bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/8] Link-based program attachment to network namespaces
@ 2020-05-27 17:08 Jakub Sitnicki
  2020-05-27 17:08 ` [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit Jakub Sitnicki
                   ` (7 more replies)
  0 siblings, 8 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Lorenz Bauer, Marek Majkowski, Stanislav Fomichev

One of the pieces of feedback from recent review of BPF hooks for socket
lookup [0] was that new program types should use bpf_link-based attachment.

Unlike with cgroups-attached programs, there is no existing infrastructure
for using bpf_links with network namespaces. It is also not as simple as
updating a single bpf_prog pointer, like in case of direct program
attachment.

Due to that I've split out this work into its own series.

Series is organized as so:

Patches 1-4 prepare a space in struct net to keep state for attached BPF
programs, and massage the code in flow_dissector to make it attach type
agnostic to finally move under kernel/bpf/.

Patch 5, the most important one, introduces new bpf_link link type for
attaching to network namespace. As opposed to cgroups, we can't hold a
ref count on struct net from a bpf_link, so we get by with just RCU
protection.

I'd be grateful if someone more experienced using RCU could give this
patch a look. It is not exactly the textbook application of it.

Patches 6-8 make libbpf and bpftool aware of the new link type and add
tests to check that link create/update/query API works as intended.

Thanks to Lorenz and Marek for early feedback & reviews.

-jkbs

Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Stanislav Fomichev <sdf@google.com>

[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/

Jakub Sitnicki (8):
  flow_dissector: Don't grab update-side lock on prog detach from
    pre_exit
  flow_dissector: Pull locking up from prog attach callback
  net: Introduce netns_bpf for BPF programs attached to netns
  flow_dissector: Move out netns_bpf prog callbacks
  bpf: Add link-based BPF program attachment to network namespace
  libbpf: Add support for bpf_link-based netns attachment
  bpftool: Support link show for netns-attached links
  selftests/bpf: Add tests for attaching bpf_link to netns

 include/linux/bpf-netns.h                     |  64 +++
 include/linux/skbuff.h                        |  26 -
 include/net/flow_dissector.h                  |   6 +
 include/net/net_namespace.h                   |   4 +-
 include/net/netns/bpf.h                       |  18 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/net_namespace.c                    | 387 ++++++++++++++
 kernel/bpf/syscall.c                          |  10 +-
 net/core/filter.c                             |   1 +
 net/core/flow_dissector.c                     | 124 +----
 tools/bpf/bpftool/link.c                      |  19 +
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/lib/bpf/libbpf.c                        |  20 +-
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../bpf/prog_tests/flow_dissector_reattach.c  | 500 +++++++++++++++++-
 17 files changed, 1026 insertions(+), 167 deletions(-)
 create mode 100644 include/linux/bpf-netns.h
 create mode 100644 include/net/netns/bpf.h
 create mode 100644 kernel/bpf/net_namespace.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-27 17:35   ` sdf
  2020-05-27 17:08 ` [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

When there are no references to struct net left, that is check_net(net) is
false, we don't need to synchronize updates to flow_dissector prog with BPF
prog attach/detach callbacks. That allows us to pull locking up from the
shared detach routine and into the bpf prog detach callback.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/flow_dissector.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 5dceed467f64..73cc085dc2b7 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -164,22 +164,26 @@ static int flow_dissector_bpf_prog_detach(struct net *net)
 {
 	struct bpf_prog *attached;
 
-	mutex_lock(&flow_dissector_mutex);
+	/* No need for update-side lock when net is going away. */
 	attached = rcu_dereference_protected(net->flow_dissector_prog,
+					     !check_net(net) ||
 					     lockdep_is_held(&flow_dissector_mutex));
-	if (!attached) {
-		mutex_unlock(&flow_dissector_mutex);
+	if (!attached)
 		return -ENOENT;
-	}
 	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
 	bpf_prog_put(attached);
-	mutex_unlock(&flow_dissector_mutex);
 	return 0;
 }
 
 int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 {
-	return flow_dissector_bpf_prog_detach(current->nsproxy->net_ns);
+	int ret;
+
+	mutex_lock(&flow_dissector_mutex);
+	ret = flow_dissector_bpf_prog_detach(current->nsproxy->net_ns);
+	mutex_unlock(&flow_dissector_mutex);
+
+	return ret;
 }
 
 static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
  2020-05-27 17:08 ` [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-27 17:35   ` sdf
  2020-05-27 17:08 ` [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Split out the part of attach callback that happens with attach/detach lock
acquired. This structures the prog attach callback similar to prog detach,
and opens up doors for moving the locking out of flow_dissector and into
generic callbacks for attaching/detaching progs to netns in subsequent
patches.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/flow_dissector.c | 40 +++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 73cc085dc2b7..4f73f6c51602 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -109,15 +109,10 @@ int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 	return 0;
 }
 
-int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-				       struct bpf_prog *prog)
+static int flow_dissector_bpf_prog_attach(struct net *net,
+					  struct bpf_prog *prog)
 {
 	struct bpf_prog *attached;
-	struct net *net;
-	int ret = 0;
-
-	net = current->nsproxy->net_ns;
-	mutex_lock(&flow_dissector_mutex);
 
 	if (net == &init_net) {
 		/* BPF flow dissector in the root namespace overrides
@@ -130,33 +125,38 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->flow_dissector_prog)) {
-				ret = -EEXIST;
-				goto out;
-			}
+			if (rcu_access_pointer(ns->flow_dissector_prog))
+				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.flow_dissector_prog)) {
-			ret = -EEXIST;
-			goto out;
-		}
+		if (rcu_access_pointer(init_net.flow_dissector_prog))
+			return -EEXIST;
 	}
 
 	attached = rcu_dereference_protected(net->flow_dissector_prog,
 					     lockdep_is_held(&flow_dissector_mutex));
-	if (attached == prog) {
+	if (attached == prog)
 		/* The same program cannot be attached twice */
-		ret = -EINVAL;
-		goto out;
-	}
+		return -EINVAL;
+
 	rcu_assign_pointer(net->flow_dissector_prog, prog);
 	if (attached)
 		bpf_prog_put(attached);
-out:
+	return 0;
+}
+
+int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
+				       struct bpf_prog *prog)
+{
+	int ret;
+
+	mutex_lock(&flow_dissector_mutex);
+	ret = flow_dissector_bpf_prog_attach(current->nsproxy->net_ns, prog);
 	mutex_unlock(&flow_dissector_mutex);
+
 	return ret;
 }
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
  2020-05-27 17:08 ` [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit Jakub Sitnicki
  2020-05-27 17:08 ` [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-27 17:40   ` sdf
  2020-05-27 17:08 ` [PATCH bpf-next 4/8] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

In order to:

 (1) attach more than one BPF program type to netns, or
 (2) support attaching BPF programs to netns with bpf_link, or
 (3) support multi-prog attach points for netns

we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.

Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.

Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.

This is similar to how it is organized for cgroup with cgroup_bpf.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf-netns.h   | 56 ++++++++++++++++++++++
 include/linux/skbuff.h      | 26 ----------
 include/net/net_namespace.h |  4 +-
 include/net/netns/bpf.h     | 17 +++++++
 kernel/bpf/syscall.c        |  7 +--
 net/core/flow_dissector.c   | 96 ++++++++++++++++++++++++-------------
 6 files changed, 143 insertions(+), 63 deletions(-)
 create mode 100644 include/linux/bpf-netns.h
 create mode 100644 include/net/netns/bpf.h

diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
new file mode 100644
index 000000000000..f3aec3d79824
--- /dev/null
+++ b/include/linux/bpf-netns.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BPF_NETNS_H
+#define _BPF_NETNS_H
+
+#include <linux/mutex.h>
+#include <uapi/linux/bpf.h>
+
+enum netns_bpf_attach_type {
+	NETNS_BPF_INVALID = -1,
+	NETNS_BPF_FLOW_DISSECTOR = 0,
+	MAX_NETNS_BPF_ATTACH_TYPE
+};
+
+static inline enum netns_bpf_attach_type
+to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
+{
+	switch (attach_type) {
+	case BPF_FLOW_DISSECTOR:
+		return NETNS_BPF_FLOW_DISSECTOR;
+	default:
+		return NETNS_BPF_INVALID;
+	}
+}
+
+/* Protects updates to netns_bpf */
+extern struct mutex netns_bpf_mutex;
+
+union bpf_attr;
+struct bpf_prog;
+
+#ifdef CONFIG_NET
+int netns_bpf_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr);
+int netns_bpf_prog_attach(const union bpf_attr *attr,
+			  struct bpf_prog *prog);
+int netns_bpf_prog_detach(const union bpf_attr *attr);
+#else
+static inline int netns_bpf_prog_query(const union bpf_attr *attr,
+				       union bpf_attr __user *uattr)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netns_bpf_prog_attach(const union bpf_attr *attr,
+					struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netns_bpf_prog_detach(const union bpf_attr *attr)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
+#endif /* _BPF_NETNS_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 531843952809..a0d5c2760103 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1283,32 +1283,6 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 			     const struct flow_dissector_key *key,
 			     unsigned int key_count);
 
-#ifdef CONFIG_NET
-int skb_flow_dissector_prog_query(const union bpf_attr *attr,
-				  union bpf_attr __user *uattr);
-int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-				       struct bpf_prog *prog);
-
-int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr);
-#else
-static inline int skb_flow_dissector_prog_query(const union bpf_attr *attr,
-						union bpf_attr __user *uattr)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-						     struct bpf_prog *prog)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
-{
-	return -EOPNOTSUPP;
-}
-#endif
-
 struct bpf_flow_dissector;
 bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
 		      __be16 proto, int nhoff, int hlen, unsigned int flags);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 8e001e049497..2ee5901bec7a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -33,6 +33,7 @@
 #include <net/netns/mpls.h>
 #include <net/netns/can.h>
 #include <net/netns/xdp.h>
+#include <net/netns/bpf.h>
 #include <linux/ns_common.h>
 #include <linux/idr.h>
 #include <linux/skbuff.h>
@@ -162,7 +163,8 @@ struct net {
 #endif
 	struct net_generic __rcu	*gen;
 
-	struct bpf_prog __rcu	*flow_dissector_prog;
+	/* Used to store attached BPF programs */
+	struct netns_bpf	bpf;
 
 	/* Note : following structs are cache line aligned */
 #ifdef CONFIG_XFRM
diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
new file mode 100644
index 000000000000..a858d1c5b166
--- /dev/null
+++ b/include/net/netns/bpf.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * BPF programs attached to network namespace
+ */
+
+#ifndef __NETNS_BPF_H__
+#define __NETNS_BPF_H__
+
+#include <linux/bpf-netns.h>
+
+struct bpf_prog;
+
+struct netns_bpf {
+	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+};
+
+#endif /* __NETNS_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d13b804ff045..93f329a6736d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -26,6 +26,7 @@
 #include <linux/audit.h>
 #include <uapi/linux/btf.h>
 #include <linux/bpf_lsm.h>
+#include <linux/bpf-netns.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -2855,7 +2856,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		ret = lirc_prog_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
-		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
+		ret = netns_bpf_prog_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
@@ -2895,7 +2896,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
-		return skb_flow_dissector_bpf_prog_detach(attr);
+		return netns_bpf_prog_detach(attr);
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2948,7 +2949,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
-		return skb_flow_dissector_prog_query(attr, uattr);
+		return netns_bpf_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 4f73f6c51602..5c978c87e6bc 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -31,8 +31,10 @@
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_labels.h>
 #endif
+#include <linux/bpf-netns.h>
 
-static DEFINE_MUTEX(flow_dissector_mutex);
+/* Protects updates to netns_bpf */
+DEFINE_MUTEX(netns_bpf_mutex);
 
 static void dissector_set_key(struct flow_dissector *flow_dissector,
 			      enum flow_dissector_key_id key_id)
@@ -70,23 +72,28 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
-int skb_flow_dissector_prog_query(const union bpf_attr *attr,
-				  union bpf_attr __user *uattr)
+int netns_bpf_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr)
 {
 	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
 	u32 prog_id, prog_cnt = 0, flags = 0;
+	enum netns_bpf_attach_type type;
 	struct bpf_prog *attached;
 	struct net *net;
 
 	if (attr->query.query_flags)
 		return -EINVAL;
 
+	type = to_netns_bpf_attach_type(attr->query.attach_type);
+	if (type < 0)
+		return -EINVAL;
+
 	net = get_net_ns_by_fd(attr->query.target_fd);
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
 	rcu_read_lock();
-	attached = rcu_dereference(net->flow_dissector_prog);
+	attached = rcu_dereference(net->bpf.progs[type]);
 	if (attached) {
 		prog_cnt = 1;
 		prog_id = attached->aux->id;
@@ -112,6 +119,7 @@ int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 static int flow_dissector_bpf_prog_attach(struct net *net,
 					  struct bpf_prog *prog)
 {
+	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
 	struct bpf_prog *attached;
 
 	if (net == &init_net) {
@@ -125,78 +133,98 @@ static int flow_dissector_bpf_prog_attach(struct net *net,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->flow_dissector_prog))
+			if (rcu_access_pointer(ns->bpf.progs[type]))
 				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.flow_dissector_prog))
+		if (rcu_access_pointer(init_net.bpf.progs[type]))
 			return -EEXIST;
 	}
 
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
 	if (attached == prog)
 		/* The same program cannot be attached twice */
 		return -EINVAL;
 
-	rcu_assign_pointer(net->flow_dissector_prog, prog);
+	rcu_assign_pointer(net->bpf.progs[type], prog);
 	if (attached)
 		bpf_prog_put(attached);
 	return 0;
 }
 
-int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-				       struct bpf_prog *prog)
+int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	enum netns_bpf_attach_type type;
+	struct net *net;
 	int ret;
 
-	mutex_lock(&flow_dissector_mutex);
-	ret = flow_dissector_bpf_prog_attach(current->nsproxy->net_ns, prog);
-	mutex_unlock(&flow_dissector_mutex);
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	net = current->nsproxy->net_ns;
+	mutex_lock(&netns_bpf_mutex);
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		ret = flow_dissector_bpf_prog_attach(net, prog);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	mutex_unlock(&netns_bpf_mutex);
 
 	return ret;
 }
 
-static int flow_dissector_bpf_prog_detach(struct net *net)
+static int __netns_bpf_prog_detach(struct net *net,
+				   enum netns_bpf_attach_type type)
 {
 	struct bpf_prog *attached;
 
 	/* No need for update-side lock when net is going away. */
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
+	attached = rcu_dereference_protected(net->bpf.progs[type],
 					     !check_net(net) ||
-					     lockdep_is_held(&flow_dissector_mutex));
+					     lockdep_is_held(&netns_bpf_mutex));
 	if (!attached)
 		return -ENOENT;
-	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
+	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
 	bpf_prog_put(attached);
 	return 0;
 }
 
-int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
+int netns_bpf_prog_detach(const union bpf_attr *attr)
 {
+	enum netns_bpf_attach_type type;
 	int ret;
 
-	mutex_lock(&flow_dissector_mutex);
-	ret = flow_dissector_bpf_prog_detach(current->nsproxy->net_ns);
-	mutex_unlock(&flow_dissector_mutex);
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
+	mutex_unlock(&netns_bpf_mutex);
 
 	return ret;
 }
 
-static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
+static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
-	/* We're not racing with attach/detach because there are no
-	 * references to netns left when pre_exit gets called.
-	 */
-	if (rcu_access_pointer(net->flow_dissector_prog))
-		flow_dissector_bpf_prog_detach(net);
+	enum netns_bpf_attach_type type;
+
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
+		if (rcu_access_pointer(net->bpf.progs[type]))
+			__netns_bpf_prog_detach(net, type);
+	}
 }
 
-static struct pernet_operations flow_dissector_pernet_ops __net_initdata = {
-	.pre_exit = flow_dissector_pernet_pre_exit,
+static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.pre_exit = netns_bpf_pernet_pre_exit,
 };
 
 /**
@@ -1034,11 +1062,13 @@ bool __skb_flow_dissect(const struct net *net,
 
 	WARN_ON_ONCE(!net);
 	if (net) {
+		enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+
 		rcu_read_lock();
-		attached = rcu_dereference(init_net.flow_dissector_prog);
+		attached = rcu_dereference(init_net.bpf.progs[type]);
 
 		if (!attached)
-			attached = rcu_dereference(net->flow_dissector_prog);
+			attached = rcu_dereference(net->bpf.progs[type]);
 
 		if (attached) {
 			struct bpf_flow_keys flow_keys;
@@ -1857,6 +1887,6 @@ static int __init init_default_flow_dissectors(void)
 				flow_keys_basic_dissector_keys,
 				ARRAY_SIZE(flow_keys_basic_dissector_keys));
 
-	return register_pernet_subsys(&flow_dissector_pernet_ops);
+	return register_pernet_subsys(&netns_bpf_pernet_ops);
 }
 core_initcall(init_default_flow_dissectors);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 4/8] flow_dissector: Move out netns_bpf prog callbacks
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2020-05-27 17:08 ` [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Move functions to manage BPF programs attached to netns that are not
specific to flow dissector to a dedicated module named
bpf/net_namespace.c.

The set of functions will grow with the addition of bpf_link support for
netns attached programs. This patch prepares ground for it by creating a
place for it.

This is a code move with no functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/flow_dissector.h |   6 ++
 kernel/bpf/Makefile          |   1 +
 kernel/bpf/net_namespace.c   | 134 +++++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c    | 126 ++------------------------------
 4 files changed, 145 insertions(+), 122 deletions(-)
 create mode 100644 kernel/bpf/net_namespace.c

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 628383915827..9af143760e35 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -8,6 +8,8 @@
 #include <linux/string.h>
 #include <uapi/linux/if_ether.h>
 
+struct bpf_prog;
+struct net;
 struct sk_buff;
 
 /**
@@ -357,4 +359,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
 	memset(key_basic, 0, sizeof(*key_basic));
 }
 
+#ifdef CONFIG_BPF_SYSCALL
+int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
+#endif /* CONFIG_BPF_SYSCALL */
+
 #endif
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 375b933010dd..ccecb70bfaa0 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -13,6 +13,7 @@ ifeq ($(CONFIG_NET),y)
 obj-$(CONFIG_BPF_SYSCALL) += devmap.o
 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
 obj-$(CONFIG_BPF_SYSCALL) += offload.o
+obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
 endif
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
new file mode 100644
index 000000000000..fc89154aed27
--- /dev/null
+++ b/kernel/bpf/net_namespace.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <net/net_namespace.h>
+
+/*
+ * Functions to manage BPF programs attached to netns
+ */
+
+/* Protects updates to netns_bpf */
+DEFINE_MUTEX(netns_bpf_mutex);
+
+int netns_bpf_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	u32 prog_id, prog_cnt = 0, flags = 0;
+	enum netns_bpf_attach_type type;
+	struct bpf_prog *attached;
+	struct net *net;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	type = to_netns_bpf_attach_type(attr->query.attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	net = get_net_ns_by_fd(attr->query.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	rcu_read_lock();
+	attached = rcu_dereference(net->bpf.progs[type]);
+	if (attached) {
+		prog_cnt = 1;
+		prog_id = attached->aux->id;
+	}
+	rcu_read_unlock();
+
+	put_net(net);
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	enum netns_bpf_attach_type type;
+	struct net *net;
+	int ret;
+
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	net = current->nsproxy->net_ns;
+	mutex_lock(&netns_bpf_mutex);
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		ret = flow_dissector_bpf_prog_attach(net, prog);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
+}
+
+static int __netns_bpf_prog_detach(struct net *net,
+				   enum netns_bpf_attach_type type)
+{
+	struct bpf_prog *attached;
+
+	/* No need for update-side lock when net is going away. */
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     !check_net(net) ||
+					     lockdep_is_held(&netns_bpf_mutex));
+	if (!attached)
+		return -ENOENT;
+	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+	bpf_prog_put(attached);
+	return 0;
+}
+
+int netns_bpf_prog_detach(const union bpf_attr *attr)
+{
+	enum netns_bpf_attach_type type;
+	int ret;
+
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
+}
+
+static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
+{
+	enum netns_bpf_attach_type type;
+
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
+		if (rcu_access_pointer(net->bpf.progs[type]))
+			__netns_bpf_prog_detach(net, type);
+	}
+}
+
+static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.pre_exit = netns_bpf_pernet_pre_exit,
+};
+
+static int __init netns_bpf_init(void)
+{
+	return register_pernet_subsys(&netns_bpf_pernet_ops);
+}
+
+subsys_initcall(netns_bpf_init);
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 5c978c87e6bc..9a42ac5114e3 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -33,9 +33,6 @@
 #endif
 #include <linux/bpf-netns.h>
 
-/* Protects updates to netns_bpf */
-DEFINE_MUTEX(netns_bpf_mutex);
-
 static void dissector_set_key(struct flow_dissector *flow_dissector,
 			      enum flow_dissector_key_id key_id)
 {
@@ -72,52 +69,8 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
-int netns_bpf_prog_query(const union bpf_attr *attr,
-			 union bpf_attr __user *uattr)
-{
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
-	enum netns_bpf_attach_type type;
-	struct bpf_prog *attached;
-	struct net *net;
-
-	if (attr->query.query_flags)
-		return -EINVAL;
-
-	type = to_netns_bpf_attach_type(attr->query.attach_type);
-	if (type < 0)
-		return -EINVAL;
-
-	net = get_net_ns_by_fd(attr->query.target_fd);
-	if (IS_ERR(net))
-		return PTR_ERR(net);
-
-	rcu_read_lock();
-	attached = rcu_dereference(net->bpf.progs[type]);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
-
-	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
-}
-
-static int flow_dissector_bpf_prog_attach(struct net *net,
-					  struct bpf_prog *prog)
+#ifdef CONFIG_BPF_SYSCALL
+int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
 	struct bpf_prog *attached;
@@ -155,77 +108,7 @@ static int flow_dissector_bpf_prog_attach(struct net *net,
 		bpf_prog_put(attached);
 	return 0;
 }
-
-int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
-{
-	enum netns_bpf_attach_type type;
-	struct net *net;
-	int ret;
-
-	type = to_netns_bpf_attach_type(attr->attach_type);
-	if (type < 0)
-		return -EINVAL;
-
-	net = current->nsproxy->net_ns;
-	mutex_lock(&netns_bpf_mutex);
-	switch (type) {
-	case NETNS_BPF_FLOW_DISSECTOR:
-		ret = flow_dissector_bpf_prog_attach(net, prog);
-		break;
-	default:
-		ret = -EINVAL;
-		break;
-	}
-	mutex_unlock(&netns_bpf_mutex);
-
-	return ret;
-}
-
-static int __netns_bpf_prog_detach(struct net *net,
-				   enum netns_bpf_attach_type type)
-{
-	struct bpf_prog *attached;
-
-	/* No need for update-side lock when net is going away. */
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     !check_net(net) ||
-					     lockdep_is_held(&netns_bpf_mutex));
-	if (!attached)
-		return -ENOENT;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
-	bpf_prog_put(attached);
-	return 0;
-}
-
-int netns_bpf_prog_detach(const union bpf_attr *attr)
-{
-	enum netns_bpf_attach_type type;
-	int ret;
-
-	type = to_netns_bpf_attach_type(attr->attach_type);
-	if (type < 0)
-		return -EINVAL;
-
-	mutex_lock(&netns_bpf_mutex);
-	ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
-	mutex_unlock(&netns_bpf_mutex);
-
-	return ret;
-}
-
-static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
-{
-	enum netns_bpf_attach_type type;
-
-	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
-		if (rcu_access_pointer(net->bpf.progs[type]))
-			__netns_bpf_prog_detach(net, type);
-	}
-}
-
-static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
-	.pre_exit = netns_bpf_pernet_pre_exit,
-};
+#endif /* CONFIG_BPF_SYSCALL */
 
 /**
  * __skb_flow_get_ports - extract the upper layer ports and return them
@@ -1886,7 +1769,6 @@ static int __init init_default_flow_dissectors(void)
 	skb_flow_dissector_init(&flow_keys_basic_dissector,
 				flow_keys_basic_dissector_keys,
 				ARRAY_SIZE(flow_keys_basic_dissector_keys));
-
-	return register_pernet_subsys(&netns_bpf_pernet_ops);
+	return 0;
 }
 core_initcall(init_default_flow_dissectors);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (3 preceding siblings ...)
  2020-05-27 17:08 ` [PATCH bpf-next 4/8] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-27 17:48   ` sdf
                     ` (4 more replies)
  2020-05-27 17:08 ` [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
                   ` (2 subsequent siblings)
  7 siblings, 5 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Add support for bpf() syscall subcommands that operate on
bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
network namespaces (that is flow dissector at the moment).

Link-based and prog-based attachment can be used interchangeably, but only
one can be in use at a time. Attempts to attach a link when a prog is
already attached directly, and the other way around, will be met with
-EBUSY.

Attachment of multiple links of same attach type to one netns is not
supported, with the intention to lift it when a use-case presents
itself. Because of that attempts to create a netns link, when one already
exists result in -E2BIG error, signifying that there is no space left for
another attachment.

Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed by a pernet pre_exit callback.

When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.

Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing the link. The netns might be getting torn down when
the release function tries to access it to detach the link.

To ensure the struct net object is alive when release function accesses it
we rely on the fact that cleanup_net(), struct net destructor, calls
synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
pre_exit happens first, link release will not attempt to access struct net.

Same applies the other way around, network namespace doesn't keep an
attached link alive because by not holding a ref to it. Instead bpf_links
to netns are RCU-freed, so that pernet pre_exit callback can safely access
and auto-detach the link when racing with link release/free.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf-netns.h      |   8 +
 include/net/netns/bpf.h        |   1 +
 include/uapi/linux/bpf.h       |   5 +
 kernel/bpf/net_namespace.c     | 257 ++++++++++++++++++++++++++++++++-
 kernel/bpf/syscall.c           |   3 +
 net/core/filter.c              |   1 +
 tools/include/uapi/linux/bpf.h |   5 +
 7 files changed, 278 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
index f3aec3d79824..4052d649f36d 100644
--- a/include/linux/bpf-netns.h
+++ b/include/linux/bpf-netns.h
@@ -34,6 +34,8 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 int netns_bpf_prog_attach(const union bpf_attr *attr,
 			  struct bpf_prog *prog);
 int netns_bpf_prog_detach(const union bpf_attr *attr);
+int netns_bpf_link_create(const union bpf_attr *attr,
+			  struct bpf_prog *prog);
 #else
 static inline int netns_bpf_prog_query(const union bpf_attr *attr,
 				       union bpf_attr __user *uattr)
@@ -51,6 +53,12 @@ static inline int netns_bpf_prog_detach(const union bpf_attr *attr)
 {
 	return -EOPNOTSUPP;
 }
+
+static inline int netns_bpf_link_create(const union bpf_attr *attr,
+					struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #endif /* _BPF_NETNS_H */
diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a858d1c5b166..baabefdeb10c 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -12,6 +12,7 @@ struct bpf_prog;
 
 struct netns_bpf {
 	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct bpf_link __rcu *links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
 #endif /* __NETNS_BPF_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 54b93f8b49b8..05469d3c5c82 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -235,6 +235,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_TRACING = 2,
 	BPF_LINK_TYPE_CGROUP = 3,
 	BPF_LINK_TYPE_ITER = 4,
+	BPF_LINK_TYPE_NETNS = 5,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -3753,6 +3754,10 @@ struct bpf_link_info {
 			__u64 cgroup_id;
 			__u32 attach_type;
 		} cgroup;
+		struct  {
+			__u32 netns_ino;
+			__u32 attach_type;
+		} netns;
 	};
 } __attribute__((aligned(8)));
 
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index fc89154aed27..1c6009ab93c5 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -8,9 +8,155 @@
  * Functions to manage BPF programs attached to netns
  */
 
-/* Protects updates to netns_bpf */
+struct bpf_netns_link {
+	struct bpf_link	link;
+	enum bpf_attach_type type;
+	enum netns_bpf_attach_type netns_type;
+
+	/* struct net is not RCU-freed but we treat it as such because
+	 * our pre_exit callback will NULL this pointer before
+	 * cleanup_net() calls synchronize_rcu().
+	 */
+	struct net __rcu *net;
+
+	/* bpf_netns_link is RCU-freed for pre_exit callback invoked
+	 * by cleanup_net() to safely access the link.
+	 */
+	struct rcu_head	rcu;
+};
+
+/* Protects updates to netns_bpf. */
 DEFINE_MUTEX(netns_bpf_mutex);
 
+static inline struct bpf_netns_link *to_bpf_netns_link(struct bpf_link *link)
+{
+	return container_of(link, struct bpf_netns_link, link);
+}
+
+/* Called with RCU read lock. */
+static void __net_exit
+bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
+{
+	struct bpf_netns_link *net_link;
+	struct bpf_link *link;
+
+	link = rcu_dereference(net->bpf.links[type]);
+	if (!link)
+		return;
+	net_link = to_bpf_netns_link(link);
+	RCU_INIT_POINTER(net_link->net, NULL);
+}
+
+static void bpf_netns_link_release(struct bpf_link *link)
+{
+	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
+	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct net *net;
+
+	/* Link auto-detached by dying netns. */
+	if (!rcu_access_pointer(net_link->net))
+		return;
+
+	mutex_lock(&netns_bpf_mutex);
+
+	/* Recheck after potential sleep. We can race with cleanup_net
+	 * here, but if we see a non-NULL struct net pointer pre_exit
+	 * and following synchronize_rcu() has not happened yet, and
+	 * we have until the end of grace period to access net.
+	 */
+	rcu_read_lock();
+	net = rcu_dereference(net_link->net);
+	if (net) {
+		RCU_INIT_POINTER(net->bpf.links[type], NULL);
+		RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+	}
+	rcu_read_unlock();
+
+	mutex_unlock(&netns_bpf_mutex);
+}
+
+static void bpf_netns_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
+
+	/* Delay kfree in case we're racing with cleanup_net. */
+	kfree_rcu(net_link, rcu);
+}
+
+static int bpf_netns_link_update_prog(struct bpf_link *link,
+				      struct bpf_prog *new_prog,
+				      struct bpf_prog *old_prog)
+{
+	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
+	struct net *net;
+	int ret = 0;
+
+	if (old_prog && old_prog != link->prog)
+		return -EPERM;
+	if (new_prog->type != link->prog->type)
+		return -EINVAL;
+
+	mutex_lock(&netns_bpf_mutex);
+	rcu_read_lock();
+
+	net = rcu_dereference(net_link->net);
+	if (!net || !check_net(net)) {
+		/* Link auto-detached or netns dying */
+		ret = -ENOLINK;
+		goto out_unlock;
+	}
+
+	old_prog = xchg(&link->prog, new_prog);
+	bpf_prog_put(old_prog);
+
+out_unlock:
+	rcu_read_unlock();
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
+}
+
+static int bpf_netns_link_fill_info(const struct bpf_link *link,
+				    struct bpf_link_info *info)
+{
+	const struct bpf_netns_link *net_link;
+	unsigned int inum;
+	struct net *net;
+
+	net_link = container_of(link, struct bpf_netns_link, link);
+
+	rcu_read_lock();
+	net = rcu_dereference(net_link->net);
+	if (net)
+		inum = net->ns.inum;
+	rcu_read_unlock();
+
+	info->netns.netns_ino = inum;
+	info->netns.attach_type = net_link->type;
+	return 0;
+}
+
+static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
+				       struct seq_file *seq)
+{
+	struct bpf_link_info info = {};
+
+	bpf_netns_link_fill_info(link, &info);
+	seq_printf(seq,
+		   "netns_ino:\t%u\n"
+		   "attach_type:\t%u\n",
+		   info.netns.netns_ino,
+		   info.netns.attach_type);
+}
+
+static const struct bpf_link_ops bpf_netns_link_ops = {
+	.release = bpf_netns_link_release,
+	.dealloc = bpf_netns_link_dealloc,
+	.update_prog = bpf_netns_link_update_prog,
+	.fill_link_info = bpf_netns_link_fill_info,
+	.show_fdinfo = bpf_netns_link_show_fdinfo,
+};
+
 int netns_bpf_prog_query(const union bpf_attr *attr,
 			 union bpf_attr __user *uattr)
 {
@@ -67,6 +213,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 
 	net = current->nsproxy->net_ns;
 	mutex_lock(&netns_bpf_mutex);
+
+	/* Attaching prog directly is not compatible with links */
+	if (rcu_access_pointer(net->bpf.links[type])) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
 		ret = flow_dissector_bpf_prog_attach(net, prog);
@@ -75,6 +228,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 		ret = -EINVAL;
 		break;
 	}
+unlock:
 	mutex_unlock(&netns_bpf_mutex);
 
 	return ret;
@@ -85,6 +239,10 @@ static int __netns_bpf_prog_detach(struct net *net,
 {
 	struct bpf_prog *attached;
 
+	/* Progs attached via links cannot be detached */
+	if (rcu_access_pointer(net->bpf.links[type]))
+		return -EBUSY;
+
 	/* No need for update-side lock when net is going away. */
 	attached = rcu_dereference_protected(net->bpf.progs[type],
 					     !check_net(net) ||
@@ -112,14 +270,109 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 	return ret;
 }
 
+static int __netns_bpf_link_attach(struct net *net, struct bpf_link *link,
+				   enum netns_bpf_attach_type type)
+{
+	int err;
+
+	/* Allow attaching only one prog or link for now */
+	if (rcu_access_pointer(net->bpf.links[type]))
+		return -E2BIG;
+	/* Links are not compatible with attaching prog directly */
+	if (rcu_access_pointer(net->bpf.progs[type]))
+		return -EBUSY;
+
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		err = flow_dissector_bpf_prog_attach(net, link->prog);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+	if (!err)
+		rcu_assign_pointer(net->bpf.links[type], link);
+	return err;
+}
+
+static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
+				 enum netns_bpf_attach_type type)
+{
+	int ret;
+
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_link_attach(net, link, type);
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
+}
+
+int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	enum netns_bpf_attach_type netns_type;
+	struct bpf_link_primer link_primer;
+	struct bpf_netns_link *net_link;
+	enum bpf_attach_type type;
+	struct net *net;
+	int err;
+
+	if (attr->link_create.flags)
+		return -EINVAL;
+
+	type = attr->link_create.attach_type;
+	netns_type = to_netns_bpf_attach_type(type);
+	if (netns_type < 0)
+		return -EINVAL;
+
+	net = get_net_ns_by_fd(attr->link_create.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	net_link = kzalloc(sizeof(*net_link), GFP_USER);
+	if (!net_link) {
+		err = -ENOMEM;
+		goto out_put_net;
+	}
+	bpf_link_init(&net_link->link, BPF_LINK_TYPE_NETNS,
+		      &bpf_netns_link_ops, prog);
+	rcu_assign_pointer(net_link->net, net);
+	net_link->type = type;
+	net_link->netns_type = netns_type;
+
+	err = bpf_link_prime(&net_link->link, &link_primer);
+	if (err) {
+		kfree(net_link);
+		goto out_put_net;
+	}
+
+	err = netns_bpf_link_attach(net, &net_link->link, netns_type);
+	if (err) {
+		bpf_link_cleanup(&link_primer);
+		goto out_put_net;
+	}
+
+	err = bpf_link_settle(&link_primer);
+out_put_net:
+	/* To auto-detach the link from netns when it is getting
+	 * destroyed, we can't hold a ref to it. Instead, we rely on
+	 * RCU when accessing link->net pointer.
+	 */
+	put_net(net);
+	return err;
+}
+
 static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
 	enum netns_bpf_attach_type type;
 
+	rcu_read_lock();
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
-		if (rcu_access_pointer(net->bpf.progs[type]))
+		if (rcu_access_pointer(net->bpf.links[type]))
+			bpf_netns_link_auto_detach(net, type);
+		else if (rcu_access_pointer(net->bpf.progs[type]))
 			__netns_bpf_prog_detach(net, type);
 	}
+	rcu_read_unlock();
 }
 
 static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 93f329a6736d..47a7b426d75c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3874,6 +3874,9 @@ static int link_create(union bpf_attr *attr)
 	case BPF_PROG_TYPE_TRACING:
 		ret = tracing_bpf_link_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+		ret = netns_bpf_link_create(attr, prog);
+		break;
 	default:
 		ret = -EINVAL;
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index a6fc23447f12..b77391a1adb1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9081,6 +9081,7 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
 #endif /* CONFIG_INET */
 
 DEFINE_BPF_DISPATCHER(xdp)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 54b93f8b49b8..05469d3c5c82 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -235,6 +235,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_TRACING = 2,
 	BPF_LINK_TYPE_CGROUP = 3,
 	BPF_LINK_TYPE_ITER = 4,
+	BPF_LINK_TYPE_NETNS = 5,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -3753,6 +3754,10 @@ struct bpf_link_info {
 			__u64 cgroup_id;
 			__u32 attach_type;
 		} cgroup;
+		struct  {
+			__u32 netns_ino;
+			__u32 attach_type;
+		} netns;
 	};
 } __attribute__((aligned(8)));
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (4 preceding siblings ...)
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-28  5:59   ` Andrii Nakryiko
  2020-05-27 17:08 ` [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links Jakub Sitnicki
  2020-05-27 17:08 ` [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
  7 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create
an FD-based kernel bpf_link, for attach types tied to network namespace,
that is BPF_FLOW_DISSECTOR for the moment.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/lib/bpf/libbpf.c   | 20 ++++++++++++++++----
 tools/lib/bpf/libbpf.h   |  2 ++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5d60de6fd818..a49c1eb5db64 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7894,8 +7894,8 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,
 	return bpf_program__attach_iter(prog, NULL);
 }
 
-struct bpf_link *
-bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
+static struct bpf_link *
+bpf_program__attach_fd(struct bpf_program *prog, int target_fd)
 {
 	enum bpf_attach_type attach_type;
 	char errmsg[STRERR_BUFSIZE];
@@ -7915,11 +7915,11 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
 	link->detach = &bpf_link__detach_fd;
 
 	attach_type = bpf_program__get_expected_attach_type(prog);
-	link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
+	link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL);
 	if (link_fd < 0) {
 		link_fd = -errno;
 		free(link);
-		pr_warn("program '%s': failed to attach to cgroup: %s\n",
+		pr_warn("program '%s': failed to attach to cgroup/netns: %s\n",
 			bpf_program__title(prog, false),
 			libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
 		return ERR_PTR(link_fd);
@@ -7928,6 +7928,18 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
 	return link;
 }
 
+struct bpf_link *
+bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
+{
+	return bpf_program__attach_fd(prog, cgroup_fd);
+}
+
+struct bpf_link *
+bpf_program__attach_netns(struct bpf_program *prog, int netns_fd)
+{
+	return bpf_program__attach_fd(prog, netns_fd);
+}
+
 struct bpf_link *
 bpf_program__attach_iter(struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 1e2e399a5f2c..adf6fd9b6fe8 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -253,6 +253,8 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_lsm(struct bpf_program *prog);
 LIBBPF_API struct bpf_link *
 bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_netns(struct bpf_program *prog, int netns_fd);
 
 struct bpf_map;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 381a7342ecfc..7ad21ba1feb6 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -263,4 +263,5 @@ LIBBPF_0.0.9 {
 		bpf_link_get_next_id;
 		bpf_program__attach_iter;
 		perf_buffer__consume;
+		bpf_program__attach_netns;
 } LIBBPF_0.0.8;
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (5 preceding siblings ...)
  2020-05-27 17:08 ` [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-28  6:02   ` Andrii Nakryiko
  2020-05-27 17:08 ` [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
  7 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Make `bpf link show` aware of new link type, that is links attached to
netns. When listing netns-attached links, display netns inode number as its
identifier and link attach type.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/bpf/bpftool/link.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
index 670a561dc31b..83a17d62c4c3 100644
--- a/tools/bpf/bpftool/link.c
+++ b/tools/bpf/bpftool/link.c
@@ -17,6 +17,7 @@ static const char * const link_type_name[] = {
 	[BPF_LINK_TYPE_TRACING]			= "tracing",
 	[BPF_LINK_TYPE_CGROUP]			= "cgroup",
 	[BPF_LINK_TYPE_ITER]			= "iter",
+	[BPF_LINK_TYPE_NETNS]			= "netns",
 };
 
 static int link_parse_fd(int *argc, char ***argv)
@@ -122,6 +123,16 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
 			jsonw_uint_field(json_wtr, "attach_type",
 					 info->cgroup.attach_type);
 		break;
+	case BPF_LINK_TYPE_NETNS:
+		jsonw_uint_field(json_wtr, "netns_ino",
+				 info->netns.netns_ino);
+		if (info->netns.attach_type < ARRAY_SIZE(attach_type_name))
+			jsonw_string_field(json_wtr, "attach_type",
+				attach_type_name[info->netns.attach_type]);
+		else
+			jsonw_uint_field(json_wtr, "attach_type",
+					 info->netns.attach_type);
+		break;
 	default:
 		break;
 	}
@@ -190,6 +201,14 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
 		else
 			printf("attach_type %u  ", info->cgroup.attach_type);
 		break;
+	case BPF_LINK_TYPE_NETNS:
+		printf("\n\tnetns_ino %u  ", info->netns.netns_ino);
+		if (info->netns.attach_type < ARRAY_SIZE(attach_type_name))
+			printf("attach_type %s  ",
+			       attach_type_name[info->netns.attach_type]);
+		else
+			printf("attach_type %u  ", info->netns.attach_type);
+		break;
 	default:
 		break;
 	}
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns
  2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (6 preceding siblings ...)
  2020-05-27 17:08 ` [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links Jakub Sitnicki
@ 2020-05-27 17:08 ` Jakub Sitnicki
  2020-05-28  6:08   ` Andrii Nakryiko
  7 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 17:08 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Extend the existing test case for flow dissector attaching to cover:

 - link creation,
 - link updates,
 - link info querying,
 - mixing links with direct prog attachment.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../bpf/prog_tests/flow_dissector_reattach.c  | 500 +++++++++++++++++-
 1 file changed, 471 insertions(+), 29 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
index 1f51ba66b98b..b702226fa762 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
@@ -11,6 +11,7 @@
 #include <fcntl.h>
 #include <sched.h>
 #include <stdbool.h>
+#include <sys/stat.h>
 #include <unistd.h>
 
 #include <linux/bpf.h>
@@ -18,6 +19,8 @@
 
 #include "test_progs.h"
 
+static int init_net = -1;
+
 static bool is_attached(int netns)
 {
 	__u32 cnt;
@@ -32,7 +35,7 @@ static bool is_attached(int netns)
 	return cnt > 0;
 }
 
-static int load_prog(void)
+static int load_prog(enum bpf_prog_type type)
 {
 	struct bpf_insn prog[] = {
 		BPF_MOV64_IMM(BPF_REG_0, BPF_OK),
@@ -40,61 +43,494 @@ static int load_prog(void)
 	};
 	int fd;
 
-	fd = bpf_load_program(BPF_PROG_TYPE_FLOW_DISSECTOR, prog,
-			      ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
+	fd = bpf_load_program(type, prog, ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
 	if (CHECK_FAIL(fd < 0))
 		perror("bpf_load_program");
 
 	return fd;
 }
 
-static void do_flow_dissector_reattach(void)
+static void test_prog_attach_prog_attach(int netns, int prog1, int prog2)
 {
-	int prog_fd[2] = { -1, -1 };
 	int err;
 
-	prog_fd[0] = load_prog();
-	if (prog_fd[0] < 0)
-		return;
-
-	prog_fd[1] = load_prog();
-	if (prog_fd[1] < 0)
-		goto out_close;
-
-	err = bpf_prog_attach(prog_fd[0], 0, BPF_FLOW_DISSECTOR, 0);
+	err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
 	if (CHECK_FAIL(err)) {
-		perror("bpf_prog_attach-0");
-		goto out_close;
+		perror("bpf_prog_attach(prog1)");
+		return;
 	}
 
 	/* Expect success when attaching a different program */
-	err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
+	err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
 	if (CHECK_FAIL(err)) {
-		perror("bpf_prog_attach-1");
+		perror("bpf_prog_attach(prog2) #1");
 		goto out_detach;
 	}
 
 	/* Expect failure when attaching the same program twice */
-	err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
+	err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
 	if (CHECK_FAIL(!err || errno != EINVAL))
-		perror("bpf_prog_attach-2");
+		perror("bpf_prog_attach(prog2) #2");
 
 out_detach:
 	err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
 	if (CHECK_FAIL(err))
 		perror("bpf_prog_detach");
+}
+
+static void test_link_create_link_create(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int link1, link2;
+
+	link1 = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect failure creating link when another link exists */
+	errno = 0;
+	link2 = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link2 != -1 || errno != E2BIG))
+		perror("bpf_prog_attach(prog2) expected E2BIG");
+	if (link2 != -1)
+		close(link2);
+
+	close(link1);
+}
+
+static void test_prog_attach_link_create(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int err, link;
+
+	err = bpf_prog_attach(prog1, -1, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_attach(prog1)");
+		return;
+	}
+
+	/* Expect failure creating link when prog attached */
+	errno = 0;
+	link = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link != -1 || errno != EBUSY))
+		perror("bpf_link_create(prog2) expected EBUSY");
+	if (link != -1)
+		close(link);
+
+	err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
+	if (CHECK_FAIL(err))
+		perror("bpf_prog_detach");
+}
+
+static void test_link_create_prog_attach(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect failure attaching prog when link exists */
+	errno = 0;
+	err = bpf_prog_attach(prog2, -1, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(!err || errno != EBUSY))
+		perror("bpf_prog_attach(prog2) expected EBUSY");
+
+	close(link);
+}
+
+static void test_link_create_prog_detach(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect failure detaching prog when link exists */
+	errno = 0;
+	err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
+	if (CHECK_FAIL(!err || errno != EBUSY))
+		perror("bpf_prog_detach");
+
+	close(link);
+}
+
+static void test_prog_attach_detach_query(int netns, int prog1, int prog2)
+{
+	int err;
+
+	err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_attach(prog1)");
+		return;
+	}
+	if (CHECK_FAIL(!is_attached(netns)))
+		return;
+
+	err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_detach");
+		return;
+	}
+
+	/* Expect no prog attached after successful detach */
+	CHECK_FAIL(is_attached(netns));
+}
+
+static void test_link_create_close_query(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	if (CHECK_FAIL(!is_attached(netns)))
+		return;
+
+	close(link);
+	/* Expect no prog attached after closing last link FD */
+	CHECK_FAIL(is_attached(netns));
+}
+
+static void test_link_update_no_old_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect success replacing the prog when old prog not specified */
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(err))
+		perror("bpf_link_update");
+
+	close(link);
+}
+
+static void test_link_update_replace_old_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect success F_REPLACE and old prog specified to succeed */
+	update_opts.flags = BPF_F_REPLACE;
+	update_opts.old_prog_fd = prog1;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(err))
+		perror("bpf_link_update");
+
+	close(link);
+}
+
+static void test_link_update_invalid_opts(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect update to fail w/ old prog FD but w/o F_REPLACE*/
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = prog1;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EINVAL)) {
+		perror("bpf_link_update expected EINVAL");
+		goto out_close;
+	}
+
+	/* Expect update to fail on old prog FD mismatch */
+	errno = 0;
+	update_opts.flags = BPF_F_REPLACE;
+	update_opts.old_prog_fd = prog2;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EPERM)) {
+		perror("bpf_link_update expected EPERM");
+		goto out_close;
+	}
+
+	/* Expect update to fail for invalid old prog FD */
+	errno = 0;
+	update_opts.flags = BPF_F_REPLACE;
+	update_opts.old_prog_fd = -1;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EBADF)) {
+		perror("bpf_link_update expected EBADF");
+		goto out_close;
+	}
+
+	/* Expect update to fail with invalid flags */
+	errno = 0;
+	update_opts.flags = BPF_F_ALLOW_MULTI;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EINVAL))
+		perror("bpf_link_update expected EINVAL");
 
 out_close:
-	close(prog_fd[1]);
-	close(prog_fd[0]);
+	close(link);
+}
+
+static void test_link_update_invalid_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link, prog3;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	/* Expect failure when new prog FD is not valid */
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, -1, &update_opts);
+	if (CHECK_FAIL(!err || errno != EBADF)) {
+		perror("bpf_link_update expected EINVAL");
+		goto out_close_link;
+	}
+
+	prog3 = load_prog(BPF_PROG_TYPE_SOCKET_FILTER);
+	if (prog3 < 0)
+		goto out_close_link;
+
+	/* Expect failure when new prog FD type doesn't match */
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog3, &update_opts);
+	if (CHECK_FAIL(!err || errno != EINVAL))
+		perror("bpf_link_update expected EINVAL");
+
+	close(prog3);
+out_close_link:
+	close(link);
+}
+
+static void test_link_update_netns_gone(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link, old_netns;
+
+	err = unshare(CLONE_NEWNET);
+	if (CHECK_FAIL(err)) {
+		perror("unshare(CLONE_NEWNET)");
+		return;
+	}
+
+	old_netns = netns;
+	netns = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(netns < 0)) {
+		perror("open(/proc/self/ns/net");
+		setns(old_netns, CLONE_NEWNET);
+		return;
+	}
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	close(netns);
+	err = setns(old_netns, CLONE_NEWNET);
+	if (CHECK_FAIL(err)) {
+		perror("setns(CLONE_NEWNET)");
+		close(link);
+		return;
+	}
+
+	/* Expect failure when netns destroyed */
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != ENOLINK))
+		perror("bpf_link_update");
+
+	close(link);
+}
+
+static void test_link_get_info(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	struct bpf_prog_info prog1_info = {};
+	struct bpf_prog_info prog2_info = {};
+	struct bpf_link_info link_info1 = {};
+	struct bpf_link_info link_info2 = {};
+	struct stat netns_stat = {};
+	unsigned int info_len;
+	int err, link;
+
+	err = fstat(netns, &netns_stat);
+	if (CHECK_FAIL(err)) {
+		perror("stat(netns)");
+		return;
+	}
+
+	info_len = sizeof(prog1_info);
+	err = bpf_obj_get_info_by_fd(prog1, &prog1_info, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info_by_fd(prog1)");
+		return;
+	}
+	CHECK_FAIL(info_len != sizeof(prog1_info));
+
+	info_len = sizeof(prog2_info);
+	err = bpf_obj_get_info_by_fd(prog2, &prog2_info, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info_by_fd(prog1)");
+		return;
+	}
+	CHECK_FAIL(info_len != sizeof(prog2_info));
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+
+	info_len = sizeof(link_info1);
+	err = bpf_obj_get_info_by_fd(link, &link_info1, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info");
+		goto out_close;
+	}
+	CHECK_FAIL(info_len != sizeof(link_info1));
+
+	/* Expect link info to be sane and match prog and netns details */
+	CHECK_FAIL(link_info1.type != BPF_LINK_TYPE_NETNS);
+	CHECK_FAIL(link_info1.id == 0);
+	CHECK_FAIL(link_info1.prog_id != prog1_info.id);
+	CHECK_FAIL(link_info1.netns.netns_ino != netns_stat.st_ino);
+	CHECK_FAIL(link_info1.netns.attach_type != BPF_FLOW_DISSECTOR);
+
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_link_update(prog2)");
+		goto out_close;
+	}
+
+	info_len = sizeof(link_info2);
+	err = bpf_obj_get_info_by_fd(link, &link_info2, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info");
+		goto out_close;
+	}
+	CHECK_FAIL(info_len != sizeof(link_info2));
+
+	/* Expect no info change after update except in prog id */
+	CHECK_FAIL(link_info2.type != BPF_LINK_TYPE_NETNS);
+	CHECK_FAIL(link_info2.id != link_info1.id);
+	CHECK_FAIL(link_info2.prog_id != prog2_info.id);
+	CHECK_FAIL(link_info2.netns.netns_ino != netns_stat.st_ino);
+	CHECK_FAIL(link_info2.netns.attach_type != BPF_FLOW_DISSECTOR);
+
+out_close:
+	close(link);
+}
+
+static void run_tests(int netns)
+{
+	struct test {
+		const char *test_name;
+		void (*test_func)(int netns, int prog1, int prog2);
+	} tests[] = {
+		{ "prog attach, prog attach",
+		  test_prog_attach_prog_attach },
+		{ "link create, link create",
+		  test_link_create_link_create },
+		{ "prog attach, link create",
+		  test_prog_attach_link_create },
+		{ "link create, prog attach",
+		  test_link_create_prog_attach },
+		{ "link create, prog detach",
+		  test_link_create_prog_detach },
+		{ "prog attach, detach, query",
+		  test_prog_attach_detach_query },
+		{ "link create, close, query",
+		  test_link_create_close_query },
+		{ "link update no old prog",
+		  test_link_update_no_old_prog },
+		{ "link update with replace old prog",
+		  test_link_update_replace_old_prog },
+		{ "link update invalid opts",
+		  test_link_update_invalid_opts },
+		{ "link update invalid prog",
+		  test_link_update_invalid_prog },
+		{ "link update netns gone",
+		  test_link_update_netns_gone },
+		{ "link get info",
+		  test_link_get_info },
+	};
+	int i, progs[2] = { -1, -1 };
+	char test_name[80];
+
+	for (i = 0; i < ARRAY_SIZE(progs); i++) {
+		progs[i] = load_prog(BPF_PROG_TYPE_FLOW_DISSECTOR);
+		if (progs[i] < 0)
+			goto out_close;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		snprintf(test_name, sizeof(test_name),
+			 "flow dissector %s%s",
+			 tests[i].test_name,
+			 netns == init_net ? " (init_net)" : "");
+		if (test__start_subtest(test_name))
+			tests[i].test_func(netns, progs[0], progs[1]);
+	}
+out_close:
+	for (i = 0; i < ARRAY_SIZE(progs); i++) {
+		if (progs[i] != -1)
+			CHECK_FAIL(close(progs[i]));
+	}
 }
 
 void test_flow_dissector_reattach(void)
 {
-	int init_net, self_net, err;
+	int err, new_net, saved_net;
 
-	self_net = open("/proc/self/ns/net", O_RDONLY);
-	if (CHECK_FAIL(self_net < 0)) {
+	saved_net = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(saved_net < 0)) {
 		perror("open(/proc/self/ns/net");
 		return;
 	}
@@ -118,7 +554,7 @@ void test_flow_dissector_reattach(void)
 	}
 
 	/* First run tests in root network namespace */
-	do_flow_dissector_reattach();
+	run_tests(init_net);
 
 	/* Then repeat tests in a non-root namespace */
 	err = unshare(CLONE_NEWNET);
@@ -126,15 +562,21 @@ void test_flow_dissector_reattach(void)
 		perror("unshare(CLONE_NEWNET)");
 		goto out_setns;
 	}
-	do_flow_dissector_reattach();
+	new_net = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(new_net < 0)) {
+		perror("open(/proc/self/ns/net");
+		return;
+	}
+	run_tests(new_net);
+	close(new_net);
 
 out_setns:
 	/* Move back to netns we started in. */
-	err = setns(self_net, CLONE_NEWNET);
+	err = setns(saved_net, CLONE_NEWNET);
 	if (CHECK_FAIL(err))
 		perror("setns(/proc/self/ns/net)");
 
 out_close:
 	close(init_net);
-	close(self_net);
+	close(saved_net);
 }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit
  2020-05-27 17:08 ` [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit Jakub Sitnicki
@ 2020-05-27 17:35   ` sdf
  0 siblings, 0 replies; 36+ messages in thread
From: sdf @ 2020-05-27 17:35 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> When there are no references to struct net left, that is check_net(net) is
> false, we don't need to synchronize updates to flow_dissector prog with  
> BPF
> prog attach/detach callbacks. That allows us to pull locking up from the
> shared detach routine and into the bpf prog detach callback.

> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback
  2020-05-27 17:08 ` [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
@ 2020-05-27 17:35   ` sdf
  0 siblings, 0 replies; 36+ messages in thread
From: sdf @ 2020-05-27 17:35 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> Split out the part of attach callback that happens with attach/detach lock
> acquired. This structures the prog attach callback similar to prog detach,
> and opens up doors for moving the locking out of flow_dissector and into
> generic callbacks for attaching/detaching progs to netns in subsequent
> patches.

> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns
  2020-05-27 17:08 ` [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
@ 2020-05-27 17:40   ` sdf
  2020-05-27 19:31     ` Jakub Sitnicki
  0 siblings, 1 reply; 36+ messages in thread
From: sdf @ 2020-05-27 17:40 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> In order to:

>   (1) attach more than one BPF program type to netns, or
>   (2) support attaching BPF programs to netns with bpf_link, or
>   (3) support multi-prog attach points for netns

> we will need to keep more state per netns than a single pointer like we
> have now for BPF flow dissector program.

> Prepare for the above by extracting netns_bpf that is part of struct net,
> for storing all state related to BPF programs attached to netns.

> Turn flow dissector callbacks for querying/attaching/detaching a program
> into generic ones that operate on netns_bpf. Next patch will move the
> generic callbacks into their own module.

> This is similar to how it is organized for cgroup with cgroup_bpf.

> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>   include/linux/bpf-netns.h   | 56 ++++++++++++++++++++++
>   include/linux/skbuff.h      | 26 ----------
>   include/net/net_namespace.h |  4 +-
>   include/net/netns/bpf.h     | 17 +++++++
>   kernel/bpf/syscall.c        |  7 +--
>   net/core/flow_dissector.c   | 96 ++++++++++++++++++++++++-------------
>   6 files changed, 143 insertions(+), 63 deletions(-)
>   create mode 100644 include/linux/bpf-netns.h
>   create mode 100644 include/net/netns/bpf.h

> diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
> new file mode 100644
> index 000000000000..f3aec3d79824
> --- /dev/null
> +++ b/include/linux/bpf-netns.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _BPF_NETNS_H
> +#define _BPF_NETNS_H
> +
> +#include <linux/mutex.h>
> +#include <uapi/linux/bpf.h>
> +
> +enum netns_bpf_attach_type {
> +	NETNS_BPF_INVALID = -1,
> +	NETNS_BPF_FLOW_DISSECTOR = 0,
> +	MAX_NETNS_BPF_ATTACH_TYPE
> +};
> +
> +static inline enum netns_bpf_attach_type
> +to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
> +{
> +	switch (attach_type) {
> +	case BPF_FLOW_DISSECTOR:
> +		return NETNS_BPF_FLOW_DISSECTOR;
> +	default:
> +		return NETNS_BPF_INVALID;
> +	}
> +}
> +
> +/* Protects updates to netns_bpf */
> +extern struct mutex netns_bpf_mutex;
I wonder whether it's a good time to make this mutex per-netns, WDYT?

The only problem I see is that it might complicate the global
mode of flow dissector where we go over every ns to make sure no
progs are attached. That will be racy with per-ns mutex unless
we do something about it...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
@ 2020-05-27 17:48   ` sdf
  2020-05-27 19:54     ` Jakub Sitnicki
  2020-05-27 20:53   ` sdf
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: sdf @ 2020-05-27 17:48 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> Add support for bpf() syscall subcommands that operate on
> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied  
> to
> network namespaces (that is flow dissector at the moment).

> Link-based and prog-based attachment can be used interchangeably, but only
> one can be in use at a time. Attempts to attach a link when a prog is
> already attached directly, and the other way around, will be met with
> -EBUSY.

> Attachment of multiple links of same attach type to one netns is not
> supported, with the intention to lift it when a use-case presents
> itself. Because of that attempts to create a netns link, when one already
> exists result in -E2BIG error, signifying that there is no space left for
> another attachment.

> Link-based attachments to netns don't keep a netns alive by holding a ref
> to it. Instead links get auto-detached from netns when the latter is being
> destroyed by a pernet pre_exit callback.

> When auto-detached, link lives in defunct state as long there are open FDs
> for it. -ENOLINK is returned if a user tries to update a defunct link.

> Because bpf_link to netns doesn't hold a ref to struct net, special care  
> is
> taken when releasing the link. The netns might be getting torn down when
> the release function tries to access it to detach the link.

> To ensure the struct net object is alive when release function accesses it
> we rely on the fact that cleanup_net(), struct net destructor, calls
> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
> pre_exit happens first, link release will not attempt to access struct  
> net.

> Same applies the other way around, network namespace doesn't keep an
> attached link alive because by not holding a ref to it. Instead bpf_links
> to netns are RCU-freed, so that pernet pre_exit callback can safely access
> and auto-detach the link when racing with link release/free.

[..]
> +	rcu_read_lock();
>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
> -		if (rcu_access_pointer(net->bpf.progs[type]))
> +		if (rcu_access_pointer(net->bpf.links[type]))
> +			bpf_netns_link_auto_detach(net, type);
> +		else if (rcu_access_pointer(net->bpf.progs[type]))
>   			__netns_bpf_prog_detach(net, type);
>   	}
> +	rcu_read_unlock();
Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach?
Is it allowed under rcu_read_load?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns
  2020-05-27 17:40   ` sdf
@ 2020-05-27 19:31     ` Jakub Sitnicki
  2020-05-27 20:36       ` sdf
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 19:31 UTC (permalink / raw)
  To: sdf; +Cc: bpf, netdev, kernel-team

On Wed, May 27, 2020 at 07:40 PM CEST, sdf@google.com wrote:
> On 05/27, Jakub Sitnicki wrote:
>> In order to:
>
>>   (1) attach more than one BPF program type to netns, or
>>   (2) support attaching BPF programs to netns with bpf_link, or
>>   (3) support multi-prog attach points for netns
>
>> we will need to keep more state per netns than a single pointer like we
>> have now for BPF flow dissector program.
>
>> Prepare for the above by extracting netns_bpf that is part of struct net,
>> for storing all state related to BPF programs attached to netns.
>
>> Turn flow dissector callbacks for querying/attaching/detaching a program
>> into generic ones that operate on netns_bpf. Next patch will move the
>> generic callbacks into their own module.
>
>> This is similar to how it is organized for cgroup with cgroup_bpf.
>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>   include/linux/bpf-netns.h   | 56 ++++++++++++++++++++++
>>   include/linux/skbuff.h      | 26 ----------
>>   include/net/net_namespace.h |  4 +-
>>   include/net/netns/bpf.h     | 17 +++++++
>>   kernel/bpf/syscall.c        |  7 +--
>>   net/core/flow_dissector.c   | 96 ++++++++++++++++++++++++-------------
>>   6 files changed, 143 insertions(+), 63 deletions(-)
>>   create mode 100644 include/linux/bpf-netns.h
>>   create mode 100644 include/net/netns/bpf.h
>
>> diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
>> new file mode 100644
>> index 000000000000..f3aec3d79824
>> --- /dev/null
>> +++ b/include/linux/bpf-netns.h
>> @@ -0,0 +1,56 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _BPF_NETNS_H
>> +#define _BPF_NETNS_H
>> +
>> +#include <linux/mutex.h>
>> +#include <uapi/linux/bpf.h>
>> +
>> +enum netns_bpf_attach_type {
>> +	NETNS_BPF_INVALID = -1,
>> +	NETNS_BPF_FLOW_DISSECTOR = 0,
>> +	MAX_NETNS_BPF_ATTACH_TYPE
>> +};
>> +
>> +static inline enum netns_bpf_attach_type
>> +to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
>> +{
>> +	switch (attach_type) {
>> +	case BPF_FLOW_DISSECTOR:
>> +		return NETNS_BPF_FLOW_DISSECTOR;
>> +	default:
>> +		return NETNS_BPF_INVALID;
>> +	}
>> +}
>> +
>> +/* Protects updates to netns_bpf */
>> +extern struct mutex netns_bpf_mutex;
> I wonder whether it's a good time to make this mutex per-netns, WDYT?
>
> The only problem I see is that it might complicate the global
> mode of flow dissector where we go over every ns to make sure no
> progs are attached. That will be racy with per-ns mutex unless
> we do something about it...

It crossed my mind. I stuck with a global mutex for a couple of
reasons. Different that one you bring up, which I forgot about.

1. Don't know if it has potential to be a bottleneck.

cgroup BPF uses a global mutex too. Even one that serializes access to
more data than just BPF programs attached to a cgroup.

Also, we grab the netns_bpf_mutex only on prog attach/detach, and link
create/update/release. Netns teardown is not grabbing it. So if you're
not using netns BPF you're not going to "feel" contention.

2. Makes locking on nets bpf_link release trickier

In bpf_netns_link_release (patch 5), we deref pointer from link to
struct net under RCU read lock, in case the net is being destroyed
simulatneously.

However, we're also grabbing the netns_bpf_mutex, in case of another
possible scenario, when struct net is alive and well (refcnt > 0), but
we're racing with a prog attach/detach to access net->bpf.{links,progs}.

Making the mutex part of net->bpf means I first need to somehow ensure
netns stays alive if I go to sleep waiting for the lock. Or it would
have to be a spinlock, or some better (simpler?) locking scheme.


The above two convinced me that I should start with a global mutex, and
go for more pain if there's contention.

Thanks for giving the series a review.

-jkbs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:48   ` sdf
@ 2020-05-27 19:54     ` Jakub Sitnicki
  2020-05-27 20:38       ` sdf
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-27 19:54 UTC (permalink / raw)
  To: sdf; +Cc: bpf, netdev, kernel-team

On Wed, May 27, 2020 at 07:48 PM CEST, sdf@google.com wrote:
> On 05/27, Jakub Sitnicki wrote:
>> Add support for bpf() syscall subcommands that operate on
>> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
>> network namespaces (that is flow dissector at the moment).
>
>> Link-based and prog-based attachment can be used interchangeably, but only
>> one can be in use at a time. Attempts to attach a link when a prog is
>> already attached directly, and the other way around, will be met with
>> -EBUSY.
>
>> Attachment of multiple links of same attach type to one netns is not
>> supported, with the intention to lift it when a use-case presents
>> itself. Because of that attempts to create a netns link, when one already
>> exists result in -E2BIG error, signifying that there is no space left for
>> another attachment.
>
>> Link-based attachments to netns don't keep a netns alive by holding a ref
>> to it. Instead links get auto-detached from netns when the latter is being
>> destroyed by a pernet pre_exit callback.
>
>> When auto-detached, link lives in defunct state as long there are open FDs
>> for it. -ENOLINK is returned if a user tries to update a defunct link.
>
>> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> taken when releasing the link. The netns might be getting torn down when
>> the release function tries to access it to detach the link.
>
>> To ensure the struct net object is alive when release function accesses it
>> we rely on the fact that cleanup_net(), struct net destructor, calls
>> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
>> pre_exit happens first, link release will not attempt to access struct net.
>
>> Same applies the other way around, network namespace doesn't keep an
>> attached link alive because by not holding a ref to it. Instead bpf_links
>> to netns are RCU-freed, so that pernet pre_exit callback can safely access
>> and auto-detach the link when racing with link release/free.
>
> [..]
>> +	rcu_read_lock();
>>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
>> -		if (rcu_access_pointer(net->bpf.progs[type]))
>> +		if (rcu_access_pointer(net->bpf.links[type]))
>> +			bpf_netns_link_auto_detach(net, type);
>> +		else if (rcu_access_pointer(net->bpf.progs[type]))
>>   			__netns_bpf_prog_detach(net, type);
>>   	}
>> +	rcu_read_unlock();
> Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach?
> Is it allowed under rcu_read_load?

Yes, that's true. __netns_bpf_prog_detach does

	RCU_INIT_POINTER(net->bpf.progs[type], NULL);

RCU read lock is here for the rcu_dereference() that happens in
bpf_netns_link_auto_detach (netns doesn't hold a ref to bpf_link):

/* Called with RCU read lock. */
static void __net_exit
bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
{
	struct bpf_netns_link *net_link;
	struct bpf_link *link;

	link = rcu_dereference(net->bpf.links[type]);
	if (!link)
		return;
	net_link = to_bpf_netns_link(link);
	RCU_INIT_POINTER(net_link->net, NULL);
}

I've pulled it up, out of the loop, perhaps too eagerly and just made it
confusing, considering we're iterating over a 1-item array :-)

Now, I'm also doing RCU_INIT_POINTER on the *contents of bpf_link* in
bpf_netns_link_auto_detach. Is that allowed? I'm not sure, that bit is
hazy to me.

There are no concurrent writers to net_link->net, just readers, i.e.
bpf_netns_link_release(). And I know bpf_link won't be freed until the
grace period elapses.

sparse and CONFIG_PROVE_RCU are not shouting at me, but please do if it
doesn't hold up or make sense.

I certainly can push the rcu_read_lock() down into
bpf_netns_link_auto_detach().

-jkbs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns
  2020-05-27 19:31     ` Jakub Sitnicki
@ 2020-05-27 20:36       ` sdf
  0 siblings, 0 replies; 36+ messages in thread
From: sdf @ 2020-05-27 20:36 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> On Wed, May 27, 2020 at 07:40 PM CEST, sdf@google.com wrote:
> > On 05/27, Jakub Sitnicki wrote:
> >> In order to:
> >
> >>   (1) attach more than one BPF program type to netns, or
> >>   (2) support attaching BPF programs to netns with bpf_link, or
> >>   (3) support multi-prog attach points for netns
> >
> >> we will need to keep more state per netns than a single pointer like we
> >> have now for BPF flow dissector program.
> >
> >> Prepare for the above by extracting netns_bpf that is part of struct  
> net,
> >> for storing all state related to BPF programs attached to netns.
> >
> >> Turn flow dissector callbacks for querying/attaching/detaching a  
> program
> >> into generic ones that operate on netns_bpf. Next patch will move the
> >> generic callbacks into their own module.
> >
> >> This is similar to how it is organized for cgroup with cgroup_bpf.
> >
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>   include/linux/bpf-netns.h   | 56 ++++++++++++++++++++++
> >>   include/linux/skbuff.h      | 26 ----------
> >>   include/net/net_namespace.h |  4 +-
> >>   include/net/netns/bpf.h     | 17 +++++++
> >>   kernel/bpf/syscall.c        |  7 +--
> >>   net/core/flow_dissector.c   | 96  
> ++++++++++++++++++++++++-------------
> >>   6 files changed, 143 insertions(+), 63 deletions(-)
> >>   create mode 100644 include/linux/bpf-netns.h
> >>   create mode 100644 include/net/netns/bpf.h
> >
> >> diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
> >> new file mode 100644
> >> index 000000000000..f3aec3d79824
> >> --- /dev/null
> >> +++ b/include/linux/bpf-netns.h
> >> @@ -0,0 +1,56 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef _BPF_NETNS_H
> >> +#define _BPF_NETNS_H
> >> +
> >> +#include <linux/mutex.h>
> >> +#include <uapi/linux/bpf.h>
> >> +
> >> +enum netns_bpf_attach_type {
> >> +	NETNS_BPF_INVALID = -1,
> >> +	NETNS_BPF_FLOW_DISSECTOR = 0,
> >> +	MAX_NETNS_BPF_ATTACH_TYPE
> >> +};
> >> +
> >> +static inline enum netns_bpf_attach_type
> >> +to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
> >> +{
> >> +	switch (attach_type) {
> >> +	case BPF_FLOW_DISSECTOR:
> >> +		return NETNS_BPF_FLOW_DISSECTOR;
> >> +	default:
> >> +		return NETNS_BPF_INVALID;
> >> +	}
> >> +}
> >> +
> >> +/* Protects updates to netns_bpf */
> >> +extern struct mutex netns_bpf_mutex;
> > I wonder whether it's a good time to make this mutex per-netns, WDYT?
> >
> > The only problem I see is that it might complicate the global
> > mode of flow dissector where we go over every ns to make sure no
> > progs are attached. That will be racy with per-ns mutex unless
> > we do something about it...

> It crossed my mind. I stuck with a global mutex for a couple of
> reasons. Different that one you bring up, which I forgot about.

> 1. Don't know if it has potential to be a bottleneck.

> cgroup BPF uses a global mutex too. Even one that serializes access to
> more data than just BPF programs attached to a cgroup.

> Also, we grab the netns_bpf_mutex only on prog attach/detach, and link
> create/update/release. Netns teardown is not grabbing it. So if you're
> not using netns BPF you're not going to "feel" contention.

> 2. Makes locking on nets bpf_link release trickier

> In bpf_netns_link_release (patch 5), we deref pointer from link to
> struct net under RCU read lock, in case the net is being destroyed
> simulatneously.

> However, we're also grabbing the netns_bpf_mutex, in case of another
> possible scenario, when struct net is alive and well (refcnt > 0), but
> we're racing with a prog attach/detach to access net->bpf.{links,progs}.

> Making the mutex part of net->bpf means I first need to somehow ensure
> netns stays alive if I go to sleep waiting for the lock. Or it would
> have to be a spinlock, or some better (simpler?) locking scheme.


> The above two convinced me that I should start with a global mutex, and
> go for more pain if there's contention.

> Thanks for giving the series a review.
Yeah, everything makes sense, agreed, feel free to slap:
Reviewed-by: Stanislav Fomichev <sdf@google.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 19:54     ` Jakub Sitnicki
@ 2020-05-27 20:38       ` sdf
  2020-05-28 10:34         ` Jakub Sitnicki
  0 siblings, 1 reply; 36+ messages in thread
From: sdf @ 2020-05-27 20:38 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> On Wed, May 27, 2020 at 07:48 PM CEST, sdf@google.com wrote:
> > On 05/27, Jakub Sitnicki wrote:
> >> Add support for bpf() syscall subcommands that operate on
> >> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points  
> tied to
> >> network namespaces (that is flow dissector at the moment).
> >
> >> Link-based and prog-based attachment can be used interchangeably, but  
> only
> >> one can be in use at a time. Attempts to attach a link when a prog is
> >> already attached directly, and the other way around, will be met with
> >> -EBUSY.
> >
> >> Attachment of multiple links of same attach type to one netns is not
> >> supported, with the intention to lift it when a use-case presents
> >> itself. Because of that attempts to create a netns link, when one  
> already
> >> exists result in -E2BIG error, signifying that there is no space left  
> for
> >> another attachment.
> >
> >> Link-based attachments to netns don't keep a netns alive by holding a  
> ref
> >> to it. Instead links get auto-detached from netns when the latter is  
> being
> >> destroyed by a pernet pre_exit callback.
> >
> >> When auto-detached, link lives in defunct state as long there are open  
> FDs
> >> for it. -ENOLINK is returned if a user tries to update a defunct link.
> >
> >> Because bpf_link to netns doesn't hold a ref to struct net, special  
> care is
> >> taken when releasing the link. The netns might be getting torn down  
> when
> >> the release function tries to access it to detach the link.
> >
> >> To ensure the struct net object is alive when release function  
> accesses it
> >> we rely on the fact that cleanup_net(), struct net destructor, calls
> >> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach  
> from
> >> pre_exit happens first, link release will not attempt to access struct  
> net.
> >
> >> Same applies the other way around, network namespace doesn't keep an
> >> attached link alive because by not holding a ref to it. Instead  
> bpf_links
> >> to netns are RCU-freed, so that pernet pre_exit callback can safely  
> access
> >> and auto-detach the link when racing with link release/free.
> >
> > [..]
> >> +	rcu_read_lock();
> >>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
> >> -		if (rcu_access_pointer(net->bpf.progs[type]))
> >> +		if (rcu_access_pointer(net->bpf.links[type]))
> >> +			bpf_netns_link_auto_detach(net, type);
> >> +		else if (rcu_access_pointer(net->bpf.progs[type]))
> >>   			__netns_bpf_prog_detach(net, type);
> >>   	}
> >> +	rcu_read_unlock();
> > Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach?
> > Is it allowed under rcu_read_load?

> Yes, that's true. __netns_bpf_prog_detach does

> 	RCU_INIT_POINTER(net->bpf.progs[type], NULL);

> RCU read lock is here for the rcu_dereference() that happens in
> bpf_netns_link_auto_detach (netns doesn't hold a ref to bpf_link):

> /* Called with RCU read lock. */
> static void __net_exit
> bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type  
> type)
> {
> 	struct bpf_netns_link *net_link;
> 	struct bpf_link *link;

> 	link = rcu_dereference(net->bpf.links[type]);
> 	if (!link)
> 		return;
> 	net_link = to_bpf_netns_link(link);
> 	RCU_INIT_POINTER(net_link->net, NULL);
> }

> I've pulled it up, out of the loop, perhaps too eagerly and just made it
> confusing, considering we're iterating over a 1-item array :-)

> Now, I'm also doing RCU_INIT_POINTER on the *contents of bpf_link* in
> bpf_netns_link_auto_detach. Is that allowed? I'm not sure, that bit is
> hazy to me.

> There are no concurrent writers to net_link->net, just readers, i.e.
> bpf_netns_link_release(). And I know bpf_link won't be freed until the
> grace period elapses.

> sparse and CONFIG_PROVE_RCU are not shouting at me, but please do if it
> doesn't hold up or make sense.

> I certainly can push the rcu_read_lock() down into
> bpf_netns_link_auto_detach().
I think it would be much nicer if you push them down to preserve the
assumption that nothing is modified under read lock and you flip
the pointers only when holding the mutexes.

I'll do another pass on this patch, I think I don't understand a bunch
of bits where you do:

mutex_lock
rcu_read_lock -> why? you're already in the update section, can use
                  rcu_dereference_protected
...
rcu_read_unlock
mutex_unlock


But I'll post those comments inline shortly.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
  2020-05-27 17:48   ` sdf
@ 2020-05-27 20:53   ` sdf
  2020-05-28 11:03     ` Jakub Sitnicki
  2020-05-28  2:54   ` kbuild test robot
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: sdf @ 2020-05-27 20:53 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/27, Jakub Sitnicki wrote:
> Add support for bpf() syscall subcommands that operate on
> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied  
> to
> network namespaces (that is flow dissector at the moment).

> Link-based and prog-based attachment can be used interchangeably, but only
> one can be in use at a time. Attempts to attach a link when a prog is
> already attached directly, and the other way around, will be met with
> -EBUSY.

> Attachment of multiple links of same attach type to one netns is not
> supported, with the intention to lift it when a use-case presents
> itself. Because of that attempts to create a netns link, when one already
> exists result in -E2BIG error, signifying that there is no space left for
> another attachment.

> Link-based attachments to netns don't keep a netns alive by holding a ref
> to it. Instead links get auto-detached from netns when the latter is being
> destroyed by a pernet pre_exit callback.

> When auto-detached, link lives in defunct state as long there are open FDs
> for it. -ENOLINK is returned if a user tries to update a defunct link.

> Because bpf_link to netns doesn't hold a ref to struct net, special care  
> is
> taken when releasing the link. The netns might be getting torn down when
> the release function tries to access it to detach the link.

> To ensure the struct net object is alive when release function accesses it
> we rely on the fact that cleanup_net(), struct net destructor, calls
> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
> pre_exit happens first, link release will not attempt to access struct  
> net.

> Same applies the other way around, network namespace doesn't keep an
> attached link alive because by not holding a ref to it. Instead bpf_links
> to netns are RCU-freed, so that pernet pre_exit callback can safely access
> and auto-detach the link when racing with link release/free.

> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>   include/linux/bpf-netns.h      |   8 +
>   include/net/netns/bpf.h        |   1 +
>   include/uapi/linux/bpf.h       |   5 +
>   kernel/bpf/net_namespace.c     | 257 ++++++++++++++++++++++++++++++++-
>   kernel/bpf/syscall.c           |   3 +
>   net/core/filter.c              |   1 +
>   tools/include/uapi/linux/bpf.h |   5 +
>   7 files changed, 278 insertions(+), 2 deletions(-)

> diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
> index f3aec3d79824..4052d649f36d 100644
> --- a/include/linux/bpf-netns.h
> +++ b/include/linux/bpf-netns.h
> @@ -34,6 +34,8 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
>   int netns_bpf_prog_attach(const union bpf_attr *attr,
>   			  struct bpf_prog *prog);
>   int netns_bpf_prog_detach(const union bpf_attr *attr);
> +int netns_bpf_link_create(const union bpf_attr *attr,
> +			  struct bpf_prog *prog);
>   #else
>   static inline int netns_bpf_prog_query(const union bpf_attr *attr,
>   				       union bpf_attr __user *uattr)
> @@ -51,6 +53,12 @@ static inline int netns_bpf_prog_detach(const union  
> bpf_attr *attr)
>   {
>   	return -EOPNOTSUPP;
>   }
> +
> +static inline int netns_bpf_link_create(const union bpf_attr *attr,
> +					struct bpf_prog *prog)
> +{
> +	return -EOPNOTSUPP;
> +}
>   #endif

>   #endif /* _BPF_NETNS_H */
> diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
> index a858d1c5b166..baabefdeb10c 100644
> --- a/include/net/netns/bpf.h
> +++ b/include/net/netns/bpf.h
> @@ -12,6 +12,7 @@ struct bpf_prog;

>   struct netns_bpf {
>   	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
> +	struct bpf_link __rcu *links[MAX_NETNS_BPF_ATTACH_TYPE];
>   };

>   #endif /* __NETNS_BPF_H__ */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 54b93f8b49b8..05469d3c5c82 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -235,6 +235,7 @@ enum bpf_link_type {
>   	BPF_LINK_TYPE_TRACING = 2,
>   	BPF_LINK_TYPE_CGROUP = 3,
>   	BPF_LINK_TYPE_ITER = 4,
> +	BPF_LINK_TYPE_NETNS = 5,

>   	MAX_BPF_LINK_TYPE,
>   };
> @@ -3753,6 +3754,10 @@ struct bpf_link_info {
>   			__u64 cgroup_id;
>   			__u32 attach_type;
>   		} cgroup;
> +		struct  {
> +			__u32 netns_ino;
> +			__u32 attach_type;
> +		} netns;
>   	};
>   } __attribute__((aligned(8)));

> diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
> index fc89154aed27..1c6009ab93c5 100644
> --- a/kernel/bpf/net_namespace.c
> +++ b/kernel/bpf/net_namespace.c
> @@ -8,9 +8,155 @@
>    * Functions to manage BPF programs attached to netns
>    */

> -/* Protects updates to netns_bpf */
> +struct bpf_netns_link {
> +	struct bpf_link	link;
> +	enum bpf_attach_type type;
> +	enum netns_bpf_attach_type netns_type;
> +
> +	/* struct net is not RCU-freed but we treat it as such because
> +	 * our pre_exit callback will NULL this pointer before
> +	 * cleanup_net() calls synchronize_rcu().
> +	 */
> +	struct net __rcu *net;
> +
> +	/* bpf_netns_link is RCU-freed for pre_exit callback invoked
> +	 * by cleanup_net() to safely access the link.
> +	 */
> +	struct rcu_head	rcu;
> +};
> +
> +/* Protects updates to netns_bpf. */
>   DEFINE_MUTEX(netns_bpf_mutex);

> +static inline struct bpf_netns_link *to_bpf_netns_link(struct bpf_link  
> *link)
> +{
> +	return container_of(link, struct bpf_netns_link, link);
> +}
> +
> +/* Called with RCU read lock. */
As mentioned earlier, ^^^ probably doesn't make sense. Try to avoid
RCU_INIT_POINTER under read lock.

> +static void __net_exit
> +bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type  
> type)
> +{
> +	struct bpf_netns_link *net_link;
> +	struct bpf_link *link;
> +
> +	link = rcu_dereference(net->bpf.links[type]);
Btw, maybe use rcu_deref_protected with !check_net here and drop
rcu read lock requirement?

> +	if (!link)
> +		return;
> +	net_link = to_bpf_netns_link(link);
> +	RCU_INIT_POINTER(net_link->net, NULL);
> +}
> +
> +static void bpf_netns_link_release(struct bpf_link *link)
> +{
> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +	enum netns_bpf_attach_type type = net_link->netns_type;
> +	struct net *net;
> +
> +	/* Link auto-detached by dying netns. */
> +	if (!rcu_access_pointer(net_link->net))
> +		return;
> +
> +	mutex_lock(&netns_bpf_mutex);
> +
> +	/* Recheck after potential sleep. We can race with cleanup_net
> +	 * here, but if we see a non-NULL struct net pointer pre_exit
> +	 * and following synchronize_rcu() has not happened yet, and
> +	 * we have until the end of grace period to access net.
> +	 */
> +	rcu_read_lock();
> +	net = rcu_dereference(net_link->net);
Use rcu_dereferece_protected with netns_bpf_mutex here instead of
grabbing rcu read lock? Or are you guarding here against 'net'
going away? In that case, moving unlock higher can make it more
visually clear.

> +	if (net) {
> +		RCU_INIT_POINTER(net->bpf.links[type], NULL);
> +		RCU_INIT_POINTER(net->bpf.progs[type], NULL);
> +	}
> +	rcu_read_unlock();
> +
> +	mutex_unlock(&netns_bpf_mutex);
> +}
> +
> +static void bpf_netns_link_dealloc(struct bpf_link *link)
> +{
> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +
> +	/* Delay kfree in case we're racing with cleanup_net. */
> +	kfree_rcu(net_link, rcu);
> +}
> +
> +static int bpf_netns_link_update_prog(struct bpf_link *link,
> +				      struct bpf_prog *new_prog,
> +				      struct bpf_prog *old_prog)
> +{
> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +	struct net *net;
> +	int ret = 0;
> +
> +	if (old_prog && old_prog != link->prog)
> +		return -EPERM;
> +	if (new_prog->type != link->prog->type)
> +		return -EINVAL;
> +
> +	mutex_lock(&netns_bpf_mutex);
> +	rcu_read_lock();
Again, what are you protecting here? 'net' disappearing? Then maybe do:

	rcu_read_lock
	net = rcu_deref
	valid = net && check_net(net)
	rcu_read_unlock

	if (!valid)
		bail out.

Otherwise, those mutex_lock+rcu_read_lock are a bit confusing.

> +
> +	net = rcu_dereference(net_link->net);
> +	if (!net || !check_net(net)) {
> +		/* Link auto-detached or netns dying */
> +		ret = -ENOLINK;
> +		goto out_unlock;
> +	}
> +
> +	old_prog = xchg(&link->prog, new_prog);
> +	bpf_prog_put(old_prog);
> +
> +out_unlock:
> +	rcu_read_unlock();
> +	mutex_unlock(&netns_bpf_mutex);
> +
> +	return ret;
> +}
> +
> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
> +				    struct bpf_link_info *info)
> +{
> +	const struct bpf_netns_link *net_link;
> +	unsigned int inum;
> +	struct net *net;
> +
> +	net_link = container_of(link, struct bpf_netns_link, link);
> +
> +	rcu_read_lock();
> +	net = rcu_dereference(net_link->net);
> +	if (net)
> +		inum = net->ns.inum;
> +	rcu_read_unlock();
> +
> +	info->netns.netns_ino = inum;
> +	info->netns.attach_type = net_link->type;
> +	return 0;
> +}
> +
> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
> +				       struct seq_file *seq)
> +{
> +	struct bpf_link_info info = {};
> +
> +	bpf_netns_link_fill_info(link, &info);
> +	seq_printf(seq,
> +		   "netns_ino:\t%u\n"
> +		   "attach_type:\t%u\n",
> +		   info.netns.netns_ino,
> +		   info.netns.attach_type);
> +}
> +
> +static const struct bpf_link_ops bpf_netns_link_ops = {
> +	.release = bpf_netns_link_release,
> +	.dealloc = bpf_netns_link_dealloc,
> +	.update_prog = bpf_netns_link_update_prog,
> +	.fill_link_info = bpf_netns_link_fill_info,
> +	.show_fdinfo = bpf_netns_link_show_fdinfo,
> +};
> +
>   int netns_bpf_prog_query(const union bpf_attr *attr,
>   			 union bpf_attr __user *uattr)
>   {
> @@ -67,6 +213,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr,  
> struct bpf_prog *prog)

>   	net = current->nsproxy->net_ns;
>   	mutex_lock(&netns_bpf_mutex);
> +
> +	/* Attaching prog directly is not compatible with links */
> +	if (rcu_access_pointer(net->bpf.links[type])) {
> +		ret = -EBUSY;
> +		goto unlock;
> +	}
> +
>   	switch (type) {
>   	case NETNS_BPF_FLOW_DISSECTOR:
>   		ret = flow_dissector_bpf_prog_attach(net, prog);
> @@ -75,6 +228,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr,  
> struct bpf_prog *prog)
>   		ret = -EINVAL;
>   		break;
>   	}
> +unlock:
>   	mutex_unlock(&netns_bpf_mutex);

>   	return ret;
> @@ -85,6 +239,10 @@ static int __netns_bpf_prog_detach(struct net *net,
>   {
>   	struct bpf_prog *attached;

> +	/* Progs attached via links cannot be detached */
> +	if (rcu_access_pointer(net->bpf.links[type]))
> +		return -EBUSY;
> +
>   	/* No need for update-side lock when net is going away. */
>   	attached = rcu_dereference_protected(net->bpf.progs[type],
>   					     !check_net(net) ||
> @@ -112,14 +270,109 @@ int netns_bpf_prog_detach(const union bpf_attr  
> *attr)
>   	return ret;
>   }

> +static int __netns_bpf_link_attach(struct net *net, struct bpf_link  
> *link,
> +				   enum netns_bpf_attach_type type)
> +{
> +	int err;
> +
> +	/* Allow attaching only one prog or link for now */
> +	if (rcu_access_pointer(net->bpf.links[type]))
> +		return -E2BIG;
> +	/* Links are not compatible with attaching prog directly */
> +	if (rcu_access_pointer(net->bpf.progs[type]))
> +		return -EBUSY;
> +
> +	switch (type) {
> +	case NETNS_BPF_FLOW_DISSECTOR:
> +		err = flow_dissector_bpf_prog_attach(net, link->prog);
> +		break;
> +	default:
> +		err = -EINVAL;
> +		break;
> +	}
> +	if (!err)
> +		rcu_assign_pointer(net->bpf.links[type], link);
> +	return err;
> +}
> +
> +static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
> +				 enum netns_bpf_attach_type type)
> +{
> +	int ret;
> +
> +	mutex_lock(&netns_bpf_mutex);
> +	ret = __netns_bpf_link_attach(net, link, type);
What's the point of separating __netns_bpf_link_attach out?
Does it make sense to replace those rcu_access_pointer's in
__netns_bpf_link_attach with rcu_deref_protected and put everything here?
This makes it a bit easier to reason about.

> +	mutex_unlock(&netns_bpf_mutex);
> +
> +	return ret;
> +}
> +
> +int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog  
> *prog)
> +{
> +	enum netns_bpf_attach_type netns_type;
> +	struct bpf_link_primer link_primer;
> +	struct bpf_netns_link *net_link;
> +	enum bpf_attach_type type;
> +	struct net *net;
> +	int err;
> +
> +	if (attr->link_create.flags)
> +		return -EINVAL;
> +
> +	type = attr->link_create.attach_type;
> +	netns_type = to_netns_bpf_attach_type(type);
> +	if (netns_type < 0)
> +		return -EINVAL;
> +
> +	net = get_net_ns_by_fd(attr->link_create.target_fd);
> +	if (IS_ERR(net))
> +		return PTR_ERR(net);
> +
> +	net_link = kzalloc(sizeof(*net_link), GFP_USER);
> +	if (!net_link) {
> +		err = -ENOMEM;
> +		goto out_put_net;
> +	}
> +	bpf_link_init(&net_link->link, BPF_LINK_TYPE_NETNS,
> +		      &bpf_netns_link_ops, prog);
> +	rcu_assign_pointer(net_link->net, net);
> +	net_link->type = type;
> +	net_link->netns_type = netns_type;
> +
> +	err = bpf_link_prime(&net_link->link, &link_primer);
> +	if (err) {
> +		kfree(net_link);
> +		goto out_put_net;
> +	}
> +
> +	err = netns_bpf_link_attach(net, &net_link->link, netns_type);
> +	if (err) {
> +		bpf_link_cleanup(&link_primer);
> +		goto out_put_net;
> +	}
> +
> +	err = bpf_link_settle(&link_primer);
> +out_put_net:
> +	/* To auto-detach the link from netns when it is getting
> +	 * destroyed, we can't hold a ref to it. Instead, we rely on
> +	 * RCU when accessing link->net pointer.
> +	 */
> +	put_net(net);
> +	return err;
> +}
> +
>   static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
>   {
>   	enum netns_bpf_attach_type type;

> +	rcu_read_lock();
>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
> -		if (rcu_access_pointer(net->bpf.progs[type]))
> +		if (rcu_access_pointer(net->bpf.links[type]))
> +			bpf_netns_link_auto_detach(net, type);
> +		else if (rcu_access_pointer(net->bpf.progs[type]))
>   			__netns_bpf_prog_detach(net, type);
>   	}
> +	rcu_read_unlock();
>   }

>   static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 93f329a6736d..47a7b426d75c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3874,6 +3874,9 @@ static int link_create(union bpf_attr *attr)
>   	case BPF_PROG_TYPE_TRACING:
>   		ret = tracing_bpf_link_attach(attr, prog);
>   		break;
> +	case BPF_PROG_TYPE_FLOW_DISSECTOR:
> +		ret = netns_bpf_link_create(attr, prog);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   	}
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a6fc23447f12..b77391a1adb1 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9081,6 +9081,7 @@ const struct bpf_verifier_ops  
> sk_reuseport_verifier_ops = {

>   const struct bpf_prog_ops sk_reuseport_prog_ops = {
>   };
> +
>   #endif /* CONFIG_INET */

>   DEFINE_BPF_DISPATCHER(xdp)
> diff --git a/tools/include/uapi/linux/bpf.h  
> b/tools/include/uapi/linux/bpf.h
> index 54b93f8b49b8..05469d3c5c82 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -235,6 +235,7 @@ enum bpf_link_type {
>   	BPF_LINK_TYPE_TRACING = 2,
>   	BPF_LINK_TYPE_CGROUP = 3,
>   	BPF_LINK_TYPE_ITER = 4,
> +	BPF_LINK_TYPE_NETNS = 5,

>   	MAX_BPF_LINK_TYPE,
>   };
> @@ -3753,6 +3754,10 @@ struct bpf_link_info {
>   			__u64 cgroup_id;
>   			__u32 attach_type;
>   		} cgroup;
> +		struct  {
> +			__u32 netns_ino;
> +			__u32 attach_type;
> +		} netns;
>   	};
>   } __attribute__((aligned(8)));

> --
> 2.25.4


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
  2020-05-27 17:48   ` sdf
  2020-05-27 20:53   ` sdf
@ 2020-05-28  2:54   ` kbuild test robot
  2020-06-04 23:38     ` Nick Desaulniers
  2020-05-28  5:56   ` Andrii Nakryiko
  2020-05-28 13:30   ` Jakub Sitnicki
  4 siblings, 1 reply; 36+ messages in thread
From: kbuild test robot @ 2020-05-28  2:54 UTC (permalink / raw)
  To: Jakub Sitnicki, bpf; +Cc: kbuild-all, clang-built-linux, netdev, kernel-team

[-- Attachment #1: Type: text/plain, Size: 2698 bytes --]

Hi Jakub,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]
[also build test WARNING on net-next/master next-20200526]
[cannot apply to bpf/master net/master linus/master v5.7-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jakub-Sitnicki/Link-based-program-attachment-to-network-namespaces/20200528-011159
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm-randconfig-r035-20200527 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 3393cc4cebf9969db94dc424b7a2b6195589c33b)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> kernel/bpf/net_namespace.c:130:6: warning: variable 'inum' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (net)
^~~
kernel/bpf/net_namespace.c:134:26: note: uninitialized use occurs here
info->netns.netns_ino = inum;
^~~~
kernel/bpf/net_namespace.c:130:2: note: remove the 'if' if its condition is always true
if (net)
^~~~~~~~
kernel/bpf/net_namespace.c:123:19: note: initialize the variable 'inum' to silence this warning
unsigned int inum;
^
= 0
1 warning generated.

vim +130 kernel/bpf/net_namespace.c

   118	
   119	static int bpf_netns_link_fill_info(const struct bpf_link *link,
   120					    struct bpf_link_info *info)
   121	{
   122		const struct bpf_netns_link *net_link;
   123		unsigned int inum;
   124		struct net *net;
   125	
   126		net_link = container_of(link, struct bpf_netns_link, link);
   127	
   128		rcu_read_lock();
   129		net = rcu_dereference(net_link->net);
 > 130		if (net)
   131			inum = net->ns.inum;
   132		rcu_read_unlock();
   133	
   134		info->netns.netns_ino = inum;
   135		info->netns.attach_type = net_link->type;
   136		return 0;
   137	}
   138	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 35429 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
                     ` (2 preceding siblings ...)
  2020-05-28  2:54   ` kbuild test robot
@ 2020-05-28  5:56   ` Andrii Nakryiko
  2020-05-28 12:26     ` Jakub Sitnicki
  2020-05-28 13:30   ` Jakub Sitnicki
  4 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-28  5:56 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add support for bpf() syscall subcommands that operate on
> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
> network namespaces (that is flow dissector at the moment).
>
> Link-based and prog-based attachment can be used interchangeably, but only
> one can be in use at a time. Attempts to attach a link when a prog is
> already attached directly, and the other way around, will be met with
> -EBUSY.
>
> Attachment of multiple links of same attach type to one netns is not
> supported, with the intention to lift it when a use-case presents
> itself. Because of that attempts to create a netns link, when one already
> exists result in -E2BIG error, signifying that there is no space left for
> another attachment.
>
> Link-based attachments to netns don't keep a netns alive by holding a ref
> to it. Instead links get auto-detached from netns when the latter is being
> destroyed by a pernet pre_exit callback.
>
> When auto-detached, link lives in defunct state as long there are open FDs
> for it. -ENOLINK is returned if a user tries to update a defunct link.
>
> Because bpf_link to netns doesn't hold a ref to struct net, special care is
> taken when releasing the link. The netns might be getting torn down when
> the release function tries to access it to detach the link.
>
> To ensure the struct net object is alive when release function accesses it
> we rely on the fact that cleanup_net(), struct net destructor, calls
> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
> pre_exit happens first, link release will not attempt to access struct net.
>
> Same applies the other way around, network namespace doesn't keep an
> attached link alive because by not holding a ref to it. Instead bpf_links
> to netns are RCU-freed, so that pernet pre_exit callback can safely access
> and auto-detach the link when racing with link release/free.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf-netns.h      |   8 +
>  include/net/netns/bpf.h        |   1 +
>  include/uapi/linux/bpf.h       |   5 +
>  kernel/bpf/net_namespace.c     | 257 ++++++++++++++++++++++++++++++++-
>  kernel/bpf/syscall.c           |   3 +
>  net/core/filter.c              |   1 +
>  tools/include/uapi/linux/bpf.h |   5 +
>  7 files changed, 278 insertions(+), 2 deletions(-)
>

[...]

>  struct netns_bpf {
>         struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
> +       struct bpf_link __rcu *links[MAX_NETNS_BPF_ATTACH_TYPE];
>  };
>

[...]

>
> -/* Protects updates to netns_bpf */
> +struct bpf_netns_link {
> +       struct bpf_link link;
> +       enum bpf_attach_type type;
> +       enum netns_bpf_attach_type netns_type;
> +
> +       /* struct net is not RCU-freed but we treat it as such because
> +        * our pre_exit callback will NULL this pointer before
> +        * cleanup_net() calls synchronize_rcu().
> +        */
> +       struct net __rcu *net;

It feels to me (see comments below), that if you use mutex
consistently, then this shouldn't be __rcu and you won't even need
rcu_read_lock() when working with this pointer, because auto_detach
and everything else won't be racing: if you got mutex in release() and
you see non-null net pointer, auto-detach either didn't happen yet, or
is happening at the same time, but is blocked on mutex. If you got
mutex and see net == NULL, ok, auto-detach succeeded before release,
so ignore net clean-up. Easy, no?

> +
> +       /* bpf_netns_link is RCU-freed for pre_exit callback invoked
> +        * by cleanup_net() to safely access the link.
> +        */
> +       struct rcu_head rcu;
> +};
> +
> +/* Protects updates to netns_bpf. */
>  DEFINE_MUTEX(netns_bpf_mutex);
>
> +static inline struct bpf_netns_link *to_bpf_netns_link(struct bpf_link *link)
> +{
> +       return container_of(link, struct bpf_netns_link, link);
> +}
> +
> +/* Called with RCU read lock. */
> +static void __net_exit
> +bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
> +{
> +       struct bpf_netns_link *net_link;
> +       struct bpf_link *link;
> +
> +       link = rcu_dereference(net->bpf.links[type]);
> +       if (!link)
> +               return;
> +       net_link = to_bpf_netns_link(link);
> +       RCU_INIT_POINTER(net_link->net, NULL);

Given link attach and release is done under netns_bpf_mutex, shouldn't
this be done under the same mutex? You are modifying link concurrently
with update, but you are not synchronizing that access.

> +}
> +
> +static void bpf_netns_link_release(struct bpf_link *link)
> +{
> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +       enum netns_bpf_attach_type type = net_link->netns_type;
> +       struct net *net;
> +
> +       /* Link auto-detached by dying netns. */
> +       if (!rcu_access_pointer(net_link->net))
> +               return;
> +
> +       mutex_lock(&netns_bpf_mutex);
> +
> +       /* Recheck after potential sleep. We can race with cleanup_net
> +        * here, but if we see a non-NULL struct net pointer pre_exit
> +        * and following synchronize_rcu() has not happened yet, and
> +        * we have until the end of grace period to access net.
> +        */
> +       rcu_read_lock();
> +       net = rcu_dereference(net_link->net);
> +       if (net) {
> +               RCU_INIT_POINTER(net->bpf.links[type], NULL);
> +               RCU_INIT_POINTER(net->bpf.progs[type], NULL);

bpf.progs[type] is supposed to be NULL already, why setting it again here?

> +       }
> +       rcu_read_unlock();
> +
> +       mutex_unlock(&netns_bpf_mutex);
> +}
> +
> +static void bpf_netns_link_dealloc(struct bpf_link *link)
> +{
> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +
> +       /* Delay kfree in case we're racing with cleanup_net. */
> +       kfree_rcu(net_link, rcu);

It feels to me like this RCU stuff for links is a bit overcomplicated.
If I understand your changes correctly (and please correct me if I'm
wrong), netns_bpf's progs are sort of like "effective progs" for
cgroup. Regardless if attachment was bpf_link-based or straight
bpf_prog-based, prog will always be set. link[type] would be set
additionally only if bpf_link-based attachment was done. And that
makes sense to make dissector hot path faster and simpler.

But if that's the case, link itself is always (except for auto-detach,
which I think should be fixed) accessed under mutex and doesn't
need/rely on rcu_read_lock() at all. So __rcu annotation for links is
not necessary, all the rcu dereferences for links are unnecessary, and
this kfree_rcu() is unnecessary. If that's not the case, please help
me understand why not.

> +}
> +
> +static int bpf_netns_link_update_prog(struct bpf_link *link,
> +                                     struct bpf_prog *new_prog,
> +                                     struct bpf_prog *old_prog)
> +{
> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +       struct net *net;
> +       int ret = 0;
> +
> +       if (old_prog && old_prog != link->prog)
> +               return -EPERM;
> +       if (new_prog->type != link->prog->type)
> +               return -EINVAL;
> +
> +       mutex_lock(&netns_bpf_mutex);
> +       rcu_read_lock();
> +
> +       net = rcu_dereference(net_link->net);
> +       if (!net || !check_net(net)) {
> +               /* Link auto-detached or netns dying */
> +               ret = -ENOLINK;

This is an interesting error code. If we are going to adopt this, we
should change it for similar cgroup link situation as well.

> +               goto out_unlock;
> +       }
> +
> +       old_prog = xchg(&link->prog, new_prog);
> +       bpf_prog_put(old_prog);
> +
> +out_unlock:
> +       rcu_read_unlock();
> +       mutex_unlock(&netns_bpf_mutex);
> +
> +       return ret;
> +}
> +
> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
> +                                   struct bpf_link_info *info)
> +{
> +       const struct bpf_netns_link *net_link;
> +       unsigned int inum;
> +       struct net *net;
> +
> +       net_link = container_of(link, struct bpf_netns_link, link);

you use to_bpf_netns_link() in few places above, but straight
container_of() here. Let's do this consistently (I'd rather stick to
straight container_of, but that's minor).

> +
> +       rcu_read_lock();
> +       net = rcu_dereference(net_link->net);
> +       if (net)
> +               inum = net->ns.inum;
> +       rcu_read_unlock();
> +
> +       info->netns.netns_ino = inum;
> +       info->netns.attach_type = net_link->type;
> +       return 0;
> +}
> +
> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
> +                                      struct seq_file *seq)
> +{
> +       struct bpf_link_info info = {};
> +
> +       bpf_netns_link_fill_info(link, &info);
> +       seq_printf(seq,
> +                  "netns_ino:\t%u\n"
> +                  "attach_type:\t%u\n",
> +                  info.netns.netns_ino,
> +                  info.netns.attach_type);
> +}
> +
> +static const struct bpf_link_ops bpf_netns_link_ops = {
> +       .release = bpf_netns_link_release,
> +       .dealloc = bpf_netns_link_dealloc,
> +       .update_prog = bpf_netns_link_update_prog,
> +       .fill_link_info = bpf_netns_link_fill_info,
> +       .show_fdinfo = bpf_netns_link_show_fdinfo,
> +};
> +
>  int netns_bpf_prog_query(const union bpf_attr *attr,
>                          union bpf_attr __user *uattr)
>  {
> @@ -67,6 +213,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>
>         net = current->nsproxy->net_ns;
>         mutex_lock(&netns_bpf_mutex);
> +
> +       /* Attaching prog directly is not compatible with links */
> +       if (rcu_access_pointer(net->bpf.links[type])) {
> +               ret = -EBUSY;

EEXIST would be returned if another prog is attached. Given in this
case attaching prog or link has same semantics (one cannot replace
attached program, unlike for cgroups), should we keep it consistent
and return EEXIST here as well?

> +               goto unlock;
> +       }
> +
>         switch (type) {
>         case NETNS_BPF_FLOW_DISSECTOR:
>                 ret = flow_dissector_bpf_prog_attach(net, prog);
> @@ -75,6 +228,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>                 ret = -EINVAL;
>                 break;
>         }
> +unlock:
>         mutex_unlock(&netns_bpf_mutex);
>
>         return ret;
> @@ -85,6 +239,10 @@ static int __netns_bpf_prog_detach(struct net *net,
>  {
>         struct bpf_prog *attached;
>
> +       /* Progs attached via links cannot be detached */
> +       if (rcu_access_pointer(net->bpf.links[type]))
> +               return -EBUSY;

This is more of -EINVAL?

> +
>         /* No need for update-side lock when net is going away. */
>         attached = rcu_dereference_protected(net->bpf.progs[type],
>                                              !check_net(net) ||
> @@ -112,14 +270,109 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
>         return ret;
>  }
>
> +static int __netns_bpf_link_attach(struct net *net, struct bpf_link *link,
> +                                  enum netns_bpf_attach_type type)
> +{
> +       int err;
> +
> +       /* Allow attaching only one prog or link for now */
> +       if (rcu_access_pointer(net->bpf.links[type]))
> +               return -E2BIG;
> +       /* Links are not compatible with attaching prog directly */
> +       if (rcu_access_pointer(net->bpf.progs[type]))
> +               return -EBUSY;

Same as above. Do we need three different error codes instead of
consistently using EEXIST?


> +
> +       switch (type) {
> +       case NETNS_BPF_FLOW_DISSECTOR:
> +               err = flow_dissector_bpf_prog_attach(net, link->prog);
> +               break;
> +       default:
> +               err = -EINVAL;
> +               break;
> +       }
> +       if (!err)
> +               rcu_assign_pointer(net->bpf.links[type], link);
> +       return err;
> +}
> +

[...]

> +       err = bpf_link_prime(&net_link->link, &link_primer);
> +       if (err) {
> +               kfree(net_link);
> +               goto out_put_net;
> +       }
> +
> +       err = netns_bpf_link_attach(net, &net_link->link, netns_type);
> +       if (err) {
> +               bpf_link_cleanup(&link_primer);
> +               goto out_put_net;
> +       }
> +
> +       err = bpf_link_settle(&link_primer);

This looks a bit misleading. bpf_link_settle() cannot fail and returns
FD, but here it looks like it might return error. I think it would be
more straightforward to just:

    put_net(net);
    return bpf_link_settle(&link_primer);

out_put_net:
    put_net(net);
    return err;

> +out_put_net:
> +       /* To auto-detach the link from netns when it is getting
> +        * destroyed, we can't hold a ref to it. Instead, we rely on
> +        * RCU when accessing link->net pointer.
> +        */
> +       put_net(net);
> +       return err;
> +}
> +
>  static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
>  {
>         enum netns_bpf_attach_type type;
>
> +       rcu_read_lock();
>         for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
> -               if (rcu_access_pointer(net->bpf.progs[type]))
> +               if (rcu_access_pointer(net->bpf.links[type]))
> +                       bpf_netns_link_auto_detach(net, type);
> +               else if (rcu_access_pointer(net->bpf.progs[type]))
>                         __netns_bpf_prog_detach(net, type);
>         }
> +       rcu_read_unlock();
>  }
>

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment
  2020-05-27 17:08 ` [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
@ 2020-05-28  5:59   ` Andrii Nakryiko
  2020-05-28 13:05     ` Jakub Sitnicki
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-28  5:59 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create
> an FD-based kernel bpf_link, for attach types tied to network namespace,
> that is BPF_FLOW_DISSECTOR for the moment.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  tools/lib/bpf/libbpf.c   | 20 ++++++++++++++++----
>  tools/lib/bpf/libbpf.h   |  2 ++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 5d60de6fd818..a49c1eb5db64 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -7894,8 +7894,8 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,
>         return bpf_program__attach_iter(prog, NULL);
>  }
>
> -struct bpf_link *
> -bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
> +static struct bpf_link *
> +bpf_program__attach_fd(struct bpf_program *prog, int target_fd)
>  {
>         enum bpf_attach_type attach_type;
>         char errmsg[STRERR_BUFSIZE];
> @@ -7915,11 +7915,11 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
>         link->detach = &bpf_link__detach_fd;
>
>         attach_type = bpf_program__get_expected_attach_type(prog);
> -       link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
> +       link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL);
>         if (link_fd < 0) {
>                 link_fd = -errno;
>                 free(link);
> -               pr_warn("program '%s': failed to attach to cgroup: %s\n",
> +               pr_warn("program '%s': failed to attach to cgroup/netns: %s\n",

I understand the desire to save few lines of code, but it hurts error
reporting. Now it's cgroup/netns, tomorrow cgroup/netns/lirc/whatever.
If you want to generalize, let's preserve clarity of error message,
please.

>                         bpf_program__title(prog, false),
>                         libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
>                 return ERR_PTR(link_fd);
> @@ -7928,6 +7928,18 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
>         return link;
>  }
>
> +struct bpf_link *
> +bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
> +{
> +       return bpf_program__attach_fd(prog, cgroup_fd);
> +}
> +
> +struct bpf_link *
> +bpf_program__attach_netns(struct bpf_program *prog, int netns_fd)
> +{
> +       return bpf_program__attach_fd(prog, netns_fd);
> +}
> +
>  struct bpf_link *
>  bpf_program__attach_iter(struct bpf_program *prog,
>                          const struct bpf_iter_attach_opts *opts)
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 1e2e399a5f2c..adf6fd9b6fe8 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -253,6 +253,8 @@ LIBBPF_API struct bpf_link *
>  bpf_program__attach_lsm(struct bpf_program *prog);
>  LIBBPF_API struct bpf_link *
>  bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
> +LIBBPF_API struct bpf_link *
> +bpf_program__attach_netns(struct bpf_program *prog, int netns_fd);
>
>  struct bpf_map;
>
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 381a7342ecfc..7ad21ba1feb6 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -263,4 +263,5 @@ LIBBPF_0.0.9 {
>                 bpf_link_get_next_id;
>                 bpf_program__attach_iter;
>                 perf_buffer__consume;
> +               bpf_program__attach_netns;

Please keep it alphabetical.

>  } LIBBPF_0.0.8;
> --
> 2.25.4
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links
  2020-05-27 17:08 ` [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links Jakub Sitnicki
@ 2020-05-28  6:02   ` Andrii Nakryiko
  2020-05-28 13:10     ` Jakub Sitnicki
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-28  6:02 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Make `bpf link show` aware of new link type, that is links attached to
> netns. When listing netns-attached links, display netns inode number as its
> identifier and link attach type.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  tools/bpf/bpftool/link.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
> index 670a561dc31b..83a17d62c4c3 100644
> --- a/tools/bpf/bpftool/link.c
> +++ b/tools/bpf/bpftool/link.c
> @@ -17,6 +17,7 @@ static const char * const link_type_name[] = {
>         [BPF_LINK_TYPE_TRACING]                 = "tracing",
>         [BPF_LINK_TYPE_CGROUP]                  = "cgroup",
>         [BPF_LINK_TYPE_ITER]                    = "iter",
> +       [BPF_LINK_TYPE_NETNS]                   = "netns",
>  };
>
>  static int link_parse_fd(int *argc, char ***argv)
> @@ -122,6 +123,16 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
>                         jsonw_uint_field(json_wtr, "attach_type",
>                                          info->cgroup.attach_type);
>                 break;
> +       case BPF_LINK_TYPE_NETNS:
> +               jsonw_uint_field(json_wtr, "netns_ino",
> +                                info->netns.netns_ino);
> +               if (info->netns.attach_type < ARRAY_SIZE(attach_type_name))
> +                       jsonw_string_field(json_wtr, "attach_type",
> +                               attach_type_name[info->netns.attach_type]);
> +               else
> +                       jsonw_uint_field(json_wtr, "attach_type",
> +                                        info->netns.attach_type);
> +               break;

Can you please extract this attach_type handling into a helper func,
it's annoying to read so many repetitive if/elses. Same for plain-text
variant below. Thanks!

>         default:
>                 break;
>         }
> @@ -190,6 +201,14 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
>                 else
>                         printf("attach_type %u  ", info->cgroup.attach_type);
>                 break;
> +       case BPF_LINK_TYPE_NETNS:
> +               printf("\n\tnetns_ino %u  ", info->netns.netns_ino);
> +               if (info->netns.attach_type < ARRAY_SIZE(attach_type_name))
> +                       printf("attach_type %s  ",
> +                              attach_type_name[info->netns.attach_type]);
> +               else
> +                       printf("attach_type %u  ", info->netns.attach_type);
> +               break;
>         default:
>                 break;
>         }
> --
> 2.25.4
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns
  2020-05-27 17:08 ` [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
@ 2020-05-28  6:08   ` Andrii Nakryiko
  2020-05-28 13:29     ` Jakub Sitnicki
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-28  6:08 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Extend the existing test case for flow dissector attaching to cover:
>
>  - link creation,
>  - link updates,
>  - link info querying,
>  - mixing links with direct prog attachment.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

You are not using bpf_program__attach_netns() at all. Would be nice to
actually use higher-level API here...

Also... what's up with people using CHECK_FAIL + perror instead of
CHECK? Is CHECK being avoided for some reason or people are just not
aware of it (which is strange, because CHECK was there before
CHECK_FAIL)?

>  .../bpf/prog_tests/flow_dissector_reattach.c  | 500 +++++++++++++++++-
>  1 file changed, 471 insertions(+), 29 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 20:38       ` sdf
@ 2020-05-28 10:34         ` Jakub Sitnicki
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 10:34 UTC (permalink / raw)
  To: sdf; +Cc: bpf, netdev, kernel-team

On Wed, May 27, 2020 at 10:38 PM CEST, sdf@google.com wrote:
> On 05/27, Jakub Sitnicki wrote:
>> On Wed, May 27, 2020 at 07:48 PM CEST, sdf@google.com wrote:
>> > On 05/27, Jakub Sitnicki wrote:
>> >> Add support for bpf() syscall subcommands that operate on
>> >> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
>> >> network namespaces (that is flow dissector at the moment).
>> >
>> >> Link-based and prog-based attachment can be used interchangeably, but only
>> >> one can be in use at a time. Attempts to attach a link when a prog is
>> >> already attached directly, and the other way around, will be met with
>> >> -EBUSY.
>> >
>> >> Attachment of multiple links of same attach type to one netns is not
>> >> supported, with the intention to lift it when a use-case presents
>> >> itself. Because of that attempts to create a netns link, when one already
>> >> exists result in -E2BIG error, signifying that there is no space left for
>> >> another attachment.
>> >
>> >> Link-based attachments to netns don't keep a netns alive by holding a ref
>> >> to it. Instead links get auto-detached from netns when the latter is being
>> >> destroyed by a pernet pre_exit callback.
>> >
>> >> When auto-detached, link lives in defunct state as long there are open FDs
>> >> for it. -ENOLINK is returned if a user tries to update a defunct link.
>> >
>> >> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> >> taken when releasing the link. The netns might be getting torn down when
>> >> the release function tries to access it to detach the link.
>> >
>> >> To ensure the struct net object is alive when release function accesses it
>> >> we rely on the fact that cleanup_net(), struct net destructor, calls
>> >> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
>> >> pre_exit happens first, link release will not attempt to access struct net.
>> >
>> >> Same applies the other way around, network namespace doesn't keep an
>> >> attached link alive because by not holding a ref to it. Instead bpf_links
>> >> to netns are RCU-freed, so that pernet pre_exit callback can safely access
>> >> and auto-detach the link when racing with link release/free.
>> >
>> > [..]
>> >> +	rcu_read_lock();
>> >>   	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
>> >> -		if (rcu_access_pointer(net->bpf.progs[type]))
>> >> +		if (rcu_access_pointer(net->bpf.links[type]))
>> >> +			bpf_netns_link_auto_detach(net, type);
>> >> +		else if (rcu_access_pointer(net->bpf.progs[type]))
>> >>   			__netns_bpf_prog_detach(net, type);
>> >>   	}
>> >> +	rcu_read_unlock();
>> > Aren't you doing RCU_INIT_POINTER in __netns_bpf_prog_detach?
>> > Is it allowed under rcu_read_load?
>
>> Yes, that's true. __netns_bpf_prog_detach does
>
>> 	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
>
>> RCU read lock is here for the rcu_dereference() that happens in
>> bpf_netns_link_auto_detach (netns doesn't hold a ref to bpf_link):
>
>> /* Called with RCU read lock. */
>> static void __net_exit
>> bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
>> {
>> 	struct bpf_netns_link *net_link;
>> 	struct bpf_link *link;
>
>> 	link = rcu_dereference(net->bpf.links[type]);
>> 	if (!link)
>> 		return;
>> 	net_link = to_bpf_netns_link(link);
>> 	RCU_INIT_POINTER(net_link->net, NULL);
>> }
>
>> I've pulled it up, out of the loop, perhaps too eagerly and just made it
>> confusing, considering we're iterating over a 1-item array :-)
>
>> Now, I'm also doing RCU_INIT_POINTER on the *contents of bpf_link* in
>> bpf_netns_link_auto_detach. Is that allowed? I'm not sure, that bit is
>> hazy to me.
>
>> There are no concurrent writers to net_link->net, just readers, i.e.
>> bpf_netns_link_release(). And I know bpf_link won't be freed until the
>> grace period elapses.
>
>> sparse and CONFIG_PROVE_RCU are not shouting at me, but please do if it
>> doesn't hold up or make sense.
>
>> I certainly can push the rcu_read_lock() down into
>> bpf_netns_link_auto_detach().
> I think it would be much nicer if you push them down to preserve the
> assumption that nothing is modified under read lock and you flip
> the pointers only when holding the mutexes.

I certainly see how that would save some head-scratching. Might be
doable with grabbing a temporary reference to struct net/struct
bpf_link. Please see the code snippet below.

>
> I'll do another pass on this patch, I think I don't understand a bunch
> of bits where you do:
>
> mutex_lock
> rcu_read_lock -> why? you're already in the update section, can use
>                  rcu_dereference_protected
> ...
> rcu_read_unlock
> mutex_unlock

The rcu_read_lock is to get the grace-period guarantee for the value
(struct net) we access by dereferencing RCU protected pointer
(bpf_netns_link.net).

While mutex_lock is to serialize updates to values within struct
net. That is net->bpf.progs or net->bpf.links.

The locking is done in reverse order because I cannot grab the mutex
while in RCU read-side critical section.

If I was holding a reference to struct net, I would not need to be
inside an RCU read-side critical section to access it. (This is how
bpf_cgroup_link does it when accessing cgroup->bpf.)

One thought I had is that I could rearrange sychnronization so that we
try to grab a reference to struct net when we need to modify it:

	rcu_read_lock();
	net = rcu_dereference(to_bpf_netns_link(link)->net);
	if (net)
		net = maybe_get_net(net);
	rcu_read_unlock();

	if (!net)
		return; /* netns is dead */

	mutex_lock(&netns_bpf_mutex);
	rcu_assign_pointer(net->bpf.progs[type], link->prog);
	rcu_assign_pointer(net->bpf.links[type], link);
	mutex_unlock(&netns_bpf_mutex);
	put_net(net);

That seems be easier to reason about, no?

> But I'll post those comments inline shortly.

Thanks. Will be better to discuss in context.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 20:53   ` sdf
@ 2020-05-28 11:03     ` Jakub Sitnicki
  2020-05-28 16:09       ` sdf
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 11:03 UTC (permalink / raw)
  To: sdf; +Cc: bpf, netdev, kernel-team

On Wed, May 27, 2020 at 10:53 PM CEST, sdf@google.com wrote:
> On 05/27, Jakub Sitnicki wrote:
>> Add support for bpf() syscall subcommands that operate on
>> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
>> network namespaces (that is flow dissector at the moment).
>
>> Link-based and prog-based attachment can be used interchangeably, but only
>> one can be in use at a time. Attempts to attach a link when a prog is
>> already attached directly, and the other way around, will be met with
>> -EBUSY.
>
>> Attachment of multiple links of same attach type to one netns is not
>> supported, with the intention to lift it when a use-case presents
>> itself. Because of that attempts to create a netns link, when one already
>> exists result in -E2BIG error, signifying that there is no space left for
>> another attachment.
>
>> Link-based attachments to netns don't keep a netns alive by holding a ref
>> to it. Instead links get auto-detached from netns when the latter is being
>> destroyed by a pernet pre_exit callback.
>
>> When auto-detached, link lives in defunct state as long there are open FDs
>> for it. -ENOLINK is returned if a user tries to update a defunct link.
>
>> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> taken when releasing the link. The netns might be getting torn down when
>> the release function tries to access it to detach the link.
>
>> To ensure the struct net object is alive when release function accesses it
>> we rely on the fact that cleanup_net(), struct net destructor, calls
>> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
>> pre_exit happens first, link release will not attempt to access struct net.
>
>> Same applies the other way around, network namespace doesn't keep an
>> attached link alive because by not holding a ref to it. Instead bpf_links
>> to netns are RCU-freed, so that pernet pre_exit callback can safely access
>> and auto-detach the link when racing with link release/free.
>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>   include/linux/bpf-netns.h      |   8 +
>>   include/net/netns/bpf.h        |   1 +
>>   include/uapi/linux/bpf.h       |   5 +
>>   kernel/bpf/net_namespace.c     | 257 ++++++++++++++++++++++++++++++++-
>>   kernel/bpf/syscall.c           |   3 +
>>   net/core/filter.c              |   1 +
>>   tools/include/uapi/linux/bpf.h |   5 +
>>   7 files changed, 278 insertions(+), 2 deletions(-)
>
>> diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
>> index f3aec3d79824..4052d649f36d 100644
>> --- a/include/linux/bpf-netns.h
>> +++ b/include/linux/bpf-netns.h
>> @@ -34,6 +34,8 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
>>   int netns_bpf_prog_attach(const union bpf_attr *attr,
>>   			  struct bpf_prog *prog);
>>   int netns_bpf_prog_detach(const union bpf_attr *attr);
>> +int netns_bpf_link_create(const union bpf_attr *attr,
>> +			  struct bpf_prog *prog);
>>   #else
>>   static inline int netns_bpf_prog_query(const union bpf_attr *attr,
>>   				       union bpf_attr __user *uattr)
>> @@ -51,6 +53,12 @@ static inline int netns_bpf_prog_detach(const union
>> bpf_attr *attr)
>>   {
>>   	return -EOPNOTSUPP;
>>   }
>> +
>> +static inline int netns_bpf_link_create(const union bpf_attr *attr,
>> +					struct bpf_prog *prog)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>>   #endif
>
>>   #endif /* _BPF_NETNS_H */
>> diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
>> index a858d1c5b166..baabefdeb10c 100644
>> --- a/include/net/netns/bpf.h
>> +++ b/include/net/netns/bpf.h
>> @@ -12,6 +12,7 @@ struct bpf_prog;
>
>>   struct netns_bpf {
>>   	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
>> +	struct bpf_link __rcu *links[MAX_NETNS_BPF_ATTACH_TYPE];
>>   };
>
>>   #endif /* __NETNS_BPF_H__ */
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 54b93f8b49b8..05469d3c5c82 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -235,6 +235,7 @@ enum bpf_link_type {
>>   	BPF_LINK_TYPE_TRACING = 2,
>>   	BPF_LINK_TYPE_CGROUP = 3,
>>   	BPF_LINK_TYPE_ITER = 4,
>> +	BPF_LINK_TYPE_NETNS = 5,
>
>>   	MAX_BPF_LINK_TYPE,
>>   };
>> @@ -3753,6 +3754,10 @@ struct bpf_link_info {
>>   			__u64 cgroup_id;
>>   			__u32 attach_type;
>>   		} cgroup;
>> +		struct  {
>> +			__u32 netns_ino;
>> +			__u32 attach_type;
>> +		} netns;
>>   	};
>>   } __attribute__((aligned(8)));
>
>> diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
>> index fc89154aed27..1c6009ab93c5 100644
>> --- a/kernel/bpf/net_namespace.c
>> +++ b/kernel/bpf/net_namespace.c
>> @@ -8,9 +8,155 @@
>>    * Functions to manage BPF programs attached to netns
>>    */
>
>> -/* Protects updates to netns_bpf */
>> +struct bpf_netns_link {
>> +	struct bpf_link	link;
>> +	enum bpf_attach_type type;
>> +	enum netns_bpf_attach_type netns_type;
>> +
>> +	/* struct net is not RCU-freed but we treat it as such because
>> +	 * our pre_exit callback will NULL this pointer before
>> +	 * cleanup_net() calls synchronize_rcu().
>> +	 */
>> +	struct net __rcu *net;
>> +
>> +	/* bpf_netns_link is RCU-freed for pre_exit callback invoked
>> +	 * by cleanup_net() to safely access the link.
>> +	 */
>> +	struct rcu_head	rcu;
>> +};
>> +
>> +/* Protects updates to netns_bpf. */
>>   DEFINE_MUTEX(netns_bpf_mutex);
>
>> +static inline struct bpf_netns_link *to_bpf_netns_link(struct bpf_link *link)
>> +{
>> +	return container_of(link, struct bpf_netns_link, link);
>> +}
>> +
>> +/* Called with RCU read lock. */
> As mentioned earlier, ^^^ probably doesn't make sense. Try to avoid
> RCU_INIT_POINTER under read lock.

Agree. As mentione in my earlier response, I will rework it so the only
thing that happens inside RCU read-side critical section is an attempt
to bump a ref count on the object.

Pointer flipping will be done outside of RCU read locks, with mutex held
when there are multiple writers.

>
>> +static void __net_exit
>> +bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
>> +{
>> +	struct bpf_netns_link *net_link;
>> +	struct bpf_link *link;
>> +
>> +	link = rcu_dereference(net->bpf.links[type]);
> Btw, maybe use rcu_deref_protected with !check_net here and drop
> rcu read lock requirement?

In the current synchronization design !check_net wouldn't hold.

bpf_netns_link_{release,update_prog} need to update nete->bpf.links,
while not holding a ref count on struct net.

>
>> +	if (!link)
>> +		return;
>> +	net_link = to_bpf_netns_link(link);
>> +	RCU_INIT_POINTER(net_link->net, NULL);
>> +}
>> +
>> +static void bpf_netns_link_release(struct bpf_link *link)
>> +{
>> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
>> +	enum netns_bpf_attach_type type = net_link->netns_type;
>> +	struct net *net;
>> +
>> +	/* Link auto-detached by dying netns. */
>> +	if (!rcu_access_pointer(net_link->net))
>> +		return;
>> +
>> +	mutex_lock(&netns_bpf_mutex);
>> +
>> +	/* Recheck after potential sleep. We can race with cleanup_net
>> +	 * here, but if we see a non-NULL struct net pointer pre_exit
>> +	 * and following synchronize_rcu() has not happened yet, and
>> +	 * we have until the end of grace period to access net.
>> +	 */
>> +	rcu_read_lock();
>> +	net = rcu_dereference(net_link->net);
> Use rcu_dereferece_protected with netns_bpf_mutex here instead of
> grabbing rcu read lock? Or are you guarding here against 'net'
> going away? In that case, moving unlock higher can make it more
> visually clear.

RCU read lock is to guard against 'net' going away.

While netns_bpf_mutex serializes access to net->bpf.progs. Concurrent
updates are possible from bpf_netns_link_release and
netns_bpf_prog_{attach,detach}.

I agree that having an RCU read-side critical section embedded in
another critical section protected by a mutex is not visually clear.

>
>> +	if (net) {
>> +		RCU_INIT_POINTER(net->bpf.links[type], NULL);
>> +		RCU_INIT_POINTER(net->bpf.progs[type], NULL);
>> +	}
>> +	rcu_read_unlock();
>> +
>> +	mutex_unlock(&netns_bpf_mutex);
>> +}
>> +
>> +static void bpf_netns_link_dealloc(struct bpf_link *link)
>> +{
>> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
>> +
>> +	/* Delay kfree in case we're racing with cleanup_net. */
>> +	kfree_rcu(net_link, rcu);
>> +}
>> +
>> +static int bpf_netns_link_update_prog(struct bpf_link *link,
>> +				      struct bpf_prog *new_prog,
>> +				      struct bpf_prog *old_prog)
>> +{
>> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
>> +	struct net *net;
>> +	int ret = 0;
>> +
>> +	if (old_prog && old_prog != link->prog)
>> +		return -EPERM;
>> +	if (new_prog->type != link->prog->type)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&netns_bpf_mutex);
>> +	rcu_read_lock();
> Again, what are you protecting here? 'net' disappearing? Then maybe do:
>
> 	rcu_read_lock
> 	net = rcu_deref
> 	valid = net && check_net(net)
> 	rcu_read_unlock
>
> 	if (!valid)
> 		bail out.
>
> Otherwise, those mutex_lock+rcu_read_lock are a bit confusing.

Great idea, thanks. This is almost the same as what I was thinking
about. The only difference being that I want to also get ref to net, so
it doesn't go away while we continue outside of RCU read-side critical
section.

>
>> +
>> +	net = rcu_dereference(net_link->net);
>> +	if (!net || !check_net(net)) {
>> +		/* Link auto-detached or netns dying */
>> +		ret = -ENOLINK;
>> +		goto out_unlock;
>> +	}
>> +
>> +	old_prog = xchg(&link->prog, new_prog);
>> +	bpf_prog_put(old_prog);
>> +
>> +out_unlock:
>> +	rcu_read_unlock();
>> +	mutex_unlock(&netns_bpf_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
>> +				    struct bpf_link_info *info)
>> +{
>> +	const struct bpf_netns_link *net_link;
>> +	unsigned int inum;
>> +	struct net *net;
>> +
>> +	net_link = container_of(link, struct bpf_netns_link, link);
>> +
>> +	rcu_read_lock();
>> +	net = rcu_dereference(net_link->net);
>> +	if (net)
>> +		inum = net->ns.inum;
>> +	rcu_read_unlock();
>> +
>> +	info->netns.netns_ino = inum;
>> +	info->netns.attach_type = net_link->type;
>> +	return 0;
>> +}
>> +
>> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
>> +				       struct seq_file *seq)
>> +{
>> +	struct bpf_link_info info = {};
>> +
>> +	bpf_netns_link_fill_info(link, &info);
>> +	seq_printf(seq,
>> +		   "netns_ino:\t%u\n"
>> +		   "attach_type:\t%u\n",
>> +		   info.netns.netns_ino,
>> +		   info.netns.attach_type);
>> +}
>> +
>> +static const struct bpf_link_ops bpf_netns_link_ops = {
>> +	.release = bpf_netns_link_release,
>> +	.dealloc = bpf_netns_link_dealloc,
>> +	.update_prog = bpf_netns_link_update_prog,
>> +	.fill_link_info = bpf_netns_link_fill_info,
>> +	.show_fdinfo = bpf_netns_link_show_fdinfo,
>> +};
>> +
>>   int netns_bpf_prog_query(const union bpf_attr *attr,
>>   			 union bpf_attr __user *uattr)
>>   {
>> @@ -67,6 +213,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr,
>> struct bpf_prog *prog)
>
>>   	net = current->nsproxy->net_ns;
>>   	mutex_lock(&netns_bpf_mutex);
>> +
>> +	/* Attaching prog directly is not compatible with links */
>> +	if (rcu_access_pointer(net->bpf.links[type])) {
>> +		ret = -EBUSY;
>> +		goto unlock;
>> +	}
>> +
>>   	switch (type) {
>>   	case NETNS_BPF_FLOW_DISSECTOR:
>>   		ret = flow_dissector_bpf_prog_attach(net, prog);
>> @@ -75,6 +228,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr,
>> struct bpf_prog *prog)
>>   		ret = -EINVAL;
>>   		break;
>>   	}
>> +unlock:
>>   	mutex_unlock(&netns_bpf_mutex);
>
>>   	return ret;
>> @@ -85,6 +239,10 @@ static int __netns_bpf_prog_detach(struct net *net,
>>   {
>>   	struct bpf_prog *attached;
>
>> +	/* Progs attached via links cannot be detached */
>> +	if (rcu_access_pointer(net->bpf.links[type]))
>> +		return -EBUSY;
>> +
>>   	/* No need for update-side lock when net is going away. */
>>   	attached = rcu_dereference_protected(net->bpf.progs[type],
>>   					     !check_net(net) ||
>> @@ -112,14 +270,109 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
>>   	return ret;
>>   }
>
>> +static int __netns_bpf_link_attach(struct net *net, struct bpf_link *link,
>> +				   enum netns_bpf_attach_type type)
>> +{
>> +	int err;
>> +
>> +	/* Allow attaching only one prog or link for now */
>> +	if (rcu_access_pointer(net->bpf.links[type]))
>> +		return -E2BIG;
>> +	/* Links are not compatible with attaching prog directly */
>> +	if (rcu_access_pointer(net->bpf.progs[type]))
>> +		return -EBUSY;
>> +
>> +	switch (type) {
>> +	case NETNS_BPF_FLOW_DISSECTOR:
>> +		err = flow_dissector_bpf_prog_attach(net, link->prog);
>> +		break;
>> +	default:
>> +		err = -EINVAL;
>> +		break;
>> +	}
>> +	if (!err)
>> +		rcu_assign_pointer(net->bpf.links[type], link);
>> +	return err;
>> +}
>> +
>> +static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
>> +				 enum netns_bpf_attach_type type)
>> +{
>> +	int ret;
>> +
>> +	mutex_lock(&netns_bpf_mutex);
>> +	ret = __netns_bpf_link_attach(net, link, type);
> What's the point of separating __netns_bpf_link_attach out?
> Does it make sense to replace those rcu_access_pointer's in
> __netns_bpf_link_attach with rcu_deref_protected and put everything here?
> This makes it a bit easier to reason about.

I've structured it like that just to avoid goto jumps to mutex_unlock on
error. No strong feelings that it must be like that.

You're right about rcu_deref_protected(), rcu_access_pointer() docs are
explicit about it:

/**
 * rcu_access_pointer() - fetch RCU pointer with no dereferencing
 * @p: The pointer to read
...

 * Return the value of the specified RCU-protected pointer, but omit the
 * lockdep checks for being in an RCU read-side critical section.  This is
 * useful when the value of this pointer is accessed, but the pointer is
 * not dereferenced, for example, when testing an RCU-protected pointer
 * against NULL.  Although rcu_access_pointer() may also be used in cases
 * where update-side locks prevent the value of the pointer from changing,
 * you should instead use rcu_dereference_protected() for this use case.
...
 */

My mistake. Will switch to rcu_deref_protected in v2.


Thanks again for feedback.

-jkbs

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-28  5:56   ` Andrii Nakryiko
@ 2020-05-28 12:26     ` Jakub Sitnicki
  2020-05-28 18:09       ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 12:26 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Thu, May 28, 2020 at 07:56 AM CEST, Andrii Nakryiko wrote:
> On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Add support for bpf() syscall subcommands that operate on
>> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
>> network namespaces (that is flow dissector at the moment).
>>
>> Link-based and prog-based attachment can be used interchangeably, but only
>> one can be in use at a time. Attempts to attach a link when a prog is
>> already attached directly, and the other way around, will be met with
>> -EBUSY.
>>
>> Attachment of multiple links of same attach type to one netns is not
>> supported, with the intention to lift it when a use-case presents
>> itself. Because of that attempts to create a netns link, when one already
>> exists result in -E2BIG error, signifying that there is no space left for
>> another attachment.
>>
>> Link-based attachments to netns don't keep a netns alive by holding a ref
>> to it. Instead links get auto-detached from netns when the latter is being
>> destroyed by a pernet pre_exit callback.
>>
>> When auto-detached, link lives in defunct state as long there are open FDs
>> for it. -ENOLINK is returned if a user tries to update a defunct link.
>>
>> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> taken when releasing the link. The netns might be getting torn down when
>> the release function tries to access it to detach the link.
>>
>> To ensure the struct net object is alive when release function accesses it
>> we rely on the fact that cleanup_net(), struct net destructor, calls
>> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
>> pre_exit happens first, link release will not attempt to access struct net.
>>
>> Same applies the other way around, network namespace doesn't keep an
>> attached link alive because by not holding a ref to it. Instead bpf_links
>> to netns are RCU-freed, so that pernet pre_exit callback can safely access
>> and auto-detach the link when racing with link release/free.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/linux/bpf-netns.h      |   8 +
>>  include/net/netns/bpf.h        |   1 +
>>  include/uapi/linux/bpf.h       |   5 +
>>  kernel/bpf/net_namespace.c     | 257 ++++++++++++++++++++++++++++++++-
>>  kernel/bpf/syscall.c           |   3 +
>>  net/core/filter.c              |   1 +
>>  tools/include/uapi/linux/bpf.h |   5 +
>>  7 files changed, 278 insertions(+), 2 deletions(-)
>>
>
> [...]
>
>>  struct netns_bpf {
>>         struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
>> +       struct bpf_link __rcu *links[MAX_NETNS_BPF_ATTACH_TYPE];
>>  };
>>
>
> [...]
>
>>
>> -/* Protects updates to netns_bpf */
>> +struct bpf_netns_link {
>> +       struct bpf_link link;
>> +       enum bpf_attach_type type;
>> +       enum netns_bpf_attach_type netns_type;
>> +
>> +       /* struct net is not RCU-freed but we treat it as such because
>> +        * our pre_exit callback will NULL this pointer before
>> +        * cleanup_net() calls synchronize_rcu().
>> +        */
>> +       struct net __rcu *net;
>
> It feels to me (see comments below), that if you use mutex
> consistently, then this shouldn't be __rcu and you won't even need
> rcu_read_lock() when working with this pointer, because auto_detach
> and everything else won't be racing: if you got mutex in release() and
> you see non-null net pointer, auto-detach either didn't happen yet, or
> is happening at the same time, but is blocked on mutex. If you got
> mutex and see net == NULL, ok, auto-detach succeeded before release,
> so ignore net clean-up. Easy, no?

Yes, right. Grabbing the mutex in auto_detach changes everything.

I think I went about it the wrong way by trying to avoid a mutex there.
I'll go with you advice and see if I can get rid of RCU pointers in v2.

>
>> +
>> +       /* bpf_netns_link is RCU-freed for pre_exit callback invoked
>> +        * by cleanup_net() to safely access the link.
>> +        */
>> +       struct rcu_head rcu;
>> +};
>> +
>> +/* Protects updates to netns_bpf. */
>>  DEFINE_MUTEX(netns_bpf_mutex);
>>
>> +static inline struct bpf_netns_link *to_bpf_netns_link(struct bpf_link *link)
>> +{
>> +       return container_of(link, struct bpf_netns_link, link);
>> +}
>> +
>> +/* Called with RCU read lock. */
>> +static void __net_exit
>> +bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
>> +{
>> +       struct bpf_netns_link *net_link;
>> +       struct bpf_link *link;
>> +
>> +       link = rcu_dereference(net->bpf.links[type]);
>> +       if (!link)
>> +               return;
>> +       net_link = to_bpf_netns_link(link);
>> +       RCU_INIT_POINTER(net_link->net, NULL);
>
> Given link attach and release is done under netns_bpf_mutex, shouldn't
> this be done under the same mutex? You are modifying link concurrently
> with update, but you are not synchronizing that access.

After attaching the link to netns, there are no other updaters to
net_link->net field. Link update has to modify link->prog, and update
net->bpf.progs, but doesn't need to write to net_link->net, just read
it.

That's why I don't grab the netns_bpf_mutex on auto_detach.

>
>> +}
>> +
>> +static void bpf_netns_link_release(struct bpf_link *link)
>> +{
>> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
>> +       enum netns_bpf_attach_type type = net_link->netns_type;
>> +       struct net *net;
>> +
>> +       /* Link auto-detached by dying netns. */
>> +       if (!rcu_access_pointer(net_link->net))
>> +               return;
>> +
>> +       mutex_lock(&netns_bpf_mutex);
>> +
>> +       /* Recheck after potential sleep. We can race with cleanup_net
>> +        * here, but if we see a non-NULL struct net pointer pre_exit
>> +        * and following synchronize_rcu() has not happened yet, and
>> +        * we have until the end of grace period to access net.
>> +        */
>> +       rcu_read_lock();
>> +       net = rcu_dereference(net_link->net);
>> +       if (net) {
>> +               RCU_INIT_POINTER(net->bpf.links[type], NULL);
>> +               RCU_INIT_POINTER(net->bpf.progs[type], NULL);
>
> bpf.progs[type] is supposed to be NULL already, why setting it again here?
>
>> +       }
>> +       rcu_read_unlock();
>> +
>> +       mutex_unlock(&netns_bpf_mutex);
>> +}
>> +
>> +static void bpf_netns_link_dealloc(struct bpf_link *link)
>> +{
>> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
>> +
>> +       /* Delay kfree in case we're racing with cleanup_net. */
>> +       kfree_rcu(net_link, rcu);
>
> It feels to me like this RCU stuff for links is a bit overcomplicated.
> If I understand your changes correctly (and please correct me if I'm
> wrong), netns_bpf's progs are sort of like "effective progs" for
> cgroup. Regardless if attachment was bpf_link-based or straight
> bpf_prog-based, prog will always be set. link[type] would be set
> additionally only if bpf_link-based attachment was done. And that
> makes sense to make dissector hot path faster and simpler.

Correct. netns link is heavily modelled after cgroup
link. net->bpf.progs is like "effecttive progs". In fact, the next step
would be to turn it into bpf_prog_array.

> But if that's the case, link itself is always (except for auto-detach,
> which I think should be fixed) accessed under mutex and doesn't
> need/rely on rcu_read_lock() at all. So __rcu annotation for links is
> not necessary, all the rcu dereferences for links are unnecessary, and
> this kfree_rcu() is unnecessary. If that's not the case, please help
> me understand why not.

__rcu annotation for net->bpf.links is for auto_detach to access links
with grace period guarantee, in its current form. If we can get around
that, then I'm with you on that net->bpf.links doesn't need to an RCU
protected pointer.

I think you're right that grabbing netns_bpf_mutex in auto_detach will
guarantee that bpf_link stays alive while we access it from there. We
either see a non-NULL net->bpf.links[type] and know that release can't
run before we finish working on it.

Or we see a NULL net->bpf.links[type] and there is nothing to do.
Thanks makes __rcu annotation and kfree_rcu not needed any more.

Thanks, that is a great suggestion.

>
>> +}
>> +
>> +static int bpf_netns_link_update_prog(struct bpf_link *link,
>> +                                     struct bpf_prog *new_prog,
>> +                                     struct bpf_prog *old_prog)
>> +{
>> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
>> +       struct net *net;
>> +       int ret = 0;
>> +
>> +       if (old_prog && old_prog != link->prog)
>> +               return -EPERM;
>> +       if (new_prog->type != link->prog->type)
>> +               return -EINVAL;
>> +
>> +       mutex_lock(&netns_bpf_mutex);
>> +       rcu_read_lock();
>> +
>> +       net = rcu_dereference(net_link->net);
>> +       if (!net || !check_net(net)) {
>> +               /* Link auto-detached or netns dying */
>> +               ret = -ENOLINK;
>
> This is an interesting error code. If we are going to adopt this, we
> should change it for similar cgroup link situation as well.

Credit goes to Lorenz for suggesting a different error code than EINVAL
for this situation so that user-space has a way to distinguish.

I'm happy to patch cgroup BPF link, if you support this change.

>
>> +               goto out_unlock;
>> +       }
>> +
>> +       old_prog = xchg(&link->prog, new_prog);
>> +       bpf_prog_put(old_prog);
>> +
>> +out_unlock:
>> +       rcu_read_unlock();
>> +       mutex_unlock(&netns_bpf_mutex);
>> +
>> +       return ret;
>> +}
>> +
>> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
>> +                                   struct bpf_link_info *info)
>> +{
>> +       const struct bpf_netns_link *net_link;
>> +       unsigned int inum;
>> +       struct net *net;
>> +
>> +       net_link = container_of(link, struct bpf_netns_link, link);
>
> you use to_bpf_netns_link() in few places above, but straight
> container_of() here. Let's do this consistently (I'd rather stick to
> straight container_of, but that's minor).

Ah, yes. That's because bpf_link has const attribute here.  I think I
can turn to_bpf_netns_link() into a macro to get around that.

>
>> +
>> +       rcu_read_lock();
>> +       net = rcu_dereference(net_link->net);
>> +       if (net)
>> +               inum = net->ns.inum;
>> +       rcu_read_unlock();
>> +
>> +       info->netns.netns_ino = inum;
>> +       info->netns.attach_type = net_link->type;
>> +       return 0;
>> +}
>> +
>> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
>> +                                      struct seq_file *seq)
>> +{
>> +       struct bpf_link_info info = {};
>> +
>> +       bpf_netns_link_fill_info(link, &info);
>> +       seq_printf(seq,
>> +                  "netns_ino:\t%u\n"
>> +                  "attach_type:\t%u\n",
>> +                  info.netns.netns_ino,
>> +                  info.netns.attach_type);
>> +}
>> +
>> +static const struct bpf_link_ops bpf_netns_link_ops = {
>> +       .release = bpf_netns_link_release,
>> +       .dealloc = bpf_netns_link_dealloc,
>> +       .update_prog = bpf_netns_link_update_prog,
>> +       .fill_link_info = bpf_netns_link_fill_info,
>> +       .show_fdinfo = bpf_netns_link_show_fdinfo,
>> +};
>> +
>>  int netns_bpf_prog_query(const union bpf_attr *attr,
>>                          union bpf_attr __user *uattr)
>>  {
>> @@ -67,6 +213,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>
>>         net = current->nsproxy->net_ns;
>>         mutex_lock(&netns_bpf_mutex);
>> +
>> +       /* Attaching prog directly is not compatible with links */
>> +       if (rcu_access_pointer(net->bpf.links[type])) {
>> +               ret = -EBUSY;
>
> EEXIST would be returned if another prog is attached. Given in this
> case attaching prog or link has same semantics (one cannot replace
> attached program, unlike for cgroups), should we keep it consistent
> and return EEXIST here as well?

Makes sense, will switch to EEXIST.

>
>> +               goto unlock;
>> +       }
>> +
>>         switch (type) {
>>         case NETNS_BPF_FLOW_DISSECTOR:
>>                 ret = flow_dissector_bpf_prog_attach(net, prog);
>> @@ -75,6 +228,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>                 ret = -EINVAL;
>>                 break;
>>         }
>> +unlock:
>>         mutex_unlock(&netns_bpf_mutex);
>>
>>         return ret;
>> @@ -85,6 +239,10 @@ static int __netns_bpf_prog_detach(struct net *net,
>>  {
>>         struct bpf_prog *attached;
>>
>> +       /* Progs attached via links cannot be detached */
>> +       if (rcu_access_pointer(net->bpf.links[type]))
>> +               return -EBUSY;
>
> This is more of -EINVAL?

No strong feelings here. The intention of returning EBUSY on
attach/detach was to signal that the attach point is in use, busy, by
another incompatible attachment.

EINVAL sends a message that I did a bad job passing argument to the
syscall. But this is just my interpretation.

>
>> +
>>         /* No need for update-side lock when net is going away. */
>>         attached = rcu_dereference_protected(net->bpf.progs[type],
>>                                              !check_net(net) ||
>> @@ -112,14 +270,109 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
>>         return ret;
>>  }
>>
>> +static int __netns_bpf_link_attach(struct net *net, struct bpf_link *link,
>> +                                  enum netns_bpf_attach_type type)
>> +{
>> +       int err;
>> +
>> +       /* Allow attaching only one prog or link for now */
>> +       if (rcu_access_pointer(net->bpf.links[type]))
>> +               return -E2BIG;
>> +       /* Links are not compatible with attaching prog directly */
>> +       if (rcu_access_pointer(net->bpf.progs[type]))
>> +               return -EBUSY;
>
> Same as above. Do we need three different error codes instead of
> consistently using EEXIST?

Ok, EEXIST seems fine to me well.

>
>
>> +
>> +       switch (type) {
>> +       case NETNS_BPF_FLOW_DISSECTOR:
>> +               err = flow_dissector_bpf_prog_attach(net, link->prog);
>> +               break;
>> +       default:
>> +               err = -EINVAL;
>> +               break;
>> +       }
>> +       if (!err)
>> +               rcu_assign_pointer(net->bpf.links[type], link);
>> +       return err;
>> +}
>> +
>
> [...]
>
>> +       err = bpf_link_prime(&net_link->link, &link_primer);
>> +       if (err) {
>> +               kfree(net_link);
>> +               goto out_put_net;
>> +       }
>> +
>> +       err = netns_bpf_link_attach(net, &net_link->link, netns_type);
>> +       if (err) {
>> +               bpf_link_cleanup(&link_primer);
>> +               goto out_put_net;
>> +       }
>> +
>> +       err = bpf_link_settle(&link_primer);
>
> This looks a bit misleading. bpf_link_settle() cannot fail and returns
> FD, but here it looks like it might return error. I think it would be
> more straightforward to just:
>
>     put_net(net);
>     return bpf_link_settle(&link_primer);
>
> out_put_net:
>     put_net(net);
>     return err;

That reads better. Will change. Thanks.


Appreciate the review. Thanks for all the feedback.

-jkbs

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment
  2020-05-28  5:59   ` Andrii Nakryiko
@ 2020-05-28 13:05     ` Jakub Sitnicki
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 13:05 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Thu, May 28, 2020 at 07:59 AM CEST, Andrii Nakryiko wrote:
> On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create
>> an FD-based kernel bpf_link, for attach types tied to network namespace,
>> that is BPF_FLOW_DISSECTOR for the moment.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  tools/lib/bpf/libbpf.c   | 20 ++++++++++++++++----
>>  tools/lib/bpf/libbpf.h   |  2 ++
>>  tools/lib/bpf/libbpf.map |  1 +
>>  3 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 5d60de6fd818..a49c1eb5db64 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -7894,8 +7894,8 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,
>>         return bpf_program__attach_iter(prog, NULL);
>>  }
>>
>> -struct bpf_link *
>> -bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
>> +static struct bpf_link *
>> +bpf_program__attach_fd(struct bpf_program *prog, int target_fd)
>>  {
>>         enum bpf_attach_type attach_type;
>>         char errmsg[STRERR_BUFSIZE];
>> @@ -7915,11 +7915,11 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
>>         link->detach = &bpf_link__detach_fd;
>>
>>         attach_type = bpf_program__get_expected_attach_type(prog);
>> -       link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
>> +       link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL);
>>         if (link_fd < 0) {
>>                 link_fd = -errno;
>>                 free(link);
>> -               pr_warn("program '%s': failed to attach to cgroup: %s\n",
>> +               pr_warn("program '%s': failed to attach to cgroup/netns: %s\n",
>
> I understand the desire to save few lines of code, but it hurts error
> reporting. Now it's cgroup/netns, tomorrow cgroup/netns/lirc/whatever.
> If you want to generalize, let's preserve clarity of error message,
> please.

Ok, that's fair. I could pass the link type bpf_program__attach_fd and
map it to a string.

>
>>                         bpf_program__title(prog, false),
>>                         libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
>>                 return ERR_PTR(link_fd);
>> @@ -7928,6 +7928,18 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
>>         return link;
>>  }
>>
>> +struct bpf_link *
>> +bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
>> +{
>> +       return bpf_program__attach_fd(prog, cgroup_fd);
>> +}
>> +
>> +struct bpf_link *
>> +bpf_program__attach_netns(struct bpf_program *prog, int netns_fd)
>> +{
>> +       return bpf_program__attach_fd(prog, netns_fd);
>> +}
>> +
>>  struct bpf_link *
>>  bpf_program__attach_iter(struct bpf_program *prog,
>>                          const struct bpf_iter_attach_opts *opts)
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index 1e2e399a5f2c..adf6fd9b6fe8 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -253,6 +253,8 @@ LIBBPF_API struct bpf_link *
>>  bpf_program__attach_lsm(struct bpf_program *prog);
>>  LIBBPF_API struct bpf_link *
>>  bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
>> +LIBBPF_API struct bpf_link *
>> +bpf_program__attach_netns(struct bpf_program *prog, int netns_fd);
>>
>>  struct bpf_map;
>>
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index 381a7342ecfc..7ad21ba1feb6 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -263,4 +263,5 @@ LIBBPF_0.0.9 {
>>                 bpf_link_get_next_id;
>>                 bpf_program__attach_iter;
>>                 perf_buffer__consume;
>> +               bpf_program__attach_netns;
>
> Please keep it alphabetical.

Will do. Not sure how I didn't pick up the convention.

>
>>  } LIBBPF_0.0.8;
>> --
>> 2.25.4
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links
  2020-05-28  6:02   ` Andrii Nakryiko
@ 2020-05-28 13:10     ` Jakub Sitnicki
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 13:10 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Thu, May 28, 2020 at 08:02 AM CEST, Andrii Nakryiko wrote:
> On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Make `bpf link show` aware of new link type, that is links attached to
>> netns. When listing netns-attached links, display netns inode number as its
>> identifier and link attach type.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  tools/bpf/bpftool/link.c | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
>> index 670a561dc31b..83a17d62c4c3 100644
>> --- a/tools/bpf/bpftool/link.c
>> +++ b/tools/bpf/bpftool/link.c
>> @@ -17,6 +17,7 @@ static const char * const link_type_name[] = {
>>         [BPF_LINK_TYPE_TRACING]                 = "tracing",
>>         [BPF_LINK_TYPE_CGROUP]                  = "cgroup",
>>         [BPF_LINK_TYPE_ITER]                    = "iter",
>> +       [BPF_LINK_TYPE_NETNS]                   = "netns",
>>  };
>>
>>  static int link_parse_fd(int *argc, char ***argv)
>> @@ -122,6 +123,16 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
>>                         jsonw_uint_field(json_wtr, "attach_type",
>>                                          info->cgroup.attach_type);
>>                 break;
>> +       case BPF_LINK_TYPE_NETNS:
>> +               jsonw_uint_field(json_wtr, "netns_ino",
>> +                                info->netns.netns_ino);
>> +               if (info->netns.attach_type < ARRAY_SIZE(attach_type_name))
>> +                       jsonw_string_field(json_wtr, "attach_type",
>> +                               attach_type_name[info->netns.attach_type]);
>> +               else
>> +                       jsonw_uint_field(json_wtr, "attach_type",
>> +                                        info->netns.attach_type);
>> +               break;
>
> Can you please extract this attach_type handling into a helper func,
> it's annoying to read so many repetitive if/elses. Same for plain-text
> variant below. Thanks!

Looks easy enough. Will be done in v2.

>
>>         default:
>>                 break;
>>         }
>> @@ -190,6 +201,14 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
>>                 else
>>                         printf("attach_type %u  ", info->cgroup.attach_type);
>>                 break;
>> +       case BPF_LINK_TYPE_NETNS:
>> +               printf("\n\tnetns_ino %u  ", info->netns.netns_ino);
>> +               if (info->netns.attach_type < ARRAY_SIZE(attach_type_name))
>> +                       printf("attach_type %s  ",
>> +                              attach_type_name[info->netns.attach_type]);
>> +               else
>> +                       printf("attach_type %u  ", info->netns.attach_type);
>> +               break;
>>         default:
>>                 break;
>>         }
>> --
>> 2.25.4
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns
  2020-05-28  6:08   ` Andrii Nakryiko
@ 2020-05-28 13:29     ` Jakub Sitnicki
  2020-05-28 18:31       ` Andrii Nakryiko
  0 siblings, 1 reply; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 13:29 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Thu, May 28, 2020 at 08:08 AM CEST, Andrii Nakryiko wrote:
> On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Extend the existing test case for flow dissector attaching to cover:
>>
>>  - link creation,
>>  - link updates,
>>  - link info querying,
>>  - mixing links with direct prog attachment.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>
> You are not using bpf_program__attach_netns() at all. Would be nice to
> actually use higher-level API here...

That's true. I didn't exercise the high-level API. I can cover that.

>
> Also... what's up with people using CHECK_FAIL + perror instead of
> CHECK? Is CHECK being avoided for some reason or people are just not
> aware of it (which is strange, because CHECK was there before
> CHECK_FAIL)?

I can only speak for myself. Funnily enough I think I've switched from
CHECK to CHECK_FAIL when I touched on BPF flow dissector last time [0].

CHECK needs and "external" duration variable to be in scope, and so it
was suggested to me that if I'm not measuring run-time with
bpf_prog_test_run, CHECK_FAIL might be a better choice.

CHECK is also perhaps too verbose because it emits a log message on
success (to report duration, I assume).

You have a better overview of all the tests than me, but if I had the
cycles I'd see if renaming CHECK to something more specific, for those
test that actually track prog run time, can work.

-jkbs


[0] https://lore.kernel.org/bpf/87imov1y5m.fsf@cloudflare.com/



>
>>  .../bpf/prog_tests/flow_dissector_reattach.c  | 500 +++++++++++++++++-
>>  1 file changed, 471 insertions(+), 29 deletions(-)
>>
>
> [...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
                     ` (3 preceding siblings ...)
  2020-05-28  5:56   ` Andrii Nakryiko
@ 2020-05-28 13:30   ` Jakub Sitnicki
  4 siblings, 0 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-05-28 13:30 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

On Wed, May 27, 2020 at 07:08 PM CEST, Jakub Sitnicki wrote:
> Add support for bpf() syscall subcommands that operate on
> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
> network namespaces (that is flow dissector at the moment).
>
> Link-based and prog-based attachment can be used interchangeably, but only
> one can be in use at a time. Attempts to attach a link when a prog is
> already attached directly, and the other way around, will be met with
> -EBUSY.
>
> Attachment of multiple links of same attach type to one netns is not
> supported, with the intention to lift it when a use-case presents
> itself. Because of that attempts to create a netns link, when one already
> exists result in -E2BIG error, signifying that there is no space left for
> another attachment.
>
> Link-based attachments to netns don't keep a netns alive by holding a ref
> to it. Instead links get auto-detached from netns when the latter is being
> destroyed by a pernet pre_exit callback.
>
> When auto-detached, link lives in defunct state as long there are open FDs
> for it. -ENOLINK is returned if a user tries to update a defunct link.
>
> Because bpf_link to netns doesn't hold a ref to struct net, special care is
> taken when releasing the link. The netns might be getting torn down when
> the release function tries to access it to detach the link.
>
> To ensure the struct net object is alive when release function accesses it
> we rely on the fact that cleanup_net(), struct net destructor, calls
> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
> pre_exit happens first, link release will not attempt to access struct net.
>
> Same applies the other way around, network namespace doesn't keep an
> attached link alive because by not holding a ref to it. Instead bpf_links
> to netns are RCU-freed, so that pernet pre_exit callback can safely access
> and auto-detach the link when racing with link release/free.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

[...]

> +static int bpf_netns_link_update_prog(struct bpf_link *link,
> +				      struct bpf_prog *new_prog,
> +				      struct bpf_prog *old_prog)
> +{
> +	struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> +	struct net *net;
> +	int ret = 0;
> +
> +	if (old_prog && old_prog != link->prog)
> +		return -EPERM;
> +	if (new_prog->type != link->prog->type)
> +		return -EINVAL;
> +
> +	mutex_lock(&netns_bpf_mutex);
> +	rcu_read_lock();
> +
> +	net = rcu_dereference(net_link->net);
> +	if (!net || !check_net(net)) {
> +		/* Link auto-detached or netns dying */
> +		ret = -ENOLINK;
> +		goto out_unlock;
> +	}
> +
> +	old_prog = xchg(&link->prog, new_prog);
> +	bpf_prog_put(old_prog);

I've noticed a bug here. I should be updating net->bpf.progs[type] here
as well. Will fix in v2.

> +
> +out_unlock:
> +	rcu_read_unlock();
> +	mutex_unlock(&netns_bpf_mutex);
> +
> +	return ret;
> +}

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-28 11:03     ` Jakub Sitnicki
@ 2020-05-28 16:09       ` sdf
  0 siblings, 0 replies; 36+ messages in thread
From: sdf @ 2020-05-28 16:09 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, netdev, kernel-team

On 05/28, Jakub Sitnicki wrote:
> On Wed, May 27, 2020 at 10:53 PM CEST, sdf@google.com wrote:
> > On 05/27, Jakub Sitnicki wrote:
[..]
> > Otherwise, those mutex_lock+rcu_read_lock are a bit confusing.

> Great idea, thanks. This is almost the same as what I was thinking
> about. The only difference being that I want to also get ref to net, so
> it doesn't go away while we continue outside of RCU read-side critical
> section.
That will also work, up to you. Thanks!

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-28 12:26     ` Jakub Sitnicki
@ 2020-05-28 18:09       ` Andrii Nakryiko
  2020-05-28 18:48         ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-28 18:09 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Thu, May 28, 2020 at 5:26 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Thu, May 28, 2020 at 07:56 AM CEST, Andrii Nakryiko wrote:
> > On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Add support for bpf() syscall subcommands that operate on
> >> bpf_link (LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO) for attach points tied to
> >> network namespaces (that is flow dissector at the moment).
> >>
> >> Link-based and prog-based attachment can be used interchangeably, but only
> >> one can be in use at a time. Attempts to attach a link when a prog is
> >> already attached directly, and the other way around, will be met with
> >> -EBUSY.
> >>
> >> Attachment of multiple links of same attach type to one netns is not
> >> supported, with the intention to lift it when a use-case presents
> >> itself. Because of that attempts to create a netns link, when one already
> >> exists result in -E2BIG error, signifying that there is no space left for
> >> another attachment.
> >>
> >> Link-based attachments to netns don't keep a netns alive by holding a ref
> >> to it. Instead links get auto-detached from netns when the latter is being
> >> destroyed by a pernet pre_exit callback.
> >>
> >> When auto-detached, link lives in defunct state as long there are open FDs
> >> for it. -ENOLINK is returned if a user tries to update a defunct link.
> >>
> >> Because bpf_link to netns doesn't hold a ref to struct net, special care is
> >> taken when releasing the link. The netns might be getting torn down when
> >> the release function tries to access it to detach the link.
> >>
> >> To ensure the struct net object is alive when release function accesses it
> >> we rely on the fact that cleanup_net(), struct net destructor, calls
> >> synchronize_rcu() after invoking pre_exit callbacks. If auto-detach from
> >> pre_exit happens first, link release will not attempt to access struct net.
> >>
> >> Same applies the other way around, network namespace doesn't keep an
> >> attached link alive because by not holding a ref to it. Instead bpf_links
> >> to netns are RCU-freed, so that pernet pre_exit callback can safely access
> >> and auto-detach the link when racing with link release/free.
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>  include/linux/bpf-netns.h      |   8 +
> >>  include/net/netns/bpf.h        |   1 +
> >>  include/uapi/linux/bpf.h       |   5 +
> >>  kernel/bpf/net_namespace.c     | 257 ++++++++++++++++++++++++++++++++-
> >>  kernel/bpf/syscall.c           |   3 +
> >>  net/core/filter.c              |   1 +
> >>  tools/include/uapi/linux/bpf.h |   5 +
> >>  7 files changed, 278 insertions(+), 2 deletions(-)
> >>
> >
> > [...]
> >
> >>  struct netns_bpf {
> >>         struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
> >> +       struct bpf_link __rcu *links[MAX_NETNS_BPF_ATTACH_TYPE];
> >>  };
> >>
> >
> > [...]
> >
> >>
> >> -/* Protects updates to netns_bpf */
> >> +struct bpf_netns_link {
> >> +       struct bpf_link link;
> >> +       enum bpf_attach_type type;
> >> +       enum netns_bpf_attach_type netns_type;
> >> +
> >> +       /* struct net is not RCU-freed but we treat it as such because
> >> +        * our pre_exit callback will NULL this pointer before
> >> +        * cleanup_net() calls synchronize_rcu().
> >> +        */
> >> +       struct net __rcu *net;
> >
> > It feels to me (see comments below), that if you use mutex
> > consistently, then this shouldn't be __rcu and you won't even need
> > rcu_read_lock() when working with this pointer, because auto_detach
> > and everything else won't be racing: if you got mutex in release() and
> > you see non-null net pointer, auto-detach either didn't happen yet, or
> > is happening at the same time, but is blocked on mutex. If you got
> > mutex and see net == NULL, ok, auto-detach succeeded before release,
> > so ignore net clean-up. Easy, no?
>
> Yes, right. Grabbing the mutex in auto_detach changes everything.
>
> I think I went about it the wrong way by trying to avoid a mutex there.
> I'll go with you advice and see if I can get rid of RCU pointers in v2.
>
> >
> >> +
> >> +       /* bpf_netns_link is RCU-freed for pre_exit callback invoked
> >> +        * by cleanup_net() to safely access the link.
> >> +        */
> >> +       struct rcu_head rcu;
> >> +};
> >> +
> >> +/* Protects updates to netns_bpf. */
> >>  DEFINE_MUTEX(netns_bpf_mutex);
> >>
> >> +static inline struct bpf_netns_link *to_bpf_netns_link(struct bpf_link *link)
> >> +{
> >> +       return container_of(link, struct bpf_netns_link, link);
> >> +}
> >> +
> >> +/* Called with RCU read lock. */
> >> +static void __net_exit
> >> +bpf_netns_link_auto_detach(struct net *net, enum netns_bpf_attach_type type)
> >> +{
> >> +       struct bpf_netns_link *net_link;
> >> +       struct bpf_link *link;
> >> +
> >> +       link = rcu_dereference(net->bpf.links[type]);
> >> +       if (!link)
> >> +               return;
> >> +       net_link = to_bpf_netns_link(link);
> >> +       RCU_INIT_POINTER(net_link->net, NULL);
> >
> > Given link attach and release is done under netns_bpf_mutex, shouldn't
> > this be done under the same mutex? You are modifying link concurrently
> > with update, but you are not synchronizing that access.
>
> After attaching the link to netns, there are no other updaters to
> net_link->net field. Link update has to modify link->prog, and update
> net->bpf.progs, but doesn't need to write to net_link->net, just read
> it.
>
> That's why I don't grab the netns_bpf_mutex on auto_detach.
>
> >
> >> +}
> >> +
> >> +static void bpf_netns_link_release(struct bpf_link *link)
> >> +{
> >> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> >> +       enum netns_bpf_attach_type type = net_link->netns_type;
> >> +       struct net *net;
> >> +
> >> +       /* Link auto-detached by dying netns. */
> >> +       if (!rcu_access_pointer(net_link->net))
> >> +               return;
> >> +
> >> +       mutex_lock(&netns_bpf_mutex);
> >> +
> >> +       /* Recheck after potential sleep. We can race with cleanup_net
> >> +        * here, but if we see a non-NULL struct net pointer pre_exit
> >> +        * and following synchronize_rcu() has not happened yet, and
> >> +        * we have until the end of grace period to access net.
> >> +        */
> >> +       rcu_read_lock();
> >> +       net = rcu_dereference(net_link->net);
> >> +       if (net) {
> >> +               RCU_INIT_POINTER(net->bpf.links[type], NULL);
> >> +               RCU_INIT_POINTER(net->bpf.progs[type], NULL);
> >
> > bpf.progs[type] is supposed to be NULL already, why setting it again here?
> >
> >> +       }
> >> +       rcu_read_unlock();
> >> +
> >> +       mutex_unlock(&netns_bpf_mutex);
> >> +}
> >> +
> >> +static void bpf_netns_link_dealloc(struct bpf_link *link)
> >> +{
> >> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> >> +
> >> +       /* Delay kfree in case we're racing with cleanup_net. */
> >> +       kfree_rcu(net_link, rcu);
> >
> > It feels to me like this RCU stuff for links is a bit overcomplicated.
> > If I understand your changes correctly (and please correct me if I'm
> > wrong), netns_bpf's progs are sort of like "effective progs" for
> > cgroup. Regardless if attachment was bpf_link-based or straight
> > bpf_prog-based, prog will always be set. link[type] would be set
> > additionally only if bpf_link-based attachment was done. And that
> > makes sense to make dissector hot path faster and simpler.
>
> Correct. netns link is heavily modelled after cgroup
> link. net->bpf.progs is like "effecttive progs". In fact, the next step
> would be to turn it into bpf_prog_array.
>
> > But if that's the case, link itself is always (except for auto-detach,
> > which I think should be fixed) accessed under mutex and doesn't
> > need/rely on rcu_read_lock() at all. So __rcu annotation for links is
> > not necessary, all the rcu dereferences for links are unnecessary, and
> > this kfree_rcu() is unnecessary. If that's not the case, please help
> > me understand why not.
>
> __rcu annotation for net->bpf.links is for auto_detach to access links
> with grace period guarantee, in its current form. If we can get around
> that, then I'm with you on that net->bpf.links doesn't need to an RCU
> protected pointer.
>
> I think you're right that grabbing netns_bpf_mutex in auto_detach will
> guarantee that bpf_link stays alive while we access it from there. We
> either see a non-NULL net->bpf.links[type] and know that release can't
> run before we finish working on it.
>
> Or we see a NULL net->bpf.links[type] and there is nothing to do.
> Thanks makes __rcu annotation and kfree_rcu not needed any more.
>
> Thanks, that is a great suggestion.
>
> >
> >> +}
> >> +
> >> +static int bpf_netns_link_update_prog(struct bpf_link *link,
> >> +                                     struct bpf_prog *new_prog,
> >> +                                     struct bpf_prog *old_prog)
> >> +{
> >> +       struct bpf_netns_link *net_link = to_bpf_netns_link(link);
> >> +       struct net *net;
> >> +       int ret = 0;
> >> +
> >> +       if (old_prog && old_prog != link->prog)
> >> +               return -EPERM;
> >> +       if (new_prog->type != link->prog->type)
> >> +               return -EINVAL;
> >> +
> >> +       mutex_lock(&netns_bpf_mutex);
> >> +       rcu_read_lock();
> >> +
> >> +       net = rcu_dereference(net_link->net);
> >> +       if (!net || !check_net(net)) {
> >> +               /* Link auto-detached or netns dying */
> >> +               ret = -ENOLINK;
> >
> > This is an interesting error code. If we are going to adopt this, we
> > should change it for similar cgroup link situation as well.
>
> Credit goes to Lorenz for suggesting a different error code than EINVAL
> for this situation so that user-space has a way to distinguish.
>
> I'm happy to patch cgroup BPF link, if you support this change.

Yeah, I guess let's do that. I like "link was severed" message for
that errno :) But for me it's way more about consistency, than any
specific error code.

>
> >
> >> +               goto out_unlock;
> >> +       }
> >> +
> >> +       old_prog = xchg(&link->prog, new_prog);
> >> +       bpf_prog_put(old_prog);
> >> +
> >> +out_unlock:
> >> +       rcu_read_unlock();
> >> +       mutex_unlock(&netns_bpf_mutex);
> >> +
> >> +       return ret;
> >> +}
> >> +
> >> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
> >> +                                   struct bpf_link_info *info)
> >> +{
> >> +       const struct bpf_netns_link *net_link;
> >> +       unsigned int inum;
> >> +       struct net *net;
> >> +
> >> +       net_link = container_of(link, struct bpf_netns_link, link);
> >
> > you use to_bpf_netns_link() in few places above, but straight
> > container_of() here. Let's do this consistently (I'd rather stick to
> > straight container_of, but that's minor).
>
> Ah, yes. That's because bpf_link has const attribute here.  I think I
> can turn to_bpf_netns_link() into a macro to get around that.

maybe just stick to container_of()? multi-layer macro just to save few
characters, why?

>
> >
> >> +
> >> +       rcu_read_lock();
> >> +       net = rcu_dereference(net_link->net);
> >> +       if (net)
> >> +               inum = net->ns.inum;
> >> +       rcu_read_unlock();
> >> +
> >> +       info->netns.netns_ino = inum;
> >> +       info->netns.attach_type = net_link->type;
> >> +       return 0;
> >> +}
> >> +
> >> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
> >> +                                      struct seq_file *seq)
> >> +{
> >> +       struct bpf_link_info info = {};
> >> +
> >> +       bpf_netns_link_fill_info(link, &info);
> >> +       seq_printf(seq,
> >> +                  "netns_ino:\t%u\n"
> >> +                  "attach_type:\t%u\n",
> >> +                  info.netns.netns_ino,
> >> +                  info.netns.attach_type);
> >> +}
> >> +
> >> +static const struct bpf_link_ops bpf_netns_link_ops = {
> >> +       .release = bpf_netns_link_release,
> >> +       .dealloc = bpf_netns_link_dealloc,
> >> +       .update_prog = bpf_netns_link_update_prog,
> >> +       .fill_link_info = bpf_netns_link_fill_info,
> >> +       .show_fdinfo = bpf_netns_link_show_fdinfo,
> >> +};
> >> +
> >>  int netns_bpf_prog_query(const union bpf_attr *attr,
> >>                          union bpf_attr __user *uattr)
> >>  {
> >> @@ -67,6 +213,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >>
> >>         net = current->nsproxy->net_ns;
> >>         mutex_lock(&netns_bpf_mutex);
> >> +
> >> +       /* Attaching prog directly is not compatible with links */
> >> +       if (rcu_access_pointer(net->bpf.links[type])) {
> >> +               ret = -EBUSY;
> >
> > EEXIST would be returned if another prog is attached. Given in this
> > case attaching prog or link has same semantics (one cannot replace
> > attached program, unlike for cgroups), should we keep it consistent
> > and return EEXIST here as well?
>
> Makes sense, will switch to EEXIST.
>
> >
> >> +               goto unlock;
> >> +       }
> >> +
> >>         switch (type) {
> >>         case NETNS_BPF_FLOW_DISSECTOR:
> >>                 ret = flow_dissector_bpf_prog_attach(net, prog);
> >> @@ -75,6 +228,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >>                 ret = -EINVAL;
> >>                 break;
> >>         }
> >> +unlock:
> >>         mutex_unlock(&netns_bpf_mutex);
> >>
> >>         return ret;
> >> @@ -85,6 +239,10 @@ static int __netns_bpf_prog_detach(struct net *net,
> >>  {
> >>         struct bpf_prog *attached;
> >>
> >> +       /* Progs attached via links cannot be detached */
> >> +       if (rcu_access_pointer(net->bpf.links[type]))
> >> +               return -EBUSY;
> >
> > This is more of -EINVAL?
>
> No strong feelings here. The intention of returning EBUSY on
> attach/detach was to signal that the attach point is in use, busy, by
> another incompatible attachment.

But we are doing detachment here. We are trying to detach something
that we didn't attach (because link can't be detached explicitly,
right?). So yeah, user clearly did something wrong here.

>
> EINVAL sends a message that I did a bad job passing argument to the
> syscall. But this is just my interpretation.
>
> >
> >> +
> >>         /* No need for update-side lock when net is going away. */
> >>         attached = rcu_dereference_protected(net->bpf.progs[type],
> >>                                              !check_net(net) ||
> >> @@ -112,14 +270,109 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
> >>         return ret;
> >>  }
> >>
> >> +static int __netns_bpf_link_attach(struct net *net, struct bpf_link *link,
> >> +                                  enum netns_bpf_attach_type type)
> >> +{
> >> +       int err;
> >> +
> >> +       /* Allow attaching only one prog or link for now */
> >> +       if (rcu_access_pointer(net->bpf.links[type]))
> >> +               return -E2BIG;
> >> +       /* Links are not compatible with attaching prog directly */
> >> +       if (rcu_access_pointer(net->bpf.progs[type]))
> >> +               return -EBUSY;
> >
> > Same as above. Do we need three different error codes instead of
> > consistently using EEXIST?
>
> Ok, EEXIST seems fine to me well.
>
> >
> >
> >> +
> >> +       switch (type) {
> >> +       case NETNS_BPF_FLOW_DISSECTOR:
> >> +               err = flow_dissector_bpf_prog_attach(net, link->prog);
> >> +               break;
> >> +       default:
> >> +               err = -EINVAL;
> >> +               break;
> >> +       }
> >> +       if (!err)
> >> +               rcu_assign_pointer(net->bpf.links[type], link);
> >> +       return err;
> >> +}
> >> +
> >
> > [...]
> >
> >> +       err = bpf_link_prime(&net_link->link, &link_primer);
> >> +       if (err) {
> >> +               kfree(net_link);
> >> +               goto out_put_net;
> >> +       }
> >> +
> >> +       err = netns_bpf_link_attach(net, &net_link->link, netns_type);
> >> +       if (err) {
> >> +               bpf_link_cleanup(&link_primer);
> >> +               goto out_put_net;
> >> +       }
> >> +
> >> +       err = bpf_link_settle(&link_primer);
> >
> > This looks a bit misleading. bpf_link_settle() cannot fail and returns
> > FD, but here it looks like it might return error. I think it would be
> > more straightforward to just:
> >
> >     put_net(net);
> >     return bpf_link_settle(&link_primer);
> >
> > out_put_net:
> >     put_net(net);
> >     return err;
>
> That reads better. Will change. Thanks.
>
>
> Appreciate the review. Thanks for all the feedback.
>

Sure, glad to help.

> -jkbs
>
> [...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns
  2020-05-28 13:29     ` Jakub Sitnicki
@ 2020-05-28 18:31       ` Andrii Nakryiko
  0 siblings, 0 replies; 36+ messages in thread
From: Andrii Nakryiko @ 2020-05-28 18:31 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Thu, May 28, 2020 at 6:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Thu, May 28, 2020 at 08:08 AM CEST, Andrii Nakryiko wrote:
> > On Wed, May 27, 2020 at 12:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Extend the existing test case for flow dissector attaching to cover:
> >>
> >>  - link creation,
> >>  - link updates,
> >>  - link info querying,
> >>  - mixing links with direct prog attachment.
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >
> > You are not using bpf_program__attach_netns() at all. Would be nice to
> > actually use higher-level API here...
>
> That's true. I didn't exercise the high-level API. I can cover that.
>
> >
> > Also... what's up with people using CHECK_FAIL + perror instead of
> > CHECK? Is CHECK being avoided for some reason or people are just not
> > aware of it (which is strange, because CHECK was there before
> > CHECK_FAIL)?
>
> I can only speak for myself. Funnily enough I think I've switched from
> CHECK to CHECK_FAIL when I touched on BPF flow dissector last time [0].
>
> CHECK needs and "external" duration variable to be in scope, and so it
> was suggested to me that if I'm not measuring run-time with
> bpf_prog_test_run, CHECK_FAIL might be a better choice.

duration is unfortunate and historical, we can eventually fix that,
but it simply didn't feel worthwhile to me. I just do `static int
duration;` at the top of the test and forget about it.

>
> CHECK is also perhaps too verbose because it emits a log message on
> success (to report duration, I assume).

I actually find that CHECK emitting message regardless of success or
failure (in verbose mode or when test fails) helps a lot to narrow
down where exactly the failure happens (it's not always obvious from
error message). So unless we are talking about 100+ element loops, I'm
all for CHECK vebosity. Again, you'll see it only when something fails
or you run tests in verbose mode.

CHECK_FAIL is the worst in that case, because it makes me blind in
terms of tracking progress of test.

>
> You have a better overview of all the tests than me, but if I had the
> cycles I'd see if renaming CHECK to something more specific, for those
> test that actually track prog run time, can work.

I'd make CHECK not depend on duration instead, and convert existing
cases that do rely on duration to something like CHECK_TIMED.

>
> -jkbs
>
>
> [0] https://lore.kernel.org/bpf/87imov1y5m.fsf@cloudflare.com/
>
>
>
> >
> >>  .../bpf/prog_tests/flow_dissector_reattach.c  | 500 +++++++++++++++++-
> >>  1 file changed, 471 insertions(+), 29 deletions(-)
> >>
> >
> > [...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-28 18:09       ` Andrii Nakryiko
@ 2020-05-28 18:48         ` Alexei Starovoitov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2020-05-28 18:48 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Jakub Sitnicki, bpf, Networking, kernel-team

On Thu, May 28, 2020 at 11:09:20AM -0700, Andrii Nakryiko wrote:
> > >> +       net = rcu_dereference(net_link->net);
> > >> +       if (!net || !check_net(net)) {
> > >> +               /* Link auto-detached or netns dying */
> > >> +               ret = -ENOLINK;
> > >
> > > This is an interesting error code. If we are going to adopt this, we
> > > should change it for similar cgroup link situation as well.
> >
> > Credit goes to Lorenz for suggesting a different error code than EINVAL
> > for this situation so that user-space has a way to distinguish.
> >
> > I'm happy to patch cgroup BPF link, if you support this change.
> 
> Yeah, I guess let's do that. I like "link was severed" message for
> that errno :) But for me it's way more about consistency, than any
> specific error code.

I like ENOLINK idea as well.
Let's fix it there and patch older kernels via separate patch.

Overall the set looks great. Looking forward to v2.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-05-28  2:54   ` kbuild test robot
@ 2020-06-04 23:38     ` Nick Desaulniers
  2020-06-05 14:41       ` Jakub Sitnicki
  0 siblings, 1 reply; 36+ messages in thread
From: Nick Desaulniers @ 2020-06-04 23:38 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, kbuild-all, clang-built-linux, Network Development,
	kernel-team, kbuild test robot

On Wed, May 27, 2020 at 8:19 PM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Jakub,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on bpf-next/master]
> [also build test WARNING on net-next/master next-20200526]
> [cannot apply to bpf/master net/master linus/master v5.7-rc7]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url:    https://github.com/0day-ci/linux/commits/Jakub-Sitnicki/Link-based-program-attachment-to-network-namespaces/20200528-011159
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: arm-randconfig-r035-20200527 (attached as .config)
> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 3393cc4cebf9969db94dc424b7a2b6195589c33b)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install arm cross compiling tool for clang build
>         # apt-get install binutils-arm-linux-gnueabi
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>
> >> kernel/bpf/net_namespace.c:130:6: warning: variable 'inum' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]

This looks legit to me. Please fix.

> if (net)
> ^~~
> kernel/bpf/net_namespace.c:134:26: note: uninitialized use occurs here
> info->netns.netns_ino = inum;
> ^~~~
> kernel/bpf/net_namespace.c:130:2: note: remove the 'if' if its condition is always true
> if (net)
> ^~~~~~~~
> kernel/bpf/net_namespace.c:123:19: note: initialize the variable 'inum' to silence this warning
> unsigned int inum;
> ^
> = 0
> 1 warning generated.
>
> vim +130 kernel/bpf/net_namespace.c
>
>    118
>    119  static int bpf_netns_link_fill_info(const struct bpf_link *link,
>    120                                      struct bpf_link_info *info)
>    121  {
>    122          const struct bpf_netns_link *net_link;
>    123          unsigned int inum;
>    124          struct net *net;
>    125
>    126          net_link = container_of(link, struct bpf_netns_link, link);
>    127
>    128          rcu_read_lock();
>    129          net = rcu_dereference(net_link->net);
>  > 130          if (net)
>    131                  inum = net->ns.inum;
>    132          rcu_read_unlock();
>    133
>    134          info->netns.netns_ino = inum;
>    135          info->netns.attach_type = net_link->type;
>    136          return 0;
>    137  }
>    138
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-linux+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/202005281031.S7IMfvFG%25lkp%40intel.com.



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace
  2020-06-04 23:38     ` Nick Desaulniers
@ 2020-06-05 14:41       ` Jakub Sitnicki
  0 siblings, 0 replies; 36+ messages in thread
From: Jakub Sitnicki @ 2020-06-05 14:41 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: bpf, kbuild-all, clang-built-linux, Network Development,
	kernel-team, kbuild test robot

On Fri, Jun 05, 2020 at 01:38 AM CEST, Nick Desaulniers wrote:
> On Wed, May 27, 2020 at 8:19 PM kbuild test robot <lkp@intel.com> wrote:
>>
>> Hi Jakub,
>>
>> I love your patch! Perhaps something to improve:
>>
>> [auto build test WARNING on bpf-next/master]
>> [also build test WARNING on net-next/master next-20200526]
>> [cannot apply to bpf/master net/master linus/master v5.7-rc7]
>> [if your patch is applied to the wrong git tree, please drop us a note to help
>> improve the system. BTW, we also suggest to use '--base' option to specify the
>> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>>
>> url:    https://github.com/0day-ci/linux/commits/Jakub-Sitnicki/Link-based-program-attachment-to-network-namespaces/20200528-011159
>> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
>> config: arm-randconfig-r035-20200527 (attached as .config)
>> compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 3393cc4cebf9969db94dc424b7a2b6195589c33b)
>> reproduce (this is a W=1 build):
>>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # install arm cross compiling tool for clang build
>>         # apt-get install binutils-arm-linux-gnueabi
>>         # save the attached .config to linux build tree
>>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kbuild test robot <lkp@intel.com>
>>
>> All warnings (new ones prefixed by >>, old ones prefixed by <<):
>>
>> >> kernel/bpf/net_namespace.c:130:6: warning: variable 'inum' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
>
> This looks legit to me. Please fix.

It is legit, or rather it was. Already fixed in v2 [0] (jump to
bpf_netns_link_fill_info).

[...]

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-06-05 14:41 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 17:08 [PATCH bpf-next 0/8] Link-based program attachment to network namespaces Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 1/8] flow_dissector: Don't grab update-side lock on prog detach from pre_exit Jakub Sitnicki
2020-05-27 17:35   ` sdf
2020-05-27 17:08 ` [PATCH bpf-next 2/8] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
2020-05-27 17:35   ` sdf
2020-05-27 17:08 ` [PATCH bpf-next 3/8] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
2020-05-27 17:40   ` sdf
2020-05-27 19:31     ` Jakub Sitnicki
2020-05-27 20:36       ` sdf
2020-05-27 17:08 ` [PATCH bpf-next 4/8] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 5/8] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
2020-05-27 17:48   ` sdf
2020-05-27 19:54     ` Jakub Sitnicki
2020-05-27 20:38       ` sdf
2020-05-28 10:34         ` Jakub Sitnicki
2020-05-27 20:53   ` sdf
2020-05-28 11:03     ` Jakub Sitnicki
2020-05-28 16:09       ` sdf
2020-05-28  2:54   ` kbuild test robot
2020-06-04 23:38     ` Nick Desaulniers
2020-06-05 14:41       ` Jakub Sitnicki
2020-05-28  5:56   ` Andrii Nakryiko
2020-05-28 12:26     ` Jakub Sitnicki
2020-05-28 18:09       ` Andrii Nakryiko
2020-05-28 18:48         ` Alexei Starovoitov
2020-05-28 13:30   ` Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 6/8] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
2020-05-28  5:59   ` Andrii Nakryiko
2020-05-28 13:05     ` Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 7/8] bpftool: Support link show for netns-attached links Jakub Sitnicki
2020-05-28  6:02   ` Andrii Nakryiko
2020-05-28 13:10     ` Jakub Sitnicki
2020-05-27 17:08 ` [PATCH bpf-next 8/8] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
2020-05-28  6:08   ` Andrii Nakryiko
2020-05-28 13:29     ` Jakub Sitnicki
2020-05-28 18:31       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).