bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces
@ 2020-05-31  8:28 Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 01/12] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Andrii Nakryiko,
	Lorenz Bauer, Marek Majkowski, Stanislav Fomichev

One of the pieces of feedback from recent review of BPF hooks for socket
lookup [0] was that new program types should use bpf_link-based
attachment.

This series introduces new bpf_link type for attaching to network
namespace. All link operations are supported. Errors returned from ops
follow cgroup example. Patch 4 description goes into error semantics.

The major change in v2 is a switch away from RCU to mutex-only
synchronization. Andrii pointed out that it is not needed, and it makes
sense to keep locking straightforward.

Also, there were a couple of bugs in update_prog and fill_info initial
implementation, one picked up by kbuild. Those are now fixed. Tests have
been extended to cover them. Full changelog below.

Series is organized as so:

Patches 1-3 prepare a space in struct net to keep state for attached BPF
programs, and massage the code in flow_dissector to make it attach type
agnostic, to finally move it under kernel/bpf/.

Patch 4, the most important one, introduces new bpf_link link type for
attaching to network namespace.

Patch 5 unifies the update error (ENOLINK) between BPF cgroup and netns.

Patches 6-8 make libbpf and bpftool aware of the new link type.

Patches 9-12 Add and extend tests to check that link low- and high-level
API for operating on links to netns works as intended.

Thanks to Alexei, Andrii, Lorenz, Marek, and Stanislav for feedback.

-jkbs

[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Stanislav Fomichev <sdf@google.com>

v1 -> v2:

- Switch to mutex-only synchronization. Don't rely on RCU grace period
  guarantee when accessing struct net from link release / update /
  fill_info, and when accessing bpf_link from pernet pre_exit
  callback. (Andrii)
- Drop patch 1, no longer needed with mutex-only synchronization.
- Don't leak uninitialized variable contents from fill_info callback
  when link is in defunct state. (kbuild)
- Make fill_info treat the link as defunct (i.e. no attached netns) when
  struct net refcount is 0, but link has not been yet auto-detached.
- Add missing BPF_LINK_TYPE define in bpf_types.h for new link type.
- Fix link update_prog callback to update the prog that will run, and
  not just the link itself.
- Return EEXIST on prog attach when link already exists, and on link
  create when prog is already attached directly. (Andrii)
- Return EINVAL on prog detach when link is attached. (Andrii)
- Fold __netns_bpf_link_attach into its only caller. (Stanislav)
- Get rid of a wrapper around container_of() (Andrii)
- Use rcu_dereference_protected instead of rcu_access_pointer on
  update-side. (Stanislav)
- Make return-on-success from netns_bpf_link_create less
  confusing. (Andrii)
- Adapt bpf_link for cgroup to return ENOLINK when updating a defunct
  link. (Andrii, Alexei)
- Order new exported symbols in libbpf.map alphabetically (Andrii)
- Keep libbpf's "failed to attach link" warning message clear as to what
  we failed to attach to (cgroup vs netns). (Andrii)
- Extract helpers for printing link attach type. (bpftool, Andrii)
- Switch flow_dissector tests to BPF skeleton and extend them to
  exercise link-based flow dissector attachment. (Andrii)
- Harden flow dissector attachment tests with prog query checks after
  prog attach/detach, or link create/update/close.
- Extend flow dissector tests to cover fill_info for defunct links.
- Rebase onto recent bpf-next

Jakub Sitnicki (12):
  flow_dissector: Pull locking up from prog attach callback
  net: Introduce netns_bpf for BPF programs attached to netns
  flow_dissector: Move out netns_bpf prog callbacks
  bpf: Add link-based BPF program attachment to network namespace
  bpf, cgroup: Return ENOLINK for auto-detached links on update
  libbpf: Add support for bpf_link-based netns attachment
  bpftool: Extract helpers for showing link attach type
  bpftool: Support link show for netns-attached links
  selftests/bpf: Add tests for attaching bpf_link to netns
  selftests/bpf, flow_dissector: Close TAP device FD after the test
  selftests/bpf: Convert test_flow_dissector to use BPF skeleton
  selftests/bpf: Extend test_flow_dissector to cover link creation

 include/linux/bpf-netns.h                     |  64 ++
 include/linux/bpf_types.h                     |   3 +
 include/linux/skbuff.h                        |  26 -
 include/net/flow_dissector.h                  |   6 +
 include/net/net_namespace.h                   |   4 +-
 include/net/netns/bpf.h                       |  18 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/cgroup.c                           |   2 +-
 kernel/bpf/net_namespace.c                    | 373 +++++++++++
 kernel/bpf/syscall.c                          |  10 +-
 net/core/flow_dissector.c                     | 124 +---
 tools/bpf/bpftool/link.c                      |  54 +-
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/lib/bpf/libbpf.c                        |  23 +-
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/flow_dissector.c | 166 +++--
 .../bpf/prog_tests/flow_dissector_reattach.c  | 588 ++++++++++++++++--
 tools/testing/selftests/bpf/progs/bpf_flow.c  |  20 +-
 20 files changed, 1248 insertions(+), 247 deletions(-)
 create mode 100644 include/linux/bpf-netns.h
 create mode 100644 include/net/netns/bpf.h
 create mode 100644 kernel/bpf/net_namespace.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 01/12] flow_dissector: Pull locking up from prog attach callback
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 02/12] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Stanislav Fomichev

Split out the part of attach callback that happens with attach/detach lock
acquired. This structures the prog attach callback in a way that opens up
doors for moving the locking out of flow_dissector and into generic
callbacks for attaching/detaching progs to netns in subsequent patches.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/flow_dissector.c | 40 +++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 5dceed467f64..ad08b51c781e 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -109,15 +109,10 @@ int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 	return 0;
 }
 
-int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-				       struct bpf_prog *prog)
+static int flow_dissector_bpf_prog_attach(struct net *net,
+					  struct bpf_prog *prog)
 {
 	struct bpf_prog *attached;
-	struct net *net;
-	int ret = 0;
-
-	net = current->nsproxy->net_ns;
-	mutex_lock(&flow_dissector_mutex);
 
 	if (net == &init_net) {
 		/* BPF flow dissector in the root namespace overrides
@@ -130,33 +125,38 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->flow_dissector_prog)) {
-				ret = -EEXIST;
-				goto out;
-			}
+			if (rcu_access_pointer(ns->flow_dissector_prog))
+				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.flow_dissector_prog)) {
-			ret = -EEXIST;
-			goto out;
-		}
+		if (rcu_access_pointer(init_net.flow_dissector_prog))
+			return -EEXIST;
 	}
 
 	attached = rcu_dereference_protected(net->flow_dissector_prog,
 					     lockdep_is_held(&flow_dissector_mutex));
-	if (attached == prog) {
+	if (attached == prog)
 		/* The same program cannot be attached twice */
-		ret = -EINVAL;
-		goto out;
-	}
+		return -EINVAL;
+
 	rcu_assign_pointer(net->flow_dissector_prog, prog);
 	if (attached)
 		bpf_prog_put(attached);
-out:
+	return 0;
+}
+
+int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
+				       struct bpf_prog *prog)
+{
+	int ret;
+
+	mutex_lock(&flow_dissector_mutex);
+	ret = flow_dissector_bpf_prog_attach(current->nsproxy->net_ns, prog);
 	mutex_unlock(&flow_dissector_mutex);
+
 	return ret;
 }
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 02/12] net: Introduce netns_bpf for BPF programs attached to netns
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 01/12] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 03/12] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Stanislav Fomichev

In order to:

 (1) attach more than one BPF program type to netns, or
 (2) support attaching BPF programs to netns with bpf_link, or
 (3) support multi-prog attach points for netns

we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.

Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.

Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.

This is similar to how it is organized for cgroup with cgroup_bpf.

Cc: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    Stanislav, this patch has changed slightly so I didn't carry over your
    Reviewed-by tag. Notice, pernet pre_exit callback now grabs the mutex.

 include/linux/bpf-netns.h   |  56 +++++++++++++++++++
 include/linux/skbuff.h      |  26 ---------
 include/net/net_namespace.h |   4 +-
 include/net/netns/bpf.h     |  17 ++++++
 kernel/bpf/syscall.c        |   7 +--
 net/core/flow_dissector.c   | 105 +++++++++++++++++++++++-------------
 6 files changed, 149 insertions(+), 66 deletions(-)
 create mode 100644 include/linux/bpf-netns.h
 create mode 100644 include/net/netns/bpf.h

diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
new file mode 100644
index 000000000000..f3aec3d79824
--- /dev/null
+++ b/include/linux/bpf-netns.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BPF_NETNS_H
+#define _BPF_NETNS_H
+
+#include <linux/mutex.h>
+#include <uapi/linux/bpf.h>
+
+enum netns_bpf_attach_type {
+	NETNS_BPF_INVALID = -1,
+	NETNS_BPF_FLOW_DISSECTOR = 0,
+	MAX_NETNS_BPF_ATTACH_TYPE
+};
+
+static inline enum netns_bpf_attach_type
+to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
+{
+	switch (attach_type) {
+	case BPF_FLOW_DISSECTOR:
+		return NETNS_BPF_FLOW_DISSECTOR;
+	default:
+		return NETNS_BPF_INVALID;
+	}
+}
+
+/* Protects updates to netns_bpf */
+extern struct mutex netns_bpf_mutex;
+
+union bpf_attr;
+struct bpf_prog;
+
+#ifdef CONFIG_NET
+int netns_bpf_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr);
+int netns_bpf_prog_attach(const union bpf_attr *attr,
+			  struct bpf_prog *prog);
+int netns_bpf_prog_detach(const union bpf_attr *attr);
+#else
+static inline int netns_bpf_prog_query(const union bpf_attr *attr,
+				       union bpf_attr __user *uattr)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netns_bpf_prog_attach(const union bpf_attr *attr,
+					struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int netns_bpf_prog_detach(const union bpf_attr *attr)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
+#endif /* _BPF_NETNS_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 531843952809..a0d5c2760103 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1283,32 +1283,6 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 			     const struct flow_dissector_key *key,
 			     unsigned int key_count);
 
-#ifdef CONFIG_NET
-int skb_flow_dissector_prog_query(const union bpf_attr *attr,
-				  union bpf_attr __user *uattr);
-int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-				       struct bpf_prog *prog);
-
-int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr);
-#else
-static inline int skb_flow_dissector_prog_query(const union bpf_attr *attr,
-						union bpf_attr __user *uattr)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-						     struct bpf_prog *prog)
-{
-	return -EOPNOTSUPP;
-}
-
-static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
-{
-	return -EOPNOTSUPP;
-}
-#endif
-
 struct bpf_flow_dissector;
 bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
 		      __be16 proto, int nhoff, int hlen, unsigned int flags);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 8e001e049497..2ee5901bec7a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -33,6 +33,7 @@
 #include <net/netns/mpls.h>
 #include <net/netns/can.h>
 #include <net/netns/xdp.h>
+#include <net/netns/bpf.h>
 #include <linux/ns_common.h>
 #include <linux/idr.h>
 #include <linux/skbuff.h>
@@ -162,7 +163,8 @@ struct net {
 #endif
 	struct net_generic __rcu	*gen;
 
-	struct bpf_prog __rcu	*flow_dissector_prog;
+	/* Used to store attached BPF programs */
+	struct netns_bpf	bpf;
 
 	/* Note : following structs are cache line aligned */
 #ifdef CONFIG_XFRM
diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
new file mode 100644
index 000000000000..a858d1c5b166
--- /dev/null
+++ b/include/net/netns/bpf.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * BPF programs attached to network namespace
+ */
+
+#ifndef __NETNS_BPF_H__
+#define __NETNS_BPF_H__
+
+#include <linux/bpf-netns.h>
+
+struct bpf_prog;
+
+struct netns_bpf {
+	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+};
+
+#endif /* __NETNS_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9de3540fa90c..5b2ae498cc3f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -27,6 +27,7 @@
 #include <uapi/linux/btf.h>
 #include <linux/bpf_lsm.h>
 #include <linux/poll.h>
+#include <linux/bpf-netns.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -2868,7 +2869,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 		ret = lirc_prog_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
-		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
+		ret = netns_bpf_prog_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
@@ -2908,7 +2909,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		if (!capable(CAP_NET_ADMIN))
 			return -EPERM;
-		return skb_flow_dissector_bpf_prog_detach(attr);
+		return netns_bpf_prog_detach(attr);
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2961,7 +2962,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
-		return skb_flow_dissector_prog_query(attr, uattr);
+		return netns_bpf_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index ad08b51c781e..91e2f5a1f8ad 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -31,8 +31,10 @@
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_labels.h>
 #endif
+#include <linux/bpf-netns.h>
 
-static DEFINE_MUTEX(flow_dissector_mutex);
+/* Protects updates to netns_bpf */
+DEFINE_MUTEX(netns_bpf_mutex);
 
 static void dissector_set_key(struct flow_dissector *flow_dissector,
 			      enum flow_dissector_key_id key_id)
@@ -70,23 +72,28 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
-int skb_flow_dissector_prog_query(const union bpf_attr *attr,
-				  union bpf_attr __user *uattr)
+int netns_bpf_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr)
 {
 	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
 	u32 prog_id, prog_cnt = 0, flags = 0;
+	enum netns_bpf_attach_type type;
 	struct bpf_prog *attached;
 	struct net *net;
 
 	if (attr->query.query_flags)
 		return -EINVAL;
 
+	type = to_netns_bpf_attach_type(attr->query.attach_type);
+	if (type < 0)
+		return -EINVAL;
+
 	net = get_net_ns_by_fd(attr->query.target_fd);
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
 	rcu_read_lock();
-	attached = rcu_dereference(net->flow_dissector_prog);
+	attached = rcu_dereference(net->bpf.progs[type]);
 	if (attached) {
 		prog_cnt = 1;
 		prog_id = attached->aux->id;
@@ -112,6 +119,7 @@ int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 static int flow_dissector_bpf_prog_attach(struct net *net,
 					  struct bpf_prog *prog)
 {
+	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
 	struct bpf_prog *attached;
 
 	if (net == &init_net) {
@@ -125,74 +133,97 @@ static int flow_dissector_bpf_prog_attach(struct net *net,
 		for_each_net(ns) {
 			if (ns == &init_net)
 				continue;
-			if (rcu_access_pointer(ns->flow_dissector_prog))
+			if (rcu_access_pointer(ns->bpf.progs[type]))
 				return -EEXIST;
 		}
 	} else {
 		/* Make sure root flow dissector is not attached
 		 * when attaching to the non-root namespace.
 		 */
-		if (rcu_access_pointer(init_net.flow_dissector_prog))
+		if (rcu_access_pointer(init_net.bpf.progs[type]))
 			return -EEXIST;
 	}
 
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
 	if (attached == prog)
 		/* The same program cannot be attached twice */
 		return -EINVAL;
 
-	rcu_assign_pointer(net->flow_dissector_prog, prog);
+	rcu_assign_pointer(net->bpf.progs[type], prog);
 	if (attached)
 		bpf_prog_put(attached);
 	return 0;
 }
 
-int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
-				       struct bpf_prog *prog)
+int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	enum netns_bpf_attach_type type;
+	struct net *net;
 	int ret;
 
-	mutex_lock(&flow_dissector_mutex);
-	ret = flow_dissector_bpf_prog_attach(current->nsproxy->net_ns, prog);
-	mutex_unlock(&flow_dissector_mutex);
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	net = current->nsproxy->net_ns;
+	mutex_lock(&netns_bpf_mutex);
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		ret = flow_dissector_bpf_prog_attach(net, prog);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	mutex_unlock(&netns_bpf_mutex);
 
 	return ret;
 }
 
-static int flow_dissector_bpf_prog_detach(struct net *net)
+/* Must be called with netns_bpf_mutex held. */
+static int __netns_bpf_prog_detach(struct net *net,
+				   enum netns_bpf_attach_type type)
 {
 	struct bpf_prog *attached;
 
-	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (!attached) {
-		mutex_unlock(&flow_dissector_mutex);
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
+	if (!attached)
 		return -ENOENT;
-	}
-	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
+	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
 	bpf_prog_put(attached);
-	mutex_unlock(&flow_dissector_mutex);
 	return 0;
 }
 
-int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
+int netns_bpf_prog_detach(const union bpf_attr *attr)
 {
-	return flow_dissector_bpf_prog_detach(current->nsproxy->net_ns);
+	enum netns_bpf_attach_type type;
+	int ret;
+
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
 }
 
-static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
+static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
-	/* We're not racing with attach/detach because there are no
-	 * references to netns left when pre_exit gets called.
-	 */
-	if (rcu_access_pointer(net->flow_dissector_prog))
-		flow_dissector_bpf_prog_detach(net);
+	enum netns_bpf_attach_type type;
+
+	mutex_lock(&netns_bpf_mutex);
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
+		__netns_bpf_prog_detach(net, type);
+	mutex_unlock(&netns_bpf_mutex);
 }
 
-static struct pernet_operations flow_dissector_pernet_ops __net_initdata = {
-	.pre_exit = flow_dissector_pernet_pre_exit,
+static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.pre_exit = netns_bpf_pernet_pre_exit,
 };
 
 /**
@@ -1030,11 +1061,13 @@ bool __skb_flow_dissect(const struct net *net,
 
 	WARN_ON_ONCE(!net);
 	if (net) {
+		enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+
 		rcu_read_lock();
-		attached = rcu_dereference(init_net.flow_dissector_prog);
+		attached = rcu_dereference(init_net.bpf.progs[type]);
 
 		if (!attached)
-			attached = rcu_dereference(net->flow_dissector_prog);
+			attached = rcu_dereference(net->bpf.progs[type]);
 
 		if (attached) {
 			struct bpf_flow_keys flow_keys;
@@ -1853,6 +1886,6 @@ static int __init init_default_flow_dissectors(void)
 				flow_keys_basic_dissector_keys,
 				ARRAY_SIZE(flow_keys_basic_dissector_keys));
 
-	return register_pernet_subsys(&flow_dissector_pernet_ops);
+	return register_pernet_subsys(&netns_bpf_pernet_ops);
 }
 core_initcall(init_default_flow_dissectors);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 03/12] flow_dissector: Move out netns_bpf prog callbacks
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 01/12] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 02/12] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Move functions to manage BPF programs attached to netns that are not
specific to flow dissector to a dedicated module named
bpf/net_namespace.c.

The set of functions will grow with the addition of bpf_link support for
netns attached programs. This patch prepares ground by creating a place
for it.

This is a code move with no functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/flow_dissector.h |   6 ++
 kernel/bpf/Makefile          |   1 +
 kernel/bpf/net_namespace.c   | 133 +++++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c    | 125 ++------------------------------
 4 files changed, 144 insertions(+), 121 deletions(-)
 create mode 100644 kernel/bpf/net_namespace.c

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 628383915827..9af143760e35 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -8,6 +8,8 @@
 #include <linux/string.h>
 #include <uapi/linux/if_ether.h>
 
+struct bpf_prog;
+struct net;
 struct sk_buff;
 
 /**
@@ -357,4 +359,8 @@ flow_dissector_init_keys(struct flow_dissector_key_control *key_control,
 	memset(key_basic, 0, sizeof(*key_basic));
 }
 
+#ifdef CONFIG_BPF_SYSCALL
+int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog);
+#endif /* CONFIG_BPF_SYSCALL */
+
 #endif
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 8fca02f64811..1131a921e1a6 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -13,6 +13,7 @@ ifeq ($(CONFIG_NET),y)
 obj-$(CONFIG_BPF_SYSCALL) += devmap.o
 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
 obj-$(CONFIG_BPF_SYSCALL) += offload.o
+obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
 endif
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
new file mode 100644
index 000000000000..b37d81450c3a
--- /dev/null
+++ b/kernel/bpf/net_namespace.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <net/net_namespace.h>
+
+/*
+ * Functions to manage BPF programs attached to netns
+ */
+
+/* Protects updates to netns_bpf */
+DEFINE_MUTEX(netns_bpf_mutex);
+
+int netns_bpf_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	u32 prog_id, prog_cnt = 0, flags = 0;
+	enum netns_bpf_attach_type type;
+	struct bpf_prog *attached;
+	struct net *net;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	type = to_netns_bpf_attach_type(attr->query.attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	net = get_net_ns_by_fd(attr->query.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	rcu_read_lock();
+	attached = rcu_dereference(net->bpf.progs[type]);
+	if (attached) {
+		prog_cnt = 1;
+		prog_id = attached->aux->id;
+	}
+	rcu_read_unlock();
+
+	put_net(net);
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	enum netns_bpf_attach_type type;
+	struct net *net;
+	int ret;
+
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	net = current->nsproxy->net_ns;
+	mutex_lock(&netns_bpf_mutex);
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		ret = flow_dissector_bpf_prog_attach(net, prog);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
+}
+
+/* Must be called with netns_bpf_mutex held. */
+static int __netns_bpf_prog_detach(struct net *net,
+				   enum netns_bpf_attach_type type)
+{
+	struct bpf_prog *attached;
+
+	attached = rcu_dereference_protected(net->bpf.progs[type],
+					     lockdep_is_held(&netns_bpf_mutex));
+	if (!attached)
+		return -ENOENT;
+	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+	bpf_prog_put(attached);
+	return 0;
+}
+
+int netns_bpf_prog_detach(const union bpf_attr *attr)
+{
+	enum netns_bpf_attach_type type;
+	int ret;
+
+	type = to_netns_bpf_attach_type(attr->attach_type);
+	if (type < 0)
+		return -EINVAL;
+
+	mutex_lock(&netns_bpf_mutex);
+	ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
+	mutex_unlock(&netns_bpf_mutex);
+
+	return ret;
+}
+
+static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
+{
+	enum netns_bpf_attach_type type;
+
+	mutex_lock(&netns_bpf_mutex);
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
+		__netns_bpf_prog_detach(net, type);
+	mutex_unlock(&netns_bpf_mutex);
+}
+
+static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
+	.pre_exit = netns_bpf_pernet_pre_exit,
+};
+
+static int __init netns_bpf_init(void)
+{
+	return register_pernet_subsys(&netns_bpf_pernet_ops);
+}
+
+subsys_initcall(netns_bpf_init);
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 91e2f5a1f8ad..9a42ac5114e3 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -33,9 +33,6 @@
 #endif
 #include <linux/bpf-netns.h>
 
-/* Protects updates to netns_bpf */
-DEFINE_MUTEX(netns_bpf_mutex);
-
 static void dissector_set_key(struct flow_dissector *flow_dissector,
 			      enum flow_dissector_key_id key_id)
 {
@@ -72,52 +69,8 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
-int netns_bpf_prog_query(const union bpf_attr *attr,
-			 union bpf_attr __user *uattr)
-{
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
-	enum netns_bpf_attach_type type;
-	struct bpf_prog *attached;
-	struct net *net;
-
-	if (attr->query.query_flags)
-		return -EINVAL;
-
-	type = to_netns_bpf_attach_type(attr->query.attach_type);
-	if (type < 0)
-		return -EINVAL;
-
-	net = get_net_ns_by_fd(attr->query.target_fd);
-	if (IS_ERR(net))
-		return PTR_ERR(net);
-
-	rcu_read_lock();
-	attached = rcu_dereference(net->bpf.progs[type]);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
-
-	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
-}
-
-static int flow_dissector_bpf_prog_attach(struct net *net,
-					  struct bpf_prog *prog)
+#ifdef CONFIG_BPF_SYSCALL
+int flow_dissector_bpf_prog_attach(struct net *net, struct bpf_prog *prog)
 {
 	enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
 	struct bpf_prog *attached;
@@ -155,76 +108,7 @@ static int flow_dissector_bpf_prog_attach(struct net *net,
 		bpf_prog_put(attached);
 	return 0;
 }
-
-int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
-{
-	enum netns_bpf_attach_type type;
-	struct net *net;
-	int ret;
-
-	type = to_netns_bpf_attach_type(attr->attach_type);
-	if (type < 0)
-		return -EINVAL;
-
-	net = current->nsproxy->net_ns;
-	mutex_lock(&netns_bpf_mutex);
-	switch (type) {
-	case NETNS_BPF_FLOW_DISSECTOR:
-		ret = flow_dissector_bpf_prog_attach(net, prog);
-		break;
-	default:
-		ret = -EINVAL;
-		break;
-	}
-	mutex_unlock(&netns_bpf_mutex);
-
-	return ret;
-}
-
-/* Must be called with netns_bpf_mutex held. */
-static int __netns_bpf_prog_detach(struct net *net,
-				   enum netns_bpf_attach_type type)
-{
-	struct bpf_prog *attached;
-
-	attached = rcu_dereference_protected(net->bpf.progs[type],
-					     lockdep_is_held(&netns_bpf_mutex));
-	if (!attached)
-		return -ENOENT;
-	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
-	bpf_prog_put(attached);
-	return 0;
-}
-
-int netns_bpf_prog_detach(const union bpf_attr *attr)
-{
-	enum netns_bpf_attach_type type;
-	int ret;
-
-	type = to_netns_bpf_attach_type(attr->attach_type);
-	if (type < 0)
-		return -EINVAL;
-
-	mutex_lock(&netns_bpf_mutex);
-	ret = __netns_bpf_prog_detach(current->nsproxy->net_ns, type);
-	mutex_unlock(&netns_bpf_mutex);
-
-	return ret;
-}
-
-static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
-{
-	enum netns_bpf_attach_type type;
-
-	mutex_lock(&netns_bpf_mutex);
-	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
-		__netns_bpf_prog_detach(net, type);
-	mutex_unlock(&netns_bpf_mutex);
-}
-
-static struct pernet_operations netns_bpf_pernet_ops __net_initdata = {
-	.pre_exit = netns_bpf_pernet_pre_exit,
-};
+#endif /* CONFIG_BPF_SYSCALL */
 
 /**
  * __skb_flow_get_ports - extract the upper layer ports and return them
@@ -1885,7 +1769,6 @@ static int __init init_default_flow_dissectors(void)
 	skb_flow_dissector_init(&flow_keys_basic_dissector,
 				flow_keys_basic_dissector_keys,
 				ARRAY_SIZE(flow_keys_basic_dissector_keys));
-
-	return register_pernet_subsys(&netns_bpf_pernet_ops);
+	return 0;
 }
 core_initcall(init_default_flow_dissectors);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 03/12] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:30   ` Andrii Nakryiko
  2020-05-31  8:28 ` [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update Jakub Sitnicki
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Extend bpf() syscall subcommands that operate on bpf_link, that is
LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
network namespaces (only flow dissector at the moment).

Link-based and prog-based attachment can be used interchangeably, but only
one can exist at a time. Attempts to attach a link when a prog is already
attached directly, and the other way around, will be met with -EEXIST.
Attempts to detach a program when link exists result in -EINVAL.

Attachment of multiple links of same attach type to one netns is not
supported with the intention to lift the restriction when a use-case
presents itself. Because of that link create returns -E2BIG when trying to
create another netns link, when one already exists.

Link-based attachments to netns don't keep a netns alive by holding a ref
to it. Instead links get auto-detached from netns when the latter is being
destroyed, using a pernet pre_exit callback.

When auto-detached, link lives in defunct state as long there are open FDs
for it. -ENOLINK is returned if a user tries to update a defunct link.

Because bpf_link to netns doesn't hold a ref to struct net, special care is
taken when releasing, updating, or filling link info. The netns might be
getting torn down when any of these link operations are in progress. That
is why auto-detach and update/release/fill_info are synchronized by the
same mutex. Also, link ops have to always check if auto-detach has not
happened yet and if netns is still alive (refcnt > 0).

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf-netns.h      |   8 ++
 include/linux/bpf_types.h      |   3 +
 include/net/netns/bpf.h        |   1 +
 include/uapi/linux/bpf.h       |   5 +
 kernel/bpf/net_namespace.c     | 244 ++++++++++++++++++++++++++++++++-
 kernel/bpf/syscall.c           |   3 +
 tools/include/uapi/linux/bpf.h |   5 +
 7 files changed, 267 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
index f3aec3d79824..4052d649f36d 100644
--- a/include/linux/bpf-netns.h
+++ b/include/linux/bpf-netns.h
@@ -34,6 +34,8 @@ int netns_bpf_prog_query(const union bpf_attr *attr,
 int netns_bpf_prog_attach(const union bpf_attr *attr,
 			  struct bpf_prog *prog);
 int netns_bpf_prog_detach(const union bpf_attr *attr);
+int netns_bpf_link_create(const union bpf_attr *attr,
+			  struct bpf_prog *prog);
 #else
 static inline int netns_bpf_prog_query(const union bpf_attr *attr,
 				       union bpf_attr __user *uattr)
@@ -51,6 +53,12 @@ static inline int netns_bpf_prog_detach(const union bpf_attr *attr)
 {
 	return -EOPNOTSUPP;
 }
+
+static inline int netns_bpf_link_create(const union bpf_attr *attr,
+					struct bpf_prog *prog)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #endif /* _BPF_NETNS_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa8e1b552acd..a18ae82a298a 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -126,3 +126,6 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
 BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
 #endif
 BPF_LINK_TYPE(BPF_LINK_TYPE_ITER, iter)
+#ifdef CONFIG_NET
+BPF_LINK_TYPE(BPF_LINK_TYPE_NETNS, netns)
+#endif
diff --git a/include/net/netns/bpf.h b/include/net/netns/bpf.h
index a858d1c5b166..a8dce2a380c8 100644
--- a/include/net/netns/bpf.h
+++ b/include/net/netns/bpf.h
@@ -12,6 +12,7 @@ struct bpf_prog;
 
 struct netns_bpf {
 	struct bpf_prog __rcu *progs[MAX_NETNS_BPF_ATTACH_TYPE];
+	struct bpf_link *links[MAX_NETNS_BPF_ATTACH_TYPE];
 };
 
 #endif /* __NETNS_BPF_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 974ca6e948e3..d73d3d6596da 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -236,6 +236,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_TRACING = 2,
 	BPF_LINK_TYPE_CGROUP = 3,
 	BPF_LINK_TYPE_ITER = 4,
+	BPF_LINK_TYPE_NETNS = 5,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -3835,6 +3836,10 @@ struct bpf_link_info {
 			__u64 cgroup_id;
 			__u32 attach_type;
 		} cgroup;
+		struct  {
+			__u32 netns_ino;
+			__u32 attach_type;
+		} netns;
 	};
 } __attribute__((aligned(8)));
 
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index b37d81450c3a..78cf061f8179 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -8,9 +8,140 @@
  * Functions to manage BPF programs attached to netns
  */
 
+struct bpf_netns_link {
+	struct bpf_link	link;
+	enum bpf_attach_type type;
+	enum netns_bpf_attach_type netns_type;
+
+	/* We don't hold a ref to net in order to auto-detach the link
+	 * when netns is going away. Instead we rely on pernet
+	 * pre_exit callback to clear this pointer. Must be accessed
+	 * with netns_bpf_mutex held.
+	 */
+	struct net *net;
+};
+
 /* Protects updates to netns_bpf */
 DEFINE_MUTEX(netns_bpf_mutex);
 
+/* Must be called with netns_bpf_mutex held. */
+static void __net_exit bpf_netns_link_auto_detach(struct bpf_link *link)
+{
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
+
+	net_link->net = NULL;
+}
+
+static void bpf_netns_link_release(struct bpf_link *link)
+{
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
+	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct net *net;
+
+	/* Link auto-detached by dying netns. */
+	if (!net_link->net)
+		return;
+
+	mutex_lock(&netns_bpf_mutex);
+
+	/* Recheck after potential sleep. We can race with cleanup_net
+	 * here, but if we see a non-NULL struct net pointer pre_exit
+	 * has not happened yet and will block on netns_bpf_mutex.
+	 */
+	net = net_link->net;
+	if (!net)
+		goto out_unlock;
+
+	net->bpf.links[type] = NULL;
+	RCU_INIT_POINTER(net->bpf.progs[type], NULL);
+
+out_unlock:
+	mutex_unlock(&netns_bpf_mutex);
+}
+
+static void bpf_netns_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
+
+	kfree(net_link);
+}
+
+static int bpf_netns_link_update_prog(struct bpf_link *link,
+				      struct bpf_prog *new_prog,
+				      struct bpf_prog *old_prog)
+{
+	struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
+	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct net *net;
+	int ret = 0;
+
+	if (old_prog && old_prog != link->prog)
+		return -EPERM;
+	if (new_prog->type != link->prog->type)
+		return -EINVAL;
+
+	mutex_lock(&netns_bpf_mutex);
+
+	net = net_link->net;
+	if (!net || !check_net(net)) {
+		/* Link auto-detached or netns dying */
+		ret = -ENOLINK;
+		goto out_unlock;
+	}
+
+	old_prog = xchg(&link->prog, new_prog);
+	rcu_assign_pointer(net->bpf.progs[type], new_prog);
+	bpf_prog_put(old_prog);
+
+out_unlock:
+	mutex_unlock(&netns_bpf_mutex);
+	return ret;
+}
+
+static int bpf_netns_link_fill_info(const struct bpf_link *link,
+				    struct bpf_link_info *info)
+{
+	const struct bpf_netns_link *net_link =
+		container_of(link, struct bpf_netns_link, link);
+	unsigned int inum = 0;
+	struct net *net;
+
+	mutex_lock(&netns_bpf_mutex);
+	net = net_link->net;
+	if (net && check_net(net))
+		inum = net->ns.inum;
+	mutex_unlock(&netns_bpf_mutex);
+
+	info->netns.netns_ino = inum;
+	info->netns.attach_type = net_link->type;
+	return 0;
+}
+
+static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
+				       struct seq_file *seq)
+{
+	struct bpf_link_info info = {};
+
+	bpf_netns_link_fill_info(link, &info);
+	seq_printf(seq,
+		   "netns_ino:\t%u\n"
+		   "attach_type:\t%u\n",
+		   info.netns.netns_ino,
+		   info.netns.attach_type);
+}
+
+static const struct bpf_link_ops bpf_netns_link_ops = {
+	.release = bpf_netns_link_release,
+	.dealloc = bpf_netns_link_dealloc,
+	.update_prog = bpf_netns_link_update_prog,
+	.fill_link_info = bpf_netns_link_fill_info,
+	.show_fdinfo = bpf_netns_link_show_fdinfo,
+};
+
 int netns_bpf_prog_query(const union bpf_attr *attr,
 			 union bpf_attr __user *uattr)
 {
@@ -67,6 +198,13 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 
 	net = current->nsproxy->net_ns;
 	mutex_lock(&netns_bpf_mutex);
+
+	/* Attaching prog directly is not compatible with links */
+	if (net->bpf.links[type]) {
+		ret = -EEXIST;
+		goto out_unlock;
+	}
+
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
 		ret = flow_dissector_bpf_prog_attach(net, prog);
@@ -75,6 +213,7 @@ int netns_bpf_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 		ret = -EINVAL;
 		break;
 	}
+out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 
 	return ret;
@@ -86,6 +225,10 @@ static int __netns_bpf_prog_detach(struct net *net,
 {
 	struct bpf_prog *attached;
 
+	/* Progs attached via links cannot be detached */
+	if (net->bpf.links[type])
+		return -EINVAL;
+
 	attached = rcu_dereference_protected(net->bpf.progs[type],
 					     lockdep_is_held(&netns_bpf_mutex));
 	if (!attached)
@@ -111,13 +254,110 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 	return ret;
 }
 
+static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
+				 enum netns_bpf_attach_type type)
+{
+	struct bpf_prog *prog;
+	int err;
+
+	mutex_lock(&netns_bpf_mutex);
+
+	/* Allow attaching only one prog or link for now */
+	if (net->bpf.links[type]) {
+		err = -E2BIG;
+		goto out_unlock;
+	}
+	/* Links are not compatible with attaching prog directly */
+	prog = rcu_dereference_protected(net->bpf.progs[type],
+					 lockdep_is_held(&netns_bpf_mutex));
+	if (prog) {
+		err = -EEXIST;
+		goto out_unlock;
+	}
+
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		err = flow_dissector_bpf_prog_attach(net, link->prog);
+		break;
+	default:
+		err = -EINVAL;
+		break;
+	}
+	if (err)
+		goto out_unlock;
+
+	net->bpf.links[type] = link;
+
+out_unlock:
+	mutex_unlock(&netns_bpf_mutex);
+	return err;
+}
+
+int netns_bpf_link_create(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	enum netns_bpf_attach_type netns_type;
+	struct bpf_link_primer link_primer;
+	struct bpf_netns_link *net_link;
+	enum bpf_attach_type type;
+	struct net *net;
+	int err;
+
+	if (attr->link_create.flags)
+		return -EINVAL;
+
+	type = attr->link_create.attach_type;
+	netns_type = to_netns_bpf_attach_type(type);
+	if (netns_type < 0)
+		return -EINVAL;
+
+	net = get_net_ns_by_fd(attr->link_create.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	net_link = kzalloc(sizeof(*net_link), GFP_USER);
+	if (!net_link) {
+		err = -ENOMEM;
+		goto out_put_net;
+	}
+	bpf_link_init(&net_link->link, BPF_LINK_TYPE_NETNS,
+		      &bpf_netns_link_ops, prog);
+	net_link->net = net;
+	net_link->type = type;
+	net_link->netns_type = netns_type;
+
+	err = bpf_link_prime(&net_link->link, &link_primer);
+	if (err) {
+		kfree(net_link);
+		goto out_put_net;
+	}
+
+	err = netns_bpf_link_attach(net, &net_link->link, netns_type);
+	if (err) {
+		bpf_link_cleanup(&link_primer);
+		goto out_put_net;
+	}
+
+	put_net(net);
+	return bpf_link_settle(&link_primer);
+
+out_put_net:
+	put_net(net);
+	return err;
+}
+
 static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 {
 	enum netns_bpf_attach_type type;
+	struct bpf_link *link;
 
 	mutex_lock(&netns_bpf_mutex);
-	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++)
-		__netns_bpf_prog_detach(net, type);
+	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
+		link = net->bpf.links[type];
+		if (link)
+			bpf_netns_link_auto_detach(link);
+		else
+			__netns_bpf_prog_detach(net, type);
+	}
 	mutex_unlock(&netns_bpf_mutex);
 }
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5b2ae498cc3f..988ec43a4f39 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3887,6 +3887,9 @@ static int link_create(union bpf_attr *attr)
 	case BPF_PROG_TYPE_TRACING:
 		ret = tracing_bpf_link_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+		ret = netns_bpf_link_create(attr, prog);
+		break;
 	default:
 		ret = -EINVAL;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 974ca6e948e3..d73d3d6596da 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -236,6 +236,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_TRACING = 2,
 	BPF_LINK_TYPE_CGROUP = 3,
 	BPF_LINK_TYPE_ITER = 4,
+	BPF_LINK_TYPE_NETNS = 5,
 
 	MAX_BPF_LINK_TYPE,
 };
@@ -3835,6 +3836,10 @@ struct bpf_link_info {
 			__u64 cgroup_id;
 			__u32 attach_type;
 		} cgroup;
+		struct  {
+			__u32 netns_ino;
+			__u32 attach_type;
+		} netns;
 	};
 } __attribute__((aligned(8)));
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (3 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:31   ` Andrii Nakryiko
  2020-05-31  8:28 ` [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Lorenz Bauer

Failure to update a bpf_link because it has been auto-detached by a dying
cgroup currently results in EINVAL error, even though the arguments passed
to bpf() syscall are not wrong.

bpf_links attaching to netns in this case will return ENOLINK, which
carries the message that the link is no longer attached to anything.

Change cgroup bpf_links to do the same to keep the uAPI errors consistent.

Fixes: 0c991ebc8c69 ("bpf: Implement bpf_prog replacement for an active bpf_cgroup_link")
Suggested-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 kernel/bpf/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5c0e964105ac..fdf7836750a3 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -595,7 +595,7 @@ static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
 	mutex_lock(&cgroup_mutex);
 	/* link might have been auto-released by dying cgroup, so fail */
 	if (!cg_link->cgroup) {
-		ret = -EINVAL;
+		ret = -ENOLINK;
 		goto out_unlock;
 	}
 	if (old_prog && link->prog != old_prog) {
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (4 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:32   ` Andrii Nakryiko
  2020-05-31  8:28 ` [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type Jakub Sitnicki
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create
an FD-based kernel bpf_link, for attach types tied to network namespace,
that is BPF_FLOW_DISSECTOR for the moment.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/lib/bpf/libbpf.c   | 23 ++++++++++++++++++-----
 tools/lib/bpf/libbpf.h   |  2 ++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 74d967619dcf..b8af2f91b7c3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7894,8 +7894,9 @@ static struct bpf_link *attach_iter(const struct bpf_sec_def *sec,
 	return bpf_program__attach_iter(prog, NULL);
 }
 
-struct bpf_link *
-bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
+static struct bpf_link *
+bpf_program__attach_fd(struct bpf_program *prog, int target_fd,
+		       const char *target_name)
 {
 	enum bpf_attach_type attach_type;
 	char errmsg[STRERR_BUFSIZE];
@@ -7915,12 +7916,12 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
 	link->detach = &bpf_link__detach_fd;
 
 	attach_type = bpf_program__get_expected_attach_type(prog);
-	link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
+	link_fd = bpf_link_create(prog_fd, target_fd, attach_type, NULL);
 	if (link_fd < 0) {
 		link_fd = -errno;
 		free(link);
-		pr_warn("program '%s': failed to attach to cgroup: %s\n",
-			bpf_program__title(prog, false),
+		pr_warn("program '%s': failed to attach to %s: %s\n",
+			bpf_program__title(prog, false), target_name,
 			libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
 		return ERR_PTR(link_fd);
 	}
@@ -7928,6 +7929,18 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
 	return link;
 }
 
+struct bpf_link *
+bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
+{
+	return bpf_program__attach_fd(prog, cgroup_fd, "cgroup");
+}
+
+struct bpf_link *
+bpf_program__attach_netns(struct bpf_program *prog, int netns_fd)
+{
+	return bpf_program__attach_fd(prog, netns_fd, "netns");
+}
+
 struct bpf_link *
 bpf_program__attach_iter(struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 8528a02d5af8..334437af3014 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -253,6 +253,8 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_lsm(struct bpf_program *prog);
 LIBBPF_API struct bpf_link *
 bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_netns(struct bpf_program *prog, int netns_fd);
 
 struct bpf_map;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c18860200abb..f732c77b7ed0 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -262,6 +262,7 @@ LIBBPF_0.0.9 {
 		bpf_link_get_fd_by_id;
 		bpf_link_get_next_id;
 		bpf_program__attach_iter;
+		bpf_program__attach_netns;
 		perf_buffer__consume;
 		ring_buffer__add;
 		ring_buffer__consume;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (5 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:35   ` Andrii Nakryiko
  2020-05-31  8:28 ` [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links Jakub Sitnicki
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team, Andrii Nakryiko

Code for printing link attach_type is duplicated in a couple of places, and
likely will be duplicated for future link types as well. Create helpers to
prevent duplication.

Suggested-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/bpf/bpftool/link.c | 44 ++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
index 670a561dc31b..1ff416eff3d7 100644
--- a/tools/bpf/bpftool/link.c
+++ b/tools/bpf/bpftool/link.c
@@ -62,6 +62,15 @@ show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
 	jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
 }
 
+static void show_link_attach_type_json(__u32 attach_type, json_writer_t *wtr)
+{
+	if (attach_type < ARRAY_SIZE(attach_type_name))
+		jsonw_string_field(wtr, "attach_type",
+				   attach_type_name[attach_type]);
+	else
+		jsonw_uint_field(wtr, "attach_type", attach_type);
+}
+
 static int get_prog_info(int prog_id, struct bpf_prog_info *info)
 {
 	__u32 len = sizeof(*info);
@@ -105,22 +114,13 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
 			jsonw_uint_field(json_wtr, "prog_type",
 					 prog_info.type);
 
-		if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
-			jsonw_string_field(json_wtr, "attach_type",
-			       attach_type_name[info->tracing.attach_type]);
-		else
-			jsonw_uint_field(json_wtr, "attach_type",
-					 info->tracing.attach_type);
+		show_link_attach_type_json(info->tracing.attach_type,
+					   json_wtr);
 		break;
 	case BPF_LINK_TYPE_CGROUP:
 		jsonw_lluint_field(json_wtr, "cgroup_id",
 				   info->cgroup.cgroup_id);
-		if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
-			jsonw_string_field(json_wtr, "attach_type",
-			       attach_type_name[info->cgroup.attach_type]);
-		else
-			jsonw_uint_field(json_wtr, "attach_type",
-					 info->cgroup.attach_type);
+		show_link_attach_type_json(info->cgroup.attach_type, json_wtr);
 		break;
 	default:
 		break;
@@ -153,6 +153,14 @@ static void show_link_header_plain(struct bpf_link_info *info)
 	printf("prog %u  ", info->prog_id);
 }
 
+static void show_link_attach_type_plain(__u32 attach_type)
+{
+	if (attach_type < ARRAY_SIZE(attach_type_name))
+		printf("attach_type %s  ", attach_type_name[attach_type]);
+	else
+		printf("attach_type %u  ", attach_type);
+}
+
 static int show_link_close_plain(int fd, struct bpf_link_info *info)
 {
 	struct bpf_prog_info prog_info;
@@ -176,19 +184,11 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
 		else
 			printf("\n\tprog_type %u  ", prog_info.type);
 
-		if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
-			printf("attach_type %s  ",
-			       attach_type_name[info->tracing.attach_type]);
-		else
-			printf("attach_type %u  ", info->tracing.attach_type);
+		show_link_attach_type_plain(info->tracing.attach_type);
 		break;
 	case BPF_LINK_TYPE_CGROUP:
 		printf("\n\tcgroup_id %zu  ", (size_t)info->cgroup.cgroup_id);
-		if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
-			printf("attach_type %s  ",
-			       attach_type_name[info->cgroup.attach_type]);
-		else
-			printf("attach_type %u  ", info->cgroup.attach_type);
+		show_link_attach_type_plain(info->cgroup.attach_type);
 		break;
 	default:
 		break;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (6 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:38   ` Andrii Nakryiko
  2020-05-31  8:28 ` [PATCH bpf-next v2 09/12] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Make `bpf link show` aware of new link type, that is links attached to
netns. When listing netns-attached links, display netns inode number as its
identifier and link attach type.

Sample session:

  # readlink /proc/self/ns/net
  net:[4026532251]
  # bpftool prog show
  357: flow_dissector  tag a04f5eef06a7f555  gpl
          loaded_at 2020-05-30T16:53:51+0200  uid 0
          xlated 16B  jited 37B  memlock 4096B
  358: flow_dissector  tag a04f5eef06a7f555  gpl
          loaded_at 2020-05-30T16:53:51+0200  uid 0
          xlated 16B  jited 37B  memlock 4096B
  # bpftool link show
  108: netns  prog 357
          netns_ino 4026532251  attach_type flow_dissector
  # bpftool link -jp show
  [{
          "id": 108,
          "type": "netns",
          "prog_id": 357,
          "netns_ino": 4026532251,
          "attach_type": "flow_dissector"
      }
  ]

  (... after netns is gone ...)

  # bpftool link show
  108: netns  prog 357
          netns_ino 0  attach_type flow_dissector
  # bpftool link -jp show
  [{
          "id": 108,
          "type": "netns",
          "prog_id": 357,
          "netns_ino": 0,
          "attach_type": "flow_dissector"
      }
  ]

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/bpf/bpftool/link.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
index 1ff416eff3d7..fca57ee8fafe 100644
--- a/tools/bpf/bpftool/link.c
+++ b/tools/bpf/bpftool/link.c
@@ -17,6 +17,7 @@ static const char * const link_type_name[] = {
 	[BPF_LINK_TYPE_TRACING]			= "tracing",
 	[BPF_LINK_TYPE_CGROUP]			= "cgroup",
 	[BPF_LINK_TYPE_ITER]			= "iter",
+	[BPF_LINK_TYPE_NETNS]			= "netns",
 };
 
 static int link_parse_fd(int *argc, char ***argv)
@@ -122,6 +123,11 @@ static int show_link_close_json(int fd, struct bpf_link_info *info)
 				   info->cgroup.cgroup_id);
 		show_link_attach_type_json(info->cgroup.attach_type, json_wtr);
 		break;
+	case BPF_LINK_TYPE_NETNS:
+		jsonw_uint_field(json_wtr, "netns_ino",
+				 info->netns.netns_ino);
+		show_link_attach_type_json(info->netns.attach_type, json_wtr);
+		break;
 	default:
 		break;
 	}
@@ -190,6 +196,10 @@ static int show_link_close_plain(int fd, struct bpf_link_info *info)
 		printf("\n\tcgroup_id %zu  ", (size_t)info->cgroup.cgroup_id);
 		show_link_attach_type_plain(info->cgroup.attach_type);
 		break;
+	case BPF_LINK_TYPE_NETNS:
+		printf("\n\tnetns_ino %u  ", info->netns.netns_ino);
+		show_link_attach_type_plain(info->netns.attach_type);
+		break;
 	default:
 		break;
 	}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 09/12] selftests/bpf: Add tests for attaching bpf_link to netns
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (7 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 10/12] selftests/bpf, flow_dissector: Close TAP device FD after the test Jakub Sitnicki
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Extend the existing test case for flow dissector attaching to cover:

 - link creation,
 - link updates,
 - link info querying,
 - mixing links with direct prog attachment.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../bpf/prog_tests/flow_dissector_reattach.c  | 588 ++++++++++++++++--
 1 file changed, 551 insertions(+), 37 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
index 1f51ba66b98b..15cb554a66d8 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector_reattach.c
@@ -11,6 +11,7 @@
 #include <fcntl.h>
 #include <sched.h>
 #include <stdbool.h>
+#include <sys/stat.h>
 #include <unistd.h>
 
 #include <linux/bpf.h>
@@ -18,21 +19,30 @@
 
 #include "test_progs.h"
 
-static bool is_attached(int netns)
+static int init_net = -1;
+
+static __u32 query_attached_prog_id(int netns)
 {
-	__u32 cnt;
+	__u32 prog_ids[1] = {};
+	__u32 prog_cnt = ARRAY_SIZE(prog_ids);
 	int err;
 
-	err = bpf_prog_query(netns, BPF_FLOW_DISSECTOR, 0, NULL, NULL, &cnt);
+	err = bpf_prog_query(netns, BPF_FLOW_DISSECTOR, 0, NULL,
+			     prog_ids, &prog_cnt);
 	if (CHECK_FAIL(err)) {
 		perror("bpf_prog_query");
-		return true; /* fail-safe */
+		return 0;
 	}
 
-	return cnt > 0;
+	return prog_cnt == 1 ? prog_ids[0] : 0;
+}
+
+static bool prog_is_attached(int netns)
+{
+	return query_attached_prog_id(netns) > 0;
 }
 
-static int load_prog(void)
+static int load_prog(enum bpf_prog_type type)
 {
 	struct bpf_insn prog[] = {
 		BPF_MOV64_IMM(BPF_REG_0, BPF_OK),
@@ -40,61 +50,566 @@ static int load_prog(void)
 	};
 	int fd;
 
-	fd = bpf_load_program(BPF_PROG_TYPE_FLOW_DISSECTOR, prog,
-			      ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
+	fd = bpf_load_program(type, prog, ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
 	if (CHECK_FAIL(fd < 0))
 		perror("bpf_load_program");
 
 	return fd;
 }
 
-static void do_flow_dissector_reattach(void)
+static __u32 query_prog_id(int prog)
 {
-	int prog_fd[2] = { -1, -1 };
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	int err;
 
-	prog_fd[0] = load_prog();
-	if (prog_fd[0] < 0)
-		return;
+	err = bpf_obj_get_info_by_fd(prog, &info, &info_len);
+	if (CHECK_FAIL(err || info_len != sizeof(info))) {
+		perror("bpf_obj_get_info_by_fd");
+		return 0;
+	}
 
-	prog_fd[1] = load_prog();
-	if (prog_fd[1] < 0)
-		goto out_close;
+	return info.id;
+}
+
+static int unshare_net(int old_net)
+{
+	int err, new_net;
 
-	err = bpf_prog_attach(prog_fd[0], 0, BPF_FLOW_DISSECTOR, 0);
+	err = unshare(CLONE_NEWNET);
 	if (CHECK_FAIL(err)) {
-		perror("bpf_prog_attach-0");
-		goto out_close;
+		perror("unshare(CLONE_NEWNET)");
+		return -1;
+	}
+	new_net = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(new_net < 0)) {
+		perror("open(/proc/self/ns/net)");
+		setns(old_net, CLONE_NEWNET);
+		return -1;
 	}
+	return new_net;
+}
+
+static void test_prog_attach_prog_attach(int netns, int prog1, int prog2)
+{
+	int err;
+
+	err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_attach(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
 
 	/* Expect success when attaching a different program */
-	err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
+	err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
 	if (CHECK_FAIL(err)) {
-		perror("bpf_prog_attach-1");
+		perror("bpf_prog_attach(prog2) #1");
 		goto out_detach;
 	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
 
 	/* Expect failure when attaching the same program twice */
-	err = bpf_prog_attach(prog_fd[1], 0, BPF_FLOW_DISSECTOR, 0);
+	err = bpf_prog_attach(prog2, 0, BPF_FLOW_DISSECTOR, 0);
 	if (CHECK_FAIL(!err || errno != EINVAL))
-		perror("bpf_prog_attach-2");
+		perror("bpf_prog_attach(prog2) #2");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
 
 out_detach:
 	err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
 	if (CHECK_FAIL(err))
 		perror("bpf_prog_detach");
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_create_link_create(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int link1, link2;
+
+	link1 = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect failure creating link when another link exists */
+	errno = 0;
+	link2 = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link2 != -1 || errno != E2BIG))
+		perror("bpf_prog_attach(prog2) expected E2BIG");
+	if (link2 != -1)
+		close(link2);
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(link1);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_prog_attach_link_create(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int err, link;
+
+	err = bpf_prog_attach(prog1, -1, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_attach(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect failure creating link when prog attached */
+	errno = 0;
+	link = bpf_link_create(prog2, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link != -1 || errno != EEXIST))
+		perror("bpf_link_create(prog2) expected EEXIST");
+	if (link != -1)
+		close(link);
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
+	if (CHECK_FAIL(err))
+		perror("bpf_prog_detach");
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_create_prog_attach(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect failure attaching prog when link exists */
+	errno = 0;
+	err = bpf_prog_attach(prog2, -1, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(!err || errno != EEXIST))
+		perror("bpf_prog_attach(prog2) expected EEXIST");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_create_prog_detach(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect failure detaching prog when link exists */
+	errno = 0;
+	err = bpf_prog_detach(-1, BPF_FLOW_DISSECTOR);
+	if (CHECK_FAIL(!err || errno != EINVAL))
+		perror("bpf_prog_detach expected EINVAL");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_prog_attach_detach_query(int netns, int prog1, int prog2)
+{
+	int err;
+
+	err = bpf_prog_attach(prog1, 0, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_attach(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	err = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_prog_detach");
+		return;
+	}
+
+	/* Expect no prog attached after successful detach */
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_create_close_query(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	int link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(link);
+	/* Expect no prog attached after closing last link FD */
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_update_no_old_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect success replacing the prog when old prog not specified */
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(err))
+		perror("bpf_link_update");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
+
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_update_replace_old_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
 
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect success F_REPLACE and old prog specified to succeed */
+	update_opts.flags = BPF_F_REPLACE;
+	update_opts.old_prog_fd = prog1;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(err))
+		perror("bpf_link_update");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog2));
+
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_update_invalid_opts(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect update to fail w/ old prog FD but w/o F_REPLACE*/
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = prog1;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EINVAL)) {
+		perror("bpf_link_update expected EINVAL");
+		goto out_close;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect update to fail on old prog FD mismatch */
+	errno = 0;
+	update_opts.flags = BPF_F_REPLACE;
+	update_opts.old_prog_fd = prog2;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EPERM)) {
+		perror("bpf_link_update expected EPERM");
+		goto out_close;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect update to fail for invalid old prog FD */
+	errno = 0;
+	update_opts.flags = BPF_F_REPLACE;
+	update_opts.old_prog_fd = -1;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EBADF)) {
+		perror("bpf_link_update expected EBADF");
+		goto out_close;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect update to fail with invalid flags */
+	errno = 0;
+	update_opts.flags = BPF_F_ALLOW_MULTI;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != EINVAL))
+		perror("bpf_link_update expected EINVAL");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+out_close:
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_update_invalid_prog(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link, prog3;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	/* Expect failure when new prog FD is not valid */
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, -1, &update_opts);
+	if (CHECK_FAIL(!err || errno != EBADF)) {
+		perror("bpf_link_update expected EINVAL");
+		goto out_close_link;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	prog3 = load_prog(BPF_PROG_TYPE_SOCKET_FILTER);
+	if (prog3 < 0)
+		goto out_close_link;
+
+	/* Expect failure when new prog FD type doesn't match */
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog3, &update_opts);
+	if (CHECK_FAIL(!err || errno != EINVAL))
+		perror("bpf_link_update expected EINVAL");
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(prog3);
+out_close_link:
+	close(link);
+	CHECK_FAIL(prog_is_attached(netns));
+}
+
+static void test_link_update_netns_gone(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	int err, link, old_net;
+
+	old_net = netns;
+	netns = unshare_net(old_net);
+	if (netns < 0)
+		return;
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		return;
+	}
+	CHECK_FAIL(query_attached_prog_id(netns) != query_prog_id(prog1));
+
+	close(netns);
+	err = setns(old_net, CLONE_NEWNET);
+	if (CHECK_FAIL(err)) {
+		perror("setns(CLONE_NEWNET)");
+		close(link);
+		return;
+	}
+
+	/* Expect failure when netns destroyed */
+	errno = 0;
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(!err || errno != ENOLINK))
+		perror("bpf_link_update");
+
+	close(link);
+}
+
+static void test_link_get_info(int netns, int prog1, int prog2)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, create_opts);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, update_opts);
+	struct bpf_link_info info = {};
+	struct stat netns_stat = {};
+	__u32 info_len, link_id;
+	int err, link, old_net;
+
+	old_net = netns;
+	netns = unshare_net(old_net);
+	if (netns < 0)
+		return;
+
+	err = fstat(netns, &netns_stat);
+	if (CHECK_FAIL(err)) {
+		perror("stat(netns)");
+		goto out_resetns;
+	}
+
+	link = bpf_link_create(prog1, netns, BPF_FLOW_DISSECTOR, &create_opts);
+	if (CHECK_FAIL(link < 0)) {
+		perror("bpf_link_create(prog1)");
+		goto out_resetns;
+	}
+
+	info_len = sizeof(info);
+	err = bpf_obj_get_info_by_fd(link, &info, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info");
+		goto out_unlink;
+	}
+	CHECK_FAIL(info_len != sizeof(info));
+
+	/* Expect link info to be sane and match prog and netns details */
+	CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
+	CHECK_FAIL(info.id == 0);
+	CHECK_FAIL(info.prog_id != query_prog_id(prog1));
+	CHECK_FAIL(info.netns.netns_ino != netns_stat.st_ino);
+	CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
+
+	update_opts.flags = 0;
+	update_opts.old_prog_fd = 0;
+	err = bpf_link_update(link, prog2, &update_opts);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_link_update(prog2)");
+		goto out_unlink;
+	}
+
+	link_id = info.id;
+	info_len = sizeof(info);
+	err = bpf_obj_get_info_by_fd(link, &info, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info");
+		goto out_unlink;
+	}
+	CHECK_FAIL(info_len != sizeof(info));
+
+	/* Expect no info change after update except in prog id */
+	CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
+	CHECK_FAIL(info.id != link_id);
+	CHECK_FAIL(info.prog_id != query_prog_id(prog2));
+	CHECK_FAIL(info.netns.netns_ino != netns_stat.st_ino);
+	CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
+
+	/* Leave netns link is attached to and close last FD to it */
+	err = setns(old_net, CLONE_NEWNET);
+	if (CHECK_FAIL(err)) {
+		perror("setns(NEWNET)");
+		goto out_unlink;
+	}
+	close(netns);
+	old_net = -1;
+	netns = -1;
+
+	info_len = sizeof(info);
+	err = bpf_obj_get_info_by_fd(link, &info, &info_len);
+	if (CHECK_FAIL(err)) {
+		perror("bpf_obj_get_info");
+		goto out_unlink;
+	}
+	CHECK_FAIL(info_len != sizeof(info));
+
+	/* Expect netns_ino to change to 0 */
+	CHECK_FAIL(info.type != BPF_LINK_TYPE_NETNS);
+	CHECK_FAIL(info.id != link_id);
+	CHECK_FAIL(info.prog_id != query_prog_id(prog2));
+	CHECK_FAIL(info.netns.netns_ino != 0);
+	CHECK_FAIL(info.netns.attach_type != BPF_FLOW_DISSECTOR);
+
+out_unlink:
+	close(link);
+out_resetns:
+	if (old_net != -1)
+		setns(old_net, CLONE_NEWNET);
+	if (netns != -1)
+		close(netns);
+}
+
+static void run_tests(int netns)
+{
+	struct test {
+		const char *test_name;
+		void (*test_func)(int netns, int prog1, int prog2);
+	} tests[] = {
+		{ "prog attach, prog attach",
+		  test_prog_attach_prog_attach },
+		{ "link create, link create",
+		  test_link_create_link_create },
+		{ "prog attach, link create",
+		  test_prog_attach_link_create },
+		{ "link create, prog attach",
+		  test_link_create_prog_attach },
+		{ "link create, prog detach",
+		  test_link_create_prog_detach },
+		{ "prog attach, detach, query",
+		  test_prog_attach_detach_query },
+		{ "link create, close, query",
+		  test_link_create_close_query },
+		{ "link update no old prog",
+		  test_link_update_no_old_prog },
+		{ "link update with replace old prog",
+		  test_link_update_replace_old_prog },
+		{ "link update invalid opts",
+		  test_link_update_invalid_opts },
+		{ "link update invalid prog",
+		  test_link_update_invalid_prog },
+		{ "link update netns gone",
+		  test_link_update_netns_gone },
+		{ "link get info",
+		  test_link_get_info },
+	};
+	int i, progs[2] = { -1, -1 };
+	char test_name[80];
+
+	for (i = 0; i < ARRAY_SIZE(progs); i++) {
+		progs[i] = load_prog(BPF_PROG_TYPE_FLOW_DISSECTOR);
+		if (progs[i] < 0)
+			goto out_close;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		snprintf(test_name, sizeof(test_name),
+			 "flow dissector %s%s",
+			 tests[i].test_name,
+			 netns == init_net ? " (init_net)" : "");
+		if (test__start_subtest(test_name))
+			tests[i].test_func(netns, progs[0], progs[1]);
+	}
 out_close:
-	close(prog_fd[1]);
-	close(prog_fd[0]);
+	for (i = 0; i < ARRAY_SIZE(progs); i++) {
+		if (progs[i] != -1)
+			CHECK_FAIL(close(progs[i]));
+	}
 }
 
 void test_flow_dissector_reattach(void)
 {
-	int init_net, self_net, err;
+	int err, new_net, saved_net;
 
-	self_net = open("/proc/self/ns/net", O_RDONLY);
-	if (CHECK_FAIL(self_net < 0)) {
+	saved_net = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(saved_net < 0)) {
 		perror("open(/proc/self/ns/net");
 		return;
 	}
@@ -111,30 +626,29 @@ void test_flow_dissector_reattach(void)
 		goto out_close;
 	}
 
-	if (is_attached(init_net)) {
+	if (prog_is_attached(init_net)) {
 		test__skip();
 		printf("Can't test with flow dissector attached to init_net\n");
 		goto out_setns;
 	}
 
 	/* First run tests in root network namespace */
-	do_flow_dissector_reattach();
+	run_tests(init_net);
 
 	/* Then repeat tests in a non-root namespace */
-	err = unshare(CLONE_NEWNET);
-	if (CHECK_FAIL(err)) {
-		perror("unshare(CLONE_NEWNET)");
+	new_net = unshare_net(init_net);
+	if (new_net < 0)
 		goto out_setns;
-	}
-	do_flow_dissector_reattach();
+	run_tests(new_net);
+	close(new_net);
 
 out_setns:
 	/* Move back to netns we started in. */
-	err = setns(self_net, CLONE_NEWNET);
+	err = setns(saved_net, CLONE_NEWNET);
 	if (CHECK_FAIL(err))
 		perror("setns(/proc/self/ns/net)");
 
 out_close:
 	close(init_net);
-	close(self_net);
+	close(saved_net);
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 10/12] selftests/bpf, flow_dissector: Close TAP device FD after the test
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (8 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 09/12] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-05-31  8:28 ` [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton Jakub Sitnicki
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

test_flow_dissector leaves a TAP device after it's finished, potentially
interfering with other tests that will run after it. Fix it by closing the
TAP descriptor on cleanup.

Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/flow_dissector.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index 2301c4d3ecec..ef5aab2f60b5 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -524,6 +524,7 @@ void test_flow_dissector(void)
 		CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
 	}
 
+	close(tap_fd);
 	bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
 	bpf_object__close(obj);
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (9 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 10/12] selftests/bpf, flow_dissector: Close TAP device FD after the test Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:42   ` Andrii Nakryiko
  2020-05-31  8:28 ` [PATCH bpf-next v2 12/12] selftests/bpf: Extend test_flow_dissector to cover link creation Jakub Sitnicki
  2020-06-01 22:26 ` [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Alexei Starovoitov
  12 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Switch flow dissector test setup from custom BPF object loader to BPF
skeleton to save boilerplate and prepare for testing higher-level API for
attaching flow dissector with bpf_link.

To avoid depending on program order in the BPF object when populating the
flow dissector PROG_ARRAY map, change the program section names to contain
the program index into the map. This follows the example set by tailcall
tests.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/prog_tests/flow_dissector.c | 50 +++++++++++++++++--
 tools/testing/selftests/bpf/progs/bpf_flow.c  | 20 ++++----
 2 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index ef5aab2f60b5..b6370c0b3b7a 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -6,6 +6,8 @@
 #include <linux/if_tun.h>
 #include <sys/uio.h>
 
+#include "bpf_flow.skel.h"
+
 #ifndef IP_MF
 #define IP_MF 0x2000
 #endif
@@ -444,17 +446,54 @@ static int ifup(const char *ifname)
 	return 0;
 }
 
+static int init_prog_array(struct bpf_object *obj, struct bpf_map *prog_array)
+{
+	int i, err, map_fd, prog_fd;
+	struct bpf_program *prog;
+	char prog_name[32];
+
+	map_fd = bpf_map__fd(prog_array);
+	if (map_fd < 0)
+		return -1;
+
+	for (i = 0; i < bpf_map__def(prog_array)->max_entries; i++) {
+		snprintf(prog_name, sizeof(prog_name), "flow_dissector/%i", i);
+
+		prog = bpf_object__find_program_by_title(obj, prog_name);
+		if (!prog)
+			return -1;
+
+		prog_fd = bpf_program__fd(prog);
+		if (prog_fd < 0)
+			return -1;
+
+		err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+		if (err)
+			return -1;
+	}
+	return 0;
+}
+
 void test_flow_dissector(void)
 {
 	int i, err, prog_fd, keys_fd = -1, tap_fd;
-	struct bpf_object *obj;
+	struct bpf_flow *skel;
 	__u32 duration = 0;
 
-	err = bpf_flow_load(&obj, "./bpf_flow.o", "flow_dissector",
-			    "jmp_table", "last_dissection", &prog_fd, &keys_fd);
-	if (CHECK_FAIL(err))
+	skel = bpf_flow__open_and_load();
+	if (CHECK(!skel, "skel", "failed to open/load skeleton\n"))
 		return;
 
+	prog_fd = bpf_program__fd(skel->progs._dissect);
+	if (CHECK(prog_fd < 0, "bpf_program__fd", "err %d\n", prog_fd))
+		goto out_destroy_skel;
+	keys_fd = bpf_map__fd(skel->maps.last_dissection);
+	if (CHECK(keys_fd < 0, "bpf_map__fd", "err %d\n", keys_fd))
+		goto out_destroy_skel;
+	err = init_prog_array(skel->obj, skel->maps.jmp_table);
+	if (CHECK(err, "init_prog_array", "err %d\n", err))
+		goto out_destroy_skel;
+
 	for (i = 0; i < ARRAY_SIZE(tests); i++) {
 		struct bpf_flow_keys flow_keys;
 		struct bpf_prog_test_run_attr tattr = {
@@ -526,5 +565,6 @@ void test_flow_dissector(void)
 
 	close(tap_fd);
 	bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
-	bpf_object__close(obj);
+out_destroy_skel:
+	bpf_flow__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
index 9941f0ba471e..de6de9221518 100644
--- a/tools/testing/selftests/bpf/progs/bpf_flow.c
+++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
@@ -20,20 +20,20 @@
 #include <bpf/bpf_endian.h>
 
 int _version SEC("version") = 1;
-#define PROG(F) SEC(#F) int bpf_func_##F
+#define PROG(F) PROG_(F, _##F)
+#define PROG_(NUM, NAME) SEC("flow_dissector/"#NUM) int bpf_func##NAME
 
 /* These are the identifiers of the BPF programs that will be used in tail
  * calls. Name is limited to 16 characters, with the terminating character and
  * bpf_func_ above, we have only 6 to work with, anything after will be cropped.
  */
-enum {
-	IP,
-	IPV6,
-	IPV6OP,	/* Destination/Hop-by-Hop Options IPv6 Extension header */
-	IPV6FR,	/* Fragmentation IPv6 Extension Header */
-	MPLS,
-	VLAN,
-};
+#define IP		0
+#define IPV6		1
+#define IPV6OP		2 /* Destination/Hop-by-Hop Options IPv6 Ext. Header */
+#define IPV6FR		3 /* Fragmentation IPv6 Extension Header */
+#define MPLS		4
+#define VLAN		5
+#define MAX_PROG	6
 
 #define IP_MF		0x2000
 #define IP_OFFSET	0x1FFF
@@ -59,7 +59,7 @@ struct frag_hdr {
 
 struct {
 	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
-	__uint(max_entries, 8);
+	__uint(max_entries, MAX_PROG);
 	__uint(key_size, sizeof(__u32));
 	__uint(value_size, sizeof(__u32));
 } jmp_table SEC(".maps");
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v2 12/12] selftests/bpf: Extend test_flow_dissector to cover link creation
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (10 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton Jakub Sitnicki
@ 2020-05-31  8:28 ` Jakub Sitnicki
  2020-06-01 22:26 ` [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Alexei Starovoitov
  12 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-05-31  8:28 UTC (permalink / raw)
  To: bpf; +Cc: netdev, kernel-team

Extend the existing flow_dissector test case to run tests once using direct
prog attachments, and then for the second time using indirect attachment
via link.

The intention is to exercises the newly added high-level API for attaching
programs to network namespace with links (bpf_program__attach_netns).

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/prog_tests/flow_dissector.c | 115 +++++++++++++-----
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
index b6370c0b3b7a..ea14e3ece812 100644
--- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
+++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
@@ -103,6 +103,7 @@ struct test {
 
 #define VLAN_HLEN	4
 
+static __u32 duration;
 struct test tests[] = {
 	{
 		.name = "ipv4",
@@ -474,11 +475,87 @@ static int init_prog_array(struct bpf_object *obj, struct bpf_map *prog_array)
 	return 0;
 }
 
+static void run_tests_skb_less(int tap_fd, struct bpf_map *keys)
+{
+	int i, err, keys_fd;
+
+	keys_fd = bpf_map__fd(keys);
+	if (CHECK(keys_fd < 0, "bpf_map__fd", "err %d\n", keys_fd))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		/* Keep in sync with 'flags' from eth_get_headlen. */
+		__u32 eth_get_headlen_flags =
+			BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
+		struct bpf_prog_test_run_attr tattr = {};
+		struct bpf_flow_keys flow_keys = {};
+		__u32 key = (__u32)(tests[i].keys.sport) << 16 |
+			    tests[i].keys.dport;
+
+		/* For skb-less case we can't pass input flags; run
+		 * only the tests that have a matching set of flags.
+		 */
+
+		if (tests[i].flags != eth_get_headlen_flags)
+			continue;
+
+		err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
+		CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
+
+		err = bpf_map_lookup_elem(keys_fd, &key, &flow_keys);
+		CHECK_ATTR(err, tests[i].name, "bpf_map_lookup_elem %d\n", err);
+
+		CHECK_ATTR(err, tests[i].name, "skb-less err %d\n", err);
+		CHECK_FLOW_KEYS(tests[i].name, flow_keys, tests[i].keys);
+
+		err = bpf_map_delete_elem(keys_fd, &key);
+		CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
+	}
+}
+
+static void test_skb_less_prog_attach(struct bpf_flow *skel, int tap_fd)
+{
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(skel->progs._dissect);
+	if (CHECK(prog_fd < 0, "bpf_program__fd", "err %d\n", prog_fd))
+		return;
+
+	err = bpf_prog_attach(prog_fd, 0, BPF_FLOW_DISSECTOR, 0);
+	if (CHECK(err, "bpf_prog_attach", "err %d errno %d\n", err, errno))
+		return;
+
+	run_tests_skb_less(tap_fd, skel->maps.last_dissection);
+
+	err = bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
+	CHECK(err, "bpf_prog_detach", "err %d errno %d\n", err, errno);
+}
+
+static void test_skb_less_link_create(struct bpf_flow *skel, int tap_fd)
+{
+	struct bpf_link *link;
+	int err, net_fd;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK(net_fd < 0, "open(/proc/self/ns/net)", "err %d\n", errno))
+		return;
+
+	link = bpf_program__attach_netns(skel->progs._dissect, net_fd);
+	if (CHECK(IS_ERR(link), "attach_netns", "err %ld\n", PTR_ERR(link)))
+		goto out_close;
+
+	run_tests_skb_less(tap_fd, skel->maps.last_dissection);
+
+	err = bpf_link__destroy(link);
+	CHECK(err, "bpf_link__destroy", "err %d\n", err);
+out_close:
+	close(net_fd);
+}
+
 void test_flow_dissector(void)
 {
 	int i, err, prog_fd, keys_fd = -1, tap_fd;
 	struct bpf_flow *skel;
-	__u32 duration = 0;
 
 	skel = bpf_flow__open_and_load();
 	if (CHECK(!skel, "skel", "failed to open/load skeleton\n"))
@@ -526,45 +603,17 @@ void test_flow_dissector(void)
 	 * via BPF map in this case.
 	 */
 
-	err = bpf_prog_attach(prog_fd, 0, BPF_FLOW_DISSECTOR, 0);
-	CHECK(err, "bpf_prog_attach", "err %d errno %d\n", err, errno);
-
 	tap_fd = create_tap("tap0");
 	CHECK(tap_fd < 0, "create_tap", "tap_fd %d errno %d\n", tap_fd, errno);
 	err = ifup("tap0");
 	CHECK(err, "ifup", "err %d errno %d\n", err, errno);
 
-	for (i = 0; i < ARRAY_SIZE(tests); i++) {
-		/* Keep in sync with 'flags' from eth_get_headlen. */
-		__u32 eth_get_headlen_flags =
-			BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
-		struct bpf_prog_test_run_attr tattr = {};
-		struct bpf_flow_keys flow_keys = {};
-		__u32 key = (__u32)(tests[i].keys.sport) << 16 |
-			    tests[i].keys.dport;
-
-		/* For skb-less case we can't pass input flags; run
-		 * only the tests that have a matching set of flags.
-		 */
-
-		if (tests[i].flags != eth_get_headlen_flags)
-			continue;
-
-		err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
-		CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
-
-		err = bpf_map_lookup_elem(keys_fd, &key, &flow_keys);
-		CHECK_ATTR(err, tests[i].name, "bpf_map_lookup_elem %d\n", err);
-
-		CHECK_ATTR(err, tests[i].name, "skb-less err %d\n", err);
-		CHECK_FLOW_KEYS(tests[i].name, flow_keys, tests[i].keys);
-
-		err = bpf_map_delete_elem(keys_fd, &key);
-		CHECK_ATTR(err, tests[i].name, "bpf_map_delete_elem %d\n", err);
-	}
+	/* Test direct prog attachment */
+	test_skb_less_prog_attach(skel, tap_fd);
+	/* Test indirect prog attachment via link */
+	test_skb_less_link_create(skel, tap_fd);
 
 	close(tap_fd);
-	bpf_prog_detach(prog_fd, BPF_FLOW_DISSECTOR);
 out_destroy_skel:
 	bpf_flow__destroy(skel);
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces
  2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
                   ` (11 preceding siblings ...)
  2020-05-31  8:28 ` [PATCH bpf-next v2 12/12] selftests/bpf: Extend test_flow_dissector to cover link creation Jakub Sitnicki
@ 2020-06-01 22:26 ` Alexei Starovoitov
  12 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2020-06-01 22:26 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Network Development, kernel-team, Andrii Nakryiko,
	Lorenz Bauer, Marek Majkowski, Stanislav Fomichev

On Sun, May 31, 2020 at 1:28 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> One of the pieces of feedback from recent review of BPF hooks for socket
> lookup [0] was that new program types should use bpf_link-based
> attachment.
>
> This series introduces new bpf_link type for attaching to network
> namespace. All link operations are supported. Errors returned from ops
> follow cgroup example. Patch 4 description goes into error semantics.
>
> The major change in v2 is a switch away from RCU to mutex-only
> synchronization. Andrii pointed out that it is not needed, and it makes
> sense to keep locking straightforward.
>
> Also, there were a couple of bugs in update_prog and fill_info initial
> implementation, one picked up by kbuild. Those are now fixed. Tests have
> been extended to cover them. Full changelog below.
>
> Series is organized as so:
>
> Patches 1-3 prepare a space in struct net to keep state for attached BPF
> programs, and massage the code in flow_dissector to make it attach type
> agnostic, to finally move it under kernel/bpf/.
>
> Patch 4, the most important one, introduces new bpf_link link type for
> attaching to network namespace.
>
> Patch 5 unifies the update error (ENOLINK) between BPF cgroup and netns.
>
> Patches 6-8 make libbpf and bpftool aware of the new link type.
>
> Patches 9-12 Add and extend tests to check that link low- and high-level
> API for operating on links to netns works as intended.
>
> Thanks to Alexei, Andrii, Lorenz, Marek, and Stanislav for feedback.
>
> -jkbs
>
> [0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/
>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Cc: Marek Majkowski <marek@cloudflare.com>
> Cc: Stanislav Fomichev <sdf@google.com>
>
> v1 -> v2:
>
> - Switch to mutex-only synchronization. Don't rely on RCU grace period
>   guarantee when accessing struct net from link release / update /
>   fill_info, and when accessing bpf_link from pernet pre_exit
>   callback. (Andrii)
> - Drop patch 1, no longer needed with mutex-only synchronization.
> - Don't leak uninitialized variable contents from fill_info callback
>   when link is in defunct state. (kbuild)
> - Make fill_info treat the link as defunct (i.e. no attached netns) when
>   struct net refcount is 0, but link has not been yet auto-detached.
> - Add missing BPF_LINK_TYPE define in bpf_types.h for new link type.
> - Fix link update_prog callback to update the prog that will run, and
>   not just the link itself.
> - Return EEXIST on prog attach when link already exists, and on link
>   create when prog is already attached directly. (Andrii)
> - Return EINVAL on prog detach when link is attached. (Andrii)
> - Fold __netns_bpf_link_attach into its only caller. (Stanislav)
> - Get rid of a wrapper around container_of() (Andrii)
> - Use rcu_dereference_protected instead of rcu_access_pointer on
>   update-side. (Stanislav)
> - Make return-on-success from netns_bpf_link_create less
>   confusing. (Andrii)
> - Adapt bpf_link for cgroup to return ENOLINK when updating a defunct
>   link. (Andrii, Alexei)
> - Order new exported symbols in libbpf.map alphabetically (Andrii)
> - Keep libbpf's "failed to attach link" warning message clear as to what
>   we failed to attach to (cgroup vs netns). (Andrii)
> - Extract helpers for printing link attach type. (bpftool, Andrii)
> - Switch flow_dissector tests to BPF skeleton and extend them to
>   exercise link-based flow dissector attachment. (Andrii)
> - Harden flow dissector attachment tests with prog query checks after
>   prog attach/detach, or link create/update/close.
> - Extend flow dissector tests to cover fill_info for defunct links.
> - Rebase onto recent bpf-next

I really like the set. Applied. Thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace
  2020-05-31  8:28 ` [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
@ 2020-06-01 22:30   ` Andrii Nakryiko
  2020-06-02  9:30     ` Jakub Sitnicki
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-01 22:30 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Extend bpf() syscall subcommands that operate on bpf_link, that is
> LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
> network namespaces (only flow dissector at the moment).
>
> Link-based and prog-based attachment can be used interchangeably, but only
> one can exist at a time. Attempts to attach a link when a prog is already
> attached directly, and the other way around, will be met with -EEXIST.
> Attempts to detach a program when link exists result in -EINVAL.
>
> Attachment of multiple links of same attach type to one netns is not
> supported with the intention to lift the restriction when a use-case
> presents itself. Because of that link create returns -E2BIG when trying to
> create another netns link, when one already exists.
>
> Link-based attachments to netns don't keep a netns alive by holding a ref
> to it. Instead links get auto-detached from netns when the latter is being
> destroyed, using a pernet pre_exit callback.
>
> When auto-detached, link lives in defunct state as long there are open FDs
> for it. -ENOLINK is returned if a user tries to update a defunct link.
>
> Because bpf_link to netns doesn't hold a ref to struct net, special care is
> taken when releasing, updating, or filling link info. The netns might be
> getting torn down when any of these link operations are in progress. That
> is why auto-detach and update/release/fill_info are synchronized by the
> same mutex. Also, link ops have to always check if auto-detach has not
> happened yet and if netns is still alive (refcnt > 0).
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf-netns.h      |   8 ++
>  include/linux/bpf_types.h      |   3 +
>  include/net/netns/bpf.h        |   1 +
>  include/uapi/linux/bpf.h       |   5 +
>  kernel/bpf/net_namespace.c     | 244 ++++++++++++++++++++++++++++++++-
>  kernel/bpf/syscall.c           |   3 +
>  tools/include/uapi/linux/bpf.h |   5 +
>  7 files changed, 267 insertions(+), 2 deletions(-)
>

[...]

> +
> +static int bpf_netns_link_update_prog(struct bpf_link *link,
> +                                     struct bpf_prog *new_prog,
> +                                     struct bpf_prog *old_prog)
> +{
> +       struct bpf_netns_link *net_link =
> +               container_of(link, struct bpf_netns_link, link);
> +       enum netns_bpf_attach_type type = net_link->netns_type;
> +       struct net *net;
> +       int ret = 0;
> +
> +       if (old_prog && old_prog != link->prog)
> +               return -EPERM;
> +       if (new_prog->type != link->prog->type)
> +               return -EINVAL;
> +
> +       mutex_lock(&netns_bpf_mutex);
> +
> +       net = net_link->net;
> +       if (!net || !check_net(net)) {

As is, this check_net() check looks very racy. Because if we do worry
about net refcnt dropping to zero, then between check_net() and
accessing net fields that can happen. So if that's a possiblity, you
should probably instead do maybe_get_net() instead.

But on the other hand, if we established that auto-detach taking a
mutex protects us from net going away, then maybe we shouldn't worry
at all about that, and thus check_net() is unnecessary and just
unnecessarily confusing everything.

I don't know enough overall net lifecycle, so I'm not sure which one
it is. But the way it is right now still looks suspicious to me.

> +               /* Link auto-detached or netns dying */
> +               ret = -ENOLINK;
> +               goto out_unlock;
> +       }
> +
> +       old_prog = xchg(&link->prog, new_prog);
> +       rcu_assign_pointer(net->bpf.progs[type], new_prog);
> +       bpf_prog_put(old_prog);
> +
> +out_unlock:
> +       mutex_unlock(&netns_bpf_mutex);
> +       return ret;
> +}
> +
> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
> +                                   struct bpf_link_info *info)
> +{
> +       const struct bpf_netns_link *net_link =
> +               container_of(link, struct bpf_netns_link, link);
> +       unsigned int inum = 0;
> +       struct net *net;
> +
> +       mutex_lock(&netns_bpf_mutex);
> +       net = net_link->net;
> +       if (net && check_net(net))
> +               inum = net->ns.inum;
> +       mutex_unlock(&netns_bpf_mutex);
> +
> +       info->netns.netns_ino = inum;
> +       info->netns.attach_type = net_link->type;
> +       return 0;
> +}
> +
> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
> +                                      struct seq_file *seq)
> +{
> +       struct bpf_link_info info = {};

initialization here is probably not necessary, as long as you access
only fields that fill_info initializes.

> +
> +       bpf_netns_link_fill_info(link, &info);
> +       seq_printf(seq,
> +                  "netns_ino:\t%u\n"
> +                  "attach_type:\t%u\n",
> +                  info.netns.netns_ino,
> +                  info.netns.attach_type);
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update
  2020-05-31  8:28 ` [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update Jakub Sitnicki
@ 2020-06-01 22:31   ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-01 22:31 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team, Lorenz Bauer

On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Failure to update a bpf_link because it has been auto-detached by a dying
> cgroup currently results in EINVAL error, even though the arguments passed
> to bpf() syscall are not wrong.
>
> bpf_links attaching to netns in this case will return ENOLINK, which
> carries the message that the link is no longer attached to anything.
>
> Change cgroup bpf_links to do the same to keep the uAPI errors consistent.
>
> Fixes: 0c991ebc8c69 ("bpf: Implement bpf_prog replacement for an active bpf_cgroup_link")
> Suggested-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

Looks good, thanks!

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  kernel/bpf/cgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 5c0e964105ac..fdf7836750a3 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -595,7 +595,7 @@ static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
>         mutex_lock(&cgroup_mutex);
>         /* link might have been auto-released by dying cgroup, so fail */
>         if (!cg_link->cgroup) {
> -               ret = -EINVAL;
> +               ret = -ENOLINK;
>                 goto out_unlock;
>         }
>         if (old_prog && link->prog != old_prog) {
> --
> 2.25.4
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment
  2020-05-31  8:28 ` [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
@ 2020-06-01 22:32   ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-01 22:32 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create

typo: nets -> netns

> an FD-based kernel bpf_link, for attach types tied to network namespace,
> that is BPF_FLOW_DISSECTOR for the moment.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

Looks good.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/lib/bpf/libbpf.c   | 23 ++++++++++++++++++-----
>  tools/lib/bpf/libbpf.h   |  2 ++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 21 insertions(+), 5 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type
  2020-05-31  8:28 ` [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type Jakub Sitnicki
@ 2020-06-01 22:35   ` Andrii Nakryiko
  2020-06-02  9:37     ` Jakub Sitnicki
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-01 22:35 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Sun, May 31, 2020 at 1:32 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Code for printing link attach_type is duplicated in a couple of places, and
> likely will be duplicated for future link types as well. Create helpers to
> prevent duplication.
>
> Suggested-by: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

LGTM, minor nit below.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/bpf/bpftool/link.c | 44 ++++++++++++++++++++--------------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
> index 670a561dc31b..1ff416eff3d7 100644
> --- a/tools/bpf/bpftool/link.c
> +++ b/tools/bpf/bpftool/link.c
> @@ -62,6 +62,15 @@ show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
>         jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
>  }
>
> +static void show_link_attach_type_json(__u32 attach_type, json_writer_t *wtr)

nit: if you look at jsonw_uint_field/jsonw_string_field, they accept
json_write_t as a first argument, because they are sort of working on
"object" json_writer_t. I think that's good and consistent. No big
deal, but if you can adjust it for consistency, it would be good.

> +{
> +       if (attach_type < ARRAY_SIZE(attach_type_name))
> +               jsonw_string_field(wtr, "attach_type",
> +                                  attach_type_name[attach_type]);
> +       else
> +               jsonw_uint_field(wtr, "attach_type", attach_type);
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links
  2020-05-31  8:28 ` [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links Jakub Sitnicki
@ 2020-06-01 22:38   ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-01 22:38 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Sun, May 31, 2020 at 1:31 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Make `bpf link show` aware of new link type, that is links attached to
> netns. When listing netns-attached links, display netns inode number as its
> identifier and link attach type.
>
> Sample session:
>
>   # readlink /proc/self/ns/net
>   net:[4026532251]
>   # bpftool prog show
>   357: flow_dissector  tag a04f5eef06a7f555  gpl
>           loaded_at 2020-05-30T16:53:51+0200  uid 0
>           xlated 16B  jited 37B  memlock 4096B
>   358: flow_dissector  tag a04f5eef06a7f555  gpl
>           loaded_at 2020-05-30T16:53:51+0200  uid 0
>           xlated 16B  jited 37B  memlock 4096B
>   # bpftool link show
>   108: netns  prog 357
>           netns_ino 4026532251  attach_type flow_dissector
>   # bpftool link -jp show
>   [{
>           "id": 108,
>           "type": "netns",
>           "prog_id": 357,
>           "netns_ino": 4026532251,
>           "attach_type": "flow_dissector"
>       }
>   ]
>
>   (... after netns is gone ...)
>
>   # bpftool link show
>   108: netns  prog 357
>           netns_ino 0  attach_type flow_dissector
>   # bpftool link -jp show
>   [{
>           "id": 108,
>           "type": "netns",
>           "prog_id": 357,
>           "netns_ino": 0,
>           "attach_type": "flow_dissector"
>       }
>   ]
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

LGTM. Not sure if bpftool allows to filter progs/maps by type, but
probably we should eventually add link type filtering ("show me only
netns links")...

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/bpf/bpftool/link.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>

[...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton
  2020-05-31  8:28 ` [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton Jakub Sitnicki
@ 2020-06-01 22:42   ` Andrii Nakryiko
  2020-06-02  9:46     ` Jakub Sitnicki
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-01 22:42 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Switch flow dissector test setup from custom BPF object loader to BPF
> skeleton to save boilerplate and prepare for testing higher-level API for
> attaching flow dissector with bpf_link.
>
> To avoid depending on program order in the BPF object when populating the
> flow dissector PROG_ARRAY map, change the program section names to contain
> the program index into the map. This follows the example set by tailcall
> tests.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  .../selftests/bpf/prog_tests/flow_dissector.c | 50 +++++++++++++++++--
>  tools/testing/selftests/bpf/progs/bpf_flow.c  | 20 ++++----
>  2 files changed, 55 insertions(+), 15 deletions(-)
>

[...]

> diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
> index 9941f0ba471e..de6de9221518 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_flow.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
> @@ -20,20 +20,20 @@
>  #include <bpf/bpf_endian.h>
>
>  int _version SEC("version") = 1;
> -#define PROG(F) SEC(#F) int bpf_func_##F
> +#define PROG(F) PROG_(F, _##F)
> +#define PROG_(NUM, NAME) SEC("flow_dissector/"#NUM) int bpf_func##NAME
>
>  /* These are the identifiers of the BPF programs that will be used in tail
>   * calls. Name is limited to 16 characters, with the terminating character and
>   * bpf_func_ above, we have only 6 to work with, anything after will be cropped.
>   */
> -enum {
> -       IP,
> -       IPV6,
> -       IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
> -       IPV6FR, /* Fragmentation IPv6 Extension Header */
> -       MPLS,
> -       VLAN,
> -};

not clear why? just add MAX_PROG after VLAN?

> +#define IP             0
> +#define IPV6           1
> +#define IPV6OP         2 /* Destination/Hop-by-Hop Options IPv6 Ext. Header */
> +#define IPV6FR         3 /* Fragmentation IPv6 Extension Header */
> +#define MPLS           4
> +#define VLAN           5
> +#define MAX_PROG       6
>
>  #define IP_MF          0x2000
>  #define IP_OFFSET      0x1FFF
> @@ -59,7 +59,7 @@ struct frag_hdr {
>
>  struct {
>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
> -       __uint(max_entries, 8);
> +       __uint(max_entries, MAX_PROG);
>         __uint(key_size, sizeof(__u32));
>         __uint(value_size, sizeof(__u32));
>  } jmp_table SEC(".maps");
> --
> 2.25.4
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace
  2020-06-01 22:30   ` Andrii Nakryiko
@ 2020-06-02  9:30     ` Jakub Sitnicki
  2020-06-02 17:38       ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-06-02  9:30 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Tue, Jun 02, 2020 at 12:30 AM CEST, Andrii Nakryiko wrote:
> On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Extend bpf() syscall subcommands that operate on bpf_link, that is
>> LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
>> network namespaces (only flow dissector at the moment).
>>
>> Link-based and prog-based attachment can be used interchangeably, but only
>> one can exist at a time. Attempts to attach a link when a prog is already
>> attached directly, and the other way around, will be met with -EEXIST.
>> Attempts to detach a program when link exists result in -EINVAL.
>>
>> Attachment of multiple links of same attach type to one netns is not
>> supported with the intention to lift the restriction when a use-case
>> presents itself. Because of that link create returns -E2BIG when trying to
>> create another netns link, when one already exists.
>>
>> Link-based attachments to netns don't keep a netns alive by holding a ref
>> to it. Instead links get auto-detached from netns when the latter is being
>> destroyed, using a pernet pre_exit callback.
>>
>> When auto-detached, link lives in defunct state as long there are open FDs
>> for it. -ENOLINK is returned if a user tries to update a defunct link.
>>
>> Because bpf_link to netns doesn't hold a ref to struct net, special care is
>> taken when releasing, updating, or filling link info. The netns might be
>> getting torn down when any of these link operations are in progress. That
>> is why auto-detach and update/release/fill_info are synchronized by the
>> same mutex. Also, link ops have to always check if auto-detach has not
>> happened yet and if netns is still alive (refcnt > 0).
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/linux/bpf-netns.h      |   8 ++
>>  include/linux/bpf_types.h      |   3 +
>>  include/net/netns/bpf.h        |   1 +
>>  include/uapi/linux/bpf.h       |   5 +
>>  kernel/bpf/net_namespace.c     | 244 ++++++++++++++++++++++++++++++++-
>>  kernel/bpf/syscall.c           |   3 +
>>  tools/include/uapi/linux/bpf.h |   5 +
>>  7 files changed, 267 insertions(+), 2 deletions(-)
>>
>
> [...]
>
>> +
>> +static int bpf_netns_link_update_prog(struct bpf_link *link,
>> +                                     struct bpf_prog *new_prog,
>> +                                     struct bpf_prog *old_prog)
>> +{
>> +       struct bpf_netns_link *net_link =
>> +               container_of(link, struct bpf_netns_link, link);
>> +       enum netns_bpf_attach_type type = net_link->netns_type;
>> +       struct net *net;
>> +       int ret = 0;
>> +
>> +       if (old_prog && old_prog != link->prog)
>> +               return -EPERM;
>> +       if (new_prog->type != link->prog->type)
>> +               return -EINVAL;
>> +
>> +       mutex_lock(&netns_bpf_mutex);
>> +
>> +       net = net_link->net;
>> +       if (!net || !check_net(net)) {
>
> As is, this check_net() check looks very racy. Because if we do worry
> about net refcnt dropping to zero, then between check_net() and
> accessing net fields that can happen. So if that's a possiblity, you
> should probably instead do maybe_get_net() instead.
>
> But on the other hand, if we established that auto-detach taking a
> mutex protects us from net going away, then maybe we shouldn't worry
> at all about that, and thus check_net() is unnecessary and just
> unnecessarily confusing everything.
>
> I don't know enough overall net lifecycle, so I'm not sure which one
> it is. But the way it is right now still looks suspicious to me.

The story behind the additional "!check_net(net)" test (in update_prog
and in fill_info) is that without it, user-space will see the link as
defunct only after some delay from the moment last ref to netns is gone
(for instance, last process left the netns).

That is because netns is being destroyed from a workqueue [0].

This unpredictable delay makes testing the uAPI harder, and I expect
using it as well. Since we can do better and check the refcnt to detect
"early" that netns is no good any more, that's what we do.

At least that was my thinking here. I was hoping the comment below the
check would be enough. Didn't mean to cause confusion.

So there is no race, the locking scheme you suggested holds.

[0] https://elixir.bootlin.com/linux/latest/source/net/core/net_namespace.c#L644

>
>> +               /* Link auto-detached or netns dying */
>> +               ret = -ENOLINK;
>> +               goto out_unlock;
>> +       }
>> +
>> +       old_prog = xchg(&link->prog, new_prog);
>> +       rcu_assign_pointer(net->bpf.progs[type], new_prog);
>> +       bpf_prog_put(old_prog);
>> +
>> +out_unlock:
>> +       mutex_unlock(&netns_bpf_mutex);
>> +       return ret;
>> +}
>> +
>> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
>> +                                   struct bpf_link_info *info)
>> +{
>> +       const struct bpf_netns_link *net_link =
>> +               container_of(link, struct bpf_netns_link, link);
>> +       unsigned int inum = 0;
>> +       struct net *net;
>> +
>> +       mutex_lock(&netns_bpf_mutex);
>> +       net = net_link->net;
>> +       if (net && check_net(net))
>> +               inum = net->ns.inum;
>> +       mutex_unlock(&netns_bpf_mutex);
>> +
>> +       info->netns.netns_ino = inum;
>> +       info->netns.attach_type = net_link->type;
>> +       return 0;
>> +}
>> +
>> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
>> +                                      struct seq_file *seq)
>> +{
>> +       struct bpf_link_info info = {};
>
> initialization here is probably not necessary, as long as you access
> only fields that fill_info initializes.

True. I figured that better safe than leaking stack contents, if
bpf_netns_link_fill_info gets broken.

>
>> +
>> +       bpf_netns_link_fill_info(link, &info);
>> +       seq_printf(seq,
>> +                  "netns_ino:\t%u\n"
>> +                  "attach_type:\t%u\n",
>> +                  info.netns.netns_ino,
>> +                  info.netns.attach_type);
>> +}
>> +
>
> [...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type
  2020-06-01 22:35   ` Andrii Nakryiko
@ 2020-06-02  9:37     ` Jakub Sitnicki
  2020-06-02 17:39       ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Sitnicki @ 2020-06-02  9:37 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Tue, Jun 02, 2020 at 12:35 AM CEST, Andrii Nakryiko wrote:
> On Sun, May 31, 2020 at 1:32 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Code for printing link attach_type is duplicated in a couple of places, and
>> likely will be duplicated for future link types as well. Create helpers to
>> prevent duplication.
>>
>> Suggested-by: Andrii Nakryiko <andriin@fb.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>
> LGTM, minor nit below.
>
> Acked-by: Andrii Nakryiko <andriin@fb.com>
>
>>  tools/bpf/bpftool/link.c | 44 ++++++++++++++++++++--------------------
>>  1 file changed, 22 insertions(+), 22 deletions(-)
>>
>> diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
>> index 670a561dc31b..1ff416eff3d7 100644
>> --- a/tools/bpf/bpftool/link.c
>> +++ b/tools/bpf/bpftool/link.c
>> @@ -62,6 +62,15 @@ show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
>>         jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
>>  }
>>
>> +static void show_link_attach_type_json(__u32 attach_type, json_writer_t *wtr)
>
> nit: if you look at jsonw_uint_field/jsonw_string_field, they accept
> json_write_t as a first argument, because they are sort of working on
> "object" json_writer_t. I think that's good and consistent. No big
> deal, but if you can adjust it for consistency, it would be good.

I followed show_link_header_json example here. I'm guessing the
intention was to keep show_link_header_json and show_link_header_plain
consistent, as the former takes an extra arg (wtr).

>
>> +{
>> +       if (attach_type < ARRAY_SIZE(attach_type_name))
>> +               jsonw_string_field(wtr, "attach_type",
>> +                                  attach_type_name[attach_type]);
>> +       else
>> +               jsonw_uint_field(wtr, "attach_type", attach_type);
>> +}
>> +
>
> [...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton
  2020-06-01 22:42   ` Andrii Nakryiko
@ 2020-06-02  9:46     ` Jakub Sitnicki
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Sitnicki @ 2020-06-02  9:46 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Networking, kernel-team

On Tue, Jun 02, 2020 at 12:42 AM CEST, Andrii Nakryiko wrote:
> On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Switch flow dissector test setup from custom BPF object loader to BPF
>> skeleton to save boilerplate and prepare for testing higher-level API for
>> attaching flow dissector with bpf_link.
>>
>> To avoid depending on program order in the BPF object when populating the
>> flow dissector PROG_ARRAY map, change the program section names to contain
>> the program index into the map. This follows the example set by tailcall
>> tests.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  .../selftests/bpf/prog_tests/flow_dissector.c | 50 +++++++++++++++++--
>>  tools/testing/selftests/bpf/progs/bpf_flow.c  | 20 ++++----
>>  2 files changed, 55 insertions(+), 15 deletions(-)
>>
>
> [...]
>
>> diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
>> index 9941f0ba471e..de6de9221518 100644
>> --- a/tools/testing/selftests/bpf/progs/bpf_flow.c
>> +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
>> @@ -20,20 +20,20 @@
>>  #include <bpf/bpf_endian.h>
>>
>>  int _version SEC("version") = 1;
>> -#define PROG(F) SEC(#F) int bpf_func_##F
>> +#define PROG(F) PROG_(F, _##F)
>> +#define PROG_(NUM, NAME) SEC("flow_dissector/"#NUM) int bpf_func##NAME
>>
>>  /* These are the identifiers of the BPF programs that will be used in tail
>>   * calls. Name is limited to 16 characters, with the terminating character and
>>   * bpf_func_ above, we have only 6 to work with, anything after will be cropped.
>>   */
>> -enum {
>> -       IP,
>> -       IPV6,
>> -       IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
>> -       IPV6FR, /* Fragmentation IPv6 Extension Header */
>> -       MPLS,
>> -       VLAN,
>> -};
>
> not clear why? just add MAX_PROG after VLAN?

I wanted to change section names to:

  "flow_dissector/0"
  "flow_dissector/1"
  ...

while keeping the corresponding function names as:

  bpf_func_IP
  bpf_func_IPV6
  ...

For that I needed the preprocessor to know the value of the constant.

>
>> +#define IP             0
>> +#define IPV6           1
>> +#define IPV6OP         2 /* Destination/Hop-by-Hop Options IPv6 Ext. Header */
>> +#define IPV6FR         3 /* Fragmentation IPv6 Extension Header */
>> +#define MPLS           4
>> +#define VLAN           5
>> +#define MAX_PROG       6
>>
>>  #define IP_MF          0x2000
>>  #define IP_OFFSET      0x1FFF
>> @@ -59,7 +59,7 @@ struct frag_hdr {
>>
>>  struct {
>>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>> -       __uint(max_entries, 8);
>> +       __uint(max_entries, MAX_PROG);
>>         __uint(key_size, sizeof(__u32));
>>         __uint(value_size, sizeof(__u32));
>>  } jmp_table SEC(".maps");
>> --
>> 2.25.4
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace
  2020-06-02  9:30     ` Jakub Sitnicki
@ 2020-06-02 17:38       ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-02 17:38 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team

On Tue, Jun 2, 2020 at 2:30 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Tue, Jun 02, 2020 at 12:30 AM CEST, Andrii Nakryiko wrote:
> > On Sun, May 31, 2020 at 1:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Extend bpf() syscall subcommands that operate on bpf_link, that is
> >> LINK_CREATE, LINK_UPDATE, OBJ_GET_INFO, to accept attach types tied to
> >> network namespaces (only flow dissector at the moment).
> >>
> >> Link-based and prog-based attachment can be used interchangeably, but only
> >> one can exist at a time. Attempts to attach a link when a prog is already
> >> attached directly, and the other way around, will be met with -EEXIST.
> >> Attempts to detach a program when link exists result in -EINVAL.
> >>
> >> Attachment of multiple links of same attach type to one netns is not
> >> supported with the intention to lift the restriction when a use-case
> >> presents itself. Because of that link create returns -E2BIG when trying to
> >> create another netns link, when one already exists.
> >>
> >> Link-based attachments to netns don't keep a netns alive by holding a ref
> >> to it. Instead links get auto-detached from netns when the latter is being
> >> destroyed, using a pernet pre_exit callback.
> >>
> >> When auto-detached, link lives in defunct state as long there are open FDs
> >> for it. -ENOLINK is returned if a user tries to update a defunct link.
> >>
> >> Because bpf_link to netns doesn't hold a ref to struct net, special care is
> >> taken when releasing, updating, or filling link info. The netns might be
> >> getting torn down when any of these link operations are in progress. That
> >> is why auto-detach and update/release/fill_info are synchronized by the
> >> same mutex. Also, link ops have to always check if auto-detach has not
> >> happened yet and if netns is still alive (refcnt > 0).
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>  include/linux/bpf-netns.h      |   8 ++
> >>  include/linux/bpf_types.h      |   3 +
> >>  include/net/netns/bpf.h        |   1 +
> >>  include/uapi/linux/bpf.h       |   5 +
> >>  kernel/bpf/net_namespace.c     | 244 ++++++++++++++++++++++++++++++++-
> >>  kernel/bpf/syscall.c           |   3 +
> >>  tools/include/uapi/linux/bpf.h |   5 +
> >>  7 files changed, 267 insertions(+), 2 deletions(-)
> >>
> >
> > [...]
> >
> >> +
> >> +static int bpf_netns_link_update_prog(struct bpf_link *link,
> >> +                                     struct bpf_prog *new_prog,
> >> +                                     struct bpf_prog *old_prog)
> >> +{
> >> +       struct bpf_netns_link *net_link =
> >> +               container_of(link, struct bpf_netns_link, link);
> >> +       enum netns_bpf_attach_type type = net_link->netns_type;
> >> +       struct net *net;
> >> +       int ret = 0;
> >> +
> >> +       if (old_prog && old_prog != link->prog)
> >> +               return -EPERM;
> >> +       if (new_prog->type != link->prog->type)
> >> +               return -EINVAL;
> >> +
> >> +       mutex_lock(&netns_bpf_mutex);
> >> +
> >> +       net = net_link->net;
> >> +       if (!net || !check_net(net)) {
> >
> > As is, this check_net() check looks very racy. Because if we do worry
> > about net refcnt dropping to zero, then between check_net() and
> > accessing net fields that can happen. So if that's a possiblity, you
> > should probably instead do maybe_get_net() instead.
> >
> > But on the other hand, if we established that auto-detach taking a
> > mutex protects us from net going away, then maybe we shouldn't worry
> > at all about that, and thus check_net() is unnecessary and just
> > unnecessarily confusing everything.
> >
> > I don't know enough overall net lifecycle, so I'm not sure which one
> > it is. But the way it is right now still looks suspicious to me.
>
> The story behind the additional "!check_net(net)" test (in update_prog
> and in fill_info) is that without it, user-space will see the link as
> defunct only after some delay from the moment last ref to netns is gone
> (for instance, last process left the netns).
>
> That is because netns is being destroyed from a workqueue [0].

Ok, so it's not strictly necessary in terms of correctness, but just
nicer behaviro and it's still safe to access net, even if its refcnt
drops to zero meanwhile. Ok, thanks for elaborating.

>
> This unpredictable delay makes testing the uAPI harder, and I expect
> using it as well. Since we can do better and check the refcnt to detect
> "early" that netns is no good any more, that's what we do.
>
> At least that was my thinking here. I was hoping the comment below the
> check would be enough. Didn't mean to cause confusion.
>
> So there is no race, the locking scheme you suggested holds.
>
> [0] https://elixir.bootlin.com/linux/latest/source/net/core/net_namespace.c#L644
>
> >
> >> +               /* Link auto-detached or netns dying */
> >> +               ret = -ENOLINK;
> >> +               goto out_unlock;
> >> +       }
> >> +
> >> +       old_prog = xchg(&link->prog, new_prog);
> >> +       rcu_assign_pointer(net->bpf.progs[type], new_prog);
> >> +       bpf_prog_put(old_prog);
> >> +
> >> +out_unlock:
> >> +       mutex_unlock(&netns_bpf_mutex);
> >> +       return ret;
> >> +}
> >> +
> >> +static int bpf_netns_link_fill_info(const struct bpf_link *link,
> >> +                                   struct bpf_link_info *info)
> >> +{
> >> +       const struct bpf_netns_link *net_link =
> >> +               container_of(link, struct bpf_netns_link, link);
> >> +       unsigned int inum = 0;
> >> +       struct net *net;
> >> +
> >> +       mutex_lock(&netns_bpf_mutex);
> >> +       net = net_link->net;
> >> +       if (net && check_net(net))
> >> +               inum = net->ns.inum;
> >> +       mutex_unlock(&netns_bpf_mutex);
> >> +
> >> +       info->netns.netns_ino = inum;
> >> +       info->netns.attach_type = net_link->type;
> >> +       return 0;
> >> +}
> >> +
> >> +static void bpf_netns_link_show_fdinfo(const struct bpf_link *link,
> >> +                                      struct seq_file *seq)
> >> +{
> >> +       struct bpf_link_info info = {};
> >
> > initialization here is probably not necessary, as long as you access
> > only fields that fill_info initializes.
>
> True. I figured that better safe than leaking stack contents, if
> bpf_netns_link_fill_info gets broken.
>
> >
> >> +
> >> +       bpf_netns_link_fill_info(link, &info);
> >> +       seq_printf(seq,
> >> +                  "netns_ino:\t%u\n"
> >> +                  "attach_type:\t%u\n",
> >> +                  info.netns.netns_ino,
> >> +                  info.netns.attach_type);
> >> +}
> >> +
> >
> > [...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type
  2020-06-02  9:37     ` Jakub Sitnicki
@ 2020-06-02 17:39       ` Andrii Nakryiko
  0 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-06-02 17:39 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf, Networking, kernel-team, Andrii Nakryiko

On Tue, Jun 2, 2020 at 2:37 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Tue, Jun 02, 2020 at 12:35 AM CEST, Andrii Nakryiko wrote:
> > On Sun, May 31, 2020 at 1:32 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Code for printing link attach_type is duplicated in a couple of places, and
> >> likely will be duplicated for future link types as well. Create helpers to
> >> prevent duplication.
> >>
> >> Suggested-by: Andrii Nakryiko <andriin@fb.com>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >
> > LGTM, minor nit below.
> >
> > Acked-by: Andrii Nakryiko <andriin@fb.com>
> >
> >>  tools/bpf/bpftool/link.c | 44 ++++++++++++++++++++--------------------
> >>  1 file changed, 22 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
> >> index 670a561dc31b..1ff416eff3d7 100644
> >> --- a/tools/bpf/bpftool/link.c
> >> +++ b/tools/bpf/bpftool/link.c
> >> @@ -62,6 +62,15 @@ show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
> >>         jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
> >>  }
> >>
> >> +static void show_link_attach_type_json(__u32 attach_type, json_writer_t *wtr)
> >
> > nit: if you look at jsonw_uint_field/jsonw_string_field, they accept
> > json_write_t as a first argument, because they are sort of working on
> > "object" json_writer_t. I think that's good and consistent. No big
> > deal, but if you can adjust it for consistency, it would be good.
>
> I followed show_link_header_json example here. I'm guessing the
> intention was to keep show_link_header_json and show_link_header_plain
> consistent, as the former takes an extra arg (wtr).

It's fine, it's a minor point, even though this order feels backwards to me :)

>
> >
> >> +{
> >> +       if (attach_type < ARRAY_SIZE(attach_type_name))
> >> +               jsonw_string_field(wtr, "attach_type",
> >> +                                  attach_type_name[attach_type]);
> >> +       else
> >> +               jsonw_uint_field(wtr, "attach_type", attach_type);
> >> +}
> >> +
> >
> > [...]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-06-02 17:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-31  8:28 [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 01/12] flow_dissector: Pull locking up from prog attach callback Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 02/12] net: Introduce netns_bpf for BPF programs attached to netns Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 03/12] flow_dissector: Move out netns_bpf prog callbacks Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 04/12] bpf: Add link-based BPF program attachment to network namespace Jakub Sitnicki
2020-06-01 22:30   ` Andrii Nakryiko
2020-06-02  9:30     ` Jakub Sitnicki
2020-06-02 17:38       ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 05/12] bpf, cgroup: Return ENOLINK for auto-detached links on update Jakub Sitnicki
2020-06-01 22:31   ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 06/12] libbpf: Add support for bpf_link-based netns attachment Jakub Sitnicki
2020-06-01 22:32   ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 07/12] bpftool: Extract helpers for showing link attach type Jakub Sitnicki
2020-06-01 22:35   ` Andrii Nakryiko
2020-06-02  9:37     ` Jakub Sitnicki
2020-06-02 17:39       ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 08/12] bpftool: Support link show for netns-attached links Jakub Sitnicki
2020-06-01 22:38   ` Andrii Nakryiko
2020-05-31  8:28 ` [PATCH bpf-next v2 09/12] selftests/bpf: Add tests for attaching bpf_link to netns Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 10/12] selftests/bpf, flow_dissector: Close TAP device FD after the test Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 11/12] selftests/bpf: Convert test_flow_dissector to use BPF skeleton Jakub Sitnicki
2020-06-01 22:42   ` Andrii Nakryiko
2020-06-02  9:46     ` Jakub Sitnicki
2020-05-31  8:28 ` [PATCH bpf-next v2 12/12] selftests/bpf: Extend test_flow_dissector to cover link creation Jakub Sitnicki
2020-06-01 22:26 ` [PATCH bpf-next v2 00/12] Link-based program attachment to network namespaces Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).