All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v5] bpf: add helper to compare network namespaces
@ 2017-02-16  1:29 David Ahern
  2017-02-16  3:24 ` Eric W. Biederman
  2017-02-16 10:08 ` Daniel Borkmann
  0 siblings, 2 replies; 8+ messages in thread
From: David Ahern @ 2017-02-16  1:29 UTC (permalink / raw)
  To: netdev, davem; +Cc: ast, daniel, tj, luto, ebiederm, David Ahern

In cases where bpf programs are looking at sockets and packets
that belong to different netns, it could be useful to compare the
network namespace of the socket or packet

Introduce bpf_sk_netns_cmp and bpf_skb_netns_cmp helpers to compare
network namespace of the socket or skb to the namespace parameters
in a prorgam.

For example to disallow raw sockets in all non-init netns
the bpf_type_cgroup_sock program can do:
if (sk->type == SOCK_RAW && !bpf_sk_netns_cmp(sk, 0x3, 0xf0000075))
  return 0;

where 0x3 and 0xf0000075 are the st_dev and st_ino of /proc/pid/ns/net.

Note that all bpf programs types are global. The same socket filter
program can be attached to sockets in different netns,
just like cls_bpf can see ingress/egress packets of multiple
net_devices in different netns. The cgroup_bpf programs are
the most exposed to sockets and devices across netns,
but the need to identify netns applies to all.
For example, if bpf_type_cgroup_skb didn't exist the system wide
monitoring daemon could have used ld_preload mechanism and
attached the same program to see traffic from applications
across netns. Therefore make bpf_sk_netns_cmp() helper available
to all network related bpf program types.

For socket, cls_bpf and cgroup_skb programs this helper
can be considered a new feature, whereas for cgroup_sock
programs that modify sk->bound_dev_if (like 'ip vrf' does)
it's a bug fix, since 'ip vrf' needs to be netns aware.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v2->v3: build bot complained. s/static/static inline/. no other changes.
v3->v4: fixed fallthrough case. Thanks Daniel.
v4->v5: dsa converted netns_id as a u64 to netns_cmp with individual dev
        and inode number. Updated samples test for sock bind.

 fs/nsfs.c                      |   7 ++
 include/linux/proc_ns.h        |   2 +
 include/net/net_namespace.h    |  11 +++
 include/uapi/linux/bpf.h       |  11 ++-
 net/core/filter.c              |  50 ++++++++++++-
 samples/bpf/bpf_helpers.h      |   2 +
 samples/bpf/test_cgrp2_sock.c  | 156 +++++++++++++++++++++++++++++++++++++----
 samples/bpf/test_cgrp2_sock.sh |  18 ++---
 8 files changed, 233 insertions(+), 24 deletions(-)

diff --git a/fs/nsfs.c b/fs/nsfs.c
index 8c9fb29c6673..c335f513d467 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -49,6 +49,13 @@ static void nsfs_evict(struct inode *inode)
 	ns->ops->put(ns);
 }
 
+int ns_cmp(struct ns_common *ns, u64 dev, u64 ino)
+{
+	u64 ns_dev = new_encode_dev(nsfs_mnt->mnt_sb->s_dev);
+
+	return dev == ns_dev && ino == ns->inum;
+}
+
 static void *__ns_get_path(struct path *path, struct ns_common *ns)
 {
 	struct vfsmount *mnt = nsfs_mnt;
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 12cb8bd81d2d..5d962eda8686 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -76,6 +76,8 @@ extern struct file *proc_ns_fget(int fd);
 extern void *ns_get_path(struct path *path, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops);
 
+extern int ns_cmp(struct ns_common *ns, u64 dev, u64 ino);
+
 extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
 			const struct proc_ns_operations *ns_ops);
 extern void nsfs_init(void);
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index af8fe8a909dc..70d1d6d473ae 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -28,6 +28,7 @@
 #include <net/netns/xfrm.h>
 #include <net/netns/mpls.h>
 #include <linux/ns_common.h>
+#include <linux/proc_ns.h>
 #include <linux/idr.h>
 #include <linux/skbuff.h>
 
@@ -215,6 +216,11 @@ int net_eq(const struct net *net1, const struct net *net2)
 
 void net_drop_ns(void *);
 
+static inline int netns_cmp(struct net *net, u64 dev, u64 ino)
+{
+	return ns_cmp(&net->ns, dev, ino);
+}
+
 #else
 
 static inline struct net *get_net(struct net *net)
@@ -237,6 +243,11 @@ int net_eq(const struct net *net1, const struct net *net2)
 	return 1;
 }
 
+static inline int netns_cmp(struct net *net, u64 dev, u64 ino)
+{
+	return 1;
+}
+
 #define net_drop_ns NULL
 #endif
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d2b0ac799d03..0a59dd182916 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -437,6 +437,14 @@ union bpf_attr {
  *     @xdp_md: pointer to xdp_md
  *     @delta: An positive/negative integer to be added to xdp_md.data
  *     Return: 0 on success or negative on error
+ *
+ * int bpf_sk_netns_cmp(ctx, dev, ino)
+ *     Compare the network namespace for sk or skb against the given
+ *     device and inode number.
+ *     @ctx: pointer to struct sock or struct __sk_buff
+ *     @dev: unsigned long device id for namespace
+ *     @ino: unsigned long inode for namespace
+ *     Return: 1 on match, 0 if no match and -1 on error
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -483,7 +491,8 @@ union bpf_attr {
 	FN(set_hash_invalid),		\
 	FN(get_numa_node_id),		\
 	FN(skb_change_head),		\
-	FN(xdp_adjust_head),
+	FN(xdp_adjust_head),		\
+	FN(sk_netns_cmp),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 1969b3f118c1..69918f8c79ee 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -52,6 +52,7 @@
 #include <net/dst_metadata.h>
 #include <net/dst.h>
 #include <net/sock_reuseport.h>
+#include <linux/proc_ns.h>
 
 /**
  *	sk_filter_trim_cap - run a packet through a socket filter
@@ -2597,6 +2598,39 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
 	.arg5_type	= ARG_CONST_STACK_SIZE,
 };
 
+BPF_CALL_3(bpf_sk_netns_cmp, struct sock *, sk,  u64, ns_dev, u64, ns_ino)
+{
+	return netns_cmp(sock_net(sk), ns_dev, ns_ino);
+}
+
+static const struct bpf_func_proto bpf_sk_netns_cmp_proto = {
+	.func		= bpf_sk_netns_cmp,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_skb_netns_cmp, struct sk_buff *, skb,  u64, ns_dev, u64, ns_ino)
+{
+	struct net_device *dev = skb->dev;
+
+	if (!dev)
+		return -EINVAL;
+
+	return netns_cmp(dev_net(dev), ns_dev, ns_ino);
+}
+
+static const struct bpf_func_proto bpf_skb_netns_cmp_proto = {
+	.func		= bpf_skb_netns_cmp,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 sk_filter_func_proto(enum bpf_func_id func_id)
 {
@@ -2617,9 +2651,12 @@ sk_filter_func_proto(enum bpf_func_id func_id)
 		return &bpf_tail_call_proto;
 	case BPF_FUNC_ktime_get_ns:
 		return &bpf_ktime_get_ns_proto;
+	case BPF_FUNC_sk_netns_cmp:
+		return &bpf_skb_netns_cmp_proto;
 	case BPF_FUNC_trace_printk:
 		if (capable(CAP_SYS_ADMIN))
 			return bpf_get_trace_printk_proto();
+		/* fallthrough */
 	default:
 		return NULL;
 	}
@@ -2700,6 +2737,17 @@ xdp_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
+cg_sock_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_sk_netns_cmp:
+		return &bpf_sk_netns_cmp_proto;
+	default:
+		return sk_filter_func_proto(func_id);
+	}
+}
+
+static const struct bpf_func_proto *
 cg_skb_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -3255,7 +3303,7 @@ static const struct bpf_verifier_ops lwt_xmit_ops = {
 };
 
 static const struct bpf_verifier_ops cg_sock_ops = {
-	.get_func_proto		= sk_filter_func_proto,
+	.get_func_proto		= cg_sock_func_proto,
 	.is_valid_access	= sock_filter_is_valid_access,
 	.convert_ctx_access	= sock_filter_convert_ctx_access,
 };
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index faaffe2e139a..679cf1496c37 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -94,6 +94,8 @@ static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
 	(void *) BPF_FUNC_skb_under_cgroup;
 static int (*bpf_skb_change_head)(void *, int len, int flags) =
 	(void *) BPF_FUNC_skb_change_head;
+static unsigned long long (*bpf_sk_netns_cmp)(void *) =
+	(void *) BPF_FUNC_sk_netns_cmp;
 
 #if defined(__x86_64__)
 
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
index c3cfb23e23b5..c29514101a46 100644
--- a/samples/bpf/test_cgrp2_sock.c
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -10,6 +10,7 @@
 
 #define _GNU_SOURCE
 
+#include <sys/stat.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <stddef.h>
@@ -25,13 +26,24 @@
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 
-static int prog_load(int idx)
+static int prog_load(int idx, __u64 dev, __u64 ino)
 {
 	struct bpf_insn prog[] = {
-		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
-		BPF_MOV64_IMM(BPF_REG_3, idx),
-		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
-		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* save sk ctx to r6 */
+
+		/* compare network namespace context for socket; r1 = ctx */
+		BPF_LD_IMM64(BPF_REG_2, dev),
+		BPF_LD_IMM64(BPF_REG_3, ino),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_sk_netns_cmp),
+		/* if no match skip setting sk_bound_dev_if */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+
+		/* set sk_bound_dev_if for socket */
+		BPF_MOV64_IMM(BPF_REG_2, idx),
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_2,
+			    offsetof(struct bpf_sock, bound_dev_if)),
+
 		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
 		BPF_EXIT_INSN(),
 	};
@@ -41,33 +53,51 @@ static int prog_load(int idx)
 				"GPL", 0, bpf_log_buf, BPF_LOG_BUF_SIZE);
 }
 
-static int usage(const char *argv0)
+/* return namespace dev and inode */
+static int get_netns(pid_t pid, __u64 *ns_dev, __u64 *ns_ino)
 {
-	printf("Usage: %s cg-path device-index\n", argv0);
-	return EXIT_FAILURE;
+	char path[64];
+	struct stat st;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/net", pid);
+
+	if (stat(path, &st) != 0)
+		return -1;
+
+	*ns_dev = st.st_dev;
+	*ns_ino = st.st_ino;
+
+	return 0;
 }
 
-int main(int argc, char **argv)
+static int bind_prog(const char *cpath, const char *dev)
 {
 	int cg_fd, prog_fd, ret;
 	unsigned int idx;
+	__u64 ns_dev, ns_ino;
 
-	if (argc < 2)
-		return usage(argv[0]);
+	if (!dev)
+		return 1;
 
-	idx = if_nametoindex(argv[2]);
+	idx = if_nametoindex(dev);
 	if (!idx) {
 		printf("Invalid device name\n");
 		return EXIT_FAILURE;
 	}
 
-	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (get_netns(getpid(), &ns_dev, &ns_ino)) {
+		fprintf(stderr,
+			"Failed to read network namespace data\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(cpath, O_DIRECTORY | O_RDONLY);
 	if (cg_fd < 0) {
 		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
 		return EXIT_FAILURE;
 	}
 
-	prog_fd = prog_load(idx);
+	prog_fd = prog_load(idx, ns_dev, ns_ino);
 	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
 
 	if (prog_fd < 0) {
@@ -84,3 +114,101 @@ int main(int argc, char **argv)
 
 	return EXIT_SUCCESS;
 }
+
+static int socket_test(int family, const char *dev, int is_negative)
+{
+	unsigned int idx;
+	socklen_t optlen;
+	char name[16];
+	int sd, rc;
+
+	if (!dev)
+		return 1;
+
+	if (!is_negative) {
+		idx = if_nametoindex(dev);
+		if (!idx) {
+			printf("Invalid device name\n");
+			return EXIT_FAILURE;
+		}
+	}
+
+	sd = socket(family, SOCK_DGRAM, 0);
+	if (sd < 0)
+		return 1;
+
+	name[0] = '\0';
+	optlen = sizeof(name);
+	rc = getsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, name, &optlen);
+
+	close(sd);
+	if (rc) {
+		printf("getsockopt(SO_BINDTODEVICE) failed\n");
+		return 1;
+	}
+
+	printf("%s socket bound to \"%s\", checking against \"%s\", neg test %d\n",
+		family == PF_INET ? "ipv4" : "ipv6",
+		name, dev, is_negative);
+
+	if (strcmp(name, dev) && !is_negative) {
+		printf("socket not bound to device as expected\n");
+		return 1;
+	}
+
+	if (!strcmp(name, dev) && is_negative) {
+		printf("socket is bound to device when not expected\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+static int usage(const char *argv0)
+{
+	printf("Usage: %s -c cg-path -d device-index -4 -6 -n\n", argv0);
+	return EXIT_FAILURE;
+}
+
+int main(int argc, char **argv)
+{
+	const char *dev = NULL, *cpath = NULL;
+	int do_ipv4 = 0, do_ipv6 = 0, is_negative = 0;
+	int rc;
+
+	extern char *optarg;
+
+	while ((rc = getopt(argc, argv, "d:c:46in")) > 0) {
+		switch (rc) {
+		case 'd':
+			dev = optarg;
+			break;
+		case 'c':
+			cpath = optarg;
+			break;
+		case '4':
+			do_ipv4 = 1;
+			break;
+		case '6':
+			do_ipv6 = 1;
+			break;
+		case 'n':
+			is_negative = 1;
+			break;
+		default:
+			usage(argv[0]);
+			return 1;
+		}
+	}
+
+	if (cpath && bind_prog(cpath, dev))
+		return 1;
+
+	if (do_ipv4 && socket_test(PF_INET, dev, is_negative))
+		return 1;
+
+	if (do_ipv6 && socket_test(PF_INET6, dev, is_negative))
+		return 1;
+
+	return EXIT_SUCCESS;
+}
diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh
index 925fd467c7cc..bab5185e46f8 100755
--- a/samples/bpf/test_cgrp2_sock.sh
+++ b/samples/bpf/test_cgrp2_sock.sh
@@ -8,11 +8,8 @@ function config_device {
 	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
 	ip netns exec at_ns0 ip addr add 2401:db00::1/64 dev veth0 nodad
 	ip netns exec at_ns0 ip link set dev veth0 up
-	ip link add foo type vrf table 1234
-	ip link set foo up
 	ip addr add 172.16.1.101/24 dev veth0b
 	ip addr add 2401:db00::2/64 dev veth0b nodad
-	ip link set veth0b master foo
 }
 
 function attach_bpf {
@@ -20,28 +17,33 @@ function attach_bpf {
 	mkdir -p /tmp/cgroupv2
 	mount -t cgroup2 none /tmp/cgroupv2
 	mkdir -p /tmp/cgroupv2/foo
-	test_cgrp2_sock /tmp/cgroupv2/foo foo
+	test_cgrp2_sock -c /tmp/cgroupv2/foo -d veth0b
 	echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
 }
 
 function cleanup {
 	set +ex
+	ip link del veth0b
 	ip netns delete at_ns0
-	ip link del veth0
-	ip link del foo
 	umount /tmp/cgroupv2
 	rm -rf /tmp/cgroupv2
 	set -ex
 }
 
 function do_test {
-	ping -c1 -w1 172.16.1.100
-	ping6 -c1 -w1 2401:db00::1
+	test_cgrp2_sock -4 -d veth0b
+	test_cgrp2_sock -6 -d veth0b
+}
+
+function do_neg_test {
+	ip netns exec at_ns0 test_cgrp2_sock -4 -n -d veth0b
+	ip netns exec at_ns0 test_cgrp2_sock -6 -n -d veth0b
 }
 
 cleanup 2>/dev/null
 config_device
 attach_bpf
 do_test
+do_neg_test
 cleanup
 echo "*** PASS ***"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-16  1:29 [PATCH net v5] bpf: add helper to compare network namespaces David Ahern
@ 2017-02-16  3:24 ` Eric W. Biederman
  2017-02-16 10:08 ` Daniel Borkmann
  1 sibling, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2017-02-16  3:24 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, ast, daniel, tj, luto

David Ahern <dsa@cumulusnetworks.com> writes:

> In cases where bpf programs are looking at sockets and packets
> that belong to different netns, it could be useful to compare the
> network namespace of the socket or packet
>
> Introduce bpf_sk_netns_cmp and bpf_skb_netns_cmp helpers to compare
> network namespace of the socket or skb to the namespace parameters
> in a prorgam.

The code for bpf_sk_netns_cmp looks good.

I don't feel comfortable with bpf_skb_netns_cmp.  There are two
issues:

  (a) skb->dev is not reliably set and does not have reliable semantics
  in different parts of the network stack.  Making bpf_skb_netns_cmp not
  work reliably for output packets for example.

  (b) Every path that processes a network packet in the network stack
  now has a struct net passed in the function arguments.  Either
  directly or in a function that is passed through.  Making it
  unambiguous and simple to get the struct net.

So I recommend a function bpf_context_cmp.  Which looks a net passed
into the bpf filter of skbs.  That should be 100% reliable and quite
straight forward to implement today.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-16  1:29 [PATCH net v5] bpf: add helper to compare network namespaces David Ahern
  2017-02-16  3:24 ` Eric W. Biederman
@ 2017-02-16 10:08 ` Daniel Borkmann
  2017-02-17  4:01   ` David Ahern
  2017-02-20  4:17   ` Eric W. Biederman
  1 sibling, 2 replies; 8+ messages in thread
From: Daniel Borkmann @ 2017-02-16 10:08 UTC (permalink / raw)
  To: David Ahern, netdev, davem; +Cc: ast, tj, luto, ebiederm

On 02/16/2017 02:29 AM, David Ahern wrote:
> In cases where bpf programs are looking at sockets and packets
> that belong to different netns, it could be useful to compare the
> network namespace of the socket or packet
>
> Introduce bpf_sk_netns_cmp and bpf_skb_netns_cmp helpers to compare
> network namespace of the socket or skb to the namespace parameters
> in a prorgam.
>
> For example to disallow raw sockets in all non-init netns
> the bpf_type_cgroup_sock program can do:
> if (sk->type == SOCK_RAW && !bpf_sk_netns_cmp(sk, 0x3, 0xf0000075))
>    return 0;
>
> where 0x3 and 0xf0000075 are the st_dev and st_ino of /proc/pid/ns/net.
>
> Note that all bpf programs types are global. The same socket filter
> program can be attached to sockets in different netns,
> just like cls_bpf can see ingress/egress packets of multiple
> net_devices in different netns. The cgroup_bpf programs are
> the most exposed to sockets and devices across netns,
> but the need to identify netns applies to all.
> For example, if bpf_type_cgroup_skb didn't exist the system wide
> monitoring daemon could have used ld_preload mechanism and
> attached the same program to see traffic from applications
> across netns. Therefore make bpf_sk_netns_cmp() helper available
> to all network related bpf program types.
>
> For socket, cls_bpf and cgroup_skb programs this helper
> can be considered a new feature, whereas for cgroup_sock
> programs that modify sk->bound_dev_if (like 'ip vrf' does)
> it's a bug fix, since 'ip vrf' needs to be netns aware.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> v2->v3: build bot complained. s/static/static inline/. no other changes.
> v3->v4: fixed fallthrough case. Thanks Daniel.
> v4->v5: dsa converted netns_id as a u64 to netns_cmp with individual dev
>          and inode number. Updated samples test for sock bind.
[...]
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index 8c9fb29c6673..c335f513d467 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -49,6 +49,13 @@ static void nsfs_evict(struct inode *inode)
>   	ns->ops->put(ns);
>   }
>
> +int ns_cmp(struct ns_common *ns, u64 dev, u64 ino)
> +{
> +	u64 ns_dev = new_encode_dev(nsfs_mnt->mnt_sb->s_dev);
> +
> +	return dev == ns_dev && ino == ns->inum;
> +}
> +
[...]
>   void net_drop_ns(void *);
>
> +static inline int netns_cmp(struct net *net, u64 dev, u64 ino)
> +{
> +	return ns_cmp(&net->ns, dev, ino);
> +}
> +
>   #else
[...]
>
>   /**
>    *	sk_filter_trim_cap - run a packet through a socket filter
> @@ -2597,6 +2598,39 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
>   	.arg5_type	= ARG_CONST_STACK_SIZE,
>   };
>
> +BPF_CALL_3(bpf_sk_netns_cmp, struct sock *, sk,  u64, ns_dev, u64, ns_ino)
> +{
> +	return netns_cmp(sock_net(sk), ns_dev, ns_ino);
> +}

Is there anything that speaks against doing the comparison itself
outside of the helper? Meaning, the helper would get a buffer
passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
and fills both out with the netns info belonging to the sk/skb.

And the program can use that to make a match, or to use the
struct itself as a map lookup key (which in the previous patch
was also possible). Right now, such flexibility would be lost
for map usage with the above bpf_sk{,b}_netns_cmp() membership
test that checks only against one instance of ns_dev/ns_ino.

A helper used as bpf_get_netns_sk(sk, &buff, sizeof(buff)) resp.
bpf_get_netns_skb(skb, &buff, sizeof(buff)) can also still be
extended in future should something really change, f.e. we know
that the passed size must be 16 byte today, and anything else
would be rejected for now.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-16 10:08 ` Daniel Borkmann
@ 2017-02-17  4:01   ` David Ahern
  2017-02-17 14:15     ` Daniel Borkmann
  2017-02-20  4:17   ` Eric W. Biederman
  1 sibling, 1 reply; 8+ messages in thread
From: David Ahern @ 2017-02-17  4:01 UTC (permalink / raw)
  To: Daniel Borkmann, netdev, davem; +Cc: ast, tj, luto, ebiederm

On 2/16/17 3:08 AM, Daniel Borkmann wrote:
> Is there anything that speaks against doing the comparison itself
> outside of the helper? Meaning, the helper would get a buffer
> passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
> and fills both out with the netns info belonging to the sk/skb.

How do you handle CONFIG_NET_NS not set?

You call something like bpf_get_netns_id(sk, &foo), it has to exist
regardless of the config. What should it return if netns is disabled?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-17  4:01   ` David Ahern
@ 2017-02-17 14:15     ` Daniel Borkmann
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Borkmann @ 2017-02-17 14:15 UTC (permalink / raw)
  To: David Ahern, netdev, davem; +Cc: ast, tj, luto, ebiederm

On 02/17/2017 05:01 AM, David Ahern wrote:
> On 2/16/17 3:08 AM, Daniel Borkmann wrote:
>> Is there anything that speaks against doing the comparison itself
>> outside of the helper? Meaning, the helper would get a buffer
>> passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
>> and fills both out with the netns info belonging to the sk/skb.
>
> How do you handle CONFIG_NET_NS not set?
>
> You call something like bpf_get_netns_id(sk, &foo), it has to exist
> regardless of the config. What should it return if netns is disabled?

In rough semi pseudo code, it could look like the below. In case we have
!CONFIG_NET_NS then that would be hidden behind the netns_get_kstat() (or
whatever function name it has) as a static inline that just returns an
error such as -ENOTSUPP.

If the entity installing the program is aware that CONFIG_NET_NS is set
and that the input is exactly of size struct bpf_netns, then it can also
skip the error test in the BPF program itself. Anyway, just a thought
so that the helper could be more flexible and used as a key for lookups
on maps, too ...

BPF_CALL_3(bpf_sk_netns_get, struct sock *, sk,  struct bpf_netns *, ns,
            u32, size)
{
         struct ns_kstat tmp;
         int ret;

         if (unlikely(size != sizeof(struct bpf_netns)))
                 return -EINVAL;

         ret = netns_get_kstat(sock_net(sk), &tmp);
         if (unlikely(ret))
                 return ret;

         ns->dev = tmp.dev;
         ns->ino = tmp.ino;

         return 0;
}

static const struct bpf_func_proto bpf_sk_netns_get_proto = {
         .func           = bpf_sk_netns_get,
         .gpl_only       = false,
         .ret_type       = RET_INTEGER,
         .arg1_type      = ARG_PTR_TO_CTX,
         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
         .arg3_type      = ARG_CONST_SIZE,
};

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-16 10:08 ` Daniel Borkmann
  2017-02-17  4:01   ` David Ahern
@ 2017-02-20  4:17   ` Eric W. Biederman
  2017-02-23  3:28     ` David Ahern
  1 sibling, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2017-02-20  4:17 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, davem, ast, tj, luto

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 02/16/2017 02:29 AM, David Ahern wrote:
>> In cases where bpf programs are looking at sockets and packets
>> that belong to different netns, it could be useful to compare the
>> network namespace of the socket or packet
>>
>> Introduce bpf_sk_netns_cmp and bpf_skb_netns_cmp helpers to compare
>> network namespace of the socket or skb to the namespace parameters
>> in a prorgam.
>>
>> For example to disallow raw sockets in all non-init netns
>> the bpf_type_cgroup_sock program can do:
>> if (sk->type == SOCK_RAW && !bpf_sk_netns_cmp(sk, 0x3, 0xf0000075))
>>    return 0;
>>
>> where 0x3 and 0xf0000075 are the st_dev and st_ino of /proc/pid/ns/net.
>>
>> Note that all bpf programs types are global. The same socket filter
>> program can be attached to sockets in different netns,
>> just like cls_bpf can see ingress/egress packets of multiple
>> net_devices in different netns. The cgroup_bpf programs are
>> the most exposed to sockets and devices across netns,
>> but the need to identify netns applies to all.
>> For example, if bpf_type_cgroup_skb didn't exist the system wide
>> monitoring daemon could have used ld_preload mechanism and
>> attached the same program to see traffic from applications
>> across netns. Therefore make bpf_sk_netns_cmp() helper available
>> to all network related bpf program types.
>>
>> For socket, cls_bpf and cgroup_skb programs this helper
>> can be considered a new feature, whereas for cgroup_sock
>> programs that modify sk->bound_dev_if (like 'ip vrf' does)
>> it's a bug fix, since 'ip vrf' needs to be netns aware.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>> ---
>> v2->v3: build bot complained. s/static/static inline/. no other changes.
>> v3->v4: fixed fallthrough case. Thanks Daniel.
>> v4->v5: dsa converted netns_id as a u64 to netns_cmp with individual dev
>>          and inode number. Updated samples test for sock bind.
> [...]
>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>> index 8c9fb29c6673..c335f513d467 100644
>> --- a/fs/nsfs.c
>> +++ b/fs/nsfs.c
>> @@ -49,6 +49,13 @@ static void nsfs_evict(struct inode *inode)
>>   	ns->ops->put(ns);
>>   }
>>
>> +int ns_cmp(struct ns_common *ns, u64 dev, u64 ino)
>> +{
>> +	u64 ns_dev = new_encode_dev(nsfs_mnt->mnt_sb->s_dev);
>> +
>> +	return dev == ns_dev && ino == ns->inum;
>> +}
>> +
> [...]
>>   void net_drop_ns(void *);
>>
>> +static inline int netns_cmp(struct net *net, u64 dev, u64 ino)
>> +{
>> +	return ns_cmp(&net->ns, dev, ino);
>> +}
>> +
>>   #else
> [...]
>>
>>   /**
>>    *	sk_filter_trim_cap - run a packet through a socket filter
>> @@ -2597,6 +2598,39 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
>>   	.arg5_type	= ARG_CONST_STACK_SIZE,
>>   };
>>
>> +BPF_CALL_3(bpf_sk_netns_cmp, struct sock *, sk,  u64, ns_dev, u64, ns_ino)
>> +{
>> +	return netns_cmp(sock_net(sk), ns_dev, ns_ino);
>> +}
>
> Is there anything that speaks against doing the comparison itself
> outside of the helper? Meaning, the helper would get a buffer
> passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
> and fills both out with the netns info belonging to the sk/skb.

Yes.  The dev/ino pair is not necessarily unique so it is not at all
clear that the returned value would be what the program is expecting.

I can see some fancy optimizations that perhaps compile a dev/ino pair
into a network namespace pointer, and fail the compilation otherwise.
But short of that I believe we need to have the comparision in the
helper.

> And the program can use that to make a match, or to use the
> struct itself as a map lookup key (which in the previous patch
> was also possible). Right now, such flexibility would be lost
> for map usage with the above bpf_sk{,b}_netns_cmp() membership
> test that checks only against one instance of ns_dev/ns_ino.

I don't have a clue how you could use maps in this context, although I
can see where it could be desirable.  A bpf filter program that collects
counts per network namespace, or per a set of known network namespaces
feels reasonable.

> A helper used as bpf_get_netns_sk(sk, &buff, sizeof(buff)) resp.
> bpf_get_netns_skb(skb, &buff, sizeof(buff)) can also still be
> extended in future should something really change, f.e. we know
> that the passed size must be 16 byte today, and anything else
> would be rejected for now.

I am not particularly concerned about the size, but the fact that
dev/ino is not unique globally for a network namespace seems very
challenging for your design.

Now if someone is willing to limit installation of these bpf programs
to the initial instances of all namespaces we can make some assumptions
but I haven't seen any willingness to limit things in such a way that
we can make assumptions about the user space context we are talking
about.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-20  4:17   ` Eric W. Biederman
@ 2017-02-23  3:28     ` David Ahern
  2017-02-23 14:55       ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2017-02-23  3:28 UTC (permalink / raw)
  To: Eric W. Biederman, Daniel Borkmann; +Cc: netdev, davem, ast, tj, luto

On 2/19/17 9:17 PM, Eric W. Biederman wrote:
>>> @@ -2597,6 +2598,39 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
>>>   	.arg5_type	= ARG_CONST_STACK_SIZE,
>>>   };
>>>
>>> +BPF_CALL_3(bpf_sk_netns_cmp, struct sock *, sk,  u64, ns_dev, u64, ns_ino)
>>> +{
>>> +	return netns_cmp(sock_net(sk), ns_dev, ns_ino);
>>> +}
>>
>> Is there anything that speaks against doing the comparison itself
>> outside of the helper? Meaning, the helper would get a buffer
>> passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
>> and fills both out with the netns info belonging to the sk/skb.
> 
> Yes.  The dev/ino pair is not necessarily unique so it is not at all
> clear that the returned value would be what the program is expecting.

How does the comparison inside a helper change the fact that a dev and
inode number are compared? ie., inside or outside of a helper, the end
result is that a bpf program has a dev/inode pair that is compared to
that of a socket or skb.

Ideally, it would be nice to have a bpf equivalent to net_eq(), but it
is not possible from a practical perspective to have bpf programs load a
namespace reference (address really) from a given pid or fd.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v5] bpf: add helper to compare network namespaces
  2017-02-23  3:28     ` David Ahern
@ 2017-02-23 14:55       ` Eric W. Biederman
  0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2017-02-23 14:55 UTC (permalink / raw)
  To: David Ahern; +Cc: Daniel Borkmann, netdev, davem, ast, tj, luto

David Ahern <dsa@cumulusnetworks.com> writes:

> On 2/19/17 9:17 PM, Eric W. Biederman wrote:
>>>> @@ -2597,6 +2598,39 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
>>>>   	.arg5_type	= ARG_CONST_STACK_SIZE,
>>>>   };
>>>>
>>>> +BPF_CALL_3(bpf_sk_netns_cmp, struct sock *, sk,  u64, ns_dev, u64, ns_ino)
>>>> +{
>>>> +	return netns_cmp(sock_net(sk), ns_dev, ns_ino);
>>>> +}
>>>
>>> Is there anything that speaks against doing the comparison itself
>>> outside of the helper? Meaning, the helper would get a buffer
>>> passed from stack f.e. struct foo { u64 ns_dev; u64 ns_ino; }
>>> and fills both out with the netns info belonging to the sk/skb.
>> 
>> Yes.  The dev/ino pair is not necessarily unique so it is not at all
>> clear that the returned value would be what the program is expecting.
>
> How does the comparison inside a helper change the fact that a dev and
> inode number are compared? ie., inside or outside of a helper, the end
> result is that a bpf program has a dev/inode pair that is compared to
> that of a socket or skb.

With the comparison inside a helper if the kernel has more than one
dev+inode that maps to the same network namespace (as we had just
recently until the inodes were moved from proc to nsfs) then the helper
can lookup the the dev+inode and see which network namespace it maps
to and then compare network namespaces.  So logically the helper really
is doing more than more than comparing dev+inode.

With the helper doing the comparison the kernel implementation details
can change and everything will continue to work.

> Ideally, it would be nice to have a bpf equivalent to net_eq(), but it
> is not possible from a practical perspective to have bpf programs load a
> namespace reference (address really) from a given pid or fd.

Which is why I am not at all keen on support for maps etc.  It is not
clear how to do something more elegant.

If there was an environmental restriction on the bpf program where we
knew all references had to be from the perspective of the initial set of
namespaces there would be a unique dev+inode we could deal with.  But
again that obvious solution that works so often elsewhere appears to be
a non-starter here.

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-23 15:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-16  1:29 [PATCH net v5] bpf: add helper to compare network namespaces David Ahern
2017-02-16  3:24 ` Eric W. Biederman
2017-02-16 10:08 ` Daniel Borkmann
2017-02-17  4:01   ` David Ahern
2017-02-17 14:15     ` Daniel Borkmann
2017-02-20  4:17   ` Eric W. Biederman
2017-02-23  3:28     ` David Ahern
2017-02-23 14:55       ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.