All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if
@ 2016-10-27  0:58 David Ahern
  2016-10-27  0:58 ` [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type David Ahern
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: David Ahern @ 2016-10-27  0:58 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.

This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup:

    https://www.mail-archive.com/netdev@vger.kernel.org/msg134028.html

to provide a capability to set sk_bound_dev_if for all AF_INET{6}
sockets opened by a process in a cgroup when the sockets are allocated.

This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.

v2
- addressed Daniel's comments: dropped the bpf_sock_store_u32 helper
  and used bpf_prog_run_save_cb on the code move

- picked up Mickaël Salaün's subtype patch with a few small tweaks

- removed new prog type in favor of a subtype on the BPF_PROG_TYPE_CGROUP
  from Daniel Mack's patch set

- moved the filter hook from sk_alloc to inet{6}_create


David Ahern (5):
  bpf: Refactor cgroups code in prep for new type
  bpf: Add eBPF program subtype and is_valid_subtype() verifier
  bpf: Add new cgroup attach type to enable sock modifications
  samples: bpf: Add prog_subtype to bpf_prog_load
  samples: bpf: add userspace example for modifying sk_bound_dev_if

 include/linux/bpf.h             |   7 ++-
 include/linux/filter.h          |   3 +-
 include/uapi/linux/bpf.h        |  15 +++++-
 kernel/bpf/cgroup.c             |  36 +++++++++++--
 kernel/bpf/syscall.c            |  11 ++--
 kernel/bpf/verifier.c           |  10 +++-
 kernel/trace/bpf_trace.c        |  16 ++++--
 net/core/filter.c               | 115 +++++++++++++++++++++++++++++++++-------
 net/ipv4/af_inet.c              |   4 ++
 net/ipv6/af_inet6.c             |   3 ++
 samples/bpf/Makefile            |   2 +
 samples/bpf/bpf_load.c          |   2 +-
 samples/bpf/fds_example.c       |   2 +-
 samples/bpf/libbpf.c            |   5 +-
 samples/bpf/libbpf.h            |   3 +-
 samples/bpf/sock_example.c      |   2 +-
 samples/bpf/test_cgrp2_attach.c |   4 +-
 samples/bpf/test_cgrp2_sock.c   |  84 +++++++++++++++++++++++++++++
 18 files changed, 280 insertions(+), 44 deletions(-)
 create mode 100644 samples/bpf/test_cgrp2_sock.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
@ 2016-10-27  0:58 ` David Ahern
  2016-10-31 16:58   ` David Miller
  2016-10-27  0:58 ` [PATCH net-next 2/5] bpf: Add eBPF program subtype and is_valid_subtype() verifier David Ahern
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2016-10-27  0:58 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Code move only and rename only; no functional change intended.

v2
- fix bpf_prog_run_clear_cb to bpf_prog_run_save_cb as caught by Daniel

- rename BPF_PROG_TYPE_CGROUP_SKB and its cg_skb functions to
  BPF_PROG_TYPE_CGROUP and cgroup

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/uapi/linux/bpf.h        |  2 +-
 kernel/bpf/cgroup.c             | 27 ++++++++++++++++++++++-----
 kernel/bpf/syscall.c            |  2 +-
 net/core/filter.c               | 14 +++++++-------
 samples/bpf/test_cgrp2_attach.c |  4 ++--
 5 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6b62ee9a2f78..73da296c2125 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -98,7 +98,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_TRACEPOINT,
 	BPF_PROG_TYPE_XDP,
 	BPF_PROG_TYPE_PERF_EVENT,
-	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP,
 };
 
 enum bpf_attach_type {
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a0ab43f264b0..d5746aec8f34 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -117,6 +117,19 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 	}
 }
 
+static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb,
+				       struct bpf_prog *prog)
+{
+	unsigned int offset = skb->data - skb_network_header(skb);
+	int ret;
+
+	__skb_push(skb, offset);
+	ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM;
+	__skb_pull(skb, offset);
+
+	return ret;
+}
+
 /**
  * __cgroup_bpf_run_filter() - Run a program for packet filtering
  * @sk: The socken sending or receiving traffic
@@ -153,11 +166,15 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 
 	prog = rcu_dereference(cgrp->bpf.effective[type]);
 	if (prog) {
-		unsigned int offset = skb->data - skb_network_header(skb);
-
-		__skb_push(skb, offset);
-		ret = bpf_prog_run_save_cb(prog, skb) == 1 ? 0 : -EPERM;
-		__skb_pull(skb, offset);
+		switch (type) {
+		case BPF_CGROUP_INET_INGRESS:
+		case BPF_CGROUP_INET_EGRESS:
+			ret = __cgroup_bpf_run_filter_skb(skb, prog);
+			break;
+		/* make gcc happy else complains about missing enum value */
+		default:
+			return 0;
+		}
 	}
 
 	rcu_read_unlock();
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1814c010ace6..3c4c9ed2d576 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -841,7 +841,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
 		prog = bpf_prog_get_type(attr->attach_bpf_fd,
-					 BPF_PROG_TYPE_CGROUP_SKB);
+					 BPF_PROG_TYPE_CGROUP);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 4552b8c93b99..61e229c3a862 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2583,7 +2583,7 @@ xdp_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
-cg_skb_func_proto(enum bpf_func_id func_id)
+cgroup_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_load_bytes:
@@ -2955,8 +2955,8 @@ static const struct bpf_verifier_ops xdp_ops = {
 	.convert_ctx_access	= xdp_convert_ctx_access,
 };
 
-static const struct bpf_verifier_ops cg_skb_ops = {
-	.get_func_proto		= cg_skb_func_proto,
+static const struct bpf_verifier_ops cgroup_ops = {
+	.get_func_proto		= cgroup_func_proto,
 	.is_valid_access	= sk_filter_is_valid_access,
 	.convert_ctx_access	= sk_filter_convert_ctx_access,
 };
@@ -2981,9 +2981,9 @@ static struct bpf_prog_type_list xdp_type __read_mostly = {
 	.type	= BPF_PROG_TYPE_XDP,
 };
 
-static struct bpf_prog_type_list cg_skb_type __read_mostly = {
-	.ops	= &cg_skb_ops,
-	.type	= BPF_PROG_TYPE_CGROUP_SKB,
+static struct bpf_prog_type_list cgroup_type __read_mostly = {
+	.ops	= &cgroup_ops,
+	.type	= BPF_PROG_TYPE_CGROUP,
 };
 
 static int __init register_sk_filter_ops(void)
@@ -2992,7 +2992,7 @@ static int __init register_sk_filter_ops(void)
 	bpf_register_prog_type(&sched_cls_type);
 	bpf_register_prog_type(&sched_act_type);
 	bpf_register_prog_type(&xdp_type);
-	bpf_register_prog_type(&cg_skb_type);
+	bpf_register_prog_type(&cgroup_type);
 
 	return 0;
 }
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
index 63ef2083f766..468e68e6d511 100644
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -69,8 +69,8 @@ static int prog_load(int map_fd, int verdict)
 		BPF_EXIT_INSN(),
 	};
 
-	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB,
-			     prog, sizeof(prog), "GPL", 0);
+	return bpf_prog_load(BPF_PROG_TYPE_CGROUP,
+			     prog, sizeof(prog), "GPL", 0, NULL);
 }
 
 static int usage(const char *argv0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/5] bpf: Add eBPF program subtype and is_valid_subtype() verifier
  2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
  2016-10-27  0:58 ` [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type David Ahern
@ 2016-10-27  0:58 ` David Ahern
  2016-10-27  0:58 ` [PATCH v2 net-next 3/5] bpf: Add new cgroup attach type to enable sock modifications David Ahern
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2016-10-27  0:58 UTC (permalink / raw)
  To: netdev
  Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern,
	Mickaël Salaün

The program subtype's goal is to be able to have different static
fine-grained verifications for a unique program type.

The struct bpf_verifier_ops gets a new optional function:
is_valid_subtype(). This new verifier is called at the begening of the
eBPF program verification to check if the (optional) program subtype is
valid.

For now, only Landlock eBPF programs are using a program subtype but
this could be used by other program types in the future.

Cf. the next commit to see how the subtype is used by Landlock LSM.

Changes since v3:
* remove the "origin" field
* add an "option" field
* cleanup comments

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
[ Patch from Mickaël modified to add subtype arg to pe_prog_is_valid_access
  which was missing, removed the landlock references since they are not
  relevant for this use case and added cgroup struct to subtype, and
  fixed some checkpatch errors ]

---
 include/linux/bpf.h      |  7 +++++--
 include/linux/filter.h   |  1 +
 include/uapi/linux/bpf.h |  8 ++++++++
 kernel/bpf/syscall.c     |  7 +++++--
 kernel/bpf/verifier.c    | 10 ++++++++--
 kernel/trace/bpf_trace.c | 16 +++++++++++-----
 net/core/filter.c        | 26 ++++++++++++++++----------
 7 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index edcd96ded8aa..1c602a0938e8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -152,18 +152,21 @@ struct bpf_prog;
 
 struct bpf_verifier_ops {
 	/* return eBPF function prototype for verification */
-	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
+	const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id,
+					union bpf_prog_subtype *prog_subtype);
 
 	/* return true if 'size' wide access at offset 'off' within bpf_context
 	 * with 'type' (read or write) is allowed
 	 */
 	bool (*is_valid_access)(int off, int size, enum bpf_access_type type,
-				enum bpf_reg_type *reg_type);
+				enum bpf_reg_type *reg_type,
+				union bpf_prog_subtype *prog_subtype);
 	int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
 			    const struct bpf_prog *prog);
 	u32 (*convert_ctx_access)(enum bpf_access_type type, int dst_reg,
 				  int src_reg, int ctx_off,
 				  struct bpf_insn *insn, struct bpf_prog *prog);
+	bool (*is_valid_subtype)(union bpf_prog_subtype *prog_subtype);
 };
 
 struct bpf_prog_type_list {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c521adfe..88470cdd3ee1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -406,6 +406,7 @@ struct bpf_prog {
 	kmemcheck_bitfield_end(meta);
 	u32			len;		/* Number of filter blocks */
 	enum bpf_prog_type	type;		/* Type of BPF program */
+	union bpf_prog_subtype	subtype;	/* For fine-grained verifications */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 73da296c2125..160c24ffdce2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -118,6 +118,13 @@ enum bpf_attach_type {
 
 #define BPF_F_NO_PREALLOC	(1U << 0)
 
+union bpf_prog_subtype {
+	struct {
+		__u32 sock:1,    /* ebpf prog applies to sock struct */
+		      reserved:31;
+	} cgroup;
+} __attribute__((aligned(8)));
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -146,6 +153,7 @@ union bpf_attr {
 		__u32		log_size;	/* size of user buffer */
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
+		union bpf_prog_subtype prog_subtype;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3c4c9ed2d576..fbf81156e49d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -578,7 +578,9 @@ static void fixup_bpf_calls(struct bpf_prog *prog)
 				continue;
 			}
 
-			fn = prog->aux->ops->get_func_proto(insn->imm);
+			fn = prog->aux->ops->get_func_proto(insn->imm,
+							   &prog->subtype);
+
 			/* all functions that have prototype and verifier allowed
 			 * programs to call them, must be real in-kernel functions
 			 */
@@ -716,7 +718,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type)
 EXPORT_SYMBOL_GPL(bpf_prog_get_type);
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD kern_version
+#define	BPF_PROG_LOAD_LAST_FIELD prog_subtype
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -774,6 +776,7 @@ static int bpf_prog_load(union bpf_attr *attr)
 	err = find_prog_type(type, prog);
 	if (err < 0)
 		goto free_prog;
+	prog->subtype = attr->prog_subtype;
 
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 846d7ceaf202..e9c660363c41 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -655,7 +655,8 @@ static int check_ctx_access(struct bpf_verifier_env *env, int off, int size,
 		return 0;
 
 	if (env->prog->aux->ops->is_valid_access &&
-	    env->prog->aux->ops->is_valid_access(off, size, t, reg_type)) {
+	    env->prog->aux->ops->is_valid_access(off, size, t, reg_type,
+						&env->prog->subtype)) {
 		/* remember the offset of last byte accessed in ctx */
 		if (env->prog->aux->max_ctx_offset < off + size)
 			env->prog->aux->max_ctx_offset = off + size;
@@ -1177,7 +1178,8 @@ static int check_call(struct bpf_verifier_env *env, int func_id)
 	}
 
 	if (env->prog->aux->ops->get_func_proto)
-		fn = env->prog->aux->ops->get_func_proto(func_id);
+		fn = env->prog->aux->ops->get_func_proto(func_id,
+							&env->prog->subtype);
 
 	if (!fn) {
 		verbose("unknown func %d\n", func_id);
@@ -3076,6 +3078,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 	if ((*prog)->len <= 0 || (*prog)->len > BPF_MAXINSNS)
 		return -E2BIG;
 
+	if ((*prog)->aux->ops->is_valid_subtype &&
+	    !(*prog)->aux->ops->is_valid_subtype(&(*prog)->subtype))
+		return -EINVAL;
+
 	/* 'struct bpf_verifier_env' can be global, but since it's not small,
 	 * allocate/free it every time bpf_check() is called
 	 */
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fa77311dadb2..922bf7b99811 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -437,7 +437,9 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
 	}
 }
 
-static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id)
+static const
+struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id,
+					      union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -451,7 +453,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 
 /* bpf+kprobe programs can access fields of 'struct pt_regs' */
 static bool kprobe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-					enum bpf_reg_type *reg_type)
+					enum bpf_reg_type *reg_type,
+					union bpf_prog_subtype *prog_subtype)
 {
 	if (off < 0 || off >= sizeof(struct pt_regs))
 		return false;
@@ -519,7 +522,8 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
-static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
+static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id,
+		union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -532,7 +536,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 }
 
 static bool tp_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-				    enum bpf_reg_type *reg_type)
+				    enum bpf_reg_type *reg_type,
+				    union bpf_prog_subtype *prog_subtype)
 {
 	if (off < sizeof(void *) || off >= PERF_MAX_TRACE_SIZE)
 		return false;
@@ -554,7 +559,8 @@ static struct bpf_prog_type_list tracepoint_tl = {
 };
 
 static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type,
-				    enum bpf_reg_type *reg_type)
+				    enum bpf_reg_type *reg_type,
+				    union bpf_prog_subtype *prog_subtype)
 {
 	if (off < 0 || off >= sizeof(struct bpf_perf_event_data))
 		return false;
diff --git a/net/core/filter.c b/net/core/filter.c
index 61e229c3a862..4207ab2e56ba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2483,7 +2483,8 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
 };
 
 static const struct bpf_func_proto *
-sk_filter_func_proto(enum bpf_func_id func_id)
+sk_filter_func_proto(enum bpf_func_id func_id,
+		     union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -2511,7 +2512,8 @@ sk_filter_func_proto(enum bpf_func_id func_id)
 }
 
 static const struct bpf_func_proto *
-tc_cls_act_func_proto(enum bpf_func_id func_id)
+tc_cls_act_func_proto(enum bpf_func_id func_id,
+		      union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_store_bytes:
@@ -2565,12 +2567,12 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return sk_filter_func_proto(func_id);
+		return sk_filter_func_proto(func_id, prog_subtype);
 	}
 }
 
 static const struct bpf_func_proto *
-xdp_func_proto(enum bpf_func_id func_id)
+xdp_func_proto(enum bpf_func_id func_id, union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
@@ -2578,18 +2580,19 @@ xdp_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_get_smp_processor_id:
 		return &bpf_get_smp_processor_id_proto;
 	default:
-		return sk_filter_func_proto(func_id);
+		return sk_filter_func_proto(func_id, prog_subtype);
 	}
 }
 
 static const struct bpf_func_proto *
-cgroup_func_proto(enum bpf_func_id func_id)
+cgroup_func_proto(enum bpf_func_id func_id,
+		  union bpf_prog_subtype *prog_subtype)
 {
 	switch (func_id) {
 	case BPF_FUNC_skb_load_bytes:
 		return &bpf_skb_load_bytes_proto;
 	default:
-		return sk_filter_func_proto(func_id);
+		return sk_filter_func_proto(func_id, prog_subtype);
 	}
 }
 
@@ -2608,7 +2611,8 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 
 static bool sk_filter_is_valid_access(int off, int size,
 				      enum bpf_access_type type,
-				      enum bpf_reg_type *reg_type)
+				      enum bpf_reg_type *reg_type,
+				      union bpf_prog_subtype *prog_subtype)
 {
 	switch (off) {
 	case offsetof(struct __sk_buff, tc_classid):
@@ -2671,7 +2675,8 @@ static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
 
 static bool tc_cls_act_is_valid_access(int off, int size,
 				       enum bpf_access_type type,
-				       enum bpf_reg_type *reg_type)
+				       enum bpf_reg_type *reg_type,
+				       union bpf_prog_subtype *prog_subtype)
 {
 	if (type == BPF_WRITE) {
 		switch (off) {
@@ -2714,7 +2719,8 @@ static bool __is_valid_xdp_access(int off, int size,
 
 static bool xdp_is_valid_access(int off, int size,
 				enum bpf_access_type type,
-				enum bpf_reg_type *reg_type)
+				enum bpf_reg_type *reg_type,
+				union bpf_prog_subtype *prog_subtype)
 {
 	if (type == BPF_WRITE)
 		return false;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 3/5] bpf: Add new cgroup attach type to enable sock modifications
  2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
  2016-10-27  0:58 ` [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type David Ahern
  2016-10-27  0:58 ` [PATCH net-next 2/5] bpf: Add eBPF program subtype and is_valid_subtype() verifier David Ahern
@ 2016-10-27  0:58 ` David Ahern
  2016-10-27  0:58 ` [PATCH net-next 4/5] samples: bpf: Add prog_subtype to bpf_prog_load David Ahern
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2016-10-27  0:58 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Allow BPF_PROG_TYPE_CGROUP programs with cgroup.sock subtype to modify
sk_bound_dev_if for newly created AF_INET or AF_INET6 sockets. The program
can be attached to a cgroup using attach type BPF_CGROUP_INET_SOCK. The
cgroup verifier ops are updated to handle the sock offsets as well as the
existing skb accesses.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

v2
- dropped the bpf_sock_store_u32 helper
- dropped the new prog type BPF_PROG_TYPE_CGROUP_SOCK
- moved valid access and context conversion to use subtype
- dropped CREATE from BPF_CGROUP_INET_SOCK and related function names
- moved running of filter from sk_alloc to inet{6}_create

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/linux/filter.h   |  2 +-
 include/uapi/linux/bpf.h |  5 ++++
 kernel/bpf/cgroup.c      |  9 ++++++
 kernel/bpf/syscall.c     |  2 ++
 net/core/filter.c        | 77 ++++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/af_inet.c       |  4 +++
 net/ipv6/af_inet6.c      |  3 ++
 7 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 88470cdd3ee1..ffde714f3a98 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -409,7 +409,7 @@ struct bpf_prog {
 	union bpf_prog_subtype	subtype;	/* For fine-grained verifications */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
-	unsigned int		(*bpf_func)(const struct sk_buff *skb,
+	unsigned int		(*bpf_func)(const void *ctx,
 					    const struct bpf_insn *filter);
 	/* Instructions for interpreter */
 	union {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 160c24ffdce2..546e84b1792f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -104,6 +104,7 @@ enum bpf_prog_type {
 enum bpf_attach_type {
 	BPF_CGROUP_INET_INGRESS,
 	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -532,6 +533,10 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+struct bpf_sock {
+	__u32 bound_dev_if;
+};
+
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
  * return codes are reserved for future use. Unknown return codes will result
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index d5746aec8f34..796e39aa28f5 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -117,6 +117,12 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 	}
 }
 
+static int __cgroup_bpf_run_filter_sock(struct sock *sk,
+					struct bpf_prog *prog)
+{
+	return prog->bpf_func(sk, prog->insnsi) == 1 ? 0 : -EPERM;
+}
+
 static int __cgroup_bpf_run_filter_skb(struct sk_buff *skb,
 				       struct bpf_prog *prog)
 {
@@ -171,6 +177,9 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 		case BPF_CGROUP_INET_EGRESS:
 			ret = __cgroup_bpf_run_filter_skb(skb, prog);
 			break;
+		case BPF_CGROUP_INET_SOCK:
+			ret = __cgroup_bpf_run_filter_sock(sk, prog);
+			break;
 		/* make gcc happy else complains about missing enum value */
 		default:
 			return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fbf81156e49d..bc3be0b19b57 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -843,6 +843,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+	case BPF_CGROUP_INET_SOCK:
 		prog = bpf_prog_get_type(attr->attach_bpf_fd,
 					 BPF_PROG_TYPE_CGROUP);
 		if (IS_ERR(prog))
@@ -880,6 +881,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+	case BPF_CGROUP_INET_SOCK:
 		cgrp = cgroup_get_from_fd(attr->target_fd);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
diff --git a/net/core/filter.c b/net/core/filter.c
index 4207ab2e56ba..7193eb7fe892 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2634,6 +2634,40 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool sock_filter_is_valid_access(int off, int size,
+					enum bpf_access_type type)
+{
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case offsetof(struct bpf_sock, bound_dev_if):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	if (off < 0 || off + size > sizeof(struct bpf_sock))
+		return false;
+
+	/* The verifier guarantees that size > 0. */
+	if (off % size != 0)
+		return false;
+
+	return true;
+}
+
+static bool cgroup_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   enum bpf_reg_type *reg_type,
+				   union bpf_prog_subtype *prog_subtype)
+{
+	if (prog_subtype->cgroup.sock)
+		return sock_filter_is_valid_access(off, size, type);
+
+	return sk_filter_is_valid_access(off, size, type, reg_type,
+					 prog_subtype);
+}
+
 static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
 			       const struct bpf_prog *prog)
 {
@@ -2894,6 +2928,45 @@ static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 	return insn - insn_buf;
 }
 
+static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
+					  int dst_reg, int src_reg,
+					  int ctx_off,
+					  struct bpf_insn *insn_buf,
+					  struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (ctx_off) {
+	case offsetof(struct bpf_sock, bound_dev_if):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_bound_dev_if) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, dst_reg, src_reg,
+					offsetof(struct sock, sk_bound_dev_if));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
+				      offsetof(struct sock, sk_bound_dev_if));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+static u32 cgroup_convert_ctx_access(enum bpf_access_type type, int dst_reg,
+				     int src_reg, int ctx_off,
+				     struct bpf_insn *insn_buf,
+				     struct bpf_prog *prog)
+{
+	union bpf_prog_subtype *prog_subtype = &prog->subtype;
+
+	if (prog_subtype->cgroup.sock)
+		return sock_filter_convert_ctx_access(type, dst_reg, src_reg,
+						      ctx_off, insn_buf, prog);
+
+	return sk_filter_convert_ctx_access(type, dst_reg, src_reg, ctx_off,
+					    insn_buf, prog);
+}
+
 static u32 tc_cls_act_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					 int src_reg, int ctx_off,
 					 struct bpf_insn *insn_buf,
@@ -2963,8 +3036,8 @@ static const struct bpf_verifier_ops xdp_ops = {
 
 static const struct bpf_verifier_ops cgroup_ops = {
 	.get_func_proto		= cgroup_func_proto,
-	.is_valid_access	= sk_filter_is_valid_access,
-	.convert_ctx_access	= sk_filter_convert_ctx_access,
+	.is_valid_access	= cgroup_is_valid_access,
+	.convert_ctx_access	= cgroup_convert_ctx_access,
 };
 
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1effc986739e..c0934f7483cb 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -377,6 +377,10 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 		if (err)
 			sk_common_release(sk);
 	}
+
+	if (!kern)
+		cgroup_bpf_run_filter(sk, NULL, BPF_CGROUP_INET_SOCK);
+
 out:
 	return err;
 out_rcu_unlock:
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 46ad699937fd..c499ae3c472e 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -257,6 +257,9 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 			goto out;
 		}
 	}
+
+	if (!kern)
+		cgroup_bpf_run_filter(sk, NULL, BPF_CGROUP_INET_SOCK);
 out:
 	return err;
 out_rcu_unlock:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 4/5] samples: bpf: Add prog_subtype to bpf_prog_load
  2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
                   ` (2 preceding siblings ...)
  2016-10-27  0:58 ` [PATCH v2 net-next 3/5] bpf: Add new cgroup attach type to enable sock modifications David Ahern
@ 2016-10-27  0:58 ` David Ahern
  2016-10-27  0:58 ` [PATCH v2 net-next 5/5] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
  2016-10-31 17:01 ` [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Miller
  5 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2016-10-27  0:58 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Add bpf_prog_subtype argument to bpf_prog_load. If arg is non-NULL,
it is added to the attr passed to the bpf system call.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 samples/bpf/bpf_load.c     | 2 +-
 samples/bpf/fds_example.c  | 2 +-
 samples/bpf/libbpf.c       | 5 ++++-
 samples/bpf/libbpf.h       | 3 ++-
 samples/bpf/sock_example.c | 2 +-
 5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 97913e109b14..634fcd7f7498 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -77,7 +77,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
 		return -1;
 	}
 
-	fd = bpf_prog_load(prog_type, prog, size, license, kern_version);
+	fd = bpf_prog_load(prog_type, prog, size, license, kern_version, NULL);
 	if (fd < 0) {
 		printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf);
 		return -1;
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index 625e797be6ef..df38b68f3586 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -59,7 +59,7 @@ static int bpf_prog_create(const char *object)
 		return prog_fd[0];
 	} else {
 		return bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER,
-				     insns, sizeof(insns), "GPL", 0);
+				     insns, sizeof(insns), "GPL", 0, NULL);
 	}
 }
 
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 9ce707bf02a7..53987b1ab191 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -82,7 +82,8 @@ char bpf_log_buf[LOG_BUF_SIZE];
 
 int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, int prog_len,
-		  const char *license, int kern_version)
+		  const char *license, int kern_version,
+		  union bpf_prog_subtype *prog_subtype)
 {
 	union bpf_attr attr = {
 		.prog_type = prog_type,
@@ -98,6 +99,8 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	 * padding is zero initialized
 	 */
 	attr.kern_version = kern_version;
+	if (prog_subtype)
+		attr.prog_subtype = *prog_subtype;
 
 	bpf_log_buf[0] = 0;
 
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index d0a799a52eaf..949c89c85c70 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -13,7 +13,8 @@ int bpf_get_next_key(int fd, void *key, void *next_key);
 
 int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, int insn_len,
-		  const char *license, int kern_version);
+		  const char *license, int kern_version,
+		  union bpf_prog_subtype *prog_subtype);
 
 int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
 int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 28b60baa9fa8..521f918ab34d 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -56,7 +56,7 @@ static int test_sock(void)
 	};
 
 	prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
-				"GPL", 0);
+				"GPL", 0, NULL);
 	if (prog_fd < 0) {
 		printf("failed to load prog '%s'\n", strerror(errno));
 		goto cleanup;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 net-next 5/5] samples: bpf: add userspace example for modifying sk_bound_dev_if
  2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
                   ` (3 preceding siblings ...)
  2016-10-27  0:58 ` [PATCH net-next 4/5] samples: bpf: Add prog_subtype to bpf_prog_load David Ahern
@ 2016-10-27  0:58 ` David Ahern
  2016-10-31 17:01 ` [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Miller
  5 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2016-10-27  0:58 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.

v2
- removed bpf_sock_store_u32 references
- changed BPF_CGROUP_INET_SOCK_CREATE to BPF_CGROUP_INET_SOCK
- remove BPF_PROG_TYPE_CGROUP_SOCK prog type and add prog_subtype

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 samples/bpf/Makefile          |  2 ++
 samples/bpf/test_cgrp2_sock.c | 84 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_sock.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 2624d5d7ce8b..ec4ef37a2dbc 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -22,6 +22,7 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += test_cgrp2_attach
+hostprogs-y += test_cgrp2_sock
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
@@ -48,6 +49,7 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
+test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
new file mode 100644
index 000000000000..9cc3307052e0
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -0,0 +1,84 @@
+/* eBPF example program:
+ *
+ * - Loads eBPF program
+ *
+ *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
+ *   sockets opened by processes in the cgroup.
+ *
+ * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/bpf.h>
+
+#include "libbpf.h"
+
+static int prog_load(int idx)
+{
+	struct bpf_insn prog[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+		BPF_MOV64_IMM(BPF_REG_3, idx),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+	union bpf_prog_subtype prog_subtype = { .cgroup.sock = 1 };
+
+	return bpf_prog_load(BPF_PROG_TYPE_CGROUP, prog, sizeof(prog), "GPL",
+			     0, &prog_subtype);
+}
+
+static int usage(const char *argv0)
+{
+	printf("Usage: %s <cg-path> device-index\n", argv0);
+	return EXIT_FAILURE;
+}
+
+int main(int argc, char **argv)
+{
+	int cg_fd, prog_fd, ret;
+	int idx = 0;
+
+	if (argc < 2)
+		return usage(argv[0]);
+
+	idx = atoi(argv[2]);
+	if (!idx) {
+		printf("Invalid device index\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (cg_fd < 0) {
+		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	prog_fd = prog_load(idx);
+	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
+
+	if (prog_fd < 0) {
+		printf("Failed to load prog: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK);
+	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK);
+	if (ret < 0) {
+		printf("Failed to attach prog to cgroup: '%s'\n",
+		       strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	return EXIT_SUCCESS;
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-27  0:58 ` [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type David Ahern
@ 2016-10-31 16:58   ` David Miller
  2016-10-31 17:00     ` Daniel Mack
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2016-10-31 16:58 UTC (permalink / raw)
  To: dsa; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

From: David Ahern <dsa@cumulusnetworks.com>
Date: Wed, 26 Oct 2016 17:58:38 -0700

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 6b62ee9a2f78..73da296c2125 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -98,7 +98,7 @@ enum bpf_prog_type {
>  	BPF_PROG_TYPE_TRACEPOINT,
>  	BPF_PROG_TYPE_XDP,
>  	BPF_PROG_TYPE_PERF_EVENT,
> -	BPF_PROG_TYPE_CGROUP_SKB,
> +	BPF_PROG_TYPE_CGROUP,
>  };
>  
>  enum bpf_attach_type {

If we do this then the cgroup-bpf series should use this value rather than
changing it after-the-fact in your series here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-31 16:58   ` David Miller
@ 2016-10-31 17:00     ` Daniel Mack
  2016-10-31 17:05       ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Mack @ 2016-10-31 17:00 UTC (permalink / raw)
  To: David Miller, dsa; +Cc: netdev, ast, daniel, maheshb, tgraf

On 10/31/2016 05:58 PM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Wed, 26 Oct 2016 17:58:38 -0700
> 
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 6b62ee9a2f78..73da296c2125 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -98,7 +98,7 @@ enum bpf_prog_type {
>>  	BPF_PROG_TYPE_TRACEPOINT,
>>  	BPF_PROG_TYPE_XDP,
>>  	BPF_PROG_TYPE_PERF_EVENT,
>> -	BPF_PROG_TYPE_CGROUP_SKB,
>> +	BPF_PROG_TYPE_CGROUP,
>>  };
>>  
>>  enum bpf_attach_type {
> 
> If we do this then the cgroup-bpf series should use this value rather than
> changing it after-the-fact in your series here.
> 

Yeah, I'm confused too. I changed that name in my v7 from
BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's (Ahern)
request. Why is it now renamed again?


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if
  2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
                   ` (4 preceding siblings ...)
  2016-10-27  0:58 ` [PATCH v2 net-next 5/5] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
@ 2016-10-31 17:01 ` David Miller
  2016-10-31 17:16   ` David Ahern
  5 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2016-10-31 17:01 UTC (permalink / raw)
  To: dsa; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

From: David Ahern <dsa@cumulusnetworks.com>
Date: Wed, 26 Oct 2016 17:58:37 -0700

> The recently added VRF support in Linux leverages the bind-to-device
> API for programs to specify an L3 domain for a socket. While
> SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
> program has support for it. Even for those programs that do support it,
> the API requires processes to be started as root (CAP_NET_RAW) which
> is not desirable from a general security perspective.
> 
> This patch set leverages Daniel Mack's work to attach bpf programs to
> a cgroup:
> 
>     https://www.mail-archive.com/netdev@vger.kernel.org/msg134028.html
> 
> to provide a capability to set sk_bound_dev_if for all AF_INET{6}
> sockets opened by a process in a cgroup when the sockets are allocated.
> 
> This capability enables running any program in a VRF context and is key
> to deploying Management VRF, a fundamental configuration for networking
> gear, with any Linux OS installation.

Ok, after some review I think I understand what's going on here.

It would initially seem simpler to just support forced sk_bound_dev_if
in cgroups.  But I think I understand why you may have gone this way:

1) The cgroup-bpf code always has the cgroup hierarchy propagation
   logic.

2) The may be use cases for doing things with other sock members.

With respect to #2, do you know of any such planned use cases already?

Also, any reason why you don't allow the cgroup bpf sk filter to return
an error code so that the sock creation could be cancelled if the eBPF
program desires that?  It could be useful, I suppose.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-31 17:00     ` Daniel Mack
@ 2016-10-31 17:05       ` David Ahern
  2016-10-31 17:16         ` Daniel Mack
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2016-10-31 17:05 UTC (permalink / raw)
  To: Daniel Mack, David Miller; +Cc: netdev, ast, daniel, maheshb, tgraf

On 10/31/16 11:00 AM, Daniel Mack wrote:
> On 10/31/2016 05:58 PM, David Miller wrote:
>> From: David Ahern <dsa@cumulusnetworks.com>
>> Date: Wed, 26 Oct 2016 17:58:38 -0700
>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 6b62ee9a2f78..73da296c2125 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -98,7 +98,7 @@ enum bpf_prog_type {
>>>  	BPF_PROG_TYPE_TRACEPOINT,
>>>  	BPF_PROG_TYPE_XDP,
>>>  	BPF_PROG_TYPE_PERF_EVENT,
>>> -	BPF_PROG_TYPE_CGROUP_SKB,
>>> +	BPF_PROG_TYPE_CGROUP,
>>>  };
>>>  
>>>  enum bpf_attach_type {
>>
>> If we do this then the cgroup-bpf series should use this value rather than
>> changing it after-the-fact in your series here.
>>
> 
> Yeah, I'm confused too. I changed that name in my v7 from
> BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's (Ahern)
> request. Why is it now renamed again?

Thomas pushed back on adding another program type in favor of using subtypes. So this makes the program type generic to CGROUP and patch 2 in this v2 set added Mickaël's subtype patch with the socket mangling done that way in patch 3.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-31 17:05       ` David Ahern
@ 2016-10-31 17:16         ` Daniel Mack
  2016-10-31 17:49           ` Thomas Graf
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Mack @ 2016-10-31 17:16 UTC (permalink / raw)
  To: David Ahern, David Miller; +Cc: netdev, ast, daniel, maheshb, tgraf

On 10/31/2016 06:05 PM, David Ahern wrote:
> On 10/31/16 11:00 AM, Daniel Mack wrote:
>> On 10/31/2016 05:58 PM, David Miller wrote:
>>> From: David Ahern <dsa@cumulusnetworks.com> Date: Wed, 26 Oct
>>> 2016 17:58:38 -0700
>>> 
>>>> diff --git a/include/uapi/linux/bpf.h
>>>> b/include/uapi/linux/bpf.h index 6b62ee9a2f78..73da296c2125
>>>> 100644 --- a/include/uapi/linux/bpf.h +++
>>>> b/include/uapi/linux/bpf.h @@ -98,7 +98,7 @@ enum bpf_prog_type
>>>> { BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_XDP, 
>>>> BPF_PROG_TYPE_PERF_EVENT, -	BPF_PROG_TYPE_CGROUP_SKB, +
>>>> BPF_PROG_TYPE_CGROUP, };
>>>> 
>>>> enum bpf_attach_type {
>>> 
>>> If we do this then the cgroup-bpf series should use this value
>>> rather than changing it after-the-fact in your series here.
>>> 
>> 
>> Yeah, I'm confused too. I changed that name in my v7 from 
>> BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's
>> (Ahern) request. Why is it now renamed again?
> 
> Thomas pushed back on adding another program type in favor of using
> subtypes. So this makes the program type generic to CGROUP and patch
> 2 in this v2 set added Mickaël's subtype patch with the socket
> mangling done that way in patch 3.
> 

Fine for me. I can change it around again.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if
  2016-10-31 17:01 ` [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Miller
@ 2016-10-31 17:16   ` David Ahern
  2016-10-31 17:46     ` Thomas Graf
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2016-10-31 17:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf

On 10/31/16 11:01 AM, David Miller wrote:
> From: David Ahern <dsa@cumulusnetworks.com>
> Date: Wed, 26 Oct 2016 17:58:37 -0700
> 
>> The recently added VRF support in Linux leverages the bind-to-device
>> API for programs to specify an L3 domain for a socket. While
>> SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
>> program has support for it. Even for those programs that do support it,
>> the API requires processes to be started as root (CAP_NET_RAW) which
>> is not desirable from a general security perspective.
>>
>> This patch set leverages Daniel Mack's work to attach bpf programs to
>> a cgroup:
>>
>>     https://www.mail-archive.com/netdev@vger.kernel.org/msg134028.html
>>
>> to provide a capability to set sk_bound_dev_if for all AF_INET{6}
>> sockets opened by a process in a cgroup when the sockets are allocated.
>>
>> This capability enables running any program in a VRF context and is key
>> to deploying Management VRF, a fundamental configuration for networking
>> gear, with any Linux OS installation.
> 
> Ok, after some review I think I understand what's going on here.
> 
> It would initially seem simpler to just support forced sk_bound_dev_if
> in cgroups.  But I think I understand why you may have gone this way:

That's what the l3mdev cgroup patch does -- force the sk_bound_dev_if for sockets. Tejun pushed back on adding new controllers. The cgroup+bpf is another way to accomplish the end goal. The key is using the cgroup infra for parent-child inheritance of the policy, holder of the policy "data" to be applied, tracking what processes are in a group, what the group is for a specific process, and on. No need to reinvent that part.

> 
> 1) The cgroup-bpf code always has the cgroup hierarchy propagation
>    logic.
> 
> 2) The may be use cases for doing things with other sock members.
> 
> With respect to #2, do you know of any such planned use cases already?

One suggestion is the local port binding limitations that Mahesh and Anoop were looking into.

> 
> Also, any reason why you don't allow the cgroup bpf sk filter to return
> an error code so that the sock creation could be cancelled if the eBPF
> program desires that?  It could be useful, I suppose.

My first draft at this feature had that but I removed it for simplicity now. Can certainly add it back.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if
  2016-10-31 17:16   ` David Ahern
@ 2016-10-31 17:46     ` Thomas Graf
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Graf @ 2016-10-31 17:46 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, netdev, daniel, ast, daniel, maheshb

On 10/31/16 at 11:16am, David Ahern wrote:
> On 10/31/16 11:01 AM, David Miller wrote:
> > Also, any reason why you don't allow the cgroup bpf sk filter to return
> > an error code so that the sock creation could be cancelled if the eBPF
> > program desires that?  It could be useful, I suppose.
> 
> My first draft at this feature had that but I removed it for simplicity now. Can certainly add it back.

We're trying to standardize on common return codes for all program
types. The lwt bpf series defines BPF_ codes which are compatible
with TC_ACT_* values to make lwt_bpf and cls_bpf compatible. Would
be great to use the same return codes and implement the ones that
make sense.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-31 17:16         ` Daniel Mack
@ 2016-10-31 17:49           ` Thomas Graf
  2016-11-14  3:51             ` David Ahern
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Graf @ 2016-10-31 17:49 UTC (permalink / raw)
  To: Daniel Mack; +Cc: David Ahern, David Miller, netdev, ast, daniel, maheshb

On 10/31/16 at 06:16pm, Daniel Mack wrote:
> On 10/31/2016 06:05 PM, David Ahern wrote:
> > On 10/31/16 11:00 AM, Daniel Mack wrote:
> >> Yeah, I'm confused too. I changed that name in my v7 from 
> >> BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's
> >> (Ahern) request. Why is it now renamed again?
> > 
> > Thomas pushed back on adding another program type in favor of using
> > subtypes. So this makes the program type generic to CGROUP and patch
> > 2 in this v2 set added Mickaël's subtype patch with the socket
> > mangling done that way in patch 3.
> > 
> 
> Fine for me. I can change it around again.

I would like to hear from Daniel B and Alexei as well. We need to
decide whether to use subtypes consistently and treat prog types as
something more high level or whether to bluntly introduce a new prog
type for every distinct set of verifier limits. I will change lwt_bpf
as well accordingly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-10-31 17:49           ` Thomas Graf
@ 2016-11-14  3:51             ` David Ahern
  2016-11-15  2:23               ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: David Ahern @ 2016-11-14  3:51 UTC (permalink / raw)
  To: Thomas Graf, Daniel Mack; +Cc: David Miller, netdev, ast, daniel, maheshb

On 10/31/16 11:49 AM, Thomas Graf wrote:
> On 10/31/16 at 06:16pm, Daniel Mack wrote:
>> On 10/31/2016 06:05 PM, David Ahern wrote:
>>> On 10/31/16 11:00 AM, Daniel Mack wrote:
>>>> Yeah, I'm confused too. I changed that name in my v7 from 
>>>> BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's
>>>> (Ahern) request. Why is it now renamed again?
>>>
>>> Thomas pushed back on adding another program type in favor of using
>>> subtypes. So this makes the program type generic to CGROUP and patch
>>> 2 in this v2 set added Mickaël's subtype patch with the socket
>>> mangling done that way in patch 3.
>>>
>>
>> Fine for me. I can change it around again.
> 
> I would like to hear from Daniel B and Alexei as well. We need to
> decide whether to use subtypes consistently and treat prog types as
> something more high level or whether to bluntly introduce a new prog
> type for every distinct set of verifier limits. I will change lwt_bpf
> as well accordingly.
> 

Alexei / Daniel - any comments/preferences on subtypes vs program types?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
  2016-11-14  3:51             ` David Ahern
@ 2016-11-15  2:23               ` Alexei Starovoitov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2016-11-15  2:23 UTC (permalink / raw)
  To: David Ahern, Thomas Graf, Daniel Mack
  Cc: David Miller, netdev, daniel, maheshb

On 11/13/16 7:51 PM, David Ahern wrote:
> On 10/31/16 11:49 AM, Thomas Graf wrote:
>> On 10/31/16 at 06:16pm, Daniel Mack wrote:
>>> On 10/31/2016 06:05 PM, David Ahern wrote:
>>>> On 10/31/16 11:00 AM, Daniel Mack wrote:
>>>>> Yeah, I'm confused too. I changed that name in my v7 from
>>>>> BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's
>>>>> (Ahern) request. Why is it now renamed again?
>>>>
>>>> Thomas pushed back on adding another program type in favor of using
>>>> subtypes. So this makes the program type generic to CGROUP and patch
>>>> 2 in this v2 set added Mickaël's subtype patch with the socket
>>>> mangling done that way in patch 3.
>>>>
>>>
>>> Fine for me. I can change it around again.
>>
>> I would like to hear from Daniel B and Alexei as well. We need to
>> decide whether to use subtypes consistently and treat prog types as
>> something more high level or whether to bluntly introduce a new prog
>> type for every distinct set of verifier limits. I will change lwt_bpf
>> as well accordingly.
>>
>
> Alexei / Daniel - any comments/preferences on subtypes vs program types?

looks like in this particular case it's better to use different program
types, since they all serve different purpose and context is different.
Daniel Mack's programs can stay BPF_PROG_TYPE_CGROUP_SKB and operate on 
skb, whereas your ifindex changing programs can be 
BPF_PROG_TYPE_CGROUP_SOCK.

regarding DanielM's patches.. Dave's comment regarding skb->sk vs sk
made us think hard about it. We looked into tunnels and at the end
realized that skb->sk won't fly, since the program will be called
multiple times on tx for every ip_output. And since we were not sure
that using 'sk' will fix it, we had to do a bunch of tests with
different tunnels and analyze the code to make sure it's all good.
We'll repost soon.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-11-15  2:23 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27  0:58 [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Ahern
2016-10-27  0:58 ` [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type David Ahern
2016-10-31 16:58   ` David Miller
2016-10-31 17:00     ` Daniel Mack
2016-10-31 17:05       ` David Ahern
2016-10-31 17:16         ` Daniel Mack
2016-10-31 17:49           ` Thomas Graf
2016-11-14  3:51             ` David Ahern
2016-11-15  2:23               ` Alexei Starovoitov
2016-10-27  0:58 ` [PATCH net-next 2/5] bpf: Add eBPF program subtype and is_valid_subtype() verifier David Ahern
2016-10-27  0:58 ` [PATCH v2 net-next 3/5] bpf: Add new cgroup attach type to enable sock modifications David Ahern
2016-10-27  0:58 ` [PATCH net-next 4/5] samples: bpf: Add prog_subtype to bpf_prog_load David Ahern
2016-10-27  0:58 ` [PATCH v2 net-next 5/5] samples: bpf: add userspace example for modifying sk_bound_dev_if David Ahern
2016-10-31 17:01 ` [PATCH v2 net-next 0/5] Add bpf support to set sk_bound_dev_if David Miller
2016-10-31 17:16   ` David Ahern
2016-10-31 17:46     ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.