All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/6] Add eBPF hooks for cgroups
@ 2016-09-19 16:43 Daniel Mack
  2016-09-19 16:43 ` [PATCH v6 1/6] bpf: add new prog type for cgroup socket filtering Daniel Mack
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:43 UTC (permalink / raw)
  To: htejun, daniel, ast
  Cc: davem, kafai, fw, pablo, harald, netdev, sargun, cgroups, Daniel Mack

This is v6 of the patch set to allow eBPF programs for network
filtering and accounting to be attached to cgroups, so that they apply
to all sockets of all tasks placed in that cgroup. The logic also
allows to be extendeded for other cgroup based eBPF logic.


Changes from v5:

* The eBPF programs now operate on L3 rather than on L2 of the packets,
  and the egress hooks were moved from __dev_queue_xmit() to
  ip*_output().

* For BPF_PROG_TYPE_CGROUP_SOCKET, disallow direct access to the skb
  through BPF_LD_[ABS|IND] instructions, but hook up the
  bpf_skb_load_bytes() access helper instead. Thanks to Daniel Borkmann
  for the help.


Changes from v4:

* Plug an skb leak when dropping packets due to eBPF verdicts in
  __dev_queue_xmit(). Spotted by Daniel Borkmann.

* Check for sk_fullsock(sk) in __cgroup_bpf_run_filter() so we don't
  operate on timewait or request sockets. Suggested by Daniel Borkmann.

* Add missing @parent parameter in kerneldoc of __cgroup_bpf_update().
  Spotted by Rami Rosen.

* Include linux/jump_label.h from bpf-cgroup.h to fix a kbuild error.


Changes from v3:

* Dropped the _FILTER suffix from BPF_PROG_TYPE_CGROUP_SOCKET_FILTER,
  renamed BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS to
  BPF_CGROUP_INET_{IN,E}GRESS and alias BPF_MAX_ATTACH_TYPE to
  __BPF_MAX_ATTACH_TYPE, as suggested by Daniel Borkmann.

* Dropped the attach_flags member from the anonymous struct for BPF
  attach operations in union bpf_attr. They can be added later on via
  CHECK_ATTR. Requested by Daniel Borkmann and Alexei.

* Release old_prog at the end of __cgroup_bpf_update rather that at
  the beginning to fix a race gap between program updates and their
  users. Spotted by Daniel Borkmann.

* Plugged an skb leak when dropping packets on the egress path.
  Spotted by Daniel Borkmann.

* Add cgroups@vger.kernel.org to the loop, as suggested by Rami Rosen.

* Some minor coding style adoptions not worth mentioning in particular.


Changes from v2:

* Fixed the RCU locking details Tejun pointed out.

* Assert bpf_attr.flags == 0 in BPF_PROG_DETACH syscall handler.


Changes from v1:

* Moved all bpf specific cgroup code into its own file, and stub
  out related functions for !CONFIG_CGROUP_BPF as static inline nops.
  This way, the call sites are not cluttered with #ifdef guards while
  the feature remains compile-time configurable.

* Implemented the new scheme proposed by Tejun. Per cgroup, store one
  set of pointers that are pinned to the cgroup, and one for the
  programs that are effective. When a program is attached or detached,
  the change is propagated to all the cgroup's descendants. If a
  subcgroup has its own pinned program, skip the whole subbranch in
  order to allow delegation models.

* The hookup for egress packets is now done from __dev_queue_xmit().

* A static key is now used in both the ingress and egress fast paths
  to keep performance penalties close to zero if the feature is
  not in use.

* Overall cleanup to make the accessors use the program arrays.
  This should make it much easier to add new program types, which
  will then automatically follow the pinned vs. effective logic.

* Fixed locking issues, as pointed out by Eric Dumazet and Alexei
  Starovoitov. Changes to the program array are now done with
  xchg() and are protected by cgroup_mutex.

* eBPF programs are now expected to return 1 to let the packet pass,
  not >= 0. Pointed out by Alexei.

* Operation is now limited to INET sockets, so local AF_UNIX sockets
  are not affected. The enum members are renamed accordingly. In case
  other socket families should be supported, this can be extended in
  the future.

* The sample program learned to support both ingress and egress, and
  can now optionally make the eBPF program drop packets by making it
  return 0.


As always, feedback is much appreciated.

Thanks,
Daniel


Daniel Mack (6):
  bpf: add new prog type for cgroup socket filtering
  cgroup: add support for eBPF programs
  bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands
  net: filter: run cgroup eBPF ingress programs
  net: ipv4, ipv6: run cgroup eBPF egress programs
  samples: bpf: add userspace example for attaching eBPF programs to
    cgroups

 include/linux/bpf-cgroup.h      |  71 +++++++++++++++++
 include/linux/cgroup-defs.h     |   4 +
 include/uapi/linux/bpf.h        |  17 ++++
 init/Kconfig                    |  12 +++
 kernel/bpf/Makefile             |   1 +
 kernel/bpf/cgroup.c             | 166 ++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c            |  81 ++++++++++++++++++++
 kernel/cgroup.c                 |  18 +++++
 net/core/filter.c               |  27 +++++++
 net/ipv4/ip_output.c            |  15 ++++
 net/ipv6/ip6_output.c           |   8 ++
 samples/bpf/Makefile            |   2 +
 samples/bpf/libbpf.c            |  21 +++++
 samples/bpf/libbpf.h            |   3 +
 samples/bpf/test_cgrp2_attach.c | 147 +++++++++++++++++++++++++++++++++++
 15 files changed, 593 insertions(+)
 create mode 100644 include/linux/bpf-cgroup.h
 create mode 100644 kernel/bpf/cgroup.c
 create mode 100644 samples/bpf/test_cgrp2_attach.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 1/6] bpf: add new prog type for cgroup socket filtering
  2016-09-19 16:43 [PATCH v6 0/6] Add eBPF hooks for cgroups Daniel Mack
@ 2016-09-19 16:43 ` Daniel Mack
  2016-09-19 16:43 ` [PATCH v6 3/6] bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands Daniel Mack
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:43 UTC (permalink / raw)
  To: htejun, daniel, ast
  Cc: davem, kafai, fw, pablo, harald, netdev, sargun, cgroups, Daniel Mack

This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that
it does not allow BPF_LD_[ABS|IND] instructions and hooks up the
bpf_skb_load_bytes() helper.

Programs of this type will be attached to cgroups for network filtering
and accounting.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 include/uapi/linux/bpf.h |  9 +++++++++
 net/core/filter.c        | 23 +++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f896dfa..55f815e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -96,8 +96,17 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_TRACEPOINT,
 	BPF_PROG_TYPE_XDP,
 	BPF_PROG_TYPE_PERF_EVENT,
+	BPF_PROG_TYPE_CGROUP_SOCKET,
 };
 
+enum bpf_attach_type {
+	BPF_CGROUP_INET_INGRESS,
+	BPF_CGROUP_INET_EGRESS,
+	__MAX_BPF_ATTACH_TYPE
+};
+
+#define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
+
 #define BPF_PSEUDO_MAP_FD	1
 
 /* flags for BPF_MAP_UPDATE_ELEM command */
diff --git a/net/core/filter.c b/net/core/filter.c
index 298b146..e46c98e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2496,6 +2496,17 @@ xdp_func_proto(enum bpf_func_id func_id)
 	}
 }
 
+static const struct bpf_func_proto *
+cg_sk_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_skb_load_bytes:
+		return &bpf_skb_load_bytes_proto;
+	default:
+		return sk_filter_func_proto(func_id);
+	}
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
 	if (off < 0 || off >= sizeof(struct __sk_buff))
@@ -2818,6 +2829,12 @@ static const struct bpf_verifier_ops xdp_ops = {
 	.convert_ctx_access	= xdp_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops cg_sk_ops = {
+	.get_func_proto		= cg_sk_func_proto,
+	.is_valid_access	= sk_filter_is_valid_access,
+	.convert_ctx_access	= sk_filter_convert_ctx_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops	= &sk_filter_ops,
 	.type	= BPF_PROG_TYPE_SOCKET_FILTER,
@@ -2838,12 +2855,18 @@ static struct bpf_prog_type_list xdp_type __read_mostly = {
 	.type	= BPF_PROG_TYPE_XDP,
 };
 
+static struct bpf_prog_type_list cg_sk_type __read_mostly = {
+	.ops	= &cg_sk_ops,
+	.type	= BPF_PROG_TYPE_CGROUP_SOCKET,
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
 	bpf_register_prog_type(&sched_cls_type);
 	bpf_register_prog_type(&sched_act_type);
 	bpf_register_prog_type(&xdp_type);
+	bpf_register_prog_type(&cg_sk_type);
 
 	return 0;
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 2/6] cgroup: add support for eBPF programs
       [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
@ 2016-09-19 16:43   ` Daniel Mack
  2016-09-19 16:43   ` [PATCH v6 4/6] net: filter: run cgroup eBPF ingress programs Daniel Mack
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:43 UTC (permalink / raw)
  To: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ, ast-b10kYP2dOMg
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Mack

This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.

To illustrate the logic behind that, assume the following example
cgroup hierarchy.

  A - B - C
        \ D - E

If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.

Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
---
 include/linux/bpf-cgroup.h  |  71 +++++++++++++++++++
 include/linux/cgroup-defs.h |   4 ++
 init/Kconfig                |  12 ++++
 kernel/bpf/Makefile         |   1 +
 kernel/bpf/cgroup.c         | 166 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/cgroup.c             |  18 +++++
 6 files changed, 272 insertions(+)
 create mode 100644 include/linux/bpf-cgroup.h
 create mode 100644 kernel/bpf/cgroup.c

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
new file mode 100644
index 0000000..fc076de
--- /dev/null
+++ b/include/linux/bpf-cgroup.h
@@ -0,0 +1,71 @@
+#ifndef _BPF_CGROUP_H
+#define _BPF_CGROUP_H
+
+#include <linux/bpf.h>
+#include <linux/jump_label.h>
+#include <uapi/linux/bpf.h>
+
+struct sock;
+struct cgroup;
+struct sk_buff;
+
+#ifdef CONFIG_CGROUP_BPF
+
+extern struct static_key_false cgroup_bpf_enabled_key;
+#define cgroup_bpf_enabled static_branch_unlikely(&cgroup_bpf_enabled_key)
+
+struct cgroup_bpf {
+	/*
+	 * Store two sets of bpf_prog pointers, one for programs that are
+	 * pinned directly to this cgroup, and one for those that are effective
+	 * when this cgroup is accessed.
+	 */
+	struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
+	struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
+};
+
+void cgroup_bpf_put(struct cgroup *cgrp);
+void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
+
+void __cgroup_bpf_update(struct cgroup *cgrp,
+			 struct cgroup *parent,
+			 struct bpf_prog *prog,
+			 enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
+void cgroup_bpf_update(struct cgroup *cgrp,
+		       struct bpf_prog *prog,
+		       enum bpf_attach_type type);
+
+int __cgroup_bpf_run_filter(struct sock *sk,
+			    struct sk_buff *skb,
+			    enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled */
+static inline int cgroup_bpf_run_filter(struct sock *sk,
+					struct sk_buff *skb,
+					enum bpf_attach_type type)
+{
+	if (cgroup_bpf_enabled)
+		return __cgroup_bpf_run_filter(sk, skb, type);
+
+	return 0;
+}
+
+#else
+
+struct cgroup_bpf {};
+static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
+static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
+				      struct cgroup *parent) {}
+
+static inline int cgroup_bpf_run_filter(struct sock *sk,
+					struct sk_buff *skb,
+					enum bpf_attach_type type)
+{
+	return 0;
+}
+
+#endif /* CONFIG_CGROUP_BPF */
+
+#endif /* _BPF_CGROUP_H */
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5b17de6..861b467 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -16,6 +16,7 @@
 #include <linux/percpu-refcount.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/workqueue.h>
+#include <linux/bpf-cgroup.h>
 
 #ifdef CONFIG_CGROUPS
 
@@ -300,6 +301,9 @@ struct cgroup {
 	/* used to schedule release agent */
 	struct work_struct release_agent_work;
 
+	/* used to store eBPF programs */
+	struct cgroup_bpf bpf;
+
 	/* ids of the ancestors at each level including self */
 	int ancestor_ids[];
 };
diff --git a/init/Kconfig b/init/Kconfig
index cac3f09..71c71b0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1144,6 +1144,18 @@ config CGROUP_PERF
 
 	  Say N if unsure.
 
+config CGROUP_BPF
+	bool "Support for eBPF programs attached to cgroups"
+	depends on BPF_SYSCALL && SOCK_CGROUP_DATA
+	help
+	  Allow attaching eBPF programs to a cgroup using the bpf(2)
+	  syscall command BPF_PROG_ATTACH.
+
+	  In which context these programs are accessed depends on the type
+	  of attachment. For instance, programs that are attached using
+	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
+	  inet sockets.
+
 config CGROUP_DEBUG
 	bool "Example controller"
 	default n
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index eed911d..b22256b 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
 endif
+obj-$(CONFIG_CGROUP_BPF) += cgroup.o
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
new file mode 100644
index 0000000..81b96a4
--- /dev/null
+++ b/kernel/bpf/cgroup.c
@@ -0,0 +1,166 @@
+/*
+ * Functions to manage eBPF programs attached to cgroups
+ *
+ * Copyright (c) 2016 Daniel Mack
+ *
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License.  See the file COPYING in the main directory of the
+ * Linux distribution for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/atomic.h>
+#include <linux/cgroup.h>
+#include <linux/slab.h>
+#include <linux/bpf.h>
+#include <linux/bpf-cgroup.h>
+#include <net/sock.h>
+
+DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
+EXPORT_SYMBOL(cgroup_bpf_enabled_key);
+
+/**
+ * cgroup_bpf_put() - put references of all bpf programs
+ * @cgrp: the cgroup to modify
+ */
+void cgroup_bpf_put(struct cgroup *cgrp)
+{
+	unsigned int type;
+
+	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.prog); type++) {
+		struct bpf_prog *prog = cgrp->bpf.prog[type];
+
+		if (prog) {
+			bpf_prog_put(prog);
+			static_branch_dec(&cgroup_bpf_enabled_key);
+		}
+	}
+}
+
+/**
+ * cgroup_bpf_inherit() - inherit effective programs from parent
+ * @cgrp: the cgroup to modify
+ * @parent: the parent to inherit from
+ */
+void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
+{
+	unsigned int type;
+
+	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.effective); type++) {
+		struct bpf_prog *e;
+
+		e = rcu_dereference_protected(parent->bpf.effective[type],
+					      lockdep_is_held(&cgroup_mutex));
+		rcu_assign_pointer(cgrp->bpf.effective[type], e);
+	}
+}
+
+/**
+ * __cgroup_bpf_update() - Update the pinned program of a cgroup, and
+ *                         propagate the change to descendants
+ * @cgrp: The cgroup which descendants to traverse
+ * @parent: The parent of @cgrp, or %NULL if @cgrp is the root
+ * @prog: A new program to pin
+ * @type: Type of pinning operation (ingress/egress)
+ *
+ * Each cgroup has a set of two pointers for bpf programs; one for eBPF
+ * programs it owns, and which is effective for execution.
+ *
+ * If @prog is %NULL, this function attaches a new program to the cgroup and
+ * releases the one that is currently attached, if any. @prog is then made
+ * the effective program of type @type in that cgroup.
+ *
+ * If @prog is %NULL, the currently attached program of type @type is released,
+ * and the effective program of the parent cgroup (if any) is inherited to
+ * @cgrp.
+ *
+ * Then, the descendants of @cgrp are walked and the effective program for
+ * each of them is set to the effective program of @cgrp unless the
+ * descendant has its own program attached, in which case the subbranch is
+ * skipped. This ensures that delegated subcgroups with own programs are left
+ * untouched.
+ *
+ * Must be called with cgroup_mutex held.
+ */
+void __cgroup_bpf_update(struct cgroup *cgrp,
+			 struct cgroup *parent,
+			 struct bpf_prog *prog,
+			 enum bpf_attach_type type)
+{
+	struct bpf_prog *old_prog, *effective;
+	struct cgroup_subsys_state *pos;
+
+	old_prog = xchg(cgrp->bpf.prog + type, prog);
+
+	effective = (!prog && parent) ?
+		rcu_dereference_protected(parent->bpf.effective[type],
+					  lockdep_is_held(&cgroup_mutex)) :
+		prog;
+
+	css_for_each_descendant_pre(pos, &cgrp->self) {
+		struct cgroup *desc = container_of(pos, struct cgroup, self);
+
+		/* skip the subtree if the descendant has its own program */
+		if (desc->bpf.prog[type] && desc != cgrp)
+			pos = css_rightmost_descendant(pos);
+		else
+			rcu_assign_pointer(desc->bpf.effective[type],
+					   effective);
+	}
+
+	if (prog)
+		static_branch_inc(&cgroup_bpf_enabled_key);
+
+	if (old_prog) {
+		bpf_prog_put(old_prog);
+		static_branch_dec(&cgroup_bpf_enabled_key);
+	}
+}
+
+/**
+ * __cgroup_bpf_run_filter() - Run a program for packet filtering
+ * @sk: The socken sending or receiving traffic
+ * @skb: The skb that is being sent or received
+ * @type: The type of program to be exectuted
+ *
+ * If no socket is passed, or the socket is not of type INET or INET6,
+ * this function does nothing and returns 0.
+ *
+ * The program type passed in via @type must be suitable for network
+ * filtering. No further check is performed to assert that.
+ *
+ * This function will return %-EPERM if any if an attached program was found
+ * and if it returned != 1 during execution. In all other cases, 0 is returned.
+ */
+int __cgroup_bpf_run_filter(struct sock *sk,
+			    struct sk_buff *skb,
+			    enum bpf_attach_type type)
+{
+	struct bpf_prog *prog;
+	struct cgroup *cgrp;
+	int ret = 0;
+
+	if (!sk || !sk_fullsock(sk))
+		return 0;
+
+	if (sk->sk_family != AF_INET &&
+	    sk->sk_family != AF_INET6)
+		return 0;
+
+	cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+
+	rcu_read_lock();
+
+	prog = rcu_dereference(cgrp->bpf.effective[type]);
+	if (prog) {
+		unsigned int offset = skb->data - skb_network_header(skb);
+
+		__skb_push(skb, offset);
+		ret = bpf_prog_run_clear_cb(prog, skb) == 1 ? 0 : -EPERM;
+		__skb_pull(skb, offset);
+	}
+
+	rcu_read_unlock();
+
+	return ret;
+}
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d1c51b7..57ade89 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5038,6 +5038,8 @@ static void css_release_work_fn(struct work_struct *work)
 		if (cgrp->kn)
 			RCU_INIT_POINTER(*(void __rcu __force **)&cgrp->kn->priv,
 					 NULL);
+
+		cgroup_bpf_put(cgrp);
 	}
 
 	mutex_unlock(&cgroup_mutex);
@@ -5245,6 +5247,9 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	if (!cgroup_on_dfl(cgrp))
 		cgrp->subtree_control = cgroup_control(cgrp);
 
+	if (parent)
+		cgroup_bpf_inherit(cgrp, parent);
+
 	cgroup_propagate_control(cgrp);
 
 	/* @cgrp doesn't have dir yet so the following will only create csses */
@@ -6417,6 +6422,19 @@ static __init int cgroup_namespaces_init(void)
 }
 subsys_initcall(cgroup_namespaces_init);
 
+#ifdef CONFIG_CGROUP_BPF
+void cgroup_bpf_update(struct cgroup *cgrp,
+		       struct bpf_prog *prog,
+		       enum bpf_attach_type type)
+{
+	struct cgroup *parent = cgroup_parent(cgrp);
+
+	mutex_lock(&cgroup_mutex);
+	__cgroup_bpf_update(cgrp, parent, prog, type);
+	mutex_unlock(&cgroup_mutex);
+}
+#endif /* CONFIG_CGROUP_BPF */
+
 #ifdef CONFIG_CGROUP_DEBUG
 static struct cgroup_subsys_state *
 debug_css_alloc(struct cgroup_subsys_state *parent_css)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 3/6] bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands
  2016-09-19 16:43 [PATCH v6 0/6] Add eBPF hooks for cgroups Daniel Mack
  2016-09-19 16:43 ` [PATCH v6 1/6] bpf: add new prog type for cgroup socket filtering Daniel Mack
@ 2016-09-19 16:43 ` Daniel Mack
  2016-09-19 16:44 ` [PATCH v6 6/6] samples: bpf: add userspace example for attaching eBPF programs to cgroups Daniel Mack
       [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:43 UTC (permalink / raw)
  To: htejun, daniel, ast
  Cc: davem, kafai, fw, pablo, harald, netdev, sargun, cgroups, Daniel Mack

Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and
BPF_PROG_DETACH which allow attaching and detaching eBPF programs
to a target.

On the API level, the target could be anything that has an fd in
userspace, hence the name of the field in union bpf_attr is called
'target_fd'.

When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is
expected to be a valid file descriptor of a cgroup v2 directory which
has the bpf controller enabled. These are the only use-cases
implemented by this patch at this point, but more can be added.

If a program of the given type already exists in the given cgroup,
the program is swapped automically, so userspace does not have to drop
an existing program first before installing a new one, which would
otherwise leave a gap in which no program is attached.

For more information on the propagation logic to subcgroups, please
refer to the bpf cgroup controller implementation.

The API is guarded by CAP_NET_ADMIN.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 include/uapi/linux/bpf.h |  8 +++++
 kernel/bpf/syscall.c     | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 55f815e..7cd3616 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -73,6 +73,8 @@ enum bpf_cmd {
 	BPF_PROG_LOAD,
 	BPF_OBJ_PIN,
 	BPF_OBJ_GET,
+	BPF_PROG_ATTACH,
+	BPF_PROG_DETACH,
 };
 
 enum bpf_map_type {
@@ -150,6 +152,12 @@ union bpf_attr {
 		__aligned_u64	pathname;
 		__u32		bpf_fd;
 	};
+
+	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
+		__u32		target_fd;	/* container object to attach to */
+		__u32		attach_bpf_fd;	/* eBPF program to attach */
+		__u32		attach_type;
+	};
 } __attribute__((aligned(8)));
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 228f962..1a8592a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -822,6 +822,77 @@ static int bpf_obj_get(const union bpf_attr *attr)
 	return bpf_obj_get_user(u64_to_ptr(attr->pathname));
 }
 
+#ifdef CONFIG_CGROUP_BPF
+
+#define BPF_PROG_ATTACH_LAST_FIELD attach_type
+
+static int bpf_prog_attach(const union bpf_attr *attr)
+{
+	struct bpf_prog *prog;
+	struct cgroup *cgrp;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (CHECK_ATTR(BPF_PROG_ATTACH))
+		return -EINVAL;
+
+	switch (attr->attach_type) {
+	case BPF_CGROUP_INET_INGRESS:
+	case BPF_CGROUP_INET_EGRESS:
+		prog = bpf_prog_get_type(attr->attach_bpf_fd,
+					 BPF_PROG_TYPE_CGROUP_SOCKET);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+
+		cgrp = cgroup_get_from_fd(attr->target_fd);
+		if (IS_ERR(cgrp)) {
+			bpf_prog_put(prog);
+			return PTR_ERR(cgrp);
+		}
+
+		cgroup_bpf_update(cgrp, prog, attr->attach_type);
+		cgroup_put(cgrp);
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+#define BPF_PROG_DETACH_LAST_FIELD attach_type
+
+static int bpf_prog_detach(const union bpf_attr *attr)
+{
+	struct cgroup *cgrp;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (CHECK_ATTR(BPF_PROG_DETACH))
+		return -EINVAL;
+
+	switch (attr->attach_type) {
+	case BPF_CGROUP_INET_INGRESS:
+	case BPF_CGROUP_INET_EGRESS:
+		cgrp = cgroup_get_from_fd(attr->target_fd);
+		if (IS_ERR(cgrp))
+			return PTR_ERR(cgrp);
+
+		cgroup_bpf_update(cgrp, NULL, attr->attach_type);
+		cgroup_put(cgrp);
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+#endif /* CONFIG_CGROUP_BPF */
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -888,6 +959,16 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_OBJ_GET:
 		err = bpf_obj_get(&attr);
 		break;
+
+#ifdef CONFIG_CGROUP_BPF
+	case BPF_PROG_ATTACH:
+		err = bpf_prog_attach(&attr);
+		break;
+	case BPF_PROG_DETACH:
+		err = bpf_prog_detach(&attr);
+		break;
+#endif
+
 	default:
 		err = -EINVAL;
 		break;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 4/6] net: filter: run cgroup eBPF ingress programs
       [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  2016-09-19 16:43   ` [PATCH v6 2/6] cgroup: add support for eBPF programs Daniel Mack
@ 2016-09-19 16:43   ` Daniel Mack
  2016-09-19 16:44   ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs Daniel Mack
  2016-10-21  5:32     ` David Ahern
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:43 UTC (permalink / raw)
  To: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ, ast-b10kYP2dOMg
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Mack

If the cgroup associated with the receiving socket has an eBPF
programs installed, run them from sk_filter_trim_cap().

eBPF programs used in this context are expected to either return 1 to
let the packet pass, or != 1 to drop them. The programs have access to
the skb through bpf_skb_load_bytes(), and the payload starts at the
network headers (L3).

Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
the feature is unused.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
---
 net/core/filter.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index e46c98e..ce6e527 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -78,6 +78,10 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
 	if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
 		return -ENOMEM;
 
+	err = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_INGRESS);
+	if (err)
+		return err;
+
 	err = security_sock_rcv_skb(sk, skb);
 	if (err)
 		return err;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  2016-09-19 16:43   ` [PATCH v6 2/6] cgroup: add support for eBPF programs Daniel Mack
  2016-09-19 16:43   ` [PATCH v6 4/6] net: filter: run cgroup eBPF ingress programs Daniel Mack
@ 2016-09-19 16:44   ` Daniel Mack
  2016-09-19 19:19     ` Pablo Neira Ayuso
       [not found]     ` <1474303441-3745-6-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  2016-10-21  5:32     ` David Ahern
  3 siblings, 2 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:44 UTC (permalink / raw)
  To: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ, ast-b10kYP2dOMg
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Mack

If the cgroup associated with the receiving socket has an eBPF
programs installed, run them from ip_output(), ip6_output() and
ip_mc_output().

eBPF programs used in this context are expected to either return 1 to
let the packet pass, or != 1 to drop them. The programs have access to
the skb through bpf_skb_load_bytes(), and the payload starts at the
network headers (L3).

Note that cgroup_bpf_run_filter() is stubbed out as static inline nop
for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if
the feature is unused.

Signed-off-by: Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
---
 net/ipv4/ip_output.c  | 15 +++++++++++++++
 net/ipv6/ip6_output.c |  8 ++++++++
 2 files changed, 23 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 05d1058..3ca3d7a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -74,6 +74,7 @@
 #include <net/checksum.h>
 #include <net/inetpeer.h>
 #include <net/lwtunnel.h>
+#include <linux/bpf-cgroup.h>
 #include <linux/igmp.h>
 #include <linux/netfilter_ipv4.h>
 #include <linux/netfilter_bridge.h>
@@ -303,6 +304,7 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	struct rtable *rt = skb_rtable(skb);
 	struct net_device *dev = rt->dst.dev;
+	int ret;
 
 	/*
 	 *	If the indicated interface is up and running, send the packet.
@@ -312,6 +314,12 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	skb->dev = dev;
 	skb->protocol = htons(ETH_P_IP);
 
+	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
+	if (ret) {
+		kfree_skb(skb);
+		return ret;
+	}
+
 	/*
 	 *	Multicasts are looped back for other local users
 	 */
@@ -364,12 +372,19 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	struct net_device *dev = skb_dst(skb)->dev;
+	int ret;
 
 	IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
 
 	skb->dev = dev;
 	skb->protocol = htons(ETH_P_IP);
 
+	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
+	if (ret) {
+		kfree_skb(skb);
+		return ret;
+	}
+
 	return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
 			    net, sk, skb, NULL, dev,
 			    ip_finish_output,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..5dc90aa 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -39,6 +39,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#include <linux/bpf-cgroup.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv6.h>
 
@@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	struct net_device *dev = skb_dst(skb)->dev;
 	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
+	int ret;
 
 	if (unlikely(idev->cnf.disable_ipv6)) {
 		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
@@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 		return 0;
 	}
 
+	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
+	if (ret) {
+		kfree_skb(skb);
+		return ret;
+	}
+
 	return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING,
 			    net, sk, skb, NULL, dev,
 			    ip6_finish_output,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 6/6] samples: bpf: add userspace example for attaching eBPF programs to cgroups
  2016-09-19 16:43 [PATCH v6 0/6] Add eBPF hooks for cgroups Daniel Mack
  2016-09-19 16:43 ` [PATCH v6 1/6] bpf: add new prog type for cgroup socket filtering Daniel Mack
  2016-09-19 16:43 ` [PATCH v6 3/6] bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands Daniel Mack
@ 2016-09-19 16:44 ` Daniel Mack
       [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 16:44 UTC (permalink / raw)
  To: htejun, daniel, ast
  Cc: davem, kafai, fw, pablo, harald, netdev, sargun, cgroups, Daniel Mack

Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:

 * Create arraymap in kernel with 4 byte keys and 8 byte values

 * Load eBPF program

   The eBPF program accesses the map passed in to store two pieces of
   information. The number of invocations of the program, which maps
   to the number of packets received, is stored to key 0. Key 1 is
   incremented on each iteration by the number of bytes stored in
   the skb.

 * Detach any eBPF program previously attached to the cgroup

 * Attach the new program to the cgroup using BPF_PROG_ATTACH

 * Once a second, read map[0] and map[1] to see how many bytes and
   packets were seen on any socket of tasks in the given cgroup.

The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.

libbpf gained two new wrappers for the new syscall commands.

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 samples/bpf/Makefile            |   2 +
 samples/bpf/libbpf.c            |  21 ++++++
 samples/bpf/libbpf.h            |   3 +
 samples/bpf/test_cgrp2_attach.c | 147 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_attach.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 12b7304..e4cdc74 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -22,6 +22,7 @@ hostprogs-y += spintest
 hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
+hostprogs-y += test_cgrp2_attach
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
@@ -49,6 +50,7 @@ spintest-objs := bpf_load.o libbpf.o spintest_user.o
 map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
+test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 9969e35..9ce707b 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -104,6 +104,27 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
 }
 
+int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
+{
+	union bpf_attr attr = {
+		.target_fd = target_fd,
+		.attach_bpf_fd = prog_fd,
+		.attach_type = type,
+	};
+
+	return syscall(__NR_bpf, BPF_PROG_ATTACH, &attr, sizeof(attr));
+}
+
+int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
+{
+	union bpf_attr attr = {
+		.target_fd = target_fd,
+		.attach_type = type,
+	};
+
+	return syscall(__NR_bpf, BPF_PROG_DETACH, &attr, sizeof(attr));
+}
+
 int bpf_obj_pin(int fd, const char *pathname)
 {
 	union bpf_attr attr = {
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 364582b..f973241 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -15,6 +15,9 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, int insn_len,
 		  const char *license, int kern_version);
 
+int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
+int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
+
 int bpf_obj_pin(int fd, const char *pathname);
 int bpf_obj_get(const char *pathname);
 
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
new file mode 100644
index 0000000..19e4ec0
--- /dev/null
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -0,0 +1,147 @@
+/* eBPF example program:
+ *
+ * - Creates arraymap in kernel with 4 bytes keys and 8 byte values
+ *
+ * - Loads eBPF program
+ *
+ *   The eBPF program accesses the map passed in to store two pieces of
+ *   information. The number of invocations of the program, which maps
+ *   to the number of packets received, is stored to key 0. Key 1 is
+ *   incremented on each iteration by the number of bytes stored in
+ *   the skb.
+ *
+ * - Detaches any eBPF program previously attached to the cgroup
+ *
+ * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
+ *
+ * - Every second, reads map[0] and map[1] to see how many bytes and
+ *   packets were seen on any socket of tasks in the given cgroup.
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+
+#include <linux/bpf.h>
+
+#include "libbpf.h"
+
+enum {
+	MAP_KEY_PACKETS,
+	MAP_KEY_BYTES,
+};
+
+static int prog_load(int map_fd, int verdict)
+{
+	struct bpf_insn prog[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* save r6 so it's not clobbered by BPF_CALL */
+
+		/* Count packets */
+		BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_PACKETS), /* r0 = 0 */
+		BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
+		BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* load map fd to r1 */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+		BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
+		BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+
+		/* Count bytes */
+		BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_BYTES), /* r0 = 1 */
+		BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
+		BPF_LD_MAP_FD(BPF_REG_1, map_fd),
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_6, offsetof(struct __sk_buff, len)), /* r1 = skb->len */
+		BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+
+		BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+
+	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCKET,
+			     prog, sizeof(prog), "GPL", 0);
+}
+
+static int usage(const char *argv0)
+{
+	printf("Usage: %s <cg-path> <egress|ingress> [drop]\n", argv0);
+	return EXIT_FAILURE;
+}
+
+int main(int argc, char **argv)
+{
+	int cg_fd, map_fd, prog_fd, key, ret;
+	long long pkt_cnt, byte_cnt;
+	enum bpf_attach_type type;
+	int verdict = 1;
+
+	if (argc < 3)
+		return usage(argv[0]);
+
+	if (strcmp(argv[2], "ingress") == 0)
+		type = BPF_CGROUP_INET_INGRESS;
+	else if (strcmp(argv[2], "egress") == 0)
+		type = BPF_CGROUP_INET_EGRESS;
+	else
+		return usage(argv[0]);
+
+	if (argc > 3 && strcmp(argv[3], "drop") == 0)
+		verdict = 0;
+
+	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (cg_fd < 0) {
+		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY,
+				sizeof(key), sizeof(byte_cnt),
+				256, 0);
+	if (map_fd < 0) {
+		printf("Failed to create map: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	prog_fd = prog_load(map_fd, verdict);
+	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
+
+	if (prog_fd < 0) {
+		printf("Failed to load prog: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	ret = bpf_prog_detach(cg_fd, type);
+	printf("bpf_prog_detach() returned '%s' (%d)\n", strerror(errno), errno);
+
+	ret = bpf_prog_attach(prog_fd, cg_fd, type);
+	if (ret < 0) {
+		printf("Failed to attach prog to cgroup: '%s'\n",
+		       strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	while (1) {
+		key = MAP_KEY_PACKETS;
+		assert(bpf_lookup_elem(map_fd, &key, &pkt_cnt) == 0);
+
+		key = MAP_KEY_BYTES;
+		assert(bpf_lookup_elem(map_fd, &key, &byte_cnt) == 0);
+
+		printf("cgroup received %lld packets, %lld bytes\n",
+		       pkt_cnt, byte_cnt);
+		sleep(1);
+	}
+
+	return EXIT_SUCCESS;
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-19 16:44   ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs Daniel Mack
@ 2016-09-19 19:19     ` Pablo Neira Ayuso
  2016-09-19 19:30       ` Daniel Mack
  2016-09-19 20:13       ` Alexei Starovoitov
       [not found]     ` <1474303441-3745-6-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  1 sibling, 2 replies; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-19 19:19 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun, daniel, ast, davem, kafai, fw, harald, netdev, sargun, cgroups

On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..5dc90aa 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -39,6 +39,7 @@
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  
> +#include <linux/bpf-cgroup.h>
>  #include <linux/netfilter.h>
>  #include <linux/netfilter_ipv6.h>
>  
> @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  {
>  	struct net_device *dev = skb_dst(skb)->dev;
>  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
> +	int ret;
>  
>  	if (unlikely(idev->cnf.disable_ipv6)) {
>  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
> @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  		return 0;
>  	}
>  
> +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
> +	if (ret) {
> +		kfree_skb(skb);
> +		return ret;
> +	}

1) If your goal is to filter packets, why so late? The sooner you
   enforce your policy, the less cycles you waste.

Actually, did you look at Google's approach to this problem?  They
want to control this at socket level, so you restrict what the process
can actually bind. That is enforcing the policy way before you even
send packets. On top of that, what they submitted is infrastructured
so any process with CAP_NET_ADMIN can access that policy that is being
applied and fetch a readable policy through kernel interface.

2) This will turn the stack into a nightmare to debug I predict. If
   any process with CAP_NET_ADMIN can potentially attach bpf blobs
   via these hooks, we will have to include in the network stack
   traveling documentation something like: "Probably you have to check
   that your orchestrator is not dropping your packets for some
   reason". So I wonder how users will debug this and how the policy that
   your orchestrator applies will be exposed to userspace.

>  	return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING,
>  			    net, sk, skb, NULL, dev,
>  			    ip6_finish_output,
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-19 19:19     ` Pablo Neira Ayuso
@ 2016-09-19 19:30       ` Daniel Mack
       [not found]         ` <ac88bb4c-ab7c-1f74-c7fd-79e523b50ae4-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  2016-09-19 20:13       ` Alexei Starovoitov
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 19:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: htejun, daniel, ast, davem, kafai, fw, harald, netdev, sargun, cgroups

On 09/19/2016 09:19 PM, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 6001e78..5dc90aa 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -39,6 +39,7 @@
>>  #include <linux/module.h>
>>  #include <linux/slab.h>
>>  
>> +#include <linux/bpf-cgroup.h>
>>  #include <linux/netfilter.h>
>>  #include <linux/netfilter_ipv6.h>
>>  
>> @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>>  {
>>  	struct net_device *dev = skb_dst(skb)->dev;
>>  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
>> +	int ret;
>>  
>>  	if (unlikely(idev->cnf.disable_ipv6)) {
>>  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
>> @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>>  		return 0;
>>  	}
>>  
>> +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
>> +	if (ret) {
>> +		kfree_skb(skb);
>> +		return ret;
>> +	}
> 
> 1) If your goal is to filter packets, why so late? The sooner you
>    enforce your policy, the less cycles you waste.
> 
> Actually, did you look at Google's approach to this problem?  They
> want to control this at socket level, so you restrict what the process
> can actually bind. That is enforcing the policy way before you even
> send packets. On top of that, what they submitted is infrastructured
> so any process with CAP_NET_ADMIN can access that policy that is being
> applied and fetch a readable policy through kernel interface.

Yes, I've seen what they propose, but I want this approach to support
accounting, and so the code has to look at each and every packet in
order to count bytes and packets. Do you know of any better place to put
the hook then?

That said, I can well imagine more hooks types that also operate at port
bind time. That would be easy to add on top.

> 2) This will turn the stack into a nightmare to debug I predict. If
>    any process with CAP_NET_ADMIN can potentially attach bpf blobs
>    via these hooks, we will have to include in the network stack
>    traveling documentation something like: "Probably you have to check
>    that your orchestrator is not dropping your packets for some
>    reason". So I wonder how users will debug this and how the policy that
>    your orchestrator applies will be exposed to userspace.

Sure, every new limitation mechanism adds another knob to look at if
things don't work. But apart from taking care at userspace level to make
the behavior as obvious as possible, I'm open to suggestions of how to
improve the transparency of attached eBPF programs on the kernel side.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-19 19:19     ` Pablo Neira Ayuso
  2016-09-19 19:30       ` Daniel Mack
@ 2016-09-19 20:13       ` Alexei Starovoitov
  2016-09-19 20:39         ` Pablo Neira Ayuso
       [not found]         ` <20160919201322.GA84770-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
  1 sibling, 2 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2016-09-19 20:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Sep 19, 2016 at 09:19:10PM +0200, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index 6001e78..5dc90aa 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -39,6 +39,7 @@
> >  #include <linux/module.h>
> >  #include <linux/slab.h>
> >  
> > +#include <linux/bpf-cgroup.h>
> >  #include <linux/netfilter.h>
> >  #include <linux/netfilter_ipv6.h>
> >  
> > @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  {
> >  	struct net_device *dev = skb_dst(skb)->dev;
> >  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
> > +	int ret;
> >  
> >  	if (unlikely(idev->cnf.disable_ipv6)) {
> >  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
> > @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  		return 0;
> >  	}
> >  
> > +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
> > +	if (ret) {
> > +		kfree_skb(skb);
> > +		return ret;
> > +	}
> 
> 1) If your goal is to filter packets, why so late? The sooner you
>    enforce your policy, the less cycles you waste.
> 
> Actually, did you look at Google's approach to this problem?  They
> want to control this at socket level, so you restrict what the process
> can actually bind. That is enforcing the policy way before you even
> send packets. On top of that, what they submitted is infrastructured
> so any process with CAP_NET_ADMIN can access that policy that is being
> applied and fetch a readable policy through kernel interface.
> 
> 2) This will turn the stack into a nightmare to debug I predict. If
>    any process with CAP_NET_ADMIN can potentially attach bpf blobs
>    via these hooks, we will have to include in the network stack

a process without CAP_NET_ADMIN can attach bpf blobs to
system calls via seccomp. bpf is already used for security and policing.

>    traveling documentation something like: "Probably you have to check
>    that your orchestrator is not dropping your packets for some
>    reason". So I wonder how users will debug this and how the policy that
>    your orchestrator applies will be exposed to userspace.

as far as bpf debuggability/visibility there are various efforts on the way:
for kernel side:
- ksym for jit-ed programs
- hash sum for prog code
- compact type information for maps and various pretty printers
- data flow analysis of the programs
for user space:
- from bpf asm reconstruct the program in the high level language
  (there is p4 to bpf, this effort is about bpf to p4)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]         ` <ac88bb4c-ab7c-1f74-c7fd-79e523b50ae4-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
@ 2016-09-19 20:35           ` Pablo Neira Ayuso
  2016-09-19 20:56             ` Daniel Mack
  0 siblings, 1 reply; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-19 20:35 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Mon, Sep 19, 2016 at 09:30:02PM +0200, Daniel Mack wrote:
> On 09/19/2016 09:19 PM, Pablo Neira Ayuso wrote:
> > On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:
> >> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> >> index 6001e78..5dc90aa 100644
> >> --- a/net/ipv6/ip6_output.c
> >> +++ b/net/ipv6/ip6_output.c
> >> @@ -39,6 +39,7 @@
> >>  #include <linux/module.h>
> >>  #include <linux/slab.h>
> >>  
> >> +#include <linux/bpf-cgroup.h>
> >>  #include <linux/netfilter.h>
> >>  #include <linux/netfilter_ipv6.h>
> >>  
> >> @@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>  {
> >>  	struct net_device *dev = skb_dst(skb)->dev;
> >>  	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
> >> +	int ret;
> >>  
> >>  	if (unlikely(idev->cnf.disable_ipv6)) {
> >>  		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
> >> @@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >>  		return 0;
> >>  	}
> >>  
> >> +	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
> >> +	if (ret) {
> >> +		kfree_skb(skb);
> >> +		return ret;
> >> +	}
> > 
> > 1) If your goal is to filter packets, why so late? The sooner you
> >    enforce your policy, the less cycles you waste.
> > 
> > Actually, did you look at Google's approach to this problem?  They
> > want to control this at socket level, so you restrict what the process
> > can actually bind. That is enforcing the policy way before you even
> > send packets. On top of that, what they submitted is infrastructured
> > so any process with CAP_NET_ADMIN can access that policy that is being
> > applied and fetch a readable policy through kernel interface.
> 
> Yes, I've seen what they propose, but I want this approach to support
> accounting, and so the code has to look at each and every packet in
> order to count bytes and packets. Do you know of any better place to put
> the hook then?

Accounting is part of the usecase that fits into the "network
introspection" idea that has been mentioned here, so you can achieve
this by adding a hook that returns no verdict, so this becomes similar
to the tracing infrastructure.

> That said, I can well imagine more hooks types that also operate at port
> bind time. That would be easy to add on top.

Filtering packets with cgroups is braindead.

You have the means to ensure that processes send no packets via
restricting port binding, there is no reason to do this any later for
locally generated traffic.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-19 20:13       ` Alexei Starovoitov
@ 2016-09-19 20:39         ` Pablo Neira Ayuso
       [not found]         ` <20160919201322.GA84770-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
  1 sibling, 0 replies; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-19 20:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Mack, htejun, daniel, ast, davem, kafai, fw, harald,
	netdev, sargun, cgroups

On Mon, Sep 19, 2016 at 01:13:27PM -0700, Alexei Starovoitov wrote:
> On Mon, Sep 19, 2016 at 09:19:10PM +0200, Pablo Neira Ayuso wrote:
[...]
> > 2) This will turn the stack into a nightmare to debug I predict. If
> >    any process with CAP_NET_ADMIN can potentially attach bpf blobs
> >    via these hooks, we will have to include in the network stack
> 
> a process without CAP_NET_ADMIN can attach bpf blobs to
> system calls via seccomp. bpf is already used for security and policing.

That is a local mechanism, it applies to parent process and child
processes, just like SO_ATTACH_FILTER.

The usecase that we're discussing here enforces a global policy.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-19 20:35           ` Pablo Neira Ayuso
@ 2016-09-19 20:56             ` Daniel Mack
  2016-09-20 14:29               ` Pablo Neira Ayuso
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Mack @ 2016-09-19 20:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 09/19/2016 10:35 PM, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 09:30:02PM +0200, Daniel Mack wrote:
>> On 09/19/2016 09:19 PM, Pablo Neira Ayuso wrote:

>>> Actually, did you look at Google's approach to this problem?  They
>>> want to control this at socket level, so you restrict what the process
>>> can actually bind. That is enforcing the policy way before you even
>>> send packets. On top of that, what they submitted is infrastructured
>>> so any process with CAP_NET_ADMIN can access that policy that is being
>>> applied and fetch a readable policy through kernel interface.
>>
>> Yes, I've seen what they propose, but I want this approach to support
>> accounting, and so the code has to look at each and every packet in
>> order to count bytes and packets. Do you know of any better place to put
>> the hook then?
> 
> Accounting is part of the usecase that fits into the "network
> introspection" idea that has been mentioned here, so you can achieve
> this by adding a hook that returns no verdict, so this becomes similar
> to the tracing infrastructure.

Why would we artificially limit the use-cases of this implementation if
the way it stands, both filtering and introspection are possible?

> Filtering packets with cgroups is braindead.

Filtering is done via eBPF, and cgroups are just the containers. I don't
see what's brain-dead in that approach. After all, accessing the cgroup
once we have a local socket is really fast, so the idea is kinda obvious.

> You have the means to ensure that processes send no packets via
> restricting port binding, there is no reason to do this any later for
> locally generated traffic.

Yes, restricting port binding can be done on top, if people are worried
about the performance overhead of a per-packet program.



Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]         ` <20160919201322.GA84770-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
@ 2016-09-19 21:28           ` Thomas Graf
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Graf @ 2016-09-19 21:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Pablo Neira Ayuso, Daniel Mack, htejun-b10kYP2dOMg,
	daniel-FeC+5ew28dpmcu3hnIyYJQ, ast-b10kYP2dOMg,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 09/19/16 at 01:13pm, Alexei Starovoitov wrote:
> as far as bpf debuggability/visibility there are various efforts on the way:
> for kernel side:
> - ksym for jit-ed programs
> - hash sum for prog code
> - compact type information for maps and various pretty printers
> - data flow analysis of the programs

We made a generic map tool available here as well:
https://github.com/cilium/cilium/blob/master/bpf/go/map_ctrl.go

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]     ` <1474303441-3745-6-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
@ 2016-09-20  5:44       ` kbuild test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2016-09-20  5:44 UTC (permalink / raw)
  To: Daniel Mack
  Cc: kbuild-all-JC7UmRfGjtg, htejun-b10kYP2dOMg,
	daniel-FeC+5ew28dpmcu3hnIyYJQ, ast-b10kYP2dOMg,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Daniel Mack

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

Hi Daniel,

[auto build test ERROR on next-20160919]
[cannot apply to linus/master linux/master net/master v4.8-rc7 v4.8-rc6 v4.8-rc5 v4.8-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Daniel-Mack/Add-eBPF-hooks-for-cgroups/20160920-010551
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

>> ERROR: "__cgroup_bpf_run_filter" [net/ipv6/ipv6.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56527 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-19 20:56             ` Daniel Mack
@ 2016-09-20 14:29               ` Pablo Neira Ayuso
  2016-09-20 16:43                 ` Daniel Mack
  2016-09-20 16:53                 ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF " Thomas Graf
  0 siblings, 2 replies; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-20 14:29 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun, daniel, ast, davem, kafai, fw, harald, netdev, sargun, cgroups

On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
[...]
> Why would we artificially limit the use-cases of this implementation if
> the way it stands, both filtering and introspection are possible?

Why should we place infrastructure in the kernel to filter packets so
late, and why at postrouting btw, when we can do this way earlier
before any packet is actually sent? No performance impact, no need for
skbuff allocation and *no cycles wasted to evaluate if every packet is
wanted or not*.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-20 14:29               ` Pablo Neira Ayuso
@ 2016-09-20 16:43                 ` Daniel Mack
       [not found]                   ` <6584b975-fa3e-8d98-f0c7-a2c6b194b2b6-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
  2016-09-20 16:53                 ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF " Thomas Graf
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Mack @ 2016-09-20 16:43 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Pablo,

On 09/20/2016 04:29 PM, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
> [...]
>> Why would we artificially limit the use-cases of this implementation if
>> the way it stands, both filtering and introspection are possible?
> 
> Why should we place infrastructure in the kernel to filter packets so
> late, and why at postrouting btw, when we can do this way earlier
> before any packet is actually sent?

The point is that from an application's perspective, restricting the
ability to bind a port and dropping packets that are being sent is a
very different thing. Applications will start to behave differently if
they can't bind to a port, and that's something we do not want to happen.

Looking at packets and making a verdict on them is the only way to
implement what we have in mind. Given that's in line with what netfilter
does, it can't be all that wrong, can it?

> No performance impact, no need for
> skbuff allocation and *no cycles wasted to evaluate if every packet is
> wanted or not*.

Hmm, not sure why this keeps coming up. As I said - for accounting,
there is no other option than to look at every packet and its size.

Regarding the performance concerns, are you saying a netfilter based
implementation that uses counters for that purpose would be more
efficient? Because in my tests, just loading the netfilter modules with
no rules in place at all has more impact than running the code from 6/6
on every packet.

As stated before, I see no reason why we shouldn't have a netfilter
based implementation that can achieve the same, function-wise. And I
would also like to compare their throughput.


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-20 14:29               ` Pablo Neira Ayuso
  2016-09-20 16:43                 ` Daniel Mack
@ 2016-09-20 16:53                 ` Thomas Graf
  1 sibling, 0 replies; 29+ messages in thread
From: Thomas Graf @ 2016-09-20 16:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 09/20/16 at 04:29pm, Pablo Neira Ayuso wrote:
> On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
> [...]
> > Why would we artificially limit the use-cases of this implementation if
> > the way it stands, both filtering and introspection are possible?
> 
> Why should we place infrastructure in the kernel to filter packets so
> late, and why at postrouting btw, when we can do this way earlier
> before any packet is actually sent? No performance impact, no need for
> skbuff allocation and *no cycles wasted to evaluate if every packet is
> wanted or not*.

I won't argue against filtering at socket level, it is very valuable.
A difference is transparency of enforcement. Dropping at networking
level is accounted for in the usual counters and well understood by
admins. "Dropping" at bind socket level would either involve returning
an error to the application (which may change behaviour of the application)
or require a new form of accounting.

I also want to see packet size, selected source address, outgoing device,
and attach tunnel key metadata to the skb based on the cgroup. All of
which are not available at socket level.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]                   ` <6584b975-fa3e-8d98-f0c7-a2c6b194b2b6-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
@ 2016-09-21 15:45                     ` Pablo Neira Ayuso
  2016-09-21 18:48                       ` Thomas Graf
  0 siblings, 1 reply; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-21 15:45 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi Daniel,

On Tue, Sep 20, 2016 at 06:43:35PM +0200, Daniel Mack wrote:
> Hi Pablo,
> 
> On 09/20/2016 04:29 PM, Pablo Neira Ayuso wrote:
> > On Mon, Sep 19, 2016 at 10:56:14PM +0200, Daniel Mack wrote:
> > [...]
> >> Why would we artificially limit the use-cases of this implementation if
> >> the way it stands, both filtering and introspection are possible?
> > 
> > Why should we place infrastructure in the kernel to filter packets so
> > late, and why at postrouting btw, when we can do this way earlier
> > before any packet is actually sent?
> 
> The point is that from an application's perspective, restricting the
> ability to bind a port and dropping packets that are being sent is a
> very different thing. Applications will start to behave differently if
> they can't bind to a port, and that's something we do not want to happen.

What is exactly the problem? Applications are not checking for return
value from bind? They should be fixed. If you want to collect
statistics, I see no reason why you couldn't collect them for every
EACCESS on each bind() call.

> Looking at packets and making a verdict on them is the only way to
> implement what we have in mind. Given that's in line with what netfilter
> does, it can't be all that wrong, can it?

That output hook was added ~20 years ago... At that time we didn't
have anything better than dropping locally generated packets. Today we
can probably do something better.

> > No performance impact, no need for
> > skbuff allocation and *no cycles wasted to evaluate if every packet is
> > wanted or not*.
> 
> Hmm, not sure why this keeps coming up. As I said - for accounting,
> there is no other option than to look at every packet and its size.
> 
> Regarding the performance concerns, are you saying a netfilter based
> implementation that uses counters for that purpose would be more
> efficient?

> Because in my tests, just loading the netfilter modules with no
> rules in place at all has more impact than running the code from 6/6
> on every packet.

You must be talking on iptables. When did you test this? We now have
on-demand hook registration per-netns, anyway, in nftables you only
register what you need.

Everytime you mention about performance, it sounds like there is no
room to improve what we have... and we indeed have room and ideas to
get this flying faster, but keeping in mind good integration with our
generic network stack and extensible interfaces, that's important too.
On top of that, I started working on some preliminary patches to add
nftables jit, will be talking on this during NetDev 1.2
netfilter/nftables workshop. I would expect numbers close to what
you're observing with this solution.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-21 15:45                     ` Pablo Neira Ayuso
@ 2016-09-21 18:48                       ` Thomas Graf
       [not found]                         ` <20160921184827.GA15732-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Graf @ 2016-09-21 18:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun, daniel, ast, davem, kafai, fw, harald,
	netdev, sargun, cgroups

On 09/21/16 at 05:45pm, Pablo Neira Ayuso wrote:
> On Tue, Sep 20, 2016 at 06:43:35PM +0200, Daniel Mack wrote:
> > The point is that from an application's perspective, restricting the
> > ability to bind a port and dropping packets that are being sent is a
> > very different thing. Applications will start to behave differently if
> > they can't bind to a port, and that's something we do not want to happen.
> 
> What is exactly the problem? Applications are not checking for return
> value from bind? They should be fixed. If you want to collect
> statistics, I see no reason why you couldn't collect them for every
> EACCESS on each bind() call.

It's not about applications not checking the return value of bind().
Unfortunately, many applications (or the respective libraries they use)
retry on connect() failure but handle bind() errors as a hard failure
and exit. Yes, it's an application or library bug but these
applications have very specific exceptions how something fails.
Sometimes even going from drop to RST will break applications.

Paranoia speaking: by returning errors where no error was returned
before, undefined behaviour occurs. In Murphy speak: things break.

This is given and we can't fix it from the kernel side. Returning at
system call level has many benefits but it's not always an option.

Adding the late hook does not prevent filtering at socket layer to
also be added. I think we need both.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]                         ` <20160921184827.GA15732-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>
@ 2016-09-22  9:21                           ` Pablo Neira Ayuso
  2016-09-22  9:54                             ` Thomas Graf
  0 siblings, 1 reply; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-22  9:21 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Wed, Sep 21, 2016 at 08:48:27PM +0200, Thomas Graf wrote:
> On 09/21/16 at 05:45pm, Pablo Neira Ayuso wrote:
> > On Tue, Sep 20, 2016 at 06:43:35PM +0200, Daniel Mack wrote:
> > > The point is that from an application's perspective, restricting the
> > > ability to bind a port and dropping packets that are being sent is a
> > > very different thing. Applications will start to behave differently if
> > > they can't bind to a port, and that's something we do not want to happen.
> > 
> > What is exactly the problem? Applications are not checking for return
> > value from bind? They should be fixed. If you want to collect
> > statistics, I see no reason why you couldn't collect them for every
> > EACCESS on each bind() call.
> 
> It's not about applications not checking the return value of bind().
> Unfortunately, many applications (or the respective libraries they use)
> retry on connect() failure but handle bind() errors as a hard failure
> and exit. Yes, it's an application or library bug but these
> applications have very specific exceptions how something fails.
> Sometimes even going from drop to RST will break applications.
> 
> Paranoia speaking: by returning errors where no error was returned
> before, undefined behaviour occurs. In Murphy speak: things break.
> 
> This is given and we can't fix it from the kernel side. Returning at
> system call level has many benefits but it's not always an option.
> 
> Adding the late hook does not prevent filtering at socket layer to
> also be added. I think we need both.

I have a hard time to buy this new specific hook, I think we should
shift focus of this debate, this is my proposal to untangle this:

You add a net/netfilter/nft_bpf.c expression that allows you to run
bpf programs from nf_tables. This expression can either run bpf
programs in a similar fashion to tc+bpf or run the bpf program that
you have attached to the cgroup.

To achieve this, I'd suggest you also add a new bpf chain type. That
new chain type would basically provide raw access to netfilter hooks
via nf_tables netlink interface.  This bpf chain would exclusively
take rules that use this new bpf expression.

I see good things on this proposal:

* This is consistent to what we offer via tc+bpf.

* It becomes easily visible to the user that a bpf program is running
  from the packet path, or any cgroup+bpf filtering is going on. Thus,
  no matter what those orchestrators do, this filtering becomes
  visible to sysadmins that are familiar with the existing command line
  tooling.

* You get access to all of the existing netfilter hooks in one go.

A side note on this: I would suggest this conversation focuses on
discussing aspects at a slightly higher level rather than counting raw
load and stores instructions...  I think this effort requires looking
at the whole forest, instead barfing at one single tree. Genericity
always comes at a slight cost, and to all those programmability fans
here, please remember we have a generic stack between hands after all.
So let's try to accomodate this new requirements in a way that makes
sense.

Thanks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-22  9:21                           ` Pablo Neira Ayuso
@ 2016-09-22  9:54                             ` Thomas Graf
       [not found]                               ` <20160922095411.GA5654-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Graf @ 2016-09-22  9:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Daniel Mack, htejun, daniel, ast, davem, kafai, fw, harald,
	netdev, sargun, cgroups

On 09/22/16 at 11:21am, Pablo Neira Ayuso wrote:
> I have a hard time to buy this new specific hook, I think we should
> shift focus of this debate, this is my proposal to untangle this:
> 
> You add a net/netfilter/nft_bpf.c expression that allows you to run
> bpf programs from nf_tables. This expression can either run bpf
> programs in a similar fashion to tc+bpf or run the bpf program that
> you have attached to the cgroup.

So for every packet processed, you want to require the user to load
and run a (unJITed) nft program acting as a wrapper to run a JITed
BPF program? What it the benefit of this model compared to what Daniel
is proposing? The hooking point is the same. This only introduces
additional per packet overhead in the fast path. Am I missing something?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]                               ` <20160922095411.GA5654-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>
@ 2016-09-22 12:05                                 ` Pablo Neira Ayuso
  2016-09-22 15:12                                   ` Daniel Borkmann
  0 siblings, 1 reply; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-22 12:05 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 22, 2016 at 11:54:11AM +0200, Thomas Graf wrote:
> On 09/22/16 at 11:21am, Pablo Neira Ayuso wrote:
> > I have a hard time to buy this new specific hook, I think we should
> > shift focus of this debate, this is my proposal to untangle this:
> >
> > You add a net/netfilter/nft_bpf.c expression that allows you to run
> > bpf programs from nf_tables. This expression can either run bpf
> > programs in a similar fashion to tc+bpf or run the bpf program that
> > you have attached to the cgroup.
>
> So for every packet processed, you want to require the user to load
> and run a (unJITed) nft program acting as a wrapper to run a JITed
> BPF program? What it the benefit of this model compared to what Daniel
> is proposing? The hooking point is the same. This only introduces
> additional per packet overhead in the fast path. Am I missing
> something?

Have a look at net/ipv4/netfilter/nft_chain_route_ipv4.c for instance.
In your case, you have to add a new chain type:

static const struct nf_chain_type nft_chain_bpf = {
        .name           = "bpf",
        .type           = NFT_CHAIN_T_BPF,
        ...
        .hooks          = {
                [NF_INET_LOCAL_IN]      = nft_do_bpf,
                [NF_INET_LOCAL_OUT]     = nft_do_bpf,
                [NF_INET_FORWARD]       = nft_do_bpf,
                [NF_INET_PRE_ROUTING]   = nft_do_bpf,
                [NF_INET_POST_ROUTING]  = nft_do_bpf,
        },
};

nft_do_bpf() is the raw netfilter hook that you register, this hook
will just execute to iterate over the list of bpf filters and run
them.

This new chain is created on demand, so no overhead if not needed, eg.

nft add table bpf
nft add chain bpf input { type bpf hook output priority 0\; }

Then, you add a rule for each bpf program you want to run, just like
tc+bpf.

Benefits are, rewording previous email:

* You get access to all of the existing netfilter hooks in one go
  to run bpf programs. No need for specific redundant hooks. This
  provides raw access to the netfilter hook, you define the little
  code that your hook runs before you bpf run invocation. So there
  is *no need to bloat the stack with more hooks, we use what we
  have.*

* This is consistent to what we offer via tc+bpf, similar design idea.
  Users are already familiar with this approach.

* It becomes easily visible to the user that a bpf program is running
  from whenever in the packet path, so from a sysadmin perspective is
  is easy to dump the configuration via netlink interface using the
  existing tooling in case that troubleshooting is required.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
  2016-09-22 12:05                                 ` Pablo Neira Ayuso
@ 2016-09-22 15:12                                   ` Daniel Borkmann
       [not found]                                     ` <57E3F4F9.70300-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Borkmann @ 2016-09-22 15:12 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Thomas Graf
  Cc: Daniel Mack, htejun, ast, davem, kafai, fw, harald, netdev,
	sargun, cgroups

On 09/22/2016 02:05 PM, Pablo Neira Ayuso wrote:
> On Thu, Sep 22, 2016 at 11:54:11AM +0200, Thomas Graf wrote:
>> On 09/22/16 at 11:21am, Pablo Neira Ayuso wrote:
>>> I have a hard time to buy this new specific hook, I think we should
>>> shift focus of this debate, this is my proposal to untangle this:
>>>
>>> You add a net/netfilter/nft_bpf.c expression that allows you to run
>>> bpf programs from nf_tables. This expression can either run bpf
>>> programs in a similar fashion to tc+bpf or run the bpf program that
>>> you have attached to the cgroup.
>>
>> So for every packet processed, you want to require the user to load
>> and run a (unJITed) nft program acting as a wrapper to run a JITed
>> BPF program? What it the benefit of this model compared to what Daniel
>> is proposing? The hooking point is the same. This only introduces
>> additional per packet overhead in the fast path. Am I missing
>> something?
>
> Have a look at net/ipv4/netfilter/nft_chain_route_ipv4.c for instance.
> In your case, you have to add a new chain type:
>
> static const struct nf_chain_type nft_chain_bpf = {
>          .name           = "bpf",
>          .type           = NFT_CHAIN_T_BPF,
>          ...
>          .hooks          = {
>                  [NF_INET_LOCAL_IN]      = nft_do_bpf,
>                  [NF_INET_LOCAL_OUT]     = nft_do_bpf,
>                  [NF_INET_FORWARD]       = nft_do_bpf,
>                  [NF_INET_PRE_ROUTING]   = nft_do_bpf,
>                  [NF_INET_POST_ROUTING]  = nft_do_bpf,
>          },
> };
>
> nft_do_bpf() is the raw netfilter hook that you register, this hook
> will just execute to iterate over the list of bpf filters and run
> them.
>
> This new chain is created on demand, so no overhead if not needed, eg.
>
> nft add table bpf
> nft add chain bpf input { type bpf hook output priority 0\; }
>
> Then, you add a rule for each bpf program you want to run, just like
> tc+bpf.

But from a high-level point of view, this sounds like a huge hack to me,
in the sense that nft as a bytecode engine (from whole architecture I
mean) calls into another bytecode engine such as BPF as an extension. And
BPF code from there isn't using any of the features from nft besides being
invoked from the hook, yet it needs to be a hard dependency to be compiled
into the kernel when the only thing that is needed is to just load and
execute a BPF program that can do already everything needed by itself.

Also, if we look at history, at times such tings have been tried in the
past - just take tc calling into iptables as an example - I get goosebumps
(and probably you as well ;)), as the result looks worse than it would
have looked like when it would have been just kept separated.
I don't quite understand this detour, I mean, I would understand it when
the idea is that nft is using BPF as an intermediate to make use of all
the already existing JIT/offloading features that have been developed over
the years to not duplicate all the work for arch folks, but off-topic
in this discussion context here. I was hoping that nft would try to avoid
some of those exotic modules we have from xt, I would consider xt_bpf (no
offense ;)) as one of them; in the sense that in the context/model back
then of xt it made sense as most xt modules were kind of hard-coded, but
that was overcome with nft itself. So I'm not really keen on nft_bpf
idea besides that it also doesn't use anything else other than the hooks
themselves for what is proposed. Really, both are two different worlds
with different programming models and use-cases and it doesn't really make
sense to brute-force them together.

> Benefits are, rewording previous email:
>
> * You get access to all of the existing netfilter hooks in one go
>    to run bpf programs. No need for specific redundant hooks. This
>    provides raw access to the netfilter hook, you define the little
>    code that your hook runs before you bpf run invocation. So there
>    is *no need to bloat the stack with more hooks, we use what we
>    have.*

But also this doesn't really address the fundamental underlying problem
that is discussed here. nft doesn't even have cgroups v2 support, only
xt_cgroups has it so far, but even if it would have it, then it's still
a scalability issue that this model has over what is being proposed by
Daniel, since you still need to test linearly wrt cgroups v2 membership,
whereas in the set that is proposed it's integral part of cgroups and can
be extended further, also for non-networking users to use this facility.
Or would the idea be that the current netfilter hooks would be redone in
a way that they are generic enough so that any other user could make use
of it independent of netfilter?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
       [not found]                                     ` <57E3F4F9.70300-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
@ 2016-09-22 15:53                                       ` Daniel Mack
  2016-09-23 13:17                                       ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf " Pablo Neira Ayuso
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Mack @ 2016-09-22 15:53 UTC (permalink / raw)
  To: Daniel Borkmann, Pablo Neira Ayuso, Thomas Graf
  Cc: htejun-b10kYP2dOMg, ast-b10kYP2dOMg,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 09/22/2016 05:12 PM, Daniel Borkmann wrote:
> On 09/22/2016 02:05 PM, Pablo Neira Ayuso wrote:

>> Benefits are, rewording previous email:
>>
>> * You get access to all of the existing netfilter hooks in one go
>>    to run bpf programs. No need for specific redundant hooks. This
>>    provides raw access to the netfilter hook, you define the little
>>    code that your hook runs before you bpf run invocation. So there
>>    is *no need to bloat the stack with more hooks, we use what we
>>    have.*
> 
> But also this doesn't really address the fundamental underlying problem
> that is discussed here. nft doesn't even have cgroups v2 support, only
> xt_cgroups has it so far, but even if it would have it, then it's still
> a scalability issue that this model has over what is being proposed by
> Daniel, since you still need to test linearly wrt cgroups v2 membership,
> whereas in the set that is proposed it's integral part of cgroups and can
> be extended further, also for non-networking users to use this facility.
> Or would the idea be that the current netfilter hooks would be redone in
> a way that they are generic enough so that any other user could make use
> of it independent of netfilter?

Yes, that part I don't understand either.

Pablo, could you outline in more detail (in terms of syscalls, commands,
resulting nftables layout etc) how your proposed model would support
having per-cgroup byte and packet counters for both ingress and egress,
and filtering at least for ingress?

And how would that mitigate the race gaps you have been worried about,
between cgroup creation and filters taking effect for a task?


Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf egress programs
       [not found]                                     ` <57E3F4F9.70300-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
  2016-09-22 15:53                                       ` Daniel Mack
@ 2016-09-23 13:17                                       ` Pablo Neira Ayuso
  2016-09-26 10:10                                         ` Daniel Borkmann
  1 sibling, 1 reply; 29+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-23 13:17 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Thomas Graf, Daniel Mack, htejun-b10kYP2dOMg, ast-b10kYP2dOMg,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, harald-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, sargun-GaZTRHToo+CzQB+pC5nmwQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 22, 2016 at 05:12:57PM +0200, Daniel Borkmann wrote:
> On 09/22/2016 02:05 PM, Pablo Neira Ayuso wrote:
[...]
> >Have a look at net/ipv4/netfilter/nft_chain_route_ipv4.c for instance.
> >In your case, you have to add a new chain type:
> >
> >static const struct nf_chain_type nft_chain_bpf = {
> >         .name           = "bpf",
> >         .type           = NFT_CHAIN_T_bpf,
> >         ...
> >         .hooks          = {
> >                 [NF_INET_LOCAL_IN]      = nft_do_bpf,
> >                 [NF_INET_LOCAL_OUT]     = nft_do_bpf,
> >                 [NF_INET_FORWARD]       = nft_do_bpf,
> >                 [NF_INET_PRE_ROUTING]   = nft_do_bpf,
> >                 [NF_INET_POST_ROUTING]  = nft_do_bpf,
> >         },
> >};
> >
> >nft_do_bpf() is the raw netfilter hook that you register, this hook
> >will just execute to iterate over the list of bpf filters and run
> >them.
> >
> >This new chain is created on demand, so no overhead if not needed, eg.
> >
> >nft add table bpf
> >nft add chain bpf input { type bpf hook output priority 0\; }
> >
> >Then, you add a rule for each bpf program you want to run, just like
> >tc+bpf.
> 
> But from a high-level point of view, this sounds like a huge hack to me,
> in the sense that nft as a bytecode engine (from whole architecture I
> mean) calls into another bytecode engine such as bpf as an extension.

nft is not only bytecode engine, it provides a netlink socket
interface to register hooks (from user perspective, these are called
basechain). It is providing the infrastructure that you're lacking
indeed and addressing the concerns I mentioned about the visibility of
the global policy that you want to apply on the packet path.

As I explained you can potentially add any basechain type with
specific semantics. Proposed semantics for this bpf chain would be:

1) You can use any of the existing netfilter hooks.
2) You can only run bpf program from there. No chance for the user
   can mix nftables with bpf VM.

> And bpf code from there isn't using any of the features from nft
> besides being invoked from the hook

I think there's a misunderstading here.

You will not run nft_do_chain(), you don't waste cycles to run what is
specific to nftables. You will just run nft_do_bpf() which will just
do what you want to run for each packet. Thus, you have control on
what nft_do_bpf() does and decide on what that function spend cycles
on.

> [...] I was hoping that nft would try to avoid some of those exotic
> modules we have from xt, I would consider xt_bpf (no offense ;))

This has nothing to do with it. In xt_bpf you waste cycles running
code that is specific to iptables, what I propose would not, just the
generic hook code and then your code.

[...]
> >Benefits are, rewording previous email:
> >
> >* You get access to all of the existing netfilter hooks in one go
> >   to run bpf programs. No need for specific redundant hooks. This
> >   provides raw access to the netfilter hook, you define the little
> >   code that your hook runs before you bpf run invocation. So there
> >   is *no need to bloat the stack with more hooks, we use what we
> >   have.*
> 
> But also this doesn't really address the fundamental underlying problem
> that is discussed here. nft doesn't even have cgroups v2 support.

You don't need native cgroups v2 support in nft, you just run bpf
programs from the native bpf basechain type. So whatever bpf supports,
you can do it.

Instead, if you take this approach, you will get access to all of the
existing hooks to run bpf programs, this includes arp, bridge and
potentially run filters for both ip and ip6 through our inet family.

[...]
> Or would the idea be that the current netfilter hooks would be redone in
> a way that they are generic enough so that any other user could make use
> of it independent of netfilter?

Redone? Why? What do you need, a rename?

Dependencies are very few: CONFIG_NETFILTER for the hooks,
CONFIG_NF_TABLES to obtain the netlink interface to load the bpf
programs and CONFIG_NF_TABLES_BPF to define the bpf basechain type
semantics to run bpf programs from there. It's actually very little
boilerplate code.

Other than that, I can predict where you're going: You will end up
adding a hook just before/after every of the existing netfilter hooks,
and that is really nonsense to me. Why bloat the stack with more
hooks? Use what it is already available.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf egress programs
  2016-09-23 13:17                                       ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf " Pablo Neira Ayuso
@ 2016-09-26 10:10                                         ` Daniel Borkmann
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Borkmann @ 2016-09-26 10:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Thomas Graf, Daniel Mack, htejun, ast, davem, kafai, fw, harald,
	netdev, sargun, cgroups

On 09/23/2016 03:17 PM, Pablo Neira Ayuso wrote:
> On Thu, Sep 22, 2016 at 05:12:57PM +0200, Daniel Borkmann wrote:
>> On 09/22/2016 02:05 PM, Pablo Neira Ayuso wrote:
> [...]
>>> Have a look at net/ipv4/netfilter/nft_chain_route_ipv4.c for instance.
>>> In your case, you have to add a new chain type:
>>>
>>> static const struct nf_chain_type nft_chain_bpf = {
>>>          .name           = "bpf",
>>>          .type           = NFT_CHAIN_T_bpf,
>>>          ...
>>>          .hooks          = {
>>>                  [NF_INET_LOCAL_IN]      = nft_do_bpf,
>>>                  [NF_INET_LOCAL_OUT]     = nft_do_bpf,
>>>                  [NF_INET_FORWARD]       = nft_do_bpf,
>>>                  [NF_INET_PRE_ROUTING]   = nft_do_bpf,
>>>                  [NF_INET_POST_ROUTING]  = nft_do_bpf,
>>>          },
>>> };
>>>
>>> nft_do_bpf() is the raw netfilter hook that you register, this hook
>>> will just execute to iterate over the list of bpf filters and run
>>> them.
>>>
>>> This new chain is created on demand, so no overhead if not needed, eg.
>>>
>>> nft add table bpf
>>> nft add chain bpf input { type bpf hook output priority 0\; }
>>>
>>> Then, you add a rule for each bpf program you want to run, just like
>>> tc+bpf.
>>
>> But from a high-level point of view, this sounds like a huge hack to me,
>> in the sense that nft as a bytecode engine (from whole architecture I
>> mean) calls into another bytecode engine such as bpf as an extension.
>
> nft is not only bytecode engine, it provides a netlink socket
> interface to register hooks (from user perspective, these are called
> basechain). It is providing the infrastructure that you're lacking
> indeed and addressing the concerns I mentioned about the visibility of
> the global policy that you want to apply on the packet path.
>
> As I explained you can potentially add any basechain type with
> specific semantics. Proposed semantics for this bpf chain would be:
>
> 1) You can use any of the existing netfilter hooks.
> 2) You can only run bpf program from there. No chance for the user
>     can mix nftables with bpf VM.
>
>> And bpf code from there isn't using any of the features from nft
>> besides being invoked from the hook
>
> I think there's a misunderstading here.
>
> You will not run nft_do_chain(), you don't waste cycles to run what is
> specific to nftables. You will just run nft_do_bpf() which will just
> do what you want to run for each packet. Thus, you have control on
> what nft_do_bpf() does and decide on what that function spend cycles
> on.
>
>> [...] I was hoping that nft would try to avoid some of those exotic
>> modules we have from xt, I would consider xt_bpf (no offense ;))
>
> This has nothing to do with it. In xt_bpf you waste cycles running
> code that is specific to iptables, what I propose would not, just the
> generic hook code and then your code.
>
> [...]
>>> Benefits are, rewording previous email:
>>>
>>> * You get access to all of the existing netfilter hooks in one go
>>>    to run bpf programs. No need for specific redundant hooks. This
>>>    provides raw access to the netfilter hook, you define the little
>>>    code that your hook runs before you bpf run invocation. So there
>>>    is *no need to bloat the stack with more hooks, we use what we
>>>    have.*
>>
>> But also this doesn't really address the fundamental underlying problem
>> that is discussed here. nft doesn't even have cgroups v2 support.
>
> You don't need native cgroups v2 support in nft, you just run bpf
> programs from the native bpf basechain type. So whatever bpf supports,
> you can do it.

Yes, and I'm saying that the existing netfilter hooks alone still
don't address the underlying issue as mentioned before. From ingress
side you still need some form of a hook where you get the final sk
(non early-demuxed ones I mean), nothing changed on that issue as far
as I can see.

> Instead, if you take this approach, you will get access to all of the
> existing hooks to run bpf programs, this includes arp, bridge and
> potentially run filters for both ip and ip6 through our inet family.
>
> [...]
>> Or would the idea be that the current netfilter hooks would be redone in
>> a way that they are generic enough so that any other user could make use
>> of it independent of netfilter?
>
> Redone? Why? What do you need, a rename?
>
> Dependencies are very few: CONFIG_NETFILTER for the hooks,
> CONFIG_NF_TABLES to obtain the netlink interface to load the bpf
> programs and CONFIG_NF_TABLES_BPF to define the bpf basechain type
> semantics to run bpf programs from there. It's actually very little
> boilerplate code.

Still not really keen on this idea. You still need an additional,
non-symmetric hook for sk input, we add a dependency that forces
CONFIG_NETFILTER and CONFIG_NF_TABLES where it doesn't really need
it besides the hook point itself, and while boilerplate code for
integrating/adding the BPF basechain may be okay'ish from kernel
side in size, you still need the whole front-end infrastructure
with ELF loader etc we've added over time into tc. So what you would
end up with is the same duplicated infrastructure side you have
with tc + BPF already.

Therefore my question was in the direction of making the hook
generic in a way, where we could just add a new parent id to
sch_clsact for clsact_find_tcf(), and then from all the available
generic hooks, the two needed here can be addressed from sch_clsact
as well without the detour via adding entire infrastructure around
nft, since this is already available from tc by just going via
tc_classify() like we do in functions such as sch_handle_ingress()
and sch_handle_egress(). This still gives you the visibility and
infrastructure you are asking for, and avoids to duplicate the same
functionality from tc over into nft. It would also help interactions
with the already existing 'tc filter add ... {ingress,egress} bpf ..'
parents, since from a BPF programmability perspective, it would
reuse the same existing loader code and thus eases maps sharing,
and allow for program code to be reused in the same object file.

> Other than that, I can predict where you're going: You will end up
> adding a hook just before/after every of the existing netfilter hooks,
> and that is really nonsense to me.

I don't think that is needed here.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 0/6] Add eBPF hooks for cgroups
       [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
@ 2016-10-21  5:32     ` David Ahern
  2016-09-19 16:43   ` [PATCH v6 4/6] net: filter: run cgroup eBPF ingress programs Daniel Mack
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: David Ahern @ 2016-10-21  5:32 UTC (permalink / raw)
  To: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, Pablo Neira Ayuso
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA

On 9/19/16 10:43 AM, Daniel Mack wrote:
> This is v6 of the patch set to allow eBPF programs for network
> filtering and accounting to be attached to cgroups, so that they apply
> to all sockets of all tasks placed in that cgroup. The logic also
> allows to be extendeded for other cgroup based eBPF logic.
> 
> 
> Changes from v5:
> 
> * The eBPF programs now operate on L3 rather than on L2 of the packets,
>   and the egress hooks were moved from __dev_queue_xmit() to
>   ip*_output().
> 
> * For BPF_PROG_TYPE_CGROUP_SOCKET, disallow direct access to the skb
>   through BPF_LD_[ABS|IND] instructions, but hook up the
>   bpf_skb_load_bytes() access helper instead. Thanks to Daniel Borkmann
>   for the help.

It's been a month since the last response or update to this series. Any progress in resolving the resistance to hook locations?

As I mentioned in Tokyo I need a solution for VRF that allows running processes in a VRF context -- meaning a process inherits a default sk_bound_dev_if for any AF_INET{6} sockets opened. Right now we (Cumulus) are using an l3mdev cgroup, something that Tejun pushed back on earlier this year. I strongly believe that cgroups provide the right infrastructure for this feature and looking at options. I'm sure a few people will chuckle about this, but I do have another solution that leverages this patchset -- a bpf program on a cgroup that sets sk_bound_dev_if. So, what's the likelihood of this patchset making 4.10 (or any other release)?

Thanks,
David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 0/6] Add eBPF hooks for cgroups
@ 2016-10-21  5:32     ` David Ahern
  0 siblings, 0 replies; 29+ messages in thread
From: David Ahern @ 2016-10-21  5:32 UTC (permalink / raw)
  To: Daniel Mack, htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA

On 9/19/16 10:43 AM, Daniel Mack wrote:
> This is v6 of the patch set to allow eBPF programs for network
> filtering and accounting to be attached to cgroups, so that they apply
> to all sockets of all tasks placed in that cgroup. The logic also
> allows to be extendeded for other cgroup based eBPF logic.
> 
> 
> Changes from v5:
> 
> * The eBPF programs now operate on L3 rather than on L2 of the packets,
>   and the egress hooks were moved from __dev_queue_xmit() to
>   ip*_output().
> 
> * For BPF_PROG_TYPE_CGROUP_SOCKET, disallow direct access to the skb
>   through BPF_LD_[ABS|IND] instructions, but hook up the
>   bpf_skb_load_bytes() access helper instead. Thanks to Daniel Borkmann
>   for the help.

It's been a month since the last response or update to this series. Any progress in resolving the resistance to hook locations?

As I mentioned in Tokyo I need a solution for VRF that allows running processes in a VRF context -- meaning a process inherits a default sk_bound_dev_if for any AF_INET{6} sockets opened. Right now we (Cumulus) are using an l3mdev cgroup, something that Tejun pushed back on earlier this year. I strongly believe that cgroups provide the right infrastructure for this feature and looking at options. I'm sure a few people will chuckle about this, but I do have another solution that leverages this patchset -- a bpf program on a cgroup that sets sk_bound_dev_if. So, what's the likelihood of this patchset making 4.10 (or any other release)?

Thanks,
David

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-10-21  5:32 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19 16:43 [PATCH v6 0/6] Add eBPF hooks for cgroups Daniel Mack
2016-09-19 16:43 ` [PATCH v6 1/6] bpf: add new prog type for cgroup socket filtering Daniel Mack
2016-09-19 16:43 ` [PATCH v6 3/6] bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands Daniel Mack
2016-09-19 16:44 ` [PATCH v6 6/6] samples: bpf: add userspace example for attaching eBPF programs to cgroups Daniel Mack
     [not found] ` <1474303441-3745-1-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-19 16:43   ` [PATCH v6 2/6] cgroup: add support for eBPF programs Daniel Mack
2016-09-19 16:43   ` [PATCH v6 4/6] net: filter: run cgroup eBPF ingress programs Daniel Mack
2016-09-19 16:44   ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs Daniel Mack
2016-09-19 19:19     ` Pablo Neira Ayuso
2016-09-19 19:30       ` Daniel Mack
     [not found]         ` <ac88bb4c-ab7c-1f74-c7fd-79e523b50ae4-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-19 20:35           ` Pablo Neira Ayuso
2016-09-19 20:56             ` Daniel Mack
2016-09-20 14:29               ` Pablo Neira Ayuso
2016-09-20 16:43                 ` Daniel Mack
     [not found]                   ` <6584b975-fa3e-8d98-f0c7-a2c6b194b2b6-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-21 15:45                     ` Pablo Neira Ayuso
2016-09-21 18:48                       ` Thomas Graf
     [not found]                         ` <20160921184827.GA15732-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>
2016-09-22  9:21                           ` Pablo Neira Ayuso
2016-09-22  9:54                             ` Thomas Graf
     [not found]                               ` <20160922095411.GA5654-4EA/1caXOu0mYvmMESoHnA@public.gmane.org>
2016-09-22 12:05                                 ` Pablo Neira Ayuso
2016-09-22 15:12                                   ` Daniel Borkmann
     [not found]                                     ` <57E3F4F9.70300-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
2016-09-22 15:53                                       ` Daniel Mack
2016-09-23 13:17                                       ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf " Pablo Neira Ayuso
2016-09-26 10:10                                         ` Daniel Borkmann
2016-09-20 16:53                 ` [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF " Thomas Graf
2016-09-19 20:13       ` Alexei Starovoitov
2016-09-19 20:39         ` Pablo Neira Ayuso
     [not found]         ` <20160919201322.GA84770-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-09-19 21:28           ` Thomas Graf
     [not found]     ` <1474303441-3745-6-git-send-email-daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org>
2016-09-20  5:44       ` kbuild test robot
2016-10-21  5:32   ` [PATCH v6 0/6] Add eBPF hooks for cgroups David Ahern
2016-10-21  5:32     ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.