[RFC PATCH net-next 0/2] bpf: Allow retrieval of ebpf filters

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH net-next 0/2] bpf: Allow retrieval of ebpf filters
@ 2017-02-03 20:38 David Ahern
  2017-02-03 20:38 ` [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions David Ahern
  2017-02-03 20:38 ` [RFC PATCH net-next 2/2] bpf: Add support to retrieve program attached to a cgroup David Ahern
  0 siblings, 2 replies; 15+ messages in thread
From: David Ahern @ 2017-02-03 20:38 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel; +Cc: roopa, David Ahern

As mentioned at netconf in October, insight into bpf filters is an
essential part of debugging and verifying a particular networking
configuration. For example, classic bpf filters can be returned for
packet sockets as part of the sock_diag infrastructure and the
PACKET_DIAG_FILTER attribute. This capability is leveraged by
'ss --bpf' to dump the filter when requested.

This series adds similar support to ebpf, starting with filters
attached to a cgroup. The first patch saves the original bpf
instructions in a manner similar to classic bpf. The second patch
allows the retrieval of filters applied to a cgroup.

David Ahern (2):
  bpf: Save original ebpf instructions
  bpf: Add support to retrieve program attached to a cgroup

 include/linux/bpf-cgroup.h |  7 ++++
 include/linux/filter.h     |  5 ++-
 include/uapi/linux/bpf.h   |  9 +++++
 kernel/bpf/cgroup.c        | 31 +++++++++++++++
 kernel/bpf/syscall.c       | 97 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/cgroup.c            | 12 ++++++
 6 files changed, 160 insertions(+), 1 deletion(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-03 20:38 [RFC PATCH net-next 0/2] bpf: Allow retrieval of ebpf filters David Ahern
@ 2017-02-03 20:38 ` David Ahern
  2017-02-03 21:09   ` Daniel Borkmann
  2017-02-03 20:38 ` [RFC PATCH net-next 2/2] bpf: Add support to retrieve program attached to a cgroup David Ahern
  1 sibling, 1 reply; 15+ messages in thread
From: David Ahern @ 2017-02-03 20:38 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel; +Cc: roopa, David Ahern

Similar to classic bpf, support saving original ebpf instructions

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/linux/filter.h |  5 ++++-
 kernel/bpf/syscall.c   | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index e4eb2546339a..86b1e3624463 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -391,7 +391,10 @@ struct compat_sock_fprog {
 
 struct sock_fprog_kern {
 	u16			len;
-	struct sock_filter	*filter;
+	union {
+		struct sock_filter	*filter;
+		struct bpf_insn		*insn;
+	};
 };
 
 struct bpf_binary_header {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 08a4d287226b..95e640a3ed99 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -694,11 +694,43 @@ static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
 	free_uid(user);
 }
 
+static int bpf_prog_store_orig_insn(struct bpf_prog *prog)
+{
+	u32 isize = bpf_prog_insn_size(prog);
+	struct sock_fprog_kern *fkprog;
+
+	prog->orig_prog = kmalloc(sizeof(*fkprog), GFP_KERNEL);
+	if (!prog->orig_prog)
+		return -ENOMEM;
+
+	fkprog = prog->orig_prog;
+	fkprog->len = prog->len;
+
+	fkprog->insn = kmemdup(prog->insnsi, isize, GFP_KERNEL | __GFP_NOWARN);
+	if (!fkprog->insn) {
+		kfree(prog->orig_prog);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void bpf_release_orig_insn(struct bpf_prog *fp)
+{
+	struct sock_fprog_kern *fprog = fp->orig_prog;
+
+	if (fprog) {
+		kfree(fprog->insn);
+		kfree(fprog);
+	}
+}
+
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 {
 	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
 
 	free_used_maps(aux);
+	bpf_release_orig_insn(aux->prog);
 	bpf_prog_uncharge_memlock(aux->prog);
 	bpf_prog_free(aux->prog);
 }
@@ -885,6 +917,10 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err < 0)
 		goto free_prog;
 
+	err = bpf_prog_store_orig_insn(prog);
+	if (err < 0)
+		goto free_prog;
+
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr);
 	if (err < 0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH net-next 2/2] bpf: Add support to retrieve program attached to a cgroup
  2017-02-03 20:38 [RFC PATCH net-next 0/2] bpf: Allow retrieval of ebpf filters David Ahern
  2017-02-03 20:38 ` [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions David Ahern
@ 2017-02-03 20:38 ` David Ahern
  1 sibling, 0 replies; 15+ messages in thread
From: David Ahern @ 2017-02-03 20:38 UTC (permalink / raw)
  To: netdev, alexei.starovoitov, daniel; +Cc: roopa, David Ahern

Add support to ebpf to retrieve a program attached to a cgroup.
This is done using a new bpf_cmd, BPF_PROG_GET_ATTACH, and
associated struct in bpf_attr.

This allows a program to verify a bpf filter attached to a
cgroup as well as verify the lack of a filter - which can be
just as relevant when debugging a problem.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/linux/bpf-cgroup.h |  7 ++++++
 include/uapi/linux/bpf.h   |  9 +++++++
 kernel/bpf/cgroup.c        | 31 +++++++++++++++++++++++
 kernel/bpf/syscall.c       | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/cgroup.c            | 12 +++++++++
 5 files changed, 120 insertions(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 92bc89ae7e20..5a9fde760332 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -36,6 +36,13 @@ void cgroup_bpf_update(struct cgroup *cgrp,
 		       struct bpf_prog *prog,
 		       enum bpf_attach_type type);
 
+struct bpf_prog *__cgroup_bpf_get(struct cgroup *cgrp,
+				  enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_get() protected by cgroup_mutex */
+struct bpf_prog *cgroup_bpf_get(struct cgroup *cgrp,
+				enum bpf_attach_type type);
+
 int __cgroup_bpf_run_filter_skb(struct sock *sk,
 				struct sk_buff *skb,
 				enum bpf_attach_type type);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e07fd5a324e6..f321b96459e2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -81,6 +81,7 @@ enum bpf_cmd {
 	BPF_OBJ_GET,
 	BPF_PROG_ATTACH,
 	BPF_PROG_DETACH,
+	BPF_PROG_GET_ATTACH,
 };
 
 enum bpf_map_type {
@@ -179,6 +180,14 @@ union bpf_attr {
 		__u32		attach_bpf_fd;	/* eBPF program to attach */
 		__u32		attach_type;
 	};
+
+	struct { /* anonymous struct used by BPF_PROG_ATTACH_GET command */
+		__u32		target_get_fd;	/* container object attached to */
+		__u32		attach_type_get;
+		__u32		prog_type_get;	/* one of enum bpf_prog_type */
+		__u32		insn_cnt_get;
+		__aligned_u64	insns_get;
+	};
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a515f7b007c6..7d9f12606939 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -117,6 +117,37 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 	}
 }
 
+struct bpf_prog *__cgroup_bpf_get(struct cgroup *cgrp,
+				  enum bpf_attach_type type)
+{
+	struct bpf_prog *cgrp_prog, *prog;
+	struct bpf_insn *insns;
+	u32 len;
+
+	cgrp_prog = rcu_dereference_protected(cgrp->bpf.effective[type],
+					      lockdep_is_held(&cgroup_mutex));
+	if (!cgrp_prog)
+		return NULL;
+
+	if (cgrp_prog->orig_prog) {
+		len = cgrp_prog->orig_prog->len;
+		insns = cgrp_prog->orig_prog->insn;
+	} else {
+		len = cgrp_prog->len;
+		insns = cgrp_prog->insnsi;
+	}
+
+	prog = bpf_prog_alloc(bpf_prog_size(len), GFP_USER);
+	if (!prog)
+		return ERR_PTR(-ENOMEM);
+
+	prog->len = len;
+	memcpy(prog->insns, insns, bpf_prog_insn_size(prog));
+	prog->type = cgrp_prog->type;
+
+	return prog;
+}
+
 /**
  * __cgroup_bpf_run_filter_skb() - Run a program for packet filtering
  * @sk: The socken sending or receiving traffic
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 95e640a3ed99..ad211e5ccaae 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1043,6 +1043,64 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 
 	return 0;
 }
+
+#define BPF_PROG_GET_ATTACH_LAST_FIELD insns_get
+
+static int bpf_prog_get_attach(union bpf_attr *attr,
+			       union bpf_attr __user *uattr)
+{
+	struct bpf_prog *prog;
+	struct cgroup *cgrp;
+	u32 ptype;
+	int err;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (CHECK_ATTR(BPF_PROG_GET_ATTACH))
+		return -EINVAL;
+
+	if (attr->attach_type_get >= MAX_BPF_ATTACH_TYPE)
+		return -EINVAL;
+
+	cgrp = cgroup_get_from_fd(attr->target_get_fd);
+	if (IS_ERR(cgrp))
+		return PTR_ERR(cgrp);
+
+	prog = cgroup_bpf_get(cgrp, attr->attach_type_get);
+	cgroup_put(cgrp);
+
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	/* no program means nothing to copy */
+	if (!prog) {
+		u32 zero = 0;
+
+		if (copy_to_user(&uattr->insn_cnt_get, &zero, sizeof(u32)) ||
+		    copy_to_user(&uattr->prog_type_get, &zero, sizeof(u32)))
+			return -EFAULT;
+
+		return 0;
+	}
+
+	err = -E2BIG;
+	if (attr->insn_cnt_get < prog->len)
+		goto out;
+
+	err = 0;
+	ptype = prog->type;
+	if (copy_to_user(&uattr->insn_cnt_get, &prog->len, sizeof(u32)) ||
+	    copy_to_user(&uattr->prog_type_get, &ptype, sizeof(u32))    ||
+	    copy_to_user(u64_to_user_ptr(attr->insns_get), prog->insns,
+			 bpf_prog_insn_size(prog))) {
+		err = -EFAULT;
+	}
+out:
+	bpf_prog_free(prog);
+	return err;
+}
+
 #endif /* CONFIG_CGROUP_BPF */
 
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
@@ -1119,6 +1177,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_DETACH:
 		err = bpf_prog_detach(&attr);
 		break;
+	case BPF_PROG_GET_ATTACH:
+		err = bpf_prog_get_attach(&attr, uattr);
+		break;
 #endif
 
 	default:
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2ee9ec3051b2..860f639a405e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -6511,6 +6511,18 @@ void cgroup_bpf_update(struct cgroup *cgrp,
 	__cgroup_bpf_update(cgrp, parent, prog, type);
 	mutex_unlock(&cgroup_mutex);
 }
+
+struct bpf_prog *cgroup_bpf_get(struct cgroup *cgrp,
+				enum bpf_attach_type type)
+{
+	struct bpf_prog *prog;
+
+	mutex_lock(&cgroup_mutex);
+	prog = __cgroup_bpf_get(cgrp, type);
+	mutex_unlock(&cgroup_mutex);
+
+	return prog;
+}
 #endif /* CONFIG_CGROUP_BPF */
 
 #ifdef CONFIG_CGROUP_DEBUG
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-03 20:38 ` [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions David Ahern
@ 2017-02-03 21:09   ` Daniel Borkmann
  2017-02-03 22:28     ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2017-02-03 21:09 UTC (permalink / raw)
  To: David Ahern, netdev, alexei.starovoitov; +Cc: roopa

On 02/03/2017 09:38 PM, David Ahern wrote:
> Similar to classic bpf, support saving original ebpf instructions
>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Not convinced that this is in the right direction, this not only *significantly*
increases mem footprint for each and every program, but also when you dump this,
then map references from relocs inside the insns are meaningless (f.e. what about
prog arrays used in tail calls?), so things like criu also won't be able to use
this kind of interface for dump and restore. If it's just for debugging, then
why not extend the existing tracing infrastructure around bpf that was started
with intention to gain more visibility.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-03 21:09   ` Daniel Borkmann
@ 2017-02-03 22:28     ` David Ahern
  2017-02-06 10:56       ` Quentin Monnet
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2017-02-03 22:28 UTC (permalink / raw)
  To: Daniel Borkmann, netdev, alexei.starovoitov; +Cc: roopa

On 2/3/17 2:09 PM, Daniel Borkmann wrote:
> On 02/03/2017 09:38 PM, David Ahern wrote:
>> Similar to classic bpf, support saving original ebpf instructions
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> 
> Not convinced that this is in the right direction, this not only *significantly*
> increases mem footprint for each and every program, but also when you dump this,
> then map references from relocs inside the insns are meaningless (f.e. what about
> prog arrays used in tail calls?), so things like criu also won't be able to use
> this kind of interface for dump and restore. If it's just for debugging, then
> why not extend the existing tracing infrastructure around bpf that was started
> with intention to gain more visibility.

Yes, saving the original bpf increases the memory footprint. If you noticed, a kmemdup is used for the exact instruction size (no page round up). Right now programs are limited to a single page, so worst case  is an extra page per program. I am open to other suggestions. For example, bpf_prog is rounded up to a page which means there could be room at the end of the page for the original instructions. This is definitely true for the ip vrf programs which will be < 32 instructions even with the namespace checking and the conversions done kernel side.

Tracepoints will not solve the problem for me for a number of reasons. Tracepoints have to be hit to return data, and there is no way the tracepoint can return relevant information for me to verify that the correct filter was downloaded. I want the original code. I want to audit what was installed. In my case there could be N VRFs, and I want 'ip vrf' or ifupdown2 or any other command to be able to verify that each cgroup has the correct program, and to verify that the default VRF does *not* have a program installed.

Generically, the bpf code might contain relative data but that's for the user or decoder program to deal with. Surely there is no harm in returning the original, downloaded bpf code to a properly privileged process. If I am debugging some weird network behavior, I want to be able to determine what bpf code is running where and to see what it is doing to whatever degree possible. Saving the original code is the first part of this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-03 22:28     ` David Ahern
@ 2017-02-06 10:56       ` Quentin Monnet
  2017-02-06 14:13         ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Quentin Monnet @ 2017-02-06 10:56 UTC (permalink / raw)
  To: David Ahern, Daniel Borkmann, netdev, alexei.starovoitov; +Cc: roopa

Hi Daniel, David,

2017-02-03 (15:28 -0700) ~ David Ahern <dsa@cumulusnetworks.com>
> On 2/3/17 2:09 PM, Daniel Borkmann wrote:
>> On 02/03/2017 09:38 PM, David Ahern wrote:
>>> Similar to classic bpf, support saving original ebpf instructions
>>>
>>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>>
>> Not convinced that this is in the right direction, this not only *significantly*
>> increases mem footprint for each and every program, but also when you dump this,
>> then map references from relocs inside the insns are meaningless (f.e. what about
>> prog arrays used in tail calls?), so things like criu also won't be able to use
>> this kind of interface for dump and restore. If it's just for debugging, then
>> why not extend the existing tracing infrastructure around bpf that was started
>> with intention to gain more visibility.
> 
> 
> Yes, saving the original bpf increases the memory footprint. If you noticed, a kmemdup is used for the exact instruction size (no page round up). Right now programs are limited to a single page, so worst case  is an extra page per program. I am open to other suggestions. For example, bpf_prog is rounded up to a page which means there could be room at the end of the page for the original instructions. This is definitely true for the ip vrf programs which will be < 32 instructions even with the namespace checking and the conversions done kernel side.
> 
> Tracepoints will not solve the problem for me for a number of reasons. Tracepoints have to be hit to return data, and there is no way the tracepoint can return relevant information for me to verify that the correct filter was downloaded. I want the original code. I want to audit what was installed. In my case there could be N VRFs, and I want 'ip vrf' or ifupdown2 or any other command to be able to verify that each cgroup has the correct program, and to verify that the default VRF does *not* have a program installed.
> 
> Generically, the bpf code might contain relative data but that's for the user or decoder program to deal with. Surely there is no harm in returning the original, downloaded bpf code to a properly privileged process. If I am debugging some weird network behavior, I want to be able to determine what bpf code is running where and to see what it is doing to whatever degree possible. Saving the original code is the first part of this.
> 

This reminds me of that patchset I sent some time ago [1] to dump a
program, and which was rejected for the same reasons. I'd like to stand
with David and confirm that I am interested as well in a solution that
would allow to dump the bytecode without relying on tracepoints.

Quentin

[1] https://www.spinics.net/lists/netdev/msg373259.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-06 10:56       ` Quentin Monnet
@ 2017-02-06 14:13         ` Daniel Borkmann
  2017-02-06 19:21           ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2017-02-06 14:13 UTC (permalink / raw)
  To: Quentin Monnet, David Ahern, netdev, alexei.starovoitov; +Cc: roopa

On 02/06/2017 11:56 AM, Quentin Monnet wrote:
> 2017-02-03 (15:28 -0700) ~ David Ahern <dsa@cumulusnetworks.com>
>> On 2/3/17 2:09 PM, Daniel Borkmann wrote:
>>> On 02/03/2017 09:38 PM, David Ahern wrote:
>>>> Similar to classic bpf, support saving original ebpf instructions
>>>>
>>>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>>>
>>> Not convinced that this is in the right direction, this not only *significantly*
>>> increases mem footprint for each and every program, but also when you dump this,
>>> then map references from relocs inside the insns are meaningless (f.e. what about
>>> prog arrays used in tail calls?), so things like criu also won't be able to use
>>> this kind of interface for dump and restore. If it's just for debugging, then
>>> why not extend the existing tracing infrastructure around bpf that was started
>>> with intention to gain more visibility.
>>
>> Yes, saving the original bpf increases the memory footprint. If you noticed, a kmemdup is used for the exact instruction size (no page round up). Right now programs are limited to a single page, so worst case  is an extra page per program. I am open to other suggestions. For example, bpf_prog is rounded up to a page which means there could be room at the end of the page for the original instructions. This is definitely true for the ip vrf programs which will be < 32 instructions even with the namespace checking and the conversions done kernel side.
>>
>> Tracepoints will not solve the problem for me for a number of reasons. Tracepoints have to be hit to return data, and there is no way the tracepoint can return relevant information for me to verify that the correct filter was downloaded. I want the original code. I want to audit what was installed. In my case there could be N VRFs, and I want 'ip vrf' or ifupdown2 or any other command to be able to verify that each cgroup has the correct program, and to verify that the default VRF does *not* have a program installed.
>>
>> Generically, the bpf code might contain relative data but that's for the user or decoder program to deal with. Surely there is no harm in returning the original, downloaded bpf code to a properly privileged process. If I am debugging some weird network behavior, I want to be able to determine what bpf code is running where and to see what it is doing to whatever degree possible. Saving the original code is the first part of this.

Well, worst case cost would be ~8 additional pages per program that
are very rarely used; assume you want to attach a prog per netdevice,
or worse, one for ingress, one for egress ... and yet we already
complain that netdevice itself is way too big and needs a diet ...
That makes it much much worse, though. Using the remainder of the
used page is thus not enough, I'm afraid.

But also then, what do you do with 4k insns (or multiple of these in
case of tail calls), is there a decompiler in the works? Understandably
that for vrf it might be trivial to just read disasm, but that's just
one very specific use-case. I can see the point of dumping for the
sake of CRIU use-case given that cBPF is supported there in the past,
and CRIU strives to catch up with all kind of kernel APIs. It's
restricted wrt restore to either the same kernel version or newer
kernels, but that's generic issue for CRIU. So far I'm not aware of
user requests for CRIU support, but it could come in future, though.
Thus, should we have some dump interface it definitely needs to be
i) generic and ii) usable for CRIU, so we don't end up with multiple
dump mechanisms due to one API not being designed good enough to
already cover it.

Dump result would need to look a bit similar to what we have in ELF
file, that is, you'd need to construct map descriptions and reference
them in the insn sequence similar to relocations in the loader. I
haven't looked into whether it's feasible, but we should definitely
evaluate possibilities to transform the post-verifier insn sequence
back into a pre-verifier one for dumping, so we don't need to pay the
huge additional cost (also a major fraction of the insns might be
just duplicated here as they weren't rewritten). Perhaps it could
be done with the help of small annotations, or some sort of diff
that we only need to store. Not saying it's trivial, but we should
definitely try hard first and design it carefully if we want to
support such API.

> This reminds me of that patchset I sent some time ago [1] to dump a
> program, and which was rejected for the same reasons. I'd like to stand
> with David and confirm that I am interested as well in a solution that
> would allow to dump the bytecode without relying on tracepoints.
>
> Quentin
>
> [1] https://www.spinics.net/lists/netdev/msg373259.html
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-06 14:13         ` Daniel Borkmann
@ 2017-02-06 19:21           ` Alexei Starovoitov
  2017-02-07 17:22             ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2017-02-06 19:21 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Quentin Monnet, David Ahern, netdev, roopa

On Mon, Feb 06, 2017 at 03:13:15PM +0100, Daniel Borkmann wrote:
> On 02/06/2017 11:56 AM, Quentin Monnet wrote:
> >2017-02-03 (15:28 -0700) ~ David Ahern <dsa@cumulusnetworks.com>
> >>On 2/3/17 2:09 PM, Daniel Borkmann wrote:
> >>>On 02/03/2017 09:38 PM, David Ahern wrote:
> >>>>Similar to classic bpf, support saving original ebpf instructions
> >>>>
> >>>>Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> >>>
> >>>Not convinced that this is in the right direction, this not only *significantly*
> >>>increases mem footprint for each and every program, but also when you dump this,
> >>>then map references from relocs inside the insns are meaningless (f.e. what about
> >>>prog arrays used in tail calls?), so things like criu also won't be able to use
> >>>this kind of interface for dump and restore. If it's just for debugging, then
> >>>why not extend the existing tracing infrastructure around bpf that was started
> >>>with intention to gain more visibility.
> >>
> >>Yes, saving the original bpf increases the memory footprint. If you noticed, a kmemdup is used for the exact instruction size (no page round up). Right now programs are limited to a single page, so worst case  is an extra page per program. I am open to other suggestions. For example, bpf_prog is rounded up to a page which means there could be room at the end of the page for the original instructions. This is definitely true for the ip vrf programs which will be < 32 instructions even with the namespace checking and the conversions done kernel side.
> >>
> >>Tracepoints will not solve the problem for me for a number of reasons. Tracepoints have to be hit to return data, and there is no way the tracepoint can return relevant information for me to verify that the correct filter was downloaded. I want the original code. I want to audit what was installed. In my case there could be N VRFs, and I want 'ip vrf' or ifupdown2 or any other command to be able to verify that each cgroup has the correct program, and to verify that the default VRF does *not* have a program installed.
> >>
> >>Generically, the bpf code might contain relative data but that's for the user or decoder program to deal with. Surely there is no harm in returning the original, downloaded bpf code to a properly privileged process. If I am debugging some weird network behavior, I want to be able to determine what bpf code is running where and to see what it is doing to whatever degree possible. Saving the original code is the first part of this.
> 
> Well, worst case cost would be ~8 additional pages per program that
> are very rarely used; assume you want to attach a prog per netdevice,
> or worse, one for ingress, one for egress ... and yet we already
> complain that netdevice itself is way too big and needs a diet ...
> That makes it much much worse, though. Using the remainder of the
> used page is thus not enough, I'm afraid.
> 
> But also then, what do you do with 4k insns (or multiple of these in
> case of tail calls), is there a decompiler in the works? Understandably
> that for vrf it might be trivial to just read disasm, but that's just
> one very specific use-case. I can see the point of dumping for the
> sake of CRIU use-case given that cBPF is supported there in the past,
> and CRIU strives to catch up with all kind of kernel APIs. It's
> restricted wrt restore to either the same kernel version or newer
> kernels, but that's generic issue for CRIU. So far I'm not aware of
> user requests for CRIU support, but it could come in future, though.
> Thus, should we have some dump interface it definitely needs to be
> i) generic and ii) usable for CRIU, so we don't end up with multiple
> dump mechanisms due to one API not being designed good enough to
> already cover it.
> 
> Dump result would need to look a bit similar to what we have in ELF
> file, that is, you'd need to construct map descriptions and reference
> them in the insn sequence similar to relocations in the loader. I
> haven't looked into whether it's feasible, but we should definitely
> evaluate possibilities to transform the post-verifier insn sequence
> back into a pre-verifier one for dumping, so we don't need to pay the
> huge additional cost (also a major fraction of the insns might be
> just duplicated here as they weren't rewritten). Perhaps it could
> be done with the help of small annotations, or some sort of diff
> that we only need to store. Not saying it's trivial, but we should
> definitely try hard first and design it carefully if we want to
> support such API.

+1

such instruction dump is meaningful only for stateless programs.
Out of the top of my head the minium requirement is that they
shouldn't use maps and no tailcalls.
Also the extra memory is the concern, so I think we need a flag
for BPF_PROG_LOAD command like 'store stateless original prog'.
Then at load time we can check that maps are not used and so on.
The get_attach approach from patch 2 is imo too specific.
I think two independent commands would be better.
One to return new prog_fd from attachment point and
second to do the actual instruction dump based on given prog_fd.
Most of the programs today are stateful, so I'm reluctant to add
all that complexity for small fraction of programs, so I would
like to understand the motivation first.
'ss --bpf' is imo part of the answer. It would need to print
attachment info as well, right? and this is for debugging only?
In case of 'ip vrf' the programs are trivial, so it's really
to debug 'ip vrf' only? and tracepoints somehow not enough?
Or there is more to it that I'm missing?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-06 19:21           ` Alexei Starovoitov
@ 2017-02-07 17:22             ` David Ahern
  2017-02-08 10:52               ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2017-02-07 17:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: Quentin Monnet, netdev, roopa

On 2/6/17 12:21 PM, Alexei Starovoitov wrote:
>> Well, worst case cost would be ~8 additional pages per program that
>> are very rarely used; assume you want to attach a prog per netdevice,
>> or worse, one for ingress, one for egress ... and yet we already
>> complain that netdevice itself is way too big and needs a diet ...
>> That makes it much much worse, though. Using the remainder of the
>> used page is thus not enough, I'm afraid.

sure, 8 bytes per instruction, 4k instructions so worst case is ~8 pages.

>> But also then, what do you do with 4k insns (or multiple of these in
>> case of tail calls), is there a decompiler in the works? Understandably
>> that for vrf it might be trivial to just read disasm, but that's just
>> one very specific use-case. I can see the point of dumping for the

Dumping bpf bytecode is no different than dumping asm with objdump. To
those who know/learn how to read it and make sense of the asm, it is an
important tool. I contend the ability to retrieve and dump BPF code is
equally important.

>> sake of CRIU use-case given that cBPF is supported there in the past,
>> and CRIU strives to catch up with all kind of kernel APIs. It's
>> restricted wrt restore to either the same kernel version or newer
>> kernels, but that's generic issue for CRIU. So far I'm not aware of
>> user requests for CRIU support, but it could come in future, though.
>> Thus, should we have some dump interface it definitely needs to be
>> i) generic and ii) usable for CRIU, so we don't end up with multiple
>> dump mechanisms due to one API not being designed good enough to
>> already cover it.
>>
>> Dump result would need to look a bit similar to what we have in ELF
>> file, that is, you'd need to construct map descriptions and reference
>> them in the insn sequence similar to relocations in the loader. I

That is a userspace problem, not a kernel problem. This discussion is on
the API to return BPF code to userspace. How that program is presented
to the user is up to the tools.

>> haven't looked into whether it's feasible, but we should definitely
>> evaluate possibilities to transform the post-verifier insn sequence
>> back into a pre-verifier one for dumping, so we don't need to pay the
>> huge additional cost (also a major fraction of the insns might be
>> just duplicated here as they weren't rewritten). Perhaps it could
>> be done with the help of small annotations, or some sort of diff
>> that we only need to store. Not saying it's trivial, but we should
>> definitely try hard first and design it carefully if we want to
>> support such API.

I looked at a reverse_convert_ctx_access at the end of last week. I only
have code to reverse sock_filter_convert_ctx_access, but scanning the
other convert functions nothing jumped out suggesting it is not doable.
Saving the original bpf code is the simplest solution, hence I decided
to use it for this discussion.

> +1
> 
> such instruction dump is meaningful only for stateless programs.
> Out of the top of my head the minium requirement is that they
> shouldn't use maps and no tailcalls.
> Also the extra memory is the concern, so I think we need a flag
> for BPF_PROG_LOAD command like 'store stateless original prog'.
> Then at load time we can check that maps are not used and so on.
> The get_attach approach from patch 2 is imo too specific.
> I think two independent commands would be better.
> One to return new prog_fd from attachment point and
> second to do the actual instruction dump based on given prog_fd.
> Most of the programs today are stateful, so I'm reluctant to add
> all that complexity for small fraction of programs, so I would
> like to understand the motivation first.
> 'ss --bpf' is imo part of the answer. It would need to print
> attachment info as well, right? and this is for debugging only?
> In case of 'ip vrf' the programs are trivial, so it's really
> to debug 'ip vrf' only? and tracepoints somehow not enough?
> Or there is more to it that I'm missing?
> 

My motivation for pursing this discussion now is to verify cgroups are
correctly configured on running systems. But, I am also interested in
the generic OS case and being able to debug and understand networking
problems because of bugs in BPF code.

For example, consider the case of enterprise distributions used by
companies as part of their platforms and products with BPF programs
coming from the OS vendor, the hardware vendors, and custom, locally
developed programs (remember that BPF store comment I made in October? I
was only half-joking). As an open OS, you have to anticipate programs
coming from multiple sources, and those programs can and will have bugs.
Further, people make mistakes; people forget commands that were run. It
is fairly easy to see that the wrong bpf program could get attached to
some object.

In both the VRF/cgroup case and Quentin's case, we are talking about
retrieving programs attached to specific objects, not just retrieving
some BPF code that has been loaded into the kernel. bpf programs are
loaded using the bpf system call, but they are attached to objects using
different APIs. Given that, retrieval of the programs attached to
specific objects need to be retrieved with object specific APIs. No?
e.g., In Quentin's patch, the bpf code is added as part of the tc dump
which seems appropriate. For the cgroup case, programs are attached to
cgroups using the bpf system call hence an extension to it is needed to
retrieve the attached program.

Tracepoints are wrong here for multiple reasons: tracepoints do not and
should not look at the bpf code, and auditing / verifying a config can
not rely on executing code that can, hopefully, trigger the tracepoint
to return some information that might shed light on a problem. That is
not a reliable mechanism for verifying something is (or is not in the
case of the default VRF) properly configured.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-07 17:22             ` David Ahern
@ 2017-02-08 10:52               ` Daniel Borkmann
  2017-02-08 19:40                 ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2017-02-08 10:52 UTC (permalink / raw)
  To: David Ahern, Alexei Starovoitov; +Cc: Quentin Monnet, netdev, roopa

On 02/07/2017 06:22 PM, David Ahern wrote:
> On 2/6/17 12:21 PM, Alexei Starovoitov wrote:
>>> Well, worst case cost would be ~8 additional pages per program that
>>> are very rarely used; assume you want to attach a prog per netdevice,
>>> or worse, one for ingress, one for egress ... and yet we already
>>> complain that netdevice itself is way too big and needs a diet ...
>>> That makes it much much worse, though. Using the remainder of the
>>> used page is thus not enough, I'm afraid.
>
> sure, 8 bytes per instruction, 4k instructions so worst case is ~8 pages.
>
>>> But also then, what do you do with 4k insns (or multiple of these in
>>> case of tail calls), is there a decompiler in the works? Understandably
>>> that for vrf it might be trivial to just read disasm, but that's just
>>> one very specific use-case. I can see the point of dumping for the
>
> Dumping bpf bytecode is no different than dumping asm with objdump. To
> those who know/learn how to read it and make sense of the asm, it is an
> important tool. I contend the ability to retrieve and dump BPF code is
> equally important.

I agree that dumping might be useful particularly with further
tooling on top (f.e. decompiler), but in any case the interface for
that needs to be designed carefully and generic, so it covers various
use cases (meaning also CRIU). Above, I was rather referring to what
an admin can make sense of when you have worst-case 4k insns with
complex control flow and you just dump the asm (or in case of [1] it
was pure opcodes) in, say, ss or tc; goal was on debugability and
the people debugging are not always kernel devs. Actually currently,
for cBPF dumps it looks like this in ss. Can you tell me what these
11 insns do? Likely you can, but can a normal admin?

# ss -0 -b
Netid  Recv-Q Send-Q                                       Local Address:Port                                                        Peer Address:Port
p_raw  0      0                                                        *:em1                                                                *
	bpf filter (11):  0x28 0 0 12, 0x15 0 8 2048, 0x30 0 0 23, 0x15 0 6 17, 0x28 0 0 20, 0x45 4 0 8191, 0xb1 0 0 14, 0x48 0 0 16, 0x15 0 1 68, 0x06 0 0 4294967295, 0x06 0 0 0,

Certainly that can and should be further improved (f.e. see bpf_disasm()
in tools/net/bpf_dbg.c) to make it actually more readable w/o any
external tooling that does not ship with iproute2. That could help
already, but is still hard. Oddly enough, for cBPF everyone was fine
so far with plain opcode dump it seems?! For eBPF, it could be same
kind of disasm that people are used from verifier with further debug
help when it comes to context access and helpers. So I'm not against
this, but if it's just plain opcodes smashed to the user w/o further
tooling/infra on top aiding the reverse engineering process, it's
pretty much obfuscated and unusable. Thus, both needs to be provided
here. Admittedly, I don't know how your iproute2 plans looked like.

   [1] https://github.com/6WIND/iproute2/commit/288af23f4dc0adac6c288b3f70ae25997ea5bebf

>>> sake of CRIU use-case given that cBPF is supported there in the past,
>>> and CRIU strives to catch up with all kind of kernel APIs. It's
>>> restricted wrt restore to either the same kernel version or newer
>>> kernels, but that's generic issue for CRIU. So far I'm not aware of
>>> user requests for CRIU support, but it could come in future, though.
>>> Thus, should we have some dump interface it definitely needs to be
>>> i) generic and ii) usable for CRIU, so we don't end up with multiple
>>> dump mechanisms due to one API not being designed good enough to
>>> already cover it.
>>>
>>> Dump result would need to look a bit similar to what we have in ELF
>>> file, that is, you'd need to construct map descriptions and reference
>>> them in the insn sequence similar to relocations in the loader. I
>
> That is a userspace problem, not a kernel problem. This discussion is on
> the API to return BPF code to userspace. How that program is presented
> to the user is up to the tools.

It's not a user space problem. When you want to CRIU your program,
then you also need the correlation to maps that are used in the program.
So the dump needs to export both, the map meta data and references to
it in the insns stream where src_reg is BPF_PSEUDO_MAP_FD. With map meta
data, I mean the attrs used to originally create the map via bpf(2), so
that you can later restore the same thing (taking the actual map data
aside for now as also not part of that specific api). Since a bpf prog
has all the maps referenced in its aux data, it should be doable.
Perhaps in case the map was pinned, that meta data should also contain
dev/indoe of the pinned map. There's still the issue of generating a
snapshot of map data itself, and to extract/dump prog insns from a tail
call map, though.

>>> haven't looked into whether it's feasible, but we should definitely
>>> evaluate possibilities to transform the post-verifier insn sequence
>>> back into a pre-verifier one for dumping, so we don't need to pay the
>>> huge additional cost (also a major fraction of the insns might be
>>> just duplicated here as they weren't rewritten). Perhaps it could
>>> be done with the help of small annotations, or some sort of diff
>>> that we only need to store. Not saying it's trivial, but we should
>>> definitely try hard first and design it carefully if we want to
>>> support such API.
>
> I looked at a reverse_convert_ctx_access at the end of last week. I only
> have code to reverse sock_filter_convert_ctx_access, but scanning the
> other convert functions nothing jumped out suggesting it is not doable.

Ok.

> Saving the original bpf code is the simplest solution, hence I decided
> to use it for this discussion.

Simplest but with downside of huge worst-case cost, see my earlier
comparison to bloating net devices, where the progs are eventually
attached to, at least in tc case. So lets find ways to avoid this big
overhead when designing this API.

>> +1
>>
>> such instruction dump is meaningful only for stateless programs.
>> Out of the top of my head the minium requirement is that they
>> shouldn't use maps and no tailcalls.
>> Also the extra memory is the concern, so I think we need a flag
>> for BPF_PROG_LOAD command like 'store stateless original prog'.
>> Then at load time we can check that maps are not used and so on.
>> The get_attach approach from patch 2 is imo too specific.
>> I think two independent commands would be better.
>> One to return new prog_fd from attachment point and
>> second to do the actual instruction dump based on given prog_fd.
>> Most of the programs today are stateful, so I'm reluctant to add
>> all that complexity for small fraction of programs, so I would
>> like to understand the motivation first.
>> 'ss --bpf' is imo part of the answer. It would need to print
>> attachment info as well, right? and this is for debugging only?
>> In case of 'ip vrf' the programs are trivial, so it's really
>> to debug 'ip vrf' only? and tracepoints somehow not enough?
>> Or there is more to it that I'm missing?
>
> My motivation for pursing this discussion now is to verify cgroups are
> correctly configured on running systems. But, I am also interested in
> the generic OS case and being able to debug and understand networking
> problems because of bugs in BPF code.
>
> For example, consider the case of enterprise distributions used by
> companies as part of their platforms and products with BPF programs
> coming from the OS vendor, the hardware vendors, and custom, locally
> developed programs (remember that BPF store comment I made in October? I
> was only half-joking). As an open OS, you have to anticipate programs
> coming from multiple sources, and those programs can and will have bugs.
> Further, people make mistakes; people forget commands that were run. It
> is fairly easy to see that the wrong bpf program could get attached to
> some object.
>
> In both the VRF/cgroup case and Quentin's case, we are talking about
> retrieving programs attached to specific objects, not just retrieving
> some BPF code that has been loaded into the kernel. bpf programs are
> loaded using the bpf system call, but they are attached to objects using
> different APIs. Given that, retrieval of the programs attached to
> specific objects need to be retrieved with object specific APIs. No?
> e.g., In Quentin's patch, the bpf code is added as part of the tc dump
> which seems appropriate. For the cgroup case, programs are attached to
> cgroups using the bpf system call hence an extension to it is needed to
> retrieve the attached program.
>
> Tracepoints are wrong here for multiple reasons: tracepoints do not and
> should not look at the bpf code, and auditing / verifying a config can
> not rely on executing code that can, hopefully, trigger the tracepoint
> to return some information that might shed light on a problem. That is
> not a reliable mechanism for verifying something is (or is not in the
> case of the default VRF) properly configured.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-08 10:52               ` Daniel Borkmann
@ 2017-02-08 19:40                 ` David Ahern
  2017-02-09  1:28                   ` David Ahern
  2017-02-09 11:25                   ` Daniel Borkmann
  0 siblings, 2 replies; 15+ messages in thread
From: David Ahern @ 2017-02-08 19:40 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: Quentin Monnet, netdev, roopa

On 2/8/17 3:52 AM, Daniel Borkmann wrote:
> I agree that dumping might be useful particularly with further
> tooling on top (f.e. decompiler), but in any case the interface for
> that needs to be designed carefully and generic, so it covers various
> use cases (meaning also CRIU). Above, I was rather referring to what
> an admin can make sense of when you have worst-case 4k insns with
> complex control flow and you just dump the asm (or in case of [1] it
> was pure opcodes) in, say, ss or tc; goal was on debugability and
> the people debugging are not always kernel devs. Actually currently,
> for cBPF dumps it looks like this in ss. Can you tell me what these
> 11 insns do? Likely you can, but can a normal admin?
> 
> # ss -0 -b
> Netid  Recv-Q Send-Q                                       Local
> Address:Port                                                        Peer
> Address:Port
> p_raw  0      0                                                       
> *:em1                                                                *
>     bpf filter (11):  0x28 0 0 12, 0x15 0 8 2048, 0x30 0 0 23, 0x15 0 6
> 17, 0x28 0 0 20, 0x45 4 0 8191, 0xb1 0 0 14, 0x48 0 0 16, 0x15 0 1 68,
> 0x06 0 0 4294967295, 0x06 0 0 0,

Yes, that byte stream needs some pretty printing. It is not as easily
decipherable, but this form is (quick, ballpark conversion by hand to
get the intent):


0x28 0 0 12         BPF_LD  | BPF_ABS | BPF_H     0, 0, 12
-- load 2 bytes at offset 12 of ethernet header (h_proto)

0x15 0 8 2048               | BPF_JMP |           0, 8, 0x0800
-- if protocol is not ipv4 jump forward 8 (exit prog 0)

0x30 0 0 23         BPF_LD  | BPF_ABS | BPF_B     0, 0, 23
-- load byte 9 of ipv4 hdr (ipv4 protocol)

0x15 0 6 17                 | BPF_JMP | BPF_B     0, 6, 17
-- if protocol is not 17 (udp) jump forward 6 (exit program)

0x28 0 0 20         BPF_LD  | BPF_ABS | BPF_H     0, 0, 20
-- load 2-bytes at offset 6 of ipv4 hdr (frag offset)

0x45 4 0 8191       BPF_LDX | BPF_IND | BPF_JMP   4, 0, 0x1fff
-- frag_offset & 0x1fff. if not 0 jump forward 4

0xb1 0 0 14         BPF_LDX | BPF_MSH | BPF_B     0, 0, 14
-- load ipv4 header length

0x48 0 0 16                 | BPF_IND | BPF_H     0, 0, 16
-- load 2-bytes at offset 2 of ipv4 hdr (tot_len)

0x15 0 1 68         BPF_LDX | BPF_ALU | BPF_B     0, 1, 68
-- jump to exit 0 if total ipv4 payload length is not 68

0x06 0 0 4294967295 BPF_RET                       0, 0, 0xffffffff
-- exit -1 / 0xffffffff

0x06 0 0 0          BPF_RET                       0, 0, 0
-- exit 0


So basically, match ipv4/udp packets that are not fragments with a total
ip length of 68. Or roughly this tcpdump filter:

'ether[12:2] == 0x800 && ip[9] == 17 && (ip[6:2] & 0x1fff == 0) &&
ip[2:2] == 68'



> 
> Certainly that can and should be further improved (f.e. see bpf_disasm()
> in tools/net/bpf_dbg.c) to make it actually more readable w/o any
> external tooling that does not ship with iproute2. That could help
> already, but is still hard. Oddly enough, for cBPF everyone was fine
> so far with plain opcode dump it seems?! For eBPF, it could be same
> kind of disasm that people are used from verifier with further debug
> help when it comes to context access and helpers. So I'm not against
> this, but if it's just plain opcodes smashed to the user w/o further
> tooling/infra on top aiding the reverse engineering process, it's
> pretty much obfuscated and unusable. Thus, both needs to be provided
> here. Admittedly, I don't know how your iproute2 plans looked like.

It's not rocket science. We should be able to write tools that do the
same for bpf as objdump does for assembly. It is a matter of someone
having the need and taking the initiative. BTW, the bpf option was added
to ss by Nicolas, so a history of 6wind saying 'give me the bpf'.


> It's not a user space problem. When you want to CRIU your program,
> then you also need the correlation to maps that are used in the program.
> So the dump needs to export both, the map meta data and references to
> it in the insns stream where src_reg is BPF_PSEUDO_MAP_FD. With map meta
> data, I mean the attrs used to originally create the map via bpf(2), so
> that you can later restore the same thing (taking the actual map data
> aside for now as also not part of that specific api). Since a bpf prog
> has all the maps referenced in its aux data, it should be doable.
> Perhaps in case the map was pinned, that meta data should also contain
> dev/indoe of the pinned map. There's still the issue of generating a
> snapshot of map data itself, and to extract/dump prog insns from a tail
> call map, though.

I have not used CRIU and have no idea what it needs here. At a high
level it would seem to be a completely different use case -- perhaps
wanting all of the loaded programs and their attach points.  What I am
referring to and what 6wind has expressed an interest in is pulling
programs attached to specific objects, so the overlap with CRIU would be
just the intent to retrieve bpf programs.


>>>> haven't looked into whether it's feasible, but we should definitely
>>>> evaluate possibilities to transform the post-verifier insn sequence
>>>> back into a pre-verifier one for dumping, so we don't need to pay the
>>>> huge additional cost (also a major fraction of the insns might be
>>>> just duplicated here as they weren't rewritten). Perhaps it could
>>>> be done with the help of small annotations, or some sort of diff
>>>> that we only need to store. Not saying it's trivial, but we should
>>>> definitely try hard first and design it carefully if we want to
>>>> support such API.
>>
>> I looked at a reverse_convert_ctx_access at the end of last week. I only
>> have code to reverse sock_filter_convert_ctx_access, but scanning the
>> other convert functions nothing jumped out suggesting it is not doable.
> 
> Ok.
> 
>> Saving the original bpf code is the simplest solution, hence I decided
>> to use it for this discussion.
> 
> Simplest but with downside of huge worst-case cost, see my earlier
> comparison to bloating net devices, where the progs are eventually
> attached to, at least in tc case. So lets find ways to avoid this big
> overhead when designing this API.

I saw the comparison and I agree on avoiding the memory waste.

So next steps ?

1. Looking at reverse functions for the convert accesses done by the
verifier. Programs returned to userspace would be reversed

2. API for retrieving programs attached to objects. We can discuss this
further at netdev if you are going to be there, but for starters:
   - for existing rtnetlink dumps why can't the bpf be attached as a
netlink attribute as 6wind proposed? It is an attribute of that object.

   - Alternatively, the attach is always done by passing the FD as an
attribute, so the netlink dump could attach an fd to the running
program, return the FD as an attribute and the bpf program is retrieved
from the fd. This is a major departure from how dumps work with
processing attributes and needing to attach open files to a process will
be problematic. Integrating the bpf into the dump is a natural fit.

   - for cgroups the attach/detach go through the bpf syscall. There is
no rtnetlink here and Alexei resisted any cgroup based files for
accessing the filters. This suggests retrieving cgroup programs has to
go through bpf syscall as I proposed.

3. tools to pretty print the bpf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-08 19:40                 ` David Ahern
@ 2017-02-09  1:28                   ` David Ahern
  2017-02-09 11:25                   ` Daniel Borkmann
  1 sibling, 0 replies; 15+ messages in thread
From: David Ahern @ 2017-02-09  1:28 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: Quentin Monnet, netdev, roopa

On 2/8/17 12:40 PM, David Ahern wrote:
> On 2/8/17 3:52 AM, Daniel Borkmann wrote:
>> for cBPF dumps it looks like this in ss. Can you tell me what these
>> 11 insns do? Likely you can, but can a normal admin?
>>
>> # ss -0 -b
>> Netid  Recv-Q Send-Q                                       Local
>> Address:Port                                                        Peer
>> Address:Port
>> p_raw  0      0                                                       
>> *:em1                                                                *
>>     bpf filter (11):  0x28 0 0 12, 0x15 0 8 2048, 0x30 0 0 23, 0x15 0 6
>> 17, 0x28 0 0 20, 0x45 4 0 8191, 0xb1 0 0 14, 0x48 0 0 16, 0x15 0 1 68,
>> 0x06 0 0 4294967295, 0x06 0 0 0,
> 
...

> 
> It's not rocket science. We should be able to write tools that do the
> same for bpf as objdump does for assembly. It is a matter of someone
> having the need and taking the initiative. BTW, the bpf option was added

Just a couple of hours of hacking this afternoon and leveraging some of
the verifier code in the kernel, the above bpf filter in more human
friendly terms:

BPF_LD  | BPF_ABS  | BPF_H       0xc    :  val = *(u16 *)skb[12]
BPF_JMP | BPF_JEQ  | BPF_K  0  8 0x800  :  if !(val == 0x800) goto pc+8
BPF_LD  | BPF_ABS  | BPF_B       0x17   :  val = *(u8 *)skb[23]
BPF_JMP | BPF_JEQ  | BPF_K  0  6 0x11   :  if !(val == 0x11) goto pc+6
BPF_LD  | BPF_ABS  | BPF_H       0x14   :  val = *(u16 *)skb[20]
BPF_JMP | BPF_JSET | BPF_K  4  0 0x1fff :  if ((val & 0x1fff) != 0) goto
pc+4
BPF_LDX | BPF_MSH  | BPF_B       0xe    :
BPF_LD  | BPF_IND  | BPF_H       0x10   :  val = *(u16 *)skb[16]
BPF_JMP | BPF_JEQ  | BPF_K  0  1 0x44   :  if !(val == 0x44) goto pc+1
BPF_RET ffffffff                        :  ret ffffffff
BPF_RET 0                               :  ret 0

(long lines so I chopped the reprint of the hex on the left)

That said, verifying that the program attached to a cgroup is correct
for a VRF does not require it to be pretty printed or viewed by humans.
I can automate the checks on namespace id and and device index.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-08 19:40                 ` David Ahern
  2017-02-09  1:28                   ` David Ahern
@ 2017-02-09 11:25                   ` Daniel Borkmann
  2017-02-10  5:22                     ` Alexei Starovoitov
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2017-02-09 11:25 UTC (permalink / raw)
  To: David Ahern, Alexei Starovoitov; +Cc: Quentin Monnet, netdev, roopa

On 02/08/2017 08:40 PM, David Ahern wrote:
> On 2/8/17 3:52 AM, Daniel Borkmann wrote:
>> I agree that dumping might be useful particularly with further
>> tooling on top (f.e. decompiler), but in any case the interface for
>> that needs to be designed carefully and generic, so it covers various
>> use cases (meaning also CRIU). Above, I was rather referring to what
>> an admin can make sense of when you have worst-case 4k insns with
>> complex control flow and you just dump the asm (or in case of [1] it
>> was pure opcodes) in, say, ss or tc; goal was on debugability and
>> the people debugging are not always kernel devs. Actually currently,
>> for cBPF dumps it looks like this in ss. Can you tell me what these
>> 11 insns do? Likely you can, but can a normal admin?
>>
>> # ss -0 -b
>> Netid  Recv-Q Send-Q                                       Local
>> Address:Port                                                        Peer
>> Address:Port
>> p_raw  0      0
>> *:em1                                                                *
>>      bpf filter (11):  0x28 0 0 12, 0x15 0 8 2048, 0x30 0 0 23, 0x15 0 6
>> 17, 0x28 0 0 20, 0x45 4 0 8191, 0xb1 0 0 14, 0x48 0 0 16, 0x15 0 1 68,
>> 0x06 0 0 4294967295, 0x06 0 0 0,
>
> Yes, that byte stream needs some pretty printing. It is not as easily
> decipherable, but this form is (quick, ballpark conversion by hand to
> get the intent):
>
> 0x28 0 0 12         BPF_LD  | BPF_ABS | BPF_H     0, 0, 12
> -- load 2 bytes at offset 12 of ethernet header (h_proto)
>
> 0x15 0 8 2048               | BPF_JMP |           0, 8, 0x0800
> -- if protocol is not ipv4 jump forward 8 (exit prog 0)
>
> 0x30 0 0 23         BPF_LD  | BPF_ABS | BPF_B     0, 0, 23
> -- load byte 9 of ipv4 hdr (ipv4 protocol)
>
> 0x15 0 6 17                 | BPF_JMP | BPF_B     0, 6, 17
> -- if protocol is not 17 (udp) jump forward 6 (exit program)
>
> 0x28 0 0 20         BPF_LD  | BPF_ABS | BPF_H     0, 0, 20
> -- load 2-bytes at offset 6 of ipv4 hdr (frag offset)
>
> 0x45 4 0 8191       BPF_LDX | BPF_IND | BPF_JMP   4, 0, 0x1fff
> -- frag_offset & 0x1fff. if not 0 jump forward 4
>
> 0xb1 0 0 14         BPF_LDX | BPF_MSH | BPF_B     0, 0, 14
> -- load ipv4 header length
>
> 0x48 0 0 16                 | BPF_IND | BPF_H     0, 0, 16
> -- load 2-bytes at offset 2 of ipv4 hdr (tot_len)
>
> 0x15 0 1 68         BPF_LDX | BPF_ALU | BPF_B     0, 1, 68
> -- jump to exit 0 if total ipv4 payload length is not 68
>
> 0x06 0 0 4294967295 BPF_RET                       0, 0, 0xffffffff
> -- exit -1 / 0xffffffff
>
> 0x06 0 0 0          BPF_RET                       0, 0, 0
> -- exit 0
>
> So basically, match ipv4/udp packets that are not fragments with a total
> ip length of 68. Or roughly this tcpdump filter:
>
> 'ether[12:2] == 0x800 && ip[9] == 17 && (ip[6:2] & 0x1fff == 0) &&
> ip[2:2] == 68'
>
>> Certainly that can and should be further improved (f.e. see bpf_disasm()
>> in tools/net/bpf_dbg.c) to make it actually more readable w/o any
>> external tooling that does not ship with iproute2. That could help
>> already, but is still hard. Oddly enough, for cBPF everyone was fine
>> so far with plain opcode dump it seems?! For eBPF, it could be same
>> kind of disasm that people are used from verifier with further debug
>> help when it comes to context access and helpers. So I'm not against
>> this, but if it's just plain opcodes smashed to the user w/o further
>> tooling/infra on top aiding the reverse engineering process, it's
>> pretty much obfuscated and unusable. Thus, both needs to be provided
>> here. Admittedly, I don't know how your iproute2 plans looked like.
>
> It's not rocket science. We should be able to write tools that do the
> same for bpf as objdump does for assembly. It is a matter of someone
> having the need and taking the initiative. BTW, the bpf option was added

Right, as I said, we should add similar disasm output that verifier
resp. llvm-objdump for eBPF generates. So that it's easier to follow
for people familiar with that already. F.e. that could be dumped if
someone calls `tc -d filter show dev foo ingress|egress` and bpf is
present, the disasm could be in lib/bpf.c and called from the various
users in iproute2, incl. vrf, xdp, etc. On top of that, perhaps -dd
(or whatever fits) option where the disasm and related opcodes are
shown for each line as well (I mean similar to `bpf_jit_disasm -o`).
Having that would make debugging/introspection easier, agree.

> to ss by Nicolas, so a history of 6wind saying 'give me the bpf'.

Right, that gets pretty-printed or processed outside of iproute2 then
in some third party application I presume. We should have something
that ships with iproute2 natively instead.

>> It's not a user space problem. When you want to CRIU your program,
>> then you also need the correlation to maps that are used in the program.
>> So the dump needs to export both, the map meta data and references to
>> it in the insns stream where src_reg is BPF_PSEUDO_MAP_FD. With map meta
>> data, I mean the attrs used to originally create the map via bpf(2), so
>> that you can later restore the same thing (taking the actual map data
>> aside for now as also not part of that specific api). Since a bpf prog
>> has all the maps referenced in its aux data, it should be doable.
>> Perhaps in case the map was pinned, that meta data should also contain
>> dev/indoe of the pinned map. There's still the issue of generating a
>> snapshot of map data itself, and to extract/dump prog insns from a tail
>> call map, though.
>
> I have not used CRIU and have no idea what it needs here. At a high
> level it would seem to be a completely different use case -- perhaps
> wanting all of the loaded programs and their attach points.  What I am
> referring to and what 6wind has expressed an interest in is pulling
> programs attached to specific objects, so the overlap with CRIU would be
> just the intent to retrieve bpf programs.

Correct the overlap both use-cases share is the dump itself. It needs
to be in such a condition for CRIU, that it can be reloaded eventually,
hence references to maps need to be reconstructable, which means we
need to define a common format for the dump (probably as TLVs), that
contains an array of map definition, where the insns need to reference
them (instead of having meaningless/stale fd numbers in imm). If you look
at union bpf_attr {}, we have the anonymous struct for BPF_MAP_CREATE,
so the map definition would need to be an array of these, perhaps with a
unique id value for each, that is then placed into the related imms of
the instruction sequence where maps are present. TLVs likely make sense
since union bpf_attr {} can get extended as needed in future, and thus
also the members for map creation, so we need something extensible for
the dump as well w/o breaking existing apps. That would be a start that
can address both use-cases, so we don't end up having two dump APIs/format
in future. At the same time, it also helps showing this info to the
user for introspection.

>>>>> haven't looked into whether it's feasible, but we should definitely
>>>>> evaluate possibilities to transform the post-verifier insn sequence
>>>>> back into a pre-verifier one for dumping, so we don't need to pay the
>>>>> huge additional cost (also a major fraction of the insns might be
>>>>> just duplicated here as they weren't rewritten). Perhaps it could
>>>>> be done with the help of small annotations, or some sort of diff
>>>>> that we only need to store. Not saying it's trivial, but we should
>>>>> definitely try hard first and design it carefully if we want to
>>>>> support such API.
>>>
>>> I looked at a reverse_convert_ctx_access at the end of last week. I only
>>> have code to reverse sock_filter_convert_ctx_access, but scanning the
>>> other convert functions nothing jumped out suggesting it is not doable.
>>
>> Ok.
>>
>>> Saving the original bpf code is the simplest solution, hence I decided
>>> to use it for this discussion.
>>
>> Simplest but with downside of huge worst-case cost, see my earlier
>> comparison to bloating net devices, where the progs are eventually
>> attached to, at least in tc case. So lets find ways to avoid this big
>> overhead when designing this API.
>
> I saw the comparison and I agree on avoiding the memory waste.
>
> So next steps ?
>
> 1. Looking at reverse functions for the convert accesses done by the
> verifier. Programs returned to userspace would be reversed

Yes.

> 2. API for retrieving programs attached to objects. We can discuss this
> further at netdev if you are going to be there, but for starters:

Will be there.

>     - for existing rtnetlink dumps why can't the bpf be attached as a
> netlink attribute as 6wind proposed? It is an attribute of that object.
>
>     - Alternatively, the attach is always done by passing the FD as an
> attribute, so the netlink dump could attach an fd to the running
> program, return the FD as an attribute and the bpf program is retrieved
> from the fd. This is a major departure from how dumps work with
> processing attributes and needing to attach open files to a process will
> be problematic. Integrating the bpf into the dump is a natural fit.

Right, I think it's a natural fit to place it into the various points/
places where it's attached to, as we're stuck with that anyway for the
attachment part. Meaning in cls_bpf, it would go as a mem blob into the
netlink attribute. There would need to be a common BPF core helper that
the various subsystem users call in order to generate that mentioned
output format, and that resulting mem blob is then stuck into either
nlattr, mem provided by syscall, etc.

>     - for cgroups the attach/detach go through the bpf syscall. There is
> no rtnetlink here and Alexei resisted any cgroup based files for
> accessing the filters. This suggests retrieving cgroup programs has to
> go through bpf syscall as I proposed.
>
> 3. tools to pretty print the bpf
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-09 11:25                   ` Daniel Borkmann
@ 2017-02-10  5:22                     ` Alexei Starovoitov
  2017-02-10 22:45                       ` Daniel Borkmann
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2017-02-10  5:22 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, Quentin Monnet, netdev, roopa

On Thu, Feb 09, 2017 at 12:25:37PM +0100, Daniel Borkmann wrote:
>
> Correct the overlap both use-cases share is the dump itself. It needs
> to be in such a condition for CRIU, that it can be reloaded eventually,

I don't think it makes sense to drag criu into this discussion.
I expressed my take on criu in the other thread. tldr:
bpf is a graph of dependencies between programs, maps, applications
and kernel events. So to save/restore this graph one would need to solve
very hard problems of stopping multiple applications at once,
stopping kernel events and so on. I don't think it's worth going that route.

> >    - Alternatively, the attach is always done by passing the FD as an
> >attribute, so the netlink dump could attach an fd to the running
> >program, return the FD as an attribute and the bpf program is retrieved
> >from the fd. This is a major departure from how dumps work with
> >processing attributes and needing to attach open files to a process will
> >be problematic. Integrating the bpf into the dump is a natural fit.
> 
> Right, I think it's a natural fit to place it into the various points/
> places where it's attached to, as we're stuck with that anyway for the
> attachment part. Meaning in cls_bpf, it would go as a mem blob into the
> netlink attribute. There would need to be a common BPF core helper that
> the various subsystem users call in order to generate that mentioned
> output format, and that resulting mem blob is then stuck into either
> nlattr, mem provided by syscall, etc.

I think if we use ten different ways to dump it, it will
complicate the user space tooling.
I'd rather see one way of doing it via new syscall command.
Pass prog_fd and it will return insns in some form.

Here is more concrete proposal:
- add two flags to PROG_LOAD:
  BPF_F_ENFORCE_STATELESS - it will require verifier to check that program
  doesn't use maps and any other global state (doesn't use bpf_redirect,
  doesn't use bpf_set_tunnel_key and tunnel_opt)
  This will ensure that program is stateless and pure instruction
  dump is meaningful. For 'ip vrf' case it will be enough.
  BPF_F_ALLOW_DUMP - it will save original program, so in the common
  case we wouldn't need to waste memory to save program

- add new bpf syscall command BPF_PROG_DUMP
  input: prog_fd, output: insns
  it will work right away with OBJ_GET command and the user will
  be able to dump stateless programs pinned in bpffs

- add approriate interfaces for different attach points to return prog_fd:
  for cgroup it will be new BPF_PROG_GET command.
  for socket it will be new getsockopt. (Actually BPF_PROG_GET can work
  for sockets too and probably better).
  for xdp and tc we need to find a way to return prog_fd.
  netlink is no good, since it would be very weird to install fd
  and return it async in netlink body. We can simply say that
  whoever wants to dump programs need to first pin them in bpffs
  and then attach to tc/xdp. iproute2 already does it anyway.
  Realistically tc/xdp programs are almost always stateful, so
  dump won't be available for them anyway.

If in the future we will discover magic way of restoring maps,
we can relax prog loading part and allow BPF_F_ALLOW_DUMP to be
used independently of BPF_F_ENFORCE_STATELESS flag.
(In the beginning these two flags would need to be used together).
Also we'll be able to extend BPF_PROG_DUMP independently
of attachment points. If we introduce something like global
variable section or read-only string section (like some folks asked)
we can dump it back. Insns will not be the only section.

My main point is that right now I'd really like to avoid
dealing with stateful bits (maps, etc), since there is no
good way of dumping it, while stateless will be enough
for 'ip vrf' and simple programs.

Thoughts?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
  2017-02-10  5:22                     ` Alexei Starovoitov
@ 2017-02-10 22:45                       ` Daniel Borkmann
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Borkmann @ 2017-02-10 22:45 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Ahern, Quentin Monnet, netdev, roopa

On 02/10/2017 06:22 AM, Alexei Starovoitov wrote:
> On Thu, Feb 09, 2017 at 12:25:37PM +0100, Daniel Borkmann wrote:
>>
>> Correct the overlap both use-cases share is the dump itself. It needs
>> to be in such a condition for CRIU, that it can be reloaded eventually,
>
> I don't think it makes sense to drag criu into this discussion.
> I expressed my take on criu in the other thread. tldr:
> bpf is a graph of dependencies between programs, maps, applications
> and kernel events. So to save/restore this graph one would need to solve
> very hard problems of stopping multiple applications at once,
> stopping kernel events and so on. I don't think it's worth going that route.

Definitely not straight forward, fully agree. Worst-case you probably
need to go via stop_machine() (like in ftrace case when it modifies
code) in order to get a global consistent snapshot at a specific time.
Sounds ugly. Or if small steps first, tail calls etc would not be supported;
then you would need to tackle progs and generic maps, for the progs part
it could be a very similar interface at least, thus I'm saying that it
would be good if it's designed extendable in future on that regard.

>>>     - Alternatively, the attach is always done by passing the FD as an
>>> attribute, so the netlink dump could attach an fd to the running
>>> program, return the FD as an attribute and the bpf program is retrieved
>> >from the fd. This is a major departure from how dumps work with
>>> processing attributes and needing to attach open files to a process will
>>> be problematic. Integrating the bpf into the dump is a natural fit.
>>
>> Right, I think it's a natural fit to place it into the various points/
>> places where it's attached to, as we're stuck with that anyway for the
>> attachment part. Meaning in cls_bpf, it would go as a mem blob into the
>> netlink attribute. There would need to be a common BPF core helper that
>> the various subsystem users call in order to generate that mentioned
>> output format, and that resulting mem blob is then stuck into either
>> nlattr, mem provided by syscall, etc.
>
> I think if we use ten different ways to dump it, it will
> complicate the user space tooling.
> I'd rather see one way of doing it via new syscall command.
> Pass prog_fd and it will return insns in some form.
>
> Here is more concrete proposal:
> - add two flags to PROG_LOAD:
>    BPF_F_ENFORCE_STATELESS - it will require verifier to check that program
>    doesn't use maps and any other global state (doesn't use bpf_redirect,
>    doesn't use bpf_set_tunnel_key and tunnel_opt)
>    This will ensure that program is stateless and pure instruction
>    dump is meaningful. For 'ip vrf' case it will be enough.

I don't think such flag will be needed from uapi pov. Verifier can
just set a flag like that in the bpf_prog aux bits while verifying ...

>    BPF_F_ALLOW_DUMP - it will save original program, so in the common
>    case we wouldn't need to waste memory to save program

... and when that one is passed and the prog has state, then it gets
rejected. Effectively, both flags are saying the same thing. Plus side
is that you don't waste any resources when not set, but problem I see
is that BPF_F_ALLOW_DUMP requires explicit cooperation from a process,
when used for introspection doing that transparently instead might be
more desirable. Problem is that even when transparent, we have mentioned
limitations, so someone who doesn't want to cooperate could then just
use f.e. an empty tail call map on exit and that would be enough to
make dump not supported again. But also with just BPF_F_ALLOW_DUMP, I
can foresee that in half a year or so people request that dump should
be possible also without BPF_F_ALLOW_DUMP explicitly set.

> - add new bpf syscall command BPF_PROG_DUMP
>    input: prog_fd, output: insns
>    it will work right away with OBJ_GET command and the user will
>    be able to dump stateless programs pinned in bpffs

(And with that it requires really cooperation by design.)

> - add approriate interfaces for different attach points to return prog_fd:
>    for cgroup it will be new BPF_PROG_GET command.
>    for socket it will be new getsockopt. (Actually BPF_PROG_GET can work
>    for sockets too and probably better).

I assume you mean above BPF_PROG_DUMP, right? Yeah, for them it's
not that difficult, agree.

>    for xdp and tc we need to find a way to return prog_fd.
>    netlink is no good, since it would be very weird to install fd
>    and return it async in netlink body. We can simply say that
>    whoever wants to dump programs need to first pin them in bpffs
>    and then attach to tc/xdp. iproute2 already does it anyway.
>    Realistically tc/xdp programs are almost always stateful, so
>    dump won't be available for them anyway.

Right, but if it's just for introspection, I still think that this
format I described earlier could work. Meaning for maps, you dump
all the params used to create the map along with refs where they
are used, that would allow for tc/xdp to dump it at least. It still
wouldn't support tail calls, but you would see that a tail call map
is used somewhere. It would still allow to move all the logic behind
the tail call then to defeat it, but we could make it that user
space can retrieve progs from slots by returning a new fd which
then itself can be used to do the dump again.

> If in the future we will discover magic way of restoring maps,
> we can relax prog loading part and allow BPF_F_ALLOW_DUMP to be
> used independently of BPF_F_ENFORCE_STATELESS flag.
> (In the beginning these two flags would need to be used together).
> Also we'll be able to extend BPF_PROG_DUMP independently
> of attachment points. If we introduce something like global
> variable section or read-only string section (like some folks asked)
> we can dump it back. Insns will not be the only section.
>
> My main point is that right now I'd really like to avoid
> dealing with stateful bits (maps, etc), since there is no
> good way of dumping it, while stateless will be enough

With dumping data, agree, but as mentioned you could at least
dump related map attributes that are immutable during lifetime
of a map, which can help introspection side a bit.

> for 'ip vrf' and simple programs.
>
> Thoughts?
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-02-10 22:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 20:38 [RFC PATCH net-next 0/2] bpf: Allow retrieval of ebpf filters David Ahern
2017-02-03 20:38 ` [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions David Ahern
2017-02-03 21:09   ` Daniel Borkmann
2017-02-03 22:28     ` David Ahern
2017-02-06 10:56       ` Quentin Monnet
2017-02-06 14:13         ` Daniel Borkmann
2017-02-06 19:21           ` Alexei Starovoitov
2017-02-07 17:22             ` David Ahern
2017-02-08 10:52               ` Daniel Borkmann
2017-02-08 19:40                 ` David Ahern
2017-02-09  1:28                   ` David Ahern
2017-02-09 11:25                   ` Daniel Borkmann
2017-02-10  5:22                     ` Alexei Starovoitov
2017-02-10 22:45                       ` Daniel Borkmann
2017-02-03 20:38 ` [RFC PATCH net-next 2/2] bpf: Add support to retrieve program attached to a cgroup David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.