All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] BPF updates
@ 2014-03-21 12:20 Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 1/9] net: filter: add jited flag to indicate jit compiled filters Daniel Borkmann
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

We sat down and have heavily reworked the whole previous patchset
from v10 [1] to address all comments/concerns. This patchset therefore
*replaces* the internal BPF interpreter with the new layout as
discussed in [1], and migrates some exotic callers to properly use the
BPF API for a transparent upgrade. All other callers that already use
the BPF API in a way it should be used, need no further changes to run
the new internals. We also removed the sysctl knob entirely, and do not
expose any structure to userland, so that implementation details only
reside in kernel space. Since we are replacing the interpreter we had
to migrate seccomp in one patch along with the interpreter to not break
anything. When attaching a new filter, the flow can be described as
following: i) test if jit compiler is enabled and can compile the user
BPF, ii) if so, then go for it, iii) if not, then transparently migrate
the filter into the new representation, and run it in the interpreter.
Also, we have scratched the jit flag from the len attribute and made it
as initial patch in this series as Pablo has suggested in the last
feedback, thanks. For details, please refer to the patches themself.

We did extensive testing of BPF and seccomp on the new interpreter
itself and also on the user ABIs and could not find any issues; new
performance numbers as posted in patch 8 are also still the same.

Please find more details in the patches themselves.

For all the previous history from v1 to v10, see [1]. We have decided
to drop the v11 as we have pedantically reworked the set, but of course,
included all previous feedback.

Rebased to latest net-next.

Thanks !

  [1] http://thread.gmane.org/gmane.linux.kernel/1665858

Alexei Starovoitov (2):
  net: filter: rework/optimize internal BPF interpreter's instruction set
  doc: filter: extend BPF documentation to document new internals

Daniel Borkmann (7):
  net: filter: add jited flag to indicate jit compiled filters
  net: filter: keep original BPF program around
  net: filter: move filter accounting to filter core
  net: ptp: use sk_unattached_filter_create() for BPF
  net: ptp: do not reimplement PTP/BPF classifier
  net: ppp: use sk_unattached_filter api
  net: isdn: use sk_unattached_filter api

 Documentation/networking/filter.txt                |  147 ++
 arch/arm/net/bpf_jit_32.c                          |    3 +-
 arch/powerpc/net/bpf_jit_comp.c                    |    3 +-
 arch/s390/net/bpf_jit_comp.c                       |    5 +-
 arch/sparc/net/bpf_jit_comp.c                      |    3 +-
 arch/x86/net/bpf_jit_comp.c                        |    3 +-
 drivers/isdn/i4l/isdn_ppp.c                        |   61 +-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   |   11 +-
 drivers/net/ethernet/ti/cpts.c                     |   10 +-
 drivers/net/ethernet/xscale/ixp4xx_eth.c           |   11 +-
 drivers/net/ppp/ppp_generic.c                      |   60 +-
 include/linux/filter.h                             |  110 +-
 include/linux/isdn_ppp.h                           |    5 +-
 include/linux/ptp_classify.h                       |   14 +-
 include/linux/seccomp.h                            |    1 -
 include/net/sock.h                                 |   27 -
 kernel/seccomp.c                                   |  119 +-
 net/core/filter.c                                  | 1523 ++++++++++++++++----
 net/core/sock_diag.c                               |   23 +-
 net/core/timestamping.c                            |   27 +-
 20 files changed, 1630 insertions(+), 536 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/9] net: filter: add jited flag to indicate jit compiled filters
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 2/9] net: filter: keep original BPF program around Daniel Borkmann
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Pablo Neira Ayuso

This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.

Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
---
 arch/arm/net/bpf_jit_32.c       | 3 ++-
 arch/powerpc/net/bpf_jit_comp.c | 3 ++-
 arch/s390/net/bpf_jit_comp.c    | 5 ++++-
 arch/sparc/net/bpf_jit_comp.c   | 3 ++-
 arch/x86/net/bpf_jit_comp.c     | 3 ++-
 include/linux/filter.h          | 3 ++-
 net/core/filter.c               | 1 +
 7 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 271b5e9..e72ff51 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -925,6 +925,7 @@ void bpf_jit_compile(struct sk_filter *fp)
 		bpf_jit_dump(fp->len, alloc_size, 2, ctx.target);
 
 	fp->bpf_func = (void *)ctx.target;
+	fp->jited = 1;
 out:
 	kfree(ctx.offsets);
 	return;
@@ -932,7 +933,7 @@ out:
 
 void bpf_jit_free(struct sk_filter *fp)
 {
-	if (fp->bpf_func != sk_run_filter)
+	if (fp->jited)
 		module_free(NULL, fp->bpf_func);
 	kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 555034f..c0c5fcb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -689,6 +689,7 @@ void bpf_jit_compile(struct sk_filter *fp)
 		((u64 *)image)[0] = (u64)code_base;
 		((u64 *)image)[1] = local_paca->kernel_toc;
 		fp->bpf_func = (void *)image;
+		fp->jited = 1;
 	}
 out:
 	kfree(addrs);
@@ -697,7 +698,7 @@ out:
 
 void bpf_jit_free(struct sk_filter *fp)
 {
-	if (fp->bpf_func != sk_run_filter)
+	if (fp->jited)
 		module_free(NULL, fp->bpf_func);
 	kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 708d60e..dd2d9b3 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -877,6 +877,7 @@ void bpf_jit_compile(struct sk_filter *fp)
 	if (jit.start) {
 		set_memory_ro((unsigned long)header, header->pages);
 		fp->bpf_func = (void *) jit.start;
+		fp->jited = 1;
 	}
 out:
 	kfree(addrs);
@@ -887,10 +888,12 @@ void bpf_jit_free(struct sk_filter *fp)
 	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
 	struct bpf_binary_header *header = (void *)addr;
 
-	if (fp->bpf_func == sk_run_filter)
+	if (!fp->jited)
 		goto free_filter;
+
 	set_memory_rw(addr, header->pages);
 	module_free(NULL, header);
+
 free_filter:
 	kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 01fe994..8c01be6 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -809,6 +809,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
 	if (image) {
 		bpf_flush_icache(image, image + proglen);
 		fp->bpf_func = (void *)image;
+		fp->jited = 1;
 	}
 out:
 	kfree(addrs);
@@ -817,7 +818,7 @@ out:
 
 void bpf_jit_free(struct sk_filter *fp)
 {
-	if (fp->bpf_func != sk_run_filter)
+	if (fp->jited)
 		module_free(NULL, fp->bpf_func);
 	kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 4ed75dd..7fa182c 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,6 +772,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
 		bpf_flush_icache(header, image + proglen);
 		set_memory_ro((unsigned long)header, header->pages);
 		fp->bpf_func = (void *)image;
+		fp->jited = 1;
 	}
 out:
 	kfree(addrs);
@@ -791,7 +792,7 @@ static void bpf_jit_free_deferred(struct work_struct *work)
 
 void bpf_jit_free(struct sk_filter *fp)
 {
-	if (fp->bpf_func != sk_run_filter) {
+	if (fp->jited) {
 		INIT_WORK(&fp->work, bpf_jit_free_deferred);
 		schedule_work(&fp->work);
 	} else {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index e568c8e..e65e230 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -25,7 +25,8 @@ struct sock;
 struct sk_filter
 {
 	atomic_t		refcnt;
-	unsigned int         	len;	/* Number of filter blocks */
+	u32			jited:1,	/* Is our filter JIT'ed? */
+				len:31;		/* Number of filter blocks */
 	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
diff --git a/net/core/filter.c b/net/core/filter.c
index ad30d62..2874cc8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -646,6 +646,7 @@ static int __sk_prepare_filter(struct sk_filter *fp)
 	int err;
 
 	fp->bpf_func = sk_run_filter;
+	fp->jited = 0;
 
 	err = sk_chk_filter(fp->insns, fp->len);
 	if (err)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/9] net: filter: keep original BPF program around
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 1/9] net: filter: add jited flag to indicate jit compiled filters Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 3/9] net: filter: move filter accounting to filter core Daniel Borkmann
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Pavel Emelyanov

In order to open up the possibility to internally transform a BPF program
into an alternative and possibly non-trivial reversible representation, we
need to keep the original BPF program around, so that it can be passed back
to user space w/o the need of a complex decoder.

The reason for that use case resides in commit a8fc92778080 ("sk-filter:
Add ability to get socket filter program (v2)"), that is, the ability
to retrieve the currently attached BPF filter from a given socket used
mainly by the checkpoint-restore project, for example.

Therefore, we add two helpers sk_{store,release}_orig_filter for taking
care of that. In the sk_unattached_filter_create() case, there's no such
possibility/requirement to retrieve a loaded BPF program. Therefore, we
can spare us the work in that case.

This approach will simplify and slightly speed up both, sk_get_filter()
and sock_diag_put_filterinfo() handlers as we won't need to successively
decode filters anymore through sk_decode_filter(). As we still need
sk_decode_filter() later on, we're keeping it around.

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
---
 include/linux/filter.h | 15 +++++++--
 net/core/filter.c      | 86 ++++++++++++++++++++++++++++++++++++++++----------
 net/core/sock_diag.c   | 23 ++++++--------
 3 files changed, 93 insertions(+), 31 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index e65e230..93a9792 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -19,14 +19,19 @@ struct compat_sock_fprog {
 };
 #endif
 
+struct sock_fprog_kern {
+	u16			len;
+	struct sock_filter	*filter;
+};
+
 struct sk_buff;
 struct sock;
 
-struct sk_filter
-{
+struct sk_filter {
 	atomic_t		refcnt;
 	u32			jited:1,	/* Is our filter JIT'ed? */
 				len:31;		/* Number of filter blocks */
+	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
 	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
@@ -42,14 +47,20 @@ static inline unsigned int sk_filter_size(unsigned int proglen)
 		   offsetof(struct sk_filter, insns[proglen]));
 }
 
+#define sk_filter_proglen(fprog)			\
+		(fprog->len * sizeof(fprog->filter[0]))
+
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);
 extern unsigned int sk_run_filter(const struct sk_buff *skb,
 				  const struct sock_filter *filter);
+
 extern int sk_unattached_filter_create(struct sk_filter **pfp,
 				       struct sock_fprog *fprog);
 extern void sk_unattached_filter_destroy(struct sk_filter *fp);
+
 extern int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
 extern int sk_detach_filter(struct sock *sk);
+
 extern int sk_chk_filter(struct sock_filter *filter, unsigned int flen);
 extern int sk_get_filter(struct sock *sk, struct sock_filter __user *filter, unsigned len);
 extern void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to);
diff --git a/net/core/filter.c b/net/core/filter.c
index 2874cc8..f19c078 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -629,6 +629,37 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
 }
 EXPORT_SYMBOL(sk_chk_filter);
 
+static int sk_store_orig_filter(struct sk_filter *fp,
+				const struct sock_fprog *fprog)
+{
+	unsigned int fsize = sk_filter_proglen(fprog);
+	struct sock_fprog_kern *fkprog;
+
+	fp->orig_prog = kmalloc(sizeof(*fkprog), GFP_KERNEL);
+	if (!fp->orig_prog)
+		return -ENOMEM;
+
+	fkprog = fp->orig_prog;
+	fkprog->len = fprog->len;
+	fkprog->filter = kmemdup(fp->insns, fsize, GFP_KERNEL);
+	if (!fkprog->filter) {
+		kfree(fp->orig_prog);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void sk_release_orig_filter(struct sk_filter *fp)
+{
+	struct sock_fprog_kern *fprog = fp->orig_prog;
+
+	if (fprog) {
+		kfree(fprog->filter);
+		kfree(fprog);
+	}
+}
+
 /**
  * 	sk_filter_release_rcu - Release a socket filter by rcu_head
  *	@rcu: rcu_head that contains the sk_filter to free
@@ -637,6 +668,7 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
 {
 	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
+	sk_release_orig_filter(fp);
 	bpf_jit_free(fp);
 }
 EXPORT_SYMBOL(sk_filter_release_rcu);
@@ -669,8 +701,8 @@ static int __sk_prepare_filter(struct sk_filter *fp)
 int sk_unattached_filter_create(struct sk_filter **pfp,
 				struct sock_fprog *fprog)
 {
+	unsigned int fsize = sk_filter_proglen(fprog);
 	struct sk_filter *fp;
-	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
 	int err;
 
 	/* Make sure new filter is there and in the right amounts. */
@@ -680,10 +712,16 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
 	fp = kmalloc(sk_filter_size(fprog->len), GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
+
 	memcpy(fp->insns, fprog->filter, fsize);
 
 	atomic_set(&fp->refcnt, 1);
 	fp->len = fprog->len;
+	/* Since unattached filters are not copied back to user
+	 * space through sk_get_filter(), we do not need to hold
+	 * a copy here, and can spare us the work.
+	 */
+	fp->orig_prog = NULL;
 
 	err = __sk_prepare_filter(fp);
 	if (err)
@@ -716,7 +754,7 @@ EXPORT_SYMBOL_GPL(sk_unattached_filter_destroy);
 int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 {
 	struct sk_filter *fp, *old_fp;
-	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+	unsigned int fsize = sk_filter_proglen(fprog);
 	unsigned int sk_fsize = sk_filter_size(fprog->len);
 	int err;
 
@@ -730,6 +768,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	fp = sock_kmalloc(sk, sk_fsize, GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
+
 	if (copy_from_user(fp->insns, fprog->filter, fsize)) {
 		sock_kfree_s(sk, fp, sk_fsize);
 		return -EFAULT;
@@ -738,6 +777,12 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	atomic_set(&fp->refcnt, 1);
 	fp->len = fprog->len;
 
+	err = sk_store_orig_filter(fp, fprog);
+	if (err) {
+		sk_filter_uncharge(sk, fp);
+		return -ENOMEM;
+	}
+
 	err = __sk_prepare_filter(fp);
 	if (err) {
 		sk_filter_uncharge(sk, fp);
@@ -750,6 +795,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 
 	if (old_fp)
 		sk_filter_uncharge(sk, old_fp);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sk_attach_filter);
@@ -769,6 +815,7 @@ int sk_detach_filter(struct sock *sk)
 		sk_filter_uncharge(sk, filter);
 		ret = 0;
 	}
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(sk_detach_filter);
@@ -851,34 +898,41 @@ void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to)
 	to->k = filt->k;
 }
 
-int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf, unsigned int len)
+int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
+		  unsigned int len)
 {
+	struct sock_fprog_kern *fprog;
 	struct sk_filter *filter;
-	int i, ret;
+	int ret = 0;
 
 	lock_sock(sk);
 	filter = rcu_dereference_protected(sk->sk_filter,
-			sock_owned_by_user(sk));
-	ret = 0;
+					   sock_owned_by_user(sk));
 	if (!filter)
 		goto out;
-	ret = filter->len;
+
+	/* We're copying the filter that has been originally attached,
+	 * so no conversion/decode needed anymore.
+	 */
+	fprog = filter->orig_prog;
+
+	ret = fprog->len;
 	if (!len)
+		/* User space only enquires number of filter blocks. */
 		goto out;
+
 	ret = -EINVAL;
-	if (len < filter->len)
+	if (len < fprog->len)
 		goto out;
 
 	ret = -EFAULT;
-	for (i = 0; i < filter->len; i++) {
-		struct sock_filter fb;
-
-		sk_decode_filter(&filter->insns[i], &fb);
-		if (copy_to_user(&ubuf[i], &fb, sizeof(fb)))
-			goto out;
-	}
+	if (copy_to_user(ubuf, fprog->filter, sk_filter_proglen(fprog)))
+		goto out;
 
-	ret = filter->len;
+	/* Instead of bytes, the API requests to return the number
+	 * of filter blocks.
+	 */
+	ret = fprog->len;
 out:
 	release_sock(sk);
 	return ret;
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index a0e9cf6..d7af188 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -52,9 +52,10 @@ EXPORT_SYMBOL_GPL(sock_diag_put_meminfo);
 int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk,
 			     struct sk_buff *skb, int attrtype)
 {
-	struct nlattr *attr;
+	struct sock_fprog_kern *fprog;
 	struct sk_filter *filter;
-	unsigned int len;
+	struct nlattr *attr;
+	unsigned int flen;
 	int err = 0;
 
 	if (!ns_capable(user_ns, CAP_NET_ADMIN)) {
@@ -63,24 +64,20 @@ int sock_diag_put_filterinfo(struct user_namespace *user_ns, struct sock *sk,
 	}
 
 	rcu_read_lock();
-
 	filter = rcu_dereference(sk->sk_filter);
-	len = filter ? filter->len * sizeof(struct sock_filter) : 0;
+	if (!filter)
+		goto out;
 
-	attr = nla_reserve(skb, attrtype, len);
+	fprog = filter->orig_prog;
+	flen = sk_filter_proglen(fprog);
+
+	attr = nla_reserve(skb, attrtype, flen);
 	if (attr == NULL) {
 		err = -EMSGSIZE;
 		goto out;
 	}
 
-	if (filter) {
-		struct sock_filter *fb = (struct sock_filter *)nla_data(attr);
-		int i;
-
-		for (i = 0; i < filter->len; i++, fb++)
-			sk_decode_filter(&filter->insns[i], fb);
-	}
-
+	memcpy(nla_data(attr), fprog->filter, flen);
 out:
 	rcu_read_unlock();
 	return err;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/9] net: filter: move filter accounting to filter core
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 1/9] net: filter: add jited flag to indicate jit compiled filters Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 2/9] net: filter: keep original BPF program around Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 4/9] net: ptp: use sk_unattached_filter_create() for BPF Daniel Borkmann
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Pavel Emelyanov

This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.

Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.

These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
---
 include/linux/filter.h | 30 +++++++++++++++++-------------
 include/net/sock.h     | 27 ---------------------------
 net/core/filter.c      | 27 +++++++++++++++++++++++++--
 3 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 93a9792..9bde3ed 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -50,28 +50,32 @@ static inline unsigned int sk_filter_size(unsigned int proglen)
 #define sk_filter_proglen(fprog)			\
 		(fprog->len * sizeof(fprog->filter[0]))
 
-extern int sk_filter(struct sock *sk, struct sk_buff *skb);
-extern unsigned int sk_run_filter(const struct sk_buff *skb,
-				  const struct sock_filter *filter);
+int sk_filter(struct sock *sk, struct sk_buff *skb);
+unsigned int sk_run_filter(const struct sk_buff *skb,
+			   const struct sock_filter *filter);
 
-extern int sk_unattached_filter_create(struct sk_filter **pfp,
-				       struct sock_fprog *fprog);
-extern void sk_unattached_filter_destroy(struct sk_filter *fp);
+int sk_unattached_filter_create(struct sk_filter **pfp,
+				struct sock_fprog *fprog);
+void sk_unattached_filter_destroy(struct sk_filter *fp);
 
-extern int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
-extern int sk_detach_filter(struct sock *sk);
+int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
+int sk_detach_filter(struct sock *sk);
 
-extern int sk_chk_filter(struct sock_filter *filter, unsigned int flen);
-extern int sk_get_filter(struct sock *sk, struct sock_filter __user *filter, unsigned len);
-extern void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to);
+int sk_chk_filter(struct sock_filter *filter, unsigned int flen);
+int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
+		  unsigned int len);
+void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to);
+
+void sk_filter_charge(struct sock *sk, struct sk_filter *fp);
+void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp);
 
 #ifdef CONFIG_BPF_JIT
 #include <stdarg.h>
 #include <linux/linkage.h>
 #include <linux/printk.h>
 
-extern void bpf_jit_compile(struct sk_filter *fp);
-extern void bpf_jit_free(struct sk_filter *fp);
+void bpf_jit_compile(struct sk_filter *fp);
+void bpf_jit_free(struct sk_filter *fp);
 
 static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
 				u32 pass, void *image)
diff --git a/include/net/sock.h b/include/net/sock.h
index 625e65b..696227c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1621,33 +1621,6 @@ void sk_common_release(struct sock *sk);
 /* Initialise core socket variables */
 void sock_init_data(struct socket *sock, struct sock *sk);
 
-void sk_filter_release_rcu(struct rcu_head *rcu);
-
-/**
- *	sk_filter_release - release a socket filter
- *	@fp: filter to remove
- *
- *	Remove a filter from a socket and release its resources.
- */
-
-static inline void sk_filter_release(struct sk_filter *fp)
-{
-	if (atomic_dec_and_test(&fp->refcnt))
-		call_rcu(&fp->rcu, sk_filter_release_rcu);
-}
-
-static inline void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
-{
-	atomic_sub(sk_filter_size(fp->len), &sk->sk_omem_alloc);
-	sk_filter_release(fp);
-}
-
-static inline void sk_filter_charge(struct sock *sk, struct sk_filter *fp)
-{
-	atomic_inc(&fp->refcnt);
-	atomic_add(sk_filter_size(fp->len), &sk->sk_omem_alloc);
-}
-
 /*
  * Socket reference counting postulates.
  *
diff --git a/net/core/filter.c b/net/core/filter.c
index f19c078..976edc6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -664,14 +664,37 @@ static void sk_release_orig_filter(struct sk_filter *fp)
  * 	sk_filter_release_rcu - Release a socket filter by rcu_head
  *	@rcu: rcu_head that contains the sk_filter to free
  */
-void sk_filter_release_rcu(struct rcu_head *rcu)
+static void sk_filter_release_rcu(struct rcu_head *rcu)
 {
 	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
 	sk_release_orig_filter(fp);
 	bpf_jit_free(fp);
 }
-EXPORT_SYMBOL(sk_filter_release_rcu);
+
+/**
+ *	sk_filter_release - release a socket filter
+ *	@fp: filter to remove
+ *
+ *	Remove a filter from a socket and release its resources.
+ */
+static void sk_filter_release(struct sk_filter *fp)
+{
+	if (atomic_dec_and_test(&fp->refcnt))
+		call_rcu(&fp->rcu, sk_filter_release_rcu);
+}
+
+void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp)
+{
+	atomic_sub(sk_filter_size(fp->len), &sk->sk_omem_alloc);
+	sk_filter_release(fp);
+}
+
+void sk_filter_charge(struct sock *sk, struct sk_filter *fp)
+{
+	atomic_inc(&fp->refcnt);
+	atomic_add(sk_filter_size(fp->len), &sk->sk_omem_alloc);
+}
 
 static int __sk_prepare_filter(struct sk_filter *fp)
 {
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/9] net: ptp: use sk_unattached_filter_create() for BPF
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
                   ` (2 preceding siblings ...)
  2014-03-21 12:20 ` [PATCH net-next 3/9] net: filter: move filter accounting to filter core Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-24 22:39   ` David Miller
  2014-03-21 12:20 ` [PATCH net-next 5/9] net: ptp: do not reimplement PTP/BPF classifier Daniel Borkmann
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Richard Cochran, Jiri Benc

This patch migrates an open-coded sk_run_filter() implementation with
proper use of the BPF API, that is, sk_unattached_filter_create(). This
migration is needed, as we will be internally transforming the filter
to a different representation, and therefore needs to be decoupled.

It is okay to do so as skb_timestamping_init() is called during
initialization of the network stack in core initcall via sock_init().
This would effectively also allow for PTP filters to be jit compiled if
bpf_jit_enable is set.

For better readability, there are also some newlines introduced, also
ptp_classify.h is only in kernel space.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
---
 include/linux/ptp_classify.h |  4 ----
 net/core/timestamping.c      | 21 ++++++++++++++-------
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h
index 1dc420b..3decfa4 100644
--- a/include/linux/ptp_classify.h
+++ b/include/linux/ptp_classify.h
@@ -27,11 +27,7 @@
 #include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <linux/filter.h>
-#ifdef __KERNEL__
 #include <linux/in.h>
-#else
-#include <netinet/in.h>
-#endif
 
 #define PTP_CLASS_NONE  0x00 /* not a PTP event message */
 #define PTP_CLASS_V1    0x01 /* protocol version 1 */
diff --git a/net/core/timestamping.c b/net/core/timestamping.c
index 661b5a4..837485d 100644
--- a/net/core/timestamping.c
+++ b/net/core/timestamping.c
@@ -23,16 +23,13 @@
 #include <linux/skbuff.h>
 #include <linux/export.h>
 
-static struct sock_filter ptp_filter[] = {
-	PTP_FILTER
-};
+static struct sk_filter *ptp_insns __read_mostly;
 
 static unsigned int classify(const struct sk_buff *skb)
 {
-	if (likely(skb->dev &&
-		   skb->dev->phydev &&
+	if (likely(skb->dev && skb->dev->phydev &&
 		   skb->dev->phydev->drv))
-		return sk_run_filter(skb, ptp_filter);
+		return SK_RUN_FILTER(ptp_insns, skb);
 	else
 		return PTP_CLASS_NONE;
 }
@@ -60,11 +57,13 @@ void skb_clone_tx_timestamp(struct sk_buff *skb)
 		if (likely(phydev->drv->txtstamp)) {
 			if (!atomic_inc_not_zero(&sk->sk_refcnt))
 				return;
+
 			clone = skb_clone(skb, GFP_ATOMIC);
 			if (!clone) {
 				sock_put(sk);
 				return;
 			}
+
 			clone->sk = sk;
 			phydev->drv->txtstamp(phydev, clone, type);
 		}
@@ -89,12 +88,15 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
 	}
 
 	*skb_hwtstamps(skb) = *hwtstamps;
+
 	serr = SKB_EXT_ERR(skb);
 	memset(serr, 0, sizeof(*serr));
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
 	skb->sk = NULL;
+
 	err = sock_queue_err_skb(sk, skb);
+
 	sock_put(sk);
 	if (err)
 		kfree_skb(skb);
@@ -135,5 +137,10 @@ EXPORT_SYMBOL_GPL(skb_defer_rx_timestamp);
 
 void __init skb_timestamping_init(void)
 {
-	BUG_ON(sk_chk_filter(ptp_filter, ARRAY_SIZE(ptp_filter)));
+	struct sock_filter ptp_filter[] = { PTP_FILTER };
+	struct sock_fprog ptp_prog = {
+		.len = ARRAY_SIZE(ptp_filter), .filter = ptp_filter,
+	};
+
+	BUG_ON(sk_unattached_filter_create(&ptp_insns, &ptp_prog));
 }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 5/9] net: ptp: do not reimplement PTP/BPF classifier
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
                   ` (3 preceding siblings ...)
  2014-03-21 12:20 ` [PATCH net-next 4/9] net: ptp: use sk_unattached_filter_create() for BPF Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-21 12:20   ` Daniel Borkmann
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Richard Cochran, Jiri Benc

There are currently pch_gbe, cpts, and ixp4xx_eth drivers that open-code
and reimplement a BPF classifier for the PTP protocol. Since all of them
effectively do the very same thing and load the very same PTP/BPF filter,
we can just consolidate that code by introducing ptp_classify_raw() in
the time-stamping core framework which can be used in drivers.

As drivers get initialized after bootstrapping the core networking
subsystem, they can make use of ptp_insns wrapped through
ptp_classify_raw(), which allows to simplify and remove PTP classifier
setup code in drivers.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 11 +----------
 drivers/net/ethernet/ti/cpts.c                       | 10 +---------
 drivers/net/ethernet/xscale/ixp4xx_eth.c             | 11 +----------
 include/linux/ptp_classify.h                         | 10 ++--------
 net/core/timestamping.c                              |  8 +++++++-
 5 files changed, 12 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 464e910..73e6683 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -120,10 +120,6 @@ static void pch_gbe_mdio_write(struct net_device *netdev, int addr, int reg,
 			       int data);
 static void pch_gbe_set_multi(struct net_device *netdev);
 
-static struct sock_filter ptp_filter[] = {
-	PTP_FILTER
-};
-
 static int pch_ptp_match(struct sk_buff *skb, u16 uid_hi, u32 uid_lo, u16 seqid)
 {
 	u8 *data = skb->data;
@@ -131,7 +127,7 @@ static int pch_ptp_match(struct sk_buff *skb, u16 uid_hi, u32 uid_lo, u16 seqid)
 	u16 *hi, *id;
 	u32 lo;
 
-	if (sk_run_filter(skb, ptp_filter) == PTP_CLASS_NONE)
+	if (ptp_classify_raw(skb) == PTP_CLASS_NONE)
 		return 0;
 
 	offset = ETH_HLEN + IPV4_HLEN(data) + UDP_HLEN;
@@ -2635,11 +2631,6 @@ static int pch_gbe_probe(struct pci_dev *pdev,
 
 	adapter->ptp_pdev = pci_get_bus_and_slot(adapter->pdev->bus->number,
 					       PCI_DEVFN(12, 4));
-	if (ptp_filter_init(ptp_filter, ARRAY_SIZE(ptp_filter))) {
-		dev_err(&pdev->dev, "Bad ptp filter\n");
-		ret = -EINVAL;
-		goto err_free_netdev;
-	}
 
 	netdev->netdev_ops = &pch_gbe_netdev_ops;
 	netdev->watchdog_timeo = PCH_GBE_WATCHDOG_PERIOD;
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index 8c351f1..fd31546 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -31,10 +31,6 @@
 
 #ifdef CONFIG_TI_CPTS
 
-static struct sock_filter ptp_filter[] = {
-	PTP_FILTER
-};
-
 #define cpts_read32(c, r)	__raw_readl(&c->reg->r)
 #define cpts_write32(c, v, r)	__raw_writel(v, &c->reg->r)
 
@@ -300,7 +296,7 @@ static u64 cpts_find_ts(struct cpts *cpts, struct sk_buff *skb, int ev_type)
 	u64 ns = 0;
 	struct cpts_event *event;
 	struct list_head *this, *next;
-	unsigned int class = sk_run_filter(skb, ptp_filter);
+	unsigned int class = ptp_classify_raw(skb);
 	unsigned long flags;
 	u16 seqid;
 	u8 mtype;
@@ -371,10 +367,6 @@ int cpts_register(struct device *dev, struct cpts *cpts,
 	int err, i;
 	unsigned long flags;
 
-	if (ptp_filter_init(ptp_filter, ARRAY_SIZE(ptp_filter))) {
-		pr_err("cpts: bad ptp filter\n");
-		return -EINVAL;
-	}
 	cpts->info = cpts_info;
 	cpts->clock = ptp_clock_register(&cpts->info, dev);
 	if (IS_ERR(cpts->clock)) {
diff --git a/drivers/net/ethernet/xscale/ixp4xx_eth.c b/drivers/net/ethernet/xscale/ixp4xx_eth.c
index 25283f1..f7e0f0f 100644
--- a/drivers/net/ethernet/xscale/ixp4xx_eth.c
+++ b/drivers/net/ethernet/xscale/ixp4xx_eth.c
@@ -256,10 +256,6 @@ static int ports_open;
 static struct port *npe_port_tab[MAX_NPES];
 static struct dma_pool *dma_pool;
 
-static struct sock_filter ptp_filter[] = {
-	PTP_FILTER
-};
-
 static int ixp_ptp_match(struct sk_buff *skb, u16 uid_hi, u32 uid_lo, u16 seqid)
 {
 	u8 *data = skb->data;
@@ -267,7 +263,7 @@ static int ixp_ptp_match(struct sk_buff *skb, u16 uid_hi, u32 uid_lo, u16 seqid)
 	u16 *hi, *id;
 	u32 lo;
 
-	if (sk_run_filter(skb, ptp_filter) != PTP_CLASS_V1_IPV4)
+	if (ptp_classify_raw(skb) != PTP_CLASS_V1_IPV4)
 		return 0;
 
 	offset = ETH_HLEN + IPV4_HLEN(data) + UDP_HLEN;
@@ -1413,11 +1409,6 @@ static int eth_init_one(struct platform_device *pdev)
 	char phy_id[MII_BUS_ID_SIZE + 3];
 	int err;
 
-	if (ptp_filter_init(ptp_filter, ARRAY_SIZE(ptp_filter))) {
-		pr_err("ixp4xx_eth: bad ptp filter\n");
-		return -EINVAL;
-	}
-
 	if (!(dev = alloc_etherdev(sizeof(struct port))))
 		return -ENOMEM;
 
diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h
index 3decfa4..6d3b0a2 100644
--- a/include/linux/ptp_classify.h
+++ b/include/linux/ptp_classify.h
@@ -80,14 +80,6 @@
 #define OP_RETA	(BPF_RET | BPF_A)
 #define OP_RETK	(BPF_RET | BPF_K)
 
-static inline int ptp_filter_init(struct sock_filter *f, int len)
-{
-	if (OP_LDH == f[0].code)
-		return sk_chk_filter(f, len);
-	else
-		return 0;
-}
-
 #define PTP_FILTER \
 	{OP_LDH,	0,   0, OFF_ETYPE		}, /*              */ \
 	{OP_JEQ,	0,  12, ETH_P_IP		}, /* f goto L20   */ \
@@ -133,4 +125,6 @@ static inline int ptp_filter_init(struct sock_filter *f, int len)
 	{OP_RETA,	0,   0, 0			}, /*              */ \
 /*L6x*/	{OP_RETK,	0,   0, PTP_CLASS_NONE		},
 
+unsigned int ptp_classify_raw(const struct sk_buff *skb);
+
 #endif
diff --git a/net/core/timestamping.c b/net/core/timestamping.c
index 837485d..5347ce4 100644
--- a/net/core/timestamping.c
+++ b/net/core/timestamping.c
@@ -25,11 +25,17 @@
 
 static struct sk_filter *ptp_insns __read_mostly;
 
+unsigned int ptp_classify_raw(const struct sk_buff *skb)
+{
+	return SK_RUN_FILTER(ptp_insns, skb);
+}
+EXPORT_SYMBOL_GPL(ptp_classify_raw);
+
 static unsigned int classify(const struct sk_buff *skb)
 {
 	if (likely(skb->dev && skb->dev->phydev &&
 		   skb->dev->phydev->drv))
-		return SK_RUN_FILTER(ptp_insns, skb);
+		return ptp_classify_raw(skb);
 	else
 		return PTP_CLASS_NONE;
 }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 6/9] net: ppp: use sk_unattached_filter api
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
@ 2014-03-21 12:20   ` Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 2/9] net: filter: keep original BPF program around Daniel Borkmann
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Paul Mackerras, linux-ppp

For the ppp driver, there are currently two open-coded BPF filters in use,
that is, pass_filter and active_filter. Migrate both to make proper use
of sk_unattached_filter_{create,destroy} API so that the actual BPF code
is decoupled from direct access, and filters can be jited as a side-effect
by the internal filter compiler.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
---
 drivers/net/ppp/ppp_generic.c | 60 +++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 72ff14b..e3923eb 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -143,9 +143,8 @@ struct ppp {
 	struct sk_buff_head mrq;	/* MP: receive reconstruction queue */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
-	struct sock_filter *pass_filter;	/* filter for packets to pass */
-	struct sock_filter *active_filter;/* filter for pkts to reset idle */
-	unsigned pass_len, active_len;
+	struct sk_filter *pass_filter;	/* filter for packets to pass */
+	struct sk_filter *active_filter;/* filter for pkts to reset idle */
 #endif /* CONFIG_PPP_FILTER */
 	struct net	*ppp_net;	/* the net we belong to */
 	struct ppp_link_stats stats64;	/* 64 bit network stats */
@@ -755,28 +754,42 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case PPPIOCSPASS:
 	{
 		struct sock_filter *code;
+
 		err = get_filter(argp, &code);
 		if (err >= 0) {
+			struct sock_fprog fprog = {
+				.len = err,
+				.filter = code,
+			};
+
 			ppp_lock(ppp);
-			kfree(ppp->pass_filter);
-			ppp->pass_filter = code;
-			ppp->pass_len = err;
+			if (ppp->pass_filter)
+				sk_unattached_filter_destroy(ppp->pass_filter);
+			err = sk_unattached_filter_create(&ppp->pass_filter,
+							  &fprog);
+			kfree(code);
 			ppp_unlock(ppp);
-			err = 0;
 		}
 		break;
 	}
 	case PPPIOCSACTIVE:
 	{
 		struct sock_filter *code;
+
 		err = get_filter(argp, &code);
 		if (err >= 0) {
+			struct sock_fprog fprog = {
+				.len = err,
+				.filter = code,
+			};
+
 			ppp_lock(ppp);
-			kfree(ppp->active_filter);
-			ppp->active_filter = code;
-			ppp->active_len = err;
+			if (ppp->active_filter)
+				sk_unattached_filter_destroy(ppp->active_filter);
+			err = sk_unattached_filter_create(&ppp->active_filter,
+							  &fprog);
+			kfree(code);
 			ppp_unlock(ppp);
-			err = 0;
 		}
 		break;
 	}
@@ -1184,7 +1197,7 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
 		   a four-byte PPP header on each packet */
 		*skb_push(skb, 2) = 1;
 		if (ppp->pass_filter &&
-		    sk_run_filter(skb, ppp->pass_filter) == 0) {
+		    SK_RUN_FILTER(ppp->pass_filter, skb) == 0) {
 			if (ppp->debug & 1)
 				netdev_printk(KERN_DEBUG, ppp->dev,
 					      "PPP: outbound frame "
@@ -1194,7 +1207,7 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
 		}
 		/* if this packet passes the active filter, record the time */
 		if (!(ppp->active_filter &&
-		      sk_run_filter(skb, ppp->active_filter) == 0))
+		      SK_RUN_FILTER(ppp->active_filter, skb) == 0))
 			ppp->last_xmit = jiffies;
 		skb_pull(skb, 2);
 #else
@@ -1818,7 +1831,7 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)
 
 			*skb_push(skb, 2) = 0;
 			if (ppp->pass_filter &&
-			    sk_run_filter(skb, ppp->pass_filter) == 0) {
+			    SK_RUN_FILTER(ppp->pass_filter, skb) == 0) {
 				if (ppp->debug & 1)
 					netdev_printk(KERN_DEBUG, ppp->dev,
 						      "PPP: inbound frame "
@@ -1827,7 +1840,7 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)
 				return;
 			}
 			if (!(ppp->active_filter &&
-			      sk_run_filter(skb, ppp->active_filter) == 0))
+			      SK_RUN_FILTER(ppp->active_filter, skb) == 0))
 				ppp->last_recv = jiffies;
 			__skb_pull(skb, 2);
 		} else
@@ -2672,6 +2685,10 @@ ppp_create_interface(struct net *net, int unit, int *retp)
 	ppp->minseq = -1;
 	skb_queue_head_init(&ppp->mrq);
 #endif /* CONFIG_PPP_MULTILINK */
+#ifdef CONFIG_PPP_FILTER
+	ppp->pass_filter = NULL;
+	ppp->active_filter = NULL;
+#endif /* CONFIG_PPP_FILTER */
 
 	/*
 	 * drum roll: don't forget to set
@@ -2802,10 +2819,15 @@ static void ppp_destroy_interface(struct ppp *ppp)
 	skb_queue_purge(&ppp->mrq);
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
-	kfree(ppp->pass_filter);
-	ppp->pass_filter = NULL;
-	kfree(ppp->active_filter);
-	ppp->active_filter = NULL;
+	if (ppp->pass_filter) {
+		sk_unattached_filter_destroy(ppp->pass_filter);
+		ppp->pass_filter = NULL;
+	}
+
+	if (ppp->active_filter) {
+		sk_unattached_filter_destroy(ppp->active_filter);
+		ppp->active_filter = NULL;
+	}
 #endif /* CONFIG_PPP_FILTER */
 
 	kfree_skb(ppp->xmit_pending);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 6/9] net: ppp: use sk_unattached_filter api
@ 2014-03-21 12:20   ` Daniel Borkmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Paul Mackerras, linux-ppp

For the ppp driver, there are currently two open-coded BPF filters in use,
that is, pass_filter and active_filter. Migrate both to make proper use
of sk_unattached_filter_{create,destroy} API so that the actual BPF code
is decoupled from direct access, and filters can be jited as a side-effect
by the internal filter compiler.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
---
 drivers/net/ppp/ppp_generic.c | 60 +++++++++++++++++++++++++++++--------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 72ff14b..e3923eb 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -143,9 +143,8 @@ struct ppp {
 	struct sk_buff_head mrq;	/* MP: receive reconstruction queue */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
-	struct sock_filter *pass_filter;	/* filter for packets to pass */
-	struct sock_filter *active_filter;/* filter for pkts to reset idle */
-	unsigned pass_len, active_len;
+	struct sk_filter *pass_filter;	/* filter for packets to pass */
+	struct sk_filter *active_filter;/* filter for pkts to reset idle */
 #endif /* CONFIG_PPP_FILTER */
 	struct net	*ppp_net;	/* the net we belong to */
 	struct ppp_link_stats stats64;	/* 64 bit network stats */
@@ -755,28 +754,42 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case PPPIOCSPASS:
 	{
 		struct sock_filter *code;
+
 		err = get_filter(argp, &code);
 		if (err >= 0) {
+			struct sock_fprog fprog = {
+				.len = err,
+				.filter = code,
+			};
+
 			ppp_lock(ppp);
-			kfree(ppp->pass_filter);
-			ppp->pass_filter = code;
-			ppp->pass_len = err;
+			if (ppp->pass_filter)
+				sk_unattached_filter_destroy(ppp->pass_filter);
+			err = sk_unattached_filter_create(&ppp->pass_filter,
+							  &fprog);
+			kfree(code);
 			ppp_unlock(ppp);
-			err = 0;
 		}
 		break;
 	}
 	case PPPIOCSACTIVE:
 	{
 		struct sock_filter *code;
+
 		err = get_filter(argp, &code);
 		if (err >= 0) {
+			struct sock_fprog fprog = {
+				.len = err,
+				.filter = code,
+			};
+
 			ppp_lock(ppp);
-			kfree(ppp->active_filter);
-			ppp->active_filter = code;
-			ppp->active_len = err;
+			if (ppp->active_filter)
+				sk_unattached_filter_destroy(ppp->active_filter);
+			err = sk_unattached_filter_create(&ppp->active_filter,
+							  &fprog);
+			kfree(code);
 			ppp_unlock(ppp);
-			err = 0;
 		}
 		break;
 	}
@@ -1184,7 +1197,7 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
 		   a four-byte PPP header on each packet */
 		*skb_push(skb, 2) = 1;
 		if (ppp->pass_filter &&
-		    sk_run_filter(skb, ppp->pass_filter) = 0) {
+		    SK_RUN_FILTER(ppp->pass_filter, skb) = 0) {
 			if (ppp->debug & 1)
 				netdev_printk(KERN_DEBUG, ppp->dev,
 					      "PPP: outbound frame "
@@ -1194,7 +1207,7 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
 		}
 		/* if this packet passes the active filter, record the time */
 		if (!(ppp->active_filter &&
-		      sk_run_filter(skb, ppp->active_filter) = 0))
+		      SK_RUN_FILTER(ppp->active_filter, skb) = 0))
 			ppp->last_xmit = jiffies;
 		skb_pull(skb, 2);
 #else
@@ -1818,7 +1831,7 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)
 
 			*skb_push(skb, 2) = 0;
 			if (ppp->pass_filter &&
-			    sk_run_filter(skb, ppp->pass_filter) = 0) {
+			    SK_RUN_FILTER(ppp->pass_filter, skb) = 0) {
 				if (ppp->debug & 1)
 					netdev_printk(KERN_DEBUG, ppp->dev,
 						      "PPP: inbound frame "
@@ -1827,7 +1840,7 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)
 				return;
 			}
 			if (!(ppp->active_filter &&
-			      sk_run_filter(skb, ppp->active_filter) = 0))
+			      SK_RUN_FILTER(ppp->active_filter, skb) = 0))
 				ppp->last_recv = jiffies;
 			__skb_pull(skb, 2);
 		} else
@@ -2672,6 +2685,10 @@ ppp_create_interface(struct net *net, int unit, int *retp)
 	ppp->minseq = -1;
 	skb_queue_head_init(&ppp->mrq);
 #endif /* CONFIG_PPP_MULTILINK */
+#ifdef CONFIG_PPP_FILTER
+	ppp->pass_filter = NULL;
+	ppp->active_filter = NULL;
+#endif /* CONFIG_PPP_FILTER */
 
 	/*
 	 * drum roll: don't forget to set
@@ -2802,10 +2819,15 @@ static void ppp_destroy_interface(struct ppp *ppp)
 	skb_queue_purge(&ppp->mrq);
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
-	kfree(ppp->pass_filter);
-	ppp->pass_filter = NULL;
-	kfree(ppp->active_filter);
-	ppp->active_filter = NULL;
+	if (ppp->pass_filter) {
+		sk_unattached_filter_destroy(ppp->pass_filter);
+		ppp->pass_filter = NULL;
+	}
+
+	if (ppp->active_filter) {
+		sk_unattached_filter_destroy(ppp->active_filter);
+		ppp->active_filter = NULL;
+	}
 #endif /* CONFIG_PPP_FILTER */
 
 	kfree_skb(ppp->xmit_pending);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 7/9] net: isdn: use sk_unattached_filter api
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
                   ` (5 preceding siblings ...)
  2014-03-21 12:20   ` Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 8/9] net: filter: rework/optimize internal BPF interpreter's instruction set Daniel Borkmann
  2014-03-21 12:20 ` [PATCH net-next 9/9] doc: filter: extend BPF documentation to document new internals Daniel Borkmann
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Karsten Keil, isdn4linux

Similarly as in ppp, we need to migrate the ISDN/PPP code to make use
of the sk_unattached_filter api in order to decouple having direct
filter structure access. By using sk_unattached_filter_{create,destroy},
we can allow for the possibility to jit compile filters for faster
filter verdicts as well.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: isdn4linux@listserv.isdn4linux.de
---
 drivers/isdn/i4l/isdn_ppp.c | 61 ++++++++++++++++++++++++++++++---------------
 include/linux/isdn_ppp.h    |  5 ++--
 2 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c
index 38ceac5..a5da511 100644
--- a/drivers/isdn/i4l/isdn_ppp.c
+++ b/drivers/isdn/i4l/isdn_ppp.c
@@ -378,10 +378,15 @@ isdn_ppp_release(int min, struct file *file)
 	is->slcomp = NULL;
 #endif
 #ifdef CONFIG_IPPP_FILTER
-	kfree(is->pass_filter);
-	is->pass_filter = NULL;
-	kfree(is->active_filter);
-	is->active_filter = NULL;
+	if (is->pass_filter) {
+		sk_unattached_filter_destroy(is->pass_filter);
+		is->pass_filter = NULL;
+	}
+
+	if (is->active_filter) {
+		sk_unattached_filter_destroy(is->active_filter);
+		is->active_filter = NULL;
+	}
 #endif
 
 /* TODO: if this was the previous master: link the stuff to the new master */
@@ -629,25 +634,41 @@ isdn_ppp_ioctl(int min, struct file *file, unsigned int cmd, unsigned long arg)
 #ifdef CONFIG_IPPP_FILTER
 	case PPPIOCSPASS:
 	{
+		struct sock_fprog fprog;
 		struct sock_filter *code;
-		int len = get_filter(argp, &code);
+		int err, len = get_filter(argp, &code);
+
 		if (len < 0)
 			return len;
-		kfree(is->pass_filter);
-		is->pass_filter = code;
-		is->pass_len = len;
-		break;
+
+		fprog.len = len;
+		fprog.filter = code;
+
+		if (is->pass_filter)
+			sk_unattached_filter_destroy(is->pass_filter);
+		err = sk_unattached_filter_create(&is->pass_filter, &fprog);
+		kfree(code);
+
+		return err;
 	}
 	case PPPIOCSACTIVE:
 	{
+		struct sock_fprog fprog;
 		struct sock_filter *code;
-		int len = get_filter(argp, &code);
+		int err, len = get_filter(argp, &code);
+
 		if (len < 0)
 			return len;
-		kfree(is->active_filter);
-		is->active_filter = code;
-		is->active_len = len;
-		break;
+
+		fprog.len = len;
+		fprog.filter = code;
+
+		if (is->active_filter)
+			sk_unattached_filter_destroy(is->active_filter);
+		err = sk_unattached_filter_create(&is->active_filter, &fprog);
+		kfree(code);
+
+		return err;
 	}
 #endif /* CONFIG_IPPP_FILTER */
 	default:
@@ -1147,14 +1168,14 @@ isdn_ppp_push_higher(isdn_net_dev *net_dev, isdn_net_local *lp, struct sk_buff *
 	}
 
 	if (is->pass_filter
-	    && sk_run_filter(skb, is->pass_filter) == 0) {
+	    && SK_RUN_FILTER(is->pass_filter, skb) == 0) {
 		if (is->debug & 0x2)
 			printk(KERN_DEBUG "IPPP: inbound frame filtered.\n");
 		kfree_skb(skb);
 		return;
 	}
 	if (!(is->active_filter
-	      && sk_run_filter(skb, is->active_filter) == 0)) {
+	      && SK_RUN_FILTER(is->active_filter, skb) == 0)) {
 		if (is->debug & 0x2)
 			printk(KERN_DEBUG "IPPP: link-active filter: resetting huptimer.\n");
 		lp->huptimer = 0;
@@ -1293,14 +1314,14 @@ isdn_ppp_xmit(struct sk_buff *skb, struct net_device *netdev)
 	}
 
 	if (ipt->pass_filter
-	    && sk_run_filter(skb, ipt->pass_filter) == 0) {
+	    && SK_RUN_FILTER(ipt->pass_filter, skb) == 0) {
 		if (ipt->debug & 0x4)
 			printk(KERN_DEBUG "IPPP: outbound frame filtered.\n");
 		kfree_skb(skb);
 		goto unlock;
 	}
 	if (!(ipt->active_filter
-	      && sk_run_filter(skb, ipt->active_filter) == 0)) {
+	      && SK_RUN_FILTER(ipt->active_filter, skb) == 0)) {
 		if (ipt->debug & 0x4)
 			printk(KERN_DEBUG "IPPP: link-active filter: resetting huptimer.\n");
 		lp->huptimer = 0;
@@ -1490,9 +1511,9 @@ int isdn_ppp_autodial_filter(struct sk_buff *skb, isdn_net_local *lp)
 	}
 
 	drop |= is->pass_filter
-		&& sk_run_filter(skb, is->pass_filter) == 0;
+		&& SK_RUN_FILTER(is->pass_filter, skb) == 0;
 	drop |= is->active_filter
-		&& sk_run_filter(skb, is->active_filter) == 0;
+		&& SK_RUN_FILTER(is->active_filter, skb) == 0;
 
 	skb_push(skb, IPPP_MAX_HEADER - 4);
 	return drop;
diff --git a/include/linux/isdn_ppp.h b/include/linux/isdn_ppp.h
index d5f62bc..8e10f57 100644
--- a/include/linux/isdn_ppp.h
+++ b/include/linux/isdn_ppp.h
@@ -180,9 +180,8 @@ struct ippp_struct {
   struct slcompress *slcomp;
 #endif
 #ifdef CONFIG_IPPP_FILTER
-  struct sock_filter *pass_filter;	/* filter for packets to pass */
-  struct sock_filter *active_filter;	/* filter for pkts to reset idle */
-  unsigned pass_len, active_len;
+  struct sk_filter *pass_filter;   /* filter for packets to pass */
+  struct sk_filter *active_filter; /* filter for pkts to reset idle */
 #endif
   unsigned long debug;
   struct isdn_ppp_compressor *compressor,*decompressor;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 8/9] net: filter: rework/optimize internal BPF interpreter's instruction set
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
                   ` (6 preceding siblings ...)
  2014-03-21 12:20 ` [PATCH net-next 7/9] net: isdn: " Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  2014-03-21 15:40   ` Kees Cook
  2014-03-21 12:20 ` [PATCH net-next 9/9] doc: filter: extend BPF documentation to document new internals Daniel Borkmann
  8 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem
  Cc: ast, netdev, Hagen Paul Pfeifer, Kees Cook, Paul Moore,
	Ingo Molnar, H. Peter Anvin, linux-kernel

From: Alexei Starovoitov <ast@plumgrid.com>

This patch replaces/reworks the kernel-internel BPF interpreter with
an optimized BPF instruction set format that is modelled closer to
mimic native instruction sets and is designed to be JITed with one to
one mapping. Thus, the new interpreter is noticeably faster than the
current implementation of sk_run_filter(); mainly for two reasons:

1. Fall-through jumps:

  BPF jump instructions are forced to go either 'true' or 'false'
  branch which causes branch-miss penalty. The new BPF jump
  instructions have only one branch and fall-through otherwise,
  which fits the CPU branch predictor logic better. `perf stat`
  shows drastic difference for branch-misses between the old and
  new code.

2. Jump-threaded implementation of interpreter vs switch
   statement:

  Instead of single tablejump at the top of 'switch' statement,
  gcc will now generate multiple tablejump instructions, which
  helps CPU branch predictor logic.

In short, the internal format extends BPF in the following way (more
details can be taken from the appended documentation):

  - Number of registers increase from 2 to 10
  - Register width increases from 32-bit to 64-bit
  - Conditional jt/jf targets replaced with jt/fall-through,
    and forward/backward jumps now possible as well
  - Adds signed > and >= insns
  - 16 4-byte stack slots for register spill-fill replaced
    with up to 512 bytes of multi-use stack space
  - Introduction of bpf_call insn and register passing convention
    for zero overhead calls from/to other kernel functions
  - Adds arithmetic right shift insn
  - Adds swab insns for 32/64-bit
  - Adds atomic_add insn
  - Old tax/txa insns are replaced with 'mov dst,src' insn

Note that the verification of filters is still being done through
sk_chk_filter(), so filters from user- or kernel space are verified
in the same way as we do now. We reuse current BPF JIT compilers
in a way that this upgrade would even be fine as is, but nevertheless
allows for a successive upgrade of BPF JIT compilers to the new
format. The internal instruction set migration is being done after
the probing for JIT compilation, so in case JIT compilers are able
to create a native opcode image, we're going to use that, and in all
other cases we're doing a follow-up migration of the BPG program's
instruction set, so that it can be transparently run in the new
interpreter.

Performance of two BPF filters generated by libpcap resp. bpf_asm
was measured on x86_64, i386 and arm32 (other libpcap programs
have similar performance differences):

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
   ((tcp[12]&0xf0)>>2)) != 0)' -dd

Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
same SKB (cache-hit) or 10k SKBs (cache-miss); time in nsec per
call, smaller is better:

--x86_64--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF      90       101        192       202
new BPF      31        71         47        97
old BPF jit  12        34         17        44
new BPF jit TBD

--i386--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     107       136        227       252
new BPF      40       119         69       172

--arm32--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     202       300        475       540
new BPF     180       270        330       470
old BPF jit  26       182         37       202
new BPF jit TBD

Thus, without changing any userland BPF filters, applications on
top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
classifier, netfilter's xt_bpf, team driver's load-balancing mode,
and many more will have better interpreter filtering performance.

While we are replacing the internal BPF interpreter, we also need
to convert seccomp BPF in the same step to make use of the new
internal structure since it makes use of lower-level API details
without being further decoupled through higher-level calls like
sk_unattached_filter_{create,destroy}(), for example.

Just as for normal socket filtering, also seccomp BPF experiences
a time-to-verdict speedup:

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

  seccomp_rule_add_exact(ctx,...
  seccomp_rule_add_exact(ctx,...

  rc = seccomp_load(ctx);

  for (i = 0; i < 10000000; i++)
     syscall(199, 100);

'short filter' has 2 rules
'large filter' has 200 rules

'short filter' performance is slightly better on x86_64/i386/arm32
'large filter' is much faster on x86_64 and i386 and shows no
               difference on arm32

--x86_64-- short filter
old BPF: 2.7 sec
 39.12%  bench  libc-2.15.so       [.] syscall
  8.10%  bench  [kernel.kallsyms]  [k] sk_run_filter
  6.31%  bench  [kernel.kallsyms]  [k] system_call
  5.59%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  4.37%  bench  [kernel.kallsyms]  [k] trace_hardirqs_off_caller
  3.70%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.67%  bench  [kernel.kallsyms]  [k] lock_is_held
  3.03%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
new BPF: 2.58 sec
 42.05%  bench  libc-2.15.so       [.] syscall
  6.91%  bench  [kernel.kallsyms]  [k] system_call
  6.25%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  6.07%  bench  [kernel.kallsyms]  [k] __secure_computing
  5.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp

--arm32-- short filter
old BPF: 4.0 sec
 39.92%  bench  [kernel.kallsyms]  [k] vector_swi
 16.60%  bench  [kernel.kallsyms]  [k] sk_run_filter
 14.66%  bench  libc-2.17.so       [.] syscall
  5.42%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  5.10%  bench  [kernel.kallsyms]  [k] __secure_computing
new BPF: 3.7 sec
 35.93%  bench  [kernel.kallsyms]  [k] vector_swi
 21.89%  bench  libc-2.17.so       [.] syscall
 13.45%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
  6.25%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.96%  bench  [kernel.kallsyms]  [k] syscall_trace_exit

--x86_64-- large filter
old BPF: 8.6 seconds
    73.38%    bench  [kernel.kallsyms]  [k] sk_run_filter
    10.70%    bench  libc-2.15.so       [.] syscall
     5.09%    bench  [kernel.kallsyms]  [k] seccomp_bpf_load
     1.97%    bench  [kernel.kallsyms]  [k] system_call
new BPF: 5.7 seconds
    66.20%    bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
    16.75%    bench  libc-2.15.so       [.] syscall
     3.31%    bench  [kernel.kallsyms]  [k] system_call
     2.88%    bench  [kernel.kallsyms]  [k] __secure_computing

--i386-- large filter
old BPF: 5.4 sec
new BPF: 3.8 sec

--arm32-- large filter
old BPF: 13.5 sec
 73.88%  bench  [kernel.kallsyms]  [k] sk_run_filter
 10.29%  bench  [kernel.kallsyms]  [k] vector_swi
  6.46%  bench  libc-2.17.so       [.] syscall
  2.94%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  1.19%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.87%  bench  [kernel.kallsyms]  [k] sys_getuid
new BPF: 13.5 sec
 76.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
 10.98%  bench  [kernel.kallsyms]  [k] vector_swi
  5.87%  bench  libc-2.17.so       [.] syscall
  1.77%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.93%  bench  [kernel.kallsyms]  [k] sys_getuid

BPF filters generated by seccomp are very branchy, so the new
internal BPF performance is better than the old one. Performance
gains will be even higher when BPF JIT is committed for the
new structure, which is planned in future work (as successive
JIT migrations).

BPF has also been stress-tested with trinity's BPF fuzzer.

Joint work with Daniel Borkmann.

References: http://thread.gmane.org/gmane.linux.kernel/1665858
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
---
 v1 -> v10 history at:
  - http://thread.gmane.org/gmane.linux.kernel/1665858

 include/linux/filter.h  |   66 ++-
 include/linux/seccomp.h |    1 -
 kernel/seccomp.c        |  119 ++--
 net/core/filter.c       | 1415 +++++++++++++++++++++++++++++++++++++----------
 4 files changed, 1229 insertions(+), 372 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9bde3ed..3ea12fa 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -9,13 +9,50 @@
 #include <linux/workqueue.h>
 #include <uapi/linux/filter.h>
 
-#ifdef CONFIG_COMPAT
-/*
- * A struct sock_filter is architecture independent.
+/* Internally used and optimized filter representation with extended
+ * instruction set based on top of classic BPF.
  */
+
+/* instruction classes */
+#define BPF_ALU64	0x07	/* alu mode in double word width */
+
+/* ld/ldx fields */
+#define BPF_DW		0x18	/* double word */
+#define BPF_XADD	0xc0	/* exclusive add */
+
+/* alu/jmp fields */
+#define BPF_MOV		0xb0	/* mov reg to reg */
+#define BPF_ARSH	0xc0	/* sign extending arithmetic shift right */
+#define BPF_BSWAP	0xd0	/* swap 4 or 8 bytes of 64-bit register */
+
+#define BPF_JNE		0x50	/* jump != */
+#define BPF_JSGT	0x60	/* SGT is signed '>', GT in x86 */
+#define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
+#define BPF_CALL	0x80	/* function call */
+
+/* BPF has 10 general purpose 64-bit registers and stack frame. */
+#define MAX_BPF_REG	11
+
+/* BPF program can access up to 512 bytes of stack space. */
+#define MAX_BPF_STACK	512
+
+/* Context and stack frame pointer register positions. */
+#define CTX_REG		1
+#define FP_REG		10
+
+struct sock_filter_int {
+	__u8	code;		/* opcode */
+	__u8	a_reg:4;	/* dest register */
+	__u8	x_reg:4;	/* source register */
+	__s16	off;		/* signed offset */
+	__s32	imm;		/* signed immediate constant */
+};
+
+#ifdef CONFIG_COMPAT
+/* A struct sock_filter is architecture independent. */
 struct compat_sock_fprog {
 	u16		len;
-	compat_uptr_t	filter;		/* struct sock_filter * */
+	compat_uptr_t	filter;	/* struct sock_filter * */
 };
 #endif
 
@@ -26,6 +63,7 @@ struct sock_fprog_kern {
 
 struct sk_buff;
 struct sock;
+struct seccomp_data;
 
 struct sk_filter {
 	atomic_t		refcnt;
@@ -34,9 +72,10 @@ struct sk_filter {
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
 	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
-					    const struct sock_filter *filter);
+					    const struct sock_filter_int *filter);
 	union {
-		struct sock_filter     	insns[0];
+		struct sock_filter	insns[0];
+		struct sock_filter_int	insnsi[0];
 		struct work_struct	work;
 	};
 };
@@ -50,9 +89,18 @@ static inline unsigned int sk_filter_size(unsigned int proglen)
 #define sk_filter_proglen(fprog)			\
 		(fprog->len * sizeof(fprog->filter[0]))
 
+#define SK_RUN_FILTER(filter, ctx)			\
+		(*filter->bpf_func)(ctx, filter->insnsi)
+
 int sk_filter(struct sock *sk, struct sk_buff *skb);
-unsigned int sk_run_filter(const struct sk_buff *skb,
-			   const struct sock_filter *filter);
+
+u32 sk_run_filter_int_seccomp(const struct seccomp_data *ctx,
+			      const struct sock_filter_int *insni);
+u32 sk_run_filter_int_skb(const struct sk_buff *ctx,
+			  const struct sock_filter_int *insni);
+
+int sk_convert_filter(struct sock_filter *prog, int len,
+		      struct sock_filter_int *new_prog, int *new_len);
 
 int sk_unattached_filter_create(struct sk_filter **pfp,
 				struct sock_fprog *fprog);
@@ -86,7 +134,6 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
 		print_hex_dump(KERN_ERR, "JIT code: ", DUMP_PREFIX_OFFSET,
 			       16, 1, image, proglen, false);
 }
-#define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
 #else
 #include <linux/slab.h>
 static inline void bpf_jit_compile(struct sk_filter *fp)
@@ -96,7 +143,6 @@ static inline void bpf_jit_free(struct sk_filter *fp)
 {
 	kfree(fp);
 }
-#define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
 #endif
 
 static inline int bpf_tell_extensions(void)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 6f19cfd..4054b09 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -76,7 +76,6 @@ static inline int seccomp_mode(struct seccomp *s)
 #ifdef CONFIG_SECCOMP_FILTER
 extern void put_seccomp_filter(struct task_struct *tsk);
 extern void get_seccomp_filter(struct task_struct *tsk);
-extern u32 seccomp_bpf_load(int off);
 #else  /* CONFIG_SECCOMP_FILTER */
 static inline void put_seccomp_filter(struct task_struct *tsk)
 {
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b7a1004..4f18e75 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -55,60 +55,33 @@ struct seccomp_filter {
 	atomic_t usage;
 	struct seccomp_filter *prev;
 	unsigned short len;  /* Instruction count */
-	struct sock_filter insns[];
+	struct sock_filter_int insnsi[];
 };
 
 /* Limit any path through the tree to 256KB worth of instructions. */
 #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
 
-/**
- * get_u32 - returns a u32 offset into data
- * @data: a unsigned 64 bit value
- * @index: 0 or 1 to return the first or second 32-bits
- *
- * This inline exists to hide the length of unsigned long.  If a 32-bit
- * unsigned long is passed in, it will be extended and the top 32-bits will be
- * 0. If it is a 64-bit unsigned long, then whatever data is resident will be
- * properly returned.
- *
+/*
  * Endianness is explicitly ignored and left for BPF program authors to manage
  * as per the specific architecture.
  */
-static inline u32 get_u32(u64 data, int index)
+static void populate_seccomp_data(struct seccomp_data *sd)
 {
-	return ((u32 *)&data)[index];
-}
+	struct task_struct *task = current;
+	struct pt_regs *regs = task_pt_regs(task);
 
-/* Helper for bpf_load below. */
-#define BPF_DATA(_name) offsetof(struct seccomp_data, _name)
-/**
- * bpf_load: checks and returns a pointer to the requested offset
- * @off: offset into struct seccomp_data to load from
- *
- * Returns the requested 32-bits of data.
- * seccomp_check_filter() should assure that @off is 32-bit aligned
- * and not out of bounds.  Failure to do so is a BUG.
- */
-u32 seccomp_bpf_load(int off)
-{
-	struct pt_regs *regs = task_pt_regs(current);
-	if (off == BPF_DATA(nr))
-		return syscall_get_nr(current, regs);
-	if (off == BPF_DATA(arch))
-		return syscall_get_arch(current, regs);
-	if (off >= BPF_DATA(args[0]) && off < BPF_DATA(args[6])) {
-		unsigned long value;
-		int arg = (off - BPF_DATA(args[0])) / sizeof(u64);
-		int index = !!(off % sizeof(u64));
-		syscall_get_arguments(current, regs, arg, 1, &value);
-		return get_u32(value, index);
-	}
-	if (off == BPF_DATA(instruction_pointer))
-		return get_u32(KSTK_EIP(current), 0);
-	if (off == BPF_DATA(instruction_pointer) + sizeof(u32))
-		return get_u32(KSTK_EIP(current), 1);
-	/* seccomp_check_filter should make this impossible. */
-	BUG();
+	sd->nr = syscall_get_nr(task, regs);
+	sd->arch = syscall_get_arch(task, regs);
+
+	/* Unroll syscall_get_args to help gcc on arm. */
+	syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
+	syscall_get_arguments(task, regs, 1, 1, (unsigned long *) &sd->args[1]);
+	syscall_get_arguments(task, regs, 2, 1, (unsigned long *) &sd->args[2]);
+	syscall_get_arguments(task, regs, 3, 1, (unsigned long *) &sd->args[3]);
+	syscall_get_arguments(task, regs, 4, 1, (unsigned long *) &sd->args[4]);
+	syscall_get_arguments(task, regs, 5, 1, (unsigned long *) &sd->args[5]);
+
+	sd->instruction_pointer = KSTK_EIP(task);
 }
 
 /**
@@ -133,17 +106,17 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 
 		switch (code) {
 		case BPF_S_LD_W_ABS:
-			ftest->code = BPF_S_ANC_SECCOMP_LD_W;
+			ftest->code = BPF_LDX | BPF_W | BPF_ABS;
 			/* 32-bit aligned and not out of bounds. */
 			if (k >= sizeof(struct seccomp_data) || k & 3)
 				return -EINVAL;
 			continue;
 		case BPF_S_LD_W_LEN:
-			ftest->code = BPF_S_LD_IMM;
+			ftest->code = BPF_LD | BPF_IMM;
 			ftest->k = sizeof(struct seccomp_data);
 			continue;
 		case BPF_S_LDX_W_LEN:
-			ftest->code = BPF_S_LDX_IMM;
+			ftest->code = BPF_LDX | BPF_IMM;
 			ftest->k = sizeof(struct seccomp_data);
 			continue;
 		/* Explicitly include allowed calls. */
@@ -185,6 +158,7 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 		case BPF_S_JMP_JGT_X:
 		case BPF_S_JMP_JSET_K:
 		case BPF_S_JMP_JSET_X:
+			sk_decode_filter(ftest, ftest);
 			continue;
 		default:
 			return -EINVAL;
@@ -202,18 +176,21 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
 static u32 seccomp_run_filters(int syscall)
 {
 	struct seccomp_filter *f;
+	struct seccomp_data sd;
 	u32 ret = SECCOMP_RET_ALLOW;
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
 	if (WARN_ON(current->seccomp.filter == NULL))
 		return SECCOMP_RET_KILL;
 
+	populate_seccomp_data(&sd);
+
 	/*
 	 * All filters in the list are evaluated and the lowest BPF return
 	 * value always takes priority (ignoring the DATA).
 	 */
 	for (f = current->seccomp.filter; f; f = f->prev) {
-		u32 cur_ret = sk_run_filter(NULL, f->insns);
+		u32 cur_ret = sk_run_filter_int_seccomp(&sd, f->insnsi);
 		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
 			ret = cur_ret;
 	}
@@ -231,6 +208,8 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	struct seccomp_filter *filter;
 	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
 	unsigned long total_insns = fprog->len;
+	struct sock_filter *fp;
+	int new_len;
 	long ret;
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
@@ -252,28 +231,43 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 				     CAP_SYS_ADMIN) != 0)
 		return -EACCES;
 
-	/* Allocate a new seccomp_filter */
-	filter = kzalloc(sizeof(struct seccomp_filter) + fp_size,
-			 GFP_KERNEL|__GFP_NOWARN);
-	if (!filter)
+	fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
+	if (!fp)
 		return -ENOMEM;
-	atomic_set(&filter->usage, 1);
-	filter->len = fprog->len;
 
 	/* Copy the instructions from fprog. */
 	ret = -EFAULT;
-	if (copy_from_user(filter->insns, fprog->filter, fp_size))
-		goto fail;
+	if (copy_from_user(fp, fprog->filter, fp_size))
+		goto free_prog;
 
 	/* Check and rewrite the fprog via the skb checker */
-	ret = sk_chk_filter(filter->insns, filter->len);
+	ret = sk_chk_filter(fp, fprog->len);
 	if (ret)
-		goto fail;
+		goto free_prog;
 
 	/* Check and rewrite the fprog for seccomp use */
-	ret = seccomp_check_filter(filter->insns, filter->len);
+	ret = seccomp_check_filter(fp, fprog->len);
+	if (ret)
+		goto free_prog;
+
+	/* Convert 'sock_filter' insns to 'sock_filter_int' insns */
+	ret = sk_convert_filter(fp, fprog->len, NULL, &new_len);
+	if (ret)
+		goto free_prog;
+
+	/* Allocate a new seccomp_filter */
+	filter = kzalloc(sizeof(struct seccomp_filter) +
+			 sizeof(struct sock_filter_int) * new_len,
+			 GFP_KERNEL|__GFP_NOWARN);
+	if (!filter)
+		goto free_prog;
+
+	ret = sk_convert_filter(fp, fprog->len, filter->insnsi, &new_len);
 	if (ret)
-		goto fail;
+		goto free_filter;
+
+	atomic_set(&filter->usage, 1);
+	filter->len = new_len;
 
 	/*
 	 * If there is an existing filter, make it the prev and don't drop its
@@ -282,8 +276,11 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	filter->prev = current->seccomp.filter;
 	current->seccomp.filter = filter;
 	return 0;
-fail:
+
+free_filter:
 	kfree(filter);
+free_prog:
+	kfree(fp);
 	return ret;
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 976edc6..683f1e8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1,11 +1,16 @@
 /*
  * Linux Socket Filter - Kernel level socket filtering
  *
- * Author:
- *     Jay Schulist <jschlst@samba.org>
+ * Based on the design of the Berkeley Packet Filter. The new
+ * internal format has been designed by PLUMgrid:
  *
- * Based on the design of:
- *     - The Berkeley Packet Filter
+ *	Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ *
+ * Authors:
+ *
+ *	Jay Schulist <jschlst@samba.org>
+ *	Alexei Starovoitov <ast@plumgrid.com>
+ *	Daniel Borkmann <dborkman@redhat.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -35,6 +40,7 @@
 #include <linux/timer.h>
 #include <asm/uaccess.h>
 #include <asm/unaligned.h>
+#include <asm/byteorder.h>
 #include <linux/filter.h>
 #include <linux/ratelimit.h>
 #include <linux/seccomp.h>
@@ -108,304 +114,1002 @@ int sk_filter(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sk_filter);
 
+/* Base function for offset calculation. Needs to go into .text section,
+ * therefore keeping it non-static as well; will also be used by JITs
+ * anyway later on, so do not let the compiler omit it.
+ */
+noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	return 0;
+}
+
 /**
- *	sk_run_filter - run a filter on a socket
- *	@skb: buffer to run the filter on
+ *	__sk_run_filter - run a filter on a given context
+ *	@ctx: buffer to run the filter on
  *	@fentry: filter to apply
  *
- * Decode and apply filter instructions to the skb->data.
- * Return length to keep, 0 for none. @skb is the data we are
- * filtering, @filter is the array of filter instructions.
- * Because all jumps are guaranteed to be before last instruction,
- * and last instruction guaranteed to be a RET, we dont need to check
- * flen. (We used to pass to this function the length of filter)
+ * Decode and apply filter instructions to the skb->data. Return length to
+ * keep, 0 for none. @ctx is the data we are operating on, @filter is the
+ * array of filter instructions.
  */
-unsigned int sk_run_filter(const struct sk_buff *skb,
-			   const struct sock_filter *fentry)
+unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 {
+	u64 stack[MAX_BPF_STACK / sizeof(u64)];
+	u64 regs[MAX_BPF_REG], tmp;
 	void *ptr;
-	u32 A = 0;			/* Accumulator */
-	u32 X = 0;			/* Index Register */
-	u32 mem[BPF_MEMWORDS];		/* Scratch Memory Store */
-	u32 tmp;
-	int k;
+	int off;
+
+#define K insn->imm
+#define A regs[insn->a_reg]
+#define X regs[insn->x_reg]
+
+#define CONT	 ({insn++; goto select_insn; })
+#define CONT_JMP ({insn++; goto select_insn; })
+
+	static const void *jumptable[256] = {
+		[0 ... 255] = &&default_label,
+		/* Overwrite non-defaults ... */
+#define DL(A, B, C)	[A|B|C] = &&A##_##B##_##C
+		DL(BPF_ALU, BPF_ADD, BPF_X),
+		DL(BPF_ALU, BPF_ADD, BPF_K),
+		DL(BPF_ALU, BPF_SUB, BPF_X),
+		DL(BPF_ALU, BPF_SUB, BPF_K),
+		DL(BPF_ALU, BPF_AND, BPF_X),
+		DL(BPF_ALU, BPF_AND, BPF_K),
+		DL(BPF_ALU, BPF_OR, BPF_X),
+		DL(BPF_ALU, BPF_OR, BPF_K),
+		DL(BPF_ALU, BPF_LSH, BPF_X),
+		DL(BPF_ALU, BPF_LSH, BPF_K),
+		DL(BPF_ALU, BPF_RSH, BPF_X),
+		DL(BPF_ALU, BPF_RSH, BPF_K),
+		DL(BPF_ALU, BPF_XOR, BPF_X),
+		DL(BPF_ALU, BPF_XOR, BPF_K),
+		DL(BPF_ALU, BPF_MUL, BPF_X),
+		DL(BPF_ALU, BPF_MUL, BPF_K),
+		DL(BPF_ALU, BPF_MOV, BPF_X),
+		DL(BPF_ALU, BPF_MOV, BPF_K),
+		DL(BPF_ALU, BPF_DIV, BPF_X),
+		DL(BPF_ALU, BPF_DIV, BPF_K),
+		DL(BPF_ALU, BPF_MOD, BPF_X),
+		DL(BPF_ALU, BPF_MOD, BPF_K),
+		DL(BPF_ALU, BPF_BSWAP, BPF_X),
+		DL(BPF_ALU, BPF_NEG, 0),
+		DL(BPF_ALU64, BPF_ADD, BPF_X),
+		DL(BPF_ALU64, BPF_ADD, BPF_K),
+		DL(BPF_ALU64, BPF_SUB, BPF_X),
+		DL(BPF_ALU64, BPF_SUB, BPF_K),
+		DL(BPF_ALU64, BPF_AND, BPF_X),
+		DL(BPF_ALU64, BPF_AND, BPF_K),
+		DL(BPF_ALU64, BPF_OR, BPF_X),
+		DL(BPF_ALU64, BPF_OR, BPF_K),
+		DL(BPF_ALU64, BPF_LSH, BPF_X),
+		DL(BPF_ALU64, BPF_LSH, BPF_K),
+		DL(BPF_ALU64, BPF_RSH, BPF_X),
+		DL(BPF_ALU64, BPF_RSH, BPF_K),
+		DL(BPF_ALU64, BPF_XOR, BPF_X),
+		DL(BPF_ALU64, BPF_XOR, BPF_K),
+		DL(BPF_ALU64, BPF_MUL, BPF_X),
+		DL(BPF_ALU64, BPF_MUL, BPF_K),
+		DL(BPF_ALU64, BPF_MOV, BPF_X),
+		DL(BPF_ALU64, BPF_MOV, BPF_K),
+		DL(BPF_ALU64, BPF_ARSH, BPF_X),
+		DL(BPF_ALU64, BPF_ARSH, BPF_K),
+		DL(BPF_ALU64, BPF_DIV, BPF_X),
+		DL(BPF_ALU64, BPF_DIV, BPF_K),
+		DL(BPF_ALU64, BPF_MOD, BPF_X),
+		DL(BPF_ALU64, BPF_MOD, BPF_K),
+		DL(BPF_ALU64, BPF_BSWAP, BPF_X),
+		DL(BPF_ALU64, BPF_NEG, 0),
+		DL(BPF_JMP, BPF_CALL, 0),
+		DL(BPF_JMP, BPF_JA, 0),
+		DL(BPF_JMP, BPF_JEQ, BPF_X),
+		DL(BPF_JMP, BPF_JEQ, BPF_K),
+		DL(BPF_JMP, BPF_JNE, BPF_X),
+		DL(BPF_JMP, BPF_JNE, BPF_K),
+		DL(BPF_JMP, BPF_JGT, BPF_X),
+		DL(BPF_JMP, BPF_JGT, BPF_K),
+		DL(BPF_JMP, BPF_JGE, BPF_X),
+		DL(BPF_JMP, BPF_JGE, BPF_K),
+		DL(BPF_JMP, BPF_JSGT, BPF_X),
+		DL(BPF_JMP, BPF_JSGT, BPF_K),
+		DL(BPF_JMP, BPF_JSGE, BPF_X),
+		DL(BPF_JMP, BPF_JSGE, BPF_K),
+		DL(BPF_JMP, BPF_JSET, BPF_X),
+		DL(BPF_JMP, BPF_JSET, BPF_K),
+		DL(BPF_STX, BPF_MEM, BPF_B),
+		DL(BPF_STX, BPF_MEM, BPF_H),
+		DL(BPF_STX, BPF_MEM, BPF_W),
+		DL(BPF_STX, BPF_MEM, BPF_DW),
+		DL(BPF_ST, BPF_MEM, BPF_B),
+		DL(BPF_ST, BPF_MEM, BPF_H),
+		DL(BPF_ST, BPF_MEM, BPF_W),
+		DL(BPF_ST, BPF_MEM, BPF_DW),
+		DL(BPF_LDX, BPF_MEM, BPF_B),
+		DL(BPF_LDX, BPF_MEM, BPF_H),
+		DL(BPF_LDX, BPF_MEM, BPF_W),
+		DL(BPF_LDX, BPF_MEM, BPF_DW),
+		DL(BPF_STX, BPF_XADD, BPF_W),
+		DL(BPF_STX, BPF_XADD, BPF_DW),
+		DL(BPF_LD, BPF_ABS, BPF_W),
+		DL(BPF_LD, BPF_ABS, BPF_H),
+		DL(BPF_LD, BPF_ABS, BPF_B),
+		DL(BPF_LD, BPF_IND, BPF_W),
+		DL(BPF_LD, BPF_IND, BPF_H),
+		DL(BPF_LD, BPF_IND, BPF_B),
+		DL(BPF_RET, BPF_K, 0),
+#undef DL
+	};
 
-	/*
-	 * Process array of filter instructions.
-	 */
-	for (;; fentry++) {
-#if defined(CONFIG_X86_32)
-#define	K (fentry->k)
-#else
-		const u32 K = fentry->k;
-#endif
-
-		switch (fentry->code) {
-		case BPF_S_ALU_ADD_X:
-			A += X;
-			continue;
-		case BPF_S_ALU_ADD_K:
-			A += K;
-			continue;
-		case BPF_S_ALU_SUB_X:
-			A -= X;
-			continue;
-		case BPF_S_ALU_SUB_K:
-			A -= K;
-			continue;
-		case BPF_S_ALU_MUL_X:
-			A *= X;
-			continue;
-		case BPF_S_ALU_MUL_K:
-			A *= K;
-			continue;
-		case BPF_S_ALU_DIV_X:
-			if (X == 0)
-				return 0;
-			A /= X;
-			continue;
-		case BPF_S_ALU_DIV_K:
-			A /= K;
-			continue;
-		case BPF_S_ALU_MOD_X:
-			if (X == 0)
-				return 0;
-			A %= X;
-			continue;
-		case BPF_S_ALU_MOD_K:
-			A %= K;
-			continue;
-		case BPF_S_ALU_AND_X:
-			A &= X;
-			continue;
-		case BPF_S_ALU_AND_K:
-			A &= K;
-			continue;
-		case BPF_S_ALU_OR_X:
-			A |= X;
-			continue;
-		case BPF_S_ALU_OR_K:
-			A |= K;
-			continue;
-		case BPF_S_ANC_ALU_XOR_X:
-		case BPF_S_ALU_XOR_X:
-			A ^= X;
-			continue;
-		case BPF_S_ALU_XOR_K:
-			A ^= K;
-			continue;
-		case BPF_S_ALU_LSH_X:
-			A <<= X;
-			continue;
-		case BPF_S_ALU_LSH_K:
-			A <<= K;
-			continue;
-		case BPF_S_ALU_RSH_X:
-			A >>= X;
-			continue;
-		case BPF_S_ALU_RSH_K:
-			A >>= K;
-			continue;
-		case BPF_S_ALU_NEG:
-			A = -A;
-			continue;
-		case BPF_S_JMP_JA:
-			fentry += K;
-			continue;
-		case BPF_S_JMP_JGT_K:
-			fentry += (A > K) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JGE_K:
-			fentry += (A >= K) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JEQ_K:
-			fentry += (A == K) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JSET_K:
-			fentry += (A & K) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JGT_X:
-			fentry += (A > X) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JGE_X:
-			fentry += (A >= X) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JEQ_X:
-			fentry += (A == X) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_JMP_JSET_X:
-			fentry += (A & X) ? fentry->jt : fentry->jf;
-			continue;
-		case BPF_S_LD_W_ABS:
-			k = K;
-load_w:
-			ptr = load_pointer(skb, k, 4, &tmp);
-			if (ptr != NULL) {
-				A = get_unaligned_be32(ptr);
-				continue;
-			}
-			return 0;
-		case BPF_S_LD_H_ABS:
-			k = K;
-load_h:
-			ptr = load_pointer(skb, k, 2, &tmp);
-			if (ptr != NULL) {
-				A = get_unaligned_be16(ptr);
-				continue;
+	regs[FP_REG]  = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
+	regs[CTX_REG] = (u64) (unsigned long) ctx;
+
+select_insn:
+	goto *jumptable[insn->code];
+
+	/* ALU */
+#define ALU(OPCODE, OP)			\
+	BPF_ALU64_##OPCODE##_BPF_X:	\
+		A = A OP X;		\
+		CONT;			\
+	BPF_ALU_##OPCODE##_BPF_X:	\
+		A = (u32) A OP (u32) X;	\
+		CONT;			\
+	BPF_ALU64_##OPCODE##_BPF_K:	\
+		A = A OP K;		\
+		CONT;			\
+	BPF_ALU_##OPCODE##_BPF_K:	\
+		A = (u32) A OP (u32) K;	\
+		CONT;
+
+	ALU(BPF_ADD,  +)
+	ALU(BPF_SUB,  -)
+	ALU(BPF_AND,  &)
+	ALU(BPF_OR,   |)
+	ALU(BPF_LSH, <<)
+	ALU(BPF_RSH, >>)
+	ALU(BPF_XOR,  ^)
+	ALU(BPF_MUL,  *)
+#undef ALU
+	BPF_ALU_BPF_NEG_0:
+		A = (u32) -A;
+		CONT;
+	BPF_ALU64_BPF_NEG_0:
+		A = -A;
+		CONT;
+	BPF_ALU_BPF_MOV_BPF_X:
+		A = (u32) X;
+		CONT;
+	BPF_ALU_BPF_MOV_BPF_K:
+		A = (u32) K;
+		CONT;
+	BPF_ALU64_BPF_MOV_BPF_X:
+		A = X;
+		CONT;
+	BPF_ALU64_BPF_MOV_BPF_K:
+		A = K;
+		CONT;
+	BPF_ALU64_BPF_ARSH_BPF_X:
+		(*(s64 *) &A) >>= X;
+		CONT;
+	BPF_ALU64_BPF_ARSH_BPF_K:
+		(*(s64 *) &A) >>= K;
+		CONT;
+	BPF_ALU64_BPF_MOD_BPF_X:
+		tmp = A;
+		if (X)
+			A = do_div(tmp, X);
+		CONT;
+	BPF_ALU_BPF_MOD_BPF_X:
+		tmp = (u32) A;
+		if (X)
+			A = do_div(tmp, (u32) X);
+		CONT;
+	BPF_ALU64_BPF_MOD_BPF_K:
+		tmp = A;
+		if (K)
+			A = do_div(tmp, K);
+		CONT;
+	BPF_ALU_BPF_MOD_BPF_K:
+		tmp = (u32) A;
+		if (K)
+			A = do_div(tmp, (u32) K);
+		CONT;
+	BPF_ALU64_BPF_DIV_BPF_X:
+		if (X)
+			do_div(A, X);
+		CONT;
+	BPF_ALU_BPF_DIV_BPF_X:
+		tmp = (u32) A;
+		if (X)
+			do_div(tmp, (u32) X);
+		A = (u32) tmp;
+		CONT;
+	BPF_ALU64_BPF_DIV_BPF_K:
+		if (K)
+			do_div(A, K);
+		CONT;
+	BPF_ALU_BPF_DIV_BPF_K:
+		tmp = (u32) A;
+		if (K)
+			do_div(tmp, (u32) K);
+		A = (u32) tmp;
+		CONT;
+	BPF_ALU_BPF_BSWAP_BPF_X:
+		A = swab32(A);
+		CONT;
+	BPF_ALU64_BPF_BSWAP_BPF_X:
+		A = swab64(A);
+		CONT;
+
+	/* CALL */
+	BPF_JMP_BPF_CALL_0:
+		regs[0] = (__bpf_call_base + insn->imm)(regs[1], regs[2],
+							regs[3], regs[4],
+							regs[5]);
+		CONT;
+
+	/* JMP */
+	BPF_JMP_BPF_JA_0:
+		insn += insn->off;
+		CONT;
+	BPF_JMP_BPF_JEQ_BPF_X:
+		if (A == X) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JEQ_BPF_K:
+		if (A == K) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JNE_BPF_X:
+		if (A != X) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JNE_BPF_K:
+		if (A != K) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JGT_BPF_X:
+		if (A > X) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JGT_BPF_K:
+		if (A > K) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JGE_BPF_X:
+		if (A >= X) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JGE_BPF_K:
+		if (A >= K) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JSGT_BPF_X:
+		if (((s64)A) > ((s64)X)) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JSGT_BPF_K:
+		if (((s64)A) > ((s64)K)) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JSGE_BPF_X:
+		if (((s64)A) >= ((s64)X)) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JSGE_BPF_K:
+		if (((s64)A) >= ((s64)K)) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JSET_BPF_X:
+		if (A & X) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+	BPF_JMP_BPF_JSET_BPF_K:
+		if (A & K) {
+			insn += insn->off;
+			CONT_JMP;
+		}
+		CONT;
+
+	/* STX and ST and LDX*/
+#define LDST(SIZEOP, SIZE)					\
+	BPF_STX_BPF_MEM_##SIZEOP:				\
+		*(SIZE *)(unsigned long) (A + insn->off) = X;	\
+		CONT;						\
+	BPF_ST_BPF_MEM_##SIZEOP:				\
+		*(SIZE *)(unsigned long) (A + insn->off) = K;	\
+		CONT;						\
+	BPF_LDX_BPF_MEM_##SIZEOP:				\
+		A = *(SIZE *)(unsigned long) (X + insn->off);	\
+		CONT;
+
+	LDST(BPF_B,   u8)
+	LDST(BPF_H,  u16)
+	LDST(BPF_W,  u32)
+	LDST(BPF_DW, u64)
+#undef LDST
+	BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
+		atomic_add((u32) X, (atomic_t *)(unsigned long)
+			   (A + insn->off));
+		CONT;
+	BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
+		atomic64_add((u64) X, (atomic64_t *)(unsigned long)
+			     (A + insn->off));
+		CONT;
+	BPF_LD_BPF_ABS_BPF_W: /* A = *(u32 *)(ctx + K) */
+		off = K;
+load_word:
+		/* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
+		 * appearing in the programs where ctx == skb.
+		 */
+		ptr = load_pointer((struct sk_buff *) ctx, off, 4, &tmp);
+		if (likely(ptr != NULL)) {
+			A = get_unaligned_be32(ptr);
+			CONT;
+		}
+		return 0;
+	BPF_LD_BPF_ABS_BPF_H: /* A = *(u16 *)(ctx + K) */
+		off = K;
+load_half:
+		ptr = load_pointer((struct sk_buff *) ctx, off, 2, &tmp);
+		if (likely(ptr != NULL)) {
+			A = get_unaligned_be16(ptr);
+			CONT;
+		}
+		return 0;
+
+	BPF_LD_BPF_ABS_BPF_B: /* A = *(u8 *)(ctx + K) */
+		off = K;
+load_byte:
+		ptr = load_pointer((struct sk_buff *) ctx, off, 1, &tmp);
+		if (likely(ptr != NULL)) {
+			A = *(u8 *)ptr;
+			CONT;
+		}
+		return 0;
+	BPF_LD_BPF_IND_BPF_W: /* A = *(u32 *)(ctx + X + K) */
+		off = K + X;
+		goto load_word;
+	BPF_LD_BPF_IND_BPF_H: /* A = *(u16 *)(ctx + X + K) */
+		off = K + X;
+		goto load_half;
+	BPF_LD_BPF_IND_BPF_B: /* A = *(u8 *)(ctx + X + K) */
+		off = K + X;
+		goto load_byte;
+
+	/* RET */
+	BPF_RET_BPF_K_0:
+		return regs[0 /* R0 */];
+
+	default_label:
+		/* If we ever reach this, we have a bug somewhere. */
+		WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
+		return 0;
+#undef CONT_JMP
+#undef CONT
+#undef A
+#undef X
+#undef K
+}
+
+u32 sk_run_filter_int_seccomp(const struct seccomp_data *ctx,
+			      const struct sock_filter_int *insni)
+    __attribute__ ((alias ("__sk_run_filter")));
+
+u32 sk_run_filter_int_skb(const struct sk_buff *ctx,
+			  const struct sock_filter_int *insni)
+    __attribute__ ((alias ("__sk_run_filter")));
+EXPORT_SYMBOL_GPL(sk_run_filter_int_skb);
+
+/* Helper to find the offset of pkt_type in sk_buff structure. We want
+ * to make sure its still a 3bit field starting at a byte boundary;
+ * taken from arch/x86/net/bpf_jit_comp.c.
+ */
+#define PKT_TYPE_MAX	7
+static unsigned int pkt_type_offset(void)
+{
+	struct sk_buff skb_probe = { .pkt_type = ~0, };
+	u8 *ct = (u8 *) &skb_probe;
+	unsigned int off;
+
+	for (off = 0; off < sizeof(struct sk_buff); off++) {
+		if (ct[off] == PKT_TYPE_MAX)
+			return off;
+	}
+
+	pr_err_once("Please fix %s, as pkt_type couldn't be found!\n", __func__);
+	return -1;
+}
+
+static u64 __skb_get_pay_offset(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+{
+	struct sk_buff *skb = (struct sk_buff *)(long) ctx;
+
+	return __skb_get_poff(skb);
+}
+
+static u64 __skb_get_nlattr(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+{
+	struct sk_buff *skb = (struct sk_buff *)(long) ctx;
+	struct nlattr *nla;
+
+	if (skb_is_nonlinear(skb))
+		return 0;
+
+	if (A > skb->len - sizeof(struct nlattr))
+		return 0;
+
+	nla = nla_find((struct nlattr *) &skb->data[A], skb->len - A, X);
+	if (nla)
+		return (void *) nla - (void *) skb->data;
+
+	return 0;
+}
+
+static u64 __skb_get_nlattr_nest(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+{
+	struct sk_buff *skb = (struct sk_buff *)(long) ctx;
+	struct nlattr *nla;
+
+	if (skb_is_nonlinear(skb))
+		return 0;
+
+	if (A > skb->len - sizeof(struct nlattr))
+		return 0;
+
+	nla = (struct nlattr *) &skb->data[A];
+	if (nla->nla_len > A - skb->len)
+		return 0;
+
+	nla = nla_find_nested(nla, X);
+	if (nla)
+		return (void *) nla - (void *) skb->data;
+
+	return 0;
+}
+
+static u64 __get_raw_cpu_id(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
+{
+	return raw_smp_processor_id();
+}
+
+/* Register mappings for user programs. */
+#define A_REG		6
+#define X_REG		7
+#define TMP_REG		8
+
+static bool convert_bpf_extensions(struct sock_filter *fp,
+				   struct sock_filter_int **insnp)
+{
+	struct sock_filter_int *insn = *insnp;
+
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, protocol) != 2);
+
+		insn->code = BPF_LDX | BPF_MEM | BPF_H;
+		insn->a_reg = A_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = offsetof(struct sk_buff, protocol);
+#ifdef  __LITTLE_ENDIAN
+		insn++;
+
+		/* A = swab32(A) */
+		insn->code = BPF_ALU | BPF_BSWAP | BPF_X;
+		insn->a_reg = A_REG;
+		insn++;
+
+		/* A >>= 16 */
+		insn->code = BPF_ALU | BPF_RSH | BPF_K;
+		insn->a_reg = A_REG;
+		insn->imm = 16;
+#endif /* __LITTLE_ENDIAN */
+		break;
+
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+		insn->code = BPF_LDX | BPF_MEM | BPF_B;
+		insn->a_reg = A_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = pkt_type_offset();
+		if (insn->off < 0)
+			return false;
+		insn++;
+
+		insn->code = BPF_ALU | BPF_AND | BPF_K;
+		insn->a_reg = A_REG;
+		insn->imm = PKT_TYPE_MAX;
+		break;
+
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+		if (FIELD_SIZEOF(struct sk_buff, dev) == 8)
+			insn->code = BPF_LDX | BPF_MEM | BPF_DW;
+		else
+			insn->code = BPF_LDX | BPF_MEM | BPF_W;
+		insn->a_reg = TMP_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = offsetof(struct sk_buff, dev);
+		insn++;
+
+		insn->code = BPF_JMP | BPF_JNE | BPF_K;
+		insn->a_reg = TMP_REG;
+		insn->imm = 0;
+		insn->off = 1;
+		insn++;
+
+		insn->code = BPF_RET | BPF_K;
+		insn++;
+
+		BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, ifindex) != 4);
+		BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, type) != 2);
+
+		insn->a_reg = A_REG;
+		insn->x_reg = TMP_REG;
+
+		if (fp->k == SKF_AD_OFF + SKF_AD_IFINDEX) {
+			insn->code = BPF_LDX | BPF_MEM | BPF_W;
+			insn->off = offsetof(struct net_device, ifindex);
+		} else {
+			insn->code = BPF_LDX | BPF_MEM | BPF_H;
+			insn->off = offsetof(struct net_device, type);
+		}
+		break;
+
+	case SKF_AD_OFF + SKF_AD_MARK:
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
+
+		insn->code = BPF_LDX | BPF_MEM | BPF_W;
+		insn->a_reg = A_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = offsetof(struct sk_buff, mark);
+		break;
+
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, rxhash) != 4);
+
+		insn->code = BPF_LDX | BPF_MEM | BPF_W;
+		insn->a_reg = A_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = offsetof(struct sk_buff, rxhash);
+		break;
+
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, queue_mapping) != 2);
+
+		insn->code = BPF_LDX | BPF_MEM | BPF_H;
+		insn->a_reg = A_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = offsetof(struct sk_buff, queue_mapping);
+		break;
+
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
+
+		insn->code = BPF_LDX | BPF_MEM | BPF_H;
+		insn->a_reg = A_REG;
+		insn->x_reg = CTX_REG;
+		insn->off = offsetof(struct sk_buff, vlan_tci);
+		insn++;
+
+		BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
+
+		if (fp->k == SKF_AD_OFF + SKF_AD_VLAN_TAG) {
+			insn->code = BPF_ALU | BPF_AND | BPF_K;
+			insn->a_reg = A_REG;
+			insn->imm = ~VLAN_TAG_PRESENT;
+		} else {
+			insn->code = BPF_ALU | BPF_RSH | BPF_K;
+			insn->a_reg = A_REG;
+			insn->imm = 12;
+			insn++;
+
+			insn->code = BPF_ALU | BPF_AND | BPF_K;
+			insn->a_reg = A_REG;
+			insn->imm = 1;
+		}
+		break;
+
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+		/* Save ctx */
+		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+		insn->a_reg = TMP_REG;
+		insn->x_reg = CTX_REG;
+		insn++;
+
+		/* arg2 = A */
+		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+		insn->a_reg = 2;
+		insn->x_reg = A_REG;
+		insn++;
+
+		/* arg3 = X */
+		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+		insn->a_reg = 3;
+		insn->x_reg = X_REG;
+		insn++;
+
+		/* Emit call(ctx, arg2=A, arg3=X) */
+		insn->code = BPF_JMP | BPF_CALL;
+		/* Re: sparse ... Share your drugs? High on caffeine ... ;-) */
+		switch (fp->k) {
+		case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+			insn->imm = __skb_get_pay_offset - __bpf_call_base;
+			break;
+		case SKF_AD_OFF + SKF_AD_NLATTR:
+			insn->imm = __skb_get_nlattr - __bpf_call_base;
+			break;
+		case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+			insn->imm = __skb_get_nlattr_nest - __bpf_call_base;
+			break;
+		case SKF_AD_OFF + SKF_AD_CPU:
+			insn->imm = __get_raw_cpu_id - __bpf_call_base;
+			break;
+		}
+		insn++;
+
+		/* Restore ctx */
+		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+		insn->a_reg = CTX_REG;
+		insn->x_reg = TMP_REG;
+		insn++;
+
+		/* Move ret value into A_REG */
+		insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+		insn->a_reg = A_REG;
+		insn->x_reg = 0;
+		break;
+
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		insn->code = BPF_ALU | BPF_XOR | BPF_X;
+		insn->a_reg = A_REG;
+		insn->x_reg = X_REG;
+		break;
+
+	default:
+		/* This is just a dummy call to avoid letting the compiler
+		 * evict __bpf_call_base() as an optimization. Placed here
+		 * where no-one bothers.
+		 */
+		BUG_ON(__bpf_call_base(0, 0, 0, 0, 0) != 0);
+		return false;
+	}
+
+	*insnp = insn;
+	return true;
+}
+
+/**
+ *	sk_convert_filter - convert filter program
+ *	@prog: the user passed filter program
+ *	@len: the length of the user passed filter program
+ *	@new_prog: buffer where converted program will be stored
+ *	@new_len: pointer to store length of converted program
+ *
+ * Remap 'sock_filter' style BPF instruction set to 'sock_filter_ext' style.
+ * Conversion workflow:
+ *
+ * 1) First pass for calculating the new program length:
+ *   sk_convert_filter(old_prog, old_len, NULL, &new_len)
+ *
+ * 2) 2nd pass to remap in two passes: 1st pass finds new
+ *    jump offsets, 2nd pass remapping:
+ *   new_prog = kmalloc(sizeof(struct sock_filter_int) * new_len);
+ *   sk_convert_filter(old_prog, old_len, new_prog, &new_len);
+ *
+ * User BPF's register A is mapped to our BPF register 6, user BPF
+ * register X is mapped to BPF register 7; frame pointer is always
+ * register 10; Context 'void *ctx' is stored in register 1, that is,
+ * for socket filters: ctx == 'struct sk_buff *', for seccomp:
+ * ctx == 'struct seccomp_data *'.
+ */
+int sk_convert_filter(struct sock_filter *prog, int len,
+		      struct sock_filter_int *new_prog, int *new_len)
+{
+	int new_flen = 0, pass = 0, target, i;
+	struct sock_filter_int *new_insn;
+	struct sock_filter *fp;
+	int *addrs = NULL;
+	u8 bpf_src;
+
+	BUILD_BUG_ON(BPF_MEMWORDS * sizeof(u32) > MAX_BPF_STACK);
+	BUILD_BUG_ON(FP_REG + 1 != MAX_BPF_REG);
+
+	if (len <= 0 || len >= BPF_MAXINSNS)
+		return -EINVAL;
+
+	if (new_prog) {
+		addrs = kzalloc(len * sizeof(*addrs), GFP_KERNEL);
+		if (!addrs)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	for (i = 0; i < len; fp++, i++) {
+		struct sock_filter_int tmp_insns[6] = { };
+		struct sock_filter_int *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+		/* All arithmetic insns and skb loads map as-is. */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* Check for overloaded BPF extension and
+			 * directly convert it if found, otherwise
+			 * just move on with mapping.
+			 */
+			if (BPF_CLASS(fp->code) == BPF_LD &&
+			    BPF_MODE(fp->code) == BPF_ABS &&
+			    convert_bpf_extensions(fp, &insn))
+				break;
+
+			insn->code = fp->code;
+			insn->a_reg = A_REG;
+			insn->x_reg = X_REG;
+			insn->imm = fp->k;
+			break;
+
+		/* Jump opcodes map as-is, but offsets need adjustment. */
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+#define EMIT_JMP							\
+	do {								\
+		if (target >= len || target < 0)			\
+			goto err;					\
+		insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0;	\
+		/* Adjust pc relative offset for 2nd or 3rd insn. */	\
+		insn->off -= insn - tmp_insns;				\
+	} while (0)
+
+			EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				insn->code = BPF_ALU | BPF_MOV | BPF_K;
+				insn->a_reg = TMP_REG;
+				insn->imm = fp->k;
+				insn++;
+
+				insn->a_reg = A_REG;
+				insn->x_reg = TMP_REG;
+				bpf_src = BPF_X;
+			} else {
+				insn->a_reg = A_REG;
+				insn->x_reg = X_REG;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
 			}
-			return 0;
-		case BPF_S_LD_B_ABS:
-			k = K;
-load_b:
-			ptr = load_pointer(skb, k, 1, &tmp);
-			if (ptr != NULL) {
-				A = *(u8 *)ptr;
-				continue;
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				EMIT_JMP;
+				break;
 			}
-			return 0;
-		case BPF_S_LD_W_LEN:
-			A = skb->len;
-			continue;
-		case BPF_S_LDX_W_LEN:
-			X = skb->len;
-			continue;
-		case BPF_S_LD_W_IND:
-			k = X + K;
-			goto load_w;
-		case BPF_S_LD_H_IND:
-			k = X + K;
-			goto load_h;
-		case BPF_S_LD_B_IND:
-			k = X + K;
-			goto load_b;
-		case BPF_S_LDX_B_MSH:
-			ptr = load_pointer(skb, K, 1, &tmp);
-			if (ptr != NULL) {
-				X = (*(u8 *)ptr & 0xf) << 2;
-				continue;
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | BPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				EMIT_JMP;
+				break;
 			}
-			return 0;
-		case BPF_S_LD_IMM:
-			A = K;
-			continue;
-		case BPF_S_LDX_IMM:
-			X = K;
-			continue;
-		case BPF_S_LD_MEM:
-			A = mem[K];
-			continue;
-		case BPF_S_LDX_MEM:
-			X = mem[K];
-			continue;
-		case BPF_S_MISC_TAX:
-			X = A;
-			continue;
-		case BPF_S_MISC_TXA:
-			A = X;
-			continue;
-		case BPF_S_RET_K:
-			return K;
-		case BPF_S_RET_A:
-			return A;
-		case BPF_S_ST:
-			mem[K] = A;
-			continue;
-		case BPF_S_STX:
-			mem[K] = X;
-			continue;
-		case BPF_S_ANC_PROTOCOL:
-			A = ntohs(skb->protocol);
-			continue;
-		case BPF_S_ANC_PKTTYPE:
-			A = skb->pkt_type;
-			continue;
-		case BPF_S_ANC_IFINDEX:
-			if (!skb->dev)
-				return 0;
-			A = skb->dev->ifindex;
-			continue;
-		case BPF_S_ANC_MARK:
-			A = skb->mark;
-			continue;
-		case BPF_S_ANC_QUEUE:
-			A = skb->queue_mapping;
-			continue;
-		case BPF_S_ANC_HATYPE:
-			if (!skb->dev)
-				return 0;
-			A = skb->dev->type;
-			continue;
-		case BPF_S_ANC_RXHASH:
-			A = skb->rxhash;
-			continue;
-		case BPF_S_ANC_CPU:
-			A = raw_smp_processor_id();
-			continue;
-		case BPF_S_ANC_VLAN_TAG:
-			A = vlan_tx_tag_get(skb);
-			continue;
-		case BPF_S_ANC_VLAN_TAG_PRESENT:
-			A = !!vlan_tx_tag_present(skb);
-			continue;
-		case BPF_S_ANC_PAY_OFFSET:
-			A = __skb_get_poff(skb);
-			continue;
-		case BPF_S_ANC_NLATTR: {
-			struct nlattr *nla;
-
-			if (skb_is_nonlinear(skb))
-				return 0;
-			if (A > skb->len - sizeof(struct nlattr))
-				return 0;
-
-			nla = nla_find((struct nlattr *)&skb->data[A],
-				       skb->len - A, X);
-			if (nla)
-				A = (void *)nla - (void *)skb->data;
-			else
-				A = 0;
-			continue;
-		}
-		case BPF_S_ANC_NLATTR_NEST: {
-			struct nlattr *nla;
-
-			if (skb_is_nonlinear(skb))
-				return 0;
-			if (A > skb->len - sizeof(struct nlattr))
-				return 0;
-
-			nla = (struct nlattr *)&skb->data[A];
-			if (nla->nla_len > A - skb->len)
-				return 0;
-
-			nla = nla_find_nested(nla, X);
-			if (nla)
-				A = (void *)nla - (void *)skb->data;
-			else
-				A = 0;
-			continue;
-		}
-#ifdef CONFIG_SECCOMP_FILTER
-		case BPF_S_ANC_SECCOMP_LD_W:
-			A = seccomp_bpf_load(fentry->k);
-			continue;
-#endif
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			EMIT_JMP;
+			break;
+
+		/* ldxb 4 * ([14] & 0xf) is remaped into 3 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			insn->code = BPF_LD | BPF_ABS | BPF_B;
+			insn->a_reg = X_REG;
+			insn->imm = fp->k;
+			insn++;
+
+			insn->code = BPF_ALU | BPF_AND | BPF_K;
+			insn->a_reg = X_REG;
+			insn->imm = 0xf;
+			insn++;
+
+			insn->code = BPF_ALU | BPF_LSH | BPF_K;
+			insn->a_reg = X_REG;
+			insn->imm = 2;
+			break;
+
+		/* RET_K, RET_A are remaped into 2 insns. */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			insn->code = BPF_ALU | BPF_MOV |
+				     (BPF_RVAL(fp->code) == BPF_K ?
+				      BPF_K : BPF_X);
+			insn->a_reg = 0;
+			insn->x_reg = A_REG;
+			insn->imm = fp->k;
+			insn++;
+
+			insn->code = BPF_RET | BPF_K;
+			break;
+
+		/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			insn->code = BPF_STX | BPF_MEM | BPF_W;
+			insn->a_reg = FP_REG;
+			insn->x_reg = fp->code == BPF_ST ? A_REG : X_REG;
+			insn->off = -(BPF_MEMWORDS - fp->k) * 4;
+			break;
+
+		/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			insn->code = BPF_LDX | BPF_MEM | BPF_W;
+			insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
+				      A_REG : X_REG;
+			insn->x_reg = FP_REG;
+			insn->off = -(BPF_MEMWORDS - fp->k) * 4;
+			break;
+
+		/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			insn->code = BPF_ALU | BPF_MOV | BPF_K;
+			insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
+				      A_REG : X_REG;
+			insn->imm = fp->k;
+			break;
+
+		/* X = A */
+		case BPF_MISC | BPF_TAX:
+			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+			insn->a_reg = X_REG;
+			insn->x_reg = A_REG;
+			break;
+
+		/* A = X */
+		case BPF_MISC | BPF_TXA:
+			insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
+			insn->a_reg = A_REG;
+			insn->x_reg = X_REG;
+			break;
+
+		/* A = skb->len or X = skb->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			insn->code = BPF_LDX | BPF_MEM | BPF_W;
+			insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
+				      A_REG : X_REG;
+			insn->x_reg = CTX_REG;
+			insn->off = offsetof(struct sk_buff, len);
+			break;
+
+		/* access seccomp_data fields */
+		case BPF_LDX | BPF_ABS | BPF_W:
+			insn->code = BPF_LDX | BPF_MEM | BPF_W;
+			insn->a_reg = A_REG;
+			insn->x_reg = CTX_REG;
+			insn->off = fp->k;
+			break;
+
 		default:
-			WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n",
-				       fentry->code, fentry->jt,
-				       fentry->jf, fentry->k);
-			return 0;
+			goto err;
 		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+
+		new_insn += insn - tmp_insns;
 	}
 
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if (new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+
+		goto do_pass;
+	}
+
+	kfree(addrs);
+	BUG_ON(*new_len != new_flen);
 	return 0;
+err:
+	kfree(addrs);
+	return -EINVAL;
 }
-EXPORT_SYMBOL(sk_run_filter);
 
-/*
- * Security :
+/* Security:
+ *
  * A BPF program is able to use 16 cells of memory to store intermediate
- * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
+ * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter()).
+ *
  * As we dont want to clear mem[] array for each packet going through
  * sk_run_filter(), we check that filter loaded by user never try to read
  * a cell if not previously written, and we check all branches to be sure
@@ -696,19 +1400,130 @@ void sk_filter_charge(struct sock *sk, struct sk_filter *fp)
 	atomic_add(sk_filter_size(fp->len), &sk->sk_omem_alloc);
 }
 
-static int __sk_prepare_filter(struct sk_filter *fp)
+static struct sk_filter *__sk_migrate_realloc(struct sk_filter *fp,
+					      struct sock *sk,
+					      unsigned int len)
+{
+	struct sk_filter *fp_new;
+
+	if (sk == NULL)
+		return krealloc(fp, len, GFP_KERNEL);
+
+	fp_new = sock_kmalloc(sk, len, GFP_KERNEL);
+	if (fp_new) {
+		memcpy(fp_new, fp, sizeof(struct sk_filter));
+		/* As we're kepping orig_prog in fp_new along,
+		 * we need to make sure we're not evicting it
+		 * from the old fp.
+		 */
+		fp->orig_prog = NULL;
+		sk_filter_uncharge(sk, fp);
+	}
+
+	return fp_new;
+}
+
+static struct sk_filter *__sk_migrate_filter(struct sk_filter *fp,
+					     struct sock *sk)
+{
+	struct sock_filter *old_prog;
+	struct sk_filter *old_fp;
+	int i, err, new_len, old_len = fp->len;
+
+	/* We are free to overwrite insns et al right here as it
+	 * won't be used at this point in time anymore internally
+	 * after the migration to the internal BPF instruction
+	 * representation.
+	 */
+	BUILD_BUG_ON(sizeof(struct sock_filter) !=
+		     sizeof(struct sock_filter_int));
+
+	/* For now, we need to unfiddle BPF_S_* identifiers in place.
+	 * This can sooner or later on be subject to removal, e.g. when
+	 * JITs have been converted.
+	 */
+	for (i = 0; i < fp->len; i++)
+		sk_decode_filter(&fp->insns[i], &fp->insns[i]);
+
+	/* Conversion cannot happen on overlapping memory areas,
+	 * so we need to keep the user BPF around until the 2nd
+	 * pass. At this time, the user BPF is stored in fp->insns.
+	 */
+	old_prog = kmemdup(fp->insns, old_len * sizeof(struct sock_filter),
+			   GFP_KERNEL);
+	if (!old_prog) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	/* 1st pass: calculate the new program length. */
+	err = sk_convert_filter(old_prog, old_len, NULL, &new_len);
+	if (err)
+		goto out_err_free;
+
+	/* Expand fp for appending the new filter representation. */
+	old_fp = fp;
+	fp = __sk_migrate_realloc(old_fp, sk, sk_filter_size(new_len));
+	if (!fp) {
+		/* The old_fp is still around in case we couldn't
+		 * allocate new memory, so uncharge on that one.
+		 */
+		fp = old_fp;
+		err = -ENOMEM;
+		goto out_err_free;
+	}
+
+	fp->bpf_func = sk_run_filter_int_skb;
+	fp->len = new_len;
+
+	/* 2nd pass: remap sock_filter insns into sock_filter_int insns. */
+	err = sk_convert_filter(old_prog, old_len, fp->insnsi, &new_len);
+	if (err)
+		/* 2nd sk_convert_filter() can fail only if it fails
+		 * to allocate memory, remapping must succeed. Note,
+		 * that at this time old_fp has already been released
+		 * by __sk_migrate_realloc().
+		 */
+		goto out_err_free;
+
+	kfree(old_prog);
+	return fp;
+
+out_err_free:
+	kfree(old_prog);
+out_err:
+	/* Rollback filter setup. */
+	if (sk != NULL)
+		sk_filter_uncharge(sk, fp);
+	else
+		kfree(fp);
+	return ERR_PTR(err);
+}
+
+static struct sk_filter *__sk_prepare_filter(struct sk_filter *fp,
+					     struct sock *sk)
 {
 	int err;
 
-	fp->bpf_func = sk_run_filter;
+	fp->bpf_func = NULL;
 	fp->jited = 0;
 
 	err = sk_chk_filter(fp->insns, fp->len);
 	if (err)
-		return err;
+		return ERR_PTR(err);
 
+	/* Probe if we can JIT compile the filter and if so, do
+	 * the compilation of the filter.
+	 */
 	bpf_jit_compile(fp);
-	return 0;
+
+	/* JIT compiler couldn't process this filter, so do the
+	 * internal BPF translation for the optimized interpreter.
+	 */
+	if (!fp->jited)
+		fp = __sk_migrate_filter(fp, sk);
+
+	return fp;
 }
 
 /**
@@ -726,7 +1541,6 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
 {
 	unsigned int fsize = sk_filter_proglen(fprog);
 	struct sk_filter *fp;
-	int err;
 
 	/* Make sure new filter is there and in the right amounts. */
 	if (fprog->filter == NULL)
@@ -746,15 +1560,15 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
 	 */
 	fp->orig_prog = NULL;
 
-	err = __sk_prepare_filter(fp);
-	if (err)
-		goto free_mem;
+	/* __sk_prepare_filter() already takes care of uncharging
+	 * memory in case something goes wrong.
+	 */
+	fp = __sk_prepare_filter(fp, NULL);
+	if (IS_ERR(fp))
+		return PTR_ERR(fp);
 
 	*pfp = fp;
 	return 0;
-free_mem:
-	kfree(fp);
-	return err;
 }
 EXPORT_SYMBOL_GPL(sk_unattached_filter_create);
 
@@ -806,11 +1620,12 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 		return -ENOMEM;
 	}
 
-	err = __sk_prepare_filter(fp);
-	if (err) {
-		sk_filter_uncharge(sk, fp);
-		return err;
-	}
+	/* __sk_prepare_filter() already takes care of uncharging
+	 * memory in case something goes wrong.
+	 */
+	fp = __sk_prepare_filter(fp, sk);
+	if (IS_ERR(fp))
+		return PTR_ERR(fp);
 
 	old_fp = rcu_dereference_protected(sk->sk_filter,
 					   sock_owned_by_user(sk));
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 9/9] doc: filter: extend BPF documentation to document new internals
  2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
                   ` (7 preceding siblings ...)
  2014-03-21 12:20 ` [PATCH net-next 8/9] net: filter: rework/optimize internal BPF interpreter's instruction set Daniel Borkmann
@ 2014-03-21 12:20 ` Daniel Borkmann
  8 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2014-03-21 12:20 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev

From: Alexei Starovoitov <ast@plumgrid.com>

Further extend the current BPF documentation to document new BPF
engine internals. Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 Documentation/networking/filter.txt | 147 ++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index a06b48d..13a58d5 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -546,6 +546,152 @@ ffffffffa0069c8f + <x>:
 For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful
 toolchain for developing and testing the kernel's JIT compiler.
 
+BPF kernel internals
+--------------------
+Internally for its interpreter the kernel uses a different BPF instruction
+set format with similar underlying principles from the BPF described in
+previous paragraphs. However, the instruction set format is modeled closer
+to the underlying architecture instruction set so that a better performance
+can be achieved (more details later).
+
+It is designed to be JITed with one to one mapping, which can also open up
+the possibility for GCC/LLVM compilers to generate optimized BPF code through
+a BPF backend that performs almost as fast as natively compiled code.
+
+The new instruction set was originally designed with the possible goal in
+mind to write programs in "restricted C" and compile into BPF with a optional
+GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with
+minimal performance overhead over two steps, that is, C -> BPF -> native code.
+
+Currently, the new format is being used for running user BPF programs, which
+includes seccomp BPF, classic socket filters, cls_bpf traffic classifier,
+team driver's classifier for its load-balancing mode, netfilter's xt_bpf
+extension, PTP dissector/classifier, and much more. They are all internally
+converted by the kernel into the new instruction set representation and run
+in the extended interpreter. For in-kernel handlers, this all works
+transparently by using sk_unattached_filter_create() for setting up the
+filter, resp. sk_unattached_filter_destroy() for destroying it. The macro
+SK_RUN_FILTER(filter, ctx) transparently invokes the right BPF function to
+run the filter. 'filter' is a pointer to struct sk_filter that we got from
+sk_unattached_filter_create(), and 'ctx' the given context (e.g. skb pointer).
+
+Currently, for JITing, the user BPF format is being used and current BPF JIT
+compilers reused whenever possible. In other words, we do not (yet!) perform
+a JIT compilation in new the layout, however, future work will successively
+migrate traditional JIT compilers into the new instruction format as well, so
+that they will profit from the very same benefits. So, when speaking about
+JIT in the following, a JIT compiler (TBD) for the new instruction format is
+meant in this context.
+
+The internal format extends BPF in the following way:
+
+- Number of registers increase from 2 to 10:
+
+  The old format had two registers A and X, and a hidden frame pointer. The
+  new layout extends this to be 10 internal registers and a read-only frame
+  pointer. Since 64-bit CPUs are passing arguments to functions via registers
+  the number of args from BPF program to in-kernel function is restricted
+  to 5 and one register is used to accept return value from an in-kernel
+  function. Natively, x86_64 passes first 6 arguments in registers, aarch64/
+  sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved
+  registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers.
+
+  Therefore, BPF calling convention is defined as:
+
+    * R0	- return value from in-kernel function
+    * R1 - R5	- arguments from BPF program to in-kernel function
+    * R6 - R9	- callee saved registers that in-kernel function will preserve
+    * R10	- read-only frame pointer to access stack
+
+  Thus, all BPF registers map one to one to HW registers on x86_64, aarch64,
+  etc, and BPF calling convention maps directly to ABIs used by the kernel on
+  64-bit architectures.
+
+  On 32-bit architectures JIT may map programs that use only 32-bit arithmetic
+  and may let more complex programs to be interpreted.
+
+  R0 - R5 are scratch registers and BPF program needs spill/fill them if
+  necessary across calls. Note that there is only one BPF program (== one BPF
+  main routine) and it cannot call other BPF functions, it can only call
+  predefined in-kernel functions, though.
+
+- Register width increases from 32-bit to 64-bit:
+
+  Still, the semantics of the original 32-bit ALU operations are preserved
+  via 32-bit subregisters. All BPF registers are 64-bit with 32-bit lower
+  subregisters that zero-extend into 64-bit if they are being written to.
+  That behavior maps directly to x86_64 and arm64 subregister definition, but
+  makes other JITs more difficult.
+
+  32-bit architectures run 64-bit internal BPF programs via interpreter.
+  Their JITs may convert BPF programs that only use 32-bit subregisters into
+  native instruction set and let the rest being interpreted.
+
+  Operation is 64-bit, because on 64-bit architectures, pointers are also
+  64-bit wide, and we want to pass 64-bit values in/out of kernel functions,
+  so 32-bit BPF registers would otherwise require to define register-pair
+  ABI, thus, there won't be able to use a direct BPF register to HW register
+  mapping and JIT would need to do combine/split/move operations for every
+  register in and out of the function, which is complex, bug prone and slow.
+  Another reason is the use of atomic 64-bit counters.
+
+- Conditional jt/jf targets replaced with jt/fall-through, and forward/backward
+  jumps now possible:
+
+  While the original design has constructs such as "if (cond) jump_true;
+  else jump_false;", they are being replaced into alternative constructs like
+  "if (cond) jump_true; /* else fall-through */".
+
+  The new BPF format may also allow jumps forward and backwards for two
+  reasons: i) to reduce branch mis-predict penalty, the compiler moves cold
+  basic blocks out of the fall-through path, and ii) to reduce code duplication
+  that would be hard to avoid if only jump forward was available.
+
+- Adds signed > and >= insns
+
+- 16 4-byte stack slots for register spill-fill replaced with up to 512 bytes
+  of multi-use stack space
+
+- Introduces bpf_call insn and register passing convention for zero overhead
+  calls from/to other kernel functions:
+
+  After a kernel function call, R1 - R5 are reset to unreadable and R0 has a
+  return type of the function. Since R6 - R9 are callee saved, their state is
+  preserved across the call.
+
+- Adds arithmetic right shift insn
+
+- Adds swab insns for 32/64-bit
+
+  The new BPF format doesn't have pre-defined endianness not to favor one
+  architecture vs another. Therefore, bswap insn is available. Original BPF
+  doesn't have such insn and does bswap as part of sk_load_word call which is
+  often unnecessary, if we want to compare a value with a constant.
+
+- Adds atomic_add insn
+
+- Old tax/txa insns are replaced with 'mov dst,src' insn
+
+Also in the new design, BPF is limited to 4096 insns, which means that any
+program will terminate quickly and will only call a fixed number of kernel
+functions. Original BPF and the new format are two operand instructions,
+which helps to do one-to-one mapping between BPF insn and x86 insn during JIT.
+
+The input context pointer for invoking the interpreter function is generic,
+its content is defined by a specific use case. For seccomp register R1 points
+to seccomp_data, for converted BPF filters R1 points to a skb.
+
+A program, that is translated internally consists of the following elements:
+
+  op:16, jt:8, jf:8, k:32    ==>    op:8, a_reg:4, x_reg:4, off:16, imm:32
+
+Just like the original BPF, the new format runs within a controlled environment,
+is deterministic and the kernel can easily prove that. The safety of the program
+can be determined in two steps: first step does depth-first-search to disallow
+loops and other CFG validation; second step starts from the first insn and
+descends all possible paths. It simulates execution of every insn and observes
+the state change of registers and stack.
+
 Misc
 ----
 
@@ -561,3 +707,4 @@ the underlying architecture.
 
 Jay Schulist <jschlst@samba.org>
 Daniel Borkmann <dborkman@redhat.com>
+Alexei Starovoitov <ast@plumgrid.com>
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 8/9] net: filter: rework/optimize internal BPF interpreter's instruction set
  2014-03-21 12:20 ` [PATCH net-next 8/9] net: filter: rework/optimize internal BPF interpreter's instruction set Daniel Borkmann
@ 2014-03-21 15:40   ` Kees Cook
  0 siblings, 0 replies; 13+ messages in thread
From: Kees Cook @ 2014-03-21 15:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David S. Miller, Alexei Starovoitov, netdev, Hagen Paul Pfeifer,
	Paul Moore, Ingo Molnar, H. Peter Anvin, LKML, Will Drewry

On Fri, Mar 21, 2014 at 6:20 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> From: Alexei Starovoitov <ast@plumgrid.com>
>
> This patch replaces/reworks the kernel-internel BPF interpreter with
> an optimized BPF instruction set format that is modelled closer to
> mimic native instruction sets and is designed to be JITed with one to
> one mapping. Thus, the new interpreter is noticeably faster than the
> current implementation of sk_run_filter(); mainly for two reasons:
>
> 1. Fall-through jumps:
>
>   BPF jump instructions are forced to go either 'true' or 'false'
>   branch which causes branch-miss penalty. The new BPF jump
>   instructions have only one branch and fall-through otherwise,
>   which fits the CPU branch predictor logic better. `perf stat`
>   shows drastic difference for branch-misses between the old and
>   new code.
>
> 2. Jump-threaded implementation of interpreter vs switch
>    statement:
>
>   Instead of single tablejump at the top of 'switch' statement,
>   gcc will now generate multiple tablejump instructions, which
>   helps CPU branch predictor logic.
>
> In short, the internal format extends BPF in the following way (more
> details can be taken from the appended documentation):
>
>   - Number of registers increase from 2 to 10
>   - Register width increases from 32-bit to 64-bit
>   - Conditional jt/jf targets replaced with jt/fall-through,
>     and forward/backward jumps now possible as well
>   - Adds signed > and >= insns
>   - 16 4-byte stack slots for register spill-fill replaced
>     with up to 512 bytes of multi-use stack space
>   - Introduction of bpf_call insn and register passing convention
>     for zero overhead calls from/to other kernel functions
>   - Adds arithmetic right shift insn
>   - Adds swab insns for 32/64-bit
>   - Adds atomic_add insn
>   - Old tax/txa insns are replaced with 'mov dst,src' insn
>
> Note that the verification of filters is still being done through
> sk_chk_filter(), so filters from user- or kernel space are verified
> in the same way as we do now. We reuse current BPF JIT compilers
> in a way that this upgrade would even be fine as is, but nevertheless
> allows for a successive upgrade of BPF JIT compilers to the new
> format. The internal instruction set migration is being done after
> the probing for JIT compilation, so in case JIT compilers are able
> to create a native opcode image, we're going to use that, and in all
> other cases we're doing a follow-up migration of the BPG program's
> instruction set, so that it can be transparently run in the new
> interpreter.
>
> Performance of two BPF filters generated by libpcap resp. bpf_asm
> was measured on x86_64, i386 and arm32 (other libpcap programs
> have similar performance differences):
>
> fprog #1 is taken from Documentation/networking/filter.txt:
> tcpdump -i eth0 port 22 -dd
>
> fprog #2 is taken from 'man tcpdump':
> tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
>    ((tcp[12]&0xf0)>>2)) != 0)' -dd
>
> Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
> same SKB (cache-hit) or 10k SKBs (cache-miss); time in nsec per
> call, smaller is better:
>
> --x86_64--
>          fprog #1  fprog #1   fprog #2  fprog #2
>          cache-hit cache-miss cache-hit cache-miss
> old BPF      90       101        192       202
> new BPF      31        71         47        97
> old BPF jit  12        34         17        44
> new BPF jit TBD
>
> --i386--
>          fprog #1  fprog #1   fprog #2  fprog #2
>          cache-hit cache-miss cache-hit cache-miss
> old BPF     107       136        227       252
> new BPF      40       119         69       172
>
> --arm32--
>          fprog #1  fprog #1   fprog #2  fprog #2
>          cache-hit cache-miss cache-hit cache-miss
> old BPF     202       300        475       540
> new BPF     180       270        330       470
> old BPF jit  26       182         37       202
> new BPF jit TBD
>
> Thus, without changing any userland BPF filters, applications on
> top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
> classifier, netfilter's xt_bpf, team driver's load-balancing mode,
> and many more will have better interpreter filtering performance.
>
> While we are replacing the internal BPF interpreter, we also need
> to convert seccomp BPF in the same step to make use of the new
> internal structure since it makes use of lower-level API details
> without being further decoupled through higher-level calls like
> sk_unattached_filter_{create,destroy}(), for example.
>
> Just as for normal socket filtering, also seccomp BPF experiences
> a time-to-verdict speedup:
>
> 05-sim-long_jumps.c of libseccomp was used as micro-benchmark:
>
>   seccomp_rule_add_exact(ctx,...
>   seccomp_rule_add_exact(ctx,...
>
>   rc = seccomp_load(ctx);
>
>   for (i = 0; i < 10000000; i++)
>      syscall(199, 100);
>
> 'short filter' has 2 rules
> 'large filter' has 200 rules
>
> 'short filter' performance is slightly better on x86_64/i386/arm32
> 'large filter' is much faster on x86_64 and i386 and shows no
>                difference on arm32
>
> --x86_64-- short filter
> old BPF: 2.7 sec
>  39.12%  bench  libc-2.15.so       [.] syscall
>   8.10%  bench  [kernel.kallsyms]  [k] sk_run_filter
>   6.31%  bench  [kernel.kallsyms]  [k] system_call
>   5.59%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
>   4.37%  bench  [kernel.kallsyms]  [k] trace_hardirqs_off_caller
>   3.70%  bench  [kernel.kallsyms]  [k] __secure_computing
>   3.67%  bench  [kernel.kallsyms]  [k] lock_is_held
>   3.03%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
> new BPF: 2.58 sec
>  42.05%  bench  libc-2.15.so       [.] syscall
>   6.91%  bench  [kernel.kallsyms]  [k] system_call
>   6.25%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
>   6.07%  bench  [kernel.kallsyms]  [k] __secure_computing
>   5.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
>
> --arm32-- short filter
> old BPF: 4.0 sec
>  39.92%  bench  [kernel.kallsyms]  [k] vector_swi
>  16.60%  bench  [kernel.kallsyms]  [k] sk_run_filter
>  14.66%  bench  libc-2.17.so       [.] syscall
>   5.42%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
>   5.10%  bench  [kernel.kallsyms]  [k] __secure_computing
> new BPF: 3.7 sec
>  35.93%  bench  [kernel.kallsyms]  [k] vector_swi
>  21.89%  bench  libc-2.17.so       [.] syscall
>  13.45%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
>   6.25%  bench  [kernel.kallsyms]  [k] __secure_computing
>   3.96%  bench  [kernel.kallsyms]  [k] syscall_trace_exit
>
> --x86_64-- large filter
> old BPF: 8.6 seconds
>     73.38%    bench  [kernel.kallsyms]  [k] sk_run_filter
>     10.70%    bench  libc-2.15.so       [.] syscall
>      5.09%    bench  [kernel.kallsyms]  [k] seccomp_bpf_load
>      1.97%    bench  [kernel.kallsyms]  [k] system_call
> new BPF: 5.7 seconds
>     66.20%    bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
>     16.75%    bench  libc-2.15.so       [.] syscall
>      3.31%    bench  [kernel.kallsyms]  [k] system_call
>      2.88%    bench  [kernel.kallsyms]  [k] __secure_computing
>
> --i386-- large filter
> old BPF: 5.4 sec
> new BPF: 3.8 sec
>
> --arm32-- large filter
> old BPF: 13.5 sec
>  73.88%  bench  [kernel.kallsyms]  [k] sk_run_filter
>  10.29%  bench  [kernel.kallsyms]  [k] vector_swi
>   6.46%  bench  libc-2.17.so       [.] syscall
>   2.94%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
>   1.19%  bench  [kernel.kallsyms]  [k] __secure_computing
>   0.87%  bench  [kernel.kallsyms]  [k] sys_getuid
> new BPF: 13.5 sec
>  76.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
>  10.98%  bench  [kernel.kallsyms]  [k] vector_swi
>   5.87%  bench  libc-2.17.so       [.] syscall
>   1.77%  bench  [kernel.kallsyms]  [k] __secure_computing
>   0.93%  bench  [kernel.kallsyms]  [k] sys_getuid
>
> BPF filters generated by seccomp are very branchy, so the new
> internal BPF performance is better than the old one. Performance
> gains will be even higher when BPF JIT is committed for the
> new structure, which is planned in future work (as successive
> JIT migrations).
>
> BPF has also been stress-tested with trinity's BPF fuzzer.
>
> Joint work with Daniel Borkmann.
>
> References: http://thread.gmane.org/gmane.linux.kernel/1665858
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <pmoore@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: H. Peter Anvin <hpa@linux.intel.com>
> Cc: linux-kernel@vger.kernel.org

This looks great, thanks for all the seccomp testing!

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  v1 -> v10 history at:
>   - http://thread.gmane.org/gmane.linux.kernel/1665858
>
>  include/linux/filter.h  |   66 ++-
>  include/linux/seccomp.h |    1 -
>  kernel/seccomp.c        |  119 ++--
>  net/core/filter.c       | 1415 +++++++++++++++++++++++++++++++++++++----------
>  4 files changed, 1229 insertions(+), 372 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 9bde3ed..3ea12fa 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -9,13 +9,50 @@
>  #include <linux/workqueue.h>
>  #include <uapi/linux/filter.h>
>
> -#ifdef CONFIG_COMPAT
> -/*
> - * A struct sock_filter is architecture independent.
> +/* Internally used and optimized filter representation with extended
> + * instruction set based on top of classic BPF.
>   */
> +
> +/* instruction classes */
> +#define BPF_ALU64      0x07    /* alu mode in double word width */
> +
> +/* ld/ldx fields */
> +#define BPF_DW         0x18    /* double word */
> +#define BPF_XADD       0xc0    /* exclusive add */
> +
> +/* alu/jmp fields */
> +#define BPF_MOV                0xb0    /* mov reg to reg */
> +#define BPF_ARSH       0xc0    /* sign extending arithmetic shift right */
> +#define BPF_BSWAP      0xd0    /* swap 4 or 8 bytes of 64-bit register */
> +
> +#define BPF_JNE                0x50    /* jump != */
> +#define BPF_JSGT       0x60    /* SGT is signed '>', GT in x86 */
> +#define BPF_JSGE       0x70    /* SGE is signed '>=', GE in x86 */
> +#define BPF_CALL       0x80    /* function call */
> +
> +/* BPF has 10 general purpose 64-bit registers and stack frame. */
> +#define MAX_BPF_REG    11
> +
> +/* BPF program can access up to 512 bytes of stack space. */
> +#define MAX_BPF_STACK  512
> +
> +/* Context and stack frame pointer register positions. */
> +#define CTX_REG                1
> +#define FP_REG         10
> +
> +struct sock_filter_int {
> +       __u8    code;           /* opcode */
> +       __u8    a_reg:4;        /* dest register */
> +       __u8    x_reg:4;        /* source register */
> +       __s16   off;            /* signed offset */
> +       __s32   imm;            /* signed immediate constant */
> +};
> +
> +#ifdef CONFIG_COMPAT
> +/* A struct sock_filter is architecture independent. */
>  struct compat_sock_fprog {
>         u16             len;
> -       compat_uptr_t   filter;         /* struct sock_filter * */
> +       compat_uptr_t   filter; /* struct sock_filter * */
>  };
>  #endif
>
> @@ -26,6 +63,7 @@ struct sock_fprog_kern {
>
>  struct sk_buff;
>  struct sock;
> +struct seccomp_data;
>
>  struct sk_filter {
>         atomic_t                refcnt;
> @@ -34,9 +72,10 @@ struct sk_filter {
>         struct sock_fprog_kern  *orig_prog;     /* Original BPF program */
>         struct rcu_head         rcu;
>         unsigned int            (*bpf_func)(const struct sk_buff *skb,
> -                                           const struct sock_filter *filter);
> +                                           const struct sock_filter_int *filter);
>         union {
> -               struct sock_filter      insns[0];
> +               struct sock_filter      insns[0];
> +               struct sock_filter_int  insnsi[0];
>                 struct work_struct      work;
>         };
>  };
> @@ -50,9 +89,18 @@ static inline unsigned int sk_filter_size(unsigned int proglen)
>  #define sk_filter_proglen(fprog)                       \
>                 (fprog->len * sizeof(fprog->filter[0]))
>
> +#define SK_RUN_FILTER(filter, ctx)                     \
> +               (*filter->bpf_func)(ctx, filter->insnsi)
> +
>  int sk_filter(struct sock *sk, struct sk_buff *skb);
> -unsigned int sk_run_filter(const struct sk_buff *skb,
> -                          const struct sock_filter *filter);
> +
> +u32 sk_run_filter_int_seccomp(const struct seccomp_data *ctx,
> +                             const struct sock_filter_int *insni);
> +u32 sk_run_filter_int_skb(const struct sk_buff *ctx,
> +                         const struct sock_filter_int *insni);
> +
> +int sk_convert_filter(struct sock_filter *prog, int len,
> +                     struct sock_filter_int *new_prog, int *new_len);
>
>  int sk_unattached_filter_create(struct sk_filter **pfp,
>                                 struct sock_fprog *fprog);
> @@ -86,7 +134,6 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
>                 print_hex_dump(KERN_ERR, "JIT code: ", DUMP_PREFIX_OFFSET,
>                                16, 1, image, proglen, false);
>  }
> -#define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
>  #else
>  #include <linux/slab.h>
>  static inline void bpf_jit_compile(struct sk_filter *fp)
> @@ -96,7 +143,6 @@ static inline void bpf_jit_free(struct sk_filter *fp)
>  {
>         kfree(fp);
>  }
> -#define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
>  #endif
>
>  static inline int bpf_tell_extensions(void)
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index 6f19cfd..4054b09 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -76,7 +76,6 @@ static inline int seccomp_mode(struct seccomp *s)
>  #ifdef CONFIG_SECCOMP_FILTER
>  extern void put_seccomp_filter(struct task_struct *tsk);
>  extern void get_seccomp_filter(struct task_struct *tsk);
> -extern u32 seccomp_bpf_load(int off);
>  #else  /* CONFIG_SECCOMP_FILTER */
>  static inline void put_seccomp_filter(struct task_struct *tsk)
>  {
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b7a1004..4f18e75 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -55,60 +55,33 @@ struct seccomp_filter {
>         atomic_t usage;
>         struct seccomp_filter *prev;
>         unsigned short len;  /* Instruction count */
> -       struct sock_filter insns[];
> +       struct sock_filter_int insnsi[];
>  };
>
>  /* Limit any path through the tree to 256KB worth of instructions. */
>  #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter))
>
> -/**
> - * get_u32 - returns a u32 offset into data
> - * @data: a unsigned 64 bit value
> - * @index: 0 or 1 to return the first or second 32-bits
> - *
> - * This inline exists to hide the length of unsigned long.  If a 32-bit
> - * unsigned long is passed in, it will be extended and the top 32-bits will be
> - * 0. If it is a 64-bit unsigned long, then whatever data is resident will be
> - * properly returned.
> - *
> +/*
>   * Endianness is explicitly ignored and left for BPF program authors to manage
>   * as per the specific architecture.
>   */
> -static inline u32 get_u32(u64 data, int index)
> +static void populate_seccomp_data(struct seccomp_data *sd)
>  {
> -       return ((u32 *)&data)[index];
> -}
> +       struct task_struct *task = current;
> +       struct pt_regs *regs = task_pt_regs(task);
>
> -/* Helper for bpf_load below. */
> -#define BPF_DATA(_name) offsetof(struct seccomp_data, _name)
> -/**
> - * bpf_load: checks and returns a pointer to the requested offset
> - * @off: offset into struct seccomp_data to load from
> - *
> - * Returns the requested 32-bits of data.
> - * seccomp_check_filter() should assure that @off is 32-bit aligned
> - * and not out of bounds.  Failure to do so is a BUG.
> - */
> -u32 seccomp_bpf_load(int off)
> -{
> -       struct pt_regs *regs = task_pt_regs(current);
> -       if (off == BPF_DATA(nr))
> -               return syscall_get_nr(current, regs);
> -       if (off == BPF_DATA(arch))
> -               return syscall_get_arch(current, regs);
> -       if (off >= BPF_DATA(args[0]) && off < BPF_DATA(args[6])) {
> -               unsigned long value;
> -               int arg = (off - BPF_DATA(args[0])) / sizeof(u64);
> -               int index = !!(off % sizeof(u64));
> -               syscall_get_arguments(current, regs, arg, 1, &value);
> -               return get_u32(value, index);
> -       }
> -       if (off == BPF_DATA(instruction_pointer))
> -               return get_u32(KSTK_EIP(current), 0);
> -       if (off == BPF_DATA(instruction_pointer) + sizeof(u32))
> -               return get_u32(KSTK_EIP(current), 1);
> -       /* seccomp_check_filter should make this impossible. */
> -       BUG();
> +       sd->nr = syscall_get_nr(task, regs);
> +       sd->arch = syscall_get_arch(task, regs);
> +
> +       /* Unroll syscall_get_args to help gcc on arm. */
> +       syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
> +       syscall_get_arguments(task, regs, 1, 1, (unsigned long *) &sd->args[1]);
> +       syscall_get_arguments(task, regs, 2, 1, (unsigned long *) &sd->args[2]);
> +       syscall_get_arguments(task, regs, 3, 1, (unsigned long *) &sd->args[3]);
> +       syscall_get_arguments(task, regs, 4, 1, (unsigned long *) &sd->args[4]);
> +       syscall_get_arguments(task, regs, 5, 1, (unsigned long *) &sd->args[5]);
> +
> +       sd->instruction_pointer = KSTK_EIP(task);
>  }
>
>  /**
> @@ -133,17 +106,17 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>
>                 switch (code) {
>                 case BPF_S_LD_W_ABS:
> -                       ftest->code = BPF_S_ANC_SECCOMP_LD_W;
> +                       ftest->code = BPF_LDX | BPF_W | BPF_ABS;
>                         /* 32-bit aligned and not out of bounds. */
>                         if (k >= sizeof(struct seccomp_data) || k & 3)
>                                 return -EINVAL;
>                         continue;
>                 case BPF_S_LD_W_LEN:
> -                       ftest->code = BPF_S_LD_IMM;
> +                       ftest->code = BPF_LD | BPF_IMM;
>                         ftest->k = sizeof(struct seccomp_data);
>                         continue;
>                 case BPF_S_LDX_W_LEN:
> -                       ftest->code = BPF_S_LDX_IMM;
> +                       ftest->code = BPF_LDX | BPF_IMM;
>                         ftest->k = sizeof(struct seccomp_data);
>                         continue;
>                 /* Explicitly include allowed calls. */
> @@ -185,6 +158,7 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>                 case BPF_S_JMP_JGT_X:
>                 case BPF_S_JMP_JSET_K:
>                 case BPF_S_JMP_JSET_X:
> +                       sk_decode_filter(ftest, ftest);
>                         continue;
>                 default:
>                         return -EINVAL;
> @@ -202,18 +176,21 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>  static u32 seccomp_run_filters(int syscall)
>  {
>         struct seccomp_filter *f;
> +       struct seccomp_data sd;
>         u32 ret = SECCOMP_RET_ALLOW;
>
>         /* Ensure unexpected behavior doesn't result in failing open. */
>         if (WARN_ON(current->seccomp.filter == NULL))
>                 return SECCOMP_RET_KILL;
>
> +       populate_seccomp_data(&sd);
> +
>         /*
>          * All filters in the list are evaluated and the lowest BPF return
>          * value always takes priority (ignoring the DATA).
>          */
>         for (f = current->seccomp.filter; f; f = f->prev) {
> -               u32 cur_ret = sk_run_filter(NULL, f->insns);
> +               u32 cur_ret = sk_run_filter_int_seccomp(&sd, f->insnsi);
>                 if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
>                         ret = cur_ret;
>         }
> @@ -231,6 +208,8 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
>         struct seccomp_filter *filter;
>         unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
>         unsigned long total_insns = fprog->len;
> +       struct sock_filter *fp;
> +       int new_len;
>         long ret;
>
>         if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
> @@ -252,28 +231,43 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
>                                      CAP_SYS_ADMIN) != 0)
>                 return -EACCES;
>
> -       /* Allocate a new seccomp_filter */
> -       filter = kzalloc(sizeof(struct seccomp_filter) + fp_size,
> -                        GFP_KERNEL|__GFP_NOWARN);
> -       if (!filter)
> +       fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
> +       if (!fp)
>                 return -ENOMEM;
> -       atomic_set(&filter->usage, 1);
> -       filter->len = fprog->len;
>
>         /* Copy the instructions from fprog. */
>         ret = -EFAULT;
> -       if (copy_from_user(filter->insns, fprog->filter, fp_size))
> -               goto fail;
> +       if (copy_from_user(fp, fprog->filter, fp_size))
> +               goto free_prog;
>
>         /* Check and rewrite the fprog via the skb checker */
> -       ret = sk_chk_filter(filter->insns, filter->len);
> +       ret = sk_chk_filter(fp, fprog->len);
>         if (ret)
> -               goto fail;
> +               goto free_prog;
>
>         /* Check and rewrite the fprog for seccomp use */
> -       ret = seccomp_check_filter(filter->insns, filter->len);
> +       ret = seccomp_check_filter(fp, fprog->len);
> +       if (ret)
> +               goto free_prog;
> +
> +       /* Convert 'sock_filter' insns to 'sock_filter_int' insns */
> +       ret = sk_convert_filter(fp, fprog->len, NULL, &new_len);
> +       if (ret)
> +               goto free_prog;
> +
> +       /* Allocate a new seccomp_filter */
> +       filter = kzalloc(sizeof(struct seccomp_filter) +
> +                        sizeof(struct sock_filter_int) * new_len,
> +                        GFP_KERNEL|__GFP_NOWARN);
> +       if (!filter)
> +               goto free_prog;
> +
> +       ret = sk_convert_filter(fp, fprog->len, filter->insnsi, &new_len);
>         if (ret)
> -               goto fail;
> +               goto free_filter;
> +
> +       atomic_set(&filter->usage, 1);
> +       filter->len = new_len;
>
>         /*
>          * If there is an existing filter, make it the prev and don't drop its
> @@ -282,8 +276,11 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
>         filter->prev = current->seccomp.filter;
>         current->seccomp.filter = filter;
>         return 0;
> -fail:
> +
> +free_filter:
>         kfree(filter);
> +free_prog:
> +       kfree(fp);
>         return ret;
>  }
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 976edc6..683f1e8 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1,11 +1,16 @@
>  /*
>   * Linux Socket Filter - Kernel level socket filtering
>   *
> - * Author:
> - *     Jay Schulist <jschlst@samba.org>
> + * Based on the design of the Berkeley Packet Filter. The new
> + * internal format has been designed by PLUMgrid:
>   *
> - * Based on the design of:
> - *     - The Berkeley Packet Filter
> + *     Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
> + *
> + * Authors:
> + *
> + *     Jay Schulist <jschlst@samba.org>
> + *     Alexei Starovoitov <ast@plumgrid.com>
> + *     Daniel Borkmann <dborkman@redhat.com>
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -35,6 +40,7 @@
>  #include <linux/timer.h>
>  #include <asm/uaccess.h>
>  #include <asm/unaligned.h>
> +#include <asm/byteorder.h>
>  #include <linux/filter.h>
>  #include <linux/ratelimit.h>
>  #include <linux/seccomp.h>
> @@ -108,304 +114,1002 @@ int sk_filter(struct sock *sk, struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(sk_filter);
>
> +/* Base function for offset calculation. Needs to go into .text section,
> + * therefore keeping it non-static as well; will also be used by JITs
> + * anyway later on, so do not let the compiler omit it.
> + */
> +noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
> +{
> +       return 0;
> +}
> +
>  /**
> - *     sk_run_filter - run a filter on a socket
> - *     @skb: buffer to run the filter on
> + *     __sk_run_filter - run a filter on a given context
> + *     @ctx: buffer to run the filter on
>   *     @fentry: filter to apply
>   *
> - * Decode and apply filter instructions to the skb->data.
> - * Return length to keep, 0 for none. @skb is the data we are
> - * filtering, @filter is the array of filter instructions.
> - * Because all jumps are guaranteed to be before last instruction,
> - * and last instruction guaranteed to be a RET, we dont need to check
> - * flen. (We used to pass to this function the length of filter)
> + * Decode and apply filter instructions to the skb->data. Return length to
> + * keep, 0 for none. @ctx is the data we are operating on, @filter is the
> + * array of filter instructions.
>   */
> -unsigned int sk_run_filter(const struct sk_buff *skb,
> -                          const struct sock_filter *fentry)
> +unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
>  {
> +       u64 stack[MAX_BPF_STACK / sizeof(u64)];
> +       u64 regs[MAX_BPF_REG], tmp;
>         void *ptr;
> -       u32 A = 0;                      /* Accumulator */
> -       u32 X = 0;                      /* Index Register */
> -       u32 mem[BPF_MEMWORDS];          /* Scratch Memory Store */
> -       u32 tmp;
> -       int k;
> +       int off;
> +
> +#define K insn->imm
> +#define A regs[insn->a_reg]
> +#define X regs[insn->x_reg]
> +
> +#define CONT    ({insn++; goto select_insn; })
> +#define CONT_JMP ({insn++; goto select_insn; })
> +
> +       static const void *jumptable[256] = {
> +               [0 ... 255] = &&default_label,
> +               /* Overwrite non-defaults ... */
> +#define DL(A, B, C)    [A|B|C] = &&A##_##B##_##C
> +               DL(BPF_ALU, BPF_ADD, BPF_X),
> +               DL(BPF_ALU, BPF_ADD, BPF_K),
> +               DL(BPF_ALU, BPF_SUB, BPF_X),
> +               DL(BPF_ALU, BPF_SUB, BPF_K),
> +               DL(BPF_ALU, BPF_AND, BPF_X),
> +               DL(BPF_ALU, BPF_AND, BPF_K),
> +               DL(BPF_ALU, BPF_OR, BPF_X),
> +               DL(BPF_ALU, BPF_OR, BPF_K),
> +               DL(BPF_ALU, BPF_LSH, BPF_X),
> +               DL(BPF_ALU, BPF_LSH, BPF_K),
> +               DL(BPF_ALU, BPF_RSH, BPF_X),
> +               DL(BPF_ALU, BPF_RSH, BPF_K),
> +               DL(BPF_ALU, BPF_XOR, BPF_X),
> +               DL(BPF_ALU, BPF_XOR, BPF_K),
> +               DL(BPF_ALU, BPF_MUL, BPF_X),
> +               DL(BPF_ALU, BPF_MUL, BPF_K),
> +               DL(BPF_ALU, BPF_MOV, BPF_X),
> +               DL(BPF_ALU, BPF_MOV, BPF_K),
> +               DL(BPF_ALU, BPF_DIV, BPF_X),
> +               DL(BPF_ALU, BPF_DIV, BPF_K),
> +               DL(BPF_ALU, BPF_MOD, BPF_X),
> +               DL(BPF_ALU, BPF_MOD, BPF_K),
> +               DL(BPF_ALU, BPF_BSWAP, BPF_X),
> +               DL(BPF_ALU, BPF_NEG, 0),
> +               DL(BPF_ALU64, BPF_ADD, BPF_X),
> +               DL(BPF_ALU64, BPF_ADD, BPF_K),
> +               DL(BPF_ALU64, BPF_SUB, BPF_X),
> +               DL(BPF_ALU64, BPF_SUB, BPF_K),
> +               DL(BPF_ALU64, BPF_AND, BPF_X),
> +               DL(BPF_ALU64, BPF_AND, BPF_K),
> +               DL(BPF_ALU64, BPF_OR, BPF_X),
> +               DL(BPF_ALU64, BPF_OR, BPF_K),
> +               DL(BPF_ALU64, BPF_LSH, BPF_X),
> +               DL(BPF_ALU64, BPF_LSH, BPF_K),
> +               DL(BPF_ALU64, BPF_RSH, BPF_X),
> +               DL(BPF_ALU64, BPF_RSH, BPF_K),
> +               DL(BPF_ALU64, BPF_XOR, BPF_X),
> +               DL(BPF_ALU64, BPF_XOR, BPF_K),
> +               DL(BPF_ALU64, BPF_MUL, BPF_X),
> +               DL(BPF_ALU64, BPF_MUL, BPF_K),
> +               DL(BPF_ALU64, BPF_MOV, BPF_X),
> +               DL(BPF_ALU64, BPF_MOV, BPF_K),
> +               DL(BPF_ALU64, BPF_ARSH, BPF_X),
> +               DL(BPF_ALU64, BPF_ARSH, BPF_K),
> +               DL(BPF_ALU64, BPF_DIV, BPF_X),
> +               DL(BPF_ALU64, BPF_DIV, BPF_K),
> +               DL(BPF_ALU64, BPF_MOD, BPF_X),
> +               DL(BPF_ALU64, BPF_MOD, BPF_K),
> +               DL(BPF_ALU64, BPF_BSWAP, BPF_X),
> +               DL(BPF_ALU64, BPF_NEG, 0),
> +               DL(BPF_JMP, BPF_CALL, 0),
> +               DL(BPF_JMP, BPF_JA, 0),
> +               DL(BPF_JMP, BPF_JEQ, BPF_X),
> +               DL(BPF_JMP, BPF_JEQ, BPF_K),
> +               DL(BPF_JMP, BPF_JNE, BPF_X),
> +               DL(BPF_JMP, BPF_JNE, BPF_K),
> +               DL(BPF_JMP, BPF_JGT, BPF_X),
> +               DL(BPF_JMP, BPF_JGT, BPF_K),
> +               DL(BPF_JMP, BPF_JGE, BPF_X),
> +               DL(BPF_JMP, BPF_JGE, BPF_K),
> +               DL(BPF_JMP, BPF_JSGT, BPF_X),
> +               DL(BPF_JMP, BPF_JSGT, BPF_K),
> +               DL(BPF_JMP, BPF_JSGE, BPF_X),
> +               DL(BPF_JMP, BPF_JSGE, BPF_K),
> +               DL(BPF_JMP, BPF_JSET, BPF_X),
> +               DL(BPF_JMP, BPF_JSET, BPF_K),
> +               DL(BPF_STX, BPF_MEM, BPF_B),
> +               DL(BPF_STX, BPF_MEM, BPF_H),
> +               DL(BPF_STX, BPF_MEM, BPF_W),
> +               DL(BPF_STX, BPF_MEM, BPF_DW),
> +               DL(BPF_ST, BPF_MEM, BPF_B),
> +               DL(BPF_ST, BPF_MEM, BPF_H),
> +               DL(BPF_ST, BPF_MEM, BPF_W),
> +               DL(BPF_ST, BPF_MEM, BPF_DW),
> +               DL(BPF_LDX, BPF_MEM, BPF_B),
> +               DL(BPF_LDX, BPF_MEM, BPF_H),
> +               DL(BPF_LDX, BPF_MEM, BPF_W),
> +               DL(BPF_LDX, BPF_MEM, BPF_DW),
> +               DL(BPF_STX, BPF_XADD, BPF_W),
> +               DL(BPF_STX, BPF_XADD, BPF_DW),
> +               DL(BPF_LD, BPF_ABS, BPF_W),
> +               DL(BPF_LD, BPF_ABS, BPF_H),
> +               DL(BPF_LD, BPF_ABS, BPF_B),
> +               DL(BPF_LD, BPF_IND, BPF_W),
> +               DL(BPF_LD, BPF_IND, BPF_H),
> +               DL(BPF_LD, BPF_IND, BPF_B),
> +               DL(BPF_RET, BPF_K, 0),
> +#undef DL
> +       };
>
> -       /*
> -        * Process array of filter instructions.
> -        */
> -       for (;; fentry++) {
> -#if defined(CONFIG_X86_32)
> -#define        K (fentry->k)
> -#else
> -               const u32 K = fentry->k;
> -#endif
> -
> -               switch (fentry->code) {
> -               case BPF_S_ALU_ADD_X:
> -                       A += X;
> -                       continue;
> -               case BPF_S_ALU_ADD_K:
> -                       A += K;
> -                       continue;
> -               case BPF_S_ALU_SUB_X:
> -                       A -= X;
> -                       continue;
> -               case BPF_S_ALU_SUB_K:
> -                       A -= K;
> -                       continue;
> -               case BPF_S_ALU_MUL_X:
> -                       A *= X;
> -                       continue;
> -               case BPF_S_ALU_MUL_K:
> -                       A *= K;
> -                       continue;
> -               case BPF_S_ALU_DIV_X:
> -                       if (X == 0)
> -                               return 0;
> -                       A /= X;
> -                       continue;
> -               case BPF_S_ALU_DIV_K:
> -                       A /= K;
> -                       continue;
> -               case BPF_S_ALU_MOD_X:
> -                       if (X == 0)
> -                               return 0;
> -                       A %= X;
> -                       continue;
> -               case BPF_S_ALU_MOD_K:
> -                       A %= K;
> -                       continue;
> -               case BPF_S_ALU_AND_X:
> -                       A &= X;
> -                       continue;
> -               case BPF_S_ALU_AND_K:
> -                       A &= K;
> -                       continue;
> -               case BPF_S_ALU_OR_X:
> -                       A |= X;
> -                       continue;
> -               case BPF_S_ALU_OR_K:
> -                       A |= K;
> -                       continue;
> -               case BPF_S_ANC_ALU_XOR_X:
> -               case BPF_S_ALU_XOR_X:
> -                       A ^= X;
> -                       continue;
> -               case BPF_S_ALU_XOR_K:
> -                       A ^= K;
> -                       continue;
> -               case BPF_S_ALU_LSH_X:
> -                       A <<= X;
> -                       continue;
> -               case BPF_S_ALU_LSH_K:
> -                       A <<= K;
> -                       continue;
> -               case BPF_S_ALU_RSH_X:
> -                       A >>= X;
> -                       continue;
> -               case BPF_S_ALU_RSH_K:
> -                       A >>= K;
> -                       continue;
> -               case BPF_S_ALU_NEG:
> -                       A = -A;
> -                       continue;
> -               case BPF_S_JMP_JA:
> -                       fentry += K;
> -                       continue;
> -               case BPF_S_JMP_JGT_K:
> -                       fentry += (A > K) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JGE_K:
> -                       fentry += (A >= K) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JEQ_K:
> -                       fentry += (A == K) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JSET_K:
> -                       fentry += (A & K) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JGT_X:
> -                       fentry += (A > X) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JGE_X:
> -                       fentry += (A >= X) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JEQ_X:
> -                       fentry += (A == X) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_JMP_JSET_X:
> -                       fentry += (A & X) ? fentry->jt : fentry->jf;
> -                       continue;
> -               case BPF_S_LD_W_ABS:
> -                       k = K;
> -load_w:
> -                       ptr = load_pointer(skb, k, 4, &tmp);
> -                       if (ptr != NULL) {
> -                               A = get_unaligned_be32(ptr);
> -                               continue;
> -                       }
> -                       return 0;
> -               case BPF_S_LD_H_ABS:
> -                       k = K;
> -load_h:
> -                       ptr = load_pointer(skb, k, 2, &tmp);
> -                       if (ptr != NULL) {
> -                               A = get_unaligned_be16(ptr);
> -                               continue;
> +       regs[FP_REG]  = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
> +       regs[CTX_REG] = (u64) (unsigned long) ctx;
> +
> +select_insn:
> +       goto *jumptable[insn->code];
> +
> +       /* ALU */
> +#define ALU(OPCODE, OP)                        \
> +       BPF_ALU64_##OPCODE##_BPF_X:     \
> +               A = A OP X;             \
> +               CONT;                   \
> +       BPF_ALU_##OPCODE##_BPF_X:       \
> +               A = (u32) A OP (u32) X; \
> +               CONT;                   \
> +       BPF_ALU64_##OPCODE##_BPF_K:     \
> +               A = A OP K;             \
> +               CONT;                   \
> +       BPF_ALU_##OPCODE##_BPF_K:       \
> +               A = (u32) A OP (u32) K; \
> +               CONT;
> +
> +       ALU(BPF_ADD,  +)
> +       ALU(BPF_SUB,  -)
> +       ALU(BPF_AND,  &)
> +       ALU(BPF_OR,   |)
> +       ALU(BPF_LSH, <<)
> +       ALU(BPF_RSH, >>)
> +       ALU(BPF_XOR,  ^)
> +       ALU(BPF_MUL,  *)
> +#undef ALU
> +       BPF_ALU_BPF_NEG_0:
> +               A = (u32) -A;
> +               CONT;
> +       BPF_ALU64_BPF_NEG_0:
> +               A = -A;
> +               CONT;
> +       BPF_ALU_BPF_MOV_BPF_X:
> +               A = (u32) X;
> +               CONT;
> +       BPF_ALU_BPF_MOV_BPF_K:
> +               A = (u32) K;
> +               CONT;
> +       BPF_ALU64_BPF_MOV_BPF_X:
> +               A = X;
> +               CONT;
> +       BPF_ALU64_BPF_MOV_BPF_K:
> +               A = K;
> +               CONT;
> +       BPF_ALU64_BPF_ARSH_BPF_X:
> +               (*(s64 *) &A) >>= X;
> +               CONT;
> +       BPF_ALU64_BPF_ARSH_BPF_K:
> +               (*(s64 *) &A) >>= K;
> +               CONT;
> +       BPF_ALU64_BPF_MOD_BPF_X:
> +               tmp = A;
> +               if (X)
> +                       A = do_div(tmp, X);
> +               CONT;
> +       BPF_ALU_BPF_MOD_BPF_X:
> +               tmp = (u32) A;
> +               if (X)
> +                       A = do_div(tmp, (u32) X);
> +               CONT;
> +       BPF_ALU64_BPF_MOD_BPF_K:
> +               tmp = A;
> +               if (K)
> +                       A = do_div(tmp, K);
> +               CONT;
> +       BPF_ALU_BPF_MOD_BPF_K:
> +               tmp = (u32) A;
> +               if (K)
> +                       A = do_div(tmp, (u32) K);
> +               CONT;
> +       BPF_ALU64_BPF_DIV_BPF_X:
> +               if (X)
> +                       do_div(A, X);
> +               CONT;
> +       BPF_ALU_BPF_DIV_BPF_X:
> +               tmp = (u32) A;
> +               if (X)
> +                       do_div(tmp, (u32) X);
> +               A = (u32) tmp;
> +               CONT;
> +       BPF_ALU64_BPF_DIV_BPF_K:
> +               if (K)
> +                       do_div(A, K);
> +               CONT;
> +       BPF_ALU_BPF_DIV_BPF_K:
> +               tmp = (u32) A;
> +               if (K)
> +                       do_div(tmp, (u32) K);
> +               A = (u32) tmp;
> +               CONT;
> +       BPF_ALU_BPF_BSWAP_BPF_X:
> +               A = swab32(A);
> +               CONT;
> +       BPF_ALU64_BPF_BSWAP_BPF_X:
> +               A = swab64(A);
> +               CONT;
> +
> +       /* CALL */
> +       BPF_JMP_BPF_CALL_0:
> +               regs[0] = (__bpf_call_base + insn->imm)(regs[1], regs[2],
> +                                                       regs[3], regs[4],
> +                                                       regs[5]);
> +               CONT;
> +
> +       /* JMP */
> +       BPF_JMP_BPF_JA_0:
> +               insn += insn->off;
> +               CONT;
> +       BPF_JMP_BPF_JEQ_BPF_X:
> +               if (A == X) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JEQ_BPF_K:
> +               if (A == K) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JNE_BPF_X:
> +               if (A != X) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JNE_BPF_K:
> +               if (A != K) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JGT_BPF_X:
> +               if (A > X) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JGT_BPF_K:
> +               if (A > K) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JGE_BPF_X:
> +               if (A >= X) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JGE_BPF_K:
> +               if (A >= K) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JSGT_BPF_X:
> +               if (((s64)A) > ((s64)X)) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JSGT_BPF_K:
> +               if (((s64)A) > ((s64)K)) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JSGE_BPF_X:
> +               if (((s64)A) >= ((s64)X)) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JSGE_BPF_K:
> +               if (((s64)A) >= ((s64)K)) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JSET_BPF_X:
> +               if (A & X) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +       BPF_JMP_BPF_JSET_BPF_K:
> +               if (A & K) {
> +                       insn += insn->off;
> +                       CONT_JMP;
> +               }
> +               CONT;
> +
> +       /* STX and ST and LDX*/
> +#define LDST(SIZEOP, SIZE)                                     \
> +       BPF_STX_BPF_MEM_##SIZEOP:                               \
> +               *(SIZE *)(unsigned long) (A + insn->off) = X;   \
> +               CONT;                                           \
> +       BPF_ST_BPF_MEM_##SIZEOP:                                \
> +               *(SIZE *)(unsigned long) (A + insn->off) = K;   \
> +               CONT;                                           \
> +       BPF_LDX_BPF_MEM_##SIZEOP:                               \
> +               A = *(SIZE *)(unsigned long) (X + insn->off);   \
> +               CONT;
> +
> +       LDST(BPF_B,   u8)
> +       LDST(BPF_H,  u16)
> +       LDST(BPF_W,  u32)
> +       LDST(BPF_DW, u64)
> +#undef LDST
> +       BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
> +               atomic_add((u32) X, (atomic_t *)(unsigned long)
> +                          (A + insn->off));
> +               CONT;
> +       BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
> +               atomic64_add((u64) X, (atomic64_t *)(unsigned long)
> +                            (A + insn->off));
> +               CONT;
> +       BPF_LD_BPF_ABS_BPF_W: /* A = *(u32 *)(ctx + K) */
> +               off = K;
> +load_word:
> +               /* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
> +                * appearing in the programs where ctx == skb.
> +                */
> +               ptr = load_pointer((struct sk_buff *) ctx, off, 4, &tmp);
> +               if (likely(ptr != NULL)) {
> +                       A = get_unaligned_be32(ptr);
> +                       CONT;
> +               }
> +               return 0;
> +       BPF_LD_BPF_ABS_BPF_H: /* A = *(u16 *)(ctx + K) */
> +               off = K;
> +load_half:
> +               ptr = load_pointer((struct sk_buff *) ctx, off, 2, &tmp);
> +               if (likely(ptr != NULL)) {
> +                       A = get_unaligned_be16(ptr);
> +                       CONT;
> +               }
> +               return 0;
> +
> +       BPF_LD_BPF_ABS_BPF_B: /* A = *(u8 *)(ctx + K) */
> +               off = K;
> +load_byte:
> +               ptr = load_pointer((struct sk_buff *) ctx, off, 1, &tmp);
> +               if (likely(ptr != NULL)) {
> +                       A = *(u8 *)ptr;
> +                       CONT;
> +               }
> +               return 0;
> +       BPF_LD_BPF_IND_BPF_W: /* A = *(u32 *)(ctx + X + K) */
> +               off = K + X;
> +               goto load_word;
> +       BPF_LD_BPF_IND_BPF_H: /* A = *(u16 *)(ctx + X + K) */
> +               off = K + X;
> +               goto load_half;
> +       BPF_LD_BPF_IND_BPF_B: /* A = *(u8 *)(ctx + X + K) */
> +               off = K + X;
> +               goto load_byte;
> +
> +       /* RET */
> +       BPF_RET_BPF_K_0:
> +               return regs[0 /* R0 */];
> +
> +       default_label:
> +               /* If we ever reach this, we have a bug somewhere. */
> +               WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
> +               return 0;
> +#undef CONT_JMP
> +#undef CONT
> +#undef A
> +#undef X
> +#undef K
> +}
> +
> +u32 sk_run_filter_int_seccomp(const struct seccomp_data *ctx,
> +                             const struct sock_filter_int *insni)
> +    __attribute__ ((alias ("__sk_run_filter")));
> +
> +u32 sk_run_filter_int_skb(const struct sk_buff *ctx,
> +                         const struct sock_filter_int *insni)
> +    __attribute__ ((alias ("__sk_run_filter")));
> +EXPORT_SYMBOL_GPL(sk_run_filter_int_skb);
> +
> +/* Helper to find the offset of pkt_type in sk_buff structure. We want
> + * to make sure its still a 3bit field starting at a byte boundary;
> + * taken from arch/x86/net/bpf_jit_comp.c.
> + */
> +#define PKT_TYPE_MAX   7
> +static unsigned int pkt_type_offset(void)
> +{
> +       struct sk_buff skb_probe = { .pkt_type = ~0, };
> +       u8 *ct = (u8 *) &skb_probe;
> +       unsigned int off;
> +
> +       for (off = 0; off < sizeof(struct sk_buff); off++) {
> +               if (ct[off] == PKT_TYPE_MAX)
> +                       return off;
> +       }
> +
> +       pr_err_once("Please fix %s, as pkt_type couldn't be found!\n", __func__);
> +       return -1;
> +}
> +
> +static u64 __skb_get_pay_offset(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
> +{
> +       struct sk_buff *skb = (struct sk_buff *)(long) ctx;
> +
> +       return __skb_get_poff(skb);
> +}
> +
> +static u64 __skb_get_nlattr(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
> +{
> +       struct sk_buff *skb = (struct sk_buff *)(long) ctx;
> +       struct nlattr *nla;
> +
> +       if (skb_is_nonlinear(skb))
> +               return 0;
> +
> +       if (A > skb->len - sizeof(struct nlattr))
> +               return 0;
> +
> +       nla = nla_find((struct nlattr *) &skb->data[A], skb->len - A, X);
> +       if (nla)
> +               return (void *) nla - (void *) skb->data;
> +
> +       return 0;
> +}
> +
> +static u64 __skb_get_nlattr_nest(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
> +{
> +       struct sk_buff *skb = (struct sk_buff *)(long) ctx;
> +       struct nlattr *nla;
> +
> +       if (skb_is_nonlinear(skb))
> +               return 0;
> +
> +       if (A > skb->len - sizeof(struct nlattr))
> +               return 0;
> +
> +       nla = (struct nlattr *) &skb->data[A];
> +       if (nla->nla_len > A - skb->len)
> +               return 0;
> +
> +       nla = nla_find_nested(nla, X);
> +       if (nla)
> +               return (void *) nla - (void *) skb->data;
> +
> +       return 0;
> +}
> +
> +static u64 __get_raw_cpu_id(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
> +{
> +       return raw_smp_processor_id();
> +}
> +
> +/* Register mappings for user programs. */
> +#define A_REG          6
> +#define X_REG          7
> +#define TMP_REG                8
> +
> +static bool convert_bpf_extensions(struct sock_filter *fp,
> +                                  struct sock_filter_int **insnp)
> +{
> +       struct sock_filter_int *insn = *insnp;
> +
> +       switch (fp->k) {
> +       case SKF_AD_OFF + SKF_AD_PROTOCOL:
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, protocol) != 2);
> +
> +               insn->code = BPF_LDX | BPF_MEM | BPF_H;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = offsetof(struct sk_buff, protocol);
> +#ifdef  __LITTLE_ENDIAN
> +               insn++;
> +
> +               /* A = swab32(A) */
> +               insn->code = BPF_ALU | BPF_BSWAP | BPF_X;
> +               insn->a_reg = A_REG;
> +               insn++;
> +
> +               /* A >>= 16 */
> +               insn->code = BPF_ALU | BPF_RSH | BPF_K;
> +               insn->a_reg = A_REG;
> +               insn->imm = 16;
> +#endif /* __LITTLE_ENDIAN */
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_PKTTYPE:
> +               insn->code = BPF_LDX | BPF_MEM | BPF_B;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = pkt_type_offset();
> +               if (insn->off < 0)
> +                       return false;
> +               insn++;
> +
> +               insn->code = BPF_ALU | BPF_AND | BPF_K;
> +               insn->a_reg = A_REG;
> +               insn->imm = PKT_TYPE_MAX;
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_IFINDEX:
> +       case SKF_AD_OFF + SKF_AD_HATYPE:
> +               if (FIELD_SIZEOF(struct sk_buff, dev) == 8)
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_DW;
> +               else
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +               insn->a_reg = TMP_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = offsetof(struct sk_buff, dev);
> +               insn++;
> +
> +               insn->code = BPF_JMP | BPF_JNE | BPF_K;
> +               insn->a_reg = TMP_REG;
> +               insn->imm = 0;
> +               insn->off = 1;
> +               insn++;
> +
> +               insn->code = BPF_RET | BPF_K;
> +               insn++;
> +
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, ifindex) != 4);
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct net_device, type) != 2);
> +
> +               insn->a_reg = A_REG;
> +               insn->x_reg = TMP_REG;
> +
> +               if (fp->k == SKF_AD_OFF + SKF_AD_IFINDEX) {
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +                       insn->off = offsetof(struct net_device, ifindex);
> +               } else {
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_H;
> +                       insn->off = offsetof(struct net_device, type);
> +               }
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_MARK:
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, mark) != 4);
> +
> +               insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = offsetof(struct sk_buff, mark);
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_RXHASH:
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, rxhash) != 4);
> +
> +               insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = offsetof(struct sk_buff, rxhash);
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_QUEUE:
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, queue_mapping) != 2);
> +
> +               insn->code = BPF_LDX | BPF_MEM | BPF_H;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = offsetof(struct sk_buff, queue_mapping);
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_VLAN_TAG:
> +       case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
> +               BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 2);
> +
> +               insn->code = BPF_LDX | BPF_MEM | BPF_H;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = CTX_REG;
> +               insn->off = offsetof(struct sk_buff, vlan_tci);
> +               insn++;
> +
> +               BUILD_BUG_ON(VLAN_TAG_PRESENT != 0x1000);
> +
> +               if (fp->k == SKF_AD_OFF + SKF_AD_VLAN_TAG) {
> +                       insn->code = BPF_ALU | BPF_AND | BPF_K;
> +                       insn->a_reg = A_REG;
> +                       insn->imm = ~VLAN_TAG_PRESENT;
> +               } else {
> +                       insn->code = BPF_ALU | BPF_RSH | BPF_K;
> +                       insn->a_reg = A_REG;
> +                       insn->imm = 12;
> +                       insn++;
> +
> +                       insn->code = BPF_ALU | BPF_AND | BPF_K;
> +                       insn->a_reg = A_REG;
> +                       insn->imm = 1;
> +               }
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
> +       case SKF_AD_OFF + SKF_AD_NLATTR:
> +       case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
> +       case SKF_AD_OFF + SKF_AD_CPU:
> +               /* Save ctx */
> +               insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +               insn->a_reg = TMP_REG;
> +               insn->x_reg = CTX_REG;
> +               insn++;
> +
> +               /* arg2 = A */
> +               insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +               insn->a_reg = 2;
> +               insn->x_reg = A_REG;
> +               insn++;
> +
> +               /* arg3 = X */
> +               insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +               insn->a_reg = 3;
> +               insn->x_reg = X_REG;
> +               insn++;
> +
> +               /* Emit call(ctx, arg2=A, arg3=X) */
> +               insn->code = BPF_JMP | BPF_CALL;
> +               /* Re: sparse ... Share your drugs? High on caffeine ... ;-) */
> +               switch (fp->k) {
> +               case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
> +                       insn->imm = __skb_get_pay_offset - __bpf_call_base;
> +                       break;
> +               case SKF_AD_OFF + SKF_AD_NLATTR:
> +                       insn->imm = __skb_get_nlattr - __bpf_call_base;
> +                       break;
> +               case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
> +                       insn->imm = __skb_get_nlattr_nest - __bpf_call_base;
> +                       break;
> +               case SKF_AD_OFF + SKF_AD_CPU:
> +                       insn->imm = __get_raw_cpu_id - __bpf_call_base;
> +                       break;
> +               }
> +               insn++;
> +
> +               /* Restore ctx */
> +               insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +               insn->a_reg = CTX_REG;
> +               insn->x_reg = TMP_REG;
> +               insn++;
> +
> +               /* Move ret value into A_REG */
> +               insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = 0;
> +               break;
> +
> +       case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
> +               insn->code = BPF_ALU | BPF_XOR | BPF_X;
> +               insn->a_reg = A_REG;
> +               insn->x_reg = X_REG;
> +               break;
> +
> +       default:
> +               /* This is just a dummy call to avoid letting the compiler
> +                * evict __bpf_call_base() as an optimization. Placed here
> +                * where no-one bothers.
> +                */
> +               BUG_ON(__bpf_call_base(0, 0, 0, 0, 0) != 0);
> +               return false;
> +       }
> +
> +       *insnp = insn;
> +       return true;
> +}
> +
> +/**
> + *     sk_convert_filter - convert filter program
> + *     @prog: the user passed filter program
> + *     @len: the length of the user passed filter program
> + *     @new_prog: buffer where converted program will be stored
> + *     @new_len: pointer to store length of converted program
> + *
> + * Remap 'sock_filter' style BPF instruction set to 'sock_filter_ext' style.
> + * Conversion workflow:
> + *
> + * 1) First pass for calculating the new program length:
> + *   sk_convert_filter(old_prog, old_len, NULL, &new_len)
> + *
> + * 2) 2nd pass to remap in two passes: 1st pass finds new
> + *    jump offsets, 2nd pass remapping:
> + *   new_prog = kmalloc(sizeof(struct sock_filter_int) * new_len);
> + *   sk_convert_filter(old_prog, old_len, new_prog, &new_len);
> + *
> + * User BPF's register A is mapped to our BPF register 6, user BPF
> + * register X is mapped to BPF register 7; frame pointer is always
> + * register 10; Context 'void *ctx' is stored in register 1, that is,
> + * for socket filters: ctx == 'struct sk_buff *', for seccomp:
> + * ctx == 'struct seccomp_data *'.
> + */
> +int sk_convert_filter(struct sock_filter *prog, int len,
> +                     struct sock_filter_int *new_prog, int *new_len)
> +{
> +       int new_flen = 0, pass = 0, target, i;
> +       struct sock_filter_int *new_insn;
> +       struct sock_filter *fp;
> +       int *addrs = NULL;
> +       u8 bpf_src;
> +
> +       BUILD_BUG_ON(BPF_MEMWORDS * sizeof(u32) > MAX_BPF_STACK);
> +       BUILD_BUG_ON(FP_REG + 1 != MAX_BPF_REG);
> +
> +       if (len <= 0 || len >= BPF_MAXINSNS)
> +               return -EINVAL;
> +
> +       if (new_prog) {
> +               addrs = kzalloc(len * sizeof(*addrs), GFP_KERNEL);
> +               if (!addrs)
> +                       return -ENOMEM;
> +       }
> +
> +do_pass:
> +       new_insn = new_prog;
> +       fp = prog;
> +
> +       for (i = 0; i < len; fp++, i++) {
> +               struct sock_filter_int tmp_insns[6] = { };
> +               struct sock_filter_int *insn = tmp_insns;
> +
> +               if (addrs)
> +                       addrs[i] = new_insn - new_prog;
> +
> +               switch (fp->code) {
> +               /* All arithmetic insns and skb loads map as-is. */
> +               case BPF_ALU | BPF_ADD | BPF_X:
> +               case BPF_ALU | BPF_ADD | BPF_K:
> +               case BPF_ALU | BPF_SUB | BPF_X:
> +               case BPF_ALU | BPF_SUB | BPF_K:
> +               case BPF_ALU | BPF_AND | BPF_X:
> +               case BPF_ALU | BPF_AND | BPF_K:
> +               case BPF_ALU | BPF_OR | BPF_X:
> +               case BPF_ALU | BPF_OR | BPF_K:
> +               case BPF_ALU | BPF_LSH | BPF_X:
> +               case BPF_ALU | BPF_LSH | BPF_K:
> +               case BPF_ALU | BPF_RSH | BPF_X:
> +               case BPF_ALU | BPF_RSH | BPF_K:
> +               case BPF_ALU | BPF_XOR | BPF_X:
> +               case BPF_ALU | BPF_XOR | BPF_K:
> +               case BPF_ALU | BPF_MUL | BPF_X:
> +               case BPF_ALU | BPF_MUL | BPF_K:
> +               case BPF_ALU | BPF_DIV | BPF_X:
> +               case BPF_ALU | BPF_DIV | BPF_K:
> +               case BPF_ALU | BPF_MOD | BPF_X:
> +               case BPF_ALU | BPF_MOD | BPF_K:
> +               case BPF_ALU | BPF_NEG:
> +               case BPF_LD | BPF_ABS | BPF_W:
> +               case BPF_LD | BPF_ABS | BPF_H:
> +               case BPF_LD | BPF_ABS | BPF_B:
> +               case BPF_LD | BPF_IND | BPF_W:
> +               case BPF_LD | BPF_IND | BPF_H:
> +               case BPF_LD | BPF_IND | BPF_B:
> +                       /* Check for overloaded BPF extension and
> +                        * directly convert it if found, otherwise
> +                        * just move on with mapping.
> +                        */
> +                       if (BPF_CLASS(fp->code) == BPF_LD &&
> +                           BPF_MODE(fp->code) == BPF_ABS &&
> +                           convert_bpf_extensions(fp, &insn))
> +                               break;
> +
> +                       insn->code = fp->code;
> +                       insn->a_reg = A_REG;
> +                       insn->x_reg = X_REG;
> +                       insn->imm = fp->k;
> +                       break;
> +
> +               /* Jump opcodes map as-is, but offsets need adjustment. */
> +               case BPF_JMP | BPF_JA:
> +                       target = i + fp->k + 1;
> +                       insn->code = fp->code;
> +#define EMIT_JMP                                                       \
> +       do {                                                            \
> +               if (target >= len || target < 0)                        \
> +                       goto err;                                       \
> +               insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0;   \
> +               /* Adjust pc relative offset for 2nd or 3rd insn. */    \
> +               insn->off -= insn - tmp_insns;                          \
> +       } while (0)
> +
> +                       EMIT_JMP;
> +                       break;
> +
> +               case BPF_JMP | BPF_JEQ | BPF_K:
> +               case BPF_JMP | BPF_JEQ | BPF_X:
> +               case BPF_JMP | BPF_JSET | BPF_K:
> +               case BPF_JMP | BPF_JSET | BPF_X:
> +               case BPF_JMP | BPF_JGT | BPF_K:
> +               case BPF_JMP | BPF_JGT | BPF_X:
> +               case BPF_JMP | BPF_JGE | BPF_K:
> +               case BPF_JMP | BPF_JGE | BPF_X:
> +                       if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
> +                               /* BPF immediates are signed, zero extend
> +                                * immediate into tmp register and use it
> +                                * in compare insn.
> +                                */
> +                               insn->code = BPF_ALU | BPF_MOV | BPF_K;
> +                               insn->a_reg = TMP_REG;
> +                               insn->imm = fp->k;
> +                               insn++;
> +
> +                               insn->a_reg = A_REG;
> +                               insn->x_reg = TMP_REG;
> +                               bpf_src = BPF_X;
> +                       } else {
> +                               insn->a_reg = A_REG;
> +                               insn->x_reg = X_REG;
> +                               insn->imm = fp->k;
> +                               bpf_src = BPF_SRC(fp->code);
>                         }
> -                       return 0;
> -               case BPF_S_LD_B_ABS:
> -                       k = K;
> -load_b:
> -                       ptr = load_pointer(skb, k, 1, &tmp);
> -                       if (ptr != NULL) {
> -                               A = *(u8 *)ptr;
> -                               continue;
> +
> +                       /* Common case where 'jump_false' is next insn. */
> +                       if (fp->jf == 0) {
> +                               insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
> +                               target = i + fp->jt + 1;
> +                               EMIT_JMP;
> +                               break;
>                         }
> -                       return 0;
> -               case BPF_S_LD_W_LEN:
> -                       A = skb->len;
> -                       continue;
> -               case BPF_S_LDX_W_LEN:
> -                       X = skb->len;
> -                       continue;
> -               case BPF_S_LD_W_IND:
> -                       k = X + K;
> -                       goto load_w;
> -               case BPF_S_LD_H_IND:
> -                       k = X + K;
> -                       goto load_h;
> -               case BPF_S_LD_B_IND:
> -                       k = X + K;
> -                       goto load_b;
> -               case BPF_S_LDX_B_MSH:
> -                       ptr = load_pointer(skb, K, 1, &tmp);
> -                       if (ptr != NULL) {
> -                               X = (*(u8 *)ptr & 0xf) << 2;
> -                               continue;
> +
> +                       /* Convert JEQ into JNE when 'jump_true' is next insn. */
> +                       if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
> +                               insn->code = BPF_JMP | BPF_JNE | bpf_src;
> +                               target = i + fp->jf + 1;
> +                               EMIT_JMP;
> +                               break;
>                         }
> -                       return 0;
> -               case BPF_S_LD_IMM:
> -                       A = K;
> -                       continue;
> -               case BPF_S_LDX_IMM:
> -                       X = K;
> -                       continue;
> -               case BPF_S_LD_MEM:
> -                       A = mem[K];
> -                       continue;
> -               case BPF_S_LDX_MEM:
> -                       X = mem[K];
> -                       continue;
> -               case BPF_S_MISC_TAX:
> -                       X = A;
> -                       continue;
> -               case BPF_S_MISC_TXA:
> -                       A = X;
> -                       continue;
> -               case BPF_S_RET_K:
> -                       return K;
> -               case BPF_S_RET_A:
> -                       return A;
> -               case BPF_S_ST:
> -                       mem[K] = A;
> -                       continue;
> -               case BPF_S_STX:
> -                       mem[K] = X;
> -                       continue;
> -               case BPF_S_ANC_PROTOCOL:
> -                       A = ntohs(skb->protocol);
> -                       continue;
> -               case BPF_S_ANC_PKTTYPE:
> -                       A = skb->pkt_type;
> -                       continue;
> -               case BPF_S_ANC_IFINDEX:
> -                       if (!skb->dev)
> -                               return 0;
> -                       A = skb->dev->ifindex;
> -                       continue;
> -               case BPF_S_ANC_MARK:
> -                       A = skb->mark;
> -                       continue;
> -               case BPF_S_ANC_QUEUE:
> -                       A = skb->queue_mapping;
> -                       continue;
> -               case BPF_S_ANC_HATYPE:
> -                       if (!skb->dev)
> -                               return 0;
> -                       A = skb->dev->type;
> -                       continue;
> -               case BPF_S_ANC_RXHASH:
> -                       A = skb->rxhash;
> -                       continue;
> -               case BPF_S_ANC_CPU:
> -                       A = raw_smp_processor_id();
> -                       continue;
> -               case BPF_S_ANC_VLAN_TAG:
> -                       A = vlan_tx_tag_get(skb);
> -                       continue;
> -               case BPF_S_ANC_VLAN_TAG_PRESENT:
> -                       A = !!vlan_tx_tag_present(skb);
> -                       continue;
> -               case BPF_S_ANC_PAY_OFFSET:
> -                       A = __skb_get_poff(skb);
> -                       continue;
> -               case BPF_S_ANC_NLATTR: {
> -                       struct nlattr *nla;
> -
> -                       if (skb_is_nonlinear(skb))
> -                               return 0;
> -                       if (A > skb->len - sizeof(struct nlattr))
> -                               return 0;
> -
> -                       nla = nla_find((struct nlattr *)&skb->data[A],
> -                                      skb->len - A, X);
> -                       if (nla)
> -                               A = (void *)nla - (void *)skb->data;
> -                       else
> -                               A = 0;
> -                       continue;
> -               }
> -               case BPF_S_ANC_NLATTR_NEST: {
> -                       struct nlattr *nla;
> -
> -                       if (skb_is_nonlinear(skb))
> -                               return 0;
> -                       if (A > skb->len - sizeof(struct nlattr))
> -                               return 0;
> -
> -                       nla = (struct nlattr *)&skb->data[A];
> -                       if (nla->nla_len > A - skb->len)
> -                               return 0;
> -
> -                       nla = nla_find_nested(nla, X);
> -                       if (nla)
> -                               A = (void *)nla - (void *)skb->data;
> -                       else
> -                               A = 0;
> -                       continue;
> -               }
> -#ifdef CONFIG_SECCOMP_FILTER
> -               case BPF_S_ANC_SECCOMP_LD_W:
> -                       A = seccomp_bpf_load(fentry->k);
> -                       continue;
> -#endif
> +
> +                       /* Other jumps are mapped into two insns: Jxx and JA. */
> +                       target = i + fp->jt + 1;
> +                       insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
> +                       EMIT_JMP;
> +                       insn++;
> +
> +                       insn->code = BPF_JMP | BPF_JA;
> +                       target = i + fp->jf + 1;
> +                       EMIT_JMP;
> +                       break;
> +
> +               /* ldxb 4 * ([14] & 0xf) is remaped into 3 insns. */
> +               case BPF_LDX | BPF_MSH | BPF_B:
> +                       insn->code = BPF_LD | BPF_ABS | BPF_B;
> +                       insn->a_reg = X_REG;
> +                       insn->imm = fp->k;
> +                       insn++;
> +
> +                       insn->code = BPF_ALU | BPF_AND | BPF_K;
> +                       insn->a_reg = X_REG;
> +                       insn->imm = 0xf;
> +                       insn++;
> +
> +                       insn->code = BPF_ALU | BPF_LSH | BPF_K;
> +                       insn->a_reg = X_REG;
> +                       insn->imm = 2;
> +                       break;
> +
> +               /* RET_K, RET_A are remaped into 2 insns. */
> +               case BPF_RET | BPF_A:
> +               case BPF_RET | BPF_K:
> +                       insn->code = BPF_ALU | BPF_MOV |
> +                                    (BPF_RVAL(fp->code) == BPF_K ?
> +                                     BPF_K : BPF_X);
> +                       insn->a_reg = 0;
> +                       insn->x_reg = A_REG;
> +                       insn->imm = fp->k;
> +                       insn++;
> +
> +                       insn->code = BPF_RET | BPF_K;
> +                       break;
> +
> +               /* Store to stack. */
> +               case BPF_ST:
> +               case BPF_STX:
> +                       insn->code = BPF_STX | BPF_MEM | BPF_W;
> +                       insn->a_reg = FP_REG;
> +                       insn->x_reg = fp->code == BPF_ST ? A_REG : X_REG;
> +                       insn->off = -(BPF_MEMWORDS - fp->k) * 4;
> +                       break;
> +
> +               /* Load from stack. */
> +               case BPF_LD | BPF_MEM:
> +               case BPF_LDX | BPF_MEM:
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +                       insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
> +                                     A_REG : X_REG;
> +                       insn->x_reg = FP_REG;
> +                       insn->off = -(BPF_MEMWORDS - fp->k) * 4;
> +                       break;
> +
> +               /* A = K or X = K */
> +               case BPF_LD | BPF_IMM:
> +               case BPF_LDX | BPF_IMM:
> +                       insn->code = BPF_ALU | BPF_MOV | BPF_K;
> +                       insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
> +                                     A_REG : X_REG;
> +                       insn->imm = fp->k;
> +                       break;
> +
> +               /* X = A */
> +               case BPF_MISC | BPF_TAX:
> +                       insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +                       insn->a_reg = X_REG;
> +                       insn->x_reg = A_REG;
> +                       break;
> +
> +               /* A = X */
> +               case BPF_MISC | BPF_TXA:
> +                       insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> +                       insn->a_reg = A_REG;
> +                       insn->x_reg = X_REG;
> +                       break;
> +
> +               /* A = skb->len or X = skb->len */
> +               case BPF_LD | BPF_W | BPF_LEN:
> +               case BPF_LDX | BPF_W | BPF_LEN:
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +                       insn->a_reg = BPF_CLASS(fp->code) == BPF_LD ?
> +                                     A_REG : X_REG;
> +                       insn->x_reg = CTX_REG;
> +                       insn->off = offsetof(struct sk_buff, len);
> +                       break;
> +
> +               /* access seccomp_data fields */
> +               case BPF_LDX | BPF_ABS | BPF_W:
> +                       insn->code = BPF_LDX | BPF_MEM | BPF_W;
> +                       insn->a_reg = A_REG;
> +                       insn->x_reg = CTX_REG;
> +                       insn->off = fp->k;
> +                       break;
> +
>                 default:
> -                       WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n",
> -                                      fentry->code, fentry->jt,
> -                                      fentry->jf, fentry->k);
> -                       return 0;
> +                       goto err;
>                 }
> +
> +               insn++;
> +               if (new_prog)
> +                       memcpy(new_insn, tmp_insns,
> +                              sizeof(*insn) * (insn - tmp_insns));
> +
> +               new_insn += insn - tmp_insns;
>         }
>
> +       if (!new_prog) {
> +               /* Only calculating new length. */
> +               *new_len = new_insn - new_prog;
> +               return 0;
> +       }
> +
> +       pass++;
> +       if (new_flen != new_insn - new_prog) {
> +               new_flen = new_insn - new_prog;
> +               if (pass > 2)
> +                       goto err;
> +
> +               goto do_pass;
> +       }
> +
> +       kfree(addrs);
> +       BUG_ON(*new_len != new_flen);
>         return 0;
> +err:
> +       kfree(addrs);
> +       return -EINVAL;
>  }
> -EXPORT_SYMBOL(sk_run_filter);
>
> -/*
> - * Security :
> +/* Security:
> + *
>   * A BPF program is able to use 16 cells of memory to store intermediate
> - * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter())
> + * values (check u32 mem[BPF_MEMWORDS] in sk_run_filter()).
> + *
>   * As we dont want to clear mem[] array for each packet going through
>   * sk_run_filter(), we check that filter loaded by user never try to read
>   * a cell if not previously written, and we check all branches to be sure
> @@ -696,19 +1400,130 @@ void sk_filter_charge(struct sock *sk, struct sk_filter *fp)
>         atomic_add(sk_filter_size(fp->len), &sk->sk_omem_alloc);
>  }
>
> -static int __sk_prepare_filter(struct sk_filter *fp)
> +static struct sk_filter *__sk_migrate_realloc(struct sk_filter *fp,
> +                                             struct sock *sk,
> +                                             unsigned int len)
> +{
> +       struct sk_filter *fp_new;
> +
> +       if (sk == NULL)
> +               return krealloc(fp, len, GFP_KERNEL);
> +
> +       fp_new = sock_kmalloc(sk, len, GFP_KERNEL);
> +       if (fp_new) {
> +               memcpy(fp_new, fp, sizeof(struct sk_filter));
> +               /* As we're kepping orig_prog in fp_new along,
> +                * we need to make sure we're not evicting it
> +                * from the old fp.
> +                */
> +               fp->orig_prog = NULL;
> +               sk_filter_uncharge(sk, fp);
> +       }
> +
> +       return fp_new;
> +}
> +
> +static struct sk_filter *__sk_migrate_filter(struct sk_filter *fp,
> +                                            struct sock *sk)
> +{
> +       struct sock_filter *old_prog;
> +       struct sk_filter *old_fp;
> +       int i, err, new_len, old_len = fp->len;
> +
> +       /* We are free to overwrite insns et al right here as it
> +        * won't be used at this point in time anymore internally
> +        * after the migration to the internal BPF instruction
> +        * representation.
> +        */
> +       BUILD_BUG_ON(sizeof(struct sock_filter) !=
> +                    sizeof(struct sock_filter_int));
> +
> +       /* For now, we need to unfiddle BPF_S_* identifiers in place.
> +        * This can sooner or later on be subject to removal, e.g. when
> +        * JITs have been converted.
> +        */
> +       for (i = 0; i < fp->len; i++)
> +               sk_decode_filter(&fp->insns[i], &fp->insns[i]);
> +
> +       /* Conversion cannot happen on overlapping memory areas,
> +        * so we need to keep the user BPF around until the 2nd
> +        * pass. At this time, the user BPF is stored in fp->insns.
> +        */
> +       old_prog = kmemdup(fp->insns, old_len * sizeof(struct sock_filter),
> +                          GFP_KERNEL);
> +       if (!old_prog) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       /* 1st pass: calculate the new program length. */
> +       err = sk_convert_filter(old_prog, old_len, NULL, &new_len);
> +       if (err)
> +               goto out_err_free;
> +
> +       /* Expand fp for appending the new filter representation. */
> +       old_fp = fp;
> +       fp = __sk_migrate_realloc(old_fp, sk, sk_filter_size(new_len));
> +       if (!fp) {
> +               /* The old_fp is still around in case we couldn't
> +                * allocate new memory, so uncharge on that one.
> +                */
> +               fp = old_fp;
> +               err = -ENOMEM;
> +               goto out_err_free;
> +       }
> +
> +       fp->bpf_func = sk_run_filter_int_skb;
> +       fp->len = new_len;
> +
> +       /* 2nd pass: remap sock_filter insns into sock_filter_int insns. */
> +       err = sk_convert_filter(old_prog, old_len, fp->insnsi, &new_len);
> +       if (err)
> +               /* 2nd sk_convert_filter() can fail only if it fails
> +                * to allocate memory, remapping must succeed. Note,
> +                * that at this time old_fp has already been released
> +                * by __sk_migrate_realloc().
> +                */
> +               goto out_err_free;
> +
> +       kfree(old_prog);
> +       return fp;
> +
> +out_err_free:
> +       kfree(old_prog);
> +out_err:
> +       /* Rollback filter setup. */
> +       if (sk != NULL)
> +               sk_filter_uncharge(sk, fp);
> +       else
> +               kfree(fp);
> +       return ERR_PTR(err);
> +}
> +
> +static struct sk_filter *__sk_prepare_filter(struct sk_filter *fp,
> +                                            struct sock *sk)
>  {
>         int err;
>
> -       fp->bpf_func = sk_run_filter;
> +       fp->bpf_func = NULL;
>         fp->jited = 0;
>
>         err = sk_chk_filter(fp->insns, fp->len);
>         if (err)
> -               return err;
> +               return ERR_PTR(err);
>
> +       /* Probe if we can JIT compile the filter and if so, do
> +        * the compilation of the filter.
> +        */
>         bpf_jit_compile(fp);
> -       return 0;
> +
> +       /* JIT compiler couldn't process this filter, so do the
> +        * internal BPF translation for the optimized interpreter.
> +        */
> +       if (!fp->jited)
> +               fp = __sk_migrate_filter(fp, sk);
> +
> +       return fp;
>  }
>
>  /**
> @@ -726,7 +1541,6 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
>  {
>         unsigned int fsize = sk_filter_proglen(fprog);
>         struct sk_filter *fp;
> -       int err;
>
>         /* Make sure new filter is there and in the right amounts. */
>         if (fprog->filter == NULL)
> @@ -746,15 +1560,15 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
>          */
>         fp->orig_prog = NULL;
>
> -       err = __sk_prepare_filter(fp);
> -       if (err)
> -               goto free_mem;
> +       /* __sk_prepare_filter() already takes care of uncharging
> +        * memory in case something goes wrong.
> +        */
> +       fp = __sk_prepare_filter(fp, NULL);
> +       if (IS_ERR(fp))
> +               return PTR_ERR(fp);
>
>         *pfp = fp;
>         return 0;
> -free_mem:
> -       kfree(fp);
> -       return err;
>  }
>  EXPORT_SYMBOL_GPL(sk_unattached_filter_create);
>
> @@ -806,11 +1620,12 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>                 return -ENOMEM;
>         }
>
> -       err = __sk_prepare_filter(fp);
> -       if (err) {
> -               sk_filter_uncharge(sk, fp);
> -               return err;
> -       }
> +       /* __sk_prepare_filter() already takes care of uncharging
> +        * memory in case something goes wrong.
> +        */
> +       fp = __sk_prepare_filter(fp, sk);
> +       if (IS_ERR(fp))
> +               return PTR_ERR(fp);
>
>         old_fp = rcu_dereference_protected(sk->sk_filter,
>                                            sock_owned_by_user(sk));
> --
> 1.7.11.7
>



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 4/9] net: ptp: use sk_unattached_filter_create() for BPF
  2014-03-21 12:20 ` [PATCH net-next 4/9] net: ptp: use sk_unattached_filter_create() for BPF Daniel Borkmann
@ 2014-03-24 22:39   ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2014-03-24 22:39 UTC (permalink / raw)
  To: dborkman; +Cc: ast, netdev, richard.cochran, jbenc

From: Daniel Borkmann <dborkman@redhat.com>
Date: Fri, 21 Mar 2014 13:20:13 +0100

> @@ -135,5 +137,10 @@ EXPORT_SYMBOL_GPL(skb_defer_rx_timestamp);
>  
>  void __init skb_timestamping_init(void)
>  {
> -	BUG_ON(sk_chk_filter(ptp_filter, ARRAY_SIZE(ptp_filter)));
> +	struct sock_filter ptp_filter[] = { PTP_FILTER };

Perhaps you want this to be static or static const?  Why copy
PTP_FILTER onto the stack at run time?

> +	struct sock_fprog ptp_prog = {
> +		.len = ARRAY_SIZE(ptp_filter), .filter = ptp_filter,
> +	};
> +
> +	BUG_ON(sk_unattached_filter_create(&ptp_insns, &ptp_prog));
>  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-03-24 22:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-21 12:20 [PATCH net-next 0/9] BPF updates Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 1/9] net: filter: add jited flag to indicate jit compiled filters Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 2/9] net: filter: keep original BPF program around Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 3/9] net: filter: move filter accounting to filter core Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 4/9] net: ptp: use sk_unattached_filter_create() for BPF Daniel Borkmann
2014-03-24 22:39   ` David Miller
2014-03-21 12:20 ` [PATCH net-next 5/9] net: ptp: do not reimplement PTP/BPF classifier Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 6/9] net: ppp: use sk_unattached_filter api Daniel Borkmann
2014-03-21 12:20   ` Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 7/9] net: isdn: " Daniel Borkmann
2014-03-21 12:20 ` [PATCH net-next 8/9] net: filter: rework/optimize internal BPF interpreter's instruction set Daniel Borkmann
2014-03-21 15:40   ` Kees Cook
2014-03-21 12:20 ` [PATCH net-next 9/9] doc: filter: extend BPF documentation to document new internals Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.