All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/12] BPF hardware offload via cls_bpf
@ 2016-06-01 16:50 Jakub Kicinski
  2016-06-01 16:50 ` [RFC 01/12] add basic register-field manipulation macros Jakub Kicinski
                   ` (11 more replies)
  0 siblings, 12 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Hi!

In recent months a lot of progress have been made on offloading
simpler TC classifiers.  There is also growing interest in using
BPF for generic high-speed packet processing in the kernel.
It seems beneficial to tie those two trends together and think
about hardware offloads of BPF programs.  This patch set presents
such offload to Netronome smart NICs.  cls_bpf is extended with
hardware offload capabilities and NFP driver gets a JIT translator
which in presence of capable firmware can be used to offload
the BPF program onto the card.

BPF JIT implementation is not 100% complete (e.g. missing instructions)
but it is functional - in fact I implemented only bits which clang/cBPF
used in my test programs.  Encouragingly it should be possible to
offload most (if not all) advanced BPF features onto the NIC - 
including packet modification, maps, tunnel encap/decap etc.

Examples of tests I used:

Basic eBPF/C/clang apps:
  __section_cls_entry
  int cls_entry(struct __sk_buff *skb)
  {
	if (load_byte(skb, 0) != 0x0)
		return 0;

	if (load_byte(skb, 4) != 0x1)
		return 0;

	skb->mark = 0xcafe;

	if (load_byte(skb, 50) != 0xff)
		return 0;

	return ~0U;
  }

tcpdump generated filters, for instance:
  dst 10.1.255.255 and \
  tcp and \
  (port 90 or port 91) and \
  tcp[tcpflags] & tcp-syn != 0


First patch is not really related but others depend on it, I hope to
post it separately soon.


Jakub Kicinski (12):
  add basic register-field manipulation macros
  net: cls_bpf: add hardware offload
  net: cls_bpf: limit hardware offload by software-only flag
  net: cls_bpf: add support for marking filters as hardware-only
  nfp: add BPF to NFP code translator
  nfp: add hardware cls_bpf offload
  nfp: add skb mark support to the bpf offload
  net: cls_bpf: allow offloaded filters to update stats
  nfp: report statistics of offloaded filters
  nfp: bpf: optimize register init
  nfp: bpf: add register rename
  nfp: bpf: add denser mode of execution

 drivers/net/ethernet/netronome/nfp/Makefile        |    4 +-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h       |  191 ++++
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h       |  136 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c   | 1027 ++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |   34 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |   41 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   21 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |    3 +
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  256 +++++
 include/linux/bitfield.h                           |   89 ++
 include/linux/netdevice.h                          |    2 +
 include/net/pkt_cls.h                              |   16 +
 include/uapi/linux/pkt_cls.h                       |    1 +
 net/sched/cls_bpf.c                                |  115 ++-
 14 files changed, 1928 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_asm.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
 create mode 100644 include/linux/bitfield.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC 01/12] add basic register-field manipulation macros
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 20:15   ` Hannes Frederic Sowa
  2016-06-01 16:50 ` [RFC 02/12] net: cls_bpf: add hardware offload Jakub Kicinski
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev
  Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski, Ivo van Doorn,
	Felix Fietkau

C bitfields are problematic and best avoided.  Developers
interacting with hardware registers find themselves searching
for easy-to-use alternatives.  Common approach is to define
structures or sets of macros containing mask and shift pair.
Operation on the register are then performed as follows:

 field = (reg >> shift) & mask;

 reg &= ~(mask << shift);
 reg |= (field & mask) << shift;

Defining shift and mask separately is tedious.  Ivo van Doorn
came up with an idea of computing them at compilation time
based on a single shifted mask (later refined by Felix) which
can be used like this:

 field = FIELD_GET(REG_FIELD, reg);

 reg &= ~REG_FIELD;
 reg |= FIELD_PUT(REG_FIELD, field);

FIELD_{GET,PUT} macros take care of finding out what the
appropriate shift is based on compilation time ffs operation.

GENMASK can be used to define registers (which is usually
less error-prone and easier to match with datasheets).

This approach is the most convenient I've seen so to limit code
multiplication let's move the macros to a global header file.

Compared to Felix Fietkau's version I:
 - edited the comments a bit;
 - gave auxiliary macros _bf_ prefix;
 - dropped the UL specifier from 1 in _bf_low_bits,
 - renamed the FIELD_SET to FIELD_PUT;
 - added 64bit versions.

CC: Ivo van Doorn <IvDoorn@gmail.com>
CC: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 include/linux/bitfield.h | 89 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)
 create mode 100644 include/linux/bitfield.h

diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
new file mode 100644
index 000000000000..3815c81f5b10
--- /dev/null
+++ b/include/linux/bitfield.h
@@ -0,0 +1,89 @@
+/*
+ * Copyright (C) 2014 Felix Fietkau <nbd@openwrt.org>
+ * Copyright (C) 2004 - 2009 Ivo van Doorn <IvDoorn@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _LINUX_BITFIELD_H
+#define _LINUX_BITFIELD_H
+
+#include <asm/types.h>
+
+/*
+ * Macros to find first set bit in a variable.
+ * These macros behave the same as the __ffs() functions but the most important
+ * difference that this is done during compile-time rather than run-time.
+ */
+#define compile_ffs2(__x) \
+	__builtin_choose_expr(((__x) & 0x1), 0, 1)
+
+#define compile_ffs4(__x) \
+	__builtin_choose_expr(((__x) & 0x3), \
+			      (compile_ffs2((__x))), \
+			      (compile_ffs2((__x) >> 2) + 2))
+
+#define compile_ffs8(__x) \
+	__builtin_choose_expr(((__x) & 0xf), \
+			      (compile_ffs4((__x))), \
+			      (compile_ffs4((__x) >> 4) + 4))
+
+#define compile_ffs16(__x) \
+	__builtin_choose_expr(((__x) & 0xff), \
+			      (compile_ffs8((__x))), \
+			      (compile_ffs8((__x) >> 8) + 8))
+
+#define compile_ffs32(__x) \
+	__builtin_choose_expr(((__x) & 0xffff), \
+			      (compile_ffs16((__x))), \
+			      (compile_ffs16((__x) >> 16) + 16))
+
+#define compile_ffs64(__x) \
+	__builtin_choose_expr(((__x) & 0xffffffff), \
+			      (compile_ffs32((__x))), \
+			      (compile_ffs32(((__x) >> 16) >> 16) + 32))
+
+/*
+ * Macros to validate that the mask given contains a contiguous set of bits.
+ * Note that we cannot use the is_power_of_2() function since this check
+ * must be done at compile-time.
+ */
+#define _bf_is_power_of_two(x)	( !((x) & ((x) - 1)) )
+#define _bf_low_bits(x)		( ((x) - 1) & ~(x) )
+#define _bf_is_valid_mask(x)	_bf_is_power_of_two(1U + (x) + _bf_low_bits(x))
+
+#define _BF_FIELD_CHECK(_mask)						\
+	BUILD_BUG_ON(!(_mask) || !_bf_is_valid_mask(_mask))
+
+#define FIELD_PUT(_mask, _val)						\
+	({								\
+		_BF_FIELD_CHECK(_mask);					\
+		(((u32)(_val)) << compile_ffs32(_mask)) & _mask;	\
+	})
+
+#define FIELD_GET(_mask, _val)						\
+	({								\
+		_BF_FIELD_CHECK(_mask);					\
+		(u32)(((_val) & _mask) >> compile_ffs32(_mask));	\
+	})
+
+#define FIELD_PUT64(_mask, _val)					\
+	({								\
+		_BF_FIELD_CHECK(_mask);					\
+		(((u64)(_val)) << compile_ffs64(_mask)) & _mask;	\
+	})
+
+#define FIELD_GET64(_mask, _val)					\
+	({								\
+		_BF_FIELD_CHECK(_mask);					\
+		(u64)(((_val) & _mask) >> compile_ffs64(_mask));	\
+	})
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 02/12] net: cls_bpf: add hardware offload
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
  2016-06-01 16:50 ` [RFC 01/12] add basic register-field manipulation macros Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 17:13   ` John Fastabend
                     ` (2 more replies)
  2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
                   ` (9 subsequent siblings)
  11 siblings, 3 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

This patch adds hardware offload capability to cls_bpf classifier,
similar to what have been done with U32 and flower.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 include/linux/netdevice.h |  2 ++
 include/net/pkt_cls.h     | 14 ++++++++++
 net/sched/cls_bpf.c       | 71 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929ce8157..6c8364b8aae9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -785,6 +785,7 @@ enum {
 	TC_SETUP_MQPRIO,
 	TC_SETUP_CLSU32,
 	TC_SETUP_CLSFLOWER,
+	TC_SETUP_CLSBPF,
 };
 
 struct tc_cls_u32_offload;
@@ -795,6 +796,7 @@ struct tc_to_netdev {
 		u8 tc;
 		struct tc_cls_u32_offload *cls_u32;
 		struct tc_cls_flower_offload *cls_flower;
+		struct tc_cls_bpf_offload *cls_bpf;
 	};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0f7efa88f210..10b7e8cdc98c 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -438,4 +438,18 @@ struct tc_cls_flower_offload {
 	struct tcf_exts *exts;
 };
 
+enum tc_clsbpf_command {
+	TC_CLSBPF_ADD,
+	TC_CLSBPF_REPLACE,
+	TC_CLSBPF_DESTROY,
+};
+
+struct tc_cls_bpf_offload {
+	enum tc_clsbpf_command command;
+	struct tcf_exts *exts;
+	struct bpf_prog *filter;
+	const char *name;
+	bool exts_integrated;
+};
+
 #endif
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 7b342c779da7..b7c4c6dd6ad6 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -39,6 +39,7 @@ struct cls_bpf_prog {
 	struct list_head link;
 	struct tcf_result res;
 	bool exts_integrated;
+	bool offloaded;
 	struct tcf_exts exts;
 	u32 handle;
 	union {
@@ -140,6 +141,72 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog)
 	return !prog->bpf_ops;
 }
 
+static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
+			       enum tc_clsbpf_command cmd)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct tc_cls_bpf_offload bpf_offload = {};
+	struct tc_to_netdev offload;
+
+	offload.type = TC_SETUP_CLSBPF;
+	offload.cls_bpf = &bpf_offload;
+
+	bpf_offload.command = cmd;
+	bpf_offload.exts = &prog->exts;
+	bpf_offload.filter = prog->filter;
+	bpf_offload.name = prog->bpf_name;
+	bpf_offload.exts_integrated = prog->exts_integrated;
+
+	return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					     tp->protocol, &offload);
+}
+
+static void cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
+			    struct cls_bpf_prog *oldprog)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+	struct cls_bpf_prog *obj = prog;
+	enum tc_clsbpf_command cmd;
+
+	if (oldprog && oldprog->offloaded) {
+		if (tc_should_offload(dev, 0)) {
+			cmd = TC_CLSBPF_REPLACE;
+		} else {
+			obj = oldprog;
+			cmd = TC_CLSBPF_DESTROY;
+		}
+	} else {
+		if (!tc_should_offload(dev, 0))
+			return;
+		cmd = TC_CLSBPF_ADD;
+	}
+
+	if (cls_bpf_offload_cmd(tp, obj, cmd))
+		return;
+
+	obj->offloaded = true;
+	if (oldprog)
+		oldprog->offloaded = false;
+}
+
+static void cls_bpf_stop_offload(struct tcf_proto *tp,
+				 struct cls_bpf_prog *prog)
+{
+	struct net_device *dev = tp->q->dev_queue->dev;
+
+	if (!prog->offloaded)
+		return;
+	if (WARN_ON(!tc_should_offload(dev, 0)))
+		return;
+
+	if (cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY)) {
+		pr_err("Stopping hardware offload failed!\n");
+		return;
+	}
+
+	prog->offloaded = false;
+}
+
 static int cls_bpf_init(struct tcf_proto *tp)
 {
 	struct cls_bpf_head *head;
@@ -179,6 +246,7 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
 {
 	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) arg;
 
+	cls_bpf_stop_offload(tp, prog);
 	list_del_rcu(&prog->link);
 	tcf_unbind_filter(tp, &prog->res);
 	call_rcu(&prog->rcu, __cls_bpf_delete_prog);
@@ -195,6 +263,7 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
 		return false;
 
 	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
+		cls_bpf_stop_offload(tp, prog);
 		list_del_rcu(&prog->link);
 		tcf_unbind_filter(tp, &prog->res);
 		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
@@ -415,6 +484,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		goto errout;
 
+	cls_bpf_offload(tp, prog, oldprog);
+
 	if (oldprog) {
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
  2016-06-01 16:50 ` [RFC 01/12] add basic register-field manipulation macros Jakub Kicinski
  2016-06-01 16:50 ` [RFC 02/12] net: cls_bpf: add hardware offload Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 17:16   ` John Fastabend
                     ` (3 more replies)
  2016-06-01 16:50 ` [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only Jakub Kicinski
                   ` (8 subsequent siblings)
  11 siblings, 4 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
Unlike U32 and flower cls_bpf already has some netlink
flags defined.  I chose to create a new attribute to be
able to use the same flag values as the above.

Unknown flags are ignored and not reported upon dump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 include/net/pkt_cls.h        |  1 +
 include/uapi/linux/pkt_cls.h |  1 +
 net/sched/cls_bpf.c          | 16 ++++++++++++++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 10b7e8cdc98c..e6b3dfb3159b 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -450,6 +450,7 @@ struct tc_cls_bpf_offload {
 	struct bpf_prog *filter;
 	const char *name;
 	bool exts_integrated;
+	u32 gen_flags;
 };
 
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index f4297c8a42fe..93a86edf3bd8 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -395,6 +395,7 @@ enum {
 	TCA_BPF_FD,
 	TCA_BPF_NAME,
 	TCA_BPF_FLAGS,
+	TCA_BPF_GEN_TCA_FLAGS,
 	__TCA_BPF_MAX,
 };
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index b7c4c6dd6ad6..9f1bc37dcbbc 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -27,6 +27,8 @@ MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
 MODULE_DESCRIPTION("TC BPF based classifier");
 
 #define CLS_BPF_NAME_LEN	256
+#define CLS_BPF_SUPPORTED_GEN_FLAGS		\
+	TCA_CLS_FLAGS_SKIP_HW
 
 struct cls_bpf_head {
 	struct list_head plist;
@@ -40,6 +42,7 @@ struct cls_bpf_prog {
 	struct tcf_result res;
 	bool exts_integrated;
 	bool offloaded;
+	u32 gen_flags;
 	struct tcf_exts exts;
 	u32 handle;
 	union {
@@ -55,6 +58,7 @@ struct cls_bpf_prog {
 static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
 	[TCA_BPF_CLASSID]	= { .type = NLA_U32 },
 	[TCA_BPF_FLAGS]		= { .type = NLA_U32 },
+	[TCA_BPF_GEN_TCA_FLAGS]	= { .type = NLA_U32 },
 	[TCA_BPF_FD]		= { .type = NLA_U32 },
 	[TCA_BPF_NAME]		= { .type = NLA_NUL_STRING, .len = CLS_BPF_NAME_LEN },
 	[TCA_BPF_OPS_LEN]	= { .type = NLA_U16 },
@@ -156,6 +160,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	bpf_offload.filter = prog->filter;
 	bpf_offload.name = prog->bpf_name;
 	bpf_offload.exts_integrated = prog->exts_integrated;
+	bpf_offload.gen_flags = prog->gen_flags;
 
 	return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
 					     tp->protocol, &offload);
@@ -169,14 +174,14 @@ static void cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	enum tc_clsbpf_command cmd;
 
 	if (oldprog && oldprog->offloaded) {
-		if (tc_should_offload(dev, 0)) {
+		if (tc_should_offload(dev, prog->gen_flags)) {
 			cmd = TC_CLSBPF_REPLACE;
 		} else {
 			obj = oldprog;
 			cmd = TC_CLSBPF_DESTROY;
 		}
 	} else {
-		if (!tc_should_offload(dev, 0))
+		if (!tc_should_offload(dev, prog->gen_flags))
 			return;
 		cmd = TC_CLSBPF_ADD;
 	}
@@ -378,6 +383,7 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 {
 	bool is_bpf, is_ebpf, have_exts = false;
 	struct tcf_exts exts;
+	u32 gen_flags = 0;
 	int ret;
 
 	is_bpf = tb[TCA_BPF_OPS_LEN] && tb[TCA_BPF_OPS];
@@ -400,8 +406,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 
 		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
 	}
+	if (tb[TCA_BPF_GEN_TCA_FLAGS])
+		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
 
 	prog->exts_integrated = have_exts;
+	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;
 
 	ret = is_bpf ? cls_bpf_prog_from_ops(tb, prog) :
 		       cls_bpf_prog_from_efd(tb, prog, tp);
@@ -568,6 +577,9 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
 	if (bpf_flags && nla_put_u32(skb, TCA_BPF_FLAGS, bpf_flags))
 		goto nla_put_failure;
+	if (prog->gen_flags &&
+	    nla_put_u32(skb, TCA_BPF_GEN_TCA_FLAGS, prog->gen_flags))
+		goto nla_put_failure;
 
 	nla_nest_end(skb, nest);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (2 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 17:19   ` John Fastabend
  2016-06-01 19:57   ` Daniel Borkmann
  2016-06-01 16:50 ` [RFC 05/12] nfp: add BPF to NFP code translator Jakub Kicinski
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_SW flag.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 net/sched/cls_bpf.c | 39 +++++++++++++++++++++++++++++----------
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 9f1bc37dcbbc..1083910cebaf 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -28,7 +28,7 @@ MODULE_DESCRIPTION("TC BPF based classifier");
 
 #define CLS_BPF_NAME_LEN	256
 #define CLS_BPF_SUPPORTED_GEN_FLAGS		\
-	TCA_CLS_FLAGS_SKIP_HW
+	(TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW)
 
 struct cls_bpf_head {
 	struct list_head plist;
@@ -98,7 +98,9 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 		qdisc_skb_cb(skb)->tc_classid = prog->res.classid;
 
-		if (at_ingress) {
+		if (tc_skip_sw(prog->gen_flags)) {
+			filter_res = 0;
+		} else if (at_ingress) {
 			/* It is safe to push/pull even if skb_shared() */
 			__skb_push(skb, skb->mac_len);
 			bpf_compute_data_end(skb);
@@ -166,32 +168,42 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 					     tp->protocol, &offload);
 }
 
-static void cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
-			    struct cls_bpf_prog *oldprog)
+static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
+			   struct cls_bpf_prog *oldprog)
 {
 	struct net_device *dev = tp->q->dev_queue->dev;
 	struct cls_bpf_prog *obj = prog;
 	enum tc_clsbpf_command cmd;
+	bool skip_sw;
+	int ret;
+
+	skip_sw = tc_skip_sw(prog->gen_flags) ||
+		(oldprog && tc_skip_sw(oldprog->gen_flags));
 
 	if (oldprog && oldprog->offloaded) {
 		if (tc_should_offload(dev, prog->gen_flags)) {
 			cmd = TC_CLSBPF_REPLACE;
-		} else {
+		} else if (!tc_skip_sw(prog->gen_flags)) {
 			obj = oldprog;
 			cmd = TC_CLSBPF_DESTROY;
+		} else {
+			return -EINVAL;
 		}
 	} else {
 		if (!tc_should_offload(dev, prog->gen_flags))
-			return;
+			return skip_sw ? -EINVAL : 0;;
 		cmd = TC_CLSBPF_ADD;
 	}
 
-	if (cls_bpf_offload_cmd(tp, obj, cmd))
-		return;
+	ret = cls_bpf_offload_cmd(tp, obj, cmd);
+	if (ret)
+		return skip_sw ? ret : 0;
 
 	obj->offloaded = true;
 	if (oldprog)
 		oldprog->offloaded = false;
+
+	return 0;
 }
 
 static void cls_bpf_stop_offload(struct tcf_proto *tp,
@@ -406,8 +418,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
 
 		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
 	}
-	if (tb[TCA_BPF_GEN_TCA_FLAGS])
+	if (tb[TCA_BPF_GEN_TCA_FLAGS]) {
 		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
+		if (!tc_flags_valid(gen_flags))
+			return -EINVAL;
+	}
 
 	prog->exts_integrated = have_exts;
 	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;
@@ -493,7 +508,11 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	if (ret < 0)
 		goto errout;
 
-	cls_bpf_offload(tp, prog, oldprog);
+	ret = cls_bpf_offload(tp, prog, oldprog);
+	if (ret) {
+		cls_bpf_delete_prog(tp, prog);
+		return ret;
+	}
 
 	if (oldprog) {
 		list_replace_rcu(&oldprog->link, &prog->link);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 05/12] nfp: add BPF to NFP code translator
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (3 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 20:03   ` Daniel Borkmann
  2016-06-01 16:50 ` [RFC 06/12] nfp: add hardware cls_bpf offload Jakub Kicinski
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Add translator for JITing eBPF to operations which
can be executed on NFP's programmable engines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/Makefile      |   3 +-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h     | 191 +++++
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h     | 112 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 937 +++++++++++++++++++++++
 4 files changed, 1242 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_asm.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index 68178819ff12..46648404e750 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_NFP_NETVF)	+= nfp_netvf.o
 nfp_netvf-objs := \
 	    nfp_net_common.o \
 	    nfp_net_ethtool.o \
-	    nfp_netvf_main.o
+	    nfp_netvf_main.o \
+	    nfp_bpf_jit.o
 
 nfp_netvf-$(CONFIG_NFP_NET_DEBUG) += nfp_net_debugfs.o
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
new file mode 100644
index 000000000000..8ea1c35665ee
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -0,0 +1,191 @@
+/*
+ * Copyright (C) 2016 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __NFP_ASM_H__
+#define __NFP_ASM_H__ 1
+
+#include "nfp_bpf.h"
+
+#define REG_NONE	0
+#define RE_REG_IMM(x)	(0x20 | ((x) & 0x1f) | (((x) & 0x60) << 1))
+#define RE_REG_IMM_MAX	0x07fULL
+#define RE_REG_NO_DST	0x20
+#define RE_REG_XFR	0x80
+#define UR_REG_IMM	0x300
+#define UR_REG_IMM_MAX	0x0ffULL
+#define UR_REG_NO_DST	0x300
+#define UR_REG_XFR	0x180
+
+#define OP_BR_BASE	0x0d800000020ULL
+#define OP_BR_BASE_MASK	0x0f8000c3ce0ULL
+#define OP_BR_MASK	0x0000000001fULL
+#define OP_BR_EV_PIP	0x00000000300ULL
+#define OP_BR_CSS	0x0000003c000ULL
+#define OP_BR_ADDR_LO	0x007ffc00000ULL
+#define OP_BR_ADDR_HI	0x10000000000ULL
+
+enum br_mask {
+	BR_BEQ = 0x00,
+	BR_BNE = 0x01,
+	BR_BHS = 0x04,
+	BR_BLO = 0x05,
+	BR_BGE = 0x08,
+	BR_UNC = 0x18,
+};
+
+enum br_ev_pip {
+	BR_EV_PIP_UNCOND = 0,
+	BR_EV_PIP_COND = 1,
+};
+
+enum br_ctx_signal_state {
+	BR_CSS_NONE = 2,
+};
+
+#define OP_IMMED_A_SRC	0x000000003ffULL
+#define OP_IMMED_B_SRC	0x000000ffc00ULL
+#define OP_IMMED_IMM	0x0000ff00000ULL
+#define OP_IMMED_WIDTH	0x00060000000ULL
+#define OP_IMMED_INV	0x00080000000ULL
+#define OP_IMMED_SHIFT	0x00600000000ULL
+#define OP_IMMED_BASE	0x0f000000000ULL
+#define OP_IMMED_WR_AB	0x20000000000ULL
+
+enum immed_width {
+	IMMED_WIDTH_ALL = 0,
+	IMMED_WIDTH_BYTE = 1,
+	IMMED_WIDTH_WORD = 2,
+};
+
+enum immed_shift {
+	IMMED_SHIFT_0B = 0,
+	IMMED_SHIFT_1B = 1,
+	IMMED_SHIFT_2B = 2,
+};
+
+#define OP_SHF_BASE	0x08000000000ULL
+#define OP_SHF_A_SRC	0x000000000ffULL
+#define OP_SHF_SC	0x00000000300ULL
+#define OP_SHF_B_SRC	0x0000003fc00ULL
+#define OP_SHF_I8	0x00000040000ULL
+#define OP_SHF_SW	0x00000080000ULL
+#define OP_SHF_DST	0x0000ff00000ULL
+#define OP_SHF_SHIFT	0x001f0000000ULL
+#define OP_SHF_OP	0x00e00000000ULL
+#define OP_SHF_DST_AB	0x01000000000ULL
+#define OP_SHF_WR_AB	0x20000000000ULL
+
+enum shf_op {
+	SHF_OP_NONE = 0,
+	SHF_OP_AND = 2,
+};
+
+enum shf_sc {
+	SHF_SC_R_ROT = 0,
+	SHF_SC_R_SHF = 1,
+	SHF_SC_L_SHF = 2,
+	SHF_SC_R_DSHF = 3,
+};
+
+#define OP_ALU_A_SRC	0x000000003ffULL
+#define OP_ALU_B_SRC	0x000000ffc00ULL
+#define OP_ALU_DST	0x0003ff00000ULL
+#define OP_ALU_SW	0x00040000000ULL
+#define OP_ALU_OP	0x00f80000000ULL
+#define OP_ALU_DST_AB	0x01000000000ULL
+#define OP_ALU_BASE	0x0a000000000ULL
+#define OP_ALU_WR_AB	0x20000000000ULL
+
+enum alu_op {
+	ALU_OP_NONE = 0x00,
+	ALU_OP_ADD = 0x01,
+	ALU_OP_AND = 0x08,
+	ALU_OP_SUB = 0x15,
+	ALU_OP_XOR = 0x18,
+};
+
+enum alu_dst_ab {
+	ALU_DST_A = 0,
+	ALU_DST_B = 1,
+};
+
+#define OP_CMD_A_SRC	 0x000000000ffULL
+#define OP_CMD_CTX	 0x00000000300ULL
+#define OP_CMD_B_SRC	 0x0000003fc00ULL
+#define OP_CMD_TOKEN	 0x000000c0000ULL
+#define OP_CMD_XFER	 0x00001f00000ULL
+#define OP_CMD_CNT	 0x0000e000000ULL
+#define OP_CMD_SIG	 0x000f0000000ULL
+#define OP_CMD_TGT_CMD	 0x07f00000000ULL
+#define OP_CMD_MODE	0x1c0000000000ULL
+
+struct cmd_tgt_act {
+	u8 token;
+	u8 tgt_cmd;
+};
+
+enum cmd_tgt_map {
+	CMD_TGT_READ8,
+	CMD_TGT_WRITE8,
+	CMD_TGT_READ_LE,
+	CMD_TGT_READ_SWAP_LE,
+	__CMD_TGT_MAP_SIZE,
+};
+
+enum cmd_mode {
+	CMD_MODE_40b_AB	= 0,
+	CMD_MODE_40b_BA	= 1,
+	CMD_MODE_32b	= 4,
+};
+
+enum cmd_ctx_swap {
+	CMD_CTX_SWAP = 0,
+	CMD_CTX_NO_SWAP = 3,
+};
+
+#define OP_LCSR_BASE	0x0fc00000000ULL
+#define OP_LCSR_A_SRC	0x000000003ffULL
+#define OP_LCSR_B_SRC	0x000000ffc00ULL
+#define OP_LCSR_WRITE	0x00000200000ULL
+#define OP_LCSR_ADDR	0x001ffc00000ULL
+
+enum lcsr_wr_src {
+	LCSR_WR_AREG,
+	LCSR_WR_BREG,
+	LCSR_WR_IMM,
+};
+
+#define OP_CARB_BASE	0x0e000000000ULL
+#define OP_CARB_OR	0x00000010000ULL
+
+#endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf.h b/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
new file mode 100644
index 000000000000..8fa9ff28ba80
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
@@ -0,0 +1,112 @@
+/*
+ * Copyright (C) 2016 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __NFP_BPF_H__
+#define __NFP_BPF_H__ 1
+
+#include <linux/bitfield.h>
+#include <linux/bpf.h>
+#include <linux/list.h>
+#include <linux/types.h>
+
+#define FIT_FIELD(mask, val)  (!((((u64)val) << compile_ffs64(mask)) & ~(mask)))
+
+/* For branch fixup logic use up-most byte of branch instruction as scratch
+ * area.  Remember to clear this before sending instructions to HW!
+ */
+#define OP_BR_SPECIAL	0xff00000000000000ULL
+
+enum br_special {
+	OP_BR_NORMAL = 0,
+	OP_BR_GO_OUT,
+	OP_BR_GO_ABORT,
+};
+
+struct nfp_prog;
+struct nfp_insn_meta;
+typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
+
+/**
+ * struct nfp_insn_meta - BPF instruction wrapper
+ * @insn: BPF instruction
+ * @off: index of first generated machine instruction (in nfp_prog.prog)
+ * @skip: skip this instruction (optimized out)
+ * @double_cb: callback for second part of the instruction
+ * @l: link on nfp_prog->insns list
+ */
+struct nfp_insn_meta {
+	struct bpf_insn insn;
+	unsigned int off;
+	bool skip;
+	instr_cb_t double_cb;
+
+	struct list_head l;
+};
+
+/**
+ * struct nfp_prog - nfp BPF program
+ * @prog: machine code
+ * @prog_len: number of valid instructions in @prog array
+ * @__prog_alloc_len: alloc size of @prog array
+ * @start_off: address of the first instruction in the memory
+ * @tgt_out: jump target for normal exit
+ * @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
+ * @n_translated: number of successfully translated instructions (for errors)
+ * @error: error code if something went wrong
+ * @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
+ */
+struct nfp_prog {
+	u64 *prog;
+	unsigned int prog_len;
+	unsigned int __prog_alloc_len;
+
+	unsigned int start_off;
+	unsigned int tgt_out;
+	unsigned int tgt_abort;
+
+	unsigned int n_translated;
+	int error;
+
+	struct list_head insns;
+};
+
+struct nfp_bpf_result {
+	unsigned int n_instr;
+};
+
+int
+nfp_bpf_jit(struct bpf_prog *filter, void *prog, unsigned int prog_start,
+	    unsigned int tgt_out, unsigned int tgt_abort,
+	    unsigned int prog_sz, struct nfp_bpf_result *res);
+
+#endif
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
new file mode 100644
index 000000000000..d7eecfceba5c
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -0,0 +1,937 @@
+/*
+ * Copyright (C) 2016 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/unistd.h>
+
+#include "nfp_asm.h"
+#include "nfp_bpf.h"
+
+#define REG_PKT_N	31
+#define REG_PKT_BANK	ALU_DST_A
+#define REG_LEN_N	31
+#define REG_LEN_BANK	ALU_DST_B
+
+#define REG_IMM0_N	30 /* Bank AB */
+#define REG_QNUM	29 /* Bank AB */
+
+/* --- NFP prog --- */
+/* Foreach "multiple" entries macros provide pos and next<n> pointers.
+ * It's safe to modify the next pointers (but not pos).
+ */
+#define nfp_for_each_insn_walk2(nfp_prog, pos, next)			\
+	for (pos = list_first_entry(&(nfp_prog)->insns, typeof(*pos), l), \
+	     next = list_next_entry(pos, l);			\
+	     &(nfp_prog)->insns != &pos->l &&			\
+	     &(nfp_prog)->insns != &next->l;			\
+	     pos = nfp_meta_next(pos),				\
+	     next = nfp_meta_next(pos))
+
+#define nfp_for_each_insn_walk3(nfp_prog, pos, next, next2)		\
+	for (pos = list_first_entry(&(nfp_prog)->insns, typeof(*pos), l), \
+	     next = list_next_entry(pos, l),			\
+	     next2 = list_next_entry(next, l);			\
+	     &(nfp_prog)->insns != &pos->l &&			\
+	     &(nfp_prog)->insns != &next->l &&			\
+	     &(nfp_prog)->insns != &next2->l;			\
+	     pos = nfp_meta_next(pos),				\
+	     next = nfp_meta_next(pos),				\
+	     next2 = nfp_meta_next(next))
+
+#define nfp_meta_next(meta)	list_next_entry(meta, l)
+#define nfp_meta_prev(meta)	list_prev_entry(meta, l)
+
+static bool
+nfp_meta_has_next(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return meta->l.next != &nfp_prog->insns;
+}
+
+static bool
+nfp_meta_has_prev(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return meta->l.prev != &nfp_prog->insns;
+}
+
+static void nfp_prog_free(struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta, *tmp;
+
+	list_for_each_entry_safe(meta, tmp, &nfp_prog->insns, l) {
+		list_del(&meta->l);
+		kfree(meta);
+	}
+	kfree(nfp_prog);
+}
+
+static struct nfp_insn_meta *
+nfp_prog_new_instr(struct nfp_prog *nfp_prog, const struct bpf_insn *insn)
+{
+	struct nfp_insn_meta *meta;
+
+	meta = kzalloc(sizeof(*meta), GFP_KERNEL);
+	if (!meta)
+		return NULL;
+
+	meta->insn = *insn;
+
+	list_add_tail(&meta->l, &nfp_prog->insns);
+
+	return meta;
+}
+
+static void nfp_prog_push(struct nfp_prog *nfp_prog, u64 insn)
+{
+	if (nfp_prog->__prog_alloc_len == nfp_prog->prog_len) {
+		nfp_prog->error = -ENOSPC;
+		return;
+	}
+
+	nfp_prog->prog[nfp_prog->prog_len] = insn;
+	nfp_prog->prog_len++;
+}
+
+static unsigned int nfp_prog_current_offset(struct nfp_prog *nfp_prog)
+{
+	return nfp_prog->start_off + nfp_prog->prog_len;
+}
+
+static unsigned int
+nfp_prog_offset_to_index(struct nfp_prog *nfp_prog, unsigned int offset)
+{
+	return offset - nfp_prog->start_off;
+}
+
+/* --- Emitters --- */
+static const struct cmd_tgt_act cmd_tgt_act[__CMD_TGT_MAP_SIZE] = {
+	[CMD_TGT_WRITE8] =		{ 0x00, 0x42 },
+	[CMD_TGT_READ8] =		{ 0x01, 0x43 },
+	[CMD_TGT_READ_LE] =		{ 0x01, 0x40 },
+	[CMD_TGT_READ_SWAP_LE] =	{ 0x03, 0x40 },
+};
+
+static void
+__emit_cmd(struct nfp_prog *nfp_prog, enum cmd_tgt_map op,
+	   u8 mode, u8 xfer, u8 areg, u8 breg, u8 size, bool sync)
+{
+	enum cmd_ctx_swap ctx;
+	u64 insn;
+
+	if (sync)
+		ctx = CMD_CTX_SWAP;
+	else
+		ctx = CMD_CTX_NO_SWAP;
+
+	insn =	FIELD_PUT64(OP_CMD_A_SRC, areg) |
+		FIELD_PUT64(OP_CMD_CTX, ctx) |
+		FIELD_PUT64(OP_CMD_B_SRC, breg) |
+		FIELD_PUT64(OP_CMD_TOKEN, cmd_tgt_act[op].token) |
+		FIELD_PUT64(OP_CMD_XFER, xfer) |
+		FIELD_PUT64(OP_CMD_CNT, size) |
+		FIELD_PUT64(OP_CMD_SIG, sync) |
+		FIELD_PUT64(OP_CMD_TGT_CMD, cmd_tgt_act[op].tgt_cmd) |
+		FIELD_PUT64(OP_CMD_MODE, mode);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+static void
+__emit_br(struct nfp_prog *nfp_prog, enum br_mask mask, enum br_ev_pip ev_pip,
+	  enum br_ctx_signal_state css, u16 addr)
+{
+	u16 addr_lo, addr_hi;
+	u64 insn;
+
+	addr_lo = addr & (OP_BR_ADDR_LO >> compile_ffs64(OP_BR_ADDR_LO));
+	addr_hi = addr != addr_lo;
+
+	insn = OP_BR_BASE |
+		FIELD_PUT64(OP_BR_MASK, mask) |
+		FIELD_PUT64(OP_BR_EV_PIP, ev_pip) |
+		FIELD_PUT64(OP_BR_CSS, css) |
+		FIELD_PUT64(OP_BR_ADDR_LO, addr_lo) |
+		FIELD_PUT64(OP_BR_ADDR_HI, addr_hi);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+static bool pack_immed(u32 imm, u16 *val, enum immed_shift *shift)
+{
+	if (!(imm & 0xffff0000)) {
+		*val = imm;
+		*shift = IMMED_SHIFT_0B;
+	} else if (!(imm & 0xff0000ff)) {
+		*val = imm >> 8;
+		*shift = IMMED_SHIFT_1B;
+	} else if (!(imm & 0x0000ffff)) {
+		*val = imm >> 16;
+		*shift = IMMED_SHIFT_2B;
+	} else {
+		return false;
+	}
+
+	return true;
+}
+
+static void
+__emit_immed(struct nfp_prog *nfp_prog, u8 dst, enum alu_dst_ab dst_ab,
+	     u32 imm, bool wr_both)
+{
+	enum immed_shift shift;
+	bool invert = false;
+	u16 areg, breg;
+	u64 insn;
+	u16 val;
+
+	if (!pack_immed(imm, &val, &shift)) {
+		if (!pack_immed(~imm, &val, &shift)) {
+			nfp_prog->error = -EFAULT; /* TODO */
+			return;
+		}
+		invert = true;
+	}
+
+	if (dst_ab == ALU_DST_A) {
+		areg = dst;
+		breg = UR_REG_IMM | (val & 0xff);
+	} else {
+		areg = UR_REG_IMM | (val & 0xff);
+		breg = dst;
+	}
+
+	insn = OP_IMMED_BASE |
+		FIELD_PUT64(OP_IMMED_A_SRC, areg) |
+		FIELD_PUT64(OP_IMMED_B_SRC, breg) |
+		FIELD_PUT64(OP_IMMED_IMM, (val >> 8) & 0xff) |
+		FIELD_PUT64(OP_IMMED_WIDTH, IMMED_WIDTH_ALL) |
+		FIELD_PUT64(OP_IMMED_INV, invert) |
+		FIELD_PUT64(OP_IMMED_SHIFT, shift) |
+		FIELD_PUT64(OP_IMMED_WR_AB, wr_both);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+static void
+__emit_shf(struct nfp_prog *nfp_prog, u16 dst, enum alu_dst_ab dst_ab,
+	   enum shf_sc sc, u8 shift,
+	   u16 areg, enum shf_op op, u16 breg, bool sw, bool wr_both)
+{
+	u64 insn;
+
+	if (!FIT_FIELD(OP_SHF_SHIFT, shift)) {
+		nfp_prog->error = -EFAULT;
+		return;
+	}
+
+	if (sc == SHF_SC_L_SHF) {
+		shift = 32 - shift;
+	} else if (sc != SHF_SC_R_SHF) {
+		nfp_prog->error = -EFAULT; /* TODO */
+		return;
+	}
+
+	insn = OP_SHF_BASE |
+		FIELD_PUT64(OP_SHF_A_SRC, areg) |
+		FIELD_PUT64(OP_SHF_SC, sc) |
+		FIELD_PUT64(OP_SHF_B_SRC, breg) |
+		FIELD_PUT64(OP_SHF_SW, sw) |
+		FIELD_PUT64(OP_SHF_DST, dst) |
+		FIELD_PUT64(OP_SHF_SHIFT, shift) |
+		FIELD_PUT64(OP_SHF_OP, op) |
+		FIELD_PUT64(OP_SHF_DST_AB, dst_ab) |
+		FIELD_PUT64(OP_SHF_WR_AB, wr_both);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+static void
+__emit_alu(struct nfp_prog *nfp_prog, u16 dst, enum alu_dst_ab dst_ab,
+	   u16 areg, enum alu_op op, u16 breg, bool swap, bool wr_both)
+{
+	u64 insn;
+
+	insn = OP_ALU_BASE |
+		FIELD_PUT64(OP_ALU_A_SRC, areg) |
+		FIELD_PUT64(OP_ALU_B_SRC, breg) |
+		FIELD_PUT64(OP_ALU_DST, dst) |
+		FIELD_PUT64(OP_ALU_SW, swap) |
+		FIELD_PUT64(OP_ALU_OP, op) |
+		FIELD_PUT64(OP_ALU_DST_AB, dst_ab) |
+		FIELD_PUT64(OP_ALU_WR_AB,	wr_both);
+
+	nfp_prog_push(nfp_prog, insn);
+}
+
+/* --- Wrappers --- */
+/* ur_load_imm_any() - encode immediate or use tmp register (unrestricted)
+ * If the @imm is small enough encode it directly in operand and return
+ * otherwise load @imm to a spare register and return its encoding.
+ */
+static u16
+ur_load_imm_any(struct nfp_prog *nfp_prog, u32 imm, u16 reg,
+		enum alu_dst_ab alu_dst)
+{
+	if (FIT_FIELD(UR_REG_IMM_MAX, imm))
+		return UR_REG_IMM | imm;
+
+	__emit_immed(nfp_prog, reg, alu_dst, imm, false);
+	return reg;
+}
+
+/* re_load_imm_any() - encode immediate or use tmp register (restricted)
+ * If the @imm is small enough encode it directly in operand and return
+ * otherwise load @imm to a spare register and return its encoding.
+ */
+static u16
+re_load_imm_any(struct nfp_prog *nfp_prog, u32 imm, u16 reg,
+		enum alu_dst_ab alu_dst)
+{
+	if (FIT_FIELD(RE_REG_IMM_MAX, imm))
+		return RE_REG_IMM(imm);
+
+	__emit_immed(nfp_prog, reg, alu_dst, imm, false);
+	return reg;
+}
+
+static void wrp_br(struct nfp_prog *nfp_prog, enum br_mask mask, u16 addr)
+{
+	__emit_br(nfp_prog, mask,
+		  mask != BR_UNC ? BR_EV_PIP_COND : BR_EV_PIP_UNCOND,
+		  BR_CSS_NONE, addr);
+}
+
+static void
+wrp_br_special(struct nfp_prog *nfp_prog, enum br_mask mask,
+	       enum br_special special)
+{
+	wrp_br(nfp_prog, mask, 0);
+
+	nfp_prog->prog[nfp_prog->prog_len - 1] |=
+		FIELD_PUT64(OP_BR_SPECIAL, special);
+}
+
+static void wrp_reg_mov(struct nfp_prog *nfp_prog, u16 dst, u16 src)
+{
+	__emit_alu(nfp_prog, dst, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
+		   false, true);
+}
+
+static void wrp_reg_xor(struct nfp_prog *nfp_prog, u16 dst, u16 src)
+{
+	__emit_alu(nfp_prog, dst, ALU_DST_A, dst, ALU_OP_XOR, src, false, true);
+}
+
+static int
+construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset,
+		      u16 src, bool src_valid, u8 size)
+{
+	unsigned int i;
+	u16 shift, sz;
+	u16 imm_reg;
+
+	/* We load the value from the address indicated in @offset and then
+	 * shift out the data we don't need.  Note: this is big endian!
+	 */
+	sz = size < 4 ? 4 : size;
+	shift = size < 4 ? 4 - size : 0;
+
+	if (src_valid) {
+		/* Calculate the true offset (src_reg + imm) */
+		imm_reg = ur_load_imm_any(nfp_prog, offset,
+					  REG_IMM0_N, ALU_DST_B);
+		__emit_alu(nfp_prog, REG_IMM0_N, ALU_DST_A,
+			   src, ALU_OP_ADD, imm_reg, false, true);
+		/* Check packet length (size guaranteed to fit b/c it's u8) */
+		__emit_alu(nfp_prog, REG_IMM0_N, ALU_DST_A,
+			   REG_IMM0_N, ALU_OP_ADD, UR_REG_IMM | size,
+			   false, false);
+		__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
+			   REG_IMM0_N, ALU_OP_SUB, REG_LEN_N, true, false);
+		wrp_br_special(nfp_prog, BR_BLO, OP_BR_GO_ABORT);
+		/* Load data */
+		__emit_cmd(nfp_prog, CMD_TGT_READ8, CMD_MODE_32b, 0,
+			   REG_PKT_N, REG_IMM0_N, sz - 1, true);
+	} else {
+		/* Check packet length */
+		imm_reg = ur_load_imm_any(nfp_prog, offset + size,
+					  REG_IMM0_N, ALU_DST_A);
+		__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
+			   imm_reg, ALU_OP_SUB, REG_LEN_N, true, false);
+		wrp_br_special(nfp_prog, BR_BLO, OP_BR_GO_ABORT);
+		/* Load data */
+		imm_reg = re_load_imm_any(nfp_prog, offset,
+					  REG_IMM0_N, ALU_DST_B);
+		__emit_cmd(nfp_prog, CMD_TGT_READ8, CMD_MODE_32b, 0,
+			   REG_PKT_N, imm_reg, sz - 1, true);
+	}
+
+	i = 0;
+	if (shift)
+		__emit_shf(nfp_prog, 0, ALU_DST_A, SHF_SC_R_SHF, shift * 8,
+			   REG_NONE, SHF_OP_NONE, RE_REG_XFR | 0, false, true);
+	else
+		for (; i * 4 < size; i++)
+			wrp_reg_mov(nfp_prog, i, UR_REG_XFR | i);
+
+	if (i < 2)
+		__emit_immed(nfp_prog, 1, ALU_DST_A, 0, true);
+
+	return 0;
+}
+
+static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
+{
+	return construct_data_ind_ld(nfp_prog, offset, 0, false, size);
+}
+
+static int
+construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
+		 enum alu_op alu_op, bool sw)
+{
+	u16 imm_reg;
+
+	imm_reg = ur_load_imm_any(nfp_prog, imm, REG_IMM0_N, ALU_DST_B);
+
+	__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
+		   dst, alu_op, imm_reg, sw, false);
+	wrp_br(nfp_prog, br, off);
+
+	return 0;
+}
+
+static int
+wrp_jmp_imm(struct nfp_prog *nfp_prog, const struct bpf_insn *insn,
+	    enum br_mask mask, enum alu_op alu_op, bool sw)
+{
+	if (insn->off < 0) /* TODO */
+		return -ENOTSUPP;
+	construct_br_imm(nfp_prog, insn->imm, insn->dst_reg * 2,
+			 mask, insn->off, alu_op, sw);
+	return 0;
+}
+
+static int
+wrp_jmp_reg(struct nfp_prog *nfp_prog, const struct bpf_insn *insn,
+	    enum br_mask mask, enum alu_op alu_op, bool sw)
+{
+	if (insn->off < 0) /* TODO */
+		return -ENOTSUPP;
+	__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
+		   insn->src_reg * 2, alu_op, insn->dst_reg * 2, sw, false);
+	wrp_br(nfp_prog, mask, insn->off);
+
+	return 0;
+}
+
+/* --- Callbacks --- */
+static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+
+	wrp_reg_mov(nfp_prog, insn->dst_reg * 2, insn->src_reg * 2);
+	wrp_reg_mov(nfp_prog, insn->dst_reg * 2 + 1, insn->src_reg * 2 + 1);
+
+	return 0;
+}
+
+static int xor_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+
+	wrp_reg_xor(nfp_prog, insn->dst_reg * 2, insn->src_reg * 2);
+	wrp_reg_xor(nfp_prog, insn->dst_reg * 2 + 1, insn->src_reg * 2 + 1);
+
+	return 0;
+}
+
+static int data_ld1(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return construct_data_ld(nfp_prog, meta->insn.imm, 1);
+}
+
+static int data_ld2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return construct_data_ld(nfp_prog, meta->insn.imm, 2);
+}
+
+static int data_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return construct_data_ld(nfp_prog, meta->insn.imm, 4);
+}
+
+static int data_ind_ld1(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+				     meta->insn.src_reg * 2, true, 1);
+}
+
+static int data_ind_ld2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+				     meta->insn.src_reg * 2, true, 2);
+}
+
+static int data_ind_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return construct_data_ind_ld(nfp_prog, meta->insn.imm,
+				     meta->insn.src_reg * 2, true, 4);
+}
+
+static int mem_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	if (meta->insn.off == offsetof(struct sk_buff, len))
+		__emit_alu(nfp_prog, meta->insn.dst_reg * 2, ALU_DST_A,
+			   REG_NONE, ALU_OP_NONE, REG_LEN_N, false, true);
+	else
+		return -ENOTSUPP;
+
+	return 0;
+}
+
+static int imm_ld8_part2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	__emit_immed(nfp_prog, nfp_meta_prev(meta)->insn.dst_reg * 2 + 1,
+		     ALU_DST_A, meta->insn.imm, true);
+
+	return 0;
+}
+
+static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+
+	meta->double_cb = imm_ld8_part2;
+	__emit_immed(nfp_prog, insn->dst_reg * 2, ALU_DST_A, insn->imm, true);
+
+	return 0;
+}
+
+static int and_immX(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+	u16 imm_reg;
+
+	imm_reg = ur_load_imm_any(nfp_prog, insn->imm, REG_IMM0_N, ALU_DST_B);
+
+	__emit_alu(nfp_prog, insn->dst_reg * 2, ALU_DST_A,
+		   insn->dst_reg * 2, ALU_OP_AND, imm_reg, false, true);
+	/* Zero the upper part - imm is just 32b */
+	if (BPF_CLASS(insn->code) == BPF_ALU64)
+		__emit_immed(nfp_prog, insn->dst_reg * 2 + 1,
+			     ALU_DST_A, 0, true);
+
+	return 0;
+}
+
+static int shl_imm32(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+
+	__emit_shf(nfp_prog, insn->dst_reg * 2, ALU_DST_A,
+		   SHF_SC_L_SHF, insn->imm,
+		   REG_NONE, SHF_OP_NONE, insn->dst_reg * 2, false, true);
+
+	return 0;
+}
+
+static int mov_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	const struct bpf_insn *insn = &meta->insn;
+
+	__emit_immed(nfp_prog, insn->dst_reg * 2, ALU_DST_A, insn->imm, true);
+	/* We need to zero-extend the movs according to ABI */
+	__emit_immed(nfp_prog, insn->dst_reg * 2 + 1, ALU_DST_A, 0, true);
+
+	return 0;
+}
+
+/* Note to self - 'ja' is unconditional jump in BPF speak, not Jump Above... */
+static int ja_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	if (meta->insn.off < 0) /* TODO */
+		return -ENOTSUPP;
+	wrp_br(nfp_prog, BR_UNC, meta->insn.off);
+
+	return 0;
+}
+
+static int jeq_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_imm(nfp_prog, &meta->insn, BR_BEQ, ALU_OP_SUB, false);
+}
+
+static int jgt_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_imm(nfp_prog, &meta->insn, BR_BLO, ALU_OP_SUB, false);
+}
+
+static int jge_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_imm(nfp_prog, &meta->insn, BR_BHS, ALU_OP_SUB, true);
+}
+
+static int jset_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_imm(nfp_prog, &meta->insn, BR_BNE, ALU_OP_AND, false);
+}
+
+static int jne_imm(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_imm(nfp_prog, &meta->insn, BR_BNE, ALU_OP_SUB, false);
+}
+
+static int jeq_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_reg(nfp_prog, &meta->insn, BR_BEQ, ALU_OP_SUB, false);
+}
+
+static int jgt_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_reg(nfp_prog, &meta->insn, BR_BLO, ALU_OP_SUB, false);
+}
+
+static int jge_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_reg(nfp_prog, &meta->insn, BR_BHS, ALU_OP_SUB, true);
+}
+
+static int jset_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_reg(nfp_prog, &meta->insn, BR_BNE, ALU_OP_AND, false);
+}
+
+static int jne_reg(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	return wrp_jmp_reg(nfp_prog, &meta->insn, BR_BNE, ALU_OP_SUB, false);
+}
+
+static int goto_out(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	wrp_br_special(nfp_prog, BR_UNC, OP_BR_GO_OUT);
+
+	return 0;
+}
+
+static const instr_cb_t instr_cb[256] = {
+	[BPF_ALU64 | BPF_MOV | BPF_X] =	mov_reg64,
+	[BPF_ALU64 | BPF_MOV | BPF_K] =	mov_imm, /* imm is always 32b */
+	[BPF_ALU64 | BPF_XOR | BPF_X] =	xor_reg64,
+	[BPF_ALU64 | BPF_AND | BPF_K] =	and_immX,
+	[BPF_ALU | BPF_MOV | BPF_K] =	mov_imm,
+	[BPF_ALU | BPF_AND | BPF_K] =	and_immX,
+	[BPF_ALU | BPF_LSH | BPF_K] =	shl_imm32,
+	[BPF_LDX | BPF_MEM | BPF_W] =	mem_ld4,
+	[BPF_LD | BPF_IMM | BPF_DW] =	imm_ld8,
+	[BPF_LD | BPF_ABS | BPF_B] =	data_ld1,
+	[BPF_LD | BPF_ABS | BPF_H] =	data_ld2,
+	[BPF_LD | BPF_ABS | BPF_W] =	data_ld4,
+	[BPF_LD | BPF_IND | BPF_B] =	data_ind_ld1,
+	[BPF_LD | BPF_IND | BPF_H] =	data_ind_ld2,
+	[BPF_LD | BPF_IND | BPF_W] =	data_ind_ld4,
+	[BPF_JMP | BPF_JA | BPF_K] =	ja_imm,
+	[BPF_JMP | BPF_JEQ | BPF_K] =	jeq_imm,
+	[BPF_JMP | BPF_JGT | BPF_K] =	jgt_imm,
+	[BPF_JMP | BPF_JGE | BPF_K] =	jge_imm,
+	[BPF_JMP | BPF_JSET | BPF_K] =	jset_imm,
+	[BPF_JMP | BPF_JNE | BPF_K] =	jne_imm,
+	[BPF_JMP | BPF_JEQ | BPF_X] =	jeq_reg,
+	[BPF_JMP | BPF_JGT | BPF_X] =	jgt_reg,
+	[BPF_JMP | BPF_JGE | BPF_X] =	jge_reg,
+	[BPF_JMP | BPF_JSET | BPF_X] =	jset_reg,
+	[BPF_JMP | BPF_JNE | BPF_X] =	jne_reg,
+	[BPF_JMP | BPF_EXIT] =		goto_out,
+};
+
+/* --- Misc code --- */
+static void br_set_offset(u64 *instr, u16 offset)
+{
+	u16 addr_lo, addr_hi;
+
+	addr_lo = offset & (OP_BR_ADDR_LO >> compile_ffs64(OP_BR_ADDR_LO));
+	addr_hi = offset != addr_lo;
+	*instr &= ~(OP_BR_ADDR_HI | OP_BR_ADDR_LO);
+	*instr |= FIELD_PUT64(OP_BR_ADDR_HI, addr_hi);
+	*instr |= FIELD_PUT64(OP_BR_ADDR_LO, addr_lo);
+}
+
+/* --- Assembler logic --- */
+static int nfp_fixup_branches(struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta, *next;
+	u32 off, br_idx;
+
+	nfp_for_each_insn_walk2(nfp_prog, meta, next) {
+		if (BPF_CLASS(meta->insn.code) != BPF_JMP)
+			continue;
+
+		br_idx = nfp_prog_offset_to_index(nfp_prog, next->off) - 1;
+		if ((nfp_prog->prog[br_idx] & OP_BR_BASE_MASK) != OP_BR_BASE) {
+			pr_err("Fixup found block not ending in branch %d %02x %016llx!!\n",
+			       br_idx, meta->insn.code, nfp_prog->prog[br_idx]);
+			return -ELOOP;
+		}
+		/* Leave special branches for later */
+		if (FIELD_GET64(OP_BR_SPECIAL, nfp_prog->prog[br_idx]))
+			continue;
+
+		/* Find the target offset in assembler realm */
+		off = meta->insn.off;
+		if (!off) {
+			pr_err("Fixup found zero offset!!\n");
+			return -ELOOP;
+		}
+
+		while (off && nfp_meta_has_next(nfp_prog, next)) {
+			next = nfp_meta_next(next);
+			off--;
+		}
+		if (off) {
+			pr_err("Fixup found too large jump!! %d\n", off);
+			return -ELOOP;
+		}
+
+		if (next->skip) {
+			pr_err("Branch landing on removed instruction!!\n");
+			return -ELOOP;
+		}
+
+		br_set_offset(&nfp_prog->prog[br_idx], next->off);
+	}
+
+	/* Fixup 'goto out's separately, they can be scattered around */
+	for (br_idx = 0; br_idx < nfp_prog->prog_len; br_idx++) {
+		enum br_special special;
+
+		if ((nfp_prog->prog[br_idx] & OP_BR_BASE_MASK) != OP_BR_BASE)
+			continue;
+
+		special = FIELD_GET64(OP_BR_SPECIAL, nfp_prog->prog[br_idx]);
+		switch (special) {
+		case OP_BR_NORMAL:
+			break;
+		case OP_BR_GO_OUT:
+			br_set_offset(&nfp_prog->prog[br_idx],
+				      nfp_prog->tgt_out);
+			break;
+		case OP_BR_GO_ABORT:
+			br_set_offset(&nfp_prog->prog[br_idx],
+				      nfp_prog->tgt_abort);
+			break;
+		}
+
+		nfp_prog->prog[br_idx] &= ~OP_BR_SPECIAL;
+	}
+
+	return 0;
+}
+
+static int nfp_translate(struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta;
+	int err;
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		instr_cb_t cb = instr_cb[meta->insn.code];
+
+		meta->off = nfp_prog_current_offset(nfp_prog);
+
+		if (meta->skip) {
+			nfp_prog->n_translated++;
+			continue;
+		}
+
+		if (nfp_meta_has_prev(nfp_prog, meta) &&
+		    nfp_meta_prev(meta)->double_cb)
+			cb = nfp_meta_prev(meta)->double_cb;
+		if (!cb)
+			return -ENOENT;
+		err = cb(nfp_prog, meta);
+		if (err)
+			return err;
+
+		if (nfp_prog->error)
+			return nfp_prog->error;
+
+		nfp_prog->n_translated++;
+	}
+
+	return nfp_fixup_branches(nfp_prog);
+}
+
+static int
+nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
+		 unsigned int cnt)
+{
+	unsigned int i;
+
+	for (i = 0; i < cnt; i++)
+		if (!nfp_prog_new_instr(nfp_prog, prog + i))
+			return -ENOMEM;
+
+	return 0;
+}
+
+/* --- Optimizations --- */
+/* Remove masking after load since our load guarantees this is not needed */
+static void nfp_bpf_opt_ld_mask(struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta1, *meta2;
+	const s32 exp_mask[] = {
+		[BPF_B] = 0x000000ffU,
+		[BPF_H] = 0x0000ffffU,
+		[BPF_W] = 0xffffffffU,
+	};
+
+	nfp_for_each_insn_walk2(nfp_prog, meta1, meta2) {
+		struct bpf_insn insn, next;
+
+		insn = meta1->insn;
+		next = meta2->insn;
+
+		if (BPF_CLASS(insn.code) != BPF_LD)
+			continue;
+		if (BPF_MODE(insn.code) != BPF_ABS &&
+		    BPF_MODE(insn.code) != BPF_IND)
+			continue;
+
+		if (next.code != (BPF_ALU64 | BPF_AND | BPF_K))
+			continue;
+
+		if (!exp_mask[BPF_SIZE(insn.code)])
+			continue;
+		if (exp_mask[BPF_SIZE(insn.code)] != next.imm)
+			continue;
+
+		if (next.src_reg || next.dst_reg)
+			continue;
+
+		meta2->skip = true;
+	}
+}
+
+static void nfp_bpf_opt_ld_shift(struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta1, *meta2, *meta3;
+
+	nfp_for_each_insn_walk3(nfp_prog, meta1, meta2, meta3) {
+		struct bpf_insn insn, next1, next2;
+
+		insn = meta1->insn;
+		next1 = meta2->insn;
+		next2 = meta3->insn;
+
+		if (BPF_CLASS(insn.code) != BPF_LD)
+			continue;
+		if (BPF_MODE(insn.code) != BPF_ABS &&
+		    BPF_MODE(insn.code) != BPF_IND)
+			continue;
+		if (BPF_SIZE(insn.code) != BPF_W)
+			continue;
+
+		if (!(next1.code == (BPF_LSH | BPF_K | BPF_ALU64) &&
+		      next2.code == (BPF_RSH | BPF_K | BPF_ALU64)) &&
+		    !(next1.code == (BPF_RSH | BPF_K | BPF_ALU64) &&
+		      next2.code == (BPF_LSH | BPF_K | BPF_ALU64)))
+			continue;
+
+		if (next1.src_reg || next1.dst_reg ||
+		    next2.src_reg || next2.dst_reg)
+			continue;
+
+		if (next1.imm != 0x20 || next2.imm != 0x20)
+			continue;
+
+		meta2->skip = true;
+		meta3->skip = true;
+	}
+}
+
+static void nfp_bpf_optimize(struct nfp_prog *nfp_prog)
+{
+	nfp_bpf_opt_ld_mask(nfp_prog);
+	nfp_bpf_opt_ld_shift(nfp_prog);
+}
+
+/**
+ * nfp_bpf_jit() - translate BPF code into NFP assembly
+ * @filter:	kernel BPF filter struct
+ * @prog_mem:	memory to store assembler instructions
+ * @prog_start:	offset of the first instruction when loaded
+ * @tgt_out:	where to jump on clean exit
+ * @tgt_abort:	where to jump on abort (i.e. access beyond end of packet)
+ * @prog_sz:	size of @prog_mem in instructions
+ * @res:	achieved parameters of translation results
+ */
+int
+nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
+	    unsigned int tgt_out, unsigned int tgt_abort,
+	    unsigned int prog_sz, struct nfp_bpf_result *res)
+{
+	struct nfp_prog *nfp_prog;
+	int ret;
+
+	/* TODO: maybe make this dependent on bpf_jit_enable? */
+
+	nfp_prog = kzalloc(sizeof(*nfp_prog), GFP_KERNEL);
+	if (!nfp_prog)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&nfp_prog->insns);
+	nfp_prog->start_off = prog_start;
+	nfp_prog->tgt_out = tgt_out;
+	nfp_prog->tgt_abort = tgt_abort;
+
+	ret = nfp_prog_prepare(nfp_prog, filter->insnsi, filter->len);
+	if (ret)
+		goto out;
+
+	nfp_bpf_optimize(nfp_prog);
+
+	nfp_prog->prog = prog_mem;
+	nfp_prog->__prog_alloc_len = prog_sz;
+
+	ret = nfp_translate(nfp_prog);
+	if (ret) {
+		pr_err("Translation failed with error %d (translated: %u)\n",
+		       ret, nfp_prog->n_translated);
+		ret = -EINVAL;
+	}
+
+	res->n_instr = nfp_prog->prog_len;
+out:
+	nfp_prog_free(nfp_prog);
+
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (4 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 05/12] nfp: add BPF to NFP code translator Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 20:20   ` Daniel Borkmann
  2016-06-01 23:03   ` Daniel Borkmann
  2016-06-01 16:50 ` [RFC 07/12] nfp: add skb mark support to the bpf offload Jakub Kicinski
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Add hardware cls_bpf offload on our smart NICs.  Detect if
capable firmware is loaded and use it to load the code JITed
with just added translator onto programmable engines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/Makefile        |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |  13 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |  30 +++-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |  18 +-
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   | 191 +++++++++++++++++++++
 5 files changed, 249 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index 46648404e750..6d1a83cd17e8 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_NFP_NETVF)	+= nfp_netvf.o
 nfp_netvf-objs := \
 	    nfp_net_common.o \
 	    nfp_net_ethtool.o \
+	    nfp_net_offload.o \
 	    nfp_netvf_main.o \
 	    nfp_bpf_jit.o
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index e744acc18ef4..784287e9ccaa 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -413,6 +413,7 @@ static inline bool nfp_net_fw_ver_eq(struct nfp_net_fw_version *fw_ver,
  * @is_vf:              Is the driver attached to a VF?
  * @is_nfp3200:         Is the driver for a NFP-3200 card?
  * @fw_loaded:          Is the firmware loaded?
+ * @bpf_offload_skip_sw:    Offloaded BPF program will not be rerun by cls_bpf
  * @ctrl:               Local copy of the control register/word.
  * @fl_bufsz:           Currently configured size of the freelist buffers
  * @rx_offset:		Offset in the RX buffers where packet data starts
@@ -473,6 +474,7 @@ struct nfp_net {
 	unsigned is_vf:1;
 	unsigned is_nfp3200:1;
 	unsigned fw_loaded:1;
+	unsigned bpf_offload_skip_sw:1;
 
 	u32 ctrl;
 	u32 fl_bufsz;
@@ -566,7 +568,12 @@ static inline void nn_writeb(struct nfp_net *nn, int off, u8 val)
 	writeb(val, nn->ctrl_bar + off);
 }
 
-/* NFP-3200 can't handle 16-bit accesses too well - hence no readw/writew */
+/* NFP-3200 can't handle 16-bit accesses too well */
+static inline u16 nn_readw(struct nfp_net *nn, int off)
+{
+	WARN_ON_ONCE(nn->is_nfp3200);
+	return readw(nn->ctrl_bar + off);
+}
 
 static inline u32 nn_readl(struct nfp_net *nn, int off)
 {
@@ -734,6 +741,10 @@ int nfp_net_irqs_alloc(struct nfp_net *nn);
 void nfp_net_irqs_disable(struct nfp_net *nn);
 int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt);
 
+int
+nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
+		    struct tc_cls_bpf_offload *cls_bpf);
+
 #ifdef CONFIG_NFP_NET_DEBUG
 void nfp_net_debugfs_create(void);
 void nfp_net_debugfs_destroy(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index fa47c14c743a..c5acdf703b7f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -61,6 +61,7 @@
 
 #include <linux/ktime.h>
 
+#include <net/pkt_cls.h>
 #include <net/vxlan.h>
 
 #include "nfp_net_ctrl.h"
@@ -2386,6 +2387,21 @@ static struct rtnl_link_stats64 *nfp_net_stat64(struct net_device *netdev,
 	return stats;
 }
 
+static int
+nfp_net_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+		 struct tc_to_netdev *tc)
+{
+	struct nfp_net *nn = netdev_priv(netdev);
+
+	if (TC_H_MAJ(handle) != TC_H_MAJ(TC_H_INGRESS))
+		return -EINVAL;
+
+	if (tc->type == TC_SETUP_CLSBPF && nn->cap & NFP_NET_CFG_CTRL_BPF)
+		return nfp_net_bpf_offload(nn, handle, proto, tc->cls_bpf);
+
+	return -EINVAL;
+}
+
 static int nfp_net_set_features(struct net_device *netdev,
 				netdev_features_t features)
 {
@@ -2440,6 +2456,11 @@ static int nfp_net_set_features(struct net_device *netdev,
 			new_ctrl &= ~NFP_NET_CFG_CTRL_GATHER;
 	}
 
+	if (changed & NETIF_F_HW_TC && nn->ctrl & NFP_NET_CFG_CTRL_BPF) {
+		nn_err(nn, "Cannot disable HW TC offload while in use\n");
+		return -EBUSY;
+	}
+
 	nn_dbg(nn, "Feature change 0x%llx -> 0x%llx (changed=0x%llx)\n",
 	       netdev->features, features, changed);
 
@@ -2583,6 +2604,7 @@ static const struct net_device_ops nfp_net_netdev_ops = {
 	.ndo_stop		= nfp_net_netdev_close,
 	.ndo_start_xmit		= nfp_net_tx,
 	.ndo_get_stats64	= nfp_net_stat64,
+	.ndo_setup_tc		= nfp_net_setup_tc,
 	.ndo_tx_timeout		= nfp_net_tx_timeout,
 	.ndo_set_rx_mode	= nfp_net_set_rx_mode,
 	.ndo_change_mtu		= nfp_net_change_mtu,
@@ -2608,7 +2630,7 @@ void nfp_net_info(struct nfp_net *nn)
 		nn->fw_ver.resv, nn->fw_ver.class,
 		nn->fw_ver.major, nn->fw_ver.minor,
 		nn->max_mtu);
-	nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+	nn_info(nn, "CAP: %#x %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
 		nn->cap,
 		nn->cap & NFP_NET_CFG_CTRL_PROMISC  ? "PROMISC "  : "",
 		nn->cap & NFP_NET_CFG_CTRL_L2BC     ? "L2BCFILT " : "",
@@ -2625,7 +2647,8 @@ void nfp_net_info(struct nfp_net *nn)
 		nn->cap & NFP_NET_CFG_CTRL_MSIXAUTO ? "AUTOMASK " : "",
 		nn->cap & NFP_NET_CFG_CTRL_IRQMOD   ? "IRQMOD "   : "",
 		nn->cap & NFP_NET_CFG_CTRL_VXLAN    ? "VXLAN "    : "",
-		nn->cap & NFP_NET_CFG_CTRL_NVGRE    ? "NVGRE "	  : "");
+		nn->cap & NFP_NET_CFG_CTRL_NVGRE    ? "NVGRE "	  : "",
+		nn->cap & NFP_NET_CFG_CTRL_BPF      ? "BPF "	  : "");
 }
 
 /**
@@ -2793,6 +2816,9 @@ int nfp_net_netdev_init(struct net_device *netdev)
 
 	netdev->features = netdev->hw_features;
 
+	if (nn->cap & NFP_NET_CFG_CTRL_BPF)
+		netdev->hw_features |= NETIF_F_HW_TC;
+
 	/* Advertise but disable TSO by default. */
 	netdev->features &= ~(NETIF_F_TSO | NETIF_F_TSO6);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
index ad6c4e31cedd..c26449655419 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
@@ -123,6 +123,7 @@
 #define   NFP_NET_CFG_CTRL_L2SWITCH_LOCAL (0x1 << 23) /* Switch to local */
 #define   NFP_NET_CFG_CTRL_VXLAN	  (0x1 << 24) /* VXLAN tunnel support */
 #define   NFP_NET_CFG_CTRL_NVGRE	  (0x1 << 25) /* NVGRE tunnel support */
+#define   NFP_NET_CFG_CTRL_BPF		  (0x1 << 27) /* BPF offload */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
 #define   NFP_NET_CFG_UPDATE_RING         (0x1 <<  1) /* Ring config change */
@@ -134,6 +135,7 @@
 #define   NFP_NET_CFG_UPDATE_RESET        (0x1 <<  7) /* Update due to FLR */
 #define   NFP_NET_CFG_UPDATE_IRQMOD       (0x1 <<  8) /* IRQ mod change */
 #define   NFP_NET_CFG_UPDATE_VXLAN	  (0x1 <<  9) /* VXLAN port change */
+#define   NFP_NET_CFG_UPDATE_BPF	  (0x1 << 10) /* BPF program load */
 #define   NFP_NET_CFG_UPDATE_ERR          (0x1 << 31) /* A error occurred */
 #define NFP_NET_CFG_TXRS_ENABLE         0x0008
 #define NFP_NET_CFG_RXRS_ENABLE         0x0010
@@ -196,7 +198,21 @@
 #define NFP_NET_CFG_VXLAN_SZ		  0x0008
 
 /**
- * 64B reserved for future use (0x0080 - 0x00c0)
+ * NFP6000 - BPF loading
+ * @NFP_NET_CFG_BPF_SIZE:	Size of the JITed BPF code in bytes
+ * @NFP_NET_CFG_BPF_ADDR:	DMA address of the buffer with JITed BPF code
+ * @NFP_NET_CFG_BPF_START:	Offset at which BPF will be loaded
+ * @NFP_NET_CFG_BPF_MAX_LEN:	Maximum size of JITed BPF code in bytes
+ */
+#define NFP_NET_CFG_BPF_MAX_LEN		0x0080
+#define NFP_NET_CFG_BPF_START		0x0082
+#define NFP_NET_CFG_BPF_TGT_OUT		0x0084
+#define NFP_NET_CFG_BPF_TGT_ABORT	0x0086
+#define NFP_NET_CFG_BPF_SIZE		0x0088
+#define NFP_NET_CFG_BPF_ADDR		0x008c
+
+/**
+ * 48B reserved for future use (0x0090 - 0x00c0)
  */
 #define NFP_NET_CFG_RESERVED            0x0080
 #define NFP_NET_CFG_RESERVED_SZ         0x0040
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
new file mode 100644
index 000000000000..dd2e428fd35a
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
@@ -0,0 +1,191 @@
+/*
+ * Copyright (C) 2016 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      1. Redistributions of source code must retain the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer.
+ *
+ *      2. Redistributions in binary form must reproduce the above
+ *         copyright notice, this list of conditions and the following
+ *         disclaimer in the documentation and/or other materials
+ *         provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * nfp_net_offload.c
+ * Netronome network device driver: TC offload functions for PF and VF
+ */
+
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/pci.h>
+
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gact.h>
+
+#include "nfp_bpf.h"
+#include "nfp_net_ctrl.h"
+#include "nfp_net.h"
+
+static int
+nfp_net_bpf_offload_prepare(struct nfp_net *nn,
+			    struct tc_cls_bpf_offload *cls_bpf,
+			    struct nfp_bpf_result *res,
+			    void **code, dma_addr_t *dma_addr, u16 max_instr)
+{
+	unsigned int code_sz = max_instr * sizeof(u64);
+	u16 start_off, tgt_out, tgt_abort;
+	const struct tc_action *a;
+	int err;
+
+	if (tc_no_actions(cls_bpf->exts))
+		return -EINVAL;
+
+	tc_for_each_action(a, cls_bpf->exts) {
+		if (!is_tcf_gact_shot(a))
+			return -EINVAL;
+	}
+
+	if (cls_bpf->exts_integrated)
+		return -EINVAL;
+
+	start_off = nn_readw(nn, NFP_NET_CFG_BPF_START);
+	tgt_out = nn_readw(nn, NFP_NET_CFG_BPF_TGT_OUT);
+	tgt_abort = nn_readw(nn, NFP_NET_CFG_BPF_TGT_ABORT);
+
+	*code = dma_zalloc_coherent(&nn->pdev->dev, code_sz, dma_addr,
+				    GFP_KERNEL);
+	if (!*code)
+		return -ENOMEM;
+
+	err = nfp_bpf_jit(cls_bpf->filter, *code, start_off, tgt_out, tgt_abort,
+			  max_instr, res);
+	if (err)
+		goto out;
+
+	return 0;
+
+out:
+	dma_free_coherent(&nn->pdev->dev, code_sz, *code, *dma_addr);
+	return err;
+}
+
+static void
+nfp_net_bpf_load_and_start(struct nfp_net *nn, u32 tc_flags,
+			   void *code, dma_addr_t dma_addr,
+			   unsigned int code_sz, unsigned int n_instr)
+{
+	int err;
+
+	nn->bpf_offload_skip_sw = !!(tc_flags & TCA_CLS_FLAGS_SKIP_SW);
+
+	nn_writel(nn, NFP_NET_CFG_BPF_SIZE, n_instr * sizeof(u64));
+	nn_writeq(nn, NFP_NET_CFG_BPF_ADDR, dma_addr);
+
+	/* Load up the JITed code */
+	nn_info(nn, "Reloading BPF code (%d instr)\n", n_instr);
+	err = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_BPF);
+	if (err)
+		nn_err(nn, "FW command error while loading BPF: %d\n", err);
+
+	/* Enable passing packets through BPF function */
+	nn->ctrl |= NFP_NET_CFG_CTRL_BPF;
+	nn_writel(nn, NFP_NET_CFG_CTRL, nn->ctrl);
+	err = nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_GEN);
+	if (err)
+		nn_err(nn, "FW command error while enabling BPF: %d\n", err);
+
+	dma_free_coherent(&nn->pdev->dev, code_sz, code, dma_addr);
+}
+
+static int nfp_net_bpf_stop(struct nfp_net *nn)
+{
+	if (!(nn->ctrl & NFP_NET_CFG_CTRL_BPF))
+		return 0;
+
+	nn->ctrl &= ~NFP_NET_CFG_CTRL_BPF;
+	nn_writel(nn, NFP_NET_CFG_CTRL, nn->ctrl);
+
+	nn->bpf_offload_skip_sw = 0;
+
+	return nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_GEN);
+}
+
+int
+nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
+		    struct tc_cls_bpf_offload *cls_bpf)
+{
+	struct nfp_bpf_result res;
+	dma_addr_t dma_addr;
+	u16 max_instr;
+	void *code;
+	int err;
+
+	max_instr = nn_readw(nn, NFP_NET_CFG_BPF_MAX_LEN);
+
+	switch (cls_bpf->command) {
+	case TC_CLSBPF_REPLACE:
+		/* There is nothing stopping us from implementing seamless
+		 * replace but the simple method of loading I adopted in
+		 * the firmware does not handle atomic replace (i.e. we have to
+		 * stop the BPF offload and re-enable it).  Leaking-in a few
+		 * frames which didn't have BPF applied in the hardware should
+		 * be fine if software fallback is available, though.
+		 */
+		if (nn->bpf_offload_skip_sw) {
+			nn_info(nn, "can't do atomic replace, yet, sorry!\n");
+			return -EBUSY;
+		}
+
+		err = nfp_net_bpf_offload_prepare(nn, cls_bpf, &res, &code,
+						  &dma_addr, max_instr);
+		if (err)
+			return err;
+
+		nfp_net_bpf_stop(nn);
+		nfp_net_bpf_load_and_start(nn, cls_bpf->gen_flags, code,
+					   dma_addr, max_instr * sizeof(u64),
+					   res.n_instr);
+		return 0;
+
+	case TC_CLSBPF_ADD:
+		if (nn->ctrl & NFP_NET_CFG_CTRL_BPF)
+			return -EBUSY;
+
+		err = nfp_net_bpf_offload_prepare(nn, cls_bpf, &res, &code,
+						  &dma_addr, max_instr);
+		if (err)
+			return err;
+
+		nfp_net_bpf_load_and_start(nn, cls_bpf->gen_flags, code,
+					   dma_addr, max_instr * sizeof(u64),
+					   res.n_instr);
+		return 0;
+
+	case TC_CLSBPF_DESTROY:
+		return nfp_net_bpf_stop(nn);
+
+	default:
+		return -ENOTSUPP;
+	}
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 07/12] nfp: add skb mark support to the bpf offload
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (5 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 06/12] nfp: add hardware cls_bpf offload Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 21:56   ` Alexei Starovoitov
  2016-06-01 16:50 ` [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats Jakub Kicinski
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Skb marking should be set in designated register, FW will
prepend it to the packet for us.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c    | 20 ++++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_net.h        |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  8 +++++++-
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
index d7eecfceba5c..b31e673a6fe8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -46,6 +46,8 @@
 
 #define REG_IMM0_N	30 /* Bank AB */
 #define REG_QNUM	29 /* Bank AB */
+#define REG_MARK	28 /* Bank A */
+#define REG_MARK_STS	28 /* Bank B */
 
 /* --- NFP prog --- */
 /* Foreach "multiple" entries macros provide pos and next<n> pointers.
@@ -416,6 +418,15 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
 	return construct_data_ind_ld(nfp_prog, offset, 0, false, size);
 }
 
+static int wrp_skb_mark(struct nfp_prog *nfp_prog, u16 src)
+{
+	__emit_alu(nfp_prog, REG_MARK, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
+		   false, false);
+	__emit_immed(nfp_prog, REG_MARK_STS, ALU_DST_B, 1, false);
+
+	return 0;
+}
+
 static int
 construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
 		 enum alu_op alu_op, bool sw)
@@ -538,6 +549,14 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return 0;
 }
 
+static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	if (meta->insn.off == offsetof(struct sk_buff, mark))
+		return wrp_skb_mark(nfp_prog, meta->insn.src_reg * 2);
+
+	return -ENOTSUPP;
+}
+
 static int and_immX(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	const struct bpf_insn *insn = &meta->insn;
@@ -660,6 +679,7 @@ static const instr_cb_t instr_cb[256] = {
 	[BPF_LD | BPF_IND | BPF_B] =	data_ind_ld1,
 	[BPF_LD | BPF_IND | BPF_H] =	data_ind_ld2,
 	[BPF_LD | BPF_IND | BPF_W] =	data_ind_ld4,
+	[BPF_STX | BPF_MEM | BPF_W] =	mem_stx4,
 	[BPF_JMP | BPF_JA | BPF_K] =	ja_imm,
 	[BPF_JMP | BPF_JEQ | BPF_K] =	jeq_imm,
 	[BPF_JMP | BPF_JGT | BPF_K] =	jgt_imm,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 784287e9ccaa..5df713b8f7c5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -220,7 +220,7 @@ struct nfp_net_tx_ring {
 #define PCIE_DESC_RX_I_TCP_CSUM_OK	cpu_to_le16(BIT(11))
 #define PCIE_DESC_RX_I_UDP_CSUM		cpu_to_le16(BIT(10))
 #define PCIE_DESC_RX_I_UDP_CSUM_OK	cpu_to_le16(BIT(9))
-#define PCIE_DESC_RX_SPARE		cpu_to_le16(BIT(8))
+#define PCIE_DESC_RX_BPF		cpu_to_le16(BIT(8))
 #define PCIE_DESC_RX_EOP		cpu_to_le16(BIT(7))
 #define PCIE_DESC_RX_IP4_CSUM		cpu_to_le16(BIT(6))
 #define PCIE_DESC_RX_IP4_CSUM_OK	cpu_to_le16(BIT(5))
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index c5acdf703b7f..706927b94e28 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1307,12 +1307,18 @@ static void nfp_net_set_hash(struct net_device *netdev, struct sk_buff *skb,
 			     struct nfp_net_rx_desc *rxd)
 {
 	struct nfp_net_rx_hash *rx_hash;
+	void *data = skb->data;
+
+	if (rxd->rxd.flags & PCIE_DESC_RX_BPF) {
+		skb->mark = get_unaligned_be32(data - 4);
+		data -= 4;
+	}
 
 	if (!(rxd->rxd.flags & PCIE_DESC_RX_RSS) ||
 	    !(netdev->features & NETIF_F_RXHASH))
 		return;
 
-	rx_hash = (struct nfp_net_rx_hash *)(skb->data - sizeof(*rx_hash));
+	rx_hash = (struct nfp_net_rx_hash *)(data - sizeof(*rx_hash));
 
 	switch (be32_to_cpu(rx_hash->hash_type)) {
 	case NFP_NET_RSS_IPV4:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (6 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 07/12] nfp: add skb mark support to the bpf offload Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 17:20   ` John Fastabend
  2016-06-01 22:09   ` Daniel Borkmann
  2016-06-01 16:50 ` [RFC 09/12] nfp: report statistics of offloaded filters Jakub Kicinski
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Call into offloaded filters to update stats.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 include/net/pkt_cls.h |  1 +
 net/sched/cls_bpf.c   | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e6b3dfb3159b..7a7bae628d1a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -442,6 +442,7 @@ enum tc_clsbpf_command {
 	TC_CLSBPF_ADD,
 	TC_CLSBPF_REPLACE,
 	TC_CLSBPF_DESTROY,
+	TC_CLSBPF_STATS,
 };
 
 struct tc_cls_bpf_offload {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 1083910cebaf..66215d9d6c9c 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -224,6 +224,15 @@ static void cls_bpf_stop_offload(struct tcf_proto *tp,
 	prog->offloaded = false;
 }
 
+static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
+					 struct cls_bpf_prog *prog)
+{
+	if (!prog->offloaded)
+		return;
+
+	cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_STATS);
+}
+
 static int cls_bpf_init(struct tcf_proto *tp)
 {
 	struct cls_bpf_head *head;
@@ -574,6 +583,8 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 
 	tm->tcm_handle = prog->handle;
 
+	cls_bpf_offload_update_stats(tp, prog);
+
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
 		goto nla_put_failure;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 09/12] nfp: report statistics of offloaded filters
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (7 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 16:50 ` [RFC 10/12] nfp: bpf: optimize register init Jakub Kicinski
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

Periodically read filter drop counts from the bar and
report them to appropriate action on user request.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h       | 19 +++++++
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |  3 +
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |  3 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  3 +
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   | 64 ++++++++++++++++++++++
 5 files changed, 92 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 5df713b8f7c5..fe6333220d97 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -62,6 +62,9 @@
 /* Max time to wait for NFP to respond on updates (in seconds) */
 #define NFP_NET_POLL_TIMEOUT	5
 
+/* Interval for reading offloaded filter stats */
+#define NFP_NET_STAT_POLL_IVL	msecs_to_jiffies(500)
+
 /* Bar allocation */
 #define NFP_NET_CRTL_BAR	0
 #define NFP_NET_Q0_BAR		2
@@ -405,6 +408,11 @@ static inline bool nfp_net_fw_ver_eq(struct nfp_net_fw_version *fw_ver,
 	       fw_ver->minor == minor;
 }
 
+struct nfp_stat_pair {
+	u64 pkts;
+	u64 bytes;
+};
+
 /**
  * struct nfp_net - NFP network device structure
  * @pdev:               Backpointer to PCI device
@@ -428,6 +436,11 @@ static inline bool nfp_net_fw_ver_eq(struct nfp_net_fw_version *fw_ver,
  * @rss_cfg:            RSS configuration
  * @rss_key:            RSS secret key
  * @rss_itbl:           RSS indirection table
+ * @rx_filter:		Filter offload statistics - dropped packets/bytes
+ * @rx_filter_prev:	Filter offload statistics - values from previous update
+ * @rx_filter_change:	Jiffies when statistics last changed
+ * @rx_filter_stats_timer:  Timer for polling filter offload statistics
+ * @rx_filter_lock:	Lock protecting timer state changes (teardown)
  * @max_tx_rings:       Maximum number of TX rings supported by the Firmware
  * @max_rx_rings:       Maximum number of RX rings supported by the Firmware
  * @num_tx_rings:       Currently configured number of TX rings
@@ -504,6 +517,11 @@ struct nfp_net {
 	u8 rss_key[NFP_NET_CFG_RSS_KEY_SZ];
 	u8 rss_itbl[NFP_NET_CFG_RSS_ITBL_SZ];
 
+	struct nfp_stat_pair rx_filter, rx_filter_prev;
+	unsigned long rx_filter_change;
+	struct timer_list rx_filter_stats_timer;
+	spinlock_t rx_filter_lock;
+
 	int max_tx_rings;
 	int max_rx_rings;
 
@@ -741,6 +759,7 @@ int nfp_net_irqs_alloc(struct nfp_net *nn);
 void nfp_net_irqs_disable(struct nfp_net *nn);
 int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt);
 
+void nfp_net_filter_stats_timer(unsigned long data);
 int
 nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
 		    struct tc_cls_bpf_offload *cls_bpf);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 706927b94e28..6364e97bac66 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2697,10 +2697,13 @@ struct nfp_net *nfp_net_netdev_alloc(struct pci_dev *pdev,
 	nn->rxd_cnt = NFP_NET_RX_DESCS_DEFAULT;
 
 	spin_lock_init(&nn->reconfig_lock);
+	spin_lock_init(&nn->rx_filter_lock);
 	spin_lock_init(&nn->link_status_lock);
 
 	setup_timer(&nn->reconfig_timer,
 		    nfp_net_reconfig_timer, (unsigned long)nn);
+	setup_timer(&nn->rx_filter_stats_timer,
+		    nfp_net_filter_stats_timer, (unsigned long)nn);
 
 	return nn;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
index c26449655419..f66b81b5d749 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
@@ -319,6 +319,9 @@
 #define NFP_NET_CFG_STATS_TX_MC_FRAMES  (NFP_NET_CFG_STATS_BASE + 0x80)
 #define NFP_NET_CFG_STATS_TX_BC_FRAMES  (NFP_NET_CFG_STATS_BASE + 0x88)
 
+#define NFP_NET_CFG_STATS_FILTER_FRAMES (NFP_NET_CFG_STATS_BASE + 0x90)
+#define NFP_NET_CFG_STATS_FILTER_BYTES  (NFP_NET_CFG_STATS_BASE + 0x98)
+
 /**
  * Per ring stats (0x1000 - 0x1800)
  * options, 64bit per entry
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index ccfef1f17627..a3a27b9917af 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -107,6 +107,9 @@ static const struct _nfp_net_et_stats nfp_net_et_stats[] = {
 	{"dev_tx_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_TX_FRAMES)},
 	{"dev_tx_mc_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_TX_MC_FRAMES)},
 	{"dev_tx_bc_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_TX_BC_FRAMES)},
+
+	{"dev_filter_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_FILTER_FRAMES)},
+	{"dev_filter_bytes", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_FILTER_BYTES)},
 };
 
 #define NN_ET_GLOBAL_STATS_LEN ARRAY_SIZE(nfp_net_et_stats)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
index dd2e428fd35a..caada9d61913 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
@@ -39,6 +39,8 @@
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/pci.h>
+#include <linux/jiffies.h>
+#include <linux/timer.h>
 
 #include <net/pkt_cls.h>
 #include <net/tc_act/tc_gact.h>
@@ -47,6 +49,58 @@
 #include "nfp_net_ctrl.h"
 #include "nfp_net.h"
 
+void nfp_net_filter_stats_timer(unsigned long data)
+{
+	struct nfp_net *nn = (void *)data;
+	struct nfp_stat_pair latest;
+
+	spin_lock_bh(&nn->rx_filter_lock);
+
+	if (nn->ctrl & NFP_NET_CFG_CTRL_BPF)
+		mod_timer(&nn->rx_filter_stats_timer,
+			  jiffies + NFP_NET_STAT_POLL_IVL);
+
+	spin_unlock_bh(&nn->rx_filter_lock);
+
+	latest.pkts = nn_readq(nn, NFP_NET_CFG_STATS_FILTER_FRAMES);
+	latest.bytes = nn_readq(nn, NFP_NET_CFG_STATS_FILTER_BYTES);
+
+	if (latest.pkts != nn->rx_filter.pkts)
+		nn->rx_filter_change = jiffies;
+
+	nn->rx_filter = latest;
+}
+
+static void nfp_net_bpf_stats_reset(struct nfp_net *nn)
+{
+	nn->rx_filter.pkts = nn_readq(nn, NFP_NET_CFG_STATS_FILTER_FRAMES);
+	nn->rx_filter.bytes = nn_readq(nn, NFP_NET_CFG_STATS_FILTER_BYTES);
+	nn->rx_filter_prev = nn->rx_filter;
+	nn->rx_filter_change = jiffies;
+}
+
+static int
+nfp_net_bpf_stats_update(struct nfp_net *nn, struct tc_cls_bpf_offload *cls_bpf)
+{
+	struct tc_action *a;
+	u64 bytes, pkts;
+
+	pkts = nn->rx_filter.pkts - nn->rx_filter_prev.pkts;
+	bytes = nn->rx_filter.bytes - nn->rx_filter_prev.bytes;
+	bytes -= pkts * ETH_HLEN;
+
+	nn->rx_filter_prev = nn->rx_filter;
+
+	preempt_disable();
+
+	tc_for_each_action(a, cls_bpf->exts)
+		tcf_action_stats_update(a, bytes, pkts, nn->rx_filter_change);
+
+	preempt_enable();
+
+	return 0;
+}
+
 static int
 nfp_net_bpf_offload_prepare(struct nfp_net *nn,
 			    struct tc_cls_bpf_offload *cls_bpf,
@@ -116,6 +170,9 @@ nfp_net_bpf_load_and_start(struct nfp_net *nn, u32 tc_flags,
 		nn_err(nn, "FW command error while enabling BPF: %d\n", err);
 
 	dma_free_coherent(&nn->pdev->dev, code_sz, code, dma_addr);
+
+	nfp_net_bpf_stats_reset(nn);
+	mod_timer(&nn->rx_filter_stats_timer, jiffies + NFP_NET_STAT_POLL_IVL);
 }
 
 static int nfp_net_bpf_stop(struct nfp_net *nn)
@@ -123,11 +180,15 @@ static int nfp_net_bpf_stop(struct nfp_net *nn)
 	if (!(nn->ctrl & NFP_NET_CFG_CTRL_BPF))
 		return 0;
 
+	spin_lock_bh(&nn->rx_filter_lock);
 	nn->ctrl &= ~NFP_NET_CFG_CTRL_BPF;
+	spin_unlock_bh(&nn->rx_filter_lock);
 	nn_writel(nn, NFP_NET_CFG_CTRL, nn->ctrl);
 
 	nn->bpf_offload_skip_sw = 0;
 
+	del_timer_sync(&nn->rx_filter_stats_timer);
+
 	return nfp_net_reconfig(nn, NFP_NET_CFG_UPDATE_GEN);
 }
 
@@ -185,6 +246,9 @@ nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
 	case TC_CLSBPF_DESTROY:
 		return nfp_net_bpf_stop(nn);
 
+	case TC_CLSBPF_STATS:
+		return nfp_net_bpf_stats_update(nn, cls_bpf);
+
 	default:
 		return -ENOTSUPP;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 10/12] nfp: bpf: optimize register init
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (8 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 09/12] nfp: report statistics of offloaded filters Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 16:50 ` [RFC 11/12] nfp: bpf: add register rename Jakub Kicinski
  2016-06-01 16:50 ` [RFC 12/12] nfp: bpf: add denser mode of execution Jakub Kicinski
  11 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

We can skip move of skb pointer to R6 since this pointer
has no meaning on the card.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
index b31e673a6fe8..a83e7404598f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -824,6 +824,29 @@ nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
 }
 
 /* --- Optimizations --- */
+static void nfp_bpf_opt_reg_init(struct nfp_prog *nfp_prog)
+{
+	struct nfp_insn_meta *meta;
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		struct bpf_insn insn = meta->insn;
+
+		/* Programs converted from cBPF start with register xoring */
+		if (insn.code == (BPF_ALU64 | BPF_XOR | BPF_X) &&
+		    insn.src_reg == insn.dst_reg)
+			continue;
+
+		/* Programs start with R6 = R1 but we ignore the skb pointer */
+		if (insn.code == (BPF_ALU64 | BPF_MOV | BPF_X) &&
+		    insn.src_reg == 1 && insn.dst_reg == 6)
+			meta->skip = true;
+
+		/* Return as soon as something doesn't match */
+		if (!meta->skip)
+			return;
+	}
+}
+
 /* Remove masking after load since our load guarantees this is not needed */
 static void nfp_bpf_opt_ld_mask(struct nfp_prog *nfp_prog)
 {
@@ -900,6 +923,7 @@ static void nfp_bpf_opt_ld_shift(struct nfp_prog *nfp_prog)
 
 static void nfp_bpf_optimize(struct nfp_prog *nfp_prog)
 {
+	nfp_bpf_opt_reg_init(nfp_prog);
 	nfp_bpf_opt_ld_mask(nfp_prog);
 	nfp_bpf_opt_ld_shift(nfp_prog);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 11/12] nfp: bpf: add register rename
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (9 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 10/12] nfp: bpf: optimize register init Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 16:50 ` [RFC 12/12] nfp: bpf: add denser mode of execution Jakub Kicinski
  11 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

If we use resources more wisely we may be able to use
programmable engines in a denser fashion.  Try to squeeze
used registers together.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 51 +++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
index a83e7404598f..b4dd04f4c653 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -847,6 +847,42 @@ static void nfp_bpf_opt_reg_init(struct nfp_prog *nfp_prog)
 	}
 }
 
+/* Try to rename registers so that program uses only low ones */
+static int nfp_bpf_opt_reg_rename(struct nfp_prog *nfp_prog)
+{
+	bool reg_used[__MAX_BPF_REG] = {};
+	u8 tgt_reg[__MAX_BPF_REG] = {};
+	struct nfp_insn_meta *meta;
+	unsigned int i, j;
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		if (meta->skip)
+			continue;
+
+		reg_used[meta->insn.src_reg] = true;
+		reg_used[meta->insn.dst_reg] = true;
+	}
+
+	if (reg_used[BPF_REG_10]) {
+		pr_err("Detected use of stack ptr\n");
+		return -EINVAL;
+	}
+
+	for (i = 0, j = 0; i < ARRAY_SIZE(tgt_reg); i++) {
+		if (!reg_used[i])
+			continue;
+
+		tgt_reg[i] = j++;
+	}
+
+	list_for_each_entry(meta, &nfp_prog->insns, l) {
+		meta->insn.src_reg = tgt_reg[meta->insn.src_reg];
+		meta->insn.dst_reg = tgt_reg[meta->insn.dst_reg];
+	}
+
+	return 0;
+}
+
 /* Remove masking after load since our load guarantees this is not needed */
 static void nfp_bpf_opt_ld_mask(struct nfp_prog *nfp_prog)
 {
@@ -921,11 +957,20 @@ static void nfp_bpf_opt_ld_shift(struct nfp_prog *nfp_prog)
 	}
 }
 
-static void nfp_bpf_optimize(struct nfp_prog *nfp_prog)
+static int nfp_bpf_optimize(struct nfp_prog *nfp_prog)
 {
+	int ret;
+
 	nfp_bpf_opt_reg_init(nfp_prog);
+
+	ret = nfp_bpf_opt_reg_rename(nfp_prog);
+	if (ret)
+		return ret;
+
 	nfp_bpf_opt_ld_mask(nfp_prog);
 	nfp_bpf_opt_ld_shift(nfp_prog);
+
+	return 0;
 }
 
 /**
@@ -961,7 +1006,9 @@ nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
 	if (ret)
 		goto out;
 
-	nfp_bpf_optimize(nfp_prog);
+	ret = nfp_bpf_optimize(nfp_prog);
+	if (ret)
+		goto out;
 
 	nfp_prog->prog = prog_mem;
 	nfp_prog->__prog_alloc_len = prog_sz;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [RFC 12/12] nfp: bpf: add denser mode of execution
  2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
                   ` (10 preceding siblings ...)
  2016-06-01 16:50 ` [RFC 11/12] nfp: bpf: add register rename Jakub Kicinski
@ 2016-06-01 16:50 ` Jakub Kicinski
  2016-06-01 22:01   ` Alexei Starovoitov
  11 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 16:50 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, dinan.gunawardena, Jakub Kicinski

If BPF uses less than 7 registers programmable engines
can process twice as many packets in parallel.  Signal
this denser mode of operation to FW by setting the lowest
bit in DMA address of the machine code buffer.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h       | 24 ++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c   | 51 +++++++++++-----------
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  9 ++--
 3 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf.h b/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
index 8fa9ff28ba80..bec766cd072f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
@@ -52,6 +52,24 @@ enum br_special {
 	OP_BR_GO_ABORT,
 };
 
+enum static_regs {
+	STATIC_REG_PKT		= 1,
+#define REG_PKT_BANK	ALU_DST_A
+	STATIC_REG_LEN		= 1,
+#define REG_LEN_BANK	ALU_DST_B
+	STATIC_REG_IMM		= 2, /* Bank AB */
+	STATIC_REG_QNUM		= 3, /* Bank AB */
+	STATIC_REG_MARK		= 4, /* Bank A */
+	STATIC_REG_MARK_SET	= 4, /* Bank B */
+};
+
+#define r_pkt(np)	((np)->regs_per_thread - STATIC_REG_PKT)
+#define r_len(np)	((np)->regs_per_thread - STATIC_REG_LEN)
+#define r_imm(np)	((np)->regs_per_thread - STATIC_REG_IMM)
+#define r_qnum(np)	((np)->regs_per_thread - STATIC_REG_QNUM)
+#define r_mark(np)	((np)->regs_per_thread - STATIC_REG_MARK)
+#define r_mark_s(np)	((np)->regs_per_thread - STATIC_REG_MARK_SET)
+
 struct nfp_prog;
 struct nfp_insn_meta;
 typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
@@ -78,6 +96,8 @@ struct nfp_insn_meta {
  * @prog: machine code
  * @prog_len: number of valid instructions in @prog array
  * @__prog_alloc_len: alloc size of @prog array
+ * @num_regs: numer of registers used by this program
+ * @regs_per_thread: number of basic registers allocated per thread
  * @start_off: address of the first instruction in the memory
  * @tgt_out: jump target for normal exit
  * @tgt_abort: jump target for abort (e.g. access outside of packet buffer)
@@ -90,6 +110,9 @@ struct nfp_prog {
 	unsigned int prog_len;
 	unsigned int __prog_alloc_len;
 
+	unsigned int num_regs;
+	unsigned int regs_per_thread;
+
 	unsigned int start_off;
 	unsigned int tgt_out;
 	unsigned int tgt_abort;
@@ -102,6 +125,7 @@ struct nfp_prog {
 
 struct nfp_bpf_result {
 	unsigned int n_instr;
+	bool dense_mode;
 };
 
 int
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
index b4dd04f4c653..666dee54fe77 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -39,16 +39,6 @@
 #include "nfp_asm.h"
 #include "nfp_bpf.h"
 
-#define REG_PKT_N	31
-#define REG_PKT_BANK	ALU_DST_A
-#define REG_LEN_N	31
-#define REG_LEN_BANK	ALU_DST_B
-
-#define REG_IMM0_N	30 /* Bank AB */
-#define REG_QNUM	29 /* Bank AB */
-#define REG_MARK	28 /* Bank A */
-#define REG_MARK_STS	28 /* Bank B */
-
 /* --- NFP prog --- */
 /* Foreach "multiple" entries macros provide pos and next<n> pointers.
  * It's safe to modify the next pointers (but not pos).
@@ -372,31 +362,32 @@ construct_data_ind_ld(struct nfp_prog *nfp_prog, u16 offset,
 	if (src_valid) {
 		/* Calculate the true offset (src_reg + imm) */
 		imm_reg = ur_load_imm_any(nfp_prog, offset,
-					  REG_IMM0_N, ALU_DST_B);
-		__emit_alu(nfp_prog, REG_IMM0_N, ALU_DST_A,
+					  r_imm(nfp_prog), ALU_DST_B);
+		__emit_alu(nfp_prog, r_imm(nfp_prog), ALU_DST_A,
 			   src, ALU_OP_ADD, imm_reg, false, true);
 		/* Check packet length (size guaranteed to fit b/c it's u8) */
-		__emit_alu(nfp_prog, REG_IMM0_N, ALU_DST_A,
-			   REG_IMM0_N, ALU_OP_ADD, UR_REG_IMM | size,
+		__emit_alu(nfp_prog, r_imm(nfp_prog), ALU_DST_A,
+			   r_imm(nfp_prog), ALU_OP_ADD, UR_REG_IMM | size,
 			   false, false);
 		__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
-			   REG_IMM0_N, ALU_OP_SUB, REG_LEN_N, true, false);
+			   r_imm(nfp_prog), ALU_OP_SUB, r_len(nfp_prog),
+			   true, false);
 		wrp_br_special(nfp_prog, BR_BLO, OP_BR_GO_ABORT);
 		/* Load data */
 		__emit_cmd(nfp_prog, CMD_TGT_READ8, CMD_MODE_32b, 0,
-			   REG_PKT_N, REG_IMM0_N, sz - 1, true);
+			   r_pkt(nfp_prog), r_imm(nfp_prog), sz - 1, true);
 	} else {
 		/* Check packet length */
 		imm_reg = ur_load_imm_any(nfp_prog, offset + size,
-					  REG_IMM0_N, ALU_DST_A);
+					  r_imm(nfp_prog), ALU_DST_A);
 		__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
-			   imm_reg, ALU_OP_SUB, REG_LEN_N, true, false);
+			   imm_reg, ALU_OP_SUB, r_len(nfp_prog), true, false);
 		wrp_br_special(nfp_prog, BR_BLO, OP_BR_GO_ABORT);
 		/* Load data */
 		imm_reg = re_load_imm_any(nfp_prog, offset,
-					  REG_IMM0_N, ALU_DST_B);
+					  r_imm(nfp_prog), ALU_DST_B);
 		__emit_cmd(nfp_prog, CMD_TGT_READ8, CMD_MODE_32b, 0,
-			   REG_PKT_N, imm_reg, sz - 1, true);
+			   r_pkt(nfp_prog), imm_reg, sz - 1, true);
 	}
 
 	i = 0;
@@ -420,9 +411,9 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
 
 static int wrp_skb_mark(struct nfp_prog *nfp_prog, u16 src)
 {
-	__emit_alu(nfp_prog, REG_MARK, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
-		   false, false);
-	__emit_immed(nfp_prog, REG_MARK_STS, ALU_DST_B, 1, false);
+	__emit_alu(nfp_prog, r_mark(nfp_prog), ALU_DST_A,
+		   REG_NONE, ALU_OP_NONE, src, false, false);
+	__emit_immed(nfp_prog, r_mark_s(nfp_prog), ALU_DST_B, 1, false);
 
 	return 0;
 }
@@ -433,7 +424,7 @@ construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
 {
 	u16 imm_reg;
 
-	imm_reg = ur_load_imm_any(nfp_prog, imm, REG_IMM0_N, ALU_DST_B);
+	imm_reg = ur_load_imm_any(nfp_prog, imm, r_imm(nfp_prog), ALU_DST_B);
 
 	__emit_alu(nfp_prog, UR_REG_NO_DST, ALU_DST_A,
 		   dst, alu_op, imm_reg, sw, false);
@@ -524,7 +515,7 @@ static int mem_ld4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
 	if (meta->insn.off == offsetof(struct sk_buff, len))
 		__emit_alu(nfp_prog, meta->insn.dst_reg * 2, ALU_DST_A,
-			   REG_NONE, ALU_OP_NONE, REG_LEN_N, false, true);
+			   REG_NONE, ALU_OP_NONE, r_len(nfp_prog), false, true);
 	else
 		return -ENOTSUPP;
 
@@ -562,7 +553,8 @@ static int and_immX(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	const struct bpf_insn *insn = &meta->insn;
 	u16 imm_reg;
 
-	imm_reg = ur_load_imm_any(nfp_prog, insn->imm, REG_IMM0_N, ALU_DST_B);
+	imm_reg = ur_load_imm_any(nfp_prog, insn->imm,
+				  r_imm(nfp_prog), ALU_DST_B);
 
 	__emit_alu(nfp_prog, insn->dst_reg * 2, ALU_DST_A,
 		   insn->dst_reg * 2, ALU_OP_AND, imm_reg, false, true);
@@ -874,6 +866,7 @@ static int nfp_bpf_opt_reg_rename(struct nfp_prog *nfp_prog)
 
 		tgt_reg[i] = j++;
 	}
+	nfp_prog->num_regs = j;
 
 	list_for_each_entry(meta, &nfp_prog->insns, l) {
 		meta->insn.src_reg = tgt_reg[meta->insn.src_reg];
@@ -1010,6 +1003,11 @@ nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
 	if (ret)
 		goto out;
 
+	if (nfp_prog->num_regs <= 6)
+		nfp_prog->regs_per_thread = 16;
+	else
+		nfp_prog->regs_per_thread = 32;
+
 	nfp_prog->prog = prog_mem;
 	nfp_prog->__prog_alloc_len = prog_sz;
 
@@ -1021,6 +1019,7 @@ nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
 	}
 
 	res->n_instr = nfp_prog->prog_len;
+	res->dense_mode = nfp_prog->num_regs <= 6;
 out:
 	nfp_prog_free(nfp_prog);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
index caada9d61913..62c7725e489c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
@@ -147,14 +147,15 @@ out:
 static void
 nfp_net_bpf_load_and_start(struct nfp_net *nn, u32 tc_flags,
 			   void *code, dma_addr_t dma_addr,
-			   unsigned int code_sz, unsigned int n_instr)
+			   unsigned int code_sz, unsigned int n_instr,
+			   bool dense_mode)
 {
 	int err;
 
 	nn->bpf_offload_skip_sw = !!(tc_flags & TCA_CLS_FLAGS_SKIP_SW);
 
 	nn_writel(nn, NFP_NET_CFG_BPF_SIZE, n_instr * sizeof(u64));
-	nn_writeq(nn, NFP_NET_CFG_BPF_ADDR, dma_addr);
+	nn_writeq(nn, NFP_NET_CFG_BPF_ADDR, dma_addr | dense_mode);
 
 	/* Load up the JITed code */
 	nn_info(nn, "Reloading BPF code (%d instr)\n", n_instr);
@@ -226,7 +227,7 @@ nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
 		nfp_net_bpf_stop(nn);
 		nfp_net_bpf_load_and_start(nn, cls_bpf->gen_flags, code,
 					   dma_addr, max_instr * sizeof(u64),
-					   res.n_instr);
+					   res.n_instr, res.dense_mode);
 		return 0;
 
 	case TC_CLSBPF_ADD:
@@ -240,7 +241,7 @@ nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
 
 		nfp_net_bpf_load_and_start(nn, cls_bpf->gen_flags, code,
 					   dma_addr, max_instr * sizeof(u64),
-					   res.n_instr);
+					   res.n_instr, res.dense_mode);
 		return 0;
 
 	case TC_CLSBPF_DESTROY:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [RFC 02/12] net: cls_bpf: add hardware offload
  2016-06-01 16:50 ` [RFC 02/12] net: cls_bpf: add hardware offload Jakub Kicinski
@ 2016-06-01 17:13   ` John Fastabend
  2016-06-01 20:59     ` Jakub Kicinski
  2016-06-01 19:34   ` Daniel Borkmann
  2016-06-02  7:17   ` Jiri Pirko
  2 siblings, 1 reply; 54+ messages in thread
From: John Fastabend @ 2016-06-01 17:13 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, daniel, dinan.gunawardena

On 16-06-01 09:50 AM, Jakub Kicinski wrote:
> This patch adds hardware offload capability to cls_bpf classifier,
> similar to what have been done with U32 and flower.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---


Nice!

[...]

> +static void cls_bpf_stop_offload(struct tcf_proto *tp,
> +				 struct cls_bpf_prog *prog)
> +{
> +	struct net_device *dev = tp->q->dev_queue->dev;
> +
> +	if (!prog->offloaded)
> +		return;
> +	if (WARN_ON(!tc_should_offload(dev, 0)))
> +		return;

This warn on is a bit concerning it looks like you can get
a program stuck in hardware but removed from the software
stack. Any idea why this could happen? I think it is better
to solve the root problem or just remove this if its dbg code.

One thought is you need to block disabling the ethtool flag
if the hardware has running ebpf codes. Haven't got to the
driver patches yet though so not sure if you did this or not.
And now that I think about it I better go check the other
drivers.

> +
> +	if (cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY)) {
> +		pr_err("Stopping hardware offload failed!\n");
> +		return;
> +	}
> +
> +	prog->offloaded = false;
> +}
> +
>  static int cls_bpf_init(struct tcf_proto *tp)
>  {
>  	struct cls_bpf_head *head;
> @@ -179,6 +246,7 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
>  {
>  	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) arg;
>  
> +	cls_bpf_stop_offload(tp, prog);
>  	list_del_rcu(&prog->link);
>  	tcf_unbind_filter(tp, &prog->res);
>  	call_rcu(&prog->rcu, __cls_bpf_delete_prog);
> @@ -195,6 +263,7 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
>  		return false;
>  
>  	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
> +		cls_bpf_stop_offload(tp, prog);
>  		list_del_rcu(&prog->link);
>  		tcf_unbind_filter(tp, &prog->res);
>  		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
> @@ -415,6 +484,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
>  	if (ret < 0)
>  		goto errout;
>  
> +	cls_bpf_offload(tp, prog, oldprog);
> +
>  	if (oldprog) {
>  		list_replace_rcu(&oldprog->link, &prog->link);
>  		tcf_unbind_filter(tp, &oldprog->res);
> 


Otherwise this looks really good.

.John

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
@ 2016-06-01 17:16   ` John Fastabend
  2016-06-01 17:16   ` John Fastabend
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 54+ messages in thread
From: John Fastabend @ 2016-06-01 17:16 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, daniel, dinan.gunawardena

On 16-06-01 09:50 AM, Jakub Kicinski wrote:
> Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
> Unlike U32 and flower cls_bpf already has some netlink
> flags defined.  I chose to create a new attribute to be
> able to use the same flag values as the above.
> 
> Unknown flags are ignored and not reported upon dump.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---

Seems reasonable to me. Adding support for the SKIP_SW flag
as well would be helpful. It looks like you could do this
fairly easily by checking the offload boolean. Anyways that
is another patch of course.

Acked-by: John Fastabend <john.r.fastabend@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
  2016-06-01 17:16   ` John Fastabend
@ 2016-06-01 17:16   ` John Fastabend
  2016-06-01 19:40   ` Daniel Borkmann
  2016-06-02  7:24   ` Jiri Pirko
  3 siblings, 0 replies; 54+ messages in thread
From: John Fastabend @ 2016-06-01 17:16 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, daniel, dinan.gunawardena

On 16-06-01 09:50 AM, Jakub Kicinski wrote:
> Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
> Unlike U32 and flower cls_bpf already has some netlink
> flags defined.  I chose to create a new attribute to be
> able to use the same flag values as the above.
> 
> Unknown flags are ignored and not reported upon dump.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---

Seems reasonable to me.

Acked-by: John Fastabend <john.r.fastabend@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only
  2016-06-01 16:50 ` [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only Jakub Kicinski
@ 2016-06-01 17:19   ` John Fastabend
  2016-06-01 19:57   ` Daniel Borkmann
  1 sibling, 0 replies; 54+ messages in thread
From: John Fastabend @ 2016-06-01 17:19 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, daniel, dinan.gunawardena

On 16-06-01 09:50 AM, Jakub Kicinski wrote:
> Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_SW flag.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---

Looks good to me. Glad to see this worked out without too much pain
on the classifier side at least.

Acked-by: John Fastabend <john.r.fastabend@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats
  2016-06-01 16:50 ` [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats Jakub Kicinski
@ 2016-06-01 17:20   ` John Fastabend
  2016-06-01 22:09   ` Daniel Borkmann
  1 sibling, 0 replies; 54+ messages in thread
From: John Fastabend @ 2016-06-01 17:20 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, daniel, dinan.gunawardena

On 16-06-01 09:50 AM, Jakub Kicinski wrote:
> Call into offloaded filters to update stats.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---


Acked-by: John Fastabend <john.r.fastabend@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 02/12] net: cls_bpf: add hardware offload
  2016-06-01 16:50 ` [RFC 02/12] net: cls_bpf: add hardware offload Jakub Kicinski
  2016-06-01 17:13   ` John Fastabend
@ 2016-06-01 19:34   ` Daniel Borkmann
  2016-06-02  7:17   ` Jiri Pirko
  2 siblings, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 19:34 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> This patch adds hardware offload capability to cls_bpf classifier,
> similar to what have been done with U32 and flower.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>

Nice! This is truly awesome to see this work progressing!

Apart from John's comment made earlier:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
  2016-06-01 17:16   ` John Fastabend
  2016-06-01 17:16   ` John Fastabend
@ 2016-06-01 19:40   ` Daniel Borkmann
  2016-06-01 21:05     ` Jakub Kicinski
  2016-06-02  7:24   ` Jiri Pirko
  3 siblings, 1 reply; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 19:40 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
> Unlike U32 and flower cls_bpf already has some netlink
> flags defined.  I chose to create a new attribute to be
> able to use the same flag values as the above.

Yeah, that's totally fine to make it a new flag attribute.

> Unknown flags are ignored and not reported upon dump.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>

[...]
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index f4297c8a42fe..93a86edf3bd8 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -395,6 +395,7 @@ enum {
>   	TCA_BPF_FD,
>   	TCA_BPF_NAME,
>   	TCA_BPF_FLAGS,
> +	TCA_BPF_GEN_TCA_FLAGS,

Small nit for the non-RFC set: I'd simply name that TCA_BPF_FLAGS_GEN.

>   	__TCA_BPF_MAX,
>   };
>
[...]
> @@ -400,8 +406,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
>
>   		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
>   	}
> +	if (tb[TCA_BPF_GEN_TCA_FLAGS])
> +		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
>
>   	prog->exts_integrated = have_exts;
> +	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;

Invalid flags should probably be rejected here with -EINVAL or something.

>   	ret = is_bpf ? cls_bpf_prog_from_ops(tb, prog) :
>   		       cls_bpf_prog_from_efd(tb, prog, tp);
> @@ -568,6 +577,9 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
>   		bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
>   	if (bpf_flags && nla_put_u32(skb, TCA_BPF_FLAGS, bpf_flags))
>   		goto nla_put_failure;
> +	if (prog->gen_flags &&
> +	    nla_put_u32(skb, TCA_BPF_GEN_TCA_FLAGS, prog->gen_flags))
> +		goto nla_put_failure;
>
>   	nla_nest_end(skb, nest);
>
>

Otherwise looks good:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only
  2016-06-01 16:50 ` [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only Jakub Kicinski
  2016-06-01 17:19   ` John Fastabend
@ 2016-06-01 19:57   ` Daniel Borkmann
  1 sibling, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 19:57 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_SW flag.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---
>   net/sched/cls_bpf.c | 39 +++++++++++++++++++++++++++++----------
>   1 file changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 9f1bc37dcbbc..1083910cebaf 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -28,7 +28,7 @@ MODULE_DESCRIPTION("TC BPF based classifier");
>
>   #define CLS_BPF_NAME_LEN	256
>   #define CLS_BPF_SUPPORTED_GEN_FLAGS		\
> -	TCA_CLS_FLAGS_SKIP_HW
> +	(TCA_CLS_FLAGS_SKIP_HW | TCA_CLS_FLAGS_SKIP_SW)
>
>   struct cls_bpf_head {
>   	struct list_head plist;
> @@ -98,7 +98,9 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>
>   		qdisc_skb_cb(skb)->tc_classid = prog->res.classid;
>
> -		if (at_ingress) {
> +		if (tc_skip_sw(prog->gen_flags)) {
> +			filter_res = 0;

Seems okay to me, depending on the working mode of cls_bpf (da or non-da),
this either means TC_ACT_OK (== 0) or 'not classified'.

> +		} else if (at_ingress) {
>   			/* It is safe to push/pull even if skb_shared() */
>   			__skb_push(skb, skb->mac_len);
>   			bpf_compute_data_end(skb);

[...]
> @@ -406,8 +418,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
>
>   		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
>   	}
> -	if (tb[TCA_BPF_GEN_TCA_FLAGS])
> +	if (tb[TCA_BPF_GEN_TCA_FLAGS]) {
>   		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
> +		if (!tc_flags_valid(gen_flags))
> +			return -EINVAL;
> +	}

Resolves my earlier comment, okay to put it here after supporting all flags.

>   	prog->exts_integrated = have_exts;
>   	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;
> @@ -493,7 +508,11 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
>   	if (ret < 0)
>   		goto errout;

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 05/12] nfp: add BPF to NFP code translator
  2016-06-01 16:50 ` [RFC 05/12] nfp: add BPF to NFP code translator Jakub Kicinski
@ 2016-06-01 20:03   ` Daniel Borkmann
  2016-06-01 20:09     ` John Fastabend
  2016-06-01 20:15     ` Alexei Starovoitov
  0 siblings, 2 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 20:03 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> Add translator for JITing eBPF to operations which
> can be executed on NFP's programmable engines.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
[...]
> +int
> +nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
> +	    unsigned int tgt_out, unsigned int tgt_abort,
> +	    unsigned int prog_sz, struct nfp_bpf_result *res)
> +{
> +	struct nfp_prog *nfp_prog;
> +	int ret;
> +
> +	/* TODO: maybe make this dependent on bpf_jit_enable? */

Probably makes sense to leave it independent from this.

Maybe that would rather be an ethtool flag/setting?

> +	nfp_prog = kzalloc(sizeof(*nfp_prog), GFP_KERNEL);
> +	if (!nfp_prog)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&nfp_prog->insns);
> +	nfp_prog->start_off = prog_start;
> +	nfp_prog->tgt_out = tgt_out;
> +	nfp_prog->tgt_abort = tgt_abort;
> +
> +	ret = nfp_prog_prepare(nfp_prog, filter->insnsi, filter->len);
> +	if (ret)
> +		goto out;
> +
> +	nfp_bpf_optimize(nfp_prog);
> +
> +	nfp_prog->prog = prog_mem;
> +	nfp_prog->__prog_alloc_len = prog_sz;
> +
> +	ret = nfp_translate(nfp_prog);
> +	if (ret) {
> +		pr_err("Translation failed with error %d (translated: %u)\n",
> +		       ret, nfp_prog->n_translated);
> +		ret = -EINVAL;
> +	}
> +
> +	res->n_instr = nfp_prog->prog_len;
> +out:
> +	nfp_prog_free(nfp_prog);
> +
> +	return ret;
> +}
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 05/12] nfp: add BPF to NFP code translator
  2016-06-01 20:03   ` Daniel Borkmann
@ 2016-06-01 20:09     ` John Fastabend
  2016-06-01 20:15     ` Alexei Starovoitov
  1 sibling, 0 replies; 54+ messages in thread
From: John Fastabend @ 2016-06-01 20:09 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 16-06-01 01:03 PM, Daniel Borkmann wrote:
> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
>> Add translator for JITing eBPF to operations which
>> can be executed on NFP's programmable engines.
>>
>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> [...]
>> +int
>> +nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int
>> prog_start,
>> +        unsigned int tgt_out, unsigned int tgt_abort,
>> +        unsigned int prog_sz, struct nfp_bpf_result *res)
>> +{
>> +    struct nfp_prog *nfp_prog;
>> +    int ret;
>> +
>> +    /* TODO: maybe make this dependent on bpf_jit_enable? */
> 
> Probably makes sense to leave it independent from this.

Agreed we have an ethtool flag for the other offloads I think this
should just follow the same semantics and get offloaded if the flag
is set.

> 
> Maybe that would rather be an ethtool flag/setting?
> 

I would think using the same flag as we did for u32 and flower
should be fine.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 01/12] add basic register-field manipulation macros
  2016-06-01 16:50 ` [RFC 01/12] add basic register-field manipulation macros Jakub Kicinski
@ 2016-06-01 20:15   ` Hannes Frederic Sowa
  2016-06-01 23:08     ` Jakub Kicinski
  0 siblings, 1 reply; 54+ messages in thread
From: Hannes Frederic Sowa @ 2016-06-01 20:15 UTC (permalink / raw)
  To: Jakub Kicinski, netdev
  Cc: ast, daniel, dinan.gunawardena, Ivo van Doorn, Felix Fietkau

Hello,

On Wed, Jun 1, 2016, at 18:50, Jakub Kicinski wrote:
> C bitfields are problematic and best avoided.  Developers
> interacting with hardware registers find themselves searching
> for easy-to-use alternatives.  Common approach is to define
> structures or sets of macros containing mask and shift pair.
> Operation on the register are then performed as follows:
> 
>  field = (reg >> shift) & mask;
> 
>  reg &= ~(mask << shift);
>  reg |= (field & mask) << shift;
> 
> Defining shift and mask separately is tedious.  Ivo van Doorn
> came up with an idea of computing them at compilation time
> based on a single shifted mask (later refined by Felix) which
> can be used like this:
> 
>  field = FIELD_GET(REG_FIELD, reg);
> 
>  reg &= ~REG_FIELD;
>  reg |= FIELD_PUT(REG_FIELD, field);
> 
> FIELD_{GET,PUT} macros take care of finding out what the
> appropriate shift is based on compilation time ffs operation.
> 
> GENMASK can be used to define registers (which is usually
> less error-prone and easier to match with datasheets).
> 
> This approach is the most convenient I've seen so to limit code
> multiplication let's move the macros to a global header file.
> 
> Compared to Felix Fietkau's version I:
>  - edited the comments a bit;
>  - gave auxiliary macros _bf_ prefix;
>  - dropped the UL specifier from 1 in _bf_low_bits,
>  - renamed the FIELD_SET to FIELD_PUT;
>  - added 64bit versions.
> 
> CC: Ivo van Doorn <IvDoorn@gmail.com>
> CC: Felix Fietkau <nbd@openwrt.org>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---
>  include/linux/bitfield.h | 89
>  ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
>  create mode 100644 include/linux/bitfield.h
> 
> diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
> new file mode 100644
> index 000000000000..3815c81f5b10
> --- /dev/null
> +++ b/include/linux/bitfield.h
> @@ -0,0 +1,89 @@
> +/*
> + * Copyright (C) 2014 Felix Fietkau <nbd@openwrt.org>
> + * Copyright (C) 2004 - 2009 Ivo van Doorn <IvDoorn@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _LINUX_BITFIELD_H
> +#define _LINUX_BITFIELD_H
> +
> +#include <asm/types.h>
> +
> +/*
> + * Macros to find first set bit in a variable.
> + * These macros behave the same as the __ffs() functions but the most
> important
> + * difference that this is done during compile-time rather than
> run-time.
> + */
> +#define compile_ffs2(__x) \
> +       __builtin_choose_expr(((__x) & 0x1), 0, 1)
> +
> +#define compile_ffs4(__x) \
> +       __builtin_choose_expr(((__x) & 0x3), \
> +                             (compile_ffs2((__x))), \
> +                             (compile_ffs2((__x) >> 2) + 2))
> +
> +#define compile_ffs8(__x) \
> +       __builtin_choose_expr(((__x) & 0xf), \
> +                             (compile_ffs4((__x))), \
> +                             (compile_ffs4((__x) >> 4) + 4))
> +
> +#define compile_ffs16(__x) \
> +       __builtin_choose_expr(((__x) & 0xff), \
> +                             (compile_ffs8((__x))), \
> +                             (compile_ffs8((__x) >> 8) + 8))
> +
> +#define compile_ffs32(__x) \
> +       __builtin_choose_expr(((__x) & 0xffff), \
> +                             (compile_ffs16((__x))), \
> +                             (compile_ffs16((__x) >> 16) + 16))
> +
> +#define compile_ffs64(__x) \
> +       __builtin_choose_expr(((__x) & 0xffffffff), \
> +                             (compile_ffs32((__x))), \
> +                             (compile_ffs32(((__x) >> 16) >> 16) + 32))

I wonder if this can already be done with __builtin_ffs/__builtin_ffsl.

So the macro would only need to do:

__builtin_choose_expr(__builtin_types_compatible_p(typeof(x), long,
__builtin_ffsl(x),
__builtin_choose_expr(__builtin_types_compatible_p(typeof(x), int,
__builtin_ffs(x), (void)0)

Probably you can get away with the long version only.

Probably adding ffs and ffsl to the generic headers and using them would
also be worthwhile.

Otherwise you should be able to express all of this via ilog2 in nearly
one line?

> +
> +/*
> + * Macros to validate that the mask given contains a contiguous set of
> bits.
> + * Note that we cannot use the is_power_of_2() function since this check
> + * must be done at compile-time.
> + */
> +#define _bf_is_power_of_two(x) ( !((x) & ((x) - 1)) )

We already provide a macro is_power_of_2.

> +#define _bf_low_bits(x)                ( ((x) - 1) & ~(x) )

GENMASK?

> +#define _bf_is_valid_mask(x)   _bf_is_power_of_two(1U + (x) +
> _bf_low_bits(x))
> +
> +#define _BF_FIELD_CHECK(_mask)                                         \
> +       BUILD_BUG_ON(!(_mask) || !_bf_is_valid_mask(_mask))
> +
> +#define FIELD_PUT(_mask, _val)                                         \
> +       ({                                                              \
> +               _BF_FIELD_CHECK(_mask);                                 \
> +               (((u32)(_val)) << compile_ffs32(_mask)) & _mask;        \
> +       })
> +
> +#define FIELD_GET(_mask, _val)                                         \
> +       ({                                                              \
> +               _BF_FIELD_CHECK(_mask);                                 \
> +               (u32)(((_val) & _mask) >> compile_ffs32(_mask));        \
> +       })
> +
> +#define FIELD_PUT64(_mask, _val)                                       \
> +       ({                                                              \
> +               _BF_FIELD_CHECK(_mask);                                 \
> +               (((u64)(_val)) << compile_ffs64(_mask)) & _mask;        \
> +       })
> +
> +#define FIELD_GET64(_mask, _val)                                       \
> +       ({                                                              \
> +               _BF_FIELD_CHECK(_mask);                                 \
> +               (u64)(((_val) & _mask) >> compile_ffs64(_mask));        \
> +       })
> +
> +#endif

Looks good!

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 05/12] nfp: add BPF to NFP code translator
  2016-06-01 20:03   ` Daniel Borkmann
  2016-06-01 20:09     ` John Fastabend
@ 2016-06-01 20:15     ` Alexei Starovoitov
  2016-06-01 21:23       ` Jakub Kicinski
  2016-06-02 16:21       ` John Fastabend
  1 sibling, 2 replies; 54+ messages in thread
From: Alexei Starovoitov @ 2016-06-01 20:15 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Jakub Kicinski, netdev, ast, dinan.gunawardena

On Wed, Jun 01, 2016 at 10:03:04PM +0200, Daniel Borkmann wrote:
> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> >Add translator for JITing eBPF to operations which
> >can be executed on NFP's programmable engines.
> >
> >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> >Reviewed-by: Simon Horman <simon.horman@netronome.com>
> [...]
> >+int
> >+nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
> >+	    unsigned int tgt_out, unsigned int tgt_abort,
> >+	    unsigned int prog_sz, struct nfp_bpf_result *res)
> >+{
> >+	struct nfp_prog *nfp_prog;
> >+	int ret;
> >+
> >+	/* TODO: maybe make this dependent on bpf_jit_enable? */
> 
> Probably makes sense to leave it independent from this.
> 
> Maybe that would rather be an ethtool flag/setting?

Agree that it should be independent of bpf_jit_enable,
since that's very different JIT. The whole point of hw offload
is that bpf is translated into something hw understand natively.
Gating it by sysctl or another flag doesn't make much sense to me.
In this case the user will say 'do offload tc+cls_bpf into a nic'
and nic should either do it or not. No need for ethtool flag either.
One can argue that that bpf_jit_enable=2 was useful for debugging
of JIT itself, but looks like it was only used by jit developers
like us, but we would be fine with temp printk while debugging.
At least there was never a case where jit had a bug and we would
ask a person reporting a bug to send us back jit_enable=2 output.
We cannot remove it now, but I wouldn't simply copy the behavior here.
So I'm suggesting not to use bpf_jit_enable either 1 or 2 at all.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 16:50 ` [RFC 06/12] nfp: add hardware cls_bpf offload Jakub Kicinski
@ 2016-06-01 20:20   ` Daniel Borkmann
  2016-06-01 20:52     ` Alexei Starovoitov
  2016-06-01 23:03   ` Daniel Borkmann
  1 sibling, 1 reply; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 20:20 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> Add hardware cls_bpf offload on our smart NICs.  Detect if
> capable firmware is loaded and use it to load the code JITed
> with just added translator onto programmable engines.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
[...]
> +static int
> +nfp_net_bpf_offload_prepare(struct nfp_net *nn,
> +			    struct tc_cls_bpf_offload *cls_bpf,
> +			    struct nfp_bpf_result *res,
> +			    void **code, dma_addr_t *dma_addr, u16 max_instr)
> +{
> +	unsigned int code_sz = max_instr * sizeof(u64);
> +	u16 start_off, tgt_out, tgt_abort;
> +	const struct tc_action *a;
> +	int err;
> +
> +	if (tc_no_actions(cls_bpf->exts))
> +		return -EINVAL;
> +
> +	tc_for_each_action(a, cls_bpf->exts) {
> +		if (!is_tcf_gact_shot(a))
> +			return -EINVAL;
> +	}
> +
> +	if (cls_bpf->exts_integrated)
> +		return -EINVAL;

So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
Direct-action is I would say the most typical way to run cls_bpf as it allows
you to more naturally and efficiently code programs in the sense that classification
and action is already combined in a single program, so there's no additional
overhead of a linear action chain required, and a single program can already
have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
usually enough to run a single cls_bpf instance, for example, on sch_clsact
ingress or egress parent, nothing more than that to get the job done. I think
the cls_bpf->exts_integrated test could probably come first and if it's false,
we'd need to walk the actions?

> +	start_off = nn_readw(nn, NFP_NET_CFG_BPF_START);
> +	tgt_out = nn_readw(nn, NFP_NET_CFG_BPF_TGT_OUT);
> +	tgt_abort = nn_readw(nn, NFP_NET_CFG_BPF_TGT_ABORT);
> +
> +	*code = dma_zalloc_coherent(&nn->pdev->dev, code_sz, dma_addr,
> +				    GFP_KERNEL);
> +	if (!*code)
> +		return -ENOMEM;
> +
> +	err = nfp_bpf_jit(cls_bpf->filter, *code, start_off, tgt_out, tgt_abort,
> +			  max_instr, res);
> +	if (err)
> +		goto out;
> +
> +	return 0;
> +
> +out:
> +	dma_free_coherent(&nn->pdev->dev, code_sz, *code, *dma_addr);
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 20:20   ` Daniel Borkmann
@ 2016-06-01 20:52     ` Alexei Starovoitov
  2016-06-01 21:15       ` Jakub Kicinski
                         ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Alexei Starovoitov @ 2016-06-01 20:52 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Jakub Kicinski, netdev, ast, dinan.gunawardena

On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> >Add hardware cls_bpf offload on our smart NICs.  Detect if
> >capable firmware is loaded and use it to load the code JITed
> >with just added translator onto programmable engines.
> >
> >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> >Reviewed-by: Simon Horman <simon.horman@netronome.com>
> [...]
> >+static int
> >+nfp_net_bpf_offload_prepare(struct nfp_net *nn,
> >+			    struct tc_cls_bpf_offload *cls_bpf,
> >+			    struct nfp_bpf_result *res,
> >+			    void **code, dma_addr_t *dma_addr, u16 max_instr)
> >+{
> >+	unsigned int code_sz = max_instr * sizeof(u64);
> >+	u16 start_off, tgt_out, tgt_abort;
> >+	const struct tc_action *a;
> >+	int err;
> >+
> >+	if (tc_no_actions(cls_bpf->exts))
> >+		return -EINVAL;
> >+
> >+	tc_for_each_action(a, cls_bpf->exts) {
> >+		if (!is_tcf_gact_shot(a))
> >+			return -EINVAL;
> >+	}
> >+
> >+	if (cls_bpf->exts_integrated)
> >+		return -EINVAL;
> 
> So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
> Direct-action is I would say the most typical way to run cls_bpf as it allows
> you to more naturally and efficiently code programs in the sense that classification
> and action is already combined in a single program, so there's no additional
> overhead of a linear action chain required, and a single program can already
> have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
> usually enough to run a single cls_bpf instance, for example, on sch_clsact
> ingress or egress parent, nothing more than that to get the job done. I think
> the cls_bpf->exts_integrated test could probably come first and if it's false,
> we'd need to walk the actions?

I think it makes sense to offload da mode only. Doing tc_for_each_action
walk like above is ok, but the number of bpf programs with only separate
gact is diminishingly small and we don't recommend to use it anymore.
That's the stuff we used when da wasn't available.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 02/12] net: cls_bpf: add hardware offload
  2016-06-01 17:13   ` John Fastabend
@ 2016-06-01 20:59     ` Jakub Kicinski
  0 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 20:59 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, ast, daniel, dinan.gunawardena

On Wed, 1 Jun 2016 10:13:48 -0700, John Fastabend wrote:
> > +static void cls_bpf_stop_offload(struct tcf_proto *tp,
> > +				 struct cls_bpf_prog *prog)
> > +{
> > +	struct net_device *dev = tp->q->dev_queue->dev;
> > +
> > +	if (!prog->offloaded)
> > +		return;
> > +	if (WARN_ON(!tc_should_offload(dev, 0)))
> > +		return;  
> 
> This warn on is a bit concerning it looks like you can get
> a program stuck in hardware but removed from the software
> stack. Any idea why this could happen? I think it is better
> to solve the root problem or just remove this if its dbg code.
> 
> One thought is you need to block disabling the ethtool flag
> if the hardware has running ebpf codes. Haven't got to the
> driver patches yet though so not sure if you did this or not.
> And now that I think about it I better go check the other
> drivers.

Yes, I do refuse clearing the ethtool flag if offload is active.
I put the warning there in an attempt to document that "this should
never happen".  I'll drop it in the next revision.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 19:40   ` Daniel Borkmann
@ 2016-06-01 21:05     ` Jakub Kicinski
  2016-06-01 21:21       ` Daniel Borkmann
  0 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 21:05 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, ast, dinan.gunawardena

On Wed, 01 Jun 2016 21:40:23 +0200, Daniel Borkmann wrote:
> > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > index f4297c8a42fe..93a86edf3bd8 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -395,6 +395,7 @@ enum {
> >   	TCA_BPF_FD,
> >   	TCA_BPF_NAME,
> >   	TCA_BPF_FLAGS,
> > +	TCA_BPF_GEN_TCA_FLAGS,  
> 
> Small nit for the non-RFC set: I'd simply name that TCA_BPF_FLAGS_GEN.

OK!

> > @@ -400,8 +406,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
> >
> >   		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
> >   	}
> > +	if (tb[TCA_BPF_GEN_TCA_FLAGS])
> > +		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
> >
> >   	prog->exts_integrated = have_exts;
> > +	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;  
> 
> Invalid flags should probably be rejected here with -EINVAL or something.

Indeed, that would be more in line with what is done for "the other"
flags attribute, but not so much with how flower and u32 handles
flags. I like the stricter approach better, though, so unless someone
speaks up I'll do as you suggest.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 20:52     ` Alexei Starovoitov
@ 2016-06-01 21:15       ` Jakub Kicinski
  2016-06-01 21:51         ` Alexei Starovoitov
  2016-06-01 21:16       ` Daniel Borkmann
  2016-06-01 21:36       ` John Fastabend
  2 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 21:15 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, netdev, ast, dinan.gunawardena

On Wed, 1 Jun 2016 13:52:01 -0700, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
> > On 06/01/2016 06:50 PM, Jakub Kicinski wrote:  
> > >Add hardware cls_bpf offload on our smart NICs.  Detect if
> > >capable firmware is loaded and use it to load the code JITed
> > >with just added translator onto programmable engines.
> > >
> > >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > >Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> > >Reviewed-by: Simon Horman <simon.horman@netronome.com>  
> > [...]  
> > >+static int
> > >+nfp_net_bpf_offload_prepare(struct nfp_net *nn,
> > >+			    struct tc_cls_bpf_offload *cls_bpf,
> > >+			    struct nfp_bpf_result *res,
> > >+			    void **code, dma_addr_t *dma_addr, u16 max_instr)
> > >+{
> > >+	unsigned int code_sz = max_instr * sizeof(u64);
> > >+	u16 start_off, tgt_out, tgt_abort;
> > >+	const struct tc_action *a;
> > >+	int err;
> > >+
> > >+	if (tc_no_actions(cls_bpf->exts))
> > >+		return -EINVAL;
> > >+
> > >+	tc_for_each_action(a, cls_bpf->exts) {
> > >+		if (!is_tcf_gact_shot(a))
> > >+			return -EINVAL;
> > >+	}
> > >+
> > >+	if (cls_bpf->exts_integrated)
> > >+		return -EINVAL;  
> > 
> > So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
> > Direct-action is I would say the most typical way to run cls_bpf as it allows
> > you to more naturally and efficiently code programs in the sense that classification
> > and action is already combined in a single program, so there's no additional
> > overhead of a linear action chain required, and a single program can already
> > have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
> > usually enough to run a single cls_bpf instance, for example, on sch_clsact
> > ingress or egress parent, nothing more than that to get the job done. I think
> > the cls_bpf->exts_integrated test could probably come first and if it's false,
> > we'd need to walk the actions?  
> 
> I think it makes sense to offload da mode only. Doing tc_for_each_action
> walk like above is ok, but the number of bpf programs with only separate
> gact is diminishingly small and we don't recommend to use it anymore.
> That's the stuff we used when da wasn't available.
 
Let me make sure I understand - to get da offloaded I would have to
check that all return values are either OK or SHOT?  That was my
understanding but since it would make the initial submission more
complex I decided against it...

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 20:52     ` Alexei Starovoitov
  2016-06-01 21:15       ` Jakub Kicinski
@ 2016-06-01 21:16       ` Daniel Borkmann
  2016-06-01 21:36       ` John Fastabend
  2 siblings, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 21:16 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Jakub Kicinski, netdev, ast, dinan.gunawardena

On 06/01/2016 10:52 PM, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
>> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
>>> Add hardware cls_bpf offload on our smart NICs.  Detect if
>>> capable firmware is loaded and use it to load the code JITed
>>> with just added translator onto programmable engines.
>>>
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>> [...]
>>> +static int
>>> +nfp_net_bpf_offload_prepare(struct nfp_net *nn,
>>> +			    struct tc_cls_bpf_offload *cls_bpf,
>>> +			    struct nfp_bpf_result *res,
>>> +			    void **code, dma_addr_t *dma_addr, u16 max_instr)
>>> +{
>>> +	unsigned int code_sz = max_instr * sizeof(u64);
>>> +	u16 start_off, tgt_out, tgt_abort;
>>> +	const struct tc_action *a;
>>> +	int err;
>>> +
>>> +	if (tc_no_actions(cls_bpf->exts))
>>> +		return -EINVAL;
>>> +
>>> +	tc_for_each_action(a, cls_bpf->exts) {
>>> +		if (!is_tcf_gact_shot(a))
>>> +			return -EINVAL;
>>> +	}
>>> +
>>> +	if (cls_bpf->exts_integrated)
>>> +		return -EINVAL;
>>
>> So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
>> Direct-action is I would say the most typical way to run cls_bpf as it allows
>> you to more naturally and efficiently code programs in the sense that classification
>> and action is already combined in a single program, so there's no additional
>> overhead of a linear action chain required, and a single program can already
>> have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
>> usually enough to run a single cls_bpf instance, for example, on sch_clsact
>> ingress or egress parent, nothing more than that to get the job done. I think
>> the cls_bpf->exts_integrated test could probably come first and if it's false,
>> we'd need to walk the actions?
>
> I think it makes sense to offload da mode only. Doing tc_for_each_action
> walk like above is ok, but the number of bpf programs with only separate
> gact is diminishingly small and we don't recommend to use it anymore.
> That's the stuff we used when da wasn't available.

Yeah, that makes sense to me, I presume that would also be easier to manage due
to all being self-contained.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 21:05     ` Jakub Kicinski
@ 2016-06-01 21:21       ` Daniel Borkmann
  2016-06-01 21:26         ` Jakub Kicinski
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 21:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, dinan.gunawardena

On 06/01/2016 11:05 PM, Jakub Kicinski wrote:
> On Wed, 01 Jun 2016 21:40:23 +0200, Daniel Borkmann wrote:
[...]
>>> @@ -400,8 +406,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
>>>
>>>    		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
>>>    	}
>>> +	if (tb[TCA_BPF_GEN_TCA_FLAGS])
>>> +		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
>>>
>>>    	prog->exts_integrated = have_exts;
>>> +	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;
>>
>> Invalid flags should probably be rejected here with -EINVAL or something.
>
> Indeed, that would be more in line with what is done for "the other"
> flags attribute, but not so much with how flower and u32 handles
> flags. I like the stricter approach better, though, so unless someone
> speaks up I'll do as you suggest.

If I see this correctly, in patch 4 you're already following up on that
with the tc_flags_valid() check, it's probably okay to leave it as-is then.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 05/12] nfp: add BPF to NFP code translator
  2016-06-01 20:15     ` Alexei Starovoitov
@ 2016-06-01 21:23       ` Jakub Kicinski
  2016-06-02 16:21       ` John Fastabend
  1 sibling, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 21:23 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Daniel Borkmann, netdev, ast, dinan.gunawardena

On Wed, 1 Jun 2016 13:15:48 -0700, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 10:03:04PM +0200, Daniel Borkmann wrote:
> > On 06/01/2016 06:50 PM, Jakub Kicinski wrote:  
> > >Add translator for JITing eBPF to operations which
> > >can be executed on NFP's programmable engines.
> > >
> > >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > >Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> > >Reviewed-by: Simon Horman <simon.horman@netronome.com>  
> > [...]  
> > >+int
> > >+nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
> > >+	    unsigned int tgt_out, unsigned int tgt_abort,
> > >+	    unsigned int prog_sz, struct nfp_bpf_result *res)
> > >+{
> > >+	struct nfp_prog *nfp_prog;
> > >+	int ret;
> > >+
> > >+	/* TODO: maybe make this dependent on bpf_jit_enable? */  
> > 
> > Probably makes sense to leave it independent from this.
> > 
> > Maybe that would rather be an ethtool flag/setting?  
> 
> Agree that it should be independent of bpf_jit_enable,
> since that's very different JIT. The whole point of hw offload
> is that bpf is translated into something hw understand natively.
> Gating it by sysctl or another flag doesn't make much sense to me.
> In this case the user will say 'do offload tc+cls_bpf into a nic'
> and nic should either do it or not. No need for ethtool flag either.
> One can argue that that bpf_jit_enable=2 was useful for debugging
> of JIT itself, but looks like it was only used by jit developers
> like us, but we would be fine with temp printk while debugging.
> At least there was never a case where jit had a bug and we would
> ask a person reporting a bug to send us back jit_enable=2 output.
> We cannot remove it now, but I wouldn't simply copy the behavior here.
> So I'm suggesting not to use bpf_jit_enable either 1 or 2 at all.

That clarifies things a lot, thanks!  I was indeed mostly hoping to use
this to control debug prints, but as you said those are mostly useful
for the developers so I didn't even include the debug/trace infra here.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 21:21       ` Daniel Borkmann
@ 2016-06-01 21:26         ` Jakub Kicinski
  2016-06-01 21:31           ` Daniel Borkmann
  0 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 21:26 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, ast, dinan.gunawardena

On Wed, 01 Jun 2016 23:21:40 +0200, Daniel Borkmann wrote:
> On 06/01/2016 11:05 PM, Jakub Kicinski wrote:
> > On Wed, 01 Jun 2016 21:40:23 +0200, Daniel Borkmann wrote:  
> [...]
> >>> @@ -400,8 +406,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
> >>>
> >>>    		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
> >>>    	}
> >>> +	if (tb[TCA_BPF_GEN_TCA_FLAGS])
> >>> +		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
> >>>
> >>>    	prog->exts_integrated = have_exts;
> >>> +	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;  
> >>
> >> Invalid flags should probably be rejected here with -EINVAL or something.  
> >
> > Indeed, that would be more in line with what is done for "the other"
> > flags attribute, but not so much with how flower and u32 handles
> > flags. I like the stricter approach better, though, so unless someone
> > speaks up I'll do as you suggest.  
> 
> If I see this correctly, in patch 4 you're already following up on that
> with the tc_flags_valid() check, it's probably okay to leave it as-is then.

My concern was that if someone adds a new flag for u32/flower
tc_flags_valid() will have to accept it but cls_bpf will ignore it.  So
I went with clearing things we don't support so that the user can at
least see in tc show that the flags he thrown at us did not stick...

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 21:26         ` Jakub Kicinski
@ 2016-06-01 21:31           ` Daniel Borkmann
  0 siblings, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 21:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, dinan.gunawardena

On 06/01/2016 11:26 PM, Jakub Kicinski wrote:
> On Wed, 01 Jun 2016 23:21:40 +0200, Daniel Borkmann wrote:
>> On 06/01/2016 11:05 PM, Jakub Kicinski wrote:
>>> On Wed, 01 Jun 2016 21:40:23 +0200, Daniel Borkmann wrote:
>> [...]
>>>>> @@ -400,8 +406,11 @@ static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
>>>>>
>>>>>     		have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
>>>>>     	}
>>>>> +	if (tb[TCA_BPF_GEN_TCA_FLAGS])
>>>>> +		gen_flags = nla_get_u32(tb[TCA_BPF_GEN_TCA_FLAGS]);
>>>>>
>>>>>     	prog->exts_integrated = have_exts;
>>>>> +	prog->gen_flags = gen_flags & CLS_BPF_SUPPORTED_GEN_FLAGS;
>>>>
>>>> Invalid flags should probably be rejected here with -EINVAL or something.
>>>
>>> Indeed, that would be more in line with what is done for "the other"
>>> flags attribute, but not so much with how flower and u32 handles
>>> flags. I like the stricter approach better, though, so unless someone
>>> speaks up I'll do as you suggest.
>>
>> If I see this correctly, in patch 4 you're already following up on that
>> with the tc_flags_valid() check, it's probably okay to leave it as-is then.
>
> My concern was that if someone adds a new flag for u32/flower
> tc_flags_valid() will have to accept it but cls_bpf will ignore it.  So
> I went with clearing things we don't support so that the user can at
> least see in tc show that the flags he thrown at us did not stick...

Ok, then doing so is fine. You could probably add that as a comment there
for the rejection.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 20:52     ` Alexei Starovoitov
  2016-06-01 21:15       ` Jakub Kicinski
  2016-06-01 21:16       ` Daniel Borkmann
@ 2016-06-01 21:36       ` John Fastabend
  2016-06-02  6:57         ` Jiri Pirko
  2 siblings, 1 reply; 54+ messages in thread
From: John Fastabend @ 2016-06-01 21:36 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Jakub Kicinski, netdev, ast, dinan.gunawardena

On 16-06-01 01:52 PM, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
>> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
>>> Add hardware cls_bpf offload on our smart NICs.  Detect if
>>> capable firmware is loaded and use it to load the code JITed
>>> with just added translator onto programmable engines.
>>>
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>> [...]
>>> +static int
>>> +nfp_net_bpf_offload_prepare(struct nfp_net *nn,
>>> +			    struct tc_cls_bpf_offload *cls_bpf,
>>> +			    struct nfp_bpf_result *res,
>>> +			    void **code, dma_addr_t *dma_addr, u16 max_instr)
>>> +{
>>> +	unsigned int code_sz = max_instr * sizeof(u64);
>>> +	u16 start_off, tgt_out, tgt_abort;
>>> +	const struct tc_action *a;
>>> +	int err;
>>> +
>>> +	if (tc_no_actions(cls_bpf->exts))
>>> +		return -EINVAL;
>>> +
>>> +	tc_for_each_action(a, cls_bpf->exts) {
>>> +		if (!is_tcf_gact_shot(a))
>>> +			return -EINVAL;
>>> +	}
>>> +
>>> +	if (cls_bpf->exts_integrated)
>>> +		return -EINVAL;
>>
>> So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
>> Direct-action is I would say the most typical way to run cls_bpf as it allows
>> you to more naturally and efficiently code programs in the sense that classification
>> and action is already combined in a single program, so there's no additional
>> overhead of a linear action chain required, and a single program can already
>> have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
>> usually enough to run a single cls_bpf instance, for example, on sch_clsact
>> ingress or egress parent, nothing more than that to get the job done. I think
>> the cls_bpf->exts_integrated test could probably come first and if it's false,
>> we'd need to walk the actions?
> 
> I think it makes sense to offload da mode only. Doing tc_for_each_action
> walk like above is ok, but the number of bpf programs with only separate
> gact is diminishingly small and we don't recommend to use it anymore.
> That's the stuff we used when da wasn't available.
> 

+1 I've been using da mode only as well.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 21:15       ` Jakub Kicinski
@ 2016-06-01 21:51         ` Alexei Starovoitov
  0 siblings, 0 replies; 54+ messages in thread
From: Alexei Starovoitov @ 2016-06-01 21:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Daniel Borkmann, netdev, ast, dinan.gunawardena

On Wed, Jun 01, 2016 at 10:15:57PM +0100, Jakub Kicinski wrote:
> On Wed, 1 Jun 2016 13:52:01 -0700, Alexei Starovoitov wrote:
> > On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
> > > On 06/01/2016 06:50 PM, Jakub Kicinski wrote:  
> > > >Add hardware cls_bpf offload on our smart NICs.  Detect if
> > > >capable firmware is loaded and use it to load the code JITed
> > > >with just added translator onto programmable engines.
> > > >
> > > >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > > >Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> > > >Reviewed-by: Simon Horman <simon.horman@netronome.com>  
> > > [...]  
> > > >+static int
> > > >+nfp_net_bpf_offload_prepare(struct nfp_net *nn,
> > > >+			    struct tc_cls_bpf_offload *cls_bpf,
> > > >+			    struct nfp_bpf_result *res,
> > > >+			    void **code, dma_addr_t *dma_addr, u16 max_instr)
> > > >+{
> > > >+	unsigned int code_sz = max_instr * sizeof(u64);
> > > >+	u16 start_off, tgt_out, tgt_abort;
> > > >+	const struct tc_action *a;
> > > >+	int err;
> > > >+
> > > >+	if (tc_no_actions(cls_bpf->exts))
> > > >+		return -EINVAL;
> > > >+
> > > >+	tc_for_each_action(a, cls_bpf->exts) {
> > > >+		if (!is_tcf_gact_shot(a))
> > > >+			return -EINVAL;
> > > >+	}
> > > >+
> > > >+	if (cls_bpf->exts_integrated)
> > > >+		return -EINVAL;  
> > > 
> > > So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
> > > Direct-action is I would say the most typical way to run cls_bpf as it allows
> > > you to more naturally and efficiently code programs in the sense that classification
> > > and action is already combined in a single program, so there's no additional
> > > overhead of a linear action chain required, and a single program can already
> > > have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
> > > usually enough to run a single cls_bpf instance, for example, on sch_clsact
> > > ingress or egress parent, nothing more than that to get the job done. I think
> > > the cls_bpf->exts_integrated test could probably come first and if it's false,
> > > we'd need to walk the actions?  
> > 
> > I think it makes sense to offload da mode only. Doing tc_for_each_action
> > walk like above is ok, but the number of bpf programs with only separate
> > gact is diminishingly small and we don't recommend to use it anymore.
> > That's the stuff we used when da wasn't available.
>  
> Let me make sure I understand - to get da offloaded I would have to
> check that all return values are either OK or SHOT?  That was my

yes, but jit need to check all return values anyway and do even
more checks depending on how actions are configured, what is default
result, then re-jit the whole thing if default and/or act is changed, etc.
imo da is actually much simpler to jit.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 07/12] nfp: add skb mark support to the bpf offload
  2016-06-01 16:50 ` [RFC 07/12] nfp: add skb mark support to the bpf offload Jakub Kicinski
@ 2016-06-01 21:56   ` Alexei Starovoitov
  2016-06-01 22:19     ` Jakub Kicinski
  0 siblings, 1 reply; 54+ messages in thread
From: Alexei Starovoitov @ 2016-06-01 21:56 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, dinan.gunawardena

On Wed, Jun 01, 2016 at 05:50:09PM +0100, Jakub Kicinski wrote:
> Skb marking should be set in designated register, FW will
> prepend it to the packet for us.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---
>  drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c    | 20 ++++++++++++++++++++
>  drivers/net/ethernet/netronome/nfp/nfp_net.h        |  2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  8 +++++++-
>  3 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> index d7eecfceba5c..b31e673a6fe8 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> @@ -46,6 +46,8 @@
>  
>  #define REG_IMM0_N	30 /* Bank AB */
>  #define REG_QNUM	29 /* Bank AB */
> +#define REG_MARK	28 /* Bank A */
> +#define REG_MARK_STS	28 /* Bank B */
>  
>  /* --- NFP prog --- */
>  /* Foreach "multiple" entries macros provide pos and next<n> pointers.
> @@ -416,6 +418,15 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
>  	return construct_data_ind_ld(nfp_prog, offset, 0, false, size);
>  }
>  
> +static int wrp_skb_mark(struct nfp_prog *nfp_prog, u16 src)
> +{
> +	__emit_alu(nfp_prog, REG_MARK, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
> +		   false, false);
> +	__emit_immed(nfp_prog, REG_MARK_STS, ALU_DST_B, 1, false);
> +
> +	return 0;
> +}
> +
>  static int
>  construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
>  		 enum alu_op alu_op, bool sw)
> @@ -538,6 +549,14 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>  	return 0;
>  }
>  
> +static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> +{
> +	if (meta->insn.off == offsetof(struct sk_buff, mark))
> +		return wrp_skb_mark(nfp_prog, meta->insn.src_reg * 2);

couldn't figure out from the diff or commit log...
what is the meaning of 'skb->mark' for nfp?
Looks like it's writing into magic register and fw will do something
with that register?
'mark' is packet metadata. Could you explain how it's passing
this metadata? Is it on the wire as well or somehow in the wire
only between two nfps?
Looks like interesting feature.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 12/12] nfp: bpf: add denser mode of execution
  2016-06-01 16:50 ` [RFC 12/12] nfp: bpf: add denser mode of execution Jakub Kicinski
@ 2016-06-01 22:01   ` Alexei Starovoitov
  2016-06-01 22:47     ` Jakub Kicinski
  0 siblings, 1 reply; 54+ messages in thread
From: Alexei Starovoitov @ 2016-06-01 22:01 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, dinan.gunawardena

On Wed, Jun 01, 2016 at 05:50:14PM +0100, Jakub Kicinski wrote:
> If BPF uses less than 7 registers programmable engines
> can process twice as many packets in parallel.  Signal
> this denser mode of operation to FW by setting the lowest
> bit in DMA address of the machine code buffer.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>

wow. that sounds pretty cool.
I was about to ask whether we can help HW to be more efficient
by doing something on bpf side like annotating the registers or
adding 'hw_only' registers...
but looks like less registers is actually good, since NFP jit
can parallelize it? Truly wow.
if you can share the hw architecture details and explain more
on how this 'dense_mode' works, would be awesome.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats
  2016-06-01 16:50 ` [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats Jakub Kicinski
  2016-06-01 17:20   ` John Fastabend
@ 2016-06-01 22:09   ` Daniel Borkmann
  1 sibling, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 22:09 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> Call into offloaded filters to update stats.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 07/12] nfp: add skb mark support to the bpf offload
  2016-06-01 21:56   ` Alexei Starovoitov
@ 2016-06-01 22:19     ` Jakub Kicinski
  2016-06-01 22:30       ` Daniel Borkmann
  0 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 22:19 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, ast, daniel, dinan.gunawardena

On Wed, 1 Jun 2016 14:56:26 -0700, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 05:50:09PM +0100, Jakub Kicinski wrote:
> > Skb marking should be set in designated register, FW will
> > prepend it to the packet for us.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> > Reviewed-by: Simon Horman <simon.horman@netronome.com>
> > ---
> >  drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c    | 20 ++++++++++++++++++++
> >  drivers/net/ethernet/netronome/nfp/nfp_net.h        |  2 +-
> >  drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  8 +++++++-
> >  3 files changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> > index d7eecfceba5c..b31e673a6fe8 100644
> > --- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> > +++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> > @@ -46,6 +46,8 @@
> >  
> >  #define REG_IMM0_N	30 /* Bank AB */
> >  #define REG_QNUM	29 /* Bank AB */
> > +#define REG_MARK	28 /* Bank A */
> > +#define REG_MARK_STS	28 /* Bank B */
> >  
> >  /* --- NFP prog --- */
> >  /* Foreach "multiple" entries macros provide pos and next<n> pointers.
> > @@ -416,6 +418,15 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
> >  	return construct_data_ind_ld(nfp_prog, offset, 0, false, size);
> >  }
> >  
> > +static int wrp_skb_mark(struct nfp_prog *nfp_prog, u16 src)
> > +{
> > +	__emit_alu(nfp_prog, REG_MARK, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
> > +		   false, false);
> > +	__emit_immed(nfp_prog, REG_MARK_STS, ALU_DST_B, 1, false);
> > +
> > +	return 0;
> > +}
> > +
> >  static int
> >  construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
> >  		 enum alu_op alu_op, bool sw)
> > @@ -538,6 +549,14 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> >  	return 0;
> >  }
> >  
> > +static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> > +{
> > +	if (meta->insn.off == offsetof(struct sk_buff, mark))
> > +		return wrp_skb_mark(nfp_prog, meta->insn.src_reg * 2);  
> 
> couldn't figure out from the diff or commit log...
> what is the meaning of 'skb->mark' for nfp?
> Looks like it's writing into magic register and fw will do something
> with that register?
> 'mark' is packet metadata. Could you explain how it's passing
> this metadata? Is it on the wire as well or somehow in the wire
> only between two nfps?
> Looks like interesting feature.

Oh, it's not a magic register, it just an "API" I have between the BPF
and the datapath firmware.  Whatever is put in that register will be
prepended to the packet (if the mark status register is set).

I was actually pondering making the prepend TC IFE-compatible. Not sure
if it's worth the overhead but definitely something that can be done if
there is interest.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 07/12] nfp: add skb mark support to the bpf offload
  2016-06-01 22:19     ` Jakub Kicinski
@ 2016-06-01 22:30       ` Daniel Borkmann
  2016-06-01 23:01         ` Jakub Kicinski
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 22:30 UTC (permalink / raw)
  To: Jakub Kicinski, Alexei Starovoitov; +Cc: netdev, ast, dinan.gunawardena

On 06/02/2016 12:19 AM, Jakub Kicinski wrote:
> On Wed, 1 Jun 2016 14:56:26 -0700, Alexei Starovoitov wrote:
>> On Wed, Jun 01, 2016 at 05:50:09PM +0100, Jakub Kicinski wrote:
>>> Skb marking should be set in designated register, FW will
>>> prepend it to the packet for us.
>>>
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>>> ---
>>>   drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c    | 20 ++++++++++++++++++++
>>>   drivers/net/ethernet/netronome/nfp/nfp_net.h        |  2 +-
>>>   drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  8 +++++++-
>>>   3 files changed, 28 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
>>> index d7eecfceba5c..b31e673a6fe8 100644
>>> --- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
>>> +++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
>>> @@ -46,6 +46,8 @@
>>>
>>>   #define REG_IMM0_N	30 /* Bank AB */
>>>   #define REG_QNUM	29 /* Bank AB */
>>> +#define REG_MARK	28 /* Bank A */
>>> +#define REG_MARK_STS	28 /* Bank B */
>>>
>>>   /* --- NFP prog --- */
>>>   /* Foreach "multiple" entries macros provide pos and next<n> pointers.
>>> @@ -416,6 +418,15 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
>>>   	return construct_data_ind_ld(nfp_prog, offset, 0, false, size);
>>>   }
>>>
>>> +static int wrp_skb_mark(struct nfp_prog *nfp_prog, u16 src)
>>> +{
>>> +	__emit_alu(nfp_prog, REG_MARK, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
>>> +		   false, false);
>>> +	__emit_immed(nfp_prog, REG_MARK_STS, ALU_DST_B, 1, false);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>   static int
>>>   construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
>>>   		 enum alu_op alu_op, bool sw)
>>> @@ -538,6 +549,14 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>>>   	return 0;
>>>   }
>>>
>>> +static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
>>> +{
>>> +	if (meta->insn.off == offsetof(struct sk_buff, mark))
>>> +		return wrp_skb_mark(nfp_prog, meta->insn.src_reg * 2);
>>
>> couldn't figure out from the diff or commit log...
>> what is the meaning of 'skb->mark' for nfp?
>> Looks like it's writing into magic register and fw will do something
>> with that register?
>> 'mark' is packet metadata. Could you explain how it's passing
>> this metadata? Is it on the wire as well or somehow in the wire
>> only between two nfps?
>> Looks like interesting feature.
>
> Oh, it's not a magic register, it just an "API" I have between the BPF
> and the datapath firmware.  Whatever is put in that register will be
> prepended to the packet (if the mark status register is set).

That is very useful indeed!

Btw, do you later on plan to also add something similar like TC_ACT_REDIRECT,
f.e. to push the packet same or different NIC port out again w/o leaving the
HW?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 12/12] nfp: bpf: add denser mode of execution
  2016-06-01 22:01   ` Alexei Starovoitov
@ 2016-06-01 22:47     ` Jakub Kicinski
  0 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 22:47 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, ast, daniel, dinan.gunawardena

On Wed, 1 Jun 2016 15:01:16 -0700, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 05:50:14PM +0100, Jakub Kicinski wrote:
> > If BPF uses less than 7 registers programmable engines
> > can process twice as many packets in parallel.  Signal
> > this denser mode of operation to FW by setting the lowest
> > bit in DMA address of the machine code buffer.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> > Reviewed-by: Simon Horman <simon.horman@netronome.com>  
> 
> wow. that sounds pretty cool.
> I was about to ask whether we can help HW to be more efficient
> by doing something on bpf side like annotating the registers or
> adding 'hw_only' registers...
> but looks like less registers is actually good, since NFP jit
> can parallelize it? Truly wow.
> if you can share the hw architecture details and explain more
> on how this 'dense_mode' works, would be awesome.
 
I think the best resource of information about the card and its
internals would be the open-nfp website [1].  Regarding optimizations
there are definitely things which could be done.  The most obvious is 
helping 32bit JITs in general.  Annotations which registers are
32bit-only, which operations need zero-extending etc.  There is some
state in the verifier that would be useful here as well.  Knowing which
registers contain the skb pointer for instance would help to ignore any
operations on them (since as exemplified by simple optimization in
patch 10 the skb pointer has no meaning for the HW).  I'm sure more
such things will come up with time.

[1] http://open-nfp.org/resources/

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 07/12] nfp: add skb mark support to the bpf offload
  2016-06-01 22:30       ` Daniel Borkmann
@ 2016-06-01 23:01         ` Jakub Kicinski
  0 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 23:01 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, netdev, ast, dinan.gunawardena

On Thu, 02 Jun 2016 00:30:07 +0200, Daniel Borkmann wrote:
> On 06/02/2016 12:19 AM, Jakub Kicinski wrote:
> > On Wed, 1 Jun 2016 14:56:26 -0700, Alexei Starovoitov wrote:  
> >> On Wed, Jun 01, 2016 at 05:50:09PM +0100, Jakub Kicinski wrote:  
> >>> Skb marking should be set in designated register, FW will
> >>> prepend it to the packet for us.
> >>>
> >>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> >>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> >>> ---
> >>>   drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c    | 20 ++++++++++++++++++++
> >>>   drivers/net/ethernet/netronome/nfp/nfp_net.h        |  2 +-
> >>>   drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  8 +++++++-
> >>>   3 files changed, 28 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> >>> index d7eecfceba5c..b31e673a6fe8 100644
> >>> --- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> >>> +++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
> >>> @@ -46,6 +46,8 @@
> >>>
> >>>   #define REG_IMM0_N	30 /* Bank AB */
> >>>   #define REG_QNUM	29 /* Bank AB */
> >>> +#define REG_MARK	28 /* Bank A */
> >>> +#define REG_MARK_STS	28 /* Bank B */
> >>>
> >>>   /* --- NFP prog --- */
> >>>   /* Foreach "multiple" entries macros provide pos and next<n> pointers.
> >>> @@ -416,6 +418,15 @@ static int construct_data_ld(struct nfp_prog *nfp_prog, u16 offset, u8 size)
> >>>   	return construct_data_ind_ld(nfp_prog, offset, 0, false, size);
> >>>   }
> >>>
> >>> +static int wrp_skb_mark(struct nfp_prog *nfp_prog, u16 src)
> >>> +{
> >>> +	__emit_alu(nfp_prog, REG_MARK, ALU_DST_A, REG_NONE, ALU_OP_NONE, src,
> >>> +		   false, false);
> >>> +	__emit_immed(nfp_prog, REG_MARK_STS, ALU_DST_B, 1, false);
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>>   static int
> >>>   construct_br_imm(struct nfp_prog *nfp_prog, u32 imm, u16 dst, u8 br, u16 off,
> >>>   		 enum alu_op alu_op, bool sw)
> >>> @@ -538,6 +549,14 @@ static int imm_ld8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> >>>   	return 0;
> >>>   }
> >>>
> >>> +static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
> >>> +{
> >>> +	if (meta->insn.off == offsetof(struct sk_buff, mark))
> >>> +		return wrp_skb_mark(nfp_prog, meta->insn.src_reg * 2);  
> >>
> >> couldn't figure out from the diff or commit log...
> >> what is the meaning of 'skb->mark' for nfp?
> >> Looks like it's writing into magic register and fw will do something
> >> with that register?
> >> 'mark' is packet metadata. Could you explain how it's passing
> >> this metadata? Is it on the wire as well or somehow in the wire
> >> only between two nfps?
> >> Looks like interesting feature.  
> >
> > Oh, it's not a magic register, it just an "API" I have between the BPF
> > and the datapath firmware.  Whatever is put in that register will be
> > prepended to the packet (if the mark status register is set).  
> 
> That is very useful indeed!
> 
> Btw, do you later on plan to also add something similar like TC_ACT_REDIRECT,
> f.e. to push the packet same or different NIC port out again w/o leaving the
> HW?

I don't see any reason why we wouldn't be able to do that :)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 16:50 ` [RFC 06/12] nfp: add hardware cls_bpf offload Jakub Kicinski
  2016-06-01 20:20   ` Daniel Borkmann
@ 2016-06-01 23:03   ` Daniel Borkmann
  1 sibling, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-01 23:03 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast, dinan.gunawardena, john.fastabend

On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
> Add hardware cls_bpf offload on our smart NICs.  Detect if
> capable firmware is loaded and use it to load the code JITed
> with just added translator onto programmable engines.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>

[...]
> @@ -2386,6 +2387,21 @@ static struct rtnl_link_stats64 *nfp_net_stat64(struct net_device *netdev,
>   	return stats;
>   }
>
> +static int
> +nfp_net_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
> +		 struct tc_to_netdev *tc)
> +{
> +	struct nfp_net *nn = netdev_priv(netdev);
> +
> +	if (TC_H_MAJ(handle) != TC_H_MAJ(TC_H_INGRESS))
> +		return -EINVAL;

General question (maybe also to John, since this construct is used elsewhere too),
does this handle the case with sch_clsact since they share the same major code?
F.e. I have the subclass with minor number TC_H_MIN_INGRESS offloaded, but can still
use TC_H_MIN_EGRESS part in SW at the same time? Do we make sure to separate that?
If not, should this info be passed via tc_to_netdev?

> +	if (tc->type == TC_SETUP_CLSBPF && nn->cap & NFP_NET_CFG_CTRL_BPF)
> +		return nfp_net_bpf_offload(nn, handle, proto, tc->cls_bpf);
> +
> +	return -EINVAL;
> +}
> +
>   static int nfp_net_set_features(struct net_device *netdev,
>   				netdev_features_t features)
>   {
> @@ -2440,6 +2456,11 @@ static int nfp_net_set_features(struct net_device *netdev,
>   			new_ctrl &= ~NFP_NET_CFG_CTRL_GATHER;
>   	}
>
> +	if (changed & NETIF_F_HW_TC && nn->ctrl & NFP_NET_CFG_CTRL_BPF) {
> +		nn_err(nn, "Cannot disable HW TC offload while in use\n");
> +		return -EBUSY;
> +	}
> +

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 01/12] add basic register-field manipulation macros
  2016-06-01 20:15   ` Hannes Frederic Sowa
@ 2016-06-01 23:08     ` Jakub Kicinski
  2016-06-02 12:01       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-01 23:08 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, ast, daniel, dinan.gunawardena, Ivo van Doorn, Felix Fietkau

On Wed, 01 Jun 2016 22:15:36 +0200, Hannes Frederic Sowa wrote:
> Hello,
> 
> On Wed, Jun 1, 2016, at 18:50, Jakub Kicinski wrote:
> > C bitfields are problematic and best avoided.  Developers
> > interacting with hardware registers find themselves searching
> > for easy-to-use alternatives.  Common approach is to define
> > structures or sets of macros containing mask and shift pair.
> > Operation on the register are then performed as follows:
> > 
> >  field = (reg >> shift) & mask;
> > 
> >  reg &= ~(mask << shift);
> >  reg |= (field & mask) << shift;
> > 
> > Defining shift and mask separately is tedious.  Ivo van Doorn
> > came up with an idea of computing them at compilation time
> > based on a single shifted mask (later refined by Felix) which
> > can be used like this:
> > 
> >  field = FIELD_GET(REG_FIELD, reg);
> > 
> >  reg &= ~REG_FIELD;
> >  reg |= FIELD_PUT(REG_FIELD, field);
> > 
> > FIELD_{GET,PUT} macros take care of finding out what the
> > appropriate shift is based on compilation time ffs operation.
> > 
> > GENMASK can be used to define registers (which is usually
> > less error-prone and easier to match with datasheets).
> > 
> > This approach is the most convenient I've seen so to limit code
> > multiplication let's move the macros to a global header file.
> > 
> > Compared to Felix Fietkau's version I:
> >  - edited the comments a bit;
> >  - gave auxiliary macros _bf_ prefix;
> >  - dropped the UL specifier from 1 in _bf_low_bits,
> >  - renamed the FIELD_SET to FIELD_PUT;
> >  - added 64bit versions.
> > 
> > CC: Ivo van Doorn <IvDoorn@gmail.com>
> > CC: Felix Fietkau <nbd@openwrt.org>
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> > Reviewed-by: Simon Horman <simon.horman@netronome.com>
> > ---
> >  include/linux/bitfield.h | 89
> >  ++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 89 insertions(+)
> >  create mode 100644 include/linux/bitfield.h
> > 
> > diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
> > new file mode 100644
> > index 000000000000..3815c81f5b10
> > --- /dev/null
> > +++ b/include/linux/bitfield.h
> > @@ -0,0 +1,89 @@
> > +/*
> > + * Copyright (C) 2014 Felix Fietkau <nbd@openwrt.org>
> > + * Copyright (C) 2004 - 2009 Ivo van Doorn <IvDoorn@gmail.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2
> > + * as published by the Free Software Foundation
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + */
> > +
> > +#ifndef _LINUX_BITFIELD_H
> > +#define _LINUX_BITFIELD_H
> > +
> > +#include <asm/types.h>
> > +
> > +/*
> > + * Macros to find first set bit in a variable.
> > + * These macros behave the same as the __ffs() functions but the most
> > important
> > + * difference that this is done during compile-time rather than
> > run-time.
> > + */
> > +#define compile_ffs2(__x) \
> > +       __builtin_choose_expr(((__x) & 0x1), 0, 1)
> > +
> > +#define compile_ffs4(__x) \
> > +       __builtin_choose_expr(((__x) & 0x3), \
> > +                             (compile_ffs2((__x))), \
> > +                             (compile_ffs2((__x) >> 2) + 2))
> > +
> > +#define compile_ffs8(__x) \
> > +       __builtin_choose_expr(((__x) & 0xf), \
> > +                             (compile_ffs4((__x))), \
> > +                             (compile_ffs4((__x) >> 4) + 4))
> > +
> > +#define compile_ffs16(__x) \
> > +       __builtin_choose_expr(((__x) & 0xff), \
> > +                             (compile_ffs8((__x))), \
> > +                             (compile_ffs8((__x) >> 8) + 8))
> > +
> > +#define compile_ffs32(__x) \
> > +       __builtin_choose_expr(((__x) & 0xffff), \
> > +                             (compile_ffs16((__x))), \
> > +                             (compile_ffs16((__x) >> 16) + 16))
> > +
> > +#define compile_ffs64(__x) \
> > +       __builtin_choose_expr(((__x) & 0xffffffff), \
> > +                             (compile_ffs32((__x))), \
> > +                             (compile_ffs32(((__x) >> 16) >> 16) + 32))  
> 
> I wonder if this can already be done with __builtin_ffs/__builtin_ffsl.
> 
> So the macro would only need to do:
> 
> __builtin_choose_expr(__builtin_types_compatible_p(typeof(x), long,
> __builtin_ffsl(x),
> __builtin_choose_expr(__builtin_types_compatible_p(typeof(x), int,
> __builtin_ffs(x), (void)0)
> 
> Probably you can get away with the long version only.
> 
> Probably adding ffs and ffsl to the generic headers and using them would
> also be worthwhile.
> 
> Otherwise you should be able to express all of this via ilog2 in nearly
> one line?

Yes, the emphasis here is to be able to do it all at compilation time
and throw an error if there is something wrong with the mask.  I should
make that clear in the commit message and/or a comment.

> > +
> > +/*
> > + * Macros to validate that the mask given contains a contiguous set of
> > bits.
> > + * Note that we cannot use the is_power_of_2() function since this check
> > + * must be done at compile-time.
> > + */
> > +#define _bf_is_power_of_two(x) ( !((x) & ((x) - 1)) )  
> 
> We already provide a macro is_power_of_2.

Same here, is_power_of_2() is a static inline unfortunately.  I tried
using it but it makes compilation time checking not work.

> > +#define _bf_low_bits(x)                ( ((x) - 1) & ~(x) )  
> 
> GENMASK?

How do I use GENMASK to convert:
0x00ffff00
to
0x000000ff
?
Using the compile_time_ffs() macros?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-01 21:36       ` John Fastabend
@ 2016-06-02  6:57         ` Jiri Pirko
  2016-06-02 12:13           ` Jakub Kicinski
  0 siblings, 1 reply; 54+ messages in thread
From: Jiri Pirko @ 2016-06-02  6:57 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski, netdev, ast,
	dinan.gunawardena

Wed, Jun 01, 2016 at 11:36:48PM CEST, john.fastabend@gmail.com wrote:
>On 16-06-01 01:52 PM, Alexei Starovoitov wrote:
>> On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
>>> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
>>>> Add hardware cls_bpf offload on our smart NICs.  Detect if
>>>> capable firmware is loaded and use it to load the code JITed
>>>> with just added translator onto programmable engines.
>>>>
>>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>>> [...]
>>>> +static int
>>>> +nfp_net_bpf_offload_prepare(struct nfp_net *nn,
>>>> +			    struct tc_cls_bpf_offload *cls_bpf,
>>>> +			    struct nfp_bpf_result *res,
>>>> +			    void **code, dma_addr_t *dma_addr, u16 max_instr)
>>>> +{
>>>> +	unsigned int code_sz = max_instr * sizeof(u64);
>>>> +	u16 start_off, tgt_out, tgt_abort;
>>>> +	const struct tc_action *a;
>>>> +	int err;
>>>> +
>>>> +	if (tc_no_actions(cls_bpf->exts))
>>>> +		return -EINVAL;
>>>> +
>>>> +	tc_for_each_action(a, cls_bpf->exts) {
>>>> +		if (!is_tcf_gact_shot(a))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (cls_bpf->exts_integrated)
>>>> +		return -EINVAL;
>>>
>>> So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
>>> Direct-action is I would say the most typical way to run cls_bpf as it allows
>>> you to more naturally and efficiently code programs in the sense that classification
>>> and action is already combined in a single program, so there's no additional
>>> overhead of a linear action chain required, and a single program can already
>>> have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
>>> usually enough to run a single cls_bpf instance, for example, on sch_clsact
>>> ingress or egress parent, nothing more than that to get the job done. I think
>>> the cls_bpf->exts_integrated test could probably come first and if it's false,
>>> we'd need to walk the actions?
>> 
>> I think it makes sense to offload da mode only. Doing tc_for_each_action
>> walk like above is ok, but the number of bpf programs with only separate
>> gact is diminishingly small and we don't recommend to use it anymore.
>> That's the stuff we used when da wasn't available.
>> 
>
>+1 I've been using da mode only as well.

I also think we should support offload for da mode only for cls_bpf

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 02/12] net: cls_bpf: add hardware offload
  2016-06-01 16:50 ` [RFC 02/12] net: cls_bpf: add hardware offload Jakub Kicinski
  2016-06-01 17:13   ` John Fastabend
  2016-06-01 19:34   ` Daniel Borkmann
@ 2016-06-02  7:17   ` Jiri Pirko
  2016-06-02 12:07     ` Jakub Kicinski
  2 siblings, 1 reply; 54+ messages in thread
From: Jiri Pirko @ 2016-06-02  7:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, dinan.gunawardena

Wed, Jun 01, 2016 at 06:50:04PM CEST, jakub.kicinski@netronome.com wrote:
>This patch adds hardware offload capability to cls_bpf classifier,
>similar to what have been done with U32 and flower.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>Reviewed-by: Simon Horman <simon.horman@netronome.com>
>---
> include/linux/netdevice.h |  2 ++
> include/net/pkt_cls.h     | 14 ++++++++++
> net/sched/cls_bpf.c       | 71 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 87 insertions(+)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index f45929ce8157..6c8364b8aae9 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -785,6 +785,7 @@ enum {
> 	TC_SETUP_MQPRIO,
> 	TC_SETUP_CLSU32,
> 	TC_SETUP_CLSFLOWER,
>+	TC_SETUP_CLSBPF,
> };
> 
> struct tc_cls_u32_offload;
>@@ -795,6 +796,7 @@ struct tc_to_netdev {
> 		u8 tc;
> 		struct tc_cls_u32_offload *cls_u32;
> 		struct tc_cls_flower_offload *cls_flower;
>+		struct tc_cls_bpf_offload *cls_bpf;
> 	};
> };
> 
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 0f7efa88f210..10b7e8cdc98c 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -438,4 +438,18 @@ struct tc_cls_flower_offload {
> 	struct tcf_exts *exts;
> };
> 
>+enum tc_clsbpf_command {
>+	TC_CLSBPF_ADD,
>+	TC_CLSBPF_REPLACE,
>+	TC_CLSBPF_DESTROY,
>+};
>+
>+struct tc_cls_bpf_offload {
>+	enum tc_clsbpf_command command;
>+	struct tcf_exts *exts;
>+	struct bpf_prog *filter;
>+	const char *name;
>+	bool exts_integrated;
>+};
>+
> #endif
>diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
>index 7b342c779da7..b7c4c6dd6ad6 100644
>--- a/net/sched/cls_bpf.c
>+++ b/net/sched/cls_bpf.c
>@@ -39,6 +39,7 @@ struct cls_bpf_prog {
> 	struct list_head link;
> 	struct tcf_result res;
> 	bool exts_integrated;
>+	bool offloaded;
> 	struct tcf_exts exts;
> 	u32 handle;
> 	union {
>@@ -140,6 +141,72 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog)
> 	return !prog->bpf_ops;
> }
> 
>+static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
>+			       enum tc_clsbpf_command cmd)
>+{
>+	struct net_device *dev = tp->q->dev_queue->dev;
>+	struct tc_cls_bpf_offload bpf_offload = {};
>+	struct tc_to_netdev offload;
>+
>+	offload.type = TC_SETUP_CLSBPF;
>+	offload.cls_bpf = &bpf_offload;
>+
>+	bpf_offload.command = cmd;
>+	bpf_offload.exts = &prog->exts;
>+	bpf_offload.filter = prog->filter;
>+	bpf_offload.name = prog->bpf_name;
>+	bpf_offload.exts_integrated = prog->exts_integrated;
>+
>+	return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
>+					     tp->protocol, &offload);
>+}
>+
>+static void cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
>+			    struct cls_bpf_prog *oldprog)
>+{
>+	struct net_device *dev = tp->q->dev_queue->dev;
>+	struct cls_bpf_prog *obj = prog;
>+	enum tc_clsbpf_command cmd;
>+
>+	if (oldprog && oldprog->offloaded) {
>+		if (tc_should_offload(dev, 0)) {
>+			cmd = TC_CLSBPF_REPLACE;
>+		} else {
>+			obj = oldprog;
>+			cmd = TC_CLSBPF_DESTROY;
>+		}
>+	} else {
>+		if (!tc_should_offload(dev, 0))
>+			return;
>+		cmd = TC_CLSBPF_ADD;
>+	}
>+
>+	if (cls_bpf_offload_cmd(tp, obj, cmd))
>+		return;
>+
>+	obj->offloaded = true;
>+	if (oldprog)
>+		oldprog->offloaded = false;
>+}
>+
>+static void cls_bpf_stop_offload(struct tcf_proto *tp,
>+				 struct cls_bpf_prog *prog)
>+{
>+	struct net_device *dev = tp->q->dev_queue->dev;
>+
>+	if (!prog->offloaded)
>+		return;
>+	if (WARN_ON(!tc_should_offload(dev, 0)))
>+		return;
>+
>+	if (cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY)) {

Please do:
err = 
if (err)



>+		pr_err("Stopping hardware offload failed!\n");
>+		return;
>+	}
>+
>+	prog->offloaded = false;
>+}
>+
> static int cls_bpf_init(struct tcf_proto *tp)
> {
> 	struct cls_bpf_head *head;
>@@ -179,6 +246,7 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
> {
> 	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) arg;
> 
>+	cls_bpf_stop_offload(tp, prog);
> 	list_del_rcu(&prog->link);
> 	tcf_unbind_filter(tp, &prog->res);
> 	call_rcu(&prog->rcu, __cls_bpf_delete_prog);
>@@ -195,6 +263,7 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
> 		return false;
> 
> 	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
>+		cls_bpf_stop_offload(tp, prog);
> 		list_del_rcu(&prog->link);
> 		tcf_unbind_filter(tp, &prog->res);
> 		call_rcu(&prog->rcu, __cls_bpf_delete_prog);
>@@ -415,6 +484,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
> 	if (ret < 0)
> 		goto errout;
> 
>+	cls_bpf_offload(tp, prog, oldprog);
>+
> 	if (oldprog) {
> 		list_replace_rcu(&oldprog->link, &prog->link);
> 		tcf_unbind_filter(tp, &oldprog->res);
>-- 
>1.9.1
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag
  2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
                     ` (2 preceding siblings ...)
  2016-06-01 19:40   ` Daniel Borkmann
@ 2016-06-02  7:24   ` Jiri Pirko
  3 siblings, 0 replies; 54+ messages in thread
From: Jiri Pirko @ 2016-06-02  7:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, dinan.gunawardena

Wed, Jun 01, 2016 at 06:50:05PM CEST, jakub.kicinski@netronome.com wrote:
>Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
>Unlike U32 and flower cls_bpf already has some netlink
>flags defined.  I chose to create a new attribute to be
>able to use the same flag values as the above.
>
>Unknown flags are ignored and not reported upon dump.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>Reviewed-by: Simon Horman <simon.horman@netronome.com>


Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 01/12] add basic register-field manipulation macros
  2016-06-01 23:08     ` Jakub Kicinski
@ 2016-06-02 12:01       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 54+ messages in thread
From: Hannes Frederic Sowa @ 2016-06-02 12:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, ast, daniel, dinan.gunawardena, Ivo van Doorn, Felix Fietkau

On 02.06.2016 01:08, Jakub Kicinski wrote:
> On Wed, 01 Jun 2016 22:15:36 +0200, Hannes Frederic Sowa wrote:
>> Hello,
>>
>> On Wed, Jun 1, 2016, at 18:50, Jakub Kicinski wrote:
>>> C bitfields are problematic and best avoided.  Developers
>>> interacting with hardware registers find themselves searching
>>> for easy-to-use alternatives.  Common approach is to define
>>> structures or sets of macros containing mask and shift pair.
>>> Operation on the register are then performed as follows:
>>>
>>>  field = (reg >> shift) & mask;
>>>
>>>  reg &= ~(mask << shift);
>>>  reg |= (field & mask) << shift;
>>>
>>> Defining shift and mask separately is tedious.  Ivo van Doorn
>>> came up with an idea of computing them at compilation time
>>> based on a single shifted mask (later refined by Felix) which
>>> can be used like this:
>>>
>>>  field = FIELD_GET(REG_FIELD, reg);
>>>
>>>  reg &= ~REG_FIELD;
>>>  reg |= FIELD_PUT(REG_FIELD, field);
>>>
>>> FIELD_{GET,PUT} macros take care of finding out what the
>>> appropriate shift is based on compilation time ffs operation.
>>>
>>> GENMASK can be used to define registers (which is usually
>>> less error-prone and easier to match with datasheets).
>>>
>>> This approach is the most convenient I've seen so to limit code
>>> multiplication let's move the macros to a global header file.
>>>
>>> Compared to Felix Fietkau's version I:
>>>  - edited the comments a bit;
>>>  - gave auxiliary macros _bf_ prefix;
>>>  - dropped the UL specifier from 1 in _bf_low_bits,
>>>  - renamed the FIELD_SET to FIELD_PUT;
>>>  - added 64bit versions.
>>>
>>> CC: Ivo van Doorn <IvDoorn@gmail.com>
>>> CC: Felix Fietkau <nbd@openwrt.org>
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>>> ---
>>>  include/linux/bitfield.h | 89
>>>  ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 89 insertions(+)
>>>  create mode 100644 include/linux/bitfield.h
>>>
>>> diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
>>> new file mode 100644
>>> index 000000000000..3815c81f5b10
>>> --- /dev/null
>>> +++ b/include/linux/bitfield.h
>>> @@ -0,0 +1,89 @@
>>> +/*
>>> + * Copyright (C) 2014 Felix Fietkau <nbd@openwrt.org>
>>> + * Copyright (C) 2004 - 2009 Ivo van Doorn <IvDoorn@gmail.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2
>>> + * as published by the Free Software Foundation
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + */
>>> +
>>> +#ifndef _LINUX_BITFIELD_H
>>> +#define _LINUX_BITFIELD_H
>>> +
>>> +#include <asm/types.h>
>>> +
>>> +/*
>>> + * Macros to find first set bit in a variable.
>>> + * These macros behave the same as the __ffs() functions but the most
>>> important
>>> + * difference that this is done during compile-time rather than
>>> run-time.
>>> + */
>>> +#define compile_ffs2(__x) \
>>> +       __builtin_choose_expr(((__x) & 0x1), 0, 1)
>>> +
>>> +#define compile_ffs4(__x) \
>>> +       __builtin_choose_expr(((__x) & 0x3), \
>>> +                             (compile_ffs2((__x))), \
>>> +                             (compile_ffs2((__x) >> 2) + 2))
>>> +
>>> +#define compile_ffs8(__x) \
>>> +       __builtin_choose_expr(((__x) & 0xf), \
>>> +                             (compile_ffs4((__x))), \
>>> +                             (compile_ffs4((__x) >> 4) + 4))
>>> +
>>> +#define compile_ffs16(__x) \
>>> +       __builtin_choose_expr(((__x) & 0xff), \
>>> +                             (compile_ffs8((__x))), \
>>> +                             (compile_ffs8((__x) >> 8) + 8))
>>> +
>>> +#define compile_ffs32(__x) \
>>> +       __builtin_choose_expr(((__x) & 0xffff), \
>>> +                             (compile_ffs16((__x))), \
>>> +                             (compile_ffs16((__x) >> 16) + 16))
>>> +
>>> +#define compile_ffs64(__x) \
>>> +       __builtin_choose_expr(((__x) & 0xffffffff), \
>>> +                             (compile_ffs32((__x))), \
>>> +                             (compile_ffs32(((__x) >> 16) >> 16) + 32))  
>>
>> I wonder if this can already be done with __builtin_ffs/__builtin_ffsl.
>>
>> So the macro would only need to do:
>>
>> __builtin_choose_expr(__builtin_types_compatible_p(typeof(x), long,
>> __builtin_ffsl(x),
>> __builtin_choose_expr(__builtin_types_compatible_p(typeof(x), int,
>> __builtin_ffs(x), (void)0)
>>
>> Probably you can get away with the long version only.
>>
>> Probably adding ffs and ffsl to the generic headers and using them would
>> also be worthwhile.
>>
>> Otherwise you should be able to express all of this via ilog2 in nearly
>> one line?
> 
> Yes, the emphasis here is to be able to do it all at compilation time
> and throw an error if there is something wrong with the mask.  I should
> make that clear in the commit message and/or a comment.

__builtin_ffs(l) does constant folding in the compiler, it is already
used as such in the kernel. Also ilog2 is specifically designed to be
used in constant folding expansion, that's why it is so huge. ;)

>>> +
>>> +/*
>>> + * Macros to validate that the mask given contains a contiguous set of
>>> bits.
>>> + * Note that we cannot use the is_power_of_2() function since this check
>>> + * must be done at compile-time.
>>> + */
>>> +#define _bf_is_power_of_two(x) ( !((x) & ((x) - 1)) )  
>>
>> We already provide a macro is_power_of_2.
> 
> Same here, is_power_of_2() is a static inline unfortunately.  I tried
> using it but it makes compilation time checking not work.

Other existing code in the kernel would not work if this static inline
function would not be constant folded away.

> 
>>> +#define _bf_low_bits(x)                ( ((x) - 1) & ~(x) )  
>>
>> GENMASK?
> 
> How do I use GENMASK to convert:
> 0x00ffff00
> to
> 0x000000ff
> ?
> Using the compile_time_ffs() macros?
> 

Okay, probably a bit far fetched: ;)

at compile time:

GENMASK(__builtin_ctz(x), 0)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 02/12] net: cls_bpf: add hardware offload
  2016-06-02  7:17   ` Jiri Pirko
@ 2016-06-02 12:07     ` Jakub Kicinski
  0 siblings, 0 replies; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-02 12:07 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, ast, daniel, dinan.gunawardena

On Thu, 2 Jun 2016 09:17:15 +0200, Jiri Pirko wrote:
> >+static void cls_bpf_stop_offload(struct tcf_proto *tp,
> >+				 struct cls_bpf_prog *prog)
> >+{
> >+	struct net_device *dev = tp->q->dev_queue->dev;
> >+
> >+	if (!prog->offloaded)
> >+		return;
> >+	if (WARN_ON(!tc_should_offload(dev, 0)))
> >+		return;
> >+
> >+	if (cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY)) {  
> 
> Please do:
> err = 
> if (err)

Sure!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-02  6:57         ` Jiri Pirko
@ 2016-06-02 12:13           ` Jakub Kicinski
  2016-06-02 12:30             ` Daniel Borkmann
  0 siblings, 1 reply; 54+ messages in thread
From: Jakub Kicinski @ 2016-06-02 12:13 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Alexei Starovoitov, Daniel Borkmann, netdev, ast,
	dinan.gunawardena

On Thu, 2 Jun 2016 08:57:48 +0200, Jiri Pirko wrote:
> Wed, Jun 01, 2016 at 11:36:48PM CEST, john.fastabend@gmail.com wrote:
> >On 16-06-01 01:52 PM, Alexei Starovoitov wrote:  
> >> On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:  
> >>> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:  
> >>>> Add hardware cls_bpf offload on our smart NICs.  Detect if
> >>>> capable firmware is loaded and use it to load the code JITed
> >>>> with just added translator onto programmable engines.
> >>>>
> >>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
> >>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>  
> >>> [...]  
> >>>> +static int
> >>>> +nfp_net_bpf_offload_prepare(struct nfp_net *nn,
> >>>> +			    struct tc_cls_bpf_offload *cls_bpf,
> >>>> +			    struct nfp_bpf_result *res,
> >>>> +			    void **code, dma_addr_t *dma_addr, u16 max_instr)
> >>>> +{
> >>>> +	unsigned int code_sz = max_instr * sizeof(u64);
> >>>> +	u16 start_off, tgt_out, tgt_abort;
> >>>> +	const struct tc_action *a;
> >>>> +	int err;
> >>>> +
> >>>> +	if (tc_no_actions(cls_bpf->exts))
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	tc_for_each_action(a, cls_bpf->exts) {
> >>>> +		if (!is_tcf_gact_shot(a))
> >>>> +			return -EINVAL;
> >>>> +	}
> >>>> +
> >>>> +	if (cls_bpf->exts_integrated)
> >>>> +		return -EINVAL;  
> >>>
> >>> So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
> >>> Direct-action is I would say the most typical way to run cls_bpf as it allows
> >>> you to more naturally and efficiently code programs in the sense that classification
> >>> and action is already combined in a single program, so there's no additional
> >>> overhead of a linear action chain required, and a single program can already
> >>> have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
> >>> usually enough to run a single cls_bpf instance, for example, on sch_clsact
> >>> ingress or egress parent, nothing more than that to get the job done. I think
> >>> the cls_bpf->exts_integrated test could probably come first and if it's false,
> >>> we'd need to walk the actions?  
> >> 
> >> I think it makes sense to offload da mode only. Doing tc_for_each_action
> >> walk like above is ok, but the number of bpf programs with only separate
> >> gact is diminishingly small and we don't recommend to use it anymore.
> >> That's the stuff we used when da wasn't available.
> >>   
> >
> >+1 I've been using da mode only as well.  
> 
> I also think we should support offload for da mode only for cls_bpf

First of all thanks everyone for the reviews and suggestions!

I will definitely do da in the next revision, but I'm not sure we
should do only da.  As far as I can tell there are no statistics when
da mode is used.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 06/12] nfp: add hardware cls_bpf offload
  2016-06-02 12:13           ` Jakub Kicinski
@ 2016-06-02 12:30             ` Daniel Borkmann
  0 siblings, 0 replies; 54+ messages in thread
From: Daniel Borkmann @ 2016-06-02 12:30 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: John Fastabend, Alexei Starovoitov, netdev, ast, dinan.gunawardena

On 06/02/2016 02:13 PM, Jakub Kicinski wrote:
> On Thu, 2 Jun 2016 08:57:48 +0200, Jiri Pirko wrote:
>> Wed, Jun 01, 2016 at 11:36:48PM CEST, john.fastabend@gmail.com wrote:
>>> On 16-06-01 01:52 PM, Alexei Starovoitov wrote:
>>>> On Wed, Jun 01, 2016 at 10:20:54PM +0200, Daniel Borkmann wrote:
>>>>> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
>>>>>> Add hardware cls_bpf offload on our smart NICs.  Detect if
>>>>>> capable firmware is loaded and use it to load the code JITed
>>>>>> with just added translator onto programmable engines.
>>>>>>
>>>>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>>>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>>>>> [...]
>>>>>> +static int
>>>>>> +nfp_net_bpf_offload_prepare(struct nfp_net *nn,
>>>>>> +			    struct tc_cls_bpf_offload *cls_bpf,
>>>>>> +			    struct nfp_bpf_result *res,
>>>>>> +			    void **code, dma_addr_t *dma_addr, u16 max_instr)
>>>>>> +{
>>>>>> +	unsigned int code_sz = max_instr * sizeof(u64);
>>>>>> +	u16 start_off, tgt_out, tgt_abort;
>>>>>> +	const struct tc_action *a;
>>>>>> +	int err;
>>>>>> +
>>>>>> +	if (tc_no_actions(cls_bpf->exts))
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	tc_for_each_action(a, cls_bpf->exts) {
>>>>>> +		if (!is_tcf_gact_shot(a))
>>>>>> +			return -EINVAL;
>>>>>> +	}
>>>>>> +
>>>>>> +	if (cls_bpf->exts_integrated)
>>>>>> +		return -EINVAL;
>>>>>
>>>>> So cls_bpf has two working modes as mentioned: da (direct-action) and non-da.
>>>>> Direct-action is I would say the most typical way to run cls_bpf as it allows
>>>>> you to more naturally and efficiently code programs in the sense that classification
>>>>> and action is already combined in a single program, so there's no additional
>>>>> overhead of a linear action chain required, and a single program can already
>>>>> have multiple action code outcomes (TC_ACT_OK, TC_ACT_SHOT, ...), so that it is
>>>>> usually enough to run a single cls_bpf instance, for example, on sch_clsact
>>>>> ingress or egress parent, nothing more than that to get the job done. I think
>>>>> the cls_bpf->exts_integrated test could probably come first and if it's false,
>>>>> we'd need to walk the actions?
>>>>
>>>> I think it makes sense to offload da mode only. Doing tc_for_each_action
>>>> walk like above is ok, but the number of bpf programs with only separate
>>>> gact is diminishingly small and we don't recommend to use it anymore.
>>>> That's the stuff we used when da wasn't available.
>>>
>>> +1 I've been using da mode only as well.
>>
>> I also think we should support offload for da mode only for cls_bpf
>
> First of all thanks everyone for the reviews and suggestions!
>
> I will definitely do da in the next revision, but I'm not sure we
> should do only da.  As far as I can tell there are no statistics when
> da mode is used.

Well, you still have (qdisc) drop counter, but other than that, you can implement
your own stats via BPF maps for whatever event the user is programming it to count
stats for. But I presume that would be for some step later on if you can in-fact
support that.

But at the same time there should also be nothing that prevents you from adding a
new netlink attribute for the cls_bpf dump part that could export generic hw stats
related to the offloaded program (like passes, drops, etc) to tc and if present and
da enabled, tc dumps them too?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [RFC 05/12] nfp: add BPF to NFP code translator
  2016-06-01 20:15     ` Alexei Starovoitov
  2016-06-01 21:23       ` Jakub Kicinski
@ 2016-06-02 16:21       ` John Fastabend
  1 sibling, 0 replies; 54+ messages in thread
From: John Fastabend @ 2016-06-02 16:21 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Jakub Kicinski, netdev, ast, dinan.gunawardena

On 16-06-01 01:15 PM, Alexei Starovoitov wrote:
> On Wed, Jun 01, 2016 at 10:03:04PM +0200, Daniel Borkmann wrote:
>> On 06/01/2016 06:50 PM, Jakub Kicinski wrote:
>>> Add translator for JITing eBPF to operations which
>>> can be executed on NFP's programmable engines.
>>>
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Reviewed-by: Dinan Gunawardena <dgunawardena@netronome.com>
>>> Reviewed-by: Simon Horman <simon.horman@netronome.com>
>> [...]
>>> +int
>>> +nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int prog_start,
>>> +	    unsigned int tgt_out, unsigned int tgt_abort,
>>> +	    unsigned int prog_sz, struct nfp_bpf_result *res)
>>> +{
>>> +	struct nfp_prog *nfp_prog;
>>> +	int ret;
>>> +
>>> +	/* TODO: maybe make this dependent on bpf_jit_enable? */
>>
>> Probably makes sense to leave it independent from this.
>>
>> Maybe that would rather be an ethtool flag/setting?
> 
> Agree that it should be independent of bpf_jit_enable,
> since that's very different JIT. The whole point of hw offload
> is that bpf is translated into something hw understand natively.
> Gating it by sysctl or another flag doesn't make much sense to me.
> In this case the user will say 'do offload tc+cls_bpf into a nic'
> and nic should either do it or not. No need for ethtool flag either.
> One can argue that that bpf_jit_enable=2 was useful for debugging
> of JIT itself, but looks like it was only used by jit developers
> like us, but we would be fine with temp printk while debugging.
> At least there was never a case where jit had a bug and we would
> ask a person reporting a bug to send us back jit_enable=2 output.
> We cannot remove it now, but I wouldn't simply copy the behavior here.
> So I'm suggesting not to use bpf_jit_enable either 1 or 2 at all.
> 

In the default case (no flags to the tc command) the tc filter
tries to load itself in the hardware. The ethtool flag is there
to enable/disable this default behavior. The alternative to the
default load into hardware behavior is to specify it explicitly
via userspace using the 'do offload tc+cls_bpf' as you note. This
was the default behavior folks wanted at netdev conference so I
added it even though for many of my use cases users specify explicitly
if they want offload or not.

Thanks,
John

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2016-06-02 16:22 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 16:50 [RFC 00/12] BPF hardware offload via cls_bpf Jakub Kicinski
2016-06-01 16:50 ` [RFC 01/12] add basic register-field manipulation macros Jakub Kicinski
2016-06-01 20:15   ` Hannes Frederic Sowa
2016-06-01 23:08     ` Jakub Kicinski
2016-06-02 12:01       ` Hannes Frederic Sowa
2016-06-01 16:50 ` [RFC 02/12] net: cls_bpf: add hardware offload Jakub Kicinski
2016-06-01 17:13   ` John Fastabend
2016-06-01 20:59     ` Jakub Kicinski
2016-06-01 19:34   ` Daniel Borkmann
2016-06-02  7:17   ` Jiri Pirko
2016-06-02 12:07     ` Jakub Kicinski
2016-06-01 16:50 ` [RFC 03/12] net: cls_bpf: limit hardware offload by software-only flag Jakub Kicinski
2016-06-01 17:16   ` John Fastabend
2016-06-01 17:16   ` John Fastabend
2016-06-01 19:40   ` Daniel Borkmann
2016-06-01 21:05     ` Jakub Kicinski
2016-06-01 21:21       ` Daniel Borkmann
2016-06-01 21:26         ` Jakub Kicinski
2016-06-01 21:31           ` Daniel Borkmann
2016-06-02  7:24   ` Jiri Pirko
2016-06-01 16:50 ` [RFC 04/12] net: cls_bpf: add support for marking filters as hardware-only Jakub Kicinski
2016-06-01 17:19   ` John Fastabend
2016-06-01 19:57   ` Daniel Borkmann
2016-06-01 16:50 ` [RFC 05/12] nfp: add BPF to NFP code translator Jakub Kicinski
2016-06-01 20:03   ` Daniel Borkmann
2016-06-01 20:09     ` John Fastabend
2016-06-01 20:15     ` Alexei Starovoitov
2016-06-01 21:23       ` Jakub Kicinski
2016-06-02 16:21       ` John Fastabend
2016-06-01 16:50 ` [RFC 06/12] nfp: add hardware cls_bpf offload Jakub Kicinski
2016-06-01 20:20   ` Daniel Borkmann
2016-06-01 20:52     ` Alexei Starovoitov
2016-06-01 21:15       ` Jakub Kicinski
2016-06-01 21:51         ` Alexei Starovoitov
2016-06-01 21:16       ` Daniel Borkmann
2016-06-01 21:36       ` John Fastabend
2016-06-02  6:57         ` Jiri Pirko
2016-06-02 12:13           ` Jakub Kicinski
2016-06-02 12:30             ` Daniel Borkmann
2016-06-01 23:03   ` Daniel Borkmann
2016-06-01 16:50 ` [RFC 07/12] nfp: add skb mark support to the bpf offload Jakub Kicinski
2016-06-01 21:56   ` Alexei Starovoitov
2016-06-01 22:19     ` Jakub Kicinski
2016-06-01 22:30       ` Daniel Borkmann
2016-06-01 23:01         ` Jakub Kicinski
2016-06-01 16:50 ` [RFC 08/12] net: cls_bpf: allow offloaded filters to update stats Jakub Kicinski
2016-06-01 17:20   ` John Fastabend
2016-06-01 22:09   ` Daniel Borkmann
2016-06-01 16:50 ` [RFC 09/12] nfp: report statistics of offloaded filters Jakub Kicinski
2016-06-01 16:50 ` [RFC 10/12] nfp: bpf: optimize register init Jakub Kicinski
2016-06-01 16:50 ` [RFC 11/12] nfp: bpf: add register rename Jakub Kicinski
2016-06-01 16:50 ` [RFC 12/12] nfp: bpf: add denser mode of execution Jakub Kicinski
2016-06-01 22:01   ` Alexei Starovoitov
2016-06-01 22:47     ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.