All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers
@ 2017-02-15  8:52 Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 1/7] net/sched: cls_flower: Properly handle classifier flags dumping Or Gerlitz
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Added two new flags, "in hw" and "not in hw" such that user space 
can determine if a filter is actually offloaded to hw. The "in hw" 
UAPI semantics was chosen so it's similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

As an example, add one vlan push + fwd rule, one matchall rule and one u32 rule 
without any flags, and another vlan + fwd skip_sw rule, such that the different TC 
classifier attempt to offload all of them -- all over mlx5 SRIOV VF rep:

# tc filter add dev eth2_0 protocol ip parent ffff: 
	flower skip_sw indev eth2_0 src_mac e4:11:22:33:44:50 dst_mac e4:1d:2d:a5:f3:9d 
	action vlan push id 52 action mirred egress redirect dev eth2

# tc filter add dev eth2_0 protocol ip parent ffff: 
	flower indev eth2_0 src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:51 
	action vlan push id 53 action mirred egress redirect dev eth2

# tc filter add dev eth2_0 parent ffff: matchall action mirred egress mirror dev veth1

# tc filter add dev eth2_0 parent ffff: protocol ip prio 99 handle 800:0:1 
	u32 ht 800: flowid 800:1 match ip src 192.168.1.0/24 action drop

Since that VF rep doesn't offload matchall/u32 and can currently offload
only one vlan push rule we expect three of the rules not to be offloaded:

# tc filter show dev eth2_0 parent ffff:

filter protocol ip pref 99 u32 
filter protocol ip pref 99 u32 fh 800: ht divisor 1 
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 flowid 800:1 not in_hw 
  match c0a80100/ffffff00 at 12
	action order 1: gact action drop
	 random type none pass val 0
	 index 8 ref 1 bind 1
 
filter protocol all pref 49150 matchall 
filter protocol all pref 49150 matchall handle 0x1 
  not in_hw
	action order 1: mirred (Egress Mirror to device veth1) pipe
 	index 27 ref 1 bind 1
 
filter protocol ip pref 49151 flower 
filter protocol ip pref 49151 flower handle 0x1 
  indev eth2_0
  dst_mac e4:11:22:33:44:51
  src_mac e4:11:22:33:44:50
  eth_type ipv4
  not in_hw
	action order 1:  vlan push id 53 protocol 802.1Q priority 0 pipe
	 index 20 ref 1 bind 1
 
	action order 2: mirred (Egress Redirect to device eth2) stolen
 	index 26 ref 1 bind 1
 
filter protocol ip pref 49152 flower 
filter protocol ip pref 49152 flower handle 0x1 
  indev eth2_0
  dst_mac e4:1d:2d:a5:f3:9d
  src_mac e4:11:22:33:44:50
  eth_type ipv4
  skip_sw
  in_hw
	action order 1:  vlan push id 52 protocol 802.1Q priority 0 pipe
	 index 19 ref 1 bind 1
 
	action order 2: mirred (Egress Redirect to device eth2) stolen
 	index 25 ref 1 bind 1

v2 --> v3 changes:

 - fixed the matchall dump flags patch to do proper checks (Jakub)
 - added the same proper checks to flower where they were missing 
 - that flower patch was added as #1 and hence all the other patches are offed-by-one
 
v1 --> v2 changes:
 - applied feedback from Jakub and Dave -- where none of the skip flags were set, 
   the suggested approach didn't allow user space to distringuish between old kernel
   to a case when offloading to HW worked fine.

Or Gerlitz (6):
  net/sched: cls_matchall: Dump skip flags
  net/sched: Reflect HW offload status
  net/sched: cls_flower: Reflect HW offload status
  net/sched: cls_matchall: Reflect HW offloading status
  net/sched: cls_u32: Reflect HW offload status
  net/sched: cls_bpf: Reflect HW offload status

 include/net/pkt_cls.h        |  5 +++++
 include/uapi/linux/pkt_cls.h |  6 ++++--
 net/sched/cls_bpf.c          | 13 +++++++++++--
 net/sched/cls_flower.c       |  5 +++++
 net/sched/cls_matchall.c     | 14 ++++++++++++--
 net/sched/cls_u32.c          | 10 ++++++++++
 6 files changed, 47 insertions(+), 6 deletions(-)

-- 
2.3.7

*** BLURB HERE ***

Or Gerlitz (7):
  net/sched: cls_flower: Properly handle classifier flags dumping
  net/sched: cls_matchall: Dump the classifier flags
  net/sched: Reflect HW offload status
  net/sched: cls_flower: Reflect HW offload status
  net/sched: cls_matchall: Reflect HW offloading status
  net/sched: cls_u32: Reflect HW offload status
  net/sched: cls_bpf: Reflect HW offload status

 include/net/pkt_cls.h        |  5 +++++
 include/uapi/linux/pkt_cls.h |  6 ++++--
 net/sched/cls_bpf.c          | 13 +++++++++++--
 net/sched/cls_flower.c       |  8 +++++++-
 net/sched/cls_matchall.c     | 15 +++++++++++++--
 net/sched/cls_u32.c          | 10 ++++++++++
 6 files changed, 50 insertions(+), 7 deletions(-)

-- 
2.3.7

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 1/7] net/sched: cls_flower: Properly handle classifier flags dumping
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 2/7] net/sched: cls_matchall: Dump the classifier flags Or Gerlitz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

Dump the classifier flags only if non zero and make sure to check
the return status of the handler that puts them into the netlink msg.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 net/sched/cls_flower.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0826c8e..850d982 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1229,7 +1229,8 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
 		goto nla_put_failure;
 
-	nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
+	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
+		goto nla_put_failure;
 
 	if (tcf_exts_dump(skb, &f->exts))
 		goto nla_put_failure;
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 2/7] net/sched: cls_matchall: Dump the classifier flags
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 1/7] net/sched: cls_flower: Properly handle classifier flags dumping Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 3/7] net/sched: Reflect HW offload status Or Gerlitz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

The classifier flags are not dumped to user-space, do that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
---
 net/sched/cls_matchall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index f2141cb..35ef1c1 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -244,6 +244,9 @@ static int mall_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 	    nla_put_u32(skb, TCA_MATCHALL_CLASSID, head->res.classid))
 		goto nla_put_failure;
 
+	if (head->flags && nla_put_u32(skb, TCA_MATCHALL_FLAGS, head->flags))
+		goto nla_put_failure;
+
 	if (tcf_exts_dump(skb, &head->exts))
 		goto nla_put_failure;
 
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 1/7] net/sched: cls_flower: Properly handle classifier flags dumping Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 2/7] net/sched: cls_matchall: Dump the classifier flags Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15 16:19   ` Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 4/7] net/sched: cls_flower: " Or Gerlitz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h        | 5 +++++
 include/uapi/linux/pkt_cls.h | 6 ++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 71b266c..15cfe15 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -475,6 +475,11 @@ static inline bool tc_flags_valid(u32 flags)
 	return true;
 }
 
+static inline bool tc_in_hw(u32 flags)
+{
+	return (flags & TCA_CLS_FLAGS_IN_HW) ? true : false;
+}
+
 enum tc_fl_command {
 	TC_CLSFLOWER_REPLACE,
 	TC_CLSFLOWER_DESTROY,
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 345551e..7a69f2a 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -103,8 +103,10 @@ enum {
 #define TCA_POLICE_MAX (__TCA_POLICE_MAX - 1)
 
 /* tca flags definitions */
-#define TCA_CLS_FLAGS_SKIP_HW	(1 << 0)
-#define TCA_CLS_FLAGS_SKIP_SW	(1 << 1)
+#define TCA_CLS_FLAGS_SKIP_HW	(1 << 0) /* don't offload filter to HW */
+#define TCA_CLS_FLAGS_SKIP_SW	(1 << 1) /* don't use filter in SW */
+#define TCA_CLS_FLAGS_IN_HW	(1 << 2) /* filter is offloaded to HW */
+#define TCA_CLS_FLAGS_NOT_IN_HW (1 << 3) /* filter isn't offloaded to HW */
 
 /* U32 filters */
 
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 4/7] net/sched: cls_flower: Reflect HW offload status
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
                   ` (2 preceding siblings ...)
  2017-02-15  8:52 ` [PATCH net-next V3 3/7] net/sched: Reflect HW offload status Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status Or Gerlitz
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

Flower support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 850d982..9270c5b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -273,6 +273,8 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 
 	err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
 					    tc);
+	if (!err)
+		f->flags |= TCA_CLS_FLAGS_IN_HW;
 
 	if (tc_skip_sw(f->flags))
 		return err;
@@ -912,6 +914,9 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 			goto errout;
 	}
 
+	if (!(tc_in_hw(fnew->flags)))
+		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
 	if (fold) {
 		if (!tc_skip_sw(fold->flags))
 			rhashtable_remove_fast(&head->ht, &fold->ht_node,
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
                   ` (3 preceding siblings ...)
  2017-02-15  8:52 ` [PATCH net-next V3 4/7] net/sched: cls_flower: " Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15 18:15   ` David Miller
  2017-02-15  8:52 ` [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status Or Gerlitz
  2017-02-15  8:52 ` [PATCH net-next V3 7/7] net/sched: cls_bpf: " Or Gerlitz
  6 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

Matchall support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_matchall.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 35ef1c1..14de5b7 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -56,6 +56,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 	struct net_device *dev = tp->q->dev_queue->dev;
 	struct tc_to_netdev offload;
 	struct tc_cls_matchall_offload mall_offload = {0};
+	int err;
 
 	offload.type = TC_SETUP_MATCHALL;
 	offload.cls_mall = &mall_offload;
@@ -63,8 +64,12 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 	offload.cls_mall->exts = &head->exts;
 	offload.cls_mall->cookie = cookie;
 
-	return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
-					     &offload);
+	err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
+					    &offload);
+	if (!err)
+		head->flags |= TCA_CLS_FLAGS_IN_HW;
+
+	return err;
 }
 
 static void mall_destroy_hw_filter(struct tcf_proto *tp,
@@ -194,6 +199,9 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
 		}
 	}
 
+	if (!(tc_in_hw(new->flags)))
+		new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
 	*arg = (unsigned long) head;
 	rcu_assign_pointer(tp->root, new);
 	if (head)
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
                   ` (4 preceding siblings ...)
  2017-02-15  8:52 ` [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15 18:16   ` David Miller
  2017-02-15  8:52 ` [PATCH net-next V3 7/7] net/sched: cls_bpf: " Or Gerlitz
  6 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

U32 support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
---
 net/sched/cls_u32.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index a6ec3e4b..8c6cc39 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -523,6 +523,10 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 
 	err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
 					    tp->protocol, &offload);
+
+	if (!err)
+		n->flags |= TCA_CLS_FLAGS_IN_HW;
+
 	if (tc_skip_sw(flags))
 		return err;
 
@@ -895,6 +899,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 			return err;
 		}
 
+		if (!(tc_in_hw(new->flags)))
+			new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
 		u32_replace_knode(tp, tp_c, new);
 		tcf_unbind_filter(tp, &n->res);
 		call_rcu(&n->rcu, u32_delete_key_rcu);
@@ -1014,6 +1021,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		if (err)
 			goto errhw;
 
+		if (!(tc_in_hw(n->flags)))
+			n->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
 		ins = &ht->ht[TC_U32_HASH(handle)];
 		for (pins = rtnl_dereference(*ins); pins;
 		     ins = &pins->next, pins = rtnl_dereference(*ins))
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next V3 7/7] net/sched: cls_bpf: Reflect HW offload status
  2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
                   ` (5 preceding siblings ...)
  2017-02-15  8:52 ` [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status Or Gerlitz
@ 2017-02-15  8:52 ` Or Gerlitz
  2017-02-15 18:16   ` David Miller
  6 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15  8:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jakub Kicinski, John Fastabend, Jamal Hadi Salim, Roi Dayan,
	Jiri Pirko, netdev, Or Gerlitz

BPF classifier support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 net/sched/cls_bpf.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index d9c9701..61a5b33 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -148,6 +148,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	struct net_device *dev = tp->q->dev_queue->dev;
 	struct tc_cls_bpf_offload bpf_offload = {};
 	struct tc_to_netdev offload;
+	int err;
 
 	offload.type = TC_SETUP_CLSBPF;
 	offload.cls_bpf = &bpf_offload;
@@ -159,8 +160,13 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	bpf_offload.exts_integrated = prog->exts_integrated;
 	bpf_offload.gen_flags = prog->gen_flags;
 
-	return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
-					     tp->protocol, &offload);
+	err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+					    tp->protocol, &offload);
+
+	if (!err && (cmd == TC_CLSBPF_ADD || cmd == TC_CLSBPF_REPLACE))
+		prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
+
+	return err;
 }
 
 static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
@@ -511,6 +517,9 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 		return ret;
 	}
 
+	if (!(tc_in_hw(prog->gen_flags)))
+		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
 	if (oldprog) {
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
-- 
2.3.7

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-15  8:52 ` [PATCH net-next V3 3/7] net/sched: Reflect HW offload status Or Gerlitz
@ 2017-02-15 16:19   ` Or Gerlitz
  2017-02-15 18:28     ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15 16:19 UTC (permalink / raw)
  To: David Miller, Jamal Hadi Salim, Jiri Pirko
  Cc: Jakub Kicinski, John Fastabend, Roi Dayan, Linux Netdev List,
	Hadar Hen Zion, Amir Vadai, Ido Schimmel

On Wed, Feb 15, 2017 at 10:52 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Currently there is no way of querying whether a filter is
> offloaded to HW or not when using "both" policy (where none
> of skip_sw or skip_hw flags are set by user-space).

> Add two new flags, "in hw" and "not in hw" such that user
> space can determine if a filter is actually offloaded to
> hw or not. The "in hw" UAPI semantics was chosen so it's
> similar to the "skip hw" flag logic.

To make things a bit more clear, the semantics of the "in hw"
thing relates to the time of dumping the rule.

Currently in all of the offloading drivers/cases, when the driver returns
success for the ndo_setup_tc call, the flow is offloaded to hw.

But moving fwd that might change, a flow might be not offloaded to HW
on some window of time.

The coming up example, is support for neigh updates w.r.t to IP tunnel
encapsulation offloads in mlx5 SRIOV switchdev mode.

Today we offload tunnel encap flow only if the kernel has valid neigh
to the tunnel destination. Under the works is a code to offload/un-offload
the flow to/from HW when the neigh becomes valid/invalid or goes through
hardware address change etc.

So what I basically suggest here is to enhance that future mlx5 series
with patches
under which the dump code of all the classifiers will invoke the
tc_setup_ndo with
a fourth sub command (today there are add/del/stats) which will return
the actual
"in hw" status.

This is aligned with the general architecture/approach in the kernel
for switchdev and other
offloads.

Note that this future change doesn't change the UAPI, it will still
have two values, "in hw"
and "not in hw". The values with this series are the actual values and
later with that change,
they will keep being the actual values, just the kernel method to
retrieve them will be different.


Or.

> If none of these two flags are set, this signals running
> over older kernel.

> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> Reviewed-by: Amir Vadai <amir@vadai.me>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/pkt_cls.h        | 5 +++++
>  include/uapi/linux/pkt_cls.h | 6 ++++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 71b266c..15cfe15 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -475,6 +475,11 @@ static inline bool tc_flags_valid(u32 flags)
>         return true;
>  }
>
> +static inline bool tc_in_hw(u32 flags)
> +{
> +       return (flags & TCA_CLS_FLAGS_IN_HW) ? true : false;
> +}
> +
>  enum tc_fl_command {
>         TC_CLSFLOWER_REPLACE,
>         TC_CLSFLOWER_DESTROY,
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 345551e..7a69f2a 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -103,8 +103,10 @@ enum {
>  #define TCA_POLICE_MAX (__TCA_POLICE_MAX - 1)
>
>  /* tca flags definitions */
> -#define TCA_CLS_FLAGS_SKIP_HW  (1 << 0)
> -#define TCA_CLS_FLAGS_SKIP_SW  (1 << 1)
> +#define TCA_CLS_FLAGS_SKIP_HW  (1 << 0) /* don't offload filter to HW */
> +#define TCA_CLS_FLAGS_SKIP_SW  (1 << 1) /* don't use filter in SW */
> +#define TCA_CLS_FLAGS_IN_HW    (1 << 2) /* filter is offloaded to HW */
> +#define TCA_CLS_FLAGS_NOT_IN_HW (1 << 3) /* filter isn't offloaded to HW */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status
  2017-02-15  8:52 ` [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status Or Gerlitz
@ 2017-02-15 18:15   ` David Miller
  2017-02-15 21:40     ` Or Gerlitz
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2017-02-15 18:15 UTC (permalink / raw)
  To: ogerlitz; +Cc: jakub.kicinski, john.r.fastabend, jhs, roid, jiri, netdev

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 15 Feb 2017 10:52:35 +0200

> @@ -194,6 +199,9 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
>  		}
>  	}
>  
> +	if (!(tc_in_hw(new->flags)))
> +		new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;

Too many parenthesis, please make this:

	if (!tc_in_hw(new->flags))
		new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status
  2017-02-15  8:52 ` [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status Or Gerlitz
@ 2017-02-15 18:16   ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2017-02-15 18:16 UTC (permalink / raw)
  To: ogerlitz; +Cc: jakub.kicinski, john.r.fastabend, jhs, roid, jiri, netdev

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 15 Feb 2017 10:52:36 +0200

> @@ -895,6 +899,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
>  			return err;
>  		}
>  
> +		if (!(tc_in_hw(new->flags)))

Less parenthesis, please.

> @@ -1014,6 +1021,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
>  		if (err)
>  			goto errhw;
>  
> +		if (!(tc_in_hw(n->flags)))

Likewise.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 7/7] net/sched: cls_bpf: Reflect HW offload status
  2017-02-15  8:52 ` [PATCH net-next V3 7/7] net/sched: cls_bpf: " Or Gerlitz
@ 2017-02-15 18:16   ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2017-02-15 18:16 UTC (permalink / raw)
  To: ogerlitz; +Cc: jakub.kicinski, john.r.fastabend, jhs, roid, jiri, netdev

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 15 Feb 2017 10:52:37 +0200

> @@ -511,6 +517,9 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
>  		return ret;
>  	}
>  
> +	if (!(tc_in_hw(prog->gen_flags)))

Again, less parenthesis.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-15 16:19   ` Or Gerlitz
@ 2017-02-15 18:28     ` Jakub Kicinski
  2017-02-15 21:56       ` Or Gerlitz
  2017-02-16  8:17       ` Ido Schimmel
  0 siblings, 2 replies; 18+ messages in thread
From: Jakub Kicinski @ 2017-02-15 18:28 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David Miller, Jamal Hadi Salim, Jiri Pirko, John Fastabend,
	Roi Dayan, Linux Netdev List, Hadar Hen Zion, Amir Vadai,
	Ido Schimmel

On Wed, 15 Feb 2017 18:19:17 +0200, Or Gerlitz wrote:
> On Wed, Feb 15, 2017 at 10:52 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> > Currently there is no way of querying whether a filter is
> > offloaded to HW or not when using "both" policy (where none
> > of skip_sw or skip_hw flags are set by user-space).  
> 
> > Add two new flags, "in hw" and "not in hw" such that user
> > space can determine if a filter is actually offloaded to
> > hw or not. The "in hw" UAPI semantics was chosen so it's
> > similar to the "skip hw" flag logic.  
> 
> To make things a bit more clear, the semantics of the "in hw"
> thing relates to the time of dumping the rule.
> 
> Currently in all of the offloading drivers/cases, when the driver returns
> success for the ndo_setup_tc call, the flow is offloaded to hw.
> 
> But moving fwd that might change, a flow might be not offloaded to HW
> on some window of time.
> 
> The coming up example, is support for neigh updates w.r.t to IP tunnel
> encapsulation offloads in mlx5 SRIOV switchdev mode.
> 
> Today we offload tunnel encap flow only if the kernel has valid neigh
> to the tunnel destination. Under the works is a code to offload/un-offload
> the flow to/from HW when the neigh becomes valid/invalid or goes through
> hardware address change etc.
> 
> So what I basically suggest here is to enhance that future mlx5 series
> with patches
> under which the dump code of all the classifiers will invoke the
> tc_setup_ndo with
> a fourth sub command (today there are add/del/stats) which will return
> the actual
> "in hw" status.
> 
> This is aligned with the general architecture/approach in the kernel
> for switchdev and other
> offloads.
> 
> Note that this future change doesn't change the UAPI, it will still
> have two values, "in hw"
> and "not in hw". The values with this series are the actual values and
> later with that change,
> they will keep being the actual values, just the kernel method to
> retrieve them will be different.

Thanks for explaining this, I was actually hoping to take this in
slightly different direction.

What worries me is that the moment we started offloading packet
modification we run at the risk of modifying packets twice.  This used
to be a problem only for eBPF but now mlx5 can also offload things like
ttl decrement.  For filters which have no skip_* flags set and get
offloaded if packet doesn't get redirected or modified significantly it
will match the filter both in HW and on the host and therefore have the
actions applied twice.  And it will get counted twice.  (It seems nobody
ever raised this so perhaps I'm mistaken in thinking that this can
happen?)

Back to your patch set, I was hoping we will be able to use the new
IN_HW flag to skip filters in software even if they don't have skip_sw
set.  If we need to eject actions from HW based on external events, that
obviously complicates things.  Three trivial solutions to the problem
I could think of are:
 - pass a pointer to filter's flags down to the driver and let it do
   atomic bitops? on it to set/clear the IN_HW status;
 - add a way of driver calling back into TC;
 - use one of recently freed skb TC bits to mark packets which were
   supposed to be processed in HW by could not as needing software
   fallback (I think this could work for you without parsing the packet
   in the driver, you could replace the tunnel action with mark action
   and leave the matching rule in HW classifier/TCAM; for BPF I have a
   descriptor flag telling me if offloaded BPF completed successfully).

FWIW this patch set looks good to me!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status
  2017-02-15 18:15   ` David Miller
@ 2017-02-15 21:40     ` Or Gerlitz
  0 siblings, 0 replies; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15 21:40 UTC (permalink / raw)
  To: David Miller
  Cc: Or Gerlitz, Jakub Kicinski, John Fastabend, Jamal Hadi Salim,
	Roi Dayan, Jiri Pirko, Linux Netdev List

On Wed, Feb 15, 2017 at 8:15 PM, David Miller <davem@davemloft.net> wrote:
> From: Or Gerlitz <ogerlitz@mellanox.com>
> Date: Wed, 15 Feb 2017 10:52:35 +0200
>
>> @@ -194,6 +199,9 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
>>               }
>>       }
>>
>> +     if (!(tc_in_hw(new->flags)))
>> +             new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>
> Too many parenthesis, please make this:
>
>         if (!tc_in_hw(new->flags))
>                 new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;

OKay, Dave, I will fix that along with the other places

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-15 18:28     ` Jakub Kicinski
@ 2017-02-15 21:56       ` Or Gerlitz
  2017-02-15 23:07         ` Jakub Kicinski
  2017-02-16  8:17       ` Ido Schimmel
  1 sibling, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2017-02-15 21:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, Jamal Hadi Salim, Jiri Pirko, John Fastabend,
	Roi Dayan, Linux Netdev List, Hadar Hen Zion, Amir Vadai,
	Ido Schimmel

On Wed, Feb 15, 2017 at 8:28 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Wed, 15 Feb 2017 18:19:17 +0200, Or Gerlitz wrote:
>> On Wed, Feb 15, 2017 at 10:52 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>>> Currently there is no way of querying whether a filter is
>>> offloaded to HW or not when using "both" policy (where none
>>> of skip_sw or skip_hw flags are set by user-space).

>>> Add two new flags, "in hw" and "not in hw" such that user
>>> space can determine if a filter is actually offloaded to
>>> hw or not. The "in hw" UAPI semantics was chosen so it's
>>> similar to the "skip hw" flag logic.

>> To make things a bit more clear, the semantics of the "in hw"
>> thing relates to the time of dumping the rule.

>> Note that this future change doesn't change the UAPI, it will still
>> have two values, "in hw" and "not in hw".
>> The values with this series are the actual values and later with that change,
>> they will keep being the actual values, just the kernel method to
>> retrieve them will be different.

> Thanks for explaining this, I was actually hoping to take this in
> slightly different direction.

> What worries me is that the moment we started offloading packet
> modification we run at the risk of modifying packets twice.  This used
> to be a problem only for eBPF but now mlx5 can also offload things like
> ttl decrement.

hey, did you broke into my private git... joking, we submitted the tc
pedit enhancements for easy enablement of hw offloading but not yet
the offloading bits... but this is indeed coming up.

When the HW terminates packets that were matched by an offloaded
filter - e.g drop, encap/decap and/or push/pop vlan + fwd to wire/VM,
the SW TC DP (data path) will not see them and the problem doesn't
exist. In the case of non SRIOV NIC header re-write on RX, indeed, the
packet will be seen by the SW DP and if the SW control plane
programmed the rule to the SW DP too they could have a problem.

Re your TTL example, if we're offloading router, with matching on L2
src/dst addresses following by HW re-write of the L2 addresses and
decrements of TTL, the SW DP will not match and the action will not be
carried out twice.

Same for offloading NAT, if the matching is on L3/L4 addresses and
then the HW action is to re-write them, the SW DP will not match and
not re-write again.

So this is not very common to happen, but possible -- user space
application can avoid that by having such filters be set to only one
of the DPs, SW or HW.

> For filters which have no skip_* flags set and get
> offloaded if packet doesn't get redirected or modified significantly it
> will match the filter both in HW and on the host and therefore have the
> actions applied twice.  And it will get counted twice.  (It seems nobody
> ever raised this so perhaps I'm mistaken in thinking that this can happen?)

see above this is very rare and donno if there are real life examples
for such cases, but if it happens, yes, will be counted twice.

> Back to your patch set, I was hoping we will be able to use the new
> IN_HW flag to skip filters in software even if they don't have skip_sw
> set.  If we need to eject actions from HW based on external events, that
> obviously complicates things.  Three trivial solutions to the problem

can you clarify what is the precise problem you refer to? is that the
two DPs working on the same packet you mentioned above?

> I could think of are:
>  - pass a pointer to filter's flags down to the driver and let it do
>    atomic bitops? on it to set/clear the IN_HW status;
>  - add a way of driver calling back into TC;
>  - use one of recently freed skb TC bits to mark packets which were
>    supposed to be processed in HW by could not as needing software
>    fallback (I think this could work for you without parsing the packet
>    in the driver, you could replace the tunnel action with mark action
>    and leave the matching rule in HW classifier/TCAM; for BPF I have a
>    descriptor flag telling me if offloaded BPF completed successfully).


> FWIW this patch set looks good to me!

thanks, so you are okay we move forward with this series and if needed
address the matters you raised here in follow ups?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-15 21:56       ` Or Gerlitz
@ 2017-02-15 23:07         ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2017-02-15 23:07 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David Miller, Jamal Hadi Salim, Jiri Pirko, John Fastabend,
	Roi Dayan, Linux Netdev List, Hadar Hen Zion, Amir Vadai,
	Ido Schimmel

On Wed, 15 Feb 2017 23:56:53 +0200, Or Gerlitz wrote:
> On Wed, Feb 15, 2017 at 8:28 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> > On Wed, 15 Feb 2017 18:19:17 +0200, Or Gerlitz wrote:  
> >> On Wed, Feb 15, 2017 at 10:52 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:  
> >>> Currently there is no way of querying whether a filter is
> >>> offloaded to HW or not when using "both" policy (where none
> >>> of skip_sw or skip_hw flags are set by user-space).  
> 
> >>> Add two new flags, "in hw" and "not in hw" such that user
> >>> space can determine if a filter is actually offloaded to
> >>> hw or not. The "in hw" UAPI semantics was chosen so it's
> >>> similar to the "skip hw" flag logic.  
> 
> >> To make things a bit more clear, the semantics of the "in hw"
> >> thing relates to the time of dumping the rule.  
> 
> >> Note that this future change doesn't change the UAPI, it will still
> >> have two values, "in hw" and "not in hw".
> >> The values with this series are the actual values and later with that change,
> >> they will keep being the actual values, just the kernel method to
> >> retrieve them will be different.  
> 
> > Thanks for explaining this, I was actually hoping to take this in
> > slightly different direction.  
> 
> > What worries me is that the moment we started offloading packet
> > modification we run at the risk of modifying packets twice.  This used
> > to be a problem only for eBPF but now mlx5 can also offload things like
> > ttl decrement.  
> 
> hey, did you broke into my private git... joking, we submitted the tc
> pedit enhancements for easy enablement of hw offloading but not yet
> the offloading bits... but this is indeed coming up.
> 
> When the HW terminates packets that were matched by an offloaded
> filter - e.g drop, encap/decap and/or push/pop vlan + fwd to wire/VM,
> the SW TC DP (data path) will not see them and the problem doesn't
> exist. In the case of non SRIOV NIC header re-write on RX, indeed, the
> packet will be seen by the SW DP and if the SW control plane
> programmed the rule to the SW DP too they could have a problem.
> 
> Re your TTL example, if we're offloading router, with matching on L2
> src/dst addresses following by HW re-write of the L2 addresses and
> decrements of TTL, the SW DP will not match and the action will not be
> carried out twice.
> 
> Same for offloading NAT, if the matching is on L3/L4 addresses and
> then the HW action is to re-write them, the SW DP will not match and
> not re-write again.

Indeed, but also there could be a catch-all rule at the end of ruleset
to drop unmatched packets (firewall+NAT scenario).  I admit, I don't
have great practical examples at the moment.

> So this is not very common to happen, but possible -- user space
> application can avoid that by having such filters be set to only one
> of the DPs, SW or HW.

Yes, I agree this should not commonly happen in simple SR-IOV
forwarding.  But I feel like this is a fundamental problem with the
opportunistic offload model we currently have.  We should guarantee
actions will not be performed twice if the operation of passing packets
through the ruleset is not idempotent.  I think with the flags you are
proposing in your current series we can easily achieve that by skipping
in software DP rules which are in_hw, but we need to make sure that the
in_hw check can be performed efficiently.

> > For filters which have no skip_* flags set and get
> > offloaded if packet doesn't get redirected or modified significantly it
> > will match the filter both in HW and on the host and therefore have the
> > actions applied twice.  And it will get counted twice.  (It seems nobody
> > ever raised this so perhaps I'm mistaken in thinking that this can happen?)  
> 
> see above this is very rare and donno if there are real life examples
> for such cases, but if it happens, yes, will be counted twice.
> 
> > Back to your patch set, I was hoping we will be able to use the new
> > IN_HW flag to skip filters in software even if they don't have skip_sw
> > set.  If we need to eject actions from HW based on external events, that
> > obviously complicates things.  Three trivial solutions to the problem  
> 
> can you clarify what is the precise problem you refer to? is that the
> two DPs working on the same packet you mentioned above?

Yes.

> > I could think of are:
> >  - pass a pointer to filter's flags down to the driver and let it do
> >    atomic bitops? on it to set/clear the IN_HW status;
> >  - add a way of driver calling back into TC;
> >  - use one of recently freed skb TC bits to mark packets which were
> >    supposed to be processed in HW by could not as needing software
> >    fallback (I think this could work for you without parsing the packet
> >    in the driver, you could replace the tunnel action with mark action
> >    and leave the matching rule in HW classifier/TCAM; for BPF I have a
> >    descriptor flag telling me if offloaded BPF completed successfully).  
> 
> 
> > FWIW this patch set looks good to me!  
> 
> thanks, so you are okay we move forward with this series and if needed
> address the matters you raised here in follow ups?

Yes, 100%.  I am only commenting on the idea of extending the in_hw
querying down to the driver in the future.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-15 18:28     ` Jakub Kicinski
  2017-02-15 21:56       ` Or Gerlitz
@ 2017-02-16  8:17       ` Ido Schimmel
  2017-02-17  4:23         ` Jakub Kicinski
  1 sibling, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2017-02-16  8:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Or Gerlitz, David Miller, Jamal Hadi Salim, Jiri Pirko,
	John Fastabend, Roi Dayan, Linux Netdev List, Hadar Hen Zion,
	Amir Vadai, Ido Schimmel

On Wed, Feb 15, 2017 at 10:28:21AM -0800, Jakub Kicinski wrote:
> What worries me is that the moment we started offloading packet
> modification we run at the risk of modifying packets twice.  This used
> to be a problem only for eBPF but now mlx5 can also offload things like
> ttl decrement.  For filters which have no skip_* flags set and get
> offloaded if packet doesn't get redirected or modified significantly it
> will match the filter both in HW and on the host and therefore have the
> actions applied twice.  And it will get counted twice.  (It seems nobody
> ever raised this so perhaps I'm mistaken in thinking that this can
> happen?)

FWIW, we already have that problem with bridge offload. There are some
packets that you can easily forward in hardware, but still wants the
software bridge to receive a copy. IGMP queries for example. These
should be flooded to all ports in the bridge, so we do the forwarding in
hardware, but send a copy to the bridge driver for it to mark the
receiving port as an mrouter port.

To prevent the packet from being flooded twice, we set
'skb->offload_fwd_mark' inside the driver and have the bridge driver
check it during its egress check. It's a bit more involved if you've
several ASICs in the same bridge, but that's the gist. See commit
6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked
devices") for more details.

> Back to your patch set, I was hoping we will be able to use the new
> IN_HW flag to skip filters in software even if they don't have skip_sw
> set.  If we need to eject actions from HW based on external events, that
> obviously complicates things.  Three trivial solutions to the problem
> I could think of are:

[...]

>  - use one of recently freed skb TC bits to mark packets which were
>    supposed to be processed in HW by could not as needing software
>    fallback (I think this could work for you without parsing the packet
>    in the driver, you could replace the tunnel action with mark action
>    and leave the matching rule in HW classifier/TCAM; for BPF I have a
>    descriptor flag telling me if offloaded BPF completed successfully).

This is similar to what I described above. The tricky part is correctly
marking the packet. In mlxsw, for each received packet we get the trap
ID in the DMA descriptor, so we can easily determine whether we should
set 'skb->offload_fwd_mark' or not.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status
  2017-02-16  8:17       ` Ido Schimmel
@ 2017-02-17  4:23         ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2017-02-17  4:23 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Or Gerlitz, David Miller, Jamal Hadi Salim, Jiri Pirko,
	John Fastabend, Roi Dayan, Linux Netdev List, Hadar Hen Zion,
	Amir Vadai, Ido Schimmel

On Thu, 16 Feb 2017 10:17:44 +0200, Ido Schimmel wrote:
> On Wed, Feb 15, 2017 at 10:28:21AM -0800, Jakub Kicinski wrote:
> > What worries me is that the moment we started offloading packet
> > modification we run at the risk of modifying packets twice.  This used
> > to be a problem only for eBPF but now mlx5 can also offload things like
> > ttl decrement.  For filters which have no skip_* flags set and get
> > offloaded if packet doesn't get redirected or modified significantly it
> > will match the filter both in HW and on the host and therefore have the
> > actions applied twice.  And it will get counted twice.  (It seems nobody
> > ever raised this so perhaps I'm mistaken in thinking that this can
> > happen?)  
> 
> FWIW, we already have that problem with bridge offload. There are some
> packets that you can easily forward in hardware, but still wants the
> software bridge to receive a copy. IGMP queries for example. These
> should be flooded to all ports in the bridge, so we do the forwarding in
> hardware, but send a copy to the bridge driver for it to mark the
> receiving port as an mrouter port.
> 
> To prevent the packet from being flooded twice, we set
> 'skb->offload_fwd_mark' inside the driver and have the bridge driver
> check it during its egress check. It's a bit more involved if you've
> several ASICs in the same bridge, but that's the gist. See commit
> 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked
> devices") for more details.

Thanks for mentioning 'skb->offload_fwd_mark'.  It took me a bit to wrap 
my head around it :)
 
> > Back to your patch set, I was hoping we will be able to use the new
> > IN_HW flag to skip filters in software even if they don't have skip_sw
> > set.  If we need to eject actions from HW based on external events, that
> > obviously complicates things.  Three trivial solutions to the problem
> > I could think of are:  
> 
> [...]
> 
> >  - use one of recently freed skb TC bits to mark packets which were
> >    supposed to be processed in HW by could not as needing software
> >    fallback (I think this could work for you without parsing the packet
> >    in the driver, you could replace the tunnel action with mark action
> >    and leave the matching rule in HW classifier/TCAM; for BPF I have a
> >    descriptor flag telling me if offloaded BPF completed successfully).  
> 
> This is similar to what I described above. The tricky part is correctly
> marking the packet. In mlxsw, for each received packet we get the trap
> ID in the DMA descriptor, so we can easily determine whether we should
> set 'skb->offload_fwd_mark' or not.

After trying to compare in bridging offload to TC I'm beginning to
think that skb bit may not actually work in TC case.  If I understand
correctly the bridge forwarding is all or nothing, while there may be
multiple TC rules and single packet may have had some of them applied
but not others*.  And because the TC rules have to be attached directly
to the egress port to get offloaded today we don't have the stacked
device problem your commit solves.  We could try to make drivers mark
packets with id of last executed filter... but that would take a lot of
skb space :S  Luckily AFAIK TC filters still can't be shared across
devices so perhaps making the driver update TC on which filters are in
hardware is not such a bad idea.

* which makes me think about the fact that our TC offloads today don't
  respect ordering of rules in TC.  Ugh.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-02-17  4:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-15  8:52 [PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers Or Gerlitz
2017-02-15  8:52 ` [PATCH net-next V3 1/7] net/sched: cls_flower: Properly handle classifier flags dumping Or Gerlitz
2017-02-15  8:52 ` [PATCH net-next V3 2/7] net/sched: cls_matchall: Dump the classifier flags Or Gerlitz
2017-02-15  8:52 ` [PATCH net-next V3 3/7] net/sched: Reflect HW offload status Or Gerlitz
2017-02-15 16:19   ` Or Gerlitz
2017-02-15 18:28     ` Jakub Kicinski
2017-02-15 21:56       ` Or Gerlitz
2017-02-15 23:07         ` Jakub Kicinski
2017-02-16  8:17       ` Ido Schimmel
2017-02-17  4:23         ` Jakub Kicinski
2017-02-15  8:52 ` [PATCH net-next V3 4/7] net/sched: cls_flower: " Or Gerlitz
2017-02-15  8:52 ` [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status Or Gerlitz
2017-02-15 18:15   ` David Miller
2017-02-15 21:40     ` Or Gerlitz
2017-02-15  8:52 ` [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status Or Gerlitz
2017-02-15 18:16   ` David Miller
2017-02-15  8:52 ` [PATCH net-next V3 7/7] net/sched: cls_bpf: " Or Gerlitz
2017-02-15 18:16   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.