netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/13] Handle multi chain hardware misses
@ 2020-01-21 16:16 Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next 01/13] net: sched: support skb chain ext in tc classification path Paul Blakey
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Hi David/Jakub/Saeed,

TC multi chain configuration can cause offloaded tc chains to miss in
hardware after jumping to some chain. In such cases the software should
continue from the chain that was missed in hardware, as the hardware may have
manipulated the packet and updated some counters.

The first patch enables tc classification to start from a specified chain by
re-using the existing TC_SKB_EXT skb extension.

The next six patches are the Mellanox driver implementation of the miss path.
The driver loads the last processed chain from HW register (reg_c0, then flow_tag)
and stores it on the TC_SKB_EXT skb extension for continued processing
in software.

The final six patches introduce the Mellanox driver implementation for handling
tunnel restore when the packet was decapsulated on first chain hop.
Early decapsulation creates two issues:
1. The outer headers will not be available in later chains
2. If the HW will miss on later chains, the packet will come up to software
   without the tunnel header. Therefore, sw matches on the tunnel info will miss.

Address these issues by mapping a unique id per tunnel info. The mapping is
stored on hardware register (c1) when the packet is decapsulated. On miss,
use the id to restore the tunnel info metadata on the skb.

Note that miss path handling of multi-chain rules is a required infrastructure
for connection tracking hardware offload. The connection tracking offload
series will follow this one.

Paul Blakey (12):
  net/mlx5: Add new driver lib for mappings unique ids to data
  net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
  net/mlx5: E-Switch, Get reg_c0 value on CQE
  net/mlx5: E-Switch, Mark miss packets with new chain id mapping
  net/mlx5e: Rx, Split rep rx mpwqe handler from nic
  net/mlx5: E-Switch, Restore chain id on miss
  net/mlx5e: Allow re-allocating mod header actions
  net/mlx5e: Move tc tunnel parsing logic with the rest at tc_tun module
  net/mlx5e: Disallow inserting vxlan/vlan egress rules without
    decap/pop
  net/mlx5e: Support inner header rewrite with goto action
  net/mlx5: E-Switch, Get reg_c1 value on miss
  net/mlx5e: Restore tunnel metadata on miss

Vlad Buslov (1):
  net: sched: support skb chain ext in tc classification path

 drivers/infiniband/hw/mlx5/main.c                  |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c    | 112 ++-
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.h    |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |   7 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |  66 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 817 ++++++++++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  45 ++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  16 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 253 ++++++-
 .../mellanox/mlx5/core/eswitch_offloads_chains.c   | 130 +++-
 .../mellanox/mlx5/core/eswitch_offloads_chains.h   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   4 +-
 .../net/ethernet/mellanox/mlx5/core/lib/mapping.c  | 362 +++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/mapping.h  |  31 +
 include/linux/mlx5/eswitch.h                       |  11 +-
 include/net/pkt_cls.h                              |  17 +-
 include/net/sch_generic.h                          |   6 +-
 net/core/dev.c                                     |   6 +-
 net/sched/cls_api.c                                |  64 +-
 net/sched/sch_atm.c                                |   2 +-
 net/sched/sch_cake.c                               |   2 +-
 net/sched/sch_cbq.c                                |   2 +-
 net/sched/sch_drr.c                                |   2 +-
 net/sched/sch_dsmark.c                             |   2 +-
 net/sched/sch_fq_codel.c                           |   2 +-
 net/sched/sch_generic.c                            |   3 +-
 net/sched/sch_hfsc.c                               |   3 +-
 net/sched/sch_htb.c                                |   3 +-
 net/sched/sch_ingress.c                            |   5 +-
 net/sched/sch_multiq.c                             |   2 +-
 net/sched/sch_prio.c                               |   2 +-
 net/sched/sch_qfq.c                                |   2 +-
 net/sched/sch_sfb.c                                |   2 +-
 net/sched/sch_sfq.c                                |   2 +-
 36 files changed, 1742 insertions(+), 257 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next 01/13] net: sched: support skb chain ext in tc classification path
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data Paul Blakey
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

From: Vlad Buslov <vladbu@mellanox.com>

To handle the case where we offload tc chains, and might have a miss in
hardware after jumping to some chain, we would want to continue in the
correct chain in software and not start classification again. Tc
classification will start from the chain that is specified in the
TC_SKB_EXT skb extension which can be set by drivers.

Implement tcf_classify_ingress() wrapper that lookups the first tp on chain
specified by skb extension and calls tcf_classify() with the result. The
wrapper implementation requires obtaining ingress queue block and a helper
function to lookup chain by id on the ingress block. Implement the required
functions in following way:

- Extend tcf_chain_head_change_t function typedef with additional tcf_block
argument and modify mini_Qdisc to include pointer to tcf_block. Set block
pointer passed as an argument to ingress chain head change callback as
mini_Qdisc->block and read it in tcf_classify_ingress() to obtain ingress
block.

- In order to allow searching for chain by index from atomic context,
implement tcf_chain_lookup_rcu() that lookups chain by index under rcu
read lock protection. Change tcf_block->chain_list type to rcu list to
allow tcf_chain_lookup_rcu() implementation. Use this new helper to
obtain chain by index from ingress block in tcf_classify_ingress().

Pass tp list of chain obtained by new functionality described in previous
paragraph to tcf_classify(). Extend tcf_classify() with 'orig_tp' argument
which is used to correctly implement TC_ACT_RECLASSIFY action when tp was
substituted by tcf_classify_ingress() according to skb chain extension.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h     | 17 ++++++++++++-
 include/net/sch_generic.h |  6 +++--
 net/core/dev.c            |  6 +++--
 net/sched/cls_api.c       | 64 ++++++++++++++++++++++++++++++++++++++++-------
 net/sched/sch_atm.c       |  2 +-
 net/sched/sch_cake.c      |  2 +-
 net/sched/sch_cbq.c       |  2 +-
 net/sched/sch_drr.c       |  2 +-
 net/sched/sch_dsmark.c    |  2 +-
 net/sched/sch_fq_codel.c  |  2 +-
 net/sched/sch_generic.c   |  3 ++-
 net/sched/sch_hfsc.c      |  3 ++-
 net/sched/sch_htb.c       |  3 ++-
 net/sched/sch_ingress.c   |  5 ++--
 net/sched/sch_multiq.c    |  2 +-
 net/sched/sch_prio.c      |  2 +-
 net/sched/sch_qfq.c       |  2 +-
 net/sched/sch_sfb.c       |  2 +-
 net/sched/sch_sfq.c       |  2 +-
 19 files changed, 99 insertions(+), 30 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 47b115e..b2fe323 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -71,7 +71,12 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 }
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
-		 struct tcf_result *res, bool compat_mode);
+		 const struct tcf_proto *orig_tp, struct tcf_result *res,
+		 bool compat_mode);
+int tcf_classify_ingress(struct sk_buff *skb,
+			 const struct tcf_block *ingress_block,
+			 const struct tcf_proto *tp, struct tcf_result *res,
+			 bool compat_mode);
 
 #else
 static inline bool tcf_block_shared(struct tcf_block *block)
@@ -129,10 +134,20 @@ void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
 }
 
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+			       const struct tcf_proto *orig_tp,
 			       struct tcf_result *res, bool compat_mode)
 {
 	return TC_ACT_UNSPEC;
 }
+
+static inline int tcf_classify_ingress(struct sk_buff *skb,
+				       const struct tcf_block *ingress_block,
+				       const struct tcf_proto *tp,
+				       struct tcf_result *res, bool compat_mode)
+{
+	return TC_ACT_UNSPEC;
+}
+
 #endif
 
 static inline unsigned long
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fceddf8..d86bba1 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -381,7 +381,8 @@ struct qdisc_skb_cb {
 	unsigned char		data[QDISC_CB_PRIV_LEN];
 };
 
-typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
+typedef void tcf_chain_head_change_t(struct tcf_block *block,
+				     struct tcf_proto *tp_head, void *priv);
 
 struct tcf_chain {
 	/* Protects filter_chain. */
@@ -1268,6 +1269,7 @@ static inline void psched_ratecfg_getrate(struct tc_ratespec *res,
  */
 struct mini_Qdisc {
 	struct tcf_proto *filter_list;
+	struct tcf_block *block;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 	struct rcu_head rcu;
@@ -1291,7 +1293,7 @@ struct mini_Qdisc_pair {
 };
 
 void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
-			  struct tcf_proto *tp_head);
+			  struct tcf_block *block, struct tcf_proto *tp_head);
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 			  struct mini_Qdisc __rcu **p_miniq);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 6368c94..214a5dfd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3743,7 +3743,8 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	mini_qdisc_bstats_cpu_update(miniq, skb);
 
-	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
+	switch (tcf_classify(skb, miniq->filter_list, miniq->filter_list,
+			     &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
 		skb->tc_index = TC_H_MIN(cl_res.classid);
@@ -4810,7 +4811,8 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
 	skb->tc_at_ingress = 1;
 	mini_qdisc_bstats_cpu_update(miniq, skb);
 
-	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
+	switch (tcf_classify_ingress(skb, miniq->block, miniq->filter_list,
+				     &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
 		skb->tc_index = TC_H_MIN(cl_res.classid);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 76e0d12..b5314af 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -22,6 +22,7 @@
 #include <linux/idr.h>
 #include <linux/rhashtable.h>
 #include <linux/jhash.h>
+#include <linux/rculist.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/netlink.h>
@@ -354,7 +355,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 	if (!chain)
 		return NULL;
-	list_add_tail(&chain->list, &block->chain_list);
+	list_add_tail_rcu(&chain->list, &block->chain_list);
 	mutex_init(&chain->filter_chain_lock);
 	chain->block = block;
 	chain->index = chain_index;
@@ -365,10 +366,12 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 }
 
 static void tcf_chain_head_change_item(struct tcf_filter_chain_list_item *item,
+				       struct tcf_block *block,
 				       struct tcf_proto *tp_head)
 {
 	if (item->chain_head_change)
-		item->chain_head_change(tp_head, item->chain_head_change_priv);
+		item->chain_head_change(block, tp_head,
+					item->chain_head_change_priv);
 }
 
 static void tcf_chain0_head_change(struct tcf_chain *chain,
@@ -382,7 +385,7 @@ static void tcf_chain0_head_change(struct tcf_chain *chain,
 
 	mutex_lock(&block->lock);
 	list_for_each_entry(item, &block->chain0.filter_chain_list, list)
-		tcf_chain_head_change_item(item, tp_head);
+		tcf_chain_head_change_item(item, block, tp_head);
 	mutex_unlock(&block->lock);
 }
 
@@ -394,7 +397,7 @@ static bool tcf_chain_detach(struct tcf_chain *chain)
 
 	ASSERT_BLOCK_LOCKED(block);
 
-	list_del(&chain->list);
+	list_del_rcu(&chain->list);
 	if (!chain->index)
 		block->chain0.chain = NULL;
 
@@ -453,6 +456,20 @@ static struct tcf_chain *tcf_chain_lookup(struct tcf_block *block,
 	return NULL;
 }
 
+#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+static struct tcf_chain *tcf_chain_lookup_rcu(const struct tcf_block *block,
+					      u32 chain_index)
+{
+	struct tcf_chain *chain;
+
+	list_for_each_entry_rcu(chain, &block->chain_list, list) {
+		if (chain->index == chain_index)
+			return chain;
+	}
+	return NULL;
+}
+#endif
+
 static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb,
 			   u32 seq, u16 flags, int event, bool unicast);
 
@@ -822,7 +839,7 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 
 		tp_head = tcf_chain_dereference(chain0->filter_chain, chain0);
 		if (tp_head)
-			tcf_chain_head_change_item(item, tp_head);
+			tcf_chain_head_change_item(item, block, tp_head);
 
 		mutex_lock(&block->lock);
 		list_add(&item->list, &block->chain0.filter_chain_list);
@@ -847,7 +864,7 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 		    (item->chain_head_change == ei->chain_head_change &&
 		     item->chain_head_change_priv == ei->chain_head_change_priv)) {
 			if (block->chain0.chain)
-				tcf_chain_head_change_item(item, NULL);
+				tcf_chain_head_change_item(item, NULL, NULL);
 			list_del(&item->list);
 			mutex_unlock(&block->lock);
 
@@ -1384,7 +1401,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 }
 EXPORT_SYMBOL(tcf_block_get_ext);
 
-static void tcf_chain_head_change_dflt(struct tcf_proto *tp_head, void *priv)
+static void tcf_chain_head_change_dflt(struct tcf_block *block,
+				       struct tcf_proto *tp_head, void *priv)
 {
 	struct tcf_proto __rcu **p_filter_chain = priv;
 
@@ -1560,11 +1578,11 @@ static int tcf_block_setup(struct tcf_block *block,
  * specific classifiers.
  */
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
-		 struct tcf_result *res, bool compat_mode)
+		 const struct tcf_proto *orig_tp, struct tcf_result *res,
+		 bool compat_mode)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	const int max_reclassify_loop = 4;
-	const struct tcf_proto *orig_tp = tp;
 	const struct tcf_proto *first_tp;
 	int limit = 0;
 
@@ -1621,6 +1639,34 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 }
 EXPORT_SYMBOL(tcf_classify);
 
+int tcf_classify_ingress(struct sk_buff *skb,
+			 const struct tcf_block *ingress_block,
+			 const struct tcf_proto *tp, struct tcf_result *res,
+			 bool compat_mode)
+{
+	const struct tcf_proto *orig_tp = tp;
+
+#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	{
+		struct tc_skb_ext *ext = skb_ext_find(skb, TC_SKB_EXT);
+
+		if (ext && ext->chain && ingress_block) {
+			struct tcf_chain *fchain;
+
+			fchain = tcf_chain_lookup_rcu(ingress_block,
+						      ext->chain);
+			if (!fchain)
+				return TC_ACT_UNSPEC;
+
+			tp = rcu_dereference_bh(fchain->filter_chain);
+		}
+	}
+#endif
+
+	return tcf_classify(skb, tp, orig_tp, res, compat_mode);
+}
+EXPORT_SYMBOL(tcf_classify_ingress);
+
 struct tcf_chain_info {
 	struct tcf_proto __rcu **pprev;
 	struct tcf_proto __rcu *next;
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index f4f9b8c..f01d5881 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -393,7 +393,7 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		list_for_each_entry(flow, &p->flows, list) {
 			fl = rcu_dereference_bh(flow->filter_list);
 			if (fl) {
-				result = tcf_classify(skb, fl, &res, true);
+				result = tcf_classify(skb, fl, fl, &res, true);
 				if (result < 0)
 					continue;
 				flow = (struct atm_flow_data *)res.class;
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 1496e87..d7c9ae7 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1600,7 +1600,7 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,
 		goto hash;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tcf_classify(skb, filter, &res, false);
+	result = tcf_classify(skb, filter, filter, &res, false);
 
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 39b427d..06670b4 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -228,7 +228,7 @@ struct cbq_sched_data {
 		/*
 		 * Step 2+n. Apply classifier.
 		 */
-		result = tcf_classify(skb, fl, &res, true);
+		result = tcf_classify(skb, fl, fl, &res, true);
 		if (!fl || result < 0)
 			goto fallback;
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 07a2b0b..cf4dc9a 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -316,7 +316,7 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	fl = rcu_dereference_bh(q->filter_list);
-	result = tcf_classify(skb, fl, &res, false);
+	result = tcf_classify(skb, fl, fl, &res, false);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 05605b3..75cbe92 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -241,7 +241,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	else {
 		struct tcf_result res;
 		struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
-		int result = tcf_classify(skb, fl, &res, false);
+		int result = tcf_classify(skb, fl, fl, &res, false);
 
 		pr_debug("result %d class 0x%04x\n", result, res.classid);
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 968519f..2229720 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -91,7 +91,7 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 		return fq_codel_hash(q, skb) + 1;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tcf_classify(skb, filter, &res, false);
+	result = tcf_classify(skb, filter, filter, &res, false);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6c9595f..1bfddc4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1355,7 +1355,7 @@ static void mini_qdisc_rcu_func(struct rcu_head *head)
 }
 
 void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
-			  struct tcf_proto *tp_head)
+			  struct tcf_block *block, struct tcf_proto *tp_head)
 {
 	/* Protected with chain0->filter_chain_lock.
 	 * Can't access chain directly because tp_head can be NULL.
@@ -1380,6 +1380,7 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 	 */
 	rcu_barrier();
 	miniq->filter_list = tp_head;
+	miniq->block = block;
 	rcu_assign_pointer(*miniqp->p_miniq, miniq);
 
 	if (miniq_old)
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 433f219..317a864 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1129,7 +1129,8 @@ struct hfsc_sched {
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	head = &q->root;
 	tcf = rcu_dereference_bh(q->root.filter_list);
-	while (tcf && (result = tcf_classify(skb, tcf, &res, false)) >= 0) {
+	while (tcf && (result = tcf_classify(skb, tcf, tcf, &res, false))
+	       >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_QUEUED:
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 8184c87..124b1b0 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -232,7 +232,8 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 	}
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	while (tcf && (result = tcf_classify(skb, tcf, &res, false)) >= 0) {
+	while (tcf && (result = tcf_classify(skb, tcf, tcf, &res, false))
+	       >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_QUEUED:
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index bf56aa5..5b0fe98 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -52,11 +52,12 @@ static struct tcf_block *ingress_tcf_block(struct Qdisc *sch, unsigned long cl,
 	return q->block;
 }
 
-static void clsact_chain_head_change(struct tcf_proto *tp_head, void *priv)
+static void clsact_chain_head_change(struct tcf_block *block,
+				     struct tcf_proto *tp_head, void *priv)
 {
 	struct mini_Qdisc_pair *miniqp = priv;
 
-	mini_qdisc_pair_swap(miniqp, tp_head);
+	mini_qdisc_pair_swap(miniqp, block, tp_head);
 };
 
 static void ingress_ingress_block_set(struct Qdisc *sch, u32 block_index)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 1330ad2..f2ca000 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -36,7 +36,7 @@ struct multiq_sched_data {
 	int err;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	err = tcf_classify(skb, fl, &res, false);
+	err = tcf_classify(skb, fl, fl, &res, false);
 #ifdef CONFIG_NET_CLS_ACT
 	switch (err) {
 	case TC_ACT_STOLEN:
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 6479417..a28d05a 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -39,7 +39,7 @@ struct prio_sched_data {
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	if (TC_H_MAJ(skb->priority) != sch->handle) {
 		fl = rcu_dereference_bh(q->filter_list);
-		err = tcf_classify(skb, fl, &res, false);
+		err = tcf_classify(skb, fl, fl, &res, false);
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 0b05ac7..e2033e5 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -691,7 +691,7 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	fl = rcu_dereference_bh(q->filter_list);
-	result = tcf_classify(skb, fl, &res, false);
+	result = tcf_classify(skb, fl, fl, &res, false);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 4074c50..63672f3 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -257,7 +257,7 @@ static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 	struct tcf_result res;
 	int result;
 
-	result = tcf_classify(skb, fl, &res, false);
+	result = tcf_classify(skb, fl, fl, &res, false);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c787d4d..8763f09 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -178,7 +178,7 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 		return sfq_hash(q, skb) + 1;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tcf_classify(skb, fl, &res, false);
+	result = tcf_classify(skb, fl, fl, &res, false);
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next 01/13] net: sched: support skb chain ext in tc classification path Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 19:04   ` Leon Romanovsky
  2020-01-21 16:16 ` [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits Paul Blakey
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Add a new interface for mapping data to a given id range (max_id),
and back again. It supports variable sized data, and different
allocators, and read/write locks.

This mapping interface also supports delaying the mapping removal via
a workqueue. This is for cases where we need the mapping to have
some grace period in regards to finding it back again, for example
for packets arriving from hardware that were marked with by a rule
with an old mapping that no longer exists.

We also provide a first implementation of the interface is idr_mapping
that uses idr for the allocator and a mutex lock for writes
(add/del, but not for find).

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 .../net/ethernet/mellanox/mlx5/core/lib/mapping.c  | 362 +++++++++++++++++++++
 .../net/ethernet/mellanox/mlx5/core/lib/mapping.h  |  31 ++
 3 files changed, 394 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d3e06ce..e84d6d0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -34,7 +34,7 @@ mlx5_core-$(CONFIG_MLX5_EN_ARFS)     += en_arfs.o
 mlx5_core-$(CONFIG_MLX5_EN_RXNFC)    += en_fs_ethtool.o
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
 mlx5_core-$(CONFIG_MLX5_ESWITCH)     += en_rep.o en_tc.o en/tc_tun.o lib/port_tun.o lag_mp.o \
-					lib/geneve.o en/tc_tun_vxlan.o en/tc_tun_gre.o \
+					lib/geneve.o lib/mapping.o en/tc_tun_vxlan.o en/tc_tun_gre.o \
 					en/tc_tun_geneve.o diag/en_tc_tracepoint.o
 mlx5_core-$(CONFIG_PCI_HYPERV_INTERFACE) += en/hv_vhca_stats.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.c
new file mode 100644
index 0000000..1c25223
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.c
@@ -0,0 +1,362 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2018 Mellanox Technologies */
+
+#include <linux/jhash.h>
+#include <linux/slab.h>
+#include <linux/idr.h>
+#include <linux/hashtable.h>
+
+#include "mapping.h"
+
+#define MAPPING_GRACE_PERIOD 2000
+
+struct mapping_item {
+	struct hlist_node node;
+	int cnt;
+	u32 id;
+
+	char data[0]; /* Must be last for correct allocation */
+};
+
+struct mapping_ctx_ops {
+	struct mapping_item * (*alloc_item)(struct mapping_ctx *ctx);
+	void (*free_item)(struct mapping_ctx *ctx, struct mapping_item *mi);
+
+	int (*assign_id)(struct mapping_ctx *ctx, struct mapping_item *mi);
+	struct mapping_item * (*find_id)(struct mapping_ctx *ctx, u32 id);
+
+	void (*lock)(struct mapping_ctx *ctx, bool write);
+	void (*unlock)(struct mapping_ctx *ctx, bool write);
+};
+
+struct mapping_ctx {
+	unsigned long max_id;
+	size_t data_size;
+
+	const struct mapping_ctx_ops *ops;
+
+	DECLARE_HASHTABLE(ht, 8);
+};
+
+int
+mapping_add(struct mapping_ctx *ctx, void *data, u32 *id)
+{
+	struct mapping_item *mi;
+	u32 hash_key;
+	int err;
+
+	if (ctx->ops->lock)
+		ctx->ops->lock(ctx, true);
+
+	hash_key = jhash(data, ctx->data_size, 0);
+	hash_for_each_possible(ctx->ht, mi, node, hash_key) {
+		if (!memcmp(data, mi->data, ctx->data_size))
+			goto attach;
+	}
+
+	mi = ctx->ops->alloc_item(ctx);
+	if (IS_ERR(mi)) {
+		err = PTR_ERR(mi);
+		goto err_alloc;
+	}
+
+	memcpy(mi->data, data, ctx->data_size);
+	hash_add(ctx->ht, &mi->node, hash_key);
+
+	err = ctx->ops->assign_id(ctx, mi);
+	if (err)
+		goto err_assign;
+
+attach:
+	++mi->cnt;
+	*id = mi->id;
+
+	if (ctx->ops->lock)
+		ctx->ops->unlock(ctx, true);
+
+	return 0;
+
+err_assign:
+	hash_del(&mi->node);
+	ctx->ops->free_item(ctx, mi);
+err_alloc:
+	if (ctx->ops->lock)
+		ctx->ops->unlock(ctx, true);
+	return err;
+}
+
+int
+mapping_remove(struct mapping_ctx *ctx, u32 id)
+{
+	struct mapping_item *mi;
+	int err = -ENOENT;
+
+	if (ctx->ops->lock)
+		ctx->ops->lock(ctx, true);
+
+	mi = ctx->ops->find_id(ctx, id);
+	if (!mi)
+		goto out;
+	err = 0;
+
+	if (--mi->cnt > 0)
+		goto out;
+
+	hash_del(&mi->node);
+
+	ctx->ops->free_item(ctx, mi);
+
+out:
+	if (ctx->ops->lock)
+		ctx->ops->unlock(ctx, true);
+
+	return err;
+}
+
+int
+mapping_find(struct mapping_ctx *ctx, u32 id, void *data)
+{
+	struct mapping_item *mi;
+	int err = -ENOENT;
+
+	if (ctx->ops->lock)
+		ctx->ops->lock(ctx, false);
+
+	mi = ctx->ops->find_id(ctx, id);
+	if (!mi)
+		goto err_find;
+
+	memcpy(data, mi->data, ctx->data_size);
+	err = 0;
+
+err_find:
+	if (ctx->ops->lock)
+		ctx->ops->unlock(ctx, false);
+	return err;
+}
+
+static void
+mapping_ctx_init(struct mapping_ctx *ctx, size_t data_size, u32 max_id,
+		 const struct mapping_ctx_ops *ops)
+{
+	ctx->data_size = data_size;
+	ctx->max_id = max_id;
+	ctx->ops = ops;
+}
+
+struct mapping_idr_ctx {
+	struct mapping_ctx ctx;
+
+	struct idr idr;
+	struct mutex lock; /* guards the idr */
+
+	bool delayed_removal;
+	struct delayed_work dwork;
+	struct list_head pending_list;
+	spinlock_t pending_list_lock; /* guards pending list */
+};
+
+struct mapping_idr_item {
+	struct rcu_head rcu;
+	struct list_head list;
+	typeof(jiffies) timeout;
+
+	struct mapping_item item; /* Must be last for correct allocation */
+};
+
+static struct mapping_item *
+mapping_idr_alloc_item(struct mapping_ctx *ctx)
+{
+	struct mapping_idr_item *dmi;
+
+	dmi = kzalloc(sizeof(*dmi) + ctx->data_size, GFP_KERNEL);
+	return dmi ? &dmi->item : ERR_PTR(-ENOMEM);
+}
+
+static void
+mapping_idr_remove_and_free(struct mapping_idr_ctx *dctx,
+			    struct mapping_idr_item *dmi)
+{
+	idr_remove(&dctx->idr, dmi->item.id);
+	kfree_rcu(dmi, rcu);
+}
+
+static void
+mapping_idr_free_item(struct mapping_ctx *ctx, struct mapping_item *mi)
+{
+	struct mapping_idr_ctx *dctx;
+	struct mapping_idr_item *dmi;
+
+	dctx = container_of(ctx, struct mapping_idr_ctx, ctx);
+	dmi = container_of(mi, struct mapping_idr_item, item);
+
+	if (!mi->id) {
+		/* Not added to idr yet, we can free directly */
+		kfree(dmi);
+		return;
+	}
+
+	if (dctx->delayed_removal) {
+		dmi->timeout =
+			jiffies + msecs_to_jiffies(MAPPING_GRACE_PERIOD);
+
+		spin_lock(&dctx->pending_list_lock);
+		list_add_tail(&dmi->list, &dctx->pending_list);
+		spin_unlock(&dctx->pending_list_lock);
+
+		schedule_delayed_work(&dctx->dwork, MAPPING_GRACE_PERIOD);
+		return;
+	}
+
+	mapping_idr_remove_and_free(dctx, dmi);
+}
+
+static int
+mapping_idr_assign_id(struct mapping_ctx *ctx, struct mapping_item *mi)
+{
+	struct mapping_idr_ctx *dctx;
+	u32 max_id, index = 1;
+	int err;
+
+	max_id = ctx->max_id ? ctx->max_id : UINT_MAX;
+
+	dctx = container_of(ctx, struct mapping_idr_ctx, ctx);
+	err = idr_alloc_u32(&dctx->idr, mi, &index, max_id, GFP_KERNEL);
+	if (err)
+		return err;
+
+	mi->id = index;
+
+	return 0;
+}
+
+static struct mapping_item *
+mapping_idr_find_id(struct mapping_ctx *ctx, u32 id)
+{
+	struct mapping_idr_ctx *dctx;
+
+	dctx = container_of(ctx, struct mapping_idr_ctx, ctx);
+	return idr_find(&dctx->idr, id);
+}
+
+static void
+mapping_idr_lock(struct mapping_ctx *ctx, bool write)
+{
+	struct mapping_idr_ctx *dctx;
+
+	if (!write) {
+		rcu_read_lock();
+		return;
+	}
+
+	dctx = container_of(ctx, struct mapping_idr_ctx, ctx);
+	mutex_lock(&dctx->lock);
+}
+
+static void
+mapping_idr_unlock(struct mapping_ctx *ctx, bool write)
+{
+	struct mapping_idr_ctx *dctx;
+
+	if (!write) {
+		rcu_read_unlock();
+		return;
+	}
+
+	dctx = container_of(ctx, struct mapping_idr_ctx, ctx);
+	mutex_unlock(&dctx->lock);
+}
+
+static const struct mapping_ctx_ops idr_ops = {
+	.alloc_item = mapping_idr_alloc_item,
+	.free_item = mapping_idr_free_item,
+	.assign_id = mapping_idr_assign_id,
+	.find_id = mapping_idr_find_id,
+	.lock = mapping_idr_lock,
+	.unlock = mapping_idr_unlock,
+};
+
+static void
+mapping_idr_work_handler(struct work_struct *work)
+{
+	typeof(jiffies) min_timeout = 0, now = jiffies;
+	struct mapping_idr_item *dmi, *next;
+	struct mapping_idr_ctx *dctx;
+	LIST_HEAD(pending_items);
+
+	dctx = container_of(work, struct mapping_idr_ctx, dwork.work);
+
+	spin_lock(&dctx->pending_list_lock);
+	list_for_each_entry_safe(dmi, next, &dctx->pending_list, list) {
+		if (time_after(now, dmi->timeout))
+			list_move(&dmi->list, &pending_items);
+		else if (!min_timeout ||
+			 time_before(dmi->timeout, min_timeout))
+			min_timeout = dmi->timeout;
+	}
+	spin_unlock(&dctx->pending_list_lock);
+
+	list_for_each_entry_safe(dmi, next, &pending_items, list)
+		mapping_idr_remove_and_free(dctx, dmi);
+
+	if (min_timeout)
+		schedule_delayed_work(&dctx->dwork, abs(min_timeout - now));
+}
+
+static void
+mapping_idr_flush_work(struct mapping_idr_ctx *dctx)
+{
+	struct mapping_idr_item *dmi;
+
+	if (!dctx->delayed_removal)
+		return;
+
+	spin_lock(&dctx->pending_list_lock);
+	list_for_each_entry(dmi, &dctx->pending_list, list)
+		dmi->timeout = jiffies;
+	spin_unlock(&dctx->pending_list_lock);
+
+	/* Queue again, so we'll clean the pending list */
+	schedule_delayed_work(&dctx->dwork, 0);
+
+	/* Wait for queued work to be finished */
+	flush_delayed_work(&dctx->dwork);
+}
+
+struct mapping_ctx *
+mapping_idr_create(size_t data_size, u32 max_id, bool delayed_removal)
+{
+	struct mapping_idr_ctx *dctx;
+
+	dctx = kzalloc(sizeof(*dctx), GFP_KERNEL);
+	if (!dctx)
+		return ERR_PTR(-ENOMEM);
+
+	mapping_ctx_init(&dctx->ctx, data_size, max_id, &idr_ops);
+
+	if (delayed_removal) {
+		INIT_DELAYED_WORK(&dctx->dwork, mapping_idr_work_handler);
+		INIT_LIST_HEAD(&dctx->pending_list);
+		spin_lock_init(&dctx->pending_list_lock);
+	}
+
+	dctx->delayed_removal = delayed_removal;
+	idr_init(&dctx->idr);
+	mutex_init(&dctx->lock);
+
+	return &dctx->ctx;
+}
+
+void
+mapping_idr_destroy(struct mapping_ctx *ctx)
+{
+	struct mapping_idr_ctx *dctx = container_of(ctx,
+						    struct mapping_idr_ctx,
+						    ctx);
+
+	mapping_idr_flush_work(dctx);
+	idr_destroy(&dctx->idr);
+	mutex_destroy(&dctx->lock);
+
+	kfree(dctx);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.h
new file mode 100644
index 0000000..3704205
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/mapping.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2019 Mellanox Technologies */
+
+#ifndef __MLX5_MAPPING_H__
+#define __MLX5_MAPPING_H__
+
+struct mapping_ctx;
+
+int
+mapping_add(struct mapping_ctx *ctx, void *data, u32 *id);
+int
+mapping_remove(struct mapping_ctx *ctx, u32 id);
+int
+mapping_find(struct mapping_ctx *ctx, u32 id, void *data);
+
+/* mapping_idr uses an idr to map data to ids in add(), and for find().
+ * For locking, it uses a mutex for add()/remove(). find() uses
+ * rcu_read_lock().
+ * Choosing delayed_removal postpones the removal of a previously mapped
+ * id by MAPPING_GRACE_PERIOD milliseconds.
+ * This is to avoid races against hardware, where we mark the packet in
+ * hardware with a previous id, and quick remove() and add() reusing the same
+ * previous id. Then find() will get the new mapping instead of the old
+ * which was used to mark the packet.
+ */
+struct mapping_ctx *
+mapping_idr_create(size_t data_size, u32 max_id, bool delayed_removal);
+void
+mapping_idr_destroy(struct mapping_ctx *ctx);
+
+#endif /* __MLX5_MAPPING_H__ */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next 01/13] net: sched: support skb chain ext in tc classification path Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 19:08   ` Leon Romanovsky
  2020-01-21 16:16 ` [PATCH net-next-mlx5 04/13] net/mlx5: E-Switch, Get reg_c0 value on CQE Paul Blakey
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Multi chain support requires the miss path to continue the processing
from the last chain id, and for that we need to save the chain
miss tag (a mapping for 32bit chain id) on reg_c0 which will
come in a next patch.

Currently reg_c0 is exclusively used to store the source port
metadata, giving it 32bit, it is created from 16bits of vcha_id,
and 16bits of vport number.

We will move this source port metadata to upper 16bits, and leave the
lower bits for the chain miss tag. We compress the reg_c0 source port
metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
the vport number.

Since we compress the vport number to 8bits statically, and leave two
top ids for special PF/ECPF numbers, we will only support a max of 254
vports with this strategy.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c                  |  3 +-
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 81 +++++++++++++++++++---
 include/linux/mlx5/eswitch.h                       | 11 ++-
 3 files changed, 82 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 90489c5..844351c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3535,7 +3535,8 @@ static void mlx5_ib_set_rule_source_port(struct mlx5_ib_dev *dev,
 		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
 				    misc_parameters_2);
 
-		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
+		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
+			 mlx5_eswitch_get_vport_metadata_mask());
 	} else {
 		misc = MLX5_ADDR_OF(fte_match_param, spec->match_value,
 				    misc_parameters);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index a6d0b62..873b19c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -50,6 +50,19 @@
 #define MLX5_ESW_MISS_FLOWS (2)
 #define UPLINK_REP_INDEX 0
 
+/* Reg C0 usage:
+ * Reg C0 = < VHCA_ID_BITS(8) | VPORT BITS(8) | CHAIN_TAG(16) >
+ *
+ * Highest 8 bits of the reg c0 is the vhca_id, next 8 bits is vport_num,
+ * the rest (lowest 16 bits) is left for tc chain tag restoration.
+ * VHCA_ID + VPORT comprise the SOURCE_PORT matching.
+ */
+#define VHCA_ID_BITS 8
+#define VPORT_BITS 8
+#define SOURCE_PORT_METADATA_BITS (VHCA_ID_BITS + VPORT_BITS)
+#define SOURCE_PORT_METADATA_OFFSET (32 - SOURCE_PORT_METADATA_BITS)
+#define CHAIN_TAG_METADATA_BITS (32 - SOURCE_PORT_METADATA_BITS)
+
 static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
 						     u16 vport_num)
 {
@@ -85,7 +98,8 @@ static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
 								   attr->in_rep->vport));
 
 		misc2 = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters_2);
-		MLX5_SET_TO_ONES(fte_match_set_misc2, misc2, metadata_reg_c_0);
+		MLX5_SET(fte_match_set_misc2, misc2, metadata_reg_c_0,
+			 mlx5_eswitch_get_vport_metadata_mask());
 
 		spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
 		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters);
@@ -621,7 +635,8 @@ static void peer_miss_rules_setup(struct mlx5_eswitch *esw,
 	if (mlx5_eswitch_vport_match_metadata_enabled(esw)) {
 		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
 				    misc_parameters_2);
-		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
+		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
+			 mlx5_eswitch_get_vport_metadata_mask());
 
 		spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
 	} else {
@@ -851,8 +866,9 @@ static void esw_set_flow_group_source_port(struct mlx5_eswitch *esw,
 			 match_criteria_enable,
 			 MLX5_MATCH_MISC_PARAMETERS_2);
 
-		MLX5_SET_TO_ONES(fte_match_param, match_criteria,
-				 misc_parameters_2.metadata_reg_c_0);
+		MLX5_SET(fte_match_param, match_criteria,
+			 misc_parameters_2.metadata_reg_c_0,
+			 mlx5_eswitch_get_vport_metadata_mask());
 	} else {
 		MLX5_SET(create_flow_group_in, flow_group_in,
 			 match_criteria_enable,
@@ -1134,7 +1150,8 @@ struct mlx5_flow_handle *
 			 mlx5_eswitch_get_vport_metadata_for_match(esw, vport));
 
 		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters_2);
-		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
+		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
+			 mlx5_eswitch_get_vport_metadata_mask());
 
 		spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
 	} else {
@@ -1604,11 +1621,17 @@ static int esw_vport_add_ingress_acl_modify_metadata(struct mlx5_eswitch *esw,
 	static const struct mlx5_flow_spec spec = {};
 	struct mlx5_flow_act flow_act = {};
 	int err = 0;
+	u32 key;
+
+	key = mlx5_eswitch_get_vport_metadata_for_match(esw, vport->vport);
+	key >>= SOURCE_PORT_METADATA_OFFSET;
 
 	MLX5_SET(set_action_in, action, action_type, MLX5_ACTION_TYPE_SET);
-	MLX5_SET(set_action_in, action, field, MLX5_ACTION_IN_FIELD_METADATA_REG_C_0);
-	MLX5_SET(set_action_in, action, data,
-		 mlx5_eswitch_get_vport_metadata_for_match(esw, vport->vport));
+	MLX5_SET(set_action_in, action, field,
+		 MLX5_ACTION_IN_FIELD_METADATA_REG_C_0);
+	MLX5_SET(set_action_in, action, data, key);
+	MLX5_SET(set_action_in, action, offset, SOURCE_PORT_METADATA_OFFSET);
+	MLX5_SET(set_action_in, action, length, SOURCE_PORT_METADATA_BITS);
 
 	vport->ingress.offloads.modify_metadata =
 		mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_ESW_INGRESS,
@@ -2465,9 +2488,47 @@ bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw)
 }
 EXPORT_SYMBOL(mlx5_eswitch_vport_match_metadata_enabled);
 
-u32 mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
+u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
 					      u16 vport_num)
 {
-	return ((MLX5_CAP_GEN(esw->dev, vhca_id) & 0xffff) << 16) | vport_num;
+	u32 vport_num_mask = GENMASK(VPORT_BITS - 1, 0);
+	u32 vhca_id_mask = GENMASK(VHCA_ID_BITS - 1, 0);
+	u32 vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
+	u32 val;
+
+	/* Make sure the vhca_id fits the VHCA_ID_BITS */
+	WARN_ON_ONCE(vhca_id >= BIT(VHCA_ID_BITS));
+
+	/* Trim vhca_id to VHCA_ID_BITS */
+	vhca_id &= vhca_id_mask;
+
+	/* Make sure pf and ecpf map to end of VPORT_BITS range so they
+	 * don't overlap with VF numbers, and themselves, after trimming.
+	 */
+	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) <
+		     vport_num_mask - 1);
+	WARN_ON_ONCE((MLX5_VPORT_ECPF & vport_num_mask) <
+		     vport_num_mask - 1);
+	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) ==
+		     (MLX5_VPORT_ECPF & vport_num_mask));
+
+	/* Make sure that the VF vport_num fits VPORT_BITS and don't
+	 * overlap with pf and ecpf.
+	 */
+	if (vport_num != MLX5_VPORT_UPLINK &&
+	    vport_num != MLX5_VPORT_ECPF)
+		WARN_ON_ONCE(vport_num >= vport_num_mask - 1);
+
+	/* We can now trim vport_num to VPORT_BITS */
+	vport_num &= vport_num_mask;
+
+	val = (vhca_id << VPORT_BITS) | vport_num;
+	return val << (32 - SOURCE_PORT_METADATA_BITS);
 }
 EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match);
+
+u32 mlx5_eswitch_get_vport_metadata_mask(void)
+{
+	return GENMASK(31, 32 - SOURCE_PORT_METADATA_BITS);
+}
+EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_mask);
diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
index 98e667b..080b67c 100644
--- a/include/linux/mlx5/eswitch.h
+++ b/include/linux/mlx5/eswitch.h
@@ -71,8 +71,9 @@ enum devlink_eswitch_encap_mode
 mlx5_eswitch_get_encap_mode(const struct mlx5_core_dev *dev);
 
 bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw);
-u32 mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
+u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
 					      u16 vport_num);
+u32 mlx5_eswitch_get_vport_metadata_mask(void);
 u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw);
 #else  /* CONFIG_MLX5_ESWITCH */
 
@@ -94,11 +95,17 @@ static inline u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
 };
 
 static inline u32
-mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
+mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
 					  int vport_num)
 {
 	return 0;
 };
+
+static inline u32
+mlx5_eswitch_get_vport_metadata_mask(void)
+{
+	return 0;
+}
 #endif /* CONFIG_MLX5_ESWITCH */
 
 #endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 04/13] net/mlx5: E-Switch, Get reg_c0 value on CQE
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (2 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 05/13] net/mlx5: E-Switch, Mark miss packets with new chain id mapping Paul Blakey
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

On RX side create a restore table in OFFLOADS namespace.
This table will match on all values for reg_c0 we will use,
and set it to the flow_tag. This flow tag can then be read on the CQE.

As there is no copy action from reg c0 to flow tag, instead we have to
set the flow tag explictily. We add an API so callers can add all the used
reg_c0 values (tags) and for each of those we add a restore rule.

This will be used in a following patch to save the miss chain mapping
tag on reg_c0 and from it restore the tc chain on the skb.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  15 +++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 143 +++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |   4 +-
 3 files changed, 151 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 4472710..cc446ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -189,6 +189,9 @@ struct mlx5_eswitch_fdb {
 };
 
 struct mlx5_esw_offload {
+	struct mlx5_flow_table *ft_offloads_restore;
+	struct mlx5_flow_group *restore_group;
+
 	struct mlx5_flow_table *ft_offloads;
 	struct mlx5_flow_group *vport_rx_group;
 	struct mlx5_eswitch_rep *vport_reps;
@@ -623,6 +626,12 @@ struct mlx5_vport *__must_check
 esw_vport_destroy_offloads_acl_tables(struct mlx5_eswitch *esw,
 				      struct mlx5_vport *vport);
 
+struct mlx5_flow_handle *
+esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag);
+
+u32
+esw_get_max_restore_tag(struct mlx5_eswitch *esw);
+
 #else  /* CONFIG_MLX5_ESWITCH */
 /* eswitch API stubs */
 static inline int  mlx5_eswitch_init(struct mlx5_core_dev *dev) { return 0; }
@@ -638,6 +647,12 @@ static inline const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev)
 
 static inline void mlx5_eswitch_update_num_of_vfs(struct mlx5_eswitch *esw, const int num_vfs) {}
 
+static struct mlx5_flow_handle *
+esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
 #endif /* CONFIG_MLX5_ESWITCH */
 
 #endif /* __MLX5_ESWITCH_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 873b19c..d6c0850 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -62,6 +62,7 @@
 #define SOURCE_PORT_METADATA_BITS (VHCA_ID_BITS + VPORT_BITS)
 #define SOURCE_PORT_METADATA_OFFSET (32 - SOURCE_PORT_METADATA_BITS)
 #define CHAIN_TAG_METADATA_BITS (32 - SOURCE_PORT_METADATA_BITS)
+#define CHAIN_TAG_METADATA_MASK GENMASK(CHAIN_TAG_METADATA_BITS - 1, 0)
 
 static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
 						     u16 vport_num)
@@ -851,6 +852,49 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 	return err;
 }
 
+struct mlx5_flow_handle *
+esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag)
+{
+	struct mlx5_flow_act flow_act = { .flags = FLOW_ACT_NO_APPEND, };
+	struct mlx5_flow_table *ft = esw->offloads.ft_offloads_restore;
+	struct mlx5_flow_context *flow_context;
+	struct mlx5_flow_spec s, *spec = &s;
+	struct mlx5_flow_handle *flow_rule;
+	struct mlx5_flow_destination dest;
+	void *misc;
+
+	memset(spec, 0, sizeof(*spec));
+	misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+			    misc_parameters_2);
+	MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
+		 CHAIN_TAG_METADATA_MASK);
+	misc = MLX5_ADDR_OF(fte_match_param, spec->match_value,
+			    misc_parameters_2);
+	MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0, tag);
+	spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+
+	flow_context = &spec->flow_context;
+	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
+	flow_context->flow_tag = tag;
+	dest.type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
+	dest.ft = esw->offloads.ft_offloads;
+	flow_rule = mlx5_add_flow_rules(ft, spec, &flow_act, &dest, 1);
+
+	if (IS_ERR(flow_rule))
+		esw_warn(esw->dev,
+			 "Failed to create restore rule for tag: %d, err(%d)\n",
+			 tag, (int)PTR_ERR(flow_rule));
+
+	return flow_rule;
+}
+
+u32
+esw_get_max_restore_tag(struct mlx5_eswitch *esw)
+{
+	return CHAIN_TAG_METADATA_MASK;
+}
+
 #define MAX_PF_SQ 256
 #define MAX_SQ_NVPORTS 32
 
@@ -1073,6 +1117,7 @@ static int esw_create_offloads_table(struct mlx5_eswitch *esw, int nvports)
 	}
 
 	ft_attr.max_fte = nvports + MLX5_ESW_MISS_FLOWS;
+	ft_attr.prio = 1;
 
 	ft_offloads = mlx5_create_flow_table(ns, &ft_attr);
 	if (IS_ERR(ft_offloads)) {
@@ -1177,6 +1222,81 @@ struct mlx5_flow_handle *
 	return flow_rule;
 }
 
+static void esw_destroy_restore_table(struct mlx5_eswitch *esw)
+{
+	struct mlx5_esw_offload *offloads = &esw->offloads;
+
+	mlx5_destroy_flow_group(offloads->restore_group);
+	mlx5_destroy_flow_table(offloads->ft_offloads_restore);
+}
+
+static int esw_create_restore_table(struct mlx5_eswitch *esw)
+{
+	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+	struct mlx5_flow_table_attr ft_attr = {};
+	struct mlx5_core_dev *dev = esw->dev;
+	struct mlx5_flow_namespace *ns;
+	void *match_criteria, *misc;
+	struct mlx5_flow_table *ft;
+	struct mlx5_flow_group *g;
+	u32 *flow_group_in;
+	int err = 0;
+
+	ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_OFFLOADS);
+	if (!ns) {
+		esw_warn(esw->dev, "Failed to get offloads flow namespace\n");
+		return -EOPNOTSUPP;
+	}
+
+	flow_group_in = kvzalloc(inlen, GFP_KERNEL);
+	if (!flow_group_in) {
+		err = -ENOMEM;
+		goto out_free;
+	}
+
+	ft_attr.max_fte = 1 << CHAIN_TAG_METADATA_BITS;
+	ft = mlx5_create_flow_table(ns, &ft_attr);
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
+		esw_warn(esw->dev, "Failed to create restore table, err %d\n",
+			 err);
+		goto out_free;
+	}
+
+	memset(flow_group_in, 0, inlen);
+	match_criteria = MLX5_ADDR_OF(create_flow_group_in, flow_group_in,
+				      match_criteria);
+	misc = MLX5_ADDR_OF(fte_match_param, match_criteria,
+			    misc_parameters_2);
+
+	MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
+		 CHAIN_TAG_METADATA_MASK);
+	MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 0);
+	MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index,
+		 ft_attr.max_fte - 1);
+	MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable,
+		 MLX5_MATCH_MISC_PARAMETERS_2);
+	g = mlx5_create_flow_group(ft, flow_group_in);
+	if (IS_ERR(g)) {
+		err = PTR_ERR(g);
+		esw_warn(dev, "Failed to create restore flow group, err: %d\n",
+			 err);
+		goto err_group;
+	}
+
+	esw->offloads.ft_offloads_restore = ft;
+	esw->offloads.restore_group = g;
+
+	return 0;
+
+err_group:
+	mlx5_destroy_flow_table(ft);
+out_free:
+	kvfree(flow_group_in);
+
+	return err;
+}
+
 static int esw_offloads_start(struct mlx5_eswitch *esw,
 			      struct netlink_ext_ack *extack)
 {
@@ -1934,13 +2054,17 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw)
 	if (err)
 		return err;
 
-	err = esw_create_offloads_fdb_tables(esw, total_vports);
+	err = esw_create_offloads_table(esw, total_vports);
 	if (err)
-		goto create_fdb_err;
+		goto create_offloads_err;
 
-	err = esw_create_offloads_table(esw, total_vports);
+	err = esw_create_restore_table(esw);
 	if (err)
-		goto create_ft_err;
+		goto create_restore_err;
+
+	err = esw_create_offloads_fdb_tables(esw, total_vports);
+	if (err)
+		goto create_fdb_err;
 
 	err = esw_create_vport_rx_group(esw, total_vports);
 	if (err)
@@ -1949,12 +2073,12 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw)
 	return 0;
 
 create_fg_err:
-	esw_destroy_offloads_table(esw);
-
-create_ft_err:
 	esw_destroy_offloads_fdb_tables(esw);
-
 create_fdb_err:
+	esw_destroy_restore_table(esw);
+create_restore_err:
+	esw_destroy_offloads_table(esw);
+create_offloads_err:
 	esw_destroy_uplink_offloads_acl_tables(esw);
 
 	return err;
@@ -1963,8 +2087,9 @@ static int esw_offloads_steering_init(struct mlx5_eswitch *esw)
 static void esw_offloads_steering_cleanup(struct mlx5_eswitch *esw)
 {
 	esw_destroy_vport_rx_group(esw);
-	esw_destroy_offloads_table(esw);
 	esw_destroy_offloads_fdb_tables(esw);
+	esw_destroy_restore_table(esw);
+	esw_destroy_offloads_table(esw);
 	esw_destroy_uplink_offloads_acl_tables(esw);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c7a16ae..4b2e7e1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -111,8 +111,8 @@
 #define ANCHOR_MIN_LEVEL (BY_PASS_MIN_LEVEL + 1)
 
 #define OFFLOADS_MAX_FT 1
-#define OFFLOADS_NUM_PRIOS 1
-#define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + 1)
+#define OFFLOADS_NUM_PRIOS 2
+#define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS)
 
 #define LAG_PRIO_NUM_LEVELS 1
 #define LAG_NUM_PRIOS 1
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 05/13] net/mlx5: E-Switch, Mark miss packets with new chain id mapping
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (3 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 04/13] net/mlx5: E-Switch, Get reg_c0 value on CQE Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 06/13] net/mlx5e: Rx, Split rep rx mpwqe handler from nic Paul Blakey
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Currently, if we miss in hardware after jumping to some chain,
we continue in chain 0 in software. This is wrong, and with the new
tc skb extension we can now restore the chain id on the skb, so
tc can continue with in the correct chain.

To restore the chain id in software after a miss in hardware, we create
a register mapping from 32bit chain ids to 16bit of reg_c0 (that
survives loopback), to 32bit chain ids. We then mark packets that
miss on some chain with the current chain id mapping on their reg_c0
field. Using this mapping, we will support up to 64K concurrent
chains.

This register survives loopback and gets to the CQE on flow_tag
via the eswitch restore rules.

In next commit, we will reverse the mapping we got on the CQE
to a chain id and tell tc to continue in the sw chain where we
left off via the tc skb extension.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |   8 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h    |  12 ++
 .../mellanox/mlx5/core/eswitch_offloads_chains.c   | 130 ++++++++++++++++++++-
 .../mellanox/mlx5/core/eswitch_offloads_chains.h   |   4 +-
 4 files changed, 150 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 26f559b..427432f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -153,6 +153,14 @@ struct mlx5e_tc_flow_parse_attr {
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
 #define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(16)
 
+struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
+	[CHAIN_TO_REG] = {
+		.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_0,
+		.moffset = 0,
+		.mlen = 2,
+	},
+};
+
 struct mlx5e_hairpin {
 	struct mlx5_hairpin *pair;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 262cdb7..e2dbbae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -91,6 +91,18 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 
 void mlx5e_tc_reoffload_flows_work(struct work_struct *work);
 
+enum mlx5e_tc_attr_to_reg {
+	CHAIN_TO_REG,
+};
+
+struct mlx5e_tc_attr_to_reg_mapping {
+	int mfield; /* rewrite field */
+	int moffset; /* offset of mfield */
+	int mlen; /* bytes to rewrite/match */
+};
+
+extern struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[];
+
 bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv,
 				    struct net_device *out_dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
index 3a60eb5..4f9b896 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.c
@@ -6,14 +6,17 @@
 #include <linux/mlx5/fs.h>
 
 #include "eswitch_offloads_chains.h"
+#include "lib/mapping.h"
 #include "mlx5_core.h"
 #include "fs_core.h"
 #include "eswitch.h"
 #include "en.h"
+#include "en_tc.h"
 
 #define esw_chains_priv(esw) ((esw)->fdb_table.offloads.esw_chains_priv)
 #define esw_chains_lock(esw) (esw_chains_priv(esw)->lock)
 #define esw_chains_ht(esw) (esw_chains_priv(esw)->chains_ht)
+#define esw_chains_mapping(esw) (esw_chains_priv(esw)->chains_mapping)
 #define esw_prios_ht(esw) (esw_chains_priv(esw)->prios_ht)
 #define fdb_pool_left(esw) (esw_chains_priv(esw)->fdb_left)
 #define tc_slow_fdb(esw) ((esw)->fdb_table.offloads.slow_fdb)
@@ -44,6 +47,7 @@ struct mlx5_esw_chains_priv {
 	struct mutex lock;
 
 	struct mlx5_flow_table *tc_end_fdb;
+	struct mapping_ctx *chains_mapping;
 
 	int fdb_left[ARRAY_SIZE(ESW_POOLS)];
 };
@@ -54,9 +58,12 @@ struct fdb_chain {
 	u32 chain;
 
 	int ref;
+	int id;
 
 	struct mlx5_eswitch *esw;
 	struct list_head prios_list;
+	struct mlx5_flow_handle *restore_rule;
+	struct mlx5_modify_hdr *miss_modify_hdr;
 };
 
 struct fdb_prio_key {
@@ -255,6 +262,70 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 	mlx5_destroy_flow_table(fdb);
 }
 
+static int
+create_fdb_chain_restore(struct fdb_chain *fdb_chain)
+{
+	char modact[MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)];
+	struct mlx5_eswitch *esw = fdb_chain->esw;
+	struct mlx5_modify_hdr *mod_hdr;
+	u32 index;
+	int err;
+
+	if (fdb_chain->chain == mlx5_esw_chains_get_ft_chain(esw))
+		return 0;
+
+	err = mapping_add(esw_chains_mapping(esw), &fdb_chain->chain, &index);
+	if (err)
+		return err;
+	if (index == MLX5_FS_DEFAULT_FLOW_TAG) {
+		/* we got the special default flow tag id, so we won't know
+		 * if we actually marked the packet with the restore rule
+		 * we create.
+		 *
+		 * This case isn't possible with MLX5_FS_DEFAULT_FLOW_TAG = 0.
+		 */
+		err = mapping_add(esw_chains_mapping(esw),
+				  &fdb_chain->chain, &index);
+		mapping_remove(esw_chains_mapping(esw),
+			       MLX5_FS_DEFAULT_FLOW_TAG);
+		if (err)
+			return err;
+	}
+
+	fdb_chain->id = index;
+
+	MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET);
+	MLX5_SET(set_action_in, modact, field,
+		 mlx5e_tc_attr_to_reg_mappings[CHAIN_TO_REG].mfield);
+	MLX5_SET(set_action_in, modact, offset,
+		 mlx5e_tc_attr_to_reg_mappings[CHAIN_TO_REG].moffset * 8);
+	MLX5_SET(set_action_in, modact, length,
+		 mlx5e_tc_attr_to_reg_mappings[CHAIN_TO_REG].mlen * 8);
+	MLX5_SET(set_action_in, modact, data, fdb_chain->id);
+	mod_hdr = mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_FDB,
+					   1, modact);
+	if (IS_ERR(mod_hdr)) {
+		err = PTR_ERR(mod_hdr);
+		goto err_mod_hdr;
+	}
+	fdb_chain->miss_modify_hdr = mod_hdr;
+
+	fdb_chain->restore_rule = esw_add_restore_rule(esw, fdb_chain->id);
+	if (IS_ERR(fdb_chain->restore_rule)) {
+		err = PTR_ERR(fdb_chain->restore_rule);
+		goto err_rule;
+	}
+
+	return 0;
+
+err_rule:
+	mlx5_modify_header_dealloc(esw->dev, fdb_chain->miss_modify_hdr);
+err_mod_hdr:
+	/* Datapath can't find this mapping, so we can safely remove it */
+	mapping_remove(esw_chains_mapping(esw), fdb_chain->id);
+	return err;
+}
+
 static struct fdb_chain *
 mlx5_esw_chains_create_fdb_chain(struct mlx5_eswitch *esw, u32 chain)
 {
@@ -269,6 +340,10 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 	fdb_chain->chain = chain;
 	INIT_LIST_HEAD(&fdb_chain->prios_list);
 
+	err = create_fdb_chain_restore(fdb_chain);
+	if (err)
+		goto err_restore;
+
 	err = rhashtable_insert_fast(&esw_chains_ht(esw), &fdb_chain->node,
 				     chain_params);
 	if (err)
@@ -277,6 +352,12 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 	return fdb_chain;
 
 err_insert:
+	if (fdb_chain->chain != mlx5_esw_chains_get_ft_chain(esw)) {
+		mlx5_del_flow_rules(fdb_chain->restore_rule);
+		mlx5_modify_header_dealloc(esw->dev,
+					   fdb_chain->miss_modify_hdr);
+	}
+err_restore:
 	kvfree(fdb_chain);
 	return ERR_PTR(err);
 }
@@ -288,6 +369,15 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 
 	rhashtable_remove_fast(&esw_chains_ht(esw), &fdb_chain->node,
 			       chain_params);
+
+	if (fdb_chain->chain != mlx5_esw_chains_get_ft_chain(esw)) {
+		mlx5_del_flow_rules(fdb_chain->restore_rule);
+		mlx5_modify_header_dealloc(esw->dev,
+					   fdb_chain->miss_modify_hdr);
+
+		mapping_remove(esw_chains_mapping(esw), fdb_chain->id);
+	}
+
 	kvfree(fdb_chain);
 }
 
@@ -310,10 +400,12 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 }
 
 static struct mlx5_flow_handle *
-mlx5_esw_chains_add_miss_rule(struct mlx5_flow_table *fdb,
+mlx5_esw_chains_add_miss_rule(struct fdb_chain *fdb_chain,
+			      struct mlx5_flow_table *fdb,
 			      struct mlx5_flow_table *next_fdb)
 {
 	static const struct mlx5_flow_spec spec = {};
+	struct mlx5_eswitch *esw = fdb_chain->esw;
 	struct mlx5_flow_destination dest = {};
 	struct mlx5_flow_act act = {};
 
@@ -322,6 +414,11 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 	dest.type  = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE;
 	dest.ft = next_fdb;
 
+	if (fdb_chain->chain != mlx5_esw_chains_get_ft_chain(esw)) {
+		act.modify_hdr = fdb_chain->miss_modify_hdr;
+		act.action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+	}
+
 	return mlx5_add_flow_rules(fdb, &spec, &act, &dest, 1);
 }
 
@@ -345,7 +442,8 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 	list_for_each_entry_continue_reverse(pos,
 					     &fdb_chain->prios_list,
 					     list) {
-		miss_rules[n] = mlx5_esw_chains_add_miss_rule(pos->fdb,
+		miss_rules[n] = mlx5_esw_chains_add_miss_rule(fdb_chain,
+							      pos->fdb,
 							      next_fdb);
 		if (IS_ERR(miss_rules[n])) {
 			err = PTR_ERR(miss_rules[n]);
@@ -459,7 +557,7 @@ static unsigned int mlx5_esw_chains_get_level_range(struct mlx5_eswitch *esw)
 	}
 
 	/* Add miss rule to next_fdb */
-	miss_rule = mlx5_esw_chains_add_miss_rule(fdb, next_fdb);
+	miss_rule = mlx5_esw_chains_add_miss_rule(fdb_chain, fdb, next_fdb);
 	if (IS_ERR(miss_rule)) {
 		err = PTR_ERR(miss_rule);
 		goto err_miss_rule;
@@ -624,6 +722,7 @@ struct mlx5_flow_table *
 	struct mlx5_esw_chains_priv *chains_priv;
 	struct mlx5_core_dev *dev = esw->dev;
 	u32 max_flow_counter, fdb_max;
+	struct mapping_ctx *mapping;
 	int err;
 
 	chains_priv = kzalloc(sizeof(*chains_priv), GFP_KERNEL);
@@ -660,10 +759,20 @@ struct mlx5_flow_table *
 	if (err)
 		goto init_prios_ht_err;
 
+	mapping = mapping_idr_create(sizeof(u32),
+				     esw_get_max_restore_tag(esw), true);
+	if (IS_ERR(mapping)) {
+		err = PTR_ERR(mapping);
+		goto mapping_err;
+	}
+	esw_chains_mapping(esw) = mapping;
+
 	mutex_init(&esw_chains_lock(esw));
 
 	return 0;
 
+mapping_err:
+	rhashtable_destroy(&esw_prios_ht(esw));
 init_prios_ht_err:
 	rhashtable_destroy(&esw_chains_ht(esw));
 init_chains_ht_err:
@@ -675,6 +784,7 @@ struct mlx5_flow_table *
 mlx5_esw_chains_cleanup(struct mlx5_eswitch *esw)
 {
 	mutex_destroy(&esw_chains_lock(esw));
+	mapping_idr_destroy(esw_chains_mapping(esw));
 	rhashtable_destroy(&esw_prios_ht(esw));
 	rhashtable_destroy(&esw_chains_ht(esw));
 
@@ -756,3 +866,17 @@ struct mlx5_flow_table *
 	mlx5_esw_chains_close(esw);
 	mlx5_esw_chains_cleanup(esw);
 }
+
+int mlx5_eswitch_get_chain_for_tag(struct mlx5_eswitch *esw, u32 tag,
+				   u32 *chain)
+{
+	int err;
+
+	err = mapping_find(esw_chains_mapping(esw), tag, chain);
+	if (err) {
+		esw_warn(esw->dev, "Can't find chain for tag: %d\n", tag);
+		return -ENOENT;
+	}
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
index 2e13097..da45e49 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_chains.h
@@ -26,5 +26,7 @@ struct mlx5_flow_table *
 int mlx5_esw_chains_create(struct mlx5_eswitch *esw);
 void mlx5_esw_chains_destroy(struct mlx5_eswitch *esw);
 
-#endif /* __ML5_ESW_CHAINS_H__ */
+int
+mlx5_eswitch_get_chain_for_tag(struct mlx5_eswitch *esw, u32 tag, u32 *chain);
 
+#endif /* __ML5_ESW_CHAINS_H__ */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 06/13] net/mlx5e: Rx, Split rep rx mpwqe handler from nic
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (4 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 05/13] net/mlx5: E-Switch, Mark miss packets with new chain id mapping Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 07/13] net/mlx5: E-Switch, Restore chain id on miss Paul Blakey
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Copy the current rep mpwqe rx handler which is also used by nic
profile. In the next patch, we will add rep specific logic, just
for the rep profile rx handler.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c  | 54 ++++++++++++++++++++++++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 446eb4d..f33b865 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1823,7 +1823,7 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv)
 	.update_rx		= mlx5e_update_rep_rx,
 	.update_stats           = mlx5e_rep_update_hw_counters,
 	.rx_handlers.handle_rx_cqe       = mlx5e_handle_rx_cqe_rep,
-	.rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq,
+	.rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq_rep,
 	.max_tc			= 1,
 	.rq_groups		= MLX5E_NUM_RQ_GROUPS(REGULAR),
 };
@@ -1841,7 +1841,7 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv)
 	.update_stats           = mlx5e_uplink_rep_update_hw_counters,
 	.update_carrier	        = mlx5e_update_carrier,
 	.rx_handlers.handle_rx_cqe       = mlx5e_handle_rx_cqe_rep,
-	.rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq,
+	.rx_handlers.handle_rx_cqe_mpwqe = mlx5e_handle_rx_cqe_mpwrq_rep,
 	.max_tc			= MLX5E_MAX_NUM_TC,
 	.rq_groups		= MLX5E_NUM_RQ_GROUPS(REGULAR),
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index 31f83c8..5e29141 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -190,6 +190,8 @@ struct mlx5e_rep_sq {
 void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv);
 
 void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
+void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq,
+				   struct mlx5_cqe64 *cqe);
 
 int mlx5e_rep_encap_entry_attach(struct mlx5e_priv *priv,
 				 struct mlx5e_encap_entry *e);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 9e99601..ad84a55 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1230,6 +1230,60 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 wq_cyc_pop:
 	mlx5_wq_cyc_pop(wq);
 }
+
+void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq,
+				   struct mlx5_cqe64 *cqe)
+{
+	u16 cstrides       = mpwrq_get_cqe_consumed_strides(cqe);
+	u16 wqe_id         = be16_to_cpu(cqe->wqe_id);
+	struct mlx5e_mpw_info *wi = &rq->mpwqe.info[wqe_id];
+	u16 stride_ix      = mpwrq_get_cqe_stride_index(cqe);
+	u32 wqe_offset     = stride_ix << rq->mpwqe.log_stride_sz;
+	u32 head_offset    = wqe_offset & (PAGE_SIZE - 1);
+	u32 page_idx       = wqe_offset >> PAGE_SHIFT;
+	struct mlx5e_rx_wqe_ll *wqe;
+	struct mlx5_wq_ll *wq;
+	struct sk_buff *skb;
+	u16 cqe_bcnt;
+
+	wi->consumed_strides += cstrides;
+
+	if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
+		trigger_report(rq, cqe);
+		rq->stats->wqe_err++;
+		goto mpwrq_cqe_out;
+	}
+
+	if (unlikely(mpwrq_is_filler_cqe(cqe))) {
+		struct mlx5e_rq_stats *stats = rq->stats;
+
+		stats->mpwqe_filler_cqes++;
+		stats->mpwqe_filler_strides += cstrides;
+		goto mpwrq_cqe_out;
+	}
+
+	cqe_bcnt = mpwrq_get_cqe_byte_cnt(cqe);
+
+	skb = INDIRECT_CALL_2(rq->mpwqe.skb_from_cqe_mpwrq,
+			      mlx5e_skb_from_cqe_mpwrq_linear,
+			      mlx5e_skb_from_cqe_mpwrq_nonlinear,
+			      rq, wi, cqe_bcnt, head_offset, page_idx);
+	if (!skb)
+		goto mpwrq_cqe_out;
+
+	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+
+	napi_gro_receive(rq->cq.napi, skb);
+
+mpwrq_cqe_out:
+	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
+		return;
+
+	wq  = &rq->mpwqe.wq;
+	wqe = mlx5_wq_ll_get_wqe(wq, wqe_id);
+	mlx5e_free_rx_mpwqe(rq, wi, true);
+	mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index);
+}
 #endif
 
 struct sk_buff *
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 07/13] net/mlx5: E-Switch, Restore chain id on miss
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (5 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 06/13] net/mlx5e: Rx, Split rep rx mpwqe handler from nic Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 08/13] net/mlx5e: Allow re-allocating mod header actions Paul Blakey
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Chain ids are mapped to the lower part of reg C, and after loopback
are copied to to CQE via a restore rule's flow_tag.

To let tc continue in the correct chain, we find the corresponding
chain id in the eswitch chain id <-> reg C mapping, and set the SKB's
tc extension chain to it.

That tells tc to continue processing from this set chain.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |  6 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 43 +++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h |  2 ++
 3 files changed, 51 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ad84a55..4402a53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1223,6 +1223,9 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	if (rep->vlan && skb_vlan_tag_present(skb))
 		skb_vlan_pop(skb);
 
+	if (!mlx5e_tc_rep_update_skb(cqe, skb))
+		goto free_wqe;
+
 	napi_gro_receive(rq->cq.napi, skb);
 
 free_wqe:
@@ -1273,6 +1276,9 @@ void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq,
 
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
 
+	if (!mlx5e_tc_rep_update_skb(cqe, skb))
+		goto mpwrq_cqe_out;
+
 	napi_gro_receive(rq->cq.napi, skb);
 
 mpwrq_cqe_out:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 427432f..f8a3b9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4290,3 +4290,46 @@ void mlx5e_tc_reoffload_flows_work(struct work_struct *work)
 	}
 	mutex_unlock(&rpriv->unready_flows_lock);
 }
+
+bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe,
+			     struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	struct tc_skb_ext *tc_skb_ext;
+	struct mlx5_eswitch *esw;
+	struct mlx5e_priv *priv;
+	u32 chain = 0, reg_c0;
+	int err;
+
+	reg_c0 = (be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK);
+	if (reg_c0 == MLX5_FS_DEFAULT_FLOW_TAG)
+		reg_c0 = 0;
+
+	if (!reg_c0)
+		return true;
+
+	priv = netdev_priv(skb->dev);
+	esw = priv->mdev->priv.eswitch;
+
+	err = mlx5_eswitch_get_chain_for_tag(esw, reg_c0, &chain);
+	if (err) {
+		netdev_dbg(priv->netdev,
+			   "Couldn't find chain for chain tag: %d, err: %d\n",
+			   reg_c0, err);
+		return false;
+	}
+
+	if (!chain)
+		return true;
+
+	tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
+	if (!tc_skb_ext) {
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	tc_skb_ext->chain = chain;
+#endif /* CONFIG_NET_TC_SKB_EXT */
+
+	return true;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index e2dbbae..9d5fcf6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -106,6 +106,8 @@ struct mlx5e_tc_attr_to_reg_mapping {
 bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv,
 				    struct net_device *out_dev);
 
+bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb);
+
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 08/13] net/mlx5e: Allow re-allocating mod header actions
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (6 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 07/13] net/mlx5: E-Switch, Restore chain id on miss Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 09/13] net/mlx5e: Move tc tunnel parsing logic with the rest at tc_tun module Paul Blakey
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Currently the size of the mod header actions array is deduced from the
number of parsed TC header rewrite actions. However, mod header actions
are also used for setting HW register values. Support the dynamic
reallocation of the mod header array as a pre-step for adding HW
registers mod actions.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 120 +++++++++++++-----------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h |  11 +++
 2 files changed, 76 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index f8a3b9c..bc2d71a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -144,9 +144,7 @@ struct mlx5e_tc_flow_parse_attr {
 	const struct ip_tunnel_info *tun_info[MLX5_MAX_FLOW_FWD_VPORTS];
 	struct net_device *filter_dev;
 	struct mlx5_flow_spec spec;
-	int num_mod_hdr_actions;
-	int max_mod_hdr_actions;
-	void *mod_hdr_actions;
+	struct mlx5e_tc_mod_hdr_acts mod_hdr_acts;
 	int mirred_ifindex[MLX5_MAX_FLOW_FWD_VPORTS];
 };
 
@@ -369,10 +367,10 @@ static int mlx5e_attach_mod_hdr(struct mlx5e_priv *priv,
 	struct mod_hdr_key key;
 	u32 hash_key;
 
-	num_actions  = parse_attr->num_mod_hdr_actions;
+	num_actions  = parse_attr->mod_hdr_acts.num_actions;
 	actions_size = MLX5_MH_ACT_SZ * num_actions;
 
-	key.actions = parse_attr->mod_hdr_actions;
+	key.actions = parse_attr->mod_hdr_acts.actions;
 	key.num_actions = num_actions;
 
 	hash_key = hash_mod_hdr_info(&key);
@@ -962,7 +960,7 @@ static void mlx5e_hairpin_flow_del(struct mlx5e_priv *priv,
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
 		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
 		flow_act.modify_hdr = attr->modify_hdr;
-		kfree(parse_attr->mod_hdr_actions);
+		dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
 		if (err)
 			return err;
 	}
@@ -1228,7 +1226,7 @@ static void remove_unready_flow(struct mlx5e_tc_flow *flow)
 
 	if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) {
 		err = mlx5e_attach_mod_hdr(priv, flow, parse_attr);
-		kfree(parse_attr->mod_hdr_actions);
+		dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
 		if (err)
 			return err;
 	}
@@ -2352,25 +2350,26 @@ static bool cmp_val_mask(void *valp, void *maskp, void *matchvalp,
 	OFFLOAD(UDP_DPORT, 16, U16_MAX, udp.dest,   0, udp_dport),
 };
 
-/* On input attr->max_mod_hdr_actions tells how many HW actions can be parsed at
- * max from the SW pedit action. On success, attr->num_mod_hdr_actions
- * says how many HW actions were actually parsed.
- */
-static int offload_pedit_fields(struct pedit_headers_action *hdrs,
+static int offload_pedit_fields(struct mlx5e_priv *priv,
+				int namespace,
+				struct pedit_headers_action *hdrs,
 				struct mlx5e_tc_flow_parse_attr *parse_attr,
 				u32 *action_flags,
 				struct netlink_ext_ack *extack)
 {
 	struct pedit_headers *set_masks, *add_masks, *set_vals, *add_vals;
-	int i, action_size, nactions, max_actions, first, last, next_z;
+	int i, action_size, first, last, next_z;
 	void *headers_c, *headers_v, *action, *vals_p;
 	u32 *s_masks_p, *a_masks_p, s_mask, a_mask;
+	struct mlx5e_tc_mod_hdr_acts *mod_acts;
 	struct mlx5_fields *f;
 	unsigned long mask;
 	__be32 mask_be32;
 	__be16 mask_be16;
+	int err;
 	u8 cmd;
 
+	mod_acts = &parse_attr->mod_hdr_acts;
 	headers_c = get_match_headers_criteria(*action_flags, &parse_attr->spec);
 	headers_v = get_match_headers_value(*action_flags, &parse_attr->spec);
 
@@ -2380,11 +2379,6 @@ static int offload_pedit_fields(struct pedit_headers_action *hdrs,
 	add_vals = &hdrs[1].vals;
 
 	action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
-	action = parse_attr->mod_hdr_actions +
-		 parse_attr->num_mod_hdr_actions * action_size;
-
-	max_actions = parse_attr->max_mod_hdr_actions;
-	nactions = parse_attr->num_mod_hdr_actions;
 
 	for (i = 0; i < ARRAY_SIZE(fields); i++) {
 		bool skip;
@@ -2410,13 +2404,6 @@ static int offload_pedit_fields(struct pedit_headers_action *hdrs,
 			return -EOPNOTSUPP;
 		}
 
-		if (nactions == max_actions) {
-			NL_SET_ERR_MSG_MOD(extack,
-					   "too many pedit actions, can't offload");
-			printk(KERN_WARNING "mlx5: parsed %d pedit actions, can't do more\n", nactions);
-			return -EOPNOTSUPP;
-		}
-
 		skip = false;
 		if (s_mask) {
 			void *match_mask = headers_c + f->match_offset;
@@ -2463,6 +2450,18 @@ static int offload_pedit_fields(struct pedit_headers_action *hdrs,
 			return -EOPNOTSUPP;
 		}
 
+		err = alloc_mod_hdr_actions(priv->mdev, namespace, mod_acts);
+		if (err) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "too many pedit actions, can't offload");
+			mlx5_core_warn(priv->mdev,
+				       "mlx5: parsed %d pedit actions, can't do more\n",
+				       mod_acts->num_actions);
+			return err;
+		}
+
+		action = mod_acts->actions +
+			 (mod_acts->num_actions * action_size);
 		MLX5_SET(set_action_in, action, action_type, cmd);
 		MLX5_SET(set_action_in, action, field, f->field);
 
@@ -2485,11 +2484,9 @@ static int offload_pedit_fields(struct pedit_headers_action *hdrs,
 		else if (f->field_bsize == 8)
 			MLX5_SET(set_action_in, action, data, *(u8 *)vals_p >> first);
 
-		action += action_size;
-		nactions++;
+		++mod_acts->num_actions;
 	}
 
-	parse_attr->num_mod_hdr_actions = nactions;
 	return 0;
 }
 
@@ -2502,29 +2499,48 @@ static int mlx5e_flow_namespace_max_modify_action(struct mlx5_core_dev *mdev,
 		return MLX5_CAP_FLOWTABLE_NIC_RX(mdev, max_modify_header_actions);
 }
 
-static int alloc_mod_hdr_actions(struct mlx5e_priv *priv,
-				 struct pedit_headers_action *hdrs,
-				 int namespace,
-				 struct mlx5e_tc_flow_parse_attr *parse_attr)
+int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev,
+			  int namespace,
+			  struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
 {
-	int nkeys, action_size, max_actions;
+	int action_size, new_num_actions, max_hw_actions;
+	size_t new_sz, old_sz;
+	void *ret;
 
-	nkeys = hdrs[TCA_PEDIT_KEY_EX_CMD_SET].pedits +
-		hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits;
-	action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
+	if (mod_hdr_acts->num_actions < mod_hdr_acts->max_actions)
+		return 0;
 
-	max_actions = mlx5e_flow_namespace_max_modify_action(priv->mdev, namespace);
-	/* can get up to crazingly 16 HW actions in 32 bits pedit SW key */
-	max_actions = min(max_actions, nkeys * 16);
+	action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
 
-	parse_attr->mod_hdr_actions = kcalloc(max_actions, action_size, GFP_KERNEL);
-	if (!parse_attr->mod_hdr_actions)
+	max_hw_actions = mlx5e_flow_namespace_max_modify_action(mdev,
+								namespace);
+	new_num_actions = min(max_hw_actions,
+			      mod_hdr_acts->actions ?
+			      mod_hdr_acts->max_actions * 2 : 1);
+	if (mod_hdr_acts->max_actions == new_num_actions)
+		return -ENOSPC;
+
+	new_sz = action_size * new_num_actions;
+	old_sz = mod_hdr_acts->max_actions * action_size;
+	ret = krealloc(mod_hdr_acts->actions, new_sz, GFP_KERNEL);
+	if (!ret)
 		return -ENOMEM;
 
-	parse_attr->max_mod_hdr_actions = max_actions;
+	memset(ret + old_sz, 0, new_sz - old_sz);
+	mod_hdr_acts->actions = ret;
+	mod_hdr_acts->max_actions = new_num_actions;
+
 	return 0;
 }
 
+void dealloc_mod_hdr_actions(struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts)
+{
+	kfree(mod_hdr_acts->actions);
+	mod_hdr_acts->actions = NULL;
+	mod_hdr_acts->num_actions = 0;
+	mod_hdr_acts->max_actions = 0;
+}
+
 static const struct pedit_headers zero_masks = {};
 
 static int parse_tc_pedit_action(struct mlx5e_priv *priv,
@@ -2577,13 +2593,8 @@ static int alloc_tc_pedit_action(struct mlx5e_priv *priv, int namespace,
 	int err;
 	u8 cmd;
 
-	if (!parse_attr->mod_hdr_actions) {
-		err = alloc_mod_hdr_actions(priv, hdrs, namespace, parse_attr);
-		if (err)
-			goto out_err;
-	}
-
-	err = offload_pedit_fields(hdrs, parse_attr, action_flags, extack);
+	err = offload_pedit_fields(priv, namespace, hdrs, parse_attr,
+				   action_flags, extack);
 	if (err < 0)
 		goto out_dealloc_parsed_actions;
 
@@ -2603,8 +2614,7 @@ static int alloc_tc_pedit_action(struct mlx5e_priv *priv, int namespace,
 	return 0;
 
 out_dealloc_parsed_actions:
-	kfree(parse_attr->mod_hdr_actions);
-out_err:
+	dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
 	return err;
 }
 
@@ -2937,9 +2947,9 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv,
 		/* in case all pedit actions are skipped, remove the MOD_HDR
 		 * flag.
 		 */
-		if (parse_attr->num_mod_hdr_actions == 0) {
+		if (parse_attr->mod_hdr_acts.num_actions == 0) {
 			action &= ~MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
-			kfree(parse_attr->mod_hdr_actions);
+			dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
 		}
 	}
 
@@ -3525,9 +3535,9 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 		 * flag. we might have set split_count either by pedit or
 		 * pop/push. if there is no pop/push either, reset it too.
 		 */
-		if (parse_attr->num_mod_hdr_actions == 0) {
+		if (parse_attr->mod_hdr_acts.num_actions == 0) {
 			action &= ~MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
-			kfree(parse_attr->mod_hdr_actions);
+			dealloc_mod_hdr_actions(&parse_attr->mod_hdr_acts);
 			if (!((action & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) ||
 			      (action & MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH)))
 				attr->split_count = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 9d5fcf6..3848ec7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -108,6 +108,17 @@ bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv,
 
 bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb);
 
+struct mlx5e_tc_mod_hdr_acts {
+	int num_actions;
+	int max_actions;
+	void *actions;
+};
+
+int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev,
+			  int namespace,
+			  struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
+void dealloc_mod_hdr_actions(struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
+
 #else /* CONFIG_MLX5_ESWITCH */
 static inline int  mlx5e_tc_nic_init(struct mlx5e_priv *priv) { return 0; }
 static inline void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv) {}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 09/13] net/mlx5e: Move tc tunnel parsing logic with the rest at tc_tun module
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (7 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 08/13] net/mlx5e: Allow re-allocating mod header actions Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 10/13] net/mlx5e: Disallow inserting vxlan/vlan egress rules without decap/pop Paul Blakey
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Currently, tunnel parsing is split between en_tc and tc_tun. The next
patch will replace the tunnel fields matching with a register match,
and will not need this parsing.

Move the tunnel parsing logic to tc_tun as a pre-step for skipping
it in the next patch.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.c    | 112 ++++++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun.h    |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    | 109 +-------------------
 3 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
index af4ebd2..608d0e07c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c
@@ -469,10 +469,15 @@ int mlx5e_tc_tun_parse(struct net_device *filter_dev,
 		       struct mlx5e_priv *priv,
 		       struct mlx5_flow_spec *spec,
 		       struct flow_cls_offload *f,
-		       void *headers_c,
-		       void *headers_v, u8 *match_level)
+		       u8 *match_level)
 {
 	struct mlx5e_tc_tunnel *tunnel = mlx5e_get_tc_tun(filter_dev);
+	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+	void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+				       outer_headers);
+	void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value,
+				       outer_headers);
+	struct netlink_ext_ack *extack = f->common.extack;
 	int err = 0;
 
 	if (!tunnel) {
@@ -499,6 +504,109 @@ int mlx5e_tc_tun_parse(struct net_device *filter_dev,
 			goto out;
 	}
 
+	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
+		struct flow_match_control match;
+		u16 addr_type;
+
+		flow_rule_match_enc_control(rule, &match);
+		addr_type = match.key->addr_type;
+
+		/* For tunnel addr_type used same key id`s as for non-tunnel */
+		if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
+			struct flow_match_ipv4_addrs match;
+
+			flow_rule_match_enc_ipv4_addrs(rule, &match);
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+				 src_ipv4_src_ipv6.ipv4_layout.ipv4,
+				 ntohl(match.mask->src));
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+				 src_ipv4_src_ipv6.ipv4_layout.ipv4,
+				 ntohl(match.key->src));
+
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
+				 dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
+				 ntohl(match.mask->dst));
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
+				 dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
+				 ntohl(match.key->dst));
+
+			MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c,
+					 ethertype);
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
+				 ETH_P_IP);
+		} else if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
+			struct flow_match_ipv6_addrs match;
+
+			flow_rule_match_enc_ipv6_addrs(rule, &match);
+			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+					    src_ipv4_src_ipv6.ipv6_layout.ipv6),
+			       &match.mask->src, MLX5_FLD_SZ_BYTES(ipv6_layout,
+								   ipv6));
+			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+					    src_ipv4_src_ipv6.ipv6_layout.ipv6),
+			       &match.key->src, MLX5_FLD_SZ_BYTES(ipv6_layout,
+								  ipv6));
+
+			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
+					    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+			       &match.mask->dst, MLX5_FLD_SZ_BYTES(ipv6_layout,
+								   ipv6));
+			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+					    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+			       &match.key->dst, MLX5_FLD_SZ_BYTES(ipv6_layout,
+								  ipv6));
+
+			MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c,
+					 ethertype);
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
+				 ETH_P_IPV6);
+		}
+	}
+
+	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IP)) {
+		struct flow_match_ip match;
+
+		flow_rule_match_enc_ip(rule, &match);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_ecn,
+			 match.mask->tos & 0x3);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_ecn,
+			 match.key->tos & 0x3);
+
+		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_dscp,
+			 match.mask->tos >> 2);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_dscp,
+			 match.key->tos  >> 2);
+
+		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ttl_hoplimit,
+			 match.mask->ttl);
+		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ttl_hoplimit,
+			 match.key->ttl);
+
+		if (match.mask->ttl &&
+		    !MLX5_CAP_ESW_FLOWTABLE_FDB
+			(priv->mdev,
+			 ft_field_support.outer_ipv4_ttl)) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Matching on TTL is not supported");
+			err = -EOPNOTSUPP;
+			goto out;
+		}
+	}
+
+	/* Enforce DMAC when offloading incoming tunneled flows.
+	 * Flow counters require a match on the DMAC.
+	 */
+	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, dmac_47_16);
+	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, dmac_15_0);
+	ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
+				     dmac_47_16), priv->netdev->dev_addr);
+
+	/* let software handle IP fragments */
+	MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1);
+	MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag, 0);
+
+	return 0;
+
 out:
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h
index 6f9a78c..1630f0e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.h
@@ -76,8 +76,7 @@ int mlx5e_tc_tun_parse(struct net_device *filter_dev,
 		       struct mlx5e_priv *priv,
 		       struct mlx5_flow_spec *spec,
 		       struct flow_cls_offload *f,
-		       void *headers_c,
-		       void *headers_v, u8 *match_level);
+		       u8 *match_level);
 
 int mlx5e_tc_tun_parse_udp_ports(struct mlx5e_priv *priv,
 				 struct mlx5_flow_spec *spec,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index bc2d71a..71c4e78 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1677,122 +1677,15 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
 			     struct net_device *filter_dev, u8 *match_level)
 {
 	struct netlink_ext_ack *extack = f->common.extack;
-	void *headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
-				       outer_headers);
-	void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value,
-				       outer_headers);
-	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
 	int err;
 
-	err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f,
-				 headers_c, headers_v, match_level);
+	err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f, match_level);
 	if (err) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "failed to parse tunnel attributes");
 		return err;
 	}
 
-	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL)) {
-		struct flow_match_control match;
-		u16 addr_type;
-
-		flow_rule_match_enc_control(rule, &match);
-		addr_type = match.key->addr_type;
-
-		/* For tunnel addr_type used same key id`s as for non-tunnel */
-		if (addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {
-			struct flow_match_ipv4_addrs match;
-
-			flow_rule_match_enc_ipv4_addrs(rule, &match);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
-				 src_ipv4_src_ipv6.ipv4_layout.ipv4,
-				 ntohl(match.mask->src));
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 src_ipv4_src_ipv6.ipv4_layout.ipv4,
-				 ntohl(match.key->src));
-
-			MLX5_SET(fte_match_set_lyr_2_4, headers_c,
-				 dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
-				 ntohl(match.mask->dst));
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v,
-				 dst_ipv4_dst_ipv6.ipv4_layout.ipv4,
-				 ntohl(match.key->dst));
-
-			MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c,
-					 ethertype);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
-				 ETH_P_IP);
-		} else if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
-			struct flow_match_ipv6_addrs match;
-
-			flow_rule_match_enc_ipv6_addrs(rule, &match);
-			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
-					    src_ipv4_src_ipv6.ipv6_layout.ipv6),
-			       &match.mask->src, MLX5_FLD_SZ_BYTES(ipv6_layout,
-								   ipv6));
-			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
-					    src_ipv4_src_ipv6.ipv6_layout.ipv6),
-			       &match.key->src, MLX5_FLD_SZ_BYTES(ipv6_layout,
-								  ipv6));
-
-			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_c,
-					    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
-			       &match.mask->dst, MLX5_FLD_SZ_BYTES(ipv6_layout,
-								   ipv6));
-			memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
-					    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
-			       &match.key->dst, MLX5_FLD_SZ_BYTES(ipv6_layout,
-								  ipv6));
-
-			MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c,
-					 ethertype);
-			MLX5_SET(fte_match_set_lyr_2_4, headers_v, ethertype,
-				 ETH_P_IPV6);
-		}
-	}
-
-	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ENC_IP)) {
-		struct flow_match_ip match;
-
-		flow_rule_match_enc_ip(rule, &match);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_ecn,
-			 match.mask->tos & 0x3);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_ecn,
-			 match.key->tos & 0x3);
-
-		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ip_dscp,
-			 match.mask->tos >> 2);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ip_dscp,
-			 match.key->tos  >> 2);
-
-		MLX5_SET(fte_match_set_lyr_2_4, headers_c, ttl_hoplimit,
-			 match.mask->ttl);
-		MLX5_SET(fte_match_set_lyr_2_4, headers_v, ttl_hoplimit,
-			 match.key->ttl);
-
-		if (match.mask->ttl &&
-		    !MLX5_CAP_ESW_FLOWTABLE_FDB
-			(priv->mdev,
-			 ft_field_support.outer_ipv4_ttl)) {
-			NL_SET_ERR_MSG_MOD(extack,
-					   "Matching on TTL is not supported");
-			return -EOPNOTSUPP;
-		}
-
-	}
-
-	/* Enforce DMAC when offloading incoming tunneled flows.
-	 * Flow counters require a match on the DMAC.
-	 */
-	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, dmac_47_16);
-	MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, dmac_15_0);
-	ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, headers_v,
-				     dmac_47_16), priv->netdev->dev_addr);
-
-	/* let software handle IP fragments */
-	MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1);
-	MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag, 0);
-
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 10/13] net/mlx5e: Disallow inserting vxlan/vlan egress rules without decap/pop
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (8 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 09/13] net/mlx5e: Move tc tunnel parsing logic with the rest at tc_tun module Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 11/13] net/mlx5e: Support inner header rewrite with goto action Paul Blakey
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

Currently, rules on tunnel devices can be offloaded without decap action
when a vlan pop action exists. Similarly, the driver will offload rules
on vlan interfaces with no pop action when a decap action exists.

Disallow the faulty behavior by checking that vlan egress rules do pop or
drop and vxlan egress rules do decap, as intended.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 71c4e78..af7c917 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2636,6 +2636,8 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 				    struct mlx5e_tc_flow *flow,
 				    struct netlink_ext_ack *extack)
 {
+	struct net_device *filter_dev = parse_attr->filter_dev;
+	bool drop_action, decap_action, pop_action;
 	u32 actions;
 
 	if (mlx5e_is_eswitch_flow(flow))
@@ -2643,11 +2645,19 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 	else
 		actions = flow->nic_attr->action;
 
-	if (flow_flag_test(flow, EGRESS) &&
-	    !((actions & MLX5_FLOW_CONTEXT_ACTION_DECAP) ||
-	      (actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP) ||
-	      (actions & MLX5_FLOW_CONTEXT_ACTION_DROP)))
-		return false;
+	drop_action = actions & MLX5_FLOW_CONTEXT_ACTION_DROP;
+	decap_action = actions & MLX5_FLOW_CONTEXT_ACTION_DECAP;
+	pop_action = actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
+
+	if (flow_flag_test(flow, EGRESS) && !drop_action) {
+		/* If no drop, we must decap (vxlan) or pop (vlan) */
+		if (mlx5e_get_tc_tun(filter_dev) && !decap_action)
+			return false;
+		else if (is_vlan_dev(filter_dev) && !pop_action)
+			return false;
+		else
+			return false; /* Sanity */
+	}
 
 	if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
 		return modify_header_match_supported(&parse_attr->spec,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 11/13] net/mlx5e: Support inner header rewrite with goto action
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (9 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 10/13] net/mlx5e: Disallow inserting vxlan/vlan egress rules without decap/pop Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 12/13] net/mlx5: E-Switch, Get reg_c1 value on miss Paul Blakey
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

The hardware supports header rewrite of outer headers only.
To perform header rewrite on inner headers, we must first
decapsulate the packet.

Currently, the hardware decap action is explicitly set by the tc
tunnel_key unset action. However, with goto action the user won't
use the tunnel_key unset action. In addition, header rewrites actions
will not apply to the inner header as done by the software model.

To support this, we will map each tunnel matches seen on a tc rule to
a unique tunnel id, implicity add a decap action on tc chain 0 flows,
and mark the packets with this unique tunnel id. Tunnel matches on
the decapsulated tunnel on later chains will match on this unique id
instead of the actual packet.

We will also use this mapping to restore the tunnel info metadata
on miss.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c  | 473 ++++++++++++++++++++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h  |  13 +
 3 files changed, 446 insertions(+), 45 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index 5e29141..3849f06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -80,6 +80,11 @@ struct mlx5_rep_uplink_priv {
 	struct mutex                unready_flows_lock;
 	struct list_head            unready_flows;
 	struct work_struct          reoffload_flows_work;
+
+	/* maps tun_info to a unique id*/
+	struct mapping_ctx *tunnel_mapping;
+	/* maps tun_enc_opts to a unique id*/
+	struct mapping_ctx *tunnel_enc_opts_mapping;
 };
 
 struct mlx5e_rep_priv {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index af7c917..9f8ff40 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -57,8 +57,11 @@
 #include "en/tc_tun.h"
 #include "lib/devcom.h"
 #include "lib/geneve.h"
+#include "lib/mapping.h"
 #include "diag/en_tc_tracepoint.h"
 
+#define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)
+
 struct mlx5_nic_flow_attr {
 	u32 action;
 	u32 flow_tag;
@@ -134,6 +137,8 @@ struct mlx5e_tc_flow {
 	refcount_t		refcnt;
 	struct rcu_head		rcu_head;
 	struct completion	init_done;
+	int tunnel_id; /* the mapped tunnel id of this flow */
+
 	union {
 		struct mlx5_esw_flow_attr esw_attr[0];
 		struct mlx5_nic_flow_attr nic_attr[0];
@@ -151,14 +156,105 @@ struct mlx5e_tc_flow_parse_attr {
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
 #define MLX5E_TC_TABLE_MAX_GROUP_SIZE BIT(16)
 
+struct tunnel_match_key {
+	struct flow_dissector_key_control enc_control;
+	struct flow_dissector_key_keyid enc_key_id;
+	struct flow_dissector_key_ports enc_tp;
+	struct flow_dissector_key_ip enc_ip;
+	union {
+		struct flow_dissector_key_ipv4_addrs enc_ipv4;
+		struct flow_dissector_key_ipv6_addrs enc_ipv6;
+	};
+
+	int filter_ifindex;
+};
+
+/* Tunnel_id mapping is TUNNEL_INFO_BITS + ENC_OPTS_BITS.
+ * Upper TUNNEL_INFO_BITS for general tunnel info.
+ * Lower ENC_OPTS_BITS bits for enc_opts.
+ */
+#define TUNNEL_INFO_BITS 6
+#define TUNNEL_INFO_BITS_MASK GENMASK(TUNNEL_INFO_BITS - 1, 0)
+#define ENC_OPTS_BITS 2
+#define ENC_OPTS_BITS_MASK GENMASK(ENC_OPTS_BITS - 1, 0)
+#define TUNNEL_ID_BITS (TUNNEL_INFO_BITS + ENC_OPTS_BITS)
+#define TUNNEL_ID_MASK GENMASK(TUNNEL_ID_BITS - 1, 0)
+
 struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[] = {
 	[CHAIN_TO_REG] = {
 		.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_0,
 		.moffset = 0,
 		.mlen = 2,
 	},
+	[TUNNEL_TO_REG] = {
+		.mfield = MLX5_ACTION_IN_FIELD_METADATA_REG_C_1,
+		.moffset = 3,
+		.mlen = 1,
+		.soffset = MLX5_BYTE_OFF(fte_match_param,
+					 misc_parameters_2.metadata_reg_c_1),
+	},
 };
 
+static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow);
+
+void
+mlx5e_tc_match_to_reg_match(struct mlx5_flow_spec *spec,
+			    enum mlx5e_tc_attr_to_reg type,
+			    u32 data,
+			    u32 mask)
+{
+	int soffset = mlx5e_tc_attr_to_reg_mappings[type].soffset;
+	int match_len = mlx5e_tc_attr_to_reg_mappings[type].mlen;
+	void *headers_c = spec->match_criteria;
+	void *headers_v = spec->match_value;
+	void *fmask, *fval;
+
+	fmask = headers_c + soffset;
+	fval = headers_v + soffset;
+
+	mask = cpu_to_be32(mask) >> (32 - (match_len * 8));
+	data = cpu_to_be32(data) >> (32 - (match_len * 8));
+
+	memcpy(fmask, &mask, match_len);
+	memcpy(fval, &data, match_len);
+
+	spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
+}
+
+int
+mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
+			  struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts,
+			  enum mlx5e_tc_attr_to_reg type,
+			  u32 data)
+{
+	int moffset = mlx5e_tc_attr_to_reg_mappings[type].moffset;
+	int mfield = mlx5e_tc_attr_to_reg_mappings[type].mfield;
+	int mlen = mlx5e_tc_attr_to_reg_mappings[type].mlen;
+	char *modact;
+	int err;
+
+	err = alloc_mod_hdr_actions(mdev, MLX5_FLOW_NAMESPACE_FDB,
+				    mod_hdr_acts);
+	if (err)
+		return err;
+
+	modact = mod_hdr_acts->actions +
+		 (mod_hdr_acts->num_actions * MLX5_MH_ACT_SZ);
+
+	/* Firmware has 5bit length field and 0 means 32bits */
+	if (mlen == 4)
+		mlen = 0;
+
+	MLX5_SET(set_action_in, modact, action_type, MLX5_ACTION_TYPE_SET);
+	MLX5_SET(set_action_in, modact, field, mfield);
+	MLX5_SET(set_action_in, modact, offset, moffset * 8);
+	MLX5_SET(set_action_in, modact, length, mlen * 8);
+	MLX5_SET(set_action_in, modact, data, data);
+	mod_hdr_acts->num_actions++;
+
+	return 0;
+}
+
 struct mlx5e_hairpin {
 	struct mlx5_hairpin *pair;
 
@@ -216,8 +312,6 @@ struct mlx5e_mod_hdr_entry {
 	int compl_result;
 };
 
-#define MLX5_MH_ACT_SZ MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)
-
 static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 			      struct mlx5e_tc_flow *flow);
 
@@ -1281,6 +1375,8 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
 	struct mlx5_esw_flow_attr slow_attr;
 	int out_index;
 
+	mlx5e_put_flow_tunnel_id(flow);
+
 	if (flow_flag_test(flow, NOT_READY)) {
 		remove_unready_flow(flow);
 		kvfree(attr->parse_attr);
@@ -1670,46 +1766,271 @@ static void mlx5e_tc_del_flow(struct mlx5e_priv *priv,
 	}
 }
 
+static int flow_has_tc_fwd_action(struct flow_cls_offload *f)
+{
+	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+	struct flow_action *flow_action = &rule->action;
+	const struct flow_action_entry *act;
+	int i;
+
+	flow_action_for_each(i, act, flow_action) {
+		switch (act->id) {
+		case FLOW_ACTION_GOTO:
+			return true;
+		default:
+			continue;
+		}
+	}
+
+	return false;
+}
+
+static int
+enc_opts_is_dont_care_or_full_match(struct mlx5e_priv *priv,
+				    struct flow_dissector_key_enc_opts *opts,
+				    struct netlink_ext_ack *extack,
+				    bool *dont_care)
+{
+	struct geneve_opt *opt;
+	int off = 0;
+
+	*dont_care = true;
+
+	while (opts->len > off) {
+		opt = (struct geneve_opt *)&opts->data[off];
+
+		if (!(*dont_care) || opt->opt_class || opt->type ||
+		    memchr_inv(opt->opt_data, 0, opt->length * 4)) {
+			*dont_care = false;
+
+			if (opt->opt_class != U16_MAX ||
+			    opt->type != U8_MAX ||
+			    memchr_inv(opt->opt_data, 0xFF,
+				       opt->length * 4)) {
+				NL_SET_ERR_MSG(extack,
+					       "Partial match of tunnel options in chain > 0 isn't supported");
+				netdev_warn(priv->netdev,
+					    "Partial match of tunnel options in chain > 0 isn't supported");
+				return -EOPNOTSUPP;
+			}
+		}
+
+		off += sizeof(struct geneve_opt) + opt->length * 4;
+	}
+
+	return 0;
+}
+
+#define COPY_DISSECTOR(rule, diss_key, dst)\
+({ \
+	struct flow_rule *__rule = (rule);\
+	typeof(dst) __dst = dst;\
+\
+	memcpy(__dst,\
+	       skb_flow_dissector_target(__rule->match.dissector,\
+					 diss_key,\
+					 __rule->match.key),\
+	       sizeof(*__dst));\
+})
+
+static int mlx5e_get_flow_tunnel_id(struct mlx5e_priv *priv,
+				    struct mlx5e_tc_flow *flow,
+				    struct flow_cls_offload *f,
+				    struct net_device *filter_dev)
+{
+	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+	struct netlink_ext_ack *extack = f->common.extack;
+	struct mlx5_esw_flow_attr *attr = flow->esw_attr;
+	struct flow_match_enc_opts enc_opts_match;
+	struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+	bool enc_opts_is_dont_care = true;
+	struct tunnel_match_key tunnel_key;
+	u32 tun_id, enc_opts_id = 0;
+	struct mlx5_eswitch *esw;
+	u32 value, mask;
+	int err;
+
+	esw = priv->mdev->priv.eswitch;
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
+
+	memset(&tunnel_key, 0, sizeof(tunnel_key));
+	COPY_DISSECTOR(rule, FLOW_DISSECTOR_KEY_ENC_CONTROL,
+		       &tunnel_key.enc_control);
+	if (tunnel_key.enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS)
+		COPY_DISSECTOR(rule, FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS,
+			       &tunnel_key.enc_ipv4);
+	else
+		COPY_DISSECTOR(rule, FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS,
+			       &tunnel_key.enc_ipv6);
+	COPY_DISSECTOR(rule, FLOW_DISSECTOR_KEY_ENC_IP, &tunnel_key.enc_ip);
+	COPY_DISSECTOR(rule, FLOW_DISSECTOR_KEY_ENC_PORTS,
+		       &tunnel_key.enc_tp);
+	COPY_DISSECTOR(rule, FLOW_DISSECTOR_KEY_ENC_KEYID,
+		       &tunnel_key.enc_key_id);
+	tunnel_key.filter_ifindex = filter_dev->ifindex;
+
+	err = mapping_add(uplink_priv->tunnel_mapping, &tunnel_key, &tun_id);
+	if (err)
+		return err;
+
+	flow_rule_match_enc_opts(rule, &enc_opts_match);
+	err = enc_opts_is_dont_care_or_full_match(priv,
+						  enc_opts_match.mask,
+						  extack,
+						  &enc_opts_is_dont_care);
+	if (err)
+		goto err_enc_opts;
+
+	if (!enc_opts_is_dont_care) {
+		err = mapping_add(uplink_priv->tunnel_enc_opts_mapping,
+				  enc_opts_match.key, &enc_opts_id);
+		if (err)
+			goto err_enc_opts;
+	}
+
+	value = tun_id << ENC_OPTS_BITS | enc_opts_id;
+	mask = enc_opts_id ? TUNNEL_ID_MASK :
+			     (TUNNEL_ID_MASK & ~ENC_OPTS_BITS_MASK);
+
+	if (attr->chain) {
+		mlx5e_tc_match_to_reg_match(&attr->parse_attr->spec,
+					    TUNNEL_TO_REG, value, mask);
+	} else {
+		mod_hdr_acts = &attr->parse_attr->mod_hdr_acts;
+		err = mlx5e_tc_match_to_reg_set(priv->mdev,
+						mod_hdr_acts,
+						TUNNEL_TO_REG, value);
+		if (err)
+			goto err_set;
+
+		attr->action |= MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+	}
+
+	flow->tunnel_id = value;
+	return 0;
+
+err_set:
+	if (enc_opts_id)
+		mapping_remove(uplink_priv->tunnel_enc_opts_mapping,
+			       enc_opts_id);
+err_enc_opts:
+	mapping_remove(uplink_priv->tunnel_mapping, tun_id);
+	return err;
+}
+
+static void mlx5e_put_flow_tunnel_id(struct mlx5e_tc_flow *flow)
+{
+	u32 enc_opts_id = flow->tunnel_id & ENC_OPTS_BITS_MASK;
+	u32 tun_id = flow->tunnel_id >> ENC_OPTS_BITS;
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+	struct mlx5_eswitch *esw;
+
+	esw = flow->priv->mdev->priv.eswitch;
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
+
+	if (tun_id)
+		mapping_remove(uplink_priv->tunnel_mapping, tun_id);
+	if (enc_opts_id)
+		mapping_remove(uplink_priv->tunnel_enc_opts_mapping,
+			       enc_opts_id);
+}
 
 static int parse_tunnel_attr(struct mlx5e_priv *priv,
+			     struct mlx5e_tc_flow *flow,
 			     struct mlx5_flow_spec *spec,
 			     struct flow_cls_offload *f,
-			     struct net_device *filter_dev, u8 *match_level)
+			     struct net_device *filter_dev,
+			     u8 *match_level,
+			     bool *match_inner)
 {
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
 	struct netlink_ext_ack *extack = f->common.extack;
+	bool needs_mapping, sets_mapping;
 	int err;
 
-	err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f, match_level);
-	if (err) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "failed to parse tunnel attributes");
-		return err;
+	if (!mlx5e_is_eswitch_flow(flow))
+		return -EOPNOTSUPP;
+
+	needs_mapping = !!flow->esw_attr->chain;
+	sets_mapping = !flow->esw_attr->chain && flow_has_tc_fwd_action(f);
+	*match_inner = !needs_mapping;
+
+	if ((needs_mapping || sets_mapping) &&
+	    !mlx5_eswitch_vport_match_metadata_enabled(esw)) {
+		NL_SET_ERR_MSG(extack,
+			       "Chains on tunnel devices isn't supported without register metadata support");
+		netdev_warn(priv->netdev,
+			    "Chains on tunnel devices isn't supported without register metadata support");
+		return -EOPNOTSUPP;
 	}
 
-	return 0;
+	if (!flow->esw_attr->chain) {
+		err = mlx5e_tc_tun_parse(filter_dev, priv, spec, f,
+					 match_level);
+		if (err) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Failed to parse tunnel attributes");
+			netdev_warn(priv->netdev,
+				    "Failed to parse tunnel attributes");
+			return err;
+		}
+
+		flow->esw_attr->action |= MLX5_FLOW_CONTEXT_ACTION_DECAP;
+	}
+
+	if (!needs_mapping && !sets_mapping)
+		return 0;
+
+	return mlx5e_get_flow_tunnel_id(priv, flow, f, filter_dev);
 }
 
-static void *get_match_headers_criteria(u32 flags,
-					struct mlx5_flow_spec *spec)
+static void *get_match_inner_headers_criteria(struct mlx5_flow_spec *spec)
 {
-	return (flags & MLX5_FLOW_CONTEXT_ACTION_DECAP) ?
-		MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
-			     inner_headers) :
-		MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
-			     outer_headers);
+	return MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+			    inner_headers);
+}
+
+static void *get_match_inner_headers_value(struct mlx5_flow_spec *spec)
+{
+	return MLX5_ADDR_OF(fte_match_param, spec->match_value,
+			    inner_headers);
+}
+
+static void *get_match_outer_headers_criteria(struct mlx5_flow_spec *spec)
+{
+	return MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
+			    outer_headers);
+}
+
+static void *get_match_outer_headers_value(struct mlx5_flow_spec *spec)
+{
+	return MLX5_ADDR_OF(fte_match_param, spec->match_value,
+			    outer_headers);
 }
 
 static void *get_match_headers_value(u32 flags,
 				     struct mlx5_flow_spec *spec)
 {
 	return (flags & MLX5_FLOW_CONTEXT_ACTION_DECAP) ?
-		MLX5_ADDR_OF(fte_match_param, spec->match_value,
-			     inner_headers) :
-		MLX5_ADDR_OF(fte_match_param, spec->match_value,
-			     outer_headers);
+		get_match_inner_headers_value(spec) :
+		get_match_outer_headers_value(spec);
+}
+
+static void *get_match_headers_criteria(u32 flags,
+					struct mlx5_flow_spec *spec)
+{
+	return (flags & MLX5_FLOW_CONTEXT_ACTION_DECAP) ?
+		get_match_inner_headers_criteria(spec) :
+		get_match_outer_headers_criteria(spec);
 }
 
 static int __parse_cls_flower(struct mlx5e_priv *priv,
+			      struct mlx5e_tc_flow *flow,
 			      struct mlx5_flow_spec *spec,
 			      struct flow_cls_offload *f,
 			      struct net_device *filter_dev,
@@ -1729,6 +2050,7 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 	u16 addr_type = 0;
 	u8 ip_proto = 0;
 	u8 *match_level;
+	int err;
 
 	match_level = outer_match_level;
 
@@ -1758,18 +2080,22 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 	}
 
 	if (mlx5e_get_tc_tun(filter_dev)) {
-		if (parse_tunnel_attr(priv, spec, f, filter_dev,
-				      outer_match_level))
-			return -EOPNOTSUPP;
+		bool match_inner = false;
 
-		/* At this point, header pointers should point to the inner
-		 * headers, outer header were already set by parse_tunnel_attr
-		 */
-		match_level = inner_match_level;
-		headers_c = get_match_headers_criteria(MLX5_FLOW_CONTEXT_ACTION_DECAP,
-						       spec);
-		headers_v = get_match_headers_value(MLX5_FLOW_CONTEXT_ACTION_DECAP,
-						    spec);
+		err = parse_tunnel_attr(priv, flow, spec, f, filter_dev,
+					outer_match_level, &match_inner);
+		if (err)
+			return err;
+
+		if (match_inner) {
+			/* header pointers should point to the inner headers
+			 * if the packet was decapsulated already.
+			 * outer headers are set by parse_tunnel_attr.
+			 */
+			match_level = inner_match_level;
+			headers_c = get_match_inner_headers_criteria(spec);
+			headers_v = get_match_inner_headers_value(spec);
+		}
 	}
 
 	if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) {
@@ -2082,8 +2408,8 @@ static int parse_cls_flower(struct mlx5e_priv *priv,
 	inner_match_level = MLX5_MATCH_NONE;
 	outer_match_level = MLX5_MATCH_NONE;
 
-	err = __parse_cls_flower(priv, spec, f, filter_dev, &inner_match_level,
-				 &outer_match_level);
+	err = __parse_cls_flower(priv, flow, spec, f, filter_dev,
+				 &inner_match_level, &outer_match_level);
 	non_tunnel_match_level = (inner_match_level == MLX5_MATCH_NONE) ?
 				 outer_match_level : inner_match_level;
 
@@ -2637,7 +2963,7 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 				    struct netlink_ext_ack *extack)
 {
 	struct net_device *filter_dev = parse_attr->filter_dev;
-	bool drop_action, decap_action, pop_action;
+	bool drop_action, pop_action;
 	u32 actions;
 
 	if (mlx5e_is_eswitch_flow(flow))
@@ -2646,17 +2972,15 @@ static bool actions_match_supported(struct mlx5e_priv *priv,
 		actions = flow->nic_attr->action;
 
 	drop_action = actions & MLX5_FLOW_CONTEXT_ACTION_DROP;
-	decap_action = actions & MLX5_FLOW_CONTEXT_ACTION_DECAP;
 	pop_action = actions & MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
 
 	if (flow_flag_test(flow, EGRESS) && !drop_action) {
-		/* If no drop, we must decap (vxlan) or pop (vlan) */
-		if (mlx5e_get_tc_tun(filter_dev) && !decap_action)
-			return false;
-		else if (is_vlan_dev(filter_dev) && !pop_action)
+		/* We only support filters on tunnel device, or on vlan
+		 * devices if they have pop/drop action
+		 */
+		if (!mlx5e_get_tc_tun(filter_dev) ||
+		    (is_vlan_dev(filter_dev) && !pop_action))
 			return false;
-		else
-			return false; /* Sanity */
 	}
 
 	if (actions & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR)
@@ -3209,9 +3533,9 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 	int ifindexes[MLX5_MAX_FLOW_FWD_VPORTS];
 	bool ft_flow = mlx5e_is_ft_flow(flow);
 	const struct flow_action_entry *act;
+	bool encap = false, decap = false;
+	u32 action = attr->action;
 	int err, i, if_count = 0;
-	bool encap = false;
-	u32 action = 0;
 
 	if (!flow_action_has_entries(flow_action))
 		return -EINVAL;
@@ -3388,7 +3712,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 			attr->split_count = attr->out_count;
 			break;
 		case FLOW_ACTION_TUNNEL_DECAP:
-			action |= MLX5_FLOW_CONTEXT_ACTION_DECAP;
+			decap = true;
 			break;
 		case FLOW_ACTION_GOTO: {
 			u32 dest_chain = act->chain_index;
@@ -3452,6 +3776,22 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv,
 		return -EOPNOTSUPP;
 
 	if (attr->dest_chain) {
+		if (decap) {
+			/* It can be supported if we'll create a mapping for
+			 * the tunnel device only (without tunnel), and set
+			 * this tunnel id with this decap flow.
+			 *
+			 * On restore (miss), we'll just set this saved tunnel
+			 * device.
+			 */
+
+			NL_SET_ERR_MSG(extack,
+				       "Decap with goto isn't supported");
+			netdev_warn(priv->netdev,
+				    "Decap with goto isn't supported");
+			return -EOPNOTSUPP;
+		}
+
 		if (attr->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
 			NL_SET_ERR_MSG(extack, "Mirroring goto chain rules isn't supported");
 			return -EOPNOTSUPP;
@@ -4166,12 +4506,55 @@ void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv)
 
 int mlx5e_tc_esw_init(struct rhashtable *tc_ht)
 {
-	return rhashtable_init(tc_ht, &tc_ht_params);
+	const size_t sz_enc_opts = sizeof(struct flow_dissector_key_enc_opts);
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *priv;
+	struct mapping_ctx *mapping;
+	int err;
+
+	uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht);
+	priv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
+
+	mapping = mapping_idr_create(sizeof(struct tunnel_match_key),
+				     TUNNEL_INFO_BITS_MASK, true);
+	if (IS_ERR(mapping)) {
+		err = PTR_ERR(mapping);
+		goto err_tun_mapping;
+	}
+	uplink_priv->tunnel_mapping = mapping;
+
+	mapping = mapping_idr_create(sz_enc_opts, ENC_OPTS_BITS_MASK, true);
+	if (IS_ERR(mapping)) {
+		err = PTR_ERR(mapping);
+		goto err_enc_opts_mapping;
+	}
+	uplink_priv->tunnel_enc_opts_mapping = mapping;
+
+	err = rhashtable_init(tc_ht, &tc_ht_params);
+	if (err)
+		goto err_ht_init;
+
+	return err;
+
+err_ht_init:
+	mapping_idr_destroy(uplink_priv->tunnel_enc_opts_mapping);
+err_enc_opts_mapping:
+	mapping_idr_destroy(uplink_priv->tunnel_mapping);
+err_tun_mapping:
+	netdev_warn(priv->netdev,
+		    "Failed to initialize tc (eswitch), err: %d", err);
+	return err;
 }
 
 void mlx5e_tc_esw_cleanup(struct rhashtable *tc_ht)
 {
+	struct mlx5_rep_uplink_priv *uplink_priv;
+
 	rhashtable_free_and_destroy(tc_ht, _mlx5e_tc_del_flow, NULL);
+
+	uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht);
+	mapping_idr_destroy(uplink_priv->tunnel_enc_opts_mapping);
+	mapping_idr_destroy(uplink_priv->tunnel_mapping);
 }
 
 int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 3848ec7..2fab76b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -93,12 +93,15 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 
 enum mlx5e_tc_attr_to_reg {
 	CHAIN_TO_REG,
+	TUNNEL_TO_REG,
 };
 
 struct mlx5e_tc_attr_to_reg_mapping {
 	int mfield; /* rewrite field */
 	int moffset; /* offset of mfield */
 	int mlen; /* bytes to rewrite/match */
+
+	int soffset; /* offset of spec for match */
 };
 
 extern struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[];
@@ -114,6 +117,16 @@ struct mlx5e_tc_mod_hdr_acts {
 	void *actions;
 };
 
+int mlx5e_tc_match_to_reg_set(struct mlx5_core_dev *mdev,
+			      struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts,
+			      enum mlx5e_tc_attr_to_reg type,
+			      u32 data);
+
+void mlx5e_tc_match_to_reg_match(struct mlx5_flow_spec *spec,
+				 enum mlx5e_tc_attr_to_reg type,
+				 u32 data,
+				 u32 mask);
+
 int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev,
 			  int namespace,
 			  struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 12/13] net/mlx5: E-Switch, Get reg_c1 value on miss
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (10 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 11/13] net/mlx5e: Support inner header rewrite with goto action Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 16:16 ` [PATCH net-next-mlx5 13/13] net/mlx5e: Restore tunnel metadata " Paul Blakey
  2020-01-21 21:18 ` [PATCH net-next 00/13] Handle multi chain hardware misses Saeed Mahameed
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

The HW model implicitly decapsulates tunnels on chain 0 and sets reg_c1
with the mapped tunnel id. On miss, the packet does not have the outer
header and the driver restores the tunnel information from the tunnel id.

Getting reg_c1 value in software requires enabling reg_c1 loopback and
copying reg_c1 to reg_b. reg_b comes up on CQE as cqe->imm_inval_pkey.

Use the reg_c0 restoration rules to also copy reg_c1 to reg_B.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  1 +
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 31 +++++++++++++++++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index cc446ba..1597cfe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -191,6 +191,7 @@ struct mlx5_eswitch_fdb {
 struct mlx5_esw_offload {
 	struct mlx5_flow_table *ft_offloads_restore;
 	struct mlx5_flow_group *restore_group;
+	struct mlx5_modify_hdr *restore_copy_hdr_id;
 
 	struct mlx5_flow_table *ft_offloads;
 	struct mlx5_flow_group *vport_rx_group;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index d6c0850..6048d8b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -612,9 +612,11 @@ static int esw_set_passing_vport_metadata(struct mlx5_eswitch *esw, bool enable)
 					 esw_vport_context.fdb_to_vport_reg_c_id);
 
 	if (enable)
-		fdb_to_vport_reg_c_id |= MLX5_FDB_TO_VPORT_REG_C_0;
+		fdb_to_vport_reg_c_id |= MLX5_FDB_TO_VPORT_REG_C_0 |
+					 MLX5_FDB_TO_VPORT_REG_C_1;
 	else
-		fdb_to_vport_reg_c_id &= ~MLX5_FDB_TO_VPORT_REG_C_0;
+		fdb_to_vport_reg_c_id &= ~(MLX5_FDB_TO_VPORT_REG_C_0 |
+					   MLX5_FDB_TO_VPORT_REG_C_1);
 
 	MLX5_SET(modify_esw_vport_context_in, in,
 		 esw_vport_context.fdb_to_vport_reg_c_id, fdb_to_vport_reg_c_id);
@@ -872,7 +874,9 @@ struct mlx5_flow_handle *
 			    misc_parameters_2);
 	MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0, tag);
 	spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
-	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+			  MLX5_FLOW_CONTEXT_ACTION_MOD_HDR;
+	flow_act.modify_hdr = esw->offloads.restore_copy_hdr_id;
 
 	flow_context = &spec->flow_context;
 	flow_context->flags |= FLOW_CONTEXT_HAS_TAG;
@@ -1226,16 +1230,19 @@ static void esw_destroy_restore_table(struct mlx5_eswitch *esw)
 {
 	struct mlx5_esw_offload *offloads = &esw->offloads;
 
+	mlx5_modify_header_dealloc(esw->dev, offloads->restore_copy_hdr_id);
 	mlx5_destroy_flow_group(offloads->restore_group);
 	mlx5_destroy_flow_table(offloads->ft_offloads_restore);
 }
 
 static int esw_create_restore_table(struct mlx5_eswitch *esw)
 {
+	u8 modact[MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto)] = {};
 	int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
 	struct mlx5_flow_table_attr ft_attr = {};
 	struct mlx5_core_dev *dev = esw->dev;
 	struct mlx5_flow_namespace *ns;
+	struct mlx5_modify_hdr *mod_hdr;
 	void *match_criteria, *misc;
 	struct mlx5_flow_table *ft;
 	struct mlx5_flow_group *g;
@@ -1284,11 +1291,29 @@ static int esw_create_restore_table(struct mlx5_eswitch *esw)
 		goto err_group;
 	}
 
+	MLX5_SET(copy_action_in, modact, action_type, MLX5_ACTION_TYPE_COPY);
+	MLX5_SET(copy_action_in, modact, src_field,
+		 MLX5_ACTION_IN_FIELD_METADATA_REG_C_1);
+	MLX5_SET(copy_action_in, modact, dst_field,
+		 MLX5_ACTION_IN_FIELD_METADATA_REG_B);
+	mod_hdr = mlx5_modify_header_alloc(esw->dev,
+					   MLX5_FLOW_NAMESPACE_KERNEL, 1,
+					   modact);
+	if (IS_ERR(mod_hdr)) {
+		esw_warn(dev, "Failed to create restore mod header, err: %d\n",
+			 err);
+		err = PTR_ERR(mod_hdr);
+		goto err_mod_hdr;
+	}
+
 	esw->offloads.ft_offloads_restore = ft;
 	esw->offloads.restore_group = g;
+	esw->offloads.restore_copy_hdr_id = mod_hdr;
 
 	return 0;
 
+err_mod_hdr:
+	mlx5_destroy_flow_group(g);
 err_group:
 	mlx5_destroy_flow_table(ft);
 out_free:
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next-mlx5 13/13] net/mlx5e: Restore tunnel metadata on miss
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (11 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 12/13] net/mlx5: E-Switch, Get reg_c1 value on miss Paul Blakey
@ 2020-01-21 16:16 ` Paul Blakey
  2020-01-21 21:18 ` [PATCH net-next 00/13] Handle multi chain hardware misses Saeed Mahameed
  13 siblings, 0 replies; 23+ messages in thread
From: Paul Blakey @ 2020-01-21 16:16 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko

In tunnel and chains setup, we decapsulate the packets on first chain hop,
if we miss on later chains, the packet will comes up without tunnel header,
so it won't be taken by the tunnel device automatically, which fills the
tunnel metadata, and further tc tunnel matches won't work.

To fix that:
On miss, we get the tunnel mapping id, which was set on the chain 0 rule
that decapsulated the packet. This rule matched the tunnel outer
headers. From the tunnel mapping id, we get to this tunnel matches
and restore the equivalent tunnel info metadata dst on the skb.
We also set the skb->dev to the relevant device (tunnel device).
Now further tc processing can be done on the relevant device.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |  10 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 108 ++++++++++++++++++++++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.h |   9 +-
 3 files changed, 115 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 4402a53..59d01a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1191,6 +1191,7 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	struct mlx5e_priv *priv = netdev_priv(netdev);
 	struct mlx5e_rep_priv *rpriv  = priv->ppriv;
 	struct mlx5_eswitch_rep *rep = rpriv->rep;
+	struct mlx5e_tc_update_priv tc_priv = {};
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
 	struct mlx5e_wqe_frag_info *wi;
 	struct sk_buff *skb;
@@ -1223,11 +1224,13 @@ void mlx5e_handle_rx_cqe_rep(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe)
 	if (rep->vlan && skb_vlan_tag_present(skb))
 		skb_vlan_pop(skb);
 
-	if (!mlx5e_tc_rep_update_skb(cqe, skb))
+	if (!mlx5e_tc_rep_update_skb(cqe, skb, &tc_priv))
 		goto free_wqe;
 
 	napi_gro_receive(rq->cq.napi, skb);
 
+	mlx5_tc_rep_post_napi_receive(&tc_priv);
+
 free_wqe:
 	mlx5e_free_rx_wqe(rq, wi, true);
 wq_cyc_pop:
@@ -1244,6 +1247,7 @@ void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq,
 	u32 wqe_offset     = stride_ix << rq->mpwqe.log_stride_sz;
 	u32 head_offset    = wqe_offset & (PAGE_SIZE - 1);
 	u32 page_idx       = wqe_offset >> PAGE_SHIFT;
+	struct mlx5e_tc_update_priv tc_priv = {};
 	struct mlx5e_rx_wqe_ll *wqe;
 	struct mlx5_wq_ll *wq;
 	struct sk_buff *skb;
@@ -1276,11 +1280,13 @@ void mlx5e_handle_rx_cqe_mpwrq_rep(struct mlx5e_rq *rq,
 
 	mlx5e_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
 
-	if (!mlx5e_tc_rep_update_skb(cqe, skb))
+	if (!mlx5e_tc_rep_update_skb(cqe, skb, &tc_priv))
 		goto mpwrq_cqe_out;
 
 	napi_gro_receive(rq->cq.napi, skb);
 
+	mlx5_tc_rep_post_napi_receive(&tc_priv);
+
 mpwrq_cqe_out:
 	if (likely(wi->consumed_strides < rq->mpwqe.num_strides))
 		return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 9f8ff40..d8d028f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4587,19 +4587,100 @@ void mlx5e_tc_reoffload_flows_work(struct work_struct *work)
 	mutex_unlock(&rpriv->unready_flows_lock);
 }
 
+static bool mlx5e_restore_tunnel(struct mlx5e_priv *priv, struct sk_buff *skb,
+				 struct mlx5e_tc_update_priv *tc_priv,
+				 u32 tunnel_id)
+{
+	struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+	struct flow_dissector_key_enc_opts enc_opts = {};
+	struct mlx5_rep_uplink_priv *uplink_priv;
+	struct mlx5e_rep_priv *uplink_rpriv;
+	struct metadata_dst *tun_dst;
+	struct tunnel_match_key key;
+	u32 tun_id, enc_opts_id;
+	struct net_device *dev;
+	int err;
+
+	enc_opts_id = tunnel_id & ENC_OPTS_BITS_MASK;
+	tun_id = tunnel_id >> ENC_OPTS_BITS;
+
+	if (!tun_id)
+		return true;
+
+	uplink_rpriv = mlx5_eswitch_get_uplink_priv(esw, REP_ETH);
+	uplink_priv = &uplink_rpriv->uplink_priv;
+
+	err = mapping_find(uplink_priv->tunnel_mapping, tun_id, &key);
+	if (err) {
+		WARN_ON_ONCE(true);
+		netdev_dbg(priv->netdev,
+			   "Couldn't find tunnel for tun_id: %d, err: %d\n",
+			   tun_id, err);
+		return false;
+	}
+
+	if (enc_opts_id) {
+		err = mapping_find(uplink_priv->tunnel_enc_opts_mapping,
+				   enc_opts_id, &enc_opts);
+		if (err) {
+			netdev_dbg(priv->netdev,
+				   "Couldn't find tunnel (opts) for tun_id: %d, err: %d\n",
+				   enc_opts_id, err);
+			return false;
+		}
+	}
+
+	tun_dst = tun_rx_dst(enc_opts.len);
+	if (!tun_dst) {
+		WARN_ON_ONCE(true);
+		return false;
+	}
+
+	ip_tunnel_key_init(&tun_dst->u.tun_info.key,
+			   key.enc_ipv4.src, key.enc_ipv4.dst,
+			   key.enc_ip.tos, key.enc_ip.ttl,
+			   0, /* label */
+			   key.enc_tp.src, key.enc_tp.dst,
+			   key32_to_tunnel_id(key.enc_key_id.keyid),
+			   TUNNEL_KEY);
+
+	if (enc_opts.len)
+		ip_tunnel_info_opts_set(&tun_dst->u.tun_info, enc_opts.data,
+					enc_opts.len, enc_opts.dst_opt_type);
+
+	skb_dst_set(skb, (struct dst_entry *)tun_dst);
+	dev = dev_get_by_index(&init_net, key.filter_ifindex);
+	if (!dev) {
+		netdev_dbg(priv->netdev,
+			   "Couldn't find tunnel device with ifindex: %d\n",
+			   key.filter_ifindex);
+		return false;
+	}
+
+	/* Set tun_dev so we do dev_put() after datapath */
+	tc_priv->tun_dev = dev;
+
+	skb->dev = dev;
+
+	return true;
+}
+
 bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe,
-			     struct sk_buff *skb)
+			     struct sk_buff *skb,
+			     struct mlx5e_tc_update_priv *tc_priv)
 {
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	u32 chain = 0, reg_c0, reg_c1, tunnel_id;
 	struct tc_skb_ext *tc_skb_ext;
 	struct mlx5_eswitch *esw;
 	struct mlx5e_priv *priv;
-	u32 chain = 0, reg_c0;
+	int tunnel_moffset;
 	int err;
 
 	reg_c0 = (be32_to_cpu(cqe->sop_drop_qpn) & MLX5E_TC_FLOW_ID_MASK);
 	if (reg_c0 == MLX5_FS_DEFAULT_FLOW_TAG)
 		reg_c0 = 0;
+	reg_c1 = be32_to_cpu(cqe->imm_inval_pkey);
 
 	if (!reg_c0)
 		return true;
@@ -4615,17 +4696,26 @@ bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe,
 		return false;
 	}
 
-	if (!chain)
-		return true;
+	if (chain) {
+		tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
+		if (!tc_skb_ext) {
+			WARN_ON(1);
+			return false;
+		}
 
-	tc_skb_ext = skb_ext_add(skb, TC_SKB_EXT);
-	if (!tc_skb_ext) {
-		WARN_ON_ONCE(1);
-		return false;
+		tc_skb_ext->chain = chain;
 	}
 
-	tc_skb_ext->chain = chain;
+	tunnel_moffset = mlx5e_tc_attr_to_reg_mappings[TUNNEL_TO_REG].moffset;
+	tunnel_id = reg_c1 >> (8 * tunnel_moffset);
+	return mlx5e_restore_tunnel(priv, skb, tc_priv, tunnel_id);
 #endif /* CONFIG_NET_TC_SKB_EXT */
 
 	return true;
 }
+
+void mlx5_tc_rep_post_napi_receive(struct mlx5e_tc_update_priv *tc_priv)
+{
+	if (tc_priv->tun_dev)
+		dev_put(tc_priv->tun_dev);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
index 2fab76b..21cbde4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h
@@ -109,7 +109,14 @@ struct mlx5e_tc_attr_to_reg_mapping {
 bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv,
 				    struct net_device *out_dev);
 
-bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb);
+struct mlx5e_tc_update_priv {
+	struct net_device *tun_dev;
+};
+
+bool mlx5e_tc_rep_update_skb(struct mlx5_cqe64 *cqe, struct sk_buff *skb,
+			     struct mlx5e_tc_update_priv *tc_priv);
+
+void mlx5_tc_rep_post_napi_receive(struct mlx5e_tc_update_priv *tc_priv);
 
 struct mlx5e_tc_mod_hdr_acts {
 	int num_actions;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data
  2020-01-21 16:16 ` [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data Paul Blakey
@ 2020-01-21 19:04   ` Leon Romanovsky
  2020-01-22 12:17     ` Paul Blakey
  0 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2020-01-21 19:04 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko

On Tue, Jan 21, 2020 at 06:16:11PM +0200, Paul Blakey wrote:
> Add a new interface for mapping data to a given id range (max_id),
> and back again. It supports variable sized data, and different
> allocators, and read/write locks.
>
> This mapping interface also supports delaying the mapping removal via
> a workqueue. This is for cases where we need the mapping to have
> some grace period in regards to finding it back again, for example
> for packets arriving from hardware that were marked with by a rule
> with an old mapping that no longer exists.
>
> We also provide a first implementation of the interface is idr_mapping
> that uses idr for the allocator and a mutex lock for writes
> (add/del, but not for find).
>
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> Reviewed-by: Mark Bloch <markb@mellanox.com>
> ---

I have many issues with this patch, but two main are:
1. This is general implementation without proper documentation and test
which doesn't belong to driver code.
2. It looks very similar to already existing code, for example xarray.

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
  2020-01-21 16:16 ` [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits Paul Blakey
@ 2020-01-21 19:08   ` Leon Romanovsky
  2020-01-22 13:42     ` Paul Blakey
  0 siblings, 1 reply; 23+ messages in thread
From: Leon Romanovsky @ 2020-01-21 19:08 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko

On Tue, Jan 21, 2020 at 06:16:12PM +0200, Paul Blakey wrote:
> Multi chain support requires the miss path to continue the processing
> from the last chain id, and for that we need to save the chain
> miss tag (a mapping for 32bit chain id) on reg_c0 which will
> come in a next patch.
>
> Currently reg_c0 is exclusively used to store the source port
> metadata, giving it 32bit, it is created from 16bits of vcha_id,
> and 16bits of vport number.
>
> We will move this source port metadata to upper 16bits, and leave the
> lower bits for the chain miss tag. We compress the reg_c0 source port
> metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
> the vport number.
>
> Since we compress the vport number to 8bits statically, and leave two
> top ids for special PF/ECPF numbers, we will only support a max of 254
> vports with this strategy.
>
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> Reviewed-by: Mark Bloch <markb@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx5/main.c                  |  3 +-
>  .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 81 +++++++++++++++++++---
>  include/linux/mlx5/eswitch.h                       | 11 ++-
>  3 files changed, 82 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index 90489c5..844351c 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -3535,7 +3535,8 @@ static void mlx5_ib_set_rule_source_port(struct mlx5_ib_dev *dev,
>  		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
>  				    misc_parameters_2);
>
> -		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
> +		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
> +			 mlx5_eswitch_get_vport_metadata_mask());
>  	} else {
>  		misc = MLX5_ADDR_OF(fte_match_param, spec->match_value,
>  				    misc_parameters);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index a6d0b62..873b19c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -50,6 +50,19 @@
>  #define MLX5_ESW_MISS_FLOWS (2)
>  #define UPLINK_REP_INDEX 0
>
> +/* Reg C0 usage:
> + * Reg C0 = < VHCA_ID_BITS(8) | VPORT BITS(8) | CHAIN_TAG(16) >
> + *
> + * Highest 8 bits of the reg c0 is the vhca_id, next 8 bits is vport_num,
> + * the rest (lowest 16 bits) is left for tc chain tag restoration.
> + * VHCA_ID + VPORT comprise the SOURCE_PORT matching.
> + */
> +#define VHCA_ID_BITS 8
> +#define VPORT_BITS 8
> +#define SOURCE_PORT_METADATA_BITS (VHCA_ID_BITS + VPORT_BITS)
> +#define SOURCE_PORT_METADATA_OFFSET (32 - SOURCE_PORT_METADATA_BITS)
> +#define CHAIN_TAG_METADATA_BITS (32 - SOURCE_PORT_METADATA_BITS)
> +
>  static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
>  						     u16 vport_num)
>  {
> @@ -85,7 +98,8 @@ static struct mlx5_eswitch_rep *mlx5_eswitch_get_rep(struct mlx5_eswitch *esw,
>  								   attr->in_rep->vport));
>
>  		misc2 = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters_2);
> -		MLX5_SET_TO_ONES(fte_match_set_misc2, misc2, metadata_reg_c_0);
> +		MLX5_SET(fte_match_set_misc2, misc2, metadata_reg_c_0,
> +			 mlx5_eswitch_get_vport_metadata_mask());
>
>  		spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
>  		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters);
> @@ -621,7 +635,8 @@ static void peer_miss_rules_setup(struct mlx5_eswitch *esw,
>  	if (mlx5_eswitch_vport_match_metadata_enabled(esw)) {
>  		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
>  				    misc_parameters_2);
> -		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
> +		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
> +			 mlx5_eswitch_get_vport_metadata_mask());
>
>  		spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
>  	} else {
> @@ -851,8 +866,9 @@ static void esw_set_flow_group_source_port(struct mlx5_eswitch *esw,
>  			 match_criteria_enable,
>  			 MLX5_MATCH_MISC_PARAMETERS_2);
>
> -		MLX5_SET_TO_ONES(fte_match_param, match_criteria,
> -				 misc_parameters_2.metadata_reg_c_0);
> +		MLX5_SET(fte_match_param, match_criteria,
> +			 misc_parameters_2.metadata_reg_c_0,
> +			 mlx5_eswitch_get_vport_metadata_mask());
>  	} else {
>  		MLX5_SET(create_flow_group_in, flow_group_in,
>  			 match_criteria_enable,
> @@ -1134,7 +1150,8 @@ struct mlx5_flow_handle *
>  			 mlx5_eswitch_get_vport_metadata_for_match(esw, vport));
>
>  		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, misc_parameters_2);
> -		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
> +		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
> +			 mlx5_eswitch_get_vport_metadata_mask());
>
>  		spec->match_criteria_enable = MLX5_MATCH_MISC_PARAMETERS_2;
>  	} else {
> @@ -1604,11 +1621,17 @@ static int esw_vport_add_ingress_acl_modify_metadata(struct mlx5_eswitch *esw,
>  	static const struct mlx5_flow_spec spec = {};
>  	struct mlx5_flow_act flow_act = {};
>  	int err = 0;
> +	u32 key;
> +
> +	key = mlx5_eswitch_get_vport_metadata_for_match(esw, vport->vport);
> +	key >>= SOURCE_PORT_METADATA_OFFSET;
>
>  	MLX5_SET(set_action_in, action, action_type, MLX5_ACTION_TYPE_SET);
> -	MLX5_SET(set_action_in, action, field, MLX5_ACTION_IN_FIELD_METADATA_REG_C_0);
> -	MLX5_SET(set_action_in, action, data,
> -		 mlx5_eswitch_get_vport_metadata_for_match(esw, vport->vport));
> +	MLX5_SET(set_action_in, action, field,
> +		 MLX5_ACTION_IN_FIELD_METADATA_REG_C_0);
> +	MLX5_SET(set_action_in, action, data, key);
> +	MLX5_SET(set_action_in, action, offset, SOURCE_PORT_METADATA_OFFSET);
> +	MLX5_SET(set_action_in, action, length, SOURCE_PORT_METADATA_BITS);
>
>  	vport->ingress.offloads.modify_metadata =
>  		mlx5_modify_header_alloc(esw->dev, MLX5_FLOW_NAMESPACE_ESW_INGRESS,
> @@ -2465,9 +2488,47 @@ bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw)
>  }
>  EXPORT_SYMBOL(mlx5_eswitch_vport_match_metadata_enabled);
>
> -u32 mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
> +u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
>  					      u16 vport_num)
>  {
> -	return ((MLX5_CAP_GEN(esw->dev, vhca_id) & 0xffff) << 16) | vport_num;
> +	u32 vport_num_mask = GENMASK(VPORT_BITS - 1, 0);
> +	u32 vhca_id_mask = GENMASK(VHCA_ID_BITS - 1, 0);
> +	u32 vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
> +	u32 val;
> +
> +	/* Make sure the vhca_id fits the VHCA_ID_BITS */
> +	WARN_ON_ONCE(vhca_id >= BIT(VHCA_ID_BITS));
> +
> +	/* Trim vhca_id to VHCA_ID_BITS */
> +	vhca_id &= vhca_id_mask;
> +
> +	/* Make sure pf and ecpf map to end of VPORT_BITS range so they
> +	 * don't overlap with VF numbers, and themselves, after trimming.
> +	 */
> +	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) <
> +		     vport_num_mask - 1);
> +	WARN_ON_ONCE((MLX5_VPORT_ECPF & vport_num_mask) <
> +		     vport_num_mask - 1);
> +	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) ==
> +		     (MLX5_VPORT_ECPF & vport_num_mask));
> +
> +	/* Make sure that the VF vport_num fits VPORT_BITS and don't
> +	 * overlap with pf and ecpf.
> +	 */
> +	if (vport_num != MLX5_VPORT_UPLINK &&
> +	    vport_num != MLX5_VPORT_ECPF)
> +		WARN_ON_ONCE(vport_num >= vport_num_mask - 1);
> +
> +	/* We can now trim vport_num to VPORT_BITS */
> +	vport_num &= vport_num_mask;
> +
> +	val = (vhca_id << VPORT_BITS) | vport_num;
> +	return val << (32 - SOURCE_PORT_METADATA_BITS);
>  }
>  EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match);
> +
> +u32 mlx5_eswitch_get_vport_metadata_mask(void)
> +{
> +	return GENMASK(31, 32 - SOURCE_PORT_METADATA_BITS);
> +}
> +EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_mask);

This function can be inline in .h file easily and actually does nothing
except return 0xFFFF.

> diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
> index 98e667b..080b67c 100644
> --- a/include/linux/mlx5/eswitch.h
> +++ b/include/linux/mlx5/eswitch.h
> @@ -71,8 +71,9 @@ enum devlink_eswitch_encap_mode
>  mlx5_eswitch_get_encap_mode(const struct mlx5_core_dev *dev);
>
>  bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw);
> -u32 mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
> +u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
>  					      u16 vport_num);
> +u32 mlx5_eswitch_get_vport_metadata_mask(void);
>  u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw);
>  #else  /* CONFIG_MLX5_ESWITCH */
>
> @@ -94,11 +95,17 @@ static inline u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
>  };
>
>  static inline u32
> -mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
> +mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
>  					  int vport_num)
>  {
>  	return 0;
>  };
> +
> +static inline u32
> +mlx5_eswitch_get_vport_metadata_mask(void)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_MLX5_ESWITCH */
>
>  #endif
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 00/13] Handle multi chain hardware misses
  2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
                   ` (12 preceding siblings ...)
  2020-01-21 16:16 ` [PATCH net-next-mlx5 13/13] net/mlx5e: Restore tunnel metadata " Paul Blakey
@ 2020-01-21 21:18 ` Saeed Mahameed
  2020-01-23  9:54   ` David Miller
  13 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2020-01-21 21:18 UTC (permalink / raw)
  To: Vlad Buslov, netdev, jiri, Paul Blakey, Oz Shlomo, jakub.kicinski, davem

On Tue, 2020-01-21 at 18:16 +0200, Paul Blakey wrote:
> Note that miss path handling of multi-chain rules is a required
> infrastructure
> for connection tracking hardware offload. The connection tracking
> offload
> series will follow this one.

Hi Dave and Jakub,

As Paul explained this is part one of two parts series,

Assuming the review will go with no issues i would like to suggest the
following acceptance options:

option 1) I can create a separate side branch for connection tracking
offload and once Paul submits the final patch of this feature and the
mailing list review is complete, i can send to you full pull request
with everything included .. 

option 2) you to apply directly to net-next both patchsets
individually. (the normal process)

Please let me know what works better for you.

Personally I prefer option 1) so we won't endup stuck with only one
half of the connection tracking series if the review of the 2nd part
doesn't go as planned.

Thanks,
Saeed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data
  2020-01-21 19:04   ` Leon Romanovsky
@ 2020-01-22 12:17     ` Paul Blakey
  2020-01-22 13:51       ` Leon Romanovsky
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Blakey @ 2020-01-22 12:17 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko


On 1/21/2020 9:04 PM, Leon Romanovsky wrote:
> On Tue, Jan 21, 2020 at 06:16:11PM +0200, Paul Blakey wrote:
>> Add a new interface for mapping data to a given id range (max_id),
>> and back again. It supports variable sized data, and different
>> allocators, and read/write locks.
>>
>> This mapping interface also supports delaying the mapping removal via
>> a workqueue. This is for cases where we need the mapping to have
>> some grace period in regards to finding it back again, for example
>> for packets arriving from hardware that were marked with by a rule
>> with an old mapping that no longer exists.
>>
>> We also provide a first implementation of the interface is idr_mapping
>> that uses idr for the allocator and a mutex lock for writes
>> (add/del, but not for find).
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
>> Reviewed-by: Mark Bloch <markb@mellanox.com>
>> ---
> I have many issues with this patch, but two main are:
> 1. This is general implementation without proper documentation and test
> which doesn't belong to driver code.
> 2. It looks very similar to already existing code, for example xarray.
>
> Thanks
This data structure uses idr (currently wrapper for xarray) but also a 
hash table, refcount, and
generic allocators.
The hashtable is used on top of the idr to find if data added to the 
mapping already exists, if it
does it updates a refcount.
We also have some special delayed removal for our use case.
The addition to xarray is translation from data to hash function. It is 
something that doesn't exist
and needs extra code. IDR was chosen as being simplified interface of 
xarray and it is good enough
in our case.

The mlx5 is first user of such library, once the other user will arrive, 
we will be happy to
collaborate in order to make it generic.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
  2020-01-21 19:08   ` Leon Romanovsky
@ 2020-01-22 13:42     ` Paul Blakey
  2020-01-22 13:50       ` Leon Romanovsky
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Blakey @ 2020-01-22 13:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko


On 1/21/2020 9:08 PM, Leon Romanovsky wrote:
> On Tue, Jan 21, 2020 at 06:16:12PM +0200, Paul Blakey wrote:
>> Multi chain support requires the miss path to continue the processing
>> from the last chain id, and for that we need to save the chain
>> miss tag (a mapping for 32bit chain id) on reg_c0 which will
>> come in a next patch.
>>
>> Currently reg_c0 is exclusively used to store the source port
>> metadata, giving it 32bit, it is created from 16bits of vcha_id,
>> and 16bits of vport number.
>>
>> We will move this source port metadata to upper 16bits, and leave the
>> lower bits for the chain miss tag. We compress the reg_c0 source port
>> metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
>> the vport number.
>>
>> Since we compress the vport number to 8bits statically, and leave two
>> top ids for special PF/ECPF numbers, we will only support a max of 254
>> vports with this strategy.
>>
>> Signed-off-by: Paul Blakey <paulb@mellanox.com>
>> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
>> Reviewed-by: Mark Bloch <markb@mellanox.com>
>> ---
>>   drivers/infiniband/hw/mlx5/main.c                  |  3 +-
>>   .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 81 +++++++++++++++++++---
>>   include/linux/mlx5/eswitch.h                       | 11 ++-
>>   3 files changed, 82 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
>> index 90489c5..844351c 100644
>> --- a/drivers/infiniband/hw/mlx5/main.c
>> +++ b/drivers/infiniband/hw/mlx5/main.c
>> @@ -3535,7 +3535,8 @@ static void mlx5_ib_set_rule_source_port(struct mlx5_ib_dev *dev,
>>   		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
>>   				    misc_parameters_2);
>>
>> -		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
>> +		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
>> +			 mlx5_eswitch_get_vport_metadata_mask());
>>   	} else {
>>   		misc = MLX5_ADDR_OF(fte_match_param, spec->match_value,
>>   				    misc_parameters);
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>> index a6d0b62..873b19c 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>> @@ -50,6 +50,19 @@
>>   #define MLX5_ESW_MISS_FLOWS (2)
>>   #define UPLINK_REP_INDEX 0
>>
>> +/* Reg C0 usage:
>> + * Reg C0 = < VHCA_ID_BITS(8) | VPORT BITS(8) | CHAIN_TAG(16) >
>> + *
>> + * Highest 8 bits of the reg c0 is the vhca_id, next 8 bits is vport_num,
>> + * the rest (lowest 16 bits) is left for tc chain tag restoration.
>> + * VHCA_ID + VPORT comprise the SOURCE_PORT matching.
>> + */
>> +#define VHCA_ID_BITS 8
>> +#define VPORT_BITS 8
>> +#define SOURCE_PORT_METADATA_BITS (VHCA_ID_BITS + VPORT_BITS)
>> +#define SOURCE_PORT_METADATA_OFFSET (32 - SOURCE_PORT_METADATA_BITS)
>> +#define CHAIN_TAG_METADATA_BITS (32 - SOURCE_PORT_METADATA_BITS)
>> +

[...]


>> +u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
>>   					      u16 vport_num)
>>   {
>> -	return ((MLX5_CAP_GEN(esw->dev, vhca_id) & 0xffff) << 16) | vport_num;
>> +	u32 vport_num_mask = GENMASK(VPORT_BITS - 1, 0);
>> +	u32 vhca_id_mask = GENMASK(VHCA_ID_BITS - 1, 0);
>> +	u32 vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
>> +	u32 val;
>> +
>> +	/* Make sure the vhca_id fits the VHCA_ID_BITS */
>> +	WARN_ON_ONCE(vhca_id >= BIT(VHCA_ID_BITS));
>> +
>> +	/* Trim vhca_id to VHCA_ID_BITS */
>> +	vhca_id &= vhca_id_mask;
>> +
>> +	/* Make sure pf and ecpf map to end of VPORT_BITS range so they
>> +	 * don't overlap with VF numbers, and themselves, after trimming.
>> +	 */
>> +	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) <
>> +		     vport_num_mask - 1);
>> +	WARN_ON_ONCE((MLX5_VPORT_ECPF & vport_num_mask) <
>> +		     vport_num_mask - 1);
>> +	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) ==
>> +		     (MLX5_VPORT_ECPF & vport_num_mask));
>> +
>> +	/* Make sure that the VF vport_num fits VPORT_BITS and don't
>> +	 * overlap with pf and ecpf.
>> +	 */
>> +	if (vport_num != MLX5_VPORT_UPLINK &&
>> +	    vport_num != MLX5_VPORT_ECPF)
>> +		WARN_ON_ONCE(vport_num >= vport_num_mask - 1);
>> +
>> +	/* We can now trim vport_num to VPORT_BITS */
>> +	vport_num &= vport_num_mask;
>> +
>> +	val = (vhca_id << VPORT_BITS) | vport_num;
>> +	return val << (32 - SOURCE_PORT_METADATA_BITS);
>>   }
>>   EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match);
>> +
>> +u32 mlx5_eswitch_get_vport_metadata_mask(void)
>> +{
>> +	return GENMASK(31, 32 - SOURCE_PORT_METADATA_BITS);
>> +}
>> +EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_mask);
> This function can be inline in .h file easily and actually does nothing
> except return 0xFFFF.

We will move this and relevant defines to the h file, and remove the exported symbol
  

>> diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h
>> index 98e667b..080b67c 100644
>> --- a/include/linux/mlx5/eswitch.h
>> +++ b/include/linux/mlx5/eswitch.h
>> @@ -71,8 +71,9 @@ enum devlink_eswitch_encap_mode
>>   mlx5_eswitch_get_encap_mode(const struct mlx5_core_dev *dev);
>>
>>   bool mlx5_eswitch_vport_match_metadata_enabled(const struct mlx5_eswitch *esw);
>> -u32 mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
>> +u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
>>   					      u16 vport_num);
>> +u32 mlx5_eswitch_get_vport_metadata_mask(void);
>>   u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw);
>>   #else  /* CONFIG_MLX5_ESWITCH */
>>
>> @@ -94,11 +95,17 @@ static inline u8 mlx5_eswitch_mode(struct mlx5_eswitch *esw)
>>   };
>>
>>   static inline u32
>> -mlx5_eswitch_get_vport_metadata_for_match(const struct mlx5_eswitch *esw,
>> +mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
>>   					  int vport_num)
>>   {
>>   	return 0;
>>   };
>> +
>> +static inline u32
>> +mlx5_eswitch_get_vport_metadata_mask(void)
>> +{
>> +	return 0;
>> +}
>>   #endif /* CONFIG_MLX5_ESWITCH */
>>
>>   #endif
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits
  2020-01-22 13:42     ` Paul Blakey
@ 2020-01-22 13:50       ` Leon Romanovsky
  0 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2020-01-22 13:50 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko

On Wed, Jan 22, 2020 at 01:42:33PM +0000, Paul Blakey wrote:
>
> On 1/21/2020 9:08 PM, Leon Romanovsky wrote:
> > On Tue, Jan 21, 2020 at 06:16:12PM +0200, Paul Blakey wrote:
> >> Multi chain support requires the miss path to continue the processing
> >> from the last chain id, and for that we need to save the chain
> >> miss tag (a mapping for 32bit chain id) on reg_c0 which will
> >> come in a next patch.
> >>
> >> Currently reg_c0 is exclusively used to store the source port
> >> metadata, giving it 32bit, it is created from 16bits of vcha_id,
> >> and 16bits of vport number.
> >>
> >> We will move this source port metadata to upper 16bits, and leave the
> >> lower bits for the chain miss tag. We compress the reg_c0 source port
> >> metadata to 16bits by taking 8 bits from vhca_id, and 8bits from
> >> the vport number.
> >>
> >> Since we compress the vport number to 8bits statically, and leave two
> >> top ids for special PF/ECPF numbers, we will only support a max of 254
> >> vports with this strategy.
> >>
> >> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> >> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> >> Reviewed-by: Mark Bloch <markb@mellanox.com>
> >> ---
> >>   drivers/infiniband/hw/mlx5/main.c                  |  3 +-
> >>   .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 81 +++++++++++++++++++---
> >>   include/linux/mlx5/eswitch.h                       | 11 ++-
> >>   3 files changed, 82 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> >> index 90489c5..844351c 100644
> >> --- a/drivers/infiniband/hw/mlx5/main.c
> >> +++ b/drivers/infiniband/hw/mlx5/main.c
> >> @@ -3535,7 +3535,8 @@ static void mlx5_ib_set_rule_source_port(struct mlx5_ib_dev *dev,
> >>   		misc = MLX5_ADDR_OF(fte_match_param, spec->match_criteria,
> >>   				    misc_parameters_2);
> >>
> >> -		MLX5_SET_TO_ONES(fte_match_set_misc2, misc, metadata_reg_c_0);
> >> +		MLX5_SET(fte_match_set_misc2, misc, metadata_reg_c_0,
> >> +			 mlx5_eswitch_get_vport_metadata_mask());
> >>   	} else {
> >>   		misc = MLX5_ADDR_OF(fte_match_param, spec->match_value,
> >>   				    misc_parameters);
> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >> index a6d0b62..873b19c 100644
> >> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> >> @@ -50,6 +50,19 @@
> >>   #define MLX5_ESW_MISS_FLOWS (2)
> >>   #define UPLINK_REP_INDEX 0
> >>
> >> +/* Reg C0 usage:
> >> + * Reg C0 = < VHCA_ID_BITS(8) | VPORT BITS(8) | CHAIN_TAG(16) >
> >> + *
> >> + * Highest 8 bits of the reg c0 is the vhca_id, next 8 bits is vport_num,
> >> + * the rest (lowest 16 bits) is left for tc chain tag restoration.
> >> + * VHCA_ID + VPORT comprise the SOURCE_PORT matching.
> >> + */
> >> +#define VHCA_ID_BITS 8
> >> +#define VPORT_BITS 8
> >> +#define SOURCE_PORT_METADATA_BITS (VHCA_ID_BITS + VPORT_BITS)
> >> +#define SOURCE_PORT_METADATA_OFFSET (32 - SOURCE_PORT_METADATA_BITS)
> >> +#define CHAIN_TAG_METADATA_BITS (32 - SOURCE_PORT_METADATA_BITS)
> >> +
>
> [...]
>
>
> >> +u32 mlx5_eswitch_get_vport_metadata_for_match(struct mlx5_eswitch *esw,
> >>   					      u16 vport_num)
> >>   {
> >> -	return ((MLX5_CAP_GEN(esw->dev, vhca_id) & 0xffff) << 16) | vport_num;
> >> +	u32 vport_num_mask = GENMASK(VPORT_BITS - 1, 0);
> >> +	u32 vhca_id_mask = GENMASK(VHCA_ID_BITS - 1, 0);
> >> +	u32 vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id);
> >> +	u32 val;
> >> +
> >> +	/* Make sure the vhca_id fits the VHCA_ID_BITS */
> >> +	WARN_ON_ONCE(vhca_id >= BIT(VHCA_ID_BITS));
> >> +
> >> +	/* Trim vhca_id to VHCA_ID_BITS */
> >> +	vhca_id &= vhca_id_mask;
> >> +
> >> +	/* Make sure pf and ecpf map to end of VPORT_BITS range so they
> >> +	 * don't overlap with VF numbers, and themselves, after trimming.
> >> +	 */
> >> +	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) <
> >> +		     vport_num_mask - 1);
> >> +	WARN_ON_ONCE((MLX5_VPORT_ECPF & vport_num_mask) <
> >> +		     vport_num_mask - 1);
> >> +	WARN_ON_ONCE((MLX5_VPORT_UPLINK & vport_num_mask) ==
> >> +		     (MLX5_VPORT_ECPF & vport_num_mask));
> >> +
> >> +	/* Make sure that the VF vport_num fits VPORT_BITS and don't
> >> +	 * overlap with pf and ecpf.
> >> +	 */
> >> +	if (vport_num != MLX5_VPORT_UPLINK &&
> >> +	    vport_num != MLX5_VPORT_ECPF)
> >> +		WARN_ON_ONCE(vport_num >= vport_num_mask - 1);
> >> +
> >> +	/* We can now trim vport_num to VPORT_BITS */
> >> +	vport_num &= vport_num_mask;
> >> +
> >> +	val = (vhca_id << VPORT_BITS) | vport_num;
> >> +	return val << (32 - SOURCE_PORT_METADATA_BITS);
> >>   }
> >>   EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_for_match);
> >> +
> >> +u32 mlx5_eswitch_get_vport_metadata_mask(void)
> >> +{
> >> +	return GENMASK(31, 32 - SOURCE_PORT_METADATA_BITS);
> >> +}
> >> +EXPORT_SYMBOL(mlx5_eswitch_get_vport_metadata_mask);
> > This function can be inline in .h file easily and actually does nothing
> > except return 0xFFFF.
>
> We will move this and relevant defines to the h file, and remove the exported symbol

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data
  2020-01-22 12:17     ` Paul Blakey
@ 2020-01-22 13:51       ` Leon Romanovsky
  0 siblings, 0 replies; 23+ messages in thread
From: Leon Romanovsky @ 2020-01-22 13:51 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko

On Wed, Jan 22, 2020 at 12:17:44PM +0000, Paul Blakey wrote:
>
> On 1/21/2020 9:04 PM, Leon Romanovsky wrote:
> > On Tue, Jan 21, 2020 at 06:16:11PM +0200, Paul Blakey wrote:
> >> Add a new interface for mapping data to a given id range (max_id),
> >> and back again. It supports variable sized data, and different
> >> allocators, and read/write locks.
> >>
> >> This mapping interface also supports delaying the mapping removal via
> >> a workqueue. This is for cases where we need the mapping to have
> >> some grace period in regards to finding it back again, for example
> >> for packets arriving from hardware that were marked with by a rule
> >> with an old mapping that no longer exists.
> >>
> >> We also provide a first implementation of the interface is idr_mapping
> >> that uses idr for the allocator and a mutex lock for writes
> >> (add/del, but not for find).
> >>
> >> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> >> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> >> Reviewed-by: Mark Bloch <markb@mellanox.com>
> >> ---
> > I have many issues with this patch, but two main are:
> > 1. This is general implementation without proper documentation and test
> > which doesn't belong to driver code.
> > 2. It looks very similar to already existing code, for example xarray.
> >
> > Thanks
> This data structure uses idr (currently wrapper for xarray) but also a
> hash table, refcount, and
> generic allocators.
> The hashtable is used on top of the idr to find if data added to the
> mapping already exists, if it
> does it updates a refcount.
> We also have some special delayed removal for our use case.
> The addition to xarray is translation from data to hash function. It is
> something that doesn't exist
> and needs extra code. IDR was chosen as being simplified interface of
> xarray and it is good enough
> in our case.
>
> The mlx5 is first user of such library, once the other user will arrive,
> we will be happy to
> collaborate in order to make it generic.

Makes sense, thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 00/13] Handle multi chain hardware misses
  2020-01-21 21:18 ` [PATCH net-next 00/13] Handle multi chain hardware misses Saeed Mahameed
@ 2020-01-23  9:54   ` David Miller
  2020-01-24 20:26     ` Saeed Mahameed
  0 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2020-01-23  9:54 UTC (permalink / raw)
  To: saeedm; +Cc: vladbu, netdev, jiri, paulb, ozsh, jakub.kicinski

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Tue, 21 Jan 2020 21:18:21 +0000

> On Tue, 2020-01-21 at 18:16 +0200, Paul Blakey wrote:
>> Note that miss path handling of multi-chain rules is a required
>> infrastructure
>> for connection tracking hardware offload. The connection tracking
>> offload
>> series will follow this one.
> 
> Hi Dave and Jakub,
> 
> As Paul explained this is part one of two parts series,
> 
> Assuming the review will go with no issues i would like to suggest the
> following acceptance options:
> 
> option 1) I can create a separate side branch for connection tracking
> offload and once Paul submits the final patch of this feature and the
> mailing list review is complete, i can send to you full pull request
> with everything included .. 
> 
> option 2) you to apply directly to net-next both patchsets
> individually. (the normal process)
> 
> Please let me know what works better for you.
> 
> Personally I prefer option 1) so we won't endup stuck with only one
> half of the connection tracking series if the review of the 2nd part
> doesn't go as planned.

I'm fine with option #1 and will wait for that to appear in one of
your future pull requests.  It looks like patch #1 got some feedback
and needs some modifications first though.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 00/13] Handle multi chain hardware misses
  2020-01-23  9:54   ` David Miller
@ 2020-01-24 20:26     ` Saeed Mahameed
  0 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2020-01-24 20:26 UTC (permalink / raw)
  To: davem, kuba; +Cc: Oz Shlomo, jiri, netdev, Vlad Buslov, Paul Blakey

On Thu, 2020-01-23 at 10:54 +0100, David Miller wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Tue, 21 Jan 2020 21:18:21 +0000
> 
> > On Tue, 2020-01-21 at 18:16 +0200, Paul Blakey wrote:
> >> Note that miss path handling of multi-chain rules is a required
> >> infrastructure
> >> for connection tracking hardware offload. The connection tracking
> >> offload
> >> series will follow this one.
> > 
> > Hi Dave and Jakub,
> > 
> > As Paul explained this is part one of two parts series,
> > 
> > Assuming the review will go with no issues i would like to suggest
> the
> > following acceptance options:
> > 
> > option 1) I can create a separate side branch for connection
> tracking
> > offload and once Paul submits the final patch of this feature and
> the
> > mailing list review is complete, i can send to you full pull
> request
> > with everything included .. 
> > 
> > option 2) you to apply directly to net-next both patchsets
> > individually. (the normal process)
> > 
> > Please let me know what works better for you.
> > 
> > Personally I prefer option 1) so we won't endup stuck with only one
> > half of the connection tracking series if the review of the 2nd
> part
> > doesn't go as planned.
> 
> I'm fine with option #1 and will wait for that to appear in one of

Cool, will do option #1 then.. 

> your future pull requests.  It looks like patch #1 got some feedback
> and needs some modifications first though.
> 

Yes, Paul will send V3 and I will wait for all the needed ACKs and
Reviews, for this patchset and the ones to follow.

Thanks,
Saeed.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-01-24 20:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-21 16:16 [PATCH net-next 00/13] Handle multi chain hardware misses Paul Blakey
2020-01-21 16:16 ` [PATCH net-next 01/13] net: sched: support skb chain ext in tc classification path Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 02/13] net/mlx5: Add new driver lib for mappings unique ids to data Paul Blakey
2020-01-21 19:04   ` Leon Romanovsky
2020-01-22 12:17     ` Paul Blakey
2020-01-22 13:51       ` Leon Romanovsky
2020-01-21 16:16 ` [PATCH net-next-mlx5 03/13] net/mlx5: E-Switch, Move source port on reg_c0 to the upper 16 bits Paul Blakey
2020-01-21 19:08   ` Leon Romanovsky
2020-01-22 13:42     ` Paul Blakey
2020-01-22 13:50       ` Leon Romanovsky
2020-01-21 16:16 ` [PATCH net-next-mlx5 04/13] net/mlx5: E-Switch, Get reg_c0 value on CQE Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 05/13] net/mlx5: E-Switch, Mark miss packets with new chain id mapping Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 06/13] net/mlx5e: Rx, Split rep rx mpwqe handler from nic Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 07/13] net/mlx5: E-Switch, Restore chain id on miss Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 08/13] net/mlx5e: Allow re-allocating mod header actions Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 09/13] net/mlx5e: Move tc tunnel parsing logic with the rest at tc_tun module Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 10/13] net/mlx5e: Disallow inserting vxlan/vlan egress rules without decap/pop Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 11/13] net/mlx5e: Support inner header rewrite with goto action Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 12/13] net/mlx5: E-Switch, Get reg_c1 value on miss Paul Blakey
2020-01-21 16:16 ` [PATCH net-next-mlx5 13/13] net/mlx5e: Restore tunnel metadata " Paul Blakey
2020-01-21 21:18 ` [PATCH net-next 00/13] Handle multi chain hardware misses Saeed Mahameed
2020-01-23  9:54   ` David Miller
2020-01-24 20:26     ` Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).