All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] mlx5 Rx tunnel offloading
@ 2018-04-10 13:34 Xueming Li
  2018-04-10 13:34 ` [PATCH v2 01/15] net/mlx5: support 16 hardware priorities Xueming Li
                   ` (14 more replies)
  0 siblings, 15 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

v2:
- Split into 2 series: public api and mlx5, this one is the second.
- Rebased on Adrien's rte flow overhaul:
  http://www.dpdk.org/ml/archives/dev/2018-April/095774.html
v1:
- Support new tunnel type MPLS-in-GRE and MPLS-in-UDP
- Remove deprecation notes of rss level

This patchset supports MLX5 Rx tunnel checksum, inner rss, inner ptype offloading of following tunnel types:
- Standard VXLAN
- L3 VXLAN (no inner ethernet header)
- VXLAN-GPE
- MPLS-in-GRE
- MPLS-in-GPE


Xueming Li (15):
  net/mlx5: support 16 hardware priorities
  net/mlx5: support GRE tunnel flow
  net/mlx5: support L3 vxlan flow
  net/mlx5: support Rx tunnel type identification
  net/mlx5: support tunnel inner checksum offloads
  net/mlx5: split flow RSS handling logic
  net/mlx5: support tunnel RSS level
  net/mlx5: add hardware flow debug dump
  net/mlx5: introduce VXLAN-GPE tunnel type
  net/mlx5: allow flow tunnel ID 0 with outer pattern
  net/mlx5: support MPLS-in-GRE and MPLS-in-UDP
  doc: update mlx5 guide on tunnel offloading
  net/mlx5: setup RSS flow regardless of queue count
  net/mlx5: fix invalid flow item check
  net/mlx5: support RSS configuration in isolated mode

 doc/guides/nics/mlx5.rst              |   4 +-
 drivers/net/mlx5/Makefile             |   7 +-
 drivers/net/mlx5/mlx5.c               |  29 ++
 drivers/net/mlx5/mlx5.h               |   9 +
 drivers/net/mlx5/mlx5_flow.c          | 920 ++++++++++++++++++++++++++++------
 drivers/net/mlx5/mlx5_glue.c          |  16 +
 drivers/net/mlx5/mlx5_glue.h          |   8 +
 drivers/net/mlx5/mlx5_rxq.c           |  80 ++-
 drivers/net/mlx5/mlx5_rxtx.c          |  33 +-
 drivers/net/mlx5/mlx5_rxtx.h          |  11 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  21 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  17 +-
 drivers/net/mlx5/mlx5_trigger.c       |   8 -
 drivers/net/mlx5/mlx5_utils.h         |   6 +
 14 files changed, 951 insertions(+), 218 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 14:41   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 02/15] net/mlx5: support GRE tunnel flow Xueming Li
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Adjust flow priority mapping to adapt new hardware 16 verb flow
priorites support:
0-3: RTE FLOW tunnel rule
4-7: RTE FLOW non-tunnel rule
8-15: PMD control flow

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5.c         |  10 ++++
 drivers/net/mlx5/mlx5.h         |   8 +++
 drivers/net/mlx5/mlx5_flow.c    | 107 ++++++++++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_trigger.c |   8 ---
 4 files changed, 100 insertions(+), 33 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cfab55897..a1f2799e5 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -197,6 +197,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
 		priv->txqs_n = 0;
 		priv->txqs = NULL;
 	}
+	mlx5_flow_delete_drop_queue(dev);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
@@ -993,6 +994,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 		mlx5_set_link_up(eth_dev);
 		/* Store device configuration on private structure. */
 		priv->config = config;
+		/* Create drop queue. */
+		err = mlx5_flow_create_drop_queue(eth_dev);
+		if (err) {
+			DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
+				eth_dev->data->port_id, strerror(rte_errno));
+			goto port_error;
+		}
+		/* Supported flow priority number detection. */
+		mlx5_flow_priorities_detect(eth_dev);
 		continue;
 port_error:
 		if (priv)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 63b24e6bb..708272f6d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -89,6 +89,8 @@ struct mlx5_dev_config {
 	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
 	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
 	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
+	unsigned int flow_priority_shift; /* Non-tunnel flow priority shift. */
+	unsigned int control_flow_priority; /* Control flow priority. */
 	unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */
 	unsigned int ind_table_max_size; /* Maximum indirection table size. */
 	int txq_inline; /* Maximum packet size for inlining. */
@@ -105,6 +107,11 @@ enum mlx5_verbs_alloc_type {
 	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
 };
 
+/* 8 Verbs priorities per flow. */
+#define MLX5_VERBS_FLOW_PRIO_8 8
+/* 4 Verbs priorities per flow. */
+#define MLX5_VERBS_FLOW_PRIO_4 4
+
 /**
  * Verbs allocator needs a context to know in the callback which kind of
  * resources it is allocating.
@@ -253,6 +260,7 @@ int mlx5_traffic_restart(struct rte_eth_dev *dev);
 
 /* mlx5_flow.c */
 
+void mlx5_flow_priorities_detect(struct rte_eth_dev *dev);
 int mlx5_flow_validate(struct rte_eth_dev *dev,
 		       const struct rte_flow_attr *attr,
 		       const struct rte_flow_item items[],
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 288610620..394760418 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -32,9 +32,6 @@
 #include "mlx5_prm.h"
 #include "mlx5_glue.h"
 
-/* Define minimal priority for control plane flows. */
-#define MLX5_CTRL_FLOW_PRIORITY 4
-
 /* Internet Protocol versions. */
 #define MLX5_IPV4 4
 #define MLX5_IPV6 6
@@ -129,7 +126,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_RX_HASH_SRC_PORT_TCP |
 				IBV_RX_HASH_DST_PORT_TCP),
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
-		.flow_priority = 1,
+		.flow_priority = 0,
 		.ip_version = MLX5_IPV4,
 	},
 	[HASH_RXQ_UDPV4] = {
@@ -138,7 +135,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_RX_HASH_SRC_PORT_UDP |
 				IBV_RX_HASH_DST_PORT_UDP),
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
-		.flow_priority = 1,
+		.flow_priority = 0,
 		.ip_version = MLX5_IPV4,
 	},
 	[HASH_RXQ_IPV4] = {
@@ -146,7 +143,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_RX_HASH_DST_IPV4),
 		.dpdk_rss_hf = (ETH_RSS_IPV4 |
 				ETH_RSS_FRAG_IPV4),
-		.flow_priority = 2,
+		.flow_priority = 1,
 		.ip_version = MLX5_IPV4,
 	},
 	[HASH_RXQ_TCPV6] = {
@@ -155,7 +152,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_RX_HASH_SRC_PORT_TCP |
 				IBV_RX_HASH_DST_PORT_TCP),
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_TCP,
-		.flow_priority = 1,
+		.flow_priority = 0,
 		.ip_version = MLX5_IPV6,
 	},
 	[HASH_RXQ_UDPV6] = {
@@ -164,7 +161,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_RX_HASH_SRC_PORT_UDP |
 				IBV_RX_HASH_DST_PORT_UDP),
 		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_UDP,
-		.flow_priority = 1,
+		.flow_priority = 0,
 		.ip_version = MLX5_IPV6,
 	},
 	[HASH_RXQ_IPV6] = {
@@ -172,13 +169,13 @@ const struct hash_rxq_init hash_rxq_init[] = {
 				IBV_RX_HASH_DST_IPV6),
 		.dpdk_rss_hf = (ETH_RSS_IPV6 |
 				ETH_RSS_FRAG_IPV6),
-		.flow_priority = 2,
+		.flow_priority = 1,
 		.ip_version = MLX5_IPV6,
 	},
 	[HASH_RXQ_ETH] = {
 		.hash_fields = 0,
 		.dpdk_rss_hf = 0,
-		.flow_priority = 3,
+		.flow_priority = 2,
 	},
 };
 
@@ -536,6 +533,8 @@ mlx5_flow_item_validate(const struct rte_flow_item *item,
 /**
  * Extract attribute to the parser.
  *
+ * @param dev
+ *   Pointer to Ethernet device.
  * @param[in] attr
  *   Flow rule attributes.
  * @param[out] error
@@ -545,9 +544,12 @@ mlx5_flow_item_validate(const struct rte_flow_item *item,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_flow_convert_attributes(const struct rte_flow_attr *attr,
+mlx5_flow_convert_attributes(struct rte_eth_dev *dev,
+			     const struct rte_flow_attr *attr,
 			     struct rte_flow_error *error)
 {
+	struct priv *priv = dev->data->dev_private;
+
 	if (attr->group) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -555,7 +557,7 @@ mlx5_flow_convert_attributes(const struct rte_flow_attr *attr,
 				   "groups are not supported");
 		return -rte_errno;
 	}
-	if (attr->priority && attr->priority != MLX5_CTRL_FLOW_PRIORITY) {
+	if (attr->priority > priv->config.control_flow_priority) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
 				   NULL,
@@ -900,30 +902,38 @@ mlx5_flow_convert_allocate(unsigned int size, struct rte_flow_error *error)
  * Make inner packet matching with an higher priority from the non Inner
  * matching.
  *
+ * @param dev
+ *   Pointer to Ethernet device.
  * @param[in, out] parser
  *   Internal parser structure.
  * @param attr
  *   User flow attribute.
  */
 static void
-mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
+mlx5_flow_update_priority(struct rte_eth_dev *dev,
+			  struct mlx5_flow_parse *parser,
 			  const struct rte_flow_attr *attr)
 {
+	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
+	uint16_t priority;
 
+	if (priv->config.flow_priority_shift == 1)
+		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
+	else
+		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
+	if (!parser->inner)
+		priority += priv->config.flow_priority_shift;
 	if (parser->drop) {
-		parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
-			attr->priority +
-			hash_rxq_init[HASH_RXQ_ETH].flow_priority;
+		parser->queue[HASH_RXQ_ETH].ibv_attr->priority = priority +
+				hash_rxq_init[HASH_RXQ_ETH].flow_priority;
 		return;
 	}
 	for (i = 0; i != hash_rxq_init_n; ++i) {
-		if (parser->queue[i].ibv_attr) {
-			parser->queue[i].ibv_attr->priority =
-				attr->priority +
-				hash_rxq_init[i].flow_priority -
-				(parser->inner ? 1 : 0);
-		}
+		if (!parser->queue[i].ibv_attr)
+			continue;
+		parser->queue[i].ibv_attr->priority = priority +
+				hash_rxq_init[i].flow_priority;
 	}
 }
 
@@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 		.layer = HASH_RXQ_ETH,
 		.mark_id = MLX5_FLOW_MARK_DEFAULT,
 	};
-	ret = mlx5_flow_convert_attributes(attr, error);
+	ret = mlx5_flow_convert_attributes(dev, attr, error);
 	if (ret)
 		return ret;
 	ret = mlx5_flow_convert_actions(dev, actions, error, parser);
@@ -1158,7 +1168,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 	 */
 	if (!parser->drop)
 		mlx5_flow_convert_finalise(parser);
-	mlx5_flow_update_priority(parser, attr);
+	mlx5_flow_update_priority(dev, parser, attr);
 exit_free:
 	/* Only verification is expected, all resources should be released. */
 	if (!parser->create) {
@@ -2450,7 +2460,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 	const struct rte_flow_attr attr = {
 		.ingress = 1,
-		.priority = MLX5_CTRL_FLOW_PRIORITY,
+		.priority = priv->config.control_flow_priority,
 	};
 	struct rte_flow_item items[] = {
 		{
@@ -3161,3 +3171,50 @@ mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
 	}
 	return 0;
 }
+
+/**
+ * Detect number of Verbs flow priorities supported.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ */
+void
+mlx5_flow_priorities_detect(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+	uint32_t verb_priorities = MLX5_VERBS_FLOW_PRIO_8 * 2;
+	struct {
+		struct ibv_flow_attr attr;
+		struct ibv_flow_spec_eth eth;
+		struct ibv_flow_spec_action_drop drop;
+	} flow_attr = {
+		.attr = {
+			.num_of_specs = 2,
+			.priority = verb_priorities - 1,
+		},
+		.eth = {
+			.type = IBV_FLOW_SPEC_ETH,
+			.size = sizeof(struct ibv_flow_spec_eth),
+		},
+		.drop = {
+			.size = sizeof(struct ibv_flow_spec_action_drop),
+			.type = IBV_FLOW_SPEC_ACTION_DROP,
+		},
+	};
+	struct ibv_flow *flow;
+
+	if (priv->config.control_flow_priority)
+		return;
+	flow = mlx5_glue->create_flow(priv->flow_drop_queue->qp,
+				      &flow_attr.attr);
+	if (flow) {
+		priv->config.flow_priority_shift = MLX5_VERBS_FLOW_PRIO_8 / 2;
+		claim_zero(mlx5_glue->destroy_flow(flow));
+	} else {
+		priv->config.flow_priority_shift = 1;
+		verb_priorities = verb_priorities / 2;
+	}
+	priv->config.control_flow_priority = 1;
+	DRV_LOG(INFO, "port %u Verbs flow priorities: %d",
+		dev->data->port_id, verb_priorities);
+}
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6bb4ffb14..d80a2e688 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->dev_started = 1;
-	ret = mlx5_flow_create_drop_queue(dev);
-	if (ret) {
-		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
-			dev->data->port_id, strerror(rte_errno));
-		goto error;
-	}
 	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx queues",
 		dev->data->port_id);
 	rte_mempool_walk(mlx5_mp2mr_iter, priv);
@@ -202,7 +196,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
 	mlx5_traffic_disable(dev);
 	mlx5_txq_stop(dev);
 	mlx5_rxq_stop(dev);
-	mlx5_flow_delete_drop_queue(dev);
 	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
@@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
 	mlx5_rxq_stop(dev);
 	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv->mr))
 		mlx5_mr_release(mr);
-	mlx5_flow_delete_drop_queue(dev);
 }
 
 /**
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 02/15] net/mlx5: support GRE tunnel flow
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
  2018-04-10 13:34 ` [PATCH v2 01/15] net/mlx5: support 16 hardware priorities Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 13:34 ` [PATCH v2 03/15] net/mlx5: support L3 vxlan flow Xueming Li
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Support GRE tunnel type flow.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 69 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 394760418..026952b46 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -87,6 +87,11 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 		       const void *default_mask,
 		       struct mlx5_flow_data *data);
 
+static int
+mlx5_flow_create_gre(const struct rte_flow_item *item,
+		       const void *default_mask,
+		       struct mlx5_flow_data *data);
+
 struct mlx5_flow_parse;
 
 static void
@@ -229,6 +234,10 @@ struct rte_flow {
 		__VA_ARGS__, RTE_FLOW_ITEM_TYPE_END, \
 	}
 
+#define IS_TUNNEL(type) ( \
+	(type) == RTE_FLOW_ITEM_TYPE_VXLAN || \
+	(type) == RTE_FLOW_ITEM_TYPE_GRE)
+
 /** Structure to generate a simple graph of layers supported by the NIC. */
 struct mlx5_flow_items {
 	/** List of possible actions for these items. */
@@ -282,7 +291,8 @@ static const enum rte_flow_action_type valid_actions[] = {
 static const struct mlx5_flow_items mlx5_flow_items[] = {
 	[RTE_FLOW_ITEM_TYPE_END] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
-			       RTE_FLOW_ITEM_TYPE_VXLAN),
+			       RTE_FLOW_ITEM_TYPE_VXLAN,
+			       RTE_FLOW_ITEM_TYPE_GRE),
 	},
 	[RTE_FLOW_ITEM_TYPE_ETH] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_VLAN,
@@ -314,7 +324,8 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 	},
 	[RTE_FLOW_ITEM_TYPE_IPV4] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_UDP,
-			       RTE_FLOW_ITEM_TYPE_TCP),
+			       RTE_FLOW_ITEM_TYPE_TCP,
+			       RTE_FLOW_ITEM_TYPE_GRE),
 		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_ipv4){
 			.hdr = {
@@ -331,7 +342,8 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 	},
 	[RTE_FLOW_ITEM_TYPE_IPV6] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_UDP,
-			       RTE_FLOW_ITEM_TYPE_TCP),
+			       RTE_FLOW_ITEM_TYPE_TCP,
+			       RTE_FLOW_ITEM_TYPE_GRE),
 		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_ipv6){
 			.hdr = {
@@ -384,6 +396,19 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 		.convert = mlx5_flow_create_tcp,
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
+	[RTE_FLOW_ITEM_TYPE_GRE] = {
+		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
+			       RTE_FLOW_ITEM_TYPE_IPV4,
+			       RTE_FLOW_ITEM_TYPE_IPV6),
+		.actions = valid_actions,
+		.mask = &(const struct rte_flow_item_gre){
+			.protocol = -1,
+		},
+		.default_mask = &rte_flow_item_gre_mask,
+		.mask_sz = sizeof(struct rte_flow_item_gre),
+		.convert = mlx5_flow_create_gre,
+		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
+	},
 	[RTE_FLOW_ITEM_TYPE_VXLAN] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH),
 		.actions = valid_actions,
@@ -399,7 +424,7 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 
 /** Structure to pass to the conversion function. */
 struct mlx5_flow_parse {
-	uint32_t inner; /**< Set once VXLAN is encountered. */
+	uint32_t inner; /**< Verbs value, set once tunnel is encountered. */
 	uint32_t create:1;
 	/**< Whether resources should remain after a validate. */
 	uint32_t drop:1; /**< Target is a drop queue. */
@@ -832,13 +857,13 @@ mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
 					      cur_item->mask_sz);
 		if (ret)
 			goto exit_item_not_supported;
-		if (items->type == RTE_FLOW_ITEM_TYPE_VXLAN) {
+		if (IS_TUNNEL(items->type)) {
 			if (parser->inner) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
 						   items,
-						   "cannot recognize multiple"
-						   " VXLAN encapsulations");
+						   "Cannot recognize multiple"
+						   " tunnel encapsulations.");
 				return -rte_errno;
 			}
 			parser->inner = IBV_FLOW_SPEC_INNER;
@@ -1634,6 +1659,36 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 }
 
 /**
+ * Convert GRE item to Verbs specification.
+ *
+ * @param item[in]
+ *   Item specification.
+ * @param default_mask[in]
+ *   Default bit-masks to use when item->mask is not provided.
+ * @param data[in, out]
+ *   User structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_create_gre(const struct rte_flow_item *item __rte_unused,
+		     const void *default_mask __rte_unused,
+		     struct mlx5_flow_data *data)
+{
+	struct mlx5_flow_parse *parser = data->parser;
+	unsigned int size = sizeof(struct ibv_flow_spec_tunnel);
+	struct ibv_flow_spec_tunnel tunnel = {
+		.type = parser->inner | IBV_FLOW_SPEC_VXLAN_TUNNEL,
+		.size = size,
+	};
+
+	parser->inner = IBV_FLOW_SPEC_INNER;
+	mlx5_flow_create_copy(parser, &tunnel, size);
+	return 0;
+}
+
+/**
  * Convert mark/flag action to Verbs specification.
  *
  * @param parser
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 03/15] net/mlx5: support L3 vxlan flow
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
  2018-04-10 13:34 ` [PATCH v2 01/15] net/mlx5: support 16 hardware priorities Xueming Li
  2018-04-10 13:34 ` [PATCH v2 02/15] net/mlx5: support GRE tunnel flow Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 14:53   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification Xueming Li
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

This patch add L3 vxlan support, no inner L2 header comparing to
standard vxlan protocol.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 026952b46..870d05250 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -410,7 +410,9 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
 	},
 	[RTE_FLOW_ITEM_TYPE_VXLAN] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH),
+		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
+			       RTE_FLOW_ITEM_TYPE_IPV4,
+			       RTE_FLOW_ITEM_TYPE_IPV6),
 		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_vxlan){
 			.vni = "\xff\xff\xff",
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (2 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 03/15] net/mlx5: support L3 vxlan flow Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 15:17   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads Xueming Li
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

This patch introduced tunnel type identification based on flow rules.
If flows of multiple tunnel types built on same queue,
RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be used
as tunnel type identifier.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c          | 125 +++++++++++++++++++++++++++++-----
 drivers/net/mlx5/mlx5_rxq.c           |  11 ++-
 drivers/net/mlx5/mlx5_rxtx.c          |  12 ++--
 drivers/net/mlx5/mlx5_rxtx.h          |   9 ++-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  21 +++---
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  17 +++--
 6 files changed, 157 insertions(+), 38 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 870d05250..65d7a9b62 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -222,6 +222,7 @@ struct rte_flow {
 	struct rte_flow_action_rss rss_conf; /**< RSS configuration */
 	uint16_t (*queues)[]; /**< Queues indexes to use. */
 	uint8_t rss_key[40]; /**< copy of the RSS key. */
+	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
 	struct ibv_counter_set *cs; /**< Holds the counters for the rule. */
 	struct mlx5_flow_counter_stats counter_stats;/**<The counter stats. */
 	struct mlx5_flow frxq[RTE_DIM(hash_rxq_init)];
@@ -238,6 +239,19 @@ struct rte_flow {
 	(type) == RTE_FLOW_ITEM_TYPE_VXLAN || \
 	(type) == RTE_FLOW_ITEM_TYPE_GRE)
 
+const uint32_t flow_ptype[] = {
+	[RTE_FLOW_ITEM_TYPE_VXLAN] = RTE_PTYPE_TUNNEL_VXLAN,
+	[RTE_FLOW_ITEM_TYPE_GRE] = RTE_PTYPE_TUNNEL_GRE,
+};
+
+#define PTYPE_IDX(t) ((RTE_PTYPE_TUNNEL_MASK & (t)) >> 12)
+
+const uint32_t ptype_ext[] = {
+	[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)] = RTE_PTYPE_TUNNEL_VXLAN |
+					      RTE_PTYPE_L4_UDP,
+	[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)] = RTE_PTYPE_TUNNEL_GRE,
+};
+
 /** Structure to generate a simple graph of layers supported by the NIC. */
 struct mlx5_flow_items {
 	/** List of possible actions for these items. */
@@ -437,6 +451,7 @@ struct mlx5_flow_parse {
 	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queues indexes to use. */
 	uint8_t rss_key[40]; /**< copy of the RSS key. */
 	enum hash_rxq_type layer; /**< Last pattern layer detected. */
+	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
 	struct ibv_counter_set *cs; /**< Holds the counter set for the rule */
 	struct {
 		struct ibv_flow_attr *ibv_attr;
@@ -860,7 +875,7 @@ mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
 		if (ret)
 			goto exit_item_not_supported;
 		if (IS_TUNNEL(items->type)) {
-			if (parser->inner) {
+			if (parser->tunnel) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
 						   items,
@@ -869,6 +884,7 @@ mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
 				return -rte_errno;
 			}
 			parser->inner = IBV_FLOW_SPEC_INNER;
+			parser->tunnel = flow_ptype[items->type];
 		}
 		if (parser->drop) {
 			parser->queue[HASH_RXQ_ETH].offset += cur_item->dst_sz;
@@ -1165,6 +1181,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 	}
 	/* Third step. Conversion parse, fill the specifications. */
 	parser->inner = 0;
+	parser->tunnel = 0;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
 		struct mlx5_flow_data data = {
 			.parser = parser,
@@ -1633,6 +1650,7 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 
 	id.vni[0] = 0;
 	parser->inner = IBV_FLOW_SPEC_INNER;
+	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)];
 	if (spec) {
 		if (!mask)
 			mask = default_mask;
@@ -1686,6 +1704,7 @@ mlx5_flow_create_gre(const struct rte_flow_item *item __rte_unused,
 	};
 
 	parser->inner = IBV_FLOW_SPEC_INNER;
+	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)];
 	mlx5_flow_create_copy(parser, &tunnel, size);
 	return 0;
 }
@@ -1864,7 +1883,8 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
 				      parser->rss_conf.key_len,
 				      hash_fields,
 				      parser->rss_conf.queue,
-				      parser->rss_conf.queue_num);
+				      parser->rss_conf.queue_num,
+				      parser->tunnel);
 		if (flow->frxq[i].hrxq)
 			continue;
 		flow->frxq[i].hrxq =
@@ -1873,7 +1893,8 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
 				      parser->rss_conf.key_len,
 				      hash_fields,
 				      parser->rss_conf.queue,
-				      parser->rss_conf.queue_num);
+				      parser->rss_conf.queue_num,
+				      parser->tunnel);
 		if (!flow->frxq[i].hrxq) {
 			return rte_flow_error_set(error, ENOMEM,
 						  RTE_FLOW_ERROR_TYPE_HANDLE,
@@ -1885,6 +1906,40 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
 }
 
 /**
+ * RXQ update after flow rule creation.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to the flow rule.
+ */
+static void
+mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct rte_flow *flow)
+{
+	struct priv *priv = dev->data->dev_private;
+	unsigned int i;
+
+	if (!dev->data->dev_started)
+		return;
+	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
+		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
+						 [(*flow->queues)[i]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
+
+		rxq_data->mark |= flow->mark;
+		if (!tunnel)
+			continue;
+		rxq_ctrl->tunnel_types[tunnel] += 1;
+		if (rxq_data->tunnel != flow->tunnel)
+			rxq_data->tunnel = rxq_data->tunnel ?
+					   RTE_PTYPE_TUNNEL_MASK :
+					   flow->tunnel;
+	}
+}
+
+/**
  * Complete flow rule creation.
  *
  * @param dev
@@ -1944,12 +1999,7 @@ mlx5_flow_create_action_queue(struct rte_eth_dev *dev,
 				   NULL, "internal error in flow creation");
 		goto error;
 	}
-	for (i = 0; i != parser->rss_conf.queue_num; ++i) {
-		struct mlx5_rxq_data *q =
-			(*priv->rxqs)[parser->rss_conf.queue[i]];
-
-		q->mark |= parser->mark;
-	}
+	mlx5_flow_create_update_rxqs(dev, flow);
 	return 0;
 error:
 	ret = rte_errno; /* Save rte_errno before cleanup. */
@@ -2022,6 +2072,7 @@ mlx5_flow_list_create(struct rte_eth_dev *dev,
 	}
 	/* Copy configuration. */
 	flow->queues = (uint16_t (*)[])(flow + 1);
+	flow->tunnel = parser.tunnel;
 	flow->rss_conf = (struct rte_flow_action_rss){
 		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
 		.level = 0,
@@ -2113,9 +2164,38 @@ mlx5_flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (flow->drop || !flow->mark)
+	if (flow->drop || !dev->data->dev_started)
 		goto free;
-	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
+	for (i = 0; flow->tunnel && i != flow->rss_conf.queue_num; ++i) {
+		/* Update queue tunnel type. */
+		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
+						 [(*flow->queues)[i]];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
+		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
+
+		RTE_ASSERT(rxq_ctrl->tunnel_types[tunnel] > 0);
+		rxq_ctrl->tunnel_types[tunnel] -= 1;
+		if (!rxq_ctrl->tunnel_types[tunnel]) {
+			/* Update tunnel type. */
+			uint8_t j;
+			uint8_t types = 0;
+			uint8_t last;
+
+			for (j = 0; j < RTE_DIM(rxq_ctrl->tunnel_types); j++)
+				if (rxq_ctrl->tunnel_types[j]) {
+					types += 1;
+					last = j;
+				}
+			/* Keep same if more than one tunnel types left. */
+			if (types == 1)
+				rxq_data->tunnel = ptype_ext[last];
+			else if (types == 0)
+				/* No tunnel type left. */
+				rxq_data->tunnel = 0;
+		}
+	}
+	for (i = 0; flow->mark && i != flow->rss_conf.queue_num; ++i) {
 		struct rte_flow *tmp;
 		int mark = 0;
 
@@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *flow;
+	unsigned int i;
 
 	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
-		unsigned int i;
 		struct mlx5_ind_table_ibv *ind_tbl = NULL;
 
 		if (flow->drop) {
@@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list)
 		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data->port_id,
 			(void *)flow);
 	}
+	/* Cleanup Rx queue tunnel info. */
+	for (i = 0; i != priv->rxqs_n; ++i) {
+		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
+		struct mlx5_rxq_ctrl *rxq_ctrl =
+			container_of(q, struct mlx5_rxq_ctrl, rxq);
+
+		memset((void *)rxq_ctrl->tunnel_types, 0,
+		       sizeof(rxq_ctrl->tunnel_types));
+		q->tunnel = 0;
+	}
 }
 
 /**
@@ -2429,7 +2519,8 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
 					      flow->rss_conf.key_len,
 					      hash_rxq_init[i].hash_fields,
 					      flow->rss_conf.queue,
-					      flow->rss_conf.queue_num);
+					      flow->rss_conf.queue_num,
+					      flow->tunnel);
 			if (flow->frxq[i].hrxq)
 				goto flow_create;
 			flow->frxq[i].hrxq =
@@ -2437,7 +2528,8 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
 					      flow->rss_conf.key_len,
 					      hash_rxq_init[i].hash_fields,
 					      flow->rss_conf.queue,
-					      flow->rss_conf.queue_num);
+					      flow->rss_conf.queue_num,
+					      flow->tunnel);
 			if (!flow->frxq[i].hrxq) {
 				DRV_LOG(DEBUG,
 					"port %u flow %p cannot be applied",
@@ -2459,10 +2551,7 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
 			DRV_LOG(DEBUG, "port %u flow %p applied",
 				dev->data->port_id, (void *)flow);
 		}
-		if (!flow->mark)
-			continue;
-		for (i = 0; i != flow->rss_conf.queue_num; ++i)
-			(*priv->rxqs)[flow->rss_conf.queue[i]]->mark = 1;
+		mlx5_flow_create_update_rxqs(dev, flow);
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 1e4354ab3..351acfc0f 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
  *   first queue index will be taken for the indirection table.
  * @param queues_n
  *   Number of queues.
+ * @param tunnel
+ *   Tunnel type.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
@@ -1394,7 +1396,7 @@ struct mlx5_hrxq *
 mlx5_hrxq_new(struct rte_eth_dev *dev,
 	      const uint8_t *rss_key, uint32_t rss_key_len,
 	      uint64_t hash_fields,
-	      const uint16_t *queues, uint32_t queues_n)
+	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct mlx5_hrxq *hrxq;
@@ -1438,6 +1440,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 	hrxq->qp = qp;
 	hrxq->rss_key_len = rss_key_len;
 	hrxq->hash_fields = hash_fields;
+	hrxq->tunnel = tunnel;
 	memcpy(hrxq->rss_key, rss_key, rss_key_len);
 	rte_atomic32_inc(&hrxq->refcnt);
 	LIST_INSERT_HEAD(&priv->hrxqs, hrxq, next);
@@ -1466,6 +1469,8 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
  *   first queue index will be taken for the indirection table.
  * @param queues_n
  *   Number of queues.
+ * @param tunnel
+ *   Tunnel type.
  *
  * @return
  *   An hash Rx queue on success.
@@ -1474,7 +1479,7 @@ struct mlx5_hrxq *
 mlx5_hrxq_get(struct rte_eth_dev *dev,
 	      const uint8_t *rss_key, uint32_t rss_key_len,
 	      uint64_t hash_fields,
-	      const uint16_t *queues, uint32_t queues_n)
+	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct mlx5_hrxq *hrxq;
@@ -1489,6 +1494,8 @@ mlx5_hrxq_get(struct rte_eth_dev *dev,
 			continue;
 		if (hrxq->hash_fields != hash_fields)
 			continue;
+		if (hrxq->tunnel != tunnel)
+			continue;
 		ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n);
 		if (!ind_tbl)
 			continue;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 1f422c70b..d061dfc8a 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -34,7 +34,7 @@
 #include "mlx5_prm.h"
 
 static __rte_always_inline uint32_t
-rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe);
+rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe);
 
 static __rte_always_inline int
 mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
@@ -125,12 +125,14 @@ mlx5_set_ptype_table(void)
 	(*p)[0x8a] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
 		     RTE_PTYPE_L4_UDP;
 	/* Tunneled - L3 */
+	(*p)[0x40] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
 	(*p)[0x41] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
 		     RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
 		     RTE_PTYPE_INNER_L4_NONFRAG;
 	(*p)[0x42] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
 		     RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
 		     RTE_PTYPE_INNER_L4_NONFRAG;
+	(*p)[0xc0] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
 	(*p)[0xc1] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
 		     RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
 		     RTE_PTYPE_INNER_L4_NONFRAG;
@@ -1577,6 +1579,8 @@ mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 /**
  * Translate RX completion flags to packet type.
  *
+ * @param[in] rxq
+ *   Pointer to RX queue structure.
  * @param[in] cqe
  *   Pointer to CQE.
  *
@@ -1586,7 +1590,7 @@ mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   Packet type for struct rte_mbuf.
  */
 static inline uint32_t
-rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
+rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe)
 {
 	uint8_t idx;
 	uint8_t pinfo = cqe->pkt_info;
@@ -1601,7 +1605,7 @@ rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
 	 * bit[7] = outer_l3_type
 	 */
 	idx = ((pinfo & 0x3) << 6) | ((ptype & 0xfc00) >> 10);
-	return mlx5_ptype_table[idx];
+	return mlx5_ptype_table[idx] | rxq->tunnel * !!(idx & (1 << 6));
 }
 
 /**
@@ -1833,7 +1837,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			pkt = seg;
 			assert(len >= (rxq->crc_present << 2));
 			/* Update packet information. */
-			pkt->packet_type = rxq_cq_to_pkt_type(cqe);
+			pkt->packet_type = rxq_cq_to_pkt_type(rxq, cqe);
 			pkt->ol_flags = 0;
 			if (rss_hash_res && rxq->rss_hash) {
 				pkt->hash.rss = rss_hash_res;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index a702cb603..6866f6818 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -104,6 +104,7 @@ struct mlx5_rxq_data {
 	void *cq_uar; /* CQ user access region. */
 	uint32_t cqn; /* CQ number. */
 	uint8_t cq_arm_sn; /* CQ arm seq number. */
+	uint32_t tunnel; /* Tunnel information. */
 } __rte_cache_aligned;
 
 /* Verbs Rx queue elements. */
@@ -125,6 +126,7 @@ struct mlx5_rxq_ctrl {
 	struct mlx5_rxq_ibv *ibv; /* Verbs elements. */
 	struct mlx5_rxq_data rxq; /* Data path structure. */
 	unsigned int socket; /* CPU socket ID for allocations. */
+	uint32_t tunnel_types[16]; /* Tunnel type counter. */
 	unsigned int irq:1; /* Whether IRQ is enabled. */
 	uint16_t idx; /* Queue index. */
 };
@@ -145,6 +147,7 @@ struct mlx5_hrxq {
 	struct mlx5_ind_table_ibv *ind_table; /* Indirection table. */
 	struct ibv_qp *qp; /* Verbs queue pair. */
 	uint64_t hash_fields; /* Verbs Hash fields. */
+	uint32_t tunnel; /* Tunnel type. */
 	uint32_t rss_key_len; /* Hash key length in bytes. */
 	uint8_t rss_key[]; /* Hash key. */
 };
@@ -248,11 +251,13 @@ int mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev);
 struct mlx5_hrxq *mlx5_hrxq_new(struct rte_eth_dev *dev,
 				const uint8_t *rss_key, uint32_t rss_key_len,
 				uint64_t hash_fields,
-				const uint16_t *queues, uint32_t queues_n);
+				const uint16_t *queues, uint32_t queues_n,
+				uint32_t tunnel);
 struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint8_t *rss_key, uint32_t rss_key_len,
 				uint64_t hash_fields,
-				const uint16_t *queues, uint32_t queues_n);
+				const uint16_t *queues, uint32_t queues_n,
+				uint32_t tunnel);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_ibv_verify(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index bbe1818ef..9f9136108 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -551,6 +551,7 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
 	const uint64x1_t mbuf_init = vld1_u64(&rxq->mbuf_initializer);
 	const uint64x1_t r32_mask = vcreate_u64(0xffffffff);
 	uint64x2_t rearm0, rearm1, rearm2, rearm3;
+	uint8_t pt_idx0, pt_idx1, pt_idx2, pt_idx3;
 
 	if (rxq->mark) {
 		const uint32x4_t ft_def = vdupq_n_u32(MLX5_FLOW_MARK_DEFAULT);
@@ -583,14 +584,18 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
 	ptype = vshrn_n_u32(ptype_info, 10);
 	/* Errored packets will have RTE_PTYPE_ALL_MASK. */
 	ptype = vorr_u16(ptype, op_err);
-	pkts[0]->packet_type =
-		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 6)];
-	pkts[1]->packet_type =
-		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 4)];
-	pkts[2]->packet_type =
-		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 2)];
-	pkts[3]->packet_type =
-		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 0)];
+	pt_idx0 = vget_lane_u8(vreinterpret_u8_u16(ptype), 6);
+	pt_idx1 = vget_lane_u8(vreinterpret_u8_u16(ptype), 4);
+	pt_idx2 = vget_lane_u8(vreinterpret_u8_u16(ptype), 2);
+	pt_idx3 = vget_lane_u8(vreinterpret_u8_u16(ptype), 0);
+	pkts[0]->packet_type = mlx5_ptype_table[pt_idx0] |
+			       !!(pt_idx0 & (1 << 6)) * rxq->tunnel;
+	pkts[1]->packet_type = mlx5_ptype_table[pt_idx1] |
+			       !!(pt_idx1 & (1 << 6)) * rxq->tunnel;
+	pkts[2]->packet_type = mlx5_ptype_table[pt_idx2] |
+			       !!(pt_idx2 & (1 << 6)) * rxq->tunnel;
+	pkts[3]->packet_type = mlx5_ptype_table[pt_idx3] |
+			       !!(pt_idx3 & (1 << 6)) * rxq->tunnel;
 	/* Fill flags for checksum and VLAN. */
 	pinfo = vandq_u32(ptype_info, ptype_ol_mask);
 	pinfo = vreinterpretq_u32_u8(
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index c088bcb51..d2492481d 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -542,6 +542,7 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i cqes[4],
 	const __m128i mbuf_init =
 		_mm_loadl_epi64((__m128i *)&rxq->mbuf_initializer);
 	__m128i rearm0, rearm1, rearm2, rearm3;
+	uint8_t pt_idx0, pt_idx1, pt_idx2, pt_idx3;
 
 	/* Extract pkt_info field. */
 	pinfo0 = _mm_unpacklo_epi32(cqes[0], cqes[1]);
@@ -595,10 +596,18 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i cqes[4],
 	/* Errored packets will have RTE_PTYPE_ALL_MASK. */
 	op_err = _mm_srli_epi16(op_err, 8);
 	ptype = _mm_or_si128(ptype, op_err);
-	pkts[0]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 0)];
-	pkts[1]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 2)];
-	pkts[2]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 4)];
-	pkts[3]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 6)];
+	pt_idx0 = _mm_extract_epi8(ptype, 0);
+	pt_idx1 = _mm_extract_epi8(ptype, 2);
+	pt_idx2 = _mm_extract_epi8(ptype, 4);
+	pt_idx3 = _mm_extract_epi8(ptype, 6);
+	pkts[0]->packet_type = mlx5_ptype_table[pt_idx0] |
+			       !!(pt_idx0 & (1 << 6)) * rxq->tunnel;
+	pkts[1]->packet_type = mlx5_ptype_table[pt_idx1] |
+			       !!(pt_idx1 & (1 << 6)) * rxq->tunnel;
+	pkts[2]->packet_type = mlx5_ptype_table[pt_idx2] |
+			       !!(pt_idx2 & (1 << 6)) * rxq->tunnel;
+	pkts[3]->packet_type = mlx5_ptype_table[pt_idx3] |
+			       !!(pt_idx3 & (1 << 6)) * rxq->tunnel;
 	/* Fill flags for checksum and VLAN. */
 	pinfo = _mm_and_si128(pinfo, ptype_ol_mask);
 	pinfo = _mm_shuffle_epi8(cv_flag_sel, pinfo);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (3 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 15:27   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 06/15] net/mlx5: split flow RSS handling logic Xueming Li
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

This patch support tunnel inner checksum offloads. By creating tunnel
flow, once tunnel packet type(RTE_PTYPE_TUNNEL_xxx) identified,
PKT_RX_IP_CKSUM_XXX and PKT_RX_L4_CKSUM_XXX represent checksum result of
inner headers, outer L3 and L4 header checksum are always valid as soon
as tunnel identified. If no tunnel identified, PKT_RX_IP_CKSUM_XXX and
PKT_RX_L4_CKSUM_XXX represent checksum result of outer L3 and L4
headers.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c |  7 +++++--
 drivers/net/mlx5/mlx5_rxq.c  |  2 --
 drivers/net/mlx5/mlx5_rxtx.c | 18 ++++--------------
 drivers/net/mlx5/mlx5_rxtx.h |  1 -
 4 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 65d7a9b62..b3ad6dc85 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -829,6 +829,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
 /**
  * Validate items.
  *
+ * @param dev
+ *   Pointer to Ethernet device.
  * @param[in] items
  *   Pattern specification (list terminated by the END pattern item).
  * @param[out] error
@@ -840,7 +842,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
+mlx5_flow_convert_items_validate(struct rte_eth_dev *dev __rte_unused,
+				 const struct rte_flow_item items[],
 				 struct rte_flow_error *error,
 				 struct mlx5_flow_parse *parser)
 {
@@ -1146,7 +1149,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 	ret = mlx5_flow_convert_actions(dev, actions, error, parser);
 	if (ret)
 		return ret;
-	ret = mlx5_flow_convert_items_validate(items, error, parser);
+	ret = mlx5_flow_convert_items_validate(dev, items, error, parser);
 	if (ret)
 		return ret;
 	mlx5_flow_convert_finalise(parser);
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 351acfc0f..073732e16 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1045,8 +1045,6 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	}
 	/* Toggle RX checksum offload if hardware supports it. */
 	tmpl->rxq.csum = !!(conf->offloads & DEV_RX_OFFLOAD_CHECKSUM);
-	tmpl->rxq.csum_l2tun = (!!(conf->offloads & DEV_RX_OFFLOAD_CHECKSUM) &&
-				priv->config.tunnel_en);
 	tmpl->rxq.hw_timestamp = !!(conf->offloads & DEV_RX_OFFLOAD_TIMESTAMP);
 	/* Configure VLAN stripping. */
 	tmpl->rxq.vlan_strip = !!(conf->offloads & DEV_RX_OFFLOAD_VLAN_STRIP);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index d061dfc8a..285b2dbf0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -41,7 +41,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
 		 uint16_t cqe_cnt, uint32_t *rss_hash);
 
 static __rte_always_inline uint32_t
-rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe);
+rxq_cq_to_ol_flags(volatile struct mlx5_cqe *cqe);
 
 uint32_t mlx5_ptype_table[] __rte_cache_aligned = {
 	[0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */
@@ -1728,8 +1728,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
 /**
  * Translate RX completion flags to offload flags.
  *
- * @param[in] rxq
- *   Pointer to RX queue structure.
  * @param[in] cqe
  *   Pointer to CQE.
  *
@@ -1737,7 +1735,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
  *   Offload flags (ol_flags) for struct rte_mbuf.
  */
 static inline uint32_t
-rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe)
+rxq_cq_to_ol_flags(volatile struct mlx5_cqe *cqe)
 {
 	uint32_t ol_flags = 0;
 	uint16_t flags = rte_be_to_cpu_16(cqe->hdr_type_etc);
@@ -1749,14 +1747,6 @@ rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe)
 		TRANSPOSE(flags,
 			  MLX5_CQE_RX_L4_HDR_VALID,
 			  PKT_RX_L4_CKSUM_GOOD);
-	if ((cqe->pkt_info & MLX5_CQE_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
-		ol_flags |=
-			TRANSPOSE(flags,
-				  MLX5_CQE_RX_L3_HDR_VALID,
-				  PKT_RX_IP_CKSUM_GOOD) |
-			TRANSPOSE(flags,
-				  MLX5_CQE_RX_L4_HDR_VALID,
-				  PKT_RX_L4_CKSUM_GOOD);
 	return ol_flags;
 }
 
@@ -1855,8 +1845,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 						mlx5_flow_mark_get(mark);
 				}
 			}
-			if (rxq->csum | rxq->csum_l2tun)
-				pkt->ol_flags |= rxq_cq_to_ol_flags(rxq, cqe);
+			if (rxq->csum)
+				pkt->ol_flags |= rxq_cq_to_ol_flags(cqe);
 			if (rxq->vlan_strip &&
 			    (cqe->hdr_type_etc &
 			     rte_cpu_to_be_16(MLX5_CQE_VLAN_STRIPPED))) {
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 6866f6818..d35605b55 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -77,7 +77,6 @@ struct rxq_zip {
 /* RX queue descriptor. */
 struct mlx5_rxq_data {
 	unsigned int csum:1; /* Enable checksum offloading. */
-	unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
 	unsigned int hw_timestamp:1; /* Enable HW timestamp. */
 	unsigned int vlan_strip:1; /* Enable VLAN stripping. */
 	unsigned int crc_present:1; /* CRC must be subtracted. */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 06/15] net/mlx5: split flow RSS handling logic
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (4 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 15:28   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 07/15] net/mlx5: support tunnel RSS level Xueming Li
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

This patch split out flow RSS hash field handling logic to dedicate
function.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 94 +++++++++++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 41 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index b3ad6dc85..64658bc0e 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -992,13 +992,6 @@ mlx5_flow_update_priority(struct rte_eth_dev *dev,
 static void
 mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 {
-	const unsigned int ipv4 =
-		hash_rxq_init[parser->layer].ip_version == MLX5_IPV4;
-	const enum hash_rxq_type hmin = ipv4 ? HASH_RXQ_TCPV4 : HASH_RXQ_TCPV6;
-	const enum hash_rxq_type hmax = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
-	const enum hash_rxq_type ohmin = ipv4 ? HASH_RXQ_TCPV6 : HASH_RXQ_TCPV4;
-	const enum hash_rxq_type ohmax = ipv4 ? HASH_RXQ_IPV6 : HASH_RXQ_IPV4;
-	const enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
 	unsigned int i;
 
 	/* Remove any other flow not matching the pattern. */
@@ -1011,40 +1004,6 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 		}
 		return;
 	}
-	if (parser->layer == HASH_RXQ_ETH) {
-		goto fill;
-	} else {
-		/*
-		 * This layer becomes useless as the pattern define under
-		 * layers.
-		 */
-		rte_free(parser->queue[HASH_RXQ_ETH].ibv_attr);
-		parser->queue[HASH_RXQ_ETH].ibv_attr = NULL;
-	}
-	/* Remove opposite kind of layer e.g. IPv6 if the pattern is IPv4. */
-	for (i = ohmin; i != (ohmax + 1); ++i) {
-		if (!parser->queue[i].ibv_attr)
-			continue;
-		rte_free(parser->queue[i].ibv_attr);
-		parser->queue[i].ibv_attr = NULL;
-	}
-	/* Remove impossible flow according to the RSS configuration. */
-	if (hash_rxq_init[parser->layer].dpdk_rss_hf &
-	    parser->rss_conf.types) {
-		/* Remove any other flow. */
-		for (i = hmin; i != (hmax + 1); ++i) {
-			if ((i == parser->layer) ||
-			     (!parser->queue[i].ibv_attr))
-				continue;
-			rte_free(parser->queue[i].ibv_attr);
-			parser->queue[i].ibv_attr = NULL;
-		}
-	} else  if (!parser->queue[ip].ibv_attr) {
-		/* no RSS possible with the current configuration. */
-		parser->rss_conf.queue_num = 1;
-		return;
-	}
-fill:
 	/*
 	 * Fill missing layers in verbs specifications, or compute the correct
 	 * offset to allocate the memory space for the attributes and
@@ -1107,6 +1066,56 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 }
 
 /**
+ * Update flows according to pattern and RSS hash fields.
+ *
+ * @param[in, out] parser
+ *   Internal parser structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
+{
+	const unsigned int ipv4 =
+		hash_rxq_init[parser->layer].ip_version == MLX5_IPV4;
+	const enum hash_rxq_type hmin = ipv4 ? HASH_RXQ_TCPV4 : HASH_RXQ_TCPV6;
+	const enum hash_rxq_type hmax = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
+	const enum hash_rxq_type ohmin = ipv4 ? HASH_RXQ_TCPV6 : HASH_RXQ_TCPV4;
+	const enum hash_rxq_type ohmax = ipv4 ? HASH_RXQ_IPV6 : HASH_RXQ_IPV4;
+	const enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
+	unsigned int i;
+
+	if (parser->layer == HASH_RXQ_ETH)
+		return 0;
+	/* This layer becomes useless as the pattern define under layers. */
+	rte_free(parser->queue[HASH_RXQ_ETH].ibv_attr);
+	parser->queue[HASH_RXQ_ETH].ibv_attr = NULL;
+	/* Remove opposite kind of layer e.g. IPv6 if the pattern is IPv4. */
+	for (i = ohmin; i != (ohmax + 1); ++i) {
+		if (!parser->queue[i].ibv_attr)
+			continue;
+		rte_free(parser->queue[i].ibv_attr);
+		parser->queue[i].ibv_attr = NULL;
+	}
+	/* Remove impossible flow according to the RSS configuration. */
+	if (hash_rxq_init[parser->layer].dpdk_rss_hf &
+	    parser->rss_conf.types) {
+		/* Remove any other flow. */
+		for (i = hmin; i != (hmax + 1); ++i) {
+			if (i == parser->layer || !parser->queue[i].ibv_attr)
+				continue;
+			rte_free(parser->queue[i].ibv_attr);
+			parser->queue[i].ibv_attr = NULL;
+		}
+	} else if (!parser->queue[ip].ibv_attr) {
+		/* no RSS possible with the current configuration. */
+		parser->rss_conf.queue_num = 1;
+	}
+	return 0;
+}
+
+/**
  * Validate and convert a flow supported by the NIC.
  *
  * @param dev
@@ -1214,6 +1223,9 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 	 * configuration.
 	 */
 	if (!parser->drop)
+		ret = mlx5_flow_convert_rss(parser);
+		if (ret)
+			goto exit_free;
 		mlx5_flow_convert_finalise(parser);
 	mlx5_flow_update_priority(dev, parser, attr);
 exit_free:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 07/15] net/mlx5: support tunnel RSS level
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (5 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 06/15] net/mlx5: split flow RSS handling logic Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
       [not found]   ` <20180411085529.ecxuku77hg3mkybl@laranjeiro-vm.dev.6wind.com>
  2018-04-10 13:34 ` [PATCH v2 08/15] net/mlx5: add hardware flow debug dump Xueming Li
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Tunnel RSS level of flow RSS action offers user a choice to do RSS hash
calculation on inner or outer RSS fields. Testpmd flow command examples:

GRE flow inner RSS:
  flow create 0 ingress pattern eth / ipv4 proto is 47 / gre / end
actions rss queues 1 2 end level 1 / end

GRE tunnel flow outer RSS:
  flow create 0 ingress pattern eth  / ipv4 proto is 47 / gre / end
actions rss queues 1 2 end level 0 / end

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/Makefile    |   2 +-
 drivers/net/mlx5/mlx5_flow.c | 249 ++++++++++++++++++++++++++++++-------------
 drivers/net/mlx5/mlx5_glue.c |  16 +++
 drivers/net/mlx5/mlx5_glue.h |   8 ++
 drivers/net/mlx5/mlx5_rxq.c  |  46 +++++++-
 drivers/net/mlx5/mlx5_rxtx.h |   5 +-
 6 files changed, 246 insertions(+), 80 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index ae118ad33..f9a6c460b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -35,7 +35,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_pmd_mlx5.a
 LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
 LIB_GLUE_BASE = librte_pmd_mlx5_glue.so
-LIB_GLUE_VERSION = 18.02.0
+LIB_GLUE_VERSION = 18.05.0
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 64658bc0e..66c7d7993 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -113,6 +113,7 @@ enum hash_rxq_type {
 	HASH_RXQ_UDPV6,
 	HASH_RXQ_IPV6,
 	HASH_RXQ_ETH,
+	HASH_RXQ_TUNNEL,
 };
 
 /* Initialization data for hash RX queue. */
@@ -451,6 +452,7 @@ struct mlx5_flow_parse {
 	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queues indexes to use. */
 	uint8_t rss_key[40]; /**< copy of the RSS key. */
 	enum hash_rxq_type layer; /**< Last pattern layer detected. */
+	enum hash_rxq_type out_layer; /**< Last outer pattern layer detected. */
 	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
 	struct ibv_counter_set *cs; /**< Holds the counter set for the rule */
 	struct {
@@ -458,6 +460,7 @@ struct mlx5_flow_parse {
 		/**< Pointer to Verbs attributes. */
 		unsigned int offset;
 		/**< Current position or total size of the attribute. */
+		uint64_t hash_fields; /**< Verbs hash fields. */
 	} queue[RTE_DIM(hash_rxq_init)];
 };
 
@@ -698,7 +701,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
 						   " function is Toeplitz");
 				return -rte_errno;
 			}
-			if (rss->level) {
+#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+			if (parser->rss_conf.level > 0) {
 				rte_flow_error_set(error, EINVAL,
 						   RTE_FLOW_ERROR_TYPE_ACTION,
 						   actions,
@@ -706,6 +710,15 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
 						   " level is not supported");
 				return -rte_errno;
 			}
+#endif
+			if (parser->rss_conf.level > 1) {
+				rte_flow_error_set(error, EINVAL,
+						   RTE_FLOW_ERROR_TYPE_ACTION,
+						   actions,
+						   "RSS encapsulation level"
+						   " > 1 is not supported");
+				return -rte_errno;
+			}
 			if (rss->types & MLX5_RSS_HF_MASK) {
 				rte_flow_error_set(error, EINVAL,
 						   RTE_FLOW_ERROR_TYPE_ACTION,
@@ -756,7 +769,7 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
 			}
 			parser->rss_conf = (struct rte_flow_action_rss){
 				.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
-				.level = 0,
+				.level = rss->level,
 				.types = rss->types,
 				.key_len = rss_key_len,
 				.queue_num = rss->queue_num,
@@ -842,11 +855,12 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_flow_convert_items_validate(struct rte_eth_dev *dev __rte_unused,
+mlx5_flow_convert_items_validate(struct rte_eth_dev *dev,
 				 const struct rte_flow_item items[],
 				 struct rte_flow_error *error,
 				 struct mlx5_flow_parse *parser)
 {
+	struct priv *priv = dev->data->dev_private;
 	const struct mlx5_flow_items *cur_item = mlx5_flow_items;
 	unsigned int i;
 	int ret = 0;
@@ -886,6 +900,14 @@ mlx5_flow_convert_items_validate(struct rte_eth_dev *dev __rte_unused,
 						   " tunnel encapsulations.");
 				return -rte_errno;
 			}
+			if (!priv->config.tunnel_en &&
+			    parser->rss_conf.level) {
+				rte_flow_error_set(error, ENOTSUP,
+					RTE_FLOW_ERROR_TYPE_ITEM,
+					items,
+					"Tunnel offloading not enabled");
+				return -rte_errno;
+			}
 			parser->inner = IBV_FLOW_SPEC_INNER;
 			parser->tunnel = flow_ptype[items->type];
 		}
@@ -993,7 +1015,11 @@ static void
 mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 {
 	unsigned int i;
+	uint32_t inner = parser->inner;
 
+	/* Don't create extra flows for outer RSS. */
+	if (parser->tunnel && !parser->rss_conf.level)
+		return;
 	/* Remove any other flow not matching the pattern. */
 	if (parser->rss_conf.queue_num == 1 && !parser->rss_conf.types) {
 		for (i = 0; i != hash_rxq_init_n; ++i) {
@@ -1014,23 +1040,25 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 			struct ibv_flow_spec_ipv4_ext ipv4;
 			struct ibv_flow_spec_ipv6 ipv6;
 			struct ibv_flow_spec_tcp_udp udp_tcp;
+			struct ibv_flow_spec_eth eth;
 		} specs;
 		void *dst;
 		uint16_t size;
 
 		if (i == parser->layer)
 			continue;
-		if (parser->layer == HASH_RXQ_ETH) {
+		if (parser->layer == HASH_RXQ_ETH ||
+		    parser->layer == HASH_RXQ_TUNNEL) {
 			if (hash_rxq_init[i].ip_version == MLX5_IPV4) {
 				size = sizeof(struct ibv_flow_spec_ipv4_ext);
 				specs.ipv4 = (struct ibv_flow_spec_ipv4_ext){
-					.type = IBV_FLOW_SPEC_IPV4_EXT,
+					.type = inner | IBV_FLOW_SPEC_IPV4_EXT,
 					.size = size,
 				};
 			} else {
 				size = sizeof(struct ibv_flow_spec_ipv6);
 				specs.ipv6 = (struct ibv_flow_spec_ipv6){
-					.type = IBV_FLOW_SPEC_IPV6,
+					.type = inner | IBV_FLOW_SPEC_IPV6,
 					.size = size,
 				};
 			}
@@ -1047,7 +1075,7 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 		    (i == HASH_RXQ_UDPV6) || (i == HASH_RXQ_TCPV6)) {
 			size = sizeof(struct ibv_flow_spec_tcp_udp);
 			specs.udp_tcp = (struct ibv_flow_spec_tcp_udp) {
-				.type = ((i == HASH_RXQ_UDPV4 ||
+				.type = inner | ((i == HASH_RXQ_UDPV4 ||
 					  i == HASH_RXQ_UDPV6) ?
 					 IBV_FLOW_SPEC_UDP :
 					 IBV_FLOW_SPEC_TCP),
@@ -1068,6 +1096,8 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
 /**
  * Update flows according to pattern and RSS hash fields.
  *
+ * @param dev
+ *   Pointer to Ethernet device.
  * @param[in, out] parser
  *   Internal parser structure.
  *
@@ -1075,20 +1105,63 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
+mlx5_flow_convert_rss(struct rte_eth_dev *dev, struct mlx5_flow_parse *parser)
 {
-	const unsigned int ipv4 =
+	unsigned int ipv4 =
 		hash_rxq_init[parser->layer].ip_version == MLX5_IPV4;
 	const enum hash_rxq_type hmin = ipv4 ? HASH_RXQ_TCPV4 : HASH_RXQ_TCPV6;
 	const enum hash_rxq_type hmax = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
 	const enum hash_rxq_type ohmin = ipv4 ? HASH_RXQ_TCPV6 : HASH_RXQ_TCPV4;
 	const enum hash_rxq_type ohmax = ipv4 ? HASH_RXQ_IPV6 : HASH_RXQ_IPV4;
-	const enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
+	enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
 	unsigned int i;
+	int found = 0;
 
-	if (parser->layer == HASH_RXQ_ETH)
+	/*
+	 * Outer RSS.
+	 * HASH_RXQ_ETH is the only rule since tunnel packet match this
+	 * rule must match outer pattern.
+	 */
+	if (parser->tunnel && !parser->rss_conf.level) {
+		/* Remove flows other than default. */
+		for (i = 0; i != hash_rxq_init_n - 1; ++i) {
+			rte_free(parser->queue[i].ibv_attr);
+			parser->queue[i].ibv_attr = NULL;
+		}
+		ipv4 = hash_rxq_init[parser->out_layer].ip_version == MLX5_IPV4;
+		ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
+		if (hash_rxq_init[parser->out_layer].dpdk_rss_hf &
+		    parser->rss_conf.types) {
+			parser->queue[HASH_RXQ_ETH].hash_fields =
+				hash_rxq_init[parser->out_layer].hash_fields;
+		} else if (ip && (hash_rxq_init[ip].dpdk_rss_hf &
+		    parser->rss_conf.types)) {
+			parser->queue[HASH_RXQ_ETH].hash_fields =
+				hash_rxq_init[ip].hash_fields;
+		} else if (parser->rss_conf.types) {
+			DRV_LOG(WARNING,
+				"port %u rss outer hash function doesn't match"
+				" pattern", dev->data->port_id);
+		}
+		return 0;
+	}
+	if (parser->layer == HASH_RXQ_ETH || parser->layer == HASH_RXQ_TUNNEL) {
+		/* Remove unused flows according to hash function. */
+		for (i = 0; i != hash_rxq_init_n - 1; ++i) {
+			if (!parser->queue[i].ibv_attr)
+				continue;
+			if (hash_rxq_init[i].dpdk_rss_hf &
+			    parser->rss_conf.types) {
+				parser->queue[i].hash_fields =
+					hash_rxq_init[i].hash_fields;
+				continue;
+			}
+			rte_free(parser->queue[i].ibv_attr);
+			parser->queue[i].ibv_attr = NULL;
+		}
 		return 0;
-	/* This layer becomes useless as the pattern define under layers. */
+	}
+	/* Remove ETH layer flow. */
 	rte_free(parser->queue[HASH_RXQ_ETH].ibv_attr);
 	parser->queue[HASH_RXQ_ETH].ibv_attr = NULL;
 	/* Remove opposite kind of layer e.g. IPv6 if the pattern is IPv4. */
@@ -1098,9 +1171,52 @@ mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
 		rte_free(parser->queue[i].ibv_attr);
 		parser->queue[i].ibv_attr = NULL;
 	}
-	/* Remove impossible flow according to the RSS configuration. */
-	if (hash_rxq_init[parser->layer].dpdk_rss_hf &
-	    parser->rss_conf.types) {
+	/*
+	 * Keep L4 flows as IP pattern has to support L4 RSS.
+	 * Otherwise, only keep the flow that match the pattern.
+	 */
+	if (parser->layer != ip) {
+		/* Only keep the flow that match the pattern. */
+		for (i = hmin; i != (hmax + 1); ++i) {
+			if (i == parser->layer)
+				continue;
+			rte_free(parser->queue[i].ibv_attr);
+			parser->queue[i].ibv_attr = NULL;
+		}
+	}
+	if (parser->rss_conf.types) {
+		/* Remove impossible flow according to the RSS configuration. */
+		for (i = hmin; i != (hmax + 1); ++i) {
+			if (!parser->queue[i].ibv_attr)
+				continue;
+			if (parser->rss_conf.types &
+			    hash_rxq_init[i].dpdk_rss_hf) {
+				parser->queue[i].hash_fields =
+					hash_rxq_init[i].hash_fields;
+				found = 1;
+				continue;
+			}
+			/* L4 flow could be used for L3 RSS. */
+			if (i == parser->layer && i < ip &&
+			    (hash_rxq_init[ip].dpdk_rss_hf &
+			     parser->rss_conf.types)) {
+				parser->queue[i].hash_fields =
+					hash_rxq_init[ip].hash_fields;
+				found = 1;
+				continue;
+			}
+			/* L3 flow and L4 hash: non-rss L3 flow. */
+			if (i == parser->layer && i == ip && found)
+				/* IP pattern and L4 HF. */
+				continue;
+			rte_free(parser->queue[i].ibv_attr);
+			parser->queue[i].ibv_attr = NULL;
+		}
+		if (!found)
+			DRV_LOG(WARNING,
+				"port %u rss hash function doesn't match "
+				"pattern", dev->data->port_id);
+	} else {
 		/* Remove any other flow. */
 		for (i = hmin; i != (hmax + 1); ++i) {
 			if (i == parser->layer || !parser->queue[i].ibv_attr)
@@ -1108,8 +1224,6 @@ mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
 			rte_free(parser->queue[i].ibv_attr);
 			parser->queue[i].ibv_attr = NULL;
 		}
-	} else if (!parser->queue[ip].ibv_attr) {
-		/* no RSS possible with the current configuration. */
 		parser->rss_conf.queue_num = 1;
 	}
 	return 0;
@@ -1179,10 +1293,6 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 		for (i = 0; i != hash_rxq_init_n; ++i) {
 			unsigned int offset;
 
-			if (!(parser->rss_conf.types &
-			      hash_rxq_init[i].dpdk_rss_hf) &&
-			    (i != HASH_RXQ_ETH))
-				continue;
 			offset = parser->queue[i].offset;
 			parser->queue[i].ibv_attr =
 				mlx5_flow_convert_allocate(offset, error);
@@ -1194,6 +1304,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 	/* Third step. Conversion parse, fill the specifications. */
 	parser->inner = 0;
 	parser->tunnel = 0;
+	parser->layer = HASH_RXQ_ETH;
 	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
 		struct mlx5_flow_data data = {
 			.parser = parser,
@@ -1211,23 +1322,23 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
 		if (ret)
 			goto exit_free;
 	}
-	if (parser->mark)
-		mlx5_flow_create_flag_mark(parser, parser->mark_id);
-	if (parser->count && parser->create) {
-		mlx5_flow_create_count(dev, parser);
-		if (!parser->cs)
-			goto exit_count_error;
-	}
 	/*
 	 * Last step. Complete missing specification to reach the RSS
 	 * configuration.
 	 */
 	if (!parser->drop)
-		ret = mlx5_flow_convert_rss(parser);
+		ret = mlx5_flow_convert_rss(dev, parser);
 		if (ret)
 			goto exit_free;
 		mlx5_flow_convert_finalise(parser);
 	mlx5_flow_update_priority(dev, parser, attr);
+	if (parser->mark)
+		mlx5_flow_create_flag_mark(parser, parser->mark_id);
+	if (parser->count && parser->create) {
+		mlx5_flow_create_count(dev, parser);
+		if (!parser->cs)
+			goto exit_count_error;
+	}
 exit_free:
 	/* Only verification is expected, all resources should be released. */
 	if (!parser->create) {
@@ -1275,17 +1386,11 @@ mlx5_flow_create_copy(struct mlx5_flow_parse *parser, void *src,
 	for (i = 0; i != hash_rxq_init_n; ++i) {
 		if (!parser->queue[i].ibv_attr)
 			continue;
-		/* Specification must be the same l3 type or none. */
-		if (parser->layer == HASH_RXQ_ETH ||
-		    (hash_rxq_init[parser->layer].ip_version ==
-		     hash_rxq_init[i].ip_version) ||
-		    (hash_rxq_init[i].ip_version == 0)) {
-			dst = (void *)((uintptr_t)parser->queue[i].ibv_attr +
-					parser->queue[i].offset);
-			memcpy(dst, src, size);
-			++parser->queue[i].ibv_attr->num_of_specs;
-			parser->queue[i].offset += size;
-		}
+		dst = (void *)((uintptr_t)parser->queue[i].ibv_attr +
+				parser->queue[i].offset);
+		memcpy(dst, src, size);
+		++parser->queue[i].ibv_attr->num_of_specs;
+		parser->queue[i].offset += size;
 	}
 }
 
@@ -1316,9 +1421,7 @@ mlx5_flow_create_eth(const struct rte_flow_item *item,
 		.size = eth_size,
 	};
 
-	/* Don't update layer for the inner pattern. */
-	if (!parser->inner)
-		parser->layer = HASH_RXQ_ETH;
+	parser->layer = HASH_RXQ_ETH;
 	if (spec) {
 		unsigned int i;
 
@@ -1431,9 +1534,7 @@ mlx5_flow_create_ipv4(const struct rte_flow_item *item,
 		.size = ipv4_size,
 	};
 
-	/* Don't update layer for the inner pattern. */
-	if (!parser->inner)
-		parser->layer = HASH_RXQ_IPV4;
+	parser->layer = HASH_RXQ_IPV4;
 	if (spec) {
 		if (!mask)
 			mask = default_mask;
@@ -1486,9 +1587,7 @@ mlx5_flow_create_ipv6(const struct rte_flow_item *item,
 		.size = ipv6_size,
 	};
 
-	/* Don't update layer for the inner pattern. */
-	if (!parser->inner)
-		parser->layer = HASH_RXQ_IPV6;
+	parser->layer = HASH_RXQ_IPV6;
 	if (spec) {
 		unsigned int i;
 		uint32_t vtc_flow_val;
@@ -1561,13 +1660,10 @@ mlx5_flow_create_udp(const struct rte_flow_item *item,
 		.size = udp_size,
 	};
 
-	/* Don't update layer for the inner pattern. */
-	if (!parser->inner) {
-		if (parser->layer == HASH_RXQ_IPV4)
-			parser->layer = HASH_RXQ_UDPV4;
-		else
-			parser->layer = HASH_RXQ_UDPV6;
-	}
+	if (parser->layer == HASH_RXQ_IPV4)
+		parser->layer = HASH_RXQ_UDPV4;
+	else
+		parser->layer = HASH_RXQ_UDPV6;
 	if (spec) {
 		if (!mask)
 			mask = default_mask;
@@ -1610,13 +1706,10 @@ mlx5_flow_create_tcp(const struct rte_flow_item *item,
 		.size = tcp_size,
 	};
 
-	/* Don't update layer for the inner pattern. */
-	if (!parser->inner) {
-		if (parser->layer == HASH_RXQ_IPV4)
-			parser->layer = HASH_RXQ_TCPV4;
-		else
-			parser->layer = HASH_RXQ_TCPV6;
-	}
+	if (parser->layer == HASH_RXQ_IPV4)
+		parser->layer = HASH_RXQ_TCPV4;
+	else
+		parser->layer = HASH_RXQ_TCPV6;
 	if (spec) {
 		if (!mask)
 			mask = default_mask;
@@ -1666,6 +1759,8 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 	id.vni[0] = 0;
 	parser->inner = IBV_FLOW_SPEC_INNER;
 	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)];
+	parser->out_layer = parser->layer;
+	parser->layer = HASH_RXQ_TUNNEL;
 	if (spec) {
 		if (!mask)
 			mask = default_mask;
@@ -1720,6 +1815,8 @@ mlx5_flow_create_gre(const struct rte_flow_item *item __rte_unused,
 
 	parser->inner = IBV_FLOW_SPEC_INNER;
 	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)];
+	parser->out_layer = parser->layer;
+	parser->layer = HASH_RXQ_TUNNEL;
 	mlx5_flow_create_copy(parser, &tunnel, size);
 	return 0;
 }
@@ -1883,33 +1980,33 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
 	unsigned int i;
 
 	for (i = 0; i != hash_rxq_init_n; ++i) {
-		uint64_t hash_fields;
-
 		if (!parser->queue[i].ibv_attr)
 			continue;
 		flow->frxq[i].ibv_attr = parser->queue[i].ibv_attr;
 		parser->queue[i].ibv_attr = NULL;
-		hash_fields = hash_rxq_init[i].hash_fields;
+		flow->frxq[i].hash_fields = parser->queue[i].hash_fields;
 		if (!priv->dev->data->dev_started)
 			continue;
 		flow->frxq[i].hrxq =
 			mlx5_hrxq_get(dev,
 				      parser->rss_conf.key,
 				      parser->rss_conf.key_len,
-				      hash_fields,
+				      flow->frxq[i].hash_fields,
 				      parser->rss_conf.queue,
 				      parser->rss_conf.queue_num,
-				      parser->tunnel);
+				      parser->tunnel,
+				      parser->rss_conf.level);
 		if (flow->frxq[i].hrxq)
 			continue;
 		flow->frxq[i].hrxq =
 			mlx5_hrxq_new(dev,
 				      parser->rss_conf.key,
 				      parser->rss_conf.key_len,
-				      hash_fields,
+				      flow->frxq[i].hash_fields,
 				      parser->rss_conf.queue,
 				      parser->rss_conf.queue_num,
-				      parser->tunnel);
+				      parser->tunnel,
+				      parser->rss_conf.level);
 		if (!flow->frxq[i].hrxq) {
 			return rte_flow_error_set(error, ENOMEM,
 						  RTE_FLOW_ERROR_TYPE_HANDLE,
@@ -2006,7 +2103,7 @@ mlx5_flow_create_action_queue(struct rte_eth_dev *dev,
 		DRV_LOG(DEBUG, "port %u %p type %d QP %p ibv_flow %p",
 			dev->data->port_id,
 			(void *)flow, i,
-			(void *)flow->frxq[i].hrxq,
+			(void *)flow->frxq[i].hrxq->qp,
 			(void *)flow->frxq[i].ibv_flow);
 	}
 	if (!flows_n) {
@@ -2532,19 +2629,21 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
 			flow->frxq[i].hrxq =
 				mlx5_hrxq_get(dev, flow->rss_conf.key,
 					      flow->rss_conf.key_len,
-					      hash_rxq_init[i].hash_fields,
+					      flow->frxq[i].hash_fields,
 					      flow->rss_conf.queue,
 					      flow->rss_conf.queue_num,
-					      flow->tunnel);
+					      flow->tunnel,
+					      flow->rss_conf.level);
 			if (flow->frxq[i].hrxq)
 				goto flow_create;
 			flow->frxq[i].hrxq =
 				mlx5_hrxq_new(dev, flow->rss_conf.key,
 					      flow->rss_conf.key_len,
-					      hash_rxq_init[i].hash_fields,
+					      flow->frxq[i].hash_fields,
 					      flow->rss_conf.queue,
 					      flow->rss_conf.queue_num,
-					      flow->tunnel);
+					      flow->tunnel,
+					      flow->rss_conf.level);
 			if (!flow->frxq[i].hrxq) {
 				DRV_LOG(DEBUG,
 					"port %u flow %p cannot be applied",
diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
index be684d378..6874aa32a 100644
--- a/drivers/net/mlx5/mlx5_glue.c
+++ b/drivers/net/mlx5/mlx5_glue.c
@@ -313,6 +313,21 @@ mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj, uint64_t obj_type)
 	return mlx5dv_init_obj(obj, obj_type);
 }
 
+static struct ibv_qp *
+mlx5_glue_dv_create_qp(struct ibv_context *context,
+		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		       struct mlx5dv_qp_init_attr *dv_qp_init_attr)
+{
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
+#else
+	(void)context;
+	(void)qp_init_attr_ex;
+	(void)dv_qp_init_attr;
+	return NULL;
+#endif
+}
+
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.version = MLX5_GLUE_VERSION,
 	.fork_init = mlx5_glue_fork_init,
@@ -356,4 +371,5 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
 	.dv_query_device = mlx5_glue_dv_query_device,
 	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
 	.dv_init_obj = mlx5_glue_dv_init_obj,
+	.dv_create_qp = mlx5_glue_dv_create_qp,
 };
diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
index b5efee3b6..841363872 100644
--- a/drivers/net/mlx5/mlx5_glue.h
+++ b/drivers/net/mlx5/mlx5_glue.h
@@ -31,6 +31,10 @@ struct ibv_counter_set_init_attr;
 struct ibv_query_counter_set_attr;
 #endif
 
+#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+struct mlx5dv_qp_init_attr;
+#endif
+
 /* LIB_GLUE_VERSION must be updated every time this structure is modified. */
 struct mlx5_glue {
 	const char *version;
@@ -106,6 +110,10 @@ struct mlx5_glue {
 				   enum mlx5dv_set_ctx_attr_type type,
 				   void *attr);
 	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
+	struct ibv_qp *(*dv_create_qp)
+		(struct ibv_context *context,
+		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
+		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
 };
 
 const struct mlx5_glue *mlx5_glue;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 073732e16..6e5565fb2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
  *   Number of queues.
  * @param tunnel
  *   Tunnel type.
+ * @param rss_level
+ *   RSS hash on tunnel level.
  *
  * @return
  *   The Verbs object initialised, NULL otherwise and rte_errno is set.
@@ -1394,13 +1396,17 @@ struct mlx5_hrxq *
 mlx5_hrxq_new(struct rte_eth_dev *dev,
 	      const uint8_t *rss_key, uint32_t rss_key_len,
 	      uint64_t hash_fields,
-	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
+	      const uint16_t *queues, uint32_t queues_n,
+	      uint32_t tunnel, uint32_t rss_level)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct mlx5_hrxq *hrxq;
 	struct mlx5_ind_table_ibv *ind_tbl;
 	struct ibv_qp *qp;
 	int err;
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	struct mlx5dv_qp_init_attr qp_init_attr = {0};
+#endif
 
 	queues_n = hash_fields ? queues_n : 1;
 	ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n);
@@ -1410,6 +1416,33 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 		rte_errno = ENOMEM;
 		return NULL;
 	}
+#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
+	if (tunnel) {
+		qp_init_attr.comp_mask =
+				MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS;
+		qp_init_attr.create_flags = MLX5DV_QP_CREATE_TUNNEL_OFFLOADS;
+	}
+	qp = mlx5_glue->dv_create_qp(
+		priv->ctx,
+		&(struct ibv_qp_init_attr_ex){
+			.qp_type = IBV_QPT_RAW_PACKET,
+			.comp_mask =
+				IBV_QP_INIT_ATTR_PD |
+				IBV_QP_INIT_ATTR_IND_TABLE |
+				IBV_QP_INIT_ATTR_RX_HASH,
+			.rx_hash_conf = (struct ibv_rx_hash_conf){
+				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
+				.rx_hash_key_len = rss_key_len,
+				.rx_hash_key = (void *)(uintptr_t)rss_key,
+				.rx_hash_fields_mask = hash_fields |
+					(tunnel && rss_level ?
+					(uint32_t)IBV_RX_HASH_INNER : 0),
+			},
+			.rwq_ind_tbl = ind_tbl->ind_table,
+			.pd = priv->pd,
+		},
+		&qp_init_attr);
+#else
 	qp = mlx5_glue->create_qp_ex
 		(priv->ctx,
 		 &(struct ibv_qp_init_attr_ex){
@@ -1427,6 +1460,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 			.rwq_ind_tbl = ind_tbl->ind_table,
 			.pd = priv->pd,
 		 });
+#endif
 	if (!qp) {
 		rte_errno = errno;
 		goto error;
@@ -1439,6 +1473,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 	hrxq->rss_key_len = rss_key_len;
 	hrxq->hash_fields = hash_fields;
 	hrxq->tunnel = tunnel;
+	hrxq->rss_level = rss_level;
 	memcpy(hrxq->rss_key, rss_key, rss_key_len);
 	rte_atomic32_inc(&hrxq->refcnt);
 	LIST_INSERT_HEAD(&priv->hrxqs, hrxq, next);
@@ -1448,6 +1483,8 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 	return hrxq;
 error:
 	err = rte_errno; /* Save rte_errno before cleanup. */
+	DRV_LOG(ERR, "port %u: Error creating Hash Rx queue",
+		dev->data->port_id);
 	mlx5_ind_table_ibv_release(dev, ind_tbl);
 	if (qp)
 		claim_zero(mlx5_glue->destroy_qp(qp));
@@ -1469,6 +1506,8 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
  *   Number of queues.
  * @param tunnel
  *   Tunnel type.
+ * @param rss_level
+ *   RSS hash on tunnel level
  *
  * @return
  *   An hash Rx queue on success.
@@ -1477,7 +1516,8 @@ struct mlx5_hrxq *
 mlx5_hrxq_get(struct rte_eth_dev *dev,
 	      const uint8_t *rss_key, uint32_t rss_key_len,
 	      uint64_t hash_fields,
-	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
+	      const uint16_t *queues, uint32_t queues_n,
+	      uint32_t tunnel, uint32_t rss_level)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct mlx5_hrxq *hrxq;
@@ -1494,6 +1534,8 @@ mlx5_hrxq_get(struct rte_eth_dev *dev,
 			continue;
 		if (hrxq->tunnel != tunnel)
 			continue;
+		if (hrxq->rss_level != rss_level)
+			continue;
 		ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n);
 		if (!ind_tbl)
 			continue;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d35605b55..62cf55109 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -147,6 +147,7 @@ struct mlx5_hrxq {
 	struct ibv_qp *qp; /* Verbs queue pair. */
 	uint64_t hash_fields; /* Verbs Hash fields. */
 	uint32_t tunnel; /* Tunnel type. */
+	uint32_t rss_level; /* RSS on tunnel level. */
 	uint32_t rss_key_len; /* Hash key length in bytes. */
 	uint8_t rss_key[]; /* Hash key. */
 };
@@ -251,12 +252,12 @@ struct mlx5_hrxq *mlx5_hrxq_new(struct rte_eth_dev *dev,
 				const uint8_t *rss_key, uint32_t rss_key_len,
 				uint64_t hash_fields,
 				const uint16_t *queues, uint32_t queues_n,
-				uint32_t tunnel);
+				uint32_t tunnel, uint32_t rss_level);
 struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
 				const uint8_t *rss_key, uint32_t rss_key_len,
 				uint64_t hash_fields,
 				const uint16_t *queues, uint32_t queues_n,
-				uint32_t tunnel);
+				uint32_t tunnel, uint32_t rss_level);
 int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
 int mlx5_hrxq_ibv_verify(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 08/15] net/mlx5: add hardware flow debug dump
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (6 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 07/15] net/mlx5: support tunnel RSS level Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 13:34 ` [PATCH v2 09/15] net/mlx5: introduce VXLAN-GPE tunnel type Xueming Li
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Dump verb flow detail including flow spec type and size for debugging
purpose.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c  | 68 ++++++++++++++++++++++++++++++++++++-------
 drivers/net/mlx5/mlx5_rxq.c   | 25 +++++++++++++---
 drivers/net/mlx5/mlx5_utils.h |  6 ++++
 3 files changed, 85 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 66c7d7993..70718c9fe 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2052,6 +2052,57 @@ mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct rte_flow *flow)
 }
 
 /**
+ * Dump flow hash RX queue detail.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param flow
+ *   Pointer to the rte_flow.
+ * @param i
+ *   Hash RX queue index.
+ */
+static void
+mlx5_flow_dump(struct rte_eth_dev *dev __rte_unused,
+	       struct rte_flow *flow __rte_unused,
+	       unsigned int i __rte_unused)
+{
+#ifndef NDEBUG
+	uintptr_t spec_ptr;
+	uint16_t j;
+	char buf[256];
+	uint8_t off;
+
+	spec_ptr = (uintptr_t)(flow->frxq[i].ibv_attr + 1);
+	for (j = 0, off = 0; j < flow->frxq[i].ibv_attr->num_of_specs;
+	     j++) {
+		struct ibv_flow_spec *spec = (void *)spec_ptr;
+		off += sprintf(buf + off, " %x(%hu)", spec->hdr.type,
+			       spec->hdr.size);
+		spec_ptr += spec->hdr.size;
+	}
+	DRV_LOG(DEBUG,
+		"port %u Verbs flow %p type %u: hrxq:%p qp:%p ind:%p, hash:%lx/%u"
+		" specs:%hhu(%hu), priority:%hu, type:%d, flags:%x,"
+		" comp_mask:%x specs:%s",
+		dev->data->port_id, (void *)flow, i,
+		(void *)flow->frxq[i].hrxq,
+		(void *)flow->frxq[i].hrxq->qp,
+		(void *)flow->frxq[i].hrxq->ind_table,
+		flow->frxq[i].hash_fields |
+		(flow->tunnel &&
+		 flow->rss_conf.rss_level ? (uint32_t)IBV_RX_HASH_INNER : 0),
+		flow->queues_n,
+		flow->frxq[i].ibv_attr->num_of_specs,
+		flow->frxq[i].ibv_attr->size,
+		flow->frxq[i].ibv_attr->priority,
+		flow->frxq[i].ibv_attr->type,
+		flow->frxq[i].ibv_attr->flags,
+		flow->frxq[i].ibv_attr->comp_mask,
+		buf);
+#endif
+}
+
+/**
  * Complete flow rule creation.
  *
  * @param dev
@@ -2093,6 +2144,7 @@ mlx5_flow_create_action_queue(struct rte_eth_dev *dev,
 		flow->frxq[i].ibv_flow =
 			mlx5_glue->create_flow(flow->frxq[i].hrxq->qp,
 					       flow->frxq[i].ibv_attr);
+		mlx5_flow_dump(dev, flow, i);
 		if (!flow->frxq[i].ibv_flow) {
 			rte_flow_error_set(error, ENOMEM,
 					   RTE_FLOW_ERROR_TYPE_HANDLE,
@@ -2100,11 +2152,6 @@ mlx5_flow_create_action_queue(struct rte_eth_dev *dev,
 			goto error;
 		}
 		++flows_n;
-		DRV_LOG(DEBUG, "port %u %p type %d QP %p ibv_flow %p",
-			dev->data->port_id,
-			(void *)flow, i,
-			(void *)flow->frxq[i].hrxq->qp,
-			(void *)flow->frxq[i].ibv_flow);
 	}
 	if (!flows_n) {
 		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_HANDLE,
@@ -2646,24 +2693,25 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
 					      flow->rss_conf.level);
 			if (!flow->frxq[i].hrxq) {
 				DRV_LOG(DEBUG,
-					"port %u flow %p cannot be applied",
+					"port %u flow %p cannot create hash"
+					" rxq",
 					dev->data->port_id, (void *)flow);
 				rte_errno = EINVAL;
 				return -rte_errno;
 			}
 flow_create:
+			mlx5_flow_dump(dev, flow, i);
 			flow->frxq[i].ibv_flow =
 				mlx5_glue->create_flow(flow->frxq[i].hrxq->qp,
 						       flow->frxq[i].ibv_attr);
 			if (!flow->frxq[i].ibv_flow) {
 				DRV_LOG(DEBUG,
-					"port %u flow %p cannot be applied",
-					dev->data->port_id, (void *)flow);
+					"port %u flow %p type %u cannot be"
+					" applied",
+					dev->data->port_id, (void *)flow, i);
 				rte_errno = EINVAL;
 				return -rte_errno;
 			}
-			DRV_LOG(DEBUG, "port %u flow %p applied",
-				dev->data->port_id, (void *)flow);
 		}
 		mlx5_flow_create_update_rxqs(dev, flow);
 	}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 6e5565fb2..423d3272e 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1259,9 +1259,9 @@ mlx5_ind_table_ibv_new(struct rte_eth_dev *dev, const uint16_t *queues,
 	}
 	rte_atomic32_inc(&ind_tbl->refcnt);
 	LIST_INSERT_HEAD(&priv->ind_tbls, ind_tbl, next);
-	DRV_LOG(DEBUG, "port %u indirection table %p: refcnt %d",
-		dev->data->port_id, (void *)ind_tbl,
-		rte_atomic32_read(&ind_tbl->refcnt));
+	DEBUG("port %u new indirection table %p: queues:%u refcnt:%d",
+	      dev->data->port_id, (void *)ind_tbl, 1 << wq_n,
+	      rte_atomic32_read(&ind_tbl->refcnt));
 	return ind_tbl;
 error:
 	rte_free(ind_tbl);
@@ -1330,9 +1330,12 @@ mlx5_ind_table_ibv_release(struct rte_eth_dev *dev,
 	DRV_LOG(DEBUG, "port %u indirection table %p: refcnt %d",
 		((struct priv *)dev->data->dev_private)->port,
 		(void *)ind_tbl, rte_atomic32_read(&ind_tbl->refcnt));
-	if (rte_atomic32_dec_and_test(&ind_tbl->refcnt))
+	if (rte_atomic32_dec_and_test(&ind_tbl->refcnt)) {
 		claim_zero(mlx5_glue->destroy_rwq_ind_table
 			   (ind_tbl->ind_table));
+		DEBUG("port %u delete indirection table %p: queues: %u",
+		      dev->data->port_id, (void *)ind_tbl, ind_tbl->queues_n);
+	}
 	for (i = 0; i != ind_tbl->queues_n; ++i)
 		claim_nonzero(mlx5_rxq_release(dev, ind_tbl->queues[i]));
 	if (!rte_atomic32_read(&ind_tbl->refcnt)) {
@@ -1442,6 +1445,12 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 			.pd = priv->pd,
 		},
 		&qp_init_attr);
+	DEBUG("port %u new QP:%p ind_tbl:%p hash_fields:0x%lx tunnel:0x%x"
+	      " level:%hhu dv_attr:comp_mask:0x%lx create_flags:0x%x",
+	      dev->data->port_id, (void *)qp, (void *)ind_tbl,
+	      (tunnel && rss_level ? (uint32_t)IBV_RX_HASH_INNER : 0) |
+	      hash_fields, tunnel, rss_level,
+	      qp_init_attr.comp_mask, qp_init_attr.create_flags);
 #else
 	qp = mlx5_glue->create_qp_ex
 		(priv->ctx,
@@ -1460,6 +1469,10 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
 			.rwq_ind_tbl = ind_tbl->ind_table,
 			.pd = priv->pd,
 		 });
+	DEBUG("port %u new QP:%p ind_tbl:%p hash_fields:0x%lx tunnel:0x%x"
+	      " level:%hhu",
+	      dev->data->port_id, (void *)qp, (void *)ind_tbl,
+	      hash_fields, tunnel, rss_level);
 #endif
 	if (!qp) {
 		rte_errno = errno;
@@ -1571,6 +1584,10 @@ mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hrxq)
 		(void *)hrxq, rte_atomic32_read(&hrxq->refcnt));
 	if (rte_atomic32_dec_and_test(&hrxq->refcnt)) {
 		claim_zero(mlx5_glue->destroy_qp(hrxq->qp));
+		DEBUG("port %u delete QP %p: hash: 0x%lx, tunnel:"
+		      " 0x%x, level: %hhu",
+		      dev->data->port_id, (void *)hrxq, hrxq->hash_fields,
+		      hrxq->tunnel, hrxq->rss_level);
 		mlx5_ind_table_ibv_release(dev, hrxq->ind_table);
 		LIST_REMOVE(hrxq, next);
 		rte_free(hrxq);
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 85d2aae2b..9a3181b1f 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -103,16 +103,22 @@ extern int mlx5_logtype;
 /* claim_zero() does not perform any check when debugging is disabled. */
 #ifndef NDEBUG
 
+#define DEBUG(...) DRV_LOG(DEBUG, __VA_ARGS__)
 #define claim_zero(...) assert((__VA_ARGS__) == 0)
 #define claim_nonzero(...) assert((__VA_ARGS__) != 0)
 
 #else /* NDEBUG */
 
+#define DEBUG(...) (void)0
 #define claim_zero(...) (__VA_ARGS__)
 #define claim_nonzero(...) (__VA_ARGS__)
 
 #endif /* NDEBUG */
 
+#define INFO(...) DRV_LOG(INFO, __VA_ARGS__)
+#define WARN(...) DRV_LOG(WARNING, __VA_ARGS__)
+#define ERROR(...) DRV_LOG(ERR, __VA_ARGS__)
+
 /* Convenience macros for accessing mbuf fields. */
 #define NEXT(m) ((m)->next)
 #define DATA_LEN(m) ((m)->data_len)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 09/15] net/mlx5: introduce VXLAN-GPE tunnel type
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (7 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 08/15] net/mlx5: add hardware flow debug dump Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 13:34 ` [PATCH v2 10/15] net/mlx5: allow flow tunnel ID 0 with outer pattern Xueming Li
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Add VXLAN-GPE support to rte flow.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 95 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_rxtx.c |  3 +-
 2 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 70718c9fe..857b8b716 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -88,6 +88,11 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 		       struct mlx5_flow_data *data);
 
 static int
+mlx5_flow_create_vxlan_gpe(const struct rte_flow_item *item,
+			   const void *default_mask,
+			   struct mlx5_flow_data *data);
+
+static int
 mlx5_flow_create_gre(const struct rte_flow_item *item,
 		       const void *default_mask,
 		       struct mlx5_flow_data *data);
@@ -238,10 +243,12 @@ struct rte_flow {
 
 #define IS_TUNNEL(type) ( \
 	(type) == RTE_FLOW_ITEM_TYPE_VXLAN || \
+	(type) == RTE_FLOW_ITEM_TYPE_VXLAN_GPE || \
 	(type) == RTE_FLOW_ITEM_TYPE_GRE)
 
 const uint32_t flow_ptype[] = {
 	[RTE_FLOW_ITEM_TYPE_VXLAN] = RTE_PTYPE_TUNNEL_VXLAN,
+	[RTE_FLOW_ITEM_TYPE_VXLAN_GPE] = RTE_PTYPE_TUNNEL_VXLAN_GPE,
 	[RTE_FLOW_ITEM_TYPE_GRE] = RTE_PTYPE_TUNNEL_GRE,
 };
 
@@ -250,6 +257,8 @@ const uint32_t flow_ptype[] = {
 const uint32_t ptype_ext[] = {
 	[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)] = RTE_PTYPE_TUNNEL_VXLAN |
 					      RTE_PTYPE_L4_UDP,
+	[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN_GPE)]	= RTE_PTYPE_TUNNEL_VXLAN_GPE |
+						  RTE_PTYPE_L4_UDP,
 	[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)] = RTE_PTYPE_TUNNEL_GRE,
 };
 
@@ -307,6 +316,7 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 	[RTE_FLOW_ITEM_TYPE_END] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
 			       RTE_FLOW_ITEM_TYPE_VXLAN,
+			       RTE_FLOW_ITEM_TYPE_VXLAN_GPE,
 			       RTE_FLOW_ITEM_TYPE_GRE),
 	},
 	[RTE_FLOW_ITEM_TYPE_ETH] = {
@@ -385,7 +395,8 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_ipv6),
 	},
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_VXLAN),
+		.items = ITEMS(RTE_FLOW_ITEM_TYPE_VXLAN,
+			       RTE_FLOW_ITEM_TYPE_VXLAN_GPE),
 		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_udp){
 			.hdr = {
@@ -437,6 +448,19 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 		.convert = mlx5_flow_create_vxlan,
 		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
 	},
+	[RTE_FLOW_ITEM_TYPE_VXLAN_GPE] = {
+		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
+			       RTE_FLOW_ITEM_TYPE_IPV4,
+			       RTE_FLOW_ITEM_TYPE_IPV6),
+		.actions = valid_actions,
+		.mask = &(const struct rte_flow_item_vxlan_gpe){
+			.vni = "\xff\xff\xff",
+		},
+		.default_mask = &rte_flow_item_vxlan_gpe_mask,
+		.mask_sz = sizeof(struct rte_flow_item_vxlan_gpe),
+		.convert = mlx5_flow_create_vxlan_gpe,
+		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
+	},
 };
 
 /** Structure to pass to the conversion function. */
@@ -1789,6 +1813,75 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 }
 
 /**
+ * Convert VXLAN-GPE item to Verbs specification.
+ *
+ * @param item[in]
+ *   Item specification.
+ * @param default_mask[in]
+ *   Default bit-masks to use when item->mask is not provided.
+ * @param data[in, out]
+ *   User structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_create_vxlan_gpe(const struct rte_flow_item *item,
+			   const void *default_mask,
+			   struct mlx5_flow_data *data)
+{
+	const struct rte_flow_item_vxlan_gpe *spec = item->spec;
+	const struct rte_flow_item_vxlan_gpe *mask = item->mask;
+	struct mlx5_flow_parse *parser = data->parser;
+	unsigned int size = sizeof(struct ibv_flow_spec_tunnel);
+	struct ibv_flow_spec_tunnel vxlan = {
+		.type = parser->inner | IBV_FLOW_SPEC_VXLAN_TUNNEL,
+		.size = size,
+	};
+	union vni {
+		uint32_t vlan_id;
+		uint8_t vni[4];
+	} id;
+	int r;
+
+	id.vni[0] = 0;
+	parser->inner = IBV_FLOW_SPEC_INNER;
+	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN_GPE)];
+	parser->out_layer = parser->layer;
+	parser->layer = HASH_RXQ_TUNNEL;
+	if (spec) {
+		if (!mask)
+			mask = default_mask;
+		memcpy(&id.vni[1], spec->vni, 3);
+		vxlan.val.tunnel_id = id.vlan_id;
+		memcpy(&id.vni[1], mask->vni, 3);
+		vxlan.mask.tunnel_id = id.vlan_id;
+		if (spec->protocol) {
+			r = EINVAL;
+			return r;
+		}
+		/* Remove unwanted bits from values. */
+		vxlan.val.tunnel_id &= vxlan.mask.tunnel_id;
+	}
+	/*
+	 * Tunnel id 0 is equivalent as not adding a VXLAN layer, if only this
+	 * layer is defined in the Verbs specification it is interpreted as
+	 * wildcard and all packets will match this rule, if it follows a full
+	 * stack layer (ex: eth / ipv4 / udp), all packets matching the layers
+	 * before will also match this rule.
+	 * To avoid such situation, VNI 0 is currently refused.
+	 */
+	/* Only allow tunnel w/o tunnel id pattern after proper outer spec. */
+	if (parser->out_layer == HASH_RXQ_ETH && !vxlan.val.tunnel_id)
+		return rte_flow_error_set(data->error, EINVAL,
+					  RTE_FLOW_ERROR_TYPE_ITEM,
+					  item,
+					  "VxLAN-GPE vni cannot be 0");
+	mlx5_flow_create_copy(parser, &vxlan, size);
+	return 0;
+}
+
+/**
  * Convert GRE item to Verbs specification.
  *
  * @param item[in]
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 285b2dbf0..c9342d659 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -466,8 +466,7 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			uint8_t vlan_sz =
 				(buf->ol_flags & PKT_TX_VLAN_PKT) ? 4 : 0;
 			const uint64_t is_tunneled =
-				buf->ol_flags & (PKT_TX_TUNNEL_GRE |
-						 PKT_TX_TUNNEL_VXLAN);
+				buf->ol_flags & (PKT_TX_TUNNEL_MASK);
 
 			tso_header_sz = buf->l2_len + vlan_sz +
 					buf->l3_len + buf->l4_len;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 10/15] net/mlx5: allow flow tunnel ID 0 with outer pattern
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (8 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 09/15] net/mlx5: introduce VXLAN-GPE tunnel type Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-11 12:25   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 11/15] net/mlx5: support MPLS-in-GRE and MPLS-in-UDP Xueming Li
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Tunnel w/o tunnel id pattern could match any non-tunneled packet,
this patch allowed tunnel w/o tunnel id pattern after proper outer spec.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 857b8b716..58d437308 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1803,7 +1803,8 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 	 * before will also match this rule.
 	 * To avoid such situation, VNI 0 is currently refused.
 	 */
-	if (!vxlan.val.tunnel_id)
+	/* Only allow tunnel w/o tunnel id pattern after proper outer spec. */
+	if (parser->out_layer == HASH_RXQ_ETH && !vxlan.val.tunnel_id)
 		return rte_flow_error_set(data->error, EINVAL,
 					  RTE_FLOW_ERROR_TYPE_ITEM,
 					  item,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 11/15] net/mlx5: support MPLS-in-GRE and MPLS-in-UDP
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (9 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 10/15] net/mlx5: allow flow tunnel ID 0 with outer pattern Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 13:34 ` [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading Xueming Li
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

This patch supports new tunnel type MPLS-in-GRE and MPLS-in-UDP.
Flow pattern example:
  ipv4 proto is 47 / gre proto is 0x8847 / mpls
  ipv4 / udp dst is 6635 / mpls / end

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/Makefile    |   5 ++
 drivers/net/mlx5/mlx5.c      |  15 +++++
 drivers/net/mlx5/mlx5.h      |   1 +
 drivers/net/mlx5/mlx5_flow.c | 148 ++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index f9a6c460b..33553483e 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -131,6 +131,11 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		enum MLX5DV_CONTEXT_MASK_TUNNEL_OFFLOADS \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_IBV_DEVICE_MPLS_SUPPORT \
+		infiniband/verbs.h \
+		enum IBV_FLOW_SPEC_MPLS \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_IBV_WQ_FLAG_RX_END_PADDING \
 		infiniband/verbs.h \
 		enum IBV_WQ_FLAG_RX_END_PADDING \
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a1f2799e5..2124439b3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -613,6 +613,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	unsigned int mps;
 	unsigned int cqe_comp;
 	unsigned int tunnel_en = 0;
+	unsigned int mpls_en = 0;
 	int idx;
 	int i;
 	struct mlx5dv_context attrs_out = {0};
@@ -719,12 +720,25 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 			      MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_VXLAN) &&
 			     (attrs_out.tunnel_offloads_caps &
 			      MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_GRE));
+#ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
+		mpls_en = ((attrs_out.tunnel_offloads_caps &
+			    MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_MPLS_GRE) &&
+			   (attrs_out.tunnel_offloads_caps &
+			    MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_MPLS_UDP) &&
+			   (attrs_out.tunnel_offloads_caps &
+			  MLX5DV_RAW_PACKET_CAP_TUNNELED_OFFLOAD_CTRL_DW_MPLS));
+#endif
 	}
 	DRV_LOG(DEBUG, "tunnel offloading is %ssupported",
 		tunnel_en ? "" : "not ");
+	DRV_LOG(DEBUG, "MPLS over GRE/UDP offloading is %ssupported",
+		mpls_en ? "" : "not ");
 #else
 	DRV_LOG(WARNING,
 		"tunnel offloading disabled due to old OFED/rdma-core version");
+	DRV_LOG(WARNING,
+		"MPLS over GRE/UDP offloading disabled due to old"
+		" OFED/rdma-core version or firmware configuration");
 #endif
 	if (mlx5_glue->query_device_ex(attr_ctx, NULL, &device_attr)) {
 		err = errno;
@@ -748,6 +762,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 			.cqe_comp = cqe_comp,
 			.mps = mps,
 			.tunnel_en = tunnel_en,
+			.mpls_en = mpls_en,
 			.tx_vec_en = 1,
 			.rx_vec_en = 1,
 			.mpw_hdr_dseg = 0,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 708272f6d..1868abd8d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -81,6 +81,7 @@ struct mlx5_dev_config {
 	unsigned int vf:1; /* This is a VF. */
 	unsigned int mps:2; /* Multi-packet send supported mode. */
 	unsigned int tunnel_en:1;
+	unsigned int mpls_en:1; /* MPLS over GRE/UDP is enabled. */
 	/* Whether tunnel stateless offloads are supported. */
 	unsigned int flow_counter_en:1; /* Whether flow counter is supported. */
 	unsigned int cqe_comp:1; /* CQE compression is enabled. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 58d437308..5784f2ee0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -97,6 +97,11 @@ mlx5_flow_create_gre(const struct rte_flow_item *item,
 		       const void *default_mask,
 		       struct mlx5_flow_data *data);
 
+static int
+mlx5_flow_create_mpls(const struct rte_flow_item *item,
+		      const void *default_mask,
+		      struct mlx5_flow_data *data);
+
 struct mlx5_flow_parse;
 
 static void
@@ -244,12 +249,14 @@ struct rte_flow {
 #define IS_TUNNEL(type) ( \
 	(type) == RTE_FLOW_ITEM_TYPE_VXLAN || \
 	(type) == RTE_FLOW_ITEM_TYPE_VXLAN_GPE || \
+	(type) == RTE_FLOW_ITEM_TYPE_MPLS || \
 	(type) == RTE_FLOW_ITEM_TYPE_GRE)
 
 const uint32_t flow_ptype[] = {
 	[RTE_FLOW_ITEM_TYPE_VXLAN] = RTE_PTYPE_TUNNEL_VXLAN,
 	[RTE_FLOW_ITEM_TYPE_VXLAN_GPE] = RTE_PTYPE_TUNNEL_VXLAN_GPE,
 	[RTE_FLOW_ITEM_TYPE_GRE] = RTE_PTYPE_TUNNEL_GRE,
+	[RTE_FLOW_ITEM_TYPE_MPLS] = RTE_PTYPE_TUNNEL_MPLS_IN_GRE,
 };
 
 #define PTYPE_IDX(t) ((RTE_PTYPE_TUNNEL_MASK & (t)) >> 12)
@@ -260,6 +267,10 @@ const uint32_t ptype_ext[] = {
 	[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN_GPE)]	= RTE_PTYPE_TUNNEL_VXLAN_GPE |
 						  RTE_PTYPE_L4_UDP,
 	[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)] = RTE_PTYPE_TUNNEL_GRE,
+	[PTYPE_IDX(RTE_PTYPE_TUNNEL_MPLS_IN_GRE)] =
+		RTE_PTYPE_TUNNEL_MPLS_IN_GRE,
+	[PTYPE_IDX(RTE_PTYPE_TUNNEL_MPLS_IN_UDP)] =
+		RTE_PTYPE_TUNNEL_MPLS_IN_GRE | RTE_PTYPE_L4_UDP,
 };
 
 /** Structure to generate a simple graph of layers supported by the NIC. */
@@ -396,7 +407,8 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 	},
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_VXLAN,
-			       RTE_FLOW_ITEM_TYPE_VXLAN_GPE),
+			       RTE_FLOW_ITEM_TYPE_VXLAN_GPE,
+			       RTE_FLOW_ITEM_TYPE_MPLS),
 		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_udp){
 			.hdr = {
@@ -425,7 +437,8 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 	[RTE_FLOW_ITEM_TYPE_GRE] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
 			       RTE_FLOW_ITEM_TYPE_IPV4,
-			       RTE_FLOW_ITEM_TYPE_IPV6),
+			       RTE_FLOW_ITEM_TYPE_IPV6,
+			       RTE_FLOW_ITEM_TYPE_MPLS),
 		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_gre){
 			.protocol = -1,
@@ -433,7 +446,11 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 		.default_mask = &rte_flow_item_gre_mask,
 		.mask_sz = sizeof(struct rte_flow_item_gre),
 		.convert = mlx5_flow_create_gre,
+#ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
+		.dst_sz = sizeof(struct ibv_flow_spec_gre),
+#else
 		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
+#endif
 	},
 	[RTE_FLOW_ITEM_TYPE_VXLAN] = {
 		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
@@ -461,6 +478,21 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 		.convert = mlx5_flow_create_vxlan_gpe,
 		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
 	},
+	[RTE_FLOW_ITEM_TYPE_MPLS] = {
+		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
+			       RTE_FLOW_ITEM_TYPE_IPV4,
+			       RTE_FLOW_ITEM_TYPE_IPV6),
+		.actions = valid_actions,
+		.mask = &(const struct rte_flow_item_mpls){
+			.label_tc_s = "\xff\xff\xf0",
+		},
+		.default_mask = &rte_flow_item_mpls_mask,
+		.mask_sz = sizeof(struct rte_flow_item_mpls),
+		.convert = mlx5_flow_create_mpls,
+#ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
+		.dst_sz = sizeof(struct ibv_flow_spec_mpls),
+#endif
+	},
 };
 
 /** Structure to pass to the conversion function. */
@@ -916,7 +948,9 @@ mlx5_flow_convert_items_validate(struct rte_eth_dev *dev,
 		if (ret)
 			goto exit_item_not_supported;
 		if (IS_TUNNEL(items->type)) {
-			if (parser->tunnel) {
+			if (parser->tunnel &&
+			   !(parser->tunnel == RTE_PTYPE_TUNNEL_GRE &&
+			     items->type == RTE_FLOW_ITEM_TYPE_MPLS)) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
 						   items,
@@ -924,6 +958,16 @@ mlx5_flow_convert_items_validate(struct rte_eth_dev *dev,
 						   " tunnel encapsulations.");
 				return -rte_errno;
 			}
+			if (items->type == RTE_FLOW_ITEM_TYPE_MPLS &&
+			    !priv->config.mpls_en) {
+				rte_flow_error_set(error, ENOTSUP,
+						   RTE_FLOW_ERROR_TYPE_ITEM,
+						   items,
+						   "MPLS not supported or"
+						   " disabled in firmware"
+						   " configuration.");
+				return -rte_errno;
+			}
 			if (!priv->config.tunnel_en &&
 			    parser->rss_conf.level) {
 				rte_flow_error_set(error, ENOTSUP,
@@ -1883,6 +1927,80 @@ mlx5_flow_create_vxlan_gpe(const struct rte_flow_item *item,
 }
 
 /**
+ * Convert MPLS item to Verbs specification.
+ * Tunnel types currently supported are MPLS-in-GRE and MPLS-in-UDP.
+ *
+ * @param item[in]
+ *   Item specification.
+ * @param default_mask[in]
+ *   Default bit-masks to use when item->mask is not provided.
+ * @param data[in, out]
+ *   User structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx5_flow_create_mpls(const struct rte_flow_item *item __rte_unused,
+		      const void *default_mask __rte_unused,
+		      struct mlx5_flow_data *data __rte_unused)
+{
+#ifndef HAVE_IBV_DEVICE_MPLS_SUPPORT
+	return rte_flow_error_set(data->error, EINVAL,
+				  RTE_FLOW_ERROR_TYPE_ITEM,
+				  item,
+				  "MPLS not supported by driver");
+#else
+	unsigned int i;
+	const struct rte_flow_item_mpls *spec = item->spec;
+	const struct rte_flow_item_mpls *mask = item->mask;
+	struct mlx5_flow_parse *parser = data->parser;
+	unsigned int size = sizeof(struct ibv_flow_spec_mpls);
+	struct ibv_flow_spec_mpls mpls = {
+		.type = IBV_FLOW_SPEC_MPLS,
+		.size = size,
+	};
+	union tag {
+		uint32_t tag;
+		uint8_t label[4];
+	} id;
+
+	id.tag = 0;
+	parser->inner = IBV_FLOW_SPEC_INNER;
+	if (parser->layer == HASH_RXQ_UDPV4 ||
+	    parser->layer == HASH_RXQ_UDPV6) {
+		parser->tunnel =
+			ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_MPLS_IN_UDP)];
+		parser->out_layer = parser->layer;
+	} else {
+		parser->tunnel =
+			ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_MPLS_IN_GRE)];
+	}
+	parser->layer = HASH_RXQ_TUNNEL;
+	if (spec) {
+		if (!mask)
+			mask = default_mask;
+		memcpy(&id.label[1], spec->label_tc_s, 3);
+		id.label[0] = spec->ttl;
+		mpls.val.tag = id.tag;
+		memcpy(&id.label[1], mask->label_tc_s, 3);
+		id.label[0] = mask->ttl;
+		mpls.mask.tag = id.tag;
+		/* Remove unwanted bits from values. */
+		mpls.val.tag &= mpls.mask.tag;
+	}
+	mlx5_flow_create_copy(parser, &mpls, size);
+	for (i = 0; i != hash_rxq_init_n; ++i) {
+		if (!parser->queue[i].ibv_attr)
+			continue;
+		parser->queue[i].ibv_attr->flags |=
+			IBV_FLOW_ATTR_FLAGS_ORDERED_SPEC_LIST;
+	}
+	return 0;
+#endif
+}
+
+/**
  * Convert GRE item to Verbs specification.
  *
  * @param item[in]
@@ -1901,16 +2019,40 @@ mlx5_flow_create_gre(const struct rte_flow_item *item __rte_unused,
 		     struct mlx5_flow_data *data)
 {
 	struct mlx5_flow_parse *parser = data->parser;
+#ifndef HAVE_IBV_DEVICE_MPLS_SUPPORT
 	unsigned int size = sizeof(struct ibv_flow_spec_tunnel);
 	struct ibv_flow_spec_tunnel tunnel = {
 		.type = parser->inner | IBV_FLOW_SPEC_VXLAN_TUNNEL,
 		.size = size,
 	};
+#else
+	const struct rte_flow_item_gre *spec = item->spec;
+	const struct rte_flow_item_gre *mask = item->mask;
+	unsigned int size = sizeof(struct ibv_flow_spec_gre);
+	struct ibv_flow_spec_gre tunnel = {
+		.type = parser->inner | IBV_FLOW_SPEC_GRE,
+		.size = size,
+	};
+#endif
 
 	parser->inner = IBV_FLOW_SPEC_INNER;
 	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)];
 	parser->out_layer = parser->layer;
 	parser->layer = HASH_RXQ_TUNNEL;
+#ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
+	if (spec) {
+		if (!mask)
+			mask = default_mask;
+		tunnel.val.c_ks_res0_ver = spec->c_rsvd0_ver;
+		tunnel.val.protocol = spec->protocol;
+		tunnel.val.c_ks_res0_ver = mask->c_rsvd0_ver;
+		tunnel.val.protocol = mask->protocol;
+		/* Remove unwanted bits from values. */
+		tunnel.val.c_ks_res0_ver &= tunnel.mask.c_ks_res0_ver;
+		tunnel.val.protocol &= tunnel.mask.protocol;
+		tunnel.val.key &= tunnel.mask.key;
+	}
+#endif
 	mlx5_flow_create_copy(parser, &tunnel, size);
 	return 0;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (10 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 11/15] net/mlx5: support MPLS-in-GRE and MPLS-in-UDP Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-11 12:32   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count Xueming Li
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Remove tunnel limitations, add new hardware tunnel offload features.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 doc/guides/nics/mlx5.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index b1bab2ce2..c256f85f3 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -100,12 +100,12 @@ Features
 - RX interrupts.
 - Statistics query including Basic, Extended and per queue.
 - Rx HW timestamp.
+- Tunnel types: VXLAN, L3 VXLAN, VXLAN-GPE, GRE, MPLS-in-GRE, MPLS-in-UDP.
+- Tunnel HW offloads: packet type, inner/outer RSS, IP and UDP checksum verification.
 
 Limitations
 -----------
 
-- Inner RSS for VXLAN frames is not supported yet.
-- Hardware checksum RX offloads for VXLAN inner header are not supported yet.
 - For secondary process:
 
   - Forked secondary process not supported.
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (11 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-11 12:37   ` Nélio Laranjeiro
  2018-04-10 13:34 ` [PATCH v2 14/15] net/mlx5: fix invalid flow item check Xueming Li
  2018-04-10 13:34 ` [PATCH v2 15/15] net/mlx5: support RSS configuration in isolated mode Xueming Li
  14 siblings, 1 reply; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

In some environments it is desirable to have the NIC perform RSS
normally on the packet regardless of the number of queues configured.
The RSS hash result that is stored in the mbuf can then be used by
the application to make decisions about how to distribute workloads
to threads, secondary processes, or even virtual machines if the
application is a virtual switch.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 71 +++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 41 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 5784f2ee0..9efe00086 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1252,48 +1252,37 @@ mlx5_flow_convert_rss(struct rte_eth_dev *dev, struct mlx5_flow_parse *parser)
 			parser->queue[i].ibv_attr = NULL;
 		}
 	}
-	if (parser->rss_conf.types) {
-		/* Remove impossible flow according to the RSS configuration. */
-		for (i = hmin; i != (hmax + 1); ++i) {
-			if (!parser->queue[i].ibv_attr)
-				continue;
-			if (parser->rss_conf.types &
-			    hash_rxq_init[i].dpdk_rss_hf) {
-				parser->queue[i].hash_fields =
-					hash_rxq_init[i].hash_fields;
-				found = 1;
-				continue;
-			}
-			/* L4 flow could be used for L3 RSS. */
-			if (i == parser->layer && i < ip &&
-			    (hash_rxq_init[ip].dpdk_rss_hf &
-			     parser->rss_conf.types)) {
-				parser->queue[i].hash_fields =
-					hash_rxq_init[ip].hash_fields;
-				found = 1;
-				continue;
-			}
-			/* L3 flow and L4 hash: non-rss L3 flow. */
-			if (i == parser->layer && i == ip && found)
-				/* IP pattern and L4 HF. */
-				continue;
-			rte_free(parser->queue[i].ibv_attr);
-			parser->queue[i].ibv_attr = NULL;
+	/* Remove impossible flow according to the RSS configuration. */
+	for (i = hmin; i != (hmax + 1); ++i) {
+		if (!parser->queue[i].ibv_attr)
+			continue;
+		if (parser->rss_conf.types &
+		    hash_rxq_init[i].dpdk_rss_hf) {
+			parser->queue[i].hash_fields =
+				hash_rxq_init[i].hash_fields;
+			found = 1;
+			continue;
 		}
-		if (!found)
-			DRV_LOG(WARNING,
-				"port %u rss hash function doesn't match "
-				"pattern", dev->data->port_id);
-	} else {
-		/* Remove any other flow. */
-		for (i = hmin; i != (hmax + 1); ++i) {
-			if (i == parser->layer || !parser->queue[i].ibv_attr)
-				continue;
-			rte_free(parser->queue[i].ibv_attr);
-			parser->queue[i].ibv_attr = NULL;
+		/* L4 flow could be used for L3 RSS. */
+		if (i == parser->layer && i < ip &&
+		    (hash_rxq_init[ip].dpdk_rss_hf &
+		     parser->rss_conf.types)) {
+			parser->queue[i].hash_fields =
+				hash_rxq_init[ip].hash_fields;
+			found = 1;
+			continue;
 		}
-		parser->rss_conf.queue_num = 1;
+		/* L3 flow and L4 hash: non-rss L3 flow. */
+		if (i == parser->layer && i == ip && found)
+			/* IP pattern and L4 HF. */
+			continue;
+		rte_free(parser->queue[i].ibv_attr);
+		parser->queue[i].ibv_attr = NULL;
 	}
+	if (!found)
+		DRV_LOG(WARNING,
+			"port %u rss hash function doesn't match "
+			"pattern", dev->data->port_id);
 	return 0;
 }
 
@@ -2326,8 +2315,8 @@ mlx5_flow_dump(struct rte_eth_dev *dev __rte_unused,
 		(void *)flow->frxq[i].hrxq->ind_table,
 		flow->frxq[i].hash_fields |
 		(flow->tunnel &&
-		 flow->rss_conf.rss_level ? (uint32_t)IBV_RX_HASH_INNER : 0),
-		flow->queues_n,
+		 flow->rss_conf.level ? (uint32_t)IBV_RX_HASH_INNER : 0),
+		flow->rss_conf.queue_num,
 		flow->frxq[i].ibv_attr->num_of_specs,
 		flow->frxq[i].ibv_attr->size,
 		flow->frxq[i].ibv_attr->priority,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 14/15] net/mlx5: fix invalid flow item check
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (12 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  2018-04-10 13:34 ` [PATCH v2 15/15] net/mlx5: support RSS configuration in isolated mode Xueming Li
  14 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

This patch fixed invalid flow item check.

Fixes: 4f1a88e3f9b0 ("net/mlx5: standardize on negative errno values")
Cc: nelio.laranjeiro@6wind.com

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5_flow.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 9efe00086..e2ed675c0 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -939,8 +939,10 @@ mlx5_flow_convert_items_validate(struct rte_eth_dev *dev,
 				break;
 			}
 		}
-		if (!token)
+		if (!token) {
+			ret = -ENOTSUP;
 			goto exit_item_not_supported;
+		}
 		cur_item = token;
 		ret = mlx5_flow_item_validate(items,
 					      (const uint8_t *)cur_item->mask,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 15/15] net/mlx5: support RSS configuration in isolated mode
  2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
                   ` (13 preceding siblings ...)
  2018-04-10 13:34 ` [PATCH v2 14/15] net/mlx5: fix invalid flow item check Xueming Li
@ 2018-04-10 13:34 ` Xueming Li
  14 siblings, 0 replies; 43+ messages in thread
From: Xueming Li @ 2018-04-10 13:34 UTC (permalink / raw)
  To: Nelio Laranjeiro, Shahaf Shuler; +Cc: Xueming Li, dev

Enable RSS related configuration in isolated mode.

Signed-off-by: Xueming Li <xuemingl@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 2124439b3..ad36d51c6 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -333,6 +333,10 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.mtu_set = mlx5_dev_set_mtu,
 	.vlan_strip_queue_set = mlx5_vlan_strip_queue_set,
 	.vlan_offload_set = mlx5_vlan_offload_set,
+	.reta_update = mlx5_dev_rss_reta_update,
+	.reta_query = mlx5_dev_rss_reta_query,
+	.rss_hash_update = mlx5_rss_hash_update,
+	.rss_hash_conf_get = mlx5_rss_hash_conf_get,
 	.filter_ctrl = mlx5_dev_filter_ctrl,
 	.rx_descriptor_status = mlx5_rx_descriptor_status,
 	.tx_descriptor_status = mlx5_tx_descriptor_status,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-10 13:34 ` [PATCH v2 01/15] net/mlx5: support 16 hardware priorities Xueming Li
@ 2018-04-10 14:41   ` Nélio Laranjeiro
  2018-04-10 15:22     ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-10 14:41 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> Adjust flow priority mapping to adapt new hardware 16 verb flow
> priorites support:
> 0-3: RTE FLOW tunnel rule
> 4-7: RTE FLOW non-tunnel rule
> 8-15: PMD control flow

This commit log is inducing people in error, this amount of priority
depends on the Mellanox OFED installed, it is not available on upstream
Linux kernel yet nor in the current Mellanox OFED GA.  

What happens when those amount of priority are not available, is it
removing a functionality?  Will it collide with other flows?

> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5.c         |  10 ++++
>  drivers/net/mlx5/mlx5.h         |   8 +++
>  drivers/net/mlx5/mlx5_flow.c    | 107 ++++++++++++++++++++++++++++++----------
>  drivers/net/mlx5/mlx5_trigger.c |   8 ---
>  4 files changed, 100 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index cfab55897..a1f2799e5 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -197,6 +197,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
>  		priv->txqs_n = 0;
>  		priv->txqs = NULL;
>  	}
> +	mlx5_flow_delete_drop_queue(dev);
>
>  	if (priv->pd != NULL) {
>  		assert(priv->ctx != NULL);
>  		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
> @@ -993,6 +994,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
>  		mlx5_set_link_up(eth_dev);
>  		/* Store device configuration on private structure. */
>  		priv->config = config;
> +		/* Create drop queue. */
> +		err = mlx5_flow_create_drop_queue(eth_dev);
> +		if (err) {
> +			DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> +				eth_dev->data->port_id, strerror(rte_errno));
> +			goto port_error;
> +		}
> +		/* Supported flow priority number detection. */
> +		mlx5_flow_priorities_detect(eth_dev);
>  		continue;
>  port_error:
>  		if (priv)
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
> index 63b24e6bb..708272f6d 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -89,6 +89,8 @@ struct mlx5_dev_config {
>  	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
>  	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
>  	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
> +	unsigned int flow_priority_shift; /* Non-tunnel flow priority shift. */
> +	unsigned int control_flow_priority; /* Control flow priority. */
>  	unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */
>  	unsigned int ind_table_max_size; /* Maximum indirection table size. */
>  	int txq_inline; /* Maximum packet size for inlining. */
> @@ -105,6 +107,11 @@ enum mlx5_verbs_alloc_type {
>  	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
>  };
>  
> +/* 8 Verbs priorities per flow. */
> +#define MLX5_VERBS_FLOW_PRIO_8 8
> +/* 4 Verbs priorities per flow. */
> +#define MLX5_VERBS_FLOW_PRIO_4 4
> +
>  /**
>   * Verbs allocator needs a context to know in the callback which kind of
>   * resources it is allocating.
> @@ -253,6 +260,7 @@ int mlx5_traffic_restart(struct rte_eth_dev *dev);
>  
>  /* mlx5_flow.c */
>  
> +void mlx5_flow_priorities_detect(struct rte_eth_dev *dev);
>  int mlx5_flow_validate(struct rte_eth_dev *dev,
>  		       const struct rte_flow_attr *attr,
>  		       const struct rte_flow_item items[],
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 288610620..394760418 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -32,9 +32,6 @@
>  #include "mlx5_prm.h"
>  #include "mlx5_glue.h"
>  
> -/* Define minimal priority for control plane flows. */
> -#define MLX5_CTRL_FLOW_PRIORITY 4
> -
>  /* Internet Protocol versions. */
>  #define MLX5_IPV4 4
>  #define MLX5_IPV6 6
> @@ -129,7 +126,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
>  				IBV_RX_HASH_SRC_PORT_TCP |
>  				IBV_RX_HASH_DST_PORT_TCP),
>  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
> -		.flow_priority = 1,
> +		.flow_priority = 0,
>  		.ip_version = MLX5_IPV4,
>  	},
>  	[HASH_RXQ_UDPV4] = {
> @@ -138,7 +135,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
>  				IBV_RX_HASH_SRC_PORT_UDP |
>  				IBV_RX_HASH_DST_PORT_UDP),
>  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
> -		.flow_priority = 1,
> +		.flow_priority = 0,
>  		.ip_version = MLX5_IPV4,
>  	},
>  	[HASH_RXQ_IPV4] = {
> @@ -146,7 +143,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
>  				IBV_RX_HASH_DST_IPV4),
>  		.dpdk_rss_hf = (ETH_RSS_IPV4 |
>  				ETH_RSS_FRAG_IPV4),
> -		.flow_priority = 2,
> +		.flow_priority = 1,
>  		.ip_version = MLX5_IPV4,
>  	},
>  	[HASH_RXQ_TCPV6] = {
> @@ -155,7 +152,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
>  				IBV_RX_HASH_SRC_PORT_TCP |
>  				IBV_RX_HASH_DST_PORT_TCP),
>  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_TCP,
> -		.flow_priority = 1,
> +		.flow_priority = 0,
>  		.ip_version = MLX5_IPV6,
>  	},
>  	[HASH_RXQ_UDPV6] = {
> @@ -164,7 +161,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
>  				IBV_RX_HASH_SRC_PORT_UDP |
>  				IBV_RX_HASH_DST_PORT_UDP),
>  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_UDP,
> -		.flow_priority = 1,
> +		.flow_priority = 0,
>  		.ip_version = MLX5_IPV6,
>  	},
>  	[HASH_RXQ_IPV6] = {
> @@ -172,13 +169,13 @@ const struct hash_rxq_init hash_rxq_init[] = {
>  				IBV_RX_HASH_DST_IPV6),
>  		.dpdk_rss_hf = (ETH_RSS_IPV6 |
>  				ETH_RSS_FRAG_IPV6),
> -		.flow_priority = 2,
> +		.flow_priority = 1,
>  		.ip_version = MLX5_IPV6,
>  	},
>  	[HASH_RXQ_ETH] = {
>  		.hash_fields = 0,
>  		.dpdk_rss_hf = 0,
> -		.flow_priority = 3,
> +		.flow_priority = 2,
>  	},
>  };

If the amount of priorities remains 8, you are removing the priority for
the tunnel flows introduced by 
commit 749365717f5c ("net/mlx5: change tunnel flow priority")

Please keep this functionality when this patch fails to get the expected
16 Verbs priorities.

> @@ -536,6 +533,8 @@ mlx5_flow_item_validate(const struct rte_flow_item *item,
>  /**
>   * Extract attribute to the parser.
>   *
> + * @param dev
> + *   Pointer to Ethernet device.
>   * @param[in] attr
>   *   Flow rule attributes.
>   * @param[out] error
> @@ -545,9 +544,12 @@ mlx5_flow_item_validate(const struct rte_flow_item *item,
>   *   0 on success, a negative errno value otherwise and rte_errno is set.
>   */
>  static int
> -mlx5_flow_convert_attributes(const struct rte_flow_attr *attr,
> +mlx5_flow_convert_attributes(struct rte_eth_dev *dev,
> +			     const struct rte_flow_attr *attr,
>  			     struct rte_flow_error *error)
>  {
> +	struct priv *priv = dev->data->dev_private;
> +
>  	if (attr->group) {
>  		rte_flow_error_set(error, ENOTSUP,
>  				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
> @@ -555,7 +557,7 @@ mlx5_flow_convert_attributes(const struct rte_flow_attr *attr,
>  				   "groups are not supported");
>  		return -rte_errno;
>  	}
> -	if (attr->priority && attr->priority != MLX5_CTRL_FLOW_PRIORITY) {
> +	if (attr->priority > priv->config.control_flow_priority) {
>  		rte_flow_error_set(error, ENOTSUP,
>  				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
>  				   NULL,
> @@ -900,30 +902,38 @@ mlx5_flow_convert_allocate(unsigned int size, struct rte_flow_error *error)
>   * Make inner packet matching with an higher priority from the non Inner
>   * matching.
>   *
> + * @param dev
> + *   Pointer to Ethernet device.
>   * @param[in, out] parser
>   *   Internal parser structure.
>   * @param attr
>   *   User flow attribute.
>   */
>  static void
> -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> +			  struct mlx5_flow_parse *parser,
>  			  const struct rte_flow_attr *attr)
>  {
> +	struct priv *priv = dev->data->dev_private;
>  	unsigned int i;
> +	uint16_t priority;
>  
> +	if (priv->config.flow_priority_shift == 1)
> +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> +	else
> +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> +	if (!parser->inner)
> +		priority += priv->config.flow_priority_shift;
>  	if (parser->drop) {
> -		parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> -			attr->priority +
> -			hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> +		parser->queue[HASH_RXQ_ETH].ibv_attr->priority = priority +
> +				hash_rxq_init[HASH_RXQ_ETH].flow_priority;
>  		return;
>  	}
>  	for (i = 0; i != hash_rxq_init_n; ++i) {
> -		if (parser->queue[i].ibv_attr) {
> -			parser->queue[i].ibv_attr->priority =
> -				attr->priority +
> -				hash_rxq_init[i].flow_priority -
> -				(parser->inner ? 1 : 0);
> -		}
> +		if (!parser->queue[i].ibv_attr)
> +			continue;
> +		parser->queue[i].ibv_attr->priority = priority +
> +				hash_rxq_init[i].flow_priority;
>  	}
>  }
>  
> @@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
>  		.layer = HASH_RXQ_ETH,
>  		.mark_id = MLX5_FLOW_MARK_DEFAULT,
>  	};
> -	ret = mlx5_flow_convert_attributes(attr, error);
> +	ret = mlx5_flow_convert_attributes(dev, attr, error);
>  	if (ret)
>  		return ret;
>  	ret = mlx5_flow_convert_actions(dev, actions, error, parser);
> @@ -1158,7 +1168,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
>  	 */
>  	if (!parser->drop)
>  		mlx5_flow_convert_finalise(parser);
> -	mlx5_flow_update_priority(parser, attr);
> +	mlx5_flow_update_priority(dev, parser, attr);
>  exit_free:
>  	/* Only verification is expected, all resources should be released. */
>  	if (!parser->create) {
> @@ -2450,7 +2460,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
>  	struct priv *priv = dev->data->dev_private;
>  	const struct rte_flow_attr attr = {
>  		.ingress = 1,
> -		.priority = MLX5_CTRL_FLOW_PRIORITY,
> +		.priority = priv->config.control_flow_priority,
>  	};
>  	struct rte_flow_item items[] = {
>  		{
> @@ -3161,3 +3171,50 @@ mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
>  	}
>  	return 0;
>  }
> +
> +/**
> + * Detect number of Verbs flow priorities supported.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + */
> +void
> +mlx5_flow_priorities_detect(struct rte_eth_dev *dev)
> +{
> +	struct priv *priv = dev->data->dev_private;
> +	uint32_t verb_priorities = MLX5_VERBS_FLOW_PRIO_8 * 2;
> +	struct {
> +		struct ibv_flow_attr attr;
> +		struct ibv_flow_spec_eth eth;
> +		struct ibv_flow_spec_action_drop drop;
> +	} flow_attr = {
> +		.attr = {
> +			.num_of_specs = 2,
> +			.priority = verb_priorities - 1,
> +		},
> +		.eth = {
> +			.type = IBV_FLOW_SPEC_ETH,
> +			.size = sizeof(struct ibv_flow_spec_eth),
> +		},
> +		.drop = {
> +			.size = sizeof(struct ibv_flow_spec_action_drop),
> +			.type = IBV_FLOW_SPEC_ACTION_DROP,
> +		},
> +	};
> +	struct ibv_flow *flow;
> +
> +	if (priv->config.control_flow_priority)
> +		return;
> +	flow = mlx5_glue->create_flow(priv->flow_drop_queue->qp,
> +				      &flow_attr.attr);
> +	if (flow) {
> +		priv->config.flow_priority_shift = MLX5_VERBS_FLOW_PRIO_8 / 2;
> +		claim_zero(mlx5_glue->destroy_flow(flow));
> +	} else {
> +		priv->config.flow_priority_shift = 1;
> +		verb_priorities = verb_priorities / 2;
> +	}
> +	priv->config.control_flow_priority = 1;
> +	DRV_LOG(INFO, "port %u Verbs flow priorities: %d",
> +		dev->data->port_id, verb_priorities);
> +}
> diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
> index 6bb4ffb14..d80a2e688 100644
> --- a/drivers/net/mlx5/mlx5_trigger.c
> +++ b/drivers/net/mlx5/mlx5_trigger.c
> @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  	int ret;
>  
>  	dev->data->dev_started = 1;
> -	ret = mlx5_flow_create_drop_queue(dev);
> -	if (ret) {
> -		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> -			dev->data->port_id, strerror(rte_errno));
> -		goto error;
> -	}
>  	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx queues",
>  		dev->data->port_id);
>  	rte_mempool_walk(mlx5_mp2mr_iter, priv);
> @@ -202,7 +196,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
>  	mlx5_traffic_disable(dev);
>  	mlx5_txq_stop(dev);
>  	mlx5_rxq_stop(dev);
> -	mlx5_flow_delete_drop_queue(dev);
>  	rte_errno = ret; /* Restore rte_errno. */
>  	return -rte_errno;
>  }
> @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
>  	mlx5_rxq_stop(dev);
>  	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv->mr))
>  		mlx5_mr_release(mr);
> -	mlx5_flow_delete_drop_queue(dev);
>  }
>  
>  /**
> -- 
> 2.13.3

I have few concerns on this, mlx5_pci_probe() will also probe any
under layer verbs device, and in a near future the representors
associated to a VF.
Making such detection should only be done once by the PF, I also wander
if it is possible to make such drop action in a representor directly
using Verbs.

Another concern is, this patch will be reverted in some time when those
16 priority will be always available.  It will be easier to remove this
detection function than searching for all those modifications.

I would suggest to have a standalone mlx5_flow_priorities_detect() which
creates and deletes all resources needed for this detection.

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 03/15] net/mlx5: support L3 vxlan flow
  2018-04-10 13:34 ` [PATCH v2 03/15] net/mlx5: support L3 vxlan flow Xueming Li
@ 2018-04-10 14:53   ` Nélio Laranjeiro
  0 siblings, 0 replies; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-10 14:53 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:03PM +0800, Xueming Li wrote:
> This patch add L3 vxlan support, no inner L2 header comparing to
> standard vxlan protocol.
>
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 026952b46..870d05250 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -410,7 +410,9 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
>  		.dst_sz = sizeof(struct ibv_flow_spec_tunnel),
>  	},
>  	[RTE_FLOW_ITEM_TYPE_VXLAN] = {
> -		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH),
> +		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH,
> +			       RTE_FLOW_ITEM_TYPE_IPV4,
> +			       RTE_FLOW_ITEM_TYPE_IPV6),
>  		.actions = valid_actions,
>  		.mask = &(const struct rte_flow_item_vxlan){
>  			.vni = "\xff\xff\xff",
> -- 
> 2.13.3


As there is a necessity for a v3, can you also upper VXLAN.

It also deserves a comment in the code itself, currently this seems a
bug as the RFC [1] implies to have an inner Ethernet layer.  I suppose
there is a use case for such modification but as it not explained it is
just a supposition.

Thanks,

[1] https://tools.ietf.org/html/rfc7348

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-10 13:34 ` [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification Xueming Li
@ 2018-04-10 15:17   ` Nélio Laranjeiro
  2018-04-11  8:11     ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-10 15:17 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> This patch introduced tunnel type identification based on flow rules.
> If flows of multiple tunnel types built on same queue,
> RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be used
> as tunnel type identifier.

I don't see anywhere in this patch where the bits are reserved to
identify a flow, nor values which can help to identify it.

Is this missing?

Anyway we have already very few bits in the mark making it difficult to
be used by the user, reserving again some to may lead to remove the mark
support from the flows.

> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow.c          | 125 +++++++++++++++++++++++++++++-----
>  drivers/net/mlx5/mlx5_rxq.c           |  11 ++-
>  drivers/net/mlx5/mlx5_rxtx.c          |  12 ++--
>  drivers/net/mlx5/mlx5_rxtx.h          |   9 ++-
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  21 +++---
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  17 +++--
>  6 files changed, 157 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 870d05250..65d7a9b62 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -222,6 +222,7 @@ struct rte_flow {
>  	struct rte_flow_action_rss rss_conf; /**< RSS configuration */
>  	uint16_t (*queues)[]; /**< Queues indexes to use. */
>  	uint8_t rss_key[40]; /**< copy of the RSS key. */
> +	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
>  	struct ibv_counter_set *cs; /**< Holds the counters for the rule. */
>  	struct mlx5_flow_counter_stats counter_stats;/**<The counter stats. */
>  	struct mlx5_flow frxq[RTE_DIM(hash_rxq_init)];
> @@ -238,6 +239,19 @@ struct rte_flow {
>  	(type) == RTE_FLOW_ITEM_TYPE_VXLAN || \
>  	(type) == RTE_FLOW_ITEM_TYPE_GRE)
>  
> +const uint32_t flow_ptype[] = {
> +	[RTE_FLOW_ITEM_TYPE_VXLAN] = RTE_PTYPE_TUNNEL_VXLAN,
> +	[RTE_FLOW_ITEM_TYPE_GRE] = RTE_PTYPE_TUNNEL_GRE,
> +};
> +
> +#define PTYPE_IDX(t) ((RTE_PTYPE_TUNNEL_MASK & (t)) >> 12)
> +
> +const uint32_t ptype_ext[] = {
> +	[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)] = RTE_PTYPE_TUNNEL_VXLAN |
> +					      RTE_PTYPE_L4_UDP,
> +	[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)] = RTE_PTYPE_TUNNEL_GRE,
> +};
> +
>  /** Structure to generate a simple graph of layers supported by the NIC. */
>  struct mlx5_flow_items {
>  	/** List of possible actions for these items. */
> @@ -437,6 +451,7 @@ struct mlx5_flow_parse {
>  	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queues indexes to use. */
>  	uint8_t rss_key[40]; /**< copy of the RSS key. */
>  	enum hash_rxq_type layer; /**< Last pattern layer detected. */
> +	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
>  	struct ibv_counter_set *cs; /**< Holds the counter set for the rule */
>  	struct {
>  		struct ibv_flow_attr *ibv_attr;
> @@ -860,7 +875,7 @@ mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
>  		if (ret)
>  			goto exit_item_not_supported;
>  		if (IS_TUNNEL(items->type)) {
> -			if (parser->inner) {
> +			if (parser->tunnel) {
>  				rte_flow_error_set(error, ENOTSUP,
>  						   RTE_FLOW_ERROR_TYPE_ITEM,
>  						   items,
> @@ -869,6 +884,7 @@ mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
>  				return -rte_errno;
>  			}
>  			parser->inner = IBV_FLOW_SPEC_INNER;
> +			parser->tunnel = flow_ptype[items->type];
>  		}
>  		if (parser->drop) {
>  			parser->queue[HASH_RXQ_ETH].offset += cur_item->dst_sz;
> @@ -1165,6 +1181,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
>  	}
>  	/* Third step. Conversion parse, fill the specifications. */
>  	parser->inner = 0;
> +	parser->tunnel = 0;
>  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
>  		struct mlx5_flow_data data = {
>  			.parser = parser,
> @@ -1633,6 +1650,7 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
>  
>  	id.vni[0] = 0;
>  	parser->inner = IBV_FLOW_SPEC_INNER;
> +	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)];
>  	if (spec) {
>  		if (!mask)
>  			mask = default_mask;
> @@ -1686,6 +1704,7 @@ mlx5_flow_create_gre(const struct rte_flow_item *item __rte_unused,
>  	};
>  
>  	parser->inner = IBV_FLOW_SPEC_INNER;
> +	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)];
>  	mlx5_flow_create_copy(parser, &tunnel, size);
>  	return 0;
>  }
> @@ -1864,7 +1883,8 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
>  				      parser->rss_conf.key_len,
>  				      hash_fields,
>  				      parser->rss_conf.queue,
> -				      parser->rss_conf.queue_num);
> +				      parser->rss_conf.queue_num,
> +				      parser->tunnel);
>  		if (flow->frxq[i].hrxq)
>  			continue;
>  		flow->frxq[i].hrxq =
> @@ -1873,7 +1893,8 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
>  				      parser->rss_conf.key_len,
>  				      hash_fields,
>  				      parser->rss_conf.queue,
> -				      parser->rss_conf.queue_num);
> +				      parser->rss_conf.queue_num,
> +				      parser->tunnel);
>  		if (!flow->frxq[i].hrxq) {
>  			return rte_flow_error_set(error, ENOMEM,
>  						  RTE_FLOW_ERROR_TYPE_HANDLE,
> @@ -1885,6 +1906,40 @@ mlx5_flow_create_action_queue_rss(struct rte_eth_dev *dev,
>  }
>  
>  /**
> + * RXQ update after flow rule creation.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param flow
> + *   Pointer to the flow rule.
> + */
> +static void
> +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct rte_flow *flow)
> +{
> +	struct priv *priv = dev->data->dev_private;
> +	unsigned int i;
> +
> +	if (!dev->data->dev_started)
> +		return;
> +	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> +						 [(*flow->queues)[i]];
> +		struct mlx5_rxq_ctrl *rxq_ctrl =
> +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> +
> +		rxq_data->mark |= flow->mark;
> +		if (!tunnel)
> +			continue;
> +		rxq_ctrl->tunnel_types[tunnel] += 1;

I don't understand why you need such array, the NIC is unable to return
the tunnel type has it returns only one bit saying tunnel.
Why don't it store in the priv structure the current configured tunnel?

> +		if (rxq_data->tunnel != flow->tunnel)
> +			rxq_data->tunnel = rxq_data->tunnel ?
> +					   RTE_PTYPE_TUNNEL_MASK :
> +					   flow->tunnel;
> +	}
> +}
> +
> +/**
>   * Complete flow rule creation.
>   *
>   * @param dev
> @@ -1944,12 +1999,7 @@ mlx5_flow_create_action_queue(struct rte_eth_dev *dev,
>  				   NULL, "internal error in flow creation");
>  		goto error;
>  	}
> -	for (i = 0; i != parser->rss_conf.queue_num; ++i) {
> -		struct mlx5_rxq_data *q =
> -			(*priv->rxqs)[parser->rss_conf.queue[i]];
> -
> -		q->mark |= parser->mark;
> -	}
> +	mlx5_flow_create_update_rxqs(dev, flow);
>  	return 0;
>  error:
>  	ret = rte_errno; /* Save rte_errno before cleanup. */
> @@ -2022,6 +2072,7 @@ mlx5_flow_list_create(struct rte_eth_dev *dev,
>  	}
>  	/* Copy configuration. */
>  	flow->queues = (uint16_t (*)[])(flow + 1);
> +	flow->tunnel = parser.tunnel;
>  	flow->rss_conf = (struct rte_flow_action_rss){
>  		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
>  		.level = 0,
> @@ -2113,9 +2164,38 @@ mlx5_flow_list_destroy(struct rte_eth_dev *dev, struct mlx5_flows *list,
>  	struct priv *priv = dev->data->dev_private;
>  	unsigned int i;
>  
> -	if (flow->drop || !flow->mark)
> +	if (flow->drop || !dev->data->dev_started)
>  		goto free;
> -	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> +	for (i = 0; flow->tunnel && i != flow->rss_conf.queue_num; ++i) {
> +		/* Update queue tunnel type. */
> +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> +						 [(*flow->queues)[i]];
> +		struct mlx5_rxq_ctrl *rxq_ctrl =
> +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> +
> +		RTE_ASSERT(rxq_ctrl->tunnel_types[tunnel] > 0);

use assert() not RTE_ASSERT() or make a patch to make such move in the
whole PMD.

> +		rxq_ctrl->tunnel_types[tunnel] -= 1;
> +		if (!rxq_ctrl->tunnel_types[tunnel]) {
> +			/* Update tunnel type. */
> +			uint8_t j;
> +			uint8_t types = 0;
> +			uint8_t last;
> +
> +			for (j = 0; j < RTE_DIM(rxq_ctrl->tunnel_types); j++)
> +				if (rxq_ctrl->tunnel_types[j]) {
> +					types += 1;
> +					last = j;
> +				}
> +			/* Keep same if more than one tunnel types left. */
> +			if (types == 1)
> +				rxq_data->tunnel = ptype_ext[last];
> +			else if (types == 0)
> +				/* No tunnel type left. */
> +				rxq_data->tunnel = 0;
> +		}
> +	}
> +	for (i = 0; flow->mark && i != flow->rss_conf.queue_num; ++i) {
>  		struct rte_flow *tmp;
>  		int mark = 0;
>  
> @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list)
>  {
>  	struct priv *priv = dev->data->dev_private;
>  	struct rte_flow *flow;
> +	unsigned int i;
>  
>  	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> -		unsigned int i;
>  		struct mlx5_ind_table_ibv *ind_tbl = NULL;
>  
>  		if (flow->drop) {
> @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct mlx5_flows *list)
>  		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data->port_id,
>  			(void *)flow);
>  	}
> +	/* Cleanup Rx queue tunnel info. */
> +	for (i = 0; i != priv->rxqs_n; ++i) {
> +		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> +		struct mlx5_rxq_ctrl *rxq_ctrl =
> +			container_of(q, struct mlx5_rxq_ctrl, rxq);
> +
> +		memset((void *)rxq_ctrl->tunnel_types, 0,
> +		       sizeof(rxq_ctrl->tunnel_types));
> +		q->tunnel = 0;
> +	}
>  }

This hunk does not handle the fact the Rx queue array may have some
holes i.e. the application is allowed to ask for 10 queues and only
initialise some.  In such situation this code will segfault.

It should only memset the Rx queues making part of the flow not the
others.

>  /**
> @@ -2429,7 +2519,8 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
>  					      flow->rss_conf.key_len,
>  					      hash_rxq_init[i].hash_fields,
>  					      flow->rss_conf.queue,
> -					      flow->rss_conf.queue_num);
> +					      flow->rss_conf.queue_num,
> +					      flow->tunnel);
>  			if (flow->frxq[i].hrxq)
>  				goto flow_create;
>  			flow->frxq[i].hrxq =
> @@ -2437,7 +2528,8 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
>  					      flow->rss_conf.key_len,
>  					      hash_rxq_init[i].hash_fields,
>  					      flow->rss_conf.queue,
> -					      flow->rss_conf.queue_num);
> +					      flow->rss_conf.queue_num,
> +					      flow->tunnel);
>  			if (!flow->frxq[i].hrxq) {
>  				DRV_LOG(DEBUG,
>  					"port %u flow %p cannot be applied",
> @@ -2459,10 +2551,7 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
>  			DRV_LOG(DEBUG, "port %u flow %p applied",
>  				dev->data->port_id, (void *)flow);
>  		}
> -		if (!flow->mark)
> -			continue;
> -		for (i = 0; i != flow->rss_conf.queue_num; ++i)
> -			(*priv->rxqs)[flow->rss_conf.queue[i]]->mark = 1;
> +		mlx5_flow_create_update_rxqs(dev, flow);
>  	}
>  	return 0;
>  }
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 1e4354ab3..351acfc0f 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
>   *   first queue index will be taken for the indirection table.
>   * @param queues_n
>   *   Number of queues.
> + * @param tunnel
> + *   Tunnel type.
>   *
>   * @return
>   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
> @@ -1394,7 +1396,7 @@ struct mlx5_hrxq *
>  mlx5_hrxq_new(struct rte_eth_dev *dev,
>  	      const uint8_t *rss_key, uint32_t rss_key_len,
>  	      uint64_t hash_fields,
> -	      const uint16_t *queues, uint32_t queues_n)
> +	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
>  {
>  	struct priv *priv = dev->data->dev_private;
>  	struct mlx5_hrxq *hrxq;
> @@ -1438,6 +1440,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
>  	hrxq->qp = qp;
>  	hrxq->rss_key_len = rss_key_len;
>  	hrxq->hash_fields = hash_fields;
> +	hrxq->tunnel = tunnel;
>  	memcpy(hrxq->rss_key, rss_key, rss_key_len);
>  	rte_atomic32_inc(&hrxq->refcnt);
>  	LIST_INSERT_HEAD(&priv->hrxqs, hrxq, next);
> @@ -1466,6 +1469,8 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
>   *   first queue index will be taken for the indirection table.
>   * @param queues_n
>   *   Number of queues.
> + * @param tunnel
> + *   Tunnel type.
>   *
>   * @return
>   *   An hash Rx queue on success.
> @@ -1474,7 +1479,7 @@ struct mlx5_hrxq *
>  mlx5_hrxq_get(struct rte_eth_dev *dev,
>  	      const uint8_t *rss_key, uint32_t rss_key_len,
>  	      uint64_t hash_fields,
> -	      const uint16_t *queues, uint32_t queues_n)
> +	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
>  {
>  	struct priv *priv = dev->data->dev_private;
>  	struct mlx5_hrxq *hrxq;
> @@ -1489,6 +1494,8 @@ mlx5_hrxq_get(struct rte_eth_dev *dev,
>  			continue;
>  		if (hrxq->hash_fields != hash_fields)
>  			continue;
> +		if (hrxq->tunnel != tunnel)
> +			continue;
>  		ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n);
>  		if (!ind_tbl)
>  			continue;
> diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
> index 1f422c70b..d061dfc8a 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -34,7 +34,7 @@
>  #include "mlx5_prm.h"
>  
>  static __rte_always_inline uint32_t
> -rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe);
> +rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe);
>  
>  static __rte_always_inline int
>  mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
> @@ -125,12 +125,14 @@ mlx5_set_ptype_table(void)
>  	(*p)[0x8a] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
>  		     RTE_PTYPE_L4_UDP;
>  	/* Tunneled - L3 */
> +	(*p)[0x40] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
>  	(*p)[0x41] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
>  		     RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
>  		     RTE_PTYPE_INNER_L4_NONFRAG;
>  	(*p)[0x42] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
>  		     RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
>  		     RTE_PTYPE_INNER_L4_NONFRAG;
> +	(*p)[0xc0] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
>  	(*p)[0xc1] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
>  		     RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
>  		     RTE_PTYPE_INNER_L4_NONFRAG;
> @@ -1577,6 +1579,8 @@ mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
>  /**
>   * Translate RX completion flags to packet type.
>   *
> + * @param[in] rxq
> + *   Pointer to RX queue structure.
>   * @param[in] cqe
>   *   Pointer to CQE.
>   *
> @@ -1586,7 +1590,7 @@ mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
>   *   Packet type for struct rte_mbuf.
>   */
>  static inline uint32_t
> -rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
> +rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe)
>  {
>  	uint8_t idx;
>  	uint8_t pinfo = cqe->pkt_info;
> @@ -1601,7 +1605,7 @@ rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
>  	 * bit[7] = outer_l3_type
>  	 */
>  	idx = ((pinfo & 0x3) << 6) | ((ptype & 0xfc00) >> 10);
> -	return mlx5_ptype_table[idx];
> +	return mlx5_ptype_table[idx] | rxq->tunnel * !!(idx & (1 << 6));
>  }
>  
>  /**
> @@ -1833,7 +1837,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
>  			pkt = seg;
>  			assert(len >= (rxq->crc_present << 2));
>  			/* Update packet information. */
> -			pkt->packet_type = rxq_cq_to_pkt_type(cqe);
> +			pkt->packet_type = rxq_cq_to_pkt_type(rxq, cqe);
>  			pkt->ol_flags = 0;
>  			if (rss_hash_res && rxq->rss_hash) {
>  				pkt->hash.rss = rss_hash_res;
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index a702cb603..6866f6818 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -104,6 +104,7 @@ struct mlx5_rxq_data {
>  	void *cq_uar; /* CQ user access region. */
>  	uint32_t cqn; /* CQ number. */
>  	uint8_t cq_arm_sn; /* CQ arm seq number. */
> +	uint32_t tunnel; /* Tunnel information. */
>  } __rte_cache_aligned;
>  
>  /* Verbs Rx queue elements. */
> @@ -125,6 +126,7 @@ struct mlx5_rxq_ctrl {
>  	struct mlx5_rxq_ibv *ibv; /* Verbs elements. */
>  	struct mlx5_rxq_data rxq; /* Data path structure. */
>  	unsigned int socket; /* CPU socket ID for allocations. */
> +	uint32_t tunnel_types[16]; /* Tunnel type counter. */
>  	unsigned int irq:1; /* Whether IRQ is enabled. */
>  	uint16_t idx; /* Queue index. */
>  };
> @@ -145,6 +147,7 @@ struct mlx5_hrxq {
>  	struct mlx5_ind_table_ibv *ind_table; /* Indirection table. */
>  	struct ibv_qp *qp; /* Verbs queue pair. */
>  	uint64_t hash_fields; /* Verbs Hash fields. */
> +	uint32_t tunnel; /* Tunnel type. */
>  	uint32_t rss_key_len; /* Hash key length in bytes. */
>  	uint8_t rss_key[]; /* Hash key. */
>  };
> @@ -248,11 +251,13 @@ int mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev);
>  struct mlx5_hrxq *mlx5_hrxq_new(struct rte_eth_dev *dev,
>  				const uint8_t *rss_key, uint32_t rss_key_len,
>  				uint64_t hash_fields,
> -				const uint16_t *queues, uint32_t queues_n);
> +				const uint16_t *queues, uint32_t queues_n,
> +				uint32_t tunnel);
>  struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
>  				const uint8_t *rss_key, uint32_t rss_key_len,
>  				uint64_t hash_fields,
> -				const uint16_t *queues, uint32_t queues_n);
> +				const uint16_t *queues, uint32_t queues_n,
> +				uint32_t tunnel);
>  int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq *hxrq);
>  int mlx5_hrxq_ibv_verify(struct rte_eth_dev *dev);
>  uint64_t mlx5_get_rx_port_offloads(void);
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> index bbe1818ef..9f9136108 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> @@ -551,6 +551,7 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
>  	const uint64x1_t mbuf_init = vld1_u64(&rxq->mbuf_initializer);
>  	const uint64x1_t r32_mask = vcreate_u64(0xffffffff);
>  	uint64x2_t rearm0, rearm1, rearm2, rearm3;
> +	uint8_t pt_idx0, pt_idx1, pt_idx2, pt_idx3;
>  
>  	if (rxq->mark) {
>  		const uint32x4_t ft_def = vdupq_n_u32(MLX5_FLOW_MARK_DEFAULT);
> @@ -583,14 +584,18 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
>  	ptype = vshrn_n_u32(ptype_info, 10);
>  	/* Errored packets will have RTE_PTYPE_ALL_MASK. */
>  	ptype = vorr_u16(ptype, op_err);
> -	pkts[0]->packet_type =
> -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 6)];
> -	pkts[1]->packet_type =
> -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 4)];
> -	pkts[2]->packet_type =
> -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 2)];
> -	pkts[3]->packet_type =
> -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 0)];
> +	pt_idx0 = vget_lane_u8(vreinterpret_u8_u16(ptype), 6);
> +	pt_idx1 = vget_lane_u8(vreinterpret_u8_u16(ptype), 4);
> +	pt_idx2 = vget_lane_u8(vreinterpret_u8_u16(ptype), 2);
> +	pt_idx3 = vget_lane_u8(vreinterpret_u8_u16(ptype), 0);
> +	pkts[0]->packet_type = mlx5_ptype_table[pt_idx0] |
> +			       !!(pt_idx0 & (1 << 6)) * rxq->tunnel;
> +	pkts[1]->packet_type = mlx5_ptype_table[pt_idx1] |
> +			       !!(pt_idx1 & (1 << 6)) * rxq->tunnel;
> +	pkts[2]->packet_type = mlx5_ptype_table[pt_idx2] |
> +			       !!(pt_idx2 & (1 << 6)) * rxq->tunnel;
> +	pkts[3]->packet_type = mlx5_ptype_table[pt_idx3] |
> +			       !!(pt_idx3 & (1 << 6)) * rxq->tunnel;
>  	/* Fill flags for checksum and VLAN. */
>  	pinfo = vandq_u32(ptype_info, ptype_ol_mask);
>  	pinfo = vreinterpretq_u32_u8(
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> index c088bcb51..d2492481d 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> @@ -542,6 +542,7 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i cqes[4],
>  	const __m128i mbuf_init =
>  		_mm_loadl_epi64((__m128i *)&rxq->mbuf_initializer);
>  	__m128i rearm0, rearm1, rearm2, rearm3;
> +	uint8_t pt_idx0, pt_idx1, pt_idx2, pt_idx3;
>  
>  	/* Extract pkt_info field. */
>  	pinfo0 = _mm_unpacklo_epi32(cqes[0], cqes[1]);
> @@ -595,10 +596,18 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i cqes[4],
>  	/* Errored packets will have RTE_PTYPE_ALL_MASK. */
>  	op_err = _mm_srli_epi16(op_err, 8);
>  	ptype = _mm_or_si128(ptype, op_err);
> -	pkts[0]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 0)];
> -	pkts[1]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 2)];
> -	pkts[2]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 4)];
> -	pkts[3]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 6)];
> +	pt_idx0 = _mm_extract_epi8(ptype, 0);
> +	pt_idx1 = _mm_extract_epi8(ptype, 2);
> +	pt_idx2 = _mm_extract_epi8(ptype, 4);
> +	pt_idx3 = _mm_extract_epi8(ptype, 6);
> +	pkts[0]->packet_type = mlx5_ptype_table[pt_idx0] |
> +			       !!(pt_idx0 & (1 << 6)) * rxq->tunnel;
> +	pkts[1]->packet_type = mlx5_ptype_table[pt_idx1] |
> +			       !!(pt_idx1 & (1 << 6)) * rxq->tunnel;
> +	pkts[2]->packet_type = mlx5_ptype_table[pt_idx2] |
> +			       !!(pt_idx2 & (1 << 6)) * rxq->tunnel;
> +	pkts[3]->packet_type = mlx5_ptype_table[pt_idx3] |
> +			       !!(pt_idx3 & (1 << 6)) * rxq->tunnel;
>  	/* Fill flags for checksum and VLAN. */
>  	pinfo = _mm_and_si128(pinfo, ptype_ol_mask);
>  	pinfo = _mm_shuffle_epi8(cv_flag_sel, pinfo);
> -- 
> 2.13.3

Regards,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-10 14:41   ` Nélio Laranjeiro
@ 2018-04-10 15:22     ` Xueming(Steven) Li
  2018-04-12  9:09       ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-10 15:22 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev

Hi Nelio,

> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Tuesday, April 10, 2018 10:42 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> 
> On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > priorites support:
> > 0-3: RTE FLOW tunnel rule
> > 4-7: RTE FLOW non-tunnel rule
> > 8-15: PMD control flow
> 
> This commit log is inducing people in error, this amount of priority
> depends on the Mellanox OFED installed, it is not available on upstream
> Linux kernel yet nor in the current Mellanox OFED GA.
> 
> What happens when those amount of priority are not available, is it
> removing a functionality?  Will it collide with other flows?

If 16  priorities not available, simply behavior as 8 priorities.

> 
> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5.c         |  10 ++++
> >  drivers/net/mlx5/mlx5.h         |   8 +++
> >  drivers/net/mlx5/mlx5_flow.c    | 107 ++++++++++++++++++++++++++++++---
> -------
> >  drivers/net/mlx5/mlx5_trigger.c |   8 ---
> >  4 files changed, 100 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> > cfab55897..a1f2799e5 100644
> > --- a/drivers/net/mlx5/mlx5.c
> > +++ b/drivers/net/mlx5/mlx5.c
> > @@ -197,6 +197,7 @@ mlx5_dev_close(struct rte_eth_dev *dev)
> >  		priv->txqs_n = 0;
> >  		priv->txqs = NULL;
> >  	}
> > +	mlx5_flow_delete_drop_queue(dev);
> >
> >  	if (priv->pd != NULL) {
> >  		assert(priv->ctx != NULL);
> >  		claim_zero(mlx5_glue->dealloc_pd(priv->pd));
> > @@ -993,6 +994,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
> >  		mlx5_set_link_up(eth_dev);
> >  		/* Store device configuration on private structure. */
> >  		priv->config = config;
> > +		/* Create drop queue. */
> > +		err = mlx5_flow_create_drop_queue(eth_dev);
> > +		if (err) {
> > +			DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > +				eth_dev->data->port_id, strerror(rte_errno));
> > +			goto port_error;
> > +		}
> > +		/* Supported flow priority number detection. */
> > +		mlx5_flow_priorities_detect(eth_dev);
> >  		continue;
> >  port_error:
> >  		if (priv)
> > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > 63b24e6bb..708272f6d 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -89,6 +89,8 @@ struct mlx5_dev_config {
> >  	unsigned int rx_vec_en:1; /* Rx vector is enabled. */
> >  	unsigned int mpw_hdr_dseg:1; /* Enable DSEGs in the title WQEBB. */
> >  	unsigned int vf_nl_en:1; /* Enable Netlink requests in VF mode. */
> > +	unsigned int flow_priority_shift; /* Non-tunnel flow priority shift.
> */
> > +	unsigned int control_flow_priority; /* Control flow priority. */
> >  	unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */
> >  	unsigned int ind_table_max_size; /* Maximum indirection table size.
> */
> >  	int txq_inline; /* Maximum packet size for inlining. */ @@ -105,6
> > +107,11 @@ enum mlx5_verbs_alloc_type {
> >  	MLX5_VERBS_ALLOC_TYPE_RX_QUEUE,
> >  };
> >
> > +/* 8 Verbs priorities per flow. */
> > +#define MLX5_VERBS_FLOW_PRIO_8 8
> > +/* 4 Verbs priorities per flow. */
> > +#define MLX5_VERBS_FLOW_PRIO_4 4
> > +
> >  /**
> >   * Verbs allocator needs a context to know in the callback which kind
> of
> >   * resources it is allocating.
> > @@ -253,6 +260,7 @@ int mlx5_traffic_restart(struct rte_eth_dev *dev);
> >
> >  /* mlx5_flow.c */
> >
> > +void mlx5_flow_priorities_detect(struct rte_eth_dev *dev);
> >  int mlx5_flow_validate(struct rte_eth_dev *dev,
> >  		       const struct rte_flow_attr *attr,
> >  		       const struct rte_flow_item items[], diff --git
> > a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index
> > 288610620..394760418 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -32,9 +32,6 @@
> >  #include "mlx5_prm.h"
> >  #include "mlx5_glue.h"
> >
> > -/* Define minimal priority for control plane flows. */ -#define
> > MLX5_CTRL_FLOW_PRIORITY 4
> > -
> >  /* Internet Protocol versions. */
> >  #define MLX5_IPV4 4
> >  #define MLX5_IPV6 6
> > @@ -129,7 +126,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
> >  				IBV_RX_HASH_SRC_PORT_TCP |
> >  				IBV_RX_HASH_DST_PORT_TCP),
> >  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_TCP,
> > -		.flow_priority = 1,
> > +		.flow_priority = 0,
> >  		.ip_version = MLX5_IPV4,
> >  	},
> >  	[HASH_RXQ_UDPV4] = {
> > @@ -138,7 +135,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
> >  				IBV_RX_HASH_SRC_PORT_UDP |
> >  				IBV_RX_HASH_DST_PORT_UDP),
> >  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV4_UDP,
> > -		.flow_priority = 1,
> > +		.flow_priority = 0,
> >  		.ip_version = MLX5_IPV4,
> >  	},
> >  	[HASH_RXQ_IPV4] = {
> > @@ -146,7 +143,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
> >  				IBV_RX_HASH_DST_IPV4),
> >  		.dpdk_rss_hf = (ETH_RSS_IPV4 |
> >  				ETH_RSS_FRAG_IPV4),
> > -		.flow_priority = 2,
> > +		.flow_priority = 1,
> >  		.ip_version = MLX5_IPV4,
> >  	},
> >  	[HASH_RXQ_TCPV6] = {
> > @@ -155,7 +152,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
> >  				IBV_RX_HASH_SRC_PORT_TCP |
> >  				IBV_RX_HASH_DST_PORT_TCP),
> >  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_TCP,
> > -		.flow_priority = 1,
> > +		.flow_priority = 0,
> >  		.ip_version = MLX5_IPV6,
> >  	},
> >  	[HASH_RXQ_UDPV6] = {
> > @@ -164,7 +161,7 @@ const struct hash_rxq_init hash_rxq_init[] = {
> >  				IBV_RX_HASH_SRC_PORT_UDP |
> >  				IBV_RX_HASH_DST_PORT_UDP),
> >  		.dpdk_rss_hf = ETH_RSS_NONFRAG_IPV6_UDP,
> > -		.flow_priority = 1,
> > +		.flow_priority = 0,
> >  		.ip_version = MLX5_IPV6,
> >  	},
> >  	[HASH_RXQ_IPV6] = {
> > @@ -172,13 +169,13 @@ const struct hash_rxq_init hash_rxq_init[] = {
> >  				IBV_RX_HASH_DST_IPV6),
> >  		.dpdk_rss_hf = (ETH_RSS_IPV6 |
> >  				ETH_RSS_FRAG_IPV6),
> > -		.flow_priority = 2,
> > +		.flow_priority = 1,
> >  		.ip_version = MLX5_IPV6,
> >  	},
> >  	[HASH_RXQ_ETH] = {
> >  		.hash_fields = 0,
> >  		.dpdk_rss_hf = 0,
> > -		.flow_priority = 3,
> > +		.flow_priority = 2,
> >  	},
> >  };
> 
> If the amount of priorities remains 8, you are removing the priority for
> the tunnel flows introduced by commit 749365717f5c ("net/mlx5: change
> tunnel flow priority")
> 
> Please keep this functionality when this patch fails to get the expected
> 16 Verbs priorities.

These priority shift are different in 16 priorities scenario, I changed it
to calculation. In function mlx5_flow_priorities_detect(), priority shift 
will be 1 if 8 priorities, 4 in case of 16 priorities. Please refer to changes
in function mlx5_flow_update_priority() as well.

> 
> > @@ -536,6 +533,8 @@ mlx5_flow_item_validate(const struct rte_flow_item
> > *item,
> >  /**
> >   * Extract attribute to the parser.
> >   *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> >   * @param[in] attr
> >   *   Flow rule attributes.
> >   * @param[out] error
> > @@ -545,9 +544,12 @@ mlx5_flow_item_validate(const struct rte_flow_item
> *item,
> >   *   0 on success, a negative errno value otherwise and rte_errno is
> set.
> >   */
> >  static int
> > -mlx5_flow_convert_attributes(const struct rte_flow_attr *attr,
> > +mlx5_flow_convert_attributes(struct rte_eth_dev *dev,
> > +			     const struct rte_flow_attr *attr,
> >  			     struct rte_flow_error *error)  {
> > +	struct priv *priv = dev->data->dev_private;
> > +
> >  	if (attr->group) {
> >  		rte_flow_error_set(error, ENOTSUP,
> >  				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP, @@ -555,7
> +557,7 @@
> > mlx5_flow_convert_attributes(const struct rte_flow_attr *attr,
> >  				   "groups are not supported");
> >  		return -rte_errno;
> >  	}
> > -	if (attr->priority && attr->priority != MLX5_CTRL_FLOW_PRIORITY) {
> > +	if (attr->priority > priv->config.control_flow_priority) {
> >  		rte_flow_error_set(error, ENOTSUP,
> >  				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
> >  				   NULL,
> > @@ -900,30 +902,38 @@ mlx5_flow_convert_allocate(unsigned int size,
> struct rte_flow_error *error)
> >   * Make inner packet matching with an higher priority from the non
> Inner
> >   * matching.
> >   *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> >   * @param[in, out] parser
> >   *   Internal parser structure.
> >   * @param attr
> >   *   User flow attribute.
> >   */
> >  static void
> > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > +			  struct mlx5_flow_parse *parser,
> >  			  const struct rte_flow_attr *attr)  {
> > +	struct priv *priv = dev->data->dev_private;
> >  	unsigned int i;
> > +	uint16_t priority;
> >
> > +	if (priv->config.flow_priority_shift == 1)
> > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > +	else
> > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > +	if (!parser->inner)
> > +		priority += priv->config.flow_priority_shift;
> >  	if (parser->drop) {
> > -		parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> > -			attr->priority +
> > -			hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > +		parser->queue[HASH_RXQ_ETH].ibv_attr->priority = priority +
> > +				hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> >  		return;
> >  	}
> >  	for (i = 0; i != hash_rxq_init_n; ++i) {
> > -		if (parser->queue[i].ibv_attr) {
> > -			parser->queue[i].ibv_attr->priority =
> > -				attr->priority +
> > -				hash_rxq_init[i].flow_priority -
> > -				(parser->inner ? 1 : 0);
> > -		}
> > +		if (!parser->queue[i].ibv_attr)
> > +			continue;
> > +		parser->queue[i].ibv_attr->priority = priority +
> > +				hash_rxq_init[i].flow_priority;
> >  	}
> >  }
> >
> > @@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> >  		.layer = HASH_RXQ_ETH,
> >  		.mark_id = MLX5_FLOW_MARK_DEFAULT,
> >  	};
> > -	ret = mlx5_flow_convert_attributes(attr, error);
> > +	ret = mlx5_flow_convert_attributes(dev, attr, error);
> >  	if (ret)
> >  		return ret;
> >  	ret = mlx5_flow_convert_actions(dev, actions, error, parser); @@
> > -1158,7 +1168,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> >  	 */
> >  	if (!parser->drop)
> >  		mlx5_flow_convert_finalise(parser);
> > -	mlx5_flow_update_priority(parser, attr);
> > +	mlx5_flow_update_priority(dev, parser, attr);
> >  exit_free:
> >  	/* Only verification is expected, all resources should be released.
> */
> >  	if (!parser->create) {
> > @@ -2450,7 +2460,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
> >  	struct priv *priv = dev->data->dev_private;
> >  	const struct rte_flow_attr attr = {
> >  		.ingress = 1,
> > -		.priority = MLX5_CTRL_FLOW_PRIORITY,
> > +		.priority = priv->config.control_flow_priority,
> >  	};
> >  	struct rte_flow_item items[] = {
> >  		{
> > @@ -3161,3 +3171,50 @@ mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
> >  	}
> >  	return 0;
> >  }
> > +
> > +/**
> > + * Detect number of Verbs flow priorities supported.
> > + *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> > + */
> > +void
> > +mlx5_flow_priorities_detect(struct rte_eth_dev *dev) {
> > +	struct priv *priv = dev->data->dev_private;
> > +	uint32_t verb_priorities = MLX5_VERBS_FLOW_PRIO_8 * 2;
> > +	struct {
> > +		struct ibv_flow_attr attr;
> > +		struct ibv_flow_spec_eth eth;
> > +		struct ibv_flow_spec_action_drop drop;
> > +	} flow_attr = {
> > +		.attr = {
> > +			.num_of_specs = 2,
> > +			.priority = verb_priorities - 1,
> > +		},
> > +		.eth = {
> > +			.type = IBV_FLOW_SPEC_ETH,
> > +			.size = sizeof(struct ibv_flow_spec_eth),
> > +		},
> > +		.drop = {
> > +			.size = sizeof(struct ibv_flow_spec_action_drop),
> > +			.type = IBV_FLOW_SPEC_ACTION_DROP,
> > +		},
> > +	};
> > +	struct ibv_flow *flow;
> > +
> > +	if (priv->config.control_flow_priority)
> > +		return;
> > +	flow = mlx5_glue->create_flow(priv->flow_drop_queue->qp,
> > +				      &flow_attr.attr);
> > +	if (flow) {
> > +		priv->config.flow_priority_shift = MLX5_VERBS_FLOW_PRIO_8 / 2;
> > +		claim_zero(mlx5_glue->destroy_flow(flow));
> > +	} else {
> > +		priv->config.flow_priority_shift = 1;
> > +		verb_priorities = verb_priorities / 2;
> > +	}
> > +	priv->config.control_flow_priority = 1;
> > +	DRV_LOG(INFO, "port %u Verbs flow priorities: %d",
> > +		dev->data->port_id, verb_priorities); }
> > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688 100644
> > --- a/drivers/net/mlx5/mlx5_trigger.c
> > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> >  	int ret;
> >
> >  	dev->data->dev_started = 1;
> > -	ret = mlx5_flow_create_drop_queue(dev);
> > -	if (ret) {
> > -		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > -			dev->data->port_id, strerror(rte_errno));
> > -		goto error;
> > -	}
> >  	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx queues",
> >  		dev->data->port_id);
> >  	rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > mlx5_dev_start(struct rte_eth_dev *dev)
> >  	mlx5_traffic_disable(dev);
> >  	mlx5_txq_stop(dev);
> >  	mlx5_rxq_stop(dev);
> > -	mlx5_flow_delete_drop_queue(dev);
> >  	rte_errno = ret; /* Restore rte_errno. */
> >  	return -rte_errno;
> >  }
> > @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
> >  	mlx5_rxq_stop(dev);
> >  	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv->mr))
> >  		mlx5_mr_release(mr);
> > -	mlx5_flow_delete_drop_queue(dev);
> >  }
> >
> >  /**
> > --
> > 2.13.3
> 
> I have few concerns on this, mlx5_pci_probe() will also probe any under
> layer verbs device, and in a near future the representors associated to a
> VF.
> Making such detection should only be done once by the PF, I also wander if
> it is possible to make such drop action in a representor directly using
> Verbs.

Then there should be some work to disable flows in representors? that 
supposed to cover this.

> 
> Another concern is, this patch will be reverted in some time when those
> 16 priority will be always available.  It will be easier to remove this
> detection function than searching for all those modifications.
> 
> I would suggest to have a standalone mlx5_flow_priorities_detect() which
> creates and deletes all resources needed for this detection.

There is an upcoming new feature to support priorities more than 16, auto 
detection will be kept IMHO. Besides, there will be a bundle of resource 
creation and removal in this standalone function, I'm not sure it valuable 
to duplicate them, please refer to function mlx5_flow_create_drop_queue().


> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads
  2018-04-10 13:34 ` [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads Xueming Li
@ 2018-04-10 15:27   ` Nélio Laranjeiro
  2018-04-11  8:46     ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-10 15:27 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:05PM +0800, Xueming Li wrote:
> This patch support tunnel inner checksum offloads. By creating tunnel
> flow, once tunnel packet type(RTE_PTYPE_TUNNEL_xxx) identified,

Where is the code creating the tunnel flow?

> PKT_RX_IP_CKSUM_XXX and PKT_RX_L4_CKSUM_XXX represent checksum result of
> inner headers, outer L3 and L4 header checksum are always valid as soon
> as tunnel identified. If no tunnel identified, PKT_RX_IP_CKSUM_XXX and
> PKT_RX_L4_CKSUM_XXX represent checksum result of outer L3 and L4
> headers.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow.c |  7 +++++--
>  drivers/net/mlx5/mlx5_rxq.c  |  2 --
>  drivers/net/mlx5/mlx5_rxtx.c | 18 ++++--------------
>  drivers/net/mlx5/mlx5_rxtx.h |  1 -
>  4 files changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 65d7a9b62..b3ad6dc85 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -829,6 +829,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
>  /**
>   * Validate items.
>   *
> + * @param dev
> + *   Pointer to Ethernet device.
>   * @param[in] items
>   *   Pattern specification (list terminated by the END pattern item).
>   * @param[out] error
> @@ -840,7 +842,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
>   *   0 on success, a negative errno value otherwise and rte_errno is set.
>   */
>  static int
> -mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
> +mlx5_flow_convert_items_validate(struct rte_eth_dev *dev __rte_unused,
> +				 const struct rte_flow_item items[],
>  				 struct rte_flow_error *error,
>  				 struct mlx5_flow_parse *parser)
>  {
> @@ -1146,7 +1149,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
>  	ret = mlx5_flow_convert_actions(dev, actions, error, parser);
>  	if (ret)
>  		return ret;
> -	ret = mlx5_flow_convert_items_validate(items, error, parser);
> +	ret = mlx5_flow_convert_items_validate(dev, items, error, parser);
>  	if (ret)
>  		return ret;
>  	mlx5_flow_convert_finalise(parser);

I don't understand the necessity of the two hunks above.

> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 351acfc0f..073732e16 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -1045,8 +1045,6 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
>  	}
>  	/* Toggle RX checksum offload if hardware supports it. */
>  	tmpl->rxq.csum = !!(conf->offloads & DEV_RX_OFFLOAD_CHECKSUM);
> -	tmpl->rxq.csum_l2tun = (!!(conf->offloads & DEV_RX_OFFLOAD_CHECKSUM) &&
> -				priv->config.tunnel_en);
>  	tmpl->rxq.hw_timestamp = !!(conf->offloads & DEV_RX_OFFLOAD_TIMESTAMP);
>  	/* Configure VLAN stripping. */
>  	tmpl->rxq.vlan_strip = !!(conf->offloads & DEV_RX_OFFLOAD_VLAN_STRIP);
> diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
> index d061dfc8a..285b2dbf0 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -41,7 +41,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
>  		 uint16_t cqe_cnt, uint32_t *rss_hash);
>  
>  static __rte_always_inline uint32_t
> -rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe);
> +rxq_cq_to_ol_flags(volatile struct mlx5_cqe *cqe);
>  
>  uint32_t mlx5_ptype_table[] __rte_cache_aligned = {
>  	[0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */
> @@ -1728,8 +1728,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
>  /**
>   * Translate RX completion flags to offload flags.
>   *
> - * @param[in] rxq
> - *   Pointer to RX queue structure.
>   * @param[in] cqe
>   *   Pointer to CQE.
>   *
> @@ -1737,7 +1735,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
>   *   Offload flags (ol_flags) for struct rte_mbuf.
>   */
>  static inline uint32_t
> -rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe)
> +rxq_cq_to_ol_flags(volatile struct mlx5_cqe *cqe)
>  {
>  	uint32_t ol_flags = 0;
>  	uint16_t flags = rte_be_to_cpu_16(cqe->hdr_type_etc);
> @@ -1749,14 +1747,6 @@ rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe)
>  		TRANSPOSE(flags,
>  			  MLX5_CQE_RX_L4_HDR_VALID,
>  			  PKT_RX_L4_CKSUM_GOOD);
> -	if ((cqe->pkt_info & MLX5_CQE_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
> -		ol_flags |=
> -			TRANSPOSE(flags,
> -				  MLX5_CQE_RX_L3_HDR_VALID,
> -				  PKT_RX_IP_CKSUM_GOOD) |
> -			TRANSPOSE(flags,
> -				  MLX5_CQE_RX_L4_HDR_VALID,
> -				  PKT_RX_L4_CKSUM_GOOD);
>  	return ol_flags;
>  }
>  
> @@ -1855,8 +1845,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
>  						mlx5_flow_mark_get(mark);
>  				}
>  			}
> -			if (rxq->csum | rxq->csum_l2tun)
> -				pkt->ol_flags |= rxq_cq_to_ol_flags(rxq, cqe);
> +			if (rxq->csum)
> +				pkt->ol_flags |= rxq_cq_to_ol_flags(cqe);
>  			if (rxq->vlan_strip &&
>  			    (cqe->hdr_type_etc &
>  			     rte_cpu_to_be_16(MLX5_CQE_VLAN_STRIPPED))) {
> diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
> index 6866f6818..d35605b55 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.h
> +++ b/drivers/net/mlx5/mlx5_rxtx.h
> @@ -77,7 +77,6 @@ struct rxq_zip {
>  /* RX queue descriptor. */
>  struct mlx5_rxq_data {
>  	unsigned int csum:1; /* Enable checksum offloading. */
> -	unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
>  	unsigned int hw_timestamp:1; /* Enable HW timestamp. */
>  	unsigned int vlan_strip:1; /* Enable VLAN stripping. */
>  	unsigned int crc_present:1; /* CRC must be subtracted. */
> -- 
> 2.13.3

This last part seems to introduce a regression by removing the support
for the tunnel checksum offload.

Seems this patch is incomplete or wrongly explained.

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 06/15] net/mlx5: split flow RSS handling logic
  2018-04-10 13:34 ` [PATCH v2 06/15] net/mlx5: split flow RSS handling logic Xueming Li
@ 2018-04-10 15:28   ` Nélio Laranjeiro
  0 siblings, 0 replies; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-10 15:28 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:06PM +0800, Xueming Li wrote:
> This patch split out flow RSS hash field handling logic to dedicate
> function.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

> ---
>  drivers/net/mlx5/mlx5_flow.c | 94 +++++++++++++++++++++++++-------------------
>  1 file changed, 53 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index b3ad6dc85..64658bc0e 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -992,13 +992,6 @@ mlx5_flow_update_priority(struct rte_eth_dev *dev,
>  static void
>  mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
>  {
> -	const unsigned int ipv4 =
> -		hash_rxq_init[parser->layer].ip_version == MLX5_IPV4;
> -	const enum hash_rxq_type hmin = ipv4 ? HASH_RXQ_TCPV4 : HASH_RXQ_TCPV6;
> -	const enum hash_rxq_type hmax = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> -	const enum hash_rxq_type ohmin = ipv4 ? HASH_RXQ_TCPV6 : HASH_RXQ_TCPV4;
> -	const enum hash_rxq_type ohmax = ipv4 ? HASH_RXQ_IPV6 : HASH_RXQ_IPV4;
> -	const enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
>  	unsigned int i;
>  
>  	/* Remove any other flow not matching the pattern. */
> @@ -1011,40 +1004,6 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
>  		}
>  		return;
>  	}
> -	if (parser->layer == HASH_RXQ_ETH) {
> -		goto fill;
> -	} else {
> -		/*
> -		 * This layer becomes useless as the pattern define under
> -		 * layers.
> -		 */
> -		rte_free(parser->queue[HASH_RXQ_ETH].ibv_attr);
> -		parser->queue[HASH_RXQ_ETH].ibv_attr = NULL;
> -	}
> -	/* Remove opposite kind of layer e.g. IPv6 if the pattern is IPv4. */
> -	for (i = ohmin; i != (ohmax + 1); ++i) {
> -		if (!parser->queue[i].ibv_attr)
> -			continue;
> -		rte_free(parser->queue[i].ibv_attr);
> -		parser->queue[i].ibv_attr = NULL;
> -	}
> -	/* Remove impossible flow according to the RSS configuration. */
> -	if (hash_rxq_init[parser->layer].dpdk_rss_hf &
> -	    parser->rss_conf.types) {
> -		/* Remove any other flow. */
> -		for (i = hmin; i != (hmax + 1); ++i) {
> -			if ((i == parser->layer) ||
> -			     (!parser->queue[i].ibv_attr))
> -				continue;
> -			rte_free(parser->queue[i].ibv_attr);
> -			parser->queue[i].ibv_attr = NULL;
> -		}
> -	} else  if (!parser->queue[ip].ibv_attr) {
> -		/* no RSS possible with the current configuration. */
> -		parser->rss_conf.queue_num = 1;
> -		return;
> -	}
> -fill:
>  	/*
>  	 * Fill missing layers in verbs specifications, or compute the correct
>  	 * offset to allocate the memory space for the attributes and
> @@ -1107,6 +1066,56 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
>  }
>  
>  /**
> + * Update flows according to pattern and RSS hash fields.
> + *
> + * @param[in, out] parser
> + *   Internal parser structure.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +static int
> +mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
> +{
> +	const unsigned int ipv4 =
> +		hash_rxq_init[parser->layer].ip_version == MLX5_IPV4;
> +	const enum hash_rxq_type hmin = ipv4 ? HASH_RXQ_TCPV4 : HASH_RXQ_TCPV6;
> +	const enum hash_rxq_type hmax = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> +	const enum hash_rxq_type ohmin = ipv4 ? HASH_RXQ_TCPV6 : HASH_RXQ_TCPV4;
> +	const enum hash_rxq_type ohmax = ipv4 ? HASH_RXQ_IPV6 : HASH_RXQ_IPV4;
> +	const enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> +	unsigned int i;
> +
> +	if (parser->layer == HASH_RXQ_ETH)
> +		return 0;
> +	/* This layer becomes useless as the pattern define under layers. */
> +	rte_free(parser->queue[HASH_RXQ_ETH].ibv_attr);
> +	parser->queue[HASH_RXQ_ETH].ibv_attr = NULL;
> +	/* Remove opposite kind of layer e.g. IPv6 if the pattern is IPv4. */
> +	for (i = ohmin; i != (ohmax + 1); ++i) {
> +		if (!parser->queue[i].ibv_attr)
> +			continue;
> +		rte_free(parser->queue[i].ibv_attr);
> +		parser->queue[i].ibv_attr = NULL;
> +	}
> +	/* Remove impossible flow according to the RSS configuration. */
> +	if (hash_rxq_init[parser->layer].dpdk_rss_hf &
> +	    parser->rss_conf.types) {
> +		/* Remove any other flow. */
> +		for (i = hmin; i != (hmax + 1); ++i) {
> +			if (i == parser->layer || !parser->queue[i].ibv_attr)
> +				continue;
> +			rte_free(parser->queue[i].ibv_attr);
> +			parser->queue[i].ibv_attr = NULL;
> +		}
> +	} else if (!parser->queue[ip].ibv_attr) {
> +		/* no RSS possible with the current configuration. */
> +		parser->rss_conf.queue_num = 1;
> +	}
> +	return 0;
> +}
> +
> +/**
>   * Validate and convert a flow supported by the NIC.
>   *
>   * @param dev
> @@ -1214,6 +1223,9 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
>  	 * configuration.
>  	 */
>  	if (!parser->drop)
> +		ret = mlx5_flow_convert_rss(parser);
> +		if (ret)
> +			goto exit_free;
>  		mlx5_flow_convert_finalise(parser);
>  	mlx5_flow_update_priority(dev, parser, attr);
>  exit_free:
> -- 
> 2.13.3
> 

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-10 15:17   ` Nélio Laranjeiro
@ 2018-04-11  8:11     ` Xueming(Steven) Li
  2018-04-12  9:50       ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-11  8:11 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev

Hi Nelio,

> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Tuesday, April 10, 2018 11:17 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> identification
> 
> On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > This patch introduced tunnel type identification based on flow rules.
> > If flows of multiple tunnel types built on same queue,
> > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be
> > used as tunnel type identifier.
> 
> I don't see anywhere in this patch where the bits are reserved to identify
> a flow, nor values which can help to identify it.
> 
> Is this missing?
> 
> Anyway we have already very few bits in the mark making it difficult to be
> used by the user, reserving again some to may lead to remove the mark
> support from the flows.

Not all users will use multiple tunnel types, this is not included in this patch
set and left to user decision. I'll update comments to make this clear.

> 
> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_flow.c          | 125
> +++++++++++++++++++++++++++++-----
> >  drivers/net/mlx5/mlx5_rxq.c           |  11 ++-
> >  drivers/net/mlx5/mlx5_rxtx.c          |  12 ++--
> >  drivers/net/mlx5/mlx5_rxtx.h          |   9 ++-
> >  drivers/net/mlx5/mlx5_rxtx_vec_neon.h |  21 +++---
> > drivers/net/mlx5/mlx5_rxtx_vec_sse.h  |  17 +++--
> >  6 files changed, 157 insertions(+), 38 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_flow.c
> > b/drivers/net/mlx5/mlx5_flow.c index 870d05250..65d7a9b62 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -222,6 +222,7 @@ struct rte_flow {
> >  	struct rte_flow_action_rss rss_conf; /**< RSS configuration */
> >  	uint16_t (*queues)[]; /**< Queues indexes to use. */
> >  	uint8_t rss_key[40]; /**< copy of the RSS key. */
> > +	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
> >  	struct ibv_counter_set *cs; /**< Holds the counters for the rule. */
> >  	struct mlx5_flow_counter_stats counter_stats;/**<The counter stats.
> */
> >  	struct mlx5_flow frxq[RTE_DIM(hash_rxq_init)]; @@ -238,6 +239,19 @@
> > struct rte_flow {
> >  	(type) == RTE_FLOW_ITEM_TYPE_VXLAN || \
> >  	(type) == RTE_FLOW_ITEM_TYPE_GRE)
> >
> > +const uint32_t flow_ptype[] = {
> > +	[RTE_FLOW_ITEM_TYPE_VXLAN] = RTE_PTYPE_TUNNEL_VXLAN,
> > +	[RTE_FLOW_ITEM_TYPE_GRE] = RTE_PTYPE_TUNNEL_GRE, };
> > +
> > +#define PTYPE_IDX(t) ((RTE_PTYPE_TUNNEL_MASK & (t)) >> 12)
> > +
> > +const uint32_t ptype_ext[] = {
> > +	[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)] = RTE_PTYPE_TUNNEL_VXLAN |
> > +					      RTE_PTYPE_L4_UDP,
> > +	[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)] = RTE_PTYPE_TUNNEL_GRE, };
> > +
> >  /** Structure to generate a simple graph of layers supported by the
> > NIC. */  struct mlx5_flow_items {
> >  	/** List of possible actions for these items. */ @@ -437,6 +451,7 @@
> > struct mlx5_flow_parse {
> >  	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queues indexes to use.
> */
> >  	uint8_t rss_key[40]; /**< copy of the RSS key. */
> >  	enum hash_rxq_type layer; /**< Last pattern layer detected. */
> > +	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
> >  	struct ibv_counter_set *cs; /**< Holds the counter set for the rule
> */
> >  	struct {
> >  		struct ibv_flow_attr *ibv_attr;
> > @@ -860,7 +875,7 @@ mlx5_flow_convert_items_validate(const struct
> rte_flow_item items[],
> >  		if (ret)
> >  			goto exit_item_not_supported;
> >  		if (IS_TUNNEL(items->type)) {
> > -			if (parser->inner) {
> > +			if (parser->tunnel) {
> >  				rte_flow_error_set(error, ENOTSUP,
> >  						   RTE_FLOW_ERROR_TYPE_ITEM,
> >  						   items,
> > @@ -869,6 +884,7 @@ mlx5_flow_convert_items_validate(const struct
> rte_flow_item items[],
> >  				return -rte_errno;
> >  			}
> >  			parser->inner = IBV_FLOW_SPEC_INNER;
> > +			parser->tunnel = flow_ptype[items->type];
> >  		}
> >  		if (parser->drop) {
> >  			parser->queue[HASH_RXQ_ETH].offset += cur_item->dst_sz;
> @@ -1165,6
> > +1181,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> >  	}
> >  	/* Third step. Conversion parse, fill the specifications. */
> >  	parser->inner = 0;
> > +	parser->tunnel = 0;
> >  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
> >  		struct mlx5_flow_data data = {
> >  			.parser = parser,
> > @@ -1633,6 +1650,7 @@ mlx5_flow_create_vxlan(const struct
> > rte_flow_item *item,
> >
> >  	id.vni[0] = 0;
> >  	parser->inner = IBV_FLOW_SPEC_INNER;
> > +	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)];
> >  	if (spec) {
> >  		if (!mask)
> >  			mask = default_mask;
> > @@ -1686,6 +1704,7 @@ mlx5_flow_create_gre(const struct rte_flow_item
> *item __rte_unused,
> >  	};
> >
> >  	parser->inner = IBV_FLOW_SPEC_INNER;
> > +	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)];
> >  	mlx5_flow_create_copy(parser, &tunnel, size);
> >  	return 0;
> >  }
> > @@ -1864,7 +1883,8 @@ mlx5_flow_create_action_queue_rss(struct
> rte_eth_dev *dev,
> >  				      parser->rss_conf.key_len,
> >  				      hash_fields,
> >  				      parser->rss_conf.queue,
> > -				      parser->rss_conf.queue_num);
> > +				      parser->rss_conf.queue_num,
> > +				      parser->tunnel);
> >  		if (flow->frxq[i].hrxq)
> >  			continue;
> >  		flow->frxq[i].hrxq =
> > @@ -1873,7 +1893,8 @@ mlx5_flow_create_action_queue_rss(struct
> rte_eth_dev *dev,
> >  				      parser->rss_conf.key_len,
> >  				      hash_fields,
> >  				      parser->rss_conf.queue,
> > -				      parser->rss_conf.queue_num);
> > +				      parser->rss_conf.queue_num,
> > +				      parser->tunnel);
> >  		if (!flow->frxq[i].hrxq) {
> >  			return rte_flow_error_set(error, ENOMEM,
> >  						  RTE_FLOW_ERROR_TYPE_HANDLE,
> > @@ -1885,6 +1906,40 @@ mlx5_flow_create_action_queue_rss(struct
> > rte_eth_dev *dev,  }
> >
> >  /**
> > + * RXQ update after flow rule creation.
> > + *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> > + * @param flow
> > + *   Pointer to the flow rule.
> > + */
> > +static void
> > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct rte_flow
> > +*flow) {
> > +	struct priv *priv = dev->data->dev_private;
> > +	unsigned int i;
> > +
> > +	if (!dev->data->dev_started)
> > +		return;
> > +	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > +						 [(*flow->queues)[i]];
> > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > +
> > +		rxq_data->mark |= flow->mark;
> > +		if (!tunnel)
> > +			continue;
> > +		rxq_ctrl->tunnel_types[tunnel] += 1;
> 
> I don't understand why you need such array, the NIC is unable to return
> the tunnel type has it returns only one bit saying tunnel.
> Why don't it store in the priv structure the current configured tunnel?

This array is used to count tunnel types bound to queue, if only one tunnel type,
ptype will report that tunnel type, TUNNEL MASK(max value) will be returned if 
multiple types bound to a queue.

Flow rss action specifies queues that binding to tunnel, thus we can't assume
all queues have same tunnel types, so this is a per queue structure.


have 

> 
> > +		if (rxq_data->tunnel != flow->tunnel)
> > +			rxq_data->tunnel = rxq_data->tunnel ?
> > +					   RTE_PTYPE_TUNNEL_MASK :
> > +					   flow->tunnel;
> > +	}
> > +}
> > +
> > +/**
> >   * Complete flow rule creation.
> >   *
> >   * @param dev
> > @@ -1944,12 +1999,7 @@ mlx5_flow_create_action_queue(struct rte_eth_dev
> *dev,
> >  				   NULL, "internal error in flow creation");
> >  		goto error;
> >  	}
> > -	for (i = 0; i != parser->rss_conf.queue_num; ++i) {
> > -		struct mlx5_rxq_data *q =
> > -			(*priv->rxqs)[parser->rss_conf.queue[i]];
> > -
> > -		q->mark |= parser->mark;
> > -	}
> > +	mlx5_flow_create_update_rxqs(dev, flow);
> >  	return 0;
> >  error:
> >  	ret = rte_errno; /* Save rte_errno before cleanup. */ @@ -2022,6
> > +2072,7 @@ mlx5_flow_list_create(struct rte_eth_dev *dev,
> >  	}
> >  	/* Copy configuration. */
> >  	flow->queues = (uint16_t (*)[])(flow + 1);
> > +	flow->tunnel = parser.tunnel;
> >  	flow->rss_conf = (struct rte_flow_action_rss){
> >  		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
> >  		.level = 0,
> > @@ -2113,9 +2164,38 @@ mlx5_flow_list_destroy(struct rte_eth_dev *dev,
> struct mlx5_flows *list,
> >  	struct priv *priv = dev->data->dev_private;
> >  	unsigned int i;
> >
> > -	if (flow->drop || !flow->mark)
> > +	if (flow->drop || !dev->data->dev_started)
> >  		goto free;
> > -	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > +	for (i = 0; flow->tunnel && i != flow->rss_conf.queue_num; ++i) {
> > +		/* Update queue tunnel type. */
> > +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > +						 [(*flow->queues)[i]];
> > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > +
> > +		RTE_ASSERT(rxq_ctrl->tunnel_types[tunnel] > 0);
> 
> use assert() not RTE_ASSERT() or make a patch to make such move in the
> whole PMD.
> 
> > +		rxq_ctrl->tunnel_types[tunnel] -= 1;
> > +		if (!rxq_ctrl->tunnel_types[tunnel]) {
> > +			/* Update tunnel type. */
> > +			uint8_t j;
> > +			uint8_t types = 0;
> > +			uint8_t last;
> > +
> > +			for (j = 0; j < RTE_DIM(rxq_ctrl->tunnel_types); j++)
> > +				if (rxq_ctrl->tunnel_types[j]) {
> > +					types += 1;
> > +					last = j;
> > +				}
> > +			/* Keep same if more than one tunnel types left. */
> > +			if (types == 1)
> > +				rxq_data->tunnel = ptype_ext[last];
> > +			else if (types == 0)
> > +				/* No tunnel type left. */
> > +				rxq_data->tunnel = 0;
> > +		}
> > +	}
> > +	for (i = 0; flow->mark && i != flow->rss_conf.queue_num; ++i) {
> >  		struct rte_flow *tmp;
> >  		int mark = 0;
> >
> > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct
> > mlx5_flows *list)  {
> >  	struct priv *priv = dev->data->dev_private;
> >  	struct rte_flow *flow;
> > +	unsigned int i;
> >
> >  	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > -		unsigned int i;
> >  		struct mlx5_ind_table_ibv *ind_tbl = NULL;
> >
> >  		if (flow->drop) {
> > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct
> mlx5_flows *list)
> >  		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data->port_id,
> >  			(void *)flow);
> >  	}
> > +	/* Cleanup Rx queue tunnel info. */
> > +	for (i = 0; i != priv->rxqs_n; ++i) {
> > +		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > +			container_of(q, struct mlx5_rxq_ctrl, rxq);
> > +
> > +		memset((void *)rxq_ctrl->tunnel_types, 0,
> > +		       sizeof(rxq_ctrl->tunnel_types));
> > +		q->tunnel = 0;
> > +	}
> >  }
> 
> This hunk does not handle the fact the Rx queue array may have some holes
> i.e. the application is allowed to ask for 10 queues and only initialise
> some.  In such situation this code will segfault.

In other words, "q" could be NULL, correct? I'll add check for this.
BTW, there should be an action item to add such check in rss/queue flow creation.

> 
> It should only memset the Rx queues making part of the flow not the others.

Clean this(decrease tunnel_types counter of each queue) from each flow would be time 
consuming. If an error happened, counter will not be cleared and such state 
will impact tunnel type after port start again.

> 
> >  /**
> > @@ -2429,7 +2519,8 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct
> mlx5_flows *list)
> >  					      flow->rss_conf.key_len,
> >  					      hash_rxq_init[i].hash_fields,
> >  					      flow->rss_conf.queue,
> > -					      flow->rss_conf.queue_num);
> > +					      flow->rss_conf.queue_num,
> > +					      flow->tunnel);
> >  			if (flow->frxq[i].hrxq)
> >  				goto flow_create;
> >  			flow->frxq[i].hrxq =
> > @@ -2437,7 +2528,8 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct
> mlx5_flows *list)
> >  					      flow->rss_conf.key_len,
> >  					      hash_rxq_init[i].hash_fields,
> >  					      flow->rss_conf.queue,
> > -					      flow->rss_conf.queue_num);
> > +					      flow->rss_conf.queue_num,
> > +					      flow->tunnel);
> >  			if (!flow->frxq[i].hrxq) {
> >  				DRV_LOG(DEBUG,
> >  					"port %u flow %p cannot be applied", @@ -
> 2459,10 +2551,7 @@
> > mlx5_flow_start(struct rte_eth_dev *dev, struct mlx5_flows *list)
> >  			DRV_LOG(DEBUG, "port %u flow %p applied",
> >  				dev->data->port_id, (void *)flow);
> >  		}
> > -		if (!flow->mark)
> > -			continue;
> > -		for (i = 0; i != flow->rss_conf.queue_num; ++i)
> > -			(*priv->rxqs)[flow->rss_conf.queue[i]]->mark = 1;
> > +		mlx5_flow_create_update_rxqs(dev, flow);
> >  	}
> >  	return 0;
> >  }
> > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > index 1e4354ab3..351acfc0f 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
> >   *   first queue index will be taken for the indirection table.
> >   * @param queues_n
> >   *   Number of queues.
> > + * @param tunnel
> > + *   Tunnel type.
> >   *
> >   * @return
> >   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
> > @@ -1394,7 +1396,7 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > rte_eth_dev *dev,
> >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> >  	      uint64_t hash_fields,
> > -	      const uint16_t *queues, uint32_t queues_n)
> > +	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
> >  {
> >  	struct priv *priv = dev->data->dev_private;
> >  	struct mlx5_hrxq *hrxq;
> > @@ -1438,6 +1440,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> >  	hrxq->qp = qp;
> >  	hrxq->rss_key_len = rss_key_len;
> >  	hrxq->hash_fields = hash_fields;
> > +	hrxq->tunnel = tunnel;
> >  	memcpy(hrxq->rss_key, rss_key, rss_key_len);
> >  	rte_atomic32_inc(&hrxq->refcnt);
> >  	LIST_INSERT_HEAD(&priv->hrxqs, hrxq, next); @@ -1466,6 +1469,8 @@
> > mlx5_hrxq_new(struct rte_eth_dev *dev,
> >   *   first queue index will be taken for the indirection table.
> >   * @param queues_n
> >   *   Number of queues.
> > + * @param tunnel
> > + *   Tunnel type.
> >   *
> >   * @return
> >   *   An hash Rx queue on success.
> > @@ -1474,7 +1479,7 @@ struct mlx5_hrxq *  mlx5_hrxq_get(struct
> > rte_eth_dev *dev,
> >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> >  	      uint64_t hash_fields,
> > -	      const uint16_t *queues, uint32_t queues_n)
> > +	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
> >  {
> >  	struct priv *priv = dev->data->dev_private;
> >  	struct mlx5_hrxq *hrxq;
> > @@ -1489,6 +1494,8 @@ mlx5_hrxq_get(struct rte_eth_dev *dev,
> >  			continue;
> >  		if (hrxq->hash_fields != hash_fields)
> >  			continue;
> > +		if (hrxq->tunnel != tunnel)
> > +			continue;
> >  		ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n);
> >  		if (!ind_tbl)
> >  			continue;
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.c
> > b/drivers/net/mlx5/mlx5_rxtx.c index 1f422c70b..d061dfc8a 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.c
> > +++ b/drivers/net/mlx5/mlx5_rxtx.c
> > @@ -34,7 +34,7 @@
> >  #include "mlx5_prm.h"
> >
> >  static __rte_always_inline uint32_t
> > -rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe);
> > +rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct
> > +mlx5_cqe *cqe);
> >
> >  static __rte_always_inline int
> >  mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe
> > *cqe, @@ -125,12 +125,14 @@ mlx5_set_ptype_table(void)
> >  	(*p)[0x8a] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> >  		     RTE_PTYPE_L4_UDP;
> >  	/* Tunneled - L3 */
> > +	(*p)[0x40] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN;
> >  	(*p)[0x41] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> >  		     RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
> >  		     RTE_PTYPE_INNER_L4_NONFRAG;
> >  	(*p)[0x42] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN |
> >  		     RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN |
> >  		     RTE_PTYPE_INNER_L4_NONFRAG;
> > +	(*p)[0xc0] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN;
> >  	(*p)[0xc1] = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV6_EXT_UNKNOWN |
> >  		     RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN |
> >  		     RTE_PTYPE_INNER_L4_NONFRAG;
> > @@ -1577,6 +1579,8 @@ mlx5_tx_burst_empw(void *dpdk_txq, struct
> > rte_mbuf **pkts, uint16_t pkts_n)
> >  /**
> >   * Translate RX completion flags to packet type.
> >   *
> > + * @param[in] rxq
> > + *   Pointer to RX queue structure.
> >   * @param[in] cqe
> >   *   Pointer to CQE.
> >   *
> > @@ -1586,7 +1590,7 @@ mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf
> **pkts, uint16_t pkts_n)
> >   *   Packet type for struct rte_mbuf.
> >   */
> >  static inline uint32_t
> > -rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
> > +rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct
> > +mlx5_cqe *cqe)
> >  {
> >  	uint8_t idx;
> >  	uint8_t pinfo = cqe->pkt_info;
> > @@ -1601,7 +1605,7 @@ rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
> >  	 * bit[7] = outer_l3_type
> >  	 */
> >  	idx = ((pinfo & 0x3) << 6) | ((ptype & 0xfc00) >> 10);
> > -	return mlx5_ptype_table[idx];
> > +	return mlx5_ptype_table[idx] | rxq->tunnel * !!(idx & (1 << 6));
> >  }
> >
> >  /**
> > @@ -1833,7 +1837,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf
> **pkts, uint16_t pkts_n)
> >  			pkt = seg;
> >  			assert(len >= (rxq->crc_present << 2));
> >  			/* Update packet information. */
> > -			pkt->packet_type = rxq_cq_to_pkt_type(cqe);
> > +			pkt->packet_type = rxq_cq_to_pkt_type(rxq, cqe);
> >  			pkt->ol_flags = 0;
> >  			if (rss_hash_res && rxq->rss_hash) {
> >  				pkt->hash.rss = rss_hash_res;
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.h
> > b/drivers/net/mlx5/mlx5_rxtx.h index a702cb603..6866f6818 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -104,6 +104,7 @@ struct mlx5_rxq_data {
> >  	void *cq_uar; /* CQ user access region. */
> >  	uint32_t cqn; /* CQ number. */
> >  	uint8_t cq_arm_sn; /* CQ arm seq number. */
> > +	uint32_t tunnel; /* Tunnel information. */
> >  } __rte_cache_aligned;
> >
> >  /* Verbs Rx queue elements. */
> > @@ -125,6 +126,7 @@ struct mlx5_rxq_ctrl {
> >  	struct mlx5_rxq_ibv *ibv; /* Verbs elements. */
> >  	struct mlx5_rxq_data rxq; /* Data path structure. */
> >  	unsigned int socket; /* CPU socket ID for allocations. */
> > +	uint32_t tunnel_types[16]; /* Tunnel type counter. */
> >  	unsigned int irq:1; /* Whether IRQ is enabled. */
> >  	uint16_t idx; /* Queue index. */
> >  };
> > @@ -145,6 +147,7 @@ struct mlx5_hrxq {
> >  	struct mlx5_ind_table_ibv *ind_table; /* Indirection table. */
> >  	struct ibv_qp *qp; /* Verbs queue pair. */
> >  	uint64_t hash_fields; /* Verbs Hash fields. */
> > +	uint32_t tunnel; /* Tunnel type. */
> >  	uint32_t rss_key_len; /* Hash key length in bytes. */
> >  	uint8_t rss_key[]; /* Hash key. */
> >  };
> > @@ -248,11 +251,13 @@ int mlx5_ind_table_ibv_verify(struct rte_eth_dev
> > *dev);  struct mlx5_hrxq *mlx5_hrxq_new(struct rte_eth_dev *dev,
> >  				const uint8_t *rss_key, uint32_t rss_key_len,
> >  				uint64_t hash_fields,
> > -				const uint16_t *queues, uint32_t queues_n);
> > +				const uint16_t *queues, uint32_t queues_n,
> > +				uint32_t tunnel);
> >  struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
> >  				const uint8_t *rss_key, uint32_t rss_key_len,
> >  				uint64_t hash_fields,
> > -				const uint16_t *queues, uint32_t queues_n);
> > +				const uint16_t *queues, uint32_t queues_n,
> > +				uint32_t tunnel);
> >  int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq
> > *hxrq);  int mlx5_hrxq_ibv_verify(struct rte_eth_dev *dev);  uint64_t
> > mlx5_get_rx_port_offloads(void); diff --git
> > a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > index bbe1818ef..9f9136108 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > @@ -551,6 +551,7 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
> >  	const uint64x1_t mbuf_init = vld1_u64(&rxq->mbuf_initializer);
> >  	const uint64x1_t r32_mask = vcreate_u64(0xffffffff);
> >  	uint64x2_t rearm0, rearm1, rearm2, rearm3;
> > +	uint8_t pt_idx0, pt_idx1, pt_idx2, pt_idx3;
> >
> >  	if (rxq->mark) {
> >  		const uint32x4_t ft_def = vdupq_n_u32(MLX5_FLOW_MARK_DEFAULT);
> > @@ -583,14 +584,18 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
> >  	ptype = vshrn_n_u32(ptype_info, 10);
> >  	/* Errored packets will have RTE_PTYPE_ALL_MASK. */
> >  	ptype = vorr_u16(ptype, op_err);
> > -	pkts[0]->packet_type =
> > -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 6)];
> > -	pkts[1]->packet_type =
> > -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 4)];
> > -	pkts[2]->packet_type =
> > -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 2)];
> > -	pkts[3]->packet_type =
> > -		mlx5_ptype_table[vget_lane_u8(vreinterpret_u8_u16(ptype), 0)];
> > +	pt_idx0 = vget_lane_u8(vreinterpret_u8_u16(ptype), 6);
> > +	pt_idx1 = vget_lane_u8(vreinterpret_u8_u16(ptype), 4);
> > +	pt_idx2 = vget_lane_u8(vreinterpret_u8_u16(ptype), 2);
> > +	pt_idx3 = vget_lane_u8(vreinterpret_u8_u16(ptype), 0);
> > +	pkts[0]->packet_type = mlx5_ptype_table[pt_idx0] |
> > +			       !!(pt_idx0 & (1 << 6)) * rxq->tunnel;
> > +	pkts[1]->packet_type = mlx5_ptype_table[pt_idx1] |
> > +			       !!(pt_idx1 & (1 << 6)) * rxq->tunnel;
> > +	pkts[2]->packet_type = mlx5_ptype_table[pt_idx2] |
> > +			       !!(pt_idx2 & (1 << 6)) * rxq->tunnel;
> > +	pkts[3]->packet_type = mlx5_ptype_table[pt_idx3] |
> > +			       !!(pt_idx3 & (1 << 6)) * rxq->tunnel;
> >  	/* Fill flags for checksum and VLAN. */
> >  	pinfo = vandq_u32(ptype_info, ptype_ol_mask);
> >  	pinfo = vreinterpretq_u32_u8(
> > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> > b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> > index c088bcb51..d2492481d 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
> > @@ -542,6 +542,7 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq,
> __m128i cqes[4],
> >  	const __m128i mbuf_init =
> >  		_mm_loadl_epi64((__m128i *)&rxq->mbuf_initializer);
> >  	__m128i rearm0, rearm1, rearm2, rearm3;
> > +	uint8_t pt_idx0, pt_idx1, pt_idx2, pt_idx3;
> >
> >  	/* Extract pkt_info field. */
> >  	pinfo0 = _mm_unpacklo_epi32(cqes[0], cqes[1]); @@ -595,10 +596,18 @@
> > rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i cqes[4],
> >  	/* Errored packets will have RTE_PTYPE_ALL_MASK. */
> >  	op_err = _mm_srli_epi16(op_err, 8);
> >  	ptype = _mm_or_si128(ptype, op_err);
> > -	pkts[0]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 0)];
> > -	pkts[1]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 2)];
> > -	pkts[2]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 4)];
> > -	pkts[3]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 6)];
> > +	pt_idx0 = _mm_extract_epi8(ptype, 0);
> > +	pt_idx1 = _mm_extract_epi8(ptype, 2);
> > +	pt_idx2 = _mm_extract_epi8(ptype, 4);
> > +	pt_idx3 = _mm_extract_epi8(ptype, 6);
> > +	pkts[0]->packet_type = mlx5_ptype_table[pt_idx0] |
> > +			       !!(pt_idx0 & (1 << 6)) * rxq->tunnel;
> > +	pkts[1]->packet_type = mlx5_ptype_table[pt_idx1] |
> > +			       !!(pt_idx1 & (1 << 6)) * rxq->tunnel;
> > +	pkts[2]->packet_type = mlx5_ptype_table[pt_idx2] |
> > +			       !!(pt_idx2 & (1 << 6)) * rxq->tunnel;
> > +	pkts[3]->packet_type = mlx5_ptype_table[pt_idx3] |
> > +			       !!(pt_idx3 & (1 << 6)) * rxq->tunnel;
> >  	/* Fill flags for checksum and VLAN. */
> >  	pinfo = _mm_and_si128(pinfo, ptype_ol_mask);
> >  	pinfo = _mm_shuffle_epi8(cv_flag_sel, pinfo);
> > --
> > 2.13.3
> 
> Regards,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads
  2018-04-10 15:27   ` Nélio Laranjeiro
@ 2018-04-11  8:46     ` Xueming(Steven) Li
  0 siblings, 0 replies; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-11  8:46 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev

Hi Nelio,

> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Tuesday, April 10, 2018 11:28 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 05/15] net/mlx5: support tunnel inner checksum
> offloads
> 
> On Tue, Apr 10, 2018 at 09:34:05PM +0800, Xueming Li wrote:
> > This patch support tunnel inner checksum offloads. By creating tunnel
> > flow, once tunnel packet type(RTE_PTYPE_TUNNEL_xxx) identified,
> 
> Where is the code creating the tunnel flow?

Literal issue, I'll remove "By creating tunnel flow". Also, this patch set
actually a cleanup of tunnel checksum, I'll update.

> 
> > PKT_RX_IP_CKSUM_XXX and PKT_RX_L4_CKSUM_XXX represent checksum result
> > of inner headers, outer L3 and L4 header checksum are always valid as
> > soon as tunnel identified. If no tunnel identified,
> > PKT_RX_IP_CKSUM_XXX and PKT_RX_L4_CKSUM_XXX represent checksum result
> > of outer L3 and L4 headers.
> >
> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_flow.c |  7 +++++--
> > drivers/net/mlx5/mlx5_rxq.c  |  2 --  drivers/net/mlx5/mlx5_rxtx.c |
> > 18 ++++--------------  drivers/net/mlx5/mlx5_rxtx.h |  1 -
> >  4 files changed, 9 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_flow.c
> > b/drivers/net/mlx5/mlx5_flow.c index 65d7a9b62..b3ad6dc85 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -829,6 +829,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
> >  /**
> >   * Validate items.
> >   *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> >   * @param[in] items
> >   *   Pattern specification (list terminated by the END pattern item).
> >   * @param[out] error
> > @@ -840,7 +842,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
> >   *   0 on success, a negative errno value otherwise and rte_errno is
> set.
> >   */
> >  static int
> > -mlx5_flow_convert_items_validate(const struct rte_flow_item items[],
> > +mlx5_flow_convert_items_validate(struct rte_eth_dev *dev __rte_unused,
> > +				 const struct rte_flow_item items[],
> >  				 struct rte_flow_error *error,
> >  				 struct mlx5_flow_parse *parser)
> >  {
> > @@ -1146,7 +1149,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> >  	ret = mlx5_flow_convert_actions(dev, actions, error, parser);
> >  	if (ret)
> >  		return ret;
> > -	ret = mlx5_flow_convert_items_validate(items, error, parser);
> > +	ret = mlx5_flow_convert_items_validate(dev, items, error, parser);
> >  	if (ret)
> >  		return ret;
> >  	mlx5_flow_convert_finalise(parser);
> 
> I don't understand the necessity of the two hunks above.
> 
> > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > index 351acfc0f..073732e16 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -1045,8 +1045,6 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t idx,
> uint16_t desc,
> >  	}
> >  	/* Toggle RX checksum offload if hardware supports it. */
> >  	tmpl->rxq.csum = !!(conf->offloads & DEV_RX_OFFLOAD_CHECKSUM);
> > -	tmpl->rxq.csum_l2tun = (!!(conf->offloads & DEV_RX_OFFLOAD_CHECKSUM)
> &&
> > -				priv->config.tunnel_en);
> >  	tmpl->rxq.hw_timestamp = !!(conf->offloads &
> DEV_RX_OFFLOAD_TIMESTAMP);
> >  	/* Configure VLAN stripping. */
> >  	tmpl->rxq.vlan_strip = !!(conf->offloads &
> > DEV_RX_OFFLOAD_VLAN_STRIP); diff --git a/drivers/net/mlx5/mlx5_rxtx.c
> > b/drivers/net/mlx5/mlx5_rxtx.c index d061dfc8a..285b2dbf0 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.c
> > +++ b/drivers/net/mlx5/mlx5_rxtx.c
> > @@ -41,7 +41,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile
> struct mlx5_cqe *cqe,
> >  		 uint16_t cqe_cnt, uint32_t *rss_hash);
> >
> >  static __rte_always_inline uint32_t
> > -rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct
> > mlx5_cqe *cqe);
> > +rxq_cq_to_ol_flags(volatile struct mlx5_cqe *cqe);
> >
> >  uint32_t mlx5_ptype_table[] __rte_cache_aligned = {
> >  	[0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */ @@
> > -1728,8 +1728,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq,
> > volatile struct mlx5_cqe *cqe,
> >  /**
> >   * Translate RX completion flags to offload flags.
> >   *
> > - * @param[in] rxq
> > - *   Pointer to RX queue structure.
> >   * @param[in] cqe
> >   *   Pointer to CQE.
> >   *
> > @@ -1737,7 +1735,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq,
> volatile struct mlx5_cqe *cqe,
> >   *   Offload flags (ol_flags) for struct rte_mbuf.
> >   */
> >  static inline uint32_t
> > -rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq, volatile struct
> > mlx5_cqe *cqe)
> > +rxq_cq_to_ol_flags(volatile struct mlx5_cqe *cqe)
> >  {
> >  	uint32_t ol_flags = 0;
> >  	uint16_t flags = rte_be_to_cpu_16(cqe->hdr_type_etc);
> > @@ -1749,14 +1747,6 @@ rxq_cq_to_ol_flags(struct mlx5_rxq_data *rxq,
> volatile struct mlx5_cqe *cqe)
> >  		TRANSPOSE(flags,
> >  			  MLX5_CQE_RX_L4_HDR_VALID,
> >  			  PKT_RX_L4_CKSUM_GOOD);
> > -	if ((cqe->pkt_info & MLX5_CQE_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
> > -		ol_flags |=
> > -			TRANSPOSE(flags,
> > -				  MLX5_CQE_RX_L3_HDR_VALID,
> > -				  PKT_RX_IP_CKSUM_GOOD) |
> > -			TRANSPOSE(flags,
> > -				  MLX5_CQE_RX_L4_HDR_VALID,
> > -				  PKT_RX_L4_CKSUM_GOOD);
> >  	return ol_flags;
> >  }
> >
> > @@ -1855,8 +1845,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf
> **pkts, uint16_t pkts_n)
> >  						mlx5_flow_mark_get(mark);
> >  				}
> >  			}
> > -			if (rxq->csum | rxq->csum_l2tun)
> > -				pkt->ol_flags |= rxq_cq_to_ol_flags(rxq, cqe);
> > +			if (rxq->csum)
> > +				pkt->ol_flags |= rxq_cq_to_ol_flags(cqe);
> >  			if (rxq->vlan_strip &&
> >  			    (cqe->hdr_type_etc &
> >  			     rte_cpu_to_be_16(MLX5_CQE_VLAN_STRIPPED))) { diff -
> -git
> > a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index
> > 6866f6818..d35605b55 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -77,7 +77,6 @@ struct rxq_zip {
> >  /* RX queue descriptor. */
> >  struct mlx5_rxq_data {
> >  	unsigned int csum:1; /* Enable checksum offloading. */
> > -	unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
> >  	unsigned int hw_timestamp:1; /* Enable HW timestamp. */
> >  	unsigned int vlan_strip:1; /* Enable VLAN stripping. */
> >  	unsigned int crc_present:1; /* CRC must be subtracted. */
> > --
> > 2.13.3
> 
> This last part seems to introduce a regression by removing the support for
> the tunnel checksum offload.

Csum_l2tun is unused, checksum for tunnel and normal packet can't be enabled
separately, so only leave csum field. The code that using csum_l2tun is 
duplicated code, that's why I removed it in rxq_cq_to_ol_flags().

> 
> Seems this patch is incomplete or wrongly explained.
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 10/15] net/mlx5: allow flow tunnel ID 0 with outer pattern
  2018-04-10 13:34 ` [PATCH v2 10/15] net/mlx5: allow flow tunnel ID 0 with outer pattern Xueming Li
@ 2018-04-11 12:25   ` Nélio Laranjeiro
  0 siblings, 0 replies; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-11 12:25 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:10PM +0800, Xueming Li wrote:
> Tunnel w/o tunnel id pattern could match any non-tunneled packet,
> this patch allowed tunnel w/o tunnel id pattern after proper outer spec.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

> ---
>  drivers/net/mlx5/mlx5_flow.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 857b8b716..58d437308 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -1803,7 +1803,8 @@ mlx5_flow_create_vxlan(const struct rte_flow_item *item,
>  	 * before will also match this rule.
>  	 * To avoid such situation, VNI 0 is currently refused.
>  	 */
> -	if (!vxlan.val.tunnel_id)
> +	/* Only allow tunnel w/o tunnel id pattern after proper outer spec. */
> +	if (parser->out_layer == HASH_RXQ_ETH && !vxlan.val.tunnel_id)
>  		return rte_flow_error_set(data->error, EINVAL,
>  					  RTE_FLOW_ERROR_TYPE_ITEM,
>  					  item,
> -- 
> 2.13.3
> 

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading
  2018-04-10 13:34 ` [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading Xueming Li
@ 2018-04-11 12:32   ` Nélio Laranjeiro
  2018-04-11 12:43     ` Thomas Monjalon
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-11 12:32 UTC (permalink / raw)
  To: Xueming Li, Thomas Monjalon; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:12PM +0800, Xueming Li wrote:
> Remove tunnel limitations, add new hardware tunnel offload features.
> 
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  doc/guides/nics/mlx5.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
> index b1bab2ce2..c256f85f3 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -100,12 +100,12 @@ Features
>  - RX interrupts.
>  - Statistics query including Basic, Extended and per queue.
>  - Rx HW timestamp.
> +- Tunnel types: VXLAN, L3 VXLAN, VXLAN-GPE, GRE, MPLS-in-GRE, MPLS-in-UDP.
> +- Tunnel HW offloads: packet type, inner/outer RSS, IP and UDP checksum verification.
>  
>  Limitations
>  -----------
>  
> -- Inner RSS for VXLAN frames is not supported yet.
> -- Hardware checksum RX offloads for VXLAN inner header are not supported yet.
>  - For secondary process:
>  
>    - Forked secondary process not supported.
> -- 
> 2.13.3

Inner RSS may deserve its own entry in the features docs [1][2][3],

Thomas what do you think?

Regards,

[1] https://dpdk.org/doc/guides/nics/overview.html
[2] https://dpdk.org/browse/dpdk/tree/doc/guides/nics/features/default.ini
[3] https://dpdk.org/browse/dpdk/tree/doc/guides/nics/features/mlx5.ini

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count
  2018-04-10 13:34 ` [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count Xueming Li
@ 2018-04-11 12:37   ` Nélio Laranjeiro
  2018-04-11 13:01     ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-11 12:37 UTC (permalink / raw)
  To: Xueming Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 09:34:13PM +0800, Xueming Li wrote:
> In some environments it is desirable to have the NIC perform RSS
> normally on the packet regardless of the number of queues configured.
> The RSS hash result that is stored in the mbuf can then be used by
> the application to make decisions about how to distribute workloads
> to threads, secondary processes, or even virtual machines if the
> application is a virtual switch.
>
> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow.c | 71 +++++++++++++++++++-------------------------
>  1 file changed, 30 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 5784f2ee0..9efe00086 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -1252,48 +1252,37 @@ mlx5_flow_convert_rss(struct rte_eth_dev *dev, struct mlx5_flow_parse *parser)
>  			parser->queue[i].ibv_attr = NULL;
>  		}
>  	}
> -	if (parser->rss_conf.types) {
> -		/* Remove impossible flow according to the RSS configuration. */
> -		for (i = hmin; i != (hmax + 1); ++i) {
> -			if (!parser->queue[i].ibv_attr)
> -				continue;
> -			if (parser->rss_conf.types &
> -			    hash_rxq_init[i].dpdk_rss_hf) {
> -				parser->queue[i].hash_fields =
> -					hash_rxq_init[i].hash_fields;
> -				found = 1;
> -				continue;
> -			}
> -			/* L4 flow could be used for L3 RSS. */
> -			if (i == parser->layer && i < ip &&
> -			    (hash_rxq_init[ip].dpdk_rss_hf &
> -			     parser->rss_conf.types)) {
> -				parser->queue[i].hash_fields =
> -					hash_rxq_init[ip].hash_fields;
> -				found = 1;
> -				continue;
> -			}
> -			/* L3 flow and L4 hash: non-rss L3 flow. */
> -			if (i == parser->layer && i == ip && found)
> -				/* IP pattern and L4 HF. */
> -				continue;
> -			rte_free(parser->queue[i].ibv_attr);
> -			parser->queue[i].ibv_attr = NULL;
> +	/* Remove impossible flow according to the RSS configuration. */
> +	for (i = hmin; i != (hmax + 1); ++i) {
> +		if (!parser->queue[i].ibv_attr)
> +			continue;
> +		if (parser->rss_conf.types &
> +		    hash_rxq_init[i].dpdk_rss_hf) {
> +			parser->queue[i].hash_fields =
> +				hash_rxq_init[i].hash_fields;
> +			found = 1;
> +			continue;
>  		}
> -		if (!found)
> -			DRV_LOG(WARNING,
> -				"port %u rss hash function doesn't match "
> -				"pattern", dev->data->port_id);
> -	} else {
> -		/* Remove any other flow. */
> -		for (i = hmin; i != (hmax + 1); ++i) {
> -			if (i == parser->layer || !parser->queue[i].ibv_attr)
> -				continue;
> -			rte_free(parser->queue[i].ibv_attr);
> -			parser->queue[i].ibv_attr = NULL;
> +		/* L4 flow could be used for L3 RSS. */
> +		if (i == parser->layer && i < ip &&
> +		    (hash_rxq_init[ip].dpdk_rss_hf &
> +		     parser->rss_conf.types)) {
> +			parser->queue[i].hash_fields =
> +				hash_rxq_init[ip].hash_fields;
> +			found = 1;
> +			continue;
>  		}
> -		parser->rss_conf.queue_num = 1;
> +		/* L3 flow and L4 hash: non-rss L3 flow. */
> +		if (i == parser->layer && i == ip && found)
> +			/* IP pattern and L4 HF. */
> +			continue;
> +		rte_free(parser->queue[i].ibv_attr);
> +		parser->queue[i].ibv_attr = NULL;
>  	}
> +	if (!found)
> +		DRV_LOG(WARNING,
> +			"port %u rss hash function doesn't match "
> +			"pattern", dev->data->port_id);
>  	return 0;
>  }
>  
> @@ -2326,8 +2315,8 @@ mlx5_flow_dump(struct rte_eth_dev *dev __rte_unused,
>  		(void *)flow->frxq[i].hrxq->ind_table,
>  		flow->frxq[i].hash_fields |
>  		(flow->tunnel &&
> -		 flow->rss_conf.rss_level ? (uint32_t)IBV_RX_HASH_INNER : 0),
> -		flow->queues_n,
> +		 flow->rss_conf.level ? (uint32_t)IBV_RX_HASH_INNER : 0),
> +		flow->rss_conf.queue_num,
>  		flow->frxq[i].ibv_attr->num_of_specs,
>  		flow->frxq[i].ibv_attr->size,
>  		flow->frxq[i].ibv_attr->priority,
> -- 
> 2.13.3

Seems, this code should make part of
"[PATCH v2 07/15] net/mlx5: support tunnel RSS level", as it re-works
the code added there, in addition this feature is already present in
tree for non tunnel packets.

Any reason why it is not merged in the previous commit?

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading
  2018-04-11 12:32   ` Nélio Laranjeiro
@ 2018-04-11 12:43     ` Thomas Monjalon
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Monjalon @ 2018-04-11 12:43 UTC (permalink / raw)
  To: Nélio Laranjeiro, ferruh.yigit; +Cc: Xueming Li, Shahaf Shuler, dev

11/04/2018 14:32, Nélio Laranjeiro:
> On Tue, Apr 10, 2018 at 09:34:12PM +0800, Xueming Li wrote:
> > -- Inner RSS for VXLAN frames is not supported yet.
> > -- Hardware checksum RX offloads for VXLAN inner header are not supported yet.
> >  - For secondary process:
> 
> Inner RSS may deserve its own entry in the features docs [1][2][3],
> 
> Thomas what do you think?

Yes, it looks reasonnable to add inner RSS in features list.
Ferruh, do you agree?


> [1] https://dpdk.org/doc/guides/nics/overview.html
> [2] https://dpdk.org/browse/dpdk/tree/doc/guides/nics/features/default.ini
> [3] https://dpdk.org/browse/dpdk/tree/doc/guides/nics/features/mlx5.ini

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count
  2018-04-11 12:37   ` Nélio Laranjeiro
@ 2018-04-11 13:01     ` Xueming(Steven) Li
  0 siblings, 0 replies; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-11 13:01 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev

Hi Nelio,

> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Wednesday, April 11, 2018 8:37 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue
> count
> 
> On Tue, Apr 10, 2018 at 09:34:13PM +0800, Xueming Li wrote:
> > In some environments it is desirable to have the NIC perform RSS
> > normally on the packet regardless of the number of queues configured.
> > The RSS hash result that is stored in the mbuf can then be used by the
> > application to make decisions about how to distribute workloads to
> > threads, secondary processes, or even virtual machines if the
> > application is a virtual switch.
> >
> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_flow.c | 71
> > +++++++++++++++++++-------------------------
> >  1 file changed, 30 insertions(+), 41 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_flow.c
> > b/drivers/net/mlx5/mlx5_flow.c index 5784f2ee0..9efe00086 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -1252,48 +1252,37 @@ mlx5_flow_convert_rss(struct rte_eth_dev *dev,
> struct mlx5_flow_parse *parser)
> >  			parser->queue[i].ibv_attr = NULL;
> >  		}
> >  	}
> > -	if (parser->rss_conf.types) {
> > -		/* Remove impossible flow according to the RSS configuration.
> */
> > -		for (i = hmin; i != (hmax + 1); ++i) {
> > -			if (!parser->queue[i].ibv_attr)
> > -				continue;
> > -			if (parser->rss_conf.types &
> > -			    hash_rxq_init[i].dpdk_rss_hf) {
> > -				parser->queue[i].hash_fields =
> > -					hash_rxq_init[i].hash_fields;
> > -				found = 1;
> > -				continue;
> > -			}
> > -			/* L4 flow could be used for L3 RSS. */
> > -			if (i == parser->layer && i < ip &&
> > -			    (hash_rxq_init[ip].dpdk_rss_hf &
> > -			     parser->rss_conf.types)) {
> > -				parser->queue[i].hash_fields =
> > -					hash_rxq_init[ip].hash_fields;
> > -				found = 1;
> > -				continue;
> > -			}
> > -			/* L3 flow and L4 hash: non-rss L3 flow. */
> > -			if (i == parser->layer && i == ip && found)
> > -				/* IP pattern and L4 HF. */
> > -				continue;
> > -			rte_free(parser->queue[i].ibv_attr);
> > -			parser->queue[i].ibv_attr = NULL;
> > +	/* Remove impossible flow according to the RSS configuration. */
> > +	for (i = hmin; i != (hmax + 1); ++i) {
> > +		if (!parser->queue[i].ibv_attr)
> > +			continue;
> > +		if (parser->rss_conf.types &
> > +		    hash_rxq_init[i].dpdk_rss_hf) {
> > +			parser->queue[i].hash_fields =
> > +				hash_rxq_init[i].hash_fields;
> > +			found = 1;
> > +			continue;
> >  		}
> > -		if (!found)
> > -			DRV_LOG(WARNING,
> > -				"port %u rss hash function doesn't match "
> > -				"pattern", dev->data->port_id);
> > -	} else {
> > -		/* Remove any other flow. */
> > -		for (i = hmin; i != (hmax + 1); ++i) {
> > -			if (i == parser->layer || !parser->queue[i].ibv_attr)
> > -				continue;
> > -			rte_free(parser->queue[i].ibv_attr);
> > -			parser->queue[i].ibv_attr = NULL;
> > +		/* L4 flow could be used for L3 RSS. */
> > +		if (i == parser->layer && i < ip &&
> > +		    (hash_rxq_init[ip].dpdk_rss_hf &
> > +		     parser->rss_conf.types)) {
> > +			parser->queue[i].hash_fields =
> > +				hash_rxq_init[ip].hash_fields;
> > +			found = 1;
> > +			continue;
> >  		}
> > -		parser->rss_conf.queue_num = 1;
> > +		/* L3 flow and L4 hash: non-rss L3 flow. */
> > +		if (i == parser->layer && i == ip && found)
> > +			/* IP pattern and L4 HF. */
> > +			continue;
> > +		rte_free(parser->queue[i].ibv_attr);
> > +		parser->queue[i].ibv_attr = NULL;
> >  	}
> > +	if (!found)
> > +		DRV_LOG(WARNING,
> > +			"port %u rss hash function doesn't match "
> > +			"pattern", dev->data->port_id);
> >  	return 0;
> >  }
> >
> > @@ -2326,8 +2315,8 @@ mlx5_flow_dump(struct rte_eth_dev *dev
> __rte_unused,
> >  		(void *)flow->frxq[i].hrxq->ind_table,
> >  		flow->frxq[i].hash_fields |
> >  		(flow->tunnel &&
> > -		 flow->rss_conf.rss_level ? (uint32_t)IBV_RX_HASH_INNER : 0),
> > -		flow->queues_n,
> > +		 flow->rss_conf.level ? (uint32_t)IBV_RX_HASH_INNER : 0),
> > +		flow->rss_conf.queue_num,
> >  		flow->frxq[i].ibv_attr->num_of_specs,
> >  		flow->frxq[i].ibv_attr->size,
> >  		flow->frxq[i].ibv_attr->priority,
> > --
> > 2.13.3
> 
> Seems, this code should make part of
> "[PATCH v2 07/15] net/mlx5: support tunnel RSS level", as it re-works the
> code added there, in addition this feature is already present in tree for
> non tunnel packets.
> 
> Any reason why it is not merged in the previous commit?

This feature is developed much later than the patch 7/15, but I think you are 
right, will merge them together.

> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-10 15:22     ` Xueming(Steven) Li
@ 2018-04-12  9:09       ` Nélio Laranjeiro
  2018-04-12 13:43         ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-12  9:09 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Tue, Apr 10, 2018 at 03:22:46PM +0000, Xueming(Steven) Li wrote:
> Hi Nelio,
> 
> > -----Original Message-----
> > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Sent: Tuesday, April 10, 2018 10:42 PM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> > 
> > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > > priorites support:
> > > 0-3: RTE FLOW tunnel rule
> > > 4-7: RTE FLOW non-tunnel rule
> > > 8-15: PMD control flow
> > 
> > This commit log is inducing people in error, this amount of priority
> > depends on the Mellanox OFED installed, it is not available on upstream
> > Linux kernel yet nor in the current Mellanox OFED GA.
> > 
> > What happens when those amount of priority are not available, is it
> > removing a functionality?  Will it collide with other flows?
> 
> If 16  priorities not available, simply behavior as 8 priorities.

It is not described in the commit log, please add it.

> > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
<snip/>
> > >  	},
> > >  	[HASH_RXQ_ETH] = {
> > >  		.hash_fields = 0,
> > >  		.dpdk_rss_hf = 0,
> > > -		.flow_priority = 3,
> > > +		.flow_priority = 2,
> > >  	},
> > >  };
> > 
> > If the amount of priorities remains 8, you are removing the priority for
> > the tunnel flows introduced by commit 749365717f5c ("net/mlx5: change
> > tunnel flow priority")
> > 
> > Please keep this functionality when this patch fails to get the expected
> > 16 Verbs priorities.
> 
> These priority shift are different in 16 priorities scenario, I changed it
> to calculation. In function mlx5_flow_priorities_detect(), priority shift 
> will be 1 if 8 priorities, 4 in case of 16 priorities. Please refer to changes
> in function mlx5_flow_update_priority() as well.

Please light my lamp, I don't see it...
 
<snip/>
> > >  static void
> > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > +			  struct mlx5_flow_parse *parser,
> > >  			  const struct rte_flow_attr *attr)  {
> > > +	struct priv *priv = dev->data->dev_private;
> > >  	unsigned int i;
> > > +	uint16_t priority;
> > >
> > > +	if (priv->config.flow_priority_shift == 1)
> > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > +	else
> > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > +	if (!parser->inner)
> > > +		priority += priv->config.flow_priority_shift;
> > >  	if (parser->drop) {
> > > -		parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> > > -			attr->priority +
> > > -			hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > > +		parser->queue[HASH_RXQ_ETH].ibv_attr->priority = priority +
> > > +				hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > >  		return;
> > >  	}
> > >  	for (i = 0; i != hash_rxq_init_n; ++i) {
> > > -		if (parser->queue[i].ibv_attr) {
> > > -			parser->queue[i].ibv_attr->priority =
> > > -				attr->priority +
> > > -				hash_rxq_init[i].flow_priority -
> > > -				(parser->inner ? 1 : 0);
> > > -		}
> > > +		if (!parser->queue[i].ibv_attr)
> > > +			continue;
> > > +		parser->queue[i].ibv_attr->priority = priority +
> > > +				hash_rxq_init[i].flow_priority;

Previous code was subtracting one from the table priorities which was
starting at 1.  In the new code I don't see it.

What I am missing?

> > >  	}
> > >  }
> > >
> > > @@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > >  		.layer = HASH_RXQ_ETH,
> > >  		.mark_id = MLX5_FLOW_MARK_DEFAULT,
> > >  	};
> > > -	ret = mlx5_flow_convert_attributes(attr, error);
> > > +	ret = mlx5_flow_convert_attributes(dev, attr, error);
> > >  	if (ret)
> > >  		return ret;
> > >  	ret = mlx5_flow_convert_actions(dev, actions, error, parser); @@
> > > -1158,7 +1168,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > >  	 */
> > >  	if (!parser->drop)
> > >  		mlx5_flow_convert_finalise(parser);
> > > -	mlx5_flow_update_priority(parser, attr);
> > > +	mlx5_flow_update_priority(dev, parser, attr);
> > >  exit_free:
> > >  	/* Only verification is expected, all resources should be released.
> > */
> > >  	if (!parser->create) {
> > > @@ -2450,7 +2460,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
> > >  	struct priv *priv = dev->data->dev_private;
> > >  	const struct rte_flow_attr attr = {
> > >  		.ingress = 1,
> > > -		.priority = MLX5_CTRL_FLOW_PRIORITY,
> > > +		.priority = priv->config.control_flow_priority,
> > >  	};
> > >  	struct rte_flow_item items[] = {
> > >  		{
> > > @@ -3161,3 +3171,50 @@ mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
> > >  	}
> > >  	return 0;
> > >  }
> > > +
> > > +/**
> > > + * Detect number of Verbs flow priorities supported.
> > > + *
> > > + * @param dev
> > > + *   Pointer to Ethernet device.
> > > + */
> > > +void
> > > +mlx5_flow_priorities_detect(struct rte_eth_dev *dev) {
> > > +	struct priv *priv = dev->data->dev_private;
> > > +	uint32_t verb_priorities = MLX5_VERBS_FLOW_PRIO_8 * 2;
> > > +	struct {
> > > +		struct ibv_flow_attr attr;
> > > +		struct ibv_flow_spec_eth eth;
> > > +		struct ibv_flow_spec_action_drop drop;
> > > +	} flow_attr = {
> > > +		.attr = {
> > > +			.num_of_specs = 2,
> > > +			.priority = verb_priorities - 1,
> > > +		},
> > > +		.eth = {
> > > +			.type = IBV_FLOW_SPEC_ETH,
> > > +			.size = sizeof(struct ibv_flow_spec_eth),
> > > +		},
> > > +		.drop = {
> > > +			.size = sizeof(struct ibv_flow_spec_action_drop),
> > > +			.type = IBV_FLOW_SPEC_ACTION_DROP,
> > > +		},
> > > +	};
> > > +	struct ibv_flow *flow;
> > > +
> > > +	if (priv->config.control_flow_priority)
> > > +		return;
> > > +	flow = mlx5_glue->create_flow(priv->flow_drop_queue->qp,
> > > +				      &flow_attr.attr);
> > > +	if (flow) {
> > > +		priv->config.flow_priority_shift = MLX5_VERBS_FLOW_PRIO_8 / 2;
> > > +		claim_zero(mlx5_glue->destroy_flow(flow));
> > > +	} else {
> > > +		priv->config.flow_priority_shift = 1;
> > > +		verb_priorities = verb_priorities / 2;
> > > +	}
> > > +	priv->config.control_flow_priority = 1;
> > > +	DRV_LOG(INFO, "port %u Verbs flow priorities: %d",
> > > +		dev->data->port_id, verb_priorities); }
> > > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688 100644
> > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > >  	int ret;
> > >
> > >  	dev->data->dev_started = 1;
> > > -	ret = mlx5_flow_create_drop_queue(dev);
> > > -	if (ret) {
> > > -		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > > -			dev->data->port_id, strerror(rte_errno));
> > > -		goto error;
> > > -	}
> > >  	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx queues",
> > >  		dev->data->port_id);
> > >  	rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > > mlx5_dev_start(struct rte_eth_dev *dev)
> > >  	mlx5_traffic_disable(dev);
> > >  	mlx5_txq_stop(dev);
> > >  	mlx5_rxq_stop(dev);
> > > -	mlx5_flow_delete_drop_queue(dev);
> > >  	rte_errno = ret; /* Restore rte_errno. */
> > >  	return -rte_errno;
> > >  }
> > > @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
> > >  	mlx5_rxq_stop(dev);
> > >  	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv->mr))
> > >  		mlx5_mr_release(mr);
> > > -	mlx5_flow_delete_drop_queue(dev);
> > >  }
> > >
> > >  /**
> > > --
> > > 2.13.3
> > 
> > I have few concerns on this, mlx5_pci_probe() will also probe any under
> > layer verbs device, and in a near future the representors associated to a
> > VF.
> > Making such detection should only be done once by the PF, I also wander if
> > it is possible to make such drop action in a representor directly using
> > Verbs.
> 
> Then there should be some work to disable flows in representors? that 
> supposed to cover this.

The code raising another Verbs device is already present and since the
first entrance of this PMD in the DPDK tree, you must respect the code
already present.
This request is not related directly to a new feature but to an existing
one, the representors being just an example.

This detection should be only done once and not for each of them.

> > Another concern is, this patch will be reverted in some time when those
> > 16 priority will be always available.  It will be easier to remove this
> > detection function than searching for all those modifications.
> > 
> > I would suggest to have a standalone mlx5_flow_priorities_detect() which
> > creates and deletes all resources needed for this detection.
> 
> There is an upcoming new feature to support priorities more than 16, auto 
> detection will be kept IMHO.

Until the final values of priorities will be backported to all kernels we
support.  You don't see far enough in the future.

> Besides, there will be a bundle of resource creation and removal in
> this standalone function, I'm not sure it valuable to duplicate them,
> please refer to function mlx5_flow_create_drop_queue().

You misunderstood, I am not asking you to not use the default drop
queues but instead of making an rte_flow attributes, items and actions to
make directly the Verbs specification on the stack.  It will be faster
than making a bunch of conversions (relying on malloc) from rte_flow to
Verbs whereas you know exactly what it needs i.e. 1 spec.

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-11  8:11     ` Xueming(Steven) Li
@ 2018-04-12  9:50       ` Nélio Laranjeiro
  2018-04-12 14:27         ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-12  9:50 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Wed, Apr 11, 2018 at 08:11:50AM +0000, Xueming(Steven) Li wrote:
> Hi Nelio,
> 
> > -----Original Message-----
> > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Sent: Tuesday, April 10, 2018 11:17 PM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > identification
> > 
> > On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > > This patch introduced tunnel type identification based on flow rules.
> > > If flows of multiple tunnel types built on same queue,
> > > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be
> > > used as tunnel type identifier.
> > 
> > I don't see anywhere in this patch where the bits are reserved to identify
> > a flow, nor values which can help to identify it.
> > 
> > Is this missing?
> > 
> > Anyway we have already very few bits in the mark making it difficult to be
> > used by the user, reserving again some to may lead to remove the mark
> > support from the flows.
> 
> Not all users will use multiple tunnel types, this is not included in this patch
> set and left to user decision. I'll update comments to make this clear.

Thanks,

> > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
<snip/>
> > >  /**
> > > + * RXQ update after flow rule creation.
> > > + *
> > > + * @param dev
> > > + *   Pointer to Ethernet device.
> > > + * @param flow
> > > + *   Pointer to the flow rule.
> > > + */
> > > +static void
> > > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct rte_flow
> > > +*flow) {
> > > +	struct priv *priv = dev->data->dev_private;
> > > +	unsigned int i;
> > > +
> > > +	if (!dev->data->dev_started)
> > > +		return;
> > > +	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > > +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > > +						 [(*flow->queues)[i]];
> > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > > +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > > +
> > > +		rxq_data->mark |= flow->mark;
> > > +		if (!tunnel)
> > > +			continue;
> > > +		rxq_ctrl->tunnel_types[tunnel] += 1;
> > 
> > I don't understand why you need such array, the NIC is unable to return
> > the tunnel type has it returns only one bit saying tunnel.
> > Why don't it store in the priv structure the current configured tunnel?
> 
> This array is used to count tunnel types bound to queue, if only one tunnel type,
> ptype will report that tunnel type, TUNNEL MASK(max value) will be returned if 
> multiple types bound to a queue.
> 
> Flow rss action specifies queues that binding to tunnel, thus we can't assume
> all queues have same tunnel types, so this is a per queue structure.

There is something I am missing here, how in the dataplane the PMD can
understand from 1 bit which kind of tunnel the packet is matching?

<snip/>
> > > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct
> > > mlx5_flows *list)  {
> > >  	struct priv *priv = dev->data->dev_private;
> > >  	struct rte_flow *flow;
> > > +	unsigned int i;
> > >
> > >  	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > > -		unsigned int i;
> > >  		struct mlx5_ind_table_ibv *ind_tbl = NULL;
> > >
> > >  		if (flow->drop) {
> > > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev, struct
> > mlx5_flows *list)
> > >  		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data->port_id,
> > >  			(void *)flow);
> > >  	}
> > > +	/* Cleanup Rx queue tunnel info. */
> > > +	for (i = 0; i != priv->rxqs_n; ++i) {
> > > +		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > +			container_of(q, struct mlx5_rxq_ctrl, rxq);
> > > +
> > > +		memset((void *)rxq_ctrl->tunnel_types, 0,
> > > +		       sizeof(rxq_ctrl->tunnel_types));
> > > +		q->tunnel = 0;
> > > +	}
> > >  }
> > 
> > This hunk does not handle the fact the Rx queue array may have some holes
> > i.e. the application is allowed to ask for 10 queues and only initialise
> > some.  In such situation this code will segfault.
> 
> In other words, "q" could be NULL, correct? I'll add check for this.

Correct.

> BTW, there should be an action item to add such check in rss/queue flow creation.

As it is the responsibility of the application/user to make rule according
to what it has configured, it has not been added.  It can still be
added, but it cannot be considered as a fix.

> > It should only memset the Rx queues making part of the flow not the others.
> 
> Clean this(decrease tunnel_types counter of each queue) from each flow would be time 
> consuming.

Considering flows are already relying on syscall to communicate with
the kernel, the extra cycles consumption to only clear the queues making
part of this flow is neglectable.  

By the way in the same function the mark is cleared only for the queues
making part of the flow, the same loop can be used to clear those tunnel
informations at the same time.

> If an error happened, counter will not be cleared and such state will
> impact tunnel type after port start again.

Unless an implementation error which other kind of them do you fear to
happen?

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-12  9:09       ` Nélio Laranjeiro
@ 2018-04-12 13:43         ` Xueming(Steven) Li
  2018-04-12 14:02           ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-12 13:43 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Thursday, April 12, 2018 5:09 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> 
> On Tue, Apr 10, 2018 at 03:22:46PM +0000, Xueming(Steven) Li wrote:
> > Hi Nelio,
> >
> > > -----Original Message-----
> > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > Sent: Tuesday, April 10, 2018 10:42 PM
> > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > priorities
> > >
> > > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > > > priorites support:
> > > > 0-3: RTE FLOW tunnel rule
> > > > 4-7: RTE FLOW non-tunnel rule
> > > > 8-15: PMD control flow
> > >
> > > This commit log is inducing people in error, this amount of priority
> > > depends on the Mellanox OFED installed, it is not available on
> > > upstream Linux kernel yet nor in the current Mellanox OFED GA.
> > >
> > > What happens when those amount of priority are not available, is it
> > > removing a functionality?  Will it collide with other flows?
> >
> > If 16  priorities not available, simply behavior as 8 priorities.
> 
> It is not described in the commit log, please add it.
> 
> > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> <snip/>
> > > >  	},
> > > >  	[HASH_RXQ_ETH] = {
> > > >  		.hash_fields = 0,
> > > >  		.dpdk_rss_hf = 0,
> > > > -		.flow_priority = 3,
> > > > +		.flow_priority = 2,
> > > >  	},
> > > >  };
> > >
> > > If the amount of priorities remains 8, you are removing the priority
> > > for the tunnel flows introduced by commit 749365717f5c ("net/mlx5:
> > > change tunnel flow priority")
> > >
> > > Please keep this functionality when this patch fails to get the
> > > expected
> > > 16 Verbs priorities.
> >
> > These priority shift are different in 16 priorities scenario, I
> > changed it to calculation. In function mlx5_flow_priorities_detect(),
> > priority shift will be 1 if 8 priorities, 4 in case of 16 priorities.
> > Please refer to changes in function mlx5_flow_update_priority() as well.
> 
> Please light my lamp, I don't see it...

Sorry, please refer to priv->config.flow_priority_shift.

> 
> <snip/>
> > > >  static void
> > > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > > +			  struct mlx5_flow_parse *parser,
> > > >  			  const struct rte_flow_attr *attr)  {
> > > > +	struct priv *priv = dev->data->dev_private;
> > > >  	unsigned int i;
> > > > +	uint16_t priority;
> > > >
> > > > +	if (priv->config.flow_priority_shift == 1)
> > > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > > +	else
> > > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > > +	if (!parser->inner)
> > > > +		priority += priv->config.flow_priority_shift;

Here, if non-tunnel flow, lower(increase) 1 for 8 priorities, lower 4 otherwise.
I'll append a comment here.

> > > >  	if (parser->drop) {
> > > > -		parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> > > > -			attr->priority +
> > > > -			hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > > > +		parser->queue[HASH_RXQ_ETH].ibv_attr->priority =
> priority +
> > > > +				hash_rxq_init[HASH_RXQ_ETH].flow_priority;
> > > >  		return;
> > > >  	}
> > > >  	for (i = 0; i != hash_rxq_init_n; ++i) {
> > > > -		if (parser->queue[i].ibv_attr) {
> > > > -			parser->queue[i].ibv_attr->priority =
> > > > -				attr->priority +
> > > > -				hash_rxq_init[i].flow_priority -
> > > > -				(parser->inner ? 1 : 0);
> > > > -		}
> > > > +		if (!parser->queue[i].ibv_attr)
> > > > +			continue;
> > > > +		parser->queue[i].ibv_attr->priority = priority +
> > > > +				hash_rxq_init[i].flow_priority;
> 
> Previous code was subtracting one from the table priorities which was
> starting at 1.  In the new code I don't see it.
> 
> What I am missing?

Please refer to new comment above around variable "priority" calculation.

> 
> > > >  	}
> > > >  }
> > > >
> > > > @@ -1087,7 +1097,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > > >  		.layer = HASH_RXQ_ETH,
> > > >  		.mark_id = MLX5_FLOW_MARK_DEFAULT,
> > > >  	};
> > > > -	ret = mlx5_flow_convert_attributes(attr, error);
> > > > +	ret = mlx5_flow_convert_attributes(dev, attr, error);
> > > >  	if (ret)
> > > >  		return ret;
> > > >  	ret = mlx5_flow_convert_actions(dev, actions, error, parser);
> @@
> > > > -1158,7 +1168,7 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > > >  	 */
> > > >  	if (!parser->drop)
> > > >  		mlx5_flow_convert_finalise(parser);
> > > > -	mlx5_flow_update_priority(parser, attr);
> > > > +	mlx5_flow_update_priority(dev, parser, attr);
> > > >  exit_free:
> > > >  	/* Only verification is expected, all resources should be
> released.
> > > */
> > > >  	if (!parser->create) {
> > > > @@ -2450,7 +2460,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
> > > >  	struct priv *priv = dev->data->dev_private;
> > > >  	const struct rte_flow_attr attr = {
> > > >  		.ingress = 1,
> > > > -		.priority = MLX5_CTRL_FLOW_PRIORITY,
> > > > +		.priority = priv->config.control_flow_priority,
> > > >  	};
> > > >  	struct rte_flow_item items[] = {
> > > >  		{
> > > > @@ -3161,3 +3171,50 @@ mlx5_dev_filter_ctrl(struct rte_eth_dev *dev,
> > > >  	}
> > > >  	return 0;
> > > >  }
> > > > +
> > > > +/**
> > > > + * Detect number of Verbs flow priorities supported.
> > > > + *
> > > > + * @param dev
> > > > + *   Pointer to Ethernet device.
> > > > + */
> > > > +void
> > > > +mlx5_flow_priorities_detect(struct rte_eth_dev *dev) {
> > > > +	struct priv *priv = dev->data->dev_private;
> > > > +	uint32_t verb_priorities = MLX5_VERBS_FLOW_PRIO_8 * 2;
> > > > +	struct {
> > > > +		struct ibv_flow_attr attr;
> > > > +		struct ibv_flow_spec_eth eth;
> > > > +		struct ibv_flow_spec_action_drop drop;
> > > > +	} flow_attr = {
> > > > +		.attr = {
> > > > +			.num_of_specs = 2,
> > > > +			.priority = verb_priorities - 1,
> > > > +		},
> > > > +		.eth = {
> > > > +			.type = IBV_FLOW_SPEC_ETH,
> > > > +			.size = sizeof(struct ibv_flow_spec_eth),
> > > > +		},
> > > > +		.drop = {
> > > > +			.size = sizeof(struct ibv_flow_spec_action_drop),
> > > > +			.type = IBV_FLOW_SPEC_ACTION_DROP,
> > > > +		},
> > > > +	};
> > > > +	struct ibv_flow *flow;
> > > > +
> > > > +	if (priv->config.control_flow_priority)
> > > > +		return;
> > > > +	flow = mlx5_glue->create_flow(priv->flow_drop_queue->qp,
> > > > +				      &flow_attr.attr);
> > > > +	if (flow) {
> > > > +		priv->config.flow_priority_shift =
> MLX5_VERBS_FLOW_PRIO_8 / 2;
> > > > +		claim_zero(mlx5_glue->destroy_flow(flow));
> > > > +	} else {
> > > > +		priv->config.flow_priority_shift = 1;
> > > > +		verb_priorities = verb_priorities / 2;
> > > > +	}
> > > > +	priv->config.control_flow_priority = 1;
> > > > +	DRV_LOG(INFO, "port %u Verbs flow priorities: %d",
> > > > +		dev->data->port_id, verb_priorities); }
> > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > > > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688
> > > > 100644
> > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  	int ret;
> > > >
> > > >  	dev->data->dev_started = 1;
> > > > -	ret = mlx5_flow_create_drop_queue(dev);
> > > > -	if (ret) {
> > > > -		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > > > -			dev->data->port_id, strerror(rte_errno));
> > > > -		goto error;
> > > > -	}
> > > >  	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx
> queues",
> > > >  		dev->data->port_id);
> > > >  	rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > > > mlx5_dev_start(struct rte_eth_dev *dev)
> > > >  	mlx5_traffic_disable(dev);
> > > >  	mlx5_txq_stop(dev);
> > > >  	mlx5_rxq_stop(dev);
> > > > -	mlx5_flow_delete_drop_queue(dev);
> > > >  	rte_errno = ret; /* Restore rte_errno. */
> > > >  	return -rte_errno;
> > > >  }
> > > > @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
> > > >  	mlx5_rxq_stop(dev);
> > > >  	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv-
> >mr))
> > > >  		mlx5_mr_release(mr);
> > > > -	mlx5_flow_delete_drop_queue(dev);
> > > >  }
> > > >
> > > >  /**
> > > > --
> > > > 2.13.3
> > >
> > > I have few concerns on this, mlx5_pci_probe() will also probe any
> > > under layer verbs device, and in a near future the representors
> > > associated to a VF.
> > > Making such detection should only be done once by the PF, I also
> > > wander if it is possible to make such drop action in a representor
> > > directly using Verbs.
> >
> > Then there should be some work to disable flows in representors? that
> > supposed to cover this.
> 
> The code raising another Verbs device is already present and since the
> first entrance of this PMD in the DPDK tree, you must respect the code
> already present.
> This request is not related directly to a new feature but to an existing
> one, the representors being just an example.
> 
> This detection should be only done once and not for each of them.

Could you please elaborate on "under layer verbs device" and how to judge
dependency to PF, is there a probe order between them?

BTW, VF representor code seems exists in 17.11, not upstream.

> 
> > > Another concern is, this patch will be reverted in some time when
> > > those
> > > 16 priority will be always available.  It will be easier to remove
> > > this detection function than searching for all those modifications.
> > >
> > > I would suggest to have a standalone mlx5_flow_priorities_detect()
> > > which creates and deletes all resources needed for this detection.
> >
> > There is an upcoming new feature to support priorities more than 16,
> > auto detection will be kept IMHO.
> 
> Until the final values of priorities will be backported to all kernels we
> support.  You don't see far enough in the future.
> 
> > Besides, there will be a bundle of resource creation and removal in
> > this standalone function, I'm not sure it valuable to duplicate them,
> > please refer to function mlx5_flow_create_drop_queue().
> 
> You misunderstood, I am not asking you to not use the default drop queues
> but instead of making an rte_flow attributes, items and actions to make
> directly the Verbs specification on the stack.  It will be faster than
> making a bunch of conversions (relying on malloc) from rte_flow to Verbs
> whereas you know exactly what it needs i.e. 1 spec.

Sorry, still confused, mlx5_flow_priorities_detect() invokes ibv_destroy_flow(),
not rte_flow stuff, no malloc at all. BTW, mlx5 flow api bypass verb flow in 
offline mode, we can't use it to create flows at such stage.

> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-12 13:43         ` Xueming(Steven) Li
@ 2018-04-12 14:02           ` Nélio Laranjeiro
  2018-04-12 14:46             ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-12 14:02 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Thu, Apr 12, 2018 at 01:43:04PM +0000, Xueming(Steven) Li wrote:
> 
> 
> > -----Original Message-----
> > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Sent: Thursday, April 12, 2018 5:09 PM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> > 
> > On Tue, Apr 10, 2018 at 03:22:46PM +0000, Xueming(Steven) Li wrote:
> > > Hi Nelio,
> > >
> > > > -----Original Message-----
> > > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > Sent: Tuesday, April 10, 2018 10:42 PM
> > > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > > priorities
> > > >
> > > > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > > > Adjust flow priority mapping to adapt new hardware 16 verb flow
> > > > > priorites support:
> > > > > 0-3: RTE FLOW tunnel rule
> > > > > 4-7: RTE FLOW non-tunnel rule
> > > > > 8-15: PMD control flow
> > > >
> > > > This commit log is inducing people in error, this amount of priority
> > > > depends on the Mellanox OFED installed, it is not available on
> > > > upstream Linux kernel yet nor in the current Mellanox OFED GA.
> > > >
> > > > What happens when those amount of priority are not available, is it
> > > > removing a functionality?  Will it collide with other flows?
> > >
> > > If 16  priorities not available, simply behavior as 8 priorities.
> > 
> > It is not described in the commit log, please add it.
> > 
> > > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > <snip/>
> > > > >  	},
> > > > >  	[HASH_RXQ_ETH] = {
> > > > >  		.hash_fields = 0,
> > > > >  		.dpdk_rss_hf = 0,
> > > > > -		.flow_priority = 3,
> > > > > +		.flow_priority = 2,
> > > > >  	},
> > > > >  };
> > > >
> > > > If the amount of priorities remains 8, you are removing the priority
> > > > for the tunnel flows introduced by commit 749365717f5c ("net/mlx5:
> > > > change tunnel flow priority")
> > > >
> > > > Please keep this functionality when this patch fails to get the
> > > > expected
> > > > 16 Verbs priorities.
> > >
> > > These priority shift are different in 16 priorities scenario, I
> > > changed it to calculation. In function mlx5_flow_priorities_detect(),
> > > priority shift will be 1 if 8 priorities, 4 in case of 16 priorities.
> > > Please refer to changes in function mlx5_flow_update_priority() as well.
> > 
> > Please light my lamp, I don't see it...
> 
> Sorry, please refer to priv->config.flow_priority_shift.
> 
> > 
> > <snip/>
> > > > >  static void
> > > > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > > > +			  struct mlx5_flow_parse *parser,
> > > > >  			  const struct rte_flow_attr *attr)  {
> > > > > +	struct priv *priv = dev->data->dev_private;
> > > > >  	unsigned int i;
> > > > > +	uint16_t priority;
> > > > >
> > > > > +	if (priv->config.flow_priority_shift == 1)
> > > > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > > > +	else
> > > > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > > > +	if (!parser->inner)
> > > > > +		priority += priv->config.flow_priority_shift;
> 
> Here, if non-tunnel flow, lower(increase) 1 for 8 priorities, lower 4 otherwise.
> I'll append a comment here.

Thanks, I totally missed this one.

<snip/>
> > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > > > > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688
> > > > > 100644
> > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  	int ret;
> > > > >
> > > > >  	dev->data->dev_started = 1;
> > > > > -	ret = mlx5_flow_create_drop_queue(dev);
> > > > > -	if (ret) {
> > > > > -		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > > > > -			dev->data->port_id, strerror(rte_errno));
> > > > > -		goto error;
> > > > > -	}
> > > > >  	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx
> > queues",
> > > > >  		dev->data->port_id);
> > > > >  	rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > > > > mlx5_dev_start(struct rte_eth_dev *dev)
> > > > >  	mlx5_traffic_disable(dev);
> > > > >  	mlx5_txq_stop(dev);
> > > > >  	mlx5_rxq_stop(dev);
> > > > > -	mlx5_flow_delete_drop_queue(dev);
> > > > >  	rte_errno = ret; /* Restore rte_errno. */
> > > > >  	return -rte_errno;
> > > > >  }
> > > > > @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
> > > > >  	mlx5_rxq_stop(dev);
> > > > >  	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv-
> > >mr))
> > > > >  		mlx5_mr_release(mr);
> > > > > -	mlx5_flow_delete_drop_queue(dev);
> > > > >  }
> > > > >
> > > > >  /**
> > > > > --
> > > > > 2.13.3
> > > >
> > > > I have few concerns on this, mlx5_pci_probe() will also probe any
> > > > under layer verbs device, and in a near future the representors
> > > > associated to a VF.
> > > > Making such detection should only be done once by the PF, I also
> > > > wander if it is possible to make such drop action in a representor
> > > > directly using Verbs.
> > >
> > > Then there should be some work to disable flows in representors? that
> > > supposed to cover this.
> > 
> > The code raising another Verbs device is already present and since the
> > first entrance of this PMD in the DPDK tree, you must respect the code
> > already present.
> > This request is not related directly to a new feature but to an existing
> > one, the representors being just an example.
> > 
> > This detection should be only done once and not for each of them.
> 
> Could you please elaborate on "under layer verbs device" and how to judge
> dependency to PF, is there a probe order between them?

The place where you are adding the mlx5_flow_create_drop_queue() in
mlx5_pci_probe() is probing any device returned by the verbs
ibv_get_device_list().

This code is also present in mlx4 where for a single PCI id there are 2
physical ports.

If the NIC handle several ibvdev (Verbs devices) it will create a new
rte_eth_dev for each one.

> BTW, VF representor code seems exists in 17.11, not upstream.
> 
> > 
> > > > Another concern is, this patch will be reverted in some time when
> > > > those
> > > > 16 priority will be always available.  It will be easier to remove
> > > > this detection function than searching for all those modifications.
> > > >
> > > > I would suggest to have a standalone mlx5_flow_priorities_detect()
> > > > which creates and deletes all resources needed for this detection.
> > >
> > > There is an upcoming new feature to support priorities more than 16,
> > > auto detection will be kept IMHO.
> > 
> > Until the final values of priorities will be backported to all kernels we
> > support.  You don't see far enough in the future.
> > 
> > > Besides, there will be a bundle of resource creation and removal in
> > > this standalone function, I'm not sure it valuable to duplicate them,
> > > please refer to function mlx5_flow_create_drop_queue().
> > 
> > You misunderstood, I am not asking you to not use the default drop queues
> > but instead of making an rte_flow attributes, items and actions to make
> > directly the Verbs specification on the stack.  It will be faster than
> > making a bunch of conversions (relying on malloc) from rte_flow to Verbs
> > whereas you know exactly what it needs i.e. 1 spec.
> 
> Sorry, still confused, mlx5_flow_priorities_detect() invokes ibv_destroy_flow(),
> not rte_flow stuff, no malloc at all. BTW, mlx5 flow api bypass verb flow in 
> offline mode, we can't use it to create flows at such stage.

Sorry I was the one confused. Priority detection is Ok.

After reading this, I'll suggest to use a boolean in the
mlx5_pci_probe() to only make this detection once, the underlying verbs
devices will inherit from such knowledge and adjust their own shift
accordingly.

What do you think?

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-12  9:50       ` Nélio Laranjeiro
@ 2018-04-12 14:27         ` Xueming(Steven) Li
  2018-04-13  8:37           ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-12 14:27 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Thursday, April 12, 2018 5:51 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> identification
> 
> On Wed, Apr 11, 2018 at 08:11:50AM +0000, Xueming(Steven) Li wrote:
> > Hi Nelio,
> >
> > > -----Original Message-----
> > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > Sent: Tuesday, April 10, 2018 11:17 PM
> > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > > identification
> > >
> > > On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > > > This patch introduced tunnel type identification based on flow rules.
> > > > If flows of multiple tunnel types built on same queue,
> > > > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be
> > > > used as tunnel type identifier.
> > >
> > > I don't see anywhere in this patch where the bits are reserved to
> > > identify a flow, nor values which can help to identify it.
> > >
> > > Is this missing?
> > >
> > > Anyway we have already very few bits in the mark making it difficult
> > > to be used by the user, reserving again some to may lead to remove
> > > the mark support from the flows.
> >
> > Not all users will use multiple tunnel types, this is not included in
> > this patch set and left to user decision. I'll update comments to make
> this clear.
> 
> Thanks,
> 
> > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> <snip/>
> > > >  /**
> > > > + * RXQ update after flow rule creation.
> > > > + *
> > > > + * @param dev
> > > > + *   Pointer to Ethernet device.
> > > > + * @param flow
> > > > + *   Pointer to the flow rule.
> > > > + */
> > > > +static void
> > > > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct
> > > > +rte_flow
> > > > +*flow) {
> > > > +	struct priv *priv = dev->data->dev_private;
> > > > +	unsigned int i;
> > > > +
> > > > +	if (!dev->data->dev_started)
> > > > +		return;
> > > > +	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > > > +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > > > +						 [(*flow->queues)[i]];
> > > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > > > +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > > > +
> > > > +		rxq_data->mark |= flow->mark;
> > > > +		if (!tunnel)
> > > > +			continue;
> > > > +		rxq_ctrl->tunnel_types[tunnel] += 1;
> > >
> > > I don't understand why you need such array, the NIC is unable to
> > > return the tunnel type has it returns only one bit saying tunnel.
> > > Why don't it store in the priv structure the current configured tunnel?
> >
> > This array is used to count tunnel types bound to queue, if only one
> > tunnel type, ptype will report that tunnel type, TUNNEL MASK(max
> > value) will be returned if multiple types bound to a queue.
> >
> > Flow rss action specifies queues that binding to tunnel, thus we can't
> > assume all queues have same tunnel types, so this is a per queue
> structure.
> 
> There is something I am missing here, how in the dataplane the PMD can
> understand from 1 bit which kind of tunnel the packet is matching?

The code under this line is answer, let me post here: 
		if (rxq_data->tunnel != flow->tunnel)
			rxq_data->tunnel = rxq_data->tunnel ?
					   RTE_PTYPE_TUNNEL_MASK :
					   flow->tunnel;
If no tunnel type associated to rxq, use tunnel type from flow.
If a new tunnel type from flow, use RTE_PTYPE_TUNNEL_MASK.

> 
> <snip/>
> > > > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > struct mlx5_flows *list)  {
> > > >  	struct priv *priv = dev->data->dev_private;
> > > >  	struct rte_flow *flow;
> > > > +	unsigned int i;
> > > >
> > > >  	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > > > -		unsigned int i;
> > > >  		struct mlx5_ind_table_ibv *ind_tbl = NULL;
> > > >
> > > >  		if (flow->drop) {
> > > > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > struct
> > > mlx5_flows *list)
> > > >  		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data-
> >port_id,
> > > >  			(void *)flow);
> > > >  	}
> > > > +	/* Cleanup Rx queue tunnel info. */
> > > > +	for (i = 0; i != priv->rxqs_n; ++i) {
> > > > +		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > +			container_of(q, struct mlx5_rxq_ctrl, rxq);
> > > > +
> > > > +		memset((void *)rxq_ctrl->tunnel_types, 0,
> > > > +		       sizeof(rxq_ctrl->tunnel_types));
> > > > +		q->tunnel = 0;
> > > > +	}
> > > >  }
> > >
> > > This hunk does not handle the fact the Rx queue array may have some
> > > holes i.e. the application is allowed to ask for 10 queues and only
> > > initialise some.  In such situation this code will segfault.
> >
> > In other words, "q" could be NULL, correct? I'll add check for this.
> 
> Correct.
> 
> > BTW, there should be an action item to add such check in rss/queue flow
> creation.
> 
> As it is the responsibility of the application/user to make rule according
> to what it has configured, it has not been added.  It can still be added,
> but it cannot be considered as a fix.
> 
> > > It should only memset the Rx queues making part of the flow not the
> others.
> >
> > Clean this(decrease tunnel_types counter of each queue) from each flow
> > would be time consuming.
> 
> Considering flows are already relying on syscall to communicate with the
> kernel, the extra cycles consumption to only clear the queues making part
> of this flow is neglectable.
> 
> By the way in the same function the mark is cleared only for the queues
> making part of the flow, the same loop can be used to clear those tunnel
> informations at the same time.
> 
> > If an error happened, counter will not be cleared and such state will
> > impact tunnel type after port start again.
> 
> Unless an implementation error which other kind of them do you fear to
> happen?

Mark of rxq simply reset to 0, this field is counter, the final target is to 
clear field value, so my code should be straight forward and error free 😊

From a quick look, this function could be much simple that what it is today:
1. clean verb flow and hrex where possible, despite of flow type.
2. clean rxq state: mark and tunnel_types.

> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
  2018-04-12 14:02           ` Nélio Laranjeiro
@ 2018-04-12 14:46             ` Xueming(Steven) Li
  0 siblings, 0 replies; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-12 14:46 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Thursday, April 12, 2018 10:03 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware priorities
> 
> On Thu, Apr 12, 2018 at 01:43:04PM +0000, Xueming(Steven) Li wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > Sent: Thursday, April 12, 2018 5:09 PM
> > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > priorities
> > >
> > > On Tue, Apr 10, 2018 at 03:22:46PM +0000, Xueming(Steven) Li wrote:
> > > > Hi Nelio,
> > > >
> > > > > -----Original Message-----
> > > > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > Sent: Tuesday, April 10, 2018 10:42 PM
> > > > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > > > Subject: Re: [PATCH v2 01/15] net/mlx5: support 16 hardware
> > > > > priorities
> > > > >
> > > > > On Tue, Apr 10, 2018 at 09:34:01PM +0800, Xueming Li wrote:
> > > > > > Adjust flow priority mapping to adapt new hardware 16 verb
> > > > > > flow priorites support:
> > > > > > 0-3: RTE FLOW tunnel rule
> > > > > > 4-7: RTE FLOW non-tunnel rule
> > > > > > 8-15: PMD control flow
> > > > >
> > > > > This commit log is inducing people in error, this amount of
> > > > > priority depends on the Mellanox OFED installed, it is not
> > > > > available on upstream Linux kernel yet nor in the current Mellanox
> OFED GA.
> > > > >
> > > > > What happens when those amount of priority are not available, is
> > > > > it removing a functionality?  Will it collide with other flows?
> > > >
> > > > If 16  priorities not available, simply behavior as 8 priorities.
> > >
> > > It is not described in the commit log, please add it.
> > >
> > > > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > > <snip/>
> > > > > >  	},
> > > > > >  	[HASH_RXQ_ETH] = {
> > > > > >  		.hash_fields = 0,
> > > > > >  		.dpdk_rss_hf = 0,
> > > > > > -		.flow_priority = 3,
> > > > > > +		.flow_priority = 2,
> > > > > >  	},
> > > > > >  };
> > > > >
> > > > > If the amount of priorities remains 8, you are removing the
> > > > > priority for the tunnel flows introduced by commit 749365717f5c
> ("net/mlx5:
> > > > > change tunnel flow priority")
> > > > >
> > > > > Please keep this functionality when this patch fails to get the
> > > > > expected
> > > > > 16 Verbs priorities.
> > > >
> > > > These priority shift are different in 16 priorities scenario, I
> > > > changed it to calculation. In function
> > > > mlx5_flow_priorities_detect(), priority shift will be 1 if 8
> priorities, 4 in case of 16 priorities.
> > > > Please refer to changes in function mlx5_flow_update_priority() as
> well.
> > >
> > > Please light my lamp, I don't see it...
> >
> > Sorry, please refer to priv->config.flow_priority_shift.
> >
> > >
> > > <snip/>
> > > > > >  static void
> > > > > > -mlx5_flow_update_priority(struct mlx5_flow_parse *parser,
> > > > > > +mlx5_flow_update_priority(struct rte_eth_dev *dev,
> > > > > > +			  struct mlx5_flow_parse *parser,
> > > > > >  			  const struct rte_flow_attr *attr)  {
> > > > > > +	struct priv *priv = dev->data->dev_private;
> > > > > >  	unsigned int i;
> > > > > > +	uint16_t priority;
> > > > > >
> > > > > > +	if (priv->config.flow_priority_shift == 1)
> > > > > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_4;
> > > > > > +	else
> > > > > > +		priority = attr->priority * MLX5_VERBS_FLOW_PRIO_8;
> > > > > > +	if (!parser->inner)
> > > > > > +		priority += priv->config.flow_priority_shift;
> >
> > Here, if non-tunnel flow, lower(increase) 1 for 8 priorities, lower 4
> otherwise.
> > I'll append a comment here.
> 
> Thanks, I totally missed this one.
> 
> <snip/>
> > > > > > diff --git a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > b/drivers/net/mlx5/mlx5_trigger.c index 6bb4ffb14..d80a2e688
> > > > > > 100644
> > > > > > --- a/drivers/net/mlx5/mlx5_trigger.c
> > > > > > +++ b/drivers/net/mlx5/mlx5_trigger.c
> > > > > > @@ -148,12 +148,6 @@ mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  	int ret;
> > > > > >
> > > > > >  	dev->data->dev_started = 1;
> > > > > > -	ret = mlx5_flow_create_drop_queue(dev);
> > > > > > -	if (ret) {
> > > > > > -		DRV_LOG(ERR, "port %u drop queue allocation failed: %s",
> > > > > > -			dev->data->port_id, strerror(rte_errno));
> > > > > > -		goto error;
> > > > > > -	}
> > > > > >  	DRV_LOG(DEBUG, "port %u allocating and configuring hash Rx
> > > queues",
> > > > > >  		dev->data->port_id);
> > > > > >  	rte_mempool_walk(mlx5_mp2mr_iter, priv); @@ -202,7 +196,6 @@
> > > > > > mlx5_dev_start(struct rte_eth_dev *dev)
> > > > > >  	mlx5_traffic_disable(dev);
> > > > > >  	mlx5_txq_stop(dev);
> > > > > >  	mlx5_rxq_stop(dev);
> > > > > > -	mlx5_flow_delete_drop_queue(dev);
> > > > > >  	rte_errno = ret; /* Restore rte_errno. */
> > > > > >  	return -rte_errno;
> > > > > >  }
> > > > > > @@ -237,7 +230,6 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
> > > > > >  	mlx5_rxq_stop(dev);
> > > > > >  	for (mr = LIST_FIRST(&priv->mr); mr; mr = LIST_FIRST(&priv-
> > > >mr))
> > > > > >  		mlx5_mr_release(mr);
> > > > > > -	mlx5_flow_delete_drop_queue(dev);
> > > > > >  }
> > > > > >
> > > > > >  /**
> > > > > > --
> > > > > > 2.13.3
> > > > >
> > > > > I have few concerns on this, mlx5_pci_probe() will also probe
> > > > > any under layer verbs device, and in a near future the
> > > > > representors associated to a VF.
> > > > > Making such detection should only be done once by the PF, I also
> > > > > wander if it is possible to make such drop action in a
> > > > > representor directly using Verbs.
> > > >
> > > > Then there should be some work to disable flows in representors?
> > > > that supposed to cover this.
> > >
> > > The code raising another Verbs device is already present and since
> > > the first entrance of this PMD in the DPDK tree, you must respect
> > > the code already present.
> > > This request is not related directly to a new feature but to an
> > > existing one, the representors being just an example.
> > >
> > > This detection should be only done once and not for each of them.
> >
> > Could you please elaborate on "under layer verbs device" and how to
> > judge dependency to PF, is there a probe order between them?
> 
> The place where you are adding the mlx5_flow_create_drop_queue() in
> mlx5_pci_probe() is probing any device returned by the verbs
> ibv_get_device_list().
> 
> This code is also present in mlx4 where for a single PCI id there are 2
> physical ports.
> 
> If the NIC handle several ibvdev (Verbs devices) it will create a new
> rte_eth_dev for each one.
> 
> > BTW, VF representor code seems exists in 17.11, not upstream.
> >
> > >
> > > > > Another concern is, this patch will be reverted in some time
> > > > > when those
> > > > > 16 priority will be always available.  It will be easier to
> > > > > remove this detection function than searching for all those
> modifications.
> > > > >
> > > > > I would suggest to have a standalone
> > > > > mlx5_flow_priorities_detect() which creates and deletes all
> resources needed for this detection.
> > > >
> > > > There is an upcoming new feature to support priorities more than
> > > > 16, auto detection will be kept IMHO.
> > >
> > > Until the final values of priorities will be backported to all
> > > kernels we support.  You don't see far enough in the future.
> > >
> > > > Besides, there will be a bundle of resource creation and removal
> > > > in this standalone function, I'm not sure it valuable to duplicate
> > > > them, please refer to function mlx5_flow_create_drop_queue().
> > >
> > > You misunderstood, I am not asking you to not use the default drop
> > > queues but instead of making an rte_flow attributes, items and
> > > actions to make directly the Verbs specification on the stack.  It
> > > will be faster than making a bunch of conversions (relying on
> > > malloc) from rte_flow to Verbs whereas you know exactly what it needs
> i.e. 1 spec.
> >
> > Sorry, still confused, mlx5_flow_priorities_detect() invokes
> > ibv_destroy_flow(), not rte_flow stuff, no malloc at all. BTW, mlx5
> > flow api bypass verb flow in offline mode, we can't use it to create
> flows at such stage.
> 
> Sorry I was the one confused. Priority detection is Ok.
> 
> After reading this, I'll suggest to use a boolean in the
> mlx5_pci_probe() to only make this detection once, the underlying verbs
> devices will inherit from such knowledge and adjust their own shift
> accordingly.
> 
> What do you think?

Finally got it, fortunately, Shahaf has similar suggestion in 17.11 review.
I'll make mlx5_flow_priorities_detect() simply return number of supported 
priorities, and this number will be used to make this function rule only once
in loop. Many thanks for your suggestion, I'm interpreting it in complex.

BTW, if no other comments on this series, I'll upload a new version.

> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-12 14:27         ` Xueming(Steven) Li
@ 2018-04-13  8:37           ` Nélio Laranjeiro
  2018-04-13 12:09             ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-13  8:37 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Thu, Apr 12, 2018 at 02:27:45PM +0000, Xueming(Steven) Li wrote:
> > -----Original Message-----
> > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Sent: Thursday, April 12, 2018 5:51 PM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > identification
> > 
> > On Wed, Apr 11, 2018 at 08:11:50AM +0000, Xueming(Steven) Li wrote:
> > > Hi Nelio,
> > >
> > > > -----Original Message-----
> > > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > Sent: Tuesday, April 10, 2018 11:17 PM
> > > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > > > identification
> > > >
> > > > On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > > > > This patch introduced tunnel type identification based on flow rules.
> > > > > If flows of multiple tunnel types built on same queue,
> > > > > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark could be
> > > > > used as tunnel type identifier.
> > > >
> > > > I don't see anywhere in this patch where the bits are reserved to
> > > > identify a flow, nor values which can help to identify it.
> > > >
> > > > Is this missing?
> > > >
> > > > Anyway we have already very few bits in the mark making it difficult
> > > > to be used by the user, reserving again some to may lead to remove
> > > > the mark support from the flows.
> > >
> > > Not all users will use multiple tunnel types, this is not included in
> > > this patch set and left to user decision. I'll update comments to make
> > this clear.
> > 
> > Thanks,
> > 
> > > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > <snip/>
> > > > >  /**
> > > > > + * RXQ update after flow rule creation.
> > > > > + *
> > > > > + * @param dev
> > > > > + *   Pointer to Ethernet device.
> > > > > + * @param flow
> > > > > + *   Pointer to the flow rule.
> > > > > + */
> > > > > +static void
> > > > > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct
> > > > > +rte_flow
> > > > > +*flow) {
> > > > > +	struct priv *priv = dev->data->dev_private;
> > > > > +	unsigned int i;
> > > > > +
> > > > > +	if (!dev->data->dev_started)
> > > > > +		return;
> > > > > +	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > > > > +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > > > > +						 [(*flow->queues)[i]];
> > > > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > > +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > > > > +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > > > > +
> > > > > +		rxq_data->mark |= flow->mark;
> > > > > +		if (!tunnel)
> > > > > +			continue;
> > > > > +		rxq_ctrl->tunnel_types[tunnel] += 1;
> > > >
> > > > I don't understand why you need such array, the NIC is unable to
> > > > return the tunnel type has it returns only one bit saying tunnel.
> > > > Why don't it store in the priv structure the current configured tunnel?
> > >
> > > This array is used to count tunnel types bound to queue, if only one
> > > tunnel type, ptype will report that tunnel type, TUNNEL MASK(max
> > > value) will be returned if multiple types bound to a queue.
> > >
> > > Flow rss action specifies queues that binding to tunnel, thus we can't
> > > assume all queues have same tunnel types, so this is a per queue
> > structure.
> > 
> > There is something I am missing here, how in the dataplane the PMD can
> > understand from 1 bit which kind of tunnel the packet is matching?
> 
> The code under this line is answer, let me post here: 
> 		if (rxq_data->tunnel != flow->tunnel)
> 			rxq_data->tunnel = rxq_data->tunnel ?
> 					   RTE_PTYPE_TUNNEL_MASK :
> 					   flow->tunnel;
> If no tunnel type associated to rxq, use tunnel type from flow.
> If a new tunnel type from flow, use RTE_PTYPE_TUNNEL_MASK.

>From my understanding, when in the same queue there are several tunnel
offloads, the mbuf ptype will contains RTE_PTYPE_TUNNEL_MASK:

   @@ -1601,7 +1605,7 @@ rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
           * bit[7] = outer_l3_type
           */
          idx = ((pinfo & 0x3) << 6) | ((ptype & 0xfc00) >> 10);
  -       return mlx5_ptype_table[idx];
  +       return mlx5_ptype_table[idx] | rxq->tunnel * !!(idx & (1 << 6));
   }


Used by Rx burst functions,

/* Update packet information. */
 pkt->packet_type = rxq_cq_to_pkt_type(cqe);

Is this correct?

There is another strange point here, 

 +       [PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)] = RTE_PTYPE_TUNNEL_VXLAN |
 +                                             RTE_PTYPE_L4_UDP,

According to the RFC 7348 [1] having a VXLAN with an outer IPv6 is
possible.  How do you handle it?

> > <snip/>
> > > > > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > > struct mlx5_flows *list)  {
> > > > >  	struct priv *priv = dev->data->dev_private;
> > > > >  	struct rte_flow *flow;
> > > > > +	unsigned int i;
> > > > >
> > > > >  	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > > > > -		unsigned int i;
> > > > >  		struct mlx5_ind_table_ibv *ind_tbl = NULL;
> > > > >
> > > > >  		if (flow->drop) {
> > > > > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > > struct
> > > > mlx5_flows *list)
> > > > >  		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data-
> > >port_id,
> > > > >  			(void *)flow);
> > > > >  	}
> > > > > +	/* Cleanup Rx queue tunnel info. */
> > > > > +	for (i = 0; i != priv->rxqs_n; ++i) {
> > > > > +		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > > > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > > +			container_of(q, struct mlx5_rxq_ctrl, rxq);
> > > > > +
> > > > > +		memset((void *)rxq_ctrl->tunnel_types, 0,
> > > > > +		       sizeof(rxq_ctrl->tunnel_types));
> > > > > +		q->tunnel = 0;
> > > > > +	}
> > > > >  }
> > > >
> > > > This hunk does not handle the fact the Rx queue array may have some
> > > > holes i.e. the application is allowed to ask for 10 queues and only
> > > > initialise some.  In such situation this code will segfault.
> > >
> > > In other words, "q" could be NULL, correct? I'll add check for this.
> > 
> > Correct.
> > 
> > > BTW, there should be an action item to add such check in rss/queue flow
> > creation.
> > 
> > As it is the responsibility of the application/user to make rule according
> > to what it has configured, it has not been added.  It can still be added,
> > but it cannot be considered as a fix.
> > 
> > > > It should only memset the Rx queues making part of the flow not the
> > others.
> > >
> > > Clean this(decrease tunnel_types counter of each queue) from each flow
> > > would be time consuming.
> > 
> > Considering flows are already relying on syscall to communicate with the
> > kernel, the extra cycles consumption to only clear the queues making part
> > of this flow is neglectable.
> > 
> > By the way in the same function the mark is cleared only for the queues
> > making part of the flow, the same loop can be used to clear those tunnel
> > informations at the same time.
> > 
> > > If an error happened, counter will not be cleared and such state will
> > > impact tunnel type after port start again.
> > 
> > Unless an implementation error which other kind of them do you fear to
> > happen?
> 
> Mark of rxq simply reset to 0, this field is counter, the final target is to 
> clear field value, so my code should be straight forward and error free 😊
> 
> From a quick look, this function could be much simple that what it is today:
> 1. clean verb flow and hrex where possible, despite of flow type.
> 2. clean rxq state: mark and tunnel_types.

Ok.

Thanks,

[1] https://dpdk.org/patch/37965

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification
  2018-04-13  8:37           ` Nélio Laranjeiro
@ 2018-04-13 12:09             ` Xueming(Steven) Li
  0 siblings, 0 replies; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-13 12:09 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Friday, April 13, 2018 4:38 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> identification
> 
> On Thu, Apr 12, 2018 at 02:27:45PM +0000, Xueming(Steven) Li wrote:
> > > -----Original Message-----
> > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > Sent: Thursday, April 12, 2018 5:51 PM
> > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > > identification
> > >
> > > On Wed, Apr 11, 2018 at 08:11:50AM +0000, Xueming(Steven) Li wrote:
> > > > Hi Nelio,
> > > >
> > > > > -----Original Message-----
> > > > > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > > > > Sent: Tuesday, April 10, 2018 11:17 PM
> > > > > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > > > > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > > > > Subject: Re: [PATCH v2 04/15] net/mlx5: support Rx tunnel type
> > > > > identification
> > > > >
> > > > > On Tue, Apr 10, 2018 at 09:34:04PM +0800, Xueming Li wrote:
> > > > > > This patch introduced tunnel type identification based on flow
> rules.
> > > > > > If flows of multiple tunnel types built on same queue,
> > > > > > RTE_PTYPE_TUNNEL_MASK will be returned, bits in flow mark
> > > > > > could be used as tunnel type identifier.
> > > > >
> > > > > I don't see anywhere in this patch where the bits are reserved
> > > > > to identify a flow, nor values which can help to identify it.
> > > > >
> > > > > Is this missing?
> > > > >
> > > > > Anyway we have already very few bits in the mark making it
> > > > > difficult to be used by the user, reserving again some to may
> > > > > lead to remove the mark support from the flows.
> > > >
> > > > Not all users will use multiple tunnel types, this is not included
> > > > in this patch set and left to user decision. I'll update comments
> > > > to make
> > > this clear.
> > >
> > > Thanks,
> > >
> > > > > > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > > <snip/>
> > > > > >  /**
> > > > > > + * RXQ update after flow rule creation.
> > > > > > + *
> > > > > > + * @param dev
> > > > > > + *   Pointer to Ethernet device.
> > > > > > + * @param flow
> > > > > > + *   Pointer to the flow rule.
> > > > > > + */
> > > > > > +static void
> > > > > > +mlx5_flow_create_update_rxqs(struct rte_eth_dev *dev, struct
> > > > > > +rte_flow
> > > > > > +*flow) {
> > > > > > +	struct priv *priv = dev->data->dev_private;
> > > > > > +	unsigned int i;
> > > > > > +
> > > > > > +	if (!dev->data->dev_started)
> > > > > > +		return;
> > > > > > +	for (i = 0; i != flow->rss_conf.queue_num; ++i) {
> > > > > > +		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)
> > > > > > +						 [(*flow->queues)[i]];
> > > > > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > > > +			container_of(rxq_data, struct mlx5_rxq_ctrl, rxq);
> > > > > > +		uint8_t tunnel = PTYPE_IDX(flow->tunnel);
> > > > > > +
> > > > > > +		rxq_data->mark |= flow->mark;
> > > > > > +		if (!tunnel)
> > > > > > +			continue;
> > > > > > +		rxq_ctrl->tunnel_types[tunnel] += 1;
> > > > >
> > > > > I don't understand why you need such array, the NIC is unable to
> > > > > return the tunnel type has it returns only one bit saying tunnel.
> > > > > Why don't it store in the priv structure the current configured
> tunnel?
> > > >
> > > > This array is used to count tunnel types bound to queue, if only
> > > > one tunnel type, ptype will report that tunnel type, TUNNEL
> > > > MASK(max
> > > > value) will be returned if multiple types bound to a queue.
> > > >
> > > > Flow rss action specifies queues that binding to tunnel, thus we
> > > > can't assume all queues have same tunnel types, so this is a per
> > > > queue
> > > structure.
> > >
> > > There is something I am missing here, how in the dataplane the PMD
> > > can understand from 1 bit which kind of tunnel the packet is matching?
> >
> > The code under this line is answer, let me post here:
> > 		if (rxq_data->tunnel != flow->tunnel)
> > 			rxq_data->tunnel = rxq_data->tunnel ?
> > 					   RTE_PTYPE_TUNNEL_MASK :
> > 					   flow->tunnel;
> > If no tunnel type associated to rxq, use tunnel type from flow.
> > If a new tunnel type from flow, use RTE_PTYPE_TUNNEL_MASK.
> 
> From my understanding, when in the same queue there are several tunnel
> offloads, the mbuf ptype will contains RTE_PTYPE_TUNNEL_MASK:
> 
>    @@ -1601,7 +1605,7 @@ rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
>            * bit[7] = outer_l3_type
>            */
>           idx = ((pinfo & 0x3) << 6) | ((ptype & 0xfc00) >> 10);
>   -       return mlx5_ptype_table[idx];
>   +       return mlx5_ptype_table[idx] | rxq->tunnel * !!(idx & (1 << 6));
>    }
> 
> 
> Used by Rx burst functions,
> 
> /* Update packet information. */
>  pkt->packet_type = rxq_cq_to_pkt_type(cqe);
> 
> Is this correct?

You got the point.

> 
> There is another strange point here,
> 
>  +       [PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)] = RTE_PTYPE_TUNNEL_VXLAN |
>  +                                             RTE_PTYPE_L4_UDP,
> 
> According to the RFC 7348 [1] having a VXLAN with an outer IPv6 is
> possible.  How do you handle it?

The answer was hide in the code you pasted:

    @@ -1601,7 +1605,7 @@ rxq_cq_to_pkt_type(volatile struct mlx5_cqe *cqe)
            * bit[7] = outer_l3_type
            */
           idx = ((pinfo & 0x3) << 6) | ((ptype & 0xfc00) >> 10);
   -       return mlx5_ptype_table[idx];
   +       return mlx5_ptype_table[idx] | rxq->tunnel * !!(idx & (1 << 6));

In comment, Bit 7 is outer L3 type from CQE, PTYPE will be retrieved from 
mlx5_ptype_table lookup.

> 
> > > <snip/>
> > > > > > @@ -2334,9 +2414,9 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > > > struct mlx5_flows *list)  {
> > > > > >  	struct priv *priv = dev->data->dev_private;
> > > > > >  	struct rte_flow *flow;
> > > > > > +	unsigned int i;
> > > > > >
> > > > > >  	TAILQ_FOREACH_REVERSE(flow, list, mlx5_flows, next) {
> > > > > > -		unsigned int i;
> > > > > >  		struct mlx5_ind_table_ibv *ind_tbl = NULL;
> > > > > >
> > > > > >  		if (flow->drop) {
> > > > > > @@ -2382,6 +2462,16 @@ mlx5_flow_stop(struct rte_eth_dev *dev,
> > > > > > struct
> > > > > mlx5_flows *list)
> > > > > >  		DRV_LOG(DEBUG, "port %u flow %p removed", dev->data-
> > > >port_id,
> > > > > >  			(void *)flow);
> > > > > >  	}
> > > > > > +	/* Cleanup Rx queue tunnel info. */
> > > > > > +	for (i = 0; i != priv->rxqs_n; ++i) {
> > > > > > +		struct mlx5_rxq_data *q = (*priv->rxqs)[i];
> > > > > > +		struct mlx5_rxq_ctrl *rxq_ctrl =
> > > > > > +			container_of(q, struct mlx5_rxq_ctrl, rxq);
> > > > > > +
> > > > > > +		memset((void *)rxq_ctrl->tunnel_types, 0,
> > > > > > +		       sizeof(rxq_ctrl->tunnel_types));
> > > > > > +		q->tunnel = 0;
> > > > > > +	}
> > > > > >  }
> > > > >
> > > > > This hunk does not handle the fact the Rx queue array may have
> > > > > some holes i.e. the application is allowed to ask for 10 queues
> > > > > and only initialise some.  In such situation this code will
> segfault.
> > > >
> > > > In other words, "q" could be NULL, correct? I'll add check for this.
> > >
> > > Correct.
> > >
> > > > BTW, there should be an action item to add such check in rss/queue
> > > > flow
> > > creation.
> > >
> > > As it is the responsibility of the application/user to make rule
> > > according to what it has configured, it has not been added.  It can
> > > still be added, but it cannot be considered as a fix.
> > >
> > > > > It should only memset the Rx queues making part of the flow not
> > > > > the
> > > others.
> > > >
> > > > Clean this(decrease tunnel_types counter of each queue) from each
> > > > flow would be time consuming.
> > >
> > > Considering flows are already relying on syscall to communicate with
> > > the kernel, the extra cycles consumption to only clear the queues
> > > making part of this flow is neglectable.
> > >
> > > By the way in the same function the mark is cleared only for the
> > > queues making part of the flow, the same loop can be used to clear
> > > those tunnel informations at the same time.
> > >
> > > > If an error happened, counter will not be cleared and such state
> > > > will impact tunnel type after port start again.
> > >
> > > Unless an implementation error which other kind of them do you fear
> > > to happen?
> >
> > Mark of rxq simply reset to 0, this field is counter, the final target
> > is to clear field value, so my code should be straight forward and
> > error free 😊
> >
> > From a quick look, this function could be much simple that what it is
> today:
> > 1. clean verb flow and hrex where possible, despite of flow type.
> > 2. clean rxq state: mark and tunnel_types.
> 
> Ok.
> 
> Thanks,
> 
> [1]
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.or
> g%2Fpatch%2F37965&data=02%7C01%7Cxuemingl%40mellanox.com%7Cc3de5d7cfc85463
> 41a6808d5a119c916%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63659205449
> 1963568&sdata=Eoq30ySZ8gRhemDG6BDawVqFvWB1gI85GpcYIMwl32Q%3D&reserved=0
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
       [not found]   ` <20180411085529.ecxuku77hg3mkybl@laranjeiro-vm.dev.6wind.com>
@ 2018-04-14 12:25     ` Xueming(Steven) Li
  2018-04-16  7:14       ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-14 12:25 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Wednesday, April 11, 2018 4:55 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> 
> On Tue, Apr 10, 2018 at 09:34:07PM +0800, Xueming Li wrote:
> > Tunnel RSS level of flow RSS action offers user a choice to do RSS
> > hash calculation on inner or outer RSS fields. Testpmd flow command
> examples:
> >
> > GRE flow inner RSS:
> >   flow create 0 ingress pattern eth / ipv4 proto is 47 / gre / end
> > actions rss queues 1 2 end level 1 / end
> >
> > GRE tunnel flow outer RSS:
> >   flow create 0 ingress pattern eth  / ipv4 proto is 47 / gre / end
> > actions rss queues 1 2 end level 0 / end
> >
> > Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > ---
> >  drivers/net/mlx5/Makefile    |   2 +-
> >  drivers/net/mlx5/mlx5_flow.c | 249
> > ++++++++++++++++++++++++++++++-------------
> >  drivers/net/mlx5/mlx5_glue.c |  16 +++
> >  drivers/net/mlx5/mlx5_glue.h |   8 ++
> >  drivers/net/mlx5/mlx5_rxq.c  |  46 +++++++-
> >  drivers/net/mlx5/mlx5_rxtx.h |   5 +-
> >  6 files changed, 246 insertions(+), 80 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
> > index ae118ad33..f9a6c460b 100644
> > --- a/drivers/net/mlx5/Makefile
> > +++ b/drivers/net/mlx5/Makefile
> > @@ -35,7 +35,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  LIB =
> > librte_pmd_mlx5.a  LIB_GLUE = $(LIB_GLUE_BASE).$(LIB_GLUE_VERSION)
> >  LIB_GLUE_BASE = librte_pmd_mlx5_glue.so -LIB_GLUE_VERSION = 18.02.0
> > +LIB_GLUE_VERSION = 18.05.0
> >
> >  # Sources.
> >  SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c diff --git
> > a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index
> > 64658bc0e..66c7d7993 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -113,6 +113,7 @@ enum hash_rxq_type {
> >  	HASH_RXQ_UDPV6,
> >  	HASH_RXQ_IPV6,
> >  	HASH_RXQ_ETH,
> > +	HASH_RXQ_TUNNEL,
> >  };
> >
> >  /* Initialization data for hash RX queue. */ @@ -451,6 +452,7 @@
> > struct mlx5_flow_parse {
> >  	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queues indexes to use.
> */
> >  	uint8_t rss_key[40]; /**< copy of the RSS key. */
> >  	enum hash_rxq_type layer; /**< Last pattern layer detected. */
> > +	enum hash_rxq_type out_layer; /**< Last outer pattern layer
> > +detected. */
> >  	uint32_t tunnel; /**< Tunnel type of RTE_PTYPE_TUNNEL_XXX. */
> >  	struct ibv_counter_set *cs; /**< Holds the counter set for the rule
> */
> >  	struct {
> > @@ -458,6 +460,7 @@ struct mlx5_flow_parse {
> >  		/**< Pointer to Verbs attributes. */
> >  		unsigned int offset;
> >  		/**< Current position or total size of the attribute. */
> > +		uint64_t hash_fields; /**< Verbs hash fields. */
> >  	} queue[RTE_DIM(hash_rxq_init)];
> >  };
> >
> > @@ -698,7 +701,8 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
> >  						   " function is Toeplitz");
> >  				return -rte_errno;
> >  			}
> > -			if (rss->level) {
> > +#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > +			if (parser->rss_conf.level > 0) {
> 
> According to Adrien's API level 0 means do whatever you want and 1 means
> outer.
> This is removing the outer RSS support.
> 
> >  				rte_flow_error_set(error, EINVAL,
> >  						   RTE_FLOW_ERROR_TYPE_ACTION,
> >  						   actions,
> > @@ -706,6 +710,15 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
> >  						   " level is not supported");
> >  				return -rte_errno;
> >  			}
> > +#endif
> > +			if (parser->rss_conf.level > 1) {
> > +				rte_flow_error_set(error, EINVAL,
> > +						   RTE_FLOW_ERROR_TYPE_ACTION,
> > +						   actions,
> > +						   "RSS encapsulation level"
> > +						   " > 1 is not supported");
> > +				return -rte_errno;
> > +			}
> 
> Seems the levels are wrongly used.

Thanks, updated.

> 
> >  			if (rss->types & MLX5_RSS_HF_MASK) {
> >  				rte_flow_error_set(error, EINVAL,
> >  						   RTE_FLOW_ERROR_TYPE_ACTION,
> > @@ -756,7 +769,7 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
> >  			}
> >  			parser->rss_conf = (struct rte_flow_action_rss){
> >  				.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
> > -				.level = 0,
> > +				.level = rss->level,
> >  				.types = rss->types,
> >  				.key_len = rss_key_len,
> >  				.queue_num = rss->queue_num,
> > @@ -842,11 +855,12 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
> >   *   0 on success, a negative errno value otherwise and rte_errno is
> set.
> >   */
> >  static int
> > -mlx5_flow_convert_items_validate(struct rte_eth_dev *dev
> > __rte_unused,
> > +mlx5_flow_convert_items_validate(struct rte_eth_dev *dev,
> >  				 const struct rte_flow_item items[],
> >  				 struct rte_flow_error *error,
> >  				 struct mlx5_flow_parse *parser)
> >  {
> > +	struct priv *priv = dev->data->dev_private;
> >  	const struct mlx5_flow_items *cur_item = mlx5_flow_items;
> >  	unsigned int i;
> >  	int ret = 0;
> > @@ -886,6 +900,14 @@ mlx5_flow_convert_items_validate(struct rte_eth_dev
> *dev __rte_unused,
> >  						   " tunnel encapsulations.");
> >  				return -rte_errno;
> >  			}
> > +			if (!priv->config.tunnel_en &&
> > +			    parser->rss_conf.level) {
> > +				rte_flow_error_set(error, ENOTSUP,
> > +					RTE_FLOW_ERROR_TYPE_ITEM,
> > +					items,
> > +					"Tunnel offloading not enabled");
> 
> I would suggest "RSS on tunnel is not supported".

Thanks, updated.

> 
> > +				return -rte_errno;
> > +			}
> >  			parser->inner = IBV_FLOW_SPEC_INNER;
> >  			parser->tunnel = flow_ptype[items->type];
> >  		}
> > @@ -993,7 +1015,11 @@ static void
> >  mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)  {
> >  	unsigned int i;
> > +	uint32_t inner = parser->inner;
> >
> > +	/* Don't create extra flows for outer RSS. */
> > +	if (parser->tunnel && !parser->rss_conf.level)
> > +		return;
> >  	/* Remove any other flow not matching the pattern. */
> >  	if (parser->rss_conf.queue_num == 1 && !parser->rss_conf.types) {
> >  		for (i = 0; i != hash_rxq_init_n; ++i) { @@ -1014,23 +1040,25
> @@
> > mlx5_flow_convert_finalise(struct mlx5_flow_parse *parser)
> >  			struct ibv_flow_spec_ipv4_ext ipv4;
> >  			struct ibv_flow_spec_ipv6 ipv6;
> >  			struct ibv_flow_spec_tcp_udp udp_tcp;
> > +			struct ibv_flow_spec_eth eth;
> >  		} specs;
> >  		void *dst;
> >  		uint16_t size;
> >
> >  		if (i == parser->layer)
> >  			continue;
> > -		if (parser->layer == HASH_RXQ_ETH) {
> > +		if (parser->layer == HASH_RXQ_ETH ||
> > +		    parser->layer == HASH_RXQ_TUNNEL) {
> >  			if (hash_rxq_init[i].ip_version == MLX5_IPV4) {
> >  				size = sizeof(struct ibv_flow_spec_ipv4_ext);
> >  				specs.ipv4 = (struct ibv_flow_spec_ipv4_ext){
> > -					.type = IBV_FLOW_SPEC_IPV4_EXT,
> > +					.type = inner | IBV_FLOW_SPEC_IPV4_EXT,
> >  					.size = size,
> >  				};
> >  			} else {
> >  				size = sizeof(struct ibv_flow_spec_ipv6);
> >  				specs.ipv6 = (struct ibv_flow_spec_ipv6){
> > -					.type = IBV_FLOW_SPEC_IPV6,
> > +					.type = inner | IBV_FLOW_SPEC_IPV6,
> >  					.size = size,
> >  				};
> >  			}
> > @@ -1047,7 +1075,7 @@ mlx5_flow_convert_finalise(struct mlx5_flow_parse
> *parser)
> >  		    (i == HASH_RXQ_UDPV6) || (i == HASH_RXQ_TCPV6)) {
> >  			size = sizeof(struct ibv_flow_spec_tcp_udp);
> >  			specs.udp_tcp = (struct ibv_flow_spec_tcp_udp) {
> > -				.type = ((i == HASH_RXQ_UDPV4 ||
> > +				.type = inner | ((i == HASH_RXQ_UDPV4 ||
> >  					  i == HASH_RXQ_UDPV6) ?
> >  					 IBV_FLOW_SPEC_UDP :
> >  					 IBV_FLOW_SPEC_TCP),
> > @@ -1068,6 +1096,8 @@ mlx5_flow_convert_finalise(struct
> > mlx5_flow_parse *parser)
> >  /**
> >   * Update flows according to pattern and RSS hash fields.
> >   *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> >   * @param[in, out] parser
> >   *   Internal parser structure.
> >   *
> > @@ -1075,20 +1105,63 @@ mlx5_flow_convert_finalise(struct
> mlx5_flow_parse *parser)
> >   *   0 on success, a negative errno value otherwise and rte_errno is
> set.
> >   */
> >  static int
> > -mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
> > +mlx5_flow_convert_rss(struct rte_eth_dev *dev, struct mlx5_flow_parse
> > +*parser)
> >  {
> > -	const unsigned int ipv4 =
> > +	unsigned int ipv4 =
> >  		hash_rxq_init[parser->layer].ip_version == MLX5_IPV4;
> >  	const enum hash_rxq_type hmin = ipv4 ? HASH_RXQ_TCPV4 :
> HASH_RXQ_TCPV6;
> >  	const enum hash_rxq_type hmax = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> >  	const enum hash_rxq_type ohmin = ipv4 ? HASH_RXQ_TCPV6 :
> HASH_RXQ_TCPV4;
> >  	const enum hash_rxq_type ohmax = ipv4 ? HASH_RXQ_IPV6 :
> HASH_RXQ_IPV4;
> > -	const enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> > +	enum hash_rxq_type ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> >  	unsigned int i;
> > +	int found = 0;
> >
> > -	if (parser->layer == HASH_RXQ_ETH)
> > +	/*
> > +	 * Outer RSS.
> > +	 * HASH_RXQ_ETH is the only rule since tunnel packet match this
> > +	 * rule must match outer pattern.
> > +	 */
> > +	if (parser->tunnel && !parser->rss_conf.level) {
> > +		/* Remove flows other than default. */
> > +		for (i = 0; i != hash_rxq_init_n - 1; ++i) {
> > +			rte_free(parser->queue[i].ibv_attr);
> > +			parser->queue[i].ibv_attr = NULL;
> > +		}
> > +		ipv4 = hash_rxq_init[parser->out_layer].ip_version ==
> MLX5_IPV4;
> > +		ip = ipv4 ? HASH_RXQ_IPV4 : HASH_RXQ_IPV6;
> > +		if (hash_rxq_init[parser->out_layer].dpdk_rss_hf &
> > +		    parser->rss_conf.types) {
> > +			parser->queue[HASH_RXQ_ETH].hash_fields =
> > +				hash_rxq_init[parser->out_layer].hash_fields;
> > +		} else if (ip && (hash_rxq_init[ip].dpdk_rss_hf &
> > +		    parser->rss_conf.types)) {
> > +			parser->queue[HASH_RXQ_ETH].hash_fields =
> > +				hash_rxq_init[ip].hash_fields;
> > +		} else if (parser->rss_conf.types) {
> > +			DRV_LOG(WARNING,
> > +				"port %u rss outer hash function doesn't match"
> > +				" pattern", dev->data->port_id);
> 
> Hash function, what do you mean ?  It seems to be the layers on the ones
> the RSS is configured which does not match the Pattern.
> 
> Sincerely, I see such warning happening I will fully doubt on the fact the
> rule has been taken and applied.
> "port 0 rss outer hash function doesn't match pattern" --> what will
> happen to the packets matching such flow?  Will they be dropped?
> This is not helping at all, so please remove it.
> 
> > +		}
> > +		return 0;
> > +	}
> > +	if (parser->layer == HASH_RXQ_ETH || parser->layer ==
> HASH_RXQ_TUNNEL) {
> > +		/* Remove unused flows according to hash function. */
> > +		for (i = 0; i != hash_rxq_init_n - 1; ++i) {
> > +			if (!parser->queue[i].ibv_attr)
> > +				continue;
> > +			if (hash_rxq_init[i].dpdk_rss_hf &
> > +			    parser->rss_conf.types) {
> > +				parser->queue[i].hash_fields =
> > +					hash_rxq_init[i].hash_fields;
> > +				continue;
> > +			}
> > +			rte_free(parser->queue[i].ibv_attr);
> > +			parser->queue[i].ibv_attr = NULL;
> > +		}
> >  		return 0;
> > -	/* This layer becomes useless as the pattern define under layers. */
> > +	}
> > +	/* Remove ETH layer flow. */
> >  	rte_free(parser->queue[HASH_RXQ_ETH].ibv_attr);
> >  	parser->queue[HASH_RXQ_ETH].ibv_attr = NULL;
> >  	/* Remove opposite kind of layer e.g. IPv6 if the pattern is IPv4.
> > */ @@ -1098,9 +1171,52 @@ mlx5_flow_convert_rss(struct mlx5_flow_parse
> *parser)
> >  		rte_free(parser->queue[i].ibv_attr);
> >  		parser->queue[i].ibv_attr = NULL;
> >  	}
> > -	/* Remove impossible flow according to the RSS configuration. */
> > -	if (hash_rxq_init[parser->layer].dpdk_rss_hf &
> > -	    parser->rss_conf.types) {
> > +	/*
> > +	 * Keep L4 flows as IP pattern has to support L4 RSS.
> > +	 * Otherwise, only keep the flow that match the pattern.
> > +	 */
> 
> This comment is not clear, please re-word it.
> 
> > +	if (parser->layer != ip) {
> > +		/* Only keep the flow that match the pattern. */
> > +		for (i = hmin; i != (hmax + 1); ++i) {
> > +			if (i == parser->layer)
> > +				continue;
> > +			rte_free(parser->queue[i].ibv_attr);
> > +			parser->queue[i].ibv_attr = NULL;
> > +		}
> > +	}
> > +	if (parser->rss_conf.types) {
> > +		/* Remove impossible flow according to the RSS configuration.
> */
> > +		for (i = hmin; i != (hmax + 1); ++i) {
> > +			if (!parser->queue[i].ibv_attr)
> > +				continue;
> > +			if (parser->rss_conf.types &
> > +			    hash_rxq_init[i].dpdk_rss_hf) {
> > +				parser->queue[i].hash_fields =
> > +					hash_rxq_init[i].hash_fields;
> > +				found = 1;
> > +				continue;
> > +			}
> > +			/* L4 flow could be used for L3 RSS. */
> > +			if (i == parser->layer && i < ip &&
> > +			    (hash_rxq_init[ip].dpdk_rss_hf &
> > +			     parser->rss_conf.types)) {
> > +				parser->queue[i].hash_fields =
> > +					hash_rxq_init[ip].hash_fields;
> > +				found = 1;
> > +				continue;
> > +			}
> > +			/* L3 flow and L4 hash: non-rss L3 flow. */
> > +			if (i == parser->layer && i == ip && found)
> > +				/* IP pattern and L4 HF. */
> > +				continue;
> > +			rte_free(parser->queue[i].ibv_attr);
> > +			parser->queue[i].ibv_attr = NULL;
> > +		}
> > +		if (!found)
> > +			DRV_LOG(WARNING,
> > +				"port %u rss hash function doesn't match "
> > +				"pattern", dev->data->port_id);
> 
> Dito.
> 
> > +	} else {
> >  		/* Remove any other flow. */
> >  		for (i = hmin; i != (hmax + 1); ++i) {
> >  			if (i == parser->layer || !parser->queue[i].ibv_attr) @@
> -1108,8
> > +1224,6 @@ mlx5_flow_convert_rss(struct mlx5_flow_parse *parser)
> >  			rte_free(parser->queue[i].ibv_attr);
> >  			parser->queue[i].ibv_attr = NULL;
> >  		}
> > -	} else if (!parser->queue[ip].ibv_attr) {
> > -		/* no RSS possible with the current configuration. */
> >  		parser->rss_conf.queue_num = 1;
> >  	}
> >  	return 0;
> > @@ -1179,10 +1293,6 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> >  		for (i = 0; i != hash_rxq_init_n; ++i) {
> >  			unsigned int offset;
> >
> > -			if (!(parser->rss_conf.types &
> > -			      hash_rxq_init[i].dpdk_rss_hf) &&
> > -			    (i != HASH_RXQ_ETH))
> > -				continue;
> >  			offset = parser->queue[i].offset;
> >  			parser->queue[i].ibv_attr =
> >  				mlx5_flow_convert_allocate(offset, error); @@ -
> 1194,6 +1304,7 @@
> > mlx5_flow_convert(struct rte_eth_dev *dev,
> >  	/* Third step. Conversion parse, fill the specifications. */
> >  	parser->inner = 0;
> >  	parser->tunnel = 0;
> > +	parser->layer = HASH_RXQ_ETH;
> >  	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
> >  		struct mlx5_flow_data data = {
> >  			.parser = parser,
> > @@ -1211,23 +1322,23 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> >  		if (ret)
> >  			goto exit_free;
> >  	}
> > -	if (parser->mark)
> > -		mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > -	if (parser->count && parser->create) {
> > -		mlx5_flow_create_count(dev, parser);
> > -		if (!parser->cs)
> > -			goto exit_count_error;
> > -	}
> >  	/*
> >  	 * Last step. Complete missing specification to reach the RSS
> >  	 * configuration.
> >  	 */
> >  	if (!parser->drop)
> > -		ret = mlx5_flow_convert_rss(parser);
> > +		ret = mlx5_flow_convert_rss(dev, parser);
> >  		if (ret)
> >  			goto exit_free;
> >  		mlx5_flow_convert_finalise(parser);
> >  	mlx5_flow_update_priority(dev, parser, attr);
> > +	if (parser->mark)
> > +		mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > +	if (parser->count && parser->create) {
> > +		mlx5_flow_create_count(dev, parser);
> > +		if (!parser->cs)
> > +			goto exit_count_error;
> > +	}
> 
> Why do you need to move this code?

To avoid counter resource missing if anything wrong in function 
mlx5_flow_convert_rss().

> 
> >  exit_free:
> >  	/* Only verification is expected, all resources should be released.
> */
> >  	if (!parser->create) {
> > @@ -1275,17 +1386,11 @@ mlx5_flow_create_copy(struct mlx5_flow_parse
> *parser, void *src,
> >  	for (i = 0; i != hash_rxq_init_n; ++i) {
> >  		if (!parser->queue[i].ibv_attr)
> >  			continue;
> > -		/* Specification must be the same l3 type or none. */
> > -		if (parser->layer == HASH_RXQ_ETH ||
> > -		    (hash_rxq_init[parser->layer].ip_version ==
> > -		     hash_rxq_init[i].ip_version) ||
> > -		    (hash_rxq_init[i].ip_version == 0)) {
> > -			dst = (void *)((uintptr_t)parser->queue[i].ibv_attr +
> > -					parser->queue[i].offset);
> > -			memcpy(dst, src, size);
> > -			++parser->queue[i].ibv_attr->num_of_specs;
> > -			parser->queue[i].offset += size;
> > -		}
> > +		dst = (void *)((uintptr_t)parser->queue[i].ibv_attr +
> > +				parser->queue[i].offset);
> > +		memcpy(dst, src, size);
> > +		++parser->queue[i].ibv_attr->num_of_specs;
> > +		parser->queue[i].offset += size;
> >  	}
> >  }
> >
> > @@ -1316,9 +1421,7 @@ mlx5_flow_create_eth(const struct rte_flow_item
> *item,
> >  		.size = eth_size,
> >  	};
> >
> > -	/* Don't update layer for the inner pattern. */
> > -	if (!parser->inner)
> > -		parser->layer = HASH_RXQ_ETH;
> > +	parser->layer = HASH_RXQ_ETH;
> >  	if (spec) {
> >  		unsigned int i;
> >
> > @@ -1431,9 +1534,7 @@ mlx5_flow_create_ipv4(const struct rte_flow_item
> *item,
> >  		.size = ipv4_size,
> >  	};
> >
> > -	/* Don't update layer for the inner pattern. */
> > -	if (!parser->inner)
> > -		parser->layer = HASH_RXQ_IPV4;
> > +	parser->layer = HASH_RXQ_IPV4;
> >  	if (spec) {
> >  		if (!mask)
> >  			mask = default_mask;
> > @@ -1486,9 +1587,7 @@ mlx5_flow_create_ipv6(const struct rte_flow_item
> *item,
> >  		.size = ipv6_size,
> >  	};
> >
> > -	/* Don't update layer for the inner pattern. */
> > -	if (!parser->inner)
> > -		parser->layer = HASH_RXQ_IPV6;
> > +	parser->layer = HASH_RXQ_IPV6;
> >  	if (spec) {
> >  		unsigned int i;
> >  		uint32_t vtc_flow_val;
> > @@ -1561,13 +1660,10 @@ mlx5_flow_create_udp(const struct rte_flow_item
> *item,
> >  		.size = udp_size,
> >  	};
> >
> > -	/* Don't update layer for the inner pattern. */
> > -	if (!parser->inner) {
> > -		if (parser->layer == HASH_RXQ_IPV4)
> > -			parser->layer = HASH_RXQ_UDPV4;
> > -		else
> > -			parser->layer = HASH_RXQ_UDPV6;
> > -	}
> > +	if (parser->layer == HASH_RXQ_IPV4)
> > +		parser->layer = HASH_RXQ_UDPV4;
> > +	else
> > +		parser->layer = HASH_RXQ_UDPV6;
> >  	if (spec) {
> >  		if (!mask)
> >  			mask = default_mask;
> > @@ -1610,13 +1706,10 @@ mlx5_flow_create_tcp(const struct rte_flow_item
> *item,
> >  		.size = tcp_size,
> >  	};
> >
> > -	/* Don't update layer for the inner pattern. */
> > -	if (!parser->inner) {
> > -		if (parser->layer == HASH_RXQ_IPV4)
> > -			parser->layer = HASH_RXQ_TCPV4;
> > -		else
> > -			parser->layer = HASH_RXQ_TCPV6;
> > -	}
> > +	if (parser->layer == HASH_RXQ_IPV4)
> > +		parser->layer = HASH_RXQ_TCPV4;
> > +	else
> > +		parser->layer = HASH_RXQ_TCPV6;
> >  	if (spec) {
> >  		if (!mask)
> >  			mask = default_mask;
> > @@ -1666,6 +1759,8 @@ mlx5_flow_create_vxlan(const struct rte_flow_item
> *item,
> >  	id.vni[0] = 0;
> >  	parser->inner = IBV_FLOW_SPEC_INNER;
> >  	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_VXLAN)];
> > +	parser->out_layer = parser->layer;
> > +	parser->layer = HASH_RXQ_TUNNEL;
> >  	if (spec) {
> >  		if (!mask)
> >  			mask = default_mask;
> > @@ -1720,6 +1815,8 @@ mlx5_flow_create_gre(const struct rte_flow_item
> > *item __rte_unused,
> >
> >  	parser->inner = IBV_FLOW_SPEC_INNER;
> >  	parser->tunnel = ptype_ext[PTYPE_IDX(RTE_PTYPE_TUNNEL_GRE)];
> > +	parser->out_layer = parser->layer;
> > +	parser->layer = HASH_RXQ_TUNNEL;
> >  	mlx5_flow_create_copy(parser, &tunnel, size);
> >  	return 0;
> >  }
> > @@ -1883,33 +1980,33 @@ mlx5_flow_create_action_queue_rss(struct
> rte_eth_dev *dev,
> >  	unsigned int i;
> >
> >  	for (i = 0; i != hash_rxq_init_n; ++i) {
> > -		uint64_t hash_fields;
> > -
> >  		if (!parser->queue[i].ibv_attr)
> >  			continue;
> >  		flow->frxq[i].ibv_attr = parser->queue[i].ibv_attr;
> >  		parser->queue[i].ibv_attr = NULL;
> > -		hash_fields = hash_rxq_init[i].hash_fields;
> > +		flow->frxq[i].hash_fields = parser->queue[i].hash_fields;
> >  		if (!priv->dev->data->dev_started)
> >  			continue;
> >  		flow->frxq[i].hrxq =
> >  			mlx5_hrxq_get(dev,
> >  				      parser->rss_conf.key,
> >  				      parser->rss_conf.key_len,
> > -				      hash_fields,
> > +				      flow->frxq[i].hash_fields,
> >  				      parser->rss_conf.queue,
> >  				      parser->rss_conf.queue_num,
> > -				      parser->tunnel);
> > +				      parser->tunnel,
> > +				      parser->rss_conf.level);
> >  		if (flow->frxq[i].hrxq)
> >  			continue;
> >  		flow->frxq[i].hrxq =
> >  			mlx5_hrxq_new(dev,
> >  				      parser->rss_conf.key,
> >  				      parser->rss_conf.key_len,
> > -				      hash_fields,
> > +				      flow->frxq[i].hash_fields,
> >  				      parser->rss_conf.queue,
> >  				      parser->rss_conf.queue_num,
> > -				      parser->tunnel);
> > +				      parser->tunnel,
> > +				      parser->rss_conf.level);
> >  		if (!flow->frxq[i].hrxq) {
> >  			return rte_flow_error_set(error, ENOMEM,
> >  						  RTE_FLOW_ERROR_TYPE_HANDLE,
> > @@ -2006,7 +2103,7 @@ mlx5_flow_create_action_queue(struct rte_eth_dev
> *dev,
> >  		DRV_LOG(DEBUG, "port %u %p type %d QP %p ibv_flow %p",
> >  			dev->data->port_id,
> >  			(void *)flow, i,
> > -			(void *)flow->frxq[i].hrxq,
> > +			(void *)flow->frxq[i].hrxq->qp,
> >  			(void *)flow->frxq[i].ibv_flow);
> >  	}
> >  	if (!flows_n) {
> > @@ -2532,19 +2629,21 @@ mlx5_flow_start(struct rte_eth_dev *dev, struct
> mlx5_flows *list)
> >  			flow->frxq[i].hrxq =
> >  				mlx5_hrxq_get(dev, flow->rss_conf.key,
> >  					      flow->rss_conf.key_len,
> > -					      hash_rxq_init[i].hash_fields,
> > +					      flow->frxq[i].hash_fields,
> >  					      flow->rss_conf.queue,
> >  					      flow->rss_conf.queue_num,
> > -					      flow->tunnel);
> > +					      flow->tunnel,
> > +					      flow->rss_conf.level);
> >  			if (flow->frxq[i].hrxq)
> >  				goto flow_create;
> >  			flow->frxq[i].hrxq =
> >  				mlx5_hrxq_new(dev, flow->rss_conf.key,
> >  					      flow->rss_conf.key_len,
> > -					      hash_rxq_init[i].hash_fields,
> > +					      flow->frxq[i].hash_fields,
> >  					      flow->rss_conf.queue,
> >  					      flow->rss_conf.queue_num,
> > -					      flow->tunnel);
> > +					      flow->tunnel,
> > +					      flow->rss_conf.level);
> >  			if (!flow->frxq[i].hrxq) {
> >  				DRV_LOG(DEBUG,
> >  					"port %u flow %p cannot be applied", diff --
> git
> > a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c index
> > be684d378..6874aa32a 100644
> > --- a/drivers/net/mlx5/mlx5_glue.c
> > +++ b/drivers/net/mlx5/mlx5_glue.c
> > @@ -313,6 +313,21 @@ mlx5_glue_dv_init_obj(struct mlx5dv_obj *obj,
> uint64_t obj_type)
> >  	return mlx5dv_init_obj(obj, obj_type);  }
> >
> > +static struct ibv_qp *
> > +mlx5_glue_dv_create_qp(struct ibv_context *context,
> > +		       struct ibv_qp_init_attr_ex *qp_init_attr_ex,
> > +		       struct mlx5dv_qp_init_attr *dv_qp_init_attr) { #ifdef
> > +HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > +	return mlx5dv_create_qp(context, qp_init_attr_ex, dv_qp_init_attr);
> > +#else
> > +	(void)context;
> > +	(void)qp_init_attr_ex;
> > +	(void)dv_qp_init_attr;
> > +	return NULL;
> > +#endif
> > +}
> > +
> >  const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
> >  	.version = MLX5_GLUE_VERSION,
> >  	.fork_init = mlx5_glue_fork_init,
> > @@ -356,4 +371,5 @@ const struct mlx5_glue *mlx5_glue = &(const struct
> mlx5_glue){
> >  	.dv_query_device = mlx5_glue_dv_query_device,
> >  	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
> >  	.dv_init_obj = mlx5_glue_dv_init_obj,
> > +	.dv_create_qp = mlx5_glue_dv_create_qp,
> >  };
> > diff --git a/drivers/net/mlx5/mlx5_glue.h
> > b/drivers/net/mlx5/mlx5_glue.h index b5efee3b6..841363872 100644
> > --- a/drivers/net/mlx5/mlx5_glue.h
> > +++ b/drivers/net/mlx5/mlx5_glue.h
> > @@ -31,6 +31,10 @@ struct ibv_counter_set_init_attr;  struct
> > ibv_query_counter_set_attr;  #endif
> >
> > +#ifndef HAVE_IBV_DEVICE_TUNNEL_SUPPORT struct mlx5dv_qp_init_attr;
> > +#endif
> > +
> >  /* LIB_GLUE_VERSION must be updated every time this structure is
> > modified. */  struct mlx5_glue {
> >  	const char *version;
> > @@ -106,6 +110,10 @@ struct mlx5_glue {
> >  				   enum mlx5dv_set_ctx_attr_type type,
> >  				   void *attr);
> >  	int (*dv_init_obj)(struct mlx5dv_obj *obj, uint64_t obj_type);
> > +	struct ibv_qp *(*dv_create_qp)
> > +		(struct ibv_context *context,
> > +		 struct ibv_qp_init_attr_ex *qp_init_attr_ex,
> > +		 struct mlx5dv_qp_init_attr *dv_qp_init_attr);
> >  };
> >
> >  const struct mlx5_glue *mlx5_glue;
> > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > index 073732e16..6e5565fb2 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
> >   *   Number of queues.
> >   * @param tunnel
> >   *   Tunnel type.
> > + * @param rss_level
> > + *   RSS hash on tunnel level.
> >   *
> >   * @return
> >   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
> > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > rte_eth_dev *dev,
> >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> >  	      uint64_t hash_fields,
> > -	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
> > +	      const uint16_t *queues, uint32_t queues_n,
> > +	      uint32_t tunnel, uint32_t rss_level)
> 
> tunnel and rss_level seems to be redundant here.
> 
> rss_level > 1 is equivalent to tunnel, there is no need to have both.

There is a case of tunnel and outer rss(1).

> 
> >  {
> >  	struct priv *priv = dev->data->dev_private;
> >  	struct mlx5_hrxq *hrxq;
> >  	struct mlx5_ind_table_ibv *ind_tbl;
> >  	struct ibv_qp *qp;
> >  	int err;
> > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > +	struct mlx5dv_qp_init_attr qp_init_attr = {0}; #endif
> >
> >  	queues_n = hash_fields ? queues_n : 1;
> >  	ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n); @@ -1410,6
> > +1416,33 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> >  		rte_errno = ENOMEM;
> >  		return NULL;
> >  	}
> > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > +	if (tunnel) {
> > +		qp_init_attr.comp_mask =
> > +				MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS;
> > +		qp_init_attr.create_flags = MLX5DV_QP_CREATE_TUNNEL_OFFLOADS;
> > +	}
> > +	qp = mlx5_glue->dv_create_qp(
> > +		priv->ctx,
> > +		&(struct ibv_qp_init_attr_ex){
> > +			.qp_type = IBV_QPT_RAW_PACKET,
> > +			.comp_mask =
> > +				IBV_QP_INIT_ATTR_PD |
> > +				IBV_QP_INIT_ATTR_IND_TABLE |
> > +				IBV_QP_INIT_ATTR_RX_HASH,
> > +			.rx_hash_conf = (struct ibv_rx_hash_conf){
> > +				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
> > +				.rx_hash_key_len = rss_key_len,
> > +				.rx_hash_key = (void *)(uintptr_t)rss_key,
> > +				.rx_hash_fields_mask = hash_fields |
> > +					(tunnel && rss_level ?
> > +					(uint32_t)IBV_RX_HASH_INNER : 0),
> > +			},
> > +			.rwq_ind_tbl = ind_tbl->ind_table,
> > +			.pd = priv->pd,
> > +		},
> > +		&qp_init_attr);
> > +#else
> >  	qp = mlx5_glue->create_qp_ex
> >  		(priv->ctx,
> >  		 &(struct ibv_qp_init_attr_ex){
> > @@ -1427,6 +1460,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> >  			.rwq_ind_tbl = ind_tbl->ind_table,
> >  			.pd = priv->pd,
> >  		 });
> > +#endif
> >  	if (!qp) {
> >  		rte_errno = errno;
> >  		goto error;
> > @@ -1439,6 +1473,7 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> >  	hrxq->rss_key_len = rss_key_len;
> >  	hrxq->hash_fields = hash_fields;
> >  	hrxq->tunnel = tunnel;
> > +	hrxq->rss_level = rss_level;
> >  	memcpy(hrxq->rss_key, rss_key, rss_key_len);
> >  	rte_atomic32_inc(&hrxq->refcnt);
> >  	LIST_INSERT_HEAD(&priv->hrxqs, hrxq, next); @@ -1448,6 +1483,8 @@
> > mlx5_hrxq_new(struct rte_eth_dev *dev,
> >  	return hrxq;
> >  error:
> >  	err = rte_errno; /* Save rte_errno before cleanup. */
> > +	DRV_LOG(ERR, "port %u: Error creating Hash Rx queue",
> > +		dev->data->port_id);
> 
> Developer log, please remove it, for the user the flow won't be created
> with the correct error reported.

Removed, there was a log in caller side.

> 
> >  	mlx5_ind_table_ibv_release(dev, ind_tbl);
> >  	if (qp)
> >  		claim_zero(mlx5_glue->destroy_qp(qp));
> > @@ -1469,6 +1506,8 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> >   *   Number of queues.
> >   * @param tunnel
> >   *   Tunnel type.
> > + * @param rss_level
> > + *   RSS hash on tunnel level
> >   *
> >   * @return
> >   *   An hash Rx queue on success.
> > @@ -1477,7 +1516,8 @@ struct mlx5_hrxq *  mlx5_hrxq_get(struct
> > rte_eth_dev *dev,
> >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> >  	      uint64_t hash_fields,
> > -	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
> > +	      const uint16_t *queues, uint32_t queues_n,
> > +	      uint32_t tunnel, uint32_t rss_level)
> 
> Dito.
> 
> >  {
> >  	struct priv *priv = dev->data->dev_private;
> >  	struct mlx5_hrxq *hrxq;
> > @@ -1494,6 +1534,8 @@ mlx5_hrxq_get(struct rte_eth_dev *dev,
> >  			continue;
> >  		if (hrxq->tunnel != tunnel)
> >  			continue;
> > +		if (hrxq->rss_level != rss_level)
> > +			continue;
> >  		ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n);
> >  		if (!ind_tbl)
> >  			continue;
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.h
> > b/drivers/net/mlx5/mlx5_rxtx.h index d35605b55..62cf55109 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx.h
> > @@ -147,6 +147,7 @@ struct mlx5_hrxq {
> >  	struct ibv_qp *qp; /* Verbs queue pair. */
> >  	uint64_t hash_fields; /* Verbs Hash fields. */
> >  	uint32_t tunnel; /* Tunnel type. */
> > +	uint32_t rss_level; /* RSS on tunnel level. */
> >  	uint32_t rss_key_len; /* Hash key length in bytes. */
> >  	uint8_t rss_key[]; /* Hash key. */
> >  };
> > @@ -251,12 +252,12 @@ struct mlx5_hrxq *mlx5_hrxq_new(struct rte_eth_dev
> *dev,
> >  				const uint8_t *rss_key, uint32_t rss_key_len,
> >  				uint64_t hash_fields,
> >  				const uint16_t *queues, uint32_t queues_n,
> > -				uint32_t tunnel);
> > +				uint32_t tunnel, uint32_t rss_level);
> >  struct mlx5_hrxq *mlx5_hrxq_get(struct rte_eth_dev *dev,
> >  				const uint8_t *rss_key, uint32_t rss_key_len,
> >  				uint64_t hash_fields,
> >  				const uint16_t *queues, uint32_t queues_n,
> > -				uint32_t tunnel);
> > +				uint32_t tunnel, uint32_t rss_level);
> >  int mlx5_hrxq_release(struct rte_eth_dev *dev, struct mlx5_hrxq
> > *hxrq);  int mlx5_hrxq_ibv_verify(struct rte_eth_dev *dev);  uint64_t
> > mlx5_get_rx_port_offloads(void);
> > --
> > 2.13.3
> >
> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
  2018-04-14 12:25     ` Xueming(Steven) Li
@ 2018-04-16  7:14       ` Nélio Laranjeiro
  2018-04-16  7:46         ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-16  7:14 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Sat, Apr 14, 2018 at 12:25:12PM +0000, Xueming(Steven) Li wrote:
>[...]
> > > @@ -1211,23 +1322,23 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > >  		if (ret)
> > >  			goto exit_free;
> > >  	}
> > > -	if (parser->mark)
> > > -		mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > -	if (parser->count && parser->create) {
> > > -		mlx5_flow_create_count(dev, parser);
> > > -		if (!parser->cs)
> > > -			goto exit_count_error;
> > > -	}
> > >  	/*
> > >  	 * Last step. Complete missing specification to reach the RSS
> > >  	 * configuration.
> > >  	 */
> > >  	if (!parser->drop)
> > > -		ret = mlx5_flow_convert_rss(parser);
> > > +		ret = mlx5_flow_convert_rss(dev, parser);
> > >  		if (ret)
> > >  			goto exit_free;
> > >  		mlx5_flow_convert_finalise(parser);
> > >  	mlx5_flow_update_priority(dev, parser, attr);
> > > +	if (parser->mark)
> > > +		mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > +	if (parser->count && parser->create) {
> > > +		mlx5_flow_create_count(dev, parser);
> > > +		if (!parser->cs)
> > > +			goto exit_count_error;
> > > +	}
> > 
> > Why do you need to move this code?
> 
> To avoid counter resource missing if anything wrong in function 
> mlx5_flow_convert_rss().

Why this modification is addressed in this patch, why should it it be in
the patch introducing the mlx5_flow_convert_rss()?

>[...]
> > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev *dev)
> > >   *   Number of queues.
> > >   * @param tunnel
> > >   *   Tunnel type.
> > > + * @param rss_level
> > > + *   RSS hash on tunnel level.
> > >   *
> > >   * @return
> > >   *   The Verbs object initialised, NULL otherwise and rte_errno is set.
> > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > > rte_eth_dev *dev,
> > >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> > >  	      uint64_t hash_fields,
> > > -	      const uint16_t *queues, uint32_t queues_n, uint32_t tunnel)
> > > +	      const uint16_t *queues, uint32_t queues_n,
> > > +	      uint32_t tunnel, uint32_t rss_level)
> > 
> > tunnel and rss_level seems to be redundant here.
> > 
> > rss_level > 1 is equivalent to tunnel, there is no need to have both.
> 
> There is a case of tunnel and outer rss(1).

Why cannot it be handled by a regular Hash Rx queue, i.e. what is the
benefit of creating a tunnel hash Rx queue to make the same job as a
legacy one?

See below,

> > >  {
> > >  	struct priv *priv = dev->data->dev_private;
> > >  	struct mlx5_hrxq *hrxq;
> > >  	struct mlx5_ind_table_ibv *ind_tbl;
> > >  	struct ibv_qp *qp;
> > >  	int err;
> > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > +	struct mlx5dv_qp_init_attr qp_init_attr = {0}; #endif
> > >
> > >  	queues_n = hash_fields ? queues_n : 1;
> > >  	ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n); @@ -1410,6
> > > +1416,33 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> > >  		rte_errno = ENOMEM;
> > >  		return NULL;
> > >  	}
> > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > +	if (tunnel) {

Why not: if (rss_level > 1) ?

> > > +		qp_init_attr.comp_mask =
> > > +				MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS;
> > > +		qp_init_attr.create_flags = MLX5DV_QP_CREATE_TUNNEL_OFFLOADS;
> > > +	}
> > > +	qp = mlx5_glue->dv_create_qp(
> > > +		priv->ctx,
> > > +		&(struct ibv_qp_init_attr_ex){
> > > +			.qp_type = IBV_QPT_RAW_PACKET,
> > > +			.comp_mask =
> > > +				IBV_QP_INIT_ATTR_PD |
> > > +				IBV_QP_INIT_ATTR_IND_TABLE |
> > > +				IBV_QP_INIT_ATTR_RX_HASH,
> > > +			.rx_hash_conf = (struct ibv_rx_hash_conf){
> > > +				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
> > > +				.rx_hash_key_len = rss_key_len,
> > > +				.rx_hash_key = (void *)(uintptr_t)rss_key,
> > > +				.rx_hash_fields_mask = hash_fields |
> > > +					(tunnel && rss_level ?
> > > +					(uint32_t)IBV_RX_HASH_INNER : 0),
>[...]

 .rx_hash_fields_mask = hash_fields |
 (rss_level > 1) ?
 (uint32_t)IBV_RX_HASH_INNER : 0),

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
  2018-04-16  7:14       ` Nélio Laranjeiro
@ 2018-04-16  7:46         ` Xueming(Steven) Li
  2018-04-16  8:09           ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-16  7:46 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Monday, April 16, 2018 3:14 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> 
> On Sat, Apr 14, 2018 at 12:25:12PM +0000, Xueming(Steven) Li wrote:
> >[...]
> > > > @@ -1211,23 +1322,23 @@ mlx5_flow_convert(struct rte_eth_dev *dev,
> > > >  		if (ret)
> > > >  			goto exit_free;
> > > >  	}
> > > > -	if (parser->mark)
> > > > -		mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > > -	if (parser->count && parser->create) {
> > > > -		mlx5_flow_create_count(dev, parser);
> > > > -		if (!parser->cs)
> > > > -			goto exit_count_error;
> > > > -	}
> > > >  	/*
> > > >  	 * Last step. Complete missing specification to reach the RSS
> > > >  	 * configuration.
> > > >  	 */
> > > >  	if (!parser->drop)
> > > > -		ret = mlx5_flow_convert_rss(parser);
> > > > +		ret = mlx5_flow_convert_rss(dev, parser);
> > > >  		if (ret)
> > > >  			goto exit_free;
> > > >  		mlx5_flow_convert_finalise(parser);
> > > >  	mlx5_flow_update_priority(dev, parser, attr);
> > > > +	if (parser->mark)
> > > > +		mlx5_flow_create_flag_mark(parser, parser->mark_id);
> > > > +	if (parser->count && parser->create) {
> > > > +		mlx5_flow_create_count(dev, parser);
> > > > +		if (!parser->cs)
> > > > +			goto exit_count_error;
> > > > +	}
> > >
> > > Why do you need to move this code?
> >
> > To avoid counter resource missing if anything wrong in function
> > mlx5_flow_convert_rss().
> 
> Why this modification is addressed in this patch, why should it it be in
> the patch introducing the mlx5_flow_convert_rss()?

Good catch, I'll update.
> 
> >[...]
> > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev
> *dev)
> > > >   *   Number of queues.
> > > >   * @param tunnel
> > > >   *   Tunnel type.
> > > > + * @param rss_level
> > > > + *   RSS hash on tunnel level.
> > > >   *
> > > >   * @return
> > > >   *   The Verbs object initialised, NULL otherwise and rte_errno is
> set.
> > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > > > rte_eth_dev *dev,
> > > >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> > > >  	      uint64_t hash_fields,
> > > > -	      const uint16_t *queues, uint32_t queues_n, uint32_t
> tunnel)
> > > > +	      const uint16_t *queues, uint32_t queues_n,
> > > > +	      uint32_t tunnel, uint32_t rss_level)
> > >
> > > tunnel and rss_level seems to be redundant here.
> > >
> > > rss_level > 1 is equivalent to tunnel, there is no need to have both.
> >
> > There is a case of tunnel and outer rss(1).
> 
> Why cannot it be handled by a regular Hash Rx queue, i.e. what is the
> benefit of creating a tunnel hash Rx queue to make the same job as a
> legacy one?

Tunnel checksum, ptype and rss offloading demand a QP to be created by DV api with
tunnel offload flags.

> 
> See below,
> 
> > > >  {
> > > >  	struct priv *priv = dev->data->dev_private;
> > > >  	struct mlx5_hrxq *hrxq;
> > > >  	struct mlx5_ind_table_ibv *ind_tbl;
> > > >  	struct ibv_qp *qp;
> > > >  	int err;
> > > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > > +	struct mlx5dv_qp_init_attr qp_init_attr = {0}; #endif
> > > >
> > > >  	queues_n = hash_fields ? queues_n : 1;
> > > >  	ind_tbl = mlx5_ind_table_ibv_get(dev, queues, queues_n); @@
> > > > -1410,6
> > > > +1416,33 @@ mlx5_hrxq_new(struct rte_eth_dev *dev,
> > > >  		rte_errno = ENOMEM;
> > > >  		return NULL;
> > > >  	}
> > > > +#ifdef HAVE_IBV_DEVICE_TUNNEL_SUPPORT
> > > > +	if (tunnel) {
> 
> Why not: if (rss_level > 1) ?

Besides rss, ptype and checksum has to take advantage of tunnel offloading.

> 
> > > > +		qp_init_attr.comp_mask =
> > > > +				MLX5DV_QP_INIT_ATTR_MASK_QP_CREATE_FLAGS;
> > > > +		qp_init_attr.create_flags =
> MLX5DV_QP_CREATE_TUNNEL_OFFLOADS;
> > > > +	}
> > > > +	qp = mlx5_glue->dv_create_qp(
> > > > +		priv->ctx,
> > > > +		&(struct ibv_qp_init_attr_ex){
> > > > +			.qp_type = IBV_QPT_RAW_PACKET,
> > > > +			.comp_mask =
> > > > +				IBV_QP_INIT_ATTR_PD |
> > > > +				IBV_QP_INIT_ATTR_IND_TABLE |
> > > > +				IBV_QP_INIT_ATTR_RX_HASH,
> > > > +			.rx_hash_conf = (struct ibv_rx_hash_conf){
> > > > +				.rx_hash_function =
> IBV_RX_HASH_FUNC_TOEPLITZ,
> > > > +				.rx_hash_key_len = rss_key_len,
> > > > +				.rx_hash_key = (void *)(uintptr_t)rss_key,
> > > > +				.rx_hash_fields_mask = hash_fields |
> > > > +					(tunnel && rss_level ?
> > > > +					(uint32_t)IBV_RX_HASH_INNER : 0),
> >[...]
> 
>  .rx_hash_fields_mask = hash_fields |
>  (rss_level > 1) ?
>  (uint32_t)IBV_RX_HASH_INNER : 0),

Thanks, rss_level has been fixed according new rule.

> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
  2018-04-16  7:46         ` Xueming(Steven) Li
@ 2018-04-16  8:09           ` Nélio Laranjeiro
  2018-04-16 10:06             ` Xueming(Steven) Li
  0 siblings, 1 reply; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-16  8:09 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Mon, Apr 16, 2018 at 07:46:08AM +0000, Xueming(Steven) Li wrote:
>[...]
> > > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct rte_eth_dev
> > *dev)
> > > > >   *   Number of queues.
> > > > >   * @param tunnel
> > > > >   *   Tunnel type.
> > > > > + * @param rss_level
> > > > > + *   RSS hash on tunnel level.
> > > > >   *
> > > > >   * @return
> > > > >   *   The Verbs object initialised, NULL otherwise and rte_errno is
> > set.
> > > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *  mlx5_hrxq_new(struct
> > > > > rte_eth_dev *dev,
> > > > >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> > > > >  	      uint64_t hash_fields,
> > > > > -	      const uint16_t *queues, uint32_t queues_n, uint32_t
> > tunnel)
> > > > > +	      const uint16_t *queues, uint32_t queues_n,
> > > > > +	      uint32_t tunnel, uint32_t rss_level)
> > > >
> > > > tunnel and rss_level seems to be redundant here.
> > > >
> > > > rss_level > 1 is equivalent to tunnel, there is no need to have both.
> > >
> > > There is a case of tunnel and outer rss(1).
> > 
> > Why cannot it be handled by a regular Hash Rx queue, i.e. what is the
> > benefit of creating a tunnel hash Rx queue to make the same job as a
> > legacy one?
> 
> Tunnel checksum, ptype and rss offloading demand a QP to be created by DV api with
> tunnel offload flags.

I was expecting such answer, such information should be present in the
function documentation, can you add it?

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
  2018-04-16  8:09           ` Nélio Laranjeiro
@ 2018-04-16 10:06             ` Xueming(Steven) Li
  2018-04-16 12:27               ` Nélio Laranjeiro
  0 siblings, 1 reply; 43+ messages in thread
From: Xueming(Steven) Li @ 2018-04-16 10:06 UTC (permalink / raw)
  To: Nélio Laranjeiro; +Cc: Shahaf Shuler, dev



> -----Original Message-----
> From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> Sent: Monday, April 16, 2018 4:09 PM
> To: Xueming(Steven) Li <xuemingl@mellanox.com>
> Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> 
> On Mon, Apr 16, 2018 at 07:46:08AM +0000, Xueming(Steven) Li wrote:
> >[...]
> > > > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct
> > > > > > rte_eth_dev
> > > *dev)
> > > > > >   *   Number of queues.
> > > > > >   * @param tunnel
> > > > > >   *   Tunnel type.
> > > > > > + * @param rss_level
> > > > > > + *   RSS hash on tunnel level.
> > > > > >   *
> > > > > >   * @return
> > > > > >   *   The Verbs object initialised, NULL otherwise and rte_errno
> is
> > > set.
> > > > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *
> > > > > > mlx5_hrxq_new(struct rte_eth_dev *dev,
> > > > > >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> > > > > >  	      uint64_t hash_fields,
> > > > > > -	      const uint16_t *queues, uint32_t queues_n, uint32_t
> > > tunnel)
> > > > > > +	      const uint16_t *queues, uint32_t queues_n,
> > > > > > +	      uint32_t tunnel, uint32_t rss_level)
> > > > >
> > > > > tunnel and rss_level seems to be redundant here.
> > > > >
> > > > > rss_level > 1 is equivalent to tunnel, there is no need to have
> both.
> > > >
> > > > There is a case of tunnel and outer rss(1).
> > >
> > > Why cannot it be handled by a regular Hash Rx queue, i.e. what is
> > > the benefit of creating a tunnel hash Rx queue to make the same job
> > > as a legacy one?
> >
> > Tunnel checksum, ptype and rss offloading demand a QP to be created by
> > DV api with tunnel offload flags.
> 
> I was expecting such answer, such information should be present in the
> function documentation, can you add it?

You mean https://dpdk.org/doc/guides/nics/overview.html?
"Inner L3 checksum" and "Inner L4 checksum" defined. 
I added "Inner RSS" per your suggestion, The only thing missing is 
"Innner packet type", make sense?

> 
> Thanks,
> 
> --
> Nélio Laranjeiro
> 6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
  2018-04-16 10:06             ` Xueming(Steven) Li
@ 2018-04-16 12:27               ` Nélio Laranjeiro
  0 siblings, 0 replies; 43+ messages in thread
From: Nélio Laranjeiro @ 2018-04-16 12:27 UTC (permalink / raw)
  To: Xueming(Steven) Li; +Cc: Shahaf Shuler, dev

On Mon, Apr 16, 2018 at 10:06:06AM +0000, Xueming(Steven) Li wrote:
> 
> 
> > -----Original Message-----
> > From: Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Sent: Monday, April 16, 2018 4:09 PM
> > To: Xueming(Steven) Li <xuemingl@mellanox.com>
> > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > Subject: Re: [PATCH v2 07/15] net/mlx5: support tunnel RSS level
> > 
> > On Mon, Apr 16, 2018 at 07:46:08AM +0000, Xueming(Steven) Li wrote:
> > >[...]
> > > > > > > @@ -1386,6 +1386,8 @@ mlx5_ind_table_ibv_verify(struct
> > > > > > > rte_eth_dev
> > > > *dev)
> > > > > > >   *   Number of queues.
> > > > > > >   * @param tunnel
> > > > > > >   *   Tunnel type.
> > > > > > > + * @param rss_level
> > > > > > > + *   RSS hash on tunnel level.
> > > > > > >   *
> > > > > > >   * @return
> > > > > > >   *   The Verbs object initialised, NULL otherwise and rte_errno
> > is
> > > > set.
> > > > > > > @@ -1394,13 +1396,17 @@ struct mlx5_hrxq *
> > > > > > > mlx5_hrxq_new(struct rte_eth_dev *dev,
> > > > > > >  	      const uint8_t *rss_key, uint32_t rss_key_len,
> > > > > > >  	      uint64_t hash_fields,
> > > > > > > -	      const uint16_t *queues, uint32_t queues_n, uint32_t
> > > > tunnel)
> > > > > > > +	      const uint16_t *queues, uint32_t queues_n,
> > > > > > > +	      uint32_t tunnel, uint32_t rss_level)
> > > > > >
> > > > > > tunnel and rss_level seems to be redundant here.
> > > > > >
> > > > > > rss_level > 1 is equivalent to tunnel, there is no need to have
> > both.
> > > > >
> > > > > There is a case of tunnel and outer rss(1).
> > > >
> > > > Why cannot it be handled by a regular Hash Rx queue, i.e. what is
> > > > the benefit of creating a tunnel hash Rx queue to make the same job
> > > > as a legacy one?
> > >
> > > Tunnel checksum, ptype and rss offloading demand a QP to be created by
> > > DV api with tunnel offload flags.
> > 
> > I was expecting such answer, such information should be present in the
> > function documentation, can you add it?
> 
> You mean https://dpdk.org/doc/guides/nics/overview.html?
> "Inner L3 checksum" and "Inner L4 checksum" defined. 
> I added "Inner RSS" per your suggestion, The only thing missing is 
> "Innner packet type", make sense?

No I mean adding in this function doxygen documentation the fact than
tunnel is to have the checksum offload whereas the rss_level will be to
enable the RSS in the inner.

Thanks,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-04-16 12:27 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10 13:34 [PATCH v2 00/15] mlx5 Rx tunnel offloading Xueming Li
2018-04-10 13:34 ` [PATCH v2 01/15] net/mlx5: support 16 hardware priorities Xueming Li
2018-04-10 14:41   ` Nélio Laranjeiro
2018-04-10 15:22     ` Xueming(Steven) Li
2018-04-12  9:09       ` Nélio Laranjeiro
2018-04-12 13:43         ` Xueming(Steven) Li
2018-04-12 14:02           ` Nélio Laranjeiro
2018-04-12 14:46             ` Xueming(Steven) Li
2018-04-10 13:34 ` [PATCH v2 02/15] net/mlx5: support GRE tunnel flow Xueming Li
2018-04-10 13:34 ` [PATCH v2 03/15] net/mlx5: support L3 vxlan flow Xueming Li
2018-04-10 14:53   ` Nélio Laranjeiro
2018-04-10 13:34 ` [PATCH v2 04/15] net/mlx5: support Rx tunnel type identification Xueming Li
2018-04-10 15:17   ` Nélio Laranjeiro
2018-04-11  8:11     ` Xueming(Steven) Li
2018-04-12  9:50       ` Nélio Laranjeiro
2018-04-12 14:27         ` Xueming(Steven) Li
2018-04-13  8:37           ` Nélio Laranjeiro
2018-04-13 12:09             ` Xueming(Steven) Li
2018-04-10 13:34 ` [PATCH v2 05/15] net/mlx5: support tunnel inner checksum offloads Xueming Li
2018-04-10 15:27   ` Nélio Laranjeiro
2018-04-11  8:46     ` Xueming(Steven) Li
2018-04-10 13:34 ` [PATCH v2 06/15] net/mlx5: split flow RSS handling logic Xueming Li
2018-04-10 15:28   ` Nélio Laranjeiro
2018-04-10 13:34 ` [PATCH v2 07/15] net/mlx5: support tunnel RSS level Xueming Li
     [not found]   ` <20180411085529.ecxuku77hg3mkybl@laranjeiro-vm.dev.6wind.com>
2018-04-14 12:25     ` Xueming(Steven) Li
2018-04-16  7:14       ` Nélio Laranjeiro
2018-04-16  7:46         ` Xueming(Steven) Li
2018-04-16  8:09           ` Nélio Laranjeiro
2018-04-16 10:06             ` Xueming(Steven) Li
2018-04-16 12:27               ` Nélio Laranjeiro
2018-04-10 13:34 ` [PATCH v2 08/15] net/mlx5: add hardware flow debug dump Xueming Li
2018-04-10 13:34 ` [PATCH v2 09/15] net/mlx5: introduce VXLAN-GPE tunnel type Xueming Li
2018-04-10 13:34 ` [PATCH v2 10/15] net/mlx5: allow flow tunnel ID 0 with outer pattern Xueming Li
2018-04-11 12:25   ` Nélio Laranjeiro
2018-04-10 13:34 ` [PATCH v2 11/15] net/mlx5: support MPLS-in-GRE and MPLS-in-UDP Xueming Li
2018-04-10 13:34 ` [PATCH v2 12/15] doc: update mlx5 guide on tunnel offloading Xueming Li
2018-04-11 12:32   ` Nélio Laranjeiro
2018-04-11 12:43     ` Thomas Monjalon
2018-04-10 13:34 ` [PATCH v2 13/15] net/mlx5: setup RSS flow regardless of queue count Xueming Li
2018-04-11 12:37   ` Nélio Laranjeiro
2018-04-11 13:01     ` Xueming(Steven) Li
2018-04-10 13:34 ` [PATCH v2 14/15] net/mlx5: fix invalid flow item check Xueming Li
2018-04-10 13:34 ` [PATCH v2 15/15] net/mlx5: support RSS configuration in isolated mode Xueming Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.