All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/29] net/mlx4: restore PMD functionality
@ 2017-10-11 14:35 Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 01/29] ethdev: expose flow API error helper Adrien Mazarguil
                   ` (29 more replies)
  0 siblings, 30 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This series restores all the control path functionality removed in prior
series "net/mlx4: trim and refactor entire PMD", including:

- Promiscuous mode.
- All multicast mode.
- MAC address configuration.
- Support for multiple simultaneous MAC addresses.
- Reception of broadcast and user-defined multicast traffic.
- VLAN filters.
- RSS.

This rework also results in the following enhancements:

- Support for multiple flow rule priorities (up to 4096).
- Much more comprehensive error messages when failing to create or apply
  flow rules.
- Flow rules with the RSS action targeting disparate queues can now overlap
  (as long as they take HW limitations into account).
- RSS contexts can be created/destroyed on demand (they were previously
  fixed once and for all after applying the first flow rule).
- RSS hash key can be configured per context.
- Rx objects have a smaller memory footprint.

Note that it should be applied directly before the following series:

 "new mlx4 datapath bypassing ibverbs"

For which a new version based on top of this one will be submitted soon.

Adrien Mazarguil (29):
  ethdev: expose flow API error helper
  net/mlx4: replace bit-field type
  net/mlx4: remove Rx QP initializer function
  net/mlx4: enhance header files comments
  net/mlx4: expose support for flow rule priorities
  net/mlx4: clarify flow objects naming scheme
  net/mlx4: tidy up flow rule handling code
  net/mlx4: compact flow rule error reporting
  mem: add iovec-like allocation wrappers
  net/mlx4: merge flow creation and validation code
  net/mlx4: allocate drop flow resources on demand
  net/mlx4: relax check on missing flow rule target
  net/mlx4: refactor internal flow rules
  net/mlx4: generalize flow rule priority support
  net/mlx4: simplify trigger code for flow rules
  net/mlx4: refactor flow item validation code
  net/mlx4: add MAC addresses configuration support
  net/mlx4: add VLAN filter configuration support
  net/mlx4: add flow support for multicast traffic
  net/mlx4: restore promisc and allmulti support
  net/mlx4: update Rx/Tx callbacks consistently
  net/mlx4: fix invalid errno value sign
  net/mlx4: drop live queue reconfiguration support
  net/mlx4: allocate queues and mbuf rings together
  net/mlx4: convert Rx path to work queues
  net/mlx4: remove unnecessary check
  net/mlx4: add RSS flow rule action support
  net/mlx4: disable UDP support in RSS flow rules
  net/mlx4: add RSS support outside flow API

 doc/guides/nics/features/mlx4.ini               |    6 +
 doc/guides/prog_guide/rte_flow.rst              |   23 +-
 drivers/net/mlx4/Makefile                       |    2 +-
 drivers/net/mlx4/mlx4.c                         |   71 +-
 drivers/net/mlx4/mlx4.h                         |   61 +-
 drivers/net/mlx4/mlx4_ethdev.c                  |  231 ++-
 drivers/net/mlx4/mlx4_flow.c                    | 1671 +++++++++++-------
 drivers/net/mlx4/mlx4_flow.h                    |   32 +-
 drivers/net/mlx4/mlx4_rxq.c                     |  697 ++++----
 drivers/net/mlx4/mlx4_rxtx.c                    |    2 +-
 drivers/net/mlx4/mlx4_rxtx.h                    |   34 +-
 drivers/net/mlx4/mlx4_txq.c                     |  343 ++--
 drivers/net/mlx4/mlx4_utils.h                   |   10 +-
 drivers/net/tap/tap_flow.c                      |    2 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |    9 +
 lib/librte_eal/common/include/rte_malloc.h      |   85 +
 lib/librte_eal/common/rte_malloc.c              |   92 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |    9 +
 lib/librte_ether/rte_ethdev_version.map         |    1 +
 lib/librte_ether/rte_flow.c                     |   49 +-
 lib/librte_ether/rte_flow.h                     |   24 +
 lib/librte_ether/rte_flow_driver.h              |   38 -
 mk/rte.app.mk                                   |    2 +-
 23 files changed, 2173 insertions(+), 1321 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v1 01/29] ethdev: expose flow API error helper
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 02/29] net/mlx4: replace bit-field type Adrien Mazarguil
                   ` (28 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

rte_flow_error_set() is a convenient helper to initialize error objects.

Since there is no fundamental reason to prevent applications from using it,
expose it through the public interface after modifying its return value
from positive to negative. This is done for consistency with the rest of
the public interface.

Documentation is updated accordingly.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/prog_guide/rte_flow.rst      | 23 +++++++++++--
 drivers/net/mlx4/mlx4_flow.c            |  6 ++--
 drivers/net/tap/tap_flow.c              |  2 +-
 lib/librte_ether/rte_ethdev_version.map |  1 +
 lib/librte_ether/rte_flow.c             | 49 +++++++++++++++++++---------
 lib/librte_ether/rte_flow.h             | 24 ++++++++++++++
 lib/librte_ether/rte_flow_driver.h      | 38 ---------------------
 7 files changed, 83 insertions(+), 60 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 73f12ee..3113881 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1695,6 +1695,25 @@ freed by the application, however its pointer can be considered valid only
 as long as its associated DPDK port remains configured. Closing the
 underlying device or unloading the PMD invalidates it.
 
+Helpers
+-------
+
+Error initializer
+~~~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+   static inline int
+   rte_flow_error_set(struct rte_flow_error *error,
+                      int code,
+                      enum rte_flow_error_type type,
+                      const void *cause,
+                      const char *message);
+
+This function initializes ``error`` (if non-NULL) with the provided
+parameters and sets ``rte_errno`` to ``code``. A negative error ``code`` is
+then returned.
+
 Caveats
 -------
 
@@ -1760,13 +1779,11 @@ the legacy filtering framework, which should eventually disappear.
   whatsoever). They only make sure these callbacks are non-NULL or return
   the ``ENOSYS`` (function not supported) error.
 
-This interface additionally defines the following helper functions:
+This interface additionally defines the following helper function:
 
 - ``rte_flow_ops_get()``: get generic flow operations structure from a
   port.
 
-- ``rte_flow_error_set()``: initialize generic flow error structure.
-
 More will be added over time.
 
 Device compatibility
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 0885a91..018843b 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -955,9 +955,9 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 		mlx4_mac_addr_del(priv);
 	} else if (mlx4_mac_addr_add(priv) < 0) {
 		priv->isolated = 1;
-		return -rte_flow_error_set(error, rte_errno,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL, "cannot leave isolated mode");
+		return rte_flow_error_set(error, rte_errno,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "cannot leave isolated mode");
 	}
 	return 0;
 }
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 28d793f..ffc0b85 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1462,7 +1462,7 @@ tap_flow_isolate(struct rte_eth_dev *dev,
 	return 0;
 error:
 	pmd->flow_isolate = 0;
-	return -rte_flow_error_set(
+	return rte_flow_error_set(
 		error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 		"TC rule creation failed");
 }
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 92c9e29..e27f596 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -193,5 +193,6 @@ DPDK_17.11 {
 
 	rte_eth_dev_pool_ops_supported;
 	rte_eth_dev_reset;
+	rte_flow_error_set;
 
 } DPDK_17.08;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index e276fb2..6659063 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -145,9 +145,9 @@ rte_flow_validate(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->validate))
 		return ops->validate(dev, attr, pattern, actions, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Create a flow rule on a given port. */
@@ -183,9 +183,9 @@ rte_flow_destroy(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->destroy))
 		return ops->destroy(dev, flow, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Destroy all flow rules associated with a port. */
@@ -200,9 +200,9 @@ rte_flow_flush(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->flush))
 		return ops->flush(dev, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Query an existing flow rule. */
@@ -220,9 +220,9 @@ rte_flow_query(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->query))
 		return ops->query(dev, flow, action, data, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Restrict ingress traffic to the defined flow rules. */
@@ -238,9 +238,28 @@ rte_flow_isolate(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->isolate))
 		return ops->isolate(dev, set, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
+}
+
+/* Initialize flow error structure. */
+int
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return -code;
 }
 
 /** Compute storage space needed by item specification. */
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index d37b0ad..a0ffb71 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -1322,6 +1322,30 @@ int
 rte_flow_isolate(uint16_t port_id, int set, struct rte_flow_error *error);
 
 /**
+ * Initialize flow error structure.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Negative error code (errno value) and rte_errno is set.
+ */
+int
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message);
+
+/**
  * Generic flow representation.
  *
  * This form is sufficient to describe an rte_flow independently from any
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
index 8573cef..254d1cb 100644
--- a/lib/librte_ether/rte_flow_driver.h
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -45,7 +45,6 @@
 
 #include <stdint.h>
 
-#include <rte_errno.h>
 #include "rte_ethdev.h"
 #include "rte_flow.h"
 
@@ -128,43 +127,6 @@ struct rte_flow_ops {
 };
 
 /**
- * Initialize generic flow error structure.
- *
- * This function also sets rte_errno to a given value.
- *
- * @param[out] error
- *   Pointer to flow error structure (may be NULL).
- * @param code
- *   Related error code (rte_errno).
- * @param type
- *   Cause field and error types.
- * @param cause
- *   Object responsible for the error.
- * @param message
- *   Human-readable error message.
- *
- * @return
- *   Error code.
- */
-static inline int
-rte_flow_error_set(struct rte_flow_error *error,
-		   int code,
-		   enum rte_flow_error_type type,
-		   const void *cause,
-		   const char *message)
-{
-	if (error) {
-		*error = (struct rte_flow_error){
-			.type = type,
-			.cause = cause,
-			.message = message,
-		};
-	}
-	rte_errno = code;
-	return code;
-}
-
-/**
  * Get generic flow operations structure from a port.
  *
  * @param port_id
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 02/29] net/mlx4: replace bit-field type
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 01/29] ethdev: expose flow API error helper Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 03/29] net/mlx4: remove Rx QP initializer function Adrien Mazarguil
                   ` (27 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Make clear it's 32-bit wide.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 9bd2acc..71cbced 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -100,10 +100,10 @@ struct priv {
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int vf:1; /* This is a VF device. */
-	unsigned int intr_alarm:1; /* An interrupt alarm is scheduled. */
-	unsigned int isolated:1; /* Toggle isolated mode. */
+	uint32_t started:1; /* Device started, flows enabled. */
+	uint32_t vf:1; /* This is a VF device. */
+	uint32_t intr_alarm:1; /* An interrupt alarm is scheduled. */
+	uint32_t isolated:1; /* Toggle isolated mode. */
 	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 03/29] net/mlx4: remove Rx QP initializer function
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 01/29] ethdev: expose flow API error helper Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 02/29] net/mlx4: replace bit-field type Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 04/29] net/mlx4: enhance header files comments Adrien Mazarguil
                   ` (26 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

There is no benefit in having this as a separate function.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_rxq.c | 59 ++++++++++++----------------------------
 1 file changed, 18 insertions(+), 41 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 409983f..2d54ab0 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -184,46 +184,6 @@ mlx4_rxq_cleanup(struct rxq *rxq)
 }
 
 /**
- * Allocate a Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error and rte_errno is set.
- */
-static struct ibv_qp *
-mlx4_rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
-	struct ibv_qp *qp;
-	struct ibv_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = 1,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-	};
-
-	qp = ibv_create_qp(priv->pd, &attr);
-	if (!qp)
-		rte_errno = errno ? errno : EINVAL;
-	return qp;
-}
-
-/**
  * Configure a Rx queue.
  *
  * @param dev
@@ -254,6 +214,7 @@ mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.socket = socket
 	};
 	struct ibv_qp_attr mod;
+	struct ibv_qp_init_attr qp_init;
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret;
@@ -317,8 +278,24 @@ mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	tmpl.qp = mlx4_rxq_setup_qp(priv, tmpl.cq, desc);
+	qp_init = (struct ibv_qp_init_attr){
+		/* CQ to be associated with the send queue. */
+		.send_cq = tmpl.cq,
+		/* CQ to be associated with the receive queue. */
+		.recv_cq = tmpl.cq,
+		.cap = {
+			/* Max number of outstanding WRs. */
+			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+					priv->device_attr.max_qp_wr :
+					desc),
+			/* Max number of scatter/gather elements in a WR. */
+			.max_recv_sge = 1,
+		},
+		.qp_type = IBV_QPT_RAW_PACKET,
+	};
+	tmpl.qp = ibv_create_qp(priv->pd, &qp_init);
 	if (tmpl.qp == NULL) {
+		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 04/29] net/mlx4: enhance header files comments
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (2 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 03/29] net/mlx4: remove Rx QP initializer function Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 05/29] net/mlx4: expose support for flow rule priorities Adrien Mazarguil
                   ` (25 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Add missing comments and fix those not Doxygen-friendly.

Since the private structure definition is modified, use this opportunity to
add one remaining missing include required by one of its fields
(sys/queue.h for LIST_HEAD()).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h       | 43 ++++++++++++++++++++------------------
 drivers/net/mlx4/mlx4_flow.h  |  2 ++
 drivers/net/mlx4/mlx4_rxtx.h  |  4 ++--
 drivers/net/mlx4/mlx4_utils.h |  4 ++--
 4 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 71cbced..1799951 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -36,6 +36,7 @@
 
 #include <net/if.h>
 #include <stdint.h>
+#include <sys/queue.h>
 
 /* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
@@ -51,13 +52,13 @@
 #include <rte_interrupts.h>
 #include <rte_mempool.h>
 
-/* Request send completion once in every 64 sends, might be less. */
+/** Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
-/* Maximum size for inline data. */
+/** Maximum size for inline data. */
 #define MLX4_PMD_MAX_INLINE 0
 
-/*
+/**
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
  * from which buffers are to be transmitted will have to be mapped by this
  * driver to their own Memory Region (MR). This is a slow operation.
@@ -68,10 +69,10 @@
 #define MLX4_PMD_TX_MP_CACHE 8
 #endif
 
-/* Interrupt alarm timeout value in microseconds. */
+/** Interrupt alarm timeout value in microseconds. */
 #define MLX4_INTR_ALARM_TIMEOUT 100000
 
-/* Port parameter. */
+/** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
 enum {
@@ -84,29 +85,31 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX3PRO = 0x1007,
 };
 
+/** Driver name reported to lower layers and used in log output. */
 #define MLX4_DRIVER_NAME "net_mlx4"
 
 struct rxq;
 struct txq;
 struct rte_flow;
 
+/** Private data structure. */
 struct priv {
-	struct rte_eth_dev *dev; /* Ethernet device. */
-	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
-	struct ether_addr mac; /* MAC address. */
-	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
+	struct rte_eth_dev *dev; /**< Ethernet device. */
+	struct ibv_context *ctx; /**< Verbs context. */
+	struct ibv_device_attr device_attr; /**< Device properties. */
+	struct ibv_pd *pd; /**< Protection Domain. */
+	struct ether_addr mac; /**< MAC address. */
+	struct ibv_flow *mac_flow; /**< Flow associated with MAC address. */
 	/* Device properties. */
-	uint16_t mtu; /* Configured MTU. */
-	uint8_t port; /* Physical port number. */
-	uint32_t started:1; /* Device started, flows enabled. */
-	uint32_t vf:1; /* This is a VF device. */
-	uint32_t intr_alarm:1; /* An interrupt alarm is scheduled. */
-	uint32_t isolated:1; /* Toggle isolated mode. */
-	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
-	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
-	LIST_HEAD(mlx4_flows, rte_flow) flows;
+	uint16_t mtu; /**< Configured MTU. */
+	uint8_t port; /**< Physical port number. */
+	uint32_t started:1; /**< Device started, flows enabled. */
+	uint32_t vf:1; /**< This is a VF device. */
+	uint32_t intr_alarm:1; /**< An interrupt alarm is scheduled. */
+	uint32_t isolated:1; /**< Toggle isolated mode. */
+	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
+	struct rte_flow_drop *flow_drop_queue; /**< Flow drop queue. */
+	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
 };
 
 /* mlx4_ethdev.c */
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index fbb775d..459030c 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -52,6 +52,7 @@
 #include <rte_flow_driver.h>
 #include <rte_byteorder.h>
 
+/** PMD-specific (mlx4) definition of a flow rule handle. */
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
@@ -65,6 +66,7 @@ struct mlx4_flow {
 	unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */
 };
 
+/** Flow rule target descriptor. */
 struct mlx4_flow_action {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index fec998a..365b585 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -85,8 +85,8 @@ struct rxq {
 
 /** Tx element. */
 struct txq_elt {
-	struct ibv_send_wr wr; /* Work request. */
-	struct ibv_sge sge; /* Scatter/gather element. */
+	struct ibv_send_wr wr; /**< Work request. */
+	struct ibv_sge sge; /**< Scatter/gather element. */
 	struct rte_mbuf *buf; /**< Buffer. */
 };
 
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index 0fbdc71..b9c02d5 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -49,7 +49,7 @@
  * information replace the driver name (MLX4_DRIVER_NAME) in log messages.
  */
 
-/* Return the file name part of a path. */
+/** Return the file name part of a path. */
 static inline const char *
 pmd_drv_log_basename(const char *s)
 {
@@ -98,7 +98,7 @@ pmd_drv_log_basename(const char *s)
 #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
+/** Allocate a buffer on the stack and fill it with a printf format string. */
 #define MKSTR(name, ...) \
 	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
 	\
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 05/29] net/mlx4: expose support for flow rule priorities
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (3 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 04/29] net/mlx4: enhance header files comments Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 06/29] net/mlx4: clarify flow objects naming scheme Adrien Mazarguil
                   ` (24 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This PMD supports up to 4096 flow rule priority levels (0 to 4095).

Applications were not allowed to use them until now due to overlaps with
the default flows (e.g. MAC address, promiscuous mode).

This is not an issue in isolated mode when such flows do not exist.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c  | 20 +++++++++++++++++---
 drivers/net/mlx4/mlx4_flow.h  |  3 +++
 drivers/net/mlx4/mlx4_utils.h |  6 ++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 018843b..730249b 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -597,8 +597,8 @@ mlx4_flow_prepare(struct priv *priv,
 		.queue = 0,
 		.drop = 0,
 	};
+	uint32_t priority_override = 0;
 
-	(void)priv;
 	if (attr->group) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -606,11 +606,22 @@ mlx4_flow_prepare(struct priv *priv,
 				   "groups are not supported");
 		return -rte_errno;
 	}
-	if (attr->priority) {
+	if (priv->isolated) {
+		priority_override = attr->priority;
+	} else if (attr->priority) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
 				   NULL,
-				   "priorities are not supported");
+				   "priorities are not supported outside"
+				   " isolated mode");
+		return -rte_errno;
+	}
+	if (attr->priority > MLX4_FLOW_PRIORITY_LAST) {
+		rte_flow_error_set(error, ENOTSUP,
+				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+				   NULL,
+				   "maximum priority level is "
+				   MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST));
 		return -rte_errno;
 	}
 	if (attr->egress) {
@@ -680,6 +691,9 @@ mlx4_flow_prepare(struct priv *priv,
 		}
 		flow->offset += cur_item->dst_sz;
 	}
+	/* Use specified priority level when in isolated mode. */
+	if (priv->isolated && flow->ibv_attr)
+		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list */
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; ++actions) {
 		if (actions->type == RTE_FLOW_ACTION_TYPE_VOID) {
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 459030c..8ac09f1 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -52,6 +52,9 @@
 #include <rte_flow_driver.h>
 #include <rte_byteorder.h>
 
+/** Last and lowest priority level for a flow rule. */
+#define MLX4_FLOW_PRIORITY_LAST UINT32_C(0xfff)
+
 /** PMD-specific (mlx4) definition of a flow rule handle. */
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index b9c02d5..13f731a 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -104,6 +104,12 @@ pmd_drv_log_basename(const char *s)
 	\
 	snprintf(name, sizeof(name), __VA_ARGS__)
 
+/** Generate a string out of the provided arguments. */
+#define MLX4_STR(...) # __VA_ARGS__
+
+/** Similar to MLX4_STR() with enclosed macros expanded first. */
+#define MLX4_STR_EXPAND(...) MLX4_STR(__VA_ARGS__)
+
 /* mlx4_utils.c */
 
 int mlx4_fd_set_non_blocking(int fd);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 06/29] net/mlx4: clarify flow objects naming scheme
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (4 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 05/29] net/mlx4: expose support for flow rule priorities Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 07/29] net/mlx4: tidy up flow rule handling code Adrien Mazarguil
                   ` (23 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

In several instances, "items" refers either to a flow pattern or a single
item, and "actions" either to the entire list of actions or only one of
them.

The fact the target of a rule (struct mlx4_flow_action) is also named
"action" and item-processing objects (struct mlx4_flow_items) as "cur_item"
("token" in one instance) contributes to the confusion.

Use this opportunity to clarify related comments and remove the unused
valid_actions[] global, whose sole purpose is to be referred by
item-processing objects as "actions".

This commit does not cause any functional change.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 171 ++++++++++++++++++--------------------
 drivers/net/mlx4/mlx4_flow.h |   2 +-
 2 files changed, 81 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 730249b..e5854c6 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -66,16 +66,14 @@
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
-/** Static initializer for items. */
-#define ITEMS(...) \
+/** Static initializer for a list of subsequent item types. */
+#define NEXT_ITEM(...) \
 	(const enum rte_flow_item_type []){ \
 		__VA_ARGS__, RTE_FLOW_ITEM_TYPE_END, \
 	}
 
-/** Structure to generate a simple graph of layers supported by the NIC. */
-struct mlx4_flow_items {
-	/** List of possible actions for these items. */
-	const enum rte_flow_action_type *const actions;
+/** Processor structure associated with a flow item. */
+struct mlx4_flow_proc_item {
 	/** Bit-masks corresponding to the possibilities for the item. */
 	const void *mask;
 	/**
@@ -121,8 +119,8 @@ struct mlx4_flow_items {
 		       void *data);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
-	/** List of possible following items.  */
-	const enum rte_flow_item_type *const items;
+	/** List of possible subsequent items. */
+	const enum rte_flow_item_type *const next_item;
 };
 
 struct rte_flow_drop {
@@ -130,13 +128,6 @@ struct rte_flow_drop {
 	struct ibv_cq *cq; /**< Verbs completion queue. */
 };
 
-/** Valid action for this PMD. */
-static const enum rte_flow_action_type valid_actions[] = {
-	RTE_FLOW_ACTION_TYPE_DROP,
-	RTE_FLOW_ACTION_TYPE_QUEUE,
-	RTE_FLOW_ACTION_TYPE_END,
-};
-
 /**
  * Convert Ethernet item to Verbs specification.
  *
@@ -485,14 +476,13 @@ mlx4_flow_validate_tcp(const struct rte_flow_item *item,
 }
 
 /** Graph of supported items and associated actions. */
-static const struct mlx4_flow_items mlx4_flow_items[] = {
+static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_END] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH),
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_ETH),
 	},
 	[RTE_FLOW_ITEM_TYPE_ETH] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_VLAN,
-			       RTE_FLOW_ITEM_TYPE_IPV4),
-		.actions = valid_actions,
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_VLAN,
+				       RTE_FLOW_ITEM_TYPE_IPV4),
 		.mask = &(const struct rte_flow_item_eth){
 			.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 			.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
@@ -504,8 +494,7 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_eth),
 	},
 	[RTE_FLOW_ITEM_TYPE_VLAN] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_IPV4),
-		.actions = valid_actions,
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_IPV4),
 		.mask = &(const struct rte_flow_item_vlan){
 		/* rte_flow_item_vlan_mask is invalid for mlx4. */
 #if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
@@ -520,9 +509,8 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = 0,
 	},
 	[RTE_FLOW_ITEM_TYPE_IPV4] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_UDP,
-			       RTE_FLOW_ITEM_TYPE_TCP),
-		.actions = valid_actions,
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_UDP,
+				       RTE_FLOW_ITEM_TYPE_TCP),
 		.mask = &(const struct rte_flow_item_ipv4){
 			.hdr = {
 				.src_addr = -1,
@@ -536,7 +524,6 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_ipv4),
 	},
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
-		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_udp){
 			.hdr = {
 				.src_port = -1,
@@ -550,7 +537,6 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
 	[RTE_FLOW_ITEM_TYPE_TCP] = {
-		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_tcp){
 			.hdr = {
 				.src_port = -1,
@@ -572,7 +558,7 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
  *   Pointer to private structure.
  * @param[in] attr
  *   Flow rule attributes.
- * @param[in] items
+ * @param[in] pattern
  *   Pattern specification (list terminated by the END pattern item).
  * @param[in] actions
  *   Associated actions (list terminated by the END action).
@@ -587,13 +573,15 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 static int
 mlx4_flow_prepare(struct priv *priv,
 		  const struct rte_flow_attr *attr,
-		  const struct rte_flow_item items[],
+		  const struct rte_flow_item pattern[],
 		  const struct rte_flow_action actions[],
 		  struct rte_flow_error *error,
 		  struct mlx4_flow *flow)
 {
-	const struct mlx4_flow_items *cur_item = mlx4_flow_items;
-	struct mlx4_flow_action action = {
+	const struct rte_flow_item *item;
+	const struct rte_flow_action *action;
+	const struct mlx4_flow_proc_item *proc = mlx4_flow_proc_item_list;
+	struct mlx4_flow_target target = {
 		.queue = 0,
 		.drop = 0,
 	};
@@ -638,82 +626,80 @@ mlx4_flow_prepare(struct priv *priv,
 				   "only ingress is supported");
 		return -rte_errno;
 	}
-	/* Go over items list. */
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
-		const struct mlx4_flow_items *token = NULL;
+	/* Go over pattern. */
+	for (item = pattern; item->type != RTE_FLOW_ITEM_TYPE_END; ++item) {
+		const struct mlx4_flow_proc_item *next = NULL;
 		unsigned int i;
 		int err;
 
-		if (items->type == RTE_FLOW_ITEM_TYPE_VOID)
+		if (item->type == RTE_FLOW_ITEM_TYPE_VOID)
 			continue;
 		/*
 		 * The nic can support patterns with NULL eth spec only
 		 * if eth is a single item in a rule.
 		 */
-		if (!items->spec &&
-			items->type == RTE_FLOW_ITEM_TYPE_ETH) {
-			const struct rte_flow_item *next = items + 1;
+		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
+			const struct rte_flow_item *next = item + 1;
 
 			if (next->type != RTE_FLOW_ITEM_TYPE_END) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
-						   items,
+						   item,
 						   "the rule requires"
 						   " an Ethernet spec");
 				return -rte_errno;
 			}
 		}
 		for (i = 0;
-		     cur_item->items &&
-		     cur_item->items[i] != RTE_FLOW_ITEM_TYPE_END;
+		     proc->next_item &&
+		     proc->next_item[i] != RTE_FLOW_ITEM_TYPE_END;
 		     ++i) {
-			if (cur_item->items[i] == items->type) {
-				token = &mlx4_flow_items[items->type];
+			if (proc->next_item[i] == item->type) {
+				next = &mlx4_flow_proc_item_list[item->type];
 				break;
 			}
 		}
-		if (!token)
+		if (!next)
 			goto exit_item_not_supported;
-		cur_item = token;
-		err = cur_item->validate(items,
-					(const uint8_t *)cur_item->mask,
-					 cur_item->mask_sz);
+		proc = next;
+		err = proc->validate(item, proc->mask, proc->mask_sz);
 		if (err)
 			goto exit_item_not_supported;
-		if (flow->ibv_attr && cur_item->convert) {
-			err = cur_item->convert(items,
-						(cur_item->default_mask ?
-						 cur_item->default_mask :
-						 cur_item->mask),
-						 flow);
+		if (flow->ibv_attr && proc->convert) {
+			err = proc->convert(item,
+					    (proc->default_mask ?
+					     proc->default_mask :
+					     proc->mask),
+					    flow);
 			if (err)
 				goto exit_item_not_supported;
 		}
-		flow->offset += cur_item->dst_sz;
+		flow->offset += proc->dst_sz;
 	}
 	/* Use specified priority level when in isolated mode. */
 	if (priv->isolated && flow->ibv_attr)
 		flow->ibv_attr->priority = priority_override;
-	/* Go over actions list */
-	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; ++actions) {
-		if (actions->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	/* Go over actions list. */
+	for (action = actions;
+	     action->type != RTE_FLOW_ACTION_TYPE_END;
+	     ++action) {
+		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
 			continue;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
-			action.drop = 1;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+			target.drop = 1;
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
 			const struct rte_flow_action_queue *queue =
-				(const struct rte_flow_action_queue *)
-				actions->conf;
+				action->conf;
 
 			if (!queue || (queue->index >
 				       (priv->dev->data->nb_rx_queues - 1)))
 				goto exit_action_not_supported;
-			action.queue = 1;
+			target.queue = 1;
 		} else {
 			goto exit_action_not_supported;
 		}
 	}
-	if (!action.queue && !action.drop) {
+	if (!target.queue && !target.drop) {
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_HANDLE,
 				   NULL, "no valid action");
 		return -rte_errno;
@@ -721,11 +707,11 @@ mlx4_flow_prepare(struct priv *priv,
 	return 0;
 exit_item_not_supported:
 	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
-			   items, "item not supported");
+			   item, "item not supported");
 	return -rte_errno;
 exit_action_not_supported:
 	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
-			   actions, "action not supported");
+			   action, "action not supported");
 	return -rte_errno;
 }
 
@@ -738,14 +724,14 @@ mlx4_flow_prepare(struct priv *priv,
 static int
 mlx4_flow_validate(struct rte_eth_dev *dev,
 		   const struct rte_flow_attr *attr,
-		   const struct rte_flow_item items[],
+		   const struct rte_flow_item pattern[],
 		   const struct rte_flow_action actions[],
 		   struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	return mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
+	return mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
 }
 
 /**
@@ -828,8 +814,8 @@ mlx4_flow_create_drop_queue(struct priv *priv)
  *   Pointer to private structure.
  * @param ibv_attr
  *   Verbs flow attributes.
- * @param action
- *   Target action structure.
+ * @param target
+ *   Rule target descriptor.
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
  *
@@ -837,9 +823,9 @@ mlx4_flow_create_drop_queue(struct priv *priv)
  *   A flow if the rule could be created.
  */
 static struct rte_flow *
-mlx4_flow_create_action_queue(struct priv *priv,
+mlx4_flow_create_target_queue(struct priv *priv,
 			      struct ibv_flow_attr *ibv_attr,
-			      struct mlx4_flow_action *action,
+			      struct mlx4_flow_target *target,
 			      struct rte_flow_error *error)
 {
 	struct ibv_qp *qp;
@@ -853,10 +839,10 @@ mlx4_flow_create_action_queue(struct priv *priv,
 				   NULL, "cannot allocate flow memory");
 		return NULL;
 	}
-	if (action->drop) {
+	if (target->drop) {
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
-		struct rxq *rxq = priv->dev->data->rx_queues[action->queue_id];
+		struct rxq *rxq = priv->dev->data->rx_queues[target->queue_id];
 
 		qp = rxq->qp;
 		rte_flow->qp = qp;
@@ -885,17 +871,18 @@ mlx4_flow_create_action_queue(struct priv *priv,
 static struct rte_flow *
 mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr,
-		 const struct rte_flow_item items[],
+		 const struct rte_flow_item pattern[],
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error)
 {
+	const struct rte_flow_action *action;
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *rte_flow;
-	struct mlx4_flow_action action;
+	struct mlx4_flow_target target;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
 	int err;
 
-	err = mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
+	err = mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
 	if (err)
 		return NULL;
 	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
@@ -914,31 +901,33 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		.port = priv->port,
 		.flags = 0,
 	};
-	claim_zero(mlx4_flow_prepare(priv, attr, items, actions,
+	claim_zero(mlx4_flow_prepare(priv, attr, pattern, actions,
 				     error, &flow));
-	action = (struct mlx4_flow_action){
+	target = (struct mlx4_flow_target){
 		.queue = 0,
 		.drop = 0,
 	};
-	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; ++actions) {
-		if (actions->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	for (action = actions;
+	     action->type != RTE_FLOW_ACTION_TYPE_END;
+	     ++action) {
+		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
 			continue;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			action.queue = 1;
-			action.queue_id =
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			target.queue = 1;
+			target.queue_id =
 				((const struct rte_flow_action_queue *)
-				 actions->conf)->index;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
-			action.drop = 1;
+				 action->conf)->index;
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+			target.drop = 1;
 		} else {
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions, "unsupported action");
+					   action, "unsupported action");
 			goto exit;
 		}
 	}
-	rte_flow = mlx4_flow_create_action_queue(priv, flow.ibv_attr,
-						 &action, error);
+	rte_flow = mlx4_flow_create_target_queue(priv, flow.ibv_attr,
+						 &target, error);
 	if (rte_flow) {
 		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
 		DEBUG("Flow created %p", (void *)rte_flow);
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 8ac09f1..358efbe 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -70,7 +70,7 @@ struct mlx4_flow {
 };
 
 /** Flow rule target descriptor. */
-struct mlx4_flow_action {
+struct mlx4_flow_target {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint32_t queue_id; /**< Identifier of the queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 07/29] net/mlx4: tidy up flow rule handling code
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (5 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 06/29] net/mlx4: clarify flow objects naming scheme Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 08/29] net/mlx4: compact flow rule error reporting Adrien Mazarguil
                   ` (22 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

- Remove unnecessary casts.
- Replace consecutive if/else blocks with switch statements.
- Use proper big endian definitions for mask values.
- Make end marker checks of item and action lists less verbose since they
  are explicitly documented as being equal to 0.
- Remove unnecessary NULL check on action configuration structure.

This commit does not cause any functional change.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 115 ++++++++++++++++++--------------------
 1 file changed, 53 insertions(+), 62 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index e5854c6..fa56419 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -53,6 +53,7 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_byteorder.h>
 #include <rte_errno.h>
 #include <rte_eth_ctrl.h>
 #include <rte_ethdev.h>
@@ -108,7 +109,7 @@ struct mlx4_flow_proc_item {
 	 *   rte_flow item to convert.
 	 * @param default_mask
 	 *   Default bit-masks to use when item->mask is not provided.
-	 * @param data
+	 * @param flow
 	 *   Internal structure to store the conversion.
 	 *
 	 * @return
@@ -116,7 +117,7 @@ struct mlx4_flow_proc_item {
 	 */
 	int (*convert)(const struct rte_flow_item *item,
 		       const void *default_mask,
-		       void *data);
+		       struct mlx4_flow *flow);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
 	/** List of possible subsequent items. */
@@ -135,17 +136,16 @@ struct rte_flow_drop {
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_eth(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     void *data)
+		     struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_eth *spec = item->spec;
 	const struct rte_flow_item_eth *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_eth *eth;
 	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
 	unsigned int i;
@@ -182,17 +182,16 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_vlan(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      void *data)
+		      struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_vlan *spec = item->spec;
 	const struct rte_flow_item_vlan *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_eth *eth;
 	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
 
@@ -214,17 +213,16 @@ mlx4_flow_create_vlan(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      void *data)
+		      struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_ipv4 *spec = item->spec;
 	const struct rte_flow_item_ipv4 *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_ipv4 *ipv4;
 	unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4);
 
@@ -260,17 +258,16 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_udp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     void *data)
+		     struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_udp *spec = item->spec;
 	const struct rte_flow_item_udp *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_tcp_udp *udp;
 	unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
@@ -302,17 +299,16 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_tcp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     void *data)
+		     struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_tcp *spec = item->spec;
 	const struct rte_flow_item_tcp *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_tcp_udp *tcp;
 	unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
@@ -496,12 +492,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_VLAN] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_IPV4),
 		.mask = &(const struct rte_flow_item_vlan){
-		/* rte_flow_item_vlan_mask is invalid for mlx4. */
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-			.tci = 0x0fff,
-#else
-			.tci = 0xff0f,
-#endif
+			/* Only TCI VID matching is supported. */
+			.tci = RTE_BE16(0x0fff),
 		},
 		.mask_sz = sizeof(struct rte_flow_item_vlan),
 		.validate = mlx4_flow_validate_vlan,
@@ -513,8 +505,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 				       RTE_FLOW_ITEM_TYPE_TCP),
 		.mask = &(const struct rte_flow_item_ipv4){
 			.hdr = {
-				.src_addr = -1,
-				.dst_addr = -1,
+				.src_addr = RTE_BE32(0xffffffff),
+				.dst_addr = RTE_BE32(0xffffffff),
 			},
 		},
 		.default_mask = &rte_flow_item_ipv4_mask,
@@ -526,8 +518,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
 		.mask = &(const struct rte_flow_item_udp){
 			.hdr = {
-				.src_port = -1,
-				.dst_port = -1,
+				.src_port = RTE_BE16(0xffff),
+				.dst_port = RTE_BE16(0xffff),
 			},
 		},
 		.default_mask = &rte_flow_item_udp_mask,
@@ -539,8 +531,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_TCP] = {
 		.mask = &(const struct rte_flow_item_tcp){
 			.hdr = {
-				.src_port = -1,
-				.dst_port = -1,
+				.src_port = RTE_BE16(0xffff),
+				.dst_port = RTE_BE16(0xffff),
 			},
 		},
 		.default_mask = &rte_flow_item_tcp_mask,
@@ -627,7 +619,7 @@ mlx4_flow_prepare(struct priv *priv,
 		return -rte_errno;
 	}
 	/* Go over pattern. */
-	for (item = pattern; item->type != RTE_FLOW_ITEM_TYPE_END; ++item) {
+	for (item = pattern; item->type; ++item) {
 		const struct mlx4_flow_proc_item *next = NULL;
 		unsigned int i;
 		int err;
@@ -641,7 +633,7 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
 			const struct rte_flow_item *next = item + 1;
 
-			if (next->type != RTE_FLOW_ITEM_TYPE_END) {
+			if (next->type) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
 						   item,
@@ -650,10 +642,7 @@ mlx4_flow_prepare(struct priv *priv,
 				return -rte_errno;
 			}
 		}
-		for (i = 0;
-		     proc->next_item &&
-		     proc->next_item[i] != RTE_FLOW_ITEM_TYPE_END;
-		     ++i) {
+		for (i = 0; proc->next_item && proc->next_item[i]; ++i) {
 			if (proc->next_item[i] == item->type) {
 				next = &mlx4_flow_proc_item_list[item->type];
 				break;
@@ -680,22 +669,22 @@ mlx4_flow_prepare(struct priv *priv,
 	if (priv->isolated && flow->ibv_attr)
 		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list. */
-	for (action = actions;
-	     action->type != RTE_FLOW_ACTION_TYPE_END;
-	     ++action) {
-		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	for (action = actions; action->type; ++action) {
+		switch (action->type) {
+			const struct rte_flow_action_queue *queue;
+
+		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+		case RTE_FLOW_ACTION_TYPE_DROP:
 			target.drop = 1;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			const struct rte_flow_action_queue *queue =
-				action->conf;
-
-			if (!queue || (queue->index >
-				       (priv->dev->data->nb_rx_queues - 1)))
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = action->conf;
+			if (queue->index >= priv->dev->data->nb_rx_queues)
 				goto exit_action_not_supported;
 			target.queue = 1;
-		} else {
+			break;
+		default:
 			goto exit_action_not_supported;
 		}
 	}
@@ -907,19 +896,21 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		.queue = 0,
 		.drop = 0,
 	};
-	for (action = actions;
-	     action->type != RTE_FLOW_ACTION_TYPE_END;
-	     ++action) {
-		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	for (action = actions; action->type; ++action) {
+		switch (action->type) {
+			const struct rte_flow_action_queue *queue;
+
+		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = action->conf;
 			target.queue = 1;
-			target.queue_id =
-				((const struct rte_flow_action_queue *)
-				 action->conf)->index;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+			target.queue_id = queue->index;
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
 			target.drop = 1;
-		} else {
+			break;
+		default:
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
 					   action, "unsupported action");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 08/29] net/mlx4: compact flow rule error reporting
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (6 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 07/29] net/mlx4: tidy up flow rule handling code Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 09/29] mem: add iovec-like allocation wrappers Adrien Mazarguil
                   ` (21 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Relying on rte_errno is not necessary where the return value of
rte_flow_error_set() can be used directly.

A related minor change is switching from RTE_FLOW_ERROR_TYPE_HANDLE to
RTE_FLOW_ERROR_TYPE_UNSPECIFIED when no rte_flow handle is involved in the
error, specifically when none is allocated yet.

This commit does not cause any functional change.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 102 ++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index fa56419..000f17f 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -579,45 +579,30 @@ mlx4_flow_prepare(struct priv *priv,
 	};
 	uint32_t priority_override = 0;
 
-	if (attr->group) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-				   NULL,
-				   "groups are not supported");
-		return -rte_errno;
-	}
-	if (priv->isolated) {
+	if (attr->group)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+			 NULL, "groups are not supported");
+	if (priv->isolated)
 		priority_override = attr->priority;
-	} else if (attr->priority) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
-				   NULL,
-				   "priorities are not supported outside"
-				   " isolated mode");
-		return -rte_errno;
-	}
-	if (attr->priority > MLX4_FLOW_PRIORITY_LAST) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
-				   NULL,
-				   "maximum priority level is "
-				   MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST));
-		return -rte_errno;
-	}
-	if (attr->egress) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_EGRESS,
-				   NULL,
-				   "egress is not supported");
-		return -rte_errno;
-	}
-	if (!attr->ingress) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
-				   NULL,
-				   "only ingress is supported");
-		return -rte_errno;
-	}
+	else if (attr->priority)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+			 NULL,
+			 "priorities are not supported outside isolated mode");
+	if (attr->priority > MLX4_FLOW_PRIORITY_LAST)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+			 NULL, "maximum priority level is "
+			 MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST));
+	if (attr->egress)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_EGRESS,
+			 NULL, "egress is not supported");
+	if (!attr->ingress)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+			 NULL, "only ingress is supported");
 	/* Go over pattern. */
 	for (item = pattern; item->type; ++item) {
 		const struct mlx4_flow_proc_item *next = NULL;
@@ -633,14 +618,11 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
 			const struct rte_flow_item *next = item + 1;
 
-			if (next->type) {
-				rte_flow_error_set(error, ENOTSUP,
-						   RTE_FLOW_ERROR_TYPE_ITEM,
-						   item,
-						   "the rule requires"
-						   " an Ethernet spec");
-				return -rte_errno;
-			}
+			if (next->type)
+				return rte_flow_error_set
+					(error, ENOTSUP,
+					 RTE_FLOW_ERROR_TYPE_ITEM, item,
+					 "the rule requires an Ethernet spec");
 		}
 		for (i = 0; proc->next_item && proc->next_item[i]; ++i) {
 			if (proc->next_item[i] == item->type) {
@@ -688,20 +670,17 @@ mlx4_flow_prepare(struct priv *priv,
 			goto exit_action_not_supported;
 		}
 	}
-	if (!target.queue && !target.drop) {
-		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_HANDLE,
-				   NULL, "no valid action");
-		return -rte_errno;
-	}
+	if (!target.queue && !target.drop)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "no valid action");
 	return 0;
 exit_item_not_supported:
-	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
-			   item, "item not supported");
-	return -rte_errno;
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, "item not supported");
 exit_action_not_supported:
-	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
-			   action, "action not supported");
-	return -rte_errno;
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				  action, "action not supported");
 }
 
 /**
@@ -824,7 +803,8 @@ mlx4_flow_create_target_queue(struct priv *priv,
 	assert(priv->ctx);
 	rte_flow = rte_calloc(__func__, 1, sizeof(*rte_flow), 0);
 	if (!rte_flow) {
-		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "cannot allocate flow memory");
 		return NULL;
 	}
@@ -841,7 +821,8 @@ mlx4_flow_create_target_queue(struct priv *priv,
 		return rte_flow;
 	rte_flow->ibv_flow = ibv_create_flow(qp, rte_flow->ibv_attr);
 	if (!rte_flow->ibv_flow) {
-		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "flow rule creation failure");
 		goto error;
 	}
@@ -876,7 +857,8 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		return NULL;
 	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
 	if (!flow.ibv_attr) {
-		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "cannot allocate ibv_attr memory");
 		return NULL;
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 09/29] mem: add iovec-like allocation wrappers
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (7 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 08/29] net/mlx4: compact flow rule error reporting Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 21:58   ` Ferruh Yigit
  2017-10-11 14:35 ` [PATCH v1 10/29] net/mlx4: merge flow creation and validation code Adrien Mazarguil
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

These wrappers implement the ability to allocate room for several disparate
objects as a single contiguous allocation while complying with their
respective alignment constraints.

This is usually more efficient than allocating and freeing them
individually if they are not expected to be reallocated with rte_realloc().

A typical use case is when several objects that cannot be dissociated must
be allocated together, as shown in the following example:

 struct b {
    ...
    struct d *d;
 }

 struct a {
     ...
     struct b *b;
     struct c *c;
 }

 struct rte_malloc_vec vec[] = {
     { .size = sizeof(struct a), .addr = &ptr_a, },
     { .size = sizeof(struct b), .addr = &ptr_b, },
     { .size = sizeof(struct c), .addr = &ptr_c, },
     { .size = sizeof(struct d), .addr = &ptr_d, },
 };

 if (!rte_mallocv(NULL, vec, RTE_DIM(vec)))
     goto error;

 struct a *a = ptr_a;

 a->b = ptr_b;
 a->c = ptr_c;
 a->b->d = ptr_d;
 ...
 rte_free(a);

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  9 ++
 lib/librte_eal/common/include/rte_malloc.h      | 85 ++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c              | 92 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  9 ++
 4 files changed, 195 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index de25582..1ab50a5 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -201,6 +201,15 @@ DPDK_17.08 {
 
 } DPDK_17.05;
 
+DPDK_17.11 {
+	global:
+
+	rte_mallocv;
+	rte_mallocv_socket;
+	rte_zmallocv;
+	rte_zmallocv_socket;
+} DPDK_17.08;
+
 EXPERIMENTAL {
 	global:
 
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 3d37f79..545697c 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -60,6 +60,13 @@ struct rte_malloc_socket_stats {
 	size_t heap_allocsz_bytes; /**< Total allocated bytes on heap */
 };
 
+/** Object description used with rte_mallocv() and similar functions. */
+struct rte_malloc_vec {
+	size_t align; /**< Alignment constraint (power of 2), 0 if unknown. */
+	size_t size; /**< Object size. */
+	void **addr; /**< Storage for allocation address. */
+};
+
 /**
  * This function allocates memory from the huge-page area of memory. The memory
  * is not cleared. In NUMA systems, the memory allocated resides on the same
@@ -335,6 +342,84 @@ rte_malloc_set_limit(const char *type, size_t max);
 phys_addr_t
 rte_malloc_virt2phy(const void *addr);
 
+/**
+ * Allocate memory once for several disparate objects.
+ *
+ * This function adds iovec-like semantics (e.g. readv()) to rte_malloc().
+ * Memory is allocated once for several contiguous objects of nonuniform
+ * sizes and alignment constraints.
+ *
+ * Each entry of @p vec describes the size, alignment constraint and
+ * provides a buffer address where the resulting object pointer must be
+ * stored.
+ *
+ * The buffer of the first entry is guaranteed to point to the beginning of
+ * the allocated region and is safe to use with rte_free().
+ *
+ * NULL buffers are silently ignored.
+ *
+ * Providing a NULL buffer in the first entry prevents this function from
+ * allocating any memory but has otherwise no effect on its behavior. In
+ * this case, the contents of remaining non-NULL buffers are updated with
+ * addresses relative to zero (i.e. offsets that would have been used during
+ * the allocation).
+ *
+ * @param[in] type
+ *   A string identifying the type of allocated objects (useful for debug
+ *   purposes, such as identifying the cause of a memory leak). Can be NULL.
+ * @param[in, out] vec
+ *   Description of objects to allocate memory for.
+ * @param cnt
+ *   Number of entries in @p vec.
+ *
+ * @return
+ *   Size in bytes of the allocated region including any padding. In case of
+ *   error, rte_errno is set, 0 is returned and NULL is stored in the
+ *   non-NULL buffers pointed by @p vec.
+ *
+ * @see rte_malloc()
+ * @see struct rte_malloc_vec
+ */
+size_t
+rte_mallocv(const char *type, const struct rte_malloc_vec *vec,
+	    unsigned int cnt);
+
+/**
+ * Combines the semantics of rte_mallocv() with those of rte_zmalloc().
+ *
+ * @see rte_mallocv()
+ * @see rte_zmalloc()
+ */
+size_t
+rte_zmallocv(const char *type, const struct rte_malloc_vec *vec,
+	     unsigned int cnt);
+
+/**
+ * Socket-aware version of rte_mallocv().
+ *
+ * This function takes one additional parameter.
+ *
+ * @param socket
+ *   NUMA socket to allocate memory on. If SOCKET_ID_ANY is used, this
+ *   function will behave the same as rte_mallocv().
+ *
+ * @see rte_mallocv()
+ */
+size_t
+rte_mallocv_socket(const char *type, const struct rte_malloc_vec *vec,
+		   unsigned int cnt, int socket);
+
+/**
+ * Combines the semantics of rte_mallocv_socket() with those of
+ * rte_zmalloc_socket().
+ *
+ * @see rte_mallocv_socket()
+ * @see rte_zmalloc_socket()
+ */
+size_t
+rte_zmallocv_socket(const char *type, const struct rte_malloc_vec *vec,
+		    unsigned int cnt, int socket);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index d65c05a..6c7a4f7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -41,6 +41,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -265,3 +266,94 @@ rte_malloc_virt2phy(const void *addr)
 			((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 	return paddr;
 }
+
+/*
+ * Internal helper to allocate memory once for several disparate objects.
+ *
+ * The most restrictive alignment constraint for standard objects is assumed
+ * to be sizeof(double) and is used as a default value.
+ *
+ * C11 code would include stdalign.h and use alignof(max_align_t) however
+ * we'll stick with C99 for the time being.
+ */
+static inline size_t
+rte_mallocv_inline(const char *type, const struct rte_malloc_vec *vec,
+		   unsigned int cnt, int zero, int socket)
+{
+	unsigned int i;
+	size_t size;
+	size_t least;
+	uint8_t *data = NULL;
+	int fill = !vec[0].addr;
+
+fill:
+	size = 0;
+	least = 0;
+	for (i = 0; i < cnt; ++i) {
+		size_t align = (uintptr_t)vec[i].align;
+
+		if (!align) {
+			align = sizeof(double);
+		} else if (!rte_is_power_of_2(align)) {
+			rte_errno = EINVAL;
+			goto error;
+		}
+		if (least < align)
+			least = align;
+		align = RTE_ALIGN_CEIL(size, align);
+		size = align + vec[i].size;
+		if (fill && vec[i].addr)
+			*vec[i].addr = data + align;
+	}
+	if (fill)
+		return size;
+	if (!zero)
+		data = rte_malloc_socket(type, size, least, socket);
+	else
+		data = rte_zmalloc_socket(type, size, least, socket);
+	if (data) {
+		fill = 1;
+		goto fill;
+	}
+	rte_errno = ENOMEM;
+error:
+	for (i = 0; i != cnt; ++i)
+		if (vec[i].addr)
+			*vec[i].addr = NULL;
+	return 0;
+}
+
+/* Allocate memory once for several disparate objects. */
+size_t
+rte_mallocv(const char *type, const struct rte_malloc_vec *vec,
+	    unsigned int cnt)
+{
+	return rte_mallocv_inline(type, vec, cnt, 0, SOCKET_ID_ANY);
+}
+
+/* Combines the semantics of rte_mallocv() with those of rte_zmalloc(). */
+size_t
+rte_zmallocv(const char *type, const struct rte_malloc_vec *vec,
+	     unsigned int cnt)
+{
+	return rte_mallocv_inline(type, vec, cnt, 1, SOCKET_ID_ANY);
+}
+
+/* Socket-aware version of rte_mallocv(). */
+size_t
+rte_mallocv_socket(const char *type, const struct rte_malloc_vec *vec,
+		   unsigned int cnt, int socket)
+{
+	return rte_mallocv_inline(type, vec, cnt, 0, socket);
+}
+
+/*
+ * Combines the semantics of rte_mallocv_socket() with those of
+ * rte_zmalloc_socket().
+ */
+size_t
+rte_zmallocv_socket(const char *type, const struct rte_malloc_vec *vec,
+		    unsigned int cnt, int socket)
+{
+	return rte_mallocv_inline(type, vec, cnt, 1, socket);
+}
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 146156e..d620da3 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -205,6 +205,15 @@ DPDK_17.08 {
 
 } DPDK_17.05;
 
+DPDK_17.11 {
+	global:
+
+	rte_mallocv;
+	rte_mallocv_socket;
+	rte_zmallocv;
+	rte_zmallocv_socket;
+} DPDK_17.08;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 10/29] net/mlx4: merge flow creation and validation code
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (8 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 09/29] mem: add iovec-like allocation wrappers Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 11/29] net/mlx4: allocate drop flow resources on demand Adrien Mazarguil
                   ` (19 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

These functions share a significant amount of code and require extra
internal objects to parse and build flow rule handles.

All this can be simplified by relying directly on the internal rte_flow
structure definition, whose QP pointer (destination Verbs queue) is
replaced by a DPDK queue ID and other properties, making it more versatile
without increasing its size (at least on 64-bit platforms).

This commit also gets rid of a few unnecessary debugging messages.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 300 ++++++++++++++++++--------------------
 drivers/net/mlx4/mlx4_flow.h |  16 +-
 2 files changed, 148 insertions(+), 168 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 000f17f..0a736b1 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -39,6 +39,7 @@
 #include <arpa/inet.h>
 #include <assert.h>
 #include <errno.h>
+#include <stdalign.h>
 #include <stddef.h>
 #include <stdint.h>
 #include <string.h>
@@ -110,14 +111,14 @@ struct mlx4_flow_proc_item {
 	 * @param default_mask
 	 *   Default bit-masks to use when item->mask is not provided.
 	 * @param flow
-	 *   Internal structure to store the conversion.
+	 *   Flow rule handle to update.
 	 *
 	 * @return
 	 *   0 on success, negative value otherwise.
 	 */
 	int (*convert)(const struct rte_flow_item *item,
 		       const void *default_mask,
-		       struct mlx4_flow *flow);
+		       struct rte_flow *flow);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
 	/** List of possible subsequent items. */
@@ -137,12 +138,12 @@ struct rte_flow_drop {
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_eth(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     struct mlx4_flow *flow)
+		     struct rte_flow *flow)
 {
 	const struct rte_flow_item_eth *spec = item->spec;
 	const struct rte_flow_item_eth *mask = item->mask;
@@ -152,7 +153,7 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 2;
-	eth = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
 		.type = IBV_FLOW_SPEC_ETH,
 		.size = eth_size,
@@ -183,19 +184,20 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_vlan(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      struct mlx4_flow *flow)
+		      struct rte_flow *flow)
 {
 	const struct rte_flow_item_vlan *spec = item->spec;
 	const struct rte_flow_item_vlan *mask = item->mask;
 	struct ibv_flow_spec_eth *eth;
 	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
 
-	eth = (void *)((uintptr_t)flow->ibv_attr + flow->offset - eth_size);
+	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size -
+		       eth_size);
 	if (!spec)
 		return 0;
 	if (!mask)
@@ -214,12 +216,12 @@ mlx4_flow_create_vlan(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      struct mlx4_flow *flow)
+		      struct rte_flow *flow)
 {
 	const struct rte_flow_item_ipv4 *spec = item->spec;
 	const struct rte_flow_item_ipv4 *mask = item->mask;
@@ -228,7 +230,7 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 1;
-	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*ipv4 = (struct ibv_flow_spec_ipv4) {
 		.type = IBV_FLOW_SPEC_IPV4,
 		.size = ipv4_size,
@@ -259,12 +261,12 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_udp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     struct mlx4_flow *flow)
+		     struct rte_flow *flow)
 {
 	const struct rte_flow_item_udp *spec = item->spec;
 	const struct rte_flow_item_udp *mask = item->mask;
@@ -273,7 +275,7 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 0;
-	udp = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	udp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*udp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_UDP,
 		.size = udp_size,
@@ -300,12 +302,12 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_tcp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     struct mlx4_flow *flow)
+		     struct rte_flow *flow)
 {
 	const struct rte_flow_item_tcp *spec = item->spec;
 	const struct rte_flow_item_tcp *mask = item->mask;
@@ -314,7 +316,7 @@ mlx4_flow_create_tcp(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 0;
-	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*tcp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_TCP,
 		.size = tcp_size,
@@ -556,8 +558,9 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
  *   Associated actions (list terminated by the END action).
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
- * @param[in, out] flow
- *   Flow structure to update.
+ * @param[in, out] addr
+ *   Buffer where the resulting flow rule handle pointer must be stored.
+ *   If NULL, stop processing after validation stage.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
@@ -568,15 +571,13 @@ mlx4_flow_prepare(struct priv *priv,
 		  const struct rte_flow_item pattern[],
 		  const struct rte_flow_action actions[],
 		  struct rte_flow_error *error,
-		  struct mlx4_flow *flow)
+		  struct rte_flow **addr)
 {
 	const struct rte_flow_item *item;
 	const struct rte_flow_action *action;
-	const struct mlx4_flow_proc_item *proc = mlx4_flow_proc_item_list;
-	struct mlx4_flow_target target = {
-		.queue = 0,
-		.drop = 0,
-	};
+	const struct mlx4_flow_proc_item *proc;
+	struct rte_flow temp = { .ibv_attr_size = sizeof(*temp.ibv_attr) };
+	struct rte_flow *flow = &temp;
 	uint32_t priority_override = 0;
 
 	if (attr->group)
@@ -603,6 +604,8 @@ mlx4_flow_prepare(struct priv *priv,
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
 			 NULL, "only ingress is supported");
+fill:
+	proc = mlx4_flow_proc_item_list;
 	/* Go over pattern. */
 	for (item = pattern; item->type; ++item) {
 		const struct mlx4_flow_proc_item *next = NULL;
@@ -633,10 +636,12 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!next)
 			goto exit_item_not_supported;
 		proc = next;
-		err = proc->validate(item, proc->mask, proc->mask_sz);
-		if (err)
-			goto exit_item_not_supported;
-		if (flow->ibv_attr && proc->convert) {
+		/* Perform validation once, while handle is not allocated. */
+		if (flow == &temp) {
+			err = proc->validate(item, proc->mask, proc->mask_sz);
+			if (err)
+				goto exit_item_not_supported;
+		} else if (proc->convert) {
 			err = proc->convert(item,
 					    (proc->default_mask ?
 					     proc->default_mask :
@@ -645,10 +650,10 @@ mlx4_flow_prepare(struct priv *priv,
 			if (err)
 				goto exit_item_not_supported;
 		}
-		flow->offset += proc->dst_sz;
+		flow->ibv_attr_size += proc->dst_sz;
 	}
 	/* Use specified priority level when in isolated mode. */
-	if (priv->isolated && flow->ibv_attr)
+	if (priv->isolated && flow != &temp)
 		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list. */
 	for (action = actions; action->type; ++action) {
@@ -658,22 +663,59 @@ mlx4_flow_prepare(struct priv *priv,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			target.drop = 1;
+			flow->drop = 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			queue = action->conf;
 			if (queue->index >= priv->dev->data->nb_rx_queues)
 				goto exit_action_not_supported;
-			target.queue = 1;
+			flow->queue = 1;
+			flow->queue_id = queue->index;
 			break;
 		default:
 			goto exit_action_not_supported;
 		}
 	}
-	if (!target.queue && !target.drop)
+	if (!flow->queue && !flow->drop)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "no valid action");
+	/* Validation ends here. */
+	if (!addr)
+		return 0;
+	if (flow == &temp) {
+		/* Allocate proper handle based on collected data. */
+		const struct rte_malloc_vec vec[] = {
+			{
+				.align = alignof(struct rte_flow),
+				.size = sizeof(*flow),
+				.addr = (void **)&flow,
+			},
+			{
+				.align = alignof(struct ibv_flow_attr),
+				.size = temp.ibv_attr_size,
+				.addr = (void **)&temp.ibv_attr,
+			},
+		};
+
+		if (!rte_zmallocv(__func__, vec, RTE_DIM(vec)))
+			return rte_flow_error_set
+				(error, -rte_errno,
+				 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				 "flow rule handle allocation failure");
+		/* Most fields will be updated by second pass. */
+		*flow = (struct rte_flow){
+			.ibv_attr = temp.ibv_attr,
+			.ibv_attr_size = sizeof(*flow->ibv_attr),
+		};
+		*flow->ibv_attr = (struct ibv_flow_attr){
+			.type = IBV_FLOW_ATTR_NORMAL,
+			.size = sizeof(*flow->ibv_attr),
+			.port = priv->port,
+		};
+		goto fill;
+	}
+	*addr = flow;
 	return 0;
 exit_item_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
@@ -697,9 +739,8 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	return mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
+	return mlx4_flow_prepare(priv, attr, pattern, actions, error, NULL);
 }
 
 /**
@@ -776,60 +817,66 @@ mlx4_flow_create_drop_queue(struct priv *priv)
 }
 
 /**
- * Complete flow rule creation.
+ * Toggle a configured flow rule.
  *
  * @param priv
  *   Pointer to private structure.
- * @param ibv_attr
- *   Verbs flow attributes.
- * @param target
- *   Rule target descriptor.
+ * @param flow
+ *   Flow rule handle to toggle.
+ * @param enable
+ *   Whether associated Verbs flow must be created or removed.
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
  *
  * @return
- *   A flow if the rule could be created.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static struct rte_flow *
-mlx4_flow_create_target_queue(struct priv *priv,
-			      struct ibv_flow_attr *ibv_attr,
-			      struct mlx4_flow_target *target,
-			      struct rte_flow_error *error)
+static int
+mlx4_flow_toggle(struct priv *priv,
+		 struct rte_flow *flow,
+		 int enable,
+		 struct rte_flow_error *error)
 {
-	struct ibv_qp *qp;
-	struct rte_flow *rte_flow;
-
-	assert(priv->pd);
-	assert(priv->ctx);
-	rte_flow = rte_calloc(__func__, 1, sizeof(*rte_flow), 0);
-	if (!rte_flow) {
-		rte_flow_error_set(error, ENOMEM,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "cannot allocate flow memory");
-		return NULL;
+	struct ibv_qp *qp = NULL;
+	const char *msg;
+	int err;
+
+	if (!enable) {
+		if (!flow->ibv_flow)
+			return 0;
+		claim_zero(ibv_destroy_flow(flow->ibv_flow));
+		flow->ibv_flow = NULL;
+		return 0;
 	}
-	if (target->drop) {
-		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
-	} else {
-		struct rxq *rxq = priv->dev->data->rx_queues[target->queue_id];
+	if (flow->ibv_flow)
+		return 0;
+	assert(flow->queue ^ flow->drop);
+	if (flow->queue) {
+		struct rxq *rxq;
 
+		assert(flow->queue_id < priv->dev->data->nb_rx_queues);
+		rxq = priv->dev->data->rx_queues[flow->queue_id];
+		if (!rxq) {
+			err = EINVAL;
+			msg = "target queue must be configured first";
+			goto error;
+		}
 		qp = rxq->qp;
-		rte_flow->qp = qp;
 	}
-	rte_flow->ibv_attr = ibv_attr;
-	if (!priv->started)
-		return rte_flow;
-	rte_flow->ibv_flow = ibv_create_flow(qp, rte_flow->ibv_attr);
-	if (!rte_flow->ibv_flow) {
-		rte_flow_error_set(error, ENOMEM,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "flow rule creation failure");
-		goto error;
+	if (flow->drop) {
+		assert(priv->flow_drop_queue);
+		qp = priv->flow_drop_queue->qp;
 	}
-	return rte_flow;
+	assert(qp);
+	assert(flow->ibv_attr);
+	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
+	if (flow->ibv_flow)
+		return 0;
+	err = errno;
+	msg = "flow rule rejected by device";
 error:
-	rte_free(rte_flow);
-	return NULL;
+	return rte_flow_error_set
+		(error, err, RTE_FLOW_ERROR_TYPE_HANDLE, flow, msg);
 }
 
 /**
@@ -845,69 +892,21 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error)
 {
-	const struct rte_flow_action *action;
 	struct priv *priv = dev->data->dev_private;
-	struct rte_flow *rte_flow;
-	struct mlx4_flow_target target;
-	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
+	struct rte_flow *flow;
 	int err;
 
 	err = mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
 	if (err)
 		return NULL;
-	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
-	if (!flow.ibv_attr) {
-		rte_flow_error_set(error, ENOMEM,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "cannot allocate ibv_attr memory");
-		return NULL;
+	err = mlx4_flow_toggle(priv, flow, priv->started, error);
+	if (!err) {
+		LIST_INSERT_HEAD(&priv->flows, flow, next);
+		return flow;
 	}
-	flow.offset = sizeof(struct ibv_flow_attr);
-	*flow.ibv_attr = (struct ibv_flow_attr){
-		.comp_mask = 0,
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.size = sizeof(struct ibv_flow_attr),
-		.priority = attr->priority,
-		.num_of_specs = 0,
-		.port = priv->port,
-		.flags = 0,
-	};
-	claim_zero(mlx4_flow_prepare(priv, attr, pattern, actions,
-				     error, &flow));
-	target = (struct mlx4_flow_target){
-		.queue = 0,
-		.drop = 0,
-	};
-	for (action = actions; action->type; ++action) {
-		switch (action->type) {
-			const struct rte_flow_action_queue *queue;
-
-		case RTE_FLOW_ACTION_TYPE_VOID:
-			continue;
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			queue = action->conf;
-			target.queue = 1;
-			target.queue_id = queue->index;
-			break;
-		case RTE_FLOW_ACTION_TYPE_DROP:
-			target.drop = 1;
-			break;
-		default:
-			rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   action, "unsupported action");
-			goto exit;
-		}
-	}
-	rte_flow = mlx4_flow_create_target_queue(priv, flow.ibv_attr,
-						 &target, error);
-	if (rte_flow) {
-		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
-		DEBUG("Flow created %p", (void *)rte_flow);
-		return rte_flow;
-	}
-exit:
-	rte_free(flow.ibv_attr);
+	rte_flow_error_set(error, -err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			   error->message);
+	rte_free(flow);
 	return NULL;
 }
 
@@ -939,7 +938,7 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 }
 
 /**
- * Destroy a flow.
+ * Destroy a flow rule.
  *
  * @see rte_flow_destroy()
  * @see rte_flow_ops
@@ -949,19 +948,18 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 		  struct rte_flow *flow,
 		  struct rte_flow_error *error)
 {
-	(void)dev;
-	(void)error;
+	struct priv *priv = dev->data->dev_private;
+	int err = mlx4_flow_toggle(priv, flow, 0, error);
+
+	if (err)
+		return err;
 	LIST_REMOVE(flow, next);
-	if (flow->ibv_flow)
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-	rte_free(flow->ibv_attr);
-	DEBUG("Flow destroyed %p", (void *)flow);
 	rte_free(flow);
 	return 0;
 }
 
 /**
- * Destroy all flows.
+ * Destroy all flow rules.
  *
  * @see rte_flow_flush()
  * @see rte_flow_ops
@@ -982,9 +980,7 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 }
 
 /**
- * Remove all flows.
- *
- * Called by dev_stop() to remove all flows.
+ * Disable flow rules.
  *
  * @param priv
  *   Pointer to private structure.
@@ -997,27 +993,24 @@ mlx4_flow_stop(struct priv *priv)
 	for (flow = LIST_FIRST(&priv->flows);
 	     flow;
 	     flow = LIST_NEXT(flow, next)) {
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-		flow->ibv_flow = NULL;
-		DEBUG("Flow %p removed", (void *)flow);
+		claim_zero(mlx4_flow_toggle(priv, flow, 0, NULL));
 	}
 	mlx4_flow_destroy_drop_queue(priv);
 }
 
 /**
- * Add all flows.
+ * Enable flow rules.
  *
  * @param priv
  *   Pointer to private structure.
  *
  * @return
- *   0 on success, a errno value otherwise and rte_errno is set.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx4_flow_start(struct priv *priv)
 {
 	int ret;
-	struct ibv_qp *qp;
 	struct rte_flow *flow;
 
 	ret = mlx4_flow_create_drop_queue(priv);
@@ -1026,14 +1019,11 @@ mlx4_flow_start(struct priv *priv)
 	for (flow = LIST_FIRST(&priv->flows);
 	     flow;
 	     flow = LIST_NEXT(flow, next)) {
-		qp = flow->qp ? flow->qp : priv->flow_drop_queue->qp;
-		flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
-		if (!flow->ibv_flow) {
-			DEBUG("Flow %p cannot be applied", (void *)flow);
-			rte_errno = EINVAL;
-			return rte_errno;
+		ret = mlx4_flow_toggle(priv, flow, 1, NULL);
+		if (unlikely(ret)) {
+			mlx4_flow_stop(priv);
+			return ret;
 		}
-		DEBUG("Flow %p applied", (void *)flow);
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 358efbe..68ffb33 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -60,20 +60,10 @@ struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
-	struct ibv_qp *qp; /**< Verbs queue pair. */
-};
-
-/** Structure to pass to the conversion function. */
-struct mlx4_flow {
-	struct ibv_flow_attr *ibv_attr; /**< Verbs attribute. */
-	unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */
-};
-
-/** Flow rule target descriptor. */
-struct mlx4_flow_target {
-	uint32_t drop:1; /**< Target is a drop queue. */
+	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
+	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
-	uint32_t queue_id; /**< Identifier of the queue. */
+	uint16_t queue_id; /**< Target queue. */
 };
 
 /* mlx4_flow.c */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 11/29] net/mlx4: allocate drop flow resources on demand
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (9 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 10/29] net/mlx4: merge flow creation and validation code Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 12/29] net/mlx4: relax check on missing flow rule target Adrien Mazarguil
                   ` (18 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Verbs QP and CQ resources for drop flow rules do not need to be permanently
allocated, only when at least one rule needs them.

Besides, struct rte_flow_drop is outside the mlx4 PMD name space and should
never have been defined there. struct rte_flow is currently the only
exception to this rule.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h      |   3 +-
 drivers/net/mlx4/mlx4_flow.c | 138 ++++++++++++++++++++------------------
 2 files changed, 74 insertions(+), 67 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 1799951..f71679b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -88,6 +88,7 @@ enum {
 /** Driver name reported to lower layers and used in log output. */
 #define MLX4_DRIVER_NAME "net_mlx4"
 
+struct mlx4_drop;
 struct rxq;
 struct txq;
 struct rte_flow;
@@ -108,7 +109,7 @@ struct priv {
 	uint32_t intr_alarm:1; /**< An interrupt alarm is scheduled. */
 	uint32_t isolated:1; /**< Toggle isolated mode. */
 	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
-	struct rte_flow_drop *flow_drop_queue; /**< Flow drop queue. */
+	struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
 	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
 };
 
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 0a736b1..658c92f 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -125,9 +125,12 @@ struct mlx4_flow_proc_item {
 	const enum rte_flow_item_type *const next_item;
 };
 
-struct rte_flow_drop {
-	struct ibv_qp *qp; /**< Verbs queue pair. */
-	struct ibv_cq *cq; /**< Verbs completion queue. */
+/** Shared resources for drop flow rules. */
+struct mlx4_drop {
+	struct ibv_qp *qp; /**< QP target. */
+	struct ibv_cq *cq; /**< CQ associated with above QP. */
+	struct priv *priv; /**< Back pointer to private data. */
+	uint32_t refcnt; /**< Reference count. */
 };
 
 /**
@@ -744,76 +747,73 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 }
 
 /**
- * Destroy a drop queue.
+ * Get a drop flow rule resources instance.
  *
  * @param priv
  *   Pointer to private structure.
+ *
+ * @return
+ *   Pointer to drop flow resources on success, NULL otherwise and rte_errno
+ *   is set.
  */
-static void
-mlx4_flow_destroy_drop_queue(struct priv *priv)
+static struct mlx4_drop *
+mlx4_drop_get(struct priv *priv)
 {
-	if (priv->flow_drop_queue) {
-		struct rte_flow_drop *fdq = priv->flow_drop_queue;
+	struct mlx4_drop *drop = priv->drop;
 
-		priv->flow_drop_queue = NULL;
-		claim_zero(ibv_destroy_qp(fdq->qp));
-		claim_zero(ibv_destroy_cq(fdq->cq));
-		rte_free(fdq);
+	if (drop) {
+		assert(drop->refcnt);
+		assert(drop->priv == priv);
+		++drop->refcnt;
+		return drop;
 	}
+	drop = rte_malloc(__func__, sizeof(*drop), 0);
+	if (!drop)
+		goto error;
+	*drop = (struct mlx4_drop){
+		.priv = priv,
+		.refcnt = 1,
+	};
+	drop->cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0);
+	if (!drop->cq)
+		goto error;
+	drop->qp = ibv_create_qp(priv->pd,
+				 &(struct ibv_qp_init_attr){
+					.send_cq = drop->cq,
+					.recv_cq = drop->cq,
+					.qp_type = IBV_QPT_RAW_PACKET,
+				 });
+	if (!drop->qp)
+		goto error;
+	priv->drop = drop;
+	return drop;
+error:
+	if (drop->qp)
+		claim_zero(ibv_destroy_qp(drop->qp));
+	if (drop->cq)
+		claim_zero(ibv_destroy_cq(drop->cq));
+	if (drop)
+		rte_free(drop);
+	rte_errno = ENOMEM;
+	return NULL;
 }
 
 /**
- * Create a single drop queue for all drop flows.
+ * Give back a drop flow rule resources instance.
  *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative value otherwise.
+ * @param drop
+ *   Pointer to drop flow rule resources.
  */
-static int
-mlx4_flow_create_drop_queue(struct priv *priv)
+static void
+mlx4_drop_put(struct mlx4_drop *drop)
 {
-	struct ibv_qp *qp;
-	struct ibv_cq *cq;
-	struct rte_flow_drop *fdq;
-
-	fdq = rte_calloc(__func__, 1, sizeof(*fdq), 0);
-	if (!fdq) {
-		ERROR("Cannot allocate memory for drop struct");
-		goto err;
-	}
-	cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0);
-	if (!cq) {
-		ERROR("Cannot create drop CQ");
-		goto err_create_cq;
-	}
-	qp = ibv_create_qp(priv->pd,
-			   &(struct ibv_qp_init_attr){
-				.send_cq = cq,
-				.recv_cq = cq,
-				.cap = {
-					.max_recv_wr = 1,
-					.max_recv_sge = 1,
-				},
-				.qp_type = IBV_QPT_RAW_PACKET,
-			   });
-	if (!qp) {
-		ERROR("Cannot create drop QP");
-		goto err_create_qp;
-	}
-	*fdq = (struct rte_flow_drop){
-		.qp = qp,
-		.cq = cq,
-	};
-	priv->flow_drop_queue = fdq;
-	return 0;
-err_create_qp:
-	claim_zero(ibv_destroy_cq(cq));
-err_create_cq:
-	rte_free(fdq);
-err:
-	return -1;
+	assert(drop->refcnt);
+	if (--drop->refcnt)
+		return;
+	drop->priv->drop = NULL;
+	claim_zero(ibv_destroy_qp(drop->qp));
+	claim_zero(ibv_destroy_cq(drop->cq));
+	rte_free(drop);
 }
 
 /**
@@ -846,6 +846,8 @@ mlx4_flow_toggle(struct priv *priv,
 			return 0;
 		claim_zero(ibv_destroy_flow(flow->ibv_flow));
 		flow->ibv_flow = NULL;
+		if (flow->drop)
+			mlx4_drop_put(priv->drop);
 		return 0;
 	}
 	if (flow->ibv_flow)
@@ -864,14 +866,21 @@ mlx4_flow_toggle(struct priv *priv,
 		qp = rxq->qp;
 	}
 	if (flow->drop) {
-		assert(priv->flow_drop_queue);
-		qp = priv->flow_drop_queue->qp;
+		mlx4_drop_get(priv);
+		if (!priv->drop) {
+			err = rte_errno;
+			msg = "resources for drop flow rule cannot be created";
+			goto error;
+		}
+		qp = priv->drop->qp;
 	}
 	assert(qp);
 	assert(flow->ibv_attr);
 	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
 	if (flow->ibv_flow)
 		return 0;
+	if (flow->drop)
+		mlx4_drop_put(priv->drop);
 	err = errno;
 	msg = "flow rule rejected by device";
 error:
@@ -995,7 +1004,7 @@ mlx4_flow_stop(struct priv *priv)
 	     flow = LIST_NEXT(flow, next)) {
 		claim_zero(mlx4_flow_toggle(priv, flow, 0, NULL));
 	}
-	mlx4_flow_destroy_drop_queue(priv);
+	assert(!priv->drop);
 }
 
 /**
@@ -1013,9 +1022,6 @@ mlx4_flow_start(struct priv *priv)
 	int ret;
 	struct rte_flow *flow;
 
-	ret = mlx4_flow_create_drop_queue(priv);
-	if (ret)
-		return -1;
 	for (flow = LIST_FIRST(&priv->flows);
 	     flow;
 	     flow = LIST_NEXT(flow, next)) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 12/29] net/mlx4: relax check on missing flow rule target
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (10 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 11/29] net/mlx4: allocate drop flow resources on demand Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 13/29] net/mlx4: refactor internal flow rules Adrien Mazarguil
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Creating a flow rule targeting a missing (unconfigured) queue is not
possible. However, nothing really prevents the destruction of a queue with
existing flow rules still pointing at it, except currently the port must be
in a stopped state in order to avoid crashing.

Problem is that the port cannot be restarted if flow rules cannot be
re-applied due to missing queues. This flexibility will be needed by
subsequent work on this PMD.

Given that a PMD cannot decide on its own to remove problematic
user-defined flow rules in order to restart a port, work around this
restriction by making the affected ones drop-like, i.e. rules targeting
nonexistent queues drop packets instead.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 658c92f..3f97987 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -850,20 +850,24 @@ mlx4_flow_toggle(struct priv *priv,
 			mlx4_drop_put(priv->drop);
 		return 0;
 	}
-	if (flow->ibv_flow)
-		return 0;
-	assert(flow->queue ^ flow->drop);
 	if (flow->queue) {
-		struct rxq *rxq;
+		struct rxq *rxq = NULL;
 
-		assert(flow->queue_id < priv->dev->data->nb_rx_queues);
-		rxq = priv->dev->data->rx_queues[flow->queue_id];
-		if (!rxq) {
-			err = EINVAL;
-			msg = "target queue must be configured first";
-			goto error;
+		if (flow->queue_id < priv->dev->data->nb_rx_queues)
+			rxq = priv->dev->data->rx_queues[flow->queue_id];
+		if (flow->ibv_flow) {
+			if (!rxq ^ !flow->drop)
+				return 0;
+			/* Verbs flow needs updating. */
+			claim_zero(ibv_destroy_flow(flow->ibv_flow));
+			flow->ibv_flow = NULL;
+			if (flow->drop)
+				mlx4_drop_put(priv->drop);
 		}
-		qp = rxq->qp;
+		if (rxq)
+			qp = rxq->qp;
+		/* A missing target queue drops traffic implicitly. */
+		flow->drop = !rxq;
 	}
 	if (flow->drop) {
 		mlx4_drop_get(priv);
@@ -876,6 +880,8 @@ mlx4_flow_toggle(struct priv *priv,
 	}
 	assert(qp);
 	assert(flow->ibv_attr);
+	if (flow->ibv_flow)
+		return 0;
 	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
 	if (flow->ibv_flow)
 		return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 13/29] net/mlx4: refactor internal flow rules
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (11 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 12/29] net/mlx4: relax check on missing flow rule target Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 14/29] net/mlx4: generalize flow rule priority support Adrien Mazarguil
                   ` (16 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

When not in isolated mode, a flow rule is automatically configured by the
PMD to receive traffic addressed to the MAC address of the device. This
somewhat duplicates flow API functionality.

Remove legacy support for internal flow rules to instead handle them
through the flow API implementation.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  20 ++---
 drivers/net/mlx4/mlx4.h      |   1 -
 drivers/net/mlx4/mlx4_flow.c | 155 +++++++++++++++++++++++++++++++++++---
 drivers/net/mlx4/mlx4_flow.h |   6 ++
 drivers/net/mlx4/mlx4_rxq.c  | 117 +++-------------------------
 drivers/net/mlx4/mlx4_rxtx.h |   2 -
 6 files changed, 172 insertions(+), 129 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b084903..40c0ee2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -96,8 +96,15 @@ const char *pmd_mlx4_init_params[] = {
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	return 0;
+	struct priv *priv = dev->data->dev_private;
+	int ret;
+
+	/* Prepare internal flow rules. */
+	ret = mlx4_flow_sync(priv);
+	if (ret)
+		ERROR("cannot set up internal flow rules: %s",
+		      strerror(-ret));
+	return ret;
 }
 
 /**
@@ -121,9 +128,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		return 0;
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
-	ret = mlx4_mac_addr_add(priv);
-	if (ret)
-		goto err;
 	ret = mlx4_intr_install(priv);
 	if (ret) {
 		ERROR("%p: interrupt handler installation failed",
@@ -139,7 +143,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	return 0;
 err:
 	/* Rollback. */
-	mlx4_mac_addr_del(priv);
 	priv->started = 0;
 	return ret;
 }
@@ -163,7 +166,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	priv->started = 0;
 	mlx4_flow_stop(priv);
 	mlx4_intr_uninstall(priv);
-	mlx4_mac_addr_del(priv);
 }
 
 /**
@@ -185,7 +187,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
-	mlx4_mac_addr_del(priv);
+	mlx4_flow_clean(priv);
 	dev->rx_pkt_burst = mlx4_rx_burst_removed;
 	dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	for (i = 0; i != dev->data->nb_rx_queues; ++i)
@@ -542,8 +544,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
 		priv->mac = mac;
-		if (mlx4_mac_addr_add(priv))
-			goto port_error;
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index f71679b..fb4708d 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -100,7 +100,6 @@ struct priv {
 	struct ibv_device_attr device_attr; /**< Device properties. */
 	struct ibv_pd *pd; /**< Protection Domain. */
 	struct ether_addr mac; /**< MAC address. */
-	struct ibv_flow *mac_flow; /**< Flow associated with MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /**< Configured MTU. */
 	uint8_t port; /**< Physical port number. */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 3f97987..be644a4 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -617,6 +617,10 @@ mlx4_flow_prepare(struct priv *priv,
 
 		if (item->type == RTE_FLOW_ITEM_TYPE_VOID)
 			continue;
+		if (item->type == MLX4_FLOW_ITEM_TYPE_INTERNAL) {
+			flow->internal = 1;
+			continue;
+		}
 		/*
 		 * The nic can support patterns with NULL eth spec only
 		 * if eth is a single item in a rule.
@@ -916,7 +920,17 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		return NULL;
 	err = mlx4_flow_toggle(priv, flow, priv->started, error);
 	if (!err) {
-		LIST_INSERT_HEAD(&priv->flows, flow, next);
+		struct rte_flow *curr = LIST_FIRST(&priv->flows);
+
+		/* New rules are inserted after internal ones. */
+		if (!curr || !curr->internal) {
+			LIST_INSERT_HEAD(&priv->flows, flow, next);
+		} else {
+			while (LIST_NEXT(curr, next) &&
+			       LIST_NEXT(curr, next)->internal)
+				curr = LIST_NEXT(curr, next);
+			LIST_INSERT_AFTER(curr, flow, next);
+		}
 		return flow;
 	}
 	rte_flow_error_set(error, -err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
@@ -941,13 +955,14 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 	if (!!enable == !!priv->isolated)
 		return 0;
 	priv->isolated = !!enable;
-	if (enable) {
-		mlx4_mac_addr_del(priv);
-	} else if (mlx4_mac_addr_add(priv) < 0) {
-		priv->isolated = 1;
+	if (mlx4_flow_sync(priv)) {
+		priv->isolated = !enable;
 		return rte_flow_error_set(error, rte_errno,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					  NULL, "cannot leave isolated mode");
+					  NULL,
+					  enable ?
+					  "cannot enter isolated mode" :
+					  "cannot leave isolated mode");
 	}
 	return 0;
 }
@@ -974,7 +989,9 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 }
 
 /**
- * Destroy all flow rules.
+ * Destroy user-configured flow rules.
+ *
+ * This function skips internal flows rules.
  *
  * @see rte_flow_flush()
  * @see rte_flow_ops
@@ -984,17 +1001,133 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_flow *flow = LIST_FIRST(&priv->flows);
 
-	while (!LIST_EMPTY(&priv->flows)) {
-		struct rte_flow *flow;
+	while (flow) {
+		struct rte_flow *next = LIST_NEXT(flow, next);
 
-		flow = LIST_FIRST(&priv->flows);
-		mlx4_flow_destroy(dev, flow, error);
+		if (!flow->internal)
+			mlx4_flow_destroy(dev, flow, error);
+		flow = next;
 	}
 	return 0;
 }
 
 /**
+ * Generate internal flow rules.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
+{
+	struct rte_flow_attr attr = {
+		.ingress = 1,
+	};
+	struct rte_flow_item pattern[] = {
+		{
+			.type = MLX4_FLOW_ITEM_TYPE_INTERNAL,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &(struct rte_flow_item_eth){
+				.dst = priv->mac,
+			},
+			.mask = &(struct rte_flow_item_eth){
+				.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+			},
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_QUEUE,
+			.conf = &(struct rte_flow_action_queue){
+				.index = 0,
+			},
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	if (!mlx4_flow_create(priv->dev, &attr, pattern, actions, error))
+		return -rte_errno;
+	return 0;
+}
+
+/**
+ * Synchronize flow rules.
+ *
+ * This function synchronizes flow rules with the state of the device by
+ * taking into account isolated mode and whether target queues are
+ * configured.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_flow_sync(struct priv *priv)
+{
+	struct rte_flow *flow;
+	int ret;
+
+	/* Internal flow rules are guaranteed to come first in the list. */
+	if (priv->isolated) {
+		/*
+		 * Get rid of them in isolated mode, stop at the first
+		 * non-internal rule found.
+		 */
+		for (flow = LIST_FIRST(&priv->flows);
+		     flow && flow->internal;
+		     flow = LIST_FIRST(&priv->flows))
+			claim_zero(mlx4_flow_destroy(priv->dev, flow, NULL));
+	} else if (!LIST_FIRST(&priv->flows) ||
+		   !LIST_FIRST(&priv->flows)->internal) {
+		/*
+		 * If the first rule is not internal outside isolated mode,
+		 * they must be added back.
+		 */
+		ret = mlx4_flow_internal(priv, NULL);
+		if (ret)
+			return ret;
+	}
+	if (priv->started)
+		return mlx4_flow_start(priv);
+	mlx4_flow_stop(priv);
+	return 0;
+}
+
+/**
+ * Clean up all flow rules.
+ *
+ * Unlike mlx4_flow_flush(), this function takes care of all remaining flow
+ * rules regardless of whether they are internal or user-configured.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+mlx4_flow_clean(struct priv *priv)
+{
+	struct rte_flow *flow;
+
+	while ((flow = LIST_FIRST(&priv->flows)))
+		mlx4_flow_destroy(priv->dev, flow, NULL);
+}
+
+/**
  * Disable flow rules.
  *
  * @param priv
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 68ffb33..c2ffa8d 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -55,12 +55,16 @@
 /** Last and lowest priority level for a flow rule. */
 #define MLX4_FLOW_PRIORITY_LAST UINT32_C(0xfff)
 
+/** Meta pattern item used to distinguish internal rules. */
+#define MLX4_FLOW_ITEM_TYPE_INTERNAL ((enum rte_flow_item_type)-1)
+
 /** PMD-specific (mlx4) definition of a flow rule handle. */
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
 	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
+	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint16_t queue_id; /**< Target queue. */
@@ -68,6 +72,8 @@ struct rte_flow {
 
 /* mlx4_flow.c */
 
+int mlx4_flow_sync(struct priv *priv);
+void mlx4_flow_clean(struct priv *priv);
 int mlx4_flow_start(struct priv *priv);
 void mlx4_flow_stop(struct priv *priv);
 int mlx4_filter_ctrl(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 2d54ab0..7bb2f9e 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -59,6 +59,7 @@
 #include <rte_mempool.h>
 
 #include "mlx4.h"
+#include "mlx4_flow.h"
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
@@ -399,8 +400,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -rte_errno;
 		}
 		dev->data->rx_queues[idx] = NULL;
-		if (idx == 0)
-			mlx4_mac_addr_del(priv);
+		/* Disable associated flows. */
+		mlx4_flow_sync(priv);
 		mlx4_rxq_cleanup(rxq);
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
@@ -419,6 +420,14 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		DEBUG("%p: adding Rx queue %p to list",
 		      (void *)dev, (void *)rxq);
 		dev->data->rx_queues[idx] = rxq;
+		/* Re-enable associated flows. */
+		ret = mlx4_flow_sync(priv);
+		if (ret) {
+			dev->data->rx_queues[idx] = NULL;
+			mlx4_rxq_cleanup(rxq);
+			rte_free(rxq);
+			return ret;
+		}
 		/* Update receive callback. */
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
@@ -446,111 +455,9 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			DEBUG("%p: removing Rx queue %p from list",
 			      (void *)priv->dev, (void *)rxq);
 			priv->dev->data->rx_queues[i] = NULL;
-			if (i == 0)
-				mlx4_mac_addr_del(priv);
 			break;
 		}
+	mlx4_flow_sync(priv);
 	mlx4_rxq_cleanup(rxq);
 	rte_free(rxq);
 }
-
-/**
- * Unregister a MAC address.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void
-mlx4_mac_addr_del(struct priv *priv)
-{
-#ifndef NDEBUG
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-#endif
-
-	if (!priv->mac_flow)
-		return;
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	claim_zero(ibv_destroy_flow(priv->mac_flow));
-	priv->mac_flow = NULL;
-}
-
-/**
- * Register a MAC address.
- *
- * The MAC address is registered in queue 0.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_mac_addr_add(struct priv *priv)
-{
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-	struct rxq *rxq;
-	struct ibv_flow *flow;
-
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		return 0;
-	if (priv->isolated)
-		return 0;
-	if (priv->dev->data->rx_queues && priv->dev->data->rx_queues[0])
-		rxq = priv->dev->data->rx_queues[0];
-	else
-		return 0;
-
-	/* Allocate flow specification on the stack. */
-	struct __attribute__((packed)) {
-		struct ibv_flow_attr attr;
-		struct ibv_flow_spec_eth spec;
-	} data;
-	struct ibv_flow_attr *attr = &data.attr;
-	struct ibv_flow_spec_eth *spec = &data.spec;
-
-	if (priv->mac_flow)
-		mlx4_mac_addr_del(priv);
-	/*
-	 * No padding must be inserted by the compiler between attr and spec.
-	 * This layout is expected by libibverbs.
-	 */
-	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-	*attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.priority = 3,
-		.num_of_specs = 1,
-		.port = priv->port,
-		.flags = 0
-	};
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
-		.size = sizeof(*spec),
-		.val = {
-			.dst_mac = {
-				(*mac)[0], (*mac)[1], (*mac)[2],
-				(*mac)[3], (*mac)[4], (*mac)[5]
-			},
-		},
-		.mask = {
-			.dst_mac = "\xff\xff\xff\xff\xff\xff",
-		}
-	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	/* Create related flow. */
-	flow = ibv_create_flow(rxq->qp, attr);
-	if (flow == NULL) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, rte_errno, strerror(errno));
-		return -rte_errno;
-	}
-	assert(priv->mac_flow == NULL);
-	priv->mac_flow = flow;
-	return 0;
-}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 365b585..7a2c982 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -128,8 +128,6 @@ int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
 void mlx4_rx_queue_release(void *dpdk_rxq);
-void mlx4_mac_addr_del(struct priv *priv);
-int mlx4_mac_addr_add(struct priv *priv);
 
 /* mlx4_rxtx.c */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 14/29] net/mlx4: generalize flow rule priority support
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (12 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 13/29] net/mlx4: refactor internal flow rules Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 15/29] net/mlx4: simplify trigger code for flow rules Adrien Mazarguil
                   ` (15 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Since both internal and user-defined flow rules are handled by a common
implementation, flow rule priority overlaps are easier to detect. No need
to restrict their use to isolated mode only.

With this patch, only the lowest priority level remains inaccessible to
users outside isolated mode.

Also, the PMD no longer automatically assigns a fixed priority level to
user-defined flow rules, which means collisions between overlapping rules
matching a different number of protocol layers at a given priority level
won't be avoided anymore (e.g. "eth" vs. "eth / ipv4 / udp").

As a reminder, the outcome of overlapping rules for a given priority level
was, and still is, undefined territory according to API documentation.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index be644a4..c4de9d9 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -155,7 +155,6 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
 	unsigned int i;
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 2;
 	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
 		.type = IBV_FLOW_SPEC_ETH,
@@ -232,7 +231,6 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 	unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4);
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 1;
 	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*ipv4 = (struct ibv_flow_spec_ipv4) {
 		.type = IBV_FLOW_SPEC_IPV4,
@@ -277,7 +275,6 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
 	unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 0;
 	udp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*udp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_UDP,
@@ -318,7 +315,6 @@ mlx4_flow_create_tcp(const struct rte_flow_item *item,
 	unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 0;
 	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*tcp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_TCP,
@@ -581,19 +577,11 @@ mlx4_flow_prepare(struct priv *priv,
 	const struct mlx4_flow_proc_item *proc;
 	struct rte_flow temp = { .ibv_attr_size = sizeof(*temp.ibv_attr) };
 	struct rte_flow *flow = &temp;
-	uint32_t priority_override = 0;
 
 	if (attr->group)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
 			 NULL, "groups are not supported");
-	if (priv->isolated)
-		priority_override = attr->priority;
-	else if (attr->priority)
-		return rte_flow_error_set
-			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
-			 NULL,
-			 "priorities are not supported outside isolated mode");
 	if (attr->priority > MLX4_FLOW_PRIORITY_LAST)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
@@ -659,9 +647,6 @@ mlx4_flow_prepare(struct priv *priv,
 		}
 		flow->ibv_attr_size += proc->dst_sz;
 	}
-	/* Use specified priority level when in isolated mode. */
-	if (priv->isolated && flow != &temp)
-		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list. */
 	for (action = actions; action->type; ++action) {
 		switch (action->type) {
@@ -718,6 +703,7 @@ mlx4_flow_prepare(struct priv *priv,
 		*flow->ibv_attr = (struct ibv_flow_attr){
 			.type = IBV_FLOW_ATTR_NORMAL,
 			.size = sizeof(*flow->ibv_attr),
+			.priority = attr->priority,
 			.port = priv->port,
 		};
 		goto fill;
@@ -854,6 +840,22 @@ mlx4_flow_toggle(struct priv *priv,
 			mlx4_drop_put(priv->drop);
 		return 0;
 	}
+	assert(flow->ibv_attr);
+	if (!flow->internal &&
+	    !priv->isolated &&
+	    flow->ibv_attr->priority == MLX4_FLOW_PRIORITY_LAST) {
+		if (flow->ibv_flow) {
+			claim_zero(ibv_destroy_flow(flow->ibv_flow));
+			flow->ibv_flow = NULL;
+			if (flow->drop)
+				mlx4_drop_put(priv->drop);
+		}
+		err = EACCES;
+		msg = ("priority level "
+		       MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST)
+		       " is reserved when not in isolated mode");
+		goto error;
+	}
 	if (flow->queue) {
 		struct rxq *rxq = NULL;
 
@@ -883,7 +885,6 @@ mlx4_flow_toggle(struct priv *priv,
 		qp = priv->drop->qp;
 	}
 	assert(qp);
-	assert(flow->ibv_attr);
 	if (flow->ibv_flow)
 		return 0;
 	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
@@ -1028,6 +1029,7 @@ static int
 mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 {
 	struct rte_flow_attr attr = {
+		.priority = MLX4_FLOW_PRIORITY_LAST,
 		.ingress = 1,
 	};
 	struct rte_flow_item pattern[] = {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 15/29] net/mlx4: simplify trigger code for flow rules
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (13 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 14/29] net/mlx4: generalize flow rule priority support Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 16/29] net/mlx4: refactor flow item validation code Adrien Mazarguil
                   ` (14 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Since flow rules synchronization function mlx4_flow_sync() takes into
account the state of the device (whether it is started), trigger functions
mlx4_flow_start() and mlx4_flow_stop() are redundant. Standardize on
mlx4_flow_sync().

Use this opportunity to enhance this function with better error reporting
as the inability to start the device due to a problem with a flow rule
otherwise results in a nondescript error code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 25 ++++++++-----
 drivers/net/mlx4/mlx4_flow.c | 76 +++++++++------------------------------
 drivers/net/mlx4/mlx4_flow.h |  4 +--
 drivers/net/mlx4/mlx4_rxq.c  | 14 ++++++--
 4 files changed, 46 insertions(+), 73 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 40c0ee2..256aa3d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -60,6 +60,7 @@
 #include <rte_ethdev.h>
 #include <rte_ethdev_pci.h>
 #include <rte_ether.h>
+#include <rte_flow.h>
 #include <rte_interrupts.h>
 #include <rte_kvargs.h>
 #include <rte_malloc.h>
@@ -97,13 +98,17 @@ static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
 	int ret;
 
 	/* Prepare internal flow rules. */
-	ret = mlx4_flow_sync(priv);
-	if (ret)
-		ERROR("cannot set up internal flow rules: %s",
-		      strerror(-ret));
+	ret = mlx4_flow_sync(priv, &error);
+	if (ret) {
+		ERROR("cannot set up internal flow rules (code %d, \"%s\"),"
+		      " flow error type %d, cause %p, message: %s",
+		      -ret, strerror(-ret), error.type, error.cause,
+		      error.message ? error.message : "(unspecified)");
+	}
 	return ret;
 }
 
@@ -122,6 +127,7 @@ static int
 mlx4_dev_start(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
 	int ret;
 
 	if (priv->started)
@@ -134,10 +140,13 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		     (void *)dev);
 		goto err;
 	}
-	ret = mlx4_flow_start(priv);
+	ret = mlx4_flow_sync(priv, &error);
 	if (ret) {
-		ERROR("%p: flow start failed: %s",
-		      (void *)dev, strerror(ret));
+		ERROR("%p: cannot attach flow rules (code %d, \"%s\"),"
+		      " flow error type %d, cause %p, message: %s",
+		      (void *)dev,
+		      -ret, strerror(-ret), error.type, error.cause,
+		      error.message ? error.message : "(unspecified)");
 		goto err;
 	}
 	return 0;
@@ -164,7 +173,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		return;
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
-	mlx4_flow_stop(priv);
+	mlx4_flow_sync(priv, NULL);
 	mlx4_intr_uninstall(priv);
 }
 
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index c4de9d9..218b23f 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -956,14 +956,9 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 	if (!!enable == !!priv->isolated)
 		return 0;
 	priv->isolated = !!enable;
-	if (mlx4_flow_sync(priv)) {
+	if (mlx4_flow_sync(priv, error)) {
 		priv->isolated = !enable;
-		return rte_flow_error_set(error, rte_errno,
-					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					  NULL,
-					  enable ?
-					  "cannot enter isolated mode" :
-					  "cannot leave isolated mode");
+		return -rte_errno;
 	}
 	return 0;
 }
@@ -1075,12 +1070,14 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
  *
  * @param priv
  *   Pointer to private structure.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx4_flow_sync(struct priv *priv)
+mlx4_flow_sync(struct priv *priv, struct rte_flow_error *error)
 {
 	struct rte_flow *flow;
 	int ret;
@@ -1094,20 +1091,27 @@ mlx4_flow_sync(struct priv *priv)
 		for (flow = LIST_FIRST(&priv->flows);
 		     flow && flow->internal;
 		     flow = LIST_FIRST(&priv->flows))
-			claim_zero(mlx4_flow_destroy(priv->dev, flow, NULL));
+			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
 	} else if (!LIST_FIRST(&priv->flows) ||
 		   !LIST_FIRST(&priv->flows)->internal) {
 		/*
 		 * If the first rule is not internal outside isolated mode,
 		 * they must be added back.
 		 */
-		ret = mlx4_flow_internal(priv, NULL);
+		ret = mlx4_flow_internal(priv, error);
+		if (ret)
+			return ret;
+	}
+	/* Toggle the remaining flow rules . */
+	for (flow = LIST_FIRST(&priv->flows);
+	     flow;
+	     flow = LIST_NEXT(flow, next)) {
+		ret = mlx4_flow_toggle(priv, flow, priv->started, error);
 		if (ret)
 			return ret;
 	}
-	if (priv->started)
-		return mlx4_flow_start(priv);
-	mlx4_flow_stop(priv);
+	if (!priv->started)
+		assert(!priv->drop);
 	return 0;
 }
 
@@ -1129,52 +1133,6 @@ mlx4_flow_clean(struct priv *priv)
 		mlx4_flow_destroy(priv->dev, flow, NULL);
 }
 
-/**
- * Disable flow rules.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void
-mlx4_flow_stop(struct priv *priv)
-{
-	struct rte_flow *flow;
-
-	for (flow = LIST_FIRST(&priv->flows);
-	     flow;
-	     flow = LIST_NEXT(flow, next)) {
-		claim_zero(mlx4_flow_toggle(priv, flow, 0, NULL));
-	}
-	assert(!priv->drop);
-}
-
-/**
- * Enable flow rules.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_flow_start(struct priv *priv)
-{
-	int ret;
-	struct rte_flow *flow;
-
-	for (flow = LIST_FIRST(&priv->flows);
-	     flow;
-	     flow = LIST_NEXT(flow, next)) {
-		ret = mlx4_flow_toggle(priv, flow, 1, NULL);
-		if (unlikely(ret)) {
-			mlx4_flow_stop(priv);
-			return ret;
-		}
-	}
-	return 0;
-}
-
 static const struct rte_flow_ops mlx4_flow_ops = {
 	.validate = mlx4_flow_validate,
 	.create = mlx4_flow_create,
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index c2ffa8d..13495d7 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -72,10 +72,8 @@ struct rte_flow {
 
 /* mlx4_flow.c */
 
-int mlx4_flow_sync(struct priv *priv);
+int mlx4_flow_sync(struct priv *priv, struct rte_flow_error *error);
 void mlx4_flow_clean(struct priv *priv);
-int mlx4_flow_start(struct priv *priv);
-void mlx4_flow_stop(struct priv *priv);
 int mlx4_filter_ctrl(struct rte_eth_dev *dev,
 		     enum rte_filter_type filter_type,
 		     enum rte_filter_op filter_op,
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 7bb2f9e..bcb7b94 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -54,6 +54,7 @@
 #include <rte_common.h>
 #include <rte_errno.h>
 #include <rte_ethdev.h>
+#include <rte_flow.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
@@ -401,7 +402,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		}
 		dev->data->rx_queues[idx] = NULL;
 		/* Disable associated flows. */
-		mlx4_flow_sync(priv);
+		mlx4_flow_sync(priv, NULL);
 		mlx4_rxq_cleanup(rxq);
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
@@ -416,13 +417,20 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	if (ret) {
 		rte_free(rxq);
 	} else {
+		struct rte_flow_error error;
+
 		rxq->stats.idx = idx;
 		DEBUG("%p: adding Rx queue %p to list",
 		      (void *)dev, (void *)rxq);
 		dev->data->rx_queues[idx] = rxq;
 		/* Re-enable associated flows. */
-		ret = mlx4_flow_sync(priv);
+		ret = mlx4_flow_sync(priv, &error);
 		if (ret) {
+			ERROR("cannot re-attach flow rules to queue %u"
+			      " (code %d, \"%s\"), flow error type %d,"
+			      " cause %p, message: %s", idx,
+			      -ret, strerror(-ret), error.type, error.cause,
+			      error.message ? error.message : "(unspecified)");
 			dev->data->rx_queues[idx] = NULL;
 			mlx4_rxq_cleanup(rxq);
 			rte_free(rxq);
@@ -457,7 +465,7 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			priv->dev->data->rx_queues[i] = NULL;
 			break;
 		}
-	mlx4_flow_sync(priv);
+	mlx4_flow_sync(priv, NULL);
 	mlx4_rxq_cleanup(rxq);
 	rte_free(rxq);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 16/29] net/mlx4: refactor flow item validation code
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (14 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 15/29] net/mlx4: simplify trigger code for flow rules Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 17/29] net/mlx4: add MAC addresses configuration support Adrien Mazarguil
                   ` (13 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Since flow rule validation and creation have been refactored into a common
two-pass function, having separate callback functions to validate and
convert individual items seems redundant.

The purpose of these item validation functions is to reject partial masks
as those are not supported by hardware, before handing over the item to a
separate function that performs basic sanity checks.

The current approach and related code have the following issues:

- Lack of flow handle context in validation code requires kludges such as
  the special treatment reserved to spec-less Ethernet pattern items.
- Lack of useful error reporting; users need as much help as possible to
  understand what they did wrong, particularly when they hit hardware
  limitations that aren't mentioned by the flow API. Preventing them from
  going berserk after getting a generic "item not supported" message for no
  apparent reason is mandatory.
- Generic checks should be performed by the caller, not by item-specific
  validation functions.
- Mask checks either missing or too lax in some cases (Ethernet, VLAN).

This commit addresses all the above by combining validation and conversion
callbacks as "merge" callbacks that take an additional error context
parameter. Also:

- Support for source MAC address matching is removed as it has no effect.
- Providing an empty mask no longer bypasses the Ethernet specification
  check that causes a rule to become promiscuous-like.
- VLAN VIDs must be matched exactly, as matching all VLAN traffic while
  excluding non-VLAN traffic is not supported.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 576 +++++++++++++++++++-------------------
 drivers/net/mlx4/mlx4_flow.h |   1 +
 2 files changed, 288 insertions(+), 289 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 218b23f..b59efe1 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -76,49 +76,17 @@
 
 /** Processor structure associated with a flow item. */
 struct mlx4_flow_proc_item {
-	/** Bit-masks corresponding to the possibilities for the item. */
-	const void *mask;
-	/**
-	 * Default bit-masks to use when item->mask is not provided. When
-	 * \default_mask is also NULL, the full supported bit-mask (\mask) is
-	 * used instead.
-	 */
-	const void *default_mask;
-	/** Bit-masks size in bytes. */
+	/** Bit-mask for fields supported by this PMD. */
+	const void *mask_support;
+	/** Bit-mask to use when @p item->mask is not provided. */
+	const void *mask_default;
+	/** Size in bytes for @p mask_support and @p mask_default. */
 	const unsigned int mask_sz;
-	/**
-	 * Check support for a given item.
-	 *
-	 * @param item[in]
-	 *   Item specification.
-	 * @param mask[in]
-	 *   Bit-masks covering supported fields to compare with spec,
-	 *   last and mask in
-	 *   \item.
-	 * @param size
-	 *   Bit-Mask size in bytes.
-	 *
-	 * @return
-	 *   0 on success, negative value otherwise.
-	 */
-	int (*validate)(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size);
-	/**
-	 * Conversion function from rte_flow to NIC specific flow.
-	 *
-	 * @param item
-	 *   rte_flow item to convert.
-	 * @param default_mask
-	 *   Default bit-masks to use when item->mask is not provided.
-	 * @param flow
-	 *   Flow rule handle to update.
-	 *
-	 * @return
-	 *   0 on success, negative value otherwise.
-	 */
-	int (*convert)(const struct rte_flow_item *item,
-		       const void *default_mask,
-		       struct rte_flow *flow);
+	/** Merge a pattern item into a flow rule handle. */
+	int (*merge)(struct rte_flow *flow,
+		     const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
 	/** List of possible subsequent items. */
@@ -134,107 +102,185 @@ struct mlx4_drop {
 };
 
 /**
- * Convert Ethernet item to Verbs specification.
+ * Merge Ethernet pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ * - Not providing @p item->spec or providing an empty @p mask->dst is
+ *   *only* supported if the rule doesn't specify additional matching
+ *   criteria (i.e. rule is promiscuous-like).
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_eth(const struct rte_flow_item *item,
-		     const void *default_mask,
-		     struct rte_flow *flow)
+mlx4_flow_merge_eth(struct rte_flow *flow,
+		    const struct rte_flow_item *item,
+		    const struct mlx4_flow_proc_item *proc,
+		    struct rte_flow_error *error)
 {
 	const struct rte_flow_item_eth *spec = item->spec;
-	const struct rte_flow_item_eth *mask = item->mask;
+	const struct rte_flow_item_eth *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_eth *eth;
-	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
+	const char *msg;
 	unsigned int i;
 
+	if (!mask) {
+		flow->promisc = 1;
+	} else {
+		uint32_t sum_dst = 0;
+		uint32_t sum_src = 0;
+
+		for (i = 0; i != sizeof(mask->dst.addr_bytes); ++i) {
+			sum_dst += mask->dst.addr_bytes[i];
+			sum_src += mask->src.addr_bytes[i];
+		}
+		if (sum_src) {
+			msg = "mlx4 does not support source MAC matching";
+			goto error;
+		} else if (!sum_dst) {
+			flow->promisc = 1;
+		} else if (sum_dst != (UINT8_C(0xff) * ETHER_ADDR_LEN)) {
+			msg = "mlx4 does not support matching partial"
+				" Ethernet fields";
+			goto error;
+		}
+	}
+	if (!flow->ibv_attr)
+		return 0;
+	if (flow->promisc) {
+		flow->ibv_attr->type = IBV_FLOW_ATTR_ALL_DEFAULT;
+		return 0;
+	}
 	++flow->ibv_attr->num_of_specs;
 	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
 		.type = IBV_FLOW_SPEC_ETH,
-		.size = eth_size,
+		.size = sizeof(*eth),
 	};
-	if (!spec) {
-		flow->ibv_attr->type = IBV_FLOW_ATTR_ALL_DEFAULT;
-		return 0;
-	}
-	if (!mask)
-		mask = default_mask;
 	memcpy(eth->val.dst_mac, spec->dst.addr_bytes, ETHER_ADDR_LEN);
-	memcpy(eth->val.src_mac, spec->src.addr_bytes, ETHER_ADDR_LEN);
 	memcpy(eth->mask.dst_mac, mask->dst.addr_bytes, ETHER_ADDR_LEN);
-	memcpy(eth->mask.src_mac, mask->src.addr_bytes, ETHER_ADDR_LEN);
 	/* Remove unwanted bits from values. */
 	for (i = 0; i < ETHER_ADDR_LEN; ++i) {
 		eth->val.dst_mac[i] &= eth->mask.dst_mac[i];
-		eth->val.src_mac[i] &= eth->mask.src_mac[i];
 	}
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert VLAN item to Verbs specification.
+ * Merge VLAN pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - Matching *all* VLAN traffic by omitting @p item->spec or providing an
+ *   empty @p item->mask would also include non-VLAN traffic. Doing so is
+ *   therefore unsupported.
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_vlan(const struct rte_flow_item *item,
-		      const void *default_mask,
-		      struct rte_flow *flow)
+mlx4_flow_merge_vlan(struct rte_flow *flow,
+		     const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vlan *spec = item->spec;
-	const struct rte_flow_item_vlan *mask = item->mask;
+	const struct rte_flow_item_vlan *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_eth *eth;
-	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
+	const char *msg;
 
-	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size -
-		       eth_size);
-	if (!spec)
+	if (!mask || !mask->tci) {
+		msg = "mlx4 cannot match all VLAN traffic while excluding"
+			" non-VLAN traffic, TCI VID must be specified";
+		goto error;
+	}
+	if (mask->tci != RTE_BE16(0x0fff)) {
+		msg = "mlx4 does not support partial TCI VID matching";
+		goto error;
+	}
+	if (!flow->ibv_attr)
 		return 0;
-	if (!mask)
-		mask = default_mask;
+	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size -
+		       sizeof(*eth));
 	eth->val.vlan_tag = spec->tci;
 	eth->mask.vlan_tag = mask->tci;
 	eth->val.vlan_tag &= eth->mask.vlan_tag;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert IPv4 item to Verbs specification.
+ * Merge IPv4 pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_ipv4(const struct rte_flow_item *item,
-		      const void *default_mask,
-		      struct rte_flow *flow)
+mlx4_flow_merge_ipv4(struct rte_flow *flow,
+		     const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error)
 {
 	const struct rte_flow_item_ipv4 *spec = item->spec;
-	const struct rte_flow_item_ipv4 *mask = item->mask;
+	const struct rte_flow_item_ipv4 *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_ipv4 *ipv4;
-	unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4);
+	const char *msg;
 
+	if (mask &&
+	    ((uint32_t)(mask->hdr.src_addr + 1) > UINT32_C(1) ||
+	     (uint32_t)(mask->hdr.dst_addr + 1) > UINT32_C(1))) {
+		msg = "mlx4 does not support matching partial IPv4 fields";
+		goto error;
+	}
+	if (!flow->ibv_attr)
+		return 0;
 	++flow->ibv_attr->num_of_specs;
 	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*ipv4 = (struct ibv_flow_spec_ipv4) {
 		.type = IBV_FLOW_SPEC_IPV4,
-		.size = ipv4_size,
+		.size = sizeof(*ipv4),
 	};
 	if (!spec)
 		return 0;
@@ -242,8 +288,6 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 		.src_ip = spec->hdr.src_addr,
 		.dst_ip = spec->hdr.dst_addr,
 	};
-	if (!mask)
-		mask = default_mask;
 	ipv4->mask = (struct ibv_flow_ipv4_filter) {
 		.src_ip = mask->hdr.src_addr,
 		.dst_ip = mask->hdr.dst_addr,
@@ -252,224 +296,188 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 	ipv4->val.src_ip &= ipv4->mask.src_ip;
 	ipv4->val.dst_ip &= ipv4->mask.dst_ip;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert UDP item to Verbs specification.
+ * Merge UDP pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_udp(const struct rte_flow_item *item,
-		     const void *default_mask,
-		     struct rte_flow *flow)
+mlx4_flow_merge_udp(struct rte_flow *flow,
+		    const struct rte_flow_item *item,
+		    const struct mlx4_flow_proc_item *proc,
+		    struct rte_flow_error *error)
 {
 	const struct rte_flow_item_udp *spec = item->spec;
-	const struct rte_flow_item_udp *mask = item->mask;
+	const struct rte_flow_item_udp *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_tcp_udp *udp;
-	unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp);
+	const char *msg;
 
+	if (!mask ||
+	    ((uint16_t)(mask->hdr.src_port + 1) > UINT16_C(1) ||
+	     (uint16_t)(mask->hdr.dst_port + 1) > UINT16_C(1))) {
+		msg = "mlx4 does not support matching partial UDP fields";
+		goto error;
+	}
+	if (!flow->ibv_attr)
+		return 0;
 	++flow->ibv_attr->num_of_specs;
 	udp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*udp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_UDP,
-		.size = udp_size,
+		.size = sizeof(*udp),
 	};
 	if (!spec)
 		return 0;
 	udp->val.dst_port = spec->hdr.dst_port;
 	udp->val.src_port = spec->hdr.src_port;
-	if (!mask)
-		mask = default_mask;
 	udp->mask.dst_port = mask->hdr.dst_port;
 	udp->mask.src_port = mask->hdr.src_port;
 	/* Remove unwanted bits from values. */
 	udp->val.src_port &= udp->mask.src_port;
 	udp->val.dst_port &= udp->mask.dst_port;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert TCP item to Verbs specification.
+ * Merge TCP pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_tcp(const struct rte_flow_item *item,
-		     const void *default_mask,
-		     struct rte_flow *flow)
+mlx4_flow_merge_tcp(struct rte_flow *flow,
+		    const struct rte_flow_item *item,
+		    const struct mlx4_flow_proc_item *proc,
+		    struct rte_flow_error *error)
 {
 	const struct rte_flow_item_tcp *spec = item->spec;
-	const struct rte_flow_item_tcp *mask = item->mask;
+	const struct rte_flow_item_tcp *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_tcp_udp *tcp;
-	unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp);
+	const char *msg;
 
+	if (!mask ||
+	    ((uint16_t)(mask->hdr.src_port + 1) > UINT16_C(1) ||
+	     (uint16_t)(mask->hdr.dst_port + 1) > UINT16_C(1))) {
+		msg = "mlx4 does not support matching partial TCP fields";
+		goto error;
+	}
+	if (!flow->ibv_attr)
+		return 0;
 	++flow->ibv_attr->num_of_specs;
 	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*tcp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_TCP,
-		.size = tcp_size,
+		.size = sizeof(*tcp),
 	};
 	if (!spec)
 		return 0;
 	tcp->val.dst_port = spec->hdr.dst_port;
 	tcp->val.src_port = spec->hdr.src_port;
-	if (!mask)
-		mask = default_mask;
 	tcp->mask.dst_port = mask->hdr.dst_port;
 	tcp->mask.src_port = mask->hdr.src_port;
 	/* Remove unwanted bits from values. */
 	tcp->val.src_port &= tcp->mask.src_port;
 	tcp->val.dst_port &= tcp->mask.dst_port;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Check support for a given item.
+ * Perform basic sanity checks on a pattern item.
  *
- * @param item[in]
+ * @param[in] item
  *   Item specification.
- * @param mask[in]
- *   Bit-masks covering supported fields to compare with spec, last and mask in
- *   \item.
- * @param size
- *   Bit-Mask size in bytes.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
  *
  * @return
- *   0 on success, negative value otherwise.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_item_validate(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size)
+mlx4_flow_item_check(const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error)
 {
-	int ret = 0;
+	const uint8_t *mask;
+	unsigned int i;
 
+	/* item->last and item->mask cannot exist without item->spec. */
 	if (!item->spec && (item->mask || item->last))
-		return -1;
-	if (item->spec && !item->mask) {
-		unsigned int i;
-		const uint8_t *spec = item->spec;
-
-		for (i = 0; i < size; ++i)
-			if ((spec[i] | mask[i]) != mask[i])
-				return -1;
-	}
-	if (item->last && !item->mask) {
-		unsigned int i;
-		const uint8_t *spec = item->last;
-
-		for (i = 0; i < size; ++i)
-			if ((spec[i] | mask[i]) != mask[i])
-				return -1;
-	}
-	if (item->spec && item->last) {
-		uint8_t spec[size];
-		uint8_t last[size];
-		const uint8_t *apply = mask;
-		unsigned int i;
-
-		if (item->mask)
-			apply = item->mask;
-		for (i = 0; i < size; ++i) {
-			spec[i] = ((const uint8_t *)item->spec)[i] & apply[i];
-			last[i] = ((const uint8_t *)item->last)[i] & apply[i];
-		}
-		ret = memcmp(spec, last, size);
-	}
-	return ret;
-}
-
-static int
-mlx4_flow_validate_eth(const struct rte_flow_item *item,
-		       const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_eth *mask = item->mask;
-
-		if (mask->dst.addr_bytes[0] != 0xff ||
-				mask->dst.addr_bytes[1] != 0xff ||
-				mask->dst.addr_bytes[2] != 0xff ||
-				mask->dst.addr_bytes[3] != 0xff ||
-				mask->dst.addr_bytes[4] != 0xff ||
-				mask->dst.addr_bytes[5] != 0xff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_vlan(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_vlan *mask = item->mask;
-
-		if (mask->tci != 0 &&
-		    ntohs(mask->tci) != 0x0fff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_ipv4(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_ipv4 *mask = item->mask;
-
-		if (mask->hdr.src_addr != 0 &&
-		    mask->hdr.src_addr != 0xffffffff)
-			return -1;
-		if (mask->hdr.dst_addr != 0 &&
-		    mask->hdr.dst_addr != 0xffffffff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_udp(const struct rte_flow_item *item,
-		       const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_udp *mask = item->mask;
-
-		if (mask->hdr.src_port != 0 &&
-		    mask->hdr.src_port != 0xffff)
-			return -1;
-		if (mask->hdr.dst_port != 0 &&
-		    mask->hdr.dst_port != 0xffff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_tcp(const struct rte_flow_item *item,
-		       const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_tcp *mask = item->mask;
-
-		if (mask->hdr.src_port != 0 &&
-		    mask->hdr.src_port != 0xffff)
-			return -1;
-		if (mask->hdr.dst_port != 0 &&
-		    mask->hdr.dst_port != 0xffff)
-			return -1;
+		return rte_flow_error_set
+			(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item,
+			 "\"mask\" or \"last\" field provided without a"
+			 " corresponding \"spec\"");
+	/* No spec, no mask, no problem. */
+	if (!item->spec)
+		return 0;
+	mask = item->mask ?
+		(const uint8_t *)item->mask :
+		(const uint8_t *)proc->mask_default;
+	assert(mask);
+	/*
+	 * Single-pass check to make sure that:
+	 * - Mask is supported, no bits are set outside proc->mask_support.
+	 * - Both item->spec and item->last are included in mask.
+	 */
+	for (i = 0; i != proc->mask_sz; ++i) {
+		if (!mask[i])
+			continue;
+		if ((mask[i] | ((const uint8_t *)proc->mask_support)[i]) !=
+		    ((const uint8_t *)proc->mask_support)[i])
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item, "unsupported field found in \"mask\"");
+		if (item->last &&
+		    (((const uint8_t *)item->spec)[i] & mask[i]) !=
+		    (((const uint8_t *)item->last)[i] & mask[i]))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item,
+				 "range between \"spec\" and \"last\""
+				 " is larger than \"mask\"");
 	}
-	return mlx4_flow_item_validate(item, mask, size);
+	return 0;
 }
 
 /** Graph of supported items and associated actions. */
@@ -480,66 +488,62 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_ETH] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_VLAN,
 				       RTE_FLOW_ITEM_TYPE_IPV4),
-		.mask = &(const struct rte_flow_item_eth){
+		.mask_support = &(const struct rte_flow_item_eth){
+			/* Only destination MAC can be matched. */
 			.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
-			.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 		},
-		.default_mask = &rte_flow_item_eth_mask,
+		.mask_default = &rte_flow_item_eth_mask,
 		.mask_sz = sizeof(struct rte_flow_item_eth),
-		.validate = mlx4_flow_validate_eth,
-		.convert = mlx4_flow_create_eth,
+		.merge = mlx4_flow_merge_eth,
 		.dst_sz = sizeof(struct ibv_flow_spec_eth),
 	},
 	[RTE_FLOW_ITEM_TYPE_VLAN] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_IPV4),
-		.mask = &(const struct rte_flow_item_vlan){
+		.mask_support = &(const struct rte_flow_item_vlan){
 			/* Only TCI VID matching is supported. */
 			.tci = RTE_BE16(0x0fff),
 		},
+		.mask_default = &rte_flow_item_vlan_mask,
 		.mask_sz = sizeof(struct rte_flow_item_vlan),
-		.validate = mlx4_flow_validate_vlan,
-		.convert = mlx4_flow_create_vlan,
+		.merge = mlx4_flow_merge_vlan,
 		.dst_sz = 0,
 	},
 	[RTE_FLOW_ITEM_TYPE_IPV4] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_UDP,
 				       RTE_FLOW_ITEM_TYPE_TCP),
-		.mask = &(const struct rte_flow_item_ipv4){
+		.mask_support = &(const struct rte_flow_item_ipv4){
 			.hdr = {
 				.src_addr = RTE_BE32(0xffffffff),
 				.dst_addr = RTE_BE32(0xffffffff),
 			},
 		},
-		.default_mask = &rte_flow_item_ipv4_mask,
+		.mask_default = &rte_flow_item_ipv4_mask,
 		.mask_sz = sizeof(struct rte_flow_item_ipv4),
-		.validate = mlx4_flow_validate_ipv4,
-		.convert = mlx4_flow_create_ipv4,
+		.merge = mlx4_flow_merge_ipv4,
 		.dst_sz = sizeof(struct ibv_flow_spec_ipv4),
 	},
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
-		.mask = &(const struct rte_flow_item_udp){
+		.mask_support = &(const struct rte_flow_item_udp){
 			.hdr = {
 				.src_port = RTE_BE16(0xffff),
 				.dst_port = RTE_BE16(0xffff),
 			},
 		},
-		.default_mask = &rte_flow_item_udp_mask,
+		.mask_default = &rte_flow_item_udp_mask,
 		.mask_sz = sizeof(struct rte_flow_item_udp),
-		.validate = mlx4_flow_validate_udp,
-		.convert = mlx4_flow_create_udp,
+		.merge = mlx4_flow_merge_udp,
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
 	[RTE_FLOW_ITEM_TYPE_TCP] = {
-		.mask = &(const struct rte_flow_item_tcp){
+		.mask_support = &(const struct rte_flow_item_tcp){
 			.hdr = {
 				.src_port = RTE_BE16(0xffff),
 				.dst_port = RTE_BE16(0xffff),
 			},
 		},
-		.default_mask = &rte_flow_item_tcp_mask,
+		.mask_default = &rte_flow_item_tcp_mask,
 		.mask_sz = sizeof(struct rte_flow_item_tcp),
-		.validate = mlx4_flow_validate_tcp,
-		.convert = mlx4_flow_create_tcp,
+		.merge = mlx4_flow_merge_tcp,
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
 };
@@ -577,6 +581,7 @@ mlx4_flow_prepare(struct priv *priv,
 	const struct mlx4_flow_proc_item *proc;
 	struct rte_flow temp = { .ibv_attr_size = sizeof(*temp.ibv_attr) };
 	struct rte_flow *flow = &temp;
+	const char *msg = NULL;
 
 	if (attr->group)
 		return rte_flow_error_set
@@ -609,18 +614,11 @@ mlx4_flow_prepare(struct priv *priv,
 			flow->internal = 1;
 			continue;
 		}
-		/*
-		 * The nic can support patterns with NULL eth spec only
-		 * if eth is a single item in a rule.
-		 */
-		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
-			const struct rte_flow_item *next = item + 1;
-
-			if (next->type)
-				return rte_flow_error_set
-					(error, ENOTSUP,
-					 RTE_FLOW_ERROR_TYPE_ITEM, item,
-					 "the rule requires an Ethernet spec");
+		if (flow->promisc) {
+			msg = "mlx4 does not support additional matching"
+				" criteria combined with indiscriminate"
+				" matching on Ethernet headers";
+			goto exit_item_not_supported;
 		}
 		for (i = 0; proc->next_item && proc->next_item[i]; ++i) {
 			if (proc->next_item[i] == item->type) {
@@ -631,19 +629,19 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!next)
 			goto exit_item_not_supported;
 		proc = next;
-		/* Perform validation once, while handle is not allocated. */
+		/*
+		 * Perform basic sanity checks only once, while handle is
+		 * not allocated.
+		 */
 		if (flow == &temp) {
-			err = proc->validate(item, proc->mask, proc->mask_sz);
+			err = mlx4_flow_item_check(item, proc, error);
 			if (err)
-				goto exit_item_not_supported;
-		} else if (proc->convert) {
-			err = proc->convert(item,
-					    (proc->default_mask ?
-					     proc->default_mask :
-					     proc->mask),
-					    flow);
+				return err;
+		}
+		if (proc->merge) {
+			err = proc->merge(flow, item, proc, error);
 			if (err)
-				goto exit_item_not_supported;
+				return err;
 		}
 		flow->ibv_attr_size += proc->dst_sz;
 	}
@@ -712,7 +710,7 @@ mlx4_flow_prepare(struct priv *priv,
 	return 0;
 exit_item_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
-				  item, "item not supported");
+				  item, msg ? msg : "item not supported");
 exit_action_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				  action, "action not supported");
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 13495d7..3036ff5 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -65,6 +65,7 @@ struct rte_flow {
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
 	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
 	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
+	uint32_t promisc:1; /**< This rule matches everything. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint16_t queue_id; /**< Target queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 17/29] net/mlx4: add MAC addresses configuration support
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (15 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 16/29] net/mlx4: refactor flow item validation code Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 18/29] net/mlx4: add VLAN filter " Adrien Mazarguil
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This commit brings back support for configuring up to 128 MAC addresses on
a port through internal flow rules automatically generated on demand.

Unlike its previous incarnation, the necessary extra flow rule for
broadcast traffic does not consume an entry from the MAC array anymore.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 +
 drivers/net/mlx4/mlx4.c           |  7 ++-
 drivers/net/mlx4/mlx4.h           | 10 +++-
 drivers/net/mlx4/mlx4_ethdev.c    | 87 +++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4_flow.c      | 90 ++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_flow.h      |  2 +
 6 files changed, 177 insertions(+), 20 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 0812a30..d17774f 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -12,6 +12,7 @@ Rx interrupt         = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
+Unicast MAC filter   = Y
 SR-IOV               = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 256aa3d..99c87ff 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -221,6 +221,9 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_set_link_up = mlx4_dev_set_link_up,
 	.dev_close = mlx4_dev_close,
 	.link_update = mlx4_link_update,
+	.mac_addr_remove = mlx4_mac_addr_remove,
+	.mac_addr_add = mlx4_mac_addr_add,
+	.mac_addr_set = mlx4_mac_addr_set,
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
@@ -552,7 +555,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[2], mac.addr_bytes[3],
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
-		priv->mac = mac;
+		priv->mac[0] = mac;
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
@@ -581,7 +584,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 		eth_dev->data->dev_private = priv;
-		eth_dev->data->mac_addrs = &priv->mac;
+		eth_dev->data->mac_addrs = priv->mac;
 		eth_dev->device = &pci_dev->device;
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
 		eth_dev->device->driver = &mlx4_driver.driver;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index fb4708d..15ecd95 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -52,6 +52,9 @@
 #include <rte_interrupts.h>
 #include <rte_mempool.h>
 
+/** Maximum number of simultaneous MAC addresses. This value is arbitrary. */
+#define MLX4_MAX_MAC_ADDRESSES 128
+
 /** Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
@@ -99,7 +102,6 @@ struct priv {
 	struct ibv_context *ctx; /**< Verbs context. */
 	struct ibv_device_attr device_attr; /**< Device properties. */
 	struct ibv_pd *pd; /**< Protection Domain. */
-	struct ether_addr mac; /**< MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /**< Configured MTU. */
 	uint8_t port; /**< Physical port number. */
@@ -110,6 +112,8 @@ struct priv {
 	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
 	struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
 	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
+	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
+	/**< Configured MAC addresses. Unused entries are zeroed. */
 };
 
 /* mlx4_ethdev.c */
@@ -120,6 +124,10 @@ int mlx4_mtu_get(struct priv *priv, uint16_t *mtu);
 int mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
 int mlx4_dev_set_link_down(struct rte_eth_dev *dev);
 int mlx4_dev_set_link_up(struct rte_eth_dev *dev);
+void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		      uint32_t index, uint32_t vmdq);
+void mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
 int mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
 void mlx4_stats_reset(struct rte_eth_dev *dev);
 void mlx4_dev_infos_get(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 8962be1..52924df 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -64,9 +64,11 @@
 #include <rte_errno.h>
 #include <rte_ethdev.h>
 #include <rte_ether.h>
+#include <rte_flow.h>
 #include <rte_pci.h>
 
 #include "mlx4.h"
+#include "mlx4_flow.h"
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
@@ -518,6 +520,88 @@ mlx4_dev_set_link_up(struct rte_eth_dev *dev)
 }
 
 /**
+ * DPDK callback to remove a MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param index
+ *   MAC address index.
+ */
+void
+mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
+
+	if (index >= RTE_DIM(priv->mac)) {
+		rte_errno = EINVAL;
+		return;
+	}
+	memset(&priv->mac[index], 0, sizeof(priv->mac[index]));
+	if (!mlx4_flow_sync(priv, &error))
+		return;
+	ERROR("failed to synchronize flow rules after removing MAC address"
+	      " at index %d (code %d, \"%s\"),"
+	      " flow error type %d, cause %p, message: %s",
+	      index, rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+}
+
+/**
+ * DPDK callback to add a MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mac_addr
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ * @param vmdq
+ *   VMDq pool index to associate address with (ignored).
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		  uint32_t index, uint32_t vmdq)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
+	int ret;
+
+	(void)vmdq;
+	if (index >= RTE_DIM(priv->mac)) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	memcpy(&priv->mac[index], mac_addr, sizeof(priv->mac[index]));
+	ret = mlx4_flow_sync(priv, &error);
+	if (!ret)
+		return 0;
+	ERROR("failed to synchronize flow rules after adding MAC address"
+	      " at index %d (code %d, \"%s\"),"
+	      " flow error type %d, cause %p, message: %s",
+	      index, rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+	return ret;
+}
+
+/**
+ * DPDK callback to set the primary MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mac_addr
+ *   MAC address to register.
+ */
+void
+mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	mlx4_mac_addr_add(dev, mac_addr, 0, 0);
+}
+
+/**
  * DPDK callback to get information about the device.
  *
  * @param dev
@@ -549,8 +633,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 		max = 65535;
 	info->max_rx_queues = max;
 	info->max_tx_queues = max;
-	/* Last array entry is reserved for broadcast. */
-	info->max_mac_addrs = 1;
+	info->max_mac_addrs = RTE_DIM(priv->mac);
 	info->rx_offload_capa = 0;
 	info->tx_offload_capa = 0;
 	if (mlx4_get_ifname(priv, &ifname) == 0)
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index b59efe1..4128437 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -58,6 +58,7 @@
 #include <rte_errno.h>
 #include <rte_eth_ctrl.h>
 #include <rte_ethdev.h>
+#include <rte_ether.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
@@ -1010,6 +1011,10 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 /**
  * Generate internal flow rules.
  *
+ * - MAC flow rules are generated from @p dev->data->mac_addrs
+ *   (@p priv->mac array).
+ * - An additional flow rule for Ethernet broadcasts is also generated.
+ *
  * @param priv
  *   Pointer to private structure.
  * @param[out] error
@@ -1025,18 +1030,18 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 		.priority = MLX4_FLOW_PRIORITY_LAST,
 		.ingress = 1,
 	};
+	struct rte_flow_item_eth eth_spec;
+	const struct rte_flow_item_eth eth_mask = {
+		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	};
 	struct rte_flow_item pattern[] = {
 		{
 			.type = MLX4_FLOW_ITEM_TYPE_INTERNAL,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_ETH,
-			.spec = &(struct rte_flow_item_eth){
-				.dst = priv->mac,
-			},
-			.mask = &(struct rte_flow_item_eth){
-				.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
-			},
+			.spec = &eth_spec,
+			.mask = &eth_mask,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -1053,10 +1058,69 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
+	struct ether_addr *rule_mac = &eth_spec.dst;
+	struct rte_flow *flow;
+	unsigned int i;
+	int err = 0;
 
-	if (!mlx4_flow_create(priv->dev, &attr, pattern, actions, error))
-		return -rte_errno;
-	return 0;
+	for (i = 0; i != RTE_DIM(priv->mac) + 1; ++i) {
+		const struct ether_addr *mac;
+
+		/* Broadcasts are handled by an extra iteration. */
+		if (i < RTE_DIM(priv->mac))
+			mac = &priv->mac[i];
+		else
+			mac = &eth_mask.dst;
+		if (is_zero_ether_addr(mac))
+			continue;
+		/* Check if MAC flow rule is already present. */
+		for (flow = LIST_FIRST(&priv->flows);
+		     flow && flow->internal;
+		     flow = LIST_NEXT(flow, next)) {
+			const struct ibv_flow_spec_eth *eth =
+				(const void *)((uintptr_t)flow->ibv_attr +
+					       sizeof(*flow->ibv_attr));
+			unsigned int j;
+
+			if (!flow->mac)
+				continue;
+			assert(flow->ibv_attr->type == IBV_FLOW_ATTR_NORMAL);
+			assert(flow->ibv_attr->num_of_specs == 1);
+			assert(eth->type == IBV_FLOW_SPEC_ETH);
+			for (j = 0; j != sizeof(mac->addr_bytes); ++j)
+				if (eth->val.dst_mac[j] != mac->addr_bytes[j] ||
+				    eth->mask.dst_mac[j] != UINT8_C(0xff) ||
+				    eth->val.src_mac[j] != UINT8_C(0x00) ||
+				    eth->mask.src_mac[j] != UINT8_C(0x00))
+					break;
+			if (j == sizeof(mac->addr_bytes))
+				break;
+		}
+		if (!flow || !flow->internal) {
+			/* Not found, create a new flow rule. */
+			memcpy(rule_mac, mac, sizeof(*mac));
+			flow = mlx4_flow_create(priv->dev, &attr, pattern,
+						actions, error);
+			if (!flow) {
+				err = -rte_errno;
+				break;
+			}
+		}
+		flow->select = 1;
+		flow->mac = 1;
+	}
+	/* Clear selection and clean up stale MAC flow rules. */
+	flow = LIST_FIRST(&priv->flows);
+	while (flow && flow->internal) {
+		struct rte_flow *next = LIST_NEXT(flow, next);
+
+		if (flow->mac && !flow->select)
+			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
+		else
+			flow->select = 0;
+		flow = next;
+	}
+	return err;
 }
 
 /**
@@ -1090,12 +1154,8 @@ mlx4_flow_sync(struct priv *priv, struct rte_flow_error *error)
 		     flow && flow->internal;
 		     flow = LIST_FIRST(&priv->flows))
 			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
-	} else if (!LIST_FIRST(&priv->flows) ||
-		   !LIST_FIRST(&priv->flows)->internal) {
-		/*
-		 * If the first rule is not internal outside isolated mode,
-		 * they must be added back.
-		 */
+	} else {
+		/* Refresh internal rules. */
 		ret = mlx4_flow_internal(priv, error);
 		if (ret)
 			return ret;
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 3036ff5..fcdf461 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -64,7 +64,9 @@ struct rte_flow {
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
 	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
+	uint32_t select:1; /**< Used by operations on the linked list. */
 	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
+	uint32_t mac:1; /**< Rule associated with a configured MAC address. */
 	uint32_t promisc:1; /**< This rule matches everything. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 18/29] net/mlx4: add VLAN filter configuration support
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (16 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 17/29] net/mlx4: add MAC addresses configuration support Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 19/29] net/mlx4: add flow support for multicast traffic Adrien Mazarguil
                   ` (11 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This commit brings back VLAN filter configuration support without any
artificial limitation on the number of simultaneous VLANs that can be
configured (previously 127).

Also thanks to the fact it does not rely on fixed per-queue arrays for
potential Verbs flow handle storage anymore, this version wastes a lot less
memory (previously 128 * 127 * pointer size, i.e. 130 kiB per Rx queue,
only one of which actually had any use for this room: the RSS parent
queue).

The number of internal flow rules generated still depends on the number of
configured MAC addresses times that of configured VLAN filters though.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 +
 drivers/net/mlx4/mlx4.c           |  1 +
 drivers/net/mlx4/mlx4.h           |  1 +
 drivers/net/mlx4/mlx4_ethdev.c    | 42 +++++++++++++++++++++
 drivers/net/mlx4/mlx4_flow.c      | 67 ++++++++++++++++++++++++++++++++++
 5 files changed, 112 insertions(+)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index d17774f..bfe0eb1 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -14,6 +14,7 @@ MTU update           = Y
 Jumbo frame          = Y
 Unicast MAC filter   = Y
 SR-IOV               = Y
+VLAN filter          = Y
 Basic stats          = Y
 Stats per queue      = Y
 Other kdrv           = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 99c87ff..e25e958 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -227,6 +227,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
+	.vlan_filter_set = mlx4_vlan_filter_set,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 15ecd95..cc403ea 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -128,6 +128,7 @@ void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 		      uint32_t index, uint32_t vmdq);
 void mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
+int mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 int mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
 void mlx4_stats_reset(struct rte_eth_dev *dev);
 void mlx4_dev_infos_get(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 52924df..7721f13 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -588,6 +588,48 @@ mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 }
 
 /**
+ * DPDK callback to configure a VLAN filter.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param vlan_id
+ *   VLAN ID to filter.
+ * @param on
+ *   Toggle filter.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
+	unsigned int vidx = vlan_id / 64;
+	unsigned int vbit = vlan_id % 64;
+	uint64_t *v;
+	int ret;
+
+	if (vidx >= RTE_DIM(dev->data->vlan_filter_conf.ids)) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	v = &dev->data->vlan_filter_conf.ids[vidx];
+	*v &= ~(UINT64_C(1) << vbit);
+	*v |= (uint64_t)!!on << vbit;
+	ret = mlx4_flow_sync(priv, &error);
+	if (!ret)
+		return 0;
+	ERROR("failed to synchronize flow rules after %s VLAN filter on ID %u"
+	      " (code %d, \"%s\"), "
+	      " flow error type %d, cause %p, message: %s",
+	      on ? "enabling" : "disabling", vlan_id,
+	      rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+	return ret;
+}
+
+/**
  * DPDK callback to set the primary MAC address.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 4128437..47a6a6a 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -1009,11 +1009,36 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 }
 
 /**
+ * Helper function to determine the next configured VLAN filter.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param vlan
+ *   VLAN ID to use as a starting point.
+ *
+ * @return
+ *   Next configured VLAN ID or a high value (>= 4096) if there is none.
+ */
+static uint16_t
+mlx4_flow_internal_next_vlan(struct priv *priv, uint16_t vlan)
+{
+	while (vlan < 4096) {
+		if (priv->dev->data->vlan_filter_conf.ids[vlan / 64] &
+		    (UINT64_C(1) << (vlan % 64)))
+			return vlan;
+		++vlan;
+	}
+	return vlan;
+}
+
+/**
  * Generate internal flow rules.
  *
  * - MAC flow rules are generated from @p dev->data->mac_addrs
  *   (@p priv->mac array).
  * - An additional flow rule for Ethernet broadcasts is also generated.
+ * - All these are per-VLAN if @p dev->data->dev_conf.rxmode.hw_vlan_filter
+ *   is enabled and VLAN filters are configured.
  *
  * @param priv
  *   Pointer to private structure.
@@ -1034,6 +1059,10 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	const struct rte_flow_item_eth eth_mask = {
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 	};
+	struct rte_flow_item_vlan vlan_spec;
+	const struct rte_flow_item_vlan vlan_mask = {
+		.tci = RTE_BE16(0x0fff),
+	};
 	struct rte_flow_item pattern[] = {
 		{
 			.type = MLX4_FLOW_ITEM_TYPE_INTERNAL,
@@ -1044,6 +1073,10 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			.mask = &eth_mask,
 		},
 		{
+			/* Replaced with VLAN if filtering is enabled. */
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
@@ -1059,10 +1092,33 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 		},
 	};
 	struct ether_addr *rule_mac = &eth_spec.dst;
+	rte_be16_t *rule_vlan =
+		priv->dev->data->dev_conf.rxmode.hw_vlan_filter ?
+		&vlan_spec.tci :
+		NULL;
+	uint16_t vlan = 0;
 	struct rte_flow *flow;
 	unsigned int i;
 	int err = 0;
 
+	/*
+	 * Set up VLAN item if filtering is enabled and at least one VLAN
+	 * filter is configured.
+	 */
+	if (rule_vlan) {
+		vlan = mlx4_flow_internal_next_vlan(priv, 0);
+		if (vlan < 4096) {
+			pattern[2] = (struct rte_flow_item){
+				.type = RTE_FLOW_ITEM_TYPE_VLAN,
+				.spec = &vlan_spec,
+				.mask = &vlan_mask,
+			};
+next_vlan:
+			*rule_vlan = rte_cpu_to_be_16(vlan);
+		} else {
+			rule_vlan = NULL;
+		}
+	}
 	for (i = 0; i != RTE_DIM(priv->mac) + 1; ++i) {
 		const struct ether_addr *mac;
 
@@ -1087,6 +1143,12 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			assert(flow->ibv_attr->type == IBV_FLOW_ATTR_NORMAL);
 			assert(flow->ibv_attr->num_of_specs == 1);
 			assert(eth->type == IBV_FLOW_SPEC_ETH);
+			if (rule_vlan &&
+			    (eth->val.vlan_tag != *rule_vlan ||
+			     eth->mask.vlan_tag != RTE_BE16(0x0fff)))
+				continue;
+			if (!rule_vlan && eth->mask.vlan_tag)
+				continue;
 			for (j = 0; j != sizeof(mac->addr_bytes); ++j)
 				if (eth->val.dst_mac[j] != mac->addr_bytes[j] ||
 				    eth->mask.dst_mac[j] != UINT8_C(0xff) ||
@@ -1109,6 +1171,11 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 		flow->select = 1;
 		flow->mac = 1;
 	}
+	if (!err && rule_vlan) {
+		vlan = mlx4_flow_internal_next_vlan(priv, vlan + 1);
+		if (vlan < 4096)
+			goto next_vlan;
+	}
 	/* Clear selection and clean up stale MAC flow rules. */
 	flow = LIST_FIRST(&priv->flows);
 	while (flow && flow->internal) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 19/29] net/mlx4: add flow support for multicast traffic
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (17 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 18/29] net/mlx4: add VLAN filter " Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 20/29] net/mlx4: restore promisc and allmulti support Adrien Mazarguil
                   ` (10 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Give users the ability to create flow rules that match all multicast
traffic. Like promiscuous flow rules, they come with restrictions such as
not allowing additional matching criteria.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 +
 drivers/net/mlx4/mlx4_flow.c      | 17 +++++++++++++++--
 drivers/net/mlx4/mlx4_flow.h      |  1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index bfe0eb1..9e3ba34 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,6 +13,7 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Unicast MAC filter   = Y
+Multicast MAC filter = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Basic stats          = Y
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 47a6a6a..2ff1c69 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -107,7 +107,9 @@ struct mlx4_drop {
  *
  * Additional mlx4-specific constraints on supported fields:
  *
- * - No support for partial masks.
+ * - No support for partial masks, except in the specific case of matching
+ *   all multicast traffic (@p spec->dst and @p mask->dst equal to
+ *   01:00:00:00:00:00).
  * - Not providing @p item->spec or providing an empty @p mask->dst is
  *   *only* supported if the rule doesn't specify additional matching
  *   criteria (i.e. rule is promiscuous-like).
@@ -152,6 +154,13 @@ mlx4_flow_merge_eth(struct rte_flow *flow,
 			goto error;
 		} else if (!sum_dst) {
 			flow->promisc = 1;
+		} else if (sum_dst == 1 && mask->dst.addr_bytes[0] == 1) {
+			if (!(spec->dst.addr_bytes[0] & 1)) {
+				msg = "mlx4 does not support the explicit"
+					" exclusion of all multicast traffic";
+				goto error;
+			}
+			flow->allmulti = 1;
 		} else if (sum_dst != (UINT8_C(0xff) * ETHER_ADDR_LEN)) {
 			msg = "mlx4 does not support matching partial"
 				" Ethernet fields";
@@ -164,6 +173,10 @@ mlx4_flow_merge_eth(struct rte_flow *flow,
 		flow->ibv_attr->type = IBV_FLOW_ATTR_ALL_DEFAULT;
 		return 0;
 	}
+	if (flow->allmulti) {
+		flow->ibv_attr->type = IBV_FLOW_ATTR_MC_DEFAULT;
+		return 0;
+	}
 	++flow->ibv_attr->num_of_specs;
 	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
@@ -615,7 +628,7 @@ mlx4_flow_prepare(struct priv *priv,
 			flow->internal = 1;
 			continue;
 		}
-		if (flow->promisc) {
+		if (flow->promisc || flow->allmulti) {
 			msg = "mlx4 does not support additional matching"
 				" criteria combined with indiscriminate"
 				" matching on Ethernet headers";
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index fcdf461..134e14d 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -68,6 +68,7 @@ struct rte_flow {
 	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
 	uint32_t mac:1; /**< Rule associated with a configured MAC address. */
 	uint32_t promisc:1; /**< This rule matches everything. */
+	uint32_t allmulti:1; /**< This rule matches all multicast traffic. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint16_t queue_id; /**< Target queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 20/29] net/mlx4: restore promisc and allmulti support
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (18 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 19/29] net/mlx4: add flow support for multicast traffic Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 21/29] net/mlx4: update Rx/Tx callbacks consistently Adrien Mazarguil
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Implement promiscuous and all multicast through internal flow rules
automatically generated according to the configured mode.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  2 +
 drivers/net/mlx4/mlx4.c           |  4 ++
 drivers/net/mlx4/mlx4.h           |  4 ++
 drivers/net/mlx4/mlx4_ethdev.c    | 95 ++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_flow.c      | 63 +++++++++++++++++++---
 5 files changed, 162 insertions(+), 6 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 9e3ba34..6f8c82a 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -12,6 +12,8 @@ Rx interrupt         = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
+Promiscuous mode     = Y
+Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
 SR-IOV               = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e25e958..f02508a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -221,6 +221,10 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_set_link_up = mlx4_dev_set_link_up,
 	.dev_close = mlx4_dev_close,
 	.link_update = mlx4_link_update,
+	.promiscuous_enable = mlx4_promiscuous_enable,
+	.promiscuous_disable = mlx4_promiscuous_disable,
+	.allmulticast_enable = mlx4_allmulticast_enable,
+	.allmulticast_disable = mlx4_allmulticast_disable,
 	.mac_addr_remove = mlx4_mac_addr_remove,
 	.mac_addr_add = mlx4_mac_addr_add,
 	.mac_addr_set = mlx4_mac_addr_set,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index cc403ea..a27399a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -124,6 +124,10 @@ int mlx4_mtu_get(struct priv *priv, uint16_t *mtu);
 int mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
 int mlx4_dev_set_link_down(struct rte_eth_dev *dev);
 int mlx4_dev_set_link_up(struct rte_eth_dev *dev);
+void mlx4_promiscuous_enable(struct rte_eth_dev *dev);
+void mlx4_promiscuous_disable(struct rte_eth_dev *dev);
+void mlx4_allmulticast_enable(struct rte_eth_dev *dev);
+void mlx4_allmulticast_disable(struct rte_eth_dev *dev);
 void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 		      uint32_t index, uint32_t vmdq);
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 7721f13..01fb195 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -520,6 +520,101 @@ mlx4_dev_set_link_up(struct rte_eth_dev *dev)
 }
 
 /**
+ * Supported Rx mode toggles.
+ *
+ * Even and odd values respectively stand for off and on.
+ */
+enum rxmode_toggle {
+	RXMODE_TOGGLE_PROMISC_OFF,
+	RXMODE_TOGGLE_PROMISC_ON,
+	RXMODE_TOGGLE_ALLMULTI_OFF,
+	RXMODE_TOGGLE_ALLMULTI_ON,
+};
+
+/**
+ * Helper function to toggle promiscuous and all multicast modes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param toggle
+ *   Toggle to set.
+ */
+static void
+mlx4_rxmode_toggle(struct rte_eth_dev *dev, enum rxmode_toggle toggle)
+{
+	struct priv *priv = dev->data->dev_private;
+	const char *mode;
+	struct rte_flow_error error;
+
+	switch (toggle) {
+	case RXMODE_TOGGLE_PROMISC_OFF:
+	case RXMODE_TOGGLE_PROMISC_ON:
+		mode = "promiscuous";
+		dev->data->promiscuous = toggle & 1;
+		break;
+	case RXMODE_TOGGLE_ALLMULTI_OFF:
+	case RXMODE_TOGGLE_ALLMULTI_ON:
+		mode = "all multicast";
+		dev->data->all_multicast = toggle & 1;
+		break;
+	}
+	if (!mlx4_flow_sync(priv, &error))
+		return;
+	ERROR("cannot toggle %s mode (code %d, \"%s\"),"
+	      " flow error type %d, cause %p, message: %s",
+	      mode, rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+}
+
+/**
+ * DPDK callback to enable promiscuous mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_PROMISC_ON);
+}
+
+/**
+ * DPDK callback to disable promiscuous mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_PROMISC_OFF);
+}
+
+/**
+ * DPDK callback to enable all multicast mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_ALLMULTI_ON);
+}
+
+/**
+ * DPDK callback to disable all multicast mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_ALLMULTI_OFF);
+}
+
+/**
  * DPDK callback to remove a MAC address.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 2ff1c69..2d826b4 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -1047,6 +1047,14 @@ mlx4_flow_internal_next_vlan(struct priv *priv, uint16_t vlan)
 /**
  * Generate internal flow rules.
  *
+ * Various flow rules are created depending on the mode the device is in:
+ *
+ * 1. Promiscuous: port MAC + catch-all (VLAN filtering is ignored).
+ * 2. All multicast: port MAC/VLAN + catch-all multicast.
+ * 3. Otherwise: port MAC/VLAN + broadcast MAC/VLAN.
+ *
+ * About MAC flow rules:
+ *
  * - MAC flow rules are generated from @p dev->data->mac_addrs
  *   (@p priv->mac array).
  * - An additional flow rule for Ethernet broadcasts is also generated.
@@ -1072,6 +1080,9 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	const struct rte_flow_item_eth eth_mask = {
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 	};
+	const struct rte_flow_item_eth eth_allmulti = {
+		.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	};
 	struct rte_flow_item_vlan vlan_spec;
 	const struct rte_flow_item_vlan vlan_mask = {
 		.tci = RTE_BE16(0x0fff),
@@ -1106,9 +1117,13 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	};
 	struct ether_addr *rule_mac = &eth_spec.dst;
 	rte_be16_t *rule_vlan =
-		priv->dev->data->dev_conf.rxmode.hw_vlan_filter ?
+		priv->dev->data->dev_conf.rxmode.hw_vlan_filter &&
+		!priv->dev->data->promiscuous ?
 		&vlan_spec.tci :
 		NULL;
+	int broadcast =
+		!priv->dev->data->promiscuous &&
+		!priv->dev->data->all_multicast;
 	uint16_t vlan = 0;
 	struct rte_flow *flow;
 	unsigned int i;
@@ -1132,7 +1147,7 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			rule_vlan = NULL;
 		}
 	}
-	for (i = 0; i != RTE_DIM(priv->mac) + 1; ++i) {
+	for (i = 0; i != RTE_DIM(priv->mac) + broadcast; ++i) {
 		const struct ether_addr *mac;
 
 		/* Broadcasts are handled by an extra iteration. */
@@ -1178,23 +1193,59 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 						actions, error);
 			if (!flow) {
 				err = -rte_errno;
-				break;
+				goto error;
 			}
 		}
 		flow->select = 1;
 		flow->mac = 1;
 	}
-	if (!err && rule_vlan) {
+	if (rule_vlan) {
 		vlan = mlx4_flow_internal_next_vlan(priv, vlan + 1);
 		if (vlan < 4096)
 			goto next_vlan;
 	}
-	/* Clear selection and clean up stale MAC flow rules. */
+	/* Take care of promiscuous and all multicast flow rules. */
+	if (!broadcast) {
+		for (flow = LIST_FIRST(&priv->flows);
+		     flow && flow->internal;
+		     flow = LIST_NEXT(flow, next)) {
+			if (priv->dev->data->promiscuous) {
+				if (flow->promisc)
+					break;
+			} else {
+				assert(priv->dev->data->all_multicast);
+				if (flow->allmulti)
+					break;
+			}
+		}
+		if (!flow || !flow->internal) {
+			/* Not found, create a new flow rule. */
+			if (priv->dev->data->promiscuous) {
+				pattern[1].spec = NULL;
+				pattern[1].mask = NULL;
+			} else {
+				assert(priv->dev->data->all_multicast);
+				pattern[1].spec = &eth_allmulti;
+				pattern[1].mask = &eth_allmulti;
+			}
+			pattern[2] = pattern[3];
+			flow = mlx4_flow_create(priv->dev, &attr, pattern,
+						actions, error);
+			if (!flow) {
+				err = -rte_errno;
+				goto error;
+			}
+		}
+		assert(flow->promisc || flow->allmulti);
+		flow->select = 1;
+	}
+error:
+	/* Clear selection and clean up stale internal flow rules. */
 	flow = LIST_FIRST(&priv->flows);
 	while (flow && flow->internal) {
 		struct rte_flow *next = LIST_NEXT(flow, next);
 
-		if (flow->mac && !flow->select)
+		if (!flow->select)
 			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
 		else
 			flow->select = 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 21/29] net/mlx4: update Rx/Tx callbacks consistently
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (19 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 20/29] net/mlx4: restore promisc and allmulti support Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 22/29] net/mlx4: fix invalid errno value sign Adrien Mazarguil
                   ` (8 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Although their "removed" version acts as a safety against unexpected bursts
while queues are being modified by the control path, these callbacks are
set per device instead of per queue. It makes sense to update them during
start/stop/close cycles instead of queue setup.

As a side effect, this commit addresses a bug left over from a prior
commit: bringing the link down causes the "removed" Tx callback to be used,
however the normal callback is not restored when bringing it back up,
preventing the application from sending traffic at all.

Updating callbacks for a link change is not necessary as bringing the
netdevice down is normally enough to prevent traffic from flowing in.

Fixes: a4951cb98fdf ("net/mlx4: drop scatter/gather support")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.c        | 11 ++++++++---
 drivers/net/mlx4/mlx4_ethdev.c |  4 ----
 drivers/net/mlx4/mlx4_rxq.c    |  2 --
 drivers/net/mlx4/mlx4_txq.c    |  2 --
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f02508a..52f8d51 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -149,6 +149,9 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		      error.message ? error.message : "(unspecified)");
 		goto err;
 	}
+	rte_wmb();
+	dev->tx_pkt_burst = mlx4_tx_burst;
+	dev->rx_pkt_burst = mlx4_rx_burst;
 	return 0;
 err:
 	/* Rollback. */
@@ -173,6 +176,9 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		return;
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
+	dev->tx_pkt_burst = mlx4_tx_burst_removed;
+	dev->rx_pkt_burst = mlx4_rx_burst_removed;
+	rte_wmb();
 	mlx4_flow_sync(priv, NULL);
 	mlx4_intr_uninstall(priv);
 }
@@ -191,14 +197,13 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (priv == NULL)
-		return;
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
-	mlx4_flow_clean(priv);
 	dev->rx_pkt_burst = mlx4_rx_burst_removed;
 	dev->tx_pkt_burst = mlx4_tx_burst_removed;
+	rte_wmb();
+	mlx4_flow_clean(priv);
 	for (i = 0; i != dev->data->nb_rx_queues; ++i)
 		mlx4_rx_queue_release(dev->data->rx_queues[i]);
 	for (i = 0; i != dev->data->nb_tx_queues; ++i)
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 01fb195..ebf2339 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -467,20 +467,16 @@ mlx4_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
 static int
 mlx4_dev_set_link(struct priv *priv, int up)
 {
-	struct rte_eth_dev *dev = priv->dev;
 	int err;
 
 	if (up) {
 		err = mlx4_set_flags(priv, ~IFF_UP, IFF_UP);
 		if (err)
 			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst;
 	} else {
 		err = mlx4_set_flags(priv, ~IFF_UP, ~IFF_UP);
 		if (err)
 			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst_removed;
-		dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index bcb7b94..693db4f 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -436,8 +436,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_free(rxq);
 			return ret;
 		}
-		/* Update receive callback. */
-		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
 	return ret;
 }
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index e0245b0..c1fdbaf 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -438,8 +438,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		DEBUG("%p: adding Tx queue %p to list",
 		      (void *)dev, (void *)txq);
 		dev->data->tx_queues[idx] = txq;
-		/* Update send callback. */
-		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
 	return ret;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 22/29] net/mlx4: fix invalid errno value sign
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (20 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 21/29] net/mlx4: update Rx/Tx callbacks consistently Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 23/29] net/mlx4: drop live queue reconfiguration support Adrien Mazarguil
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Tx queue elements allocation function sets rte_errno properly and returns
its negative version. Reassigning this value to rte_errno is thus both
invalid and unnecessary.

Fixes: c3e1f93cdf88 ("net/mlx4: standardize on negative errno values")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_txq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index c1fdbaf..3cece3e 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -340,7 +340,6 @@ mlx4_txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	}
 	ret = mlx4_txq_alloc_elts(&tmpl, desc);
 	if (ret) {
-		rte_errno = ret;
 		ERROR("%p: TXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 23/29] net/mlx4: drop live queue reconfiguration support
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (21 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 22/29] net/mlx4: fix invalid errno value sign Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 24/29] net/mlx4: allocate queues and mbuf rings together Adrien Mazarguil
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

DPDK ensures that setup functions are never called on configured queues,
or only if they have previously been released.

PMDs therefore do not need to deal with the unexpected reconfiguration of
live queues which may fail with no easy way to recover. Dropping support
for this scenario greatly simplifies the code as allocation and setup steps
and checks can be merged.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_rxq.c  | 281 ++++++++++++++------------------------
 drivers/net/mlx4/mlx4_rxtx.h |   2 -
 drivers/net/mlx4/mlx4_txq.c  | 239 +++++++++++---------------------
 3 files changed, 184 insertions(+), 338 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 693db4f..30b0654 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -162,36 +162,12 @@ mlx4_rxq_free_elts(struct rxq *rxq)
 }
 
 /**
- * Clean up a Rx queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param rxq
- *   Pointer to Rx queue structure.
- */
-void
-mlx4_rxq_cleanup(struct rxq *rxq)
-{
-	DEBUG("cleaning up %p", (void *)rxq);
-	mlx4_rxq_free_elts(rxq);
-	if (rxq->qp != NULL)
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	if (rxq->cq != NULL)
-		claim_zero(ibv_destroy_cq(rxq->cq));
-	if (rxq->channel != NULL)
-		claim_zero(ibv_destroy_comp_channel(rxq->channel));
-	if (rxq->mr != NULL)
-		claim_zero(ibv_dereg_mr(rxq->mr));
-	memset(rxq, 0, sizeof(*rxq));
-}
-
-/**
- * Configure a Rx queue.
+ * DPDK callback to configure a Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
- * @param rxq
- *   Pointer to Rx queue structure.
+ * @param idx
+ *   Rx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
  * @param socket
@@ -204,30 +180,53 @@ mlx4_rxq_cleanup(struct rxq *rxq)
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
-static int
-mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	       unsigned int socket, const struct rte_eth_rxconf *conf,
-	       struct rte_mempool *mp)
+int
+mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct rxq tmpl = {
-		.priv = priv,
-		.mp = mp,
-		.socket = socket
-	};
-	struct ibv_qp_attr mod;
-	struct ibv_qp_init_attr qp_init;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int mb_len;
+	uint32_t mb_len = rte_pktmbuf_data_room_size(mp);
+	struct rte_flow_error error;
+	struct rxq *rxq;
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	mb_len = rte_pktmbuf_data_room_size(mp);
-	if (desc == 0) {
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= dev->data->nb_rx_queues) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, dev->data->nb_rx_queues);
+		return -rte_errno;
+	}
+	rxq = dev->data->rx_queues[idx];
+	if (rxq) {
+		rte_errno = EEXIST;
+		ERROR("%p: Rx queue %u already configured, release it first",
+		      (void *)dev, idx);
+		return -rte_errno;
+	}
+	if (!desc) {
 		rte_errno = EINVAL;
 		ERROR("%p: invalid number of Rx descriptors", (void *)dev);
-		goto error;
+		return -rte_errno;
+	}
+	/* Allocate and initialize Rx queue. */
+	rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+	if (!rxq) {
+		rte_errno = ENOMEM;
+		ERROR("%p: unable to allocate queue index %u",
+		      (void *)dev, idx);
+		return -rte_errno;
 	}
+	*rxq = (struct rxq){
+		.priv = priv,
+		.mp = mp,
+		.port_id = dev->data->port_id,
+		.stats.idx = idx,
+		.socket = socket,
+	};
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
@@ -246,201 +245,115 @@ mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		     mb_len - RTE_PKTMBUF_HEADROOM);
 	}
 	/* Use the entire Rx mempool as the memory region. */
-	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
-	if (tmpl.mr == NULL) {
+	rxq->mr = mlx4_mp2mr(priv->pd, mp);
+	if (!rxq->mr) {
 		rte_errno = EINVAL;
 		ERROR("%p: MR creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	if (dev->data->dev_conf.intr_conf.rxq) {
-		tmpl.channel = ibv_create_comp_channel(priv->ctx);
-		if (tmpl.channel == NULL) {
+		rxq->channel = ibv_create_comp_channel(priv->ctx);
+		if (rxq->channel == NULL) {
 			rte_errno = ENOMEM;
 			ERROR("%p: Rx interrupt completion channel creation"
 			      " failure: %s",
 			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
-		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
+		if (mlx4_fd_set_non_blocking(rxq->channel->fd) < 0) {
 			ERROR("%p: unable to make Rx interrupt completion"
 			      " channel non-blocking: %s",
 			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
 	}
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
-	if (tmpl.cq == NULL) {
+	rxq->cq = ibv_create_cq(priv->ctx, desc, NULL, rxq->channel, 0);
+	if (!rxq->cq) {
 		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	qp_init = (struct ibv_qp_init_attr){
-		/* CQ to be associated with the send queue. */
-		.send_cq = tmpl.cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = tmpl.cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = 1,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-	};
-	tmpl.qp = ibv_create_qp(priv->pd, &qp_init);
-	if (tmpl.qp == NULL) {
+	rxq->qp = ibv_create_qp
+		(priv->pd,
+		 &(struct ibv_qp_init_attr){
+			.send_cq = rxq->cq,
+			.recv_cq = rxq->cq,
+			.cap = {
+				.max_recv_wr =
+					RTE_MIN(priv->device_attr.max_qp_wr,
+						desc),
+				.max_recv_sge = 1,
+			},
+			.qp_type = IBV_QPT_RAW_PACKET,
+		 });
+	if (!rxq->qp) {
 		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
+	ret = ibv_modify_qp
+		(rxq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_INIT,
+			.port_num = priv->port,
+		 },
+		 IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_rxq_alloc_elts(&tmpl, desc);
+	ret = mlx4_rxq_alloc_elts(rxq, desc);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
+	ret = ibv_post_recv(rxq->qp, &(*rxq->elts)[0].wr,
+			    &(struct ibv_recv_wr *){ NULL });
 	if (ret) {
 		rte_errno = ret;
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+		ERROR("%p: ibv_post_recv() failed: %s",
 		      (void *)dev,
-		      (void *)bad_wr,
 		      strerror(rte_errno));
 		goto error;
 	}
-	mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
+	ret = ibv_modify_qp
+		(rxq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTR,
+		 },
+		 IBV_QP_STATE);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	/* Save port ID. */
-	tmpl.port_id = dev->data->port_id;
-	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
-	/* Clean up rxq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
-	mlx4_rxq_cleanup(rxq);
-	*rxq = tmpl;
-	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
-	return 0;
+	DEBUG("%p: adding Rx queue %p to list", (void *)dev, (void *)rxq);
+	dev->data->rx_queues[idx] = rxq;
+	/* Enable associated flows. */
+	ret = mlx4_flow_sync(priv, &error);
+	if (!ret)
+		return 0;
+	ERROR("cannot re-attach flow rules to queue %u"
+	      " (code %d, \"%s\"), flow error type %d, cause %p, message: %s",
+	      idx, -ret, strerror(-ret), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
 error:
+	dev->data->rx_queues[idx] = NULL;
 	ret = rte_errno;
-	mlx4_rxq_cleanup(&tmpl);
+	mlx4_rx_queue_release(rxq);
 	rte_errno = ret;
 	assert(rte_errno > 0);
 	return -rte_errno;
 }
 
 /**
- * DPDK callback to configure a Rx queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Rx queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = dev->data->rx_queues[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= dev->data->nb_rx_queues) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, dev->data->nb_rx_queues);
-		return -rte_errno;
-	}
-	if (rxq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)rxq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		dev->data->rx_queues[idx] = NULL;
-		/* Disable associated flows. */
-		mlx4_flow_sync(priv, NULL);
-		mlx4_rxq_cleanup(rxq);
-	} else {
-		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
-		if (rxq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = mlx4_rxq_setup(dev, rxq, desc, socket, conf, mp);
-	if (ret) {
-		rte_free(rxq);
-	} else {
-		struct rte_flow_error error;
-
-		rxq->stats.idx = idx;
-		DEBUG("%p: adding Rx queue %p to list",
-		      (void *)dev, (void *)rxq);
-		dev->data->rx_queues[idx] = rxq;
-		/* Re-enable associated flows. */
-		ret = mlx4_flow_sync(priv, &error);
-		if (ret) {
-			ERROR("cannot re-attach flow rules to queue %u"
-			      " (code %d, \"%s\"), flow error type %d,"
-			      " cause %p, message: %s", idx,
-			      -ret, strerror(-ret), error.type, error.cause,
-			      error.message ? error.message : "(unspecified)");
-			dev->data->rx_queues[idx] = NULL;
-			mlx4_rxq_cleanup(rxq);
-			rte_free(rxq);
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
  * DPDK callback to release a Rx queue.
  *
  * @param dpdk_rxq
@@ -464,6 +377,14 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			break;
 		}
 	mlx4_flow_sync(priv, NULL);
-	mlx4_rxq_cleanup(rxq);
+	mlx4_rxq_free_elts(rxq);
+	if (rxq->qp)
+		claim_zero(ibv_destroy_qp(rxq->qp));
+	if (rxq->cq)
+		claim_zero(ibv_destroy_cq(rxq->cq));
+	if (rxq->channel)
+		claim_zero(ibv_destroy_comp_channel(rxq->channel));
+	if (rxq->mr)
+		claim_zero(ibv_dereg_mr(rxq->mr));
 	rte_free(rxq);
 }
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 7a2c982..d62120e 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -122,7 +122,6 @@ struct txq {
 
 /* mlx4_rxq.c */
 
-void mlx4_rxq_cleanup(struct rxq *rxq);
 int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			uint16_t desc, unsigned int socket,
 			const struct rte_eth_rxconf *conf,
@@ -143,7 +142,6 @@ uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
 
 /* mlx4_txq.c */
 
-void mlx4_txq_cleanup(struct txq *txq);
 int mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			uint16_t desc, unsigned int socket,
 			const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index 3cece3e..f102c68 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -155,34 +155,6 @@ mlx4_txq_free_elts(struct txq *txq)
 	rte_free(elts);
 }
 
-/**
- * Clean up a Tx queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param txq
- *   Pointer to Tx queue structure.
- */
-void
-mlx4_txq_cleanup(struct txq *txq)
-{
-	size_t i;
-
-	DEBUG("cleaning up %p", (void *)txq);
-	mlx4_txq_free_elts(txq);
-	if (txq->qp != NULL)
-		claim_zero(ibv_destroy_qp(txq->qp));
-	if (txq->cq != NULL)
-		claim_zero(ibv_destroy_cq(txq->cq));
-	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
-		if (txq->mp2mr[i].mp == NULL)
-			break;
-		assert(txq->mp2mr[i].mr != NULL);
-		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
-	}
-	memset(txq, 0, sizeof(*txq));
-}
-
 struct txq_mp2mr_mbuf_check_data {
 	int ret;
 };
@@ -242,12 +214,12 @@ mlx4_txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 }
 
 /**
- * Configure a Tx queue.
+ * DPDK callback to configure a Tx queue.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
- * @param txq
- *   Pointer to Tx queue structure.
+ * @param idx
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
  * @param socket
@@ -258,190 +230,135 @@ mlx4_txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
-static int
-mlx4_txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
-	       unsigned int socket, const struct rte_eth_txconf *conf)
+int
+mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct txq tmpl = {
-		.priv = priv,
-		.socket = socket
-	};
-	union {
-		struct ibv_qp_init_attr init;
-		struct ibv_qp_attr mod;
-	} attr;
+	struct ibv_qp_init_attr qp_init_attr;
+	struct txq *txq;
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	if (priv == NULL) {
-		rte_errno = EINVAL;
-		goto error;
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= dev->data->nb_tx_queues) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, dev->data->nb_tx_queues);
+		return -rte_errno;
+	}
+	txq = dev->data->tx_queues[idx];
+	if (txq) {
+		rte_errno = EEXIST;
+		DEBUG("%p: Tx queue %u already configured, release it first",
+		      (void *)dev, idx);
+		return -rte_errno;
 	}
-	if (desc == 0) {
+	if (!desc) {
 		rte_errno = EINVAL;
 		ERROR("%p: invalid number of Tx descriptors", (void *)dev);
-		goto error;
+		return -rte_errno;
 	}
-	/* MRs will be registered in mp2mr[] later. */
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
-	if (tmpl.cq == NULL) {
+	/* Allocate and initialize Tx queue. */
+	txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+	if (!txq) {
+		rte_errno = ENOMEM;
+		ERROR("%p: unable to allocate queue index %u",
+		      (void *)dev, idx);
+		return -rte_errno;
+	}
+	*txq = (struct txq){
+		.priv = priv,
+		.stats.idx = idx,
+		.socket = socket,
+	};
+	txq->cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+	if (!txq->cq) {
 		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	attr.init = (struct ibv_qp_init_attr){
-		/* CQ to be associated with the send queue. */
-		.send_cq = tmpl.cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = tmpl.cq,
+	qp_init_attr = (struct ibv_qp_init_attr){
+		.send_cq = txq->cq,
+		.recv_cq = txq->cq,
 		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
+			.max_send_wr =
+				RTE_MIN(priv->device_attr.max_qp_wr, desc),
 			.max_send_sge = 1,
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
-		/*
-		 * Do *NOT* enable this, completions events are managed per
-		 * Tx burst.
-		 */
+		/* No completion events must occur by default. */
 		.sq_sig_all = 0,
 	};
-	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
-	if (tmpl.qp == NULL) {
+	txq->qp = ibv_create_qp(priv->pd, &qp_init_attr);
+	if (!txq->qp) {
 		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	/* ibv_create_qp() updates this value. */
-	tmpl.max_inline = attr.init.cap.max_inline_data;
-	attr.mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
+	txq->max_inline = qp_init_attr.cap.max_inline_data;
+	ret = ibv_modify_qp
+		(txq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_INIT,
+			.port_num = priv->port,
+		 },
+		 IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_txq_alloc_elts(&tmpl, desc);
+	ret = mlx4_txq_alloc_elts(txq, desc);
 	if (ret) {
 		ERROR("%p: TXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	attr.mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	ret = ibv_modify_qp
+		(txq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTR,
+		 },
+		 IBV_QP_STATE);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	attr.mod.qp_state = IBV_QPS_RTS;
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	ret = ibv_modify_qp
+		(txq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTS,
+		 },
+		 IBV_QP_STATE);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	/* Clean up txq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
-	mlx4_txq_cleanup(txq);
-	*txq = tmpl;
-	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
 	/* Pre-register known mempools. */
 	rte_mempool_walk(mlx4_txq_mp2mr_iter, txq);
+	DEBUG("%p: adding Tx queue %p to list", (void *)dev, (void *)txq);
+	dev->data->tx_queues[idx] = txq;
 	return 0;
 error:
+	dev->data->tx_queues[idx] = NULL;
 	ret = rte_errno;
-	mlx4_txq_cleanup(&tmpl);
+	mlx4_tx_queue_release(txq);
 	rte_errno = ret;
 	assert(rte_errno > 0);
 	return -rte_errno;
 }
 
 /**
- * DPDK callback to configure a Tx queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Tx queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct txq *txq = dev->data->tx_queues[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= dev->data->nb_tx_queues) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, dev->data->nb_tx_queues);
-		return -rte_errno;
-	}
-	if (txq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)txq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		dev->data->tx_queues[idx] = NULL;
-		mlx4_txq_cleanup(txq);
-	} else {
-		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
-		if (txq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = mlx4_txq_setup(dev, txq, desc, socket, conf);
-	if (ret) {
-		rte_free(txq);
-	} else {
-		txq->stats.idx = idx;
-		DEBUG("%p: adding Tx queue %p to list",
-		      (void *)dev, (void *)txq);
-		dev->data->tx_queues[idx] = txq;
-	}
-	return ret;
-}
-
-/**
  * DPDK callback to release a Tx queue.
  *
  * @param dpdk_txq
@@ -464,6 +381,16 @@ mlx4_tx_queue_release(void *dpdk_txq)
 			priv->dev->data->tx_queues[i] = NULL;
 			break;
 		}
-	mlx4_txq_cleanup(txq);
+	mlx4_txq_free_elts(txq);
+	if (txq->qp)
+		claim_zero(ibv_destroy_qp(txq->qp));
+	if (txq->cq)
+		claim_zero(ibv_destroy_cq(txq->cq));
+	for (i = 0; i != RTE_DIM(txq->mp2mr); ++i) {
+		if (!txq->mp2mr[i].mp)
+			break;
+		assert(txq->mp2mr[i].mr);
+		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+	}
 	rte_free(txq);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 24/29] net/mlx4: allocate queues and mbuf rings together
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (22 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 23/29] net/mlx4: drop live queue reconfiguration support Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 25/29] net/mlx4: convert Rx path to work queues Adrien Mazarguil
                   ` (5 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Since live Tx and Rx queues cannot be reused anymore without being
destroyed first, mbuf ring sizes are fixed and known from the start.

This allows a single allocation for queue data structures and mbuf ring
together, saving space and bringing them closer in memory.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_rxq.c  |  71 +++++++++++--------------
 drivers/net/mlx4/mlx4_rxtx.h |   2 +
 drivers/net/mlx4/mlx4_txq.c  | 109 +++++++++++---------------------------
 3 files changed, 65 insertions(+), 117 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 30b0654..03e6af5 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -69,36 +69,30 @@
  *
  * @param rxq
  *   Pointer to Rx queue structure.
- * @param elts_n
- *   Number of elements to allocate.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
+mlx4_rxq_alloc_elts(struct rxq *rxq)
 {
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 	unsigned int i;
-	struct rxq_elt (*elts)[elts_n] =
-		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
-				  rxq->socket);
 
-	if (elts == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: can't allocate packets array", (void *)rxq);
-		goto error;
-	}
 	/* For each WR (packet). */
-	for (i = 0; (i != elts_n); ++i) {
+	for (i = 0; i != RTE_DIM(*elts); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
 		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge *sge = &(*elts)[i].sge;
 		struct rte_mbuf *buf = rte_pktmbuf_alloc(rxq->mp);
 
 		if (buf == NULL) {
+			while (i--) {
+				rte_pktmbuf_free_seg((*elts)[i].buf);
+				(*elts)[i].buf = NULL;
+			}
 			rte_errno = ENOMEM;
-			ERROR("%p: empty mbuf pool", (void *)rxq);
-			goto error;
+			return -rte_errno;
 		}
 		elt->buf = buf;
 		wr->next = &(*elts)[(i + 1)].wr;
@@ -121,21 +115,7 @@ mlx4_rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 	}
 	/* The last WR pointer must be NULL. */
 	(*elts)[(i - 1)].wr.next = NULL;
-	DEBUG("%p: allocated and configured %u single-segment WRs",
-	      (void *)rxq, elts_n);
-	rxq->elts_n = elts_n;
-	rxq->elts_head = 0;
-	rxq->elts = elts;
 	return 0;
-error:
-	if (elts != NULL) {
-		for (i = 0; (i != RTE_DIM(*elts)); ++i)
-			rte_pktmbuf_free_seg((*elts)[i].buf);
-		rte_free(elts);
-	}
-	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(rte_errno > 0);
-	return -rte_errno;
 }
 
 /**
@@ -148,17 +128,15 @@ static void
 mlx4_rxq_free_elts(struct rxq *rxq)
 {
 	unsigned int i;
-	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt (*elts)[elts_n] = rxq->elts;
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 
 	DEBUG("%p: freeing WRs", (void *)rxq);
-	rxq->elts_n = 0;
-	rxq->elts = NULL;
-	if (elts == NULL)
-		return;
-	for (i = 0; (i != RTE_DIM(*elts)); ++i)
+	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+		if (!(*elts)[i].buf)
+			continue;
 		rte_pktmbuf_free_seg((*elts)[i].buf);
-	rte_free(elts);
+		(*elts)[i].buf = NULL;
+	}
 }
 
 /**
@@ -187,8 +165,21 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 {
 	struct priv *priv = dev->data->dev_private;
 	uint32_t mb_len = rte_pktmbuf_data_room_size(mp);
+	struct rxq_elt (*elts)[desc];
 	struct rte_flow_error error;
 	struct rxq *rxq;
+	struct rte_malloc_vec vec[] = {
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*rxq),
+			.addr = (void **)&rxq,
+		},
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*elts),
+			.addr = (void **)&elts,
+		},
+	};
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
@@ -213,9 +204,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		return -rte_errno;
 	}
 	/* Allocate and initialize Rx queue. */
-	rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+	rte_zmallocv_socket("RXQ", vec, RTE_DIM(vec), socket);
 	if (!rxq) {
-		rte_errno = ENOMEM;
 		ERROR("%p: unable to allocate queue index %u",
 		      (void *)dev, idx);
 		return -rte_errno;
@@ -224,6 +214,9 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		.priv = priv,
 		.mp = mp,
 		.port_id = dev->data->port_id,
+		.elts_n = desc,
+		.elts_head = 0,
+		.elts = elts,
 		.stats.idx = idx,
 		.socket = socket,
 	};
@@ -307,7 +300,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_rxq_alloc_elts(rxq, desc);
+	ret = mlx4_rxq_alloc_elts(rxq);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index d62120e..d90f2f9 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -81,6 +81,7 @@ struct rxq {
 	struct rxq_elt (*elts)[]; /**< Rx elements. */
 	struct mlx4_rxq_stats stats; /**< Rx queue counters. */
 	unsigned int socket; /**< CPU socket ID for allocations. */
+	uint8_t data[]; /**< Remaining queue resources. */
 };
 
 /** Tx element. */
@@ -118,6 +119,7 @@ struct txq {
 	unsigned int elts_comp_cd_init; /**< Initial value for countdown. */
 	struct mlx4_txq_stats stats; /**< Tx queue counters. */
 	unsigned int socket; /**< CPU socket ID for allocations. */
+	uint8_t data[]; /**< Remaining queue resources. */
 };
 
 /* mlx4_rxq.c */
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index f102c68..7042cd9 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -64,59 +64,6 @@
 #include "mlx4_utils.h"
 
 /**
- * Allocate Tx queue elements.
- *
- * @param txq
- *   Pointer to Tx queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_txq_alloc_elts(struct txq *txq, unsigned int elts_n)
-{
-	unsigned int i;
-	struct txq_elt (*elts)[elts_n] =
-		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
-	int ret = 0;
-
-	if (elts == NULL) {
-		ERROR("%p: can't allocate packets array", (void *)txq);
-		ret = ENOMEM;
-		goto error;
-	}
-	for (i = 0; (i != elts_n); ++i) {
-		struct txq_elt *elt = &(*elts)[i];
-
-		elt->buf = NULL;
-	}
-	DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
-	txq->elts_n = elts_n;
-	txq->elts = elts;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	/*
-	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
-	 * at least 4 times per ring.
-	 */
-	txq->elts_comp_cd_init =
-		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
-		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
-	txq->elts_comp_cd = txq->elts_comp_cd_init;
-	assert(ret == 0);
-	return 0;
-error:
-	rte_free(elts);
-	DEBUG("%p: failed, freed everything", (void *)txq);
-	assert(ret > 0);
-	rte_errno = ret;
-	return -rte_errno;
-}
-
-/**
  * Free Tx queue elements.
  *
  * @param txq
@@ -125,34 +72,21 @@ mlx4_txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 static void
 mlx4_txq_free_elts(struct txq *txq)
 {
-	unsigned int elts_n = txq->elts_n;
 	unsigned int elts_head = txq->elts_head;
 	unsigned int elts_tail = txq->elts_tail;
-	struct txq_elt (*elts)[elts_n] = txq->elts;
+	struct txq_elt (*elts)[txq->elts_n] = txq->elts;
 
 	DEBUG("%p: freeing WRs", (void *)txq);
-	txq->elts_n = 0;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	txq->elts_comp_cd = 0;
-	txq->elts_comp_cd_init = 0;
-	txq->elts = NULL;
-	if (elts == NULL)
-		return;
 	while (elts_tail != elts_head) {
 		struct txq_elt *elt = &(*elts)[elts_tail];
 
 		assert(elt->buf != NULL);
 		rte_pktmbuf_free(elt->buf);
-#ifndef NDEBUG
-		/* Poisoning. */
-		memset(elt, 0x77, sizeof(*elt));
-#endif
-		if (++elts_tail == elts_n)
+		elt->buf = NULL;
+		if (++elts_tail == RTE_DIM(*elts))
 			elts_tail = 0;
 	}
-	rte_free(elts);
+	txq->elts_tail = txq->elts_head;
 }
 
 struct txq_mp2mr_mbuf_check_data {
@@ -235,8 +169,21 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		    unsigned int socket, const struct rte_eth_txconf *conf)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct txq_elt (*elts)[desc];
 	struct ibv_qp_init_attr qp_init_attr;
 	struct txq *txq;
+	struct rte_malloc_vec vec[] = {
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*txq),
+			.addr = (void **)&txq,
+		},
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*elts),
+			.addr = (void **)&elts,
+		},
+	};
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
@@ -261,9 +208,8 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		return -rte_errno;
 	}
 	/* Allocate and initialize Tx queue. */
-	txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+	rte_zmallocv_socket("TXQ", vec, RTE_DIM(vec), socket);
 	if (!txq) {
-		rte_errno = ENOMEM;
 		ERROR("%p: unable to allocate queue index %u",
 		      (void *)dev, idx);
 		return -rte_errno;
@@ -272,6 +218,19 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		.priv = priv,
 		.stats.idx = idx,
 		.socket = socket,
+		.elts_n = desc,
+		.elts = elts,
+		.elts_head = 0,
+		.elts_tail = 0,
+		.elts_comp = 0,
+		/*
+		 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ
+		 * packets or at least 4 times per ring.
+		 */
+		.elts_comp_cd =
+			RTE_MIN(MLX4_PMD_TX_PER_COMP_REQ, desc / 4),
+		.elts_comp_cd_init =
+			RTE_MIN(MLX4_PMD_TX_PER_COMP_REQ, desc / 4),
 	};
 	txq->cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
 	if (!txq->cq) {
@@ -314,12 +273,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_txq_alloc_elts(txq, desc);
-	if (ret) {
-		ERROR("%p: TXQ allocation failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
 	ret = ibv_modify_qp
 		(txq->qp,
 		 &(struct ibv_qp_attr){
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 25/29] net/mlx4: convert Rx path to work queues
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (23 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 24/29] net/mlx4: allocate queues and mbuf rings together Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 26/29] net/mlx4: remove unnecessary check Adrien Mazarguil
                   ` (4 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Work queues (WQs) are lower-level than standard queue pairs (QPs). They are
dedicated to one traffic direction and have to be used in conjunction with
indirection tables and special "hash" QPs to get the same level of
functionality.

These extra objects however are the building blocks for RSS support brought
by subsequent commits, as a single "hash" QP can manage several WQs through
an indirection table according to a hash algorithm and other parameters.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h      |  3 ++
 drivers/net/mlx4/mlx4_rxq.c  | 74 ++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_rxtx.c |  2 +-
 drivers/net/mlx4/mlx4_rxtx.h |  2 ++
 4 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a27399a..b04a104 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -61,6 +61,9 @@
 /** Maximum size for inline data. */
 #define MLX4_PMD_MAX_INLINE 0
 
+/** Fixed RSS hash key size in bytes. Cannot be modified. */
+#define MLX4_RSS_HASH_KEY_SIZE 40
+
 /**
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
  * from which buffers are to be transmitted will have to be mapped by this
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 03e6af5..b56f1ff 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -268,18 +268,64 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	rxq->qp = ibv_create_qp
-		(priv->pd,
-		 &(struct ibv_qp_init_attr){
-			.send_cq = rxq->cq,
-			.recv_cq = rxq->cq,
-			.cap = {
-				.max_recv_wr =
-					RTE_MIN(priv->device_attr.max_qp_wr,
-						desc),
-				.max_recv_sge = 1,
+	rxq->wq = ibv_create_wq
+		(priv->ctx,
+		 &(struct ibv_wq_init_attr){
+			.wq_type = IBV_WQT_RQ,
+			.max_wr = RTE_MIN(priv->device_attr.max_qp_wr, desc),
+			.max_sge = 1,
+			.pd = priv->pd,
+			.cq = rxq->cq,
+		 });
+	if (!rxq->wq) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: WQ creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = ibv_modify_wq
+		(rxq->wq,
+		 &(struct ibv_wq_attr){
+			.attr_mask = IBV_WQ_ATTR_STATE,
+			.wq_state = IBV_WQS_RDY,
+		 });
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: WQ state to IBV_WPS_RDY failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	rxq->ind = ibv_create_rwq_ind_table
+		(priv->ctx,
+		 &(struct ibv_rwq_ind_table_init_attr){
+			.log_ind_tbl_size = 0,
+			.ind_tbl = (struct ibv_wq *[]){
+				rxq->wq,
 			},
+			.comp_mask = 0,
+		 });
+	if (!rxq->ind) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: indirection table creation failure: %s",
+		      (void *)dev, strerror(errno));
+		goto error;
+	}
+	rxq->qp = ibv_create_qp_ex
+		(priv->ctx,
+		 &(struct ibv_qp_init_attr_ex){
+			.comp_mask = (IBV_QP_INIT_ATTR_PD |
+				      IBV_QP_INIT_ATTR_RX_HASH |
+				      IBV_QP_INIT_ATTR_IND_TABLE),
 			.qp_type = IBV_QPT_RAW_PACKET,
+			.pd = priv->pd,
+			.rwq_ind_tbl = rxq->ind,
+			.rx_hash_conf = {
+				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
+				.rx_hash_key_len = MLX4_RSS_HASH_KEY_SIZE,
+				.rx_hash_key =
+					(uint8_t [MLX4_RSS_HASH_KEY_SIZE]){ 0 },
+				.rx_hash_fields_mask = 0,
+			},
 		 });
 	if (!rxq->qp) {
 		rte_errno = errno ? errno : EINVAL;
@@ -306,8 +352,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = ibv_post_recv(rxq->qp, &(*rxq->elts)[0].wr,
-			    &(struct ibv_recv_wr *){ NULL });
+	ret = ibv_post_wq_recv(rxq->wq, &(*rxq->elts)[0].wr,
+			       &(struct ibv_recv_wr *){ NULL });
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: ibv_post_recv() failed: %s",
@@ -373,6 +419,10 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	mlx4_rxq_free_elts(rxq);
 	if (rxq->qp)
 		claim_zero(ibv_destroy_qp(rxq->qp));
+	if (rxq->ind)
+		claim_zero(ibv_destroy_rwq_ind_table(rxq->ind));
+	if (rxq->wq)
+		claim_zero(ibv_destroy_wq(rxq->wq));
 	if (rxq->cq)
 		claim_zero(ibv_destroy_cq(rxq->cq));
 	if (rxq->channel)
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index b5e7777..859f1bd 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -459,7 +459,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	/* Repost WRs. */
 	*wr_next = NULL;
 	assert(wr_head);
-	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
+	ret = ibv_post_wq_recv(rxq->wq, wr_head, &wr_bad);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
 		DEBUG("%p: recv_burst(): failed (ret=%d)",
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index d90f2f9..897fd2a 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -73,6 +73,8 @@ struct rxq {
 	struct rte_mempool *mp; /**< Memory pool for allocations. */
 	struct ibv_mr *mr; /**< Memory region (for mp). */
 	struct ibv_cq *cq; /**< Completion queue. */
+	struct ibv_wq *wq; /**< Work queue. */
+	struct ibv_rwq_ind_table *ind; /**< Indirection table. */
 	struct ibv_qp *qp; /**< Queue pair. */
 	struct ibv_comp_channel *channel; /**< Rx completion channel. */
 	unsigned int port_id; /**< Port ID for incoming packets. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 26/29] net/mlx4: remove unnecessary check
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (24 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 25/29] net/mlx4: convert Rx path to work queues Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 27/29] net/mlx4: add RSS flow rule action support Adrien Mazarguil
                   ` (3 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Device operation callbacks are not supposed to handle a missing private
data structure.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_ethdev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index ebf2339..661e252 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -750,8 +750,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	char ifname[IF_NAMESIZE];
 
 	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-	if (priv == NULL)
-		return;
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
 	info->max_rx_pktlen = 65536;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 27/29] net/mlx4: add RSS flow rule action support
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (25 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 26/29] net/mlx4: remove unnecessary check Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 28/29] net/mlx4: disable UDP support in RSS flow rules Adrien Mazarguil
                   ` (2 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This patch dissociates single-queue indirection tables and hash QP objects
from Rx queue structures to relinquish their control to users through the
RSS flow rule action, while simultaneously allowing multiple queues to be
associated with RSS contexts.

Flow rules share identical RSS contexts (hashed fields, hash key, target
queues) to save on memory and other resources. The trade-off is some added
complexity due to reference counters management on RSS contexts.

The QUEUE action is re-implemented on top of an automatically-generated
single-queue RSS context.

The following hardware limitations apply to RSS contexts:

- The number of queues in a group must be a power of two.
- Queue indices must be consecutive, for instance the [0 1 2 3] set is
  allowed, however [3 2 1 0], [0 2 1 3] and [0 0 1 1 2 3 3 3] are not.
- The first queue of a group must be aligned to a multiple of the context
  size, e.g. if queues [0 1 2 3 4] are defined globally, allowed group
  combinations are [0 1] and [2 3]; groups [1 2] and [3 4] are not
  supported.
- RSS hash key, while configurable per context, must be exactly 40 bytes
  long.
- The only supported hash algorithm is Toeplitz.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 +
 drivers/net/mlx4/Makefile         |   2 +-
 drivers/net/mlx4/mlx4.c           |  13 ++
 drivers/net/mlx4/mlx4.h           |   2 +
 drivers/net/mlx4/mlx4_ethdev.c    |   1 +
 drivers/net/mlx4/mlx4_flow.c      | 181 ++++++++++++++++++--
 drivers/net/mlx4/mlx4_flow.h      |   3 +-
 drivers/net/mlx4/mlx4_rxq.c       | 303 +++++++++++++++++++++++++--------
 drivers/net/mlx4/mlx4_rxtx.h      |  24 ++-
 mk/rte.app.mk                     |   2 +-
 10 files changed, 445 insertions(+), 87 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 6f8c82a..9750ebf 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -16,6 +16,7 @@ Promiscuous mode     = Y
 Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
+RSS hash             = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Basic stats          = Y
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 0515cd7..3b3a020 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -54,7 +54,7 @@ CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
-LDLIBS += -libverbs
+LDLIBS += -libverbs -lmlx4
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 52f8d51..0db9a19 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -50,6 +50,7 @@
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
 #include <infiniband/verbs.h>
+#include <infiniband/mlx4dv.h>
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
@@ -99,8 +100,20 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow_error error;
+	uint8_t log2_range = rte_log2_u32(dev->data->nb_rx_queues);
 	int ret;
 
+	/* Prepare range for RSS contexts before creating the first WQ. */
+	ret = mlx4dv_set_context_attr(priv->ctx,
+				      MLX4DV_SET_CTX_ATTR_LOG_WQS_RANGE_SZ,
+				      &log2_range);
+	if (ret) {
+		ERROR("cannot set up range size for RSS context to %u"
+		      " (for %u Rx queues), error: %s",
+		      1 << log2_range, dev->data->nb_rx_queues, strerror(ret));
+		rte_errno = ret;
+		return -ret;
+	}
 	/* Prepare internal flow rules. */
 	ret = mlx4_flow_sync(priv, &error);
 	if (ret) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b04a104..f4da8c6 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -95,6 +95,7 @@ enum {
 #define MLX4_DRIVER_NAME "net_mlx4"
 
 struct mlx4_drop;
+struct mlx4_rss;
 struct rxq;
 struct txq;
 struct rte_flow;
@@ -114,6 +115,7 @@ struct priv {
 	uint32_t isolated:1; /**< Toggle isolated mode. */
 	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
 	struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
+	LIST_HEAD(, mlx4_rss) rss; /**< Shared targets for Rx flow rules. */
 	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
 	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
 	/**< Configured MAC addresses. Unused entries are zeroed. */
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 661e252..3623909 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -769,6 +769,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->tx_offload_capa = 0;
 	if (mlx4_get_ifname(priv, &ifname) == 0)
 		info->if_index = if_nametoindex(ifname);
+	info->hash_key_size = MLX4_RSS_HASH_KEY_SIZE;
 	info->speed_capa =
 			ETH_LINK_SPEED_1G |
 			ETH_LINK_SPEED_10G |
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 2d826b4..101f245 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -103,6 +103,62 @@ struct mlx4_drop {
 };
 
 /**
+ * Convert DPDK RSS hash fields to their Verbs equivalent.
+ *
+ * @param rss_hf
+ *   Hash fields in DPDK format (see struct rte_eth_rss_conf).
+ *
+ * @return
+ *   A valid Verbs RSS hash fields mask for mlx4 on success, (uint64_t)-1
+ *   otherwise and rte_errno is set.
+ */
+static uint64_t
+mlx4_conv_rss_hf(uint64_t rss_hf)
+{
+	enum { IPV4, IPV6, TCP, UDP, };
+	const uint64_t in[] = {
+		[IPV4] = (ETH_RSS_IPV4 |
+			  ETH_RSS_FRAG_IPV4 |
+			  ETH_RSS_NONFRAG_IPV4_TCP |
+			  ETH_RSS_NONFRAG_IPV4_UDP |
+			  ETH_RSS_NONFRAG_IPV4_OTHER),
+		[IPV6] = (ETH_RSS_IPV6 |
+			  ETH_RSS_FRAG_IPV6 |
+			  ETH_RSS_NONFRAG_IPV6_TCP |
+			  ETH_RSS_NONFRAG_IPV6_UDP |
+			  ETH_RSS_NONFRAG_IPV6_OTHER |
+			  ETH_RSS_IPV6_EX |
+			  ETH_RSS_IPV6_TCP_EX |
+			  ETH_RSS_IPV6_UDP_EX),
+		[TCP] = (ETH_RSS_NONFRAG_IPV4_TCP |
+			 ETH_RSS_NONFRAG_IPV6_TCP |
+			 ETH_RSS_IPV6_TCP_EX),
+		[UDP] = (ETH_RSS_NONFRAG_IPV4_UDP |
+			 ETH_RSS_NONFRAG_IPV6_UDP |
+			 ETH_RSS_IPV6_UDP_EX),
+	};
+	const uint64_t out[RTE_DIM(in)] = {
+		[IPV4] = IBV_RX_HASH_SRC_IPV4 | IBV_RX_HASH_DST_IPV4,
+		[IPV6] = IBV_RX_HASH_SRC_IPV6 | IBV_RX_HASH_DST_IPV6,
+		[TCP] = IBV_RX_HASH_SRC_PORT_TCP | IBV_RX_HASH_DST_PORT_TCP,
+		[UDP] = IBV_RX_HASH_SRC_PORT_UDP | IBV_RX_HASH_DST_PORT_UDP,
+	};
+	uint64_t seen = 0;
+	uint64_t conv = 0;
+	unsigned int i;
+
+	for (i = 0; i != RTE_DIM(in); ++i)
+		if (rss_hf & in[i]) {
+			seen |= rss_hf & in[i];
+			conv |= out[i];
+		}
+	if (!(rss_hf & ~seen))
+		return conv;
+	rte_errno = ENOTSUP;
+	return (uint64_t)-1;
+}
+
+/**
  * Merge Ethernet pattern item into flow rule handle.
  *
  * Additional mlx4-specific constraints on supported fields:
@@ -663,6 +719,9 @@ mlx4_flow_prepare(struct priv *priv,
 	for (action = actions; action->type; ++action) {
 		switch (action->type) {
 			const struct rte_flow_action_queue *queue;
+			const struct rte_flow_action_rss *rss;
+			const struct rte_eth_rss_conf *rss_conf;
+			unsigned int i;
 
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
@@ -670,23 +729,87 @@ mlx4_flow_prepare(struct priv *priv,
 			flow->drop = 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			if (flow->rss)
+				break;
 			queue = action->conf;
-			if (queue->index >= priv->dev->data->nb_rx_queues)
+			flow->rss = mlx4_rss_get
+				(priv, 0, mlx4_rss_hash_key_default, 1,
+				 &queue->index);
+			if (!flow->rss) {
+				msg = "not enough resources for additional"
+					" single-queue RSS context";
+				goto exit_action_not_supported;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			if (flow->rss)
+				break;
+			rss = action->conf;
+			/* Default RSS configuration if none is provided. */
+			rss_conf =
+				rss->rss_conf ?
+				rss->rss_conf :
+				&(struct rte_eth_rss_conf){
+					.rss_key = mlx4_rss_hash_key_default,
+					.rss_key_len = MLX4_RSS_HASH_KEY_SIZE,
+					.rss_hf = (ETH_RSS_IPV4 |
+						   ETH_RSS_NONFRAG_IPV4_UDP |
+						   ETH_RSS_NONFRAG_IPV4_TCP |
+						   ETH_RSS_IPV6 |
+						   ETH_RSS_NONFRAG_IPV6_UDP |
+						   ETH_RSS_NONFRAG_IPV6_TCP),
+				};
+			/* Sanity checks. */
+			if (!rte_is_power_of_2(rss->num)) {
+				msg = "for RSS, mlx4 requires the number of"
+					" queues to be a power of two";
+				goto exit_action_not_supported;
+			}
+			if (rss_conf->rss_key_len !=
+			    sizeof(flow->rss->key)) {
+				msg = "mlx4 supports exactly one RSS hash key"
+					" length: "
+					MLX4_STR_EXPAND(MLX4_RSS_HASH_KEY_SIZE);
+				goto exit_action_not_supported;
+			}
+			for (i = 1; i < rss->num; ++i)
+				if (rss->queue[i] - rss->queue[i - 1] != 1)
+					break;
+			if (i != rss->num) {
+				msg = "mlx4 requires RSS contexts to use"
+					" consecutive queue indices only";
+				goto exit_action_not_supported;
+			}
+			if (rss->queue[0] % rss->num) {
+				msg = "mlx4 requires the first queue of a RSS"
+					" context to be aligned on a multiple"
+					" of the context size";
+				goto exit_action_not_supported;
+			}
+			flow->rss = mlx4_rss_get
+				(priv, mlx4_conv_rss_hf(rss_conf->rss_hf),
+				 rss_conf->rss_key, rss->num, rss->queue);
+			if (!flow->rss) {
+				msg = "either invalid parameters or not enough"
+					" resources for additional multi-queue"
+					" RSS context";
 				goto exit_action_not_supported;
-			flow->queue = 1;
-			flow->queue_id = queue->index;
+			}
 			break;
 		default:
 			goto exit_action_not_supported;
 		}
 	}
-	if (!flow->queue && !flow->drop)
+	if (!flow->rss && !flow->drop)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "no valid action");
 	/* Validation ends here. */
-	if (!addr)
+	if (!addr) {
+		if (flow->rss)
+			mlx4_rss_put(flow->rss);
 		return 0;
+	}
 	if (flow == &temp) {
 		/* Allocate proper handle based on collected data. */
 		const struct rte_malloc_vec vec[] = {
@@ -711,6 +834,7 @@ mlx4_flow_prepare(struct priv *priv,
 		*flow = (struct rte_flow){
 			.ibv_attr = temp.ibv_attr,
 			.ibv_attr_size = sizeof(*flow->ibv_attr),
+			.rss = temp.rss,
 		};
 		*flow->ibv_attr = (struct ibv_flow_attr){
 			.type = IBV_FLOW_ATTR_NORMAL,
@@ -727,7 +851,7 @@ mlx4_flow_prepare(struct priv *priv,
 				  item, msg ? msg : "item not supported");
 exit_action_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
-				  action, "action not supported");
+				  action, msg ? msg : "action not supported");
 }
 
 /**
@@ -850,6 +974,8 @@ mlx4_flow_toggle(struct priv *priv,
 		flow->ibv_flow = NULL;
 		if (flow->drop)
 			mlx4_drop_put(priv->drop);
+		else if (flow->rss)
+			mlx4_rss_detach(flow->rss);
 		return 0;
 	}
 	assert(flow->ibv_attr);
@@ -861,6 +987,8 @@ mlx4_flow_toggle(struct priv *priv,
 			flow->ibv_flow = NULL;
 			if (flow->drop)
 				mlx4_drop_put(priv->drop);
+			else if (flow->rss)
+				mlx4_rss_detach(flow->rss);
 		}
 		err = EACCES;
 		msg = ("priority level "
@@ -868,24 +996,42 @@ mlx4_flow_toggle(struct priv *priv,
 		       " is reserved when not in isolated mode");
 		goto error;
 	}
-	if (flow->queue) {
-		struct rxq *rxq = NULL;
+	if (flow->rss) {
+		struct mlx4_rss *rss = flow->rss;
+		int missing = 0;
+		unsigned int i;
 
-		if (flow->queue_id < priv->dev->data->nb_rx_queues)
-			rxq = priv->dev->data->rx_queues[flow->queue_id];
+		/* Stop at the first nonexistent target queue. */
+		for (i = 0; i != rss->queues; ++i)
+			if (rss->queue_id[i] >=
+			    priv->dev->data->nb_rx_queues ||
+			    !priv->dev->data->rx_queues[rss->queue_id[i]]) {
+				missing = 1;
+				break;
+			}
 		if (flow->ibv_flow) {
-			if (!rxq ^ !flow->drop)
+			if (missing ^ !flow->drop)
 				return 0;
 			/* Verbs flow needs updating. */
 			claim_zero(ibv_destroy_flow(flow->ibv_flow));
 			flow->ibv_flow = NULL;
 			if (flow->drop)
 				mlx4_drop_put(priv->drop);
+			else
+				mlx4_rss_detach(rss);
+		}
+		if (!missing) {
+			err = mlx4_rss_attach(rss);
+			if (err) {
+				err = -err;
+				msg = "cannot create indirection table or hash"
+					" QP to associate flow rule with";
+				goto error;
+			}
+			qp = rss->qp;
 		}
-		if (rxq)
-			qp = rxq->qp;
 		/* A missing target queue drops traffic implicitly. */
-		flow->drop = !rxq;
+		flow->drop = missing;
 	}
 	if (flow->drop) {
 		mlx4_drop_get(priv);
@@ -904,6 +1050,8 @@ mlx4_flow_toggle(struct priv *priv,
 		return 0;
 	if (flow->drop)
 		mlx4_drop_put(priv->drop);
+	else if (flow->rss)
+		mlx4_rss_detach(flow->rss);
 	err = errno;
 	msg = "flow rule rejected by device";
 error:
@@ -946,6 +1094,8 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		}
 		return flow;
 	}
+	if (flow->rss)
+		mlx4_rss_put(flow->rss);
 	rte_flow_error_set(error, -err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			   error->message);
 	rte_free(flow);
@@ -992,6 +1142,8 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 	if (err)
 		return err;
 	LIST_REMOVE(flow, next);
+	if (flow->rss)
+		mlx4_rss_put(flow->rss);
 	rte_free(flow);
 	return 0;
 }
@@ -1320,6 +1472,7 @@ mlx4_flow_clean(struct priv *priv)
 
 	while ((flow = LIST_FIRST(&priv->flows)))
 		mlx4_flow_destroy(priv->dev, flow, NULL);
+	assert(LIST_EMPTY(&priv->rss));
 }
 
 static const struct rte_flow_ops mlx4_flow_ops = {
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 134e14d..651fd37 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -70,8 +70,7 @@ struct rte_flow {
 	uint32_t promisc:1; /**< This rule matches everything. */
 	uint32_t allmulti:1; /**< This rule matches all multicast traffic. */
 	uint32_t drop:1; /**< This rule drops packets. */
-	uint32_t queue:1; /**< Target is a receive queue. */
-	uint16_t queue_id; /**< Target queue. */
+	struct mlx4_rss *rss; /**< Rx target. */
 };
 
 /* mlx4_flow.c */
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index b56f1ff..e7bde2e 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -65,6 +65,242 @@
 #include "mlx4_utils.h"
 
 /**
+ * Historical RSS hash key.
+ *
+ * This used to be the default for mlx4 in Linux before v3.19 switched to
+ * generating random hash keys through netdev_rss_key_fill().
+ *
+ * It is used in this PMD for consistency with past DPDK releases but can
+ * now be overridden through user configuration.
+ *
+ * Note: this is not const to work around API quirks.
+ */
+uint8_t
+mlx4_rss_hash_key_default[MLX4_RSS_HASH_KEY_SIZE] = {
+	0x2c, 0xc6, 0x81, 0xd1,
+	0x5b, 0xdb, 0xf4, 0xf7,
+	0xfc, 0xa2, 0x83, 0x19,
+	0xdb, 0x1a, 0x3e, 0x94,
+	0x6b, 0x9e, 0x38, 0xd9,
+	0x2c, 0x9c, 0x03, 0xd1,
+	0xad, 0x99, 0x44, 0xa7,
+	0xd9, 0x56, 0x3d, 0x59,
+	0x06, 0x3c, 0x25, 0xf3,
+	0xfc, 0x1f, 0xdc, 0x2a,
+};
+
+/**
+ * Obtain a RSS context with specified properties.
+ *
+ * Used when creating a flow rule targeting one or several Rx queues.
+ *
+ * If a matching RSS context already exists, it is returned with its
+ * reference count incremented.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param fields
+ *   Fields for RSS processing (Verbs format).
+ * @param[in] key
+ *   Hash key to use (whose size is exactly MLX4_RSS_HASH_KEY_SIZE).
+ * @param queues
+ *   Number of target queues.
+ * @param[in] queue_id
+ *   Target queues.
+ *
+ * @return
+ *   Pointer to RSS context on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx4_rss *
+mlx4_rss_get(struct priv *priv, uint64_t fields,
+	     uint8_t key[MLX4_RSS_HASH_KEY_SIZE],
+	     uint16_t queues, const uint16_t queue_id[])
+{
+	struct mlx4_rss *rss;
+	size_t queue_id_size = sizeof(queue_id[0]) * queues;
+
+	LIST_FOREACH(rss, &priv->rss, next)
+		if (fields == rss->fields &&
+		    queues == rss->queues &&
+		    !memcmp(key, rss->key, MLX4_RSS_HASH_KEY_SIZE) &&
+		    !memcmp(queue_id, rss->queue_id, queue_id_size)) {
+			++rss->refcnt;
+			return rss;
+		}
+	rss = rte_malloc(__func__, offsetof(struct mlx4_rss, queue_id) +
+			 queue_id_size, 0);
+	if (!rss)
+		goto error;
+	*rss = (struct mlx4_rss){
+		.priv = priv,
+		.refcnt = 1,
+		.usecnt = 0,
+		.qp = NULL,
+		.ind = NULL,
+		.fields = fields,
+		.queues = queues,
+	};
+	memcpy(rss->key, key, MLX4_RSS_HASH_KEY_SIZE);
+	memcpy(rss->queue_id, queue_id, queue_id_size);
+	LIST_INSERT_HEAD(&priv->rss, rss, next);
+	return rss;
+error:
+	rte_errno = ENOMEM;
+	return NULL;
+}
+
+/**
+ * Release a RSS context instance.
+ *
+ * Used when destroying a flow rule targeting one or several Rx queues.
+ *
+ * This function decrements the reference count of the context and destroys
+ * it after reaching 0. The context must have no users at this point; all
+ * prior calls to mlx4_rss_attach() must have been followed by matching
+ * calls to mlx4_rss_detach().
+ *
+ * @param rss
+ *   RSS context to release.
+ */
+void mlx4_rss_put(struct mlx4_rss *rss)
+{
+	assert(rss->refcnt);
+	if (--rss->refcnt)
+		return;
+	assert(!rss->usecnt);
+	assert(!rss->qp);
+	assert(!rss->ind);
+	LIST_REMOVE(rss, next);
+	rte_free(rss);
+}
+
+/**
+ * Attach a user to a RSS context instance.
+ *
+ * Used when the RSS QP and indirection table objects must be instantiated,
+ * that is, when a flow rule must be enabled.
+ *
+ * This function increments the usage count of the context.
+ *
+ * @param rss
+ *   RSS context to attach to.
+ */
+int mlx4_rss_attach(struct mlx4_rss *rss)
+{
+	assert(rss->refcnt);
+	if (rss->usecnt++) {
+		assert(rss->qp);
+		assert(rss->ind);
+		return 0;
+	}
+
+	struct ibv_wq *ind_tbl[rss->queues];
+	struct priv *priv = rss->priv;
+	const char *msg;
+	unsigned int i;
+	int ret;
+
+	if (!rte_is_power_of_2(RTE_DIM(ind_tbl))) {
+		msg = "number of RSS queues must be a power of two";
+		goto error;
+	}
+	for (i = 0; i != RTE_DIM(ind_tbl); ++i) {
+		uint16_t id = rss->queue_id[i];
+		struct rxq *rxq = NULL;
+
+		if (id < priv->dev->data->nb_rx_queues)
+			rxq = priv->dev->data->rx_queues[id];
+		if (!rxq) {
+			msg = "RSS target queue is not configured";
+			goto error;
+		}
+		ind_tbl[i] = rxq->wq;
+	}
+	rss->ind = ibv_create_rwq_ind_table
+		(priv->ctx,
+		 &(struct ibv_rwq_ind_table_init_attr){
+			.log_ind_tbl_size = rte_log2_u32(RTE_DIM(ind_tbl)),
+			.ind_tbl = ind_tbl,
+			.comp_mask = 0,
+		 });
+	if (!rss->ind) {
+		msg = "RSS indirection table creation failure";
+		goto error;
+	}
+	rss->qp = ibv_create_qp_ex
+		(priv->ctx,
+		 &(struct ibv_qp_init_attr_ex){
+			.comp_mask = (IBV_QP_INIT_ATTR_PD |
+				      IBV_QP_INIT_ATTR_RX_HASH |
+				      IBV_QP_INIT_ATTR_IND_TABLE),
+			.qp_type = IBV_QPT_RAW_PACKET,
+			.pd = priv->pd,
+			.rwq_ind_tbl = rss->ind,
+			.rx_hash_conf = {
+				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
+				.rx_hash_key_len = MLX4_RSS_HASH_KEY_SIZE,
+				.rx_hash_key = rss->key,
+				.rx_hash_fields_mask = rss->fields,
+			},
+		 });
+	if (!rss->qp) {
+		msg = "RSS hash QP creation failure";
+		goto error;
+	}
+	ret = ibv_modify_qp
+		(rss->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_INIT,
+			.port_num = priv->port,
+		 },
+		 IBV_QP_STATE | IBV_QP_PORT);
+	if (ret) {
+		msg = "failed to switch RSS hash QP to INIT state";
+		goto error;
+	}
+	ret = ibv_modify_qp
+		(rss->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTR,
+		 },
+		 IBV_QP_STATE);
+	if (ret) {
+		msg = "failed to switch RSS hash QP to RTR state";
+		goto error;
+	}
+	return 0;
+error:
+	ERROR("mlx4: %s", msg);
+	--rss->usecnt;
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Detach a user from a RSS context instance.
+ *
+ * Used when disabling (not destroying) a flow rule.
+ *
+ * This function decrements the usage count of the context and destroys
+ * usage resources after reaching 0.
+ *
+ * @param rss
+ *   RSS context to detach from.
+ */
+void mlx4_rss_detach(struct mlx4_rss *rss)
+{
+	assert(rss->refcnt);
+	assert(rss->qp);
+	assert(rss->ind);
+	if (--rss->usecnt)
+		return;
+	claim_zero(ibv_destroy_qp(rss->qp));
+	rss->qp = NULL;
+	claim_zero(ibv_destroy_rwq_ind_table(rss->ind));
+	rss->ind = NULL;
+}
+
+/**
  * Allocate Rx queue elements.
  *
  * @param rxq
@@ -295,57 +531,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	rxq->ind = ibv_create_rwq_ind_table
-		(priv->ctx,
-		 &(struct ibv_rwq_ind_table_init_attr){
-			.log_ind_tbl_size = 0,
-			.ind_tbl = (struct ibv_wq *[]){
-				rxq->wq,
-			},
-			.comp_mask = 0,
-		 });
-	if (!rxq->ind) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: indirection table creation failure: %s",
-		      (void *)dev, strerror(errno));
-		goto error;
-	}
-	rxq->qp = ibv_create_qp_ex
-		(priv->ctx,
-		 &(struct ibv_qp_init_attr_ex){
-			.comp_mask = (IBV_QP_INIT_ATTR_PD |
-				      IBV_QP_INIT_ATTR_RX_HASH |
-				      IBV_QP_INIT_ATTR_IND_TABLE),
-			.qp_type = IBV_QPT_RAW_PACKET,
-			.pd = priv->pd,
-			.rwq_ind_tbl = rxq->ind,
-			.rx_hash_conf = {
-				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
-				.rx_hash_key_len = MLX4_RSS_HASH_KEY_SIZE,
-				.rx_hash_key =
-					(uint8_t [MLX4_RSS_HASH_KEY_SIZE]){ 0 },
-				.rx_hash_fields_mask = 0,
-			},
-		 });
-	if (!rxq->qp) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = ibv_modify_qp
-		(rxq->qp,
-		 &(struct ibv_qp_attr){
-			.qp_state = IBV_QPS_INIT,
-			.port_num = priv->port,
-		 },
-		 IBV_QP_STATE | IBV_QP_PORT);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
 	ret = mlx4_rxq_alloc_elts(rxq);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
@@ -361,18 +546,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      strerror(rte_errno));
 		goto error;
 	}
-	ret = ibv_modify_qp
-		(rxq->qp,
-		 &(struct ibv_qp_attr){
-			.qp_state = IBV_QPS_RTR,
-		 },
-		 IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
 	DEBUG("%p: adding Rx queue %p to list", (void *)dev, (void *)rxq);
 	dev->data->rx_queues[idx] = rxq;
 	/* Enable associated flows. */
@@ -417,10 +590,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		}
 	mlx4_flow_sync(priv, NULL);
 	mlx4_rxq_free_elts(rxq);
-	if (rxq->qp)
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	if (rxq->ind)
-		claim_zero(ibv_destroy_rwq_ind_table(rxq->ind));
 	if (rxq->wq)
 		claim_zero(ibv_destroy_wq(rxq->wq));
 	if (rxq->cq)
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 897fd2a..eca966f 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -35,6 +35,7 @@
 #define MLX4_RXTX_H_
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 /* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
@@ -74,8 +75,6 @@ struct rxq {
 	struct ibv_mr *mr; /**< Memory region (for mp). */
 	struct ibv_cq *cq; /**< Completion queue. */
 	struct ibv_wq *wq; /**< Work queue. */
-	struct ibv_rwq_ind_table *ind; /**< Indirection table. */
-	struct ibv_qp *qp; /**< Queue pair. */
 	struct ibv_comp_channel *channel; /**< Rx completion channel. */
 	unsigned int port_id; /**< Port ID for incoming packets. */
 	unsigned int elts_n; /**< (*elts)[] length. */
@@ -86,6 +85,20 @@ struct rxq {
 	uint8_t data[]; /**< Remaining queue resources. */
 };
 
+/** Shared flow target for Rx queues. */
+struct mlx4_rss {
+	LIST_ENTRY(mlx4_rss) next; /**< Next entry in list. */
+	struct priv *priv; /**< Back pointer to private data. */
+	uint32_t refcnt; /**< Reference count for this object. */
+	uint32_t usecnt; /**< Number of users relying on @p qp and @p ind. */
+	struct ibv_qp *qp; /**< Queue pair. */
+	struct ibv_rwq_ind_table *ind; /**< Indirection table. */
+	uint64_t fields; /**< Fields for RSS processing (Verbs format). */
+	uint8_t key[MLX4_RSS_HASH_KEY_SIZE]; /**< Hash key to use. */
+	uint16_t queues; /**< Number of target queues. */
+	uint16_t queue_id[]; /**< Target queues. */
+};
+
 /** Tx element. */
 struct txq_elt {
 	struct ibv_send_wr wr; /**< Work request. */
@@ -126,6 +139,13 @@ struct txq {
 
 /* mlx4_rxq.c */
 
+uint8_t mlx4_rss_hash_key_default[MLX4_RSS_HASH_KEY_SIZE];
+struct mlx4_rss *mlx4_rss_get(struct priv *priv, uint64_t fields,
+			      uint8_t key[MLX4_RSS_HASH_KEY_SIZE],
+			      uint16_t queues, const uint16_t queue_id[]);
+void mlx4_rss_put(struct mlx4_rss *rss);
+int mlx4_rss_attach(struct mlx4_rss *rss);
+void mlx4_rss_detach(struct mlx4_rss *rss);
 int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			uint16_t desc, unsigned int socket,
 			const struct rte_eth_rxconf *conf,
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0b8f612..c0e0e86 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -135,7 +135,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI)        += -lrte_pmd_kni
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs -lmlx4
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MRVL_PMD)       += -lrte_pmd_mrvl -L$(LIBMUSDK_PATH)/lib -lmusdk
 _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 28/29] net/mlx4: disable UDP support in RSS flow rules
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (26 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 27/29] net/mlx4: add RSS flow rule action support Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-11 14:35 ` [PATCH v1 29/29] net/mlx4: add RSS support outside flow API Adrien Mazarguil
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

When part of the RSS hash calculation, UDP packets are discarded (not
received on any queue) likely due to an issue with the kernel
implementation.

Temporarily disable UDP RSS support until this issue is resolved.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 101f245..41b7a4c 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -133,9 +133,11 @@ mlx4_conv_rss_hf(uint64_t rss_hf)
 		[TCP] = (ETH_RSS_NONFRAG_IPV4_TCP |
 			 ETH_RSS_NONFRAG_IPV6_TCP |
 			 ETH_RSS_IPV6_TCP_EX),
-		[UDP] = (ETH_RSS_NONFRAG_IPV4_UDP |
-			 ETH_RSS_NONFRAG_IPV6_UDP |
-			 ETH_RSS_IPV6_UDP_EX),
+		/*
+		 * UDP support is temporarily disabled due to an
+		 * implementation issue in the kernel.
+		 */
+		[UDP] = 0,
 	};
 	const uint64_t out[RTE_DIM(in)] = {
 		[IPV4] = IBV_RX_HASH_SRC_IPV4 | IBV_RX_HASH_DST_IPV4,
@@ -753,10 +755,8 @@ mlx4_flow_prepare(struct priv *priv,
 					.rss_key = mlx4_rss_hash_key_default,
 					.rss_key_len = MLX4_RSS_HASH_KEY_SIZE,
 					.rss_hf = (ETH_RSS_IPV4 |
-						   ETH_RSS_NONFRAG_IPV4_UDP |
 						   ETH_RSS_NONFRAG_IPV4_TCP |
 						   ETH_RSS_IPV6 |
-						   ETH_RSS_NONFRAG_IPV6_UDP |
 						   ETH_RSS_NONFRAG_IPV6_TCP),
 				};
 			/* Sanity checks. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v1 29/29] net/mlx4: add RSS support outside flow API
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (27 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 28/29] net/mlx4: disable UDP support in RSS flow rules Adrien Mazarguil
@ 2017-10-11 14:35 ` Adrien Mazarguil
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-11 14:35 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Bring back support for automatic RSS with the default flow rules when not
in isolated mode. Balancing is done according to unspecified default
settings, as was the case before this entire rework.

Since the number of queues part of RSS contexts is limited to power of two
values, the number of configured queues is rounded down to its previous
power of two; extra queues are silently discarded. This does not prevent
dedicated flow rules from targeting them.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 41b7a4c..eca3990 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -1256,12 +1256,21 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	/*
+	 * Round number of queues down to their previous power of 2 to
+	 * comply with RSS context limitations. Extra queues silently do not
+	 * get RSS by default.
+	 */
+	uint32_t queues =
+		rte_align32pow2(priv->dev->data->nb_rx_queues + 1) >> 1;
+	alignas(struct rte_flow_action_rss) uint8_t rss_conf_data
+		[offsetof(struct rte_flow_action_rss, queue) +
+		 sizeof(((struct rte_flow_action_rss *)0)->queue[0]) * queues];
+	struct rte_flow_action_rss *rss_conf = (void *)rss_conf_data;
 	struct rte_flow_action actions[] = {
 		{
-			.type = RTE_FLOW_ACTION_TYPE_QUEUE,
-			.conf = &(struct rte_flow_action_queue){
-				.index = 0,
-			},
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = rss_conf,
 		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_END,
@@ -1281,6 +1290,13 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	unsigned int i;
 	int err = 0;
 
+	/* Prepare default RSS configuration. */
+	*rss_conf = (struct rte_flow_action_rss){
+		.rss_conf = NULL, /* Rely on default fallback settings. */
+		.num = queues,
+	};
+	for (i = 0; i != queues; ++i)
+		rss_conf->queue[i] = i;
 	/*
 	 * Set up VLAN item if filtering is enabled and at least one VLAN
 	 * filter is configured.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v1 09/29] mem: add iovec-like allocation wrappers
  2017-10-11 14:35 ` [PATCH v1 09/29] mem: add iovec-like allocation wrappers Adrien Mazarguil
@ 2017-10-11 21:58   ` Ferruh Yigit
  2017-10-11 22:00     ` Ferruh Yigit
  2017-10-12 11:07     ` Adrien Mazarguil
  0 siblings, 2 replies; 64+ messages in thread
From: Ferruh Yigit @ 2017-10-11 21:58 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev

On 10/11/2017 3:35 PM, Adrien Mazarguil wrote:
> These wrappers implement the ability to allocate room for several disparate
> objects as a single contiguous allocation while complying with their
> respective alignment constraints.
> 
> This is usually more efficient than allocating and freeing them
> individually if they are not expected to be reallocated with rte_realloc().
> 
> A typical use case is when several objects that cannot be dissociated must
> be allocated together, as shown in the following example:
> 
>  struct b {
>     ...
>     struct d *d;
>  }
> 
>  struct a {
>      ...
>      struct b *b;
>      struct c *c;
>  }
> 
>  struct rte_malloc_vec vec[] = {
>      { .size = sizeof(struct a), .addr = &ptr_a, },
>      { .size = sizeof(struct b), .addr = &ptr_b, },
>      { .size = sizeof(struct c), .addr = &ptr_c, },
>      { .size = sizeof(struct d), .addr = &ptr_d, },
>  };
> 
>  if (!rte_mallocv(NULL, vec, RTE_DIM(vec)))
>      goto error;
> 
>  struct a *a = ptr_a;
> 
>  a->b = ptr_b;
>  a->c = ptr_c;
>  a->b->d = ptr_d;
>  ...
>  rte_free(a);
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

Hi Adrien,

Why there is an eal patch in the middle of the mlx4 patchset?

I believe this shouldn't go in via next-net tree, and should be reviewed
properly.

I am behaving more flexible for PMD patches about the process and
timing, because their scope is limited.
PMD patches can break at most the PMD itself and if the maintainer is
sending the patch, they should be knowing what they are doing, so vendor
gets the responsibility for their own driver. I am paying majority of
care to be sure it doesn't break others.

But ethdev and eal way beyond those flexibility, because their scope is
much larger.

Can you please extract this patch from the patchset?

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v1 09/29] mem: add iovec-like allocation wrappers
  2017-10-11 21:58   ` Ferruh Yigit
@ 2017-10-11 22:00     ` Ferruh Yigit
  2017-10-12 11:07     ` Adrien Mazarguil
  1 sibling, 0 replies; 64+ messages in thread
From: Ferruh Yigit @ 2017-10-11 22:00 UTC (permalink / raw)
  To: Adrien Mazarguil, Thomas Monjalon; +Cc: dev

On 10/11/2017 10:58 PM, Ferruh Yigit wrote:
> On 10/11/2017 3:35 PM, Adrien Mazarguil wrote:
>> These wrappers implement the ability to allocate room for several disparate
>> objects as a single contiguous allocation while complying with their
>> respective alignment constraints.
>>
>> This is usually more efficient than allocating and freeing them
>> individually if they are not expected to be reallocated with rte_realloc().
>>
>> A typical use case is when several objects that cannot be dissociated must
>> be allocated together, as shown in the following example:
>>
>>  struct b {
>>     ...
>>     struct d *d;
>>  }
>>
>>  struct a {
>>      ...
>>      struct b *b;
>>      struct c *c;
>>  }
>>
>>  struct rte_malloc_vec vec[] = {
>>      { .size = sizeof(struct a), .addr = &ptr_a, }
>>      { .size = sizeof(struct b), .addr = &ptr_b, },
>>      { .size = sizeof(struct c), .addr = &ptr_c, },
>>      { .size = sizeof(struct d), .addr = &ptr_d, },
>>  };
>>
>>  if (!rte_mallocv(NULL, vec, RTE_DIM(vec)))
>>      goto error;
>>
>>  struct a *a = ptr_a;
>>
>>  a->b = ptr_b;
>>  a->c = ptr_c;
>>  a->b->d = ptr_d;
>>  ...
>>  rte_free(a);
>>
>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
>> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> 
> Hi Adrien,
> 
> Why there is an eal patch in the middle of the mlx4 patchset?
> 
> I believe this shouldn't go in via next-net tree, and should be reviewed
> properly.
> 
> I am behaving more flexible for PMD patches about the process and
> timing, because their scope is limited.
> PMD patches can break at most the PMD itself and if the maintainer is
> sending the patch, they should be knowing what they are doing, so vendor
> gets the responsibility for their own driver. I am paying majority of
> care to be sure it doesn't break others.
> 
> But ethdev and eal way beyond those flexibility, because their scope is
> much larger.
> 
> Can you please extract this patch from the patchset?

cc'ed Thomas.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v1 09/29] mem: add iovec-like allocation wrappers
  2017-10-11 21:58   ` Ferruh Yigit
  2017-10-11 22:00     ` Ferruh Yigit
@ 2017-10-12 11:07     ` Adrien Mazarguil
  1 sibling, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 11:07 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Thomas Monjalon

On Wed, Oct 11, 2017 at 10:58:45PM +0100, Ferruh Yigit wrote:
> On 10/11/2017 3:35 PM, Adrien Mazarguil wrote:
> > These wrappers implement the ability to allocate room for several disparate
> > objects as a single contiguous allocation while complying with their
> > respective alignment constraints.
> > 
> > This is usually more efficient than allocating and freeing them
> > individually if they are not expected to be reallocated with rte_realloc().
> > 
> > A typical use case is when several objects that cannot be dissociated must
> > be allocated together, as shown in the following example:
> > 
> >  struct b {
> >     ...
> >     struct d *d;
> >  }
> > 
> >  struct a {
> >      ...
> >      struct b *b;
> >      struct c *c;
> >  }
> > 
> >  struct rte_malloc_vec vec[] = {
> >      { .size = sizeof(struct a), .addr = &ptr_a, },
> >      { .size = sizeof(struct b), .addr = &ptr_b, },
> >      { .size = sizeof(struct c), .addr = &ptr_c, },
> >      { .size = sizeof(struct d), .addr = &ptr_d, },
> >  };
> > 
> >  if (!rte_mallocv(NULL, vec, RTE_DIM(vec)))
> >      goto error;
> > 
> >  struct a *a = ptr_a;
> > 
> >  a->b = ptr_b;
> >  a->c = ptr_c;
> >  a->b->d = ptr_d;
> >  ...
> >  rte_free(a);
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> 
> Hi Adrien,
> 
> Why there is an eal patch in the middle of the mlx4 patchset?

Well, it was probably a mistake to leave it as is; it was more or less there
from the beginning and some other stuff I intended to send also relied on
it, which is why it got promoted to EAL. This series was actually supposed
to be sent much sooner but...

Anyway, I thought it could be useful to share it through EAL, and I secretly
hoped no one would notice.

> I believe this shouldn't go in via next-net tree, and should be reviewed
> properly.
> 
> I am behaving more flexible for PMD patches about the process and
> timing, because their scope is limited.
> PMD patches can break at most the PMD itself and if the maintainer is
> sending the patch, they should be knowing what they are doing, so vendor
> gets the responsibility for their own driver. I am paying majority of
> care to be sure it doesn't break others.
> 
> But ethdev and eal way beyond those flexibility, because their scope is
> much larger.
> 
> Can you please extract this patch from the patchset?

I'll submit v2 shortly to address this issue. I remain open to comments on
the usefulness of these functions, which I'll eventually re-send as a
separate patch.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 00/29] net/mlx4: restore PMD functionality
  2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                   ` (28 preceding siblings ...)
  2017-10-11 14:35 ` [PATCH v1 29/29] net/mlx4: add RSS support outside flow API Adrien Mazarguil
@ 2017-10-12 12:19 ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 01/29] ethdev: expose flow API error helper Adrien Mazarguil
                     ` (29 more replies)
  29 siblings, 30 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

This series restores all the control path functionality removed in prior
series "net/mlx4: trim and refactor entire PMD", including:

- Promiscuous mode.
- All multicast mode.
- MAC address configuration.
- Support for multiple simultaneous MAC addresses.
- Reception of broadcast and user-defined multicast traffic.
- VLAN filters.
- RSS.

This rework also results in the following enhancements:

- Support for multiple flow rule priorities (up to 4096).
- Much more comprehensive error messages when failing to create or apply
  flow rules.
- Flow rules with the RSS action targeting disparate queues can now overlap
  (as long as they take HW limitations into account).
- RSS contexts can be created/destroyed on demand (they were previously
  fixed once and for all after applying the first flow rule).
- RSS hash key can be configured per context.
- Rx objects have a smaller memory footprint.

Note that it should be applied directly before the following series:

 "new mlx4 datapath bypassing ibverbs"

For which a new version based on top of this one will be submitted soon.

v2 changes:

- Moved new memory allocation wrappers from EAL into the mlx4 PMD.
- Re-based on latest dpdk-next-net.

Adrien Mazarguil (29):
  ethdev: expose flow API error helper
  net/mlx4: replace bit-field type
  net/mlx4: remove Rx QP initializer function
  net/mlx4: enhance header files comments
  net/mlx4: expose support for flow rule priorities
  net/mlx4: clarify flow objects naming scheme
  net/mlx4: tidy up flow rule handling code
  net/mlx4: compact flow rule error reporting
  net/mlx4: add iovec-like allocation wrappers
  net/mlx4: merge flow creation and validation code
  net/mlx4: allocate drop flow resources on demand
  net/mlx4: relax check on missing flow rule target
  net/mlx4: refactor internal flow rules
  net/mlx4: generalize flow rule priority support
  net/mlx4: simplify trigger code for flow rules
  net/mlx4: refactor flow item validation code
  net/mlx4: add MAC addresses configuration support
  net/mlx4: add VLAN filter configuration support
  net/mlx4: add flow support for multicast traffic
  net/mlx4: restore promisc and allmulti support
  net/mlx4: update Rx/Tx callbacks consistently
  net/mlx4: fix invalid errno value sign
  net/mlx4: drop live queue reconfiguration support
  net/mlx4: allocate queues and mbuf rings together
  net/mlx4: convert Rx path to work queues
  net/mlx4: remove unnecessary check
  net/mlx4: add RSS flow rule action support
  net/mlx4: disable UDP support in RSS flow rules
  net/mlx4: add RSS support outside flow API

 doc/guides/nics/features/mlx4.ini       |    6 +
 doc/guides/prog_guide/rte_flow.rst      |   23 +-
 drivers/net/mlx4/Makefile               |    2 +-
 drivers/net/mlx4/mlx4.c                 |   71 +-
 drivers/net/mlx4/mlx4.h                 |   61 +-
 drivers/net/mlx4/mlx4_ethdev.c          |  231 +++-
 drivers/net/mlx4/mlx4_flow.c            | 1671 ++++++++++++++++----------
 drivers/net/mlx4/mlx4_flow.h            |   32 +-
 drivers/net/mlx4/mlx4_rxq.c             |  697 +++++------
 drivers/net/mlx4/mlx4_rxtx.c            |    2 +-
 drivers/net/mlx4/mlx4_rxtx.h            |   34 +-
 drivers/net/mlx4/mlx4_txq.c             |  343 ++----
 drivers/net/mlx4/mlx4_utils.c           |  151 +++
 drivers/net/mlx4/mlx4_utils.h           |   25 +-
 drivers/net/tap/tap_flow.c              |    2 +-
 lib/librte_ether/rte_ethdev_version.map |    1 +
 lib/librte_ether/rte_flow.c             |   49 +-
 lib/librte_ether/rte_flow.h             |   24 +
 lib/librte_ether/rte_flow_driver.h      |   38 -
 mk/rte.app.mk                           |    2 +-
 20 files changed, 2144 insertions(+), 1321 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 01/29] ethdev: expose flow API error helper
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 02/29] net/mlx4: replace bit-field type Adrien Mazarguil
                     ` (28 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

rte_flow_error_set() is a convenient helper to initialize error objects.

Since there is no fundamental reason to prevent applications from using it,
expose it through the public interface after modifying its return value
from positive to negative. This is done for consistency with the rest of
the public interface.

Documentation is updated accordingly.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/prog_guide/rte_flow.rst      | 23 +++++++++++--
 drivers/net/mlx4/mlx4_flow.c            |  6 ++--
 drivers/net/tap/tap_flow.c              |  2 +-
 lib/librte_ether/rte_ethdev_version.map |  1 +
 lib/librte_ether/rte_flow.c             | 49 +++++++++++++++++++---------
 lib/librte_ether/rte_flow.h             | 24 ++++++++++++++
 lib/librte_ether/rte_flow_driver.h      | 38 ---------------------
 7 files changed, 83 insertions(+), 60 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 73f12ee..3113881 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1695,6 +1695,25 @@ freed by the application, however its pointer can be considered valid only
 as long as its associated DPDK port remains configured. Closing the
 underlying device or unloading the PMD invalidates it.
 
+Helpers
+-------
+
+Error initializer
+~~~~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+   static inline int
+   rte_flow_error_set(struct rte_flow_error *error,
+                      int code,
+                      enum rte_flow_error_type type,
+                      const void *cause,
+                      const char *message);
+
+This function initializes ``error`` (if non-NULL) with the provided
+parameters and sets ``rte_errno`` to ``code``. A negative error ``code`` is
+then returned.
+
 Caveats
 -------
 
@@ -1760,13 +1779,11 @@ the legacy filtering framework, which should eventually disappear.
   whatsoever). They only make sure these callbacks are non-NULL or return
   the ``ENOSYS`` (function not supported) error.
 
-This interface additionally defines the following helper functions:
+This interface additionally defines the following helper function:
 
 - ``rte_flow_ops_get()``: get generic flow operations structure from a
   port.
 
-- ``rte_flow_error_set()``: initialize generic flow error structure.
-
 More will be added over time.
 
 Device compatibility
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 0885a91..018843b 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -955,9 +955,9 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 		mlx4_mac_addr_del(priv);
 	} else if (mlx4_mac_addr_add(priv) < 0) {
 		priv->isolated = 1;
-		return -rte_flow_error_set(error, rte_errno,
-					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					   NULL, "cannot leave isolated mode");
+		return rte_flow_error_set(error, rte_errno,
+					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					  NULL, "cannot leave isolated mode");
 	}
 	return 0;
 }
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 28d793f..ffc0b85 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1462,7 +1462,7 @@ tap_flow_isolate(struct rte_eth_dev *dev,
 	return 0;
 error:
 	pmd->flow_isolate = 0;
-	return -rte_flow_error_set(
+	return rte_flow_error_set(
 		error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 		"TC rule creation failed");
 }
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 92c9e29..e27f596 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -193,5 +193,6 @@ DPDK_17.11 {
 
 	rte_eth_dev_pool_ops_supported;
 	rte_eth_dev_reset;
+	rte_flow_error_set;
 
 } DPDK_17.08;
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index e276fb2..6659063 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -145,9 +145,9 @@ rte_flow_validate(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->validate))
 		return ops->validate(dev, attr, pattern, actions, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Create a flow rule on a given port. */
@@ -183,9 +183,9 @@ rte_flow_destroy(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->destroy))
 		return ops->destroy(dev, flow, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Destroy all flow rules associated with a port. */
@@ -200,9 +200,9 @@ rte_flow_flush(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->flush))
 		return ops->flush(dev, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Query an existing flow rule. */
@@ -220,9 +220,9 @@ rte_flow_query(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->query))
 		return ops->query(dev, flow, action, data, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
 }
 
 /* Restrict ingress traffic to the defined flow rules. */
@@ -238,9 +238,28 @@ rte_flow_isolate(uint16_t port_id,
 		return -rte_errno;
 	if (likely(!!ops->isolate))
 		return ops->isolate(dev, set, error);
-	return -rte_flow_error_set(error, ENOSYS,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, rte_strerror(ENOSYS));
+	return rte_flow_error_set(error, ENOSYS,
+				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+				  NULL, rte_strerror(ENOSYS));
+}
+
+/* Initialize flow error structure. */
+int
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_flow_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return -code;
 }
 
 /** Compute storage space needed by item specification. */
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index d37b0ad..a0ffb71 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -1322,6 +1322,30 @@ int
 rte_flow_isolate(uint16_t port_id, int set, struct rte_flow_error *error);
 
 /**
+ * Initialize flow error structure.
+ *
+ * @param[out] error
+ *   Pointer to flow error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error types.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Negative error code (errno value) and rte_errno is set.
+ */
+int
+rte_flow_error_set(struct rte_flow_error *error,
+		   int code,
+		   enum rte_flow_error_type type,
+		   const void *cause,
+		   const char *message);
+
+/**
  * Generic flow representation.
  *
  * This form is sufficient to describe an rte_flow independently from any
diff --git a/lib/librte_ether/rte_flow_driver.h b/lib/librte_ether/rte_flow_driver.h
index 8573cef..254d1cb 100644
--- a/lib/librte_ether/rte_flow_driver.h
+++ b/lib/librte_ether/rte_flow_driver.h
@@ -45,7 +45,6 @@
 
 #include <stdint.h>
 
-#include <rte_errno.h>
 #include "rte_ethdev.h"
 #include "rte_flow.h"
 
@@ -128,43 +127,6 @@ struct rte_flow_ops {
 };
 
 /**
- * Initialize generic flow error structure.
- *
- * This function also sets rte_errno to a given value.
- *
- * @param[out] error
- *   Pointer to flow error structure (may be NULL).
- * @param code
- *   Related error code (rte_errno).
- * @param type
- *   Cause field and error types.
- * @param cause
- *   Object responsible for the error.
- * @param message
- *   Human-readable error message.
- *
- * @return
- *   Error code.
- */
-static inline int
-rte_flow_error_set(struct rte_flow_error *error,
-		   int code,
-		   enum rte_flow_error_type type,
-		   const void *cause,
-		   const char *message)
-{
-	if (error) {
-		*error = (struct rte_flow_error){
-			.type = type,
-			.cause = cause,
-			.message = message,
-		};
-	}
-	rte_errno = code;
-	return code;
-}
-
-/**
  * Get generic flow operations structure from a port.
  *
  * @param port_id
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 02/29] net/mlx4: replace bit-field type
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 01/29] ethdev: expose flow API error helper Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 03/29] net/mlx4: remove Rx QP initializer function Adrien Mazarguil
                     ` (27 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Make clear it's 32-bit wide.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 9bd2acc..71cbced 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -100,10 +100,10 @@ struct priv {
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int vf:1; /* This is a VF device. */
-	unsigned int intr_alarm:1; /* An interrupt alarm is scheduled. */
-	unsigned int isolated:1; /* Toggle isolated mode. */
+	uint32_t started:1; /* Device started, flows enabled. */
+	uint32_t vf:1; /* This is a VF device. */
+	uint32_t intr_alarm:1; /* An interrupt alarm is scheduled. */
+	uint32_t isolated:1; /* Toggle isolated mode. */
 	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 03/29] net/mlx4: remove Rx QP initializer function
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 01/29] ethdev: expose flow API error helper Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 02/29] net/mlx4: replace bit-field type Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 04/29] net/mlx4: enhance header files comments Adrien Mazarguil
                     ` (26 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

There is no benefit in having this as a separate function.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_rxq.c | 59 ++++++++++++----------------------------
 1 file changed, 18 insertions(+), 41 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 409983f..2d54ab0 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -184,46 +184,6 @@ mlx4_rxq_cleanup(struct rxq *rxq)
 }
 
 /**
- * Allocate a Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error and rte_errno is set.
- */
-static struct ibv_qp *
-mlx4_rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
-	struct ibv_qp *qp;
-	struct ibv_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = 1,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-	};
-
-	qp = ibv_create_qp(priv->pd, &attr);
-	if (!qp)
-		rte_errno = errno ? errno : EINVAL;
-	return qp;
-}
-
-/**
  * Configure a Rx queue.
  *
  * @param dev
@@ -254,6 +214,7 @@ mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.socket = socket
 	};
 	struct ibv_qp_attr mod;
+	struct ibv_qp_init_attr qp_init;
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret;
@@ -317,8 +278,24 @@ mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	tmpl.qp = mlx4_rxq_setup_qp(priv, tmpl.cq, desc);
+	qp_init = (struct ibv_qp_init_attr){
+		/* CQ to be associated with the send queue. */
+		.send_cq = tmpl.cq,
+		/* CQ to be associated with the receive queue. */
+		.recv_cq = tmpl.cq,
+		.cap = {
+			/* Max number of outstanding WRs. */
+			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+					priv->device_attr.max_qp_wr :
+					desc),
+			/* Max number of scatter/gather elements in a WR. */
+			.max_recv_sge = 1,
+		},
+		.qp_type = IBV_QPT_RAW_PACKET,
+	};
+	tmpl.qp = ibv_create_qp(priv->pd, &qp_init);
 	if (tmpl.qp == NULL) {
+		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 04/29] net/mlx4: enhance header files comments
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (2 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 03/29] net/mlx4: remove Rx QP initializer function Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 05/29] net/mlx4: expose support for flow rule priorities Adrien Mazarguil
                     ` (25 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Add missing comments and fix those not Doxygen-friendly.

Since the private structure definition is modified, use this opportunity to
add one remaining missing include required by one of its fields
(sys/queue.h for LIST_HEAD()).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h       | 43 ++++++++++++++++++++------------------
 drivers/net/mlx4/mlx4_flow.h  |  2 ++
 drivers/net/mlx4/mlx4_rxtx.h  |  4 ++--
 drivers/net/mlx4/mlx4_utils.h |  4 ++--
 4 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 71cbced..1799951 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -36,6 +36,7 @@
 
 #include <net/if.h>
 #include <stdint.h>
+#include <sys/queue.h>
 
 /* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
@@ -51,13 +52,13 @@
 #include <rte_interrupts.h>
 #include <rte_mempool.h>
 
-/* Request send completion once in every 64 sends, might be less. */
+/** Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
-/* Maximum size for inline data. */
+/** Maximum size for inline data. */
 #define MLX4_PMD_MAX_INLINE 0
 
-/*
+/**
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
  * from which buffers are to be transmitted will have to be mapped by this
  * driver to their own Memory Region (MR). This is a slow operation.
@@ -68,10 +69,10 @@
 #define MLX4_PMD_TX_MP_CACHE 8
 #endif
 
-/* Interrupt alarm timeout value in microseconds. */
+/** Interrupt alarm timeout value in microseconds. */
 #define MLX4_INTR_ALARM_TIMEOUT 100000
 
-/* Port parameter. */
+/** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
 enum {
@@ -84,29 +85,31 @@ enum {
 	PCI_DEVICE_ID_MELLANOX_CONNECTX3PRO = 0x1007,
 };
 
+/** Driver name reported to lower layers and used in log output. */
 #define MLX4_DRIVER_NAME "net_mlx4"
 
 struct rxq;
 struct txq;
 struct rte_flow;
 
+/** Private data structure. */
 struct priv {
-	struct rte_eth_dev *dev; /* Ethernet device. */
-	struct ibv_context *ctx; /* Verbs context. */
-	struct ibv_device_attr device_attr; /* Device properties. */
-	struct ibv_pd *pd; /* Protection Domain. */
-	struct ether_addr mac; /* MAC address. */
-	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
+	struct rte_eth_dev *dev; /**< Ethernet device. */
+	struct ibv_context *ctx; /**< Verbs context. */
+	struct ibv_device_attr device_attr; /**< Device properties. */
+	struct ibv_pd *pd; /**< Protection Domain. */
+	struct ether_addr mac; /**< MAC address. */
+	struct ibv_flow *mac_flow; /**< Flow associated with MAC address. */
 	/* Device properties. */
-	uint16_t mtu; /* Configured MTU. */
-	uint8_t port; /* Physical port number. */
-	uint32_t started:1; /* Device started, flows enabled. */
-	uint32_t vf:1; /* This is a VF device. */
-	uint32_t intr_alarm:1; /* An interrupt alarm is scheduled. */
-	uint32_t isolated:1; /* Toggle isolated mode. */
-	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
-	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
-	LIST_HEAD(mlx4_flows, rte_flow) flows;
+	uint16_t mtu; /**< Configured MTU. */
+	uint8_t port; /**< Physical port number. */
+	uint32_t started:1; /**< Device started, flows enabled. */
+	uint32_t vf:1; /**< This is a VF device. */
+	uint32_t intr_alarm:1; /**< An interrupt alarm is scheduled. */
+	uint32_t isolated:1; /**< Toggle isolated mode. */
+	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
+	struct rte_flow_drop *flow_drop_queue; /**< Flow drop queue. */
+	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
 };
 
 /* mlx4_ethdev.c */
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index fbb775d..459030c 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -52,6 +52,7 @@
 #include <rte_flow_driver.h>
 #include <rte_byteorder.h>
 
+/** PMD-specific (mlx4) definition of a flow rule handle. */
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
@@ -65,6 +66,7 @@ struct mlx4_flow {
 	unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */
 };
 
+/** Flow rule target descriptor. */
 struct mlx4_flow_action {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index fec998a..365b585 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -85,8 +85,8 @@ struct rxq {
 
 /** Tx element. */
 struct txq_elt {
-	struct ibv_send_wr wr; /* Work request. */
-	struct ibv_sge sge; /* Scatter/gather element. */
+	struct ibv_send_wr wr; /**< Work request. */
+	struct ibv_sge sge; /**< Scatter/gather element. */
 	struct rte_mbuf *buf; /**< Buffer. */
 };
 
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index 0fbdc71..b9c02d5 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -49,7 +49,7 @@
  * information replace the driver name (MLX4_DRIVER_NAME) in log messages.
  */
 
-/* Return the file name part of a path. */
+/** Return the file name part of a path. */
 static inline const char *
 pmd_drv_log_basename(const char *s)
 {
@@ -98,7 +98,7 @@ pmd_drv_log_basename(const char *s)
 #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
+/** Allocate a buffer on the stack and fill it with a printf format string. */
 #define MKSTR(name, ...) \
 	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
 	\
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 05/29] net/mlx4: expose support for flow rule priorities
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (3 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 04/29] net/mlx4: enhance header files comments Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 06/29] net/mlx4: clarify flow objects naming scheme Adrien Mazarguil
                     ` (24 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

This PMD supports up to 4096 flow rule priority levels (0 to 4095).

Applications were not allowed to use them until now due to overlaps with
the default flows (e.g. MAC address, promiscuous mode).

This is not an issue in isolated mode when such flows do not exist.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c  | 20 +++++++++++++++++---
 drivers/net/mlx4/mlx4_flow.h  |  3 +++
 drivers/net/mlx4/mlx4_utils.h |  6 ++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 018843b..730249b 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -597,8 +597,8 @@ mlx4_flow_prepare(struct priv *priv,
 		.queue = 0,
 		.drop = 0,
 	};
+	uint32_t priority_override = 0;
 
-	(void)priv;
 	if (attr->group) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
@@ -606,11 +606,22 @@ mlx4_flow_prepare(struct priv *priv,
 				   "groups are not supported");
 		return -rte_errno;
 	}
-	if (attr->priority) {
+	if (priv->isolated) {
+		priority_override = attr->priority;
+	} else if (attr->priority) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
 				   NULL,
-				   "priorities are not supported");
+				   "priorities are not supported outside"
+				   " isolated mode");
+		return -rte_errno;
+	}
+	if (attr->priority > MLX4_FLOW_PRIORITY_LAST) {
+		rte_flow_error_set(error, ENOTSUP,
+				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+				   NULL,
+				   "maximum priority level is "
+				   MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST));
 		return -rte_errno;
 	}
 	if (attr->egress) {
@@ -680,6 +691,9 @@ mlx4_flow_prepare(struct priv *priv,
 		}
 		flow->offset += cur_item->dst_sz;
 	}
+	/* Use specified priority level when in isolated mode. */
+	if (priv->isolated && flow->ibv_attr)
+		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list */
 	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; ++actions) {
 		if (actions->type == RTE_FLOW_ACTION_TYPE_VOID) {
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 459030c..8ac09f1 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -52,6 +52,9 @@
 #include <rte_flow_driver.h>
 #include <rte_byteorder.h>
 
+/** Last and lowest priority level for a flow rule. */
+#define MLX4_FLOW_PRIORITY_LAST UINT32_C(0xfff)
+
 /** PMD-specific (mlx4) definition of a flow rule handle. */
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index b9c02d5..13f731a 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -104,6 +104,12 @@ pmd_drv_log_basename(const char *s)
 	\
 	snprintf(name, sizeof(name), __VA_ARGS__)
 
+/** Generate a string out of the provided arguments. */
+#define MLX4_STR(...) # __VA_ARGS__
+
+/** Similar to MLX4_STR() with enclosed macros expanded first. */
+#define MLX4_STR_EXPAND(...) MLX4_STR(__VA_ARGS__)
+
 /* mlx4_utils.c */
 
 int mlx4_fd_set_non_blocking(int fd);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 06/29] net/mlx4: clarify flow objects naming scheme
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (4 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 05/29] net/mlx4: expose support for flow rule priorities Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 07/29] net/mlx4: tidy up flow rule handling code Adrien Mazarguil
                     ` (23 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

In several instances, "items" refers either to a flow pattern or a single
item, and "actions" either to the entire list of actions or only one of
them.

The fact the target of a rule (struct mlx4_flow_action) is also named
"action" and item-processing objects (struct mlx4_flow_items) as "cur_item"
("token" in one instance) contributes to the confusion.

Use this opportunity to clarify related comments and remove the unused
valid_actions[] global, whose sole purpose is to be referred by
item-processing objects as "actions".

This commit does not cause any functional change.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 171 ++++++++++++++++++--------------------
 drivers/net/mlx4/mlx4_flow.h |   2 +-
 2 files changed, 81 insertions(+), 92 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 730249b..e5854c6 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -66,16 +66,14 @@
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
-/** Static initializer for items. */
-#define ITEMS(...) \
+/** Static initializer for a list of subsequent item types. */
+#define NEXT_ITEM(...) \
 	(const enum rte_flow_item_type []){ \
 		__VA_ARGS__, RTE_FLOW_ITEM_TYPE_END, \
 	}
 
-/** Structure to generate a simple graph of layers supported by the NIC. */
-struct mlx4_flow_items {
-	/** List of possible actions for these items. */
-	const enum rte_flow_action_type *const actions;
+/** Processor structure associated with a flow item. */
+struct mlx4_flow_proc_item {
 	/** Bit-masks corresponding to the possibilities for the item. */
 	const void *mask;
 	/**
@@ -121,8 +119,8 @@ struct mlx4_flow_items {
 		       void *data);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
-	/** List of possible following items.  */
-	const enum rte_flow_item_type *const items;
+	/** List of possible subsequent items. */
+	const enum rte_flow_item_type *const next_item;
 };
 
 struct rte_flow_drop {
@@ -130,13 +128,6 @@ struct rte_flow_drop {
 	struct ibv_cq *cq; /**< Verbs completion queue. */
 };
 
-/** Valid action for this PMD. */
-static const enum rte_flow_action_type valid_actions[] = {
-	RTE_FLOW_ACTION_TYPE_DROP,
-	RTE_FLOW_ACTION_TYPE_QUEUE,
-	RTE_FLOW_ACTION_TYPE_END,
-};
-
 /**
  * Convert Ethernet item to Verbs specification.
  *
@@ -485,14 +476,13 @@ mlx4_flow_validate_tcp(const struct rte_flow_item *item,
 }
 
 /** Graph of supported items and associated actions. */
-static const struct mlx4_flow_items mlx4_flow_items[] = {
+static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_END] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_ETH),
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_ETH),
 	},
 	[RTE_FLOW_ITEM_TYPE_ETH] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_VLAN,
-			       RTE_FLOW_ITEM_TYPE_IPV4),
-		.actions = valid_actions,
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_VLAN,
+				       RTE_FLOW_ITEM_TYPE_IPV4),
 		.mask = &(const struct rte_flow_item_eth){
 			.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 			.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
@@ -504,8 +494,7 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_eth),
 	},
 	[RTE_FLOW_ITEM_TYPE_VLAN] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_IPV4),
-		.actions = valid_actions,
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_IPV4),
 		.mask = &(const struct rte_flow_item_vlan){
 		/* rte_flow_item_vlan_mask is invalid for mlx4. */
 #if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
@@ -520,9 +509,8 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = 0,
 	},
 	[RTE_FLOW_ITEM_TYPE_IPV4] = {
-		.items = ITEMS(RTE_FLOW_ITEM_TYPE_UDP,
-			       RTE_FLOW_ITEM_TYPE_TCP),
-		.actions = valid_actions,
+		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_UDP,
+				       RTE_FLOW_ITEM_TYPE_TCP),
 		.mask = &(const struct rte_flow_item_ipv4){
 			.hdr = {
 				.src_addr = -1,
@@ -536,7 +524,6 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_ipv4),
 	},
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
-		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_udp){
 			.hdr = {
 				.src_port = -1,
@@ -550,7 +537,6 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
 	[RTE_FLOW_ITEM_TYPE_TCP] = {
-		.actions = valid_actions,
 		.mask = &(const struct rte_flow_item_tcp){
 			.hdr = {
 				.src_port = -1,
@@ -572,7 +558,7 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
  *   Pointer to private structure.
  * @param[in] attr
  *   Flow rule attributes.
- * @param[in] items
+ * @param[in] pattern
  *   Pattern specification (list terminated by the END pattern item).
  * @param[in] actions
  *   Associated actions (list terminated by the END action).
@@ -587,13 +573,15 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 static int
 mlx4_flow_prepare(struct priv *priv,
 		  const struct rte_flow_attr *attr,
-		  const struct rte_flow_item items[],
+		  const struct rte_flow_item pattern[],
 		  const struct rte_flow_action actions[],
 		  struct rte_flow_error *error,
 		  struct mlx4_flow *flow)
 {
-	const struct mlx4_flow_items *cur_item = mlx4_flow_items;
-	struct mlx4_flow_action action = {
+	const struct rte_flow_item *item;
+	const struct rte_flow_action *action;
+	const struct mlx4_flow_proc_item *proc = mlx4_flow_proc_item_list;
+	struct mlx4_flow_target target = {
 		.queue = 0,
 		.drop = 0,
 	};
@@ -638,82 +626,80 @@ mlx4_flow_prepare(struct priv *priv,
 				   "only ingress is supported");
 		return -rte_errno;
 	}
-	/* Go over items list. */
-	for (; items->type != RTE_FLOW_ITEM_TYPE_END; ++items) {
-		const struct mlx4_flow_items *token = NULL;
+	/* Go over pattern. */
+	for (item = pattern; item->type != RTE_FLOW_ITEM_TYPE_END; ++item) {
+		const struct mlx4_flow_proc_item *next = NULL;
 		unsigned int i;
 		int err;
 
-		if (items->type == RTE_FLOW_ITEM_TYPE_VOID)
+		if (item->type == RTE_FLOW_ITEM_TYPE_VOID)
 			continue;
 		/*
 		 * The nic can support patterns with NULL eth spec only
 		 * if eth is a single item in a rule.
 		 */
-		if (!items->spec &&
-			items->type == RTE_FLOW_ITEM_TYPE_ETH) {
-			const struct rte_flow_item *next = items + 1;
+		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
+			const struct rte_flow_item *next = item + 1;
 
 			if (next->type != RTE_FLOW_ITEM_TYPE_END) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
-						   items,
+						   item,
 						   "the rule requires"
 						   " an Ethernet spec");
 				return -rte_errno;
 			}
 		}
 		for (i = 0;
-		     cur_item->items &&
-		     cur_item->items[i] != RTE_FLOW_ITEM_TYPE_END;
+		     proc->next_item &&
+		     proc->next_item[i] != RTE_FLOW_ITEM_TYPE_END;
 		     ++i) {
-			if (cur_item->items[i] == items->type) {
-				token = &mlx4_flow_items[items->type];
+			if (proc->next_item[i] == item->type) {
+				next = &mlx4_flow_proc_item_list[item->type];
 				break;
 			}
 		}
-		if (!token)
+		if (!next)
 			goto exit_item_not_supported;
-		cur_item = token;
-		err = cur_item->validate(items,
-					(const uint8_t *)cur_item->mask,
-					 cur_item->mask_sz);
+		proc = next;
+		err = proc->validate(item, proc->mask, proc->mask_sz);
 		if (err)
 			goto exit_item_not_supported;
-		if (flow->ibv_attr && cur_item->convert) {
-			err = cur_item->convert(items,
-						(cur_item->default_mask ?
-						 cur_item->default_mask :
-						 cur_item->mask),
-						 flow);
+		if (flow->ibv_attr && proc->convert) {
+			err = proc->convert(item,
+					    (proc->default_mask ?
+					     proc->default_mask :
+					     proc->mask),
+					    flow);
 			if (err)
 				goto exit_item_not_supported;
 		}
-		flow->offset += cur_item->dst_sz;
+		flow->offset += proc->dst_sz;
 	}
 	/* Use specified priority level when in isolated mode. */
 	if (priv->isolated && flow->ibv_attr)
 		flow->ibv_attr->priority = priority_override;
-	/* Go over actions list */
-	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; ++actions) {
-		if (actions->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	/* Go over actions list. */
+	for (action = actions;
+	     action->type != RTE_FLOW_ACTION_TYPE_END;
+	     ++action) {
+		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
 			continue;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
-			action.drop = 1;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+			target.drop = 1;
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
 			const struct rte_flow_action_queue *queue =
-				(const struct rte_flow_action_queue *)
-				actions->conf;
+				action->conf;
 
 			if (!queue || (queue->index >
 				       (priv->dev->data->nb_rx_queues - 1)))
 				goto exit_action_not_supported;
-			action.queue = 1;
+			target.queue = 1;
 		} else {
 			goto exit_action_not_supported;
 		}
 	}
-	if (!action.queue && !action.drop) {
+	if (!target.queue && !target.drop) {
 		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_HANDLE,
 				   NULL, "no valid action");
 		return -rte_errno;
@@ -721,11 +707,11 @@ mlx4_flow_prepare(struct priv *priv,
 	return 0;
 exit_item_not_supported:
 	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
-			   items, "item not supported");
+			   item, "item not supported");
 	return -rte_errno;
 exit_action_not_supported:
 	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
-			   actions, "action not supported");
+			   action, "action not supported");
 	return -rte_errno;
 }
 
@@ -738,14 +724,14 @@ mlx4_flow_prepare(struct priv *priv,
 static int
 mlx4_flow_validate(struct rte_eth_dev *dev,
 		   const struct rte_flow_attr *attr,
-		   const struct rte_flow_item items[],
+		   const struct rte_flow_item pattern[],
 		   const struct rte_flow_action actions[],
 		   struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	return mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
+	return mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
 }
 
 /**
@@ -828,8 +814,8 @@ mlx4_flow_create_drop_queue(struct priv *priv)
  *   Pointer to private structure.
  * @param ibv_attr
  *   Verbs flow attributes.
- * @param action
- *   Target action structure.
+ * @param target
+ *   Rule target descriptor.
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
  *
@@ -837,9 +823,9 @@ mlx4_flow_create_drop_queue(struct priv *priv)
  *   A flow if the rule could be created.
  */
 static struct rte_flow *
-mlx4_flow_create_action_queue(struct priv *priv,
+mlx4_flow_create_target_queue(struct priv *priv,
 			      struct ibv_flow_attr *ibv_attr,
-			      struct mlx4_flow_action *action,
+			      struct mlx4_flow_target *target,
 			      struct rte_flow_error *error)
 {
 	struct ibv_qp *qp;
@@ -853,10 +839,10 @@ mlx4_flow_create_action_queue(struct priv *priv,
 				   NULL, "cannot allocate flow memory");
 		return NULL;
 	}
-	if (action->drop) {
+	if (target->drop) {
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
-		struct rxq *rxq = priv->dev->data->rx_queues[action->queue_id];
+		struct rxq *rxq = priv->dev->data->rx_queues[target->queue_id];
 
 		qp = rxq->qp;
 		rte_flow->qp = qp;
@@ -885,17 +871,18 @@ mlx4_flow_create_action_queue(struct priv *priv,
 static struct rte_flow *
 mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr,
-		 const struct rte_flow_item items[],
+		 const struct rte_flow_item pattern[],
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error)
 {
+	const struct rte_flow_action *action;
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *rte_flow;
-	struct mlx4_flow_action action;
+	struct mlx4_flow_target target;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
 	int err;
 
-	err = mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
+	err = mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
 	if (err)
 		return NULL;
 	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
@@ -914,31 +901,33 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		.port = priv->port,
 		.flags = 0,
 	};
-	claim_zero(mlx4_flow_prepare(priv, attr, items, actions,
+	claim_zero(mlx4_flow_prepare(priv, attr, pattern, actions,
 				     error, &flow));
-	action = (struct mlx4_flow_action){
+	target = (struct mlx4_flow_target){
 		.queue = 0,
 		.drop = 0,
 	};
-	for (; actions->type != RTE_FLOW_ACTION_TYPE_END; ++actions) {
-		if (actions->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	for (action = actions;
+	     action->type != RTE_FLOW_ACTION_TYPE_END;
+	     ++action) {
+		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
 			continue;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			action.queue = 1;
-			action.queue_id =
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+			target.queue = 1;
+			target.queue_id =
 				((const struct rte_flow_action_queue *)
-				 actions->conf)->index;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
-			action.drop = 1;
+				 action->conf)->index;
+		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+			target.drop = 1;
 		} else {
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions, "unsupported action");
+					   action, "unsupported action");
 			goto exit;
 		}
 	}
-	rte_flow = mlx4_flow_create_action_queue(priv, flow.ibv_attr,
-						 &action, error);
+	rte_flow = mlx4_flow_create_target_queue(priv, flow.ibv_attr,
+						 &target, error);
 	if (rte_flow) {
 		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
 		DEBUG("Flow created %p", (void *)rte_flow);
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 8ac09f1..358efbe 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -70,7 +70,7 @@ struct mlx4_flow {
 };
 
 /** Flow rule target descriptor. */
-struct mlx4_flow_action {
+struct mlx4_flow_target {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint32_t queue_id; /**< Identifier of the queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 07/29] net/mlx4: tidy up flow rule handling code
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (5 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 06/29] net/mlx4: clarify flow objects naming scheme Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 08/29] net/mlx4: compact flow rule error reporting Adrien Mazarguil
                     ` (22 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

- Remove unnecessary casts.
- Replace consecutive if/else blocks with switch statements.
- Use proper big endian definitions for mask values.
- Make end marker checks of item and action lists less verbose since they
  are explicitly documented as being equal to 0.
- Remove unnecessary NULL check on action configuration structure.

This commit does not cause any functional change.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 115 ++++++++++++++++++--------------------
 1 file changed, 53 insertions(+), 62 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index e5854c6..fa56419 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -53,6 +53,7 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_byteorder.h>
 #include <rte_errno.h>
 #include <rte_eth_ctrl.h>
 #include <rte_ethdev.h>
@@ -108,7 +109,7 @@ struct mlx4_flow_proc_item {
 	 *   rte_flow item to convert.
 	 * @param default_mask
 	 *   Default bit-masks to use when item->mask is not provided.
-	 * @param data
+	 * @param flow
 	 *   Internal structure to store the conversion.
 	 *
 	 * @return
@@ -116,7 +117,7 @@ struct mlx4_flow_proc_item {
 	 */
 	int (*convert)(const struct rte_flow_item *item,
 		       const void *default_mask,
-		       void *data);
+		       struct mlx4_flow *flow);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
 	/** List of possible subsequent items. */
@@ -135,17 +136,16 @@ struct rte_flow_drop {
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_eth(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     void *data)
+		     struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_eth *spec = item->spec;
 	const struct rte_flow_item_eth *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_eth *eth;
 	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
 	unsigned int i;
@@ -182,17 +182,16 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_vlan(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      void *data)
+		      struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_vlan *spec = item->spec;
 	const struct rte_flow_item_vlan *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_eth *eth;
 	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
 
@@ -214,17 +213,16 @@ mlx4_flow_create_vlan(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      void *data)
+		      struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_ipv4 *spec = item->spec;
 	const struct rte_flow_item_ipv4 *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_ipv4 *ipv4;
 	unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4);
 
@@ -260,17 +258,16 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_udp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     void *data)
+		     struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_udp *spec = item->spec;
 	const struct rte_flow_item_udp *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_tcp_udp *udp;
 	unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
@@ -302,17 +299,16 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
  *   Item specification.
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
- * @param data[in, out]
- *   User structure.
+ * @param flow[in, out]
+ *   Conversion result.
  */
 static int
 mlx4_flow_create_tcp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     void *data)
+		     struct mlx4_flow *flow)
 {
 	const struct rte_flow_item_tcp *spec = item->spec;
 	const struct rte_flow_item_tcp *mask = item->mask;
-	struct mlx4_flow *flow = (struct mlx4_flow *)data;
 	struct ibv_flow_spec_tcp_udp *tcp;
 	unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
@@ -496,12 +492,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_VLAN] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_IPV4),
 		.mask = &(const struct rte_flow_item_vlan){
-		/* rte_flow_item_vlan_mask is invalid for mlx4. */
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-			.tci = 0x0fff,
-#else
-			.tci = 0xff0f,
-#endif
+			/* Only TCI VID matching is supported. */
+			.tci = RTE_BE16(0x0fff),
 		},
 		.mask_sz = sizeof(struct rte_flow_item_vlan),
 		.validate = mlx4_flow_validate_vlan,
@@ -513,8 +505,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 				       RTE_FLOW_ITEM_TYPE_TCP),
 		.mask = &(const struct rte_flow_item_ipv4){
 			.hdr = {
-				.src_addr = -1,
-				.dst_addr = -1,
+				.src_addr = RTE_BE32(0xffffffff),
+				.dst_addr = RTE_BE32(0xffffffff),
 			},
 		},
 		.default_mask = &rte_flow_item_ipv4_mask,
@@ -526,8 +518,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
 		.mask = &(const struct rte_flow_item_udp){
 			.hdr = {
-				.src_port = -1,
-				.dst_port = -1,
+				.src_port = RTE_BE16(0xffff),
+				.dst_port = RTE_BE16(0xffff),
 			},
 		},
 		.default_mask = &rte_flow_item_udp_mask,
@@ -539,8 +531,8 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_TCP] = {
 		.mask = &(const struct rte_flow_item_tcp){
 			.hdr = {
-				.src_port = -1,
-				.dst_port = -1,
+				.src_port = RTE_BE16(0xffff),
+				.dst_port = RTE_BE16(0xffff),
 			},
 		},
 		.default_mask = &rte_flow_item_tcp_mask,
@@ -627,7 +619,7 @@ mlx4_flow_prepare(struct priv *priv,
 		return -rte_errno;
 	}
 	/* Go over pattern. */
-	for (item = pattern; item->type != RTE_FLOW_ITEM_TYPE_END; ++item) {
+	for (item = pattern; item->type; ++item) {
 		const struct mlx4_flow_proc_item *next = NULL;
 		unsigned int i;
 		int err;
@@ -641,7 +633,7 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
 			const struct rte_flow_item *next = item + 1;
 
-			if (next->type != RTE_FLOW_ITEM_TYPE_END) {
+			if (next->type) {
 				rte_flow_error_set(error, ENOTSUP,
 						   RTE_FLOW_ERROR_TYPE_ITEM,
 						   item,
@@ -650,10 +642,7 @@ mlx4_flow_prepare(struct priv *priv,
 				return -rte_errno;
 			}
 		}
-		for (i = 0;
-		     proc->next_item &&
-		     proc->next_item[i] != RTE_FLOW_ITEM_TYPE_END;
-		     ++i) {
+		for (i = 0; proc->next_item && proc->next_item[i]; ++i) {
 			if (proc->next_item[i] == item->type) {
 				next = &mlx4_flow_proc_item_list[item->type];
 				break;
@@ -680,22 +669,22 @@ mlx4_flow_prepare(struct priv *priv,
 	if (priv->isolated && flow->ibv_attr)
 		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list. */
-	for (action = actions;
-	     action->type != RTE_FLOW_ACTION_TYPE_END;
-	     ++action) {
-		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	for (action = actions; action->type; ++action) {
+		switch (action->type) {
+			const struct rte_flow_action_queue *queue;
+
+		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+		case RTE_FLOW_ACTION_TYPE_DROP:
 			target.drop = 1;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
-			const struct rte_flow_action_queue *queue =
-				action->conf;
-
-			if (!queue || (queue->index >
-				       (priv->dev->data->nb_rx_queues - 1)))
+			break;
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = action->conf;
+			if (queue->index >= priv->dev->data->nb_rx_queues)
 				goto exit_action_not_supported;
 			target.queue = 1;
-		} else {
+			break;
+		default:
 			goto exit_action_not_supported;
 		}
 	}
@@ -907,19 +896,21 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		.queue = 0,
 		.drop = 0,
 	};
-	for (action = actions;
-	     action->type != RTE_FLOW_ACTION_TYPE_END;
-	     ++action) {
-		if (action->type == RTE_FLOW_ACTION_TYPE_VOID) {
+	for (action = actions; action->type; ++action) {
+		switch (action->type) {
+			const struct rte_flow_action_queue *queue;
+
+		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
+		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			queue = action->conf;
 			target.queue = 1;
-			target.queue_id =
-				((const struct rte_flow_action_queue *)
-				 action->conf)->index;
-		} else if (action->type == RTE_FLOW_ACTION_TYPE_DROP) {
+			target.queue_id = queue->index;
+			break;
+		case RTE_FLOW_ACTION_TYPE_DROP:
 			target.drop = 1;
-		} else {
+			break;
+		default:
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
 					   action, "unsupported action");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 08/29] net/mlx4: compact flow rule error reporting
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (6 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 07/29] net/mlx4: tidy up flow rule handling code Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 09/29] net/mlx4: add iovec-like allocation wrappers Adrien Mazarguil
                     ` (21 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Relying on rte_errno is not necessary where the return value of
rte_flow_error_set() can be used directly.

A related minor change is switching from RTE_FLOW_ERROR_TYPE_HANDLE to
RTE_FLOW_ERROR_TYPE_UNSPECIFIED when no rte_flow handle is involved in the
error, specifically when none is allocated yet.

This commit does not cause any functional change.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 102 ++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index fa56419..000f17f 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -579,45 +579,30 @@ mlx4_flow_prepare(struct priv *priv,
 	};
 	uint32_t priority_override = 0;
 
-	if (attr->group) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
-				   NULL,
-				   "groups are not supported");
-		return -rte_errno;
-	}
-	if (priv->isolated) {
+	if (attr->group)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+			 NULL, "groups are not supported");
+	if (priv->isolated)
 		priority_override = attr->priority;
-	} else if (attr->priority) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
-				   NULL,
-				   "priorities are not supported outside"
-				   " isolated mode");
-		return -rte_errno;
-	}
-	if (attr->priority > MLX4_FLOW_PRIORITY_LAST) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
-				   NULL,
-				   "maximum priority level is "
-				   MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST));
-		return -rte_errno;
-	}
-	if (attr->egress) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_EGRESS,
-				   NULL,
-				   "egress is not supported");
-		return -rte_errno;
-	}
-	if (!attr->ingress) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
-				   NULL,
-				   "only ingress is supported");
-		return -rte_errno;
-	}
+	else if (attr->priority)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+			 NULL,
+			 "priorities are not supported outside isolated mode");
+	if (attr->priority > MLX4_FLOW_PRIORITY_LAST)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
+			 NULL, "maximum priority level is "
+			 MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST));
+	if (attr->egress)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_EGRESS,
+			 NULL, "egress is not supported");
+	if (!attr->ingress)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+			 NULL, "only ingress is supported");
 	/* Go over pattern. */
 	for (item = pattern; item->type; ++item) {
 		const struct mlx4_flow_proc_item *next = NULL;
@@ -633,14 +618,11 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
 			const struct rte_flow_item *next = item + 1;
 
-			if (next->type) {
-				rte_flow_error_set(error, ENOTSUP,
-						   RTE_FLOW_ERROR_TYPE_ITEM,
-						   item,
-						   "the rule requires"
-						   " an Ethernet spec");
-				return -rte_errno;
-			}
+			if (next->type)
+				return rte_flow_error_set
+					(error, ENOTSUP,
+					 RTE_FLOW_ERROR_TYPE_ITEM, item,
+					 "the rule requires an Ethernet spec");
 		}
 		for (i = 0; proc->next_item && proc->next_item[i]; ++i) {
 			if (proc->next_item[i] == item->type) {
@@ -688,20 +670,17 @@ mlx4_flow_prepare(struct priv *priv,
 			goto exit_action_not_supported;
 		}
 	}
-	if (!target.queue && !target.drop) {
-		rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_HANDLE,
-				   NULL, "no valid action");
-		return -rte_errno;
-	}
+	if (!target.queue && !target.drop)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+			 NULL, "no valid action");
 	return 0;
 exit_item_not_supported:
-	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
-			   item, "item not supported");
-	return -rte_errno;
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, "item not supported");
 exit_action_not_supported:
-	rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
-			   action, "action not supported");
-	return -rte_errno;
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
+				  action, "action not supported");
 }
 
 /**
@@ -824,7 +803,8 @@ mlx4_flow_create_target_queue(struct priv *priv,
 	assert(priv->ctx);
 	rte_flow = rte_calloc(__func__, 1, sizeof(*rte_flow), 0);
 	if (!rte_flow) {
-		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "cannot allocate flow memory");
 		return NULL;
 	}
@@ -841,7 +821,8 @@ mlx4_flow_create_target_queue(struct priv *priv,
 		return rte_flow;
 	rte_flow->ibv_flow = ibv_create_flow(qp, rte_flow->ibv_attr);
 	if (!rte_flow->ibv_flow) {
-		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "flow rule creation failure");
 		goto error;
 	}
@@ -876,7 +857,8 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		return NULL;
 	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
 	if (!flow.ibv_attr) {
-		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+		rte_flow_error_set(error, ENOMEM,
+				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "cannot allocate ibv_attr memory");
 		return NULL;
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 09/29] net/mlx4: add iovec-like allocation wrappers
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (7 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 08/29] net/mlx4: compact flow rule error reporting Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 10/29] net/mlx4: merge flow creation and validation code Adrien Mazarguil
                     ` (20 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

These wrappers implement the ability to allocate room for several disparate
objects as a single contiguous allocation while complying with their
respective alignment constraints.

This is usually more efficient than allocating and freeing them
individually if they are not expected to be reallocated with rte_realloc().

A typical use case is when several objects that cannot be dissociated must
be allocated together, as shown in the following example:

 struct b {
    ...
    struct d *d;
 }

 struct a {
     ...
     struct b *b;
     struct c *c;
 }

 struct mlx4_malloc_vec vec[] = {
     { .size = sizeof(struct a), .addr = &ptr_a, },
     { .size = sizeof(struct b), .addr = &ptr_b, },
     { .size = sizeof(struct c), .addr = &ptr_c, },
     { .size = sizeof(struct d), .addr = &ptr_d, },
 };

 if (!mlx4_mallocv(NULL, vec, RTE_DIM(vec)))
     goto error;

 struct a *a = ptr_a;

 a->b = ptr_b;
 a->c = ptr_c;
 a->b->d = ptr_d;
 ...
 rte_free(a);

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_utils.c | 151 +++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_utils.h |  15 ++++
 2 files changed, 166 insertions(+)

diff --git a/drivers/net/mlx4/mlx4_utils.c b/drivers/net/mlx4/mlx4_utils.c
index fcf76c9..f18c714 100644
--- a/drivers/net/mlx4/mlx4_utils.c
+++ b/drivers/net/mlx4/mlx4_utils.c
@@ -39,8 +39,12 @@
 #include <assert.h>
 #include <errno.h>
 #include <fcntl.h>
+#include <stddef.h>
+#include <stdint.h>
 
 #include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
 
 #include "mlx4_utils.h"
 
@@ -64,3 +68,150 @@ mlx4_fd_set_non_blocking(int fd)
 	rte_errno = errno;
 	return -rte_errno;
 }
+
+/**
+ * Internal helper to allocate memory once for several disparate objects.
+ *
+ * The most restrictive alignment constraint for standard objects is assumed
+ * to be sizeof(double) and is used as a default value.
+ *
+ * C11 code would include stdalign.h and use alignof(max_align_t) however
+ * we'll stick with C99 for the time being.
+ */
+static inline size_t
+mlx4_mallocv_inline(const char *type, const struct mlx4_malloc_vec *vec,
+		    unsigned int cnt, int zero, int socket)
+{
+	unsigned int i;
+	size_t size;
+	size_t least;
+	uint8_t *data = NULL;
+	int fill = !vec[0].addr;
+
+fill:
+	size = 0;
+	least = 0;
+	for (i = 0; i < cnt; ++i) {
+		size_t align = (uintptr_t)vec[i].align;
+
+		if (!align) {
+			align = sizeof(double);
+		} else if (!rte_is_power_of_2(align)) {
+			rte_errno = EINVAL;
+			goto error;
+		}
+		if (least < align)
+			least = align;
+		align = RTE_ALIGN_CEIL(size, align);
+		size = align + vec[i].size;
+		if (fill && vec[i].addr)
+			*vec[i].addr = data + align;
+	}
+	if (fill)
+		return size;
+	if (!zero)
+		data = rte_malloc_socket(type, size, least, socket);
+	else
+		data = rte_zmalloc_socket(type, size, least, socket);
+	if (data) {
+		fill = 1;
+		goto fill;
+	}
+	rte_errno = ENOMEM;
+error:
+	for (i = 0; i != cnt; ++i)
+		if (vec[i].addr)
+			*vec[i].addr = NULL;
+	return 0;
+}
+
+/**
+ * Allocate memory once for several disparate objects.
+ *
+ * This function adds iovec-like semantics (e.g. readv()) to rte_malloc().
+ * Memory is allocated once for several contiguous objects of nonuniform
+ * sizes and alignment constraints.
+ *
+ * Each entry of @p vec describes the size, alignment constraint and
+ * provides a buffer address where the resulting object pointer must be
+ * stored.
+ *
+ * The buffer of the first entry is guaranteed to point to the beginning of
+ * the allocated region and is safe to use with rte_free().
+ *
+ * NULL buffers are silently ignored.
+ *
+ * Providing a NULL buffer in the first entry prevents this function from
+ * allocating any memory but has otherwise no effect on its behavior. In
+ * this case, the contents of remaining non-NULL buffers are updated with
+ * addresses relative to zero (i.e. offsets that would have been used during
+ * the allocation).
+ *
+ * @param[in] type
+ *   A string identifying the type of allocated objects (useful for debug
+ *   purposes, such as identifying the cause of a memory leak). Can be NULL.
+ * @param[in, out] vec
+ *   Description of objects to allocate memory for.
+ * @param cnt
+ *   Number of entries in @p vec.
+ *
+ * @return
+ *   Size in bytes of the allocated region including any padding. In case of
+ *   error, rte_errno is set, 0 is returned and NULL is stored in the
+ *   non-NULL buffers pointed by @p vec.
+ *
+ * @see struct mlx4_malloc_vec
+ * @see rte_malloc()
+ */
+size_t
+mlx4_mallocv(const char *type, const struct mlx4_malloc_vec *vec,
+	     unsigned int cnt)
+{
+	return mlx4_mallocv_inline(type, vec, cnt, 0, SOCKET_ID_ANY);
+}
+
+/**
+ * Combines the semantics of mlx4_mallocv() with those of rte_zmalloc().
+ *
+ * @see mlx4_mallocv()
+ * @see rte_zmalloc()
+ */
+size_t
+mlx4_zmallocv(const char *type, const struct mlx4_malloc_vec *vec,
+	      unsigned int cnt)
+{
+	return mlx4_mallocv_inline(type, vec, cnt, 1, SOCKET_ID_ANY);
+}
+
+/**
+ * Socket-aware version of mlx4_mallocv().
+ *
+ * This function takes one additional parameter.
+ *
+ * @param socket
+ *   NUMA socket to allocate memory on. If SOCKET_ID_ANY is used, this
+ *   function will behave the same as mlx4_mallocv().
+ *
+ * @see mlx4_mallocv()
+ * @see rte_malloc_socket()
+ */
+size_t
+mlx4_mallocv_socket(const char *type, const struct mlx4_malloc_vec *vec,
+		    unsigned int cnt, int socket)
+{
+	return mlx4_mallocv_inline(type, vec, cnt, 0, socket);
+}
+
+/**
+ * Combines the semantics of mlx4_mallocv_socket() with those of
+ * mlx4_zmalloc_socket().
+ *
+ * @see mlx4_mallocv_socket()
+ * @see rte_zmalloc_socket()
+ */
+size_t
+mlx4_zmallocv_socket(const char *type, const struct mlx4_malloc_vec *vec,
+		     unsigned int cnt, int socket)
+{
+	return mlx4_mallocv_inline(type, vec, cnt, 1, socket);
+}
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index 13f731a..bebd4ae 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -110,8 +110,23 @@ pmd_drv_log_basename(const char *s)
 /** Similar to MLX4_STR() with enclosed macros expanded first. */
 #define MLX4_STR_EXPAND(...) MLX4_STR(__VA_ARGS__)
 
+/** Object description used with mlx4_mallocv() and similar functions. */
+struct mlx4_malloc_vec {
+	size_t align; /**< Alignment constraint (power of 2), 0 if unknown. */
+	size_t size; /**< Object size. */
+	void **addr; /**< Storage for allocation address. */
+};
+
 /* mlx4_utils.c */
 
 int mlx4_fd_set_non_blocking(int fd);
+size_t mlx4_mallocv(const char *type, const struct mlx4_malloc_vec *vec,
+		    unsigned int cnt);
+size_t mlx4_zmallocv(const char *type, const struct mlx4_malloc_vec *vec,
+		     unsigned int cnt);
+size_t mlx4_mallocv_socket(const char *type, const struct mlx4_malloc_vec *vec,
+			   unsigned int cnt, int socket);
+size_t mlx4_zmallocv_socket(const char *type, const struct mlx4_malloc_vec *vec,
+			    unsigned int cnt, int socket);
 
 #endif /* MLX4_UTILS_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 10/29] net/mlx4: merge flow creation and validation code
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (8 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 09/29] net/mlx4: add iovec-like allocation wrappers Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 11/29] net/mlx4: allocate drop flow resources on demand Adrien Mazarguil
                     ` (19 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

These functions share a significant amount of code and require extra
internal objects to parse and build flow rule handles.

All this can be simplified by relying directly on the internal rte_flow
structure definition, whose QP pointer (destination Verbs queue) is
replaced by a DPDK queue ID and other properties, making it more versatile
without increasing its size (at least on 64-bit platforms).

This commit also gets rid of a few unnecessary debugging messages.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 300 ++++++++++++++++++--------------------
 drivers/net/mlx4/mlx4_flow.h |  16 +-
 2 files changed, 148 insertions(+), 168 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 000f17f..ac66444 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -39,6 +39,7 @@
 #include <arpa/inet.h>
 #include <assert.h>
 #include <errno.h>
+#include <stdalign.h>
 #include <stddef.h>
 #include <stdint.h>
 #include <string.h>
@@ -110,14 +111,14 @@ struct mlx4_flow_proc_item {
 	 * @param default_mask
 	 *   Default bit-masks to use when item->mask is not provided.
 	 * @param flow
-	 *   Internal structure to store the conversion.
+	 *   Flow rule handle to update.
 	 *
 	 * @return
 	 *   0 on success, negative value otherwise.
 	 */
 	int (*convert)(const struct rte_flow_item *item,
 		       const void *default_mask,
-		       struct mlx4_flow *flow);
+		       struct rte_flow *flow);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
 	/** List of possible subsequent items. */
@@ -137,12 +138,12 @@ struct rte_flow_drop {
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_eth(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     struct mlx4_flow *flow)
+		     struct rte_flow *flow)
 {
 	const struct rte_flow_item_eth *spec = item->spec;
 	const struct rte_flow_item_eth *mask = item->mask;
@@ -152,7 +153,7 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 2;
-	eth = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
 		.type = IBV_FLOW_SPEC_ETH,
 		.size = eth_size,
@@ -183,19 +184,20 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_vlan(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      struct mlx4_flow *flow)
+		      struct rte_flow *flow)
 {
 	const struct rte_flow_item_vlan *spec = item->spec;
 	const struct rte_flow_item_vlan *mask = item->mask;
 	struct ibv_flow_spec_eth *eth;
 	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
 
-	eth = (void *)((uintptr_t)flow->ibv_attr + flow->offset - eth_size);
+	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size -
+		       eth_size);
 	if (!spec)
 		return 0;
 	if (!mask)
@@ -214,12 +216,12 @@ mlx4_flow_create_vlan(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 		      const void *default_mask,
-		      struct mlx4_flow *flow)
+		      struct rte_flow *flow)
 {
 	const struct rte_flow_item_ipv4 *spec = item->spec;
 	const struct rte_flow_item_ipv4 *mask = item->mask;
@@ -228,7 +230,7 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 1;
-	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*ipv4 = (struct ibv_flow_spec_ipv4) {
 		.type = IBV_FLOW_SPEC_IPV4,
 		.size = ipv4_size,
@@ -259,12 +261,12 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_udp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     struct mlx4_flow *flow)
+		     struct rte_flow *flow)
 {
 	const struct rte_flow_item_udp *spec = item->spec;
 	const struct rte_flow_item_udp *mask = item->mask;
@@ -273,7 +275,7 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 0;
-	udp = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	udp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*udp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_UDP,
 		.size = udp_size,
@@ -300,12 +302,12 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
  * @param default_mask[in]
  *   Default bit-masks to use when item->mask is not provided.
  * @param flow[in, out]
- *   Conversion result.
+ *   Flow rule handle to update.
  */
 static int
 mlx4_flow_create_tcp(const struct rte_flow_item *item,
 		     const void *default_mask,
-		     struct mlx4_flow *flow)
+		     struct rte_flow *flow)
 {
 	const struct rte_flow_item_tcp *spec = item->spec;
 	const struct rte_flow_item_tcp *mask = item->mask;
@@ -314,7 +316,7 @@ mlx4_flow_create_tcp(const struct rte_flow_item *item,
 
 	++flow->ibv_attr->num_of_specs;
 	flow->ibv_attr->priority = 0;
-	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->offset);
+	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*tcp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_TCP,
 		.size = tcp_size,
@@ -556,8 +558,9 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
  *   Associated actions (list terminated by the END action).
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
- * @param[in, out] flow
- *   Flow structure to update.
+ * @param[in, out] addr
+ *   Buffer where the resulting flow rule handle pointer must be stored.
+ *   If NULL, stop processing after validation stage.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
@@ -568,15 +571,13 @@ mlx4_flow_prepare(struct priv *priv,
 		  const struct rte_flow_item pattern[],
 		  const struct rte_flow_action actions[],
 		  struct rte_flow_error *error,
-		  struct mlx4_flow *flow)
+		  struct rte_flow **addr)
 {
 	const struct rte_flow_item *item;
 	const struct rte_flow_action *action;
-	const struct mlx4_flow_proc_item *proc = mlx4_flow_proc_item_list;
-	struct mlx4_flow_target target = {
-		.queue = 0,
-		.drop = 0,
-	};
+	const struct mlx4_flow_proc_item *proc;
+	struct rte_flow temp = { .ibv_attr_size = sizeof(*temp.ibv_attr) };
+	struct rte_flow *flow = &temp;
 	uint32_t priority_override = 0;
 
 	if (attr->group)
@@ -603,6 +604,8 @@ mlx4_flow_prepare(struct priv *priv,
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
 			 NULL, "only ingress is supported");
+fill:
+	proc = mlx4_flow_proc_item_list;
 	/* Go over pattern. */
 	for (item = pattern; item->type; ++item) {
 		const struct mlx4_flow_proc_item *next = NULL;
@@ -633,10 +636,12 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!next)
 			goto exit_item_not_supported;
 		proc = next;
-		err = proc->validate(item, proc->mask, proc->mask_sz);
-		if (err)
-			goto exit_item_not_supported;
-		if (flow->ibv_attr && proc->convert) {
+		/* Perform validation once, while handle is not allocated. */
+		if (flow == &temp) {
+			err = proc->validate(item, proc->mask, proc->mask_sz);
+			if (err)
+				goto exit_item_not_supported;
+		} else if (proc->convert) {
 			err = proc->convert(item,
 					    (proc->default_mask ?
 					     proc->default_mask :
@@ -645,10 +650,10 @@ mlx4_flow_prepare(struct priv *priv,
 			if (err)
 				goto exit_item_not_supported;
 		}
-		flow->offset += proc->dst_sz;
+		flow->ibv_attr_size += proc->dst_sz;
 	}
 	/* Use specified priority level when in isolated mode. */
-	if (priv->isolated && flow->ibv_attr)
+	if (priv->isolated && flow != &temp)
 		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list. */
 	for (action = actions; action->type; ++action) {
@@ -658,22 +663,59 @@ mlx4_flow_prepare(struct priv *priv,
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
 		case RTE_FLOW_ACTION_TYPE_DROP:
-			target.drop = 1;
+			flow->drop = 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
 			queue = action->conf;
 			if (queue->index >= priv->dev->data->nb_rx_queues)
 				goto exit_action_not_supported;
-			target.queue = 1;
+			flow->queue = 1;
+			flow->queue_id = queue->index;
 			break;
 		default:
 			goto exit_action_not_supported;
 		}
 	}
-	if (!target.queue && !target.drop)
+	if (!flow->queue && !flow->drop)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "no valid action");
+	/* Validation ends here. */
+	if (!addr)
+		return 0;
+	if (flow == &temp) {
+		/* Allocate proper handle based on collected data. */
+		const struct mlx4_malloc_vec vec[] = {
+			{
+				.align = alignof(struct rte_flow),
+				.size = sizeof(*flow),
+				.addr = (void **)&flow,
+			},
+			{
+				.align = alignof(struct ibv_flow_attr),
+				.size = temp.ibv_attr_size,
+				.addr = (void **)&temp.ibv_attr,
+			},
+		};
+
+		if (!mlx4_zmallocv(__func__, vec, RTE_DIM(vec)))
+			return rte_flow_error_set
+				(error, -rte_errno,
+				 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				 "flow rule handle allocation failure");
+		/* Most fields will be updated by second pass. */
+		*flow = (struct rte_flow){
+			.ibv_attr = temp.ibv_attr,
+			.ibv_attr_size = sizeof(*flow->ibv_attr),
+		};
+		*flow->ibv_attr = (struct ibv_flow_attr){
+			.type = IBV_FLOW_ATTR_NORMAL,
+			.size = sizeof(*flow->ibv_attr),
+			.port = priv->port,
+		};
+		goto fill;
+	}
+	*addr = flow;
 	return 0;
 exit_item_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
@@ -697,9 +739,8 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	return mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
+	return mlx4_flow_prepare(priv, attr, pattern, actions, error, NULL);
 }
 
 /**
@@ -776,60 +817,66 @@ mlx4_flow_create_drop_queue(struct priv *priv)
 }
 
 /**
- * Complete flow rule creation.
+ * Toggle a configured flow rule.
  *
  * @param priv
  *   Pointer to private structure.
- * @param ibv_attr
- *   Verbs flow attributes.
- * @param target
- *   Rule target descriptor.
+ * @param flow
+ *   Flow rule handle to toggle.
+ * @param enable
+ *   Whether associated Verbs flow must be created or removed.
  * @param[out] error
  *   Perform verbose error reporting if not NULL.
  *
  * @return
- *   A flow if the rule could be created.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static struct rte_flow *
-mlx4_flow_create_target_queue(struct priv *priv,
-			      struct ibv_flow_attr *ibv_attr,
-			      struct mlx4_flow_target *target,
-			      struct rte_flow_error *error)
+static int
+mlx4_flow_toggle(struct priv *priv,
+		 struct rte_flow *flow,
+		 int enable,
+		 struct rte_flow_error *error)
 {
-	struct ibv_qp *qp;
-	struct rte_flow *rte_flow;
-
-	assert(priv->pd);
-	assert(priv->ctx);
-	rte_flow = rte_calloc(__func__, 1, sizeof(*rte_flow), 0);
-	if (!rte_flow) {
-		rte_flow_error_set(error, ENOMEM,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "cannot allocate flow memory");
-		return NULL;
+	struct ibv_qp *qp = NULL;
+	const char *msg;
+	int err;
+
+	if (!enable) {
+		if (!flow->ibv_flow)
+			return 0;
+		claim_zero(ibv_destroy_flow(flow->ibv_flow));
+		flow->ibv_flow = NULL;
+		return 0;
 	}
-	if (target->drop) {
-		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
-	} else {
-		struct rxq *rxq = priv->dev->data->rx_queues[target->queue_id];
+	if (flow->ibv_flow)
+		return 0;
+	assert(flow->queue ^ flow->drop);
+	if (flow->queue) {
+		struct rxq *rxq;
 
+		assert(flow->queue_id < priv->dev->data->nb_rx_queues);
+		rxq = priv->dev->data->rx_queues[flow->queue_id];
+		if (!rxq) {
+			err = EINVAL;
+			msg = "target queue must be configured first";
+			goto error;
+		}
 		qp = rxq->qp;
-		rte_flow->qp = qp;
 	}
-	rte_flow->ibv_attr = ibv_attr;
-	if (!priv->started)
-		return rte_flow;
-	rte_flow->ibv_flow = ibv_create_flow(qp, rte_flow->ibv_attr);
-	if (!rte_flow->ibv_flow) {
-		rte_flow_error_set(error, ENOMEM,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "flow rule creation failure");
-		goto error;
+	if (flow->drop) {
+		assert(priv->flow_drop_queue);
+		qp = priv->flow_drop_queue->qp;
 	}
-	return rte_flow;
+	assert(qp);
+	assert(flow->ibv_attr);
+	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
+	if (flow->ibv_flow)
+		return 0;
+	err = errno;
+	msg = "flow rule rejected by device";
 error:
-	rte_free(rte_flow);
-	return NULL;
+	return rte_flow_error_set
+		(error, err, RTE_FLOW_ERROR_TYPE_HANDLE, flow, msg);
 }
 
 /**
@@ -845,69 +892,21 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error)
 {
-	const struct rte_flow_action *action;
 	struct priv *priv = dev->data->dev_private;
-	struct rte_flow *rte_flow;
-	struct mlx4_flow_target target;
-	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
+	struct rte_flow *flow;
 	int err;
 
 	err = mlx4_flow_prepare(priv, attr, pattern, actions, error, &flow);
 	if (err)
 		return NULL;
-	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
-	if (!flow.ibv_attr) {
-		rte_flow_error_set(error, ENOMEM,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "cannot allocate ibv_attr memory");
-		return NULL;
+	err = mlx4_flow_toggle(priv, flow, priv->started, error);
+	if (!err) {
+		LIST_INSERT_HEAD(&priv->flows, flow, next);
+		return flow;
 	}
-	flow.offset = sizeof(struct ibv_flow_attr);
-	*flow.ibv_attr = (struct ibv_flow_attr){
-		.comp_mask = 0,
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.size = sizeof(struct ibv_flow_attr),
-		.priority = attr->priority,
-		.num_of_specs = 0,
-		.port = priv->port,
-		.flags = 0,
-	};
-	claim_zero(mlx4_flow_prepare(priv, attr, pattern, actions,
-				     error, &flow));
-	target = (struct mlx4_flow_target){
-		.queue = 0,
-		.drop = 0,
-	};
-	for (action = actions; action->type; ++action) {
-		switch (action->type) {
-			const struct rte_flow_action_queue *queue;
-
-		case RTE_FLOW_ACTION_TYPE_VOID:
-			continue;
-		case RTE_FLOW_ACTION_TYPE_QUEUE:
-			queue = action->conf;
-			target.queue = 1;
-			target.queue_id = queue->index;
-			break;
-		case RTE_FLOW_ACTION_TYPE_DROP:
-			target.drop = 1;
-			break;
-		default:
-			rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   action, "unsupported action");
-			goto exit;
-		}
-	}
-	rte_flow = mlx4_flow_create_target_queue(priv, flow.ibv_attr,
-						 &target, error);
-	if (rte_flow) {
-		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
-		DEBUG("Flow created %p", (void *)rte_flow);
-		return rte_flow;
-	}
-exit:
-	rte_free(flow.ibv_attr);
+	rte_flow_error_set(error, -err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			   error->message);
+	rte_free(flow);
 	return NULL;
 }
 
@@ -939,7 +938,7 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 }
 
 /**
- * Destroy a flow.
+ * Destroy a flow rule.
  *
  * @see rte_flow_destroy()
  * @see rte_flow_ops
@@ -949,19 +948,18 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 		  struct rte_flow *flow,
 		  struct rte_flow_error *error)
 {
-	(void)dev;
-	(void)error;
+	struct priv *priv = dev->data->dev_private;
+	int err = mlx4_flow_toggle(priv, flow, 0, error);
+
+	if (err)
+		return err;
 	LIST_REMOVE(flow, next);
-	if (flow->ibv_flow)
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-	rte_free(flow->ibv_attr);
-	DEBUG("Flow destroyed %p", (void *)flow);
 	rte_free(flow);
 	return 0;
 }
 
 /**
- * Destroy all flows.
+ * Destroy all flow rules.
  *
  * @see rte_flow_flush()
  * @see rte_flow_ops
@@ -982,9 +980,7 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 }
 
 /**
- * Remove all flows.
- *
- * Called by dev_stop() to remove all flows.
+ * Disable flow rules.
  *
  * @param priv
  *   Pointer to private structure.
@@ -997,27 +993,24 @@ mlx4_flow_stop(struct priv *priv)
 	for (flow = LIST_FIRST(&priv->flows);
 	     flow;
 	     flow = LIST_NEXT(flow, next)) {
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-		flow->ibv_flow = NULL;
-		DEBUG("Flow %p removed", (void *)flow);
+		claim_zero(mlx4_flow_toggle(priv, flow, 0, NULL));
 	}
 	mlx4_flow_destroy_drop_queue(priv);
 }
 
 /**
- * Add all flows.
+ * Enable flow rules.
  *
  * @param priv
  *   Pointer to private structure.
  *
  * @return
- *   0 on success, a errno value otherwise and rte_errno is set.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
 mlx4_flow_start(struct priv *priv)
 {
 	int ret;
-	struct ibv_qp *qp;
 	struct rte_flow *flow;
 
 	ret = mlx4_flow_create_drop_queue(priv);
@@ -1026,14 +1019,11 @@ mlx4_flow_start(struct priv *priv)
 	for (flow = LIST_FIRST(&priv->flows);
 	     flow;
 	     flow = LIST_NEXT(flow, next)) {
-		qp = flow->qp ? flow->qp : priv->flow_drop_queue->qp;
-		flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
-		if (!flow->ibv_flow) {
-			DEBUG("Flow %p cannot be applied", (void *)flow);
-			rte_errno = EINVAL;
-			return rte_errno;
+		ret = mlx4_flow_toggle(priv, flow, 1, NULL);
+		if (unlikely(ret)) {
+			mlx4_flow_stop(priv);
+			return ret;
 		}
-		DEBUG("Flow %p applied", (void *)flow);
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 358efbe..68ffb33 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -60,20 +60,10 @@ struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
-	struct ibv_qp *qp; /**< Verbs queue pair. */
-};
-
-/** Structure to pass to the conversion function. */
-struct mlx4_flow {
-	struct ibv_flow_attr *ibv_attr; /**< Verbs attribute. */
-	unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */
-};
-
-/** Flow rule target descriptor. */
-struct mlx4_flow_target {
-	uint32_t drop:1; /**< Target is a drop queue. */
+	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
+	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
-	uint32_t queue_id; /**< Identifier of the queue. */
+	uint16_t queue_id; /**< Target queue. */
 };
 
 /* mlx4_flow.c */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 11/29] net/mlx4: allocate drop flow resources on demand
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (9 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 10/29] net/mlx4: merge flow creation and validation code Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 12/29] net/mlx4: relax check on missing flow rule target Adrien Mazarguil
                     ` (18 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Verbs QP and CQ resources for drop flow rules do not need to be permanently
allocated, only when at least one rule needs them.

Besides, struct rte_flow_drop is outside the mlx4 PMD name space and should
never have been defined there. struct rte_flow is currently the only
exception to this rule.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h      |   3 +-
 drivers/net/mlx4/mlx4_flow.c | 138 ++++++++++++++++++++------------------
 2 files changed, 74 insertions(+), 67 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 1799951..f71679b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -88,6 +88,7 @@ enum {
 /** Driver name reported to lower layers and used in log output. */
 #define MLX4_DRIVER_NAME "net_mlx4"
 
+struct mlx4_drop;
 struct rxq;
 struct txq;
 struct rte_flow;
@@ -108,7 +109,7 @@ struct priv {
 	uint32_t intr_alarm:1; /**< An interrupt alarm is scheduled. */
 	uint32_t isolated:1; /**< Toggle isolated mode. */
 	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
-	struct rte_flow_drop *flow_drop_queue; /**< Flow drop queue. */
+	struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
 	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
 };
 
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index ac66444..8f4898b 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -125,9 +125,12 @@ struct mlx4_flow_proc_item {
 	const enum rte_flow_item_type *const next_item;
 };
 
-struct rte_flow_drop {
-	struct ibv_qp *qp; /**< Verbs queue pair. */
-	struct ibv_cq *cq; /**< Verbs completion queue. */
+/** Shared resources for drop flow rules. */
+struct mlx4_drop {
+	struct ibv_qp *qp; /**< QP target. */
+	struct ibv_cq *cq; /**< CQ associated with above QP. */
+	struct priv *priv; /**< Back pointer to private data. */
+	uint32_t refcnt; /**< Reference count. */
 };
 
 /**
@@ -744,76 +747,73 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 }
 
 /**
- * Destroy a drop queue.
+ * Get a drop flow rule resources instance.
  *
  * @param priv
  *   Pointer to private structure.
+ *
+ * @return
+ *   Pointer to drop flow resources on success, NULL otherwise and rte_errno
+ *   is set.
  */
-static void
-mlx4_flow_destroy_drop_queue(struct priv *priv)
+static struct mlx4_drop *
+mlx4_drop_get(struct priv *priv)
 {
-	if (priv->flow_drop_queue) {
-		struct rte_flow_drop *fdq = priv->flow_drop_queue;
+	struct mlx4_drop *drop = priv->drop;
 
-		priv->flow_drop_queue = NULL;
-		claim_zero(ibv_destroy_qp(fdq->qp));
-		claim_zero(ibv_destroy_cq(fdq->cq));
-		rte_free(fdq);
+	if (drop) {
+		assert(drop->refcnt);
+		assert(drop->priv == priv);
+		++drop->refcnt;
+		return drop;
 	}
+	drop = rte_malloc(__func__, sizeof(*drop), 0);
+	if (!drop)
+		goto error;
+	*drop = (struct mlx4_drop){
+		.priv = priv,
+		.refcnt = 1,
+	};
+	drop->cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0);
+	if (!drop->cq)
+		goto error;
+	drop->qp = ibv_create_qp(priv->pd,
+				 &(struct ibv_qp_init_attr){
+					.send_cq = drop->cq,
+					.recv_cq = drop->cq,
+					.qp_type = IBV_QPT_RAW_PACKET,
+				 });
+	if (!drop->qp)
+		goto error;
+	priv->drop = drop;
+	return drop;
+error:
+	if (drop->qp)
+		claim_zero(ibv_destroy_qp(drop->qp));
+	if (drop->cq)
+		claim_zero(ibv_destroy_cq(drop->cq));
+	if (drop)
+		rte_free(drop);
+	rte_errno = ENOMEM;
+	return NULL;
 }
 
 /**
- * Create a single drop queue for all drop flows.
+ * Give back a drop flow rule resources instance.
  *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative value otherwise.
+ * @param drop
+ *   Pointer to drop flow rule resources.
  */
-static int
-mlx4_flow_create_drop_queue(struct priv *priv)
+static void
+mlx4_drop_put(struct mlx4_drop *drop)
 {
-	struct ibv_qp *qp;
-	struct ibv_cq *cq;
-	struct rte_flow_drop *fdq;
-
-	fdq = rte_calloc(__func__, 1, sizeof(*fdq), 0);
-	if (!fdq) {
-		ERROR("Cannot allocate memory for drop struct");
-		goto err;
-	}
-	cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0);
-	if (!cq) {
-		ERROR("Cannot create drop CQ");
-		goto err_create_cq;
-	}
-	qp = ibv_create_qp(priv->pd,
-			   &(struct ibv_qp_init_attr){
-				.send_cq = cq,
-				.recv_cq = cq,
-				.cap = {
-					.max_recv_wr = 1,
-					.max_recv_sge = 1,
-				},
-				.qp_type = IBV_QPT_RAW_PACKET,
-			   });
-	if (!qp) {
-		ERROR("Cannot create drop QP");
-		goto err_create_qp;
-	}
-	*fdq = (struct rte_flow_drop){
-		.qp = qp,
-		.cq = cq,
-	};
-	priv->flow_drop_queue = fdq;
-	return 0;
-err_create_qp:
-	claim_zero(ibv_destroy_cq(cq));
-err_create_cq:
-	rte_free(fdq);
-err:
-	return -1;
+	assert(drop->refcnt);
+	if (--drop->refcnt)
+		return;
+	drop->priv->drop = NULL;
+	claim_zero(ibv_destroy_qp(drop->qp));
+	claim_zero(ibv_destroy_cq(drop->cq));
+	rte_free(drop);
 }
 
 /**
@@ -846,6 +846,8 @@ mlx4_flow_toggle(struct priv *priv,
 			return 0;
 		claim_zero(ibv_destroy_flow(flow->ibv_flow));
 		flow->ibv_flow = NULL;
+		if (flow->drop)
+			mlx4_drop_put(priv->drop);
 		return 0;
 	}
 	if (flow->ibv_flow)
@@ -864,14 +866,21 @@ mlx4_flow_toggle(struct priv *priv,
 		qp = rxq->qp;
 	}
 	if (flow->drop) {
-		assert(priv->flow_drop_queue);
-		qp = priv->flow_drop_queue->qp;
+		mlx4_drop_get(priv);
+		if (!priv->drop) {
+			err = rte_errno;
+			msg = "resources for drop flow rule cannot be created";
+			goto error;
+		}
+		qp = priv->drop->qp;
 	}
 	assert(qp);
 	assert(flow->ibv_attr);
 	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
 	if (flow->ibv_flow)
 		return 0;
+	if (flow->drop)
+		mlx4_drop_put(priv->drop);
 	err = errno;
 	msg = "flow rule rejected by device";
 error:
@@ -995,7 +1004,7 @@ mlx4_flow_stop(struct priv *priv)
 	     flow = LIST_NEXT(flow, next)) {
 		claim_zero(mlx4_flow_toggle(priv, flow, 0, NULL));
 	}
-	mlx4_flow_destroy_drop_queue(priv);
+	assert(!priv->drop);
 }
 
 /**
@@ -1013,9 +1022,6 @@ mlx4_flow_start(struct priv *priv)
 	int ret;
 	struct rte_flow *flow;
 
-	ret = mlx4_flow_create_drop_queue(priv);
-	if (ret)
-		return -1;
 	for (flow = LIST_FIRST(&priv->flows);
 	     flow;
 	     flow = LIST_NEXT(flow, next)) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 12/29] net/mlx4: relax check on missing flow rule target
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (10 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 11/29] net/mlx4: allocate drop flow resources on demand Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 13/29] net/mlx4: refactor internal flow rules Adrien Mazarguil
                     ` (17 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Creating a flow rule targeting a missing (unconfigured) queue is not
possible. However, nothing really prevents the destruction of a queue with
existing flow rules still pointing at it, except currently the port must be
in a stopped state in order to avoid crashing.

Problem is that the port cannot be restarted if flow rules cannot be
re-applied due to missing queues. This flexibility will be needed by
subsequent work on this PMD.

Given that a PMD cannot decide on its own to remove problematic
user-defined flow rules in order to restart a port, work around this
restriction by making the affected ones drop-like, i.e. rules targeting
nonexistent queues drop packets instead.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 8f4898b..669eba2 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -850,20 +850,24 @@ mlx4_flow_toggle(struct priv *priv,
 			mlx4_drop_put(priv->drop);
 		return 0;
 	}
-	if (flow->ibv_flow)
-		return 0;
-	assert(flow->queue ^ flow->drop);
 	if (flow->queue) {
-		struct rxq *rxq;
+		struct rxq *rxq = NULL;
 
-		assert(flow->queue_id < priv->dev->data->nb_rx_queues);
-		rxq = priv->dev->data->rx_queues[flow->queue_id];
-		if (!rxq) {
-			err = EINVAL;
-			msg = "target queue must be configured first";
-			goto error;
+		if (flow->queue_id < priv->dev->data->nb_rx_queues)
+			rxq = priv->dev->data->rx_queues[flow->queue_id];
+		if (flow->ibv_flow) {
+			if (!rxq ^ !flow->drop)
+				return 0;
+			/* Verbs flow needs updating. */
+			claim_zero(ibv_destroy_flow(flow->ibv_flow));
+			flow->ibv_flow = NULL;
+			if (flow->drop)
+				mlx4_drop_put(priv->drop);
 		}
-		qp = rxq->qp;
+		if (rxq)
+			qp = rxq->qp;
+		/* A missing target queue drops traffic implicitly. */
+		flow->drop = !rxq;
 	}
 	if (flow->drop) {
 		mlx4_drop_get(priv);
@@ -876,6 +880,8 @@ mlx4_flow_toggle(struct priv *priv,
 	}
 	assert(qp);
 	assert(flow->ibv_attr);
+	if (flow->ibv_flow)
+		return 0;
 	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
 	if (flow->ibv_flow)
 		return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 13/29] net/mlx4: refactor internal flow rules
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (11 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 12/29] net/mlx4: relax check on missing flow rule target Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 14/29] net/mlx4: generalize flow rule priority support Adrien Mazarguil
                     ` (16 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

When not in isolated mode, a flow rule is automatically configured by the
PMD to receive traffic addressed to the MAC address of the device. This
somewhat duplicates flow API functionality.

Remove legacy support for internal flow rules to instead handle them
through the flow API implementation.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  20 ++---
 drivers/net/mlx4/mlx4.h      |   1 -
 drivers/net/mlx4/mlx4_flow.c | 155 +++++++++++++++++++++++++++++++++++---
 drivers/net/mlx4/mlx4_flow.h |   6 ++
 drivers/net/mlx4/mlx4_rxq.c  | 117 +++-------------------------
 drivers/net/mlx4/mlx4_rxtx.h |   2 -
 6 files changed, 172 insertions(+), 129 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b084903..40c0ee2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -96,8 +96,15 @@ const char *pmd_mlx4_init_params[] = {
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
-	(void)dev;
-	return 0;
+	struct priv *priv = dev->data->dev_private;
+	int ret;
+
+	/* Prepare internal flow rules. */
+	ret = mlx4_flow_sync(priv);
+	if (ret)
+		ERROR("cannot set up internal flow rules: %s",
+		      strerror(-ret));
+	return ret;
 }
 
 /**
@@ -121,9 +128,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		return 0;
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
-	ret = mlx4_mac_addr_add(priv);
-	if (ret)
-		goto err;
 	ret = mlx4_intr_install(priv);
 	if (ret) {
 		ERROR("%p: interrupt handler installation failed",
@@ -139,7 +143,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	return 0;
 err:
 	/* Rollback. */
-	mlx4_mac_addr_del(priv);
 	priv->started = 0;
 	return ret;
 }
@@ -163,7 +166,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	priv->started = 0;
 	mlx4_flow_stop(priv);
 	mlx4_intr_uninstall(priv);
-	mlx4_mac_addr_del(priv);
 }
 
 /**
@@ -185,7 +187,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
-	mlx4_mac_addr_del(priv);
+	mlx4_flow_clean(priv);
 	dev->rx_pkt_burst = mlx4_rx_burst_removed;
 	dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	for (i = 0; i != dev->data->nb_rx_queues; ++i)
@@ -542,8 +544,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
 		priv->mac = mac;
-		if (mlx4_mac_addr_add(priv))
-			goto port_error;
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index f71679b..fb4708d 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -100,7 +100,6 @@ struct priv {
 	struct ibv_device_attr device_attr; /**< Device properties. */
 	struct ibv_pd *pd; /**< Protection Domain. */
 	struct ether_addr mac; /**< MAC address. */
-	struct ibv_flow *mac_flow; /**< Flow associated with MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /**< Configured MTU. */
 	uint8_t port; /**< Physical port number. */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 669eba2..fb38179 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -617,6 +617,10 @@ mlx4_flow_prepare(struct priv *priv,
 
 		if (item->type == RTE_FLOW_ITEM_TYPE_VOID)
 			continue;
+		if (item->type == MLX4_FLOW_ITEM_TYPE_INTERNAL) {
+			flow->internal = 1;
+			continue;
+		}
 		/*
 		 * The nic can support patterns with NULL eth spec only
 		 * if eth is a single item in a rule.
@@ -916,7 +920,17 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		return NULL;
 	err = mlx4_flow_toggle(priv, flow, priv->started, error);
 	if (!err) {
-		LIST_INSERT_HEAD(&priv->flows, flow, next);
+		struct rte_flow *curr = LIST_FIRST(&priv->flows);
+
+		/* New rules are inserted after internal ones. */
+		if (!curr || !curr->internal) {
+			LIST_INSERT_HEAD(&priv->flows, flow, next);
+		} else {
+			while (LIST_NEXT(curr, next) &&
+			       LIST_NEXT(curr, next)->internal)
+				curr = LIST_NEXT(curr, next);
+			LIST_INSERT_AFTER(curr, flow, next);
+		}
 		return flow;
 	}
 	rte_flow_error_set(error, -err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
@@ -941,13 +955,14 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 	if (!!enable == !!priv->isolated)
 		return 0;
 	priv->isolated = !!enable;
-	if (enable) {
-		mlx4_mac_addr_del(priv);
-	} else if (mlx4_mac_addr_add(priv) < 0) {
-		priv->isolated = 1;
+	if (mlx4_flow_sync(priv)) {
+		priv->isolated = !enable;
 		return rte_flow_error_set(error, rte_errno,
 					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					  NULL, "cannot leave isolated mode");
+					  NULL,
+					  enable ?
+					  "cannot enter isolated mode" :
+					  "cannot leave isolated mode");
 	}
 	return 0;
 }
@@ -974,7 +989,9 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 }
 
 /**
- * Destroy all flow rules.
+ * Destroy user-configured flow rules.
+ *
+ * This function skips internal flows rules.
  *
  * @see rte_flow_flush()
  * @see rte_flow_ops
@@ -984,17 +1001,133 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_flow *flow = LIST_FIRST(&priv->flows);
 
-	while (!LIST_EMPTY(&priv->flows)) {
-		struct rte_flow *flow;
+	while (flow) {
+		struct rte_flow *next = LIST_NEXT(flow, next);
 
-		flow = LIST_FIRST(&priv->flows);
-		mlx4_flow_destroy(dev, flow, error);
+		if (!flow->internal)
+			mlx4_flow_destroy(dev, flow, error);
+		flow = next;
 	}
 	return 0;
 }
 
 /**
+ * Generate internal flow rules.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
+{
+	struct rte_flow_attr attr = {
+		.ingress = 1,
+	};
+	struct rte_flow_item pattern[] = {
+		{
+			.type = MLX4_FLOW_ITEM_TYPE_INTERNAL,
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_ETH,
+			.spec = &(struct rte_flow_item_eth){
+				.dst = priv->mac,
+			},
+			.mask = &(struct rte_flow_item_eth){
+				.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+			},
+		},
+		{
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+	};
+	struct rte_flow_action actions[] = {
+		{
+			.type = RTE_FLOW_ACTION_TYPE_QUEUE,
+			.conf = &(struct rte_flow_action_queue){
+				.index = 0,
+			},
+		},
+		{
+			.type = RTE_FLOW_ACTION_TYPE_END,
+		},
+	};
+
+	if (!mlx4_flow_create(priv->dev, &attr, pattern, actions, error))
+		return -rte_errno;
+	return 0;
+}
+
+/**
+ * Synchronize flow rules.
+ *
+ * This function synchronizes flow rules with the state of the device by
+ * taking into account isolated mode and whether target queues are
+ * configured.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_flow_sync(struct priv *priv)
+{
+	struct rte_flow *flow;
+	int ret;
+
+	/* Internal flow rules are guaranteed to come first in the list. */
+	if (priv->isolated) {
+		/*
+		 * Get rid of them in isolated mode, stop at the first
+		 * non-internal rule found.
+		 */
+		for (flow = LIST_FIRST(&priv->flows);
+		     flow && flow->internal;
+		     flow = LIST_FIRST(&priv->flows))
+			claim_zero(mlx4_flow_destroy(priv->dev, flow, NULL));
+	} else if (!LIST_FIRST(&priv->flows) ||
+		   !LIST_FIRST(&priv->flows)->internal) {
+		/*
+		 * If the first rule is not internal outside isolated mode,
+		 * they must be added back.
+		 */
+		ret = mlx4_flow_internal(priv, NULL);
+		if (ret)
+			return ret;
+	}
+	if (priv->started)
+		return mlx4_flow_start(priv);
+	mlx4_flow_stop(priv);
+	return 0;
+}
+
+/**
+ * Clean up all flow rules.
+ *
+ * Unlike mlx4_flow_flush(), this function takes care of all remaining flow
+ * rules regardless of whether they are internal or user-configured.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+mlx4_flow_clean(struct priv *priv)
+{
+	struct rte_flow *flow;
+
+	while ((flow = LIST_FIRST(&priv->flows)))
+		mlx4_flow_destroy(priv->dev, flow, NULL);
+}
+
+/**
  * Disable flow rules.
  *
  * @param priv
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 68ffb33..c2ffa8d 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -55,12 +55,16 @@
 /** Last and lowest priority level for a flow rule. */
 #define MLX4_FLOW_PRIORITY_LAST UINT32_C(0xfff)
 
+/** Meta pattern item used to distinguish internal rules. */
+#define MLX4_FLOW_ITEM_TYPE_INTERNAL ((enum rte_flow_item_type)-1)
+
 /** PMD-specific (mlx4) definition of a flow rule handle. */
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
 	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
+	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint16_t queue_id; /**< Target queue. */
@@ -68,6 +72,8 @@ struct rte_flow {
 
 /* mlx4_flow.c */
 
+int mlx4_flow_sync(struct priv *priv);
+void mlx4_flow_clean(struct priv *priv);
 int mlx4_flow_start(struct priv *priv);
 void mlx4_flow_stop(struct priv *priv);
 int mlx4_filter_ctrl(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 2d54ab0..7bb2f9e 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -59,6 +59,7 @@
 #include <rte_mempool.h>
 
 #include "mlx4.h"
+#include "mlx4_flow.h"
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
@@ -399,8 +400,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -rte_errno;
 		}
 		dev->data->rx_queues[idx] = NULL;
-		if (idx == 0)
-			mlx4_mac_addr_del(priv);
+		/* Disable associated flows. */
+		mlx4_flow_sync(priv);
 		mlx4_rxq_cleanup(rxq);
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
@@ -419,6 +420,14 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		DEBUG("%p: adding Rx queue %p to list",
 		      (void *)dev, (void *)rxq);
 		dev->data->rx_queues[idx] = rxq;
+		/* Re-enable associated flows. */
+		ret = mlx4_flow_sync(priv);
+		if (ret) {
+			dev->data->rx_queues[idx] = NULL;
+			mlx4_rxq_cleanup(rxq);
+			rte_free(rxq);
+			return ret;
+		}
 		/* Update receive callback. */
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
@@ -446,111 +455,9 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			DEBUG("%p: removing Rx queue %p from list",
 			      (void *)priv->dev, (void *)rxq);
 			priv->dev->data->rx_queues[i] = NULL;
-			if (i == 0)
-				mlx4_mac_addr_del(priv);
 			break;
 		}
+	mlx4_flow_sync(priv);
 	mlx4_rxq_cleanup(rxq);
 	rte_free(rxq);
 }
-
-/**
- * Unregister a MAC address.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void
-mlx4_mac_addr_del(struct priv *priv)
-{
-#ifndef NDEBUG
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-#endif
-
-	if (!priv->mac_flow)
-		return;
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	claim_zero(ibv_destroy_flow(priv->mac_flow));
-	priv->mac_flow = NULL;
-}
-
-/**
- * Register a MAC address.
- *
- * The MAC address is registered in queue 0.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_mac_addr_add(struct priv *priv)
-{
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-	struct rxq *rxq;
-	struct ibv_flow *flow;
-
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		return 0;
-	if (priv->isolated)
-		return 0;
-	if (priv->dev->data->rx_queues && priv->dev->data->rx_queues[0])
-		rxq = priv->dev->data->rx_queues[0];
-	else
-		return 0;
-
-	/* Allocate flow specification on the stack. */
-	struct __attribute__((packed)) {
-		struct ibv_flow_attr attr;
-		struct ibv_flow_spec_eth spec;
-	} data;
-	struct ibv_flow_attr *attr = &data.attr;
-	struct ibv_flow_spec_eth *spec = &data.spec;
-
-	if (priv->mac_flow)
-		mlx4_mac_addr_del(priv);
-	/*
-	 * No padding must be inserted by the compiler between attr and spec.
-	 * This layout is expected by libibverbs.
-	 */
-	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-	*attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.priority = 3,
-		.num_of_specs = 1,
-		.port = priv->port,
-		.flags = 0
-	};
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
-		.size = sizeof(*spec),
-		.val = {
-			.dst_mac = {
-				(*mac)[0], (*mac)[1], (*mac)[2],
-				(*mac)[3], (*mac)[4], (*mac)[5]
-			},
-		},
-		.mask = {
-			.dst_mac = "\xff\xff\xff\xff\xff\xff",
-		}
-	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	/* Create related flow. */
-	flow = ibv_create_flow(rxq->qp, attr);
-	if (flow == NULL) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, rte_errno, strerror(errno));
-		return -rte_errno;
-	}
-	assert(priv->mac_flow == NULL);
-	priv->mac_flow = flow;
-	return 0;
-}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 365b585..7a2c982 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -128,8 +128,6 @@ int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			const struct rte_eth_rxconf *conf,
 			struct rte_mempool *mp);
 void mlx4_rx_queue_release(void *dpdk_rxq);
-void mlx4_mac_addr_del(struct priv *priv);
-int mlx4_mac_addr_add(struct priv *priv);
 
 /* mlx4_rxtx.c */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 14/29] net/mlx4: generalize flow rule priority support
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (12 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 13/29] net/mlx4: refactor internal flow rules Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 15/29] net/mlx4: simplify trigger code for flow rules Adrien Mazarguil
                     ` (15 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Since both internal and user-defined flow rules are handled by a common
implementation, flow rule priority overlaps are easier to detect. No need
to restrict their use to isolated mode only.

With this patch, only the lowest priority level remains inaccessible to
users outside isolated mode.

Also, the PMD no longer automatically assigns a fixed priority level to
user-defined flow rules, which means collisions between overlapping rules
matching a different number of protocol layers at a given priority level
won't be avoided anymore (e.g. "eth" vs. "eth / ipv4 / udp").

As a reminder, the outcome of overlapping rules for a given priority level
was, and still is, undefined territory according to API documentation.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index fb38179..e1290a8 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -155,7 +155,6 @@ mlx4_flow_create_eth(const struct rte_flow_item *item,
 	unsigned int i;
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 2;
 	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
 		.type = IBV_FLOW_SPEC_ETH,
@@ -232,7 +231,6 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 	unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4);
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 1;
 	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*ipv4 = (struct ibv_flow_spec_ipv4) {
 		.type = IBV_FLOW_SPEC_IPV4,
@@ -277,7 +275,6 @@ mlx4_flow_create_udp(const struct rte_flow_item *item,
 	unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 0;
 	udp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*udp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_UDP,
@@ -318,7 +315,6 @@ mlx4_flow_create_tcp(const struct rte_flow_item *item,
 	unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp);
 
 	++flow->ibv_attr->num_of_specs;
-	flow->ibv_attr->priority = 0;
 	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*tcp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_TCP,
@@ -581,19 +577,11 @@ mlx4_flow_prepare(struct priv *priv,
 	const struct mlx4_flow_proc_item *proc;
 	struct rte_flow temp = { .ibv_attr_size = sizeof(*temp.ibv_attr) };
 	struct rte_flow *flow = &temp;
-	uint32_t priority_override = 0;
 
 	if (attr->group)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
 			 NULL, "groups are not supported");
-	if (priv->isolated)
-		priority_override = attr->priority;
-	else if (attr->priority)
-		return rte_flow_error_set
-			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
-			 NULL,
-			 "priorities are not supported outside isolated mode");
 	if (attr->priority > MLX4_FLOW_PRIORITY_LAST)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY,
@@ -659,9 +647,6 @@ mlx4_flow_prepare(struct priv *priv,
 		}
 		flow->ibv_attr_size += proc->dst_sz;
 	}
-	/* Use specified priority level when in isolated mode. */
-	if (priv->isolated && flow != &temp)
-		flow->ibv_attr->priority = priority_override;
 	/* Go over actions list. */
 	for (action = actions; action->type; ++action) {
 		switch (action->type) {
@@ -718,6 +703,7 @@ mlx4_flow_prepare(struct priv *priv,
 		*flow->ibv_attr = (struct ibv_flow_attr){
 			.type = IBV_FLOW_ATTR_NORMAL,
 			.size = sizeof(*flow->ibv_attr),
+			.priority = attr->priority,
 			.port = priv->port,
 		};
 		goto fill;
@@ -854,6 +840,22 @@ mlx4_flow_toggle(struct priv *priv,
 			mlx4_drop_put(priv->drop);
 		return 0;
 	}
+	assert(flow->ibv_attr);
+	if (!flow->internal &&
+	    !priv->isolated &&
+	    flow->ibv_attr->priority == MLX4_FLOW_PRIORITY_LAST) {
+		if (flow->ibv_flow) {
+			claim_zero(ibv_destroy_flow(flow->ibv_flow));
+			flow->ibv_flow = NULL;
+			if (flow->drop)
+				mlx4_drop_put(priv->drop);
+		}
+		err = EACCES;
+		msg = ("priority level "
+		       MLX4_STR_EXPAND(MLX4_FLOW_PRIORITY_LAST)
+		       " is reserved when not in isolated mode");
+		goto error;
+	}
 	if (flow->queue) {
 		struct rxq *rxq = NULL;
 
@@ -883,7 +885,6 @@ mlx4_flow_toggle(struct priv *priv,
 		qp = priv->drop->qp;
 	}
 	assert(qp);
-	assert(flow->ibv_attr);
 	if (flow->ibv_flow)
 		return 0;
 	flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
@@ -1028,6 +1029,7 @@ static int
 mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 {
 	struct rte_flow_attr attr = {
+		.priority = MLX4_FLOW_PRIORITY_LAST,
 		.ingress = 1,
 	};
 	struct rte_flow_item pattern[] = {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 15/29] net/mlx4: simplify trigger code for flow rules
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (13 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 14/29] net/mlx4: generalize flow rule priority support Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 16/29] net/mlx4: refactor flow item validation code Adrien Mazarguil
                     ` (14 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Since flow rules synchronization function mlx4_flow_sync() takes into
account the state of the device (whether it is started), trigger functions
mlx4_flow_start() and mlx4_flow_stop() are redundant. Standardize on
mlx4_flow_sync().

Use this opportunity to enhance this function with better error reporting
as the inability to start the device due to a problem with a flow rule
otherwise results in a nondescript error code.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 25 ++++++++-----
 drivers/net/mlx4/mlx4_flow.c | 76 +++++++++------------------------------
 drivers/net/mlx4/mlx4_flow.h |  4 +--
 drivers/net/mlx4/mlx4_rxq.c  | 14 ++++++--
 4 files changed, 46 insertions(+), 73 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 40c0ee2..256aa3d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -60,6 +60,7 @@
 #include <rte_ethdev.h>
 #include <rte_ethdev_pci.h>
 #include <rte_ether.h>
+#include <rte_flow.h>
 #include <rte_interrupts.h>
 #include <rte_kvargs.h>
 #include <rte_malloc.h>
@@ -97,13 +98,17 @@ static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
 	int ret;
 
 	/* Prepare internal flow rules. */
-	ret = mlx4_flow_sync(priv);
-	if (ret)
-		ERROR("cannot set up internal flow rules: %s",
-		      strerror(-ret));
+	ret = mlx4_flow_sync(priv, &error);
+	if (ret) {
+		ERROR("cannot set up internal flow rules (code %d, \"%s\"),"
+		      " flow error type %d, cause %p, message: %s",
+		      -ret, strerror(-ret), error.type, error.cause,
+		      error.message ? error.message : "(unspecified)");
+	}
 	return ret;
 }
 
@@ -122,6 +127,7 @@ static int
 mlx4_dev_start(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
 	int ret;
 
 	if (priv->started)
@@ -134,10 +140,13 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		     (void *)dev);
 		goto err;
 	}
-	ret = mlx4_flow_start(priv);
+	ret = mlx4_flow_sync(priv, &error);
 	if (ret) {
-		ERROR("%p: flow start failed: %s",
-		      (void *)dev, strerror(ret));
+		ERROR("%p: cannot attach flow rules (code %d, \"%s\"),"
+		      " flow error type %d, cause %p, message: %s",
+		      (void *)dev,
+		      -ret, strerror(-ret), error.type, error.cause,
+		      error.message ? error.message : "(unspecified)");
 		goto err;
 	}
 	return 0;
@@ -164,7 +173,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		return;
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
-	mlx4_flow_stop(priv);
+	mlx4_flow_sync(priv, NULL);
 	mlx4_intr_uninstall(priv);
 }
 
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index e1290a8..ec6c28f 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -956,14 +956,9 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 	if (!!enable == !!priv->isolated)
 		return 0;
 	priv->isolated = !!enable;
-	if (mlx4_flow_sync(priv)) {
+	if (mlx4_flow_sync(priv, error)) {
 		priv->isolated = !enable;
-		return rte_flow_error_set(error, rte_errno,
-					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-					  NULL,
-					  enable ?
-					  "cannot enter isolated mode" :
-					  "cannot leave isolated mode");
+		return -rte_errno;
 	}
 	return 0;
 }
@@ -1075,12 +1070,14 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
  *
  * @param priv
  *   Pointer to private structure.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
  *
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-mlx4_flow_sync(struct priv *priv)
+mlx4_flow_sync(struct priv *priv, struct rte_flow_error *error)
 {
 	struct rte_flow *flow;
 	int ret;
@@ -1094,20 +1091,27 @@ mlx4_flow_sync(struct priv *priv)
 		for (flow = LIST_FIRST(&priv->flows);
 		     flow && flow->internal;
 		     flow = LIST_FIRST(&priv->flows))
-			claim_zero(mlx4_flow_destroy(priv->dev, flow, NULL));
+			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
 	} else if (!LIST_FIRST(&priv->flows) ||
 		   !LIST_FIRST(&priv->flows)->internal) {
 		/*
 		 * If the first rule is not internal outside isolated mode,
 		 * they must be added back.
 		 */
-		ret = mlx4_flow_internal(priv, NULL);
+		ret = mlx4_flow_internal(priv, error);
+		if (ret)
+			return ret;
+	}
+	/* Toggle the remaining flow rules . */
+	for (flow = LIST_FIRST(&priv->flows);
+	     flow;
+	     flow = LIST_NEXT(flow, next)) {
+		ret = mlx4_flow_toggle(priv, flow, priv->started, error);
 		if (ret)
 			return ret;
 	}
-	if (priv->started)
-		return mlx4_flow_start(priv);
-	mlx4_flow_stop(priv);
+	if (!priv->started)
+		assert(!priv->drop);
 	return 0;
 }
 
@@ -1129,52 +1133,6 @@ mlx4_flow_clean(struct priv *priv)
 		mlx4_flow_destroy(priv->dev, flow, NULL);
 }
 
-/**
- * Disable flow rules.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void
-mlx4_flow_stop(struct priv *priv)
-{
-	struct rte_flow *flow;
-
-	for (flow = LIST_FIRST(&priv->flows);
-	     flow;
-	     flow = LIST_NEXT(flow, next)) {
-		claim_zero(mlx4_flow_toggle(priv, flow, 0, NULL));
-	}
-	assert(!priv->drop);
-}
-
-/**
- * Enable flow rules.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, a negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_flow_start(struct priv *priv)
-{
-	int ret;
-	struct rte_flow *flow;
-
-	for (flow = LIST_FIRST(&priv->flows);
-	     flow;
-	     flow = LIST_NEXT(flow, next)) {
-		ret = mlx4_flow_toggle(priv, flow, 1, NULL);
-		if (unlikely(ret)) {
-			mlx4_flow_stop(priv);
-			return ret;
-		}
-	}
-	return 0;
-}
-
 static const struct rte_flow_ops mlx4_flow_ops = {
 	.validate = mlx4_flow_validate,
 	.create = mlx4_flow_create,
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index c2ffa8d..13495d7 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -72,10 +72,8 @@ struct rte_flow {
 
 /* mlx4_flow.c */
 
-int mlx4_flow_sync(struct priv *priv);
+int mlx4_flow_sync(struct priv *priv, struct rte_flow_error *error);
 void mlx4_flow_clean(struct priv *priv);
-int mlx4_flow_start(struct priv *priv);
-void mlx4_flow_stop(struct priv *priv);
 int mlx4_filter_ctrl(struct rte_eth_dev *dev,
 		     enum rte_filter_type filter_type,
 		     enum rte_filter_op filter_op,
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 7bb2f9e..bcb7b94 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -54,6 +54,7 @@
 #include <rte_common.h>
 #include <rte_errno.h>
 #include <rte_ethdev.h>
+#include <rte_flow.h>
 #include <rte_malloc.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
@@ -401,7 +402,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		}
 		dev->data->rx_queues[idx] = NULL;
 		/* Disable associated flows. */
-		mlx4_flow_sync(priv);
+		mlx4_flow_sync(priv, NULL);
 		mlx4_rxq_cleanup(rxq);
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
@@ -416,13 +417,20 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	if (ret) {
 		rte_free(rxq);
 	} else {
+		struct rte_flow_error error;
+
 		rxq->stats.idx = idx;
 		DEBUG("%p: adding Rx queue %p to list",
 		      (void *)dev, (void *)rxq);
 		dev->data->rx_queues[idx] = rxq;
 		/* Re-enable associated flows. */
-		ret = mlx4_flow_sync(priv);
+		ret = mlx4_flow_sync(priv, &error);
 		if (ret) {
+			ERROR("cannot re-attach flow rules to queue %u"
+			      " (code %d, \"%s\"), flow error type %d,"
+			      " cause %p, message: %s", idx,
+			      -ret, strerror(-ret), error.type, error.cause,
+			      error.message ? error.message : "(unspecified)");
 			dev->data->rx_queues[idx] = NULL;
 			mlx4_rxq_cleanup(rxq);
 			rte_free(rxq);
@@ -457,7 +465,7 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			priv->dev->data->rx_queues[i] = NULL;
 			break;
 		}
-	mlx4_flow_sync(priv);
+	mlx4_flow_sync(priv, NULL);
 	mlx4_rxq_cleanup(rxq);
 	rte_free(rxq);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 16/29] net/mlx4: refactor flow item validation code
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (14 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 15/29] net/mlx4: simplify trigger code for flow rules Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 17/29] net/mlx4: add MAC addresses configuration support Adrien Mazarguil
                     ` (13 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Since flow rule validation and creation have been refactored into a common
two-pass function, having separate callback functions to validate and
convert individual items seems redundant.

The purpose of these item validation functions is to reject partial masks
as those are not supported by hardware, before handing over the item to a
separate function that performs basic sanity checks.

The current approach and related code have the following issues:

- Lack of flow handle context in validation code requires kludges such as
  the special treatment reserved to spec-less Ethernet pattern items.
- Lack of useful error reporting; users need as much help as possible to
  understand what they did wrong, particularly when they hit hardware
  limitations that aren't mentioned by the flow API. Preventing them from
  going berserk after getting a generic "item not supported" message for no
  apparent reason is mandatory.
- Generic checks should be performed by the caller, not by item-specific
  validation functions.
- Mask checks either missing or too lax in some cases (Ethernet, VLAN).

This commit addresses all the above by combining validation and conversion
callbacks as "merge" callbacks that take an additional error context
parameter. Also:

- Support for source MAC address matching is removed as it has no effect.
- Providing an empty mask no longer bypasses the Ethernet specification
  check that causes a rule to become promiscuous-like.
- VLAN VIDs must be matched exactly, as matching all VLAN traffic while
  excluding non-VLAN traffic is not supported.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 576 +++++++++++++++++++-------------------
 drivers/net/mlx4/mlx4_flow.h |   1 +
 2 files changed, 288 insertions(+), 289 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index ec6c28f..3af83f2 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -76,49 +76,17 @@
 
 /** Processor structure associated with a flow item. */
 struct mlx4_flow_proc_item {
-	/** Bit-masks corresponding to the possibilities for the item. */
-	const void *mask;
-	/**
-	 * Default bit-masks to use when item->mask is not provided. When
-	 * \default_mask is also NULL, the full supported bit-mask (\mask) is
-	 * used instead.
-	 */
-	const void *default_mask;
-	/** Bit-masks size in bytes. */
+	/** Bit-mask for fields supported by this PMD. */
+	const void *mask_support;
+	/** Bit-mask to use when @p item->mask is not provided. */
+	const void *mask_default;
+	/** Size in bytes for @p mask_support and @p mask_default. */
 	const unsigned int mask_sz;
-	/**
-	 * Check support for a given item.
-	 *
-	 * @param item[in]
-	 *   Item specification.
-	 * @param mask[in]
-	 *   Bit-masks covering supported fields to compare with spec,
-	 *   last and mask in
-	 *   \item.
-	 * @param size
-	 *   Bit-Mask size in bytes.
-	 *
-	 * @return
-	 *   0 on success, negative value otherwise.
-	 */
-	int (*validate)(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size);
-	/**
-	 * Conversion function from rte_flow to NIC specific flow.
-	 *
-	 * @param item
-	 *   rte_flow item to convert.
-	 * @param default_mask
-	 *   Default bit-masks to use when item->mask is not provided.
-	 * @param flow
-	 *   Flow rule handle to update.
-	 *
-	 * @return
-	 *   0 on success, negative value otherwise.
-	 */
-	int (*convert)(const struct rte_flow_item *item,
-		       const void *default_mask,
-		       struct rte_flow *flow);
+	/** Merge a pattern item into a flow rule handle. */
+	int (*merge)(struct rte_flow *flow,
+		     const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error);
 	/** Size in bytes of the destination structure. */
 	const unsigned int dst_sz;
 	/** List of possible subsequent items. */
@@ -134,107 +102,185 @@ struct mlx4_drop {
 };
 
 /**
- * Convert Ethernet item to Verbs specification.
+ * Merge Ethernet pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ * - Not providing @p item->spec or providing an empty @p mask->dst is
+ *   *only* supported if the rule doesn't specify additional matching
+ *   criteria (i.e. rule is promiscuous-like).
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_eth(const struct rte_flow_item *item,
-		     const void *default_mask,
-		     struct rte_flow *flow)
+mlx4_flow_merge_eth(struct rte_flow *flow,
+		    const struct rte_flow_item *item,
+		    const struct mlx4_flow_proc_item *proc,
+		    struct rte_flow_error *error)
 {
 	const struct rte_flow_item_eth *spec = item->spec;
-	const struct rte_flow_item_eth *mask = item->mask;
+	const struct rte_flow_item_eth *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_eth *eth;
-	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
+	const char *msg;
 	unsigned int i;
 
+	if (!mask) {
+		flow->promisc = 1;
+	} else {
+		uint32_t sum_dst = 0;
+		uint32_t sum_src = 0;
+
+		for (i = 0; i != sizeof(mask->dst.addr_bytes); ++i) {
+			sum_dst += mask->dst.addr_bytes[i];
+			sum_src += mask->src.addr_bytes[i];
+		}
+		if (sum_src) {
+			msg = "mlx4 does not support source MAC matching";
+			goto error;
+		} else if (!sum_dst) {
+			flow->promisc = 1;
+		} else if (sum_dst != (UINT8_C(0xff) * ETHER_ADDR_LEN)) {
+			msg = "mlx4 does not support matching partial"
+				" Ethernet fields";
+			goto error;
+		}
+	}
+	if (!flow->ibv_attr)
+		return 0;
+	if (flow->promisc) {
+		flow->ibv_attr->type = IBV_FLOW_ATTR_ALL_DEFAULT;
+		return 0;
+	}
 	++flow->ibv_attr->num_of_specs;
 	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
 		.type = IBV_FLOW_SPEC_ETH,
-		.size = eth_size,
+		.size = sizeof(*eth),
 	};
-	if (!spec) {
-		flow->ibv_attr->type = IBV_FLOW_ATTR_ALL_DEFAULT;
-		return 0;
-	}
-	if (!mask)
-		mask = default_mask;
 	memcpy(eth->val.dst_mac, spec->dst.addr_bytes, ETHER_ADDR_LEN);
-	memcpy(eth->val.src_mac, spec->src.addr_bytes, ETHER_ADDR_LEN);
 	memcpy(eth->mask.dst_mac, mask->dst.addr_bytes, ETHER_ADDR_LEN);
-	memcpy(eth->mask.src_mac, mask->src.addr_bytes, ETHER_ADDR_LEN);
 	/* Remove unwanted bits from values. */
 	for (i = 0; i < ETHER_ADDR_LEN; ++i) {
 		eth->val.dst_mac[i] &= eth->mask.dst_mac[i];
-		eth->val.src_mac[i] &= eth->mask.src_mac[i];
 	}
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert VLAN item to Verbs specification.
+ * Merge VLAN pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - Matching *all* VLAN traffic by omitting @p item->spec or providing an
+ *   empty @p item->mask would also include non-VLAN traffic. Doing so is
+ *   therefore unsupported.
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_vlan(const struct rte_flow_item *item,
-		      const void *default_mask,
-		      struct rte_flow *flow)
+mlx4_flow_merge_vlan(struct rte_flow *flow,
+		     const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error)
 {
 	const struct rte_flow_item_vlan *spec = item->spec;
-	const struct rte_flow_item_vlan *mask = item->mask;
+	const struct rte_flow_item_vlan *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_eth *eth;
-	const unsigned int eth_size = sizeof(struct ibv_flow_spec_eth);
+	const char *msg;
 
-	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size -
-		       eth_size);
-	if (!spec)
+	if (!mask || !mask->tci) {
+		msg = "mlx4 cannot match all VLAN traffic while excluding"
+			" non-VLAN traffic, TCI VID must be specified";
+		goto error;
+	}
+	if (mask->tci != RTE_BE16(0x0fff)) {
+		msg = "mlx4 does not support partial TCI VID matching";
+		goto error;
+	}
+	if (!flow->ibv_attr)
 		return 0;
-	if (!mask)
-		mask = default_mask;
+	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size -
+		       sizeof(*eth));
 	eth->val.vlan_tag = spec->tci;
 	eth->mask.vlan_tag = mask->tci;
 	eth->val.vlan_tag &= eth->mask.vlan_tag;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert IPv4 item to Verbs specification.
+ * Merge IPv4 pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_ipv4(const struct rte_flow_item *item,
-		      const void *default_mask,
-		      struct rte_flow *flow)
+mlx4_flow_merge_ipv4(struct rte_flow *flow,
+		     const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error)
 {
 	const struct rte_flow_item_ipv4 *spec = item->spec;
-	const struct rte_flow_item_ipv4 *mask = item->mask;
+	const struct rte_flow_item_ipv4 *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_ipv4 *ipv4;
-	unsigned int ipv4_size = sizeof(struct ibv_flow_spec_ipv4);
+	const char *msg;
 
+	if (mask &&
+	    ((uint32_t)(mask->hdr.src_addr + 1) > UINT32_C(1) ||
+	     (uint32_t)(mask->hdr.dst_addr + 1) > UINT32_C(1))) {
+		msg = "mlx4 does not support matching partial IPv4 fields";
+		goto error;
+	}
+	if (!flow->ibv_attr)
+		return 0;
 	++flow->ibv_attr->num_of_specs;
 	ipv4 = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*ipv4 = (struct ibv_flow_spec_ipv4) {
 		.type = IBV_FLOW_SPEC_IPV4,
-		.size = ipv4_size,
+		.size = sizeof(*ipv4),
 	};
 	if (!spec)
 		return 0;
@@ -242,8 +288,6 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 		.src_ip = spec->hdr.src_addr,
 		.dst_ip = spec->hdr.dst_addr,
 	};
-	if (!mask)
-		mask = default_mask;
 	ipv4->mask = (struct ibv_flow_ipv4_filter) {
 		.src_ip = mask->hdr.src_addr,
 		.dst_ip = mask->hdr.dst_addr,
@@ -252,224 +296,188 @@ mlx4_flow_create_ipv4(const struct rte_flow_item *item,
 	ipv4->val.src_ip &= ipv4->mask.src_ip;
 	ipv4->val.dst_ip &= ipv4->mask.dst_ip;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert UDP item to Verbs specification.
+ * Merge UDP pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_udp(const struct rte_flow_item *item,
-		     const void *default_mask,
-		     struct rte_flow *flow)
+mlx4_flow_merge_udp(struct rte_flow *flow,
+		    const struct rte_flow_item *item,
+		    const struct mlx4_flow_proc_item *proc,
+		    struct rte_flow_error *error)
 {
 	const struct rte_flow_item_udp *spec = item->spec;
-	const struct rte_flow_item_udp *mask = item->mask;
+	const struct rte_flow_item_udp *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_tcp_udp *udp;
-	unsigned int udp_size = sizeof(struct ibv_flow_spec_tcp_udp);
+	const char *msg;
 
+	if (!mask ||
+	    ((uint16_t)(mask->hdr.src_port + 1) > UINT16_C(1) ||
+	     (uint16_t)(mask->hdr.dst_port + 1) > UINT16_C(1))) {
+		msg = "mlx4 does not support matching partial UDP fields";
+		goto error;
+	}
+	if (!flow->ibv_attr)
+		return 0;
 	++flow->ibv_attr->num_of_specs;
 	udp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*udp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_UDP,
-		.size = udp_size,
+		.size = sizeof(*udp),
 	};
 	if (!spec)
 		return 0;
 	udp->val.dst_port = spec->hdr.dst_port;
 	udp->val.src_port = spec->hdr.src_port;
-	if (!mask)
-		mask = default_mask;
 	udp->mask.dst_port = mask->hdr.dst_port;
 	udp->mask.src_port = mask->hdr.src_port;
 	/* Remove unwanted bits from values. */
 	udp->val.src_port &= udp->mask.src_port;
 	udp->val.dst_port &= udp->mask.dst_port;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Convert TCP item to Verbs specification.
+ * Merge TCP pattern item into flow rule handle.
  *
- * @param item[in]
- *   Item specification.
- * @param default_mask[in]
- *   Default bit-masks to use when item->mask is not provided.
- * @param flow[in, out]
+ * Additional mlx4-specific constraints on supported fields:
+ *
+ * - No support for partial masks.
+ *
+ * @param[in, out] flow
  *   Flow rule handle to update.
+ * @param[in] item
+ *   Pattern item to merge.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_create_tcp(const struct rte_flow_item *item,
-		     const void *default_mask,
-		     struct rte_flow *flow)
+mlx4_flow_merge_tcp(struct rte_flow *flow,
+		    const struct rte_flow_item *item,
+		    const struct mlx4_flow_proc_item *proc,
+		    struct rte_flow_error *error)
 {
 	const struct rte_flow_item_tcp *spec = item->spec;
-	const struct rte_flow_item_tcp *mask = item->mask;
+	const struct rte_flow_item_tcp *mask =
+		spec ? (item->mask ? item->mask : proc->mask_default) : NULL;
 	struct ibv_flow_spec_tcp_udp *tcp;
-	unsigned int tcp_size = sizeof(struct ibv_flow_spec_tcp_udp);
+	const char *msg;
 
+	if (!mask ||
+	    ((uint16_t)(mask->hdr.src_port + 1) > UINT16_C(1) ||
+	     (uint16_t)(mask->hdr.dst_port + 1) > UINT16_C(1))) {
+		msg = "mlx4 does not support matching partial TCP fields";
+		goto error;
+	}
+	if (!flow->ibv_attr)
+		return 0;
 	++flow->ibv_attr->num_of_specs;
 	tcp = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*tcp = (struct ibv_flow_spec_tcp_udp) {
 		.type = IBV_FLOW_SPEC_TCP,
-		.size = tcp_size,
+		.size = sizeof(*tcp),
 	};
 	if (!spec)
 		return 0;
 	tcp->val.dst_port = spec->hdr.dst_port;
 	tcp->val.src_port = spec->hdr.src_port;
-	if (!mask)
-		mask = default_mask;
 	tcp->mask.dst_port = mask->hdr.dst_port;
 	tcp->mask.src_port = mask->hdr.src_port;
 	/* Remove unwanted bits from values. */
 	tcp->val.src_port &= tcp->mask.src_port;
 	tcp->val.dst_port &= tcp->mask.dst_port;
 	return 0;
+error:
+	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				  item, msg);
 }
 
 /**
- * Check support for a given item.
+ * Perform basic sanity checks on a pattern item.
  *
- * @param item[in]
+ * @param[in] item
  *   Item specification.
- * @param mask[in]
- *   Bit-masks covering supported fields to compare with spec, last and mask in
- *   \item.
- * @param size
- *   Bit-Mask size in bytes.
+ * @param[in] proc
+ *   Associated item-processing object.
+ * @param[out] error
+ *   Perform verbose error reporting if not NULL.
  *
  * @return
- *   0 on success, negative value otherwise.
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_flow_item_validate(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size)
+mlx4_flow_item_check(const struct rte_flow_item *item,
+		     const struct mlx4_flow_proc_item *proc,
+		     struct rte_flow_error *error)
 {
-	int ret = 0;
+	const uint8_t *mask;
+	unsigned int i;
 
+	/* item->last and item->mask cannot exist without item->spec. */
 	if (!item->spec && (item->mask || item->last))
-		return -1;
-	if (item->spec && !item->mask) {
-		unsigned int i;
-		const uint8_t *spec = item->spec;
-
-		for (i = 0; i < size; ++i)
-			if ((spec[i] | mask[i]) != mask[i])
-				return -1;
-	}
-	if (item->last && !item->mask) {
-		unsigned int i;
-		const uint8_t *spec = item->last;
-
-		for (i = 0; i < size; ++i)
-			if ((spec[i] | mask[i]) != mask[i])
-				return -1;
-	}
-	if (item->spec && item->last) {
-		uint8_t spec[size];
-		uint8_t last[size];
-		const uint8_t *apply = mask;
-		unsigned int i;
-
-		if (item->mask)
-			apply = item->mask;
-		for (i = 0; i < size; ++i) {
-			spec[i] = ((const uint8_t *)item->spec)[i] & apply[i];
-			last[i] = ((const uint8_t *)item->last)[i] & apply[i];
-		}
-		ret = memcmp(spec, last, size);
-	}
-	return ret;
-}
-
-static int
-mlx4_flow_validate_eth(const struct rte_flow_item *item,
-		       const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_eth *mask = item->mask;
-
-		if (mask->dst.addr_bytes[0] != 0xff ||
-				mask->dst.addr_bytes[1] != 0xff ||
-				mask->dst.addr_bytes[2] != 0xff ||
-				mask->dst.addr_bytes[3] != 0xff ||
-				mask->dst.addr_bytes[4] != 0xff ||
-				mask->dst.addr_bytes[5] != 0xff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_vlan(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_vlan *mask = item->mask;
-
-		if (mask->tci != 0 &&
-		    ntohs(mask->tci) != 0x0fff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_ipv4(const struct rte_flow_item *item,
-			const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_ipv4 *mask = item->mask;
-
-		if (mask->hdr.src_addr != 0 &&
-		    mask->hdr.src_addr != 0xffffffff)
-			return -1;
-		if (mask->hdr.dst_addr != 0 &&
-		    mask->hdr.dst_addr != 0xffffffff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_udp(const struct rte_flow_item *item,
-		       const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_udp *mask = item->mask;
-
-		if (mask->hdr.src_port != 0 &&
-		    mask->hdr.src_port != 0xffff)
-			return -1;
-		if (mask->hdr.dst_port != 0 &&
-		    mask->hdr.dst_port != 0xffff)
-			return -1;
-	}
-	return mlx4_flow_item_validate(item, mask, size);
-}
-
-static int
-mlx4_flow_validate_tcp(const struct rte_flow_item *item,
-		       const uint8_t *mask, unsigned int size)
-{
-	if (item->mask) {
-		const struct rte_flow_item_tcp *mask = item->mask;
-
-		if (mask->hdr.src_port != 0 &&
-		    mask->hdr.src_port != 0xffff)
-			return -1;
-		if (mask->hdr.dst_port != 0 &&
-		    mask->hdr.dst_port != 0xffff)
-			return -1;
+		return rte_flow_error_set
+			(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item,
+			 "\"mask\" or \"last\" field provided without a"
+			 " corresponding \"spec\"");
+	/* No spec, no mask, no problem. */
+	if (!item->spec)
+		return 0;
+	mask = item->mask ?
+		(const uint8_t *)item->mask :
+		(const uint8_t *)proc->mask_default;
+	assert(mask);
+	/*
+	 * Single-pass check to make sure that:
+	 * - Mask is supported, no bits are set outside proc->mask_support.
+	 * - Both item->spec and item->last are included in mask.
+	 */
+	for (i = 0; i != proc->mask_sz; ++i) {
+		if (!mask[i])
+			continue;
+		if ((mask[i] | ((const uint8_t *)proc->mask_support)[i]) !=
+		    ((const uint8_t *)proc->mask_support)[i])
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item, "unsupported field found in \"mask\"");
+		if (item->last &&
+		    (((const uint8_t *)item->spec)[i] & mask[i]) !=
+		    (((const uint8_t *)item->last)[i] & mask[i]))
+			return rte_flow_error_set
+				(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+				 item,
+				 "range between \"spec\" and \"last\""
+				 " is larger than \"mask\"");
 	}
-	return mlx4_flow_item_validate(item, mask, size);
+	return 0;
 }
 
 /** Graph of supported items and associated actions. */
@@ -480,66 +488,62 @@ static const struct mlx4_flow_proc_item mlx4_flow_proc_item_list[] = {
 	[RTE_FLOW_ITEM_TYPE_ETH] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_VLAN,
 				       RTE_FLOW_ITEM_TYPE_IPV4),
-		.mask = &(const struct rte_flow_item_eth){
+		.mask_support = &(const struct rte_flow_item_eth){
+			/* Only destination MAC can be matched. */
 			.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
-			.src.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 		},
-		.default_mask = &rte_flow_item_eth_mask,
+		.mask_default = &rte_flow_item_eth_mask,
 		.mask_sz = sizeof(struct rte_flow_item_eth),
-		.validate = mlx4_flow_validate_eth,
-		.convert = mlx4_flow_create_eth,
+		.merge = mlx4_flow_merge_eth,
 		.dst_sz = sizeof(struct ibv_flow_spec_eth),
 	},
 	[RTE_FLOW_ITEM_TYPE_VLAN] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_IPV4),
-		.mask = &(const struct rte_flow_item_vlan){
+		.mask_support = &(const struct rte_flow_item_vlan){
 			/* Only TCI VID matching is supported. */
 			.tci = RTE_BE16(0x0fff),
 		},
+		.mask_default = &rte_flow_item_vlan_mask,
 		.mask_sz = sizeof(struct rte_flow_item_vlan),
-		.validate = mlx4_flow_validate_vlan,
-		.convert = mlx4_flow_create_vlan,
+		.merge = mlx4_flow_merge_vlan,
 		.dst_sz = 0,
 	},
 	[RTE_FLOW_ITEM_TYPE_IPV4] = {
 		.next_item = NEXT_ITEM(RTE_FLOW_ITEM_TYPE_UDP,
 				       RTE_FLOW_ITEM_TYPE_TCP),
-		.mask = &(const struct rte_flow_item_ipv4){
+		.mask_support = &(const struct rte_flow_item_ipv4){
 			.hdr = {
 				.src_addr = RTE_BE32(0xffffffff),
 				.dst_addr = RTE_BE32(0xffffffff),
 			},
 		},
-		.default_mask = &rte_flow_item_ipv4_mask,
+		.mask_default = &rte_flow_item_ipv4_mask,
 		.mask_sz = sizeof(struct rte_flow_item_ipv4),
-		.validate = mlx4_flow_validate_ipv4,
-		.convert = mlx4_flow_create_ipv4,
+		.merge = mlx4_flow_merge_ipv4,
 		.dst_sz = sizeof(struct ibv_flow_spec_ipv4),
 	},
 	[RTE_FLOW_ITEM_TYPE_UDP] = {
-		.mask = &(const struct rte_flow_item_udp){
+		.mask_support = &(const struct rte_flow_item_udp){
 			.hdr = {
 				.src_port = RTE_BE16(0xffff),
 				.dst_port = RTE_BE16(0xffff),
 			},
 		},
-		.default_mask = &rte_flow_item_udp_mask,
+		.mask_default = &rte_flow_item_udp_mask,
 		.mask_sz = sizeof(struct rte_flow_item_udp),
-		.validate = mlx4_flow_validate_udp,
-		.convert = mlx4_flow_create_udp,
+		.merge = mlx4_flow_merge_udp,
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
 	[RTE_FLOW_ITEM_TYPE_TCP] = {
-		.mask = &(const struct rte_flow_item_tcp){
+		.mask_support = &(const struct rte_flow_item_tcp){
 			.hdr = {
 				.src_port = RTE_BE16(0xffff),
 				.dst_port = RTE_BE16(0xffff),
 			},
 		},
-		.default_mask = &rte_flow_item_tcp_mask,
+		.mask_default = &rte_flow_item_tcp_mask,
 		.mask_sz = sizeof(struct rte_flow_item_tcp),
-		.validate = mlx4_flow_validate_tcp,
-		.convert = mlx4_flow_create_tcp,
+		.merge = mlx4_flow_merge_tcp,
 		.dst_sz = sizeof(struct ibv_flow_spec_tcp_udp),
 	},
 };
@@ -577,6 +581,7 @@ mlx4_flow_prepare(struct priv *priv,
 	const struct mlx4_flow_proc_item *proc;
 	struct rte_flow temp = { .ibv_attr_size = sizeof(*temp.ibv_attr) };
 	struct rte_flow *flow = &temp;
+	const char *msg = NULL;
 
 	if (attr->group)
 		return rte_flow_error_set
@@ -609,18 +614,11 @@ mlx4_flow_prepare(struct priv *priv,
 			flow->internal = 1;
 			continue;
 		}
-		/*
-		 * The nic can support patterns with NULL eth spec only
-		 * if eth is a single item in a rule.
-		 */
-		if (!item->spec && item->type == RTE_FLOW_ITEM_TYPE_ETH) {
-			const struct rte_flow_item *next = item + 1;
-
-			if (next->type)
-				return rte_flow_error_set
-					(error, ENOTSUP,
-					 RTE_FLOW_ERROR_TYPE_ITEM, item,
-					 "the rule requires an Ethernet spec");
+		if (flow->promisc) {
+			msg = "mlx4 does not support additional matching"
+				" criteria combined with indiscriminate"
+				" matching on Ethernet headers";
+			goto exit_item_not_supported;
 		}
 		for (i = 0; proc->next_item && proc->next_item[i]; ++i) {
 			if (proc->next_item[i] == item->type) {
@@ -631,19 +629,19 @@ mlx4_flow_prepare(struct priv *priv,
 		if (!next)
 			goto exit_item_not_supported;
 		proc = next;
-		/* Perform validation once, while handle is not allocated. */
+		/*
+		 * Perform basic sanity checks only once, while handle is
+		 * not allocated.
+		 */
 		if (flow == &temp) {
-			err = proc->validate(item, proc->mask, proc->mask_sz);
+			err = mlx4_flow_item_check(item, proc, error);
 			if (err)
-				goto exit_item_not_supported;
-		} else if (proc->convert) {
-			err = proc->convert(item,
-					    (proc->default_mask ?
-					     proc->default_mask :
-					     proc->mask),
-					    flow);
+				return err;
+		}
+		if (proc->merge) {
+			err = proc->merge(flow, item, proc, error);
 			if (err)
-				goto exit_item_not_supported;
+				return err;
 		}
 		flow->ibv_attr_size += proc->dst_sz;
 	}
@@ -712,7 +710,7 @@ mlx4_flow_prepare(struct priv *priv,
 	return 0;
 exit_item_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
-				  item, "item not supported");
+				  item, msg ? msg : "item not supported");
 exit_action_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 				  action, "action not supported");
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 13495d7..3036ff5 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -65,6 +65,7 @@ struct rte_flow {
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
 	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
 	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
+	uint32_t promisc:1; /**< This rule matches everything. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint16_t queue_id; /**< Target queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 17/29] net/mlx4: add MAC addresses configuration support
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (15 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 16/29] net/mlx4: refactor flow item validation code Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 18/29] net/mlx4: add VLAN filter " Adrien Mazarguil
                     ` (12 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

This commit brings back support for configuring up to 128 MAC addresses on
a port through internal flow rules automatically generated on demand.

Unlike its previous incarnation, the necessary extra flow rule for
broadcast traffic does not consume an entry from the MAC array anymore.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 +
 drivers/net/mlx4/mlx4.c           |  7 ++-
 drivers/net/mlx4/mlx4.h           | 10 +++-
 drivers/net/mlx4/mlx4_ethdev.c    | 87 +++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4_flow.c      | 90 ++++++++++++++++++++++++++++------
 drivers/net/mlx4/mlx4_flow.h      |  2 +
 6 files changed, 177 insertions(+), 20 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 0812a30..d17774f 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -12,6 +12,7 @@ Rx interrupt         = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
+Unicast MAC filter   = Y
 SR-IOV               = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 256aa3d..99c87ff 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -221,6 +221,9 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_set_link_up = mlx4_dev_set_link_up,
 	.dev_close = mlx4_dev_close,
 	.link_update = mlx4_link_update,
+	.mac_addr_remove = mlx4_mac_addr_remove,
+	.mac_addr_add = mlx4_mac_addr_add,
+	.mac_addr_set = mlx4_mac_addr_set,
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
@@ -552,7 +555,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[2], mac.addr_bytes[3],
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
-		priv->mac = mac;
+		priv->mac[0] = mac;
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
@@ -581,7 +584,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 		eth_dev->data->dev_private = priv;
-		eth_dev->data->mac_addrs = &priv->mac;
+		eth_dev->data->mac_addrs = priv->mac;
 		eth_dev->device = &pci_dev->device;
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
 		eth_dev->device->driver = &mlx4_driver.driver;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index fb4708d..15ecd95 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -52,6 +52,9 @@
 #include <rte_interrupts.h>
 #include <rte_mempool.h>
 
+/** Maximum number of simultaneous MAC addresses. This value is arbitrary. */
+#define MLX4_MAX_MAC_ADDRESSES 128
+
 /** Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
@@ -99,7 +102,6 @@ struct priv {
 	struct ibv_context *ctx; /**< Verbs context. */
 	struct ibv_device_attr device_attr; /**< Device properties. */
 	struct ibv_pd *pd; /**< Protection Domain. */
-	struct ether_addr mac; /**< MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /**< Configured MTU. */
 	uint8_t port; /**< Physical port number. */
@@ -110,6 +112,8 @@ struct priv {
 	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
 	struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
 	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
+	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
+	/**< Configured MAC addresses. Unused entries are zeroed. */
 };
 
 /* mlx4_ethdev.c */
@@ -120,6 +124,10 @@ int mlx4_mtu_get(struct priv *priv, uint16_t *mtu);
 int mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
 int mlx4_dev_set_link_down(struct rte_eth_dev *dev);
 int mlx4_dev_set_link_up(struct rte_eth_dev *dev);
+void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		      uint32_t index, uint32_t vmdq);
+void mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
 int mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
 void mlx4_stats_reset(struct rte_eth_dev *dev);
 void mlx4_dev_infos_get(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 8962be1..52924df 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -64,9 +64,11 @@
 #include <rte_errno.h>
 #include <rte_ethdev.h>
 #include <rte_ether.h>
+#include <rte_flow.h>
 #include <rte_pci.h>
 
 #include "mlx4.h"
+#include "mlx4_flow.h"
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
@@ -518,6 +520,88 @@ mlx4_dev_set_link_up(struct rte_eth_dev *dev)
 }
 
 /**
+ * DPDK callback to remove a MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param index
+ *   MAC address index.
+ */
+void
+mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
+
+	if (index >= RTE_DIM(priv->mac)) {
+		rte_errno = EINVAL;
+		return;
+	}
+	memset(&priv->mac[index], 0, sizeof(priv->mac[index]));
+	if (!mlx4_flow_sync(priv, &error))
+		return;
+	ERROR("failed to synchronize flow rules after removing MAC address"
+	      " at index %d (code %d, \"%s\"),"
+	      " flow error type %d, cause %p, message: %s",
+	      index, rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+}
+
+/**
+ * DPDK callback to add a MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mac_addr
+ *   MAC address to register.
+ * @param index
+ *   MAC address index.
+ * @param vmdq
+ *   VMDq pool index to associate address with (ignored).
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+		  uint32_t index, uint32_t vmdq)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
+	int ret;
+
+	(void)vmdq;
+	if (index >= RTE_DIM(priv->mac)) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	memcpy(&priv->mac[index], mac_addr, sizeof(priv->mac[index]));
+	ret = mlx4_flow_sync(priv, &error);
+	if (!ret)
+		return 0;
+	ERROR("failed to synchronize flow rules after adding MAC address"
+	      " at index %d (code %d, \"%s\"),"
+	      " flow error type %d, cause %p, message: %s",
+	      index, rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+	return ret;
+}
+
+/**
+ * DPDK callback to set the primary MAC address.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param mac_addr
+ *   MAC address to register.
+ */
+void
+mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	mlx4_mac_addr_add(dev, mac_addr, 0, 0);
+}
+
+/**
  * DPDK callback to get information about the device.
  *
  * @param dev
@@ -549,8 +633,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 		max = 65535;
 	info->max_rx_queues = max;
 	info->max_tx_queues = max;
-	/* Last array entry is reserved for broadcast. */
-	info->max_mac_addrs = 1;
+	info->max_mac_addrs = RTE_DIM(priv->mac);
 	info->rx_offload_capa = 0;
 	info->tx_offload_capa = 0;
 	if (mlx4_get_ifname(priv, &ifname) == 0)
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 3af83f2..14d2ed3 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -58,6 +58,7 @@
 #include <rte_errno.h>
 #include <rte_eth_ctrl.h>
 #include <rte_ethdev.h>
+#include <rte_ether.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
@@ -1010,6 +1011,10 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 /**
  * Generate internal flow rules.
  *
+ * - MAC flow rules are generated from @p dev->data->mac_addrs
+ *   (@p priv->mac array).
+ * - An additional flow rule for Ethernet broadcasts is also generated.
+ *
  * @param priv
  *   Pointer to private structure.
  * @param[out] error
@@ -1025,18 +1030,18 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 		.priority = MLX4_FLOW_PRIORITY_LAST,
 		.ingress = 1,
 	};
+	struct rte_flow_item_eth eth_spec;
+	const struct rte_flow_item_eth eth_mask = {
+		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+	};
 	struct rte_flow_item pattern[] = {
 		{
 			.type = MLX4_FLOW_ITEM_TYPE_INTERNAL,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_ETH,
-			.spec = &(struct rte_flow_item_eth){
-				.dst = priv->mac,
-			},
-			.mask = &(struct rte_flow_item_eth){
-				.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
-			},
+			.spec = &eth_spec,
+			.mask = &eth_mask,
 		},
 		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
@@ -1053,10 +1058,69 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			.type = RTE_FLOW_ACTION_TYPE_END,
 		},
 	};
+	struct ether_addr *rule_mac = &eth_spec.dst;
+	struct rte_flow *flow;
+	unsigned int i;
+	int err = 0;
 
-	if (!mlx4_flow_create(priv->dev, &attr, pattern, actions, error))
-		return -rte_errno;
-	return 0;
+	for (i = 0; i != RTE_DIM(priv->mac) + 1; ++i) {
+		const struct ether_addr *mac;
+
+		/* Broadcasts are handled by an extra iteration. */
+		if (i < RTE_DIM(priv->mac))
+			mac = &priv->mac[i];
+		else
+			mac = &eth_mask.dst;
+		if (is_zero_ether_addr(mac))
+			continue;
+		/* Check if MAC flow rule is already present. */
+		for (flow = LIST_FIRST(&priv->flows);
+		     flow && flow->internal;
+		     flow = LIST_NEXT(flow, next)) {
+			const struct ibv_flow_spec_eth *eth =
+				(const void *)((uintptr_t)flow->ibv_attr +
+					       sizeof(*flow->ibv_attr));
+			unsigned int j;
+
+			if (!flow->mac)
+				continue;
+			assert(flow->ibv_attr->type == IBV_FLOW_ATTR_NORMAL);
+			assert(flow->ibv_attr->num_of_specs == 1);
+			assert(eth->type == IBV_FLOW_SPEC_ETH);
+			for (j = 0; j != sizeof(mac->addr_bytes); ++j)
+				if (eth->val.dst_mac[j] != mac->addr_bytes[j] ||
+				    eth->mask.dst_mac[j] != UINT8_C(0xff) ||
+				    eth->val.src_mac[j] != UINT8_C(0x00) ||
+				    eth->mask.src_mac[j] != UINT8_C(0x00))
+					break;
+			if (j == sizeof(mac->addr_bytes))
+				break;
+		}
+		if (!flow || !flow->internal) {
+			/* Not found, create a new flow rule. */
+			memcpy(rule_mac, mac, sizeof(*mac));
+			flow = mlx4_flow_create(priv->dev, &attr, pattern,
+						actions, error);
+			if (!flow) {
+				err = -rte_errno;
+				break;
+			}
+		}
+		flow->select = 1;
+		flow->mac = 1;
+	}
+	/* Clear selection and clean up stale MAC flow rules. */
+	flow = LIST_FIRST(&priv->flows);
+	while (flow && flow->internal) {
+		struct rte_flow *next = LIST_NEXT(flow, next);
+
+		if (flow->mac && !flow->select)
+			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
+		else
+			flow->select = 0;
+		flow = next;
+	}
+	return err;
 }
 
 /**
@@ -1090,12 +1154,8 @@ mlx4_flow_sync(struct priv *priv, struct rte_flow_error *error)
 		     flow && flow->internal;
 		     flow = LIST_FIRST(&priv->flows))
 			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
-	} else if (!LIST_FIRST(&priv->flows) ||
-		   !LIST_FIRST(&priv->flows)->internal) {
-		/*
-		 * If the first rule is not internal outside isolated mode,
-		 * they must be added back.
-		 */
+	} else {
+		/* Refresh internal rules. */
 		ret = mlx4_flow_internal(priv, error);
 		if (ret)
 			return ret;
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 3036ff5..fcdf461 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -64,7 +64,9 @@ struct rte_flow {
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
 	uint32_t ibv_attr_size; /**< Size of Verbs attributes. */
+	uint32_t select:1; /**< Used by operations on the linked list. */
 	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
+	uint32_t mac:1; /**< Rule associated with a configured MAC address. */
 	uint32_t promisc:1; /**< This rule matches everything. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 18/29] net/mlx4: add VLAN filter configuration support
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (16 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 17/29] net/mlx4: add MAC addresses configuration support Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 19/29] net/mlx4: add flow support for multicast traffic Adrien Mazarguil
                     ` (11 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

This commit brings back VLAN filter configuration support without any
artificial limitation on the number of simultaneous VLANs that can be
configured (previously 127).

Also thanks to the fact it does not rely on fixed per-queue arrays for
potential Verbs flow handle storage anymore, this version wastes a lot less
memory (previously 128 * 127 * pointer size, i.e. 130 kiB per Rx queue,
only one of which actually had any use for this room: the RSS parent
queue).

The number of internal flow rules generated still depends on the number of
configured MAC addresses times that of configured VLAN filters though.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 +
 drivers/net/mlx4/mlx4.c           |  1 +
 drivers/net/mlx4/mlx4.h           |  1 +
 drivers/net/mlx4/mlx4_ethdev.c    | 42 +++++++++++++++++++++
 drivers/net/mlx4/mlx4_flow.c      | 67 ++++++++++++++++++++++++++++++++++
 5 files changed, 112 insertions(+)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index d17774f..bfe0eb1 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -14,6 +14,7 @@ MTU update           = Y
 Jumbo frame          = Y
 Unicast MAC filter   = Y
 SR-IOV               = Y
+VLAN filter          = Y
 Basic stats          = Y
 Stats per queue      = Y
 Other kdrv           = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 99c87ff..e25e958 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -227,6 +227,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
+	.vlan_filter_set = mlx4_vlan_filter_set,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 15ecd95..cc403ea 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -128,6 +128,7 @@ void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 		      uint32_t index, uint32_t vmdq);
 void mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr);
+int mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on);
 int mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
 void mlx4_stats_reset(struct rte_eth_dev *dev);
 void mlx4_dev_infos_get(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 52924df..7721f13 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -588,6 +588,48 @@ mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 }
 
 /**
+ * DPDK callback to configure a VLAN filter.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param vlan_id
+ *   VLAN ID to filter.
+ * @param on
+ *   Toggle filter.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_flow_error error;
+	unsigned int vidx = vlan_id / 64;
+	unsigned int vbit = vlan_id % 64;
+	uint64_t *v;
+	int ret;
+
+	if (vidx >= RTE_DIM(dev->data->vlan_filter_conf.ids)) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	v = &dev->data->vlan_filter_conf.ids[vidx];
+	*v &= ~(UINT64_C(1) << vbit);
+	*v |= (uint64_t)!!on << vbit;
+	ret = mlx4_flow_sync(priv, &error);
+	if (!ret)
+		return 0;
+	ERROR("failed to synchronize flow rules after %s VLAN filter on ID %u"
+	      " (code %d, \"%s\"), "
+	      " flow error type %d, cause %p, message: %s",
+	      on ? "enabling" : "disabling", vlan_id,
+	      rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+	return ret;
+}
+
+/**
  * DPDK callback to set the primary MAC address.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 14d2ed3..377b48b 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -1009,11 +1009,36 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 }
 
 /**
+ * Helper function to determine the next configured VLAN filter.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param vlan
+ *   VLAN ID to use as a starting point.
+ *
+ * @return
+ *   Next configured VLAN ID or a high value (>= 4096) if there is none.
+ */
+static uint16_t
+mlx4_flow_internal_next_vlan(struct priv *priv, uint16_t vlan)
+{
+	while (vlan < 4096) {
+		if (priv->dev->data->vlan_filter_conf.ids[vlan / 64] &
+		    (UINT64_C(1) << (vlan % 64)))
+			return vlan;
+		++vlan;
+	}
+	return vlan;
+}
+
+/**
  * Generate internal flow rules.
  *
  * - MAC flow rules are generated from @p dev->data->mac_addrs
  *   (@p priv->mac array).
  * - An additional flow rule for Ethernet broadcasts is also generated.
+ * - All these are per-VLAN if @p dev->data->dev_conf.rxmode.hw_vlan_filter
+ *   is enabled and VLAN filters are configured.
  *
  * @param priv
  *   Pointer to private structure.
@@ -1034,6 +1059,10 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	const struct rte_flow_item_eth eth_mask = {
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 	};
+	struct rte_flow_item_vlan vlan_spec;
+	const struct rte_flow_item_vlan vlan_mask = {
+		.tci = RTE_BE16(0x0fff),
+	};
 	struct rte_flow_item pattern[] = {
 		{
 			.type = MLX4_FLOW_ITEM_TYPE_INTERNAL,
@@ -1044,6 +1073,10 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			.mask = &eth_mask,
 		},
 		{
+			/* Replaced with VLAN if filtering is enabled. */
+			.type = RTE_FLOW_ITEM_TYPE_END,
+		},
+		{
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
@@ -1059,10 +1092,33 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 		},
 	};
 	struct ether_addr *rule_mac = &eth_spec.dst;
+	rte_be16_t *rule_vlan =
+		priv->dev->data->dev_conf.rxmode.hw_vlan_filter ?
+		&vlan_spec.tci :
+		NULL;
+	uint16_t vlan = 0;
 	struct rte_flow *flow;
 	unsigned int i;
 	int err = 0;
 
+	/*
+	 * Set up VLAN item if filtering is enabled and at least one VLAN
+	 * filter is configured.
+	 */
+	if (rule_vlan) {
+		vlan = mlx4_flow_internal_next_vlan(priv, 0);
+		if (vlan < 4096) {
+			pattern[2] = (struct rte_flow_item){
+				.type = RTE_FLOW_ITEM_TYPE_VLAN,
+				.spec = &vlan_spec,
+				.mask = &vlan_mask,
+			};
+next_vlan:
+			*rule_vlan = rte_cpu_to_be_16(vlan);
+		} else {
+			rule_vlan = NULL;
+		}
+	}
 	for (i = 0; i != RTE_DIM(priv->mac) + 1; ++i) {
 		const struct ether_addr *mac;
 
@@ -1087,6 +1143,12 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			assert(flow->ibv_attr->type == IBV_FLOW_ATTR_NORMAL);
 			assert(flow->ibv_attr->num_of_specs == 1);
 			assert(eth->type == IBV_FLOW_SPEC_ETH);
+			if (rule_vlan &&
+			    (eth->val.vlan_tag != *rule_vlan ||
+			     eth->mask.vlan_tag != RTE_BE16(0x0fff)))
+				continue;
+			if (!rule_vlan && eth->mask.vlan_tag)
+				continue;
 			for (j = 0; j != sizeof(mac->addr_bytes); ++j)
 				if (eth->val.dst_mac[j] != mac->addr_bytes[j] ||
 				    eth->mask.dst_mac[j] != UINT8_C(0xff) ||
@@ -1109,6 +1171,11 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 		flow->select = 1;
 		flow->mac = 1;
 	}
+	if (!err && rule_vlan) {
+		vlan = mlx4_flow_internal_next_vlan(priv, vlan + 1);
+		if (vlan < 4096)
+			goto next_vlan;
+	}
 	/* Clear selection and clean up stale MAC flow rules. */
 	flow = LIST_FIRST(&priv->flows);
 	while (flow && flow->internal) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 19/29] net/mlx4: add flow support for multicast traffic
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (17 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 18/29] net/mlx4: add VLAN filter " Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 20/29] net/mlx4: restore promisc and allmulti support Adrien Mazarguil
                     ` (10 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Give users the ability to create flow rules that match all multicast
traffic. Like promiscuous flow rules, they come with restrictions such as
not allowing additional matching criteria.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 +
 drivers/net/mlx4/mlx4_flow.c      | 17 +++++++++++++++--
 drivers/net/mlx4/mlx4_flow.h      |  1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index bfe0eb1..9e3ba34 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,6 +13,7 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Unicast MAC filter   = Y
+Multicast MAC filter = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Basic stats          = Y
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 377b48b..15526af 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -107,7 +107,9 @@ struct mlx4_drop {
  *
  * Additional mlx4-specific constraints on supported fields:
  *
- * - No support for partial masks.
+ * - No support for partial masks, except in the specific case of matching
+ *   all multicast traffic (@p spec->dst and @p mask->dst equal to
+ *   01:00:00:00:00:00).
  * - Not providing @p item->spec or providing an empty @p mask->dst is
  *   *only* supported if the rule doesn't specify additional matching
  *   criteria (i.e. rule is promiscuous-like).
@@ -152,6 +154,13 @@ mlx4_flow_merge_eth(struct rte_flow *flow,
 			goto error;
 		} else if (!sum_dst) {
 			flow->promisc = 1;
+		} else if (sum_dst == 1 && mask->dst.addr_bytes[0] == 1) {
+			if (!(spec->dst.addr_bytes[0] & 1)) {
+				msg = "mlx4 does not support the explicit"
+					" exclusion of all multicast traffic";
+				goto error;
+			}
+			flow->allmulti = 1;
 		} else if (sum_dst != (UINT8_C(0xff) * ETHER_ADDR_LEN)) {
 			msg = "mlx4 does not support matching partial"
 				" Ethernet fields";
@@ -164,6 +173,10 @@ mlx4_flow_merge_eth(struct rte_flow *flow,
 		flow->ibv_attr->type = IBV_FLOW_ATTR_ALL_DEFAULT;
 		return 0;
 	}
+	if (flow->allmulti) {
+		flow->ibv_attr->type = IBV_FLOW_ATTR_MC_DEFAULT;
+		return 0;
+	}
 	++flow->ibv_attr->num_of_specs;
 	eth = (void *)((uintptr_t)flow->ibv_attr + flow->ibv_attr_size);
 	*eth = (struct ibv_flow_spec_eth) {
@@ -615,7 +628,7 @@ mlx4_flow_prepare(struct priv *priv,
 			flow->internal = 1;
 			continue;
 		}
-		if (flow->promisc) {
+		if (flow->promisc || flow->allmulti) {
 			msg = "mlx4 does not support additional matching"
 				" criteria combined with indiscriminate"
 				" matching on Ethernet headers";
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index fcdf461..134e14d 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -68,6 +68,7 @@ struct rte_flow {
 	uint32_t internal:1; /**< Internal flow rule outside isolated mode. */
 	uint32_t mac:1; /**< Rule associated with a configured MAC address. */
 	uint32_t promisc:1; /**< This rule matches everything. */
+	uint32_t allmulti:1; /**< This rule matches all multicast traffic. */
 	uint32_t drop:1; /**< This rule drops packets. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint16_t queue_id; /**< Target queue. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 20/29] net/mlx4: restore promisc and allmulti support
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (18 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 19/29] net/mlx4: add flow support for multicast traffic Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 21/29] net/mlx4: update Rx/Tx callbacks consistently Adrien Mazarguil
                     ` (9 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Implement promiscuous and all multicast through internal flow rules
automatically generated according to the configured mode.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  2 +
 drivers/net/mlx4/mlx4.c           |  4 ++
 drivers/net/mlx4/mlx4.h           |  4 ++
 drivers/net/mlx4/mlx4_ethdev.c    | 95 ++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_flow.c      | 63 +++++++++++++++++++---
 5 files changed, 162 insertions(+), 6 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 9e3ba34..6f8c82a 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -12,6 +12,8 @@ Rx interrupt         = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
+Promiscuous mode     = Y
+Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
 SR-IOV               = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e25e958..f02508a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -221,6 +221,10 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_set_link_up = mlx4_dev_set_link_up,
 	.dev_close = mlx4_dev_close,
 	.link_update = mlx4_link_update,
+	.promiscuous_enable = mlx4_promiscuous_enable,
+	.promiscuous_disable = mlx4_promiscuous_disable,
+	.allmulticast_enable = mlx4_allmulticast_enable,
+	.allmulticast_disable = mlx4_allmulticast_disable,
 	.mac_addr_remove = mlx4_mac_addr_remove,
 	.mac_addr_add = mlx4_mac_addr_add,
 	.mac_addr_set = mlx4_mac_addr_set,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index cc403ea..a27399a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -124,6 +124,10 @@ int mlx4_mtu_get(struct priv *priv, uint16_t *mtu);
 int mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
 int mlx4_dev_set_link_down(struct rte_eth_dev *dev);
 int mlx4_dev_set_link_up(struct rte_eth_dev *dev);
+void mlx4_promiscuous_enable(struct rte_eth_dev *dev);
+void mlx4_promiscuous_disable(struct rte_eth_dev *dev);
+void mlx4_allmulticast_enable(struct rte_eth_dev *dev);
+void mlx4_allmulticast_disable(struct rte_eth_dev *dev);
 void mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
 int mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 		      uint32_t index, uint32_t vmdq);
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 7721f13..01fb195 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -520,6 +520,101 @@ mlx4_dev_set_link_up(struct rte_eth_dev *dev)
 }
 
 /**
+ * Supported Rx mode toggles.
+ *
+ * Even and odd values respectively stand for off and on.
+ */
+enum rxmode_toggle {
+	RXMODE_TOGGLE_PROMISC_OFF,
+	RXMODE_TOGGLE_PROMISC_ON,
+	RXMODE_TOGGLE_ALLMULTI_OFF,
+	RXMODE_TOGGLE_ALLMULTI_ON,
+};
+
+/**
+ * Helper function to toggle promiscuous and all multicast modes.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param toggle
+ *   Toggle to set.
+ */
+static void
+mlx4_rxmode_toggle(struct rte_eth_dev *dev, enum rxmode_toggle toggle)
+{
+	struct priv *priv = dev->data->dev_private;
+	const char *mode;
+	struct rte_flow_error error;
+
+	switch (toggle) {
+	case RXMODE_TOGGLE_PROMISC_OFF:
+	case RXMODE_TOGGLE_PROMISC_ON:
+		mode = "promiscuous";
+		dev->data->promiscuous = toggle & 1;
+		break;
+	case RXMODE_TOGGLE_ALLMULTI_OFF:
+	case RXMODE_TOGGLE_ALLMULTI_ON:
+		mode = "all multicast";
+		dev->data->all_multicast = toggle & 1;
+		break;
+	}
+	if (!mlx4_flow_sync(priv, &error))
+		return;
+	ERROR("cannot toggle %s mode (code %d, \"%s\"),"
+	      " flow error type %d, cause %p, message: %s",
+	      mode, rte_errno, strerror(rte_errno), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
+}
+
+/**
+ * DPDK callback to enable promiscuous mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_PROMISC_ON);
+}
+
+/**
+ * DPDK callback to disable promiscuous mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_PROMISC_OFF);
+}
+
+/**
+ * DPDK callback to enable all multicast mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_ALLMULTI_ON);
+}
+
+/**
+ * DPDK callback to disable all multicast mode.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	mlx4_rxmode_toggle(dev, RXMODE_TOGGLE_ALLMULTI_OFF);
+}
+
+/**
  * DPDK callback to remove a MAC address.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 15526af..41423cd 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -1047,6 +1047,14 @@ mlx4_flow_internal_next_vlan(struct priv *priv, uint16_t vlan)
 /**
  * Generate internal flow rules.
  *
+ * Various flow rules are created depending on the mode the device is in:
+ *
+ * 1. Promiscuous: port MAC + catch-all (VLAN filtering is ignored).
+ * 2. All multicast: port MAC/VLAN + catch-all multicast.
+ * 3. Otherwise: port MAC/VLAN + broadcast MAC/VLAN.
+ *
+ * About MAC flow rules:
+ *
  * - MAC flow rules are generated from @p dev->data->mac_addrs
  *   (@p priv->mac array).
  * - An additional flow rule for Ethernet broadcasts is also generated.
@@ -1072,6 +1080,9 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	const struct rte_flow_item_eth eth_mask = {
 		.dst.addr_bytes = "\xff\xff\xff\xff\xff\xff",
 	};
+	const struct rte_flow_item_eth eth_allmulti = {
+		.dst.addr_bytes = "\x01\x00\x00\x00\x00\x00",
+	};
 	struct rte_flow_item_vlan vlan_spec;
 	const struct rte_flow_item_vlan vlan_mask = {
 		.tci = RTE_BE16(0x0fff),
@@ -1106,9 +1117,13 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	};
 	struct ether_addr *rule_mac = &eth_spec.dst;
 	rte_be16_t *rule_vlan =
-		priv->dev->data->dev_conf.rxmode.hw_vlan_filter ?
+		priv->dev->data->dev_conf.rxmode.hw_vlan_filter &&
+		!priv->dev->data->promiscuous ?
 		&vlan_spec.tci :
 		NULL;
+	int broadcast =
+		!priv->dev->data->promiscuous &&
+		!priv->dev->data->all_multicast;
 	uint16_t vlan = 0;
 	struct rte_flow *flow;
 	unsigned int i;
@@ -1132,7 +1147,7 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			rule_vlan = NULL;
 		}
 	}
-	for (i = 0; i != RTE_DIM(priv->mac) + 1; ++i) {
+	for (i = 0; i != RTE_DIM(priv->mac) + broadcast; ++i) {
 		const struct ether_addr *mac;
 
 		/* Broadcasts are handled by an extra iteration. */
@@ -1178,23 +1193,59 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 						actions, error);
 			if (!flow) {
 				err = -rte_errno;
-				break;
+				goto error;
 			}
 		}
 		flow->select = 1;
 		flow->mac = 1;
 	}
-	if (!err && rule_vlan) {
+	if (rule_vlan) {
 		vlan = mlx4_flow_internal_next_vlan(priv, vlan + 1);
 		if (vlan < 4096)
 			goto next_vlan;
 	}
-	/* Clear selection and clean up stale MAC flow rules. */
+	/* Take care of promiscuous and all multicast flow rules. */
+	if (!broadcast) {
+		for (flow = LIST_FIRST(&priv->flows);
+		     flow && flow->internal;
+		     flow = LIST_NEXT(flow, next)) {
+			if (priv->dev->data->promiscuous) {
+				if (flow->promisc)
+					break;
+			} else {
+				assert(priv->dev->data->all_multicast);
+				if (flow->allmulti)
+					break;
+			}
+		}
+		if (!flow || !flow->internal) {
+			/* Not found, create a new flow rule. */
+			if (priv->dev->data->promiscuous) {
+				pattern[1].spec = NULL;
+				pattern[1].mask = NULL;
+			} else {
+				assert(priv->dev->data->all_multicast);
+				pattern[1].spec = &eth_allmulti;
+				pattern[1].mask = &eth_allmulti;
+			}
+			pattern[2] = pattern[3];
+			flow = mlx4_flow_create(priv->dev, &attr, pattern,
+						actions, error);
+			if (!flow) {
+				err = -rte_errno;
+				goto error;
+			}
+		}
+		assert(flow->promisc || flow->allmulti);
+		flow->select = 1;
+	}
+error:
+	/* Clear selection and clean up stale internal flow rules. */
 	flow = LIST_FIRST(&priv->flows);
 	while (flow && flow->internal) {
 		struct rte_flow *next = LIST_NEXT(flow, next);
 
-		if (flow->mac && !flow->select)
+		if (!flow->select)
 			claim_zero(mlx4_flow_destroy(priv->dev, flow, error));
 		else
 			flow->select = 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 21/29] net/mlx4: update Rx/Tx callbacks consistently
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (19 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 20/29] net/mlx4: restore promisc and allmulti support Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 22/29] net/mlx4: fix invalid errno value sign Adrien Mazarguil
                     ` (8 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Although their "removed" version acts as a safety against unexpected bursts
while queues are being modified by the control path, these callbacks are
set per device instead of per queue. It makes sense to update them during
start/stop/close cycles instead of queue setup.

As a side effect, this commit addresses a bug left over from a prior
commit: bringing the link down causes the "removed" Tx callback to be used,
however the normal callback is not restored when bringing it back up,
preventing the application from sending traffic at all.

Updating callbacks for a link change is not necessary as bringing the
netdevice down is normally enough to prevent traffic from flowing in.

Fixes: a4951cb98fdf ("net/mlx4: drop scatter/gather support")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.c        | 11 ++++++++---
 drivers/net/mlx4/mlx4_ethdev.c |  4 ----
 drivers/net/mlx4/mlx4_rxq.c    |  2 --
 drivers/net/mlx4/mlx4_txq.c    |  2 --
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f02508a..52f8d51 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -149,6 +149,9 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		      error.message ? error.message : "(unspecified)");
 		goto err;
 	}
+	rte_wmb();
+	dev->tx_pkt_burst = mlx4_tx_burst;
+	dev->rx_pkt_burst = mlx4_rx_burst;
 	return 0;
 err:
 	/* Rollback. */
@@ -173,6 +176,9 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		return;
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
+	dev->tx_pkt_burst = mlx4_tx_burst_removed;
+	dev->rx_pkt_burst = mlx4_rx_burst_removed;
+	rte_wmb();
 	mlx4_flow_sync(priv, NULL);
 	mlx4_intr_uninstall(priv);
 }
@@ -191,14 +197,13 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (priv == NULL)
-		return;
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
-	mlx4_flow_clean(priv);
 	dev->rx_pkt_burst = mlx4_rx_burst_removed;
 	dev->tx_pkt_burst = mlx4_tx_burst_removed;
+	rte_wmb();
+	mlx4_flow_clean(priv);
 	for (i = 0; i != dev->data->nb_rx_queues; ++i)
 		mlx4_rx_queue_release(dev->data->rx_queues[i]);
 	for (i = 0; i != dev->data->nb_tx_queues; ++i)
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 01fb195..ebf2339 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -467,20 +467,16 @@ mlx4_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
 static int
 mlx4_dev_set_link(struct priv *priv, int up)
 {
-	struct rte_eth_dev *dev = priv->dev;
 	int err;
 
 	if (up) {
 		err = mlx4_set_flags(priv, ~IFF_UP, IFF_UP);
 		if (err)
 			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst;
 	} else {
 		err = mlx4_set_flags(priv, ~IFF_UP, ~IFF_UP);
 		if (err)
 			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst_removed;
-		dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index bcb7b94..693db4f 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -436,8 +436,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_free(rxq);
 			return ret;
 		}
-		/* Update receive callback. */
-		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
 	return ret;
 }
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index e0245b0..c1fdbaf 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -438,8 +438,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		DEBUG("%p: adding Tx queue %p to list",
 		      (void *)dev, (void *)txq);
 		dev->data->tx_queues[idx] = txq;
-		/* Update send callback. */
-		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
 	return ret;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 22/29] net/mlx4: fix invalid errno value sign
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (20 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 21/29] net/mlx4: update Rx/Tx callbacks consistently Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 23/29] net/mlx4: drop live queue reconfiguration support Adrien Mazarguil
                     ` (7 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Tx queue elements allocation function sets rte_errno properly and returns
its negative version. Reassigning this value to rte_errno is thus both
invalid and unnecessary.

Fixes: c3e1f93cdf88 ("net/mlx4: standardize on negative errno values")

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_txq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index c1fdbaf..3cece3e 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -340,7 +340,6 @@ mlx4_txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	}
 	ret = mlx4_txq_alloc_elts(&tmpl, desc);
 	if (ret) {
-		rte_errno = ret;
 		ERROR("%p: TXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 23/29] net/mlx4: drop live queue reconfiguration support
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (21 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 22/29] net/mlx4: fix invalid errno value sign Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 24/29] net/mlx4: allocate queues and mbuf rings together Adrien Mazarguil
                     ` (6 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

DPDK ensures that setup functions are never called on configured queues,
or only if they have previously been released.

PMDs therefore do not need to deal with the unexpected reconfiguration of
live queues which may fail with no easy way to recover. Dropping support
for this scenario greatly simplifies the code as allocation and setup steps
and checks can be merged.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_rxq.c  | 281 ++++++++++++++------------------------
 drivers/net/mlx4/mlx4_rxtx.h |   2 -
 drivers/net/mlx4/mlx4_txq.c  | 239 +++++++++++---------------------
 3 files changed, 184 insertions(+), 338 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 693db4f..30b0654 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -162,36 +162,12 @@ mlx4_rxq_free_elts(struct rxq *rxq)
 }
 
 /**
- * Clean up a Rx queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param rxq
- *   Pointer to Rx queue structure.
- */
-void
-mlx4_rxq_cleanup(struct rxq *rxq)
-{
-	DEBUG("cleaning up %p", (void *)rxq);
-	mlx4_rxq_free_elts(rxq);
-	if (rxq->qp != NULL)
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	if (rxq->cq != NULL)
-		claim_zero(ibv_destroy_cq(rxq->cq));
-	if (rxq->channel != NULL)
-		claim_zero(ibv_destroy_comp_channel(rxq->channel));
-	if (rxq->mr != NULL)
-		claim_zero(ibv_dereg_mr(rxq->mr));
-	memset(rxq, 0, sizeof(*rxq));
-}
-
-/**
- * Configure a Rx queue.
+ * DPDK callback to configure a Rx queue.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
- * @param rxq
- *   Pointer to Rx queue structure.
+ * @param idx
+ *   Rx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
  * @param socket
@@ -204,30 +180,53 @@ mlx4_rxq_cleanup(struct rxq *rxq)
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
-static int
-mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	       unsigned int socket, const struct rte_eth_rxconf *conf,
-	       struct rte_mempool *mp)
+int
+mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct rxq tmpl = {
-		.priv = priv,
-		.mp = mp,
-		.socket = socket
-	};
-	struct ibv_qp_attr mod;
-	struct ibv_qp_init_attr qp_init;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int mb_len;
+	uint32_t mb_len = rte_pktmbuf_data_room_size(mp);
+	struct rte_flow_error error;
+	struct rxq *rxq;
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	mb_len = rte_pktmbuf_data_room_size(mp);
-	if (desc == 0) {
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= dev->data->nb_rx_queues) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, dev->data->nb_rx_queues);
+		return -rte_errno;
+	}
+	rxq = dev->data->rx_queues[idx];
+	if (rxq) {
+		rte_errno = EEXIST;
+		ERROR("%p: Rx queue %u already configured, release it first",
+		      (void *)dev, idx);
+		return -rte_errno;
+	}
+	if (!desc) {
 		rte_errno = EINVAL;
 		ERROR("%p: invalid number of Rx descriptors", (void *)dev);
-		goto error;
+		return -rte_errno;
+	}
+	/* Allocate and initialize Rx queue. */
+	rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+	if (!rxq) {
+		rte_errno = ENOMEM;
+		ERROR("%p: unable to allocate queue index %u",
+		      (void *)dev, idx);
+		return -rte_errno;
 	}
+	*rxq = (struct rxq){
+		.priv = priv,
+		.mp = mp,
+		.port_id = dev->data->port_id,
+		.stats.idx = idx,
+		.socket = socket,
+	};
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
@@ -246,201 +245,115 @@ mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		     mb_len - RTE_PKTMBUF_HEADROOM);
 	}
 	/* Use the entire Rx mempool as the memory region. */
-	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
-	if (tmpl.mr == NULL) {
+	rxq->mr = mlx4_mp2mr(priv->pd, mp);
+	if (!rxq->mr) {
 		rte_errno = EINVAL;
 		ERROR("%p: MR creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	if (dev->data->dev_conf.intr_conf.rxq) {
-		tmpl.channel = ibv_create_comp_channel(priv->ctx);
-		if (tmpl.channel == NULL) {
+		rxq->channel = ibv_create_comp_channel(priv->ctx);
+		if (rxq->channel == NULL) {
 			rte_errno = ENOMEM;
 			ERROR("%p: Rx interrupt completion channel creation"
 			      " failure: %s",
 			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
-		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
+		if (mlx4_fd_set_non_blocking(rxq->channel->fd) < 0) {
 			ERROR("%p: unable to make Rx interrupt completion"
 			      " channel non-blocking: %s",
 			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
 	}
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
-	if (tmpl.cq == NULL) {
+	rxq->cq = ibv_create_cq(priv->ctx, desc, NULL, rxq->channel, 0);
+	if (!rxq->cq) {
 		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	qp_init = (struct ibv_qp_init_attr){
-		/* CQ to be associated with the send queue. */
-		.send_cq = tmpl.cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = tmpl.cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = 1,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-	};
-	tmpl.qp = ibv_create_qp(priv->pd, &qp_init);
-	if (tmpl.qp == NULL) {
+	rxq->qp = ibv_create_qp
+		(priv->pd,
+		 &(struct ibv_qp_init_attr){
+			.send_cq = rxq->cq,
+			.recv_cq = rxq->cq,
+			.cap = {
+				.max_recv_wr =
+					RTE_MIN(priv->device_attr.max_qp_wr,
+						desc),
+				.max_recv_sge = 1,
+			},
+			.qp_type = IBV_QPT_RAW_PACKET,
+		 });
+	if (!rxq->qp) {
 		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
+	ret = ibv_modify_qp
+		(rxq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_INIT,
+			.port_num = priv->port,
+		 },
+		 IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_rxq_alloc_elts(&tmpl, desc);
+	ret = mlx4_rxq_alloc_elts(rxq, desc);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
+	ret = ibv_post_recv(rxq->qp, &(*rxq->elts)[0].wr,
+			    &(struct ibv_recv_wr *){ NULL });
 	if (ret) {
 		rte_errno = ret;
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+		ERROR("%p: ibv_post_recv() failed: %s",
 		      (void *)dev,
-		      (void *)bad_wr,
 		      strerror(rte_errno));
 		goto error;
 	}
-	mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
+	ret = ibv_modify_qp
+		(rxq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTR,
+		 },
+		 IBV_QP_STATE);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	/* Save port ID. */
-	tmpl.port_id = dev->data->port_id;
-	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
-	/* Clean up rxq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
-	mlx4_rxq_cleanup(rxq);
-	*rxq = tmpl;
-	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
-	return 0;
+	DEBUG("%p: adding Rx queue %p to list", (void *)dev, (void *)rxq);
+	dev->data->rx_queues[idx] = rxq;
+	/* Enable associated flows. */
+	ret = mlx4_flow_sync(priv, &error);
+	if (!ret)
+		return 0;
+	ERROR("cannot re-attach flow rules to queue %u"
+	      " (code %d, \"%s\"), flow error type %d, cause %p, message: %s",
+	      idx, -ret, strerror(-ret), error.type, error.cause,
+	      error.message ? error.message : "(unspecified)");
 error:
+	dev->data->rx_queues[idx] = NULL;
 	ret = rte_errno;
-	mlx4_rxq_cleanup(&tmpl);
+	mlx4_rx_queue_release(rxq);
 	rte_errno = ret;
 	assert(rte_errno > 0);
 	return -rte_errno;
 }
 
 /**
- * DPDK callback to configure a Rx queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Rx queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = dev->data->rx_queues[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= dev->data->nb_rx_queues) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, dev->data->nb_rx_queues);
-		return -rte_errno;
-	}
-	if (rxq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)rxq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		dev->data->rx_queues[idx] = NULL;
-		/* Disable associated flows. */
-		mlx4_flow_sync(priv, NULL);
-		mlx4_rxq_cleanup(rxq);
-	} else {
-		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
-		if (rxq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = mlx4_rxq_setup(dev, rxq, desc, socket, conf, mp);
-	if (ret) {
-		rte_free(rxq);
-	} else {
-		struct rte_flow_error error;
-
-		rxq->stats.idx = idx;
-		DEBUG("%p: adding Rx queue %p to list",
-		      (void *)dev, (void *)rxq);
-		dev->data->rx_queues[idx] = rxq;
-		/* Re-enable associated flows. */
-		ret = mlx4_flow_sync(priv, &error);
-		if (ret) {
-			ERROR("cannot re-attach flow rules to queue %u"
-			      " (code %d, \"%s\"), flow error type %d,"
-			      " cause %p, message: %s", idx,
-			      -ret, strerror(-ret), error.type, error.cause,
-			      error.message ? error.message : "(unspecified)");
-			dev->data->rx_queues[idx] = NULL;
-			mlx4_rxq_cleanup(rxq);
-			rte_free(rxq);
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
  * DPDK callback to release a Rx queue.
  *
  * @param dpdk_rxq
@@ -464,6 +377,14 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			break;
 		}
 	mlx4_flow_sync(priv, NULL);
-	mlx4_rxq_cleanup(rxq);
+	mlx4_rxq_free_elts(rxq);
+	if (rxq->qp)
+		claim_zero(ibv_destroy_qp(rxq->qp));
+	if (rxq->cq)
+		claim_zero(ibv_destroy_cq(rxq->cq));
+	if (rxq->channel)
+		claim_zero(ibv_destroy_comp_channel(rxq->channel));
+	if (rxq->mr)
+		claim_zero(ibv_dereg_mr(rxq->mr));
 	rte_free(rxq);
 }
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 7a2c982..d62120e 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -122,7 +122,6 @@ struct txq {
 
 /* mlx4_rxq.c */
 
-void mlx4_rxq_cleanup(struct rxq *rxq);
 int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			uint16_t desc, unsigned int socket,
 			const struct rte_eth_rxconf *conf,
@@ -143,7 +142,6 @@ uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
 
 /* mlx4_txq.c */
 
-void mlx4_txq_cleanup(struct txq *txq);
 int mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			uint16_t desc, unsigned int socket,
 			const struct rte_eth_txconf *conf);
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index 3cece3e..f102c68 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -155,34 +155,6 @@ mlx4_txq_free_elts(struct txq *txq)
 	rte_free(elts);
 }
 
-/**
- * Clean up a Tx queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param txq
- *   Pointer to Tx queue structure.
- */
-void
-mlx4_txq_cleanup(struct txq *txq)
-{
-	size_t i;
-
-	DEBUG("cleaning up %p", (void *)txq);
-	mlx4_txq_free_elts(txq);
-	if (txq->qp != NULL)
-		claim_zero(ibv_destroy_qp(txq->qp));
-	if (txq->cq != NULL)
-		claim_zero(ibv_destroy_cq(txq->cq));
-	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
-		if (txq->mp2mr[i].mp == NULL)
-			break;
-		assert(txq->mp2mr[i].mr != NULL);
-		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
-	}
-	memset(txq, 0, sizeof(*txq));
-}
-
 struct txq_mp2mr_mbuf_check_data {
 	int ret;
 };
@@ -242,12 +214,12 @@ mlx4_txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 }
 
 /**
- * Configure a Tx queue.
+ * DPDK callback to configure a Tx queue.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
- * @param txq
- *   Pointer to Tx queue structure.
+ * @param idx
+ *   Tx queue index.
  * @param desc
  *   Number of descriptors to configure in queue.
  * @param socket
@@ -258,190 +230,135 @@ mlx4_txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
-static int
-mlx4_txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
-	       unsigned int socket, const struct rte_eth_txconf *conf)
+int
+mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct txq tmpl = {
-		.priv = priv,
-		.socket = socket
-	};
-	union {
-		struct ibv_qp_init_attr init;
-		struct ibv_qp_attr mod;
-	} attr;
+	struct ibv_qp_init_attr qp_init_attr;
+	struct txq *txq;
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	if (priv == NULL) {
-		rte_errno = EINVAL;
-		goto error;
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= dev->data->nb_tx_queues) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, dev->data->nb_tx_queues);
+		return -rte_errno;
+	}
+	txq = dev->data->tx_queues[idx];
+	if (txq) {
+		rte_errno = EEXIST;
+		DEBUG("%p: Tx queue %u already configured, release it first",
+		      (void *)dev, idx);
+		return -rte_errno;
 	}
-	if (desc == 0) {
+	if (!desc) {
 		rte_errno = EINVAL;
 		ERROR("%p: invalid number of Tx descriptors", (void *)dev);
-		goto error;
+		return -rte_errno;
 	}
-	/* MRs will be registered in mp2mr[] later. */
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
-	if (tmpl.cq == NULL) {
+	/* Allocate and initialize Tx queue. */
+	txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+	if (!txq) {
+		rte_errno = ENOMEM;
+		ERROR("%p: unable to allocate queue index %u",
+		      (void *)dev, idx);
+		return -rte_errno;
+	}
+	*txq = (struct txq){
+		.priv = priv,
+		.stats.idx = idx,
+		.socket = socket,
+	};
+	txq->cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+	if (!txq->cq) {
 		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	attr.init = (struct ibv_qp_init_attr){
-		/* CQ to be associated with the send queue. */
-		.send_cq = tmpl.cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = tmpl.cq,
+	qp_init_attr = (struct ibv_qp_init_attr){
+		.send_cq = txq->cq,
+		.recv_cq = txq->cq,
 		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
+			.max_send_wr =
+				RTE_MIN(priv->device_attr.max_qp_wr, desc),
 			.max_send_sge = 1,
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
-		/*
-		 * Do *NOT* enable this, completions events are managed per
-		 * Tx burst.
-		 */
+		/* No completion events must occur by default. */
 		.sq_sig_all = 0,
 	};
-	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
-	if (tmpl.qp == NULL) {
+	txq->qp = ibv_create_qp(priv->pd, &qp_init_attr);
+	if (!txq->qp) {
 		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	/* ibv_create_qp() updates this value. */
-	tmpl.max_inline = attr.init.cap.max_inline_data;
-	attr.mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
+	txq->max_inline = qp_init_attr.cap.max_inline_data;
+	ret = ibv_modify_qp
+		(txq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_INIT,
+			.port_num = priv->port,
+		 },
+		 IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_txq_alloc_elts(&tmpl, desc);
+	ret = mlx4_txq_alloc_elts(txq, desc);
 	if (ret) {
 		ERROR("%p: TXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	attr.mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	ret = ibv_modify_qp
+		(txq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTR,
+		 },
+		 IBV_QP_STATE);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	attr.mod.qp_state = IBV_QPS_RTS;
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	ret = ibv_modify_qp
+		(txq->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTS,
+		 },
+		 IBV_QP_STATE);
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	/* Clean up txq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
-	mlx4_txq_cleanup(txq);
-	*txq = tmpl;
-	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
 	/* Pre-register known mempools. */
 	rte_mempool_walk(mlx4_txq_mp2mr_iter, txq);
+	DEBUG("%p: adding Tx queue %p to list", (void *)dev, (void *)txq);
+	dev->data->tx_queues[idx] = txq;
 	return 0;
 error:
+	dev->data->tx_queues[idx] = NULL;
 	ret = rte_errno;
-	mlx4_txq_cleanup(&tmpl);
+	mlx4_tx_queue_release(txq);
 	rte_errno = ret;
 	assert(rte_errno > 0);
 	return -rte_errno;
 }
 
 /**
- * DPDK callback to configure a Tx queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Tx queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct txq *txq = dev->data->tx_queues[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= dev->data->nb_tx_queues) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, dev->data->nb_tx_queues);
-		return -rte_errno;
-	}
-	if (txq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)txq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		dev->data->tx_queues[idx] = NULL;
-		mlx4_txq_cleanup(txq);
-	} else {
-		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
-		if (txq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = mlx4_txq_setup(dev, txq, desc, socket, conf);
-	if (ret) {
-		rte_free(txq);
-	} else {
-		txq->stats.idx = idx;
-		DEBUG("%p: adding Tx queue %p to list",
-		      (void *)dev, (void *)txq);
-		dev->data->tx_queues[idx] = txq;
-	}
-	return ret;
-}
-
-/**
  * DPDK callback to release a Tx queue.
  *
  * @param dpdk_txq
@@ -464,6 +381,16 @@ mlx4_tx_queue_release(void *dpdk_txq)
 			priv->dev->data->tx_queues[i] = NULL;
 			break;
 		}
-	mlx4_txq_cleanup(txq);
+	mlx4_txq_free_elts(txq);
+	if (txq->qp)
+		claim_zero(ibv_destroy_qp(txq->qp));
+	if (txq->cq)
+		claim_zero(ibv_destroy_cq(txq->cq));
+	for (i = 0; i != RTE_DIM(txq->mp2mr); ++i) {
+		if (!txq->mp2mr[i].mp)
+			break;
+		assert(txq->mp2mr[i].mr);
+		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+	}
 	rte_free(txq);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 24/29] net/mlx4: allocate queues and mbuf rings together
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (22 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 23/29] net/mlx4: drop live queue reconfiguration support Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 25/29] net/mlx4: convert Rx path to work queues Adrien Mazarguil
                     ` (5 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Since live Tx and Rx queues cannot be reused anymore without being
destroyed first, mbuf ring sizes are fixed and known from the start.

This allows a single allocation for queue data structures and mbuf ring
together, saving space and bringing them closer in memory.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_rxq.c  |  71 +++++++++++--------------
 drivers/net/mlx4/mlx4_rxtx.h |   2 +
 drivers/net/mlx4/mlx4_txq.c  | 109 +++++++++++---------------------------
 3 files changed, 65 insertions(+), 117 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 30b0654..9978e5d 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -69,36 +69,30 @@
  *
  * @param rxq
  *   Pointer to Rx queue structure.
- * @param elts_n
- *   Number of elements to allocate.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-mlx4_rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
+mlx4_rxq_alloc_elts(struct rxq *rxq)
 {
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 	unsigned int i;
-	struct rxq_elt (*elts)[elts_n] =
-		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
-				  rxq->socket);
 
-	if (elts == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: can't allocate packets array", (void *)rxq);
-		goto error;
-	}
 	/* For each WR (packet). */
-	for (i = 0; (i != elts_n); ++i) {
+	for (i = 0; i != RTE_DIM(*elts); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
 		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge *sge = &(*elts)[i].sge;
 		struct rte_mbuf *buf = rte_pktmbuf_alloc(rxq->mp);
 
 		if (buf == NULL) {
+			while (i--) {
+				rte_pktmbuf_free_seg((*elts)[i].buf);
+				(*elts)[i].buf = NULL;
+			}
 			rte_errno = ENOMEM;
-			ERROR("%p: empty mbuf pool", (void *)rxq);
-			goto error;
+			return -rte_errno;
 		}
 		elt->buf = buf;
 		wr->next = &(*elts)[(i + 1)].wr;
@@ -121,21 +115,7 @@ mlx4_rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 	}
 	/* The last WR pointer must be NULL. */
 	(*elts)[(i - 1)].wr.next = NULL;
-	DEBUG("%p: allocated and configured %u single-segment WRs",
-	      (void *)rxq, elts_n);
-	rxq->elts_n = elts_n;
-	rxq->elts_head = 0;
-	rxq->elts = elts;
 	return 0;
-error:
-	if (elts != NULL) {
-		for (i = 0; (i != RTE_DIM(*elts)); ++i)
-			rte_pktmbuf_free_seg((*elts)[i].buf);
-		rte_free(elts);
-	}
-	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(rte_errno > 0);
-	return -rte_errno;
 }
 
 /**
@@ -148,17 +128,15 @@ static void
 mlx4_rxq_free_elts(struct rxq *rxq)
 {
 	unsigned int i;
-	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt (*elts)[elts_n] = rxq->elts;
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 
 	DEBUG("%p: freeing WRs", (void *)rxq);
-	rxq->elts_n = 0;
-	rxq->elts = NULL;
-	if (elts == NULL)
-		return;
-	for (i = 0; (i != RTE_DIM(*elts)); ++i)
+	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+		if (!(*elts)[i].buf)
+			continue;
 		rte_pktmbuf_free_seg((*elts)[i].buf);
-	rte_free(elts);
+		(*elts)[i].buf = NULL;
+	}
 }
 
 /**
@@ -187,8 +165,21 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 {
 	struct priv *priv = dev->data->dev_private;
 	uint32_t mb_len = rte_pktmbuf_data_room_size(mp);
+	struct rxq_elt (*elts)[desc];
 	struct rte_flow_error error;
 	struct rxq *rxq;
+	struct mlx4_malloc_vec vec[] = {
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*rxq),
+			.addr = (void **)&rxq,
+		},
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*elts),
+			.addr = (void **)&elts,
+		},
+	};
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
@@ -213,9 +204,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		return -rte_errno;
 	}
 	/* Allocate and initialize Rx queue. */
-	rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+	mlx4_zmallocv_socket("RXQ", vec, RTE_DIM(vec), socket);
 	if (!rxq) {
-		rte_errno = ENOMEM;
 		ERROR("%p: unable to allocate queue index %u",
 		      (void *)dev, idx);
 		return -rte_errno;
@@ -224,6 +214,9 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		.priv = priv,
 		.mp = mp,
 		.port_id = dev->data->port_id,
+		.elts_n = desc,
+		.elts_head = 0,
+		.elts = elts,
 		.stats.idx = idx,
 		.socket = socket,
 	};
@@ -307,7 +300,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_rxq_alloc_elts(rxq, desc);
+	ret = mlx4_rxq_alloc_elts(rxq);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(rte_errno));
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index d62120e..d90f2f9 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -81,6 +81,7 @@ struct rxq {
 	struct rxq_elt (*elts)[]; /**< Rx elements. */
 	struct mlx4_rxq_stats stats; /**< Rx queue counters. */
 	unsigned int socket; /**< CPU socket ID for allocations. */
+	uint8_t data[]; /**< Remaining queue resources. */
 };
 
 /** Tx element. */
@@ -118,6 +119,7 @@ struct txq {
 	unsigned int elts_comp_cd_init; /**< Initial value for countdown. */
 	struct mlx4_txq_stats stats; /**< Tx queue counters. */
 	unsigned int socket; /**< CPU socket ID for allocations. */
+	uint8_t data[]; /**< Remaining queue resources. */
 };
 
 /* mlx4_rxq.c */
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index f102c68..915f8d7 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -64,59 +64,6 @@
 #include "mlx4_utils.h"
 
 /**
- * Allocate Tx queue elements.
- *
- * @param txq
- *   Pointer to Tx queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_txq_alloc_elts(struct txq *txq, unsigned int elts_n)
-{
-	unsigned int i;
-	struct txq_elt (*elts)[elts_n] =
-		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
-	int ret = 0;
-
-	if (elts == NULL) {
-		ERROR("%p: can't allocate packets array", (void *)txq);
-		ret = ENOMEM;
-		goto error;
-	}
-	for (i = 0; (i != elts_n); ++i) {
-		struct txq_elt *elt = &(*elts)[i];
-
-		elt->buf = NULL;
-	}
-	DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
-	txq->elts_n = elts_n;
-	txq->elts = elts;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	/*
-	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
-	 * at least 4 times per ring.
-	 */
-	txq->elts_comp_cd_init =
-		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
-		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
-	txq->elts_comp_cd = txq->elts_comp_cd_init;
-	assert(ret == 0);
-	return 0;
-error:
-	rte_free(elts);
-	DEBUG("%p: failed, freed everything", (void *)txq);
-	assert(ret > 0);
-	rte_errno = ret;
-	return -rte_errno;
-}
-
-/**
  * Free Tx queue elements.
  *
  * @param txq
@@ -125,34 +72,21 @@ mlx4_txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 static void
 mlx4_txq_free_elts(struct txq *txq)
 {
-	unsigned int elts_n = txq->elts_n;
 	unsigned int elts_head = txq->elts_head;
 	unsigned int elts_tail = txq->elts_tail;
-	struct txq_elt (*elts)[elts_n] = txq->elts;
+	struct txq_elt (*elts)[txq->elts_n] = txq->elts;
 
 	DEBUG("%p: freeing WRs", (void *)txq);
-	txq->elts_n = 0;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	txq->elts_comp_cd = 0;
-	txq->elts_comp_cd_init = 0;
-	txq->elts = NULL;
-	if (elts == NULL)
-		return;
 	while (elts_tail != elts_head) {
 		struct txq_elt *elt = &(*elts)[elts_tail];
 
 		assert(elt->buf != NULL);
 		rte_pktmbuf_free(elt->buf);
-#ifndef NDEBUG
-		/* Poisoning. */
-		memset(elt, 0x77, sizeof(*elt));
-#endif
-		if (++elts_tail == elts_n)
+		elt->buf = NULL;
+		if (++elts_tail == RTE_DIM(*elts))
 			elts_tail = 0;
 	}
-	rte_free(elts);
+	txq->elts_tail = txq->elts_head;
 }
 
 struct txq_mp2mr_mbuf_check_data {
@@ -235,8 +169,21 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		    unsigned int socket, const struct rte_eth_txconf *conf)
 {
 	struct priv *priv = dev->data->dev_private;
+	struct txq_elt (*elts)[desc];
 	struct ibv_qp_init_attr qp_init_attr;
 	struct txq *txq;
+	struct mlx4_malloc_vec vec[] = {
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*txq),
+			.addr = (void **)&txq,
+		},
+		{
+			.align = RTE_CACHE_LINE_SIZE,
+			.size = sizeof(*elts),
+			.addr = (void **)&elts,
+		},
+	};
 	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
@@ -261,9 +208,8 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		return -rte_errno;
 	}
 	/* Allocate and initialize Tx queue. */
-	txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+	mlx4_zmallocv_socket("TXQ", vec, RTE_DIM(vec), socket);
 	if (!txq) {
-		rte_errno = ENOMEM;
 		ERROR("%p: unable to allocate queue index %u",
 		      (void *)dev, idx);
 		return -rte_errno;
@@ -272,6 +218,19 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		.priv = priv,
 		.stats.idx = idx,
 		.socket = socket,
+		.elts_n = desc,
+		.elts = elts,
+		.elts_head = 0,
+		.elts_tail = 0,
+		.elts_comp = 0,
+		/*
+		 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ
+		 * packets or at least 4 times per ring.
+		 */
+		.elts_comp_cd =
+			RTE_MIN(MLX4_PMD_TX_PER_COMP_REQ, desc / 4),
+		.elts_comp_cd_init =
+			RTE_MIN(MLX4_PMD_TX_PER_COMP_REQ, desc / 4),
 	};
 	txq->cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
 	if (!txq->cq) {
@@ -314,12 +273,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = mlx4_txq_alloc_elts(txq, desc);
-	if (ret) {
-		ERROR("%p: TXQ allocation failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
 	ret = ibv_modify_qp
 		(txq->qp,
 		 &(struct ibv_qp_attr){
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 25/29] net/mlx4: convert Rx path to work queues
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (23 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 24/29] net/mlx4: allocate queues and mbuf rings together Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 26/29] net/mlx4: remove unnecessary check Adrien Mazarguil
                     ` (4 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Work queues (WQs) are lower-level than standard queue pairs (QPs). They are
dedicated to one traffic direction and have to be used in conjunction with
indirection tables and special "hash" QPs to get the same level of
functionality.

These extra objects however are the building blocks for RSS support brought
by subsequent commits, as a single "hash" QP can manage several WQs through
an indirection table according to a hash algorithm and other parameters.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4.h      |  3 ++
 drivers/net/mlx4/mlx4_rxq.c  | 74 ++++++++++++++++++++++++++++++++-------
 drivers/net/mlx4/mlx4_rxtx.c |  2 +-
 drivers/net/mlx4/mlx4_rxtx.h |  2 ++
 4 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a27399a..b04a104 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -61,6 +61,9 @@
 /** Maximum size for inline data. */
 #define MLX4_PMD_MAX_INLINE 0
 
+/** Fixed RSS hash key size in bytes. Cannot be modified. */
+#define MLX4_RSS_HASH_KEY_SIZE 40
+
 /**
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
  * from which buffers are to be transmitted will have to be mapped by this
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 9978e5d..171fe3f 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -268,18 +268,64 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	rxq->qp = ibv_create_qp
-		(priv->pd,
-		 &(struct ibv_qp_init_attr){
-			.send_cq = rxq->cq,
-			.recv_cq = rxq->cq,
-			.cap = {
-				.max_recv_wr =
-					RTE_MIN(priv->device_attr.max_qp_wr,
-						desc),
-				.max_recv_sge = 1,
+	rxq->wq = ibv_create_wq
+		(priv->ctx,
+		 &(struct ibv_wq_init_attr){
+			.wq_type = IBV_WQT_RQ,
+			.max_wr = RTE_MIN(priv->device_attr.max_qp_wr, desc),
+			.max_sge = 1,
+			.pd = priv->pd,
+			.cq = rxq->cq,
+		 });
+	if (!rxq->wq) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: WQ creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = ibv_modify_wq
+		(rxq->wq,
+		 &(struct ibv_wq_attr){
+			.attr_mask = IBV_WQ_ATTR_STATE,
+			.wq_state = IBV_WQS_RDY,
+		 });
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: WQ state to IBV_WPS_RDY failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	rxq->ind = ibv_create_rwq_ind_table
+		(priv->ctx,
+		 &(struct ibv_rwq_ind_table_init_attr){
+			.log_ind_tbl_size = 0,
+			.ind_tbl = (struct ibv_wq *[]){
+				rxq->wq,
 			},
+			.comp_mask = 0,
+		 });
+	if (!rxq->ind) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: indirection table creation failure: %s",
+		      (void *)dev, strerror(errno));
+		goto error;
+	}
+	rxq->qp = ibv_create_qp_ex
+		(priv->ctx,
+		 &(struct ibv_qp_init_attr_ex){
+			.comp_mask = (IBV_QP_INIT_ATTR_PD |
+				      IBV_QP_INIT_ATTR_RX_HASH |
+				      IBV_QP_INIT_ATTR_IND_TABLE),
 			.qp_type = IBV_QPT_RAW_PACKET,
+			.pd = priv->pd,
+			.rwq_ind_tbl = rxq->ind,
+			.rx_hash_conf = {
+				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
+				.rx_hash_key_len = MLX4_RSS_HASH_KEY_SIZE,
+				.rx_hash_key =
+					(uint8_t [MLX4_RSS_HASH_KEY_SIZE]){ 0 },
+				.rx_hash_fields_mask = 0,
+			},
 		 });
 	if (!rxq->qp) {
 		rte_errno = errno ? errno : EINVAL;
@@ -306,8 +352,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	ret = ibv_post_recv(rxq->qp, &(*rxq->elts)[0].wr,
-			    &(struct ibv_recv_wr *){ NULL });
+	ret = ibv_post_wq_recv(rxq->wq, &(*rxq->elts)[0].wr,
+			       &(struct ibv_recv_wr *){ NULL });
 	if (ret) {
 		rte_errno = ret;
 		ERROR("%p: ibv_post_recv() failed: %s",
@@ -373,6 +419,10 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	mlx4_rxq_free_elts(rxq);
 	if (rxq->qp)
 		claim_zero(ibv_destroy_qp(rxq->qp));
+	if (rxq->ind)
+		claim_zero(ibv_destroy_rwq_ind_table(rxq->ind));
+	if (rxq->wq)
+		claim_zero(ibv_destroy_wq(rxq->wq));
 	if (rxq->cq)
 		claim_zero(ibv_destroy_cq(rxq->cq));
 	if (rxq->channel)
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
index b5e7777..859f1bd 100644
--- a/drivers/net/mlx4/mlx4_rxtx.c
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -459,7 +459,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	/* Repost WRs. */
 	*wr_next = NULL;
 	assert(wr_head);
-	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
+	ret = ibv_post_wq_recv(rxq->wq, wr_head, &wr_bad);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
 		DEBUG("%p: recv_burst(): failed (ret=%d)",
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index d90f2f9..897fd2a 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -73,6 +73,8 @@ struct rxq {
 	struct rte_mempool *mp; /**< Memory pool for allocations. */
 	struct ibv_mr *mr; /**< Memory region (for mp). */
 	struct ibv_cq *cq; /**< Completion queue. */
+	struct ibv_wq *wq; /**< Work queue. */
+	struct ibv_rwq_ind_table *ind; /**< Indirection table. */
 	struct ibv_qp *qp; /**< Queue pair. */
 	struct ibv_comp_channel *channel; /**< Rx completion channel. */
 	unsigned int port_id; /**< Port ID for incoming packets. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 26/29] net/mlx4: remove unnecessary check
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (24 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 25/29] net/mlx4: convert Rx path to work queues Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 27/29] net/mlx4: add RSS flow rule action support Adrien Mazarguil
                     ` (3 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Device operation callbacks are not supposed to handle a missing private
data structure.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_ethdev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index ebf2339..661e252 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -750,8 +750,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	char ifname[IF_NAMESIZE];
 
 	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-	if (priv == NULL)
-		return;
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
 	info->max_rx_pktlen = 65536;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 27/29] net/mlx4: add RSS flow rule action support
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (25 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 26/29] net/mlx4: remove unnecessary check Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 28/29] net/mlx4: disable UDP support in RSS flow rules Adrien Mazarguil
                     ` (2 subsequent siblings)
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

This patch dissociates single-queue indirection tables and hash QP objects
from Rx queue structures to relinquish their control to users through the
RSS flow rule action, while simultaneously allowing multiple queues to be
associated with RSS contexts.

Flow rules share identical RSS contexts (hashed fields, hash key, target
queues) to save on memory and other resources. The trade-off is some added
complexity due to reference counters management on RSS contexts.

The QUEUE action is re-implemented on top of an automatically-generated
single-queue RSS context.

The following hardware limitations apply to RSS contexts:

- The number of queues in a group must be a power of two.
- Queue indices must be consecutive, for instance the [0 1 2 3] set is
  allowed, however [3 2 1 0], [0 2 1 3] and [0 0 1 1 2 3 3 3] are not.
- The first queue of a group must be aligned to a multiple of the context
  size, e.g. if queues [0 1 2 3 4] are defined globally, allowed group
  combinations are [0 1] and [2 3]; groups [1 2] and [3 4] are not
  supported.
- RSS hash key, while configurable per context, must be exactly 40 bytes
  long.
- The only supported hash algorithm is Toeplitz.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 +
 drivers/net/mlx4/Makefile         |   2 +-
 drivers/net/mlx4/mlx4.c           |  13 ++
 drivers/net/mlx4/mlx4.h           |   2 +
 drivers/net/mlx4/mlx4_ethdev.c    |   1 +
 drivers/net/mlx4/mlx4_flow.c      | 181 ++++++++++++++++++--
 drivers/net/mlx4/mlx4_flow.h      |   3 +-
 drivers/net/mlx4/mlx4_rxq.c       | 303 +++++++++++++++++++++++++--------
 drivers/net/mlx4/mlx4_rxtx.h      |  24 ++-
 mk/rte.app.mk                     |   2 +-
 10 files changed, 445 insertions(+), 87 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 6f8c82a..9750ebf 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -16,6 +16,7 @@ Promiscuous mode     = Y
 Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
+RSS hash             = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Basic stats          = Y
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 0515cd7..3b3a020 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -54,7 +54,7 @@ CFLAGS += -D_BSD_SOURCE
 CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += -D_XOPEN_SOURCE=600
 CFLAGS += $(WERROR_FLAGS)
-LDLIBS += -libverbs
+LDLIBS += -libverbs -lmlx4
 
 # A few warnings cannot be avoided in external headers.
 CFLAGS += -Wno-error=cast-qual
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 52f8d51..0db9a19 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -50,6 +50,7 @@
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
 #include <infiniband/verbs.h>
+#include <infiniband/mlx4dv.h>
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
@@ -99,8 +100,20 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow_error error;
+	uint8_t log2_range = rte_log2_u32(dev->data->nb_rx_queues);
 	int ret;
 
+	/* Prepare range for RSS contexts before creating the first WQ. */
+	ret = mlx4dv_set_context_attr(priv->ctx,
+				      MLX4DV_SET_CTX_ATTR_LOG_WQS_RANGE_SZ,
+				      &log2_range);
+	if (ret) {
+		ERROR("cannot set up range size for RSS context to %u"
+		      " (for %u Rx queues), error: %s",
+		      1 << log2_range, dev->data->nb_rx_queues, strerror(ret));
+		rte_errno = ret;
+		return -ret;
+	}
 	/* Prepare internal flow rules. */
 	ret = mlx4_flow_sync(priv, &error);
 	if (ret) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b04a104..f4da8c6 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -95,6 +95,7 @@ enum {
 #define MLX4_DRIVER_NAME "net_mlx4"
 
 struct mlx4_drop;
+struct mlx4_rss;
 struct rxq;
 struct txq;
 struct rte_flow;
@@ -114,6 +115,7 @@ struct priv {
 	uint32_t isolated:1; /**< Toggle isolated mode. */
 	struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
 	struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
+	LIST_HEAD(, mlx4_rss) rss; /**< Shared targets for Rx flow rules. */
 	LIST_HEAD(, rte_flow) flows; /**< Configured flow rule handles. */
 	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
 	/**< Configured MAC addresses. Unused entries are zeroed. */
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 661e252..3623909 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -769,6 +769,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->tx_offload_capa = 0;
 	if (mlx4_get_ifname(priv, &ifname) == 0)
 		info->if_index = if_nametoindex(ifname);
+	info->hash_key_size = MLX4_RSS_HASH_KEY_SIZE;
 	info->speed_capa =
 			ETH_LINK_SPEED_1G |
 			ETH_LINK_SPEED_10G |
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 41423cd..2b60d76 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -103,6 +103,62 @@ struct mlx4_drop {
 };
 
 /**
+ * Convert DPDK RSS hash fields to their Verbs equivalent.
+ *
+ * @param rss_hf
+ *   Hash fields in DPDK format (see struct rte_eth_rss_conf).
+ *
+ * @return
+ *   A valid Verbs RSS hash fields mask for mlx4 on success, (uint64_t)-1
+ *   otherwise and rte_errno is set.
+ */
+static uint64_t
+mlx4_conv_rss_hf(uint64_t rss_hf)
+{
+	enum { IPV4, IPV6, TCP, UDP, };
+	const uint64_t in[] = {
+		[IPV4] = (ETH_RSS_IPV4 |
+			  ETH_RSS_FRAG_IPV4 |
+			  ETH_RSS_NONFRAG_IPV4_TCP |
+			  ETH_RSS_NONFRAG_IPV4_UDP |
+			  ETH_RSS_NONFRAG_IPV4_OTHER),
+		[IPV6] = (ETH_RSS_IPV6 |
+			  ETH_RSS_FRAG_IPV6 |
+			  ETH_RSS_NONFRAG_IPV6_TCP |
+			  ETH_RSS_NONFRAG_IPV6_UDP |
+			  ETH_RSS_NONFRAG_IPV6_OTHER |
+			  ETH_RSS_IPV6_EX |
+			  ETH_RSS_IPV6_TCP_EX |
+			  ETH_RSS_IPV6_UDP_EX),
+		[TCP] = (ETH_RSS_NONFRAG_IPV4_TCP |
+			 ETH_RSS_NONFRAG_IPV6_TCP |
+			 ETH_RSS_IPV6_TCP_EX),
+		[UDP] = (ETH_RSS_NONFRAG_IPV4_UDP |
+			 ETH_RSS_NONFRAG_IPV6_UDP |
+			 ETH_RSS_IPV6_UDP_EX),
+	};
+	const uint64_t out[RTE_DIM(in)] = {
+		[IPV4] = IBV_RX_HASH_SRC_IPV4 | IBV_RX_HASH_DST_IPV4,
+		[IPV6] = IBV_RX_HASH_SRC_IPV6 | IBV_RX_HASH_DST_IPV6,
+		[TCP] = IBV_RX_HASH_SRC_PORT_TCP | IBV_RX_HASH_DST_PORT_TCP,
+		[UDP] = IBV_RX_HASH_SRC_PORT_UDP | IBV_RX_HASH_DST_PORT_UDP,
+	};
+	uint64_t seen = 0;
+	uint64_t conv = 0;
+	unsigned int i;
+
+	for (i = 0; i != RTE_DIM(in); ++i)
+		if (rss_hf & in[i]) {
+			seen |= rss_hf & in[i];
+			conv |= out[i];
+		}
+	if (!(rss_hf & ~seen))
+		return conv;
+	rte_errno = ENOTSUP;
+	return (uint64_t)-1;
+}
+
+/**
  * Merge Ethernet pattern item into flow rule handle.
  *
  * Additional mlx4-specific constraints on supported fields:
@@ -663,6 +719,9 @@ mlx4_flow_prepare(struct priv *priv,
 	for (action = actions; action->type; ++action) {
 		switch (action->type) {
 			const struct rte_flow_action_queue *queue;
+			const struct rte_flow_action_rss *rss;
+			const struct rte_eth_rss_conf *rss_conf;
+			unsigned int i;
 
 		case RTE_FLOW_ACTION_TYPE_VOID:
 			continue;
@@ -670,23 +729,87 @@ mlx4_flow_prepare(struct priv *priv,
 			flow->drop = 1;
 			break;
 		case RTE_FLOW_ACTION_TYPE_QUEUE:
+			if (flow->rss)
+				break;
 			queue = action->conf;
-			if (queue->index >= priv->dev->data->nb_rx_queues)
+			flow->rss = mlx4_rss_get
+				(priv, 0, mlx4_rss_hash_key_default, 1,
+				 &queue->index);
+			if (!flow->rss) {
+				msg = "not enough resources for additional"
+					" single-queue RSS context";
+				goto exit_action_not_supported;
+			}
+			break;
+		case RTE_FLOW_ACTION_TYPE_RSS:
+			if (flow->rss)
+				break;
+			rss = action->conf;
+			/* Default RSS configuration if none is provided. */
+			rss_conf =
+				rss->rss_conf ?
+				rss->rss_conf :
+				&(struct rte_eth_rss_conf){
+					.rss_key = mlx4_rss_hash_key_default,
+					.rss_key_len = MLX4_RSS_HASH_KEY_SIZE,
+					.rss_hf = (ETH_RSS_IPV4 |
+						   ETH_RSS_NONFRAG_IPV4_UDP |
+						   ETH_RSS_NONFRAG_IPV4_TCP |
+						   ETH_RSS_IPV6 |
+						   ETH_RSS_NONFRAG_IPV6_UDP |
+						   ETH_RSS_NONFRAG_IPV6_TCP),
+				};
+			/* Sanity checks. */
+			if (!rte_is_power_of_2(rss->num)) {
+				msg = "for RSS, mlx4 requires the number of"
+					" queues to be a power of two";
+				goto exit_action_not_supported;
+			}
+			if (rss_conf->rss_key_len !=
+			    sizeof(flow->rss->key)) {
+				msg = "mlx4 supports exactly one RSS hash key"
+					" length: "
+					MLX4_STR_EXPAND(MLX4_RSS_HASH_KEY_SIZE);
+				goto exit_action_not_supported;
+			}
+			for (i = 1; i < rss->num; ++i)
+				if (rss->queue[i] - rss->queue[i - 1] != 1)
+					break;
+			if (i != rss->num) {
+				msg = "mlx4 requires RSS contexts to use"
+					" consecutive queue indices only";
+				goto exit_action_not_supported;
+			}
+			if (rss->queue[0] % rss->num) {
+				msg = "mlx4 requires the first queue of a RSS"
+					" context to be aligned on a multiple"
+					" of the context size";
+				goto exit_action_not_supported;
+			}
+			flow->rss = mlx4_rss_get
+				(priv, mlx4_conv_rss_hf(rss_conf->rss_hf),
+				 rss_conf->rss_key, rss->num, rss->queue);
+			if (!flow->rss) {
+				msg = "either invalid parameters or not enough"
+					" resources for additional multi-queue"
+					" RSS context";
 				goto exit_action_not_supported;
-			flow->queue = 1;
-			flow->queue_id = queue->index;
+			}
 			break;
 		default:
 			goto exit_action_not_supported;
 		}
 	}
-	if (!flow->queue && !flow->drop)
+	if (!flow->rss && !flow->drop)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 			 NULL, "no valid action");
 	/* Validation ends here. */
-	if (!addr)
+	if (!addr) {
+		if (flow->rss)
+			mlx4_rss_put(flow->rss);
 		return 0;
+	}
 	if (flow == &temp) {
 		/* Allocate proper handle based on collected data. */
 		const struct mlx4_malloc_vec vec[] = {
@@ -711,6 +834,7 @@ mlx4_flow_prepare(struct priv *priv,
 		*flow = (struct rte_flow){
 			.ibv_attr = temp.ibv_attr,
 			.ibv_attr_size = sizeof(*flow->ibv_attr),
+			.rss = temp.rss,
 		};
 		*flow->ibv_attr = (struct ibv_flow_attr){
 			.type = IBV_FLOW_ATTR_NORMAL,
@@ -727,7 +851,7 @@ mlx4_flow_prepare(struct priv *priv,
 				  item, msg ? msg : "item not supported");
 exit_action_not_supported:
 	return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
-				  action, "action not supported");
+				  action, msg ? msg : "action not supported");
 }
 
 /**
@@ -850,6 +974,8 @@ mlx4_flow_toggle(struct priv *priv,
 		flow->ibv_flow = NULL;
 		if (flow->drop)
 			mlx4_drop_put(priv->drop);
+		else if (flow->rss)
+			mlx4_rss_detach(flow->rss);
 		return 0;
 	}
 	assert(flow->ibv_attr);
@@ -861,6 +987,8 @@ mlx4_flow_toggle(struct priv *priv,
 			flow->ibv_flow = NULL;
 			if (flow->drop)
 				mlx4_drop_put(priv->drop);
+			else if (flow->rss)
+				mlx4_rss_detach(flow->rss);
 		}
 		err = EACCES;
 		msg = ("priority level "
@@ -868,24 +996,42 @@ mlx4_flow_toggle(struct priv *priv,
 		       " is reserved when not in isolated mode");
 		goto error;
 	}
-	if (flow->queue) {
-		struct rxq *rxq = NULL;
+	if (flow->rss) {
+		struct mlx4_rss *rss = flow->rss;
+		int missing = 0;
+		unsigned int i;
 
-		if (flow->queue_id < priv->dev->data->nb_rx_queues)
-			rxq = priv->dev->data->rx_queues[flow->queue_id];
+		/* Stop at the first nonexistent target queue. */
+		for (i = 0; i != rss->queues; ++i)
+			if (rss->queue_id[i] >=
+			    priv->dev->data->nb_rx_queues ||
+			    !priv->dev->data->rx_queues[rss->queue_id[i]]) {
+				missing = 1;
+				break;
+			}
 		if (flow->ibv_flow) {
-			if (!rxq ^ !flow->drop)
+			if (missing ^ !flow->drop)
 				return 0;
 			/* Verbs flow needs updating. */
 			claim_zero(ibv_destroy_flow(flow->ibv_flow));
 			flow->ibv_flow = NULL;
 			if (flow->drop)
 				mlx4_drop_put(priv->drop);
+			else
+				mlx4_rss_detach(rss);
+		}
+		if (!missing) {
+			err = mlx4_rss_attach(rss);
+			if (err) {
+				err = -err;
+				msg = "cannot create indirection table or hash"
+					" QP to associate flow rule with";
+				goto error;
+			}
+			qp = rss->qp;
 		}
-		if (rxq)
-			qp = rxq->qp;
 		/* A missing target queue drops traffic implicitly. */
-		flow->drop = !rxq;
+		flow->drop = missing;
 	}
 	if (flow->drop) {
 		mlx4_drop_get(priv);
@@ -904,6 +1050,8 @@ mlx4_flow_toggle(struct priv *priv,
 		return 0;
 	if (flow->drop)
 		mlx4_drop_put(priv->drop);
+	else if (flow->rss)
+		mlx4_rss_detach(flow->rss);
 	err = errno;
 	msg = "flow rule rejected by device";
 error:
@@ -946,6 +1094,8 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		}
 		return flow;
 	}
+	if (flow->rss)
+		mlx4_rss_put(flow->rss);
 	rte_flow_error_set(error, -err, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			   error->message);
 	rte_free(flow);
@@ -992,6 +1142,8 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 	if (err)
 		return err;
 	LIST_REMOVE(flow, next);
+	if (flow->rss)
+		mlx4_rss_put(flow->rss);
 	rte_free(flow);
 	return 0;
 }
@@ -1320,6 +1472,7 @@ mlx4_flow_clean(struct priv *priv)
 
 	while ((flow = LIST_FIRST(&priv->flows)))
 		mlx4_flow_destroy(priv->dev, flow, NULL);
+	assert(LIST_EMPTY(&priv->rss));
 }
 
 static const struct rte_flow_ops mlx4_flow_ops = {
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 134e14d..651fd37 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -70,8 +70,7 @@ struct rte_flow {
 	uint32_t promisc:1; /**< This rule matches everything. */
 	uint32_t allmulti:1; /**< This rule matches all multicast traffic. */
 	uint32_t drop:1; /**< This rule drops packets. */
-	uint32_t queue:1; /**< Target is a receive queue. */
-	uint16_t queue_id; /**< Target queue. */
+	struct mlx4_rss *rss; /**< Rx target. */
 };
 
 /* mlx4_flow.c */
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 171fe3f..483fe9b 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -65,6 +65,242 @@
 #include "mlx4_utils.h"
 
 /**
+ * Historical RSS hash key.
+ *
+ * This used to be the default for mlx4 in Linux before v3.19 switched to
+ * generating random hash keys through netdev_rss_key_fill().
+ *
+ * It is used in this PMD for consistency with past DPDK releases but can
+ * now be overridden through user configuration.
+ *
+ * Note: this is not const to work around API quirks.
+ */
+uint8_t
+mlx4_rss_hash_key_default[MLX4_RSS_HASH_KEY_SIZE] = {
+	0x2c, 0xc6, 0x81, 0xd1,
+	0x5b, 0xdb, 0xf4, 0xf7,
+	0xfc, 0xa2, 0x83, 0x19,
+	0xdb, 0x1a, 0x3e, 0x94,
+	0x6b, 0x9e, 0x38, 0xd9,
+	0x2c, 0x9c, 0x03, 0xd1,
+	0xad, 0x99, 0x44, 0xa7,
+	0xd9, 0x56, 0x3d, 0x59,
+	0x06, 0x3c, 0x25, 0xf3,
+	0xfc, 0x1f, 0xdc, 0x2a,
+};
+
+/**
+ * Obtain a RSS context with specified properties.
+ *
+ * Used when creating a flow rule targeting one or several Rx queues.
+ *
+ * If a matching RSS context already exists, it is returned with its
+ * reference count incremented.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param fields
+ *   Fields for RSS processing (Verbs format).
+ * @param[in] key
+ *   Hash key to use (whose size is exactly MLX4_RSS_HASH_KEY_SIZE).
+ * @param queues
+ *   Number of target queues.
+ * @param[in] queue_id
+ *   Target queues.
+ *
+ * @return
+ *   Pointer to RSS context on success, NULL otherwise and rte_errno is set.
+ */
+struct mlx4_rss *
+mlx4_rss_get(struct priv *priv, uint64_t fields,
+	     uint8_t key[MLX4_RSS_HASH_KEY_SIZE],
+	     uint16_t queues, const uint16_t queue_id[])
+{
+	struct mlx4_rss *rss;
+	size_t queue_id_size = sizeof(queue_id[0]) * queues;
+
+	LIST_FOREACH(rss, &priv->rss, next)
+		if (fields == rss->fields &&
+		    queues == rss->queues &&
+		    !memcmp(key, rss->key, MLX4_RSS_HASH_KEY_SIZE) &&
+		    !memcmp(queue_id, rss->queue_id, queue_id_size)) {
+			++rss->refcnt;
+			return rss;
+		}
+	rss = rte_malloc(__func__, offsetof(struct mlx4_rss, queue_id) +
+			 queue_id_size, 0);
+	if (!rss)
+		goto error;
+	*rss = (struct mlx4_rss){
+		.priv = priv,
+		.refcnt = 1,
+		.usecnt = 0,
+		.qp = NULL,
+		.ind = NULL,
+		.fields = fields,
+		.queues = queues,
+	};
+	memcpy(rss->key, key, MLX4_RSS_HASH_KEY_SIZE);
+	memcpy(rss->queue_id, queue_id, queue_id_size);
+	LIST_INSERT_HEAD(&priv->rss, rss, next);
+	return rss;
+error:
+	rte_errno = ENOMEM;
+	return NULL;
+}
+
+/**
+ * Release a RSS context instance.
+ *
+ * Used when destroying a flow rule targeting one or several Rx queues.
+ *
+ * This function decrements the reference count of the context and destroys
+ * it after reaching 0. The context must have no users at this point; all
+ * prior calls to mlx4_rss_attach() must have been followed by matching
+ * calls to mlx4_rss_detach().
+ *
+ * @param rss
+ *   RSS context to release.
+ */
+void mlx4_rss_put(struct mlx4_rss *rss)
+{
+	assert(rss->refcnt);
+	if (--rss->refcnt)
+		return;
+	assert(!rss->usecnt);
+	assert(!rss->qp);
+	assert(!rss->ind);
+	LIST_REMOVE(rss, next);
+	rte_free(rss);
+}
+
+/**
+ * Attach a user to a RSS context instance.
+ *
+ * Used when the RSS QP and indirection table objects must be instantiated,
+ * that is, when a flow rule must be enabled.
+ *
+ * This function increments the usage count of the context.
+ *
+ * @param rss
+ *   RSS context to attach to.
+ */
+int mlx4_rss_attach(struct mlx4_rss *rss)
+{
+	assert(rss->refcnt);
+	if (rss->usecnt++) {
+		assert(rss->qp);
+		assert(rss->ind);
+		return 0;
+	}
+
+	struct ibv_wq *ind_tbl[rss->queues];
+	struct priv *priv = rss->priv;
+	const char *msg;
+	unsigned int i;
+	int ret;
+
+	if (!rte_is_power_of_2(RTE_DIM(ind_tbl))) {
+		msg = "number of RSS queues must be a power of two";
+		goto error;
+	}
+	for (i = 0; i != RTE_DIM(ind_tbl); ++i) {
+		uint16_t id = rss->queue_id[i];
+		struct rxq *rxq = NULL;
+
+		if (id < priv->dev->data->nb_rx_queues)
+			rxq = priv->dev->data->rx_queues[id];
+		if (!rxq) {
+			msg = "RSS target queue is not configured";
+			goto error;
+		}
+		ind_tbl[i] = rxq->wq;
+	}
+	rss->ind = ibv_create_rwq_ind_table
+		(priv->ctx,
+		 &(struct ibv_rwq_ind_table_init_attr){
+			.log_ind_tbl_size = rte_log2_u32(RTE_DIM(ind_tbl)),
+			.ind_tbl = ind_tbl,
+			.comp_mask = 0,
+		 });
+	if (!rss->ind) {
+		msg = "RSS indirection table creation failure";
+		goto error;
+	}
+	rss->qp = ibv_create_qp_ex
+		(priv->ctx,
+		 &(struct ibv_qp_init_attr_ex){
+			.comp_mask = (IBV_QP_INIT_ATTR_PD |
+				      IBV_QP_INIT_ATTR_RX_HASH |
+				      IBV_QP_INIT_ATTR_IND_TABLE),
+			.qp_type = IBV_QPT_RAW_PACKET,
+			.pd = priv->pd,
+			.rwq_ind_tbl = rss->ind,
+			.rx_hash_conf = {
+				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
+				.rx_hash_key_len = MLX4_RSS_HASH_KEY_SIZE,
+				.rx_hash_key = rss->key,
+				.rx_hash_fields_mask = rss->fields,
+			},
+		 });
+	if (!rss->qp) {
+		msg = "RSS hash QP creation failure";
+		goto error;
+	}
+	ret = ibv_modify_qp
+		(rss->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_INIT,
+			.port_num = priv->port,
+		 },
+		 IBV_QP_STATE | IBV_QP_PORT);
+	if (ret) {
+		msg = "failed to switch RSS hash QP to INIT state";
+		goto error;
+	}
+	ret = ibv_modify_qp
+		(rss->qp,
+		 &(struct ibv_qp_attr){
+			.qp_state = IBV_QPS_RTR,
+		 },
+		 IBV_QP_STATE);
+	if (ret) {
+		msg = "failed to switch RSS hash QP to RTR state";
+		goto error;
+	}
+	return 0;
+error:
+	ERROR("mlx4: %s", msg);
+	--rss->usecnt;
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Detach a user from a RSS context instance.
+ *
+ * Used when disabling (not destroying) a flow rule.
+ *
+ * This function decrements the usage count of the context and destroys
+ * usage resources after reaching 0.
+ *
+ * @param rss
+ *   RSS context to detach from.
+ */
+void mlx4_rss_detach(struct mlx4_rss *rss)
+{
+	assert(rss->refcnt);
+	assert(rss->qp);
+	assert(rss->ind);
+	if (--rss->usecnt)
+		return;
+	claim_zero(ibv_destroy_qp(rss->qp));
+	rss->qp = NULL;
+	claim_zero(ibv_destroy_rwq_ind_table(rss->ind));
+	rss->ind = NULL;
+}
+
+/**
  * Allocate Rx queue elements.
  *
  * @param rxq
@@ -295,57 +531,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
-	rxq->ind = ibv_create_rwq_ind_table
-		(priv->ctx,
-		 &(struct ibv_rwq_ind_table_init_attr){
-			.log_ind_tbl_size = 0,
-			.ind_tbl = (struct ibv_wq *[]){
-				rxq->wq,
-			},
-			.comp_mask = 0,
-		 });
-	if (!rxq->ind) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: indirection table creation failure: %s",
-		      (void *)dev, strerror(errno));
-		goto error;
-	}
-	rxq->qp = ibv_create_qp_ex
-		(priv->ctx,
-		 &(struct ibv_qp_init_attr_ex){
-			.comp_mask = (IBV_QP_INIT_ATTR_PD |
-				      IBV_QP_INIT_ATTR_RX_HASH |
-				      IBV_QP_INIT_ATTR_IND_TABLE),
-			.qp_type = IBV_QPT_RAW_PACKET,
-			.pd = priv->pd,
-			.rwq_ind_tbl = rxq->ind,
-			.rx_hash_conf = {
-				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
-				.rx_hash_key_len = MLX4_RSS_HASH_KEY_SIZE,
-				.rx_hash_key =
-					(uint8_t [MLX4_RSS_HASH_KEY_SIZE]){ 0 },
-				.rx_hash_fields_mask = 0,
-			},
-		 });
-	if (!rxq->qp) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = ibv_modify_qp
-		(rxq->qp,
-		 &(struct ibv_qp_attr){
-			.qp_state = IBV_QPS_INIT,
-			.port_num = priv->port,
-		 },
-		 IBV_QP_STATE | IBV_QP_PORT);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
 	ret = mlx4_rxq_alloc_elts(rxq);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
@@ -361,18 +546,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      strerror(rte_errno));
 		goto error;
 	}
-	ret = ibv_modify_qp
-		(rxq->qp,
-		 &(struct ibv_qp_attr){
-			.qp_state = IBV_QPS_RTR,
-		 },
-		 IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
 	DEBUG("%p: adding Rx queue %p to list", (void *)dev, (void *)rxq);
 	dev->data->rx_queues[idx] = rxq;
 	/* Enable associated flows. */
@@ -417,10 +590,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		}
 	mlx4_flow_sync(priv, NULL);
 	mlx4_rxq_free_elts(rxq);
-	if (rxq->qp)
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	if (rxq->ind)
-		claim_zero(ibv_destroy_rwq_ind_table(rxq->ind));
 	if (rxq->wq)
 		claim_zero(ibv_destroy_wq(rxq->wq));
 	if (rxq->cq)
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 897fd2a..eca966f 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -35,6 +35,7 @@
 #define MLX4_RXTX_H_
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 /* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
@@ -74,8 +75,6 @@ struct rxq {
 	struct ibv_mr *mr; /**< Memory region (for mp). */
 	struct ibv_cq *cq; /**< Completion queue. */
 	struct ibv_wq *wq; /**< Work queue. */
-	struct ibv_rwq_ind_table *ind; /**< Indirection table. */
-	struct ibv_qp *qp; /**< Queue pair. */
 	struct ibv_comp_channel *channel; /**< Rx completion channel. */
 	unsigned int port_id; /**< Port ID for incoming packets. */
 	unsigned int elts_n; /**< (*elts)[] length. */
@@ -86,6 +85,20 @@ struct rxq {
 	uint8_t data[]; /**< Remaining queue resources. */
 };
 
+/** Shared flow target for Rx queues. */
+struct mlx4_rss {
+	LIST_ENTRY(mlx4_rss) next; /**< Next entry in list. */
+	struct priv *priv; /**< Back pointer to private data. */
+	uint32_t refcnt; /**< Reference count for this object. */
+	uint32_t usecnt; /**< Number of users relying on @p qp and @p ind. */
+	struct ibv_qp *qp; /**< Queue pair. */
+	struct ibv_rwq_ind_table *ind; /**< Indirection table. */
+	uint64_t fields; /**< Fields for RSS processing (Verbs format). */
+	uint8_t key[MLX4_RSS_HASH_KEY_SIZE]; /**< Hash key to use. */
+	uint16_t queues; /**< Number of target queues. */
+	uint16_t queue_id[]; /**< Target queues. */
+};
+
 /** Tx element. */
 struct txq_elt {
 	struct ibv_send_wr wr; /**< Work request. */
@@ -126,6 +139,13 @@ struct txq {
 
 /* mlx4_rxq.c */
 
+uint8_t mlx4_rss_hash_key_default[MLX4_RSS_HASH_KEY_SIZE];
+struct mlx4_rss *mlx4_rss_get(struct priv *priv, uint64_t fields,
+			      uint8_t key[MLX4_RSS_HASH_KEY_SIZE],
+			      uint16_t queues, const uint16_t queue_id[]);
+void mlx4_rss_put(struct mlx4_rss *rss);
+int mlx4_rss_attach(struct mlx4_rss *rss);
+void mlx4_rss_detach(struct mlx4_rss *rss);
 int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
 			uint16_t desc, unsigned int socket,
 			const struct rte_eth_rxconf *conf,
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0b8f612..c0e0e86 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -135,7 +135,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI)        += -lrte_pmd_kni
 endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD)        += -lrte_pmd_lio
-_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs -lmlx4
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD)       += -lrte_pmd_mlx5 -libverbs -lmlx5
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MRVL_PMD)       += -lrte_pmd_mrvl -L$(LIBMUSDK_PATH)/lib -lmusdk
 _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD)        += -lrte_pmd_nfp
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 28/29] net/mlx4: disable UDP support in RSS flow rules
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (26 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 27/29] net/mlx4: add RSS flow rule action support Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 12:19   ` [PATCH v2 29/29] net/mlx4: add RSS support outside flow API Adrien Mazarguil
  2017-10-12 19:12   ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Ferruh Yigit
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

When part of the RSS hash calculation, UDP packets are discarded (not
received on any queue) likely due to an issue with the kernel
implementation.

Temporarily disable UDP RSS support until this issue is resolved.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 2b60d76..4c498f0 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -133,9 +133,11 @@ mlx4_conv_rss_hf(uint64_t rss_hf)
 		[TCP] = (ETH_RSS_NONFRAG_IPV4_TCP |
 			 ETH_RSS_NONFRAG_IPV6_TCP |
 			 ETH_RSS_IPV6_TCP_EX),
-		[UDP] = (ETH_RSS_NONFRAG_IPV4_UDP |
-			 ETH_RSS_NONFRAG_IPV6_UDP |
-			 ETH_RSS_IPV6_UDP_EX),
+		/*
+		 * UDP support is temporarily disabled due to an
+		 * implementation issue in the kernel.
+		 */
+		[UDP] = 0,
 	};
 	const uint64_t out[RTE_DIM(in)] = {
 		[IPV4] = IBV_RX_HASH_SRC_IPV4 | IBV_RX_HASH_DST_IPV4,
@@ -753,10 +755,8 @@ mlx4_flow_prepare(struct priv *priv,
 					.rss_key = mlx4_rss_hash_key_default,
 					.rss_key_len = MLX4_RSS_HASH_KEY_SIZE,
 					.rss_hf = (ETH_RSS_IPV4 |
-						   ETH_RSS_NONFRAG_IPV4_UDP |
 						   ETH_RSS_NONFRAG_IPV4_TCP |
 						   ETH_RSS_IPV6 |
-						   ETH_RSS_NONFRAG_IPV6_UDP |
 						   ETH_RSS_NONFRAG_IPV6_TCP),
 				};
 			/* Sanity checks. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 29/29] net/mlx4: add RSS support outside flow API
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (27 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 28/29] net/mlx4: disable UDP support in RSS flow rules Adrien Mazarguil
@ 2017-10-12 12:19   ` Adrien Mazarguil
  2017-10-12 19:12   ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Ferruh Yigit
  29 siblings, 0 replies; 64+ messages in thread
From: Adrien Mazarguil @ 2017-10-12 12:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Nelio Laranjeiro, dev

Bring back support for automatic RSS with the default flow rules when not
in isolated mode. Balancing is done according to unspecified default
settings, as was the case before this entire rework.

Since the number of queues part of RSS contexts is limited to power of two
values, the number of configured queues is rounded down to its previous
power of two; extra queues are silently discarded. This does not prevent
dedicated flow rules from targeting them.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 4c498f0..5c4bf8e 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -1256,12 +1256,21 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 			.type = RTE_FLOW_ITEM_TYPE_END,
 		},
 	};
+	/*
+	 * Round number of queues down to their previous power of 2 to
+	 * comply with RSS context limitations. Extra queues silently do not
+	 * get RSS by default.
+	 */
+	uint32_t queues =
+		rte_align32pow2(priv->dev->data->nb_rx_queues + 1) >> 1;
+	alignas(struct rte_flow_action_rss) uint8_t rss_conf_data
+		[offsetof(struct rte_flow_action_rss, queue) +
+		 sizeof(((struct rte_flow_action_rss *)0)->queue[0]) * queues];
+	struct rte_flow_action_rss *rss_conf = (void *)rss_conf_data;
 	struct rte_flow_action actions[] = {
 		{
-			.type = RTE_FLOW_ACTION_TYPE_QUEUE,
-			.conf = &(struct rte_flow_action_queue){
-				.index = 0,
-			},
+			.type = RTE_FLOW_ACTION_TYPE_RSS,
+			.conf = rss_conf,
 		},
 		{
 			.type = RTE_FLOW_ACTION_TYPE_END,
@@ -1281,6 +1290,13 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	unsigned int i;
 	int err = 0;
 
+	/* Prepare default RSS configuration. */
+	*rss_conf = (struct rte_flow_action_rss){
+		.rss_conf = NULL, /* Rely on default fallback settings. */
+		.num = queues,
+	};
+	for (i = 0; i != queues; ++i)
+		rss_conf->queue[i] = i;
 	/*
 	 * Set up VLAN item if filtering is enabled and at least one VLAN
 	 * filter is configured.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 00/29] net/mlx4: restore PMD functionality
  2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
                     ` (28 preceding siblings ...)
  2017-10-12 12:19   ` [PATCH v2 29/29] net/mlx4: add RSS support outside flow API Adrien Mazarguil
@ 2017-10-12 19:12   ` Ferruh Yigit
  29 siblings, 0 replies; 64+ messages in thread
From: Ferruh Yigit @ 2017-10-12 19:12 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Nelio Laranjeiro, dev

On 10/12/2017 1:19 PM, Adrien Mazarguil wrote:
> This series restores all the control path functionality removed in prior
> series "net/mlx4: trim and refactor entire PMD", including:
> 
> - Promiscuous mode.
> - All multicast mode.
> - MAC address configuration.
> - Support for multiple simultaneous MAC addresses.
> - Reception of broadcast and user-defined multicast traffic.
> - VLAN filters.
> - RSS.
> 
> This rework also results in the following enhancements:
> 
> - Support for multiple flow rule priorities (up to 4096).
> - Much more comprehensive error messages when failing to create or apply
>   flow rules.
> - Flow rules with the RSS action targeting disparate queues can now overlap
>   (as long as they take HW limitations into account).
> - RSS contexts can be created/destroyed on demand (they were previously
>   fixed once and for all after applying the first flow rule).
> - RSS hash key can be configured per context.
> - Rx objects have a smaller memory footprint.
> 
> Note that it should be applied directly before the following series:
> 
>  "new mlx4 datapath bypassing ibverbs"
> 
> For which a new version based on top of this one will be submitted soon.
> 
> v2 changes:
> 
> - Moved new memory allocation wrappers from EAL into the mlx4 PMD.
> - Re-based on latest dpdk-next-net.
> 
> Adrien Mazarguil (29):
>   ethdev: expose flow API error helper
>   net/mlx4: replace bit-field type
>   net/mlx4: remove Rx QP initializer function
>   net/mlx4: enhance header files comments
>   net/mlx4: expose support for flow rule priorities
>   net/mlx4: clarify flow objects naming scheme
>   net/mlx4: tidy up flow rule handling code
>   net/mlx4: compact flow rule error reporting
>   net/mlx4: add iovec-like allocation wrappers
>   net/mlx4: merge flow creation and validation code
>   net/mlx4: allocate drop flow resources on demand
>   net/mlx4: relax check on missing flow rule target
>   net/mlx4: refactor internal flow rules
>   net/mlx4: generalize flow rule priority support
>   net/mlx4: simplify trigger code for flow rules
>   net/mlx4: refactor flow item validation code
>   net/mlx4: add MAC addresses configuration support
>   net/mlx4: add VLAN filter configuration support
>   net/mlx4: add flow support for multicast traffic
>   net/mlx4: restore promisc and allmulti support
>   net/mlx4: update Rx/Tx callbacks consistently
>   net/mlx4: fix invalid errno value sign
>   net/mlx4: drop live queue reconfiguration support
>   net/mlx4: allocate queues and mbuf rings together
>   net/mlx4: convert Rx path to work queues
>   net/mlx4: remove unnecessary check
>   net/mlx4: add RSS flow rule action support
>   net/mlx4: disable UDP support in RSS flow rules
>   net/mlx4: add RSS support outside flow API

Series applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2017-10-12 19:12 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11 14:35 [PATCH v1 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 01/29] ethdev: expose flow API error helper Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 02/29] net/mlx4: replace bit-field type Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 03/29] net/mlx4: remove Rx QP initializer function Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 04/29] net/mlx4: enhance header files comments Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 05/29] net/mlx4: expose support for flow rule priorities Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 06/29] net/mlx4: clarify flow objects naming scheme Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 07/29] net/mlx4: tidy up flow rule handling code Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 08/29] net/mlx4: compact flow rule error reporting Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 09/29] mem: add iovec-like allocation wrappers Adrien Mazarguil
2017-10-11 21:58   ` Ferruh Yigit
2017-10-11 22:00     ` Ferruh Yigit
2017-10-12 11:07     ` Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 10/29] net/mlx4: merge flow creation and validation code Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 11/29] net/mlx4: allocate drop flow resources on demand Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 12/29] net/mlx4: relax check on missing flow rule target Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 13/29] net/mlx4: refactor internal flow rules Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 14/29] net/mlx4: generalize flow rule priority support Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 15/29] net/mlx4: simplify trigger code for flow rules Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 16/29] net/mlx4: refactor flow item validation code Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 17/29] net/mlx4: add MAC addresses configuration support Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 18/29] net/mlx4: add VLAN filter " Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 19/29] net/mlx4: add flow support for multicast traffic Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 20/29] net/mlx4: restore promisc and allmulti support Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 21/29] net/mlx4: update Rx/Tx callbacks consistently Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 22/29] net/mlx4: fix invalid errno value sign Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 23/29] net/mlx4: drop live queue reconfiguration support Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 24/29] net/mlx4: allocate queues and mbuf rings together Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 25/29] net/mlx4: convert Rx path to work queues Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 26/29] net/mlx4: remove unnecessary check Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 27/29] net/mlx4: add RSS flow rule action support Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 28/29] net/mlx4: disable UDP support in RSS flow rules Adrien Mazarguil
2017-10-11 14:35 ` [PATCH v1 29/29] net/mlx4: add RSS support outside flow API Adrien Mazarguil
2017-10-12 12:19 ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 01/29] ethdev: expose flow API error helper Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 02/29] net/mlx4: replace bit-field type Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 03/29] net/mlx4: remove Rx QP initializer function Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 04/29] net/mlx4: enhance header files comments Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 05/29] net/mlx4: expose support for flow rule priorities Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 06/29] net/mlx4: clarify flow objects naming scheme Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 07/29] net/mlx4: tidy up flow rule handling code Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 08/29] net/mlx4: compact flow rule error reporting Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 09/29] net/mlx4: add iovec-like allocation wrappers Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 10/29] net/mlx4: merge flow creation and validation code Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 11/29] net/mlx4: allocate drop flow resources on demand Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 12/29] net/mlx4: relax check on missing flow rule target Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 13/29] net/mlx4: refactor internal flow rules Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 14/29] net/mlx4: generalize flow rule priority support Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 15/29] net/mlx4: simplify trigger code for flow rules Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 16/29] net/mlx4: refactor flow item validation code Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 17/29] net/mlx4: add MAC addresses configuration support Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 18/29] net/mlx4: add VLAN filter " Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 19/29] net/mlx4: add flow support for multicast traffic Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 20/29] net/mlx4: restore promisc and allmulti support Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 21/29] net/mlx4: update Rx/Tx callbacks consistently Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 22/29] net/mlx4: fix invalid errno value sign Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 23/29] net/mlx4: drop live queue reconfiguration support Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 24/29] net/mlx4: allocate queues and mbuf rings together Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 25/29] net/mlx4: convert Rx path to work queues Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 26/29] net/mlx4: remove unnecessary check Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 27/29] net/mlx4: add RSS flow rule action support Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 28/29] net/mlx4: disable UDP support in RSS flow rules Adrien Mazarguil
2017-10-12 12:19   ` [PATCH v2 29/29] net/mlx4: add RSS support outside flow API Adrien Mazarguil
2017-10-12 19:12   ` [PATCH v2 00/29] net/mlx4: restore PMD functionality Ferruh Yigit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.