All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD
@ 2017-08-01 16:53 Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 01/48] net/mlx4: add consistency to copyright notices Adrien Mazarguil
                   ` (49 more replies)
  0 siblings, 50 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

The main purpose of this large series is to relieve the mlx4 PMD from its
dependency on Mellanox OFED to instead rely on the standard rdma-core
package provided by Linux distributions.

While compatibility with Mellanox OFED is preserved, all nonstandard
functionality has to be stripped from the PMD in order to re-implement it
through an approach compatible with rdma-core.

Due to the amount of changes necessary to achieve this goal, this rework
starts off by removing extraneous code to simplify the PMD as much as
possible before either replacing or dismantling functionality that relies on
nonstandard Verbs.

What remains after applying this series is single-segment Tx/Rx support,
without offloads nor RSS, on the default MAC address (which cannot be
configured). Support for multiple queues and the flow API (minus the RSS
action) are also preserved.

Missing functionality that needs substantial work will be restored later by
subsequent series.

Also because the mlx4 PMD is mostly contained in a single very large source
file of 6400+ lines (mlx4.c) which has become extremely difficult to
maintain, this rework is used as an opportunity to finally group functions
into separate files, as in mlx5.

This rework targets DPDK 17.11.

Adrien Mazarguil (48):
  net/mlx4: add consistency to copyright notices
  net/mlx4: remove limitation on number of instances
  net/mlx4: check max number of ports dynamically
  net/mlx4: remove useless compilation checks
  net/mlx4: remove secondary process support
  net/mlx4: remove useless code
  net/mlx4: remove soft counters compilation option
  net/mlx4: remove scatter mode compilation option
  net/mlx4: remove Tx inline compilation option
  net/mlx4: remove allmulti and promisc support
  net/mlx4: remove VLAN filter support
  net/mlx4: remove MAC address configuration support
  net/mlx4: drop MAC flows affecting all Rx queues
  net/mlx4: revert flow API RSS support
  net/mlx4: revert RSS parent queue refactoring
  net/mlx4: drop RSS support
  net/mlx4: drop checksum offloads support
  net/mlx4: drop packet type recognition support
  net/mlx4: drop scatter/gather support
  net/mlx4: drop inline receive support
  net/mlx4: use standard QP attributes
  net/mlx4: revert resource domain support
  net/mlx4: revert multicast echo prevention
  net/mlx4: revert fast Verbs interface for Tx
  net/mlx4: revert fast Verbs interface for Rx
  net/mlx4: simplify link update function
  net/mlx4: standardize on negative errno values
  net/mlx4: clean up coding style inconsistencies
  net/mlx4: remove control path locks
  net/mlx4: remove unnecessary wrapper functions
  net/mlx4: remove mbuf macro definitions
  net/mlx4: use standard macro to get array size
  net/mlx4: separate debugging macros
  net/mlx4: use a single interrupt handle
  net/mlx4: rename alarm field
  net/mlx4: refactor interrupt FD settings
  net/mlx4: clean up interrupt functions prototypes
  net/mlx4: compact interrupt functions
  net/mlx4: separate interrupt handling
  net/mlx4: separate Rx/Tx definitions
  net/mlx4: separate Rx/Tx functions
  net/mlx4: separate device control functions
  net/mlx4: separate Tx configuration functions
  net/mlx4: separate Rx configuration functions
  net/mlx4: group flow API handlers in common file
  net/mlx4: rename private functions in flow API
  net/mlx4: separate memory management functions
  net/mlx4: clean up includes and comments

 config/common_base                |    3 -
 doc/guides/nics/features/mlx4.ini |   13 -
 doc/guides/nics/mlx4.rst          |   37 +-
 drivers/net/mlx4/Makefile         |   41 +-
 drivers/net/mlx4/mlx4.c           | 6370 ++------------------------------
 drivers/net/mlx4/mlx4.h           |  322 +-
 drivers/net/mlx4/mlx4_ethdev.c    |  792 ++++
 drivers/net/mlx4/mlx4_flow.c      |  457 +--
 drivers/net/mlx4/mlx4_flow.h      |   51 +-
 drivers/net/mlx4/mlx4_intr.c      |  377 ++
 drivers/net/mlx4/mlx4_mr.c        |  183 +
 drivers/net/mlx4/mlx4_rxq.c       |  632 ++++
 drivers/net/mlx4/mlx4_rxtx.c      |  533 +++
 drivers/net/mlx4/mlx4_rxtx.h      |  164 +
 drivers/net/mlx4/mlx4_txq.c       |  472 +++
 drivers/net/mlx4/mlx4_utils.c     |   66 +
 drivers/net/mlx4/mlx4_utils.h     |  105 +
 17 files changed, 3824 insertions(+), 6794 deletions(-)
 create mode 100644 drivers/net/mlx4/mlx4_ethdev.c
 create mode 100644 drivers/net/mlx4/mlx4_intr.c
 create mode 100644 drivers/net/mlx4/mlx4_mr.c
 create mode 100644 drivers/net/mlx4/mlx4_rxq.c
 create mode 100644 drivers/net/mlx4/mlx4_rxtx.c
 create mode 100644 drivers/net/mlx4/mlx4_rxtx.h
 create mode 100644 drivers/net/mlx4/mlx4_txq.c
 create mode 100644 drivers/net/mlx4/mlx4_utils.c
 create mode 100644 drivers/net/mlx4/mlx4_utils.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v1 01/48] net/mlx4: add consistency to copyright notices
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 02/48] net/mlx4: remove limitation on number of instances Adrien Mazarguil
                   ` (48 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

Copyright lasts long enough not to require notices to be updated yearly.

The current approach of updating them occasionally while working on
unrelated tasks should be deprecated in favor of dedicated commits updating
all files at once when necessary.

Standardize on a single year per copyright owner.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/mlx4.rst     | 2 +-
 drivers/net/mlx4/Makefile    | 4 ++--
 drivers/net/mlx4/mlx4.c      | 4 ++--
 drivers/net/mlx4/mlx4.h      | 4 ++--
 drivers/net/mlx4/mlx4_flow.c | 2 +-
 drivers/net/mlx4/mlx4_flow.h | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index d5bf2b3..4c8c299 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-    Copyright 2012-2015 6WIND S.A.
+    Copyright 2012 6WIND S.A.
     Copyright 2015 Mellanox
 
     Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 755c8a4..ce4b244 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -1,7 +1,7 @@
 #   BSD LICENSE
 #
-#   Copyright 2012-2015 6WIND S.A.
-#   Copyright 2012 Mellanox.
+#   Copyright 2012 6WIND S.A.
+#   Copyright 2012 Mellanox
 #
 #   Redistribution and use in source and binary forms, with or without
 #   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 055de49..b5a7607 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2012-2017 6WIND S.A.
- *   Copyright 2012-2017 Mellanox.
+ *   Copyright 2012 6WIND S.A.
+ *   Copyright 2012 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a2e0ae7..6421c91 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2012-2017 6WIND S.A.
- *   Copyright 2012-2017 Mellanox.
+ *   Copyright 2012 6WIND S.A.
+ *   Copyright 2012 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 925c89c..ab37e7d 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -2,7 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright 2017 6WIND S.A.
- *   Copyright 2017 Mellanox.
+ *   Copyright 2017 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index beabcf2..4654dc2 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -2,7 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright 2017 6WIND S.A.
- *   Copyright 2017 Mellanox.
+ *   Copyright 2017 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 02/48] net/mlx4: remove limitation on number of instances
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 01/48] net/mlx4: add consistency to copyright notices Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 03/48] net/mlx4: check max number of ports dynamically Adrien Mazarguil
                   ` (47 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

The seemingly artificial limitation on the maximum number of instances for
this PMD is an historical leftover that predates its first public release.

It was used as a workaround to support multiple physical ports on a PCI
device exposing a single bus address when mlx4 was implemented directly as
an Ethernet device driver instead of a PCI driver spawning Ethernet
devices.

Getting rid of it simplifies device initialization.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 57 +++-----------------------------------------
 1 file changed, 3 insertions(+), 54 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b5a7607..0ae78e0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -5444,40 +5444,6 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-/* Support up to 32 adapters. */
-static struct {
-	struct rte_pci_addr pci_addr; /* associated PCI address */
-	uint32_t ports; /* physical ports bitfield. */
-} mlx4_dev[32];
-
-/**
- * Get device index in mlx4_dev[] from PCI bus address.
- *
- * @param[in] pci_addr
- *   PCI bus address to look for.
- *
- * @return
- *   mlx4_dev[] index on success, -1 on failure.
- */
-static int
-mlx4_dev_idx(struct rte_pci_addr *pci_addr)
-{
-	unsigned int i;
-	int ret = -1;
-
-	assert(pci_addr != NULL);
-	for (i = 0; (i != elemof(mlx4_dev)); ++i) {
-		if ((mlx4_dev[i].pci_addr.domain == pci_addr->domain) &&
-		    (mlx4_dev[i].pci_addr.bus == pci_addr->bus) &&
-		    (mlx4_dev[i].pci_addr.devid == pci_addr->devid) &&
-		    (mlx4_dev[i].pci_addr.function == pci_addr->function))
-			return i;
-		if ((mlx4_dev[i].ports == 0) && (ret == -1))
-			ret = i;
-	}
-	return ret;
-}
-
 /**
  * Retrieve integer value from environment variable.
  *
@@ -6060,21 +6026,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		.active_ports = 0,
 	};
 	unsigned int vf;
-	int idx;
 	int i;
 
 	(void)pci_drv;
 	assert(pci_drv == &mlx4_driver);
-	/* Get mlx4_dev[] index. */
-	idx = mlx4_dev_idx(&pci_dev->addr);
-	if (idx == -1) {
-		ERROR("this driver cannot support any more adapters");
-		return -ENOMEM;
-	}
-	DEBUG("using driver device index %d", idx);
 
-	/* Save PCI address. */
-	mlx4_dev[idx].pci_addr = pci_dev->addr;
 	list = ibv_get_device_list(&i);
 	if (list == NULL) {
 		assert(errno);
@@ -6141,7 +6097,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	}
 	for (i = 0; i < device_attr.phys_port_cnt; i++) {
 		uint32_t port = i + 1; /* ports are indexed from one */
-		uint32_t test = (1 << i);
 		struct ibv_context *ctx = NULL;
 		struct ibv_port_attr port_attr;
 		struct ibv_pd *pd = NULL;
@@ -6162,7 +6117,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* RSS_SUPPORT */
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
-		DEBUG("using port %u (%08" PRIx32 ")", port, test);
+		DEBUG("using port %u", port);
 
 		ctx = ibv_open_device(ibv_dev);
 		if (ctx == NULL) {
@@ -6198,8 +6153,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 
-		mlx4_dev[idx].ports |= test;
-
 		/* from rte_ethdev.c */
 		priv = rte_zmalloc("ethdev private structure",
 				   sizeof(*priv),
@@ -6405,6 +6358,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			rte_eth_dev_release_port(eth_dev);
 		break;
 	}
+	if (i == device_attr.phys_port_cnt)
+		return 0;
 
 	/*
 	 * XXX if something went wrong in the loop above, there is a resource
@@ -6413,12 +6368,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	 * way to enumerate the registered ethdevs to free the previous ones.
 	 */
 
-	/* no port found, complain */
-	if (!mlx4_dev[idx].ports) {
-		err = ENODEV;
-		goto error;
-	}
-
 error:
 	if (attr_ctx)
 		claim_zero(ibv_close_device(attr_ctx));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 03/48] net/mlx4: check max number of ports dynamically
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 01/48] net/mlx4: add consistency to copyright notices Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 02/48] net/mlx4: remove limitation on number of instances Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 17:35   ` Legacy, Allain
  2017-08-01 16:53 ` [PATCH v1 04/48] net/mlx4: remove useless compilation checks Adrien Mazarguil
                   ` (46 subsequent siblings)
  49 siblings, 1 reply; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev; +Cc: Gaëtan Rivet, Allain Legacy

Use maximum number reported by hardware capabilities as replacement for the
static check on MLX4_PMD_MAX_PHYS_PORTS.

Cc: Gaëtan Rivet <gaetan.rivet@6wind.com>
Cc: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 43 +++++++++++++++++++++++++------------------
 drivers/net/mlx4/mlx4.h |  3 ---
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 0ae78e0..e28928c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -118,8 +118,12 @@ struct mlx4_secondary_data {
 	rte_spinlock_t lock; /* Port configuration lock. */
 } mlx4_secondary_data[RTE_MAX_ETHPORTS];
 
+/** Configuration structure for device arguments. */
 struct mlx4_conf {
-	uint8_t active_ports;
+	struct {
+		uint32_t present; /**< Bit-field for existing ports. */
+		uint32_t enabled; /**< Bit-field for user-enabled ports. */
+	} ports;
 };
 
 /* Available parameters list. */
@@ -5927,16 +5931,15 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
  *   Key argument to verify.
  * @param[in] val
  *   Value associated with key.
- * @param out
- *   User data.
+ * @param[in, out] conf
+ *   Shared configuration data.
  *
  * @return
  *   0 on success, negative errno value on failure.
  */
 static int
-mlx4_arg_parse(const char *key, const char *val, void *out)
+mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 {
-	struct mlx4_conf *conf = out;
 	unsigned long tmp;
 
 	errno = 0;
@@ -5946,12 +5949,11 @@ mlx4_arg_parse(const char *key, const char *val, void *out)
 		return -errno;
 	}
 	if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) {
-		if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) {
-			ERROR("invalid port index %lu (max: %u)",
-				tmp, MLX4_PMD_MAX_PHYS_PORTS - 1);
+		if (!(conf->ports.present & (1 << tmp))) {
+			ERROR("invalid port index %lu", tmp);
 			return -EINVAL;
 		}
-		conf->active_ports |= 1 << tmp;
+		conf->ports.enabled |= 1 << tmp;
 	} else {
 		WARN("%s: unknown parameter", key);
 		return -EINVAL;
@@ -5987,8 +5989,13 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
 		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
 		while (arg_count-- > 0) {
-			ret = rte_kvargs_process(kvlist, MLX4_PMD_PORT_KVARG,
-					mlx4_arg_parse, conf);
+			ret = rte_kvargs_process(kvlist,
+						 MLX4_PMD_PORT_KVARG,
+						 (int (*)(const char *,
+							  const char *,
+							  void *))
+						 mlx4_arg_parse,
+						 conf);
 			if (ret != 0)
 				goto free_kvlist;
 		}
@@ -6023,7 +6030,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_context *attr_ctx = NULL;
 	struct ibv_device_attr device_attr;
 	struct mlx4_conf conf = {
-		.active_ports = 0,
+		.ports.present = 0,
 	};
 	unsigned int vf;
 	int i;
@@ -6085,16 +6092,16 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	}
 	INFO("%u port(s) detected", device_attr.phys_port_cnt);
 
+	for (i = 0; i < device_attr.phys_port_cnt; ++i)
+		conf.ports.present |= 1 << i;
 	if (mlx4_args(pci_dev->device.devargs, &conf)) {
 		ERROR("failed to process device arguments");
 		err = EINVAL;
 		goto error;
 	}
 	/* Use all ports when none are defined */
-	if (conf.active_ports == 0) {
-		for (i = 0; i < MLX4_PMD_MAX_PHYS_PORTS; i++)
-			conf.active_ports |= 1 << i;
-	}
+	if (!conf.ports.enabled)
+		conf.ports.enabled = conf.ports.present;
 	for (i = 0; i < device_attr.phys_port_cnt; i++) {
 		uint32_t port = i + 1; /* ports are indexed from one */
 		struct ibv_context *ctx = NULL;
@@ -6107,8 +6114,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */
 		struct ether_addr mac;
 
-		/* If port is not active, skip. */
-		if (!(conf.active_ports & (1 << i)))
+		/* If port is not enabled, skip. */
+		if (!(conf.ports.enabled & (1 << i)))
 			continue;
 #ifdef HAVE_EXP_QUERY_DEVICE
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 6421c91..25a7212 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -81,9 +81,6 @@
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
-/* Maximum number of physical ports. */
-#define MLX4_PMD_MAX_PHYS_PORTS 2
-
 /* Maximum number of Scatter/Gather Elements per Work Request. */
 #ifndef MLX4_PMD_SGE_WR_N
 #define MLX4_PMD_SGE_WR_N 4
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 04/48] net/mlx4: remove useless compilation checks
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (2 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 03/48] net/mlx4: check max number of ports dynamically Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-18 13:39   ` Ferruh Yigit
  2017-08-01 16:53 ` [PATCH v1 05/48] net/mlx4: remove secondary process support Adrien Mazarguil
                   ` (45 subsequent siblings)
  49 siblings, 1 reply; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

Verbs support for RSS, inline receive and extended device query calls has
not been optional for a while. Their absence is untested and is therefore
unsupported.

Remove the related compilation checks and assume Mellanox OFED is up to
date, as described in the documentation.

Use this opportunity to remove a few useless data path debugging messages
behind compilation checks on never defined macros.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    | 12 ------------
 drivers/net/mlx4/mlx4.c      | 35 -----------------------------------
 drivers/net/mlx4/mlx4.h      |  2 --
 drivers/net/mlx4/mlx4_flow.c |  3 ---
 4 files changed, 52 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index ce4b244..ab2a867 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -100,18 +100,6 @@ mlx4_autoconf.h.new: FORCE
 mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 	$Q $(RM) -f -- '$@'
 	$Q sh -- '$<' '$@' \
-		RSS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_EXP_DEVICE_UD_RSS $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		INLINE_RECV \
-		infiniband/verbs.h \
-		enum IBV_EXP_DEVICE_ATTR_INLINE_RECV_SZ $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_EXP_QUERY_DEVICE \
-		infiniband/verbs.h \
-		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
 		HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
 		infiniband/verbs.h \
 		enum IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e28928c..e6fc204 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1103,10 +1103,6 @@ txq_complete(struct txq *txq)
 
 	if (unlikely(elts_comp == 0))
 		return 0;
-#ifdef DEBUG_SEND
-	DEBUG("%p: processing %u work requests completions",
-	      (void *)txq, elts_comp);
-#endif
 	wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
 	if (unlikely(wcs_n == 0))
 		return 0;
@@ -3155,9 +3151,6 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return 0;
 	*next = NULL;
 	/* Repost WRs. */
-#ifdef DEBUG_RECV
-	DEBUG("%p: reposting %d WRs", (void *)rxq, i);
-#endif
 	ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
@@ -3318,9 +3311,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	if (unlikely(i == 0))
 		return 0;
 	/* Repost WRs. */
-#ifdef DEBUG_RECV
-	DEBUG("%p: reposting %u WRs", (void *)rxq, i);
-#endif
 	ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
@@ -3418,15 +3408,11 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-#ifdef INLINE_RECV
 	attr.max_inl_recv = priv->inl_recv_size;
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-#endif
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
-#ifdef RSS_SUPPORT
-
 /**
  * Allocate a RSS Queue Pair.
  * Optionally setup inline receive if supported.
@@ -3474,10 +3460,8 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-#ifdef INLINE_RECV
 	attr.max_inl_recv = priv->inl_recv_size,
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-#endif
 	if (children_n > 0) {
 		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
 		/* TSS isn't necessary. */
@@ -3493,8 +3477,6 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
-#endif /* RSS_SUPPORT */
-
 /**
  * Reconfigure a RX queue with new parameters.
  *
@@ -3728,13 +3710,11 @@ rxq_create_qp(struct rxq *rxq,
 	int parent = (children_n > 0);
 	struct priv *priv = rxq->priv;
 
-#ifdef RSS_SUPPORT
 	if (priv->rss && !inactive && (rxq_parent || parent))
 		rxq->qp = rxq_setup_qp_rss(priv, rxq->cq, desc,
 					   children_n, rxq->rd,
 					   rxq_parent);
 	else
-#endif /* RSS_SUPPORT */
 		rxq->qp = rxq_setup_qp(priv, rxq->cq, desc, rxq->rd);
 	if (rxq->qp == NULL) {
 		ret = (errno ? errno : EINVAL);
@@ -3750,9 +3730,7 @@ rxq_create_qp(struct rxq *rxq,
 	};
 	ret = ibv_exp_modify_qp(rxq->qp, &mod,
 				(IBV_EXP_QP_STATE |
-#ifdef RSS_SUPPORT
 				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-#endif /* RSS_SUPPORT */
 				 IBV_EXP_QP_PORT));
 	if (ret) {
 		ERROR("QP state to IBV_QPS_INIT failed: %s",
@@ -6109,20 +6087,14 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		struct ibv_pd *pd = NULL;
 		struct priv *priv = NULL;
 		struct rte_eth_dev *eth_dev = NULL;
-#ifdef HAVE_EXP_QUERY_DEVICE
 		struct ibv_exp_device_attr exp_device_attr;
-#endif /* HAVE_EXP_QUERY_DEVICE */
 		struct ether_addr mac;
 
 		/* If port is not enabled, skip. */
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
-#ifdef HAVE_EXP_QUERY_DEVICE
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
-#ifdef RSS_SUPPORT
 		exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
-#endif /* RSS_SUPPORT */
-#endif /* HAVE_EXP_QUERY_DEVICE */
 
 		DEBUG("using port %u", port);
 
@@ -6175,13 +6147,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
-#ifdef HAVE_EXP_QUERY_DEVICE
 		if (ibv_exp_query_device(ctx, &exp_device_attr)) {
 			ERROR("ibv_exp_query_device() failed");
 			err = ENODEV;
 			goto port_error;
 		}
-#ifdef RSS_SUPPORT
 		if ((exp_device_attr.exp_device_cap_flags &
 		     IBV_EXP_DEVICE_QPG) &&
 		    (exp_device_attr.exp_device_cap_flags &
@@ -6206,7 +6176,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		if (priv->hw_rss)
 			DEBUG("maximum RSS indirection table size: %u",
 			      exp_device_attr.max_rss_tbl_sz);
-#endif /* RSS_SUPPORT */
 
 		priv->hw_csum =
 			((exp_device_attr.exp_device_cap_flags &
@@ -6221,7 +6190,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		DEBUG("L2 tunnel checksum offloads are %ssupported",
 		      (priv->hw_csum_l2tun ? "" : "not "));
 
-#ifdef INLINE_RECV
 		priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
 
 		if (priv->inl_recv_size) {
@@ -6245,10 +6213,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			INFO("Set inline receive size to %u",
 			     priv->inl_recv_size);
 		}
-#endif /* INLINE_RECV */
-#endif /* HAVE_EXP_QUERY_DEVICE */
 
-		(void)mlx4_getenv_int;
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 25a7212..557b94d 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -338,9 +338,7 @@ struct priv {
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-#ifdef INLINE_RECV
 	unsigned int inl_recv_size; /* Inline recv size */
-#endif
 	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index ab37e7d..f5c015e 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -37,9 +37,6 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 
-/* Generated configuration header. */
-#include "mlx4_autoconf.h"
-
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 05/48] net/mlx4: remove secondary process support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (3 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 04/48] net/mlx4: remove useless compilation checks Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 06/48] net/mlx4: remove useless code Adrien Mazarguil
                   ` (44 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

Current implementation is partial (Tx only), not convenient to use and
not of primary concern.

Remove this feature before refactoring the PMD.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |   2 -
 drivers/net/mlx4/mlx4.c           | 349 +--------------------------------
 3 files changed, 8 insertions(+), 344 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 1d5f266..f6efd21 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -27,7 +27,6 @@ Inner L4 checksum    = Y
 Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
-Multiprocess aware   = Y
 Other kdrv           = Y
 Power8               = Y
 x86-32               = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 4c8c299..04788f2 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -88,7 +88,6 @@ Features
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
 - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
-- Secondary process TX is supported.
 - RX interrupts.
 
 Limitations
@@ -99,7 +98,6 @@ Limitations
 - RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be
   dissociated.
 - Hardware counters are not implemented (they are software counters).
-- Secondary process RX is not supported.
 
 Configuration
 -------------
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e6fc204..e938371 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -110,14 +110,6 @@ typedef union {
 	 (((val) & (from)) / ((from) / (to))) : \
 	 (((val) & (from)) * ((to) / (from))))
 
-/* Local storage for secondary process data. */
-struct mlx4_secondary_data {
-	struct rte_eth_dev_data data; /* Local device data. */
-	struct priv *primary_priv; /* Private structure from primary. */
-	struct rte_eth_dev_data *shared_dev_data; /* Shared device data. */
-	rte_spinlock_t lock; /* Port configuration lock. */
-} mlx4_secondary_data[RTE_MAX_ETHPORTS];
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -145,38 +137,6 @@ static void
 priv_rx_intr_vec_disable(struct priv *priv);
 
 /**
- * Check if running as a secondary process.
- *
- * @return
- *   Nonzero if running as a secondary process.
- */
-static inline int
-mlx4_is_secondary(void)
-{
-	return rte_eal_process_type() != RTE_PROC_PRIMARY;
-}
-
-/**
- * Return private structure associated with an Ethernet device.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   Pointer to private structure.
- */
-static struct priv *
-mlx4_get_priv(struct rte_eth_dev *dev)
-{
-	struct mlx4_secondary_data *sd;
-
-	if (!mlx4_is_secondary())
-		return dev->data->dev_private;
-	sd = &mlx4_secondary_data[dev->data->port_id];
-	return sd->data.dev_private;
-}
-
-/**
  * Lock private structure to protect it from concurrent access in the
  * control path.
  *
@@ -734,8 +694,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	ret = dev_configure(dev);
 	assert(ret >= 0);
@@ -746,157 +704,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
 static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
 
-/**
- * Configure secondary process queues from a private data pointer (primary
- * or secondary) and update burst callbacks. Can take place only once.
- *
- * All queues must have been previously created by the primary process to
- * avoid undefined behavior.
- *
- * @param priv
- *   Private data pointer from either primary or secondary process.
- *
- * @return
- *   Private data pointer from secondary process, NULL in case of error.
- */
-static struct priv *
-mlx4_secondary_data_setup(struct priv *priv)
-{
-	unsigned int port_id = 0;
-	struct mlx4_secondary_data *sd;
-	void **tx_queues;
-	void **rx_queues;
-	unsigned int nb_tx_queues;
-	unsigned int nb_rx_queues;
-	unsigned int i;
-
-	/* priv must be valid at this point. */
-	assert(priv != NULL);
-	/* priv->dev must also be valid but may point to local memory from
-	 * another process, possibly with the same address and must not
-	 * be dereferenced yet. */
-	assert(priv->dev != NULL);
-	/* Determine port ID by finding out where priv comes from. */
-	while (1) {
-		sd = &mlx4_secondary_data[port_id];
-		rte_spinlock_lock(&sd->lock);
-		/* Primary process? */
-		if (sd->primary_priv == priv)
-			break;
-		/* Secondary process? */
-		if (sd->data.dev_private == priv)
-			break;
-		rte_spinlock_unlock(&sd->lock);
-		if (++port_id == RTE_DIM(mlx4_secondary_data))
-			port_id = 0;
-	}
-	/* Switch to secondary private structure. If private data has already
-	 * been updated by another thread, there is nothing else to do. */
-	priv = sd->data.dev_private;
-	if (priv->dev->data == &sd->data)
-		goto end;
-	/* Sanity checks. Secondary private structure is supposed to point
-	 * to local eth_dev, itself still pointing to the shared device data
-	 * structure allocated by the primary process. */
-	assert(sd->shared_dev_data != &sd->data);
-	assert(sd->data.nb_tx_queues == 0);
-	assert(sd->data.tx_queues == NULL);
-	assert(sd->data.nb_rx_queues == 0);
-	assert(sd->data.rx_queues == NULL);
-	assert(priv != sd->primary_priv);
-	assert(priv->dev->data == sd->shared_dev_data);
-	assert(priv->txqs_n == 0);
-	assert(priv->txqs == NULL);
-	assert(priv->rxqs_n == 0);
-	assert(priv->rxqs == NULL);
-	nb_tx_queues = sd->shared_dev_data->nb_tx_queues;
-	nb_rx_queues = sd->shared_dev_data->nb_rx_queues;
-	/* Allocate local storage for queues. */
-	tx_queues = rte_zmalloc("secondary ethdev->tx_queues",
-				sizeof(sd->data.tx_queues[0]) * nb_tx_queues,
-				RTE_CACHE_LINE_SIZE);
-	rx_queues = rte_zmalloc("secondary ethdev->rx_queues",
-				sizeof(sd->data.rx_queues[0]) * nb_rx_queues,
-				RTE_CACHE_LINE_SIZE);
-	if (tx_queues == NULL || rx_queues == NULL)
-		goto error;
-	/* Lock to prevent control operations during setup. */
-	priv_lock(priv);
-	/* TX queues. */
-	for (i = 0; i != nb_tx_queues; ++i) {
-		struct txq *primary_txq = (*sd->primary_priv->txqs)[i];
-		struct txq *txq;
-
-		if (primary_txq == NULL)
-			continue;
-		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0,
-					primary_txq->socket);
-		if (txq != NULL) {
-			if (txq_setup(priv->dev,
-				      txq,
-				      primary_txq->elts_n * MLX4_PMD_SGE_WR_N,
-				      primary_txq->socket,
-				      NULL) == 0) {
-				txq->stats.idx = primary_txq->stats.idx;
-				tx_queues[i] = txq;
-				continue;
-			}
-			rte_free(txq);
-		}
-		while (i) {
-			txq = tx_queues[--i];
-			txq_cleanup(txq);
-			rte_free(txq);
-		}
-		goto error;
-	}
-	/* RX queues. */
-	for (i = 0; i != nb_rx_queues; ++i) {
-		struct rxq *primary_rxq = (*sd->primary_priv->rxqs)[i];
-
-		if (primary_rxq == NULL)
-			continue;
-		/* Not supported yet. */
-		rx_queues[i] = NULL;
-	}
-	/* Update everything. */
-	priv->txqs = (void *)tx_queues;
-	priv->txqs_n = nb_tx_queues;
-	priv->rxqs = (void *)rx_queues;
-	priv->rxqs_n = nb_rx_queues;
-	sd->data.rx_queues = rx_queues;
-	sd->data.tx_queues = tx_queues;
-	sd->data.nb_rx_queues = nb_rx_queues;
-	sd->data.nb_tx_queues = nb_tx_queues;
-	sd->data.dev_link = sd->shared_dev_data->dev_link;
-	sd->data.mtu = sd->shared_dev_data->mtu;
-	memcpy(sd->data.rx_queue_state, sd->shared_dev_data->rx_queue_state,
-	       sizeof(sd->data.rx_queue_state));
-	memcpy(sd->data.tx_queue_state, sd->shared_dev_data->tx_queue_state,
-	       sizeof(sd->data.tx_queue_state));
-	sd->data.dev_flags = sd->shared_dev_data->dev_flags;
-	/* Use local data from now on. */
-	rte_mb();
-	priv->dev->data = &sd->data;
-	rte_mb();
-	priv->dev->tx_pkt_burst = mlx4_tx_burst;
-	priv->dev->rx_pkt_burst = removed_rx_burst;
-	priv_unlock(priv);
-end:
-	/* More sanity checks. */
-	assert(priv->dev->tx_pkt_burst == mlx4_tx_burst);
-	assert(priv->dev->rx_pkt_burst == removed_rx_burst);
-	assert(priv->dev->data == &sd->data);
-	rte_spinlock_unlock(&sd->lock);
-	return priv;
-error:
-	priv_unlock(priv);
-	rte_free(tx_queues);
-	rte_free(rx_queues);
-	rte_spinlock_unlock(&sd->lock);
-	return NULL;
-}
-
 /* TX queues handling. */
 
 /**
@@ -1704,46 +1511,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 }
 
 /**
- * DPDK callback for TX in secondary processes.
- *
- * This function configures all queues from primary process information
- * if necessary before reverting to the normal TX burst callback.
- *
- * @param dpdk_txq
- *   Generic pointer to TX queue structure.
- * @param[in] pkts
- *   Packets to transmit.
- * @param pkts_n
- *   Number of packets in array.
- *
- * @return
- *   Number of packets successfully transmitted (<= pkts_n).
- */
-static uint16_t
-mlx4_tx_burst_secondary_setup(void *dpdk_txq, struct rte_mbuf **pkts,
-			      uint16_t pkts_n)
-{
-	struct txq *txq = dpdk_txq;
-	struct priv *priv = mlx4_secondary_data_setup(txq->priv);
-	struct priv *primary_priv;
-	unsigned int index;
-
-	if (priv == NULL)
-		return 0;
-	primary_priv =
-		mlx4_secondary_data[priv->dev->data->port_id].primary_priv;
-	/* Look for queue index in both private structures. */
-	for (index = 0; index != priv->txqs_n; ++index)
-		if (((*primary_priv->txqs)[index] == txq) ||
-		    ((*priv->txqs)[index] == txq))
-			break;
-	if (index == priv->txqs_n)
-		return 0;
-	txq = (*priv->txqs)[index];
-	return priv->dev->tx_pkt_burst(txq, pkts, pkts_n);
-}
-
-/**
  * Configure a TX queue.
  *
  * @param dev
@@ -1764,7 +1531,7 @@ static int
 txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	  unsigned int socket, const struct rte_eth_txconf *conf)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	struct txq tmpl = {
 		.priv = priv,
 		.socket = socket
@@ -1960,8 +1727,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	struct txq *txq = (*priv->txqs)[idx];
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
@@ -2017,8 +1782,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 	struct priv *priv;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	if (txq == NULL)
 		return;
 	priv = txq->priv;
@@ -3328,46 +3091,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 }
 
 /**
- * DPDK callback for RX in secondary processes.
- *
- * This function configures all queues from primary process information
- * if necessary before reverting to the normal RX burst callback.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-mlx4_rx_burst_secondary_setup(void *dpdk_rxq, struct rte_mbuf **pkts,
-			      uint16_t pkts_n)
-{
-	struct rxq *rxq = dpdk_rxq;
-	struct priv *priv = mlx4_secondary_data_setup(rxq->priv);
-	struct priv *primary_priv;
-	unsigned int index;
-
-	if (priv == NULL)
-		return 0;
-	primary_priv =
-		mlx4_secondary_data[priv->dev->data->port_id].primary_priv;
-	/* Look for queue index in both private structures. */
-	for (index = 0; index != priv->rxqs_n; ++index)
-		if (((*primary_priv->rxqs)[index] == rxq) ||
-		    ((*priv->rxqs)[index] == rxq))
-			break;
-	if (index == priv->rxqs_n)
-		return 0;
-	rxq = (*priv->rxqs)[index];
-	return priv->dev->rx_pkt_burst(rxq, pkts, pkts_n);
-}
-
-/**
  * Allocate a Queue Pair.
  * Optionally setup inline receive if supported.
  *
@@ -3998,8 +3721,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	int inactive = 0;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
@@ -4067,8 +3788,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	struct priv *priv;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	if (rxq == NULL)
 		return;
 	priv = rxq->priv;
@@ -4114,8 +3833,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	struct rxq *rxq;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	if (priv->started) {
 		priv_unlock(priv);
@@ -4206,8 +3923,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	unsigned int r;
 	struct rxq *rxq;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (!priv->started) {
 		priv_unlock(priv);
@@ -4309,7 +4024,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
 static void
 mlx4_dev_close(struct rte_eth_dev *dev)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	void *tmp;
 	unsigned int i;
 
@@ -4462,7 +4177,7 @@ mlx4_set_link_up(struct rte_eth_dev *dev)
 static void
 mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	unsigned int max;
 	char ifname[IF_NAMESIZE];
 
@@ -4539,7 +4254,7 @@ mlx4_dev_supported_ptypes_get(struct rte_eth_dev *dev)
 static void
 mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	struct rte_eth_stats tmp = {0};
 	unsigned int i;
 	unsigned int idx;
@@ -4604,7 +4319,7 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 static void
 mlx4_stats_reset(struct rte_eth_dev *dev)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 	unsigned int idx;
 
@@ -4644,8 +4359,6 @@ mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
 {
 	struct priv *priv = dev->data->dev_private;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (priv->isolated)
 		goto end;
@@ -4678,8 +4391,6 @@ mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 	struct priv *priv = dev->data->dev_private;
 	int re;
 
-	if (mlx4_is_secondary())
-		return -ENOTSUP;
 	(void)vmdq;
 	priv_lock(priv);
 	if (priv->isolated) {
@@ -4732,8 +4443,6 @@ mlx4_promiscuous_enable(struct rte_eth_dev *dev)
 	unsigned int i;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (priv->isolated) {
 		DEBUG("%p: cannot enable promiscuous, "
@@ -4786,8 +4495,6 @@ mlx4_promiscuous_disable(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (!priv->promisc || priv->isolated) {
 		priv_unlock(priv);
@@ -4818,8 +4525,6 @@ mlx4_allmulticast_enable(struct rte_eth_dev *dev)
 	unsigned int i;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (priv->isolated) {
 		DEBUG("%p: cannot enable allmulticast, "
@@ -4872,8 +4577,6 @@ mlx4_allmulticast_disable(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (!priv->allmulti || priv->isolated) {
 		priv_unlock(priv);
@@ -4902,7 +4605,7 @@ mlx4_allmulticast_disable(struct rte_eth_dev *dev)
 static int
 mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 {
-	const struct priv *priv = mlx4_get_priv(dev);
+	const struct priv *priv = dev->data->dev_private;
 	struct ethtool_cmd edata = {
 		.cmd = ETHTOOL_GSET
 	};
@@ -4976,8 +4679,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
 		mlx4_rx_burst;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	/* Set kernel interface MTU first. */
 	if (priv_set_mtu(priv, mtu)) {
@@ -5059,8 +4760,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	};
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	ifr.ifr_data = (void *)&ethpause;
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
@@ -5109,8 +4808,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	};
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	ifr.ifr_data = (void *)&ethpause;
 	ethpause.autoneg = fc_conf->autoneg;
 	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
@@ -5250,8 +4947,6 @@ mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	if (priv->isolated) {
 		DEBUG("%p: cannot set vlan filter, "
@@ -6263,36 +5958,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 
-		/* Secondary processes have to use local storage for their
-		 * private data as well as a copy of eth_dev->data, but this
-		 * pointer must not be modified before burst functions are
-		 * actually called. */
-		if (mlx4_is_secondary()) {
-			struct mlx4_secondary_data *sd =
-				&mlx4_secondary_data[eth_dev->data->port_id];
-
-			sd->primary_priv = eth_dev->data->dev_private;
-			if (sd->primary_priv == NULL) {
-				ERROR("no private data for port %u",
-				      eth_dev->data->port_id);
-				err = EINVAL;
-				goto port_error;
-			}
-			sd->shared_dev_data = eth_dev->data;
-			rte_spinlock_init(&sd->lock);
-			memcpy(sd->data.name, sd->shared_dev_data->name,
-			       sizeof(sd->data.name));
-			sd->data.dev_private = priv;
-			sd->data.rx_mbuf_alloc_failed = 0;
-			sd->data.mtu = ETHER_MTU;
-			sd->data.port_id = sd->shared_dev_data->port_id;
-			sd->data.mac_addrs = priv->mac;
-			eth_dev->tx_pkt_burst = mlx4_tx_burst_secondary_setup;
-			eth_dev->rx_pkt_burst = mlx4_rx_burst_secondary_setup;
-		} else {
-			eth_dev->data->dev_private = priv;
-			eth_dev->data->mac_addrs = priv->mac;
-		}
+		eth_dev->data->dev_private = priv;
+		eth_dev->data->mac_addrs = priv->mac;
 		eth_dev->device = &pci_dev->device;
 
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 06/48] net/mlx4: remove useless code
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (4 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 05/48] net/mlx4: remove secondary process support Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 07/48] net/mlx4: remove soft counters compilation option Adrien Mazarguil
                   ` (43 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

Less code makes refactoring easier. No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 17 +----------------
 drivers/net/mlx4/mlx4.h | 12 ------------
 2 files changed, 1 insertion(+), 28 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e938371..2b5527b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -69,13 +69,13 @@
 #include <rte_malloc.h>
 #include <rte_spinlock.h>
 #include <rte_atomic.h>
-#include <rte_version.h>
 #include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
+#include <rte_branch_prediction.h>
 
 /* Generated configuration header. */
 #include "mlx4_autoconf.h"
@@ -4649,10 +4649,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	return -1;
 }
 
-static int
-mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
-			    struct rte_pci_addr *pci_addr);
-
 /**
  * DPDK callback to change the MTU.
  *
@@ -4998,10 +4994,6 @@ mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
 			return -EINVAL;
 		*(const void **)arg = &mlx4_flow_ops;
 		return 0;
-	case RTE_ETH_FILTER_FDIR:
-		DEBUG("%p: filter type FDIR is not supported by this PMD",
-		      (void *)dev);
-		break;
 	default:
 		ERROR("%p: filter type (%d) not supported",
 		      (void *)dev, filter_type);
@@ -5024,22 +5016,15 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.link_update = mlx4_link_update,
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
-	.queue_stats_mapping_set = NULL,
 	.dev_infos_get = mlx4_dev_infos_get,
 	.dev_supported_ptypes_get = mlx4_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx4_vlan_filter_set,
-	.vlan_tpid_set = NULL,
-	.vlan_strip_queue_set = NULL,
-	.vlan_offload_set = NULL,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
 	.tx_queue_release = mlx4_tx_queue_release,
-	.dev_led_on = NULL,
-	.dev_led_off = NULL,
 	.flow_ctrl_get = mlx4_dev_get_flow_ctrl,
 	.flow_ctrl_set = mlx4_dev_set_flow_ctrl,
-	.priority_flow_ctrl_set = NULL,
 	.mac_addr_remove = mlx4_mac_addr_remove,
 	.mac_addr_add = mlx4_mac_addr_add,
 	.mac_addr_set = mlx4_mac_addr_set,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 557b94d..4b42626 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -34,7 +34,6 @@
 #ifndef RTE_PMD_MLX4_H_
 #define RTE_PMD_MLX4_H_
 
-#include <stddef.h>
 #include <stdint.h>
 #include <limits.h>
 
@@ -150,17 +149,6 @@ enum {
 /* Number of elements in array. */
 #define elemof(a) (sizeof(a) / sizeof((a)[0]))
 
-/* Cast pointer p to structure member m to its parent structure of type t. */
-#define containerof(p, t, m) ((t *)((uint8_t *)(p) - offsetof(t, m)))
-
-/* Branch prediction helpers. */
-#ifndef likely
-#define likely(c) __builtin_expect(!!(c), 1)
-#endif
-#ifndef unlikely
-#define unlikely(c) __builtin_expect(!!(c), 0)
-#endif
-
 /* Debugging */
 #ifndef NDEBUG
 #include <stdio.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 07/48] net/mlx4: remove soft counters compilation option
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (5 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 06/48] net/mlx4: remove useless code Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 08/48] net/mlx4: remove scatter mode " Adrien Mazarguil
                   ` (42 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

Software counters are mandatory since hardware counters are not
implemented.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 config/common_base        |  1 -
 doc/guides/nics/mlx4.rst  |  6 ------
 drivers/net/mlx4/Makefile |  4 ----
 drivers/net/mlx4/mlx4.c   | 37 -------------------------------------
 drivers/net/mlx4/mlx4.h   | 12 ------------
 5 files changed, 60 deletions(-)

diff --git a/config/common_base b/config/common_base
index 7805605..d768804 100644
--- a/config/common_base
+++ b/config/common_base
@@ -216,7 +216,6 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
-CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
 
 #
 # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 04788f2..729e6c1 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -97,7 +97,6 @@ Limitations
 - RSS RETA cannot be configured
 - RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be
   dissociated.
-- Hardware counters are not implemented (they are software counters).
 
 Configuration
 -------------
@@ -137,11 +136,6 @@ These options can be modified in the ``.config`` file.
 
   This value is always 1 for RX queues since they use a single MP.
 
-- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**)
-
-  Toggle software counters. No counters are available if this option is
-  disabled since hardware counters are not supported.
-
 Environment variables
 ~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index ab2a867..2db9b10 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -80,10 +80,6 @@ ifdef CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE
 CFLAGS += -DMLX4_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE)
 endif
 
-ifdef CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS
-CFLAGS += -DMLX4_PMD_SOFT_COUNTERS=$(CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS)
-endif
-
 include $(RTE_SDK)/mk/rte.lib.mk
 
 # Generate and clean-up mlx4_autoconf.h.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 2b5527b..34ef80f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -34,7 +34,6 @@
 /*
  * Known limitations:
  * - RSS hash key and options cannot be modified.
- * - Hardware counters aren't implemented.
  */
 
 /* System headers. */
@@ -1372,9 +1371,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
 		struct txq_elt *elt = &(*txq->elts)[elts_head];
 		unsigned int segs = NB_SEGS(buf);
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		unsigned int sent_size = 0;
-#endif
 		uint32_t send_flags = 0;
 
 		/* Clean up old buffer. */
@@ -1452,9 +1449,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					 send_flags);
 			if (unlikely(err))
 				goto stop;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			sent_size += length;
-#endif
 		} else {
 #if MLX4_PMD_SGE_WR_N > 1
 			struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
@@ -1473,9 +1468,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				 send_flags);
 			if (unlikely(err))
 				goto stop;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			sent_size += ret.length;
-#endif
 #else /* MLX4_PMD_SGE_WR_N > 1 */
 			DEBUG("%p: TX scattered buffers support not"
 			      " compiled in", (void *)txq);
@@ -1483,19 +1476,15 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 #endif /* MLX4_PMD_SGE_WR_N > 1 */
 		}
 		elts_head = elts_head_next;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		/* Increment sent bytes counter. */
 		txq->stats.obytes += sent_size;
-#endif
 	}
 stop:
 	/* Take a shortcut if nothing must be sent. */
 	if (unlikely(i == 0))
 		return 0;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	/* Increment sent packets counter. */
 	txq->stats.opackets += i;
-#endif
 	/* Ring QP doorbell. */
 	err = txq->if_qp->send_flush(txq->qp);
 	if (unlikely(err)) {
@@ -2786,10 +2775,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				      " completion status (%d): %s",
 				      (void *)rxq, wc.wr_id, wc.status,
 				      ibv_wc_status_str(wc.status));
-#ifdef MLX4_PMD_SOFT_COUNTERS
 				/* Increment dropped packets counter. */
 				++rxq->stats.idropped;
-#endif
 				/* Link completed WRs together for repost. */
 				*next = wr;
 				next = &wr->next;
@@ -2901,10 +2888,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Return packet. */
 		*(pkts++) = pkt_buf;
 		++pkts_ret;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		/* Increase bytes counter. */
 		rxq->stats.ibytes += pkt_buf_len;
-#endif
 repost:
 		if (++elts_head >= elts_n)
 			elts_head = 0;
@@ -2924,10 +2909,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		abort();
 	}
 	rxq->elts_head = elts_head;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	/* Increase packets counter. */
 	rxq->stats.ipackets += pkts_ret;
-#endif
 	return pkts_ret;
 }
 
@@ -3008,10 +2991,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				      " completion status (%d): %s",
 				      (void *)rxq, wc.wr_id, wc.status,
 				      ibv_wc_status_str(wc.status));
-#ifdef MLX4_PMD_SOFT_COUNTERS
 				/* Increment dropped packets counter. */
 				++rxq->stats.idropped;
-#endif
 				/* Add SGE to array for repost. */
 				sges[i] = elt->sge;
 				goto repost;
@@ -3062,10 +3043,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Return packet. */
 		*(pkts++) = seg;
 		++pkts_ret;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		/* Increase bytes counter. */
 		rxq->stats.ibytes += len;
-#endif
 repost:
 		if (++elts_head >= elts_n)
 			elts_head = 0;
@@ -3083,10 +3062,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		abort();
 	}
 	rxq->elts_head = elts_head;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	/* Increase packets counter. */
 	rxq->stats.ipackets += pkts_ret;
-#endif
 	return pkts_ret;
 }
 
@@ -4270,17 +4247,13 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 			continue;
 		idx = rxq->stats.idx;
 		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			tmp.q_ipackets[idx] += rxq->stats.ipackets;
 			tmp.q_ibytes[idx] += rxq->stats.ibytes;
-#endif
 			tmp.q_errors[idx] += (rxq->stats.idropped +
 					      rxq->stats.rx_nombuf);
 		}
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		tmp.ipackets += rxq->stats.ipackets;
 		tmp.ibytes += rxq->stats.ibytes;
-#endif
 		tmp.ierrors += rxq->stats.idropped;
 		tmp.rx_nombuf += rxq->stats.rx_nombuf;
 	}
@@ -4291,21 +4264,14 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 			continue;
 		idx = txq->stats.idx;
 		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			tmp.q_opackets[idx] += txq->stats.opackets;
 			tmp.q_obytes[idx] += txq->stats.obytes;
-#endif
 			tmp.q_errors[idx] += txq->stats.odropped;
 		}
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		tmp.opackets += txq->stats.opackets;
 		tmp.obytes += txq->stats.obytes;
-#endif
 		tmp.oerrors += txq->stats.odropped;
 	}
-#ifndef MLX4_PMD_SOFT_COUNTERS
-	/* FIXME: retrieve and add hardware counters. */
-#endif
 	*stats = tmp;
 	priv_unlock(priv);
 }
@@ -4340,9 +4306,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 		(*priv->txqs)[i]->stats =
 			(struct mlx4_txq_stats){ .idx = idx };
 	}
-#ifndef MLX4_PMD_SOFT_COUNTERS
-	/* FIXME: reset hardware counters. */
-#endif
 	priv_unlock(priv);
 }
 
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 4b42626..b88d8b0 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -101,14 +101,6 @@
 #define MLX4_PMD_TX_MP_CACHE 8
 #endif
 
-/*
- * If defined, only use software counters. The PMD will never ask the hardware
- * for these, and many of them won't be available.
- */
-#ifndef MLX4_PMD_SOFT_COUNTERS
-#define MLX4_PMD_SOFT_COUNTERS 1
-#endif
-
 /* Alarm timeout. */
 #define MLX4_ALARM_TIMEOUT_US 100000
 
@@ -180,10 +172,8 @@ enum {
 
 struct mlx4_rxq_stats {
 	unsigned int idx; /**< Mapping index. */
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	uint64_t ipackets; /**< Total of successfully received packets. */
 	uint64_t ibytes; /**< Total of successfully received bytes. */
-#endif
 	uint64_t idropped; /**< Total of packets dropped when RX ring full. */
 	uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
 };
@@ -246,10 +236,8 @@ struct txq_elt {
 
 struct mlx4_txq_stats {
 	unsigned int idx; /**< Mapping index. */
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	uint64_t opackets; /**< Total of successfully sent packets. */
 	uint64_t obytes;   /**< Total of successfully sent bytes. */
-#endif
 	uint64_t odropped; /**< Total of packets not sent when TX ring full. */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 08/48] net/mlx4: remove scatter mode compilation option
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (6 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 07/48] net/mlx4: remove soft counters compilation option Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 09/48] net/mlx4: remove Tx inline " Adrien Mazarguil
                   ` (41 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

This option both sets the maximum number of segments for Rx/Tx packets and
whether scattered mode is supported at all. This commit removes the latter
as well as configuration file exposure since the most appropriate value
should be decided at run-time.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 config/common_base        |  1 -
 doc/guides/nics/mlx4.rst  |  7 -------
 drivers/net/mlx4/Makefile |  4 ----
 drivers/net/mlx4/mlx4.c   | 10 ----------
 drivers/net/mlx4/mlx4.h   |  2 --
 5 files changed, 24 deletions(-)

diff --git a/config/common_base b/config/common_base
index d768804..2520bd1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -213,7 +213,6 @@ CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
 #
 CONFIG_RTE_LIBRTE_MLX4_PMD=n
 CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
-CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 729e6c1..f84d56c 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -116,13 +116,6 @@ These options can be modified in the ``.config`` file.
   adds additional run-time checks and debugging messages at the cost of
   lower performance.
 
-- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**)
-
-  Number of scatter/gather elements (SGEs) per work request (WR). Lowering
-  this number improves performance but also limits the ability to receive
-  scattered packets (packets that do not fit a single mbuf). The default
-  value is a safe tradeoff.
-
 - ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**)
 
   Amount of data to be inlined during TX operations. Improves latency but
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 2db9b10..a9c44ca 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -68,10 +68,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif
 
-ifdef CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N
-CFLAGS += -DMLX4_PMD_SGE_WR_N=$(CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE
 CFLAGS += -DMLX4_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE)
 endif
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 34ef80f..6dd0863 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1176,8 +1176,6 @@ txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 	txq_mp2mr(txq, mp);
 }
 
-#if MLX4_PMD_SGE_WR_N > 1
-
 /**
  * Copy scattered mbuf contents to a single linear buffer.
  *
@@ -1324,8 +1322,6 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
 	};
 }
 
-#endif /* MLX4_PMD_SGE_WR_N > 1 */
-
 /**
  * DPDK callback for TX.
  *
@@ -1451,7 +1447,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				goto stop;
 			sent_size += length;
 		} else {
-#if MLX4_PMD_SGE_WR_N > 1
 			struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
 			struct tx_burst_sg_ret ret;
 
@@ -1469,11 +1464,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			if (unlikely(err))
 				goto stop;
 			sent_size += ret.length;
-#else /* MLX4_PMD_SGE_WR_N > 1 */
-			DEBUG("%p: TX scattered buffers support not"
-			      " compiled in", (void *)txq);
-			goto stop;
-#endif /* MLX4_PMD_SGE_WR_N > 1 */
 		}
 		elts_head = elts_head_next;
 		/* Increment sent bytes counter. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b88d8b0..785b2ac 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -81,9 +81,7 @@
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
 /* Maximum number of Scatter/Gather Elements per Work Request. */
-#ifndef MLX4_PMD_SGE_WR_N
 #define MLX4_PMD_SGE_WR_N 4
-#endif
 
 /* Maximum size for inline data. */
 #ifndef MLX4_PMD_MAX_INLINE
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 09/48] net/mlx4: remove Tx inline compilation option
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (7 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 08/48] net/mlx4: remove scatter mode " Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 10/48] net/mlx4: remove allmulti and promisc support Adrien Mazarguil
                   ` (40 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

This should be a run-time parameter.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 config/common_base        | 1 -
 drivers/net/mlx4/Makefile | 4 ----
 drivers/net/mlx4/mlx4.c   | 6 ------
 drivers/net/mlx4/mlx4.h   | 4 ----
 4 files changed, 15 deletions(-)

diff --git a/config/common_base b/config/common_base
index 2520bd1..b6e322c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -213,7 +213,6 @@ CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
 #
 CONFIG_RTE_LIBRTE_MLX4_PMD=n
 CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
-CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 
 #
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index a9c44ca..8406ba2 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -68,10 +68,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif
 
-ifdef CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE
-CFLAGS += -DMLX4_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE
 CFLAGS += -DMLX4_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE)
 endif
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 6dd0863..d00ddc6 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1428,7 +1428,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					      (uintptr_t)addr);
 			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
 			/* Put packet into send queue. */
-#if MLX4_PMD_MAX_INLINE > 0
 			if (length <= txq->max_inline)
 				err = txq->if_qp->send_pending_inline
 					(txq->qp,
@@ -1436,7 +1435,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					 length,
 					 send_flags);
 			else
-#endif
 				err = txq->if_qp->send_pending
 					(txq->qp,
 					 addr,
@@ -1578,9 +1576,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 					  MLX4_PMD_SGE_WR_N) ?
 					 priv->device_attr.max_sge :
 					 MLX4_PMD_SGE_WR_N),
-#if MLX4_PMD_MAX_INLINE > 0
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
-#endif
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
 		/* Do *NOT* enable this, completions events are managed per
@@ -1598,10 +1594,8 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-#if MLX4_PMD_MAX_INLINE > 0
 	/* ibv_create_qp() updates this value. */
 	tmpl.max_inline = attr.init.cap.max_inline_data;
-#endif
 	attr.mod = (struct ibv_exp_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 785b2ac..469ab4b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -84,9 +84,7 @@
 #define MLX4_PMD_SGE_WR_N 4
 
 /* Maximum size for inline data. */
-#ifndef MLX4_PMD_MAX_INLINE
 #define MLX4_PMD_MAX_INLINE 0
-#endif
 
 /*
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
@@ -261,9 +259,7 @@ struct txq {
 	struct ibv_qp *qp; /* Queue Pair. */
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
-#if MLX4_PMD_MAX_INLINE > 0
 	uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
-#endif
 	unsigned int elts_n; /* (*elts)[] length. */
 	struct txq_elt (*elts)[]; /* TX elements. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 10/48] net/mlx4: remove allmulti and promisc support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (8 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 09/48] net/mlx4: remove Tx inline " Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 11/48] net/mlx4: remove VLAN filter support Adrien Mazarguil
                   ` (39 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

This is done in preparation for a major refactoring.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   2 -
 doc/guides/nics/mlx4.rst          |   2 -
 drivers/net/mlx4/mlx4.c           | 311 ---------------------------------
 drivers/net/mlx4/mlx4.h           |   4 -
 4 files changed, 319 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index f6efd21..344731f 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,8 +13,6 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
-Promiscuous mode     = Y
-Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
 RSS hash             = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index f84d56c..9559261 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -82,8 +82,6 @@ Features
   configured RX queues must be a power of two.
 - VLAN filtering is supported.
 - Link state information is provided.
-- Promiscuous mode is supported.
-- All multicast mode is supported.
 - Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d00ddc6..fe1da04 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2141,9 +2141,6 @@ rxq_mac_addrs_del(struct rxq *rxq)
 		rxq_mac_addr_del(rxq, i);
 }
 
-static int rxq_promiscuous_enable(struct rxq *);
-static void rxq_promiscuous_disable(struct rxq *);
-
 /**
  * Add single flow steering rule.
  *
@@ -2422,122 +2419,6 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 }
 
 /**
- * Enable allmulti mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_allmulticast_enable(struct rxq *rxq)
-{
-	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_MC_DEFAULT,
-		.num_of_specs = 0,
-		.port = rxq->priv->port,
-		.flags = 0
-	};
-
-	DEBUG("%p: enabling allmulticast mode", (void *)rxq);
-	if (rxq->allmulti_flow != NULL)
-		return EBUSY;
-	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
-	if (flow == NULL) {
-		/* It's not clear whether errno is always set in this case. */
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
-		      (errno ? strerror(errno) : "Unknown error"));
-		if (errno)
-			return errno;
-		return EINVAL;
-	}
-	rxq->allmulti_flow = flow;
-	DEBUG("%p: allmulticast mode enabled", (void *)rxq);
-	return 0;
-}
-
-/**
- * Disable allmulti mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_allmulticast_disable(struct rxq *rxq)
-{
-	DEBUG("%p: disabling allmulticast mode", (void *)rxq);
-	if (rxq->allmulti_flow == NULL)
-		return;
-	claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
-	rxq->allmulti_flow = NULL;
-	DEBUG("%p: allmulticast mode disabled", (void *)rxq);
-}
-
-/**
- * Enable promiscuous mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_promiscuous_enable(struct rxq *rxq)
-{
-	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_ALL_DEFAULT,
-		.num_of_specs = 0,
-		.port = rxq->priv->port,
-		.flags = 0
-	};
-
-	if (rxq->priv->vf)
-		return 0;
-	DEBUG("%p: enabling promiscuous mode", (void *)rxq);
-	if (rxq->promisc_flow != NULL)
-		return EBUSY;
-	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
-	if (flow == NULL) {
-		/* It's not clear whether errno is always set in this case. */
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
-		      (errno ? strerror(errno) : "Unknown error"));
-		if (errno)
-			return errno;
-		return EINVAL;
-	}
-	rxq->promisc_flow = flow;
-	DEBUG("%p: promiscuous mode enabled", (void *)rxq);
-	return 0;
-}
-
-/**
- * Disable promiscuous mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_promiscuous_disable(struct rxq *rxq)
-{
-	if (rxq->priv->vf)
-		return;
-	DEBUG("%p: disabling promiscuous mode", (void *)rxq);
-	if (rxq->promisc_flow == NULL)
-		return;
-	claim_zero(ibv_destroy_flow(rxq->promisc_flow));
-	rxq->promisc_flow = NULL;
-	DEBUG("%p: promiscuous mode disabled", (void *)rxq);
-}
-
-/**
  * Clean up a RX queue.
  *
  * Destroy objects, free allocated memory and reset the structure for reuse.
@@ -2578,8 +2459,6 @@ rxq_cleanup(struct rxq *rxq)
 						&params));
 	}
 	if (rxq->qp != NULL && !rxq->priv->isolated) {
-		rxq_promiscuous_disable(rxq);
-		rxq_allmulticast_disable(rxq);
 		rxq_mac_addrs_del(rxq);
 	}
 	if (rxq->qp != NULL)
@@ -3222,12 +3101,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* Remove attached flows if RSS is disabled (no parent queue). */
 	if (!priv->rss && !priv->isolated) {
-		rxq_allmulticast_disable(&tmpl);
-		rxq_promiscuous_disable(&tmpl);
 		rxq_mac_addrs_del(&tmpl);
 		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
 		memcpy(rxq->mac_configured, tmpl.mac_configured,
 		       sizeof(rxq->mac_configured));
 		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
@@ -3268,13 +3143,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	/* Reconfigure flows. Do not care for errors. */
 	if (!priv->rss && !priv->isolated) {
 		rxq_mac_addrs_add(&tmpl);
-		if (priv->promisc)
-			rxq_promiscuous_enable(&tmpl);
-		if (priv->allmulti)
-			rxq_allmulticast_enable(&tmpl);
 		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
 		memcpy(rxq->mac_configured, tmpl.mac_configured,
 		       sizeof(rxq->mac_configured));
 		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
@@ -3817,10 +3686,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		if (rxq == NULL)
 			continue;
 		ret = rxq_mac_addrs_add(rxq);
-		if (!ret && priv->promisc)
-			ret = rxq_promiscuous_enable(rxq);
-		if (!ret && priv->allmulti)
-			ret = rxq_allmulticast_enable(rxq);
 		if (!ret)
 			continue;
 		WARN("%p: QP flow attachment failed: %s",
@@ -3858,8 +3723,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	while (i != 0) {
 		rxq = (*priv->rxqs)[i--];
 		if (rxq != NULL) {
-			rxq_allmulticast_disable(rxq);
-			rxq_promiscuous_disable(rxq);
 			rxq_mac_addrs_del(rxq);
 		}
 	}
@@ -3907,8 +3770,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		/* Ignore nonexistent RX queues. */
 		if (rxq == NULL)
 			continue;
-		rxq_allmulticast_disable(rxq);
-		rxq_promiscuous_disable(rxq);
 		rxq_mac_addrs_del(rxq);
 	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	priv_unlock(priv);
@@ -4378,170 +4239,6 @@ mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 }
 
 /**
- * DPDK callback to enable promiscuous mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_promiscuous_enable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	int ret;
-
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot enable promiscuous, "
-		      "device is in isolated mode", (void *)dev);
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->promisc) {
-		priv_unlock(priv);
-		return;
-	}
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_promiscuous_enable(LIST_FIRST(&priv->parents));
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_promiscuous_enable((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_promiscuous_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
-	}
-end:
-	priv->promisc = 1;
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to disable promiscuous mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_promiscuous_disable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-
-	priv_lock(priv);
-	if (!priv->promisc || priv->isolated) {
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->rss) {
-		rxq_promiscuous_disable(LIST_FIRST(&priv->parents));
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_promiscuous_disable((*priv->rxqs)[i]);
-end:
-	priv->promisc = 0;
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to enable allmulti mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_allmulticast_enable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	int ret;
-
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot enable allmulticast, "
-		      "device is in isolated mode", (void *)dev);
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->allmulti) {
-		priv_unlock(priv);
-		return;
-	}
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_allmulticast_enable(LIST_FIRST(&priv->parents));
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_allmulticast_enable((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_allmulticast_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
-	}
-end:
-	priv->allmulti = 1;
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to disable allmulti mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_allmulticast_disable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-
-	priv_lock(priv);
-	if (!priv->allmulti || priv->isolated) {
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->rss) {
-		rxq_allmulticast_disable(LIST_FIRST(&priv->parents));
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_allmulticast_disable((*priv->rxqs)[i]);
-end:
-	priv->allmulti = 0;
-	priv_unlock(priv);
-}
-
-/**
  * DPDK callback to retrieve physical link information.
  *
  * @param dev
@@ -4664,10 +4361,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 		 * for errors at this stage. */
 		if (!priv->rss && !priv->isolated) {
 			rxq_mac_addrs_add(rxq);
-			if (priv->promisc)
-				rxq_promiscuous_enable(rxq);
-			if (priv->allmulti)
-				rxq_allmulticast_enable(rxq);
 		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
@@ -4956,10 +4649,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_set_link_down = mlx4_set_link_down,
 	.dev_set_link_up = mlx4_set_link_up,
 	.dev_close = mlx4_dev_close,
-	.promiscuous_enable = mlx4_promiscuous_enable,
-	.promiscuous_disable = mlx4_promiscuous_disable,
-	.allmulticast_enable = mlx4_allmulticast_enable,
-	.allmulticast_disable = mlx4_allmulticast_disable,
 	.link_update = mlx4_link_update,
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 469ab4b..35c9549 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -204,8 +204,6 @@ struct rxq {
 	 */
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
 	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
-	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-	struct ibv_flow *allmulti_flow; /* Multicast flow. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -297,8 +295,6 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int promisc:1; /* Device in promiscuous mode. */
-	unsigned int allmulti:1; /* Device receives all multicast packets. */
 	unsigned int hw_qpg:1; /* QP groups are supported. */
 	unsigned int hw_tss:1; /* TSS is supported. */
 	unsigned int hw_rss:1; /* RSS is supported. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 11/48] net/mlx4: remove VLAN filter support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (9 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 10/48] net/mlx4: remove allmulti and promisc support Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:53 ` [PATCH v1 12/48] net/mlx4: remove MAC address configuration support Adrien Mazarguil
                   ` (38 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

This is done in preparation for a major refactoring.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |   1 -
 drivers/net/mlx4/mlx4.c           | 206 +++------------------------------
 drivers/net/mlx4/mlx4.h           |  13 +--
 4 files changed, 17 insertions(+), 204 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 344731f..bfa6948 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -17,7 +17,6 @@ Unicast MAC filter   = Y
 Multicast MAC filter = Y
 RSS hash             = Y
 SR-IOV               = Y
-VLAN filter          = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
 Inner L3 checksum    = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 9559261..9ab8da9 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -80,7 +80,6 @@ Features
 - Multi arch support: x86_64 and POWER8.
 - RSS, also known as RCA, is supported. In this mode the number of
   configured RX queues must be a power of two.
-- VLAN filtering is supported.
 - Link state information is provided.
 - Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index fe1da04..288fd9b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -47,7 +47,6 @@
 #include <unistd.h>
 #include <limits.h>
 #include <assert.h>
-#include <arpa/inet.h>
 #include <net/if.h>
 #include <dirent.h>
 #include <sys/ioctl.h>
@@ -2073,11 +2072,9 @@ rxq_free_elts(struct rxq *rxq)
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index.
- * @param vlan_index
- *   VLAN index.
  */
 static void
-rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+rxq_del_flow(struct rxq *rxq, unsigned int mac_index)
 {
 #ifndef NDEBUG
 	struct priv *priv = rxq->priv;
@@ -2085,14 +2082,13 @@ rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 		(const uint8_t (*)[ETHER_ADDR_LEN])
 		priv->mac[mac_index].addr_bytes;
 #endif
-	assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
-	      " (VLAN ID %" PRIu16 ")",
+	assert(rxq->mac_flow[mac_index] != NULL);
+	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
 	      (void *)rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index, priv->vlan_filter[vlan_index].id);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
-	rxq->mac_flow[mac_index][vlan_index] = NULL;
+	      mac_index);
+	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
+	rxq->mac_flow[mac_index] = NULL;
 }
 
 /**
@@ -2106,22 +2102,10 @@ rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 static void
 rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-	unsigned int vlans = 0;
-
-	assert(mac_index < elemof(priv->mac));
+	assert(mac_index < elemof(rxq->priv->mac));
 	if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
 		return;
-	for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
-		if (!priv->vlan_filter[i].enabled)
-			continue;
-		rxq_del_flow(rxq, mac_index, i);
-		vlans++;
-	}
-	if (!vlans) {
-		rxq_del_flow(rxq, mac_index, 0);
-	}
+	rxq_del_flow(rxq, mac_index);
 	BITFIELD_RESET(rxq->mac_configured, mac_index);
 }
 
@@ -2148,14 +2132,12 @@ rxq_mac_addrs_del(struct rxq *rxq)
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index to register.
- * @param vlan_index
- *   VLAN index. Use -1 for a flow without VLAN.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 {
 	struct ibv_flow *flow;
 	struct priv *priv = rxq->priv;
@@ -2172,7 +2154,6 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 	struct ibv_flow_spec_eth *spec = &data.spec;
 
 	assert(mac_index < elemof(priv->mac));
-	assert((vlan_index < elemof(priv->vlan_filter)) || (vlan_index == -1u));
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
 	 * This layout is expected by libibverbs.
@@ -2193,22 +2174,15 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 				(*mac)[0], (*mac)[1], (*mac)[2],
 				(*mac)[3], (*mac)[4], (*mac)[5]
 			},
-			.vlan_tag = ((vlan_index != -1u) ?
-				     htons(priv->vlan_filter[vlan_index].id) :
-				     0),
 		},
 		.mask = {
 			.dst_mac = "\xff\xff\xff\xff\xff\xff",
-			.vlan_tag = ((vlan_index != -1u) ? htons(0xfff) : 0),
 		}
 	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
-	      " (VLAN %s %" PRIu16 ")",
+	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
 	      (void *)rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index,
-	      ((vlan_index != -1u) ? "ID" : "index"),
-	      ((vlan_index != -1u) ? priv->vlan_filter[vlan_index].id : -1u));
+	      mac_index);
 	/* Create related flow. */
 	errno = 0;
 	flow = ibv_create_flow(rxq->qp, attr);
@@ -2221,10 +2195,8 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 			return errno;
 		return EINVAL;
 	}
-	if (vlan_index == -1u)
-		vlan_index = 0;
-	assert(rxq->mac_flow[mac_index][vlan_index] == NULL);
-	rxq->mac_flow[mac_index][vlan_index] = flow;
+	assert(rxq->mac_flow[mac_index] == NULL);
+	rxq->mac_flow[mac_index] = flow;
 	return 0;
 }
 
@@ -2242,37 +2214,14 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 static int
 rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-	unsigned int vlans = 0;
 	int ret;
 
-	assert(mac_index < elemof(priv->mac));
+	assert(mac_index < elemof(rxq->priv->mac));
 	if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
 		rxq_mac_addr_del(rxq, mac_index);
-	/* Fill VLAN specifications. */
-	for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
-		if (!priv->vlan_filter[i].enabled)
-			continue;
-		/* Create related flow. */
-		ret = rxq_add_flow(rxq, mac_index, i);
-		if (!ret) {
-			vlans++;
-			continue;
-		}
-		/* Failure, rollback. */
-		while (i != 0)
-			if (priv->vlan_filter[--i].enabled)
-				rxq_del_flow(rxq, mac_index, i);
-		assert(ret > 0);
+	ret = rxq_add_flow(rxq, mac_index);
+	if (ret)
 		return ret;
-	}
-	/* In case there is no VLAN filter. */
-	if (!vlans) {
-		ret = rxq_add_flow(rxq, mac_index, -1);
-		if (ret)
-			return ret;
-	}
 	BITFIELD_SET(rxq->mac_configured, mac_index);
 	return 0;
 }
@@ -4474,128 +4423,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	return -ret;
 }
 
-/**
- * Configure a VLAN filter.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param vlan_id
- *   VLAN ID to filter.
- * @param on
- *   Toggle filter.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	unsigned int j = -1;
-
-	DEBUG("%p: %s VLAN filter ID %" PRIu16,
-	      (void *)dev, (on ? "enable" : "disable"), vlan_id);
-	for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
-		if (!priv->vlan_filter[i].enabled) {
-			/* Unused index, remember it. */
-			j = i;
-			continue;
-		}
-		if (priv->vlan_filter[i].id != vlan_id)
-			continue;
-		/* This VLAN ID is already known, use its index. */
-		j = i;
-		break;
-	}
-	/* Check if there's room for another VLAN filter. */
-	if (j == (unsigned int)-1)
-		return ENOMEM;
-	/*
-	 * VLAN filters apply to all configured MAC addresses, flow
-	 * specifications must be reconfigured accordingly.
-	 */
-	priv->vlan_filter[j].id = vlan_id;
-	if ((on) && (!priv->vlan_filter[j].enabled)) {
-		/*
-		 * Filter is disabled, enable it.
-		 * Rehashing flows in all RX queues is necessary.
-		 */
-		if (priv->rss)
-			rxq_mac_addrs_del(LIST_FIRST(&priv->parents));
-		else
-			for (i = 0; (i != priv->rxqs_n); ++i)
-				if ((*priv->rxqs)[i] != NULL)
-					rxq_mac_addrs_del((*priv->rxqs)[i]);
-		priv->vlan_filter[j].enabled = 1;
-		if (priv->started) {
-			if (priv->rss)
-				rxq_mac_addrs_add(LIST_FIRST(&priv->parents));
-			else
-				for (i = 0; (i != priv->rxqs_n); ++i) {
-					if ((*priv->rxqs)[i] == NULL)
-						continue;
-					rxq_mac_addrs_add((*priv->rxqs)[i]);
-				}
-		}
-	} else if ((!on) && (priv->vlan_filter[j].enabled)) {
-		/*
-		 * Filter is enabled, disable it.
-		 * Rehashing flows in all RX queues is necessary.
-		 */
-		if (priv->rss)
-			rxq_mac_addrs_del(LIST_FIRST(&priv->parents));
-		else
-			for (i = 0; (i != priv->rxqs_n); ++i)
-				if ((*priv->rxqs)[i] != NULL)
-					rxq_mac_addrs_del((*priv->rxqs)[i]);
-		priv->vlan_filter[j].enabled = 0;
-		if (priv->started) {
-			if (priv->rss)
-				rxq_mac_addrs_add(LIST_FIRST(&priv->parents));
-			else
-				for (i = 0; (i != priv->rxqs_n); ++i) {
-					if ((*priv->rxqs)[i] == NULL)
-						continue;
-					rxq_mac_addrs_add((*priv->rxqs)[i]);
-				}
-		}
-	}
-	return 0;
-}
-
-/**
- * DPDK callback to configure a VLAN filter.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param vlan_id
- *   VLAN ID to filter.
- * @param on
- *   Toggle filter.
- *
- * @return
- *   0 on success, negative errno value on failure.
- */
-static int
-mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
-{
-	struct priv *priv = dev->data->dev_private;
-	int ret;
-
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot set vlan filter, "
-		      "device is in isolated mode", (void *)dev);
-		priv_unlock(priv);
-		return -EINVAL;
-	}
-	ret = vlan_filter_set(dev, vlan_id, on);
-	priv_unlock(priv);
-	assert(ret >= 0);
-	return -ret;
-}
-
 const struct rte_flow_ops mlx4_flow_ops = {
 	.validate = mlx4_flow_validate,
 	.create = mlx4_flow_create,
@@ -4654,7 +4481,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
 	.dev_supported_ptypes_get = mlx4_dev_supported_ptypes_get,
-	.vlan_filter_set = mlx4_vlan_filter_set,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 35c9549..6b2c83b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -74,9 +74,6 @@
  */
 #define MLX4_MAX_MAC_ADDRESSES 128
 
-/* Maximum number of simultaneous VLAN filters supported. See above. */
-#define MLX4_MAX_VLAN_IDS 127
-
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
@@ -199,11 +196,8 @@ struct rxq {
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
-	/*
-	 * Each VLAN ID requires a separate flow steering rule.
-	 */
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
+	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -286,11 +280,6 @@ struct priv {
 	 */
 	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-	/* VLAN filters. */
-	struct {
-		unsigned int enabled:1; /* If enabled. */
-		unsigned int id:12; /* VLAN ID (0-4095). */
-	} vlan_filter[MLX4_MAX_VLAN_IDS]; /* VLAN filters table. */
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 12/48] net/mlx4: remove MAC address configuration support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (10 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 11/48] net/mlx4: remove VLAN filter support Adrien Mazarguil
@ 2017-08-01 16:53 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 13/48] net/mlx4: drop MAC flows affecting all Rx queues Adrien Mazarguil
                   ` (37 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:53 UTC (permalink / raw)
  To: dev

Only the default port MAC address remains and is not configurable.
This is done in preparation for a major refactoring.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   2 -
 doc/guides/nics/mlx4.rst          |   1 -
 drivers/net/mlx4/mlx4.c           | 322 ++++-----------------------------
 drivers/net/mlx4/mlx4.h           |  41 +----
 4 files changed, 39 insertions(+), 327 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index bfa6948..3acf8d3 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,8 +13,6 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
-Unicast MAC filter   = Y
-Multicast MAC filter = Y
 RSS hash             = Y
 SR-IOV               = Y
 L3 checksum offload  = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 9ab8da9..235912a 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -81,7 +81,6 @@ Features
 - RSS, also known as RCA, is supported. In this mode the number of
   configured RX queues must be a power of two.
 - Link state information is provided.
-- Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
 - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 288fd9b..dd42c96 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -45,7 +45,6 @@
 #include <string.h>
 #include <errno.h>
 #include <unistd.h>
-#include <limits.h>
 #include <assert.h>
 #include <net/if.h>
 #include <dirent.h>
@@ -2066,84 +2065,46 @@ rxq_free_elts(struct rxq *rxq)
 }
 
 /**
- * Delete flow steering rule.
+ * Unregister a MAC address from a RX queue.
  *
  * @param rxq
  *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index.
  */
 static void
-rxq_del_flow(struct rxq *rxq, unsigned int mac_index)
+rxq_mac_addr_del(struct rxq *rxq)
 {
 #ifndef NDEBUG
 	struct priv *priv = rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 		(const uint8_t (*)[ETHER_ADDR_LEN])
-		priv->mac[mac_index].addr_bytes;
+		priv->mac.addr_bytes;
 #endif
-	assert(rxq->mac_flow[mac_index] != NULL);
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
-	      (void *)rxq,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
-	rxq->mac_flow[mac_index] = NULL;
-}
-
-/**
- * Unregister a MAC address from a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index.
- */
-static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
-{
-	assert(mac_index < elemof(rxq->priv->mac));
-	if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
+	if (!rxq->mac_flow)
 		return;
-	rxq_del_flow(rxq, mac_index);
-	BITFIELD_RESET(rxq->mac_configured, mac_index);
-}
-
-/**
- * Unregister all MAC addresses from a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_mac_addrs_del(struct rxq *rxq)
-{
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-
-	for (i = 0; (i != elemof(priv->mac)); ++i)
-		rxq_mac_addr_del(rxq, i);
+	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+	      (void *)rxq,
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
+	claim_zero(ibv_destroy_flow(rxq->mac_flow));
+	rxq->mac_flow = NULL;
 }
 
 /**
- * Add single flow steering rule.
+ * Register a MAC address in a RX queue.
  *
  * @param rxq
  *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index to register.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
+rxq_mac_addr_add(struct rxq *rxq)
 {
 	struct ibv_flow *flow;
 	struct priv *priv = rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
-			priv->mac[mac_index].addr_bytes;
+			priv->mac.addr_bytes;
 
 	/* Allocate flow specification on the stack. */
 	struct __attribute__((packed)) {
@@ -2153,7 +2114,8 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 	struct ibv_flow_attr *attr = &data.attr;
 	struct ibv_flow_spec_eth *spec = &data.spec;
 
-	assert(mac_index < elemof(priv->mac));
+	if (rxq->mac_flow)
+		rxq_mac_addr_del(rxq);
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
 	 * This layout is expected by libibverbs.
@@ -2179,10 +2141,9 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 			.dst_mac = "\xff\xff\xff\xff\xff\xff",
 		}
 	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
+	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
 	      (void *)rxq,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index);
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
 	/* Create related flow. */
 	errno = 0;
 	flow = ibv_create_flow(rxq->qp, attr);
@@ -2195,99 +2156,12 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 			return errno;
 		return EINVAL;
 	}
-	assert(rxq->mac_flow[mac_index] == NULL);
-	rxq->mac_flow[mac_index] = flow;
+	assert(rxq->mac_flow == NULL);
+	rxq->mac_flow = flow;
 	return 0;
 }
 
 /**
- * Register a MAC address in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index to register.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
-{
-	int ret;
-
-	assert(mac_index < elemof(rxq->priv->mac));
-	if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
-		rxq_mac_addr_del(rxq, mac_index);
-	ret = rxq_add_flow(rxq, mac_index);
-	if (ret)
-		return ret;
-	BITFIELD_SET(rxq->mac_configured, mac_index);
-	return 0;
-}
-
-/**
- * Register all MAC addresses in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_mac_addrs_add(struct rxq *rxq)
-{
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-	int ret;
-
-	for (i = 0; (i != elemof(priv->mac)); ++i) {
-		if (!BITFIELD_ISSET(priv->mac_configured, i))
-			continue;
-		ret = rxq_mac_addr_add(rxq, i);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			rxq_mac_addr_del(rxq, --i);
-		assert(ret > 0);
-		return ret;
-	}
-	return 0;
-}
-
-/**
- * Unregister a MAC address.
- *
- * In RSS mode, the MAC address is unregistered from the parent queue,
- * otherwise it is unregistered from each queue directly.
- *
- * @param priv
- *   Pointer to private structure.
- * @param mac_index
- *   MAC address index.
- */
-static void
-priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
-{
-	unsigned int i;
-
-	assert(!priv->isolated);
-	assert(mac_index < elemof(priv->mac));
-	if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
-		return;
-	if (priv->rss) {
-		rxq_mac_addr_del(LIST_FIRST(&priv->parents), mac_index);
-		goto end;
-	}
-	for (i = 0; (i != priv->dev->data->nb_rx_queues); ++i)
-		rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
-end:
-	BITFIELD_RESET(priv->mac_configured, mac_index);
-}
-
-/**
  * Register a MAC address.
  *
  * In RSS mode, the MAC address is registered in the parent queue,
@@ -2295,8 +2169,6 @@ priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
  *
  * @param priv
  *   Pointer to private structure.
- * @param mac_index
- *   MAC address index to use.
  * @param mac
  *   MAC address to register.
  *
@@ -2304,28 +2176,12 @@ priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
  *   0 on success, errno value on failure.
  */
 static int
-priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
-		  const uint8_t (*mac)[ETHER_ADDR_LEN])
+priv_mac_addr_add(struct priv *priv, const uint8_t (*mac)[ETHER_ADDR_LEN])
 {
 	unsigned int i;
 	int ret;
 
-	assert(mac_index < elemof(priv->mac));
-	/* First, make sure this address isn't already configured. */
-	for (i = 0; (i != elemof(priv->mac)); ++i) {
-		/* Skip this index, it's going to be reconfigured. */
-		if (i == mac_index)
-			continue;
-		if (!BITFIELD_ISSET(priv->mac_configured, i))
-			continue;
-		if (memcmp(priv->mac[i].addr_bytes, *mac, sizeof(*mac)))
-			continue;
-		/* Address already configured elsewhere, return with error. */
-		return EADDRINUSE;
-	}
-	if (BITFIELD_ISSET(priv->mac_configured, mac_index))
-		priv_mac_addr_del(priv, mac_index);
-	priv->mac[mac_index] = (struct ether_addr){
+	priv->mac = (struct ether_addr){
 		{
 			(*mac)[0], (*mac)[1], (*mac)[2],
 			(*mac)[3], (*mac)[4], (*mac)[5]
@@ -2333,19 +2189,10 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 	};
 	/* If device isn't started, this is all we need to do. */
 	if (!priv->started) {
-#ifndef NDEBUG
-		/* Verify that all queues have this index disabled. */
-		for (i = 0; (i != priv->rxqs_n); ++i) {
-			if ((*priv->rxqs)[i] == NULL)
-				continue;
-			assert(!BITFIELD_ISSET
-			       ((*priv->rxqs)[i]->mac_configured, mac_index));
-		}
-#endif
 		goto end;
 	}
 	if (priv->rss) {
-		ret = rxq_mac_addr_add(LIST_FIRST(&priv->parents), mac_index);
+		ret = rxq_mac_addr_add(LIST_FIRST(&priv->parents));
 		if (ret)
 			return ret;
 		goto end;
@@ -2353,17 +2200,16 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 	for (i = 0; (i != priv->rxqs_n); ++i) {
 		if ((*priv->rxqs)[i] == NULL)
 			continue;
-		ret = rxq_mac_addr_add((*priv->rxqs)[i], mac_index);
+		ret = rxq_mac_addr_add((*priv->rxqs)[i]);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
 		while (i != 0)
 			if ((*priv->rxqs)[(--i)] != NULL)
-				rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+				rxq_mac_addr_del((*priv->rxqs)[i]);
 		return ret;
 	}
 end:
-	BITFIELD_SET(priv->mac_configured, mac_index);
 	return 0;
 }
 
@@ -2408,7 +2254,7 @@ rxq_cleanup(struct rxq *rxq)
 						&params));
 	}
 	if (rxq->qp != NULL && !rxq->priv->isolated) {
-		rxq_mac_addrs_del(rxq);
+		rxq_mac_addr_del(rxq);
 	}
 	if (rxq->qp != NULL)
 		claim_zero(ibv_destroy_qp(rxq->qp));
@@ -3050,11 +2896,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* Remove attached flows if RSS is disabled (no parent queue). */
 	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addrs_del(&tmpl);
+		rxq_mac_addr_del(&tmpl);
 		/* Update original queue in case of failure. */
-		memcpy(rxq->mac_configured, tmpl.mac_configured,
-		       sizeof(rxq->mac_configured));
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+		rxq->mac_flow = NULL;
 	}
 	/* From now on, any failure will render the queue unusable.
 	 * Reinitialize QP. */
@@ -3091,11 +2935,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* Reconfigure flows. Do not care for errors. */
 	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addrs_add(&tmpl);
+		rxq_mac_addr_add(&tmpl);
 		/* Update original queue in case of failure. */
-		memcpy(rxq->mac_configured, tmpl.mac_configured,
-		       sizeof(rxq->mac_configured));
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+		rxq->mac_flow = NULL;
 	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
@@ -3241,7 +3083,7 @@ rxq_create_qp(struct rxq *rxq,
 	}
 	if (!priv->isolated && (parent || !priv->rss)) {
 		/* Configure MAC and broadcast addresses. */
-		ret = rxq_mac_addrs_add(rxq);
+		ret = rxq_mac_addr_add(rxq);
 		if (ret) {
 			ERROR("QP flow attachment failed: %s",
 			      strerror(ret));
@@ -3634,7 +3476,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		/* Ignore nonexistent RX queues. */
 		if (rxq == NULL)
 			continue;
-		ret = rxq_mac_addrs_add(rxq);
+		ret = rxq_mac_addr_add(rxq);
 		if (!ret)
 			continue;
 		WARN("%p: QP flow attachment failed: %s",
@@ -3672,7 +3514,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	while (i != 0) {
 		rxq = (*priv->rxqs)[i--];
 		if (rxq != NULL) {
-			rxq_mac_addrs_del(rxq);
+			rxq_mac_addr_del(rxq);
 		}
 	}
 	priv->started = 0;
@@ -3719,7 +3561,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		/* Ignore nonexistent RX queues. */
 		if (rxq == NULL)
 			continue;
-		rxq_mac_addrs_del(rxq);
+		rxq_mac_addr_del(rxq);
 	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	priv_unlock(priv);
 }
@@ -3972,7 +3814,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->max_rx_queues = max;
 	info->max_tx_queues = max;
 	/* Last array entry is reserved for broadcast. */
-	info->max_mac_addrs = (elemof(priv->mac) - 1);
+	info->max_mac_addrs = 1;
 	info->rx_offload_capa =
 		(priv->hw_csum ?
 		 (DEV_RX_OFFLOAD_IPV4_CKSUM |
@@ -4104,90 +3946,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 }
 
 /**
- * DPDK callback to remove a MAC address.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param index
- *   MAC address index.
- */
-static void
-mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
-{
-	struct priv *priv = dev->data->dev_private;
-
-	priv_lock(priv);
-	if (priv->isolated)
-		goto end;
-	DEBUG("%p: removing MAC address from index %" PRIu32,
-	      (void *)dev, index);
-	/* Last array entry is reserved for broadcast. */
-	if (index >= (elemof(priv->mac) - 1))
-		goto end;
-	priv_mac_addr_del(priv, index);
-end:
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to add a MAC address.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param mac_addr
- *   MAC address to register.
- * @param index
- *   MAC address index.
- * @param vmdq
- *   VMDq pool index to associate address with (ignored).
- */
-static int
-mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
-		  uint32_t index, uint32_t vmdq)
-{
-	struct priv *priv = dev->data->dev_private;
-	int re;
-
-	(void)vmdq;
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot add MAC address, "
-		      "device is in isolated mode", (void *)dev);
-		re = EPERM;
-		goto end;
-	}
-	DEBUG("%p: adding MAC address at index %" PRIu32,
-	      (void *)dev, index);
-	/* Last array entry is reserved for broadcast. */
-	if (index >= (elemof(priv->mac) - 1)) {
-		re = EINVAL;
-		goto end;
-	}
-	re = priv_mac_addr_add(priv, index,
-			       (const uint8_t (*)[ETHER_ADDR_LEN])
-			       mac_addr->addr_bytes);
-end:
-	priv_unlock(priv);
-	return -re;
-}
-
-/**
- * DPDK callback to set the primary MAC address.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param mac_addr
- *   MAC address to register.
- */
-static void
-mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
-{
-	DEBUG("%p: setting primary MAC address", (void *)dev);
-	mlx4_mac_addr_remove(dev, 0);
-	mlx4_mac_addr_add(dev, mac_addr, 0, 0);
-}
-
-/**
  * DPDK callback to retrieve physical link information.
  *
  * @param dev
@@ -4309,7 +4067,7 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 		/* Reenable non-RSS queue attributes. No need to check
 		 * for errors at this stage. */
 		if (!priv->rss && !priv->isolated) {
-			rxq_mac_addrs_add(rxq);
+			rxq_mac_addr_add(rxq);
 		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
@@ -4487,9 +4245,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.tx_queue_release = mlx4_tx_queue_release,
 	.flow_ctrl_get = mlx4_dev_get_flow_ctrl,
 	.flow_ctrl_set = mlx4_dev_set_flow_ctrl,
-	.mac_addr_remove = mlx4_mac_addr_remove,
-	.mac_addr_add = mlx4_mac_addr_add,
-	.mac_addr_set = mlx4_mac_addr_set,
 	.mtu_set = mlx4_dev_set_mtu,
 	.filter_ctrl = mlx4_dev_filter_ctrl,
 	.rx_queue_intr_enable = mlx4_rx_intr_enable,
@@ -5369,13 +5124,10 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[0], mac.addr_bytes[1],
 		     mac.addr_bytes[2], mac.addr_bytes[3],
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
-		/* Register MAC and broadcast addresses. */
-		claim_zero(priv_mac_addr_add(priv, 0,
+		/* Register MAC address. */
+		claim_zero(priv_mac_addr_add(priv,
 					     (const uint8_t (*)[ETHER_ADDR_LEN])
 					     mac.addr_bytes));
-		claim_zero(priv_mac_addr_add(priv, (elemof(priv->mac) - 1),
-					     &(const uint8_t [ETHER_ADDR_LEN])
-					     { "\xff\xff\xff\xff\xff\xff" }));
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
@@ -5406,7 +5158,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		}
 
 		eth_dev->data->dev_private = priv;
-		eth_dev->data->mac_addrs = priv->mac;
+		eth_dev->data->mac_addrs = &priv->mac;
 		eth_dev->device = &pci_dev->device;
 
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 6b2c83b..addc2d5 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -35,7 +35,6 @@
 #define RTE_PMD_MLX4_H_
 
 #include <stdint.h>
-#include <limits.h>
 
 /*
  * Runtime logging through RTE_LOG() is enabled when not in debugging mode.
@@ -64,16 +63,6 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
-/*
- * Maximum number of simultaneous MAC addresses supported.
- *
- * According to ConnectX's Programmer Reference Manual:
- *   The L2 Address Match is implemented by comparing a MAC/VLAN combination
- *   of 128 MAC addresses and 127 VLAN values, comprising 128x127 possible
- *   L2 addresses.
- */
-#define MLX4_MAX_MAC_ADDRESSES 128
-
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
@@ -112,25 +101,6 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-/* Bit-field manipulation. */
-#define BITFIELD_DECLARE(bf, type, size)				\
-	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) +		\
-		 !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
-#define BITFIELD_DEFINE(bf, type, size)					\
-	BITFIELD_DECLARE((bf), type, (size)) = { 0 }
-#define BITFIELD_SET(bf, b)						\
-	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)),			\
-	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |=		\
-		((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
-#define BITFIELD_RESET(bf, b)						\
-	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)),			\
-	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &=		\
-		~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
-#define BITFIELD_ISSET(bf, b)						\
-	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)),			\
-	 !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &		\
-	     ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
-
 /* Number of elements in array. */
 #define elemof(a) (sizeof(a) / sizeof((a)[0]))
 
@@ -196,8 +166,7 @@ struct rxq {
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
-	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -273,13 +242,7 @@ struct priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	/*
-	 * MAC addresses array and configuration bit-field.
-	 * An extra entry that cannot be modified by the DPDK is reserved
-	 * for broadcast frames (destination MAC address ff:ff:ff:ff:ff:ff).
-	 */
-	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
-	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
+	struct ether_addr mac; /* MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 13/48] net/mlx4: drop MAC flows affecting all Rx queues
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (11 preceding siblings ...)
  2017-08-01 16:53 ` [PATCH v1 12/48] net/mlx4: remove MAC address configuration support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 14/48] net/mlx4: revert flow API RSS support Adrien Mazarguil
                   ` (36 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Configuring several Rx queues enables RSS, which causes an additional
special parent queue to be created to manage them.

MAC flows are associated with the queue supposed to receive packets; either
the parent one in case of RSS or the single orphan otherwise.

For historical reasons the current implementation supports another scenario
with multiple orphans, in which case MAC flows are configured on all of
them. This is harmless but useless since it cannot happen.

Removing this feature allows dissociating the remaining MAC flow from Rx
queues and store it inside the private structure where it belongs.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 215 +++++++++++--------------------------------
 drivers/net/mlx4/mlx4.h |   2 +-
 2 files changed, 57 insertions(+), 160 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index dd42c96..c11e789 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -515,6 +515,9 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 static void
 rxq_cleanup(struct rxq *rxq);
 
+static void
+priv_mac_addr_del(struct priv *priv);
+
 /**
  * Create RSS parent queue.
  *
@@ -641,6 +644,7 @@ dev_configure(struct rte_eth_dev *dev)
 		for (i = 0; (i != priv->rxqs_n); ++i)
 			if ((*priv->rxqs)[i] != NULL)
 				return EINVAL;
+		priv_mac_addr_del(priv);
 		priv_parent_list_cleanup(priv);
 		priv->rss = 0;
 		priv->rxqs_n = 0;
@@ -2065,46 +2069,57 @@ rxq_free_elts(struct rxq *rxq)
 }
 
 /**
- * Unregister a MAC address from a RX queue.
+ * Unregister a MAC address.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param priv
+ *   Pointer to private structure.
  */
 static void
-rxq_mac_addr_del(struct rxq *rxq)
+priv_mac_addr_del(struct priv *priv)
 {
 #ifndef NDEBUG
-	struct priv *priv = rxq->priv;
-	const uint8_t (*mac)[ETHER_ADDR_LEN] =
-		(const uint8_t (*)[ETHER_ADDR_LEN])
-		priv->mac.addr_bytes;
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
 #endif
-	if (!rxq->mac_flow)
+
+	if (!priv->mac_flow)
 		return;
 	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)rxq,
+	      (void *)priv,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow));
-	rxq->mac_flow = NULL;
+	claim_zero(ibv_destroy_flow(priv->mac_flow));
+	priv->mac_flow = NULL;
 }
 
 /**
- * Register a MAC address in a RX queue.
+ * Register a MAC address.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * In RSS mode, the MAC address is registered in the parent queue,
+ * otherwise it is registered in queue 0.
+ *
+ * @param priv
+ *   Pointer to private structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_mac_addr_add(struct rxq *rxq)
+priv_mac_addr_add(struct priv *priv)
 {
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
+	struct rxq *rxq;
 	struct ibv_flow *flow;
-	struct priv *priv = rxq->priv;
-	const uint8_t (*mac)[ETHER_ADDR_LEN] =
-			(const uint8_t (*)[ETHER_ADDR_LEN])
-			priv->mac.addr_bytes;
+
+	/* If device isn't started, this is all we need to do. */
+	if (!priv->started)
+		return 0;
+	if (priv->isolated)
+		return 0;
+	if (priv->rss)
+		rxq = LIST_FIRST(&priv->parents);
+	else if (*priv->rxqs && (*priv->rxqs)[0])
+		rxq = (*priv->rxqs)[0];
+	else
+		return 0;
 
 	/* Allocate flow specification on the stack. */
 	struct __attribute__((packed)) {
@@ -2114,8 +2129,8 @@ rxq_mac_addr_add(struct rxq *rxq)
 	struct ibv_flow_attr *attr = &data.attr;
 	struct ibv_flow_spec_eth *spec = &data.spec;
 
-	if (rxq->mac_flow)
-		rxq_mac_addr_del(rxq);
+	if (priv->mac_flow)
+		priv_mac_addr_del(priv);
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
 	 * This layout is expected by libibverbs.
@@ -2142,7 +2157,7 @@ rxq_mac_addr_add(struct rxq *rxq)
 		}
 	};
 	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)rxq,
+	      (void *)priv,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
 	/* Create related flow. */
 	errno = 0;
@@ -2156,60 +2171,8 @@ rxq_mac_addr_add(struct rxq *rxq)
 			return errno;
 		return EINVAL;
 	}
-	assert(rxq->mac_flow == NULL);
-	rxq->mac_flow = flow;
-	return 0;
-}
-
-/**
- * Register a MAC address.
- *
- * In RSS mode, the MAC address is registered in the parent queue,
- * otherwise it is registered in each queue directly.
- *
- * @param priv
- *   Pointer to private structure.
- * @param mac
- *   MAC address to register.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-priv_mac_addr_add(struct priv *priv, const uint8_t (*mac)[ETHER_ADDR_LEN])
-{
-	unsigned int i;
-	int ret;
-
-	priv->mac = (struct ether_addr){
-		{
-			(*mac)[0], (*mac)[1], (*mac)[2],
-			(*mac)[3], (*mac)[4], (*mac)[5]
-		}
-	};
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started) {
-		goto end;
-	}
-	if (priv->rss) {
-		ret = rxq_mac_addr_add(LIST_FIRST(&priv->parents));
-		if (ret)
-			return ret;
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_mac_addr_add((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[(--i)] != NULL)
-				rxq_mac_addr_del((*priv->rxqs)[i]);
-		return ret;
-	}
-end:
+	assert(priv->mac_flow == NULL);
+	priv->mac_flow = flow;
 	return 0;
 }
 
@@ -2253,9 +2216,6 @@ rxq_cleanup(struct rxq *rxq)
 						rxq->if_cq,
 						&params));
 	}
-	if (rxq->qp != NULL && !rxq->priv->isolated) {
-		rxq_mac_addr_del(rxq);
-	}
 	if (rxq->qp != NULL)
 		claim_zero(ibv_destroy_qp(rxq->qp));
 	if (rxq->cq != NULL)
@@ -2894,12 +2854,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		DEBUG("%p: nothing to do", (void *)dev);
 		return 0;
 	}
-	/* Remove attached flows if RSS is disabled (no parent queue). */
-	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addr_del(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->mac_flow = NULL;
-	}
 	/* From now on, any failure will render the queue unusable.
 	 * Reinitialize QP. */
 	if (!tmpl.qp)
@@ -2933,12 +2887,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		assert(err > 0);
 		return err;
 	}
-	/* Reconfigure flows. Do not care for errors. */
-	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addr_add(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->mac_flow = NULL;
-	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
 	if (pool == NULL) {
@@ -3081,15 +3029,6 @@ rxq_create_qp(struct rxq *rxq,
 		      strerror(ret));
 		return ret;
 	}
-	if (!priv->isolated && (parent || !priv->rss)) {
-		/* Configure MAC and broadcast addresses. */
-		ret = rxq_mac_addr_add(rxq);
-		if (ret) {
-			ERROR("QP flow attachment failed: %s",
-			      strerror(ret));
-			return ret;
-		}
-	}
 	if (!parent) {
 		ret = ibv_post_recv(rxq->qp,
 				    (rxq->sp ?
@@ -3359,6 +3298,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -EEXIST;
 		}
 		(*priv->rxqs)[idx] = NULL;
+		if (idx == 0)
+			priv_mac_addr_del(priv);
 		rxq_cleanup(rxq);
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
@@ -3418,6 +3359,8 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			DEBUG("%p: removing RX queue %p from list",
 			      (void *)priv->dev, (void *)rxq);
 			(*priv->rxqs)[i] = NULL;
+			if (i == 0)
+				priv_mac_addr_del(priv);
 			break;
 		}
 	rxq_cleanup(rxq);
@@ -3449,9 +3392,6 @@ static int
 mlx4_dev_start(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
 	int ret;
 
 	priv_lock(priv);
@@ -3461,28 +3401,9 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	}
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
-	if (priv->isolated) {
-		rxq = NULL;
-		r = 1;
-	} else if (priv->rss) {
-		rxq = LIST_FIRST(&priv->parents);
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	/* Iterate only once when RSS is enabled. */
-	do {
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		ret = rxq_mac_addr_add(rxq);
-		if (!ret)
-			continue;
-		WARN("%p: QP flow attachment failed: %s",
-		     (void *)dev, strerror(ret));
+	ret = priv_mac_addr_add(priv);
+	if (ret)
 		goto err;
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	ret = priv_dev_link_interrupt_handler_install(priv, dev);
 	if (ret) {
 		ERROR("%p: LSC handler install failed",
@@ -3511,12 +3432,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	return 0;
 err:
 	/* Rollback. */
-	while (i != 0) {
-		rxq = (*priv->rxqs)[i--];
-		if (rxq != NULL) {
-			rxq_mac_addr_del(rxq);
-		}
-	}
+	priv_mac_addr_del(priv);
 	priv->started = 0;
 	priv_unlock(priv);
 	return -ret;
@@ -3534,9 +3450,6 @@ static void
 mlx4_dev_stop(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
 
 	priv_lock(priv);
 	if (!priv->started) {
@@ -3545,24 +3458,8 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	}
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
-	if (priv->isolated) {
-		rxq = NULL;
-		r = 1;
-	} else if (priv->rss) {
-		rxq = LIST_FIRST(&priv->parents);
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
 	mlx4_priv_flow_stop(priv);
-	/* Iterate only once when RSS is enabled. */
-	do {
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		rxq_mac_addr_del(rxq);
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+	priv_mac_addr_del(priv);
 	priv_unlock(priv);
 }
 
@@ -3647,6 +3544,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+	priv_mac_addr_del(priv);
 	/* Prevent crashes when queues are still in use. This is unfortunately
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
 	 * never release them before closing the device. */
@@ -4036,6 +3934,8 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	} else
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
 	priv->mtu = mtu;
+	/* Remove MAC flow. */
+	priv_mac_addr_del(priv);
 	/* Temporarily replace RX handler with a fake one, assuming it has not
 	 * been copied elsewhere. */
 	dev->rx_pkt_burst = removed_rx_burst;
@@ -4064,11 +3964,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 				rx_func = mlx4_rx_burst_sp;
 			break;
 		}
-		/* Reenable non-RSS queue attributes. No need to check
-		 * for errors at this stage. */
-		if (!priv->rss && !priv->isolated) {
-			rxq_mac_addr_add(rxq);
-		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
 			rx_func = mlx4_rx_burst_sp;
@@ -4076,6 +3971,8 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	/* Burst functions can now be called again. */
 	rte_wmb();
 	dev->rx_pkt_burst = rx_func;
+	/* Restore MAC flow. */
+	ret = priv_mac_addr_add(priv);
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
@@ -5125,9 +5022,9 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[2], mac.addr_bytes[3],
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
-		claim_zero(priv_mac_addr_add(priv,
-					     (const uint8_t (*)[ETHER_ADDR_LEN])
-					     mac.addr_bytes));
+		priv->mac = mac;
+		if (priv_mac_addr_add(priv))
+			goto port_error;
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index addc2d5..23ffc87 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -166,7 +166,6 @@ struct rxq {
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
-	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -243,6 +242,7 @@ struct priv {
 	struct ibv_device_attr device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac; /* MAC address. */
+	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 14/48] net/mlx4: revert flow API RSS support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (12 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 13/48] net/mlx4: drop MAC flows affecting all Rx queues Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 15/48] net/mlx4: revert RSS parent queue refactoring Adrien Mazarguil
                   ` (35 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

This reverts commit d7769c7c08cc08a9d1bc4e40b95524d9697707d9.

Existing RSS features rely on experimental Verbs provided by Mellanox OFED.

In order to replace this dependency with standard distribution packages,
RSS support must be temporarily removed to be re-implemented using a
different API.

Removing support for the RSS flow rule action is the first step toward this
goal.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |   6 +-
 drivers/net/mlx4/mlx4.h      |   5 -
 drivers/net/mlx4/mlx4_flow.c | 206 +++-----------------------------------
 drivers/net/mlx4/mlx4_flow.h |   3 +-
 4 files changed, 20 insertions(+), 200 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index c11e789..4aef6a3 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -533,7 +533,7 @@ priv_mac_addr_del(struct priv *priv);
  * @return
  *   Pointer to a parent rxq structure, NULL on failure.
  */
-struct rxq *
+static struct rxq *
 priv_parent_create(struct priv *priv,
 		   uint16_t queues[],
 		   uint16_t children_n)
@@ -670,8 +670,10 @@ dev_configure(struct rte_eth_dev *dev)
 	priv->rss = 1;
 	tmp = priv->rxqs_n;
 	priv->rxqs_n = rxqs_n;
-	if (priv->isolated)
+	if (priv->isolated) {
+		priv->rss = 0;
 		return 0;
+	}
 	if (priv_parent_create(priv, NULL, priv->rxqs_n))
 		return 0;
 	/* Failure, rollback. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 23ffc87..301b193 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -285,9 +285,4 @@ rxq_create_qp(struct rxq *rxq,
 void
 rxq_parent_cleanup(struct rxq *parent);
 
-struct rxq *
-priv_parent_create(struct priv *priv,
-		   uint16_t queues[],
-		   uint16_t children_n);
-
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index f5c015e..827115e 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -109,7 +109,6 @@ struct rte_flow_drop {
 static const enum rte_flow_action_type valid_actions[] = {
 	RTE_FLOW_ACTION_TYPE_DROP,
 	RTE_FLOW_ACTION_TYPE_QUEUE,
-	RTE_FLOW_ACTION_TYPE_RSS,
 	RTE_FLOW_ACTION_TYPE_END,
 };
 
@@ -670,76 +669,6 @@ priv_flow_validate(struct priv *priv,
 			if (!queue || (queue->index > (priv->rxqs_n - 1)))
 				goto exit_action_not_supported;
 			action.queue = 1;
-			action.queues_n = 1;
-			action.queues[0] = queue->index;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_RSS) {
-			int i;
-			int ierr;
-			const struct rte_flow_action_rss *rss =
-				(const struct rte_flow_action_rss *)
-				actions->conf;
-
-			if (!priv->hw_rss) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "RSS cannot be used with "
-					   "the current configuration");
-				return -rte_errno;
-			}
-			if (!priv->isolated) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "RSS cannot be used without "
-					   "isolated mode");
-				return -rte_errno;
-			}
-			if (!rte_is_power_of_2(rss->num)) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "the number of queues "
-					   "should be power of two");
-				return -rte_errno;
-			}
-			if (priv->max_rss_tbl_sz < rss->num) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "the number of queues "
-					   "is too large");
-				return -rte_errno;
-			}
-			/* checking indexes array */
-			ierr = 0;
-			for (i = 0; i < rss->num; ++i) {
-				int j;
-				if (rss->queue[i] >= priv->rxqs_n)
-					ierr = 1;
-				/*
-				 * Prevent the user from specifying
-				 * the same queue twice in the RSS array.
-				 */
-				for (j = i + 1; j < rss->num && !ierr; ++j)
-					if (rss->queue[j] == rss->queue[i])
-						ierr = 1;
-				if (ierr) {
-					rte_flow_error_set(
-						error,
-						ENOTSUP,
-						RTE_FLOW_ERROR_TYPE_HANDLE,
-						NULL,
-						"RSS action only supports "
-						"unique queue indices "
-						"in a list");
-					return -rte_errno;
-				}
-			}
-			action.queue = 1;
-			action.queues_n = rss->num;
-			for (i = 0; i < rss->num; ++i)
-				action.queues[i] = rss->queue[i];
 		} else {
 			goto exit_action_not_supported;
 		}
@@ -865,82 +794,6 @@ mlx4_flow_create_drop_queue(struct priv *priv)
 }
 
 /**
- * Get RSS parent rxq structure for given queues.
- *
- * Creates a new or returns an existed one.
- *
- * @param priv
- *   Pointer to private structure.
- * @param queues
- *   queues indices array, NULL in default RSS case.
- * @param children_n
- *   the size of queues array.
- *
- * @return
- *   Pointer to a parent rxq structure, NULL on failure.
- */
-static struct rxq *
-priv_parent_get(struct priv *priv,
-		uint16_t queues[],
-		uint16_t children_n,
-		struct rte_flow_error *error)
-{
-	unsigned int i;
-	struct rxq *parent;
-
-	for (parent = LIST_FIRST(&priv->parents);
-	     parent;
-	     parent = LIST_NEXT(parent, next)) {
-		unsigned int same = 0;
-		unsigned int overlap = 0;
-
-		/*
-		 * Find out whether an appropriate parent queue already exists
-		 * and can be reused, otherwise make sure there are no overlaps.
-		 */
-		for (i = 0; i < children_n; ++i) {
-			unsigned int j;
-
-			for (j = 0; j < parent->rss.queues_n; ++j) {
-				if (parent->rss.queues[j] != queues[i])
-					continue;
-				++overlap;
-				if (i == j)
-					++same;
-			}
-		}
-		if (same == children_n &&
-			children_n == parent->rss.queues_n)
-			return parent;
-		else if (overlap)
-			goto error;
-	}
-	/* Exclude the cases when some QPs were created without RSS */
-	for (i = 0; i < children_n; ++i) {
-		struct rxq *rxq = (*priv->rxqs)[queues[i]];
-		if (rxq->qp)
-			goto error;
-	}
-	parent = priv_parent_create(priv, queues, children_n);
-	if (!parent) {
-		rte_flow_error_set(error,
-				   ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "flow rule creation failure");
-		return NULL;
-	}
-	return parent;
-
-error:
-	rte_flow_error_set(error,
-			   EEXIST,
-			   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			   NULL,
-			   "sharing a queue between several"
-			   " RSS groups is not supported");
-	return NULL;
-}
-
-/**
  * Complete flow rule creation.
  *
  * @param priv
@@ -963,7 +816,6 @@ priv_flow_create_action_queue(struct priv *priv,
 {
 	struct ibv_qp *qp;
 	struct rte_flow *rte_flow;
-	struct rxq *rxq_parent = NULL;
 
 	assert(priv->pd);
 	assert(priv->ctx);
@@ -977,38 +829,23 @@ priv_flow_create_action_queue(struct priv *priv,
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
 		int ret;
-		unsigned int i;
-		struct rxq *rxq = NULL;
-
-		if (action->queues_n > 1) {
-			rxq_parent = priv_parent_get(priv, action->queues,
-						     action->queues_n, error);
-			if (!rxq_parent)
+		struct rxq *rxq = (*priv->rxqs)[action->queue_id];
+
+		if (!rxq->qp) {
+			assert(priv->isolated);
+			ret = rxq_create_qp(rxq, rxq->elts_n,
+					    0, 0, NULL);
+			if (ret) {
+				rte_flow_error_set(
+					error,
+					ENOMEM,
+					RTE_FLOW_ERROR_TYPE_HANDLE,
+					NULL,
+					"flow rule creation failure");
 				goto error;
-		}
-		for (i = 0; i < action->queues_n; ++i) {
-			rxq = (*priv->rxqs)[action->queues[i]];
-			/*
-			 * In case of isolated mode we postpone
-			 * ibv receive queue creation till the first
-			 * rte_flow rule will be applied on that queue.
-			 */
-			if (!rxq->qp) {
-				assert(priv->isolated);
-				ret = rxq_create_qp(rxq, rxq->elts_n,
-						    0, 0, rxq_parent);
-				if (ret) {
-					rte_flow_error_set(
-						error,
-						ENOMEM,
-						RTE_FLOW_ERROR_TYPE_HANDLE,
-						NULL,
-						"flow rule creation failure");
-					goto error;
-				}
 			}
 		}
-		qp = action->queues_n > 1 ? rxq_parent->qp : rxq->qp;
+		qp = rxq->qp;
 		rte_flow->qp = qp;
 	}
 	rte_flow->ibv_attr = ibv_attr;
@@ -1023,8 +860,6 @@ priv_flow_create_action_queue(struct priv *priv,
 	return rte_flow;
 
 error:
-	if (rxq_parent)
-		rxq_parent_cleanup(rxq_parent);
 	rte_free(rte_flow);
 	return NULL;
 }
@@ -1088,22 +923,11 @@ priv_flow_create(struct priv *priv,
 			continue;
 		} else if (actions->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
 			action.queue = 1;
-			action.queues_n = 1;
-			action.queues[0] =
+			action.queue_id =
 				((const struct rte_flow_action_queue *)
 				 actions->conf)->index;
 		} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
 			action.drop = 1;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_RSS) {
-			unsigned int i;
-			const struct rte_flow_action_rss *rss =
-				(const struct rte_flow_action_rss *)
-				 actions->conf;
-
-			action.queue = 1;
-			action.queues_n = rss->num;
-			for (i = 0; i < rss->num; ++i)
-				action.queues[i] = rss->queue[i];
 		} else {
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 4654dc2..17e5f6e 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -98,8 +98,7 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 struct mlx4_flow_action {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
-	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queue indices to use. */
-	uint16_t queues_n; /**< Number of entries in queue[] */
+	uint32_t queue_id; /**< Identifier of the queue. */
 };
 
 int mlx4_priv_flow_start(struct priv *priv);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 15/48] net/mlx4: revert RSS parent queue refactoring
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (13 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 14/48] net/mlx4: revert flow API RSS support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 16/48] net/mlx4: drop RSS support Adrien Mazarguil
                   ` (34 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

This reverts commit ff00a0dc5600dbb0a29e4aa7fa4b078f98c7a360.

Support for several RSS parent queues was necessary to implement the RSS
flow rule action, dropped in a prior commit.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 332 +++++++++++---------------------------
 drivers/net/mlx4/mlx4.h      |  17 +-
 drivers/net/mlx4/mlx4_flow.c |  15 --
 3 files changed, 97 insertions(+), 267 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 4aef6a3..42438a2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -507,10 +507,8 @@ txq_cleanup(struct txq *txq);
 
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive,
-	  const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp, int children_n,
-	  struct rxq *rxq_parent);
+	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  struct rte_mempool *mp);
 
 static void
 rxq_cleanup(struct rxq *rxq);
@@ -519,84 +517,6 @@ static void
 priv_mac_addr_del(struct priv *priv);
 
 /**
- * Create RSS parent queue.
- *
- * The new parent is inserted in front of the list in the private structure.
- *
- * @param priv
- *   Pointer to private structure.
- * @param queues
- *   Queues indices array, if NULL use all Rx queues.
- * @param children_n
- *   The number of entries in queues[].
- *
- * @return
- *   Pointer to a parent rxq structure, NULL on failure.
- */
-static struct rxq *
-priv_parent_create(struct priv *priv,
-		   uint16_t queues[],
-		   uint16_t children_n)
-{
-	int ret;
-	uint16_t i;
-	struct rxq *parent;
-
-	parent = rte_zmalloc("parent queue",
-			     sizeof(*parent),
-			     RTE_CACHE_LINE_SIZE);
-	if (!parent) {
-		ERROR("cannot allocate memory for RSS parent queue");
-		return NULL;
-	}
-	ret = rxq_setup(priv->dev, parent, 0, 0, 0,
-			NULL, NULL, children_n, NULL);
-	if (ret) {
-		rte_free(parent);
-		return NULL;
-	}
-	parent->rss.queues_n = children_n;
-	if (queues) {
-		for (i = 0; i < children_n; ++i)
-			parent->rss.queues[i] = queues[i];
-	} else {
-		/* the default RSS ring case */
-		assert(priv->rxqs_n == children_n);
-		for (i = 0; i < priv->rxqs_n; ++i)
-			parent->rss.queues[i] = i;
-	}
-	LIST_INSERT_HEAD(&priv->parents, parent, next);
-	return parent;
-}
-
-/**
- * Clean up RX queue parent structure.
- *
- * @param parent
- *   RX queue parent structure.
- */
-void
-rxq_parent_cleanup(struct rxq *parent)
-{
-	LIST_REMOVE(parent, next);
-	rxq_cleanup(parent);
-	rte_free(parent);
-}
-
-/**
- * Clean up parent structures from the parent list.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_parent_list_cleanup(struct priv *priv)
-{
-	while (!LIST_EMPTY(&priv->parents))
-		rxq_parent_cleanup(LIST_FIRST(&priv->parents));
-}
-
-/**
  * Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
@@ -615,6 +535,7 @@ dev_configure(struct rte_eth_dev *dev)
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
 	unsigned int tmp;
+	int ret;
 
 	priv->rxqs = (void *)dev->data->rx_queues;
 	priv->txqs = (void *)dev->data->tx_queues;
@@ -645,7 +566,7 @@ dev_configure(struct rte_eth_dev *dev)
 			if ((*priv->rxqs)[i] != NULL)
 				return EINVAL;
 		priv_mac_addr_del(priv);
-		priv_parent_list_cleanup(priv);
+		rxq_cleanup(&priv->rxq_parent);
 		priv->rss = 0;
 		priv->rxqs_n = 0;
 	}
@@ -670,16 +591,14 @@ dev_configure(struct rte_eth_dev *dev)
 	priv->rss = 1;
 	tmp = priv->rxqs_n;
 	priv->rxqs_n = rxqs_n;
-	if (priv->isolated) {
-		priv->rss = 0;
-		return 0;
-	}
-	if (priv_parent_create(priv, NULL, priv->rxqs_n))
+	ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, 0, NULL, NULL);
+	if (!ret)
 		return 0;
 	/* Failure, rollback. */
 	priv->rss = 0;
 	priv->rxqs_n = tmp;
-	return ENOMEM;
+	assert(ret > 0);
+	return ret;
 }
 
 /**
@@ -2117,7 +2036,7 @@ priv_mac_addr_add(struct priv *priv)
 	if (priv->isolated)
 		return 0;
 	if (priv->rss)
-		rxq = LIST_FIRST(&priv->parents);
+		rxq = &priv->rxq_parent;
 	else if (*priv->rxqs && (*priv->rxqs)[0])
 		rxq = (*priv->rxqs)[0];
 	else
@@ -2743,18 +2662,15 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
  *   Completion queue to associate with QP.
  * @param desc
  *   Number of descriptors in QP (hint only).
- * @param children_n
- *   If nonzero, a number of children for parent QP and zero for a child.
- * @param rxq_parent
- *   Pointer for a parent in a child case, NULL otherwise.
+ * @param parent
+ *   If nonzero, create a parent QP, otherwise a child.
  *
  * @return
  *   QP pointer or NULL in case of error.
  */
 static struct ibv_qp *
 rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-		 int children_n, struct ibv_exp_res_domain *rd,
-		 struct rxq *rxq_parent)
+		 int parent, struct ibv_exp_res_domain *rd)
 {
 	struct ibv_exp_qp_init_attr attr = {
 		/* CQ to be associated with the send queue. */
@@ -2782,16 +2698,16 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 
 	attr.max_inl_recv = priv->inl_recv_size,
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-	if (children_n > 0) {
+	if (parent) {
 		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
 		/* TSS isn't necessary. */
 		attr.qpg.parent_attrib.tss_child_count = 0;
 		attr.qpg.parent_attrib.rss_child_count =
-			rte_align32pow2(children_n + 1) >> 1;
+			rte_align32pow2(priv->rxqs_n + 1) >> 1;
 		DEBUG("initializing parent RSS queue");
 	} else {
 		attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
-		attr.qpg.qpg_parent = rxq_parent->qp;
+		attr.qpg.qpg_parent = priv->rxq_parent.qp;
 		DEBUG("initializing child RSS queue");
 	}
 	return ibv_exp_create_qp(priv->ctx, &attr);
@@ -2825,7 +2741,13 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int err;
+	int parent = (rxq == &priv->rxq_parent);
 
+	if (parent) {
+		ERROR("%p: cannot rehash parent queue %p",
+		      (void *)dev, (void *)rxq);
+		return EINVAL;
+	}
 	mb_len = rte_pktmbuf_data_room_size(rxq->mp);
 	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
 	/* Number of descriptors and mbufs currently allocated. */
@@ -2858,8 +2780,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* From now on, any failure will render the queue unusable.
 	 * Reinitialize QP. */
-	if (!tmpl.qp)
-		goto skip_init;
 	mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
 	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
 	if (err) {
@@ -2867,6 +2787,12 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		assert(err > 0);
 		return err;
 	}
+	err = ibv_resize_cq(tmpl.cq, desc_n);
+	if (err) {
+		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
+		assert(err > 0);
+		return err;
+	}
 	mod = (struct ibv_exp_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
@@ -2875,6 +2801,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	};
 	err = ibv_exp_modify_qp(tmpl.qp, &mod,
 				(IBV_EXP_QP_STATE |
+				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
 				 IBV_EXP_QP_PORT));
 	if (err) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
@@ -2882,13 +2809,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		assert(err > 0);
 		return err;
 	};
-skip_init:
-	err = ibv_resize_cq(tmpl.cq, desc_n);
-	if (err) {
-		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
 	if (pool == NULL) {
@@ -2942,8 +2862,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	rxq->elts_n = 0;
 	rte_free(rxq->elts.sp);
 	rxq->elts.sp = NULL;
-	if (!tmpl.qp)
-		goto skip_rtr;
 	/* Post WRs. */
 	err = ibv_post_recv(tmpl.qp,
 			    (tmpl.sp ?
@@ -2971,103 +2889,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 }
 
 /**
- * Create verbs QP resources associated with a rxq.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param inactive
- *   If true, the queue is disabled because its index is higher or
- *   equal to the real number of queues, which must be a power of 2.
- * @param children_n
- *   The number of children in a parent case, zero for a child.
- * @param rxq_parent
- *   The pointer to a parent RX structure for a child in RSS case,
- *   NULL for parent.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-int
-rxq_create_qp(struct rxq *rxq,
-	      uint16_t desc,
-	      int inactive,
-	      int children_n,
-	      struct rxq *rxq_parent)
-{
-	int ret;
-	struct ibv_exp_qp_attr mod;
-	struct ibv_exp_query_intf_params params;
-	enum ibv_exp_query_intf_status status;
-	struct ibv_recv_wr *bad_wr;
-	int parent = (children_n > 0);
-	struct priv *priv = rxq->priv;
-
-	if (priv->rss && !inactive && (rxq_parent || parent))
-		rxq->qp = rxq_setup_qp_rss(priv, rxq->cq, desc,
-					   children_n, rxq->rd,
-					   rxq_parent);
-	else
-		rxq->qp = rxq_setup_qp(priv, rxq->cq, desc, rxq->rd);
-	if (rxq->qp == NULL) {
-		ret = (errno ? errno : EINVAL);
-		ERROR("QP creation failure: %s",
-		      strerror(ret));
-		return ret;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_exp_modify_qp(rxq->qp, &mod,
-				(IBV_EXP_QP_STATE |
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-				 IBV_EXP_QP_PORT));
-	if (ret) {
-		ERROR("QP state to IBV_QPS_INIT failed: %s",
-		      strerror(ret));
-		return ret;
-	}
-	if (!parent) {
-		ret = ibv_post_recv(rxq->qp,
-				    (rxq->sp ?
-				     &(*rxq->elts.sp)[0].wr :
-				     &(*rxq->elts.no_sp)[0].wr),
-				    &bad_wr);
-		if (ret) {
-			ERROR("ibv_post_recv() failed for WR %p: %s",
-			      (void *)bad_wr,
-			      strerror(ret));
-			return ret;
-		}
-	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_exp_modify_qp(rxq->qp, &mod, IBV_EXP_QP_STATE);
-	if (ret) {
-		ERROR("QP state to IBV_QPS_RTR failed: %s",
-		      strerror(ret));
-		return ret;
-	}
-	params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = rxq->qp,
-	};
-	rxq->if_qp = ibv_exp_query_intf(priv->ctx, &params, &status);
-	if (rxq->if_qp == NULL) {
-		ERROR("QP interface family query failed with status %d",
-		      status);
-		return errno;
-	}
-	return 0;
-}
-
-/**
  * Configure a RX queue.
  *
  * @param dev
@@ -3085,21 +2906,14 @@ rxq_create_qp(struct rxq *rxq,
  *   Thresholds parameters.
  * @param mp
  *   Memory pool for buffer allocations.
- * @param children_n
- *   The number of children in a parent case, zero for a child.
- * @param rxq_parent
- *   The pointer to a parent RX structure (or NULL) in a child case,
- *   NULL for parent.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive,
-	  const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp, int children_n,
-	  struct rxq *rxq_parent)
+	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rxq tmpl = {
@@ -3107,15 +2921,17 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.mp = mp,
 		.socket = socket
 	};
+	struct ibv_exp_qp_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
 		struct ibv_exp_cq_init_attr cq;
 		struct ibv_exp_res_domain_init_attr rd;
 	} attr;
 	enum ibv_exp_query_intf_status status;
+	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret = 0;
-	int parent = (children_n > 0);
+	int parent = (rxq == &priv->rxq_parent);
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	/*
@@ -3206,6 +3022,32 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
+	if (priv->rss && !inactive)
+		tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
+					   tmpl.rd);
+	else
+		tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+	if (tmpl.qp == NULL) {
+		ret = (errno ? errno : EINVAL);
+		ERROR("%p: QP creation failure: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
+	mod = (struct ibv_exp_qp_attr){
+		/* Move the QP to this state. */
+		.qp_state = IBV_QPS_INIT,
+		/* Primary port number. */
+		.port_num = priv->port
+	};
+	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
+				(IBV_EXP_QP_STATE |
+				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
+				 IBV_EXP_QP_PORT));
+	if (ret) {
+		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
 	/* Allocate descriptors for RX queues, except for the RSS parent. */
 	if (parent)
 		goto skip_alloc;
@@ -3216,14 +3058,29 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(ret));
-		return ret;
+		goto error;
+	}
+	ret = ibv_post_recv(tmpl.qp,
+			    (tmpl.sp ?
+			     &(*tmpl.elts.sp)[0].wr :
+			     &(*tmpl.elts.no_sp)[0].wr),
+			    &bad_wr);
+	if (ret) {
+		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+		      (void *)dev,
+		      (void *)bad_wr,
+		      strerror(ret));
+		goto error;
 	}
 skip_alloc:
-	if (parent || rxq_parent || !priv->rss) {
-		ret = rxq_create_qp(&tmpl, desc, inactive,
-				    children_n, rxq_parent);
-		if (ret)
-			goto error;
+	mod = (struct ibv_exp_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+	if (ret) {
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
 	}
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
@@ -3235,11 +3092,21 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	};
 	tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
 	if (tmpl.if_cq == NULL) {
-		ret = EINVAL;
 		ERROR("%p: CQ interface family query failed with status %d",
 		      (void *)dev, status);
 		goto error;
 	}
+	attr.params = (struct ibv_exp_query_intf_params){
+		.intf_scope = IBV_EXP_INTF_GLOBAL,
+		.intf = IBV_EXP_INTF_QP_BURST,
+		.obj = tmpl.qp,
+	};
+	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+	if (tmpl.if_qp == NULL) {
+		ERROR("%p: QP interface family query failed with status %d",
+		      (void *)dev, status);
+		goto error;
+	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
@@ -3277,7 +3144,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		    unsigned int socket, const struct rte_eth_rxconf *conf,
 		    struct rte_mempool *mp)
 {
-	struct rxq *parent;
 	struct priv *priv = dev->data->dev_private;
 	struct rxq *rxq = (*priv->rxqs)[idx];
 	int inactive = 0;
@@ -3312,16 +3178,9 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -ENOMEM;
 		}
 	}
-	if (priv->rss && !priv->isolated) {
-		/* The list consists of the single default one. */
-		parent = LIST_FIRST(&priv->parents);
-		if (idx >= rte_align32pow2(priv->rxqs_n + 1) >> 1)
-			inactive = 1;
-	} else {
-		parent = NULL;
-	}
-	ret = rxq_setup(dev, rxq, desc, socket,
-			inactive, conf, mp, 0, parent);
+	if (idx >= rte_align32pow2(priv->rxqs_n + 1) >> 1)
+		inactive = 1;
+	ret = rxq_setup(dev, rxq, desc, socket, inactive, conf, mp);
 	if (ret)
 		rte_free(rxq);
 	else {
@@ -3356,6 +3215,7 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		return;
 	priv = rxq->priv;
 	priv_lock(priv);
+	assert(rxq != &priv->rxq_parent);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
@@ -3581,7 +3441,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		priv->txqs = NULL;
 	}
 	if (priv->rss)
-		priv_parent_list_cleanup(priv);
+		rxq_cleanup(&priv->rxq_parent);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 301b193..726ca2a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -157,7 +157,6 @@ struct rxq_elt {
 
 /* RX queue descriptor. */
 struct rxq {
-	LIST_ENTRY(rxq) next; /* Used by parent queue only */
 	struct priv *priv; /* Back pointer to private data. */
 	struct rte_mempool *mp; /* Memory Pool for allocations. */
 	struct ibv_mr *mr; /* Memory Region (for mp). */
@@ -179,10 +178,6 @@ struct rxq {
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
-	struct {
-		uint16_t queues_n;
-		uint16_t queues[RTE_MAX_QUEUES_PER_PORT];
-	} rss;
 };
 
 /* TX element. */
@@ -259,6 +254,7 @@ struct priv {
 	unsigned int inl_recv_size; /* Inline recv size */
 	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
+	struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
@@ -268,21 +264,10 @@ struct priv {
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
-	LIST_HEAD(mlx4_parents, rxq) parents;
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
 void priv_lock(struct priv *priv);
 void priv_unlock(struct priv *priv);
 
-int
-rxq_create_qp(struct rxq *rxq,
-	      uint16_t desc,
-	      int inactive,
-	      int children_n,
-	      struct rxq *rxq_parent);
-
-void
-rxq_parent_cleanup(struct rxq *parent);
-
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 827115e..2c5dc3c 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -828,23 +828,8 @@ priv_flow_create_action_queue(struct priv *priv,
 	if (action->drop) {
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
-		int ret;
 		struct rxq *rxq = (*priv->rxqs)[action->queue_id];
 
-		if (!rxq->qp) {
-			assert(priv->isolated);
-			ret = rxq_create_qp(rxq, rxq->elts_n,
-					    0, 0, NULL);
-			if (ret) {
-				rte_flow_error_set(
-					error,
-					ENOMEM,
-					RTE_FLOW_ERROR_TYPE_HANDLE,
-					NULL,
-					"flow rule creation failure");
-				goto error;
-			}
-		}
 		qp = rxq->qp;
 		rte_flow->qp = qp;
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 16/48] net/mlx4: drop RSS support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (14 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 15/48] net/mlx4: revert RSS parent queue refactoring Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 17/48] net/mlx4: drop checksum offloads support Adrien Mazarguil
                   ` (33 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The Verbs RSS API used in this PMD is now obsolete. It is superseded by an
enhanced API with fewer constraints already used in the mlx5 PMD.

Drop RSS support in preparation for a major refactoring. The ability to
configure several Rx queues is retained, these can be targeted directly by
creating specific flow rules.

There is no need for "ignored" Rx queues anymore since their number is no
longer limited to powers of two.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |  13 --
 drivers/net/mlx4/mlx4.c           | 212 +++------------------------------
 drivers/net/mlx4/mlx4.h           |   6 -
 4 files changed, 14 insertions(+), 218 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 3acf8d3..aa1ad21 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,7 +13,6 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
-RSS hash             = Y
 SR-IOV               = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 235912a..e906b8d 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -78,22 +78,12 @@ Features
 --------
 
 - Multi arch support: x86_64 and POWER8.
-- RSS, also known as RCA, is supported. In this mode the number of
-  configured RX queues must be a power of two.
 - Link state information is provided.
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
 - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
 - RX interrupts.
 
-Limitations
------------
-
-- RSS hash key cannot be modified.
-- RSS RETA cannot be configured
-- RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be
-  dissociated.
-
 Configuration
 -------------
 
@@ -137,9 +127,6 @@ Environment variables
 Run-time configuration
 ~~~~~~~~~~~~~~~~~~~~~~
 
-- The only constraint when RSS mode is requested is to make sure the number
-  of RX queues is a power of two. This is a hardware requirement.
-
 - librte_pmd_mlx4 brings kernel network interfaces up during initialization
   because it is affected by their state. Forcing them down prevents packets
   reception.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 42438a2..a1ff62a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -31,11 +31,6 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-/*
- * Known limitations:
- * - RSS hash key and options cannot be modified.
- */
-
 /* System headers. */
 #include <stddef.h>
 #include <stdio.h>
@@ -507,7 +502,7 @@ txq_cleanup(struct txq *txq);
 
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  unsigned int socket, const struct rte_eth_rxconf *conf,
 	  struct rte_mempool *mp);
 
 static void
@@ -520,7 +515,6 @@ priv_mac_addr_del(struct priv *priv);
  * Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
- * Allocate parent RSS queue when several RX queues are requested.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -534,8 +528,6 @@ dev_configure(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int tmp;
-	int ret;
 
 	priv->rxqs = (void *)dev->data->rx_queues;
 	priv->txqs = (void *)dev->data->tx_queues;
@@ -544,61 +536,12 @@ dev_configure(struct rte_eth_dev *dev)
 		     (void *)dev, priv->txqs_n, txqs_n);
 		priv->txqs_n = txqs_n;
 	}
-	if (rxqs_n == priv->rxqs_n)
-		return 0;
-	if (!rte_is_power_of_2(rxqs_n) && !priv->isolated) {
-		unsigned n_active;
-
-		n_active = rte_align32pow2(rxqs_n + 1) >> 1;
-		WARN("%p: number of RX queues must be a power"
-			" of 2: %u queues among %u will be active",
-			(void *)dev, n_active, rxqs_n);
-	}
-
-	INFO("%p: RX queues number update: %u -> %u",
-	     (void *)dev, priv->rxqs_n, rxqs_n);
-	/* If RSS is enabled, disable it first. */
-	if (priv->rss) {
-		unsigned int i;
-
-		/* Only if there are no remaining child RX queues. */
-		for (i = 0; (i != priv->rxqs_n); ++i)
-			if ((*priv->rxqs)[i] != NULL)
-				return EINVAL;
-		priv_mac_addr_del(priv);
-		rxq_cleanup(&priv->rxq_parent);
-		priv->rss = 0;
-		priv->rxqs_n = 0;
-	}
-	if (rxqs_n <= 1) {
-		/* Nothing else to do. */
+	if (rxqs_n != priv->rxqs_n) {
+		INFO("%p: RX queues number update: %u -> %u",
+		     (void *)dev, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		return 0;
-	}
-	/* Allocate a new RSS parent queue if supported by hardware. */
-	if (!priv->hw_rss) {
-		ERROR("%p: only a single RX queue can be configured when"
-		      " hardware doesn't support RSS",
-		      (void *)dev);
-		return EINVAL;
 	}
-	/* Fail if hardware doesn't support that many RSS queues. */
-	if (rxqs_n >= priv->max_rss_tbl_sz) {
-		ERROR("%p: only %u RX queues can be configured for RSS",
-		      (void *)dev, priv->max_rss_tbl_sz);
-		return EINVAL;
-	}
-	priv->rss = 1;
-	tmp = priv->rxqs_n;
-	priv->rxqs_n = rxqs_n;
-	ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, 0, NULL, NULL);
-	if (!ret)
-		return 0;
-	/* Failure, rollback. */
-	priv->rss = 0;
-	priv->rxqs_n = tmp;
-	assert(ret > 0);
-	return ret;
+	return 0;
 }
 
 /**
@@ -2014,8 +1957,7 @@ priv_mac_addr_del(struct priv *priv)
 /**
  * Register a MAC address.
  *
- * In RSS mode, the MAC address is registered in the parent queue,
- * otherwise it is registered in queue 0.
+ * The MAC address is registered in queue 0.
  *
  * @param priv
  *   Pointer to private structure.
@@ -2035,9 +1977,7 @@ priv_mac_addr_add(struct priv *priv)
 		return 0;
 	if (priv->isolated)
 		return 0;
-	if (priv->rss)
-		rxq = &priv->rxq_parent;
-	else if (*priv->rxqs && (*priv->rxqs)[0])
+	if (*priv->rxqs && (*priv->rxqs)[0])
 		rxq = (*priv->rxqs)[0];
 	else
 		return 0;
@@ -2647,69 +2587,8 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-	attr.max_inl_recv = priv->inl_recv_size;
-	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-	return ibv_exp_create_qp(priv->ctx, &attr);
-}
-
-/**
- * Allocate a RSS Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- * @param parent
- *   If nonzero, create a parent QP, otherwise a child.
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-		 int parent, struct ibv_exp_res_domain *rd)
-{
-	struct ibv_exp_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX4_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX4_PMD_SGE_WR_N),
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
-			      IBV_EXP_QP_INIT_ATTR_QPG),
-		.pd = priv->pd,
-		.res_domain = rd,
-	};
-
 	attr.max_inl_recv = priv->inl_recv_size,
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-	if (parent) {
-		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
-		/* TSS isn't necessary. */
-		attr.qpg.parent_attrib.tss_child_count = 0;
-		attr.qpg.parent_attrib.rss_child_count =
-			rte_align32pow2(priv->rxqs_n + 1) >> 1;
-		DEBUG("initializing parent RSS queue");
-	} else {
-		attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
-		attr.qpg.qpg_parent = priv->rxq_parent.qp;
-		DEBUG("initializing child RSS queue");
-	}
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
@@ -2741,13 +2620,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int err;
-	int parent = (rxq == &priv->rxq_parent);
 
-	if (parent) {
-		ERROR("%p: cannot rehash parent queue %p",
-		      (void *)dev, (void *)rxq);
-		return EINVAL;
-	}
 	mb_len = rte_pktmbuf_data_room_size(rxq->mp);
 	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
 	/* Number of descriptors and mbufs currently allocated. */
@@ -2800,9 +2673,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		.port_num = priv->port
 	};
 	err = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-				 IBV_EXP_QP_PORT));
+				IBV_EXP_QP_STATE |
+				IBV_EXP_QP_PORT);
 	if (err) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(err));
@@ -2899,9 +2771,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
  *   Number of descriptors to configure in queue.
  * @param socket
  *   NUMA socket on which memory must be allocated.
- * @param inactive
- *   If true, the queue is disabled because its index is higher or
- *   equal to the real number of queues, which must be a power of 2.
  * @param[in] conf
  *   Thresholds parameters.
  * @param mp
@@ -2912,7 +2781,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
  */
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  unsigned int socket, const struct rte_eth_rxconf *conf,
 	  struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
@@ -2931,20 +2800,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret = 0;
-	int parent = (rxq == &priv->rxq_parent);
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	/*
-	 * If this is a parent queue, hardware must support RSS and
-	 * RSS must be enabled.
-	 */
-	assert((!parent) || ((priv->hw_rss) && (priv->rss)));
-	if (parent) {
-		/* Even if unused, ibv_create_cq() requires at least one
-		 * descriptor. */
-		desc = 1;
-		goto skip_mr;
-	}
 	mb_len = rte_pktmbuf_data_room_size(mp);
 	if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
 		ERROR("%p: invalid number of RX descriptors (must be a"
@@ -2982,7 +2839,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-skip_mr:
 	attr.rd = (struct ibv_exp_res_domain_init_attr){
 		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
 			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
@@ -3022,11 +2878,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	if (priv->rss && !inactive)
-		tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
-					   tmpl.rd);
-	else
-		tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
 	if (tmpl.qp == NULL) {
 		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
@@ -3040,17 +2892,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.port_num = priv->port
 	};
 	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-				 IBV_EXP_QP_PORT));
+				IBV_EXP_QP_STATE |
+				IBV_EXP_QP_PORT);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	/* Allocate descriptors for RX queues, except for the RSS parent. */
-	if (parent)
-		goto skip_alloc;
 	if (tmpl.sp)
 		ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
 	else
@@ -3072,7 +2920,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      strerror(ret));
 		goto error;
 	}
-skip_alloc:
 	mod = (struct ibv_exp_qp_attr){
 		.qp_state = IBV_QPS_RTR
 	};
@@ -3146,7 +2993,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rxq *rxq = (*priv->rxqs)[idx];
-	int inactive = 0;
 	int ret;
 
 	priv_lock(priv);
@@ -3178,9 +3024,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -ENOMEM;
 		}
 	}
-	if (idx >= rte_align32pow2(priv->rxqs_n + 1) >> 1)
-		inactive = 1;
-	ret = rxq_setup(dev, rxq, desc, socket, inactive, conf, mp);
+	ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
 	if (ret)
 		rte_free(rxq);
 	else {
@@ -3215,7 +3059,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		return;
 	priv = rxq->priv;
 	priv_lock(priv);
-	assert(rxq != &priv->rxq_parent);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
@@ -3440,8 +3283,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		priv->txqs_n = 0;
 		priv->txqs = NULL;
 	}
-	if (priv->rss)
-		rxq_cleanup(&priv->rxq_parent);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
@@ -4750,7 +4591,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
-		exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
 
 		DEBUG("using port %u", port);
 
@@ -4808,30 +4648,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			err = ENODEV;
 			goto port_error;
 		}
-		if ((exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_QPG) &&
-		    (exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_UD_RSS) &&
-		    (exp_device_attr.comp_mask &
-		     IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ) &&
-		    (exp_device_attr.max_rss_tbl_sz > 0)) {
-			priv->hw_qpg = 1;
-			priv->hw_rss = 1;
-			priv->max_rss_tbl_sz = exp_device_attr.max_rss_tbl_sz;
-		} else {
-			priv->hw_qpg = 0;
-			priv->hw_rss = 0;
-			priv->max_rss_tbl_sz = 0;
-		}
-		priv->hw_tss = !!(exp_device_attr.exp_device_cap_flags &
-				  IBV_EXP_DEVICE_UD_TSS);
-		DEBUG("device flags: %s%s%s",
-		      (priv->hw_qpg ? "IBV_DEVICE_QPG " : ""),
-		      (priv->hw_tss ? "IBV_DEVICE_TSS " : ""),
-		      (priv->hw_rss ? "IBV_DEVICE_RSS " : ""));
-		if (priv->hw_rss)
-			DEBUG("maximum RSS indirection table size: %u",
-			      exp_device_attr.max_rss_tbl_sz);
 
 		priv->hw_csum =
 			((exp_device_attr.exp_device_cap_flags &
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 726ca2a..fa703a2 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -242,19 +242,13 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int hw_qpg:1; /* QP groups are supported. */
-	unsigned int hw_tss:1; /* TSS is supported. */
-	unsigned int hw_rss:1; /* RSS is supported. */
 	unsigned int hw_csum:1; /* Checksum offload is supported. */
 	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
-	unsigned int rss:1; /* RSS is enabled. */
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
 	unsigned int inl_recv_size; /* Inline recv size */
-	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
-	struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 17/48] net/mlx4: drop checksum offloads support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (15 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 16/48] net/mlx4: drop RSS support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 18/48] net/mlx4: drop packet type recognition support Adrien Mazarguil
                   ` (32 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The Verbs API used to implement Tx and Rx checksum offloads is deprecated.
Support for these will be added back after refactoring the PMD.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  4 --
 doc/guides/nics/mlx4.rst          |  2 -
 drivers/net/mlx4/mlx4.c           | 91 ++--------------------------------
 drivers/net/mlx4/mlx4.h           |  4 --
 4 files changed, 4 insertions(+), 97 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index aa1ad21..08a2e17 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -14,10 +14,6 @@ MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
 SR-IOV               = Y
-L3 checksum offload  = Y
-L4 checksum offload  = Y
-Inner L3 checksum    = Y
-Inner L4 checksum    = Y
 Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index e906b8d..3f54343 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -80,8 +80,6 @@ Features
 - Multi arch support: x86_64 and POWER8.
 - Link state information is provided.
 - Scattered packets are supported for TX and RX.
-- Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
-- Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
 - RX interrupts.
 
 Configuration
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a1ff62a..36a616b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1258,17 +1258,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			++elts_comp;
 			send_flags |= IBV_EXP_QP_BURST_SIGNALED;
 		}
-		/* Should we enable HW CKSUM offload */
-		if (buf->ol_flags &
-		    (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
-			send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
-			/* HW does not support checksum offloads at arbitrary
-			 * offsets but automatically recognizes the packet
-			 * type. For inner L3/L4 checksums, only VXLAN (UDP)
-			 * tunnels are currently supported. */
-			if (RTE_ETH_IS_TUNNEL_PKT(buf->packet_type))
-				send_flags |= IBV_EXP_QP_BURST_TUNNEL;
-		}
 		if (likely(segs == 1)) {
 			uintptr_t addr;
 			uint32_t length;
@@ -2140,41 +2129,6 @@ rxq_cq_to_pkt_type(uint32_t flags)
 	return pkt_type;
 }
 
-/**
- * Translate RX completion flags to offload flags.
- *
- * @param[in] rxq
- *   Pointer to RX queue structure.
- * @param flags
- *   RX completion flags returned by poll_length_flags().
- *
- * @return
- *   Offload flags (ol_flags) for struct rte_mbuf.
- */
-static inline uint32_t
-rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
-{
-	uint32_t ol_flags = 0;
-
-	if (rxq->csum)
-		ol_flags |=
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IP_CSUM_OK,
-				  PKT_RX_IP_CKSUM_GOOD) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
-				  PKT_RX_L4_CKSUM_GOOD);
-	if ((flags & IBV_EXP_CQ_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
-		ol_flags |=
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_IP_CSUM_OK,
-				  PKT_RX_IP_CKSUM_GOOD) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_TCP_UDP_CSUM_OK,
-				  PKT_RX_L4_CKSUM_GOOD);
-	return ol_flags;
-}
-
 static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
 
@@ -2362,7 +2316,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		PORT(pkt_buf) = rxq->port_id;
 		PKT_LEN(pkt_buf) = pkt_buf_len;
 		pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
-		pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
+		pkt_buf->ol_flags = 0;
 
 		/* Return packet. */
 		*(pkts++) = pkt_buf;
@@ -2517,7 +2471,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		PKT_LEN(seg) = len;
 		DATA_LEN(seg) = len;
 		seg->packet_type = rxq_cq_to_pkt_type(flags);
-		seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
+		seg->ol_flags = 0;
 
 		/* Return packet. */
 		*(pkts++) = seg;
@@ -2626,15 +2580,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	/* Number of descriptors and mbufs currently allocated. */
 	desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
 	mbuf_n = desc_n;
-	/* Toggle RX checksum offload if hardware supports it. */
-	if (priv->hw_csum) {
-		tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-		rxq->csum = tmpl.csum;
-	}
-	if (priv->hw_csum_l2tun) {
-		tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-		rxq->csum_l2tun = tmpl.csum_l2tun;
-	}
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.enable_scatter &&
@@ -2808,11 +2753,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
 		return EINVAL;
 	}
-	/* Toggle RX checksum offload if hardware supports it. */
-	if (priv->hw_csum)
-		tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-	if (priv->hw_csum_l2tun)
-		tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
@@ -3416,18 +3356,8 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->max_tx_queues = max;
 	/* Last array entry is reserved for broadcast. */
 	info->max_mac_addrs = 1;
-	info->rx_offload_capa =
-		(priv->hw_csum ?
-		 (DEV_RX_OFFLOAD_IPV4_CKSUM |
-		  DEV_RX_OFFLOAD_UDP_CKSUM |
-		  DEV_RX_OFFLOAD_TCP_CKSUM) :
-		 0);
-	info->tx_offload_capa =
-		(priv->hw_csum ?
-		 (DEV_TX_OFFLOAD_IPV4_CKSUM |
-		  DEV_TX_OFFLOAD_UDP_CKSUM |
-		  DEV_TX_OFFLOAD_TCP_CKSUM) :
-		 0);
+	info->rx_offload_capa = 0;
+	info->tx_offload_capa = 0;
 	if (priv_get_ifname(priv, &ifname) == 0)
 		info->if_index = if_nametoindex(ifname);
 	info->speed_capa =
@@ -4649,19 +4579,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 
-		priv->hw_csum =
-			((exp_device_attr.exp_device_cap_flags &
-			  IBV_EXP_DEVICE_RX_CSUM_TCP_UDP_PKT) &&
-			 (exp_device_attr.exp_device_cap_flags &
-			  IBV_EXP_DEVICE_RX_CSUM_IP_PKT));
-		DEBUG("checksum offloading is %ssupported",
-		      (priv->hw_csum ? "" : "not "));
-
-		priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags &
-					 IBV_EXP_DEVICE_VXLAN_SUPPORT);
-		DEBUG("L2 tunnel checksum offloads are %ssupported",
-		      (priv->hw_csum_l2tun ? "" : "not "));
-
 		priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
 
 		if (priv->inl_recv_size) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index fa703a2..5a0a7a1 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -173,8 +173,6 @@ struct rxq {
 		struct rxq_elt (*no_sp)[]; /* RX elements. */
 	} elts;
 	unsigned int sp:1; /* Use scattered RX elements. */
-	unsigned int csum:1; /* Enable checksum offloading. */
-	unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -242,8 +240,6 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int hw_csum:1; /* Checksum offload is supported. */
-	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 18/48] net/mlx4: drop packet type recognition support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (16 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 17/48] net/mlx4: drop checksum offloads support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 19/48] net/mlx4: drop scatter/gather support Adrien Mazarguil
                   ` (31 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The Verbs API used to implement packet type recognition is deprecated.
Support will be added back after refactoring the PMD.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 -
 drivers/net/mlx4/mlx4.c           | 70 +---------------------------------
 2 files changed, 2 insertions(+), 69 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 08a2e17..27c7ae3 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -14,7 +14,6 @@ MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
 SR-IOV               = Y
-Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
 Other kdrv           = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 36a616b..e0e5d1f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -96,12 +96,6 @@ typedef union {
 
 #define WR_ID(o) (((wr_id_t *)&(o))->data)
 
-/* Transpose flags. Useful to convert IBV to DPDK flags. */
-#define TRANSPOSE(val, from, to) \
-	(((from) >= (to)) ? \
-	 (((val) & (from)) / ((from) / (to))) : \
-	 (((val) & (from)) * ((to) / (from))))
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -2088,47 +2082,6 @@ rxq_cleanup(struct rxq *rxq)
 	memset(rxq, 0, sizeof(*rxq));
 }
 
-/**
- * Translate RX completion flags to packet type.
- *
- * @param flags
- *   RX completion flags returned by poll_length_flags().
- *
- * @note: fix mlx4_dev_supported_ptypes_get() if any change here.
- *
- * @return
- *   Packet type for struct rte_mbuf.
- */
-static inline uint32_t
-rxq_cq_to_pkt_type(uint32_t flags)
-{
-	uint32_t pkt_type;
-
-	if (flags & IBV_EXP_CQ_RX_TUNNEL_PACKET)
-		pkt_type =
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_IPV4_PACKET,
-				  RTE_PTYPE_L3_IPV4_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_IPV6_PACKET,
-				  RTE_PTYPE_L3_IPV6_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV4_PACKET,
-				  RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV6_PACKET,
-				  RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN);
-	else
-		pkt_type =
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV4_PACKET,
-				  RTE_PTYPE_L3_IPV4_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV6_PACKET,
-				  RTE_PTYPE_L3_IPV6_EXT_UNKNOWN);
-	return pkt_type;
-}
-
 static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
 
@@ -2315,7 +2268,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		NB_SEGS(pkt_buf) = j;
 		PORT(pkt_buf) = rxq->port_id;
 		PKT_LEN(pkt_buf) = pkt_buf_len;
-		pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
+		pkt_buf->packet_type = 0;
 		pkt_buf->ol_flags = 0;
 
 		/* Return packet. */
@@ -2470,7 +2423,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		NEXT(seg) = NULL;
 		PKT_LEN(seg) = len;
 		DATA_LEN(seg) = len;
-		seg->packet_type = rxq_cq_to_pkt_type(flags);
+		seg->packet_type = 0;
 		seg->ol_flags = 0;
 
 		/* Return packet. */
@@ -3369,24 +3322,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	priv_unlock(priv);
 }
 
-static const uint32_t *
-mlx4_dev_supported_ptypes_get(struct rte_eth_dev *dev)
-{
-	static const uint32_t ptypes[] = {
-		/* refers to rxq_cq_to_pkt_type() */
-		RTE_PTYPE_L3_IPV4,
-		RTE_PTYPE_L3_IPV6,
-		RTE_PTYPE_INNER_L3_IPV4,
-		RTE_PTYPE_INNER_L3_IPV6,
-		RTE_PTYPE_UNKNOWN
-	};
-
-	if (dev->rx_pkt_burst == mlx4_rx_burst ||
-	    dev->rx_pkt_burst == mlx4_rx_burst_sp)
-		return ptypes;
-	return NULL;
-}
-
 /**
  * DPDK callback to get device statistics.
  *
@@ -3768,7 +3703,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
-	.dev_supported_ptypes_get = mlx4_dev_supported_ptypes_get,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 19/48] net/mlx4: drop scatter/gather support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (17 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 18/48] net/mlx4: drop packet type recognition support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 20/48] net/mlx4: drop inline receive support Adrien Mazarguil
                   ` (30 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The Verbs API used to implement Tx and Rx burst functions is deprecated.
Drop scatter/gather support to ease refactoring while maintaining basic
single-segment Rx/Tx functionality in the meantime.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |   1 -
 drivers/net/mlx4/mlx4.c           | 845 +--------------------------------
 drivers/net/mlx4/mlx4.h           |  28 +-
 4 files changed, 24 insertions(+), 851 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 27c7ae3..0812a30 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -12,7 +12,6 @@ Rx interrupt         = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
-Scattered Rx         = Y
 SR-IOV               = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 3f54343..8503804 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -79,7 +79,6 @@ Features
 
 - Multi arch support: x86_64 and POWER8.
 - Link state information is provided.
-- Scattered packets are supported for TX and RX.
 - RX interrupts.
 
 Configuration
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e0e5d1f..5546c0a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -60,7 +60,6 @@
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
 #include <rte_spinlock.h>
-#include <rte_atomic.h>
 #include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
@@ -582,26 +581,13 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 	unsigned int i;
 	struct txq_elt (*elts)[elts_n] =
 		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
-	linear_t (*elts_linear)[elts_n] =
-		rte_calloc_socket("TXQ", 1, sizeof(*elts_linear), 0,
-				  txq->socket);
-	struct ibv_mr *mr_linear = NULL;
 	int ret = 0;
 
-	if ((elts == NULL) || (elts_linear == NULL)) {
+	if (elts == NULL) {
 		ERROR("%p: can't allocate packets array", (void *)txq);
 		ret = ENOMEM;
 		goto error;
 	}
-	mr_linear =
-		ibv_reg_mr(txq->priv->pd, elts_linear, sizeof(*elts_linear),
-			   IBV_ACCESS_LOCAL_WRITE);
-	if (mr_linear == NULL) {
-		ERROR("%p: unable to configure MR, ibv_reg_mr() failed",
-		      (void *)txq);
-		ret = EINVAL;
-		goto error;
-	}
 	for (i = 0; (i != elts_n); ++i) {
 		struct txq_elt *elt = &(*elts)[i];
 
@@ -619,15 +605,9 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
 		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
 	txq->elts_comp_cd = txq->elts_comp_cd_init;
-	txq->elts_linear = elts_linear;
-	txq->mr_linear = mr_linear;
 	assert(ret == 0);
 	return 0;
 error:
-	if (mr_linear != NULL)
-		claim_zero(ibv_dereg_mr(mr_linear));
-
-	rte_free(elts_linear);
 	rte_free(elts);
 
 	DEBUG("%p: failed, freed everything", (void *)txq);
@@ -648,8 +628,6 @@ txq_free_elts(struct txq *txq)
 	unsigned int elts_head = txq->elts_head;
 	unsigned int elts_tail = txq->elts_tail;
 	struct txq_elt (*elts)[elts_n] = txq->elts;
-	linear_t (*elts_linear)[elts_n] = txq->elts_linear;
-	struct ibv_mr *mr_linear = txq->mr_linear;
 
 	DEBUG("%p: freeing WRs", (void *)txq);
 	txq->elts_n = 0;
@@ -659,12 +637,6 @@ txq_free_elts(struct txq *txq)
 	txq->elts_comp_cd = 0;
 	txq->elts_comp_cd_init = 0;
 	txq->elts = NULL;
-	txq->elts_linear = NULL;
-	txq->mr_linear = NULL;
-	if (mr_linear != NULL)
-		claim_zero(ibv_dereg_mr(mr_linear));
-
-	rte_free(elts_linear);
 	if (elts == NULL)
 		return;
 	while (elts_tail != elts_head) {
@@ -1037,152 +1009,6 @@ txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 }
 
 /**
- * Copy scattered mbuf contents to a single linear buffer.
- *
- * @param[out] linear
- *   Linear output buffer.
- * @param[in] buf
- *   Scattered input buffer.
- *
- * @return
- *   Number of bytes copied to the output buffer or 0 if not large enough.
- */
-static unsigned int
-linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
-{
-	unsigned int size = 0;
-	unsigned int offset;
-
-	do {
-		unsigned int len = DATA_LEN(buf);
-
-		offset = size;
-		size += len;
-		if (unlikely(size > sizeof(*linear)))
-			return 0;
-		memcpy(&(*linear)[offset],
-		       rte_pktmbuf_mtod(buf, uint8_t *),
-		       len);
-		buf = NEXT(buf);
-	} while (buf != NULL);
-	return size;
-}
-
-/**
- * Handle scattered buffers for mlx4_tx_burst().
- *
- * @param txq
- *   TX queue structure.
- * @param segs
- *   Number of segments in buf.
- * @param elt
- *   TX queue element to fill.
- * @param[in] buf
- *   Buffer to process.
- * @param elts_head
- *   Index of the linear buffer to use if necessary (normally txq->elts_head).
- * @param[out] sges
- *   Array filled with SGEs on success.
- *
- * @return
- *   A structure containing the processed packet size in bytes and the
- *   number of SGEs. Both fields are set to (unsigned int)-1 in case of
- *   failure.
- */
-static struct tx_burst_sg_ret {
-	unsigned int length;
-	unsigned int num;
-}
-tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
-	    struct rte_mbuf *buf, unsigned int elts_head,
-	    struct ibv_sge (*sges)[MLX4_PMD_SGE_WR_N])
-{
-	unsigned int sent_size = 0;
-	unsigned int j;
-	int linearize = 0;
-
-	/* When there are too many segments, extra segments are
-	 * linearized in the last SGE. */
-	if (unlikely(segs > elemof(*sges))) {
-		segs = (elemof(*sges) - 1);
-		linearize = 1;
-	}
-	/* Update element. */
-	elt->buf = buf;
-	/* Register segments as SGEs. */
-	for (j = 0; (j != segs); ++j) {
-		struct ibv_sge *sge = &(*sges)[j];
-		uint32_t lkey;
-
-		/* Retrieve Memory Region key for this memory pool. */
-		lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-		if (unlikely(lkey == (uint32_t)-1)) {
-			/* MR does not exist. */
-			DEBUG("%p: unable to get MP <-> MR association",
-			      (void *)txq);
-			/* Clean up TX element. */
-			elt->buf = NULL;
-			goto stop;
-		}
-		/* Update SGE. */
-		sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
-		if (txq->priv->vf)
-			rte_prefetch0((volatile void *)
-				      (uintptr_t)sge->addr);
-		sge->length = DATA_LEN(buf);
-		sge->lkey = lkey;
-		sent_size += sge->length;
-		buf = NEXT(buf);
-	}
-	/* If buf is not NULL here and is not going to be linearized,
-	 * nb_segs is not valid. */
-	assert(j == segs);
-	assert((buf == NULL) || (linearize));
-	/* Linearize extra segments. */
-	if (linearize) {
-		struct ibv_sge *sge = &(*sges)[segs];
-		linear_t *linear = &(*txq->elts_linear)[elts_head];
-		unsigned int size = linearize_mbuf(linear, buf);
-
-		assert(segs == (elemof(*sges) - 1));
-		if (size == 0) {
-			/* Invalid packet. */
-			DEBUG("%p: packet too large to be linearized.",
-			      (void *)txq);
-			/* Clean up TX element. */
-			elt->buf = NULL;
-			goto stop;
-		}
-		/* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately. */
-		if (elemof(*sges) == 1) {
-			do {
-				struct rte_mbuf *next = NEXT(buf);
-
-				rte_pktmbuf_free_seg(buf);
-				buf = next;
-			} while (buf != NULL);
-			elt->buf = NULL;
-		}
-		/* Update SGE. */
-		sge->addr = (uintptr_t)&(*linear)[0];
-		sge->length = size;
-		sge->lkey = txq->mr_linear->lkey;
-		sent_size += size;
-		/* Include last segment. */
-		segs++;
-	}
-	return (struct tx_burst_sg_ret){
-		.length = sent_size,
-		.num = segs,
-	};
-stop:
-	return (struct tx_burst_sg_ret){
-		.length = -1,
-		.num = -1,
-	};
-}
-
-/**
  * DPDK callback for TX.
  *
  * @param dpdk_txq
@@ -1294,23 +1120,8 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				goto stop;
 			sent_size += length;
 		} else {
-			struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
-			struct tx_burst_sg_ret ret;
-
-			ret = tx_burst_sg(txq, segs, elt, buf, elts_head,
-					  &sges);
-			if (ret.length == (unsigned int)-1)
-				goto stop;
-			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
-			/* Put SG list into send queue. */
-			err = txq->if_qp->send_pending_sg_list
-				(txq->qp,
-				 sges,
-				 ret.num,
-				 send_flags);
-			if (unlikely(err))
-				goto stop;
-			sent_size += ret.length;
+			err = -1;
+			goto stop;
 		}
 		elts_head = elts_head_next;
 		/* Increment sent bytes counter. */
@@ -1375,12 +1186,10 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	(void)conf; /* Thresholds configuration (ignored). */
 	if (priv == NULL)
 		return EINVAL;
-	if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
-		ERROR("%p: invalid number of TX descriptors (must be a"
-		      " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
+	if (desc == 0) {
+		ERROR("%p: invalid number of TX descriptors", (void *)dev);
 		return EINVAL;
 	}
-	desc /= MLX4_PMD_SGE_WR_N;
 	/* MRs will be registered in mp2mr[] later. */
 	attr.rd = (struct ibv_exp_res_domain_init_attr){
 		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
@@ -1421,10 +1230,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 					priv->device_attr.max_qp_wr :
 					desc),
 			/* Max number of scatter/gather elements in a WR. */
-			.max_send_sge = ((priv->device_attr.max_sge <
-					  MLX4_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX4_PMD_SGE_WR_N),
+			.max_send_sge = 1,
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
@@ -1623,153 +1429,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 /* RX queues handling. */
 
 /**
- * Allocate RX queue elements with scattered packets support.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- * @param[in] pool
- *   If not NULL, fetch buffers from this array instead of allocating them
- *   with rte_pktmbuf_alloc().
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
-		  struct rte_mbuf **pool)
-{
-	unsigned int i;
-	struct rxq_elt_sp (*elts)[elts_n] =
-		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
-				  rxq->socket);
-	int ret = 0;
-
-	if (elts == NULL) {
-		ERROR("%p: can't allocate packets array", (void *)rxq);
-		ret = ENOMEM;
-		goto error;
-	}
-	/* For each WR (packet). */
-	for (i = 0; (i != elts_n); ++i) {
-		unsigned int j;
-		struct rxq_elt_sp *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
-		struct ibv_sge (*sges)[(elemof(elt->sges))] = &elt->sges;
-
-		/* These two arrays must have the same size. */
-		assert(elemof(elt->sges) == elemof(elt->bufs));
-		/* Configure WR. */
-		wr->wr_id = i;
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = &(*sges)[0];
-		wr->num_sge = elemof(*sges);
-		/* For each SGE (segment). */
-		for (j = 0; (j != elemof(elt->bufs)); ++j) {
-			struct ibv_sge *sge = &(*sges)[j];
-			struct rte_mbuf *buf;
-
-			if (pool != NULL) {
-				buf = *(pool++);
-				assert(buf != NULL);
-				rte_pktmbuf_reset(buf);
-			} else
-				buf = rte_pktmbuf_alloc(rxq->mp);
-			if (buf == NULL) {
-				assert(pool == NULL);
-				ERROR("%p: empty mbuf pool", (void *)rxq);
-				ret = ENOMEM;
-				goto error;
-			}
-			elt->bufs[j] = buf;
-			/* Headroom is reserved by rte_pktmbuf_alloc(). */
-			assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
-			/* Buffer is supposed to be empty. */
-			assert(rte_pktmbuf_data_len(buf) == 0);
-			assert(rte_pktmbuf_pkt_len(buf) == 0);
-			/* sge->addr must be able to store a pointer. */
-			assert(sizeof(sge->addr) >= sizeof(uintptr_t));
-			if (j == 0) {
-				/* The first SGE keeps its headroom. */
-				sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
-				sge->length = (buf->buf_len -
-					       RTE_PKTMBUF_HEADROOM);
-			} else {
-				/* Subsequent SGEs lose theirs. */
-				assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
-				SET_DATA_OFF(buf, 0);
-				sge->addr = (uintptr_t)buf->buf_addr;
-				sge->length = buf->buf_len;
-			}
-			sge->lkey = rxq->mr->lkey;
-			/* Redundant check for tailroom. */
-			assert(sge->length == rte_pktmbuf_tailroom(buf));
-		}
-	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
-	DEBUG("%p: allocated and configured %u WRs (%zu segments)",
-	      (void *)rxq, elts_n, (elts_n * elemof((*elts)[0].sges)));
-	rxq->elts_n = elts_n;
-	rxq->elts_head = 0;
-	rxq->elts.sp = elts;
-	assert(ret == 0);
-	return 0;
-error:
-	if (elts != NULL) {
-		assert(pool == NULL);
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			unsigned int j;
-			struct rxq_elt_sp *elt = &(*elts)[i];
-
-			for (j = 0; (j != elemof(elt->bufs)); ++j) {
-				struct rte_mbuf *buf = elt->bufs[j];
-
-				if (buf != NULL)
-					rte_pktmbuf_free_seg(buf);
-			}
-		}
-		rte_free(elts);
-	}
-	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(ret > 0);
-	return ret;
-}
-
-/**
- * Free RX queue elements with scattered packets support.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_free_elts_sp(struct rxq *rxq)
-{
-	unsigned int i;
-	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt_sp (*elts)[elts_n] = rxq->elts.sp;
-
-	DEBUG("%p: freeing WRs", (void *)rxq);
-	rxq->elts_n = 0;
-	rxq->elts.sp = NULL;
-	if (elts == NULL)
-		return;
-	for (i = 0; (i != elemof(*elts)); ++i) {
-		unsigned int j;
-		struct rxq_elt_sp *elt = &(*elts)[i];
-
-		for (j = 0; (j != elemof(elt->bufs)); ++j) {
-			struct rte_mbuf *buf = elt->bufs[j];
-
-			if (buf != NULL)
-				rte_pktmbuf_free_seg(buf);
-		}
-	}
-	rte_free(elts);
-}
-
-/**
  * Allocate RX queue elements.
  *
  * @param rxq
@@ -1859,7 +1518,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 	      (void *)rxq, elts_n);
 	rxq->elts_n = elts_n;
 	rxq->elts_head = 0;
-	rxq->elts.no_sp = elts;
+	rxq->elts = elts;
 	assert(ret == 0);
 	return 0;
 error:
@@ -1894,11 +1553,11 @@ rxq_free_elts(struct rxq *rxq)
 {
 	unsigned int i;
 	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt (*elts)[elts_n] = rxq->elts.no_sp;
+	struct rxq_elt (*elts)[elts_n] = rxq->elts;
 
 	DEBUG("%p: freeing WRs", (void *)rxq);
 	rxq->elts_n = 0;
-	rxq->elts.no_sp = NULL;
+	rxq->elts = NULL;
 	if (elts == NULL)
 		return;
 	for (i = 0; (i != elemof(*elts)); ++i) {
@@ -2034,10 +1693,7 @@ rxq_cleanup(struct rxq *rxq)
 	struct ibv_exp_release_intf_params params;
 
 	DEBUG("cleaning up %p", (void *)rxq);
-	if (rxq->sp)
-		rxq_free_elts_sp(rxq);
-	else
-		rxq_free_elts(rxq);
+	rxq_free_elts(rxq);
 	if (rxq->if_qp != NULL) {
 		assert(rxq->priv != NULL);
 		assert(rxq->priv->ctx != NULL);
@@ -2082,230 +1738,10 @@ rxq_cleanup(struct rxq *rxq)
 	memset(rxq, 0, sizeof(*rxq));
 }
 
-static uint16_t
-mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
-
-/**
- * DPDK callback for RX with scattered packets support.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
-	const unsigned int elts_n = rxq->elts_n;
-	unsigned int elts_head = rxq->elts_head;
-	struct ibv_recv_wr head;
-	struct ibv_recv_wr **next = &head.next;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int i;
-	unsigned int pkts_ret = 0;
-	int ret;
-
-	if (unlikely(!rxq->sp))
-		return mlx4_rx_burst(dpdk_rxq, pkts, pkts_n);
-	if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
-		return 0;
-	for (i = 0; (i != pkts_n); ++i) {
-		struct rxq_elt_sp *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
-		unsigned int len;
-		unsigned int pkt_buf_len;
-		struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
-		struct rte_mbuf **pkt_buf_next = &pkt_buf;
-		unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
-		unsigned int j = 0;
-		uint32_t flags;
-
-		/* Sanity checks. */
-#ifdef NDEBUG
-		(void)wr_id;
-#endif
-		assert(wr_id < rxq->elts_n);
-		assert(wr->sg_list == elt->sges);
-		assert(wr->num_sge == elemof(elt->sges));
-		assert(elts_head < rxq->elts_n);
-		assert(rxq->elts_head < rxq->elts_n);
-		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
-						    &flags);
-		if (unlikely(ret < 0)) {
-			struct ibv_wc wc;
-			int wcs_n;
-
-			DEBUG("rxq=%p, poll_length() failed (ret=%d)",
-			      (void *)rxq, ret);
-			/* ibv_poll_cq() must be used in case of failure. */
-			wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
-			if (unlikely(wcs_n == 0))
-				break;
-			if (unlikely(wcs_n < 0)) {
-				DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
-				      (void *)rxq, wcs_n);
-				break;
-			}
-			assert(wcs_n == 1);
-			if (unlikely(wc.status != IBV_WC_SUCCESS)) {
-				/* Whatever, just repost the offending WR. */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
-				      " completion status (%d): %s",
-				      (void *)rxq, wc.wr_id, wc.status,
-				      ibv_wc_status_str(wc.status));
-				/* Increment dropped packets counter. */
-				++rxq->stats.idropped;
-				/* Link completed WRs together for repost. */
-				*next = wr;
-				next = &wr->next;
-				goto repost;
-			}
-			ret = wc.byte_len;
-		}
-		if (ret == 0)
-			break;
-		len = ret;
-		pkt_buf_len = len;
-		/* Link completed WRs together for repost. */
-		*next = wr;
-		next = &wr->next;
-		/*
-		 * Replace spent segments with new ones, concatenate and
-		 * return them as pkt_buf.
-		 */
-		while (1) {
-			struct ibv_sge *sge = &elt->sges[j];
-			struct rte_mbuf *seg = elt->bufs[j];
-			struct rte_mbuf *rep;
-			unsigned int seg_tailroom;
-
-			/*
-			 * Fetch initial bytes of packet descriptor into a
-			 * cacheline while allocating rep.
-			 */
-			rte_prefetch0(seg);
-			rep = rte_mbuf_raw_alloc(rxq->mp);
-			if (unlikely(rep == NULL)) {
-				/*
-				 * Unable to allocate a replacement mbuf,
-				 * repost WR.
-				 */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
-				      " can't allocate a new mbuf",
-				      (void *)rxq, wr_id);
-				if (pkt_buf != NULL) {
-					*pkt_buf_next = NULL;
-					rte_pktmbuf_free(pkt_buf);
-				}
-				/* Increase out of memory counters. */
-				++rxq->stats.rx_nombuf;
-				++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-				goto repost;
-			}
-#ifndef NDEBUG
-			/* Poison user-modifiable fields in rep. */
-			NEXT(rep) = (void *)((uintptr_t)-1);
-			SET_DATA_OFF(rep, 0xdead);
-			DATA_LEN(rep) = 0xd00d;
-			PKT_LEN(rep) = 0xdeadd00d;
-			NB_SEGS(rep) = 0x2a;
-			PORT(rep) = 0x2a;
-			rep->ol_flags = -1;
-			/*
-			 * Clear special flags in mbuf to avoid
-			 * crashing while freeing.
-			 */
-			rep->ol_flags &=
-				~(uint64_t)(IND_ATTACHED_MBUF |
-					    CTRL_MBUF_FLAG);
-#endif
-			assert(rep->buf_len == seg->buf_len);
-			/* Reconfigure sge to use rep instead of seg. */
-			assert(sge->lkey == rxq->mr->lkey);
-			sge->addr = ((uintptr_t)rep->buf_addr + seg_headroom);
-			elt->bufs[j] = rep;
-			++j;
-			/* Update pkt_buf if it's the first segment, or link
-			 * seg to the previous one and update pkt_buf_next. */
-			*pkt_buf_next = seg;
-			pkt_buf_next = &NEXT(seg);
-			/* Update seg information. */
-			seg_tailroom = (seg->buf_len - seg_headroom);
-			assert(sge->length == seg_tailroom);
-			SET_DATA_OFF(seg, seg_headroom);
-			if (likely(len <= seg_tailroom)) {
-				/* Last segment. */
-				DATA_LEN(seg) = len;
-				PKT_LEN(seg) = len;
-				/* Sanity check. */
-				assert(rte_pktmbuf_headroom(seg) ==
-				       seg_headroom);
-				assert(rte_pktmbuf_tailroom(seg) ==
-				       (seg_tailroom - len));
-				break;
-			}
-			DATA_LEN(seg) = seg_tailroom;
-			PKT_LEN(seg) = seg_tailroom;
-			/* Sanity check. */
-			assert(rte_pktmbuf_headroom(seg) == seg_headroom);
-			assert(rte_pktmbuf_tailroom(seg) == 0);
-			/* Fix len and clear headroom for next segments. */
-			len -= seg_tailroom;
-			seg_headroom = 0;
-		}
-		/* Update head and tail segments. */
-		*pkt_buf_next = NULL;
-		assert(pkt_buf != NULL);
-		assert(j != 0);
-		NB_SEGS(pkt_buf) = j;
-		PORT(pkt_buf) = rxq->port_id;
-		PKT_LEN(pkt_buf) = pkt_buf_len;
-		pkt_buf->packet_type = 0;
-		pkt_buf->ol_flags = 0;
-
-		/* Return packet. */
-		*(pkts++) = pkt_buf;
-		++pkts_ret;
-		/* Increase bytes counter. */
-		rxq->stats.ibytes += pkt_buf_len;
-repost:
-		if (++elts_head >= elts_n)
-			elts_head = 0;
-		continue;
-	}
-	if (unlikely(i == 0))
-		return 0;
-	*next = NULL;
-	/* Repost WRs. */
-	ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
-	if (unlikely(ret)) {
-		/* Inability to repost WRs is fatal. */
-		DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
-		      (void *)rxq->priv,
-		      (void *)bad_wr,
-		      strerror(ret));
-		abort();
-	}
-	rxq->elts_head = elts_head;
-	/* Increase packets counter. */
-	rxq->stats.ipackets += pkts_ret;
-	return pkts_ret;
-}
-
 /**
  * DPDK callback for RX.
  *
- * The following function is the same as mlx4_rx_burst_sp(), except it doesn't
- * manage scattered packets. Improves performance when MRU is lower than the
- * size of the first segment.
+ * The following function doesn't manage scattered packets.
  *
  * @param dpdk_rxq
  *   Generic pointer to RX queue structure.
@@ -2321,7 +1757,7 @@ static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 {
 	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 	const unsigned int elts_n = rxq->elts_n;
 	unsigned int elts_head = rxq->elts_head;
 	struct ibv_sge sges[pkts_n];
@@ -2329,8 +1765,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	unsigned int pkts_ret = 0;
 	int ret;
 
-	if (unlikely(rxq->sp))
-		return mlx4_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
 	for (i = 0; (i != pkts_n); ++i) {
 		struct rxq_elt *elt = &(*elts)[elts_head];
 		struct ibv_recv_wr *wr = &elt->wr;
@@ -2482,10 +1916,7 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 					priv->device_attr.max_qp_wr :
 					desc),
 			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX4_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX4_PMD_SGE_WR_N),
+			.max_recv_sge = 1,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
 		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
@@ -2500,165 +1931,6 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 }
 
 /**
- * Reconfigure a RX queue with new parameters.
- *
- * rxq_rehash() does not allocate mbufs, which, if not done from the right
- * thread (such as a control thread), may corrupt the pool.
- * In case of failure, the queue is left untouched.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param rxq
- *   RX queue pointer.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
-{
-	struct priv *priv = rxq->priv;
-	struct rxq tmpl = *rxq;
-	unsigned int mbuf_n;
-	unsigned int desc_n;
-	struct rte_mbuf **pool;
-	unsigned int i, k;
-	struct ibv_exp_qp_attr mod;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int mb_len;
-	int err;
-
-	mb_len = rte_pktmbuf_data_room_size(rxq->mp);
-	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
-	/* Number of descriptors and mbufs currently allocated. */
-	desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
-	mbuf_n = desc_n;
-	/* Enable scattered packets support for this queue if necessary. */
-	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
-	if (dev->data->dev_conf.rxmode.enable_scatter &&
-	    (dev->data->dev_conf.rxmode.max_rx_pkt_len >
-	     (mb_len - RTE_PKTMBUF_HEADROOM))) {
-		tmpl.sp = 1;
-		desc_n /= MLX4_PMD_SGE_WR_N;
-	} else
-		tmpl.sp = 0;
-	DEBUG("%p: %s scattered packets support (%u WRs)",
-	      (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc_n);
-	/* If scatter mode is the same as before, nothing to do. */
-	if (tmpl.sp == rxq->sp) {
-		DEBUG("%p: nothing to do", (void *)dev);
-		return 0;
-	}
-	/* From now on, any failure will render the queue unusable.
-	 * Reinitialize QP. */
-	mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err) {
-		ERROR("%p: cannot reset QP: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	err = ibv_resize_cq(tmpl.cq, desc_n);
-	if (err) {
-		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod,
-				IBV_EXP_QP_STATE |
-				IBV_EXP_QP_PORT);
-	if (err) {
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	};
-	/* Allocate pool. */
-	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
-	if (pool == NULL) {
-		ERROR("%p: cannot allocate memory", (void *)dev);
-		return ENOBUFS;
-	}
-	/* Snatch mbufs from original queue. */
-	k = 0;
-	if (rxq->sp) {
-		struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
-
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			struct rxq_elt_sp *elt = &(*elts)[i];
-			unsigned int j;
-
-			for (j = 0; (j != elemof(elt->bufs)); ++j) {
-				assert(elt->bufs[j] != NULL);
-				pool[k++] = elt->bufs[j];
-			}
-		}
-	} else {
-		struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
-
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf = (void *)
-				((uintptr_t)elt->sge.addr -
-				 WR_ID(elt->wr.wr_id).offset);
-
-			assert(WR_ID(elt->wr.wr_id).id == i);
-			pool[k++] = buf;
-		}
-	}
-	assert(k == mbuf_n);
-	tmpl.elts_n = 0;
-	tmpl.elts.sp = NULL;
-	assert((void *)&tmpl.elts.sp == (void *)&tmpl.elts.no_sp);
-	err = ((tmpl.sp) ?
-	       rxq_alloc_elts_sp(&tmpl, desc_n, pool) :
-	       rxq_alloc_elts(&tmpl, desc_n, pool));
-	if (err) {
-		ERROR("%p: cannot reallocate WRs, aborting", (void *)dev);
-		rte_free(pool);
-		assert(err > 0);
-		return err;
-	}
-	assert(tmpl.elts_n == desc_n);
-	assert(tmpl.elts.sp != NULL);
-	rte_free(pool);
-	/* Clean up original data. */
-	rxq->elts_n = 0;
-	rte_free(rxq->elts.sp);
-	rxq->elts.sp = NULL;
-	/* Post WRs. */
-	err = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
-	if (err) {
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(err));
-		goto skip_rtr;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err)
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(err));
-skip_rtr:
-	*rxq = tmpl;
-	assert(err >= 0);
-	return err;
-}
-
-/**
  * Configure a RX queue.
  *
  * @param dev
@@ -2701,19 +1973,19 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	mb_len = rte_pktmbuf_data_room_size(mp);
-	if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
-		ERROR("%p: invalid number of RX descriptors (must be a"
-		      " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
+	if (desc == 0) {
+		ERROR("%p: invalid number of RX descriptors", (void *)dev);
 		return EINVAL;
 	}
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
 	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
-		tmpl.sp = 0;
+		;
 	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
-		tmpl.sp = 1;
-		desc /= MLX4_PMD_SGE_WR_N;
+		WARN("%p: scattered mode has been requested but is"
+		     " not supported, this may lead to packet loss",
+		     (void *)dev);
 	} else {
 		WARN("%p: the requested maximum Rx packet size (%u) is"
 		     " larger than a single mbuf (%u) and scattered"
@@ -2722,8 +1994,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		     dev->data->dev_conf.rxmode.max_rx_pkt_len,
 		     mb_len - RTE_PKTMBUF_HEADROOM);
 	}
-	DEBUG("%p: %s scattered packets support (%u WRs)",
-	      (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
 	/* Use the entire RX mempool as the memory region. */
 	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
 	if (tmpl.mr == NULL) {
@@ -2792,20 +2062,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	if (tmpl.sp)
-		ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
-	else
-		ret = rxq_alloc_elts(&tmpl, desc, NULL);
+	ret = rxq_alloc_elts(&tmpl, desc, NULL);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	ret = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
+	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
 	if (ret) {
 		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
 		      (void *)dev,
@@ -2926,10 +2189,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, (void *)rxq);
 		(*priv->rxqs)[idx] = rxq;
 		/* Update receive callback. */
-		if (rxq->sp)
-			dev->rx_pkt_burst = mlx4_rx_burst_sp;
-		else
-			dev->rx_pkt_burst = mlx4_rx_burst;
+		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
 	priv_unlock(priv);
 	return -ret;
@@ -3205,23 +2465,12 @@ priv_set_link(struct priv *priv, int up)
 {
 	struct rte_eth_dev *dev = priv->dev;
 	int err;
-	unsigned int i;
 
 	if (up) {
 		err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
 		if (err)
 			return err;
-		for (i = 0; i < priv->rxqs_n; i++)
-			if ((*priv->rxqs)[i]->sp)
-				break;
-		/* Check if an sp queue exists.
-		 * Note: Some old frames might be received.
-		 */
-		if (i == priv->rxqs_n)
-			dev->rx_pkt_burst = mlx4_rx_burst;
-		else
-			dev->rx_pkt_burst = mlx4_rx_burst_sp;
-		dev->tx_pkt_burst = mlx4_tx_burst;
+		dev->rx_pkt_burst = mlx4_rx_burst;
 	} else {
 		err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
 		if (err)
@@ -3469,12 +2718,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 /**
  * DPDK callback to change the MTU.
  *
- * Setting the MTU affects hardware MRU (packets larger than the MTU cannot be
- * received). Use this as a hint to enable/disable scattered packets support
- * and improve performance when not needed.
- * Since failure is not an option, reconfiguring queues on the fly is not
- * recommended.
- *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param in_mtu
@@ -3488,9 +2731,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 {
 	struct priv *priv = dev->data->dev_private;
 	int ret = 0;
-	unsigned int i;
-	uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
-		mlx4_rx_burst;
 
 	priv_lock(priv);
 	/* Set kernel interface MTU first. */
@@ -3502,45 +2742,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	} else
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
 	priv->mtu = mtu;
-	/* Remove MAC flow. */
-	priv_mac_addr_del(priv);
-	/* Temporarily replace RX handler with a fake one, assuming it has not
-	 * been copied elsewhere. */
-	dev->rx_pkt_burst = removed_rx_burst;
-	/* Make sure everyone has left mlx4_rx_burst() and uses
-	 * removed_rx_burst() instead. */
-	rte_wmb();
-	usleep(1000);
-	/* Reconfigure each RX queue. */
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
-		unsigned int max_frame_len;
-
-		if (rxq == NULL)
-			continue;
-		/* Calculate new maximum frame length according to MTU. */
-		max_frame_len = (priv->mtu + ETHER_HDR_LEN +
-				 (ETHER_MAX_VLAN_FRAME_LEN - ETHER_MAX_LEN));
-		/* Provide new values to rxq_setup(). */
-		dev->data->dev_conf.rxmode.jumbo_frame =
-			(max_frame_len > ETHER_MAX_LEN);
-		dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame_len;
-		ret = rxq_rehash(dev, rxq);
-		if (ret) {
-			/* Force SP RX if that queue requires it and abort. */
-			if (rxq->sp)
-				rx_func = mlx4_rx_burst_sp;
-			break;
-		}
-		/* Scattered burst function takes priority. */
-		if (rxq->sp)
-			rx_func = mlx4_rx_burst_sp;
-	}
-	/* Burst functions can now be called again. */
-	rte_wmb();
-	dev->rx_pkt_burst = rx_func;
-	/* Restore MAC flow. */
-	ret = priv_mac_addr_add(priv);
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5a0a7a1..38c93f1 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -66,9 +66,6 @@
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
-/* Maximum number of Scatter/Gather Elements per Work Request. */
-#define MLX4_PMD_SGE_WR_N 4
-
 /* Maximum size for inline data. */
 #define MLX4_PMD_MAX_INLINE 0
 
@@ -141,13 +138,6 @@ struct mlx4_rxq_stats {
 	uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
 };
 
-/* RX element (scattered packets). */
-struct rxq_elt_sp {
-	struct ibv_recv_wr wr; /* Work Request. */
-	struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
-	struct rte_mbuf *bufs[MLX4_PMD_SGE_WR_N]; /* SGEs buffers. */
-};
-
 /* RX element. */
 struct rxq_elt {
 	struct ibv_recv_wr wr; /* Work Request. */
@@ -168,11 +158,7 @@ struct rxq {
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
-	union {
-		struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
-		struct rxq_elt (*no_sp)[]; /* RX elements. */
-	} elts;
-	unsigned int sp:1; /* Use scattered RX elements. */
+	struct rxq_elt (*elts)[]; /* RX elements. */
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -190,16 +176,6 @@ struct mlx4_txq_stats {
 	uint64_t odropped; /**< Total of packets not sent when TX ring full. */
 };
 
-/*
- * Linear buffer type. It is used when transmitting buffers with too many
- * segments that do not fit the hardware queue (see max_send_sge).
- * Extra segments are copied (linearized) in such buffers, replacing the
- * last SGE during TX.
- * The size is arbitrary but large enough to hold a jumbo frame with
- * 8 segments considering mbuf.buf_len is about 2048 bytes.
- */
-typedef uint8_t linear_t[16384];
-
 /* TX queue descriptor. */
 struct txq {
 	struct priv *priv; /* Back pointer to private data. */
@@ -221,8 +197,6 @@ struct txq {
 	unsigned int elts_comp_cd; /* Countdown for next completion request. */
 	unsigned int elts_comp_cd_init; /* Initial value for countdown. */
 	struct mlx4_txq_stats stats; /* TX queue counters. */
-	linear_t (*elts_linear)[]; /* Linearized buffers. */
-	struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 20/48] net/mlx4: drop inline receive support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (18 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 19/48] net/mlx4: drop scatter/gather support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 21/48] net/mlx4: use standard QP attributes Adrien Mazarguil
                   ` (29 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

The Verbs API used to implement inline receive is deprecated.
Support will be added back after refactoring the PMD.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 52 --------------------------------------------
 drivers/net/mlx4/mlx4.h |  1 -
 2 files changed, 53 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 5546c0a..227c02c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1925,8 +1925,6 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-	attr.max_inl_recv = priv->inl_recv_size,
-	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
@@ -2988,25 +2986,6 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-/**
- * Retrieve integer value from environment variable.
- *
- * @param[in] name
- *   Environment variable name.
- *
- * @return
- *   Integer value, 0 if the variable is not set.
- */
-static int
-mlx4_getenv_int(const char *name)
-{
-	const char *val = getenv(name);
-
-	if (val == NULL)
-		return 0;
-	return atoi(val);
-}
-
 static void
 mlx4_dev_link_status_handler(void *);
 static void
@@ -3649,13 +3628,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		struct ibv_pd *pd = NULL;
 		struct priv *priv = NULL;
 		struct rte_eth_dev *eth_dev = NULL;
-		struct ibv_exp_device_attr exp_device_attr;
 		struct ether_addr mac;
 
 		/* If port is not enabled, skip. */
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
-		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
 
 		DEBUG("using port %u", port);
 
@@ -3708,35 +3685,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
-		if (ibv_exp_query_device(ctx, &exp_device_attr)) {
-			ERROR("ibv_exp_query_device() failed");
-			err = ENODEV;
-			goto port_error;
-		}
-
-		priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
-
-		if (priv->inl_recv_size) {
-			exp_device_attr.comp_mask =
-				IBV_EXP_DEVICE_ATTR_INLINE_RECV_SZ;
-			if (ibv_exp_query_device(ctx, &exp_device_attr)) {
-				INFO("Couldn't query device for inline-receive"
-				     " capabilities.");
-				priv->inl_recv_size = 0;
-			} else {
-				if ((unsigned)exp_device_attr.inline_recv_sz <
-				    priv->inl_recv_size) {
-					INFO("Max inline-receive (%d) <"
-					     " requested inline-receive (%u)",
-					     exp_device_attr.inline_recv_sz,
-					     priv->inl_recv_size);
-					priv->inl_recv_size =
-						exp_device_attr.inline_recv_sz;
-				}
-			}
-			INFO("Set inline receive size to %u",
-			     priv->inl_recv_size);
-		}
 
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 38c93f1..66efb98 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -217,7 +217,6 @@ struct priv {
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-	unsigned int inl_recv_size; /* Inline recv size */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 21/48] net/mlx4: use standard QP attributes
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (19 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 20/48] net/mlx4: drop inline receive support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 22/48] net/mlx4: revert resource domain support Adrien Mazarguil
                   ` (28 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

The Verbs API used to set QP attributes is deprecated. Revert to the
standard API since it actually supports the remaining ones.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 227c02c..773ba62 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1178,7 +1178,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		struct ibv_exp_qp_init_attr init;
 		struct ibv_exp_res_domain_init_attr rd;
 		struct ibv_exp_cq_init_attr cq;
-		struct ibv_exp_qp_attr mod;
+		struct ibv_qp_attr mod;
 	} attr;
 	enum ibv_exp_query_intf_status status;
 	int ret = 0;
@@ -1251,14 +1251,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	}
 	/* ibv_create_qp() updates this value. */
 	tmpl.max_inline = attr.init.cap.max_inline_data;
-	attr.mod = (struct ibv_exp_qp_attr){
+	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
 		/* Primary port number. */
 		.port_num = priv->port
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod,
-				(IBV_EXP_QP_STATE | IBV_EXP_QP_PORT));
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(ret));
@@ -1270,17 +1269,17 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	attr.mod = (struct ibv_exp_qp_attr){
+	attr.mod = (struct ibv_qp_attr){
 		.qp_state = IBV_QPS_RTR
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
 	attr.mod.qp_state = IBV_QPS_RTS;
-	ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
 		      (void *)dev, strerror(ret));
@@ -1958,7 +1957,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.mp = mp,
 		.socket = socket
 	};
-	struct ibv_exp_qp_attr mod;
+	struct ibv_qp_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
 		struct ibv_exp_cq_init_attr cq;
@@ -2046,15 +2045,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
+	mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
 		/* Primary port number. */
 		.port_num = priv->port
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
-				IBV_EXP_QP_STATE |
-				IBV_EXP_QP_PORT);
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(ret));
@@ -2074,10 +2071,10 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      strerror(ret));
 		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
+	mod = (struct ibv_qp_attr){
 		.qp_state = IBV_QPS_RTR
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(ret));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 22/48] net/mlx4: revert resource domain support
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (20 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 21/48] net/mlx4: use standard QP attributes Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 23/48] net/mlx4: revert multicast echo prevention Adrien Mazarguil
                   ` (27 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit 3e49c148b715c3c0a12c1200295bb9b312f7028e.

Resource domains are not part of the standard Verbs interface. The
performance improvement they bring will be restored later through a
different data path implementation.

This commit makes the PMD not rely on the non-standard QP allocation
interface.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 88 ++++-----------------------------------
 drivers/net/mlx4/mlx4.h      |  2 -
 drivers/net/mlx4/mlx4_flow.c | 30 +++++--------
 3 files changed, 20 insertions(+), 100 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 773ba62..144cfb0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -697,17 +697,6 @@ txq_cleanup(struct txq *txq)
 		claim_zero(ibv_destroy_qp(txq->qp));
 	if (txq->cq != NULL)
 		claim_zero(ibv_destroy_cq(txq->cq));
-	if (txq->rd != NULL) {
-		struct ibv_exp_destroy_res_domain_attr attr = {
-			.comp_mask = 0,
-		};
-
-		assert(txq->priv != NULL);
-		assert(txq->priv->ctx != NULL);
-		claim_zero(ibv_exp_destroy_res_domain(txq->priv->ctx,
-						      txq->rd,
-						      &attr));
-	}
 	for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
 		if (txq->mp2mr[i].mp == NULL)
 			break;
@@ -1175,9 +1164,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	union {
 		struct ibv_exp_query_intf_params params;
-		struct ibv_exp_qp_init_attr init;
-		struct ibv_exp_res_domain_init_attr rd;
-		struct ibv_exp_cq_init_attr cq;
+		struct ibv_qp_init_attr init;
 		struct ibv_qp_attr mod;
 	} attr;
 	enum ibv_exp_query_intf_status status;
@@ -1191,24 +1178,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		return EINVAL;
 	}
 	/* MRs will be registered in mp2mr[] later. */
-	attr.rd = (struct ibv_exp_res_domain_init_attr){
-		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
-			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
-		.thread_model = IBV_EXP_THREAD_SINGLE,
-		.msg_model = IBV_EXP_MSG_HIGH_BW,
-	};
-	tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
-	if (tmpl.rd == NULL) {
-		ret = ENOMEM;
-		ERROR("%p: RD creation failure: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
-	attr.cq = (struct ibv_exp_cq_init_attr){
-		.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
-		.res_domain = tmpl.rd,
-	};
-	tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		ret = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
@@ -1219,7 +1189,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	attr.init = (struct ibv_exp_qp_init_attr){
+	attr.init = (struct ibv_qp_init_attr){
 		/* CQ to be associated with the send queue. */
 		.send_cq = tmpl.cq,
 		/* CQ to be associated with the receive queue. */
@@ -1237,12 +1207,8 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		/* Do *NOT* enable this, completions events are managed per
 		 * TX burst. */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
-		.res_domain = tmpl.rd,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
 	};
-	tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init);
+	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
 	if (tmpl.qp == NULL) {
 		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
@@ -1721,17 +1687,6 @@ rxq_cleanup(struct rxq *rxq)
 		claim_zero(ibv_destroy_cq(rxq->cq));
 	if (rxq->channel != NULL)
 		claim_zero(ibv_destroy_comp_channel(rxq->channel));
-	if (rxq->rd != NULL) {
-		struct ibv_exp_destroy_res_domain_attr attr = {
-			.comp_mask = 0,
-		};
-
-		assert(rxq->priv != NULL);
-		assert(rxq->priv->ctx != NULL);
-		claim_zero(ibv_exp_destroy_res_domain(rxq->priv->ctx,
-						      rxq->rd,
-						      &attr));
-	}
 	if (rxq->mr != NULL)
 		claim_zero(ibv_dereg_mr(rxq->mr));
 	memset(rxq, 0, sizeof(*rxq));
@@ -1901,10 +1856,9 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   QP pointer or NULL in case of error.
  */
 static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-	     struct ibv_exp_res_domain *rd)
+rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 {
-	struct ibv_exp_qp_init_attr attr = {
+	struct ibv_qp_init_attr attr = {
 		/* CQ to be associated with the send queue. */
 		.send_cq = cq,
 		/* CQ to be associated with the receive queue. */
@@ -1918,13 +1872,9 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 			.max_recv_sge = 1,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
-		.pd = priv->pd,
-		.res_domain = rd,
 	};
 
-	return ibv_exp_create_qp(priv->ctx, &attr);
+	return ibv_create_qp(priv->pd, &attr);
 }
 
 /**
@@ -1960,8 +1910,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	struct ibv_qp_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
-		struct ibv_exp_cq_init_attr cq;
-		struct ibv_exp_res_domain_init_attr rd;
 	} attr;
 	enum ibv_exp_query_intf_status status;
 	struct ibv_recv_wr *bad_wr;
@@ -1999,19 +1947,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	attr.rd = (struct ibv_exp_res_domain_init_attr){
-		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
-			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
-		.thread_model = IBV_EXP_THREAD_SINGLE,
-		.msg_model = IBV_EXP_MSG_HIGH_BW,
-	};
-	tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
-	if (tmpl.rd == NULL) {
-		ret = ENOMEM;
-		ERROR("%p: RD creation failure: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
 	if (dev->data->dev_conf.intr_conf.rxq) {
 		tmpl.channel = ibv_create_comp_channel(priv->ctx);
 		if (tmpl.channel == NULL) {
@@ -2022,12 +1957,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 			goto error;
 		}
 	}
-	attr.cq = (struct ibv_exp_cq_init_attr){
-		.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
-		.res_domain = tmpl.rd,
-	};
-	tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0,
-				    &attr.cq);
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
 	if (tmpl.cq == NULL) {
 		ret = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
@@ -2038,7 +1968,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
 	if (tmpl.qp == NULL) {
 		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 66efb98..6a5df5c 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -161,7 +161,6 @@ struct rxq {
 	struct rxq_elt (*elts)[]; /* RX elements. */
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
-	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
 /* TX element. */
@@ -198,7 +197,6 @@ struct txq {
 	unsigned int elts_comp_cd_init; /* Initial value for countdown. */
 	struct mlx4_txq_stats stats; /* TX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
-	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
 struct rte_flow;
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 2c5dc3c..58d4698 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -752,29 +752,21 @@ mlx4_flow_create_drop_queue(struct priv *priv)
 		ERROR("Cannot allocate memory for drop struct");
 		goto err;
 	}
-	cq = ibv_exp_create_cq(priv->ctx, 1, NULL, NULL, 0,
-			      &(struct ibv_exp_cq_init_attr){
-					.comp_mask = 0,
-			      });
+	cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		ERROR("Cannot create drop CQ");
 		goto err_create_cq;
 	}
-	qp = ibv_exp_create_qp(priv->ctx,
-			      &(struct ibv_exp_qp_init_attr){
-					.send_cq = cq,
-					.recv_cq = cq,
-					.cap = {
-						.max_recv_wr = 1,
-						.max_recv_sge = 1,
-					},
-					.qp_type = IBV_QPT_RAW_PACKET,
-					.comp_mask =
-						IBV_EXP_QP_INIT_ATTR_PD |
-						IBV_EXP_QP_INIT_ATTR_PORT,
-					.pd = priv->pd,
-					.port_num = priv->port,
-			      });
+	qp = ibv_create_qp(priv->pd,
+			   &(struct ibv_qp_init_attr){
+				.send_cq = cq,
+				.recv_cq = cq,
+				.cap = {
+					.max_recv_wr = 1,
+					.max_recv_sge = 1,
+				},
+				.qp_type = IBV_QPT_RAW_PACKET,
+			   });
 	if (!qp) {
 		ERROR("Cannot create drop QP");
 		goto err_create_qp;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 23/48] net/mlx4: revert multicast echo prevention
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (21 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 22/48] net/mlx4: revert resource domain support Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 24/48] net/mlx4: revert fast Verbs interface for Tx Adrien Mazarguil
                   ` (26 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit 8b3ffe95e75d6d305992505005cbb95969874a15.

Multicast loopback prevention is not part of the standard Verbs interface.
Remove it temporarily.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile | 6 +-----
 drivers/net/mlx4/mlx4.c   | 7 -------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 8406ba2..78ea350 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -87,11 +87,7 @@ mlx4_autoconf.h.new: FORCE
 
 mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 	$Q $(RM) -f -- '$@'
-	$Q sh -- '$<' '$@' \
-		HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
-		infiniband/verbs.h \
-		enum IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
-		$(AUTOCONF_OUTPUT)
+	$Q : > '$@'
 
 # Create mlx4_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 144cfb0..ffad5a4 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1266,13 +1266,6 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		.intf_scope = IBV_EXP_INTF_GLOBAL,
 		.intf = IBV_EXP_INTF_QP_BURST,
 		.obj = tmpl.qp,
-#ifdef HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK
-		/* MC loopback must be disabled when not using a VF. */
-		.family_flags =
-			(!priv->vf ?
-			 IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK :
-			 0),
-#endif
 	};
 	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
 	if (tmpl.if_qp == NULL) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 24/48] net/mlx4: revert fast Verbs interface for Tx
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (22 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 23/48] net/mlx4: revert multicast echo prevention Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 25/48] net/mlx4: revert fast Verbs interface for Rx Adrien Mazarguil
                   ` (25 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit 9980f81dc2623291b89cf1c281a6a9f116fd2394.

"Fast Verbs" is a nonstandard experimental interface that must be reverted
for compatibility reasons. Its replacement is slower but temporary,
performance will be restored by a subsequent commit through an enhanced
data path implementation. This one focuses on maintaining basic
functionality in the meantime.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 120 ++++++++++++++++++-------------------------
 drivers/net/mlx4/mlx4.h |   4 +-
 2 files changed, 52 insertions(+), 72 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ffad5a4..812f29c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -666,33 +666,10 @@ txq_free_elts(struct txq *txq)
 static void
 txq_cleanup(struct txq *txq)
 {
-	struct ibv_exp_release_intf_params params;
 	size_t i;
 
 	DEBUG("cleaning up %p", (void *)txq);
 	txq_free_elts(txq);
-	if (txq->if_qp != NULL) {
-		assert(txq->priv != NULL);
-		assert(txq->priv->ctx != NULL);
-		assert(txq->qp != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(txq->priv->ctx,
-						txq->if_qp,
-						&params));
-	}
-	if (txq->if_cq != NULL) {
-		assert(txq->priv != NULL);
-		assert(txq->priv->ctx != NULL);
-		assert(txq->cq != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(txq->priv->ctx,
-						txq->if_cq,
-						&params));
-	}
 	if (txq->qp != NULL)
 		claim_zero(ibv_destroy_qp(txq->qp));
 	if (txq->cq != NULL)
@@ -726,11 +703,12 @@ txq_complete(struct txq *txq)
 	unsigned int elts_comp = txq->elts_comp;
 	unsigned int elts_tail = txq->elts_tail;
 	const unsigned int elts_n = txq->elts_n;
+	struct ibv_wc wcs[elts_comp];
 	int wcs_n;
 
 	if (unlikely(elts_comp == 0))
 		return 0;
-	wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
+	wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
 	if (unlikely(wcs_n == 0))
 		return 0;
 	if (unlikely(wcs_n < 0)) {
@@ -1014,6 +992,9 @@ static uint16_t
 mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 {
 	struct txq *txq = (struct txq *)dpdk_txq;
+	struct ibv_send_wr *wr_head = NULL;
+	struct ibv_send_wr **wr_next = &wr_head;
+	struct ibv_send_wr *wr_bad = NULL;
 	unsigned int elts_head = txq->elts_head;
 	const unsigned int elts_n = txq->elts_n;
 	unsigned int elts_comp_cd = txq->elts_comp_cd;
@@ -1041,6 +1022,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
 		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
 		struct txq_elt *elt = &(*txq->elts)[elts_head];
+		struct ibv_send_wr *wr = &elt->wr;
 		unsigned int segs = NB_SEGS(buf);
 		unsigned int sent_size = 0;
 		uint32_t send_flags = 0;
@@ -1065,9 +1047,10 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		if (unlikely(--elts_comp_cd == 0)) {
 			elts_comp_cd = txq->elts_comp_cd_init;
 			++elts_comp;
-			send_flags |= IBV_EXP_QP_BURST_SIGNALED;
+			send_flags |= IBV_SEND_SIGNALED;
 		}
 		if (likely(segs == 1)) {
+			struct ibv_sge *sge = &elt->sge;
 			uintptr_t addr;
 			uint32_t length;
 			uint32_t lkey;
@@ -1091,30 +1074,26 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				rte_prefetch0((volatile void *)
 					      (uintptr_t)addr);
 			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
-			/* Put packet into send queue. */
-			if (length <= txq->max_inline)
-				err = txq->if_qp->send_pending_inline
-					(txq->qp,
-					 (void *)addr,
-					 length,
-					 send_flags);
-			else
-				err = txq->if_qp->send_pending
-					(txq->qp,
-					 addr,
-					 length,
-					 lkey,
-					 send_flags);
-			if (unlikely(err))
-				goto stop;
+			sge->addr = addr;
+			sge->length = length;
+			sge->lkey = lkey;
 			sent_size += length;
 		} else {
 			err = -1;
 			goto stop;
 		}
+		if (sent_size <= txq->max_inline)
+			send_flags |= IBV_SEND_INLINE;
 		elts_head = elts_head_next;
 		/* Increment sent bytes counter. */
 		txq->stats.obytes += sent_size;
+		/* Set up WR. */
+		wr->sg_list = &elt->sge;
+		wr->num_sge = segs;
+		wr->opcode = IBV_WR_SEND;
+		wr->send_flags = send_flags;
+		*wr_next = wr;
+		wr_next = &wr->next;
 	}
 stop:
 	/* Take a shortcut if nothing must be sent. */
@@ -1123,12 +1102,37 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	/* Increment sent packets counter. */
 	txq->stats.opackets += i;
 	/* Ring QP doorbell. */
-	err = txq->if_qp->send_flush(txq->qp);
+	*wr_next = NULL;
+	assert(wr_head);
+	err = ibv_post_send(txq->qp, wr_head, &wr_bad);
 	if (unlikely(err)) {
-		/* A nonzero value is not supposed to be returned.
-		 * Nothing can be done about it. */
-		DEBUG("%p: send_flush() failed with error %d",
-		      (void *)txq, err);
+		uint64_t obytes = 0;
+		uint64_t opackets = 0;
+
+		/* Rewind bad WRs. */
+		while (wr_bad != NULL) {
+			int j;
+
+			/* Force completion request if one was lost. */
+			if (wr_bad->send_flags & IBV_SEND_SIGNALED) {
+				elts_comp_cd = 1;
+				--elts_comp;
+			}
+			++opackets;
+			for (j = 0; j < wr_bad->num_sge; ++j)
+				obytes += wr_bad->sg_list[j].length;
+			elts_head = (elts_head ? elts_head : elts_n) - 1;
+			wr_bad = wr_bad->next;
+		}
+		txq->stats.opackets -= opackets;
+		txq->stats.obytes -= obytes;
+		i -= opackets;
+		DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets"
+		      " (%" PRIu64 " bytes) rejected: %s",
+		      (void *)txq,
+		      opackets,
+		      obytes,
+		      (err <= -1) ? "Internal error" : strerror(err));
 	}
 	txq->elts_head = elts_head;
 	txq->elts_comp += elts_comp;
@@ -1163,11 +1167,9 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		.socket = socket
 	};
 	union {
-		struct ibv_exp_query_intf_params params;
 		struct ibv_qp_init_attr init;
 		struct ibv_qp_attr mod;
 	} attr;
-	enum ibv_exp_query_intf_status status;
 	int ret = 0;
 
 	(void)conf; /* Thresholds configuration (ignored). */
@@ -1251,28 +1253,6 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_CQ,
-		.obj = tmpl.cq,
-	};
-	tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_cq == NULL) {
-		ERROR("%p: CQ interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = tmpl.qp,
-	};
-	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_qp == NULL) {
-		ERROR("%p: QP interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
 	/* Clean up txq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
 	txq_cleanup(txq);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 6a5df5c..11c8885 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -165,6 +165,8 @@ struct rxq {
 
 /* TX element. */
 struct txq_elt {
+	struct ibv_send_wr wr; /* Work request. */
+	struct ibv_sge sge; /* Scatter/gather element. */
 	struct rte_mbuf *buf;
 };
 
@@ -185,8 +187,6 @@ struct txq {
 	} mp2mr[MLX4_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
 	struct ibv_cq *cq; /* Completion Queue. */
 	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
-	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	struct txq_elt (*elts)[]; /* TX elements. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 25/48] net/mlx4: revert fast Verbs interface for Rx
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (23 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 24/48] net/mlx4: revert fast Verbs interface for Tx Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 26/48] net/mlx4: simplify link update function Adrien Mazarguil
                   ` (24 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit acac55f164128fc76da8d93cae1e8c1e560e99f6.

"Fast Verbs" is a nonstandard experimental interface that must be reverted
for compatibility reasons. Its replacement is slower but temporary,
performance will be restored by a subsequent commit through an enhanced
data path implementation. This one focuses on maintaining basic
functionality in the meantime.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 127 +++++++++++--------------------------------
 drivers/net/mlx4/mlx4.h |   2 -
 2 files changed, 33 insertions(+), 96 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 812f29c..5b7238e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1628,32 +1628,8 @@ priv_mac_addr_add(struct priv *priv)
 static void
 rxq_cleanup(struct rxq *rxq)
 {
-	struct ibv_exp_release_intf_params params;
-
 	DEBUG("cleaning up %p", (void *)rxq);
 	rxq_free_elts(rxq);
-	if (rxq->if_qp != NULL) {
-		assert(rxq->priv != NULL);
-		assert(rxq->priv->ctx != NULL);
-		assert(rxq->qp != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
-						rxq->if_qp,
-						&params));
-	}
-	if (rxq->if_cq != NULL) {
-		assert(rxq->priv != NULL);
-		assert(rxq->priv->ctx != NULL);
-		assert(rxq->cq != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
-						rxq->if_cq,
-						&params));
-	}
 	if (rxq->qp != NULL)
 		claim_zero(ibv_destroy_qp(rxq->qp));
 	if (rxq->cq != NULL)
@@ -1687,23 +1663,37 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 	const unsigned int elts_n = rxq->elts_n;
 	unsigned int elts_head = rxq->elts_head;
-	struct ibv_sge sges[pkts_n];
+	struct ibv_wc wcs[pkts_n];
+	struct ibv_recv_wr *wr_head = NULL;
+	struct ibv_recv_wr **wr_next = &wr_head;
+	struct ibv_recv_wr *wr_bad = NULL;
 	unsigned int i;
 	unsigned int pkts_ret = 0;
 	int ret;
 
-	for (i = 0; (i != pkts_n); ++i) {
+	ret = ibv_poll_cq(rxq->cq, pkts_n, wcs);
+	if (unlikely(ret == 0))
+		return 0;
+	if (unlikely(ret < 0)) {
+		DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
+		      (void *)rxq, ret);
+		return 0;
+	}
+	assert(ret <= (int)pkts_n);
+	/* For each work completion. */
+	for (i = 0; i != (unsigned int)ret; ++i) {
+		struct ibv_wc *wc = &wcs[i];
 		struct rxq_elt *elt = &(*elts)[elts_head];
 		struct ibv_recv_wr *wr = &elt->wr;
 		uint64_t wr_id = wr->wr_id;
-		unsigned int len;
+		uint32_t len = wc->byte_len;
 		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
 			WR_ID(wr_id).offset);
 		struct rte_mbuf *rep;
-		uint32_t flags;
 
 		/* Sanity checks. */
 		assert(WR_ID(wr_id).id < rxq->elts_n);
+		assert(wr_id == wc->wr_id);
 		assert(wr->sg_list == &elt->sge);
 		assert(wr->num_sge == 1);
 		assert(elts_head < rxq->elts_n);
@@ -1714,41 +1704,19 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		 */
 		rte_mbuf_prefetch_part1(seg);
 		rte_mbuf_prefetch_part2(seg);
-		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
-						    &flags);
-		if (unlikely(ret < 0)) {
-			struct ibv_wc wc;
-			int wcs_n;
-
-			DEBUG("rxq=%p, poll_length() failed (ret=%d)",
-			      (void *)rxq, ret);
-			/* ibv_poll_cq() must be used in case of failure. */
-			wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
-			if (unlikely(wcs_n == 0))
-				break;
-			if (unlikely(wcs_n < 0)) {
-				DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
-				      (void *)rxq, wcs_n);
-				break;
-			}
-			assert(wcs_n == 1);
-			if (unlikely(wc.status != IBV_WC_SUCCESS)) {
-				/* Whatever, just repost the offending WR. */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
-				      " completion status (%d): %s",
-				      (void *)rxq, wc.wr_id, wc.status,
-				      ibv_wc_status_str(wc.status));
-				/* Increment dropped packets counter. */
-				++rxq->stats.idropped;
-				/* Add SGE to array for repost. */
-				sges[i] = elt->sge;
-				goto repost;
-			}
-			ret = wc.byte_len;
+		/* Link completed WRs together for repost. */
+		*wr_next = wr;
+		wr_next = &wr->next;
+		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
+			/* Whatever, just repost the offending WR. */
+			DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
+			      " status (%d): %s",
+			      (void *)rxq, wr_id, wc->status,
+			      ibv_wc_status_str(wc->status));
+			/* Increment dropped packets counter. */
+			++rxq->stats.idropped;
+			goto repost;
 		}
-		if (ret == 0)
-			break;
-		len = ret;
 		rep = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(rep == NULL)) {
 			/*
@@ -1761,8 +1729,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			/* Increase out of memory counters. */
 			++rxq->stats.rx_nombuf;
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-			/* Add SGE to array for repost. */
-			sges[i] = elt->sge;
 			goto repost;
 		}
 
@@ -1774,9 +1740,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			 (uintptr_t)rep);
 		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
 
-		/* Add SGE to array for repost. */
-		sges[i] = elt->sge;
-
 		/* Update seg information. */
 		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
 		NB_SEGS(seg) = 1;
@@ -1800,7 +1763,9 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	if (unlikely(i == 0))
 		return 0;
 	/* Repost WRs. */
-	ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+	*wr_next = NULL;
+	assert(wr_head);
+	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
 		DEBUG("%p: recv_burst(): failed (ret=%d)",
@@ -1881,10 +1846,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.socket = socket
 	};
 	struct ibv_qp_attr mod;
-	union {
-		struct ibv_exp_query_intf_params params;
-	} attr;
-	enum ibv_exp_query_intf_status status;
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret = 0;
@@ -1986,28 +1947,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
 	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_CQ,
-		.obj = tmpl.cq,
-	};
-	tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_cq == NULL) {
-		ERROR("%p: CQ interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = tmpl.qp,
-	};
-	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_qp == NULL) {
-		ERROR("%p: QP interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 11c8885..635036e 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -152,8 +152,6 @@ struct rxq {
 	struct ibv_mr *mr; /* Memory Region (for mp). */
 	struct ibv_cq *cq; /* Completion Queue. */
 	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
-	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 26/48] net/mlx4: simplify link update function
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (24 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 25/48] net/mlx4: revert fast Verbs interface for Rx Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 27/48] net/mlx4: standardize on negative errno values Adrien Mazarguil
                   ` (23 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Returning a different value when the current link status differs from the
previous one was probably useful at some point in the past but is now
meaningless; this value is ignored both internally (mlx4 PMD) and
externally (ethdev wrapper).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 5b7238e..7312482 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2543,13 +2543,8 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 				ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
 	dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
 			ETH_LINK_SPEED_FIXED);
-	if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) {
-		/* Link status changed. */
-		dev->data->dev_link = dev_link;
-		return 0;
-	}
-	/* Link status is still the same. */
-	return -1;
+	dev->data->dev_link = dev_link;
+	return 0;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 27/48] net/mlx4: standardize on negative errno values
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (25 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 26/48] net/mlx4: simplify link update function Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 28/48] net/mlx4: clean up coding style inconsistencies Adrien Mazarguil
                   ` (22 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Due to its reliance on system calls, the mlx4 PMD uses positive errno
values internally and negative ones at the ethdev API border. Although most
internal functions are documented, this mixed design is unusual and prone
to mistakes (e.g. flow API implementation uses negative values
exclusively).

Standardize on negative errno values and rely on rte_errno instead of
errno in all functions.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 485 ++++++++++++++++++++++++-------------------
 1 file changed, 274 insertions(+), 211 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 7312482..8cfeab2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -159,7 +159,7 @@ void priv_unlock(struct priv *priv)
  *   Interface name output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
@@ -174,8 +174,10 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
 
 		dir = opendir(path);
-		if (dir == NULL)
-			return -1;
+		if (dir == NULL) {
+			rte_errno = errno;
+			return -rte_errno;
+		}
 	}
 	while ((dent = readdir(dir)) != NULL) {
 		char *name = dent->d_name;
@@ -225,8 +227,10 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 			snprintf(match, sizeof(match), "%s", name);
 	}
 	closedir(dir);
-	if (match[0] == '\0')
-		return -1;
+	if (match[0] == '\0') {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
 	strncpy(*ifname, match, sizeof(*ifname));
 	return 0;
 }
@@ -244,7 +248,8 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
  *   Buffer size.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   Number of bytes read on success, negative errno value otherwise and
+ *   rte_errno is set.
  */
 static int
 priv_sysfs_read(const struct priv *priv, const char *entry,
@@ -253,25 +258,27 @@ priv_sysfs_read(const struct priv *priv, const char *entry,
 	char ifname[IF_NAMESIZE];
 	FILE *file;
 	int ret;
-	int err;
 
-	if (priv_get_ifname(priv, &ifname))
-		return -1;
+	ret = priv_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
 
 	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
 	      ifname, entry);
 
 	file = fopen(path, "rb");
-	if (file == NULL)
-		return -1;
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
 	ret = fread(buf, 1, size, file);
-	err = errno;
-	if (((size_t)ret < size) && (ferror(file)))
-		ret = -1;
-	else
+	if ((size_t)ret < size && ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
 		ret = size;
+	}
 	fclose(file);
-	errno = err;
 	return ret;
 }
 
@@ -288,7 +295,8 @@ priv_sysfs_read(const struct priv *priv, const char *entry,
  *   Buffer size.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   Number of bytes written on success, negative errno value otherwise and
+ *   rte_errno is set.
  */
 static int
 priv_sysfs_write(const struct priv *priv, const char *entry,
@@ -297,25 +305,27 @@ priv_sysfs_write(const struct priv *priv, const char *entry,
 	char ifname[IF_NAMESIZE];
 	FILE *file;
 	int ret;
-	int err;
 
-	if (priv_get_ifname(priv, &ifname))
-		return -1;
+	ret = priv_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
 
 	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
 	      ifname, entry);
 
 	file = fopen(path, "wb");
-	if (file == NULL)
-		return -1;
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
 	ret = fwrite(buf, 1, size, file);
-	err = errno;
-	if (((size_t)ret < size) || (ferror(file)))
-		ret = -1;
-	else
+	if ((size_t)ret < size || ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
 		ret = size;
+	}
 	fclose(file);
-	errno = err;
 	return ret;
 }
 
@@ -330,7 +340,7 @@ priv_sysfs_write(const struct priv *priv, const char *entry,
  *   Value output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
@@ -340,18 +350,19 @@ priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
 	char value_str[32];
 
 	ret = priv_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret == -1) {
+	if (ret < 0) {
 		DEBUG("cannot read %s value from sysfs: %s",
-		      name, strerror(errno));
-		return -1;
+		      name, strerror(rte_errno));
+		return ret;
 	}
 	value_str[ret] = '\0';
 	errno = 0;
 	value_ret = strtoul(value_str, NULL, 0);
 	if (errno) {
+		rte_errno = errno;
 		DEBUG("invalid %s value `%s': %s", name, value_str,
-		      strerror(errno));
-		return -1;
+		      strerror(rte_errno));
+		return -rte_errno;
 	}
 	*value = value_ret;
 	return 0;
@@ -368,7 +379,7 @@ priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
  *   Value to set.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
@@ -377,10 +388,10 @@ priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
 	MKSTR(value_str, "%lu", value);
 
 	ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret == -1) {
+	if (ret < 0) {
 		DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
-		      name, value_str, value, strerror(errno));
-		return -1;
+		      name, value_str, value, strerror(rte_errno));
+		return ret;
 	}
 	return 0;
 }
@@ -396,18 +407,23 @@ priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
  *   Interface request structure output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
 {
 	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-	int ret = -1;
+	int ret;
 
-	if (sock == -1)
-		return ret;
-	if (priv_get_ifname(priv, &ifr->ifr_name) == 0)
-		ret = ioctl(sock, req, ifr);
+	if (sock == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = priv_get_ifname(priv, &ifr->ifr_name);
+	if (!ret && ioctl(sock, req, ifr) == -1) {
+		rte_errno = errno;
+		ret = -rte_errno;
+	}
 	close(sock);
 	return ret;
 }
@@ -421,15 +437,16 @@ priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
  *   MTU value output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_mtu(struct priv *priv, uint16_t *mtu)
 {
-	unsigned long ulong_mtu;
+	unsigned long ulong_mtu = 0;
+	int ret = priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu);
 
-	if (priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu) == -1)
-		return -1;
+	if (ret)
+		return ret;
 	*mtu = ulong_mtu;
 	return 0;
 }
@@ -443,20 +460,23 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
  *   MTU value to set.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_mtu(struct priv *priv, uint16_t mtu)
 {
 	uint16_t new_mtu;
+	int ret = priv_set_sysfs_ulong(priv, "mtu", mtu);
 
-	if (priv_set_sysfs_ulong(priv, "mtu", mtu) ||
-	    priv_get_mtu(priv, &new_mtu))
-		return -1;
+	if (ret)
+		return ret;
+	ret = priv_get_mtu(priv, &new_mtu);
+	if (ret)
+		return ret;
 	if (new_mtu == mtu)
 		return 0;
-	errno = EINVAL;
-	return -1;
+	rte_errno = EINVAL;
+	return -rte_errno;
 }
 
 /**
@@ -470,15 +490,16 @@ priv_set_mtu(struct priv *priv, uint16_t mtu)
  *   Bitmask for flags to modify.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
 {
-	unsigned long tmp;
+	unsigned long tmp = 0;
+	int ret = priv_get_sysfs_ulong(priv, "flags", &tmp);
 
-	if (priv_get_sysfs_ulong(priv, "flags", &tmp) == -1)
-		return -1;
+	if (ret)
+		return ret;
 	tmp &= keep;
 	tmp |= (flags & (~keep));
 	return priv_set_sysfs_ulong(priv, "flags", tmp);
@@ -513,7 +534,7 @@ priv_mac_addr_del(struct priv *priv);
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 dev_configure(struct rte_eth_dev *dev)
@@ -544,7 +565,7 @@ dev_configure(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
@@ -554,9 +575,8 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 
 	priv_lock(priv);
 	ret = dev_configure(dev);
-	assert(ret >= 0);
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
@@ -573,7 +593,7 @@ static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
  *   Number of elements to allocate.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 txq_alloc_elts(struct txq *txq, unsigned int elts_n)
@@ -612,7 +632,8 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 
 	DEBUG("%p: failed, freed everything", (void *)txq);
 	assert(ret > 0);
-	return ret;
+	rte_errno = ret;
+	return -rte_errno;
 }
 
 /**
@@ -806,7 +827,7 @@ static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
  *   Pointer to memory pool.
  *
  * @return
- *   Memory region pointer, NULL in case of error.
+ *   Memory region pointer, NULL in case of error and rte_errno is set.
  */
 static struct ibv_mr *
 mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
@@ -815,8 +836,10 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	uintptr_t start;
 	uintptr_t end;
 	unsigned int i;
+	struct ibv_mr *mr;
 
 	if (mlx4_check_mempool(mp, &start, &end) != 0) {
+		rte_errno = EINVAL;
 		ERROR("mempool %p: not virtually contiguous",
 			(void *)mp);
 		return NULL;
@@ -839,10 +862,13 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
 	      (void *)mp, (void *)start, (void *)end,
 	      (size_t)(end - start));
-	return ibv_reg_mr(pd,
-			  (void *)start,
-			  end - start,
-			  IBV_ACCESS_LOCAL_WRITE);
+	mr = ibv_reg_mr(pd,
+			(void *)start,
+			end - start,
+			IBV_ACCESS_LOCAL_WRITE);
+	if (!mr)
+		rte_errno = errno ? errno : EINVAL;
+	return mr;
 }
 
 /**
@@ -1155,7 +1181,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   Thresholds parameters.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
@@ -1170,21 +1196,24 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		struct ibv_qp_init_attr init;
 		struct ibv_qp_attr mod;
 	} attr;
-	int ret = 0;
+	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	if (priv == NULL)
-		return EINVAL;
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (desc == 0) {
+		rte_errno = EINVAL;
 		ERROR("%p: invalid number of TX descriptors", (void *)dev);
-		return EINVAL;
+		goto error;
 	}
 	/* MRs will be registered in mp2mr[] later. */
 	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
-		ret = ENOMEM;
+		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	DEBUG("priv->device_attr.max_qp_wr is %d",
@@ -1212,9 +1241,9 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
 	if (tmpl.qp == NULL) {
-		ret = (errno ? errno : EINVAL);
+		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	/* ibv_create_qp() updates this value. */
@@ -1227,14 +1256,16 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	ret = txq_alloc_elts(&tmpl, desc);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: TXQ allocation failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	attr.mod = (struct ibv_qp_attr){
@@ -1242,15 +1273,17 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	attr.mod.qp_state = IBV_QPS_RTS;
 	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	/* Clean up txq in case we're reinitializing it. */
@@ -1260,12 +1293,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
 	/* Pre-register known mempools. */
 	rte_mempool_walk(txq_mp2mr_iter, txq);
-	assert(ret == 0);
 	return 0;
 error:
+	ret = rte_errno;
 	txq_cleanup(&tmpl);
-	assert(ret > 0);
-	return ret;
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
 }
 
 /**
@@ -1283,7 +1317,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
  *   Thresholds parameters.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
@@ -1297,27 +1331,30 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->txqs_n) {
+		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->txqs_n);
 		priv_unlock(priv);
-		return -EOVERFLOW;
+		return -rte_errno;
 	}
 	if (txq != NULL) {
 		DEBUG("%p: reusing already allocated queue index %u (%p)",
 		      (void *)dev, idx, (void *)txq);
 		if (priv->started) {
+			rte_errno = EEXIST;
 			priv_unlock(priv);
-			return -EEXIST;
+			return -rte_errno;
 		}
 		(*priv->txqs)[idx] = NULL;
 		txq_cleanup(txq);
 	} else {
 		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
 		if (txq == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
 			priv_unlock(priv);
-			return -ENOMEM;
+			return -rte_errno;
 		}
 	}
 	ret = txq_setup(dev, txq, desc, socket, conf);
@@ -1332,7 +1369,7 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 /**
@@ -1378,7 +1415,7 @@ mlx4_tx_queue_release(void *dpdk_txq)
  *   with rte_pktmbuf_alloc().
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
@@ -1387,11 +1424,10 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 	struct rxq_elt (*elts)[elts_n] =
 		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
 				  rxq->socket);
-	int ret = 0;
 
 	if (elts == NULL) {
+		rte_errno = ENOMEM;
 		ERROR("%p: can't allocate packets array", (void *)rxq);
-		ret = ENOMEM;
 		goto error;
 	}
 	/* For each WR (packet). */
@@ -1408,9 +1444,9 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		} else
 			buf = rte_pktmbuf_alloc(rxq->mp);
 		if (buf == NULL) {
+			rte_errno = ENOMEM;
 			assert(pool == NULL);
 			ERROR("%p: empty mbuf pool", (void *)rxq);
-			ret = ENOMEM;
 			goto error;
 		}
 		/* Configure WR. Work request ID contains its own index in
@@ -1442,11 +1478,11 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		if ((WR_ID(wr->wr_id).id != i) ||
 		    ((void *)((uintptr_t)sge->addr -
 			WR_ID(wr->wr_id).offset) != buf)) {
+			rte_errno = EOVERFLOW;
 			ERROR("%p: cannot store index and offset in WR ID",
 			      (void *)rxq);
 			sge->addr = 0;
 			rte_pktmbuf_free(buf);
-			ret = EOVERFLOW;
 			goto error;
 		}
 	}
@@ -1457,7 +1493,6 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 	rxq->elts_n = elts_n;
 	rxq->elts_head = 0;
 	rxq->elts = elts;
-	assert(ret == 0);
 	return 0;
 error:
 	if (elts != NULL) {
@@ -1476,8 +1511,8 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		rte_free(elts);
 	}
 	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(ret > 0);
-	return ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
 }
 
 /**
@@ -1543,7 +1578,7 @@ priv_mac_addr_del(struct priv *priv)
  *   Pointer to private structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_mac_addr_add(struct priv *priv)
@@ -1601,16 +1636,12 @@ priv_mac_addr_add(struct priv *priv)
 	      (void *)priv,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
 	/* Create related flow. */
-	errno = 0;
 	flow = ibv_create_flow(rxq->qp, attr);
 	if (flow == NULL) {
-		/* It's not clear whether errno is always set in this case. */
+		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
-		      (errno ? strerror(errno) : "Unknown error"));
-		if (errno)
-			return errno;
-		return EINVAL;
+		      (void *)rxq, rte_errno, strerror(errno));
+		return -rte_errno;
 	}
 	assert(priv->mac_flow == NULL);
 	priv->mac_flow = flow;
@@ -1791,11 +1822,12 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   Number of descriptors in QP (hint only).
  *
  * @return
- *   QP pointer or NULL in case of error.
+ *   QP pointer or NULL in case of error and rte_errno is set.
  */
 static struct ibv_qp *
 rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 {
+	struct ibv_qp *qp;
 	struct ibv_qp_init_attr attr = {
 		/* CQ to be associated with the send queue. */
 		.send_cq = cq,
@@ -1812,7 +1844,10 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 		.qp_type = IBV_QPT_RAW_PACKET,
 	};
 
-	return ibv_create_qp(priv->pd, &attr);
+	qp = ibv_create_qp(priv->pd, &attr);
+	if (!qp)
+		rte_errno = errno ? errno : EINVAL;
+	return qp;
 }
 
 /**
@@ -1832,7 +1867,7 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
  *   Memory pool for buffer allocations.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
@@ -1848,13 +1883,14 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	struct ibv_qp_attr mod;
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
-	int ret = 0;
+	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	mb_len = rte_pktmbuf_data_room_size(mp);
 	if (desc == 0) {
+		rte_errno = EINVAL;
 		ERROR("%p: invalid number of RX descriptors", (void *)dev);
-		return EINVAL;
+		goto error;
 	}
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
@@ -1876,26 +1912,26 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	/* Use the entire RX mempool as the memory region. */
 	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
 	if (tmpl.mr == NULL) {
-		ret = EINVAL;
+		rte_errno = EINVAL;
 		ERROR("%p: MR creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	if (dev->data->dev_conf.intr_conf.rxq) {
 		tmpl.channel = ibv_create_comp_channel(priv->ctx);
 		if (tmpl.channel == NULL) {
-			ret = ENOMEM;
+			rte_errno = ENOMEM;
 			ERROR("%p: Rx interrupt completion channel creation"
 			      " failure: %s",
-			      (void *)dev, strerror(ret));
+			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
 	}
 	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
 	if (tmpl.cq == NULL) {
-		ret = ENOMEM;
+		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	DEBUG("priv->device_attr.max_qp_wr is %d",
@@ -1904,9 +1940,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_sge);
 	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
 	if (tmpl.qp == NULL) {
-		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	mod = (struct ibv_qp_attr){
@@ -1917,22 +1952,24 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	ret = rxq_alloc_elts(&tmpl, desc, NULL);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
 		      (void *)dev,
 		      (void *)bad_wr,
-		      strerror(ret));
+		      strerror(rte_errno));
 		goto error;
 	}
 	mod = (struct ibv_qp_attr){
@@ -1940,8 +1977,9 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	/* Save port ID. */
@@ -1952,12 +1990,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	rxq_cleanup(rxq);
 	*rxq = tmpl;
 	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
-	assert(ret == 0);
 	return 0;
 error:
+	ret = rte_errno;
 	rxq_cleanup(&tmpl);
-	assert(ret > 0);
-	return ret;
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
 }
 
 /**
@@ -1977,7 +2016,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
  *   Memory pool for buffer allocations.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
@@ -1992,17 +2031,19 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->rxqs_n) {
+		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->rxqs_n);
 		priv_unlock(priv);
-		return -EOVERFLOW;
+		return -rte_errno;
 	}
 	if (rxq != NULL) {
 		DEBUG("%p: reusing already allocated queue index %u (%p)",
 		      (void *)dev, idx, (void *)rxq);
 		if (priv->started) {
+			rte_errno = EEXIST;
 			priv_unlock(priv);
-			return -EEXIST;
+			return -rte_errno;
 		}
 		(*priv->rxqs)[idx] = NULL;
 		if (idx == 0)
@@ -2011,10 +2052,11 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
 		if (rxq == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
 			priv_unlock(priv);
-			return -ENOMEM;
+			return -rte_errno;
 		}
 	}
 	ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
@@ -2029,7 +2071,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 /**
@@ -2081,7 +2123,7 @@ priv_dev_link_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_start(struct rte_eth_dev *dev)
@@ -2130,7 +2172,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	priv_mac_addr_del(priv);
 	priv->started = 0;
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 /**
@@ -2295,7 +2337,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
  *   Nonzero for link up, otherwise link down.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_link(struct priv *priv, int up)
@@ -2325,7 +2367,7 @@ priv_set_link(struct priv *priv, int up)
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_set_link_down(struct rte_eth_dev *dev)
@@ -2346,7 +2388,7 @@ mlx4_set_link_down(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_set_link_up(struct rte_eth_dev *dev)
@@ -2504,6 +2546,9 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device structure.
  * @param wait_to_complete
  *   Wait for request completion (ignored).
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
@@ -2518,12 +2563,14 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 
 	/* priv_lock() is not taken to allow concurrent calls. */
 
-	if (priv == NULL)
-		return -EINVAL;
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
 	(void)wait_to_complete;
 	if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
-		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
-		return -1;
+		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(rte_errno));
+		return -rte_errno;
 	}
 	memset(&dev_link, 0, sizeof(dev_link));
 	dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
@@ -2531,8 +2578,8 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	ifr.ifr_data = (void *)&edata;
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
-		     strerror(errno));
-		return -1;
+		     strerror(rte_errno));
+		return -rte_errno;
 	}
 	link_speed = ethtool_cmd_speed(&edata);
 	if (link_speed == -1)
@@ -2556,7 +2603,7 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
  *   New MTU.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
@@ -2567,9 +2614,9 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	priv_lock(priv);
 	/* Set kernel interface MTU first. */
 	if (priv_set_mtu(priv, mtu)) {
-		ret = errno;
+		ret = rte_errno;
 		WARN("cannot set port %u MTU to %u: %s", priv->port, mtu,
-		     strerror(ret));
+		     strerror(rte_errno));
 		goto out;
 	} else
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
@@ -2589,7 +2636,7 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
  *   Flow control output buffer.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
@@ -2604,10 +2651,10 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	ifr.ifr_data = (void *)&ethpause;
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = errno;
+		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
 		     " failed: %s",
-		     strerror(ret));
+		     strerror(rte_errno));
 		goto out;
 	}
 
@@ -2637,7 +2684,7 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
  *   Flow control parameters.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
@@ -2665,10 +2712,10 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = errno;
+		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
 		     " failed: %s",
-		     strerror(ret));
+		     strerror(rte_errno));
 		goto out;
 	}
 	ret = 0;
@@ -2701,7 +2748,7 @@ const struct rte_flow_ops mlx4_flow_ops = {
  *   Pointer to operation-specific structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
@@ -2709,12 +2756,10 @@ mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
 		     enum rte_filter_op filter_op,
 		     void *arg)
 {
-	int ret = EINVAL;
-
 	switch (filter_type) {
 	case RTE_ETH_FILTER_GENERIC:
 		if (filter_op != RTE_ETH_FILTER_GET)
-			return -EINVAL;
+			break;
 		*(const void **)arg = &mlx4_flow_ops;
 		return 0;
 	default:
@@ -2722,7 +2767,8 @@ mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
 		      (void *)dev, filter_type);
 		break;
 	}
-	return -ret;
+	rte_errno = ENOTSUP;
+	return -rte_errno;
 }
 
 static const struct eth_dev_ops mlx4_dev_ops = {
@@ -2757,7 +2803,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
  *   PCI bus address output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
@@ -2768,8 +2814,10 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
 	MKSTR(path, "%s/device/uevent", device->ibdev_path);
 
 	file = fopen(path, "rb");
-	if (file == NULL)
-		return -1;
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
 	while (fgets(line, sizeof(line), file) == line) {
 		size_t len = strlen(line);
 		int ret;
@@ -2807,15 +2855,16 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
  *   MAC address output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 {
 	struct ifreq request;
+	int ret = priv_ifreq(priv, SIOCGIFHWADDR, &request);
 
-	if (priv_ifreq(priv, SIOCGIFHWADDR, &request))
-		return -1;
+	if (ret)
+		return ret;
 	memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
 	return 0;
 }
@@ -2953,7 +3002,7 @@ mlx4_dev_interrupt_handler(void *cb_arg)
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
@@ -2967,11 +3016,9 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 					   mlx4_dev_interrupt_handler,
 					   dev);
 	if (ret < 0) {
-		ERROR("rte_intr_callback_unregister failed with %d"
-		      "%s%s%s", ret,
-		      (errno ? " (errno: " : ""),
-		      (errno ? strerror(errno) : ""),
-		      (errno ? ")" : ""));
+		rte_errno = ret;
+		ERROR("rte_intr_callback_unregister failed with %d %s",
+		      ret, strerror(rte_errno));
 	}
 	priv->intr_handle.fd = 0;
 	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -2986,7 +3033,7 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_interrupt_handler_install(struct priv *priv,
@@ -3006,10 +3053,11 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 	flags = fcntl(priv->ctx->async_fd, F_GETFL);
 	rc = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (rc < 0) {
+		rte_errno = errno ? errno : EINVAL;
 		INFO("failed to change file descriptor async event queue");
 		dev->data->dev_conf.intr_conf.lsc = 0;
 		dev->data->dev_conf.intr_conf.rmv = 0;
-		return -errno;
+		return -rte_errno;
 	} else {
 		priv->intr_handle.fd = priv->ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
@@ -3017,9 +3065,10 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 						 mlx4_dev_interrupt_handler,
 						 dev);
 		if (rc) {
+			rte_errno = -rc;
 			ERROR("rte_intr_callback_register failed "
-			      " (errno: %s)", strerror(errno));
-			return rc;
+			      " (rte_errno: %s)", strerror(rte_errno));
+			return -rte_errno;
 		}
 	}
 	return 0;
@@ -3033,7 +3082,7 @@ priv_dev_interrupt_handler_install(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
@@ -3054,7 +3103,7 @@ priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error,
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
@@ -3072,7 +3121,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
 		if (rte_eal_alarm_cancel(mlx4_dev_link_status_handler,
 					 dev)) {
 			ERROR("rte_eal_alarm_cancel failed "
-			      " (errno: %s)", strerror(rte_errno));
+			      " (rte_errno: %s)", strerror(rte_errno));
 			return -rte_errno;
 		}
 	priv->pending_alarm = 0;
@@ -3087,7 +3136,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_link_interrupt_handler_install(struct priv *priv,
@@ -3112,7 +3161,7 @@ priv_dev_link_interrupt_handler_install(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_removal_interrupt_handler_install(struct priv *priv,
@@ -3136,7 +3185,7 @@ priv_dev_removal_interrupt_handler_install(struct priv *priv,
  *   Pointer to private structure.
  *
  * @return
- *   0 on success, negative on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_rx_intr_vec_enable(struct priv *priv)
@@ -3152,9 +3201,10 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	priv_rx_intr_vec_disable(priv);
 	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
 	if (intr_handle->intr_vec == NULL) {
+		rte_errno = ENOMEM;
 		ERROR("failed to allocate memory for interrupt vector,"
 		      " Rx interrupts will not be supported");
-		return -ENOMEM;
+		return -rte_errno;
 	}
 	intr_handle->type = RTE_INTR_HANDLE_EXT;
 	for (i = 0; i != n; ++i) {
@@ -3172,20 +3222,22 @@ priv_rx_intr_vec_enable(struct priv *priv)
 			continue;
 		}
 		if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
+			rte_errno = E2BIG;
 			ERROR("too many Rx queues for interrupt vector size"
 			      " (%d), Rx interrupts cannot be enabled",
 			      RTE_MAX_RXTX_INTR_VEC_ID);
 			priv_rx_intr_vec_disable(priv);
-			return -1;
+			return -rte_errno;
 		}
 		fd = rxq->channel->fd;
 		flags = fcntl(fd, F_GETFL);
 		rc = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
 		if (rc < 0) {
+			rte_errno = errno;
 			ERROR("failed to make Rx interrupt file descriptor"
 			      " %d non-blocking for queue index %d", fd, i);
 			priv_rx_intr_vec_disable(priv);
-			return rc;
+			return -rte_errno;
 		}
 		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
 		intr_handle->efds[count] = fd;
@@ -3224,7 +3276,7 @@ priv_rx_intr_vec_disable(struct priv *priv)
  *   Rx queue index.
  *
  * @return
- *   0 on success, negative on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
@@ -3237,8 +3289,10 @@ mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
 		ret = EINVAL;
 	else
 		ret = ibv_req_notify_cq(rxq->cq, 0);
-	if (ret)
+	if (ret) {
+		rte_errno = ret;
 		WARN("unable to arm interrupt on rx queue %d", idx);
+	}
 	return -ret;
 }
 
@@ -3251,7 +3305,7 @@ mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
  *   Rx queue index.
  *
  * @return
- *   0 on success, negative on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
@@ -3269,11 +3323,13 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
 		if (ret || ev_cq != rxq->cq)
 			ret = EINVAL;
 	}
-	if (ret)
+	if (ret) {
+		rte_errno = ret;
 		WARN("unable to disable interrupt on rx queue %d",
 		     idx);
-	else
+	} else {
 		ibv_ack_cq_events(rxq->cq, 1);
+	}
 	return -ret;
 }
 
@@ -3288,7 +3344,7 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
  *   Shared configuration data.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
@@ -3298,18 +3354,21 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 	errno = 0;
 	tmp = strtoul(val, NULL, 0);
 	if (errno) {
+		rte_errno = errno;
 		WARN("%s: \"%s\" is not a valid integer", key, val);
-		return -errno;
+		return -rte_errno;
 	}
 	if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) {
 		if (!(conf->ports.present & (1 << tmp))) {
+			rte_errno = EINVAL;
 			ERROR("invalid port index %lu", tmp);
-			return -EINVAL;
+			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
 	} else {
+		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
-		return -EINVAL;
+		return -rte_errno;
 	}
 	return 0;
 }
@@ -3321,7 +3380,7 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
  *   Device arguments structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
@@ -3335,8 +3394,9 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 		return 0;
 	kvlist = rte_kvargs_parse(devargs->args, pmd_mlx4_init_params);
 	if (kvlist == NULL) {
+		rte_errno = EINVAL;
 		ERROR("failed to parse kvargs");
-		return -EINVAL;
+		return -rte_errno;
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
@@ -3372,7 +3432,7 @@ static struct rte_pci_driver mlx4_driver;
  *   PCI device information.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
@@ -3393,10 +3453,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	list = ibv_get_device_list(&i);
 	if (list == NULL) {
-		assert(errno);
-		if (errno == ENOSYS)
+		rte_errno = errno;
+		assert(rte_errno);
+		if (rte_errno == ENOSYS)
 			ERROR("cannot list devices, is ib_uverbs loaded?");
-		return -errno;
+		return -rte_errno;
 	}
 	assert(i >= 0);
 	/*
@@ -3427,20 +3488,23 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		ibv_free_device_list(list);
 		switch (err) {
 		case 0:
+			rte_errno = ENODEV;
 			ERROR("cannot access device, is mlx4_ib loaded?");
-			return -ENODEV;
+			return -rte_errno;
 		case EINVAL:
+			rte_errno = EINVAL;
 			ERROR("cannot use device, are drivers up to date?");
-			return -EINVAL;
+			return -rte_errno;
 		}
 		assert(err > 0);
-		return -err;
+		rte_errno = err;
+		return -rte_errno;
 	}
 	ibv_dev = list[i];
 
 	DEBUG("device opened");
 	if (ibv_query_device(attr_ctx, &device_attr)) {
-		err = ENODEV;
+		rte_errno = ENODEV;
 		goto error;
 	}
 	INFO("%u port(s) detected", device_attr.phys_port_cnt);
@@ -3449,7 +3513,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		conf.ports.present |= 1 << i;
 	if (mlx4_args(pci_dev->device.devargs, &conf)) {
 		ERROR("failed to process device arguments");
-		err = EINVAL;
+		rte_errno = EINVAL;
 		goto error;
 	}
 	/* Use all ports when none are defined */
@@ -3472,22 +3536,22 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 		ctx = ibv_open_device(ibv_dev);
 		if (ctx == NULL) {
-			err = ENODEV;
+			rte_errno = ENODEV;
 			goto port_error;
 		}
 
 		/* Check port status. */
 		err = ibv_query_port(ctx, port, &port_attr);
 		if (err) {
-			ERROR("port query failed: %s", strerror(err));
-			err = ENODEV;
+			rte_errno = err;
+			ERROR("port query failed: %s", strerror(rte_errno));
 			goto port_error;
 		}
 
 		if (port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) {
+			rte_errno = ENOTSUP;
 			ERROR("port %d is not configured in Ethernet mode",
 			      port);
-			err = EINVAL;
 			goto port_error;
 		}
 
@@ -3499,8 +3563,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* Allocate protection domain. */
 		pd = ibv_alloc_pd(ctx);
 		if (pd == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("PD allocation failure");
-			err = ENOMEM;
 			goto port_error;
 		}
 
@@ -3509,8 +3573,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 				   sizeof(*priv),
 				   RTE_CACHE_LINE_SIZE);
 		if (priv == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("priv allocation failure");
-			err = ENOMEM;
 			goto port_error;
 		}
 
@@ -3524,8 +3588,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
 			ERROR("cannot get MAC address, is mlx4_en loaded?"
-			      " (errno: %s)", strerror(errno));
-			err = ENODEV;
+			      " (rte_errno: %s)", strerror(rte_errno));
 			goto port_error;
 		}
 		INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
@@ -3562,7 +3625,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		}
 		if (eth_dev == NULL) {
 			ERROR("can not allocate rte ethdev");
-			err = ENOMEM;
+			rte_errno = ENOMEM;
 			goto port_error;
 		}
 
@@ -3620,8 +3683,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		claim_zero(ibv_close_device(attr_ctx));
 	if (list)
 		ibv_free_device_list(list);
-	assert(err >= 0);
-	return -err;
+	assert(rte_errno >= 0);
+	return -rte_errno;
 }
 
 static const struct rte_pci_id mlx4_pci_id_map[] = {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 28/48] net/mlx4: clean up coding style inconsistencies
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (26 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 27/48] net/mlx4: standardize on negative errno values Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 29/48] net/mlx4: remove control path locks Adrien Mazarguil
                   ` (21 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

This addresses badly formatted comments and needless empty lines before
refactoring functions into different files.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 89 ++++++++++++++++-----------------------
 drivers/net/mlx4/mlx4_flow.c |  1 -
 2 files changed, 36 insertions(+), 54 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8cfeab2..8f3377c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -619,8 +619,10 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 	txq->elts_head = 0;
 	txq->elts_tail = 0;
 	txq->elts_comp = 0;
-	/* Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
-	 * at least 4 times per ring. */
+	/*
+	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
+	 * at least 4 times per ring.
+	 */
 	txq->elts_comp_cd_init =
 		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
 		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
@@ -629,7 +631,6 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 	return 0;
 error:
 	rte_free(elts);
-
 	DEBUG("%p: failed, freed everything", (void *)txq);
 	assert(ret > 0);
 	rte_errno = ret;
@@ -675,7 +676,6 @@ txq_free_elts(struct txq *txq)
 	rte_free(elts);
 }
 
-
 /**
  * Clean up a TX queue.
  *
@@ -766,7 +766,6 @@ static void mlx4_check_mempool_cb(struct rte_mempool *mp,
 
 	(void)mp;
 	(void)mem_idx;
-
 	/* It already failed, skip the next chunks. */
 	if (data->ret != 0)
 		return;
@@ -810,7 +809,6 @@ static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
 	rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, &data);
 	*start = (uintptr_t)data.start;
 	*end = (uintptr_t)data.end;
-
 	return data.ret;
 }
 
@@ -844,7 +842,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 			(void *)mp);
 		return NULL;
 	}
-
 	DEBUG("mempool %p area start=%p end=%p size=%zu",
 	      (void *)mp, (void *)start, (void *)end,
 	      (size_t)(end - start));
@@ -971,8 +968,10 @@ txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
 	struct txq_mp2mr_mbuf_check_data *data = arg;
 	struct rte_mbuf *buf = obj;
 
-	/* Check whether mbuf structure fits element size and whether mempool
-	 * pointer is valid. */
+	/*
+	 * Check whether mbuf structure fits element size and whether mempool
+	 * pointer is valid.
+	 */
 	if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
 		data->ret = -1;
 }
@@ -1235,8 +1234,10 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
-		/* Do *NOT* enable this, completions events are managed per
-		 * TX burst. */
+		/*
+		 * Do *NOT* enable this, completions events are managed per
+		 * TX burst.
+		 */
 		.sq_sig_all = 0,
 	};
 	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
@@ -1449,9 +1450,11 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 			ERROR("%p: empty mbuf pool", (void *)rxq);
 			goto error;
 		}
-		/* Configure WR. Work request ID contains its own index in
+		/*
+		 * Configure WR. Work request ID contains its own index in
 		 * the elts array and the offset between SGE buffer header and
-		 * its data. */
+		 * its data.
+		 */
 		WR_ID(wr->wr_id).id = i;
 		WR_ID(wr->wr_id).offset =
 			(((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
@@ -1473,8 +1476,10 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		sge->lkey = rxq->mr->lkey;
 		/* Redundant check for tailroom. */
 		assert(sge->length == rte_pktmbuf_tailroom(buf));
-		/* Make sure elts index and SGE mbuf pointer can be deduced
-		 * from WR ID. */
+		/*
+		 * Make sure elts index and SGE mbuf pointer can be deduced
+		 * from WR ID.
+		 */
 		if ((WR_ID(wr->wr_id).id != i) ||
 		    ((void *)((uintptr_t)sge->addr -
 			WR_ID(wr->wr_id).offset) != buf)) {
@@ -1762,7 +1767,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
 			goto repost;
 		}
-
 		/* Reconfigure sge to use rep instead of seg. */
 		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
 		assert(elt->sge.lkey == rxq->mr->lkey);
@@ -1770,7 +1774,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			(((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
 			 (uintptr_t)rep);
 		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
-
 		/* Update seg information. */
 		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
 		NB_SEGS(seg) = 1;
@@ -1780,7 +1783,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		DATA_LEN(seg) = len;
 		seg->packet_type = 0;
 		seg->ol_flags = 0;
-
 		/* Return packet. */
 		*(pkts++) = seg;
 		++pkts_ret;
@@ -2282,9 +2284,11 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
 	priv_mac_addr_del(priv);
-	/* Prevent crashes when queues are still in use. This is unfortunately
+	/*
+	 * Prevent crashes when queues are still in use. This is unfortunately
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
-	 * never release them before closing the device. */
+	 * never release them before closing the device.
+	 */
 	dev->rx_pkt_burst = removed_rx_burst;
 	dev->tx_pkt_burst = removed_tx_burst;
 	if (priv->rxqs != NULL) {
@@ -2401,6 +2405,7 @@ mlx4_set_link_up(struct rte_eth_dev *dev)
 	priv_unlock(priv);
 	return err;
 }
+
 /**
  * DPDK callback to get information about the device.
  *
@@ -2417,7 +2422,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	char ifname[IF_NAMESIZE];
 
 	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-
 	if (priv == NULL)
 		return;
 	priv_lock(priv);
@@ -2562,7 +2566,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	int link_speed = 0;
 
 	/* priv_lock() is not taken to allow concurrent calls. */
-
 	if (priv == NULL) {
 		rte_errno = EINVAL;
 		return -rte_errno;
@@ -2657,7 +2660,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		     strerror(rte_errno));
 		goto out;
 	}
-
 	fc_conf->autoneg = ethpause.autoneg;
 	if (ethpause.rx_pause && ethpause.tx_pause)
 		fc_conf->mode = RTE_FC_FULL;
@@ -2668,7 +2670,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	else
 		fc_conf->mode = RTE_FC_NONE;
 	ret = 0;
-
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
@@ -2703,13 +2704,11 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		ethpause.rx_pause = 1;
 	else
 		ethpause.rx_pause = 0;
-
 	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
 	    (fc_conf->mode & RTE_FC_TX_PAUSE))
 		ethpause.tx_pause = 1;
 	else
 		ethpause.tx_pause = 0;
-
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		ret = rte_errno;
@@ -2719,7 +2718,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		goto out;
 	}
 	ret = 0;
-
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
@@ -2953,8 +2951,8 @@ mlx4_dev_link_status_handler(void *arg)
 	ret = priv_dev_status_handler(priv, dev, &events);
 	priv_unlock(priv);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL,
-					      NULL);
+		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
+					      NULL, NULL);
 }
 
 /**
@@ -3001,6 +2999,7 @@ mlx4_dev_interrupt_handler(void *cb_arg)
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3032,6 +3031,7 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3042,8 +3042,9 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 	int flags;
 	int rc;
 
-	/* Check whether the interrupt handler has already been installed
-	 * for either type of interrupt
+	/*
+	 * Check whether the interrupt handler has already been installed
+	 * for either type of interrupt.
 	 */
 	if (priv->intr_conf.lsc &&
 	    priv->intr_conf.rmv &&
@@ -3081,6 +3082,7 @@ priv_dev_interrupt_handler_install(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3102,6 +3104,7 @@ priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3135,6 +3138,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3160,6 +3164,7 @@ priv_dev_link_interrupt_handler_install(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3450,7 +3455,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	(void)pci_drv;
 	assert(pci_drv == &mlx4_driver);
-
 	list = ibv_get_device_list(&i);
 	if (list == NULL) {
 		rte_errno = errno;
@@ -3501,14 +3505,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return -rte_errno;
 	}
 	ibv_dev = list[i];
-
 	DEBUG("device opened");
 	if (ibv_query_device(attr_ctx, &device_attr)) {
 		rte_errno = ENODEV;
 		goto error;
 	}
 	INFO("%u port(s) detected", device_attr.phys_port_cnt);
-
 	for (i = 0; i < device_attr.phys_port_cnt; ++i)
 		conf.ports.present |= 1 << i;
 	if (mlx4_args(pci_dev->device.devargs, &conf)) {
@@ -3531,15 +3533,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* If port is not enabled, skip. */
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
-
 		DEBUG("using port %u", port);
-
 		ctx = ibv_open_device(ibv_dev);
 		if (ctx == NULL) {
 			rte_errno = ENODEV;
 			goto port_error;
 		}
-
 		/* Check port status. */
 		err = ibv_query_port(ctx, port, &port_attr);
 		if (err) {
@@ -3547,19 +3546,16 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("port query failed: %s", strerror(rte_errno));
 			goto port_error;
 		}
-
 		if (port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) {
 			rte_errno = ENOTSUP;
 			ERROR("port %d is not configured in Ethernet mode",
 			      port);
 			goto port_error;
 		}
-
 		if (port_attr.state != IBV_PORT_ACTIVE)
 			DEBUG("port %d is not active: \"%s\" (%d)",
 			      port, ibv_port_state_str(port_attr.state),
 			      port_attr.state);
-
 		/* Allocate protection domain. */
 		pd = ibv_alloc_pd(ctx);
 		if (pd == NULL) {
@@ -3567,7 +3563,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("PD allocation failure");
 			goto port_error;
 		}
-
 		/* from rte_ethdev.c */
 		priv = rte_zmalloc("ethdev private structure",
 				   sizeof(*priv),
@@ -3577,13 +3572,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("priv allocation failure");
 			goto port_error;
 		}
-
 		priv->ctx = ctx;
 		priv->device_attr = device_attr;
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
-
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
@@ -3614,7 +3607,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* Get actual MTU if possible. */
 		priv_get_mtu(priv, &priv->mtu);
 		DEBUG("port %u MTU is %u", priv->port, priv->mtu);
-
 		/* from rte_ethdev.c */
 		{
 			char name[RTE_ETH_NAME_MAX_LEN];
@@ -3628,15 +3620,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			rte_errno = ENOMEM;
 			goto port_error;
 		}
-
 		eth_dev->data->dev_private = priv;
 		eth_dev->data->mac_addrs = &priv->mac;
 		eth_dev->device = &pci_dev->device;
-
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
-
 		eth_dev->device->driver = &mlx4_driver.driver;
-
 		/*
 		 * Copy and override interrupt handle to prevent it from
 		 * being shared between all ethdev instances of a given PCI
@@ -3645,11 +3633,9 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		 */
 		priv->intr_handle_dev = *eth_dev->intr_handle;
 		eth_dev->intr_handle = &priv->intr_handle_dev;
-
 		priv->dev = eth_dev;
 		eth_dev->dev_ops = &mlx4_dev_ops;
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
-
 		/* Bring Ethernet device up. */
 		DEBUG("forcing Ethernet interface up");
 		priv_set_flags(priv, ~IFF_UP, IFF_UP);
@@ -3657,7 +3643,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 			mlx4_link_update(eth_dev, 0);
 		continue;
-
 port_error:
 		rte_free(priv);
 		if (pd)
@@ -3670,14 +3655,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	}
 	if (i == device_attr.phys_port_cnt)
 		return 0;
-
 	/*
 	 * XXX if something went wrong in the loop above, there is a resource
 	 * leak (ctx, pd, priv, dpdk ethdev) but we can do nothing about it as
 	 * long as the dpdk does not provide a way to deallocate a ethdev and a
 	 * way to enumerate the registered ethdevs to free the previous ones.
 	 */
-
 error:
 	if (attr_ctx)
 		claim_zero(ibv_close_device(attr_ctx));
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 58d4698..7dcb059 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -835,7 +835,6 @@ priv_flow_create_action_queue(struct priv *priv,
 		goto error;
 	}
 	return rte_flow;
-
 error:
 	rte_free(rte_flow);
 	return NULL;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 29/48] net/mlx4: remove control path locks
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (27 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 28/48] net/mlx4: clean up coding style inconsistencies Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 30/48] net/mlx4: remove unnecessary wrapper functions Adrien Mazarguil
                   ` (20 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Concurrent use of various control path functions (e.g. configuring a queue
and destroying it simultaneously) may lead to undefined behavior.

PMD are not supposed to protect themselves from misbehaving applications,
and mlx4 is one of the few with internal locks on most control path
operations. This adds unnecessary complexity.

Leave this role to wrapper functions in ethdev.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 90 +++------------------------------------
 drivers/net/mlx4/mlx4.h      |  4 --
 drivers/net/mlx4/mlx4_flow.c | 15 +------
 3 files changed, 6 insertions(+), 103 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8f3377c..71ee016 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -59,7 +59,6 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
-#include <rte_spinlock.h>
 #include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
@@ -121,29 +120,6 @@ priv_rx_intr_vec_enable(struct priv *priv);
 static void
 priv_rx_intr_vec_disable(struct priv *priv);
 
-/**
- * Lock private structure to protect it from concurrent access in the
- * control path.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void priv_lock(struct priv *priv)
-{
-	rte_spinlock_lock(&priv->lock);
-}
-
-/**
- * Unlock private structure.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void priv_unlock(struct priv *priv)
-{
-	rte_spinlock_unlock(&priv->lock);
-}
-
 /* Allocate a buffer on the stack and fill it with a printf format string. */
 #define MKSTR(name, ...) \
 	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
@@ -570,13 +546,7 @@ dev_configure(struct rte_eth_dev *dev)
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
-	struct priv *priv = dev->data->dev_private;
-	int ret;
-
-	priv_lock(priv);
-	ret = dev_configure(dev);
-	priv_unlock(priv);
-	return ret;
+	return dev_configure(dev);
 }
 
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
@@ -1328,14 +1298,12 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	struct txq *txq = (*priv->txqs)[idx];
 	int ret;
 
-	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->txqs_n) {
 		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->txqs_n);
-		priv_unlock(priv);
 		return -rte_errno;
 	}
 	if (txq != NULL) {
@@ -1343,7 +1311,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, idx, (void *)txq);
 		if (priv->started) {
 			rte_errno = EEXIST;
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 		(*priv->txqs)[idx] = NULL;
@@ -1354,7 +1321,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 	}
@@ -1369,7 +1335,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		/* Update send callback. */
 		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
-	priv_unlock(priv);
 	return ret;
 }
 
@@ -1389,7 +1354,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 	if (txq == NULL)
 		return;
 	priv = txq->priv;
-	priv_lock(priv);
 	for (i = 0; (i != priv->txqs_n); ++i)
 		if ((*priv->txqs)[i] == txq) {
 			DEBUG("%p: removing TX queue %p from list",
@@ -1399,7 +1363,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 		}
 	txq_cleanup(txq);
 	rte_free(txq);
-	priv_unlock(priv);
 }
 
 /* RX queues handling. */
@@ -2029,14 +1992,12 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	struct rxq *rxq = (*priv->rxqs)[idx];
 	int ret;
 
-	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->rxqs_n) {
 		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->rxqs_n);
-		priv_unlock(priv);
 		return -rte_errno;
 	}
 	if (rxq != NULL) {
@@ -2044,7 +2005,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, idx, (void *)rxq);
 		if (priv->started) {
 			rte_errno = EEXIST;
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 		(*priv->rxqs)[idx] = NULL;
@@ -2057,7 +2017,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 	}
@@ -2072,7 +2031,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		/* Update receive callback. */
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
-	priv_unlock(priv);
 	return ret;
 }
 
@@ -2092,7 +2050,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	if (rxq == NULL)
 		return;
 	priv = rxq->priv;
-	priv_lock(priv);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
@@ -2104,7 +2061,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		}
 	rxq_cleanup(rxq);
 	rte_free(rxq);
-	priv_unlock(priv);
 }
 
 static int
@@ -2133,11 +2089,8 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	priv_lock(priv);
-	if (priv->started) {
-		priv_unlock(priv);
+	if (priv->started)
 		return 0;
-	}
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
 	ret = priv_mac_addr_add(priv);
@@ -2167,13 +2120,11 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		      (void *)dev, strerror(ret));
 		goto err;
 	}
-	priv_unlock(priv);
 	return 0;
 err:
 	/* Rollback. */
 	priv_mac_addr_del(priv);
 	priv->started = 0;
-	priv_unlock(priv);
 	return ret;
 }
 
@@ -2190,16 +2141,12 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 
-	priv_lock(priv);
-	if (!priv->started) {
-		priv_unlock(priv);
+	if (!priv->started)
 		return;
-	}
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
 	priv_mac_addr_del(priv);
-	priv_unlock(priv);
 }
 
 /**
@@ -2279,7 +2226,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
@@ -2328,7 +2274,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	priv_dev_removal_interrupt_handler_uninstall(priv, dev);
 	priv_dev_link_interrupt_handler_uninstall(priv, dev);
 	priv_rx_intr_vec_disable(priv);
-	priv_unlock(priv);
 	memset(priv, 0, sizeof(*priv));
 }
 
@@ -2377,12 +2322,8 @@ static int
 mlx4_set_link_down(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	int err;
 
-	priv_lock(priv);
-	err = priv_set_link(priv, 0);
-	priv_unlock(priv);
-	return err;
+	return priv_set_link(priv, 0);
 }
 
 /**
@@ -2398,12 +2339,8 @@ static int
 mlx4_set_link_up(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	int err;
 
-	priv_lock(priv);
-	err = priv_set_link(priv, 1);
-	priv_unlock(priv);
-	return err;
+	return priv_set_link(priv, 1);
 }
 
 /**
@@ -2424,7 +2361,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
 	info->max_rx_pktlen = 65536;
@@ -2451,7 +2387,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 			ETH_LINK_SPEED_20G |
 			ETH_LINK_SPEED_40G |
 			ETH_LINK_SPEED_56G;
-	priv_unlock(priv);
 }
 
 /**
@@ -2472,7 +2407,6 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	/* Add software counters. */
 	for (i = 0; (i != priv->rxqs_n); ++i) {
 		struct rxq *rxq = (*priv->rxqs)[i];
@@ -2507,7 +2441,6 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 		tmp.oerrors += txq->stats.odropped;
 	}
 	*stats = tmp;
-	priv_unlock(priv);
 }
 
 /**
@@ -2525,7 +2458,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	for (i = 0; (i != priv->rxqs_n); ++i) {
 		if ((*priv->rxqs)[i] == NULL)
 			continue;
@@ -2540,7 +2472,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 		(*priv->txqs)[i]->stats =
 			(struct mlx4_txq_stats){ .idx = idx };
 	}
-	priv_unlock(priv);
 }
 
 /**
@@ -2565,7 +2496,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	struct rte_eth_link dev_link;
 	int link_speed = 0;
 
-	/* priv_lock() is not taken to allow concurrent calls. */
 	if (priv == NULL) {
 		rte_errno = EINVAL;
 		return -rte_errno;
@@ -2614,7 +2544,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	struct priv *priv = dev->data->dev_private;
 	int ret = 0;
 
-	priv_lock(priv);
 	/* Set kernel interface MTU first. */
 	if (priv_set_mtu(priv, mtu)) {
 		ret = rte_errno;
@@ -2625,7 +2554,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
 	priv->mtu = mtu;
 out:
-	priv_unlock(priv);
 	assert(ret >= 0);
 	return -ret;
 }
@@ -2652,7 +2580,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	int ret;
 
 	ifr.ifr_data = (void *)&ethpause;
-	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
@@ -2671,7 +2598,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		fc_conf->mode = RTE_FC_NONE;
 	ret = 0;
 out:
-	priv_unlock(priv);
 	assert(ret >= 0);
 	return -ret;
 }
@@ -2709,7 +2635,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		ethpause.tx_pause = 1;
 	else
 		ethpause.tx_pause = 0;
-	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
@@ -2719,7 +2644,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	}
 	ret = 0;
 out:
-	priv_unlock(priv);
 	assert(ret >= 0);
 	return -ret;
 }
@@ -2945,11 +2869,9 @@ mlx4_dev_link_status_handler(void *arg)
 	uint32_t events;
 	int ret;
 
-	priv_lock(priv);
 	assert(priv->pending_alarm == 1);
 	priv->pending_alarm = 0;
 	ret = priv_dev_status_handler(priv, dev, &events);
-	priv_unlock(priv);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
 					      NULL, NULL);
@@ -2972,9 +2894,7 @@ mlx4_dev_interrupt_handler(void *cb_arg)
 	uint32_t ev;
 	int i;
 
-	priv_lock(priv);
 	ret = priv_dev_status_handler(priv, dev, &ev);
-	priv_unlock(priv);
 	if (ret > 0) {
 		for (i = RTE_ETH_EVENT_UNKNOWN;
 		     i < RTE_ETH_EVENT_MAX;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 635036e..3580e05 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -223,10 +223,6 @@ struct priv {
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
-	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
-void priv_lock(struct priv *priv);
-void priv_unlock(struct priv *priv);
-
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 7dcb059..07305f1 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -703,13 +703,9 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
-	int ret;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	priv_lock(priv);
-	ret = priv_flow_validate(priv, attr, items, actions, error, &flow);
-	priv_unlock(priv);
-	return ret;
+	return priv_flow_validate(priv, attr, items, actions, error, &flow);
 }
 
 /**
@@ -936,13 +932,11 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *flow;
 
-	priv_lock(priv);
 	flow = priv_flow_create(priv, attr, items, actions, error);
 	if (flow) {
 		LIST_INSERT_HEAD(&priv->flows, flow, next);
 		DEBUG("Flow created %p", (void *)flow);
 	}
-	priv_unlock(priv);
 	return flow;
 }
 
@@ -969,17 +963,14 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 {
 	struct priv *priv = dev->data->dev_private;
 
-	priv_lock(priv);
 	if (priv->rxqs) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "isolated mode must be set"
 				   " before configuring the device");
-		priv_unlock(priv);
 		return -rte_errno;
 	}
 	priv->isolated = !!enable;
-	priv_unlock(priv);
 	return 0;
 }
 
@@ -1017,9 +1008,7 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 
 	(void)error;
-	priv_lock(priv);
 	priv_flow_destroy(priv, flow);
-	priv_unlock(priv);
 	return 0;
 }
 
@@ -1053,9 +1042,7 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 
 	(void)error;
-	priv_lock(priv);
 	priv_flow_flush(priv);
-	priv_unlock(priv);
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 30/48] net/mlx4: remove unnecessary wrapper functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (28 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 29/48] net/mlx4: remove control path locks Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 31/48] net/mlx4: remove mbuf macro definitions Adrien Mazarguil
                   ` (19 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Wrapper functions whose main purpose was to take a lock on the private
structure are no longer needed since this lock does not exist anymore.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  61 ++++------------------
 drivers/net/mlx4/mlx4_flow.c | 106 +++++++++-----------------------------
 2 files changed, 32 insertions(+), 135 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 71ee016..d831729 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -428,10 +428,10 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
 }
 
 /**
- * Set device MTU.
+ * DPDK callback to change the MTU.
  *
  * @param priv
- *   Pointer to private structure.
+ *   Pointer to Ethernet device structure.
  * @param mtu
  *   MTU value to set.
  *
@@ -439,8 +439,9 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_set_mtu(struct priv *priv, uint16_t mtu)
+mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 {
+	struct priv *priv = dev->data->dev_private;
 	uint16_t new_mtu;
 	int ret = priv_set_sysfs_ulong(priv, "mtu", mtu);
 
@@ -449,8 +450,10 @@ priv_set_mtu(struct priv *priv, uint16_t mtu)
 	ret = priv_get_mtu(priv, &new_mtu);
 	if (ret)
 		return ret;
-	if (new_mtu == mtu)
+	if (new_mtu == mtu) {
+		priv->mtu = mtu;
 		return 0;
+	}
 	rte_errno = EINVAL;
 	return -rte_errno;
 }
@@ -502,7 +505,7 @@ static void
 priv_mac_addr_del(struct priv *priv);
 
 /**
- * Ethernet device configuration.
+ * DPDK callback for Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
  *
@@ -513,7 +516,7 @@ priv_mac_addr_del(struct priv *priv);
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-dev_configure(struct rte_eth_dev *dev)
+mlx4_dev_configure(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
@@ -534,21 +537,6 @@ dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-/**
- * DPDK callback for Ethernet device configuration.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_configure(struct rte_eth_dev *dev)
-{
-	return dev_configure(dev);
-}
-
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
 static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
 
@@ -2528,37 +2516,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 }
 
 /**
- * DPDK callback to change the MTU.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param in_mtu
- *   New MTU.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
-{
-	struct priv *priv = dev->data->dev_private;
-	int ret = 0;
-
-	/* Set kernel interface MTU first. */
-	if (priv_set_mtu(priv, mtu)) {
-		ret = rte_errno;
-		WARN("cannot set port %u MTU to %u: %s", priv->port, mtu,
-		     strerror(rte_errno));
-		goto out;
-	} else
-		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
-	priv->mtu = mtu;
-out:
-	assert(ret >= 0);
-	return -ret;
-}
-
-/**
  * DPDK callback to get flow control status.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 07305f1..3463713 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -837,29 +837,19 @@ priv_flow_create_action_queue(struct priv *priv,
 }
 
 /**
- * Convert a flow.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[in] attr
- *   Flow rule attributes.
- * @param[in] items
- *   Pattern specification (list terminated by the END pattern item).
- * @param[in] actions
- *   Associated actions (list terminated by the END action).
- * @param[out] error
- *   Perform verbose error reporting if not NULL.
+ * Create a flow.
  *
- * @return
- *   A flow on success, NULL otherwise.
+ * @see rte_flow_create()
+ * @see rte_flow_ops
  */
-static struct rte_flow *
-priv_flow_create(struct priv *priv,
+struct rte_flow *
+mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item items[],
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error)
 {
+	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *rte_flow;
 	struct mlx4_flow_action action;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
@@ -909,38 +899,17 @@ priv_flow_create(struct priv *priv,
 	}
 	rte_flow = priv_flow_create_action_queue(priv, flow.ibv_attr,
 						 &action, error);
-	if (rte_flow)
+	if (rte_flow) {
+		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
+		DEBUG("Flow created %p", (void *)rte_flow);
 		return rte_flow;
+	}
 exit:
 	rte_free(flow.ibv_attr);
 	return NULL;
 }
 
 /**
- * Create a flow.
- *
- * @see rte_flow_create()
- * @see rte_flow_ops
- */
-struct rte_flow *
-mlx4_flow_create(struct rte_eth_dev *dev,
-		 const struct rte_flow_attr *attr,
-		 const struct rte_flow_item items[],
-		 const struct rte_flow_action actions[],
-		 struct rte_flow_error *error)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rte_flow *flow;
-
-	flow = priv_flow_create(priv, attr, items, actions, error);
-	if (flow) {
-		LIST_INSERT_HEAD(&priv->flows, flow, next);
-		DEBUG("Flow created %p", (void *)flow);
-	}
-	return flow;
-}
-
-/**
  * @see rte_flow_isolate()
  *
  * Must be done before calling dev_configure().
@@ -977,26 +946,6 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 /**
  * Destroy a flow.
  *
- * @param priv
- *   Pointer to private structure.
- * @param[in] flow
- *   Flow to destroy.
- */
-static void
-priv_flow_destroy(struct priv *priv, struct rte_flow *flow)
-{
-	(void)priv;
-	LIST_REMOVE(flow, next);
-	if (flow->ibv_flow)
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-	rte_free(flow->ibv_attr);
-	DEBUG("Flow destroyed %p", (void *)flow);
-	rte_free(flow);
-}
-
-/**
- * Destroy a flow.
- *
  * @see rte_flow_destroy()
  * @see rte_flow_ops
  */
@@ -1005,33 +954,20 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 		  struct rte_flow *flow,
 		  struct rte_flow_error *error)
 {
-	struct priv *priv = dev->data->dev_private;
-
+	(void)dev;
 	(void)error;
-	priv_flow_destroy(priv, flow);
+	LIST_REMOVE(flow, next);
+	if (flow->ibv_flow)
+		claim_zero(ibv_destroy_flow(flow->ibv_flow));
+	rte_free(flow->ibv_attr);
+	DEBUG("Flow destroyed %p", (void *)flow);
+	rte_free(flow);
 	return 0;
 }
 
 /**
  * Destroy all flows.
  *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_flow_flush(struct priv *priv)
-{
-	while (!LIST_EMPTY(&priv->flows)) {
-		struct rte_flow *flow;
-
-		flow = LIST_FIRST(&priv->flows);
-		priv_flow_destroy(priv, flow);
-	}
-}
-
-/**
- * Destroy all flows.
- *
  * @see rte_flow_flush()
  * @see rte_flow_ops
  */
@@ -1041,8 +977,12 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 {
 	struct priv *priv = dev->data->dev_private;
 
-	(void)error;
-	priv_flow_flush(priv);
+	while (!LIST_EMPTY(&priv->flows)) {
+		struct rte_flow *flow;
+
+		flow = LIST_FIRST(&priv->flows);
+		mlx4_flow_destroy(dev, flow, error);
+	}
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 31/48] net/mlx4: remove mbuf macro definitions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (29 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 30/48] net/mlx4: remove unnecessary wrapper functions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 32/48] net/mlx4: use standard macro to get array size Adrien Mazarguil
                   ` (18 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

These were originally used for compatibility between DPDK releases when
this PMD was built out of tree.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d831729..0f1169c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -74,15 +74,6 @@
 #include "mlx4.h"
 #include "mlx4_flow.h"
 
-/* Convenience macros for accessing mbuf fields. */
-#define NEXT(m) ((m)->next)
-#define DATA_LEN(m) ((m)->data_len)
-#define PKT_LEN(m) ((m)->pkt_len)
-#define DATA_OFF(m) ((m)->data_off)
-#define SET_DATA_OFF(m, o) ((m)->data_off = (o))
-#define NB_SEGS(m) ((m)->nb_segs)
-#define PORT(m) ((m)->port)
-
 /* Work Request ID data type (64 bit). */
 typedef union {
 	struct {
@@ -1006,7 +997,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
 		struct txq_elt *elt = &(*txq->elts)[elts_head];
 		struct ibv_send_wr *wr = &elt->wr;
-		unsigned int segs = NB_SEGS(buf);
+		unsigned int segs = buf->nb_segs;
 		unsigned int sent_size = 0;
 		uint32_t send_flags = 0;
 
@@ -1020,7 +1011,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 #endif
 			/* Faster than rte_pktmbuf_free(). */
 			do {
-				struct rte_mbuf *next = NEXT(tmp);
+				struct rte_mbuf *next = tmp->next;
 
 				rte_pktmbuf_free_seg(tmp);
 				tmp = next;
@@ -1040,7 +1031,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 
 			/* Retrieve buffer information. */
 			addr = rte_pktmbuf_mtod(buf, uintptr_t);
-			length = DATA_LEN(buf);
+			length = buf->data_len;
 			/* Retrieve Memory Region key for this memory pool. */
 			lkey = txq_mp2mr(txq, txq_mb2mp(buf));
 			if (unlikely(lkey == (uint32_t)-1)) {
@@ -1414,7 +1405,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		wr->sg_list = sge;
 		wr->num_sge = 1;
 		/* Headroom is reserved by rte_pktmbuf_alloc(). */
-		assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+		assert(buf->data_off == RTE_PKTMBUF_HEADROOM);
 		/* Buffer is supposed to be empty. */
 		assert(rte_pktmbuf_data_len(buf) == 0);
 		assert(rte_pktmbuf_pkt_len(buf) == 0);
@@ -1726,12 +1717,12 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			 (uintptr_t)rep);
 		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
 		/* Update seg information. */
-		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
-		NB_SEGS(seg) = 1;
-		PORT(seg) = rxq->port_id;
-		NEXT(seg) = NULL;
-		PKT_LEN(seg) = len;
-		DATA_LEN(seg) = len;
+		seg->data_off = RTE_PKTMBUF_HEADROOM;
+		seg->nb_segs = 1;
+		seg->port = rxq->port_id;
+		seg->next = NULL;
+		seg->pkt_len = len;
+		seg->data_len = len;
 		seg->packet_type = 0;
 		seg->ol_flags = 0;
 		/* Return packet. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 32/48] net/mlx4: use standard macro to get array size
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (30 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 31/48] net/mlx4: remove mbuf macro definitions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 33/48] net/mlx4: separate debugging macros Adrien Mazarguil
                   ` (17 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 11 ++++++-----
 drivers/net/mlx4/mlx4.h |  3 ---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 0f1169c..f4dc67f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -66,6 +66,7 @@
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
 #include <rte_branch_prediction.h>
+#include <rte_common.h>
 
 /* Generated configuration header. */
 #include "mlx4_autoconf.h"
@@ -644,7 +645,7 @@ txq_cleanup(struct txq *txq)
 		claim_zero(ibv_destroy_qp(txq->qp));
 	if (txq->cq != NULL)
 		claim_zero(ibv_destroy_cq(txq->cq));
-	for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
 		if (txq->mp2mr[i].mp == NULL)
 			break;
 		assert(txq->mp2mr[i].mr != NULL);
@@ -854,7 +855,7 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
 	unsigned int i;
 	struct ibv_mr *mr;
 
-	for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
 		if (unlikely(txq->mp2mr[i].mp == NULL)) {
 			/* Unknown MP, add a new MR for it. */
 			break;
@@ -874,7 +875,7 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
 		      (void *)txq);
 		return (uint32_t)-1;
 	}
-	if (unlikely(i == elemof(txq->mp2mr))) {
+	if (unlikely(i == RTE_DIM(txq->mp2mr))) {
 		/* Table is full, remove oldest entry. */
 		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
 		      (void *)txq);
@@ -1444,7 +1445,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 error:
 	if (elts != NULL) {
 		assert(pool == NULL);
-		for (i = 0; (i != elemof(*elts)); ++i) {
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 			struct rxq_elt *elt = &(*elts)[i];
 			struct rte_mbuf *buf;
 
@@ -1480,7 +1481,7 @@ rxq_free_elts(struct rxq *rxq)
 	rxq->elts = NULL;
 	if (elts == NULL)
 		return;
-	for (i = 0; (i != elemof(*elts)); ++i) {
+	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
 		struct rxq_elt *elt = &(*elts)[i];
 		struct rte_mbuf *buf;
 
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 3580e05..7ec6317 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -98,9 +98,6 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-/* Number of elements in array. */
-#define elemof(a) (sizeof(a) / sizeof((a)[0]))
-
 /* Debugging */
 #ifndef NDEBUG
 #include <stdio.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 33/48] net/mlx4: separate debugging macros
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (31 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 32/48] net/mlx4: use standard macro to get array size Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 34/48] net/mlx4: use a single interrupt handle Adrien Mazarguil
                   ` (16 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The new definitions also rely on the existing DPDK logging subsystem
instead of using fprintf() directly.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c       |  2 +-
 drivers/net/mlx4/mlx4.h       | 46 -------------------
 drivers/net/mlx4/mlx4_flow.c  |  1 +
 drivers/net/mlx4/mlx4_utils.h | 92 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 47 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f4dc67f..07a47ea 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -59,7 +59,6 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
-#include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
@@ -74,6 +73,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_utils.h"
 
 /* Work Request ID data type (64 bit). */
 typedef union {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 7ec6317..da3e16b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -36,23 +36,6 @@
 
 #include <stdint.h>
 
-/*
- * Runtime logging through RTE_LOG() is enabled when not in debugging mode.
- * Intermediate LOG_*() macros add the required end-of-line characters.
- */
-#ifndef NDEBUG
-#define INFO(...) DEBUG(__VA_ARGS__)
-#define WARN(...) DEBUG(__VA_ARGS__)
-#define ERROR(...) DEBUG(__VA_ARGS__)
-#else
-#define LOG__(level, m, ...) \
-	RTE_LOG(level, PMD, MLX4_DRIVER_NAME ": " m "%c", __VA_ARGS__)
-#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
-#define INFO(...) LOG_(INFO, __VA_ARGS__)
-#define WARN(...) LOG_(WARNING, __VA_ARGS__)
-#define ERROR(...) LOG_(ERR, __VA_ARGS__)
-#endif
-
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
 #ifdef PEDANTIC
@@ -98,35 +81,6 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-/* Debugging */
-#ifndef NDEBUG
-#include <stdio.h>
-#define DEBUG__(m, ...)						\
-	(fprintf(stderr, "%s:%d: %s(): " m "%c",		\
-		 __FILE__, __LINE__, __func__, __VA_ARGS__),	\
-	 fflush(stderr),					\
-	 (void)0)
-/*
- * Save/restore errno around DEBUG__().
- * XXX somewhat undefined behavior, but works.
- */
-#define DEBUG_(...)				\
-	(errno = ((int []){			\
-		*(volatile int *)&errno,	\
-		(DEBUG__(__VA_ARGS__), 0)	\
-	})[0])
-#define DEBUG(...) DEBUG_(__VA_ARGS__, '\n')
-#define claim_zero(...) assert((__VA_ARGS__) == 0)
-#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
-#define claim_positive(...) assert((__VA_ARGS__) >= 0)
-#else /* NDEBUG */
-/* No-ops. */
-#define DEBUG(...) (void)0
-#define claim_zero(...) (__VA_ARGS__)
-#define claim_nonzero(...) (__VA_ARGS__)
-#define claim_positive(...) (__VA_ARGS__)
-#endif /* NDEBUG */
-
 struct mlx4_rxq_stats {
 	unsigned int idx; /**< Mapping index. */
 	uint64_t ipackets; /**< Total of successfully received packets. */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 3463713..6f6f455 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -40,6 +40,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_utils.h"
 
 /** Static initializer for items. */
 #define ITEMS(...) \
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
new file mode 100644
index 0000000..c404de2
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -0,0 +1,92 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef MLX4_UTILS_H_
+#define MLX4_UTILS_H_
+
+#include <rte_common.h>
+#include <rte_log.h>
+
+#include "mlx4.h"
+
+#ifndef NDEBUG
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX4_DRIVER_NAME) in log messages.
+ */
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+	const char *n = s;
+
+	while (*n)
+		if (*(n++) == '/')
+			s = n;
+	return s;
+}
+
+#define PMD_DRV_LOG(level, ...) \
+	RTE_LOG(level, PMD, \
+		RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \
+			pmd_drv_log_basename(__FILE__), \
+			__LINE__, \
+			__func__, \
+			RTE_FMT_TAIL(__VA_ARGS__,)))
+#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+
+#else /* NDEBUG */
+
+/*
+ * Like assert(), DEBUG() becomes a no-op and claim_zero() does not perform
+ * any check when debugging is disabled.
+ */
+
+#define PMD_DRV_LOG(level, ...) \
+	RTE_LOG(level, PMD, \
+		RTE_FMT(MLX4_DRIVER_NAME ": " \
+			RTE_FMT_HEAD(__VA_ARGS__,) "\n", \
+		RTE_FMT_TAIL(__VA_ARGS__,)))
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__)
+#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
+#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
+
+#endif /* MLX4_UTILS_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 34/48] net/mlx4: use a single interrupt handle
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (32 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 33/48] net/mlx4: separate debugging macros Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 35/48] net/mlx4: rename alarm field Adrien Mazarguil
                   ` (15 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The reason one interrupt handle is currently used for RMV/LSC events and
another one for Rx traffic is because these come from distinct file
descriptors.

This can be simplified however as Rx interrupt file descriptors are stored
elsewhere and are registered separately.

Modifying the interrupt handle type to RTE_INTR_HANDLE_UNKNOWN has never
been necessary as disabling interrupts is actually done by unregistering
the associated callback (RMV/LSC) or emptying the EFD array (Rx). Instead,
make clear that the base handle file descriptor is invalid by setting it to
-1 when disabled.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 32 ++++++++++++++++++++------------
 drivers/net/mlx4/mlx4.h |  3 +--
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 07a47ea..7fc9b4c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2888,8 +2888,7 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 		ERROR("rte_intr_callback_unregister failed with %d %s",
 		      ret, strerror(rte_errno));
 	}
-	priv->intr_handle.fd = 0;
-	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	priv->intr_handle.fd = -1;
 	return ret;
 }
 
@@ -2930,7 +2929,6 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 		return -rte_errno;
 	} else {
 		priv->intr_handle.fd = priv->ctx->async_fd;
-		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rc = rte_intr_callback_register(&priv->intr_handle,
 						 mlx4_dev_interrupt_handler,
 						 dev);
@@ -2938,6 +2936,7 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 			rte_errno = -rc;
 			ERROR("rte_intr_callback_register failed "
 			      " (rte_errno: %s)", strerror(rte_errno));
+			priv->intr_handle.fd = -1;
 			return -rte_errno;
 		}
 	}
@@ -3068,7 +3067,7 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	unsigned int rxqs_n = priv->rxqs_n;
 	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
 	unsigned int count = 0;
-	struct rte_intr_handle *intr_handle = priv->dev->intr_handle;
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
 
 	if (!priv->dev->data->dev_conf.intr_conf.rxq)
 		return 0;
@@ -3080,7 +3079,6 @@ priv_rx_intr_vec_enable(struct priv *priv)
 		      " Rx interrupts will not be supported");
 		return -rte_errno;
 	}
-	intr_handle->type = RTE_INTR_HANDLE_EXT;
 	for (i = 0; i != n; ++i) {
 		struct rxq *rxq = (*priv->rxqs)[i];
 		int fd;
@@ -3133,7 +3131,7 @@ priv_rx_intr_vec_enable(struct priv *priv)
 static void
 priv_rx_intr_vec_disable(struct priv *priv)
 {
-	struct rte_intr_handle *intr_handle = priv->dev->intr_handle;
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
 
 	rte_intr_free_epoll_fd(intr_handle);
 	free(intr_handle->intr_vec);
@@ -3494,14 +3492,24 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		eth_dev->device = &pci_dev->device;
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
 		eth_dev->device->driver = &mlx4_driver.driver;
+		/* Initialize local interrupt handle for current port. */
+		priv->intr_handle = (struct rte_intr_handle){
+			.fd = -1,
+			.type = RTE_INTR_HANDLE_EXT,
+		};
 		/*
-		 * Copy and override interrupt handle to prevent it from
-		 * being shared between all ethdev instances of a given PCI
-		 * device. This is required to properly handle Rx interrupts
-		 * on all ports.
+		 * Override ethdev interrupt handle pointer with private
+		 * handle instead of that of the parent PCI device used by
+		 * default. This prevents it from being shared between all
+		 * ports of the same PCI device since each of them is
+		 * associated its own Verbs context.
+		 *
+		 * Rx interrupts in particular require this as the PMD has
+		 * no control over the registration of queue interrupts
+		 * besides setting up eth_dev->intr_handle, the rest is
+		 * handled by rte_intr_rx_ctl().
 		 */
-		priv->intr_handle_dev = *eth_dev->intr_handle;
-		eth_dev->intr_handle = &priv->intr_handle_dev;
+		eth_dev->intr_handle = &priv->intr_handle;
 		priv->dev = eth_dev;
 		eth_dev->dev_ops = &mlx4_dev_ops;
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index da3e16b..087c831 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -169,8 +169,7 @@ struct priv {
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
-	struct rte_intr_handle intr_handle_dev; /* Device interrupt handler. */
-	struct rte_intr_handle intr_handle; /* Interrupt handler. */
+	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 35/48] net/mlx4: rename alarm field
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (33 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 34/48] net/mlx4: use a single interrupt handle Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 36/48] net/mlx4: refactor interrupt FD settings Adrien Mazarguil
                   ` (14 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Make clear this field is related to interrupt handling.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 14 +++++++-------
 drivers/net/mlx4/mlx4.h |  6 +++---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 7fc9b4c..9f1eb4e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2791,10 +2791,10 @@ priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
 	mlx4_link_update(dev, 0);
 	if (((link->link_speed == 0) && link->link_status) ||
 	    ((link->link_speed != 0) && !link->link_status)) {
-		if (!priv->pending_alarm) {
+		if (!priv->intr_alarm) {
 			/* Inconsistent status, check again later. */
-			priv->pending_alarm = 1;
-			rte_eal_alarm_set(MLX4_ALARM_TIMEOUT_US,
+			priv->intr_alarm = 1;
+			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
 					  mlx4_dev_link_status_handler,
 					  dev);
 		}
@@ -2818,8 +2818,8 @@ mlx4_dev_link_status_handler(void *arg)
 	uint32_t events;
 	int ret;
 
-	assert(priv->pending_alarm == 1);
-	priv->pending_alarm = 0;
+	assert(priv->intr_alarm == 1);
+	priv->intr_alarm = 0;
 	ret = priv_dev_status_handler(priv, dev, &events);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
@@ -2988,14 +2988,14 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
 		if (ret)
 			return ret;
 	}
-	if (priv->pending_alarm)
+	if (priv->intr_alarm)
 		if (rte_eal_alarm_cancel(mlx4_dev_link_status_handler,
 					 dev)) {
 			ERROR("rte_eal_alarm_cancel failed "
 			      " (rte_errno: %s)", strerror(rte_errno));
 			return -rte_errno;
 		}
-	priv->pending_alarm = 0;
+	priv->intr_alarm = 0;
 	return 0;
 }
 
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 087c831..ed0e6cd 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -63,8 +63,8 @@
 #define MLX4_PMD_TX_MP_CACHE 8
 #endif
 
-/* Alarm timeout. */
-#define MLX4_ALARM_TIMEOUT_US 100000
+/* Interrupt alarm timeout value in microseconds. */
+#define MLX4_INTR_ALARM_TIMEOUT 100000
 
 /* Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
@@ -162,7 +162,7 @@ struct priv {
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
 	unsigned int vf:1; /* This is a VF device. */
-	unsigned int pending_alarm:1; /* An alarm is pending. */
+	unsigned int intr_alarm:1; /* An interrupt alarm is scheduled. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 36/48] net/mlx4: refactor interrupt FD settings
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (34 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 35/48] net/mlx4: rename alarm field Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 37/48] net/mlx4: clean up interrupt functions prototypes Adrien Mazarguil
                   ` (13 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

File descriptors used for interrupts processing must be made non-blocking.

Doing so as soon as they are opened instead of waiting until they are
needed is more efficient as it avoids performing redundant system calls and
run through their associated error-handling code later on.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile     |  1 +
 drivers/net/mlx4/mlx4.c       | 63 ++++++++++++++----------------------
 drivers/net/mlx4/mlx4.h       |  4 +++
 drivers/net/mlx4/mlx4_utils.c | 66 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_utils.h |  4 +++
 5 files changed, 99 insertions(+), 39 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 78ea350..77aaad2 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -37,6 +37,7 @@ LIB = librte_pmd_mlx4.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 9f1eb4e..d6d4be7 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -48,7 +48,6 @@
 #include <netinet/in.h>
 #include <linux/ethtool.h>
 #include <linux/sockios.h>
-#include <fcntl.h>
 
 #include <rte_ether.h>
 #include <rte_ethdev.h>
@@ -1871,6 +1870,12 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
+		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
+			ERROR("%p: unable to make Rx interrupt completion"
+			      " channel non-blocking: %s",
+			      (void *)dev, strerror(rte_errno));
+			goto error;
+		}
 	}
 	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
 	if (tmpl.cq == NULL) {
@@ -2907,7 +2912,6 @@ static int
 priv_dev_interrupt_handler_install(struct priv *priv,
 				   struct rte_eth_dev *dev)
 {
-	int flags;
 	int rc;
 
 	/*
@@ -2918,29 +2922,17 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 	    priv->intr_conf.rmv &&
 	    priv->intr_handle.fd)
 		return 0;
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	rc = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
-	if (rc < 0) {
-		rte_errno = errno ? errno : EINVAL;
-		INFO("failed to change file descriptor async event queue");
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-		return -rte_errno;
-	} else {
-		priv->intr_handle.fd = priv->ctx->async_fd;
-		rc = rte_intr_callback_register(&priv->intr_handle,
-						 mlx4_dev_interrupt_handler,
-						 dev);
-		if (rc) {
-			rte_errno = -rc;
-			ERROR("rte_intr_callback_register failed "
-			      " (rte_errno: %s)", strerror(rte_errno));
-			priv->intr_handle.fd = -1;
-			return -rte_errno;
-		}
-	}
-	return 0;
+	priv->intr_handle.fd = priv->ctx->async_fd;
+	rc = rte_intr_callback_register(&priv->intr_handle,
+					mlx4_dev_interrupt_handler,
+					dev);
+	if (!rc)
+		return 0;
+	rte_errno = -rc;
+	ERROR("rte_intr_callback_register failed (rte_errno: %s)",
+	      strerror(rte_errno));
+	priv->intr_handle.fd = -1;
+	return -rte_errno;
 }
 
 /**
@@ -3081,9 +3073,6 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	}
 	for (i = 0; i != n; ++i) {
 		struct rxq *rxq = (*priv->rxqs)[i];
-		int fd;
-		int flags;
-		int rc;
 
 		/* Skip queues that cannot request interrupts. */
 		if (!rxq || !rxq->channel) {
@@ -3101,18 +3090,8 @@ priv_rx_intr_vec_enable(struct priv *priv)
 			priv_rx_intr_vec_disable(priv);
 			return -rte_errno;
 		}
-		fd = rxq->channel->fd;
-		flags = fcntl(fd, F_GETFL);
-		rc = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
-		if (rc < 0) {
-			rte_errno = errno;
-			ERROR("failed to make Rx interrupt file descriptor"
-			      " %d non-blocking for queue index %d", fd, i);
-			priv_rx_intr_vec_disable(priv);
-			return -rte_errno;
-		}
 		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
-		intr_handle->efds[count] = fd;
+		intr_handle->efds[count] = rxq->channel->fd;
 		count++;
 	}
 	if (!count)
@@ -3423,6 +3402,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			DEBUG("port %d is not active: \"%s\" (%d)",
 			      port, ibv_port_state_str(port_attr.state),
 			      port_attr.state);
+		/* Make asynchronous FD non-blocking to handle interrupts. */
+		if (mlx4_fd_set_non_blocking(ctx->async_fd) < 0) {
+			ERROR("cannot make asynchronous FD non-blocking: %s",
+			      strerror(rte_errno));
+			goto port_error;
+		}
 		/* Allocate protection domain. */
 		pd = ibv_alloc_pd(ctx);
 		if (pd == NULL) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index ed0e6cd..6104842 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -46,6 +46,10 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
diff --git a/drivers/net/mlx4/mlx4_utils.c b/drivers/net/mlx4/mlx4_utils.c
new file mode 100644
index 0000000..fcf76c9
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_utils.c
@@ -0,0 +1,66 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Utility functions used by the mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+
+#include <rte_errno.h>
+
+#include "mlx4_utils.h"
+
+/**
+ * Make a file descriptor non-blocking.
+ *
+ * @param fd
+ *   File descriptor to alter.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_fd_set_non_blocking(int fd)
+{
+	int ret = fcntl(fd, F_GETFL);
+
+	if (ret != -1 && !fcntl(fd, F_SETFL, ret | O_NONBLOCK))
+		return 0;
+	assert(errno);
+	rte_errno = errno;
+	return -rte_errno;
+}
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index c404de2..0b9a96a 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -89,4 +89,8 @@ pmd_drv_log_basename(const char *s)
 #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
 
+/* mlx4_utils.c */
+
+int mlx4_fd_set_non_blocking(int fd);
+
 #endif /* MLX4_UTILS_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 37/48] net/mlx4: clean up interrupt functions prototypes
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (35 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 36/48] net/mlx4: refactor interrupt FD settings Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 38/48] net/mlx4: compact interrupt functions Adrien Mazarguil
                   ` (12 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

The naming scheme for these functions is overly verbose and not accurate
enough, with too many "handler" functions that are difficult to
differentiate (e.g. mlx4_dev_link_status_handler(),
mlx4_dev_interrupt_handler() and priv_dev_status_handler()).

This commit renames them and removes the unnecessary dev argument which can
be retrieved through the private structure where needed. Documentation is
updated accordingly.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 145 ++++++++++++++++---------------------------
 1 file changed, 55 insertions(+), 90 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d6d4be7..50e0687 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2048,14 +2048,9 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	rte_free(rxq);
 }
 
-static int
-priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
-
-static int
-priv_dev_removal_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
-
-static int
-priv_dev_link_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
+static int priv_interrupt_handler_install(struct priv *priv);
+static int priv_removal_interrupt_handler_install(struct priv *priv);
+static int priv_link_interrupt_handler_install(struct priv *priv);
 
 /**
  * DPDK callback to start the device.
@@ -2081,13 +2076,13 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	ret = priv_mac_addr_add(priv);
 	if (ret)
 		goto err;
-	ret = priv_dev_link_interrupt_handler_install(priv, dev);
+	ret = priv_link_interrupt_handler_install(priv);
 	if (ret) {
 		ERROR("%p: LSC handler install failed",
 		     (void *)dev);
 		goto err;
 	}
-	ret = priv_dev_removal_interrupt_handler_install(priv, dev);
+	ret = priv_removal_interrupt_handler_install(priv);
 	if (ret) {
 		ERROR("%p: RMV handler install failed",
 		     (void *)dev);
@@ -2184,15 +2179,9 @@ removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	return 0;
 }
 
-static int
-priv_dev_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
-
-static int
-priv_dev_removal_interrupt_handler_uninstall(struct priv *,
-					     struct rte_eth_dev *);
-
-static int
-priv_dev_link_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
+static int priv_interrupt_handler_uninstall(struct priv *priv);
+static int priv_removal_interrupt_handler_uninstall(struct priv *priv);
+static int priv_link_interrupt_handler_uninstall(struct priv *priv);
 
 /**
  * DPDK callback to close the device.
@@ -2256,8 +2245,8 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	priv_dev_removal_interrupt_handler_uninstall(priv, dev);
-	priv_dev_link_interrupt_handler_uninstall(priv, dev);
+	priv_removal_interrupt_handler_uninstall(priv);
+	priv_link_interrupt_handler_uninstall(priv);
 	priv_rx_intr_vec_disable(priv);
 	memset(priv, 0, sizeof(*priv));
 }
@@ -2745,31 +2734,25 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-static void
-mlx4_dev_link_status_handler(void *);
-static void
-mlx4_dev_interrupt_handler(void *);
+static void mlx4_link_status_alarm(struct priv *priv);
 
 /**
- * Link/device status handler.
+ * Collect interrupt events.
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  * @param events
  *   Pointer to event flags holder.
  *
  * @return
- *   Number of events
+ *   Number of events.
  */
 static int
-priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
-			uint32_t *events)
+priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
 {
 	struct ibv_async_event event;
 	int port_change = 0;
-	struct rte_eth_link *link = &dev->data->dev_link;
+	struct rte_eth_link *link = &priv->dev->data->dev_link;
 	int ret = 0;
 
 	*events = 0;
@@ -2793,15 +2776,16 @@ priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
 	}
 	if (!port_change)
 		return ret;
-	mlx4_link_update(dev, 0);
+	mlx4_link_update(priv->dev, 0);
 	if (((link->link_speed == 0) && link->link_status) ||
 	    ((link->link_speed != 0) && !link->link_status)) {
 		if (!priv->intr_alarm) {
 			/* Inconsistent status, check again later. */
 			priv->intr_alarm = 1;
 			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
-					  mlx4_dev_link_status_handler,
-					  dev);
+					  (void (*)(void *))
+					  mlx4_link_status_alarm,
+					  priv);
 		}
 	} else {
 		*events |= (1 << RTE_ETH_EVENT_INTR_LSC);
@@ -2810,53 +2794,48 @@ priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
 }
 
 /**
- * Handle delayed link status event.
+ * Process scheduled link status check.
  *
- * @param arg
- *   Registered argument.
+ * @param priv
+ *   Pointer to private structure.
  */
 static void
-mlx4_dev_link_status_handler(void *arg)
+mlx4_link_status_alarm(struct priv *priv)
 {
-	struct rte_eth_dev *dev = arg;
-	struct priv *priv = dev->data->dev_private;
 	uint32_t events;
 	int ret;
 
 	assert(priv->intr_alarm == 1);
 	priv->intr_alarm = 0;
-	ret = priv_dev_status_handler(priv, dev, &events);
+	ret = priv_collect_interrupt_events(priv, &events);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
+		_rte_eth_dev_callback_process(priv->dev,
+					      RTE_ETH_EVENT_INTR_LSC,
 					      NULL, NULL);
 }
 
 /**
  * Handle interrupts from the NIC.
  *
- * @param[in] intr_handle
- *   Interrupt handler.
- * @param cb_arg
- *   Callback argument.
+ * @param priv
+ *   Pointer to private structure.
  */
 static void
-mlx4_dev_interrupt_handler(void *cb_arg)
+mlx4_interrupt_handler(struct priv *priv)
 {
-	struct rte_eth_dev *dev = cb_arg;
-	struct priv *priv = dev->data->dev_private;
 	int ret;
 	uint32_t ev;
 	int i;
 
-	ret = priv_dev_status_handler(priv, dev, &ev);
+	ret = priv_collect_interrupt_events(priv, &ev);
 	if (ret > 0) {
 		for (i = RTE_ETH_EVENT_UNKNOWN;
 		     i < RTE_ETH_EVENT_MAX;
 		     i++) {
 			if (ev & (1 << i)) {
 				ev &= ~(1 << i);
-				_rte_eth_dev_callback_process(dev, i, NULL,
-							      NULL);
+				_rte_eth_dev_callback_process(priv->dev, i,
+							      NULL, NULL);
 				ret--;
 			}
 		}
@@ -2871,14 +2850,12 @@ mlx4_dev_interrupt_handler(void *cb_arg)
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
+priv_interrupt_handler_uninstall(struct priv *priv)
 {
 	int ret;
 
@@ -2886,8 +2863,9 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 	    priv->intr_conf.rmv)
 		return 0;
 	ret = rte_intr_callback_unregister(&priv->intr_handle,
-					   mlx4_dev_interrupt_handler,
-					   dev);
+					   (void (*)(void *))
+					   mlx4_interrupt_handler,
+					   priv);
 	if (ret < 0) {
 		rte_errno = ret;
 		ERROR("rte_intr_callback_unregister failed with %d %s",
@@ -2902,15 +2880,12 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_interrupt_handler_install(struct priv *priv,
-				   struct rte_eth_dev *dev)
+priv_interrupt_handler_install(struct priv *priv)
 {
 	int rc;
 
@@ -2924,8 +2899,9 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 		return 0;
 	priv->intr_handle.fd = priv->ctx->async_fd;
 	rc = rte_intr_callback_register(&priv->intr_handle,
-					mlx4_dev_interrupt_handler,
-					dev);
+					(void (*)(void *))
+					mlx4_interrupt_handler,
+					priv);
 	if (!rc)
 		return 0;
 	rte_errno = -rc;
@@ -2940,19 +2916,16 @@ priv_dev_interrupt_handler_install(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
-					    struct rte_eth_dev *dev)
+priv_removal_interrupt_handler_uninstall(struct priv *priv)
 {
-	if (dev->data->dev_conf.intr_conf.rmv) {
+	if (priv->dev->data->dev_conf.intr_conf.rmv) {
 		priv->intr_conf.rmv = 0;
-		return priv_dev_interrupt_handler_uninstall(priv, dev);
+		return priv_interrupt_handler_uninstall(priv);
 	}
 	return 0;
 }
@@ -2962,27 +2935,25 @@ priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
-					  struct rte_eth_dev *dev)
+priv_link_interrupt_handler_uninstall(struct priv *priv)
 {
 	int ret = 0;
 
-	if (dev->data->dev_conf.intr_conf.lsc) {
+	if (priv->dev->data->dev_conf.intr_conf.lsc) {
 		priv->intr_conf.lsc = 0;
-		ret = priv_dev_interrupt_handler_uninstall(priv, dev);
+		ret = priv_interrupt_handler_uninstall(priv);
 		if (ret)
 			return ret;
 	}
 	if (priv->intr_alarm)
-		if (rte_eal_alarm_cancel(mlx4_dev_link_status_handler,
-					 dev)) {
+		if (rte_eal_alarm_cancel((void (*)(void *))
+					 mlx4_link_status_alarm,
+					 priv)) {
 			ERROR("rte_eal_alarm_cancel failed "
 			      " (rte_errno: %s)", strerror(rte_errno));
 			return -rte_errno;
@@ -2996,20 +2967,17 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_link_interrupt_handler_install(struct priv *priv,
-					struct rte_eth_dev *dev)
+priv_link_interrupt_handler_install(struct priv *priv)
 {
 	int ret;
 
-	if (dev->data->dev_conf.intr_conf.lsc) {
-		ret = priv_dev_interrupt_handler_install(priv, dev);
+	if (priv->dev->data->dev_conf.intr_conf.lsc) {
+		ret = priv_interrupt_handler_install(priv);
 		if (ret)
 			return ret;
 		priv->intr_conf.lsc = 1;
@@ -3022,20 +2990,17 @@ priv_dev_link_interrupt_handler_install(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_removal_interrupt_handler_install(struct priv *priv,
-					   struct rte_eth_dev *dev)
+priv_removal_interrupt_handler_install(struct priv *priv)
 {
 	int ret;
 
-	if (dev->data->dev_conf.intr_conf.rmv) {
-		ret = priv_dev_interrupt_handler_install(priv, dev);
+	if (priv->dev->data->dev_conf.intr_conf.rmv) {
+		ret = priv_interrupt_handler_install(priv);
 		if (ret)
 			return ret;
 		priv->intr_conf.rmv = 1;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 38/48] net/mlx4: compact interrupt functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (36 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 37/48] net/mlx4: clean up interrupt functions prototypes Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 39/48] net/mlx4: separate interrupt handling Adrien Mazarguil
                   ` (11 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Link status (LSC) and removal (RMV) interrupts share a common handler and
are toggled simultaneously from common install/uninstall functions.

Four additional wrapper functions (two for each interrupt type) are
currently necessary because the PMD maintains an internal configuration
state for interrupts (priv->intr_conf).

This complexity can be avoided entirely since the PMD does not disable
interrupts configuration parameters in case of error anymore.

With this commit, only two functions are necessary to toggle interrupts
(including Rx) during start/stop cycles.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 199 +++++++++----------------------------------
 drivers/net/mlx4/mlx4.h |   1 -
 2 files changed, 41 insertions(+), 159 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 50e0687..c99f040 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2048,9 +2048,8 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	rte_free(rxq);
 }
 
-static int priv_interrupt_handler_install(struct priv *priv);
-static int priv_removal_interrupt_handler_install(struct priv *priv);
-static int priv_link_interrupt_handler_install(struct priv *priv);
+static int priv_intr_uninstall(struct priv *priv);
+static int priv_intr_install(struct priv *priv);
 
 /**
  * DPDK callback to start the device.
@@ -2076,24 +2075,12 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	ret = priv_mac_addr_add(priv);
 	if (ret)
 		goto err;
-	ret = priv_link_interrupt_handler_install(priv);
+	ret = priv_intr_install(priv);
 	if (ret) {
-		ERROR("%p: LSC handler install failed",
+		ERROR("%p: interrupt handler installation failed",
 		     (void *)dev);
 		goto err;
 	}
-	ret = priv_removal_interrupt_handler_install(priv);
-	if (ret) {
-		ERROR("%p: RMV handler install failed",
-		     (void *)dev);
-		goto err;
-	}
-	ret = priv_rx_intr_vec_enable(priv);
-	if (ret) {
-		ERROR("%p: Rx interrupt vector creation failed",
-		      (void *)dev);
-		goto err;
-	}
 	ret = mlx4_priv_flow_start(priv);
 	if (ret) {
 		ERROR("%p: flow start failed: %s",
@@ -2126,6 +2113,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
+	priv_intr_uninstall(priv);
 	priv_mac_addr_del(priv);
 }
 
@@ -2179,10 +2167,6 @@ removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	return 0;
 }
 
-static int priv_interrupt_handler_uninstall(struct priv *priv);
-static int priv_removal_interrupt_handler_uninstall(struct priv *priv);
-static int priv_link_interrupt_handler_uninstall(struct priv *priv);
-
 /**
  * DPDK callback to close the device.
  *
@@ -2245,9 +2229,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	priv_removal_interrupt_handler_uninstall(priv);
-	priv_link_interrupt_handler_uninstall(priv);
-	priv_rx_intr_vec_disable(priv);
+	priv_intr_uninstall(priv);
 	memset(priv, 0, sizeof(*priv));
 }
 
@@ -2753,6 +2735,8 @@ priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
 	struct ibv_async_event event;
 	int port_change = 0;
 	struct rte_eth_link *link = &priv->dev->data->dev_link;
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
 	int ret = 0;
 
 	*events = 0;
@@ -2762,11 +2746,11 @@ priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 		     event.event_type == IBV_EVENT_PORT_ERR) &&
-		    (priv->intr_conf.lsc == 1)) {
+		    intr_conf->lsc) {
 			port_change = 1;
 			ret++;
 		} else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			   priv->intr_conf.rmv == 1) {
+			   intr_conf->rmv) {
 			*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
 			ret++;
 		} else
@@ -2855,24 +2839,22 @@ mlx4_interrupt_handler(struct priv *priv)
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_interrupt_handler_uninstall(struct priv *priv)
+priv_intr_uninstall(struct priv *priv)
 {
-	int ret;
+	int err = rte_errno; /* Make sure rte_errno remains unchanged. */
 
-	if (priv->intr_conf.lsc ||
-	    priv->intr_conf.rmv)
-		return 0;
-	ret = rte_intr_callback_unregister(&priv->intr_handle,
-					   (void (*)(void *))
-					   mlx4_interrupt_handler,
-					   priv);
-	if (ret < 0) {
-		rte_errno = ret;
-		ERROR("rte_intr_callback_unregister failed with %d %s",
-		      ret, strerror(rte_errno));
+	if (priv->intr_handle.fd != -1) {
+		rte_intr_callback_unregister(&priv->intr_handle,
+					     (void (*)(void *))
+					     mlx4_interrupt_handler,
+					     priv);
+		priv->intr_handle.fd = -1;
 	}
-	priv->intr_handle.fd = -1;
-	return ret;
+	rte_eal_alarm_cancel((void (*)(void *))mlx4_link_status_alarm, priv);
+	priv->intr_alarm = 0;
+	priv_rx_intr_vec_disable(priv);
+	rte_errno = err;
+	return 0;
 }
 
 /**
@@ -2885,127 +2867,30 @@ priv_interrupt_handler_uninstall(struct priv *priv)
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_interrupt_handler_install(struct priv *priv)
+priv_intr_install(struct priv *priv)
 {
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
 	int rc;
 
-	/*
-	 * Check whether the interrupt handler has already been installed
-	 * for either type of interrupt.
-	 */
-	if (priv->intr_conf.lsc &&
-	    priv->intr_conf.rmv &&
-	    priv->intr_handle.fd)
-		return 0;
-	priv->intr_handle.fd = priv->ctx->async_fd;
-	rc = rte_intr_callback_register(&priv->intr_handle,
-					(void (*)(void *))
-					mlx4_interrupt_handler,
-					priv);
-	if (!rc)
-		return 0;
-	rte_errno = -rc;
-	ERROR("rte_intr_callback_register failed (rte_errno: %s)",
-	      strerror(rte_errno));
-	priv->intr_handle.fd = -1;
-	return -rte_errno;
-}
-
-/**
- * Uninstall interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_removal_interrupt_handler_uninstall(struct priv *priv)
-{
-	if (priv->dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_conf.rmv = 0;
-		return priv_interrupt_handler_uninstall(priv);
-	}
-	return 0;
-}
-
-/**
- * Uninstall interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_link_interrupt_handler_uninstall(struct priv *priv)
-{
-	int ret = 0;
-
-	if (priv->dev->data->dev_conf.intr_conf.lsc) {
-		priv->intr_conf.lsc = 0;
-		ret = priv_interrupt_handler_uninstall(priv);
-		if (ret)
-			return ret;
-	}
-	if (priv->intr_alarm)
-		if (rte_eal_alarm_cancel((void (*)(void *))
-					 mlx4_link_status_alarm,
-					 priv)) {
-			ERROR("rte_eal_alarm_cancel failed "
-			      " (rte_errno: %s)", strerror(rte_errno));
-			return -rte_errno;
+	priv_intr_uninstall(priv);
+	if (intr_conf->rxq && priv_rx_intr_vec_enable(priv) < 0)
+		goto error;
+	if (intr_conf->lsc | intr_conf->rmv) {
+		priv->intr_handle.fd = priv->ctx->async_fd;
+		rc = rte_intr_callback_register(&priv->intr_handle,
+						(void (*)(void *))
+						mlx4_interrupt_handler,
+						priv);
+		if (rc < 0) {
+			rte_errno = -rc;
+			goto error;
 		}
-	priv->intr_alarm = 0;
-	return 0;
-}
-
-/**
- * Install link interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_link_interrupt_handler_install(struct priv *priv)
-{
-	int ret;
-
-	if (priv->dev->data->dev_conf.intr_conf.lsc) {
-		ret = priv_interrupt_handler_install(priv);
-		if (ret)
-			return ret;
-		priv->intr_conf.lsc = 1;
-	}
-	return 0;
-}
-
-/**
- * Install removal interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_removal_interrupt_handler_install(struct priv *priv)
-{
-	int ret;
-
-	if (priv->dev->data->dev_conf.intr_conf.rmv) {
-		ret = priv_interrupt_handler_install(priv);
-		if (ret)
-			return ret;
-		priv->intr_conf.rmv = 1;
 	}
 	return 0;
+error:
+	priv_intr_uninstall(priv);
+	return -rte_errno;
 }
 
 /**
@@ -3026,8 +2911,6 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	unsigned int count = 0;
 	struct rte_intr_handle *intr_handle = &priv->intr_handle;
 
-	if (!priv->dev->data->dev_conf.intr_conf.rxq)
-		return 0;
 	priv_rx_intr_vec_disable(priv);
 	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
 	if (intr_handle->intr_vec == NULL) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 6104842..528607c 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -176,7 +176,6 @@ struct priv {
 	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
-	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
 };
 
 #endif /* RTE_PMD_MLX4_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 39/48] net/mlx4: separate interrupt handling
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (37 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 38/48] net/mlx4: compact interrupt functions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 40/48] net/mlx4: separate Rx/Tx definitions Adrien Mazarguil
                   ` (10 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 340 +---------------------------------
 drivers/net/mlx4/mlx4.h      |  11 ++
 drivers/net/mlx4/mlx4_intr.c | 376 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 392 insertions(+), 336 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 77aaad2..37dcdf7 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -37,6 +37,7 @@ LIB = librte_pmd_mlx4.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index c99f040..284575c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -58,7 +58,6 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
-#include <rte_alarm.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
 #include <rte_kvargs.h>
@@ -99,18 +98,6 @@ const char *pmd_mlx4_init_params[] = {
 	NULL,
 };
 
-static int
-mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
-
-static int
-mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
-
-static int
-priv_rx_intr_vec_enable(struct priv *priv);
-
-static void
-priv_rx_intr_vec_disable(struct priv *priv);
-
 /* Allocate a buffer on the stack and fill it with a printf format string. */
 #define MKSTR(name, ...) \
 	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
@@ -2048,9 +2035,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	rte_free(rxq);
 }
 
-static int priv_intr_uninstall(struct priv *priv);
-static int priv_intr_install(struct priv *priv);
-
 /**
  * DPDK callback to start the device.
  *
@@ -2075,7 +2059,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	ret = priv_mac_addr_add(priv);
 	if (ret)
 		goto err;
-	ret = priv_intr_install(priv);
+	ret = mlx4_intr_install(priv);
 	if (ret) {
 		ERROR("%p: interrupt handler installation failed",
 		     (void *)dev);
@@ -2113,7 +2097,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
-	priv_intr_uninstall(priv);
+	mlx4_intr_uninstall(priv);
 	priv_mac_addr_del(priv);
 }
 
@@ -2229,7 +2213,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	priv_intr_uninstall(priv);
+	mlx4_intr_uninstall(priv);
 	memset(priv, 0, sizeof(*priv));
 }
 
@@ -2441,7 +2425,7 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 {
 	const struct priv *priv = dev->data->dev_private;
@@ -2716,322 +2700,6 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-static void mlx4_link_status_alarm(struct priv *priv);
-
-/**
- * Collect interrupt events.
- *
- * @param priv
- *   Pointer to private structure.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Number of events.
- */
-static int
-priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
-{
-	struct ibv_async_event event;
-	int port_change = 0;
-	struct rte_eth_link *link = &priv->dev->data->dev_link;
-	const struct rte_intr_conf *const intr_conf =
-		&priv->dev->data->dev_conf.intr_conf;
-	int ret = 0;
-
-	*events = 0;
-	/* Read all message and acknowledge them. */
-	for (;;) {
-		if (ibv_get_async_event(priv->ctx, &event))
-			break;
-		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-		     event.event_type == IBV_EVENT_PORT_ERR) &&
-		    intr_conf->lsc) {
-			port_change = 1;
-			ret++;
-		} else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			   intr_conf->rmv) {
-			*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
-			ret++;
-		} else
-			DEBUG("event type %d on port %d not handled",
-			      event.event_type, event.element.port_num);
-		ibv_ack_async_event(&event);
-	}
-	if (!port_change)
-		return ret;
-	mlx4_link_update(priv->dev, 0);
-	if (((link->link_speed == 0) && link->link_status) ||
-	    ((link->link_speed != 0) && !link->link_status)) {
-		if (!priv->intr_alarm) {
-			/* Inconsistent status, check again later. */
-			priv->intr_alarm = 1;
-			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
-					  (void (*)(void *))
-					  mlx4_link_status_alarm,
-					  priv);
-		}
-	} else {
-		*events |= (1 << RTE_ETH_EVENT_INTR_LSC);
-	}
-	return ret;
-}
-
-/**
- * Process scheduled link status check.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-mlx4_link_status_alarm(struct priv *priv)
-{
-	uint32_t events;
-	int ret;
-
-	assert(priv->intr_alarm == 1);
-	priv->intr_alarm = 0;
-	ret = priv_collect_interrupt_events(priv, &events);
-	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(priv->dev,
-					      RTE_ETH_EVENT_INTR_LSC,
-					      NULL, NULL);
-}
-
-/**
- * Handle interrupts from the NIC.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-mlx4_interrupt_handler(struct priv *priv)
-{
-	int ret;
-	uint32_t ev;
-	int i;
-
-	ret = priv_collect_interrupt_events(priv, &ev);
-	if (ret > 0) {
-		for (i = RTE_ETH_EVENT_UNKNOWN;
-		     i < RTE_ETH_EVENT_MAX;
-		     i++) {
-			if (ev & (1 << i)) {
-				ev &= ~(1 << i);
-				_rte_eth_dev_callback_process(priv->dev, i,
-							      NULL, NULL);
-				ret--;
-			}
-		}
-		if (ret)
-			WARN("%d event%s not processed", ret,
-			     (ret > 1 ? "s were" : " was"));
-	}
-}
-
-/**
- * Uninstall interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_intr_uninstall(struct priv *priv)
-{
-	int err = rte_errno; /* Make sure rte_errno remains unchanged. */
-
-	if (priv->intr_handle.fd != -1) {
-		rte_intr_callback_unregister(&priv->intr_handle,
-					     (void (*)(void *))
-					     mlx4_interrupt_handler,
-					     priv);
-		priv->intr_handle.fd = -1;
-	}
-	rte_eal_alarm_cancel((void (*)(void *))mlx4_link_status_alarm, priv);
-	priv->intr_alarm = 0;
-	priv_rx_intr_vec_disable(priv);
-	rte_errno = err;
-	return 0;
-}
-
-/**
- * Install interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_intr_install(struct priv *priv)
-{
-	const struct rte_intr_conf *const intr_conf =
-		&priv->dev->data->dev_conf.intr_conf;
-	int rc;
-
-	priv_intr_uninstall(priv);
-	if (intr_conf->rxq && priv_rx_intr_vec_enable(priv) < 0)
-		goto error;
-	if (intr_conf->lsc | intr_conf->rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
-		rc = rte_intr_callback_register(&priv->intr_handle,
-						(void (*)(void *))
-						mlx4_interrupt_handler,
-						priv);
-		if (rc < 0) {
-			rte_errno = -rc;
-			goto error;
-		}
-	}
-	return 0;
-error:
-	priv_intr_uninstall(priv);
-	return -rte_errno;
-}
-
-/**
- * Allocate queue vector and fill epoll fd list for Rx interrupts.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_rx_intr_vec_enable(struct priv *priv)
-{
-	unsigned int i;
-	unsigned int rxqs_n = priv->rxqs_n;
-	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
-	unsigned int count = 0;
-	struct rte_intr_handle *intr_handle = &priv->intr_handle;
-
-	priv_rx_intr_vec_disable(priv);
-	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
-	if (intr_handle->intr_vec == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("failed to allocate memory for interrupt vector,"
-		      " Rx interrupts will not be supported");
-		return -rte_errno;
-	}
-	for (i = 0; i != n; ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
-
-		/* Skip queues that cannot request interrupts. */
-		if (!rxq || !rxq->channel) {
-			/* Use invalid intr_vec[] index to disable entry. */
-			intr_handle->intr_vec[i] =
-				RTE_INTR_VEC_RXTX_OFFSET +
-				RTE_MAX_RXTX_INTR_VEC_ID;
-			continue;
-		}
-		if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
-			rte_errno = E2BIG;
-			ERROR("too many Rx queues for interrupt vector size"
-			      " (%d), Rx interrupts cannot be enabled",
-			      RTE_MAX_RXTX_INTR_VEC_ID);
-			priv_rx_intr_vec_disable(priv);
-			return -rte_errno;
-		}
-		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
-		intr_handle->efds[count] = rxq->channel->fd;
-		count++;
-	}
-	if (!count)
-		priv_rx_intr_vec_disable(priv);
-	else
-		intr_handle->nb_efd = count;
-	return 0;
-}
-
-/**
- * Clean up Rx interrupts handler.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_rx_intr_vec_disable(struct priv *priv)
-{
-	struct rte_intr_handle *intr_handle = &priv->intr_handle;
-
-	rte_intr_free_epoll_fd(intr_handle);
-	free(intr_handle->intr_vec);
-	intr_handle->nb_efd = 0;
-	intr_handle->intr_vec = NULL;
-}
-
-/**
- * DPDK callback for Rx queue interrupt enable.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Rx queue index.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
-	int ret;
-
-	if (!rxq || !rxq->channel)
-		ret = EINVAL;
-	else
-		ret = ibv_req_notify_cq(rxq->cq, 0);
-	if (ret) {
-		rte_errno = ret;
-		WARN("unable to arm interrupt on rx queue %d", idx);
-	}
-	return -ret;
-}
-
-/**
- * DPDK callback for Rx queue interrupt disable.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Rx queue index.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
-	struct ibv_cq *ev_cq;
-	void *ev_ctx;
-	int ret;
-
-	if (!rxq || !rxq->channel) {
-		ret = EINVAL;
-	} else {
-		ret = ibv_get_cq_event(rxq->cq->channel, &ev_cq, &ev_ctx);
-		if (ret || ev_cq != rxq->cq)
-			ret = EINVAL;
-	}
-	if (ret) {
-		rte_errno = ret;
-		WARN("unable to disable interrupt on rx queue %d",
-		     idx);
-	} else {
-		ibv_ack_cq_events(rxq->cq, 1);
-	}
-	return -ret;
-}
-
 /**
  * Verify and store value for device argument.
  *
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 528607c..f815dd8 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -178,4 +178,15 @@ struct priv {
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 };
 
+/* mlx4.c */
+
+int mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete);
+
+/* mlx4_intr.c */
+
+int mlx4_intr_uninstall(struct priv *priv);
+int mlx4_intr_install(struct priv *priv);
+int mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
+int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
+
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
new file mode 100644
index 0000000..bcf4d59
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -0,0 +1,376 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Interrupts handling for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_alarm.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_interrupts.h>
+
+#include "mlx4.h"
+#include "mlx4_utils.h"
+
+static void mlx4_link_status_alarm(struct priv *priv);
+
+/**
+ * Clean up Rx interrupts handler.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static void
+mlx4_rx_intr_vec_disable(struct priv *priv)
+{
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
+
+	rte_intr_free_epoll_fd(intr_handle);
+	free(intr_handle->intr_vec);
+	intr_handle->nb_efd = 0;
+	intr_handle->intr_vec = NULL;
+}
+
+/**
+ * Allocate queue vector and fill epoll fd list for Rx interrupts.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_rx_intr_vec_enable(struct priv *priv)
+{
+	unsigned int i;
+	unsigned int rxqs_n = priv->rxqs_n;
+	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
+	unsigned int count = 0;
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
+
+	mlx4_rx_intr_vec_disable(priv);
+	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
+	if (intr_handle->intr_vec == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("failed to allocate memory for interrupt vector,"
+		      " Rx interrupts will not be supported");
+		return -rte_errno;
+	}
+	for (i = 0; i != n; ++i) {
+		struct rxq *rxq = (*priv->rxqs)[i];
+
+		/* Skip queues that cannot request interrupts. */
+		if (!rxq || !rxq->channel) {
+			/* Use invalid intr_vec[] index to disable entry. */
+			intr_handle->intr_vec[i] =
+				RTE_INTR_VEC_RXTX_OFFSET +
+				RTE_MAX_RXTX_INTR_VEC_ID;
+			continue;
+		}
+		if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
+			rte_errno = E2BIG;
+			ERROR("too many Rx queues for interrupt vector size"
+			      " (%d), Rx interrupts cannot be enabled",
+			      RTE_MAX_RXTX_INTR_VEC_ID);
+			mlx4_rx_intr_vec_disable(priv);
+			return -rte_errno;
+		}
+		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
+		intr_handle->efds[count] = rxq->channel->fd;
+		count++;
+	}
+	if (!count)
+		mlx4_rx_intr_vec_disable(priv);
+	else
+		intr_handle->nb_efd = count;
+	return 0;
+}
+
+/**
+ * Collect interrupt events.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param events
+ *   Pointer to event flags holder.
+ *
+ * @return
+ *   Number of events.
+ */
+static int
+mlx4_collect_interrupt_events(struct priv *priv, uint32_t *events)
+{
+	struct ibv_async_event event;
+	int port_change = 0;
+	struct rte_eth_link *link = &priv->dev->data->dev_link;
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
+	int ret = 0;
+
+	*events = 0;
+	/* Read all message and acknowledge them. */
+	for (;;) {
+		if (ibv_get_async_event(priv->ctx, &event))
+			break;
+		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
+		     event.event_type == IBV_EVENT_PORT_ERR) &&
+		    intr_conf->lsc) {
+			port_change = 1;
+			ret++;
+		} else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
+			   intr_conf->rmv) {
+			*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
+			ret++;
+		} else {
+			DEBUG("event type %d on port %d not handled",
+			      event.event_type, event.element.port_num);
+		}
+		ibv_ack_async_event(&event);
+	}
+	if (!port_change)
+		return ret;
+	mlx4_link_update(priv->dev, 0);
+	if (((link->link_speed == 0) && link->link_status) ||
+	    ((link->link_speed != 0) && !link->link_status)) {
+		if (!priv->intr_alarm) {
+			/* Inconsistent status, check again later. */
+			priv->intr_alarm = 1;
+			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
+					  (void (*)(void *))
+					  mlx4_link_status_alarm,
+					  priv);
+		}
+	} else {
+		*events |= (1 << RTE_ETH_EVENT_INTR_LSC);
+	}
+	return ret;
+}
+
+/**
+ * Process scheduled link status check.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static void
+mlx4_link_status_alarm(struct priv *priv)
+{
+	uint32_t events;
+	int ret;
+
+	assert(priv->intr_alarm == 1);
+	priv->intr_alarm = 0;
+	ret = mlx4_collect_interrupt_events(priv, &events);
+	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
+		_rte_eth_dev_callback_process(priv->dev,
+					      RTE_ETH_EVENT_INTR_LSC,
+					      NULL, NULL);
+}
+
+/**
+ * Handle interrupts from the NIC.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static void
+mlx4_interrupt_handler(struct priv *priv)
+{
+	int ret;
+	uint32_t ev;
+	int i;
+
+	ret = mlx4_collect_interrupt_events(priv, &ev);
+	if (ret > 0) {
+		for (i = RTE_ETH_EVENT_UNKNOWN;
+		     i < RTE_ETH_EVENT_MAX;
+		     i++) {
+			if (ev & (1 << i)) {
+				ev &= ~(1 << i);
+				_rte_eth_dev_callback_process(priv->dev, i,
+							      NULL, NULL);
+				ret--;
+			}
+		}
+		if (ret)
+			WARN("%d event%s not processed", ret,
+			     (ret > 1 ? "s were" : " was"));
+	}
+}
+
+/**
+ * Uninstall interrupt handler.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_intr_uninstall(struct priv *priv)
+{
+	int err = rte_errno; /* Make sure rte_errno remains unchanged. */
+
+	if (priv->intr_handle.fd != -1) {
+		rte_intr_callback_unregister(&priv->intr_handle,
+					     (void (*)(void *))
+					     mlx4_interrupt_handler,
+					     priv);
+		priv->intr_handle.fd = -1;
+	}
+	rte_eal_alarm_cancel((void (*)(void *))mlx4_link_status_alarm, priv);
+	priv->intr_alarm = 0;
+	mlx4_rx_intr_vec_disable(priv);
+	rte_errno = err;
+	return 0;
+}
+
+/**
+ * Install interrupt handler.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_intr_install(struct priv *priv)
+{
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
+	int rc;
+
+	mlx4_intr_uninstall(priv);
+	if (intr_conf->rxq && mlx4_rx_intr_vec_enable(priv) < 0)
+		goto error;
+	if (intr_conf->lsc | intr_conf->rmv) {
+		priv->intr_handle.fd = priv->ctx->async_fd;
+		rc = rte_intr_callback_register(&priv->intr_handle,
+						(void (*)(void *))
+						mlx4_interrupt_handler,
+						priv);
+		if (rc < 0) {
+			rte_errno = -rc;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	mlx4_intr_uninstall(priv);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback for Rx queue interrupt disable.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq *rxq = (*priv->rxqs)[idx];
+	struct ibv_cq *ev_cq;
+	void *ev_ctx;
+	int ret;
+
+	if (!rxq || !rxq->channel) {
+		ret = EINVAL;
+	} else {
+		ret = ibv_get_cq_event(rxq->cq->channel, &ev_cq, &ev_ctx);
+		if (ret || ev_cq != rxq->cq)
+			ret = EINVAL;
+	}
+	if (ret) {
+		rte_errno = ret;
+		WARN("unable to disable interrupt on rx queue %d",
+		     idx);
+	} else {
+		ibv_ack_cq_events(rxq->cq, 1);
+	}
+	return -ret;
+}
+
+/**
+ * DPDK callback for Rx queue interrupt enable.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq *rxq = (*priv->rxqs)[idx];
+	int ret;
+
+	if (!rxq || !rxq->channel)
+		ret = EINVAL;
+	else
+		ret = ibv_req_notify_cq(rxq->cq, 0);
+	if (ret) {
+		rte_errno = ret;
+		WARN("unable to arm interrupt on rx queue %d", idx);
+	}
+	return -ret;
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 40/48] net/mlx4: separate Rx/Tx definitions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (38 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 39/48] net/mlx4: separate interrupt handling Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 41/48] net/mlx4: separate Rx/Tx functions Adrien Mazarguil
                   ` (9 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Except for a minor documentation update on internal structure definitions
to make them more Doxygen-friendly, there is no impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  13 +---
 drivers/net/mlx4/mlx4.h      |  69 +-------------------
 drivers/net/mlx4/mlx4_flow.c |   1 +
 drivers/net/mlx4/mlx4_intr.c |   1 +
 drivers/net/mlx4/mlx4_rxtx.h | 132 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 137 insertions(+), 79 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 284575c..21f0664 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -71,19 +71,9 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
-/* Work Request ID data type (64 bit). */
-typedef union {
-	struct {
-		uint32_t id;
-		uint16_t offset;
-	} data;
-	uint64_t raw;
-} wr_id_t;
-
-#define WR_ID(o) (((wr_id_t *)&(o))->data)
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -3083,7 +3073,6 @@ RTE_INIT(rte_mlx4_pmd_init);
 static void
 rte_mlx4_pmd_init(void)
 {
-	RTE_BUILD_BUG_ON(sizeof(wr_id_t) != sizeof(uint64_t));
 	/*
 	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
 	 * huge pages. Calling ibv_fork_init() during init allows
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index f815dd8..edbece6 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -85,73 +85,8 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-struct mlx4_rxq_stats {
-	unsigned int idx; /**< Mapping index. */
-	uint64_t ipackets; /**< Total of successfully received packets. */
-	uint64_t ibytes; /**< Total of successfully received bytes. */
-	uint64_t idropped; /**< Total of packets dropped when RX ring full. */
-	uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
-};
-
-/* RX element. */
-struct rxq_elt {
-	struct ibv_recv_wr wr; /* Work Request. */
-	struct ibv_sge sge; /* Scatter/Gather Element. */
-	/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
-};
-
-/* RX queue descriptor. */
-struct rxq {
-	struct priv *priv; /* Back pointer to private data. */
-	struct rte_mempool *mp; /* Memory Pool for allocations. */
-	struct ibv_mr *mr; /* Memory Region (for mp). */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_comp_channel *channel;
-	unsigned int port_id; /* Port ID for incoming packets. */
-	unsigned int elts_n; /* (*elts)[] length. */
-	unsigned int elts_head; /* Current index in (*elts)[]. */
-	struct rxq_elt (*elts)[]; /* RX elements. */
-	struct mlx4_rxq_stats stats; /* RX queue counters. */
-	unsigned int socket; /* CPU socket ID for allocations. */
-};
-
-/* TX element. */
-struct txq_elt {
-	struct ibv_send_wr wr; /* Work request. */
-	struct ibv_sge sge; /* Scatter/gather element. */
-	struct rte_mbuf *buf;
-};
-
-struct mlx4_txq_stats {
-	unsigned int idx; /**< Mapping index. */
-	uint64_t opackets; /**< Total of successfully sent packets. */
-	uint64_t obytes;   /**< Total of successfully sent bytes. */
-	uint64_t odropped; /**< Total of packets not sent when TX ring full. */
-};
-
-/* TX queue descriptor. */
-struct txq {
-	struct priv *priv; /* Back pointer to private data. */
-	struct {
-		const struct rte_mempool *mp; /* Cached Memory Pool. */
-		struct ibv_mr *mr; /* Memory Region (for mp). */
-		uint32_t lkey; /* mr->lkey */
-	} mp2mr[MLX4_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
-	uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
-	unsigned int elts_n; /* (*elts)[] length. */
-	struct txq_elt (*elts)[]; /* TX elements. */
-	unsigned int elts_head; /* Current index in (*elts)[]. */
-	unsigned int elts_tail; /* First element awaiting completion. */
-	unsigned int elts_comp; /* Number of completion requests. */
-	unsigned int elts_comp_cd; /* Countdown for next completion request. */
-	unsigned int elts_comp_cd_init; /* Initial value for countdown. */
-	struct mlx4_txq_stats stats; /* TX queue counters. */
-	unsigned int socket; /* CPU socket ID for allocations. */
-};
-
+struct rxq;
+struct txq;
 struct rte_flow;
 
 struct priv {
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 6f6f455..61455ce 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -40,6 +40,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
 /** Static initializer for items. */
diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
index bcf4d59..76d2e01 100644
--- a/drivers/net/mlx4/mlx4_intr.c
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -56,6 +56,7 @@
 #include <rte_interrupts.h>
 
 #include "mlx4.h"
+#include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
 static void mlx4_link_status_alarm(struct priv *priv);
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
new file mode 100644
index 0000000..1d46e1e
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -0,0 +1,132 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef MLX4_RXTX_H_
+#define MLX4_RXTX_H_
+
+#include <stdint.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#include "mlx4.h"
+
+/** Convert work request ID to data. */
+#define WR_ID(o) \
+	(((union { \
+		struct { \
+			uint32_t id; \
+			uint16_t offset; \
+		} data; \
+		uint64_t raw; \
+	} *)&(o))->data)
+
+/** Rx queue counters. */
+struct mlx4_rxq_stats {
+	unsigned int idx; /**< Mapping index. */
+	uint64_t ipackets; /**< Total of successfully received packets. */
+	uint64_t ibytes; /**< Total of successfully received bytes. */
+	uint64_t idropped; /**< Total of packets dropped when Rx ring full. */
+	uint64_t rx_nombuf; /**< Total of Rx mbuf allocation failures. */
+};
+
+/** Rx element. */
+struct rxq_elt {
+	struct ibv_recv_wr wr; /**< Work request. */
+	struct ibv_sge sge; /**< Scatter/gather element. */
+	/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+};
+
+/** Rx queue descriptor. */
+struct rxq {
+	struct priv *priv; /**< Back pointer to private data. */
+	struct rte_mempool *mp; /**< Memory pool for allocations. */
+	struct ibv_mr *mr; /**< Memory region (for mp). */
+	struct ibv_cq *cq; /**< Completion queue. */
+	struct ibv_qp *qp; /**< Queue pair. */
+	struct ibv_comp_channel *channel; /**< Rx completion channel. */
+	unsigned int port_id; /**< Port ID for incoming packets. */
+	unsigned int elts_n; /**< (*elts)[] length. */
+	unsigned int elts_head; /**< Current index in (*elts)[]. */
+	struct rxq_elt (*elts)[]; /**< Rx elements. */
+	struct mlx4_rxq_stats stats; /**< Rx queue counters. */
+	unsigned int socket; /**< CPU socket ID for allocations. */
+};
+
+/** Tx element. */
+struct txq_elt {
+	struct ibv_send_wr wr; /* Work request. */
+	struct ibv_sge sge; /* Scatter/gather element. */
+	struct rte_mbuf *buf; /**< Buffer. */
+};
+
+/** Rx queue counters. */
+struct mlx4_txq_stats {
+	unsigned int idx; /**< Mapping index. */
+	uint64_t opackets; /**< Total of successfully sent packets. */
+	uint64_t obytes; /**< Total of successfully sent bytes. */
+	uint64_t odropped; /**< Total of packets not sent when Tx ring full. */
+};
+
+/** Tx queue descriptor. */
+struct txq {
+	struct priv *priv; /**< Back pointer to private data. */
+	struct {
+		const struct rte_mempool *mp; /**< Cached memory pool. */
+		struct ibv_mr *mr; /**< Memory region (for mp). */
+		uint32_t lkey; /**< mr->lkey copy. */
+	} mp2mr[MLX4_PMD_TX_MP_CACHE]; /**< MP to MR translation table. */
+	struct ibv_cq *cq; /**< Completion queue. */
+	struct ibv_qp *qp; /**< Queue pair. */
+	uint32_t max_inline; /**< Max inline send size. */
+	unsigned int elts_n; /**< (*elts)[] length. */
+	struct txq_elt (*elts)[]; /**< Tx elements. */
+	unsigned int elts_head; /**< Current index in (*elts)[]. */
+	unsigned int elts_tail; /**< First element awaiting completion. */
+	unsigned int elts_comp; /**< Number of completion requests. */
+	unsigned int elts_comp_cd; /**< Countdown for next completion. */
+	unsigned int elts_comp_cd_init; /**< Initial value for countdown. */
+	struct mlx4_txq_stats stats; /**< Tx queue counters. */
+	unsigned int socket; /**< CPU socket ID for allocations. */
+};
+
+#endif /* MLX4_RXTX_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 41/48] net/mlx4: separate Rx/Tx functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (39 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 40/48] net/mlx4: separate Rx/Tx definitions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 42/48] net/mlx4: separate device control functions Adrien Mazarguil
                   ` (8 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

This commit groups all data plane functions (Rx/Tx) into a separate file
and adjusts header files accordingly.

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 493 +----------------------------------
 drivers/net/mlx4/mlx4.h      |   2 +
 drivers/net/mlx4/mlx4_rxtx.c | 533 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_rxtx.h |  12 +
 5 files changed, 554 insertions(+), 487 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 37dcdf7..1a7e847 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -38,6 +38,7 @@ LIB = librte_pmd_mlx4.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 21f0664..5530257 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -56,13 +56,11 @@
 #include <rte_mbuf.h>
 #include <rte_errno.h>
 #include <rte_mempool.h>
-#include <rte_prefetch.h>
 #include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
-#include <rte_branch_prediction.h>
 #include <rte_common.h>
 
 /* Generated configuration header. */
@@ -505,9 +503,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
-static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
-
 /* TX queues handling. */
 
 /**
@@ -630,53 +625,6 @@ txq_cleanup(struct txq *txq)
 	memset(txq, 0, sizeof(*txq));
 }
 
-/**
- * Manage TX completions.
- *
- * When sending a burst, mlx4_tx_burst() posts several WRs.
- * To improve performance, a completion event is only required once every
- * MLX4_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
- * for other WRs, but this information would not be used anyway.
- *
- * @param txq
- *   Pointer to TX queue structure.
- *
- * @return
- *   0 on success, -1 on failure.
- */
-static int
-txq_complete(struct txq *txq)
-{
-	unsigned int elts_comp = txq->elts_comp;
-	unsigned int elts_tail = txq->elts_tail;
-	const unsigned int elts_n = txq->elts_n;
-	struct ibv_wc wcs[elts_comp];
-	int wcs_n;
-
-	if (unlikely(elts_comp == 0))
-		return 0;
-	wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
-	if (unlikely(wcs_n == 0))
-		return 0;
-	if (unlikely(wcs_n < 0)) {
-		DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
-		      (void *)txq, wcs_n);
-		return -1;
-	}
-	elts_comp -= wcs_n;
-	assert(elts_comp <= txq->elts_comp);
-	/*
-	 * Assume WC status is successful as nothing can be done about it
-	 * anyway.
-	 */
-	elts_tail += wcs_n * txq->elts_comp_cd_init;
-	if (elts_tail >= elts_n)
-		elts_tail -= elts_n;
-	txq->elts_tail = elts_tail;
-	txq->elts_comp = elts_comp;
-	return 0;
-}
-
 struct mlx4_check_mempool_data {
 	int ret;
 	char *start;
@@ -738,10 +686,6 @@ static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
 	return data.ret;
 }
 
-/* For best performance, this function should not be inlined. */
-static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
-	__rte_noinline;
-
 /**
  * Register mempool as a memory region.
  *
@@ -753,7 +697,7 @@ static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
  * @return
  *   Memory region pointer, NULL in case of error and rte_errno is set.
  */
-static struct ibv_mr *
+struct ibv_mr *
 mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 {
 	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
@@ -794,81 +738,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	return mr;
 }
 
-/**
- * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which
- * the cloned mbuf is allocated is returned instead.
- *
- * @param buf
- *   Pointer to mbuf.
- *
- * @return
- *   Memory pool where data is located for given mbuf.
- */
-static struct rte_mempool *
-txq_mb2mp(struct rte_mbuf *buf)
-{
-	if (unlikely(RTE_MBUF_INDIRECT(buf)))
-		return rte_mbuf_from_indirect(buf)->pool;
-	return buf->pool;
-}
-
-/**
- * Get Memory Region (MR) <-> Memory Pool (MP) association from txq->mp2mr[].
- * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full,
- * remove an entry first.
- *
- * @param txq
- *   Pointer to TX queue structure.
- * @param[in] mp
- *   Memory Pool for which a Memory Region lkey must be returned.
- *
- * @return
- *   mr->lkey on success, (uint32_t)-1 on failure.
- */
-static uint32_t
-txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
-{
-	unsigned int i;
-	struct ibv_mr *mr;
-
-	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
-		if (unlikely(txq->mp2mr[i].mp == NULL)) {
-			/* Unknown MP, add a new MR for it. */
-			break;
-		}
-		if (txq->mp2mr[i].mp == mp) {
-			assert(txq->mp2mr[i].lkey != (uint32_t)-1);
-			assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey);
-			return txq->mp2mr[i].lkey;
-		}
-	}
-	/* Add a new entry, register MR first. */
-	DEBUG("%p: discovered new memory pool \"%s\" (%p)",
-	      (void *)txq, mp->name, (void *)mp);
-	mr = mlx4_mp2mr(txq->priv->pd, mp);
-	if (unlikely(mr == NULL)) {
-		DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
-		      (void *)txq);
-		return (uint32_t)-1;
-	}
-	if (unlikely(i == RTE_DIM(txq->mp2mr))) {
-		/* Table is full, remove oldest entry. */
-		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
-		      (void *)txq);
-		--i;
-		claim_zero(ibv_dereg_mr(txq->mp2mr[0].mr));
-		memmove(&txq->mp2mr[0], &txq->mp2mr[1],
-			(sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
-	}
-	/* Store the new entry. */
-	txq->mp2mr[i].mp = mp;
-	txq->mp2mr[i].mr = mr;
-	txq->mp2mr[i].lkey = mr->lkey;
-	DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
-	      (void *)txq, mp->name, (void *)mp, txq->mp2mr[i].lkey);
-	return txq->mp2mr[i].lkey;
-}
-
 struct txq_mp2mr_mbuf_check_data {
 	int ret;
 };
@@ -923,172 +792,7 @@ txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 	if (rte_mempool_obj_iter(mp, txq_mp2mr_mbuf_check, &data) == 0 ||
 			data.ret == -1)
 		return;
-	txq_mp2mr(txq, mp);
-}
-
-/**
- * DPDK callback for TX.
- *
- * @param dpdk_txq
- *   Generic pointer to TX queue structure.
- * @param[in] pkts
- *   Packets to transmit.
- * @param pkts_n
- *   Number of packets in array.
- *
- * @return
- *   Number of packets successfully transmitted (<= pkts_n).
- */
-static uint16_t
-mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	struct txq *txq = (struct txq *)dpdk_txq;
-	struct ibv_send_wr *wr_head = NULL;
-	struct ibv_send_wr **wr_next = &wr_head;
-	struct ibv_send_wr *wr_bad = NULL;
-	unsigned int elts_head = txq->elts_head;
-	const unsigned int elts_n = txq->elts_n;
-	unsigned int elts_comp_cd = txq->elts_comp_cd;
-	unsigned int elts_comp = 0;
-	unsigned int i;
-	unsigned int max;
-	int err;
-
-	assert(elts_comp_cd != 0);
-	txq_complete(txq);
-	max = (elts_n - (elts_head - txq->elts_tail));
-	if (max > elts_n)
-		max -= elts_n;
-	assert(max >= 1);
-	assert(max <= elts_n);
-	/* Always leave one free entry in the ring. */
-	--max;
-	if (max == 0)
-		return 0;
-	if (max > pkts_n)
-		max = pkts_n;
-	for (i = 0; (i != max); ++i) {
-		struct rte_mbuf *buf = pkts[i];
-		unsigned int elts_head_next =
-			(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
-		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
-		struct txq_elt *elt = &(*txq->elts)[elts_head];
-		struct ibv_send_wr *wr = &elt->wr;
-		unsigned int segs = buf->nb_segs;
-		unsigned int sent_size = 0;
-		uint32_t send_flags = 0;
-
-		/* Clean up old buffer. */
-		if (likely(elt->buf != NULL)) {
-			struct rte_mbuf *tmp = elt->buf;
-
-#ifndef NDEBUG
-			/* Poisoning. */
-			memset(elt, 0x66, sizeof(*elt));
-#endif
-			/* Faster than rte_pktmbuf_free(). */
-			do {
-				struct rte_mbuf *next = tmp->next;
-
-				rte_pktmbuf_free_seg(tmp);
-				tmp = next;
-			} while (tmp != NULL);
-		}
-		/* Request TX completion. */
-		if (unlikely(--elts_comp_cd == 0)) {
-			elts_comp_cd = txq->elts_comp_cd_init;
-			++elts_comp;
-			send_flags |= IBV_SEND_SIGNALED;
-		}
-		if (likely(segs == 1)) {
-			struct ibv_sge *sge = &elt->sge;
-			uintptr_t addr;
-			uint32_t length;
-			uint32_t lkey;
-
-			/* Retrieve buffer information. */
-			addr = rte_pktmbuf_mtod(buf, uintptr_t);
-			length = buf->data_len;
-			/* Retrieve Memory Region key for this memory pool. */
-			lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-			if (unlikely(lkey == (uint32_t)-1)) {
-				/* MR does not exist. */
-				DEBUG("%p: unable to get MP <-> MR"
-				      " association", (void *)txq);
-				/* Clean up TX element. */
-				elt->buf = NULL;
-				goto stop;
-			}
-			/* Update element. */
-			elt->buf = buf;
-			if (txq->priv->vf)
-				rte_prefetch0((volatile void *)
-					      (uintptr_t)addr);
-			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
-			sge->addr = addr;
-			sge->length = length;
-			sge->lkey = lkey;
-			sent_size += length;
-		} else {
-			err = -1;
-			goto stop;
-		}
-		if (sent_size <= txq->max_inline)
-			send_flags |= IBV_SEND_INLINE;
-		elts_head = elts_head_next;
-		/* Increment sent bytes counter. */
-		txq->stats.obytes += sent_size;
-		/* Set up WR. */
-		wr->sg_list = &elt->sge;
-		wr->num_sge = segs;
-		wr->opcode = IBV_WR_SEND;
-		wr->send_flags = send_flags;
-		*wr_next = wr;
-		wr_next = &wr->next;
-	}
-stop:
-	/* Take a shortcut if nothing must be sent. */
-	if (unlikely(i == 0))
-		return 0;
-	/* Increment sent packets counter. */
-	txq->stats.opackets += i;
-	/* Ring QP doorbell. */
-	*wr_next = NULL;
-	assert(wr_head);
-	err = ibv_post_send(txq->qp, wr_head, &wr_bad);
-	if (unlikely(err)) {
-		uint64_t obytes = 0;
-		uint64_t opackets = 0;
-
-		/* Rewind bad WRs. */
-		while (wr_bad != NULL) {
-			int j;
-
-			/* Force completion request if one was lost. */
-			if (wr_bad->send_flags & IBV_SEND_SIGNALED) {
-				elts_comp_cd = 1;
-				--elts_comp;
-			}
-			++opackets;
-			for (j = 0; j < wr_bad->num_sge; ++j)
-				obytes += wr_bad->sg_list[j].length;
-			elts_head = (elts_head ? elts_head : elts_n) - 1;
-			wr_bad = wr_bad->next;
-		}
-		txq->stats.opackets -= opackets;
-		txq->stats.obytes -= obytes;
-		i -= opackets;
-		DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets"
-		      " (%" PRIu64 " bytes) rejected: %s",
-		      (void *)txq,
-		      opackets,
-		      obytes,
-		      (err <= -1) ? "Internal error" : strerror(err));
-	}
-	txq->elts_head = elts_head;
-	txq->elts_comp += elts_comp;
-	txq->elts_comp_cd = elts_comp_cd;
-	return i;
+	mlx4_txq_mp2mr(txq, mp);
 }
 
 /**
@@ -1597,141 +1301,6 @@ rxq_cleanup(struct rxq *rxq)
 }
 
 /**
- * DPDK callback for RX.
- *
- * The following function doesn't manage scattered packets.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
-	const unsigned int elts_n = rxq->elts_n;
-	unsigned int elts_head = rxq->elts_head;
-	struct ibv_wc wcs[pkts_n];
-	struct ibv_recv_wr *wr_head = NULL;
-	struct ibv_recv_wr **wr_next = &wr_head;
-	struct ibv_recv_wr *wr_bad = NULL;
-	unsigned int i;
-	unsigned int pkts_ret = 0;
-	int ret;
-
-	ret = ibv_poll_cq(rxq->cq, pkts_n, wcs);
-	if (unlikely(ret == 0))
-		return 0;
-	if (unlikely(ret < 0)) {
-		DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
-		      (void *)rxq, ret);
-		return 0;
-	}
-	assert(ret <= (int)pkts_n);
-	/* For each work completion. */
-	for (i = 0; i != (unsigned int)ret; ++i) {
-		struct ibv_wc *wc = &wcs[i];
-		struct rxq_elt *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
-		uint32_t len = wc->byte_len;
-		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(wr_id).offset);
-		struct rte_mbuf *rep;
-
-		/* Sanity checks. */
-		assert(WR_ID(wr_id).id < rxq->elts_n);
-		assert(wr_id == wc->wr_id);
-		assert(wr->sg_list == &elt->sge);
-		assert(wr->num_sge == 1);
-		assert(elts_head < rxq->elts_n);
-		assert(rxq->elts_head < rxq->elts_n);
-		/*
-		 * Fetch initial bytes of packet descriptor into a
-		 * cacheline while allocating rep.
-		 */
-		rte_mbuf_prefetch_part1(seg);
-		rte_mbuf_prefetch_part2(seg);
-		/* Link completed WRs together for repost. */
-		*wr_next = wr;
-		wr_next = &wr->next;
-		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
-			/* Whatever, just repost the offending WR. */
-			DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
-			      " status (%d): %s",
-			      (void *)rxq, wr_id, wc->status,
-			      ibv_wc_status_str(wc->status));
-			/* Increment dropped packets counter. */
-			++rxq->stats.idropped;
-			goto repost;
-		}
-		rep = rte_mbuf_raw_alloc(rxq->mp);
-		if (unlikely(rep == NULL)) {
-			/*
-			 * Unable to allocate a replacement mbuf,
-			 * repost WR.
-			 */
-			DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
-			      " can't allocate a new mbuf",
-			      (void *)rxq, WR_ID(wr_id).id);
-			/* Increase out of memory counters. */
-			++rxq->stats.rx_nombuf;
-			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-			goto repost;
-		}
-		/* Reconfigure sge to use rep instead of seg. */
-		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
-		assert(elt->sge.lkey == rxq->mr->lkey);
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)rep);
-		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
-		/* Update seg information. */
-		seg->data_off = RTE_PKTMBUF_HEADROOM;
-		seg->nb_segs = 1;
-		seg->port = rxq->port_id;
-		seg->next = NULL;
-		seg->pkt_len = len;
-		seg->data_len = len;
-		seg->packet_type = 0;
-		seg->ol_flags = 0;
-		/* Return packet. */
-		*(pkts++) = seg;
-		++pkts_ret;
-		/* Increase bytes counter. */
-		rxq->stats.ibytes += len;
-repost:
-		if (++elts_head >= elts_n)
-			elts_head = 0;
-		continue;
-	}
-	if (unlikely(i == 0))
-		return 0;
-	/* Repost WRs. */
-	*wr_next = NULL;
-	assert(wr_head);
-	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
-	if (unlikely(ret)) {
-		/* Inability to repost WRs is fatal. */
-		DEBUG("%p: recv_burst(): failed (ret=%d)",
-		      (void *)rxq->priv,
-		      ret);
-		abort();
-	}
-	rxq->elts_head = elts_head;
-	/* Increase packets counter. */
-	rxq->stats.ipackets += pkts_ret;
-	return pkts_ret;
-}
-
-/**
  * Allocate a Queue Pair.
  * Optionally setup inline receive if supported.
  *
@@ -2092,56 +1661,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 }
 
 /**
- * Dummy DPDK callback for TX.
- *
- * This function is used to temporarily replace the real callback during
- * unsafe control operations on the queue, or in case of error.
- *
- * @param dpdk_txq
- *   Generic pointer to TX queue structure.
- * @param[in] pkts
- *   Packets to transmit.
- * @param pkts_n
- *   Number of packets in array.
- *
- * @return
- *   Number of packets successfully transmitted (<= pkts_n).
- */
-static uint16_t
-removed_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	(void)dpdk_txq;
-	(void)pkts;
-	(void)pkts_n;
-	return 0;
-}
-
-/**
- * Dummy DPDK callback for RX.
- *
- * This function is used to temporarily replace the real callback during
- * unsafe control operations on the queue, or in case of error.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	(void)dpdk_rxq;
-	(void)pkts;
-	(void)pkts_n;
-	return 0;
-}
-
-/**
  * DPDK callback to close the device.
  *
  * Destroy all queues and objects, free memory.
@@ -2167,8 +1686,8 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
 	 * never release them before closing the device.
 	 */
-	dev->rx_pkt_burst = removed_rx_burst;
-	dev->tx_pkt_burst = removed_tx_burst;
+	dev->rx_pkt_burst = mlx4_rx_burst_removed;
+	dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	if (priv->rxqs != NULL) {
 		/* XXX race condition if mlx4_rx_burst() is still running. */
 		usleep(1000);
@@ -2233,8 +1752,8 @@ priv_set_link(struct priv *priv, int up)
 		err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
 		if (err)
 			return err;
-		dev->rx_pkt_burst = removed_rx_burst;
-		dev->tx_pkt_burst = removed_tx_burst;
+		dev->rx_pkt_burst = mlx4_rx_burst_removed;
+		dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index edbece6..efccf1a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -49,6 +49,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_interrupts.h>
+#include <rte_mempool.h>
 
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
@@ -115,6 +116,7 @@ struct priv {
 
 /* mlx4.c */
 
+struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
 int mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete);
 
 /* mlx4_intr.c */
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
new file mode 100644
index 0000000..944cf48
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -0,0 +1,533 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Data plane functions for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_branch_prediction.h>
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_prefetch.h>
+
+#include "mlx4.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Manage Tx completions.
+ *
+ * When sending a burst, mlx4_tx_burst() posts several WRs.
+ * To improve performance, a completion event is only required once every
+ * MLX4_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
+ * for other WRs, but this information would not be used anyway.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+static int
+mlx4_txq_complete(struct txq *txq)
+{
+	unsigned int elts_comp = txq->elts_comp;
+	unsigned int elts_tail = txq->elts_tail;
+	const unsigned int elts_n = txq->elts_n;
+	struct ibv_wc wcs[elts_comp];
+	int wcs_n;
+
+	if (unlikely(elts_comp == 0))
+		return 0;
+	wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
+	if (unlikely(wcs_n == 0))
+		return 0;
+	if (unlikely(wcs_n < 0)) {
+		DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
+		      (void *)txq, wcs_n);
+		return -1;
+	}
+	elts_comp -= wcs_n;
+	assert(elts_comp <= txq->elts_comp);
+	/*
+	 * Assume WC status is successful as nothing can be done about it
+	 * anyway.
+	 */
+	elts_tail += wcs_n * txq->elts_comp_cd_init;
+	if (elts_tail >= elts_n)
+		elts_tail -= elts_n;
+	txq->elts_tail = elts_tail;
+	txq->elts_comp = elts_comp;
+	return 0;
+}
+
+/**
+ * Get memory pool (MP) from mbuf. If mbuf is indirect, the pool from which
+ * the cloned mbuf is allocated is returned instead.
+ *
+ * @param buf
+ *   Pointer to mbuf.
+ *
+ * @return
+ *   Memory pool where data is located for given mbuf.
+ */
+static struct rte_mempool *
+mlx4_txq_mb2mp(struct rte_mbuf *buf)
+{
+	if (unlikely(RTE_MBUF_INDIRECT(buf)))
+		return rte_mbuf_from_indirect(buf)->pool;
+	return buf->pool;
+}
+
+/**
+ * Get memory region (MR) <-> memory pool (MP) association from txq->mp2mr[].
+ * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full,
+ * remove an entry first.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ * @param[in] mp
+ *   Memory pool for which a memory region lkey must be returned.
+ *
+ * @return
+ *   mr->lkey on success, (uint32_t)-1 on failure.
+ */
+uint32_t
+mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
+{
+	unsigned int i;
+	struct ibv_mr *mr;
+
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+		if (unlikely(txq->mp2mr[i].mp == NULL)) {
+			/* Unknown MP, add a new MR for it. */
+			break;
+		}
+		if (txq->mp2mr[i].mp == mp) {
+			assert(txq->mp2mr[i].lkey != (uint32_t)-1);
+			assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey);
+			return txq->mp2mr[i].lkey;
+		}
+	}
+	/* Add a new entry, register MR first. */
+	DEBUG("%p: discovered new memory pool \"%s\" (%p)",
+	      (void *)txq, mp->name, (void *)mp);
+	mr = mlx4_mp2mr(txq->priv->pd, mp);
+	if (unlikely(mr == NULL)) {
+		DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
+		      (void *)txq);
+		return (uint32_t)-1;
+	}
+	if (unlikely(i == RTE_DIM(txq->mp2mr))) {
+		/* Table is full, remove oldest entry. */
+		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
+		      (void *)txq);
+		--i;
+		claim_zero(ibv_dereg_mr(txq->mp2mr[0].mr));
+		memmove(&txq->mp2mr[0], &txq->mp2mr[1],
+			(sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
+	}
+	/* Store the new entry. */
+	txq->mp2mr[i].mp = mp;
+	txq->mp2mr[i].mr = mr;
+	txq->mp2mr[i].lkey = mr->lkey;
+	DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
+	      (void *)txq, mp->name, (void *)mp, txq->mp2mr[i].lkey);
+	return txq->mp2mr[i].lkey;
+}
+
+/**
+ * DPDK callback for Tx.
+ *
+ * @param dpdk_txq
+ *   Generic pointer to Tx queue structure.
+ * @param[in] pkts
+ *   Packets to transmit.
+ * @param pkts_n
+ *   Number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	struct txq *txq = (struct txq *)dpdk_txq;
+	struct ibv_send_wr *wr_head = NULL;
+	struct ibv_send_wr **wr_next = &wr_head;
+	struct ibv_send_wr *wr_bad = NULL;
+	unsigned int elts_head = txq->elts_head;
+	const unsigned int elts_n = txq->elts_n;
+	unsigned int elts_comp_cd = txq->elts_comp_cd;
+	unsigned int elts_comp = 0;
+	unsigned int i;
+	unsigned int max;
+	int err;
+
+	assert(elts_comp_cd != 0);
+	mlx4_txq_complete(txq);
+	max = (elts_n - (elts_head - txq->elts_tail));
+	if (max > elts_n)
+		max -= elts_n;
+	assert(max >= 1);
+	assert(max <= elts_n);
+	/* Always leave one free entry in the ring. */
+	--max;
+	if (max == 0)
+		return 0;
+	if (max > pkts_n)
+		max = pkts_n;
+	for (i = 0; (i != max); ++i) {
+		struct rte_mbuf *buf = pkts[i];
+		unsigned int elts_head_next =
+			(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
+		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
+		struct txq_elt *elt = &(*txq->elts)[elts_head];
+		struct ibv_send_wr *wr = &elt->wr;
+		unsigned int segs = buf->nb_segs;
+		unsigned int sent_size = 0;
+		uint32_t send_flags = 0;
+
+		/* Clean up old buffer. */
+		if (likely(elt->buf != NULL)) {
+			struct rte_mbuf *tmp = elt->buf;
+
+#ifndef NDEBUG
+			/* Poisoning. */
+			memset(elt, 0x66, sizeof(*elt));
+#endif
+			/* Faster than rte_pktmbuf_free(). */
+			do {
+				struct rte_mbuf *next = tmp->next;
+
+				rte_pktmbuf_free_seg(tmp);
+				tmp = next;
+			} while (tmp != NULL);
+		}
+		/* Request Tx completion. */
+		if (unlikely(--elts_comp_cd == 0)) {
+			elts_comp_cd = txq->elts_comp_cd_init;
+			++elts_comp;
+			send_flags |= IBV_SEND_SIGNALED;
+		}
+		if (likely(segs == 1)) {
+			struct ibv_sge *sge = &elt->sge;
+			uintptr_t addr;
+			uint32_t length;
+			uint32_t lkey;
+
+			/* Retrieve buffer information. */
+			addr = rte_pktmbuf_mtod(buf, uintptr_t);
+			length = buf->data_len;
+			/* Retrieve memory region key for this memory pool. */
+			lkey = mlx4_txq_mp2mr(txq, mlx4_txq_mb2mp(buf));
+			if (unlikely(lkey == (uint32_t)-1)) {
+				/* MR does not exist. */
+				DEBUG("%p: unable to get MP <-> MR"
+				      " association", (void *)txq);
+				/* Clean up Tx element. */
+				elt->buf = NULL;
+				goto stop;
+			}
+			/* Update element. */
+			elt->buf = buf;
+			if (txq->priv->vf)
+				rte_prefetch0((volatile void *)
+					      (uintptr_t)addr);
+			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+			sge->addr = addr;
+			sge->length = length;
+			sge->lkey = lkey;
+			sent_size += length;
+		} else {
+			err = -1;
+			goto stop;
+		}
+		if (sent_size <= txq->max_inline)
+			send_flags |= IBV_SEND_INLINE;
+		elts_head = elts_head_next;
+		/* Increment sent bytes counter. */
+		txq->stats.obytes += sent_size;
+		/* Set up WR. */
+		wr->sg_list = &elt->sge;
+		wr->num_sge = segs;
+		wr->opcode = IBV_WR_SEND;
+		wr->send_flags = send_flags;
+		*wr_next = wr;
+		wr_next = &wr->next;
+	}
+stop:
+	/* Take a shortcut if nothing must be sent. */
+	if (unlikely(i == 0))
+		return 0;
+	/* Increment sent packets counter. */
+	txq->stats.opackets += i;
+	/* Ring QP doorbell. */
+	*wr_next = NULL;
+	assert(wr_head);
+	err = ibv_post_send(txq->qp, wr_head, &wr_bad);
+	if (unlikely(err)) {
+		uint64_t obytes = 0;
+		uint64_t opackets = 0;
+
+		/* Rewind bad WRs. */
+		while (wr_bad != NULL) {
+			int j;
+
+			/* Force completion request if one was lost. */
+			if (wr_bad->send_flags & IBV_SEND_SIGNALED) {
+				elts_comp_cd = 1;
+				--elts_comp;
+			}
+			++opackets;
+			for (j = 0; j < wr_bad->num_sge; ++j)
+				obytes += wr_bad->sg_list[j].length;
+			elts_head = (elts_head ? elts_head : elts_n) - 1;
+			wr_bad = wr_bad->next;
+		}
+		txq->stats.opackets -= opackets;
+		txq->stats.obytes -= obytes;
+		i -= opackets;
+		DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets"
+		      " (%" PRIu64 " bytes) rejected: %s",
+		      (void *)txq,
+		      opackets,
+		      obytes,
+		      (err <= -1) ? "Internal error" : strerror(err));
+	}
+	txq->elts_head = elts_head;
+	txq->elts_comp += elts_comp;
+	txq->elts_comp_cd = elts_comp_cd;
+	return i;
+}
+
+/**
+ * DPDK callback for Rx.
+ *
+ * The following function doesn't manage scattered packets.
+ *
+ * @param dpdk_rxq
+ *   Generic pointer to Rx queue structure.
+ * @param[out] pkts
+ *   Array to store received packets.
+ * @param pkts_n
+ *   Maximum number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	struct rxq *rxq = (struct rxq *)dpdk_rxq;
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
+	const unsigned int elts_n = rxq->elts_n;
+	unsigned int elts_head = rxq->elts_head;
+	struct ibv_wc wcs[pkts_n];
+	struct ibv_recv_wr *wr_head = NULL;
+	struct ibv_recv_wr **wr_next = &wr_head;
+	struct ibv_recv_wr *wr_bad = NULL;
+	unsigned int i;
+	unsigned int pkts_ret = 0;
+	int ret;
+
+	ret = ibv_poll_cq(rxq->cq, pkts_n, wcs);
+	if (unlikely(ret == 0))
+		return 0;
+	if (unlikely(ret < 0)) {
+		DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
+		      (void *)rxq, ret);
+		return 0;
+	}
+	assert(ret <= (int)pkts_n);
+	/* For each work completion. */
+	for (i = 0; i != (unsigned int)ret; ++i) {
+		struct ibv_wc *wc = &wcs[i];
+		struct rxq_elt *elt = &(*elts)[elts_head];
+		struct ibv_recv_wr *wr = &elt->wr;
+		uint64_t wr_id = wr->wr_id;
+		uint32_t len = wc->byte_len;
+		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
+			WR_ID(wr_id).offset);
+		struct rte_mbuf *rep;
+
+		/* Sanity checks. */
+		assert(WR_ID(wr_id).id < rxq->elts_n);
+		assert(wr_id == wc->wr_id);
+		assert(wr->sg_list == &elt->sge);
+		assert(wr->num_sge == 1);
+		assert(elts_head < rxq->elts_n);
+		assert(rxq->elts_head < rxq->elts_n);
+		/*
+		 * Fetch initial bytes of packet descriptor into a
+		 * cacheline while allocating rep.
+		 */
+		rte_mbuf_prefetch_part1(seg);
+		rte_mbuf_prefetch_part2(seg);
+		/* Link completed WRs together for repost. */
+		*wr_next = wr;
+		wr_next = &wr->next;
+		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
+			/* Whatever, just repost the offending WR. */
+			DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
+			      " status (%d): %s",
+			      (void *)rxq, wr_id, wc->status,
+			      ibv_wc_status_str(wc->status));
+			/* Increment dropped packets counter. */
+			++rxq->stats.idropped;
+			goto repost;
+		}
+		rep = rte_mbuf_raw_alloc(rxq->mp);
+		if (unlikely(rep == NULL)) {
+			/*
+			 * Unable to allocate a replacement mbuf,
+			 * repost WR.
+			 */
+			DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
+			      " can't allocate a new mbuf",
+			      (void *)rxq, WR_ID(wr_id).id);
+			/* Increase out of memory counters. */
+			++rxq->stats.rx_nombuf;
+			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
+			goto repost;
+		}
+		/* Reconfigure sge to use rep instead of seg. */
+		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
+		assert(elt->sge.lkey == rxq->mr->lkey);
+		WR_ID(wr->wr_id).offset =
+			(((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
+			 (uintptr_t)rep);
+		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+		/* Update seg information. */
+		seg->data_off = RTE_PKTMBUF_HEADROOM;
+		seg->nb_segs = 1;
+		seg->port = rxq->port_id;
+		seg->next = NULL;
+		seg->pkt_len = len;
+		seg->data_len = len;
+		seg->packet_type = 0;
+		seg->ol_flags = 0;
+		/* Return packet. */
+		*(pkts++) = seg;
+		++pkts_ret;
+		/* Increase bytes counter. */
+		rxq->stats.ibytes += len;
+repost:
+		if (++elts_head >= elts_n)
+			elts_head = 0;
+		continue;
+	}
+	if (unlikely(i == 0))
+		return 0;
+	/* Repost WRs. */
+	*wr_next = NULL;
+	assert(wr_head);
+	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
+	if (unlikely(ret)) {
+		/* Inability to repost WRs is fatal. */
+		DEBUG("%p: recv_burst(): failed (ret=%d)",
+		      (void *)rxq->priv,
+		      ret);
+		abort();
+	}
+	rxq->elts_head = elts_head;
+	/* Increase packets counter. */
+	rxq->stats.ipackets += pkts_ret;
+	return pkts_ret;
+}
+
+/**
+ * Dummy DPDK callback for Tx.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_txq
+ *   Generic pointer to Tx queue structure.
+ * @param[in] pkts
+ *   Packets to transmit.
+ * @param pkts_n
+ *   Number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	(void)dpdk_txq;
+	(void)pkts;
+	(void)pkts_n;
+	return 0;
+}
+
+/**
+ * Dummy DPDK callback for Rx.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_rxq
+ *   Generic pointer to Rx queue structure.
+ * @param[out] pkts
+ *   Array to store received packets.
+ * @param pkts_n
+ *   Maximum number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	(void)dpdk_rxq;
+	(void)pkts;
+	(void)pkts_n;
+	return 0;
+}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 1d46e1e..ab44af5 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -129,4 +129,16 @@ struct txq {
 	unsigned int socket; /**< CPU socket ID for allocations. */
 };
 
+/* mlx4_rxtx.c */
+
+uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp);
+uint16_t mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts,
+		       uint16_t pkts_n);
+uint16_t mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
+		       uint16_t pkts_n);
+uint16_t mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts,
+			       uint16_t pkts_n);
+uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
+			       uint16_t pkts_n);
+
 #endif /* MLX4_RXTX_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 42/48] net/mlx4: separate device control functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (40 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 41/48] net/mlx4: separate Rx/Tx functions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 43/48] net/mlx4: separate Tx configuration functions Adrien Mazarguil
                   ` (7 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile      |   1 +
 drivers/net/mlx4/mlx4.c        | 752 +---------------------------------
 drivers/net/mlx4/mlx4.h        |  18 +
 drivers/net/mlx4/mlx4_ethdev.c | 792 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_utils.h  |   9 +
 5 files changed, 829 insertions(+), 743 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 1a7e847..6498eef 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -36,6 +36,7 @@ LIB = librte_pmd_mlx4.a
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 5530257..b3213c0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -41,13 +41,6 @@
 #include <errno.h>
 #include <unistd.h>
 #include <assert.h>
-#include <net/if.h>
-#include <dirent.h>
-#include <sys/ioctl.h>
-#include <sys/socket.h>
-#include <netinet/in.h>
-#include <linux/ethtool.h>
-#include <linux/sockios.h>
 
 #include <rte_ether.h>
 #include <rte_ethdev.h>
@@ -86,370 +79,6 @@ const char *pmd_mlx4_init_params[] = {
 	NULL,
 };
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
-#define MKSTR(name, ...) \
-	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
-	\
-	snprintf(name, sizeof(name), __VA_ARGS__)
-
-/**
- * Get interface name from private structure.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[out] ifname
- *   Interface name output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
-{
-	DIR *dir;
-	struct dirent *dent;
-	unsigned int dev_type = 0;
-	unsigned int dev_port_prev = ~0u;
-	char match[IF_NAMESIZE] = "";
-
-	{
-		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
-
-		dir = opendir(path);
-		if (dir == NULL) {
-			rte_errno = errno;
-			return -rte_errno;
-		}
-	}
-	while ((dent = readdir(dir)) != NULL) {
-		char *name = dent->d_name;
-		FILE *file;
-		unsigned int dev_port;
-		int r;
-
-		if ((name[0] == '.') &&
-		    ((name[1] == '\0') ||
-		     ((name[1] == '.') && (name[2] == '\0'))))
-			continue;
-
-		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ctx->device->ibdev_path, name,
-		      (dev_type ? "dev_id" : "dev_port"));
-
-		file = fopen(path, "rb");
-		if (file == NULL) {
-			if (errno != ENOENT)
-				continue;
-			/*
-			 * Switch to dev_id when dev_port does not exist as
-			 * is the case with Linux kernel versions < 3.15.
-			 */
-try_dev_id:
-			match[0] = '\0';
-			if (dev_type)
-				break;
-			dev_type = 1;
-			dev_port_prev = ~0u;
-			rewinddir(dir);
-			continue;
-		}
-		r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
-		fclose(file);
-		if (r != 1)
-			continue;
-		/*
-		 * Switch to dev_id when dev_port returns the same value for
-		 * all ports. May happen when using a MOFED release older than
-		 * 3.0 with a Linux kernel >= 3.15.
-		 */
-		if (dev_port == dev_port_prev)
-			goto try_dev_id;
-		dev_port_prev = dev_port;
-		if (dev_port == (priv->port - 1u))
-			snprintf(match, sizeof(match), "%s", name);
-	}
-	closedir(dir);
-	if (match[0] == '\0') {
-		rte_errno = ENODEV;
-		return -rte_errno;
-	}
-	strncpy(*ifname, match, sizeof(*ifname));
-	return 0;
-}
-
-/**
- * Read from sysfs entry.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[in] entry
- *   Entry name relative to sysfs path.
- * @param[out] buf
- *   Data output buffer.
- * @param size
- *   Buffer size.
- *
- * @return
- *   Number of bytes read on success, negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-priv_sysfs_read(const struct priv *priv, const char *entry,
-		char *buf, size_t size)
-{
-	char ifname[IF_NAMESIZE];
-	FILE *file;
-	int ret;
-
-	ret = priv_get_ifname(priv, &ifname);
-	if (ret)
-		return ret;
-
-	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
-	      ifname, entry);
-
-	file = fopen(path, "rb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = fread(buf, 1, size, file);
-	if ((size_t)ret < size && ferror(file)) {
-		rte_errno = EIO;
-		ret = -rte_errno;
-	} else {
-		ret = size;
-	}
-	fclose(file);
-	return ret;
-}
-
-/**
- * Write to sysfs entry.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[in] entry
- *   Entry name relative to sysfs path.
- * @param[in] buf
- *   Data buffer.
- * @param size
- *   Buffer size.
- *
- * @return
- *   Number of bytes written on success, negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-priv_sysfs_write(const struct priv *priv, const char *entry,
-		 char *buf, size_t size)
-{
-	char ifname[IF_NAMESIZE];
-	FILE *file;
-	int ret;
-
-	ret = priv_get_ifname(priv, &ifname);
-	if (ret)
-		return ret;
-
-	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
-	      ifname, entry);
-
-	file = fopen(path, "wb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = fwrite(buf, 1, size, file);
-	if ((size_t)ret < size || ferror(file)) {
-		rte_errno = EIO;
-		ret = -rte_errno;
-	} else {
-		ret = size;
-	}
-	fclose(file);
-	return ret;
-}
-
-/**
- * Get unsigned long sysfs property.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[in] name
- *   Entry name relative to sysfs path.
- * @param[out] value
- *   Value output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
-{
-	int ret;
-	unsigned long value_ret;
-	char value_str[32];
-
-	ret = priv_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret < 0) {
-		DEBUG("cannot read %s value from sysfs: %s",
-		      name, strerror(rte_errno));
-		return ret;
-	}
-	value_str[ret] = '\0';
-	errno = 0;
-	value_ret = strtoul(value_str, NULL, 0);
-	if (errno) {
-		rte_errno = errno;
-		DEBUG("invalid %s value `%s': %s", name, value_str,
-		      strerror(rte_errno));
-		return -rte_errno;
-	}
-	*value = value_ret;
-	return 0;
-}
-
-/**
- * Set unsigned long sysfs property.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[in] name
- *   Entry name relative to sysfs path.
- * @param value
- *   Value to set.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
-{
-	int ret;
-	MKSTR(value_str, "%lu", value);
-
-	ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret < 0) {
-		DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
-		      name, value_str, value, strerror(rte_errno));
-		return ret;
-	}
-	return 0;
-}
-
-/**
- * Perform ifreq ioctl() on associated Ethernet device.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param req
- *   Request number to pass to ioctl().
- * @param[out] ifr
- *   Interface request structure output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
-{
-	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-	int ret;
-
-	if (sock == -1) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = priv_get_ifname(priv, &ifr->ifr_name);
-	if (!ret && ioctl(sock, req, ifr) == -1) {
-		rte_errno = errno;
-		ret = -rte_errno;
-	}
-	close(sock);
-	return ret;
-}
-
-/**
- * Get device MTU.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[out] mtu
- *   MTU value output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_mtu(struct priv *priv, uint16_t *mtu)
-{
-	unsigned long ulong_mtu = 0;
-	int ret = priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu);
-
-	if (ret)
-		return ret;
-	*mtu = ulong_mtu;
-	return 0;
-}
-
-/**
- * DPDK callback to change the MTU.
- *
- * @param priv
- *   Pointer to Ethernet device structure.
- * @param mtu
- *   MTU value to set.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
-{
-	struct priv *priv = dev->data->dev_private;
-	uint16_t new_mtu;
-	int ret = priv_set_sysfs_ulong(priv, "mtu", mtu);
-
-	if (ret)
-		return ret;
-	ret = priv_get_mtu(priv, &new_mtu);
-	if (ret)
-		return ret;
-	if (new_mtu == mtu) {
-		priv->mtu = mtu;
-		return 0;
-	}
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Set device flags.
- *
- * @param priv
- *   Pointer to private structure.
- * @param keep
- *   Bitmask for flags that must remain untouched.
- * @param flags
- *   Bitmask for flags to modify.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
-{
-	unsigned long tmp = 0;
-	int ret = priv_get_sysfs_ulong(priv, "flags", &tmp);
-
-	if (ret)
-		return ret;
-	tmp &= keep;
-	tmp |= (flags & (~keep));
-	return priv_set_sysfs_ulong(priv, "flags", tmp);
-}
-
 /* Device configuration. */
 
 static int
@@ -1726,346 +1355,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	memset(priv, 0, sizeof(*priv));
 }
 
-/**
- * Change the link state (UP / DOWN).
- *
- * @param priv
- *   Pointer to Ethernet device private data.
- * @param up
- *   Nonzero for link up, otherwise link down.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_set_link(struct priv *priv, int up)
-{
-	struct rte_eth_dev *dev = priv->dev;
-	int err;
-
-	if (up) {
-		err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
-		if (err)
-			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst;
-	} else {
-		err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
-		if (err)
-			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst_removed;
-		dev->tx_pkt_burst = mlx4_tx_burst_removed;
-	}
-	return 0;
-}
-
-/**
- * DPDK callback to bring the link DOWN.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_set_link_down(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-
-	return priv_set_link(priv, 0);
-}
-
-/**
- * DPDK callback to bring the link UP.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_set_link_up(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-
-	return priv_set_link(priv, 1);
-}
-
-/**
- * DPDK callback to get information about the device.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[out] info
- *   Info structure output buffer.
- */
-static void
-mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int max;
-	char ifname[IF_NAMESIZE];
-
-	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-	if (priv == NULL)
-		return;
-	/* FIXME: we should ask the device for these values. */
-	info->min_rx_bufsize = 32;
-	info->max_rx_pktlen = 65536;
-	/*
-	 * Since we need one CQ per QP, the limit is the minimum number
-	 * between the two values.
-	 */
-	max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ?
-	       priv->device_attr.max_qp : priv->device_attr.max_cq);
-	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
-	if (max >= 65535)
-		max = 65535;
-	info->max_rx_queues = max;
-	info->max_tx_queues = max;
-	/* Last array entry is reserved for broadcast. */
-	info->max_mac_addrs = 1;
-	info->rx_offload_capa = 0;
-	info->tx_offload_capa = 0;
-	if (priv_get_ifname(priv, &ifname) == 0)
-		info->if_index = if_nametoindex(ifname);
-	info->speed_capa =
-			ETH_LINK_SPEED_1G |
-			ETH_LINK_SPEED_10G |
-			ETH_LINK_SPEED_20G |
-			ETH_LINK_SPEED_40G |
-			ETH_LINK_SPEED_56G;
-}
-
-/**
- * DPDK callback to get device statistics.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[out] stats
- *   Stats structure output buffer.
- */
-static void
-mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rte_eth_stats tmp = {0};
-	unsigned int i;
-	unsigned int idx;
-
-	if (priv == NULL)
-		return;
-	/* Add software counters. */
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
-
-		if (rxq == NULL)
-			continue;
-		idx = rxq->stats.idx;
-		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			tmp.q_ipackets[idx] += rxq->stats.ipackets;
-			tmp.q_ibytes[idx] += rxq->stats.ibytes;
-			tmp.q_errors[idx] += (rxq->stats.idropped +
-					      rxq->stats.rx_nombuf);
-		}
-		tmp.ipackets += rxq->stats.ipackets;
-		tmp.ibytes += rxq->stats.ibytes;
-		tmp.ierrors += rxq->stats.idropped;
-		tmp.rx_nombuf += rxq->stats.rx_nombuf;
-	}
-	for (i = 0; (i != priv->txqs_n); ++i) {
-		struct txq *txq = (*priv->txqs)[i];
-
-		if (txq == NULL)
-			continue;
-		idx = txq->stats.idx;
-		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			tmp.q_opackets[idx] += txq->stats.opackets;
-			tmp.q_obytes[idx] += txq->stats.obytes;
-			tmp.q_errors[idx] += txq->stats.odropped;
-		}
-		tmp.opackets += txq->stats.opackets;
-		tmp.obytes += txq->stats.obytes;
-		tmp.oerrors += txq->stats.odropped;
-	}
-	*stats = tmp;
-}
-
-/**
- * DPDK callback to clear device statistics.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_stats_reset(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	unsigned int idx;
-
-	if (priv == NULL)
-		return;
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		idx = (*priv->rxqs)[i]->stats.idx;
-		(*priv->rxqs)[i]->stats =
-			(struct mlx4_rxq_stats){ .idx = idx };
-	}
-	for (i = 0; (i != priv->txqs_n); ++i) {
-		if ((*priv->txqs)[i] == NULL)
-			continue;
-		idx = (*priv->txqs)[i]->stats.idx;
-		(*priv->txqs)[i]->stats =
-			(struct mlx4_txq_stats){ .idx = idx };
-	}
-}
-
-/**
- * DPDK callback to retrieve physical link information.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param wait_to_complete
- *   Wait for request completion (ignored).
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
-{
-	const struct priv *priv = dev->data->dev_private;
-	struct ethtool_cmd edata = {
-		.cmd = ETHTOOL_GSET
-	};
-	struct ifreq ifr;
-	struct rte_eth_link dev_link;
-	int link_speed = 0;
-
-	if (priv == NULL) {
-		rte_errno = EINVAL;
-		return -rte_errno;
-	}
-	(void)wait_to_complete;
-	if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
-		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(rte_errno));
-		return -rte_errno;
-	}
-	memset(&dev_link, 0, sizeof(dev_link));
-	dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
-				(ifr.ifr_flags & IFF_RUNNING));
-	ifr.ifr_data = (void *)&edata;
-	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
-		     strerror(rte_errno));
-		return -rte_errno;
-	}
-	link_speed = ethtool_cmd_speed(&edata);
-	if (link_speed == -1)
-		dev_link.link_speed = 0;
-	else
-		dev_link.link_speed = link_speed;
-	dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
-				ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
-	dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
-			ETH_LINK_SPEED_FIXED);
-	dev->data->dev_link = dev_link;
-	return 0;
-}
-
-/**
- * DPDK callback to get flow control status.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[out] fc_conf
- *   Flow control output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct ifreq ifr;
-	struct ethtool_pauseparam ethpause = {
-		.cmd = ETHTOOL_GPAUSEPARAM
-	};
-	int ret;
-
-	ifr.ifr_data = (void *)&ethpause;
-	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = rte_errno;
-		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
-		     " failed: %s",
-		     strerror(rte_errno));
-		goto out;
-	}
-	fc_conf->autoneg = ethpause.autoneg;
-	if (ethpause.rx_pause && ethpause.tx_pause)
-		fc_conf->mode = RTE_FC_FULL;
-	else if (ethpause.rx_pause)
-		fc_conf->mode = RTE_FC_RX_PAUSE;
-	else if (ethpause.tx_pause)
-		fc_conf->mode = RTE_FC_TX_PAUSE;
-	else
-		fc_conf->mode = RTE_FC_NONE;
-	ret = 0;
-out:
-	assert(ret >= 0);
-	return -ret;
-}
-
-/**
- * DPDK callback to modify flow control parameters.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[in] fc_conf
- *   Flow control parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct ifreq ifr;
-	struct ethtool_pauseparam ethpause = {
-		.cmd = ETHTOOL_SPAUSEPARAM
-	};
-	int ret;
-
-	ifr.ifr_data = (void *)&ethpause;
-	ethpause.autoneg = fc_conf->autoneg;
-	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
-	    (fc_conf->mode & RTE_FC_RX_PAUSE))
-		ethpause.rx_pause = 1;
-	else
-		ethpause.rx_pause = 0;
-	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
-	    (fc_conf->mode & RTE_FC_TX_PAUSE))
-		ethpause.tx_pause = 1;
-	else
-		ethpause.tx_pause = 0;
-	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = rte_errno;
-		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
-		     " failed: %s",
-		     strerror(rte_errno));
-		goto out;
-	}
-	ret = 0;
-out:
-	assert(ret >= 0);
-	return -ret;
-}
-
 const struct rte_flow_ops mlx4_flow_ops = {
 	.validate = mlx4_flow_validate,
 	.create = mlx4_flow_create,
@@ -2115,8 +1404,8 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_configure = mlx4_dev_configure,
 	.dev_start = mlx4_dev_start,
 	.dev_stop = mlx4_dev_stop,
-	.dev_set_link_down = mlx4_set_link_down,
-	.dev_set_link_up = mlx4_set_link_up,
+	.dev_set_link_down = mlx4_dev_set_link_down,
+	.dev_set_link_up = mlx4_dev_set_link_up,
 	.dev_close = mlx4_dev_close,
 	.link_update = mlx4_link_update,
 	.stats_get = mlx4_stats_get,
@@ -2126,9 +1415,9 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
 	.tx_queue_release = mlx4_tx_queue_release,
-	.flow_ctrl_get = mlx4_dev_get_flow_ctrl,
-	.flow_ctrl_set = mlx4_dev_set_flow_ctrl,
-	.mtu_set = mlx4_dev_set_mtu,
+	.flow_ctrl_get = mlx4_flow_ctrl_get,
+	.flow_ctrl_set = mlx4_flow_ctrl_set,
+	.mtu_set = mlx4_mtu_set,
 	.filter_ctrl = mlx4_dev_filter_ctrl,
 	.rx_queue_intr_enable = mlx4_rx_intr_enable,
 	.rx_queue_intr_disable = mlx4_rx_intr_disable,
@@ -2187,29 +1476,6 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
 }
 
 /**
- * Get MAC address by querying netdevice.
- *
- * @param[in] priv
- *   struct priv for the requested device.
- * @param[out] mac
- *   MAC address output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
-{
-	struct ifreq request;
-	int ret = priv_ifreq(priv, SIOCGIFHWADDR, &request);
-
-	if (ret)
-		return ret;
-	memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
-	return 0;
-}
-
-/**
  * Verify and store value for device argument.
  *
  * @param[in] key
@@ -2456,7 +1722,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		priv->mtu = ETHER_MTU;
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
-		if (priv_get_mac(priv, &mac.addr_bytes)) {
+		if (mlx4_get_mac(priv, &mac.addr_bytes)) {
 			ERROR("cannot get MAC address, is mlx4_en loaded?"
 			      " (rte_errno: %s)", strerror(rte_errno));
 			goto port_error;
@@ -2474,7 +1740,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		{
 			char ifname[IF_NAMESIZE];
 
-			if (priv_get_ifname(priv, &ifname) == 0)
+			if (mlx4_get_ifname(priv, &ifname) == 0)
 				DEBUG("port %u ifname is \"%s\"",
 				      priv->port, ifname);
 			else
@@ -2482,7 +1748,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		}
 #endif
 		/* Get actual MTU if possible. */
-		priv_get_mtu(priv, &priv->mtu);
+		mlx4_mtu_get(priv, &priv->mtu);
 		DEBUG("port %u MTU is %u", priv->port, priv->mtu);
 		/* from rte_ethdev.c */
 		{
@@ -2525,7 +1791,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
 		/* Bring Ethernet device up. */
 		DEBUG("forcing Ethernet interface up");
-		priv_set_flags(priv, ~IFF_UP, IFF_UP);
+		mlx4_dev_set_link_up(priv->dev);
 		/* Update link status once if waiting for LSC. */
 		if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 			mlx4_link_update(eth_dev, 0);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index efccf1a..b5f2953 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -34,6 +34,7 @@
 #ifndef RTE_PMD_MLX4_H_
 #define RTE_PMD_MLX4_H_
 
+#include <net/if.h>
 #include <stdint.h>
 
 /* Verbs header. */
@@ -117,7 +118,24 @@ struct priv {
 /* mlx4.c */
 
 struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
+
+/* mlx4_ethdev.c */
+
+int mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE]);
+int mlx4_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN]);
+int mlx4_mtu_get(struct priv *priv, uint16_t *mtu);
+int mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
+int mlx4_dev_set_link_down(struct rte_eth_dev *dev);
+int mlx4_dev_set_link_up(struct rte_eth_dev *dev);
+void mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
+void mlx4_stats_reset(struct rte_eth_dev *dev);
+void mlx4_dev_infos_get(struct rte_eth_dev *dev,
+			struct rte_eth_dev_info *info);
 int mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete);
+int mlx4_flow_ctrl_get(struct rte_eth_dev *dev,
+		       struct rte_eth_fc_conf *fc_conf);
+int mlx4_flow_ctrl_set(struct rte_eth_dev *dev,
+		       struct rte_eth_fc_conf *fc_conf);
 
 /* mlx4_intr.c */
 
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
new file mode 100644
index 0000000..8c6b1fd
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -0,0 +1,792 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Miscellaneous control operations for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <dirent.h>
+#include <errno.h>
+#include <linux/ethtool.h>
+#include <linux/sockios.h>
+#include <net/if.h>
+#include <netinet/ip.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_pci.h>
+
+#include "mlx4.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Get interface name from private structure.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[out] ifname
+ *   Interface name output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
+{
+	DIR *dir;
+	struct dirent *dent;
+	unsigned int dev_type = 0;
+	unsigned int dev_port_prev = ~0u;
+	char match[IF_NAMESIZE] = "";
+
+	{
+		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
+
+		dir = opendir(path);
+		if (dir == NULL) {
+			rte_errno = errno;
+			return -rte_errno;
+		}
+	}
+	while ((dent = readdir(dir)) != NULL) {
+		char *name = dent->d_name;
+		FILE *file;
+		unsigned int dev_port;
+		int r;
+
+		if ((name[0] == '.') &&
+		    ((name[1] == '\0') ||
+		     ((name[1] == '.') && (name[2] == '\0'))))
+			continue;
+
+		MKSTR(path, "%s/device/net/%s/%s",
+		      priv->ctx->device->ibdev_path, name,
+		      (dev_type ? "dev_id" : "dev_port"));
+
+		file = fopen(path, "rb");
+		if (file == NULL) {
+			if (errno != ENOENT)
+				continue;
+			/*
+			 * Switch to dev_id when dev_port does not exist as
+			 * is the case with Linux kernel versions < 3.15.
+			 */
+try_dev_id:
+			match[0] = '\0';
+			if (dev_type)
+				break;
+			dev_type = 1;
+			dev_port_prev = ~0u;
+			rewinddir(dir);
+			continue;
+		}
+		r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
+		fclose(file);
+		if (r != 1)
+			continue;
+		/*
+		 * Switch to dev_id when dev_port returns the same value for
+		 * all ports. May happen when using a MOFED release older than
+		 * 3.0 with a Linux kernel >= 3.15.
+		 */
+		if (dev_port == dev_port_prev)
+			goto try_dev_id;
+		dev_port_prev = dev_port;
+		if (dev_port == (priv->port - 1u))
+			snprintf(match, sizeof(match), "%s", name);
+	}
+	closedir(dir);
+	if (match[0] == '\0') {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	strncpy(*ifname, match, sizeof(*ifname));
+	return 0;
+}
+
+/**
+ * Read from sysfs entry.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[in] entry
+ *   Entry name relative to sysfs path.
+ * @param[out] buf
+ *   Data output buffer.
+ * @param size
+ *   Buffer size.
+ *
+ * @return
+ *   Number of bytes read on success, negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx4_sysfs_read(const struct priv *priv, const char *entry,
+		char *buf, size_t size)
+{
+	char ifname[IF_NAMESIZE];
+	FILE *file;
+	int ret;
+
+	ret = mlx4_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
+
+	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+	      ifname, entry);
+
+	file = fopen(path, "rb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = fread(buf, 1, size, file);
+	if ((size_t)ret < size && ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
+		ret = size;
+	}
+	fclose(file);
+	return ret;
+}
+
+/**
+ * Write to sysfs entry.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[in] entry
+ *   Entry name relative to sysfs path.
+ * @param[in] buf
+ *   Data buffer.
+ * @param size
+ *   Buffer size.
+ *
+ * @return
+ *   Number of bytes written on success, negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx4_sysfs_write(const struct priv *priv, const char *entry,
+		 char *buf, size_t size)
+{
+	char ifname[IF_NAMESIZE];
+	FILE *file;
+	int ret;
+
+	ret = mlx4_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
+
+	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+	      ifname, entry);
+
+	file = fopen(path, "wb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = fwrite(buf, 1, size, file);
+	if ((size_t)ret < size || ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
+		ret = size;
+	}
+	fclose(file);
+	return ret;
+}
+
+/**
+ * Get unsigned long sysfs property.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[in] name
+ *   Entry name relative to sysfs path.
+ * @param[out] value
+ *   Value output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
+{
+	int ret;
+	unsigned long value_ret;
+	char value_str[32];
+
+	ret = mlx4_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
+	if (ret < 0) {
+		DEBUG("cannot read %s value from sysfs: %s",
+		      name, strerror(rte_errno));
+		return ret;
+	}
+	value_str[ret] = '\0';
+	errno = 0;
+	value_ret = strtoul(value_str, NULL, 0);
+	if (errno) {
+		rte_errno = errno;
+		DEBUG("invalid %s value `%s': %s", name, value_str,
+		      strerror(rte_errno));
+		return -rte_errno;
+	}
+	*value = value_ret;
+	return 0;
+}
+
+/**
+ * Set unsigned long sysfs property.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[in] name
+ *   Entry name relative to sysfs path.
+ * @param value
+ *   Value to set.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
+{
+	int ret;
+	MKSTR(value_str, "%lu", value);
+
+	ret = mlx4_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
+	if (ret < 0) {
+		DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
+		      name, value_str, value, strerror(rte_errno));
+		return ret;
+	}
+	return 0;
+}
+
+/**
+ * Perform ifreq ioctl() on associated Ethernet device.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param req
+ *   Request number to pass to ioctl().
+ * @param[out] ifr
+ *   Interface request structure output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
+{
+	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+	int ret;
+
+	if (sock == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = mlx4_get_ifname(priv, &ifr->ifr_name);
+	if (!ret && ioctl(sock, req, ifr) == -1) {
+		rte_errno = errno;
+		ret = -rte_errno;
+	}
+	close(sock);
+	return ret;
+}
+
+/**
+ * Get MAC address by querying netdevice.
+ *
+ * @param[in] priv
+ *   struct priv for the requested device.
+ * @param[out] mac
+ *   MAC address output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
+{
+	struct ifreq request;
+	int ret = mlx4_ifreq(priv, SIOCGIFHWADDR, &request);
+
+	if (ret)
+		return ret;
+	memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+	return 0;
+}
+
+/**
+ * Get device MTU.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[out] mtu
+ *   MTU value output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mtu_get(struct priv *priv, uint16_t *mtu)
+{
+	unsigned long ulong_mtu = 0;
+	int ret = mlx4_get_sysfs_ulong(priv, "mtu", &ulong_mtu);
+
+	if (ret)
+		return ret;
+	*mtu = ulong_mtu;
+	return 0;
+}
+
+/**
+ * DPDK callback to change the MTU.
+ *
+ * @param priv
+ *   Pointer to Ethernet device structure.
+ * @param mtu
+ *   MTU value to set.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct priv *priv = dev->data->dev_private;
+	uint16_t new_mtu;
+	int ret = mlx4_set_sysfs_ulong(priv, "mtu", mtu);
+
+	if (ret)
+		return ret;
+	ret = mlx4_mtu_get(priv, &new_mtu);
+	if (ret)
+		return ret;
+	if (new_mtu == mtu) {
+		priv->mtu = mtu;
+		return 0;
+	}
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Set device flags.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param keep
+ *   Bitmask for flags that must remain untouched.
+ * @param flags
+ *   Bitmask for flags to modify.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
+{
+	unsigned long tmp = 0;
+	int ret = mlx4_get_sysfs_ulong(priv, "flags", &tmp);
+
+	if (ret)
+		return ret;
+	tmp &= keep;
+	tmp |= (flags & (~keep));
+	return mlx4_set_sysfs_ulong(priv, "flags", tmp);
+}
+
+/**
+ * Change the link state (UP / DOWN).
+ *
+ * @param priv
+ *   Pointer to Ethernet device private data.
+ * @param up
+ *   Nonzero for link up, otherwise link down.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_dev_set_link(struct priv *priv, int up)
+{
+	struct rte_eth_dev *dev = priv->dev;
+	int err;
+
+	if (up) {
+		err = mlx4_set_flags(priv, ~IFF_UP, IFF_UP);
+		if (err)
+			return err;
+		dev->rx_pkt_burst = mlx4_rx_burst;
+	} else {
+		err = mlx4_set_flags(priv, ~IFF_UP, ~IFF_UP);
+		if (err)
+			return err;
+		dev->rx_pkt_burst = mlx4_rx_burst_removed;
+		dev->tx_pkt_burst = mlx4_tx_burst_removed;
+	}
+	return 0;
+}
+
+/**
+ * DPDK callback to bring the link DOWN.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+
+	return mlx4_dev_set_link(priv, 0);
+}
+
+/**
+ * DPDK callback to bring the link UP.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+
+	return mlx4_dev_set_link(priv, 1);
+}
+
+/**
+ * DPDK callback to get information about the device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] info
+ *   Info structure output buffer.
+ */
+void
+mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
+{
+	struct priv *priv = dev->data->dev_private;
+	unsigned int max;
+	char ifname[IF_NAMESIZE];
+
+	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	if (priv == NULL)
+		return;
+	/* FIXME: we should ask the device for these values. */
+	info->min_rx_bufsize = 32;
+	info->max_rx_pktlen = 65536;
+	/*
+	 * Since we need one CQ per QP, the limit is the minimum number
+	 * between the two values.
+	 */
+	max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ?
+	       priv->device_attr.max_qp : priv->device_attr.max_cq);
+	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
+	if (max >= 65535)
+		max = 65535;
+	info->max_rx_queues = max;
+	info->max_tx_queues = max;
+	/* Last array entry is reserved for broadcast. */
+	info->max_mac_addrs = 1;
+	info->rx_offload_capa = 0;
+	info->tx_offload_capa = 0;
+	if (mlx4_get_ifname(priv, &ifname) == 0)
+		info->if_index = if_nametoindex(ifname);
+	info->speed_capa =
+			ETH_LINK_SPEED_1G |
+			ETH_LINK_SPEED_10G |
+			ETH_LINK_SPEED_20G |
+			ETH_LINK_SPEED_40G |
+			ETH_LINK_SPEED_56G;
+}
+
+/**
+ * DPDK callback to get device statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] stats
+ *   Stats structure output buffer.
+ */
+void
+mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_eth_stats tmp = {0};
+	unsigned int i;
+	unsigned int idx;
+
+	if (priv == NULL)
+		return;
+	/* Add software counters. */
+	for (i = 0; (i != priv->rxqs_n); ++i) {
+		struct rxq *rxq = (*priv->rxqs)[i];
+
+		if (rxq == NULL)
+			continue;
+		idx = rxq->stats.idx;
+		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			tmp.q_ipackets[idx] += rxq->stats.ipackets;
+			tmp.q_ibytes[idx] += rxq->stats.ibytes;
+			tmp.q_errors[idx] += (rxq->stats.idropped +
+					      rxq->stats.rx_nombuf);
+		}
+		tmp.ipackets += rxq->stats.ipackets;
+		tmp.ibytes += rxq->stats.ibytes;
+		tmp.ierrors += rxq->stats.idropped;
+		tmp.rx_nombuf += rxq->stats.rx_nombuf;
+	}
+	for (i = 0; (i != priv->txqs_n); ++i) {
+		struct txq *txq = (*priv->txqs)[i];
+
+		if (txq == NULL)
+			continue;
+		idx = txq->stats.idx;
+		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			tmp.q_opackets[idx] += txq->stats.opackets;
+			tmp.q_obytes[idx] += txq->stats.obytes;
+			tmp.q_errors[idx] += txq->stats.odropped;
+		}
+		tmp.opackets += txq->stats.opackets;
+		tmp.obytes += txq->stats.obytes;
+		tmp.oerrors += txq->stats.odropped;
+	}
+	*stats = tmp;
+}
+
+/**
+ * DPDK callback to clear device statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_stats_reset(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int idx;
+
+	if (priv == NULL)
+		return;
+	for (i = 0; (i != priv->rxqs_n); ++i) {
+		if ((*priv->rxqs)[i] == NULL)
+			continue;
+		idx = (*priv->rxqs)[i]->stats.idx;
+		(*priv->rxqs)[i]->stats =
+			(struct mlx4_rxq_stats){ .idx = idx };
+	}
+	for (i = 0; (i != priv->txqs_n); ++i) {
+		if ((*priv->txqs)[i] == NULL)
+			continue;
+		idx = (*priv->txqs)[i]->stats.idx;
+		(*priv->txqs)[i]->stats =
+			(struct mlx4_txq_stats){ .idx = idx };
+	}
+}
+
+/**
+ * DPDK callback to retrieve physical link information.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ *   Wait for request completion (ignored).
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+{
+	const struct priv *priv = dev->data->dev_private;
+	struct ethtool_cmd edata = {
+		.cmd = ETHTOOL_GSET,
+	};
+	struct ifreq ifr;
+	struct rte_eth_link dev_link;
+	int link_speed = 0;
+
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	(void)wait_to_complete;
+	if (mlx4_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
+		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(rte_errno));
+		return -rte_errno;
+	}
+	memset(&dev_link, 0, sizeof(dev_link));
+	dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+				(ifr.ifr_flags & IFF_RUNNING));
+	ifr.ifr_data = (void *)&edata;
+	if (mlx4_ifreq(priv, SIOCETHTOOL, &ifr)) {
+		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+		     strerror(rte_errno));
+		return -rte_errno;
+	}
+	link_speed = ethtool_cmd_speed(&edata);
+	if (link_speed == -1)
+		dev_link.link_speed = 0;
+	else
+		dev_link.link_speed = link_speed;
+	dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+				ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+	dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
+				  ETH_LINK_SPEED_FIXED);
+	dev->data->dev_link = dev_link;
+	return 0;
+}
+
+/**
+ * DPDK callback to get flow control status.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] fc_conf
+ *   Flow control output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_flow_ctrl_get(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct ifreq ifr;
+	struct ethtool_pauseparam ethpause = {
+		.cmd = ETHTOOL_GPAUSEPARAM,
+	};
+	int ret;
+
+	ifr.ifr_data = (void *)&ethpause;
+	if (mlx4_ifreq(priv, SIOCETHTOOL, &ifr)) {
+		ret = rte_errno;
+		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
+		     " failed: %s",
+		     strerror(rte_errno));
+		goto out;
+	}
+	fc_conf->autoneg = ethpause.autoneg;
+	if (ethpause.rx_pause && ethpause.tx_pause)
+		fc_conf->mode = RTE_FC_FULL;
+	else if (ethpause.rx_pause)
+		fc_conf->mode = RTE_FC_RX_PAUSE;
+	else if (ethpause.tx_pause)
+		fc_conf->mode = RTE_FC_TX_PAUSE;
+	else
+		fc_conf->mode = RTE_FC_NONE;
+	ret = 0;
+out:
+	assert(ret >= 0);
+	return -ret;
+}
+
+/**
+ * DPDK callback to modify flow control parameters.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[in] fc_conf
+ *   Flow control parameters.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct ifreq ifr;
+	struct ethtool_pauseparam ethpause = {
+		.cmd = ETHTOOL_SPAUSEPARAM,
+	};
+	int ret;
+
+	ifr.ifr_data = (void *)&ethpause;
+	ethpause.autoneg = fc_conf->autoneg;
+	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+	    (fc_conf->mode & RTE_FC_RX_PAUSE))
+		ethpause.rx_pause = 1;
+	else
+		ethpause.rx_pause = 0;
+	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+	    (fc_conf->mode & RTE_FC_TX_PAUSE))
+		ethpause.tx_pause = 1;
+	else
+		ethpause.tx_pause = 0;
+	if (mlx4_ifreq(priv, SIOCETHTOOL, &ifr)) {
+		ret = rte_errno;
+		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
+		     " failed: %s",
+		     strerror(rte_errno));
+		goto out;
+	}
+	ret = 0;
+out:
+	assert(ret >= 0);
+	return -ret;
+}
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index 0b9a96a..e74b61b 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -34,6 +34,9 @@
 #ifndef MLX4_UTILS_H_
 #define MLX4_UTILS_H_
 
+#include <stddef.h>
+#include <stdio.h>
+
 #include <rte_common.h>
 #include <rte_log.h>
 
@@ -89,6 +92,12 @@ pmd_drv_log_basename(const char *s)
 #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
 
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
+	\
+	snprintf(name, sizeof(name), __VA_ARGS__)
+
 /* mlx4_utils.c */
 
 int mlx4_fd_set_non_blocking(int fd);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 43/48] net/mlx4: separate Tx configuration functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (41 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 42/48] net/mlx4: separate device control functions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 44/48] net/mlx4: separate Rx " Adrien Mazarguil
                   ` (6 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 421 +---------------------------------
 drivers/net/mlx4/mlx4_rxtx.h |   9 +
 drivers/net/mlx4/mlx4_txq.c  | 472 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 483 insertions(+), 420 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 6498eef..22820ab 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -40,6 +40,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b3213c0..817b36e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -56,9 +56,6 @@
 #include <rte_interrupts.h>
 #include <rte_common.h>
 
-/* Generated configuration header. */
-#include "mlx4_autoconf.h"
-
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
@@ -82,13 +79,6 @@ const char *pmd_mlx4_init_params[] = {
 /* Device configuration. */
 
 static int
-txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_txconf *conf);
-
-static void
-txq_cleanup(struct txq *txq);
-
-static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	  unsigned int socket, const struct rte_eth_rxconf *conf,
 	  struct rte_mempool *mp);
@@ -132,128 +122,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-/* TX queues handling. */
-
-/**
- * Allocate TX queue elements.
- *
- * @param txq
- *   Pointer to TX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-txq_alloc_elts(struct txq *txq, unsigned int elts_n)
-{
-	unsigned int i;
-	struct txq_elt (*elts)[elts_n] =
-		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
-	int ret = 0;
-
-	if (elts == NULL) {
-		ERROR("%p: can't allocate packets array", (void *)txq);
-		ret = ENOMEM;
-		goto error;
-	}
-	for (i = 0; (i != elts_n); ++i) {
-		struct txq_elt *elt = &(*elts)[i];
-
-		elt->buf = NULL;
-	}
-	DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
-	txq->elts_n = elts_n;
-	txq->elts = elts;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	/*
-	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
-	 * at least 4 times per ring.
-	 */
-	txq->elts_comp_cd_init =
-		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
-		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
-	txq->elts_comp_cd = txq->elts_comp_cd_init;
-	assert(ret == 0);
-	return 0;
-error:
-	rte_free(elts);
-	DEBUG("%p: failed, freed everything", (void *)txq);
-	assert(ret > 0);
-	rte_errno = ret;
-	return -rte_errno;
-}
-
-/**
- * Free TX queue elements.
- *
- * @param txq
- *   Pointer to TX queue structure.
- */
-static void
-txq_free_elts(struct txq *txq)
-{
-	unsigned int elts_n = txq->elts_n;
-	unsigned int elts_head = txq->elts_head;
-	unsigned int elts_tail = txq->elts_tail;
-	struct txq_elt (*elts)[elts_n] = txq->elts;
-
-	DEBUG("%p: freeing WRs", (void *)txq);
-	txq->elts_n = 0;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	txq->elts_comp_cd = 0;
-	txq->elts_comp_cd_init = 0;
-	txq->elts = NULL;
-	if (elts == NULL)
-		return;
-	while (elts_tail != elts_head) {
-		struct txq_elt *elt = &(*elts)[elts_tail];
-
-		assert(elt->buf != NULL);
-		rte_pktmbuf_free(elt->buf);
-#ifndef NDEBUG
-		/* Poisoning. */
-		memset(elt, 0x77, sizeof(*elt));
-#endif
-		if (++elts_tail == elts_n)
-			elts_tail = 0;
-	}
-	rte_free(elts);
-}
-
-/**
- * Clean up a TX queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param txq
- *   Pointer to TX queue structure.
- */
-static void
-txq_cleanup(struct txq *txq)
-{
-	size_t i;
-
-	DEBUG("cleaning up %p", (void *)txq);
-	txq_free_elts(txq);
-	if (txq->qp != NULL)
-		claim_zero(ibv_destroy_qp(txq->qp));
-	if (txq->cq != NULL)
-		claim_zero(ibv_destroy_cq(txq->cq));
-	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
-		if (txq->mp2mr[i].mp == NULL)
-			break;
-		assert(txq->mp2mr[i].mr != NULL);
-		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
-	}
-	memset(txq, 0, sizeof(*txq));
-}
-
 struct mlx4_check_mempool_data {
 	int ret;
 	char *start;
@@ -367,293 +235,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	return mr;
 }
 
-struct txq_mp2mr_mbuf_check_data {
-	int ret;
-};
-
-/**
- * Callback function for rte_mempool_obj_iter() to check whether a given
- * mempool object looks like a mbuf.
- *
- * @param[in] mp
- *   The mempool pointer
- * @param[in] arg
- *   Context data (struct txq_mp2mr_mbuf_check_data). Contains the
- *   return value.
- * @param[in] obj
- *   Object address.
- * @param index
- *   Object index, unused.
- */
-static void
-txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
-	uint32_t index __rte_unused)
-{
-	struct txq_mp2mr_mbuf_check_data *data = arg;
-	struct rte_mbuf *buf = obj;
-
-	/*
-	 * Check whether mbuf structure fits element size and whether mempool
-	 * pointer is valid.
-	 */
-	if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
-		data->ret = -1;
-}
-
-/**
- * Iterator function for rte_mempool_walk() to register existing mempools and
- * fill the MP to MR cache of a TX queue.
- *
- * @param[in] mp
- *   Memory Pool to register.
- * @param *arg
- *   Pointer to TX queue structure.
- */
-static void
-txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
-{
-	struct txq *txq = arg;
-	struct txq_mp2mr_mbuf_check_data data = {
-		.ret = 0,
-	};
-
-	/* Register mempool only if the first element looks like a mbuf. */
-	if (rte_mempool_obj_iter(mp, txq_mp2mr_mbuf_check, &data) == 0 ||
-			data.ret == -1)
-		return;
-	mlx4_txq_mp2mr(txq, mp);
-}
-
-/**
- * Configure a TX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param txq
- *   Pointer to TX queue structure.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_txconf *conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct txq tmpl = {
-		.priv = priv,
-		.socket = socket
-	};
-	union {
-		struct ibv_qp_init_attr init;
-		struct ibv_qp_attr mod;
-	} attr;
-	int ret;
-
-	(void)conf; /* Thresholds configuration (ignored). */
-	if (priv == NULL) {
-		rte_errno = EINVAL;
-		goto error;
-	}
-	if (desc == 0) {
-		rte_errno = EINVAL;
-		ERROR("%p: invalid number of TX descriptors", (void *)dev);
-		goto error;
-	}
-	/* MRs will be registered in mp2mr[] later. */
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
-	if (tmpl.cq == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	attr.init = (struct ibv_qp_init_attr){
-		/* CQ to be associated with the send queue. */
-		.send_cq = tmpl.cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = tmpl.cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_send_sge = 1,
-			.max_inline_data = MLX4_PMD_MAX_INLINE,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		/*
-		 * Do *NOT* enable this, completions events are managed per
-		 * TX burst.
-		 */
-		.sq_sig_all = 0,
-	};
-	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
-	if (tmpl.qp == NULL) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	/* ibv_create_qp() updates this value. */
-	tmpl.max_inline = attr.init.cap.max_inline_data;
-	attr.mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = txq_alloc_elts(&tmpl, desc);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: TXQ allocation failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	attr.mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	attr.mod.qp_state = IBV_QPS_RTS;
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	/* Clean up txq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
-	txq_cleanup(txq);
-	*txq = tmpl;
-	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
-	/* Pre-register known mempools. */
-	rte_mempool_walk(txq_mp2mr_iter, txq);
-	return 0;
-error:
-	ret = rte_errno;
-	txq_cleanup(&tmpl);
-	rte_errno = ret;
-	assert(rte_errno > 0);
-	return -rte_errno;
-}
-
-/**
- * DPDK callback to configure a TX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   TX queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct txq *txq = (*priv->txqs)[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= priv->txqs_n) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, priv->txqs_n);
-		return -rte_errno;
-	}
-	if (txq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)txq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		(*priv->txqs)[idx] = NULL;
-		txq_cleanup(txq);
-	} else {
-		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
-		if (txq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = txq_setup(dev, txq, desc, socket, conf);
-	if (ret)
-		rte_free(txq);
-	else {
-		txq->stats.idx = idx;
-		DEBUG("%p: adding TX queue %p to list",
-		      (void *)dev, (void *)txq);
-		(*priv->txqs)[idx] = txq;
-		/* Update send callback. */
-		dev->tx_pkt_burst = mlx4_tx_burst;
-	}
-	return ret;
-}
-
-/**
- * DPDK callback to release a TX queue.
- *
- * @param dpdk_txq
- *   Generic TX queue pointer.
- */
-static void
-mlx4_tx_queue_release(void *dpdk_txq)
-{
-	struct txq *txq = (struct txq *)dpdk_txq;
-	struct priv *priv;
-	unsigned int i;
-
-	if (txq == NULL)
-		return;
-	priv = txq->priv;
-	for (i = 0; (i != priv->txqs_n); ++i)
-		if ((*priv->txqs)[i] == txq) {
-			DEBUG("%p: removing TX queue %p from list",
-			      (void *)priv->dev, (void *)txq);
-			(*priv->txqs)[i] = NULL;
-			break;
-		}
-	txq_cleanup(txq);
-	rte_free(txq);
-}
-
 /* RX queues handling. */
 
 /**
@@ -1339,7 +920,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 			if (tmp == NULL)
 				continue;
 			(*priv->txqs)[i] = NULL;
-			txq_cleanup(tmp);
+			mlx4_txq_cleanup(tmp);
 			rte_free(tmp);
 		}
 		priv->txqs_n = 0;
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index ab44af5..b02af8e 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -45,6 +45,7 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_ethdev.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 
@@ -141,4 +142,12 @@ uint16_t mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts,
 uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
 			       uint16_t pkts_n);
 
+/* mlx4_txq.c */
+
+void mlx4_txq_cleanup(struct txq *txq);
+int mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			uint16_t desc, unsigned int socket,
+			const struct rte_eth_txconf *conf);
+void mlx4_tx_queue_release(void *dpdk_txq);
+
 #endif /* MLX4_RXTX_H_ */
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
new file mode 100644
index 0000000..6095322
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -0,0 +1,472 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Tx queues configuration for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#include "mlx4.h"
+#include "mlx4_autoconf.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Allocate TX queue elements.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param elts_n
+ *   Number of elements to allocate.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_txq_alloc_elts(struct txq *txq, unsigned int elts_n)
+{
+	unsigned int i;
+	struct txq_elt (*elts)[elts_n] =
+		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
+	int ret = 0;
+
+	if (elts == NULL) {
+		ERROR("%p: can't allocate packets array", (void *)txq);
+		ret = ENOMEM;
+		goto error;
+	}
+	for (i = 0; (i != elts_n); ++i) {
+		struct txq_elt *elt = &(*elts)[i];
+
+		elt->buf = NULL;
+	}
+	DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
+	txq->elts_n = elts_n;
+	txq->elts = elts;
+	txq->elts_head = 0;
+	txq->elts_tail = 0;
+	txq->elts_comp = 0;
+	/*
+	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
+	 * at least 4 times per ring.
+	 */
+	txq->elts_comp_cd_init =
+		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
+		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
+	txq->elts_comp_cd = txq->elts_comp_cd_init;
+	assert(ret == 0);
+	return 0;
+error:
+	rte_free(elts);
+	DEBUG("%p: failed, freed everything", (void *)txq);
+	assert(ret > 0);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+/**
+ * Free TX queue elements.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ */
+static void
+mlx4_txq_free_elts(struct txq *txq)
+{
+	unsigned int elts_n = txq->elts_n;
+	unsigned int elts_head = txq->elts_head;
+	unsigned int elts_tail = txq->elts_tail;
+	struct txq_elt (*elts)[elts_n] = txq->elts;
+
+	DEBUG("%p: freeing WRs", (void *)txq);
+	txq->elts_n = 0;
+	txq->elts_head = 0;
+	txq->elts_tail = 0;
+	txq->elts_comp = 0;
+	txq->elts_comp_cd = 0;
+	txq->elts_comp_cd_init = 0;
+	txq->elts = NULL;
+	if (elts == NULL)
+		return;
+	while (elts_tail != elts_head) {
+		struct txq_elt *elt = &(*elts)[elts_tail];
+
+		assert(elt->buf != NULL);
+		rte_pktmbuf_free(elt->buf);
+#ifndef NDEBUG
+		/* Poisoning. */
+		memset(elt, 0x77, sizeof(*elt));
+#endif
+		if (++elts_tail == elts_n)
+			elts_tail = 0;
+	}
+	rte_free(elts);
+}
+
+/**
+ * Clean up a TX queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ */
+void
+mlx4_txq_cleanup(struct txq *txq)
+{
+	size_t i;
+
+	DEBUG("cleaning up %p", (void *)txq);
+	mlx4_txq_free_elts(txq);
+	if (txq->qp != NULL)
+		claim_zero(ibv_destroy_qp(txq->qp));
+	if (txq->cq != NULL)
+		claim_zero(ibv_destroy_cq(txq->cq));
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+		if (txq->mp2mr[i].mp == NULL)
+			break;
+		assert(txq->mp2mr[i].mr != NULL);
+		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+	}
+	memset(txq, 0, sizeof(*txq));
+}
+
+struct txq_mp2mr_mbuf_check_data {
+	int ret;
+};
+
+/**
+ * Callback function for rte_mempool_obj_iter() to check whether a given
+ * mempool object looks like a mbuf.
+ *
+ * @param[in] mp
+ *   The mempool pointer
+ * @param[in] arg
+ *   Context data (struct mlx4_txq_mp2mr_mbuf_check_data). Contains the
+ *   return value.
+ * @param[in] obj
+ *   Object address.
+ * @param index
+ *   Object index, unused.
+ */
+static void
+mlx4_txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
+			  uint32_t index)
+{
+	struct txq_mp2mr_mbuf_check_data *data = arg;
+	struct rte_mbuf *buf = obj;
+
+	(void)index;
+	/*
+	 * Check whether mbuf structure fits element size and whether mempool
+	 * pointer is valid.
+	 */
+	if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
+		data->ret = -1;
+}
+
+/**
+ * Iterator function for rte_mempool_walk() to register existing mempools and
+ * fill the MP to MR cache of a TX queue.
+ *
+ * @param[in] mp
+ *   Memory Pool to register.
+ * @param *arg
+ *   Pointer to TX queue structure.
+ */
+static void
+mlx4_txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
+{
+	struct txq *txq = arg;
+	struct txq_mp2mr_mbuf_check_data data = {
+		.ret = 0,
+	};
+
+	/* Register mempool only if the first element looks like a mbuf. */
+	if (rte_mempool_obj_iter(mp, mlx4_txq_mp2mr_mbuf_check, &data) == 0 ||
+			data.ret == -1)
+		return;
+	mlx4_txq_mp2mr(txq, mp);
+}
+
+/**
+ * Configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
+	       unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct txq tmpl = {
+		.priv = priv,
+		.socket = socket
+	};
+	union {
+		struct ibv_qp_init_attr init;
+		struct ibv_qp_attr mod;
+	} attr;
+	int ret;
+
+	(void)conf; /* Thresholds configuration (ignored). */
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		goto error;
+	}
+	if (desc == 0) {
+		rte_errno = EINVAL;
+		ERROR("%p: invalid number of TX descriptors", (void *)dev);
+		goto error;
+	}
+	/* MRs will be registered in mp2mr[] later. */
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+	if (tmpl.cq == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("%p: CQ creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	DEBUG("priv->device_attr.max_qp_wr is %d",
+	      priv->device_attr.max_qp_wr);
+	DEBUG("priv->device_attr.max_sge is %d",
+	      priv->device_attr.max_sge);
+	attr.init = (struct ibv_qp_init_attr){
+		/* CQ to be associated with the send queue. */
+		.send_cq = tmpl.cq,
+		/* CQ to be associated with the receive queue. */
+		.recv_cq = tmpl.cq,
+		.cap = {
+			/* Max number of outstanding WRs. */
+			.max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
+					priv->device_attr.max_qp_wr :
+					desc),
+			/* Max number of scatter/gather elements in a WR. */
+			.max_send_sge = 1,
+			.max_inline_data = MLX4_PMD_MAX_INLINE,
+		},
+		.qp_type = IBV_QPT_RAW_PACKET,
+		/*
+		 * Do *NOT* enable this, completions events are managed per
+		 * TX burst.
+		 */
+		.sq_sig_all = 0,
+	};
+	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
+	if (tmpl.qp == NULL) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: QP creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	/* ibv_create_qp() updates this value. */
+	tmpl.max_inline = attr.init.cap.max_inline_data;
+	attr.mod = (struct ibv_qp_attr){
+		/* Move the QP to this state. */
+		.qp_state = IBV_QPS_INIT,
+		/* Primary port number. */
+		.port_num = priv->port
+	};
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = mlx4_txq_alloc_elts(&tmpl, desc);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: TXQ allocation failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	attr.mod = (struct ibv_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	attr.mod.qp_state = IBV_QPS_RTS;
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	/* Clean up txq in case we're reinitializing it. */
+	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
+	mlx4_txq_cleanup(txq);
+	*txq = tmpl;
+	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
+	/* Pre-register known mempools. */
+	rte_mempool_walk(mlx4_txq_mp2mr_iter, txq);
+	return 0;
+error:
+	ret = rte_errno;
+	mlx4_txq_cleanup(&tmpl);
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   TX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct txq *txq = (*priv->txqs)[idx];
+	int ret;
+
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= priv->txqs_n) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, priv->txqs_n);
+		return -rte_errno;
+	}
+	if (txq != NULL) {
+		DEBUG("%p: reusing already allocated queue index %u (%p)",
+		      (void *)dev, idx, (void *)txq);
+		if (priv->started) {
+			rte_errno = EEXIST;
+			return -rte_errno;
+		}
+		(*priv->txqs)[idx] = NULL;
+		mlx4_txq_cleanup(txq);
+	} else {
+		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+		if (txq == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: unable to allocate queue index %u",
+			      (void *)dev, idx);
+			return -rte_errno;
+		}
+	}
+	ret = mlx4_txq_setup(dev, txq, desc, socket, conf);
+	if (ret) {
+		rte_free(txq);
+	} else {
+		txq->stats.idx = idx;
+		DEBUG("%p: adding TX queue %p to list",
+		      (void *)dev, (void *)txq);
+		(*priv->txqs)[idx] = txq;
+		/* Update send callback. */
+		dev->tx_pkt_burst = mlx4_tx_burst;
+	}
+	return ret;
+}
+
+/**
+ * DPDK callback to release a TX queue.
+ *
+ * @param dpdk_txq
+ *   Generic TX queue pointer.
+ */
+void
+mlx4_tx_queue_release(void *dpdk_txq)
+{
+	struct txq *txq = (struct txq *)dpdk_txq;
+	struct priv *priv;
+	unsigned int i;
+
+	if (txq == NULL)
+		return;
+	priv = txq->priv;
+	for (i = 0; (i != priv->txqs_n); ++i)
+		if ((*priv->txqs)[i] == txq) {
+			DEBUG("%p: removing TX queue %p from list",
+			      (void *)priv->dev, (void *)txq);
+			(*priv->txqs)[i] = NULL;
+			break;
+		}
+	mlx4_txq_cleanup(txq);
+	rte_free(txq);
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 44/48] net/mlx4: separate Rx configuration functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (42 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 43/48] net/mlx4: separate Tx configuration functions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 45/48] net/mlx4: group flow API handlers in common file Adrien Mazarguil
                   ` (5 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 592 +----------------------------------
 drivers/net/mlx4/mlx4_rxq.c  | 632 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_rxtx.h |  11 +
 4 files changed, 650 insertions(+), 586 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 22820ab..00ccba0 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 817b36e..b54a569 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -78,17 +78,6 @@ const char *pmd_mlx4_init_params[] = {
 
 /* Device configuration. */
 
-static int
-rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp);
-
-static void
-rxq_cleanup(struct rxq *rxq);
-
-static void
-priv_mac_addr_del(struct priv *priv);
-
 /**
  * DPDK callback for Ethernet device configuration.
  *
@@ -235,575 +224,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	return mr;
 }
 
-/* RX queues handling. */
-
-/**
- * Allocate RX queue elements.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- * @param[in] pool
- *   If not NULL, fetch buffers from this array instead of allocating them
- *   with rte_pktmbuf_alloc().
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
-{
-	unsigned int i;
-	struct rxq_elt (*elts)[elts_n] =
-		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
-				  rxq->socket);
-
-	if (elts == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: can't allocate packets array", (void *)rxq);
-		goto error;
-	}
-	/* For each WR (packet). */
-	for (i = 0; (i != elts_n); ++i) {
-		struct rxq_elt *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
-		struct ibv_sge *sge = &(*elts)[i].sge;
-		struct rte_mbuf *buf;
-
-		if (pool != NULL) {
-			buf = *(pool++);
-			assert(buf != NULL);
-			rte_pktmbuf_reset(buf);
-		} else
-			buf = rte_pktmbuf_alloc(rxq->mp);
-		if (buf == NULL) {
-			rte_errno = ENOMEM;
-			assert(pool == NULL);
-			ERROR("%p: empty mbuf pool", (void *)rxq);
-			goto error;
-		}
-		/*
-		 * Configure WR. Work request ID contains its own index in
-		 * the elts array and the offset between SGE buffer header and
-		 * its data.
-		 */
-		WR_ID(wr->wr_id).id = i;
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)buf);
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = sge;
-		wr->num_sge = 1;
-		/* Headroom is reserved by rte_pktmbuf_alloc(). */
-		assert(buf->data_off == RTE_PKTMBUF_HEADROOM);
-		/* Buffer is supposed to be empty. */
-		assert(rte_pktmbuf_data_len(buf) == 0);
-		assert(rte_pktmbuf_pkt_len(buf) == 0);
-		/* sge->addr must be able to store a pointer. */
-		assert(sizeof(sge->addr) >= sizeof(uintptr_t));
-		/* SGE keeps its headroom. */
-		sge->addr = (uintptr_t)
-			((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
-		sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
-		sge->lkey = rxq->mr->lkey;
-		/* Redundant check for tailroom. */
-		assert(sge->length == rte_pktmbuf_tailroom(buf));
-		/*
-		 * Make sure elts index and SGE mbuf pointer can be deduced
-		 * from WR ID.
-		 */
-		if ((WR_ID(wr->wr_id).id != i) ||
-		    ((void *)((uintptr_t)sge->addr -
-			WR_ID(wr->wr_id).offset) != buf)) {
-			rte_errno = EOVERFLOW;
-			ERROR("%p: cannot store index and offset in WR ID",
-			      (void *)rxq);
-			sge->addr = 0;
-			rte_pktmbuf_free(buf);
-			goto error;
-		}
-	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
-	DEBUG("%p: allocated and configured %u single-segment WRs",
-	      (void *)rxq, elts_n);
-	rxq->elts_n = elts_n;
-	rxq->elts_head = 0;
-	rxq->elts = elts;
-	return 0;
-error:
-	if (elts != NULL) {
-		assert(pool == NULL);
-		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf;
-
-			if (elt->sge.addr == 0)
-				continue;
-			assert(WR_ID(elt->wr.wr_id).id == i);
-			buf = (void *)((uintptr_t)elt->sge.addr -
-				WR_ID(elt->wr.wr_id).offset);
-			rte_pktmbuf_free_seg(buf);
-		}
-		rte_free(elts);
-	}
-	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(rte_errno > 0);
-	return -rte_errno;
-}
-
-/**
- * Free RX queue elements.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_free_elts(struct rxq *rxq)
-{
-	unsigned int i;
-	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt (*elts)[elts_n] = rxq->elts;
-
-	DEBUG("%p: freeing WRs", (void *)rxq);
-	rxq->elts_n = 0;
-	rxq->elts = NULL;
-	if (elts == NULL)
-		return;
-	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
-		struct rxq_elt *elt = &(*elts)[i];
-		struct rte_mbuf *buf;
-
-		if (elt->sge.addr == 0)
-			continue;
-		assert(WR_ID(elt->wr.wr_id).id == i);
-		buf = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(elt->wr.wr_id).offset);
-		rte_pktmbuf_free_seg(buf);
-	}
-	rte_free(elts);
-}
-
-/**
- * Unregister a MAC address.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_mac_addr_del(struct priv *priv)
-{
-#ifndef NDEBUG
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-#endif
-
-	if (!priv->mac_flow)
-		return;
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	claim_zero(ibv_destroy_flow(priv->mac_flow));
-	priv->mac_flow = NULL;
-}
-
-/**
- * Register a MAC address.
- *
- * The MAC address is registered in queue 0.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_mac_addr_add(struct priv *priv)
-{
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-	struct rxq *rxq;
-	struct ibv_flow *flow;
-
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		return 0;
-	if (priv->isolated)
-		return 0;
-	if (*priv->rxqs && (*priv->rxqs)[0])
-		rxq = (*priv->rxqs)[0];
-	else
-		return 0;
-
-	/* Allocate flow specification on the stack. */
-	struct __attribute__((packed)) {
-		struct ibv_flow_attr attr;
-		struct ibv_flow_spec_eth spec;
-	} data;
-	struct ibv_flow_attr *attr = &data.attr;
-	struct ibv_flow_spec_eth *spec = &data.spec;
-
-	if (priv->mac_flow)
-		priv_mac_addr_del(priv);
-	/*
-	 * No padding must be inserted by the compiler between attr and spec.
-	 * This layout is expected by libibverbs.
-	 */
-	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-	*attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.priority = 3,
-		.num_of_specs = 1,
-		.port = priv->port,
-		.flags = 0
-	};
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
-		.size = sizeof(*spec),
-		.val = {
-			.dst_mac = {
-				(*mac)[0], (*mac)[1], (*mac)[2],
-				(*mac)[3], (*mac)[4], (*mac)[5]
-			},
-		},
-		.mask = {
-			.dst_mac = "\xff\xff\xff\xff\xff\xff",
-		}
-	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	/* Create related flow. */
-	flow = ibv_create_flow(rxq->qp, attr);
-	if (flow == NULL) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, rte_errno, strerror(errno));
-		return -rte_errno;
-	}
-	assert(priv->mac_flow == NULL);
-	priv->mac_flow = flow;
-	return 0;
-}
-
-/**
- * Clean up a RX queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_cleanup(struct rxq *rxq)
-{
-	DEBUG("cleaning up %p", (void *)rxq);
-	rxq_free_elts(rxq);
-	if (rxq->qp != NULL)
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	if (rxq->cq != NULL)
-		claim_zero(ibv_destroy_cq(rxq->cq));
-	if (rxq->channel != NULL)
-		claim_zero(ibv_destroy_comp_channel(rxq->channel));
-	if (rxq->mr != NULL)
-		claim_zero(ibv_dereg_mr(rxq->mr));
-	memset(rxq, 0, sizeof(*rxq));
-}
-
-/**
- * Allocate a Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error and rte_errno is set.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
-	struct ibv_qp *qp;
-	struct ibv_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = 1,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-	};
-
-	qp = ibv_create_qp(priv->pd, &attr);
-	if (!qp)
-		rte_errno = errno ? errno : EINVAL;
-	return qp;
-}
-
-/**
- * Configure a RX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param rxq
- *   Pointer to RX queue structure.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq tmpl = {
-		.priv = priv,
-		.mp = mp,
-		.socket = socket
-	};
-	struct ibv_qp_attr mod;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int mb_len;
-	int ret;
-
-	(void)conf; /* Thresholds configuration (ignored). */
-	mb_len = rte_pktmbuf_data_room_size(mp);
-	if (desc == 0) {
-		rte_errno = EINVAL;
-		ERROR("%p: invalid number of RX descriptors", (void *)dev);
-		goto error;
-	}
-	/* Enable scattered packets support for this queue if necessary. */
-	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
-	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
-	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
-		;
-	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
-		WARN("%p: scattered mode has been requested but is"
-		     " not supported, this may lead to packet loss",
-		     (void *)dev);
-	} else {
-		WARN("%p: the requested maximum Rx packet size (%u) is"
-		     " larger than a single mbuf (%u) and scattered"
-		     " mode has not been requested",
-		     (void *)dev,
-		     dev->data->dev_conf.rxmode.max_rx_pkt_len,
-		     mb_len - RTE_PKTMBUF_HEADROOM);
-	}
-	/* Use the entire RX mempool as the memory region. */
-	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
-	if (tmpl.mr == NULL) {
-		rte_errno = EINVAL;
-		ERROR("%p: MR creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	if (dev->data->dev_conf.intr_conf.rxq) {
-		tmpl.channel = ibv_create_comp_channel(priv->ctx);
-		if (tmpl.channel == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: Rx interrupt completion channel creation"
-			      " failure: %s",
-			      (void *)dev, strerror(rte_errno));
-			goto error;
-		}
-		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
-			ERROR("%p: unable to make Rx interrupt completion"
-			      " channel non-blocking: %s",
-			      (void *)dev, strerror(rte_errno));
-			goto error;
-		}
-	}
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
-	if (tmpl.cq == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
-	if (tmpl.qp == NULL) {
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = rxq_alloc_elts(&tmpl, desc, NULL);
-	if (ret) {
-		ERROR("%p: RXQ allocation failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(rte_errno));
-		goto error;
-	}
-	mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	/* Save port ID. */
-	tmpl.port_id = dev->data->port_id;
-	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
-	/* Clean up rxq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
-	rxq_cleanup(rxq);
-	*rxq = tmpl;
-	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
-	return 0;
-error:
-	ret = rte_errno;
-	rxq_cleanup(&tmpl);
-	rte_errno = ret;
-	assert(rte_errno > 0);
-	return -rte_errno;
-}
-
-/**
- * DPDK callback to configure a RX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   RX queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= priv->rxqs_n) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, priv->rxqs_n);
-		return -rte_errno;
-	}
-	if (rxq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)rxq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		(*priv->rxqs)[idx] = NULL;
-		if (idx == 0)
-			priv_mac_addr_del(priv);
-		rxq_cleanup(rxq);
-	} else {
-		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
-		if (rxq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
-	if (ret)
-		rte_free(rxq);
-	else {
-		rxq->stats.idx = idx;
-		DEBUG("%p: adding RX queue %p to list",
-		      (void *)dev, (void *)rxq);
-		(*priv->rxqs)[idx] = rxq;
-		/* Update receive callback. */
-		dev->rx_pkt_burst = mlx4_rx_burst;
-	}
-	return ret;
-}
-
-/**
- * DPDK callback to release a RX queue.
- *
- * @param dpdk_rxq
- *   Generic RX queue pointer.
- */
-static void
-mlx4_rx_queue_release(void *dpdk_rxq)
-{
-	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct priv *priv;
-	unsigned int i;
-
-	if (rxq == NULL)
-		return;
-	priv = rxq->priv;
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] == rxq) {
-			DEBUG("%p: removing RX queue %p from list",
-			      (void *)priv->dev, (void *)rxq);
-			(*priv->rxqs)[i] = NULL;
-			if (i == 0)
-				priv_mac_addr_del(priv);
-			break;
-		}
-	rxq_cleanup(rxq);
-	rte_free(rxq);
-}
-
 /**
  * DPDK callback to start the device.
  *
@@ -825,7 +245,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		return 0;
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
-	ret = priv_mac_addr_add(priv);
+	ret = mlx4_mac_addr_add(priv);
 	if (ret)
 		goto err;
 	ret = mlx4_intr_install(priv);
@@ -843,7 +263,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	return 0;
 err:
 	/* Rollback. */
-	priv_mac_addr_del(priv);
+	mlx4_mac_addr_del(priv);
 	priv->started = 0;
 	return ret;
 }
@@ -867,7 +287,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
 	mlx4_intr_uninstall(priv);
-	priv_mac_addr_del(priv);
+	mlx4_mac_addr_del(priv);
 }
 
 /**
@@ -890,7 +310,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
-	priv_mac_addr_del(priv);
+	mlx4_mac_addr_del(priv);
 	/*
 	 * Prevent crashes when queues are still in use. This is unfortunately
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
@@ -906,7 +326,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 			if (tmp == NULL)
 				continue;
 			(*priv->rxqs)[i] = NULL;
-			rxq_cleanup(tmp);
+			mlx4_rxq_cleanup(tmp);
 			rte_free(tmp);
 		}
 		priv->rxqs_n = 0;
@@ -1315,7 +735,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
 		priv->mac = mac;
-		if (priv_mac_addr_add(priv))
+		if (mlx4_mac_addr_add(priv))
 			goto port_error;
 #ifndef NDEBUG
 		{
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
new file mode 100644
index 0000000..1456b5f
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -0,0 +1,632 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Rx queues configuration for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#include "mlx4.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Allocate RX queue elements.
+ *
+ * @param rxq
+ *   Pointer to RX queue structure.
+ * @param elts_n
+ *   Number of elements to allocate.
+ * @param[in] pool
+ *   If not NULL, fetch buffers from this array instead of allocating them
+ *   with rte_pktmbuf_alloc().
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n,
+		    struct rte_mbuf **pool)
+{
+	unsigned int i;
+	struct rxq_elt (*elts)[elts_n] =
+		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
+				  rxq->socket);
+
+	if (elts == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("%p: can't allocate packets array", (void *)rxq);
+		goto error;
+	}
+	/* For each WR (packet). */
+	for (i = 0; (i != elts_n); ++i) {
+		struct rxq_elt *elt = &(*elts)[i];
+		struct ibv_recv_wr *wr = &elt->wr;
+		struct ibv_sge *sge = &(*elts)[i].sge;
+		struct rte_mbuf *buf;
+
+		if (pool != NULL) {
+			buf = *(pool++);
+			assert(buf != NULL);
+			rte_pktmbuf_reset(buf);
+		} else {
+			buf = rte_pktmbuf_alloc(rxq->mp);
+		}
+		if (buf == NULL) {
+			rte_errno = ENOMEM;
+			assert(pool == NULL);
+			ERROR("%p: empty mbuf pool", (void *)rxq);
+			goto error;
+		}
+		/*
+		 * Configure WR. Work request ID contains its own index in
+		 * the elts array and the offset between SGE buffer header and
+		 * its data.
+		 */
+		WR_ID(wr->wr_id).id = i;
+		WR_ID(wr->wr_id).offset =
+			(((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
+			 (uintptr_t)buf);
+		wr->next = &(*elts)[(i + 1)].wr;
+		wr->sg_list = sge;
+		wr->num_sge = 1;
+		/* Headroom is reserved by rte_pktmbuf_alloc(). */
+		assert(buf->data_off == RTE_PKTMBUF_HEADROOM);
+		/* Buffer is supposed to be empty. */
+		assert(rte_pktmbuf_data_len(buf) == 0);
+		assert(rte_pktmbuf_pkt_len(buf) == 0);
+		/* sge->addr must be able to store a pointer. */
+		assert(sizeof(sge->addr) >= sizeof(uintptr_t));
+		/* SGE keeps its headroom. */
+		sge->addr = (uintptr_t)
+			((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
+		sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
+		sge->lkey = rxq->mr->lkey;
+		/* Redundant check for tailroom. */
+		assert(sge->length == rte_pktmbuf_tailroom(buf));
+		/*
+		 * Make sure elts index and SGE mbuf pointer can be deduced
+		 * from WR ID.
+		 */
+		if ((WR_ID(wr->wr_id).id != i) ||
+		    ((void *)((uintptr_t)sge->addr -
+			WR_ID(wr->wr_id).offset) != buf)) {
+			rte_errno = EOVERFLOW;
+			ERROR("%p: cannot store index and offset in WR ID",
+			      (void *)rxq);
+			sge->addr = 0;
+			rte_pktmbuf_free(buf);
+			goto error;
+		}
+	}
+	/* The last WR pointer must be NULL. */
+	(*elts)[(i - 1)].wr.next = NULL;
+	DEBUG("%p: allocated and configured %u single-segment WRs",
+	      (void *)rxq, elts_n);
+	rxq->elts_n = elts_n;
+	rxq->elts_head = 0;
+	rxq->elts = elts;
+	return 0;
+error:
+	if (elts != NULL) {
+		assert(pool == NULL);
+		for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+			struct rxq_elt *elt = &(*elts)[i];
+			struct rte_mbuf *buf;
+
+			if (elt->sge.addr == 0)
+				continue;
+			assert(WR_ID(elt->wr.wr_id).id == i);
+			buf = (void *)((uintptr_t)elt->sge.addr -
+				WR_ID(elt->wr.wr_id).offset);
+			rte_pktmbuf_free_seg(buf);
+		}
+		rte_free(elts);
+	}
+	DEBUG("%p: failed, freed everything", (void *)rxq);
+	assert(rte_errno > 0);
+	return -rte_errno;
+}
+
+/**
+ * Free RX queue elements.
+ *
+ * @param rxq
+ *   Pointer to RX queue structure.
+ */
+static void
+mlx4_rxq_free_elts(struct rxq *rxq)
+{
+	unsigned int i;
+	unsigned int elts_n = rxq->elts_n;
+	struct rxq_elt (*elts)[elts_n] = rxq->elts;
+
+	DEBUG("%p: freeing WRs", (void *)rxq);
+	rxq->elts_n = 0;
+	rxq->elts = NULL;
+	if (elts == NULL)
+		return;
+	for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+		struct rxq_elt *elt = &(*elts)[i];
+		struct rte_mbuf *buf;
+
+		if (elt->sge.addr == 0)
+			continue;
+		assert(WR_ID(elt->wr.wr_id).id == i);
+		buf = (void *)((uintptr_t)elt->sge.addr -
+			WR_ID(elt->wr.wr_id).offset);
+		rte_pktmbuf_free_seg(buf);
+	}
+	rte_free(elts);
+}
+
+/**
+ * Clean up a RX queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param rxq
+ *   Pointer to RX queue structure.
+ */
+void
+mlx4_rxq_cleanup(struct rxq *rxq)
+{
+	DEBUG("cleaning up %p", (void *)rxq);
+	mlx4_rxq_free_elts(rxq);
+	if (rxq->qp != NULL)
+		claim_zero(ibv_destroy_qp(rxq->qp));
+	if (rxq->cq != NULL)
+		claim_zero(ibv_destroy_cq(rxq->cq));
+	if (rxq->channel != NULL)
+		claim_zero(ibv_destroy_comp_channel(rxq->channel));
+	if (rxq->mr != NULL)
+		claim_zero(ibv_dereg_mr(rxq->mr));
+	memset(rxq, 0, sizeof(*rxq));
+}
+
+/**
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param cq
+ *   Completion queue to associate with QP.
+ * @param desc
+ *   Number of descriptors in QP (hint only).
+ *
+ * @return
+ *   QP pointer or NULL in case of error and rte_errno is set.
+ */
+static struct ibv_qp *
+mlx4_rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
+{
+	struct ibv_qp *qp;
+	struct ibv_qp_init_attr attr = {
+		/* CQ to be associated with the send queue. */
+		.send_cq = cq,
+		/* CQ to be associated with the receive queue. */
+		.recv_cq = cq,
+		.cap = {
+			/* Max number of outstanding WRs. */
+			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+					priv->device_attr.max_qp_wr :
+					desc),
+			/* Max number of scatter/gather elements in a WR. */
+			.max_recv_sge = 1,
+		},
+		.qp_type = IBV_QPT_RAW_PACKET,
+	};
+
+	qp = ibv_create_qp(priv->pd, &attr);
+	if (!qp)
+		rte_errno = errno ? errno : EINVAL;
+	return qp;
+}
+
+/**
+ * Configure a RX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param rxq
+ *   Pointer to RX queue structure.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
+	       unsigned int socket, const struct rte_eth_rxconf *conf,
+	       struct rte_mempool *mp)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq tmpl = {
+		.priv = priv,
+		.mp = mp,
+		.socket = socket
+	};
+	struct ibv_qp_attr mod;
+	struct ibv_recv_wr *bad_wr;
+	unsigned int mb_len;
+	int ret;
+
+	(void)conf; /* Thresholds configuration (ignored). */
+	mb_len = rte_pktmbuf_data_room_size(mp);
+	if (desc == 0) {
+		rte_errno = EINVAL;
+		ERROR("%p: invalid number of RX descriptors", (void *)dev);
+		goto error;
+	}
+	/* Enable scattered packets support for this queue if necessary. */
+	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
+	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
+	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
+		;
+	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
+		WARN("%p: scattered mode has been requested but is"
+		     " not supported, this may lead to packet loss",
+		     (void *)dev);
+	} else {
+		WARN("%p: the requested maximum Rx packet size (%u) is"
+		     " larger than a single mbuf (%u) and scattered"
+		     " mode has not been requested",
+		     (void *)dev,
+		     dev->data->dev_conf.rxmode.max_rx_pkt_len,
+		     mb_len - RTE_PKTMBUF_HEADROOM);
+	}
+	/* Use the entire RX mempool as the memory region. */
+	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
+	if (tmpl.mr == NULL) {
+		rte_errno = EINVAL;
+		ERROR("%p: MR creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	if (dev->data->dev_conf.intr_conf.rxq) {
+		tmpl.channel = ibv_create_comp_channel(priv->ctx);
+		if (tmpl.channel == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: Rx interrupt completion channel creation"
+			      " failure: %s",
+			      (void *)dev, strerror(rte_errno));
+			goto error;
+		}
+		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
+			ERROR("%p: unable to make Rx interrupt completion"
+			      " channel non-blocking: %s",
+			      (void *)dev, strerror(rte_errno));
+			goto error;
+		}
+	}
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
+	if (tmpl.cq == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("%p: CQ creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	DEBUG("priv->device_attr.max_qp_wr is %d",
+	      priv->device_attr.max_qp_wr);
+	DEBUG("priv->device_attr.max_sge is %d",
+	      priv->device_attr.max_sge);
+	tmpl.qp = mlx4_rxq_setup_qp(priv, tmpl.cq, desc);
+	if (tmpl.qp == NULL) {
+		ERROR("%p: QP creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	mod = (struct ibv_qp_attr){
+		/* Move the QP to this state. */
+		.qp_state = IBV_QPS_INIT,
+		/* Primary port number. */
+		.port_num = priv->port
+	};
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = mlx4_rxq_alloc_elts(&tmpl, desc, NULL);
+	if (ret) {
+		ERROR("%p: RXQ allocation failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+		      (void *)dev,
+		      (void *)bad_wr,
+		      strerror(rte_errno));
+		goto error;
+	}
+	mod = (struct ibv_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	/* Save port ID. */
+	tmpl.port_id = dev->data->port_id;
+	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
+	/* Clean up rxq in case we're reinitializing it. */
+	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
+	mlx4_rxq_cleanup(rxq);
+	*rxq = tmpl;
+	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
+	return 0;
+error:
+	ret = rte_errno;
+	mlx4_rxq_cleanup(&tmpl);
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to configure a RX queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   RX queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq *rxq = (*priv->rxqs)[idx];
+	int ret;
+
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= priv->rxqs_n) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, priv->rxqs_n);
+		return -rte_errno;
+	}
+	if (rxq != NULL) {
+		DEBUG("%p: reusing already allocated queue index %u (%p)",
+		      (void *)dev, idx, (void *)rxq);
+		if (priv->started) {
+			rte_errno = EEXIST;
+			return -rte_errno;
+		}
+		(*priv->rxqs)[idx] = NULL;
+		if (idx == 0)
+			mlx4_mac_addr_del(priv);
+		mlx4_rxq_cleanup(rxq);
+	} else {
+		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+		if (rxq == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: unable to allocate queue index %u",
+			      (void *)dev, idx);
+			return -rte_errno;
+		}
+	}
+	ret = mlx4_rxq_setup(dev, rxq, desc, socket, conf, mp);
+	if (ret) {
+		rte_free(rxq);
+	} else {
+		rxq->stats.idx = idx;
+		DEBUG("%p: adding RX queue %p to list",
+		      (void *)dev, (void *)rxq);
+		(*priv->rxqs)[idx] = rxq;
+		/* Update receive callback. */
+		dev->rx_pkt_burst = mlx4_rx_burst;
+	}
+	return ret;
+}
+
+/**
+ * DPDK callback to release a RX queue.
+ *
+ * @param dpdk_rxq
+ *   Generic RX queue pointer.
+ */
+void
+mlx4_rx_queue_release(void *dpdk_rxq)
+{
+	struct rxq *rxq = (struct rxq *)dpdk_rxq;
+	struct priv *priv;
+	unsigned int i;
+
+	if (rxq == NULL)
+		return;
+	priv = rxq->priv;
+	for (i = 0; (i != priv->rxqs_n); ++i)
+		if ((*priv->rxqs)[i] == rxq) {
+			DEBUG("%p: removing RX queue %p from list",
+			      (void *)priv->dev, (void *)rxq);
+			(*priv->rxqs)[i] = NULL;
+			if (i == 0)
+				mlx4_mac_addr_del(priv);
+			break;
+		}
+	mlx4_rxq_cleanup(rxq);
+	rte_free(rxq);
+}
+
+/**
+ * Unregister a MAC address.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+mlx4_mac_addr_del(struct priv *priv)
+{
+#ifndef NDEBUG
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
+#endif
+
+	if (!priv->mac_flow)
+		return;
+	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+	      (void *)priv,
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
+	claim_zero(ibv_destroy_flow(priv->mac_flow));
+	priv->mac_flow = NULL;
+}
+
+/**
+ * Register a MAC address.
+ *
+ * The MAC address is registered in queue 0.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mac_addr_add(struct priv *priv)
+{
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
+	struct rxq *rxq;
+	struct ibv_flow *flow;
+
+	/* If device isn't started, this is all we need to do. */
+	if (!priv->started)
+		return 0;
+	if (priv->isolated)
+		return 0;
+	if (*priv->rxqs && (*priv->rxqs)[0])
+		rxq = (*priv->rxqs)[0];
+	else
+		return 0;
+
+	/* Allocate flow specification on the stack. */
+	struct __attribute__((packed)) {
+		struct ibv_flow_attr attr;
+		struct ibv_flow_spec_eth spec;
+	} data;
+	struct ibv_flow_attr *attr = &data.attr;
+	struct ibv_flow_spec_eth *spec = &data.spec;
+
+	if (priv->mac_flow)
+		mlx4_mac_addr_del(priv);
+	/*
+	 * No padding must be inserted by the compiler between attr and spec.
+	 * This layout is expected by libibverbs.
+	 */
+	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
+	*attr = (struct ibv_flow_attr){
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.priority = 3,
+		.num_of_specs = 1,
+		.port = priv->port,
+		.flags = 0
+	};
+	*spec = (struct ibv_flow_spec_eth){
+		.type = IBV_FLOW_SPEC_ETH,
+		.size = sizeof(*spec),
+		.val = {
+			.dst_mac = {
+				(*mac)[0], (*mac)[1], (*mac)[2],
+				(*mac)[3], (*mac)[4], (*mac)[5]
+			},
+		},
+		.mask = {
+			.dst_mac = "\xff\xff\xff\xff\xff\xff",
+		}
+	};
+	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+	      (void *)priv,
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
+	/* Create related flow. */
+	flow = ibv_create_flow(rxq->qp, attr);
+	if (flow == NULL) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: flow configuration failed, errno=%d: %s",
+		      (void *)rxq, rte_errno, strerror(errno));
+		return -rte_errno;
+	}
+	assert(priv->mac_flow == NULL);
+	priv->mac_flow = flow;
+	return 0;
+}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index b02af8e..a3d972b 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -130,6 +130,17 @@ struct txq {
 	unsigned int socket; /**< CPU socket ID for allocations. */
 };
 
+/* mlx4_rxq.c */
+
+void mlx4_rxq_cleanup(struct rxq *rxq);
+int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			uint16_t desc, unsigned int socket,
+			const struct rte_eth_rxconf *conf,
+			struct rte_mempool *mp);
+void mlx4_rx_queue_release(void *dpdk_rxq);
+void mlx4_mac_addr_del(struct priv *priv);
+int mlx4_mac_addr_add(struct priv *priv);
+
 /* mlx4_rxtx.c */
 
 uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 45/48] net/mlx4: group flow API handlers in common file
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (43 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 44/48] net/mlx4: separate Rx " Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 46/48] net/mlx4: rename private functions in flow API Adrien Mazarguil
                   ` (4 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Only the common filter control operation callback needs to be exposed.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 48 +-------------------------
 drivers/net/mlx4/mlx4_flow.c | 72 ++++++++++++++++++++++++++++++++++++---
 drivers/net/mlx4/mlx4_flow.h | 39 +++++----------------
 3 files changed, 76 insertions(+), 83 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b54a569..6424d8b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -51,7 +51,6 @@
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_memory.h>
-#include <rte_flow.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
 #include <rte_common.h>
@@ -356,51 +355,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	memset(priv, 0, sizeof(*priv));
 }
 
-const struct rte_flow_ops mlx4_flow_ops = {
-	.validate = mlx4_flow_validate,
-	.create = mlx4_flow_create,
-	.destroy = mlx4_flow_destroy,
-	.flush = mlx4_flow_flush,
-	.query = NULL,
-	.isolate = mlx4_flow_isolate,
-};
-
-/**
- * Manage filter operations.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param filter_type
- *   Filter type.
- * @param filter_op
- *   Operation to perform.
- * @param arg
- *   Pointer to operation-specific structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
-		     enum rte_filter_type filter_type,
-		     enum rte_filter_op filter_op,
-		     void *arg)
-{
-	switch (filter_type) {
-	case RTE_ETH_FILTER_GENERIC:
-		if (filter_op != RTE_ETH_FILTER_GET)
-			break;
-		*(const void **)arg = &mlx4_flow_ops;
-		return 0;
-	default:
-		ERROR("%p: filter type (%d) not supported",
-		      (void *)dev, filter_type);
-		break;
-	}
-	rte_errno = ENOTSUP;
-	return -rte_errno;
-}
-
 static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_configure = mlx4_dev_configure,
 	.dev_start = mlx4_dev_start,
@@ -419,7 +373,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.flow_ctrl_get = mlx4_flow_ctrl_get,
 	.flow_ctrl_set = mlx4_flow_ctrl_set,
 	.mtu_set = mlx4_mtu_set,
-	.filter_ctrl = mlx4_dev_filter_ctrl,
+	.filter_ctrl = mlx4_filter_ctrl,
 	.rx_queue_intr_enable = mlx4_rx_intr_enable,
 	.rx_queue_intr_disable = mlx4_rx_intr_disable,
 };
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 61455ce..6401a83 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -31,8 +31,26 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <arpa/inet.h>
 #include <assert.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
 
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_errno.h>
+#include <rte_eth_ctrl.h>
+#include <rte_ethdev.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
@@ -697,7 +715,7 @@ priv_flow_validate(struct priv *priv,
  * @see rte_flow_validate()
  * @see rte_flow_ops
  */
-int
+static int
 mlx4_flow_validate(struct rte_eth_dev *dev,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item items[],
@@ -844,7 +862,7 @@ priv_flow_create_action_queue(struct priv *priv,
  * @see rte_flow_create()
  * @see rte_flow_ops
  */
-struct rte_flow *
+static struct rte_flow *
 mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item items[],
@@ -927,7 +945,7 @@ mlx4_flow_create(struct rte_eth_dev *dev,
  * @return
  *   0 on success, a negative value on error.
  */
-int
+static int
 mlx4_flow_isolate(struct rte_eth_dev *dev,
 		  int enable,
 		  struct rte_flow_error *error)
@@ -951,7 +969,7 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
  * @see rte_flow_destroy()
  * @see rte_flow_ops
  */
-int
+static int
 mlx4_flow_destroy(struct rte_eth_dev *dev,
 		  struct rte_flow *flow,
 		  struct rte_flow_error *error)
@@ -973,7 +991,7 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
  * @see rte_flow_flush()
  * @see rte_flow_ops
  */
-int
+static int
 mlx4_flow_flush(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
@@ -1044,3 +1062,47 @@ mlx4_priv_flow_start(struct priv *priv)
 	}
 	return 0;
 }
+
+static const struct rte_flow_ops mlx4_flow_ops = {
+	.validate = mlx4_flow_validate,
+	.create = mlx4_flow_create,
+	.destroy = mlx4_flow_destroy,
+	.flush = mlx4_flow_flush,
+	.isolate = mlx4_flow_isolate,
+};
+
+/**
+ * Manage filter operations.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param filter_type
+ *   Filter type.
+ * @param filter_op
+ *   Operation to perform.
+ * @param arg
+ *   Pointer to operation-specific structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_filter_ctrl(struct rte_eth_dev *dev,
+		 enum rte_filter_type filter_type,
+		 enum rte_filter_op filter_op,
+		 void *arg)
+{
+	switch (filter_type) {
+	case RTE_ETH_FILTER_GENERIC:
+		if (filter_op != RTE_ETH_FILTER_GET)
+			break;
+		*(const void **)arg = &mlx4_flow_ops;
+		return 0;
+	default:
+		ERROR("%p: filter type (%d) not supported",
+		      (void *)dev, filter_type);
+		break;
+	}
+	rte_errno = ENOTSUP;
+	return -rte_errno;
+}
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 17e5f6e..8bd659c 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -34,7 +34,6 @@
 #ifndef RTE_PMD_MLX4_FLOW_H_
 #define RTE_PMD_MLX4_FLOW_H_
 
-#include <stddef.h>
 #include <stdint.h>
 #include <sys/queue.h>
 
@@ -48,12 +47,12 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_eth_ctrl.h>
+#include <rte_ethdev.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 #include <rte_byteorder.h>
 
-#include "mlx4.h"
-
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
@@ -61,47 +60,25 @@ struct rte_flow {
 	struct ibv_qp *qp; /**< Verbs queue pair. */
 };
 
-int
-mlx4_flow_validate(struct rte_eth_dev *dev,
-		   const struct rte_flow_attr *attr,
-		   const struct rte_flow_item items[],
-		   const struct rte_flow_action actions[],
-		   struct rte_flow_error *error);
-
-struct rte_flow *
-mlx4_flow_create(struct rte_eth_dev *dev,
-		 const struct rte_flow_attr *attr,
-		 const struct rte_flow_item items[],
-		 const struct rte_flow_action actions[],
-		 struct rte_flow_error *error);
-
-int
-mlx4_flow_destroy(struct rte_eth_dev *dev,
-		  struct rte_flow *flow,
-		  struct rte_flow_error *error);
-
-int
-mlx4_flow_flush(struct rte_eth_dev *dev,
-		struct rte_flow_error *error);
-
 /** Structure to pass to the conversion function. */
 struct mlx4_flow {
 	struct ibv_flow_attr *ibv_attr; /**< Verbs attribute. */
 	unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */
 };
 
-int
-mlx4_flow_isolate(struct rte_eth_dev *dev,
-		  int enable,
-		  struct rte_flow_error *error);
-
 struct mlx4_flow_action {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint32_t queue_id; /**< Identifier of the queue. */
 };
 
+/* mlx4_flow.c */
+
 int mlx4_priv_flow_start(struct priv *priv);
 void mlx4_priv_flow_stop(struct priv *priv);
+int mlx4_filter_ctrl(struct rte_eth_dev *dev,
+		     enum rte_filter_type filter_type,
+		     enum rte_filter_op filter_op,
+		     void *arg);
 
 #endif /* RTE_PMD_MLX4_FLOW_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 46/48] net/mlx4: rename private functions in flow API
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (44 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 45/48] net/mlx4: group flow API handlers in common file Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 47/48] net/mlx4: separate memory management functions Adrien Mazarguil
                   ` (3 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

While internal static functions do not cause link time conflicts, this
differentiates them from their mlx5 PMD counterparts while debugging.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  4 ++--
 drivers/net/mlx4/mlx4_flow.c | 30 +++++++++++++++---------------
 drivers/net/mlx4/mlx4_flow.h |  4 ++--
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 6424d8b..1f09b47 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -253,7 +253,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		     (void *)dev);
 		goto err;
 	}
-	ret = mlx4_priv_flow_start(priv);
+	ret = mlx4_flow_start(priv);
 	if (ret) {
 		ERROR("%p: flow start failed: %s",
 		      (void *)dev, strerror(ret));
@@ -284,7 +284,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		return;
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
-	mlx4_priv_flow_stop(priv);
+	mlx4_flow_stop(priv);
 	mlx4_intr_uninstall(priv);
 	mlx4_mac_addr_del(priv);
 }
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 6401a83..5616b83 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -561,7 +561,7 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 };
 
 /**
- * Validate a flow supported by the NIC.
+ * Make sure a flow rule is supported and initialize associated structure.
  *
  * @param priv
  *   Pointer to private structure.
@@ -580,12 +580,12 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_flow_validate(struct priv *priv,
-		   const struct rte_flow_attr *attr,
-		   const struct rte_flow_item items[],
-		   const struct rte_flow_action actions[],
-		   struct rte_flow_error *error,
-		   struct mlx4_flow *flow)
+mlx4_flow_prepare(struct priv *priv,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item items[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error,
+		  struct mlx4_flow *flow)
 {
 	const struct mlx4_flow_items *cur_item = mlx4_flow_items;
 	struct mlx4_flow_action action = {
@@ -725,7 +725,7 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	return priv_flow_validate(priv, attr, items, actions, error, &flow);
+	return mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
 }
 
 /**
@@ -817,7 +817,7 @@ mlx4_flow_create_drop_queue(struct priv *priv)
  *   A flow if the rule could be created.
  */
 static struct rte_flow *
-priv_flow_create_action_queue(struct priv *priv,
+mlx4_flow_create_action_queue(struct priv *priv,
 			      struct ibv_flow_attr *ibv_attr,
 			      struct mlx4_flow_action *action,
 			      struct rte_flow_error *error)
@@ -875,7 +875,7 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
 	int err;
 
-	err = priv_flow_validate(priv, attr, items, actions, error, &flow);
+	err = mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
 	if (err)
 		return NULL;
 	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
@@ -894,8 +894,8 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		.port = priv->port,
 		.flags = 0,
 	};
-	claim_zero(priv_flow_validate(priv, attr, items, actions,
-				      error, &flow));
+	claim_zero(mlx4_flow_prepare(priv, attr, items, actions,
+				     error, &flow));
 	action = (struct mlx4_flow_action){
 		.queue = 0,
 		.drop = 0,
@@ -917,7 +917,7 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 			goto exit;
 		}
 	}
-	rte_flow = priv_flow_create_action_queue(priv, flow.ibv_attr,
+	rte_flow = mlx4_flow_create_action_queue(priv, flow.ibv_attr,
 						 &action, error);
 	if (rte_flow) {
 		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
@@ -1015,7 +1015,7 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
  *   Pointer to private structure.
  */
 void
-mlx4_priv_flow_stop(struct priv *priv)
+mlx4_flow_stop(struct priv *priv)
 {
 	struct rte_flow *flow;
 
@@ -1039,7 +1039,7 @@ mlx4_priv_flow_stop(struct priv *priv)
  *   0 on success, a errno value otherwise and rte_errno is set.
  */
 int
-mlx4_priv_flow_start(struct priv *priv)
+mlx4_flow_start(struct priv *priv)
 {
 	int ret;
 	struct ibv_qp *qp;
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 8bd659c..a24ae31 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -74,8 +74,8 @@ struct mlx4_flow_action {
 
 /* mlx4_flow.c */
 
-int mlx4_priv_flow_start(struct priv *priv);
-void mlx4_priv_flow_stop(struct priv *priv);
+int mlx4_flow_start(struct priv *priv);
+void mlx4_flow_stop(struct priv *priv);
 int mlx4_filter_ctrl(struct rte_eth_dev *dev,
 		     enum rte_filter_type filter_type,
 		     enum rte_filter_op filter_op,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 47/48] net/mlx4: separate memory management functions
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (45 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 46/48] net/mlx4: rename private functions in flow API Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-01 16:54 ` [PATCH v1 48/48] net/mlx4: clean up includes and comments Adrien Mazarguil
                   ` (2 subsequent siblings)
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile  |   1 +
 drivers/net/mlx4/mlx4.c    | 115 -------------------------
 drivers/net/mlx4/mlx4.h    |   8 +-
 drivers/net/mlx4/mlx4_mr.c | 183 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 188 insertions(+), 119 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 00ccba0..41a61d7 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_mr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_txq.c
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 1f09b47..c16803e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -48,9 +48,7 @@
 #include <rte_dev.h>
 #include <rte_mbuf.h>
 #include <rte_errno.h>
-#include <rte_mempool.h>
 #include <rte_malloc.h>
-#include <rte_memory.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
 #include <rte_common.h>
@@ -110,119 +108,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-struct mlx4_check_mempool_data {
-	int ret;
-	char *start;
-	char *end;
-};
-
-/* Called by mlx4_check_mempool() when iterating the memory chunks. */
-static void mlx4_check_mempool_cb(struct rte_mempool *mp,
-	void *opaque, struct rte_mempool_memhdr *memhdr,
-	unsigned mem_idx)
-{
-	struct mlx4_check_mempool_data *data = opaque;
-
-	(void)mp;
-	(void)mem_idx;
-	/* It already failed, skip the next chunks. */
-	if (data->ret != 0)
-		return;
-	/* It is the first chunk. */
-	if (data->start == NULL && data->end == NULL) {
-		data->start = memhdr->addr;
-		data->end = data->start + memhdr->len;
-		return;
-	}
-	if (data->end == memhdr->addr) {
-		data->end += memhdr->len;
-		return;
-	}
-	if (data->start == (char *)memhdr->addr + memhdr->len) {
-		data->start -= memhdr->len;
-		return;
-	}
-	/* Error, mempool is not virtually contigous. */
-	data->ret = -1;
-}
-
-/**
- * Check if a mempool can be used: it must be virtually contiguous.
- *
- * @param[in] mp
- *   Pointer to memory pool.
- * @param[out] start
- *   Pointer to the start address of the mempool virtual memory area
- * @param[out] end
- *   Pointer to the end address of the mempool virtual memory area
- *
- * @return
- *   0 on success (mempool is virtually contiguous), -1 on error.
- */
-static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
-	uintptr_t *end)
-{
-	struct mlx4_check_mempool_data data;
-
-	memset(&data, 0, sizeof(data));
-	rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, &data);
-	*start = (uintptr_t)data.start;
-	*end = (uintptr_t)data.end;
-	return data.ret;
-}
-
-/**
- * Register mempool as a memory region.
- *
- * @param pd
- *   Pointer to protection domain.
- * @param mp
- *   Pointer to memory pool.
- *
- * @return
- *   Memory region pointer, NULL in case of error and rte_errno is set.
- */
-struct ibv_mr *
-mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
-{
-	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-	uintptr_t start;
-	uintptr_t end;
-	unsigned int i;
-	struct ibv_mr *mr;
-
-	if (mlx4_check_mempool(mp, &start, &end) != 0) {
-		rte_errno = EINVAL;
-		ERROR("mempool %p: not virtually contiguous",
-			(void *)mp);
-		return NULL;
-	}
-	DEBUG("mempool %p area start=%p end=%p size=%zu",
-	      (void *)mp, (void *)start, (void *)end,
-	      (size_t)(end - start));
-	/* Round start and end to page boundary if found in memory segments. */
-	for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
-		uintptr_t addr = (uintptr_t)ms[i].addr;
-		size_t len = ms[i].len;
-		unsigned int align = ms[i].hugepage_sz;
-
-		if ((start > addr) && (start < addr + len))
-			start = RTE_ALIGN_FLOOR(start, align);
-		if ((end > addr) && (end < addr + len))
-			end = RTE_ALIGN_CEIL(end, align);
-	}
-	DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
-	      (void *)mp, (void *)start, (void *)end,
-	      (size_t)(end - start));
-	mr = ibv_reg_mr(pd,
-			(void *)start,
-			end - start,
-			IBV_ACCESS_LOCAL_WRITE);
-	if (!mr)
-		rte_errno = errno ? errno : EINVAL;
-	return mr;
-}
-
 /**
  * DPDK callback to start the device.
  *
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b5f2953..94b5f1e 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -115,10 +115,6 @@ struct priv {
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 };
 
-/* mlx4.c */
-
-struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
-
 /* mlx4_ethdev.c */
 
 int mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE]);
@@ -144,4 +140,8 @@ int mlx4_intr_install(struct priv *priv);
 int mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 
+/* mlx4_mr.c */
+
+struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
+
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
new file mode 100644
index 0000000..1a8d2fc
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -0,0 +1,183 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Memory management functions for mlx4 driver.
+ */
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#include "mlx4_utils.h"
+
+struct mlx4_check_mempool_data {
+	int ret;
+	char *start;
+	char *end;
+};
+
+/**
+ * Called by mlx4_check_mempool() when iterating the memory chunks.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool (unused).
+ * @param[in, out] data
+ *   Pointer to shared buffer with mlx4_check_mempool().
+ * @param[in] memhdr
+ *   Pointer to mempool chunk header.
+ * @param mem_idx
+ *   Mempool element index (unused).
+ */
+static void
+mlx4_check_mempool_cb(struct rte_mempool *mp, void *opaque,
+		      struct rte_mempool_memhdr *memhdr,
+		      unsigned int mem_idx)
+{
+	struct mlx4_check_mempool_data *data = opaque;
+
+	(void)mp;
+	(void)mem_idx;
+	/* It already failed, skip the next chunks. */
+	if (data->ret != 0)
+		return;
+	/* It is the first chunk. */
+	if (data->start == NULL && data->end == NULL) {
+		data->start = memhdr->addr;
+		data->end = data->start + memhdr->len;
+		return;
+	}
+	if (data->end == memhdr->addr) {
+		data->end += memhdr->len;
+		return;
+	}
+	if (data->start == (char *)memhdr->addr + memhdr->len) {
+		data->start -= memhdr->len;
+		return;
+	}
+	/* Error, mempool is not virtually contigous. */
+	data->ret = -1;
+}
+
+/**
+ * Check if a mempool can be used: it must be virtually contiguous.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool.
+ * @param[out] start
+ *   Pointer to the start address of the mempool virtual memory area.
+ * @param[out] end
+ *   Pointer to the end address of the mempool virtual memory area.
+ *
+ * @return
+ *   0 on success (mempool is virtually contiguous), -1 on error.
+ */
+static int
+mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start, uintptr_t *end)
+{
+	struct mlx4_check_mempool_data data;
+
+	memset(&data, 0, sizeof(data));
+	rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, &data);
+	*start = (uintptr_t)data.start;
+	*end = (uintptr_t)data.end;
+	return data.ret;
+}
+
+/**
+ * Register mempool as a memory region.
+ *
+ * @param pd
+ *   Pointer to protection domain.
+ * @param mp
+ *   Pointer to memory pool.
+ *
+ * @return
+ *   Memory region pointer, NULL in case of error and rte_errno is set.
+ */
+struct ibv_mr *
+mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	uintptr_t start;
+	uintptr_t end;
+	unsigned int i;
+	struct ibv_mr *mr;
+
+	if (mlx4_check_mempool(mp, &start, &end) != 0) {
+		rte_errno = EINVAL;
+		ERROR("mempool %p: not virtually contiguous",
+			(void *)mp);
+		return NULL;
+	}
+	DEBUG("mempool %p area start=%p end=%p size=%zu",
+	      (void *)mp, (void *)start, (void *)end,
+	      (size_t)(end - start));
+	/* Round start and end to page boundary if found in memory segments. */
+	for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
+		uintptr_t addr = (uintptr_t)ms[i].addr;
+		size_t len = ms[i].len;
+		unsigned int align = ms[i].hugepage_sz;
+
+		if ((start > addr) && (start < addr + len))
+			start = RTE_ALIGN_FLOOR(start, align);
+		if ((end > addr) && (end < addr + len))
+			end = RTE_ALIGN_CEIL(end, align);
+	}
+	DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
+	      (void *)mp, (void *)start, (void *)end,
+	      (size_t)(end - start));
+	mr = ibv_reg_mr(pd,
+			(void *)start,
+			end - start,
+			IBV_ACCESS_LOCAL_WRITE);
+	if (!mr)
+		rte_errno = errno ? errno : EINVAL;
+	return mr;
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v1 48/48] net/mlx4: clean up includes and comments
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (46 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 47/48] net/mlx4: separate memory management functions Adrien Mazarguil
@ 2017-08-01 16:54 ` Adrien Mazarguil
  2017-08-18 13:28 ` [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Ferruh Yigit
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
  49 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-01 16:54 UTC (permalink / raw)
  To: dev

Add missing includes and sort them, then update/remove comments around them
for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 40 ++++++++++++++++++++++++---------------
 drivers/net/mlx4/mlx4.h      |  3 +--
 drivers/net/mlx4/mlx4_flow.c |  5 +++++
 drivers/net/mlx4/mlx4_flow.h |  3 +--
 4 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index c16803e..8573e14 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -31,29 +31,41 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-/* System headers. */
+/**
+ * @file
+ * mlx4 driver initialization.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <inttypes.h>
 #include <stddef.h>
+#include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <stdint.h>
-#include <inttypes.h>
 #include <string.h>
-#include <errno.h>
 #include <unistd.h>
-#include <assert.h>
 
-#include <rte_ether.h>
-#include <rte_ethdev.h>
-#include <rte_ethdev_pci.h>
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
 #include <rte_dev.h>
-#include <rte_mbuf.h>
 #include <rte_errno.h>
-#include <rte_malloc.h>
-#include <rte_kvargs.h>
+#include <rte_ethdev.h>
+#include <rte_ethdev_pci.h>
+#include <rte_ether.h>
 #include <rte_interrupts.h>
-#include <rte_common.h>
+#include <rte_kvargs.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
 
-/* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
 #include "mlx4_rxtx.h"
@@ -73,8 +85,6 @@ const char *pmd_mlx4_init_params[] = {
 	NULL,
 };
 
-/* Device configuration. */
-
 /**
  * DPDK callback for Ethernet device configuration.
  *
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 94b5f1e..1cd4db3 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -37,8 +37,7 @@
 #include <net/if.h>
 #include <stdint.h>
 
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+/* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 5616b83..e2798f6 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -31,6 +31,11 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+/**
+ * @file
+ * Flow API operations for mlx4 driver.
+ */
+
 #include <arpa/inet.h>
 #include <assert.h>
 #include <errno.h>
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index a24ae31..fbb775d 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -37,8 +37,7 @@
 #include <stdint.h>
 #include <sys/queue.h>
 
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+/* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 03/48] net/mlx4: check max number of ports dynamically
  2017-08-01 16:53 ` [PATCH v1 03/48] net/mlx4: check max number of ports dynamically Adrien Mazarguil
@ 2017-08-01 17:35   ` Legacy, Allain
  2017-08-02  7:52     ` Adrien Mazarguil
  0 siblings, 1 reply; 110+ messages in thread
From: Legacy, Allain @ 2017-08-01 17:35 UTC (permalink / raw)
  To: Adrien Mazarguil, dev; +Cc: Gaëtan Rivet

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Tuesday, August 01, 2017 12:54 PM
<...>
> @@ -5946,12 +5949,11 @@ mlx4_arg_parse(const char *key, const char *val,
> void *out)
>  		return -errno;
>  	}
>  	if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) {
> -		if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) {
> -			ERROR("invalid port index %lu (max: %u)",
> -				tmp, MLX4_PMD_MAX_PHYS_PORTS - 1);
> +		if (!(conf->ports.present & (1 << tmp))) {
> +			ERROR("invalid port index %lu", tmp);

The original error included the max value.  Wouldn't it be useful to report this to the 
user to help them understand their mistake?


> @@ -6085,16 +6092,16 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv,
> struct rte_pci_device *pci_dev)
>  	}
>  	INFO("%u port(s) detected", device_attr.phys_port_cnt);
> 
> +	for (i = 0; i < device_attr.phys_port_cnt; ++i)
> +		conf.ports.present |= 1 << i;

The loop could be avoided with:

	conf.ports.present = (1 << device_attr.phys_port_cnt) - 1;


Regards,
Allain

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 03/48] net/mlx4: check max number of ports dynamically
  2017-08-01 17:35   ` Legacy, Allain
@ 2017-08-02  7:52     ` Adrien Mazarguil
  0 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-08-02  7:52 UTC (permalink / raw)
  To: Legacy, Allain; +Cc: dev, Gaëtan Rivet

On Tue, Aug 01, 2017 at 05:35:30PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Tuesday, August 01, 2017 12:54 PM
> <...>
> > @@ -5946,12 +5949,11 @@ mlx4_arg_parse(const char *key, const char *val,
> > void *out)
> >  		return -errno;
> >  	}
> >  	if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) {
> > -		if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) {
> > -			ERROR("invalid port index %lu (max: %u)",
> > -				tmp, MLX4_PMD_MAX_PHYS_PORTS - 1);
> > +		if (!(conf->ports.present & (1 << tmp))) {
> > +			ERROR("invalid port index %lu", tmp);
> 
> The original error included the max value.  Wouldn't it be useful to report this to the 
> user to help them understand their mistake?

Makes sense, I'll add it back.

> > @@ -6085,16 +6092,16 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv,
> > struct rte_pci_device *pci_dev)
> >  	}
> >  	INFO("%u port(s) detected", device_attr.phys_port_cnt);
> > 
> > +	for (i = 0; i < device_attr.phys_port_cnt; ++i)
> > +		conf.ports.present |= 1 << i;
> 
> The loop could be avoided with:
> 
> 	conf.ports.present = (1 << device_attr.phys_port_cnt) - 1;

I will also make that change in the next iteration, thanks.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (47 preceding siblings ...)
  2017-08-01 16:54 ` [PATCH v1 48/48] net/mlx4: clean up includes and comments Adrien Mazarguil
@ 2017-08-18 13:28 ` Ferruh Yigit
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
  49 siblings, 0 replies; 110+ messages in thread
From: Ferruh Yigit @ 2017-08-18 13:28 UTC (permalink / raw)
  To: Adrien Mazarguil, dev

On 8/1/2017 5:53 PM, Adrien Mazarguil wrote:
> The main purpose of this large series is to relieve the mlx4 PMD from its
> dependency on Mellanox OFED to instead rely on the standard rdma-core
> package provided by Linux distributions.
> 
> While compatibility with Mellanox OFED is preserved, all nonstandard
> functionality has to be stripped from the PMD in order to re-implement it
> through an approach compatible with rdma-core.
> 
> Due to the amount of changes necessary to achieve this goal, this rework
> starts off by removing extraneous code to simplify the PMD as much as
> possible before either replacing or dismantling functionality that relies on
> nonstandard Verbs.
> 
> What remains after applying this series is single-segment Tx/Rx support,
> without offloads nor RSS, on the default MAC address (which cannot be
> configured). Support for multiple queues and the flow API (minus the RSS
> action) are also preserved.
> 
> Missing functionality that needs substantial work will be restored later by
> subsequent series.
> 
> Also because the mlx4 PMD is mostly contained in a single very large source
> file of 6400+ lines (mlx4.c) which has become extremely difficult to
> maintain, this rework is used as an opportunity to finally group functions
> into separate files, as in mlx5.
> 

Hi Adrien,

Thanks for this big clean-up.

Patchset does not applies cleanly to latest tree, because of the driver
fixes sent after this patchset, can you please rebase patchset on top of
latest tree?

Thanks,
ferruh


> This rework targets DPDK 17.11.
> 
<...>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 04/48] net/mlx4: remove useless compilation checks
  2017-08-01 16:53 ` [PATCH v1 04/48] net/mlx4: remove useless compilation checks Adrien Mazarguil
@ 2017-08-18 13:39   ` Ferruh Yigit
  2017-09-01 10:19     ` Adrien Mazarguil
  0 siblings, 1 reply; 110+ messages in thread
From: Ferruh Yigit @ 2017-08-18 13:39 UTC (permalink / raw)
  To: Adrien Mazarguil, dev

On 8/1/2017 5:53 PM, Adrien Mazarguil wrote:
> Verbs support for RSS, inline receive and extended device query calls has
> not been optional for a while. Their absence is untested and is therefore
> unsupported.
> 
> Remove the related compilation checks and assume Mellanox OFED is up to
> date, as described in the documentation.

So this requires Mellanox OFED 4.1 is there,
is there a check for the OFED version, or do you think does it required?

> 
> Use this opportunity to remove a few useless data path debugging messages
> behind compilation checks on never defined macros.
> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

<...>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD
  2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
                   ` (48 preceding siblings ...)
  2017-08-18 13:28 ` [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Ferruh Yigit
@ 2017-09-01  8:06 ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 01/51] net/mlx4: add consistency to copyright notices Adrien Mazarguil
                     ` (52 more replies)
  49 siblings, 53 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The main purpose of this large series is to relieve the mlx4 PMD from its
dependency on Mellanox OFED to instead rely on the standard rdma-core
package provided by Linux distributions.

While compatibility with Mellanox OFED is preserved, all nonstandard
functionality has to be stripped from the PMD in order to re-implement it
through an approach compatible with rdma-core.

Due to the amount of changes necessary to achieve this goal, this rework
starts off by removing extraneous code to simplify the PMD as much as
possible before either replacing or dismantling functionality that relies on
nonstandard Verbs.

What remains after applying this series is single-segment Tx/Rx support,
without offloads nor RSS, on the default MAC address (which cannot be
configured). Support for multiple queues and the flow API (minus the RSS
action) are also preserved.

Missing functionality that needs substantial work will be restored later by
subsequent series.

Also because the mlx4 PMD is mostly contained in a single very large source
file of 6400+ lines (mlx4.c) which has become extremely difficult to
maintain, this rework is used as an opportunity to finally group functions
into separate files, as in mlx5.

This rework targets DPDK 17.11.

Changes since v1:

- Rebased series on top of the latest upstream fixes.

- Cleaned up remaining typos and coding style issues.

- "net/mlx4: check max number of ports dynamically":
  Removed extra loop and added error message on maximum number of ports
  according to Allain's suggestion.

- "net/mlx4: drop scatter/gather support":
  Additionally removed unnecessary mbuf pool from rxq_alloc_elts().

- "net/mlx4: simplify Rx buffer handling":
  New patch removing unnecessary code from the simplified Rx path.

- "net/mlx4: remove isolated mode constraint":
  New patch removing needless constraint for isolated mode, which can now
  be toggled anytime.

- "net/mlx4: rely on ethdev for Tx/Rx queue arrays":
  New patch refactoring duplicated information from ethdev.

Adrien Mazarguil (51):
  net/mlx4: add consistency to copyright notices
  net/mlx4: remove limitation on number of instances
  net/mlx4: check max number of ports dynamically
  net/mlx4: remove useless compilation checks
  net/mlx4: remove secondary process support
  net/mlx4: remove useless code
  net/mlx4: remove soft counters compilation option
  net/mlx4: remove scatter mode compilation option
  net/mlx4: remove Tx inline compilation option
  net/mlx4: remove allmulti and promisc support
  net/mlx4: remove VLAN filter support
  net/mlx4: remove MAC address configuration support
  net/mlx4: drop MAC flows affecting all Rx queues
  net/mlx4: revert flow API RSS support
  net/mlx4: revert RSS parent queue refactoring
  net/mlx4: drop RSS support
  net/mlx4: drop checksum offloads support
  net/mlx4: drop packet type recognition support
  net/mlx4: drop scatter/gather support
  net/mlx4: drop inline receive support
  net/mlx4: use standard QP attributes
  net/mlx4: revert resource domain support
  net/mlx4: revert multicast echo prevention
  net/mlx4: revert fast Verbs interface for Tx
  net/mlx4: revert fast Verbs interface for Rx
  net/mlx4: simplify Rx buffer handling
  net/mlx4: simplify link update function
  net/mlx4: standardize on negative errno values
  net/mlx4: clean up coding style inconsistencies
  net/mlx4: remove control path locks
  net/mlx4: remove unnecessary wrapper functions
  net/mlx4: remove mbuf macro definitions
  net/mlx4: use standard macro to get array size
  net/mlx4: separate debugging macros
  net/mlx4: use a single interrupt handle
  net/mlx4: rename alarm field
  net/mlx4: refactor interrupt FD settings
  net/mlx4: clean up interrupt functions prototypes
  net/mlx4: compact interrupt functions
  net/mlx4: separate interrupt handling
  net/mlx4: separate Rx/Tx definitions
  net/mlx4: separate Rx/Tx functions
  net/mlx4: separate device control functions
  net/mlx4: separate Tx configuration functions
  net/mlx4: separate Rx configuration functions
  net/mlx4: group flow API handlers in common file
  net/mlx4: rename private functions in flow API
  net/mlx4: separate memory management functions
  net/mlx4: clean up includes and comments
  net/mlx4: remove isolated mode constraint
  net/mlx4: rely on ethdev for Tx/Rx queue arrays

 config/common_base                |    3 -
 doc/guides/nics/features/mlx4.ini |   13 -
 doc/guides/nics/mlx4.rst          |   37 +-
 drivers/net/mlx4/Makefile         |   41 +-
 drivers/net/mlx4/mlx4.c           | 6354 ++------------------------------
 drivers/net/mlx4/mlx4.h           |  333 +-
 drivers/net/mlx4/mlx4_ethdev.c    |  788 ++++
 drivers/net/mlx4/mlx4_flow.c      |  491 +--
 drivers/net/mlx4/mlx4_flow.h      |   51 +-
 drivers/net/mlx4/mlx4_intr.c      |  375 ++
 drivers/net/mlx4/mlx4_mr.c        |  183 +
 drivers/net/mlx4/mlx4_rxq.c       |  579 +++
 drivers/net/mlx4/mlx4_rxtx.c      |  524 +++
 drivers/net/mlx4/mlx4_rxtx.h      |  154 +
 drivers/net/mlx4/mlx4_txq.c       |  472 +++
 drivers/net/mlx4/mlx4_utils.c     |   66 +
 drivers/net/mlx4/mlx4_utils.h     |  111 +
 17 files changed, 3737 insertions(+), 6838 deletions(-)
 create mode 100644 drivers/net/mlx4/mlx4_ethdev.c
 create mode 100644 drivers/net/mlx4/mlx4_intr.c
 create mode 100644 drivers/net/mlx4/mlx4_mr.c
 create mode 100644 drivers/net/mlx4/mlx4_rxq.c
 create mode 100644 drivers/net/mlx4/mlx4_rxtx.c
 create mode 100644 drivers/net/mlx4/mlx4_rxtx.h
 create mode 100644 drivers/net/mlx4/mlx4_txq.c
 create mode 100644 drivers/net/mlx4/mlx4_utils.c
 create mode 100644 drivers/net/mlx4/mlx4_utils.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v2 01/51] net/mlx4: add consistency to copyright notices
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 02/51] net/mlx4: remove limitation on number of instances Adrien Mazarguil
                     ` (51 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Copyright lasts long enough not to require notices to be updated yearly.

The current approach of updating them occasionally while working on
unrelated tasks should be deprecated in favor of dedicated commits updating
all files at once when necessary.

Standardize on a single year per copyright owner.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/mlx4.rst     | 2 +-
 drivers/net/mlx4/Makefile    | 4 ++--
 drivers/net/mlx4/mlx4.c      | 4 ++--
 drivers/net/mlx4/mlx4.h      | 4 ++--
 drivers/net/mlx4/mlx4_flow.c | 2 +-
 drivers/net/mlx4/mlx4_flow.h | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index f8885b2..388aaf3 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-    Copyright 2012-2015 6WIND S.A.
+    Copyright 2012 6WIND S.A.
     Copyright 2015 Mellanox
 
     Redistribution and use in source and binary forms, with or without
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index c045bd7..b2ef128 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -1,7 +1,7 @@
 #   BSD LICENSE
 #
-#   Copyright 2012-2015 6WIND S.A.
-#   Copyright 2012 Mellanox.
+#   Copyright 2012 6WIND S.A.
+#   Copyright 2012 Mellanox
 #
 #   Redistribution and use in source and binary forms, with or without
 #   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 055de49..b5a7607 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2012-2017 6WIND S.A.
- *   Copyright 2012-2017 Mellanox.
+ *   Copyright 2012 6WIND S.A.
+ *   Copyright 2012 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index c0ade4f..5fd1454 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -1,8 +1,8 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright 2012-2017 6WIND S.A.
- *   Copyright 2012-2017 Mellanox.
+ *   Copyright 2012 6WIND S.A.
+ *   Copyright 2012 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 925c89c..ab37e7d 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -2,7 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright 2017 6WIND S.A.
- *   Copyright 2017 Mellanox.
+ *   Copyright 2017 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index beabcf2..4654dc2 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -2,7 +2,7 @@
  *   BSD LICENSE
  *
  *   Copyright 2017 6WIND S.A.
- *   Copyright 2017 Mellanox.
+ *   Copyright 2017 Mellanox
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 02/51] net/mlx4: remove limitation on number of instances
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 01/51] net/mlx4: add consistency to copyright notices Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 03/51] net/mlx4: check max number of ports dynamically Adrien Mazarguil
                     ` (50 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The seemingly artificial limitation on the maximum number of instances for
this PMD is an historical leftover that predates its first public release.

It was used as a workaround to support multiple physical ports on a PCI
device exposing a single bus address when mlx4 was implemented directly as
an Ethernet device driver instead of a PCI driver spawning Ethernet
devices.

Getting rid of it simplifies device initialization.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 57 +++-----------------------------------------
 1 file changed, 3 insertions(+), 54 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index b5a7607..0ae78e0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -5444,40 +5444,6 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-/* Support up to 32 adapters. */
-static struct {
-	struct rte_pci_addr pci_addr; /* associated PCI address */
-	uint32_t ports; /* physical ports bitfield. */
-} mlx4_dev[32];
-
-/**
- * Get device index in mlx4_dev[] from PCI bus address.
- *
- * @param[in] pci_addr
- *   PCI bus address to look for.
- *
- * @return
- *   mlx4_dev[] index on success, -1 on failure.
- */
-static int
-mlx4_dev_idx(struct rte_pci_addr *pci_addr)
-{
-	unsigned int i;
-	int ret = -1;
-
-	assert(pci_addr != NULL);
-	for (i = 0; (i != elemof(mlx4_dev)); ++i) {
-		if ((mlx4_dev[i].pci_addr.domain == pci_addr->domain) &&
-		    (mlx4_dev[i].pci_addr.bus == pci_addr->bus) &&
-		    (mlx4_dev[i].pci_addr.devid == pci_addr->devid) &&
-		    (mlx4_dev[i].pci_addr.function == pci_addr->function))
-			return i;
-		if ((mlx4_dev[i].ports == 0) && (ret == -1))
-			ret = i;
-	}
-	return ret;
-}
-
 /**
  * Retrieve integer value from environment variable.
  *
@@ -6060,21 +6026,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		.active_ports = 0,
 	};
 	unsigned int vf;
-	int idx;
 	int i;
 
 	(void)pci_drv;
 	assert(pci_drv == &mlx4_driver);
-	/* Get mlx4_dev[] index. */
-	idx = mlx4_dev_idx(&pci_dev->addr);
-	if (idx == -1) {
-		ERROR("this driver cannot support any more adapters");
-		return -ENOMEM;
-	}
-	DEBUG("using driver device index %d", idx);
 
-	/* Save PCI address. */
-	mlx4_dev[idx].pci_addr = pci_dev->addr;
 	list = ibv_get_device_list(&i);
 	if (list == NULL) {
 		assert(errno);
@@ -6141,7 +6097,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	}
 	for (i = 0; i < device_attr.phys_port_cnt; i++) {
 		uint32_t port = i + 1; /* ports are indexed from one */
-		uint32_t test = (1 << i);
 		struct ibv_context *ctx = NULL;
 		struct ibv_port_attr port_attr;
 		struct ibv_pd *pd = NULL;
@@ -6162,7 +6117,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* RSS_SUPPORT */
 #endif /* HAVE_EXP_QUERY_DEVICE */
 
-		DEBUG("using port %u (%08" PRIx32 ")", port, test);
+		DEBUG("using port %u", port);
 
 		ctx = ibv_open_device(ibv_dev);
 		if (ctx == NULL) {
@@ -6198,8 +6153,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 
-		mlx4_dev[idx].ports |= test;
-
 		/* from rte_ethdev.c */
 		priv = rte_zmalloc("ethdev private structure",
 				   sizeof(*priv),
@@ -6405,6 +6358,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			rte_eth_dev_release_port(eth_dev);
 		break;
 	}
+	if (i == device_attr.phys_port_cnt)
+		return 0;
 
 	/*
 	 * XXX if something went wrong in the loop above, there is a resource
@@ -6413,12 +6368,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	 * way to enumerate the registered ethdevs to free the previous ones.
 	 */
 
-	/* no port found, complain */
-	if (!mlx4_dev[idx].ports) {
-		err = ENODEV;
-		goto error;
-	}
-
 error:
 	if (attr_ctx)
 		claim_zero(ibv_close_device(attr_ctx));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 03/51] net/mlx4: check max number of ports dynamically
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 01/51] net/mlx4: add consistency to copyright notices Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 02/51] net/mlx4: remove limitation on number of instances Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01 10:57     ` Legacy, Allain
  2017-09-01  8:06   ` [PATCH v2 04/51] net/mlx4: remove useless compilation checks Adrien Mazarguil
                     ` (49 subsequent siblings)
  52 siblings, 1 reply; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Gaëtan Rivet, Allain Legacy

Use maximum number reported by hardware capabilities as replacement for the
static check on MLX4_PMD_MAX_PHYS_PORTS.

Cc: Gaëtan Rivet <gaetan.rivet@6wind.com>
Cc: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 49 ++++++++++++++++++++++++++++----------------
 drivers/net/mlx4/mlx4.h |  3 ---
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 0ae78e0..7a93462 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -118,8 +118,12 @@ struct mlx4_secondary_data {
 	rte_spinlock_t lock; /* Port configuration lock. */
 } mlx4_secondary_data[RTE_MAX_ETHPORTS];
 
+/** Configuration structure for device arguments. */
 struct mlx4_conf {
-	uint8_t active_ports;
+	struct {
+		uint32_t present; /**< Bit-field for existing ports. */
+		uint32_t enabled; /**< Bit-field for user-enabled ports. */
+	} ports;
 };
 
 /* Available parameters list. */
@@ -5927,16 +5931,15 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
  *   Key argument to verify.
  * @param[in] val
  *   Value associated with key.
- * @param out
- *   User data.
+ * @param[in, out] conf
+ *   Shared configuration data.
  *
  * @return
  *   0 on success, negative errno value on failure.
  */
 static int
-mlx4_arg_parse(const char *key, const char *val, void *out)
+mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 {
-	struct mlx4_conf *conf = out;
 	unsigned long tmp;
 
 	errno = 0;
@@ -5946,12 +5949,18 @@ mlx4_arg_parse(const char *key, const char *val, void *out)
 		return -errno;
 	}
 	if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) {
-		if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) {
-			ERROR("invalid port index %lu (max: %u)",
-				tmp, MLX4_PMD_MAX_PHYS_PORTS - 1);
+		uint32_t ports = rte_log2_u32(conf->ports.present);
+
+		if (tmp >= ports) {
+			ERROR("port index %lu outside range [0,%" PRIu32 ")",
+			      tmp, ports);
 			return -EINVAL;
 		}
-		conf->active_ports |= 1 << tmp;
+		if (!(conf->ports.present & (1 << tmp))) {
+			ERROR("invalid port index %lu", tmp);
+			return -EINVAL;
+		}
+		conf->ports.enabled |= 1 << tmp;
 	} else {
 		WARN("%s: unknown parameter", key);
 		return -EINVAL;
@@ -5987,8 +5996,13 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
 		arg_count = rte_kvargs_count(kvlist, MLX4_PMD_PORT_KVARG);
 		while (arg_count-- > 0) {
-			ret = rte_kvargs_process(kvlist, MLX4_PMD_PORT_KVARG,
-					mlx4_arg_parse, conf);
+			ret = rte_kvargs_process(kvlist,
+						 MLX4_PMD_PORT_KVARG,
+						 (int (*)(const char *,
+							  const char *,
+							  void *))
+						 mlx4_arg_parse,
+						 conf);
 			if (ret != 0)
 				goto free_kvlist;
 		}
@@ -6023,7 +6037,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	struct ibv_context *attr_ctx = NULL;
 	struct ibv_device_attr device_attr;
 	struct mlx4_conf conf = {
-		.active_ports = 0,
+		.ports.present = 0,
 	};
 	unsigned int vf;
 	int i;
@@ -6085,16 +6099,15 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	}
 	INFO("%u port(s) detected", device_attr.phys_port_cnt);
 
+	conf.ports.present |= (UINT64_C(1) << device_attr.phys_port_cnt) - 1;
 	if (mlx4_args(pci_dev->device.devargs, &conf)) {
 		ERROR("failed to process device arguments");
 		err = EINVAL;
 		goto error;
 	}
 	/* Use all ports when none are defined */
-	if (conf.active_ports == 0) {
-		for (i = 0; i < MLX4_PMD_MAX_PHYS_PORTS; i++)
-			conf.active_ports |= 1 << i;
-	}
+	if (!conf.ports.enabled)
+		conf.ports.enabled = conf.ports.present;
 	for (i = 0; i < device_attr.phys_port_cnt; i++) {
 		uint32_t port = i + 1; /* ports are indexed from one */
 		struct ibv_context *ctx = NULL;
@@ -6107,8 +6120,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 #endif /* HAVE_EXP_QUERY_DEVICE */
 		struct ether_addr mac;
 
-		/* If port is not active, skip. */
-		if (!(conf.active_ports & (1 << i)))
+		/* If port is not enabled, skip. */
+		if (!(conf.ports.enabled & (1 << i)))
 			continue;
 #ifdef HAVE_EXP_QUERY_DEVICE
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5fd1454..109cd1b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -81,9 +81,6 @@
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
-/* Maximum number of physical ports. */
-#define MLX4_PMD_MAX_PHYS_PORTS 2
-
 /* Maximum number of Scatter/Gather Elements per Work Request. */
 #ifndef MLX4_PMD_SGE_WR_N
 #define MLX4_PMD_SGE_WR_N 4
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 04/51] net/mlx4: remove useless compilation checks
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (2 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 03/51] net/mlx4: check max number of ports dynamically Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 05/51] net/mlx4: remove secondary process support Adrien Mazarguil
                     ` (48 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Verbs support for RSS, inline receive and extended device query calls has
not been optional for a while. Their absence is untested and is therefore
unsupported.

Remove the related compilation checks and assume Mellanox OFED is up to
date, as described in the documentation.

Use this opportunity to remove a few useless data path debugging messages
behind compilation checks on never defined macros.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    | 12 ------------
 drivers/net/mlx4/mlx4.c      | 35 -----------------------------------
 drivers/net/mlx4/mlx4.h      |  2 --
 drivers/net/mlx4/mlx4_flow.c |  3 ---
 4 files changed, 52 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index b2ef128..07a66c4 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -104,18 +104,6 @@ mlx4_autoconf.h.new: FORCE
 mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 	$Q $(RM) -f -- '$@'
 	$Q sh -- '$<' '$@' \
-		RSS_SUPPORT \
-		infiniband/verbs.h \
-		enum IBV_EXP_DEVICE_UD_RSS $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		INLINE_RECV \
-		infiniband/verbs.h \
-		enum IBV_EXP_DEVICE_ATTR_INLINE_RECV_SZ $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
-		HAVE_EXP_QUERY_DEVICE \
-		infiniband/verbs.h \
-		type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
-	$Q sh -- '$<' '$@' \
 		HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
 		infiniband/verbs.h \
 		enum IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 7a93462..cef024a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1103,10 +1103,6 @@ txq_complete(struct txq *txq)
 
 	if (unlikely(elts_comp == 0))
 		return 0;
-#ifdef DEBUG_SEND
-	DEBUG("%p: processing %u work requests completions",
-	      (void *)txq, elts_comp);
-#endif
 	wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
 	if (unlikely(wcs_n == 0))
 		return 0;
@@ -3155,9 +3151,6 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		return 0;
 	*next = NULL;
 	/* Repost WRs. */
-#ifdef DEBUG_RECV
-	DEBUG("%p: reposting %d WRs", (void *)rxq, i);
-#endif
 	ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
@@ -3318,9 +3311,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	if (unlikely(i == 0))
 		return 0;
 	/* Repost WRs. */
-#ifdef DEBUG_RECV
-	DEBUG("%p: reposting %u WRs", (void *)rxq, i);
-#endif
 	ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
@@ -3418,15 +3408,11 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-#ifdef INLINE_RECV
 	attr.max_inl_recv = priv->inl_recv_size;
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-#endif
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
-#ifdef RSS_SUPPORT
-
 /**
  * Allocate a RSS Queue Pair.
  * Optionally setup inline receive if supported.
@@ -3474,10 +3460,8 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-#ifdef INLINE_RECV
 	attr.max_inl_recv = priv->inl_recv_size,
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-#endif
 	if (children_n > 0) {
 		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
 		/* TSS isn't necessary. */
@@ -3493,8 +3477,6 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
-#endif /* RSS_SUPPORT */
-
 /**
  * Reconfigure a RX queue with new parameters.
  *
@@ -3728,13 +3710,11 @@ rxq_create_qp(struct rxq *rxq,
 	int parent = (children_n > 0);
 	struct priv *priv = rxq->priv;
 
-#ifdef RSS_SUPPORT
 	if (priv->rss && !inactive && (rxq_parent || parent))
 		rxq->qp = rxq_setup_qp_rss(priv, rxq->cq, desc,
 					   children_n, rxq->rd,
 					   rxq_parent);
 	else
-#endif /* RSS_SUPPORT */
 		rxq->qp = rxq_setup_qp(priv, rxq->cq, desc, rxq->rd);
 	if (rxq->qp == NULL) {
 		ret = (errno ? errno : EINVAL);
@@ -3750,9 +3730,7 @@ rxq_create_qp(struct rxq *rxq,
 	};
 	ret = ibv_exp_modify_qp(rxq->qp, &mod,
 				(IBV_EXP_QP_STATE |
-#ifdef RSS_SUPPORT
 				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-#endif /* RSS_SUPPORT */
 				 IBV_EXP_QP_PORT));
 	if (ret) {
 		ERROR("QP state to IBV_QPS_INIT failed: %s",
@@ -6115,20 +6093,14 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		struct ibv_pd *pd = NULL;
 		struct priv *priv = NULL;
 		struct rte_eth_dev *eth_dev = NULL;
-#ifdef HAVE_EXP_QUERY_DEVICE
 		struct ibv_exp_device_attr exp_device_attr;
-#endif /* HAVE_EXP_QUERY_DEVICE */
 		struct ether_addr mac;
 
 		/* If port is not enabled, skip. */
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
-#ifdef HAVE_EXP_QUERY_DEVICE
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
-#ifdef RSS_SUPPORT
 		exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
-#endif /* RSS_SUPPORT */
-#endif /* HAVE_EXP_QUERY_DEVICE */
 
 		DEBUG("using port %u", port);
 
@@ -6181,13 +6153,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
-#ifdef HAVE_EXP_QUERY_DEVICE
 		if (ibv_exp_query_device(ctx, &exp_device_attr)) {
 			ERROR("ibv_exp_query_device() failed");
 			err = ENODEV;
 			goto port_error;
 		}
-#ifdef RSS_SUPPORT
 		if ((exp_device_attr.exp_device_cap_flags &
 		     IBV_EXP_DEVICE_QPG) &&
 		    (exp_device_attr.exp_device_cap_flags &
@@ -6212,7 +6182,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		if (priv->hw_rss)
 			DEBUG("maximum RSS indirection table size: %u",
 			      exp_device_attr.max_rss_tbl_sz);
-#endif /* RSS_SUPPORT */
 
 		priv->hw_csum =
 			((exp_device_attr.exp_device_cap_flags &
@@ -6227,7 +6196,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		DEBUG("L2 tunnel checksum offloads are %ssupported",
 		      (priv->hw_csum_l2tun ? "" : "not "));
 
-#ifdef INLINE_RECV
 		priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
 
 		if (priv->inl_recv_size) {
@@ -6251,10 +6219,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			INFO("Set inline receive size to %u",
 			     priv->inl_recv_size);
 		}
-#endif /* INLINE_RECV */
-#endif /* HAVE_EXP_QUERY_DEVICE */
 
-		(void)mlx4_getenv_int;
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 109cd1b..a12d1fa 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -344,9 +344,7 @@ struct priv {
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-#ifdef INLINE_RECV
 	unsigned int inl_recv_size; /* Inline recv size */
-#endif
 	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index ab37e7d..f5c015e 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -37,9 +37,6 @@
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
 
-/* Generated configuration header. */
-#include "mlx4_autoconf.h"
-
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 05/51] net/mlx4: remove secondary process support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (3 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 04/51] net/mlx4: remove useless compilation checks Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 06/51] net/mlx4: remove useless code Adrien Mazarguil
                     ` (47 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Current implementation is partial (Tx only), not convenient to use and
not of primary concern.

Remove this feature before refactoring the PMD.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |   2 -
 drivers/net/mlx4/mlx4.c           | 349 +--------------------------------
 3 files changed, 8 insertions(+), 344 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 1d5f266..f6efd21 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -27,7 +27,6 @@ Inner L4 checksum    = Y
 Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
-Multiprocess aware   = Y
 Other kdrv           = Y
 Power8               = Y
 x86-32               = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 388aaf3..bba1034 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -88,7 +88,6 @@ Features
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
 - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
-- Secondary process TX is supported.
 - RX interrupts.
 
 Limitations
@@ -99,7 +98,6 @@ Limitations
 - RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be
   dissociated.
 - Hardware counters are not implemented (they are software counters).
-- Secondary process RX is not supported.
 
 Configuration
 -------------
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cef024a..6f8d328 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -110,14 +110,6 @@ typedef union {
 	 (((val) & (from)) / ((from) / (to))) : \
 	 (((val) & (from)) * ((to) / (from))))
 
-/* Local storage for secondary process data. */
-struct mlx4_secondary_data {
-	struct rte_eth_dev_data data; /* Local device data. */
-	struct priv *primary_priv; /* Private structure from primary. */
-	struct rte_eth_dev_data *shared_dev_data; /* Shared device data. */
-	rte_spinlock_t lock; /* Port configuration lock. */
-} mlx4_secondary_data[RTE_MAX_ETHPORTS];
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -145,38 +137,6 @@ static void
 priv_rx_intr_vec_disable(struct priv *priv);
 
 /**
- * Check if running as a secondary process.
- *
- * @return
- *   Nonzero if running as a secondary process.
- */
-static inline int
-mlx4_is_secondary(void)
-{
-	return rte_eal_process_type() != RTE_PROC_PRIMARY;
-}
-
-/**
- * Return private structure associated with an Ethernet device.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   Pointer to private structure.
- */
-static struct priv *
-mlx4_get_priv(struct rte_eth_dev *dev)
-{
-	struct mlx4_secondary_data *sd;
-
-	if (!mlx4_is_secondary())
-		return dev->data->dev_private;
-	sd = &mlx4_secondary_data[dev->data->port_id];
-	return sd->data.dev_private;
-}
-
-/**
  * Lock private structure to protect it from concurrent access in the
  * control path.
  *
@@ -734,8 +694,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	ret = dev_configure(dev);
 	assert(ret >= 0);
@@ -746,157 +704,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
 static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
 
-/**
- * Configure secondary process queues from a private data pointer (primary
- * or secondary) and update burst callbacks. Can take place only once.
- *
- * All queues must have been previously created by the primary process to
- * avoid undefined behavior.
- *
- * @param priv
- *   Private data pointer from either primary or secondary process.
- *
- * @return
- *   Private data pointer from secondary process, NULL in case of error.
- */
-static struct priv *
-mlx4_secondary_data_setup(struct priv *priv)
-{
-	unsigned int port_id = 0;
-	struct mlx4_secondary_data *sd;
-	void **tx_queues;
-	void **rx_queues;
-	unsigned int nb_tx_queues;
-	unsigned int nb_rx_queues;
-	unsigned int i;
-
-	/* priv must be valid at this point. */
-	assert(priv != NULL);
-	/* priv->dev must also be valid but may point to local memory from
-	 * another process, possibly with the same address and must not
-	 * be dereferenced yet. */
-	assert(priv->dev != NULL);
-	/* Determine port ID by finding out where priv comes from. */
-	while (1) {
-		sd = &mlx4_secondary_data[port_id];
-		rte_spinlock_lock(&sd->lock);
-		/* Primary process? */
-		if (sd->primary_priv == priv)
-			break;
-		/* Secondary process? */
-		if (sd->data.dev_private == priv)
-			break;
-		rte_spinlock_unlock(&sd->lock);
-		if (++port_id == RTE_DIM(mlx4_secondary_data))
-			port_id = 0;
-	}
-	/* Switch to secondary private structure. If private data has already
-	 * been updated by another thread, there is nothing else to do. */
-	priv = sd->data.dev_private;
-	if (priv->dev->data == &sd->data)
-		goto end;
-	/* Sanity checks. Secondary private structure is supposed to point
-	 * to local eth_dev, itself still pointing to the shared device data
-	 * structure allocated by the primary process. */
-	assert(sd->shared_dev_data != &sd->data);
-	assert(sd->data.nb_tx_queues == 0);
-	assert(sd->data.tx_queues == NULL);
-	assert(sd->data.nb_rx_queues == 0);
-	assert(sd->data.rx_queues == NULL);
-	assert(priv != sd->primary_priv);
-	assert(priv->dev->data == sd->shared_dev_data);
-	assert(priv->txqs_n == 0);
-	assert(priv->txqs == NULL);
-	assert(priv->rxqs_n == 0);
-	assert(priv->rxqs == NULL);
-	nb_tx_queues = sd->shared_dev_data->nb_tx_queues;
-	nb_rx_queues = sd->shared_dev_data->nb_rx_queues;
-	/* Allocate local storage for queues. */
-	tx_queues = rte_zmalloc("secondary ethdev->tx_queues",
-				sizeof(sd->data.tx_queues[0]) * nb_tx_queues,
-				RTE_CACHE_LINE_SIZE);
-	rx_queues = rte_zmalloc("secondary ethdev->rx_queues",
-				sizeof(sd->data.rx_queues[0]) * nb_rx_queues,
-				RTE_CACHE_LINE_SIZE);
-	if (tx_queues == NULL || rx_queues == NULL)
-		goto error;
-	/* Lock to prevent control operations during setup. */
-	priv_lock(priv);
-	/* TX queues. */
-	for (i = 0; i != nb_tx_queues; ++i) {
-		struct txq *primary_txq = (*sd->primary_priv->txqs)[i];
-		struct txq *txq;
-
-		if (primary_txq == NULL)
-			continue;
-		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0,
-					primary_txq->socket);
-		if (txq != NULL) {
-			if (txq_setup(priv->dev,
-				      txq,
-				      primary_txq->elts_n * MLX4_PMD_SGE_WR_N,
-				      primary_txq->socket,
-				      NULL) == 0) {
-				txq->stats.idx = primary_txq->stats.idx;
-				tx_queues[i] = txq;
-				continue;
-			}
-			rte_free(txq);
-		}
-		while (i) {
-			txq = tx_queues[--i];
-			txq_cleanup(txq);
-			rte_free(txq);
-		}
-		goto error;
-	}
-	/* RX queues. */
-	for (i = 0; i != nb_rx_queues; ++i) {
-		struct rxq *primary_rxq = (*sd->primary_priv->rxqs)[i];
-
-		if (primary_rxq == NULL)
-			continue;
-		/* Not supported yet. */
-		rx_queues[i] = NULL;
-	}
-	/* Update everything. */
-	priv->txqs = (void *)tx_queues;
-	priv->txqs_n = nb_tx_queues;
-	priv->rxqs = (void *)rx_queues;
-	priv->rxqs_n = nb_rx_queues;
-	sd->data.rx_queues = rx_queues;
-	sd->data.tx_queues = tx_queues;
-	sd->data.nb_rx_queues = nb_rx_queues;
-	sd->data.nb_tx_queues = nb_tx_queues;
-	sd->data.dev_link = sd->shared_dev_data->dev_link;
-	sd->data.mtu = sd->shared_dev_data->mtu;
-	memcpy(sd->data.rx_queue_state, sd->shared_dev_data->rx_queue_state,
-	       sizeof(sd->data.rx_queue_state));
-	memcpy(sd->data.tx_queue_state, sd->shared_dev_data->tx_queue_state,
-	       sizeof(sd->data.tx_queue_state));
-	sd->data.dev_flags = sd->shared_dev_data->dev_flags;
-	/* Use local data from now on. */
-	rte_mb();
-	priv->dev->data = &sd->data;
-	rte_mb();
-	priv->dev->tx_pkt_burst = mlx4_tx_burst;
-	priv->dev->rx_pkt_burst = removed_rx_burst;
-	priv_unlock(priv);
-end:
-	/* More sanity checks. */
-	assert(priv->dev->tx_pkt_burst == mlx4_tx_burst);
-	assert(priv->dev->rx_pkt_burst == removed_rx_burst);
-	assert(priv->dev->data == &sd->data);
-	rte_spinlock_unlock(&sd->lock);
-	return priv;
-error:
-	priv_unlock(priv);
-	rte_free(tx_queues);
-	rte_free(rx_queues);
-	rte_spinlock_unlock(&sd->lock);
-	return NULL;
-}
-
 /* TX queues handling. */
 
 /**
@@ -1704,46 +1511,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 }
 
 /**
- * DPDK callback for TX in secondary processes.
- *
- * This function configures all queues from primary process information
- * if necessary before reverting to the normal TX burst callback.
- *
- * @param dpdk_txq
- *   Generic pointer to TX queue structure.
- * @param[in] pkts
- *   Packets to transmit.
- * @param pkts_n
- *   Number of packets in array.
- *
- * @return
- *   Number of packets successfully transmitted (<= pkts_n).
- */
-static uint16_t
-mlx4_tx_burst_secondary_setup(void *dpdk_txq, struct rte_mbuf **pkts,
-			      uint16_t pkts_n)
-{
-	struct txq *txq = dpdk_txq;
-	struct priv *priv = mlx4_secondary_data_setup(txq->priv);
-	struct priv *primary_priv;
-	unsigned int index;
-
-	if (priv == NULL)
-		return 0;
-	primary_priv =
-		mlx4_secondary_data[priv->dev->data->port_id].primary_priv;
-	/* Look for queue index in both private structures. */
-	for (index = 0; index != priv->txqs_n; ++index)
-		if (((*primary_priv->txqs)[index] == txq) ||
-		    ((*priv->txqs)[index] == txq))
-			break;
-	if (index == priv->txqs_n)
-		return 0;
-	txq = (*priv->txqs)[index];
-	return priv->dev->tx_pkt_burst(txq, pkts, pkts_n);
-}
-
-/**
  * Configure a TX queue.
  *
  * @param dev
@@ -1764,7 +1531,7 @@ static int
 txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	  unsigned int socket, const struct rte_eth_txconf *conf)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	struct txq tmpl = {
 		.priv = priv,
 		.socket = socket
@@ -1960,8 +1727,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	struct txq *txq = (*priv->txqs)[idx];
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
@@ -2017,8 +1782,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 	struct priv *priv;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	if (txq == NULL)
 		return;
 	priv = txq->priv;
@@ -3328,46 +3091,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 }
 
 /**
- * DPDK callback for RX in secondary processes.
- *
- * This function configures all queues from primary process information
- * if necessary before reverting to the normal RX burst callback.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-mlx4_rx_burst_secondary_setup(void *dpdk_rxq, struct rte_mbuf **pkts,
-			      uint16_t pkts_n)
-{
-	struct rxq *rxq = dpdk_rxq;
-	struct priv *priv = mlx4_secondary_data_setup(rxq->priv);
-	struct priv *primary_priv;
-	unsigned int index;
-
-	if (priv == NULL)
-		return 0;
-	primary_priv =
-		mlx4_secondary_data[priv->dev->data->port_id].primary_priv;
-	/* Look for queue index in both private structures. */
-	for (index = 0; index != priv->rxqs_n; ++index)
-		if (((*primary_priv->rxqs)[index] == rxq) ||
-		    ((*priv->rxqs)[index] == rxq))
-			break;
-	if (index == priv->rxqs_n)
-		return 0;
-	rxq = (*priv->rxqs)[index];
-	return priv->dev->rx_pkt_burst(rxq, pkts, pkts_n);
-}
-
-/**
  * Allocate a Queue Pair.
  * Optionally setup inline receive if supported.
  *
@@ -3998,8 +3721,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	int inactive = 0;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
@@ -4067,8 +3788,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	struct priv *priv;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	if (rxq == NULL)
 		return;
 	priv = rxq->priv;
@@ -4114,8 +3833,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	struct rxq *rxq;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	if (priv->started) {
 		priv_unlock(priv);
@@ -4206,8 +3923,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	unsigned int r;
 	struct rxq *rxq;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (!priv->started) {
 		priv_unlock(priv);
@@ -4309,7 +4024,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
 static void
 mlx4_dev_close(struct rte_eth_dev *dev)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	void *tmp;
 	unsigned int i;
 
@@ -4462,7 +4177,7 @@ mlx4_set_link_up(struct rte_eth_dev *dev)
 static void
 mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	unsigned int max;
 	char ifname[IF_NAMESIZE];
 
@@ -4539,7 +4254,7 @@ mlx4_dev_supported_ptypes_get(struct rte_eth_dev *dev)
 static void
 mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	struct rte_eth_stats tmp = {0};
 	unsigned int i;
 	unsigned int idx;
@@ -4604,7 +4319,7 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 static void
 mlx4_stats_reset(struct rte_eth_dev *dev)
 {
-	struct priv *priv = mlx4_get_priv(dev);
+	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 	unsigned int idx;
 
@@ -4644,8 +4359,6 @@ mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
 {
 	struct priv *priv = dev->data->dev_private;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (priv->isolated)
 		goto end;
@@ -4678,8 +4391,6 @@ mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
 	struct priv *priv = dev->data->dev_private;
 	int re;
 
-	if (mlx4_is_secondary())
-		return -ENOTSUP;
 	(void)vmdq;
 	priv_lock(priv);
 	if (priv->isolated) {
@@ -4732,8 +4443,6 @@ mlx4_promiscuous_enable(struct rte_eth_dev *dev)
 	unsigned int i;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (priv->isolated) {
 		DEBUG("%p: cannot enable promiscuous, "
@@ -4786,8 +4495,6 @@ mlx4_promiscuous_disable(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (!priv->promisc || priv->isolated) {
 		priv_unlock(priv);
@@ -4818,8 +4525,6 @@ mlx4_allmulticast_enable(struct rte_eth_dev *dev)
 	unsigned int i;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (priv->isolated) {
 		DEBUG("%p: cannot enable allmulticast, "
@@ -4872,8 +4577,6 @@ mlx4_allmulticast_disable(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
 
-	if (mlx4_is_secondary())
-		return;
 	priv_lock(priv);
 	if (!priv->allmulti || priv->isolated) {
 		priv_unlock(priv);
@@ -4902,7 +4605,7 @@ mlx4_allmulticast_disable(struct rte_eth_dev *dev)
 static int
 mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 {
-	const struct priv *priv = mlx4_get_priv(dev);
+	const struct priv *priv = dev->data->dev_private;
 	struct ethtool_cmd edata = {
 		.cmd = ETHTOOL_GSET
 	};
@@ -4976,8 +4679,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
 		mlx4_rx_burst;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	/* Set kernel interface MTU first. */
 	if (priv_set_mtu(priv, mtu)) {
@@ -5059,8 +4760,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	};
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	ifr.ifr_data = (void *)&ethpause;
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
@@ -5109,8 +4808,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	};
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	ifr.ifr_data = (void *)&ethpause;
 	ethpause.autoneg = fc_conf->autoneg;
 	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
@@ -5250,8 +4947,6 @@ mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	if (mlx4_is_secondary())
-		return -E_RTE_SECONDARY;
 	priv_lock(priv);
 	if (priv->isolated) {
 		DEBUG("%p: cannot set vlan filter, "
@@ -6269,36 +5964,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 
-		/* Secondary processes have to use local storage for their
-		 * private data as well as a copy of eth_dev->data, but this
-		 * pointer must not be modified before burst functions are
-		 * actually called. */
-		if (mlx4_is_secondary()) {
-			struct mlx4_secondary_data *sd =
-				&mlx4_secondary_data[eth_dev->data->port_id];
-
-			sd->primary_priv = eth_dev->data->dev_private;
-			if (sd->primary_priv == NULL) {
-				ERROR("no private data for port %u",
-				      eth_dev->data->port_id);
-				err = EINVAL;
-				goto port_error;
-			}
-			sd->shared_dev_data = eth_dev->data;
-			rte_spinlock_init(&sd->lock);
-			memcpy(sd->data.name, sd->shared_dev_data->name,
-			       sizeof(sd->data.name));
-			sd->data.dev_private = priv;
-			sd->data.rx_mbuf_alloc_failed = 0;
-			sd->data.mtu = ETHER_MTU;
-			sd->data.port_id = sd->shared_dev_data->port_id;
-			sd->data.mac_addrs = priv->mac;
-			eth_dev->tx_pkt_burst = mlx4_tx_burst_secondary_setup;
-			eth_dev->rx_pkt_burst = mlx4_rx_burst_secondary_setup;
-		} else {
-			eth_dev->data->dev_private = priv;
-			eth_dev->data->mac_addrs = priv->mac;
-		}
+		eth_dev->data->dev_private = priv;
+		eth_dev->data->mac_addrs = priv->mac;
 		eth_dev->device = &pci_dev->device;
 
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 06/51] net/mlx4: remove useless code
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (4 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 05/51] net/mlx4: remove secondary process support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 07/51] net/mlx4: remove soft counters compilation option Adrien Mazarguil
                     ` (46 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Less code makes refactoring easier. No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 17 +----------------
 drivers/net/mlx4/mlx4.h | 12 ------------
 2 files changed, 1 insertion(+), 28 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 6f8d328..cb07a53 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -69,13 +69,13 @@
 #include <rte_malloc.h>
 #include <rte_spinlock.h>
 #include <rte_atomic.h>
-#include <rte_version.h>
 #include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
+#include <rte_branch_prediction.h>
 
 /* Generated configuration header. */
 #include "mlx4_autoconf.h"
@@ -4649,10 +4649,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	return -1;
 }
 
-static int
-mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
-			    struct rte_pci_addr *pci_addr);
-
 /**
  * DPDK callback to change the MTU.
  *
@@ -4998,10 +4994,6 @@ mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
 			return -EINVAL;
 		*(const void **)arg = &mlx4_flow_ops;
 		return 0;
-	case RTE_ETH_FILTER_FDIR:
-		DEBUG("%p: filter type FDIR is not supported by this PMD",
-		      (void *)dev);
-		break;
 	default:
 		ERROR("%p: filter type (%d) not supported",
 		      (void *)dev, filter_type);
@@ -5024,22 +5016,15 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.link_update = mlx4_link_update,
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
-	.queue_stats_mapping_set = NULL,
 	.dev_infos_get = mlx4_dev_infos_get,
 	.dev_supported_ptypes_get = mlx4_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx4_vlan_filter_set,
-	.vlan_tpid_set = NULL,
-	.vlan_strip_queue_set = NULL,
-	.vlan_offload_set = NULL,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
 	.tx_queue_release = mlx4_tx_queue_release,
-	.dev_led_on = NULL,
-	.dev_led_off = NULL,
 	.flow_ctrl_get = mlx4_dev_get_flow_ctrl,
 	.flow_ctrl_set = mlx4_dev_set_flow_ctrl,
-	.priority_flow_ctrl_set = NULL,
 	.mac_addr_remove = mlx4_mac_addr_remove,
 	.mac_addr_add = mlx4_mac_addr_add,
 	.mac_addr_set = mlx4_mac_addr_set,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a12d1fa..702f9c9 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -34,7 +34,6 @@
 #ifndef RTE_PMD_MLX4_H_
 #define RTE_PMD_MLX4_H_
 
-#include <stddef.h>
 #include <stdint.h>
 #include <limits.h>
 
@@ -150,17 +149,6 @@ enum {
 /* Number of elements in array. */
 #define elemof(a) (sizeof(a) / sizeof((a)[0]))
 
-/* Cast pointer p to structure member m to its parent structure of type t. */
-#define containerof(p, t, m) ((t *)((uint8_t *)(p) - offsetof(t, m)))
-
-/* Branch prediction helpers. */
-#ifndef likely
-#define likely(c) __builtin_expect(!!(c), 1)
-#endif
-#ifndef unlikely
-#define unlikely(c) __builtin_expect(!!(c), 0)
-#endif
-
 /* Debugging */
 #ifndef NDEBUG
 #include <stdio.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 07/51] net/mlx4: remove soft counters compilation option
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (5 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 06/51] net/mlx4: remove useless code Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 08/51] net/mlx4: remove scatter mode " Adrien Mazarguil
                     ` (45 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Software counters are mandatory since hardware counters are not
implemented.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 config/common_base        |  1 -
 doc/guides/nics/mlx4.rst  |  6 ------
 drivers/net/mlx4/Makefile |  4 ----
 drivers/net/mlx4/mlx4.c   | 37 -------------------------------------
 drivers/net/mlx4/mlx4.h   | 12 ------------
 5 files changed, 60 deletions(-)

diff --git a/config/common_base b/config/common_base
index 5e97a08..c7ffb6b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -217,7 +217,6 @@ CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS=n
 CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
-CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
 
 #
 # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index bba1034..eba81ba 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -97,7 +97,6 @@ Limitations
 - RSS RETA cannot be configured
 - RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be
   dissociated.
-- Hardware counters are not implemented (they are software counters).
 
 Configuration
 -------------
@@ -145,11 +144,6 @@ These options can be modified in the ``.config`` file.
 
   This value is always 1 for RX queues since they use a single MP.
 
-- ``CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS`` (default **1**)
-
-  Toggle software counters. No counters are available if this option is
-  disabled since hardware counters are not supported.
-
 Environment variables
 ~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 07a66c4..147e541 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -80,10 +80,6 @@ ifdef CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE
 CFLAGS += -DMLX4_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE)
 endif
 
-ifdef CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS
-CFLAGS += -DMLX4_PMD_SOFT_COUNTERS=$(CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS)
-endif
-
 ifeq ($(CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS),y)
 CFLAGS += -DMLX4_PMD_DEBUG_BROKEN_VERBS
 endif
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cb07a53..2e8de92 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -34,7 +34,6 @@
 /*
  * Known limitations:
  * - RSS hash key and options cannot be modified.
- * - Hardware counters aren't implemented.
  */
 
 /* System headers. */
@@ -1372,9 +1371,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
 		struct txq_elt *elt = &(*txq->elts)[elts_head];
 		unsigned int segs = NB_SEGS(buf);
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		unsigned int sent_size = 0;
-#endif
 		uint32_t send_flags = 0;
 
 		/* Clean up old buffer. */
@@ -1452,9 +1449,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					 send_flags);
 			if (unlikely(err))
 				goto stop;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			sent_size += length;
-#endif
 		} else {
 #if MLX4_PMD_SGE_WR_N > 1
 			struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
@@ -1473,9 +1468,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				 send_flags);
 			if (unlikely(err))
 				goto stop;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			sent_size += ret.length;
-#endif
 #else /* MLX4_PMD_SGE_WR_N > 1 */
 			DEBUG("%p: TX scattered buffers support not"
 			      " compiled in", (void *)txq);
@@ -1483,19 +1476,15 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 #endif /* MLX4_PMD_SGE_WR_N > 1 */
 		}
 		elts_head = elts_head_next;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		/* Increment sent bytes counter. */
 		txq->stats.obytes += sent_size;
-#endif
 	}
 stop:
 	/* Take a shortcut if nothing must be sent. */
 	if (unlikely(i == 0))
 		return 0;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	/* Increment sent packets counter. */
 	txq->stats.opackets += i;
-#endif
 	/* Ring QP doorbell. */
 	err = txq->if_qp->send_flush(txq->qp);
 	if (unlikely(err)) {
@@ -2786,10 +2775,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				      " completion status (%d): %s",
 				      (void *)rxq, wc.wr_id, wc.status,
 				      ibv_wc_status_str(wc.status));
-#ifdef MLX4_PMD_SOFT_COUNTERS
 				/* Increment dropped packets counter. */
 				++rxq->stats.idropped;
-#endif
 				/* Link completed WRs together for repost. */
 				*next = wr;
 				next = &wr->next;
@@ -2901,10 +2888,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Return packet. */
 		*(pkts++) = pkt_buf;
 		++pkts_ret;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		/* Increase bytes counter. */
 		rxq->stats.ibytes += pkt_buf_len;
-#endif
 repost:
 		if (++elts_head >= elts_n)
 			elts_head = 0;
@@ -2924,10 +2909,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		abort();
 	}
 	rxq->elts_head = elts_head;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	/* Increase packets counter. */
 	rxq->stats.ipackets += pkts_ret;
-#endif
 	return pkts_ret;
 }
 
@@ -3008,10 +2991,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				      " completion status (%d): %s",
 				      (void *)rxq, wc.wr_id, wc.status,
 				      ibv_wc_status_str(wc.status));
-#ifdef MLX4_PMD_SOFT_COUNTERS
 				/* Increment dropped packets counter. */
 				++rxq->stats.idropped;
-#endif
 				/* Add SGE to array for repost. */
 				sges[i] = elt->sge;
 				goto repost;
@@ -3062,10 +3043,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Return packet. */
 		*(pkts++) = seg;
 		++pkts_ret;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		/* Increase bytes counter. */
 		rxq->stats.ibytes += len;
-#endif
 repost:
 		if (++elts_head >= elts_n)
 			elts_head = 0;
@@ -3083,10 +3062,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		abort();
 	}
 	rxq->elts_head = elts_head;
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	/* Increase packets counter. */
 	rxq->stats.ipackets += pkts_ret;
-#endif
 	return pkts_ret;
 }
 
@@ -4270,17 +4247,13 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 			continue;
 		idx = rxq->stats.idx;
 		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			tmp.q_ipackets[idx] += rxq->stats.ipackets;
 			tmp.q_ibytes[idx] += rxq->stats.ibytes;
-#endif
 			tmp.q_errors[idx] += (rxq->stats.idropped +
 					      rxq->stats.rx_nombuf);
 		}
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		tmp.ipackets += rxq->stats.ipackets;
 		tmp.ibytes += rxq->stats.ibytes;
-#endif
 		tmp.ierrors += rxq->stats.idropped;
 		tmp.rx_nombuf += rxq->stats.rx_nombuf;
 	}
@@ -4291,21 +4264,14 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 			continue;
 		idx = txq->stats.idx;
 		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-#ifdef MLX4_PMD_SOFT_COUNTERS
 			tmp.q_opackets[idx] += txq->stats.opackets;
 			tmp.q_obytes[idx] += txq->stats.obytes;
-#endif
 			tmp.q_errors[idx] += txq->stats.odropped;
 		}
-#ifdef MLX4_PMD_SOFT_COUNTERS
 		tmp.opackets += txq->stats.opackets;
 		tmp.obytes += txq->stats.obytes;
-#endif
 		tmp.oerrors += txq->stats.odropped;
 	}
-#ifndef MLX4_PMD_SOFT_COUNTERS
-	/* FIXME: retrieve and add hardware counters. */
-#endif
 	*stats = tmp;
 	priv_unlock(priv);
 }
@@ -4340,9 +4306,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 		(*priv->txqs)[i]->stats =
 			(struct mlx4_txq_stats){ .idx = idx };
 	}
-#ifndef MLX4_PMD_SOFT_COUNTERS
-	/* FIXME: reset hardware counters. */
-#endif
 	priv_unlock(priv);
 }
 
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 702f9c9..97e042e 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -101,14 +101,6 @@
 #define MLX4_PMD_TX_MP_CACHE 8
 #endif
 
-/*
- * If defined, only use software counters. The PMD will never ask the hardware
- * for these, and many of them won't be available.
- */
-#ifndef MLX4_PMD_SOFT_COUNTERS
-#define MLX4_PMD_SOFT_COUNTERS 1
-#endif
-
 /* Alarm timeout. */
 #define MLX4_ALARM_TIMEOUT_US 100000
 
@@ -186,10 +178,8 @@ enum {
 
 struct mlx4_rxq_stats {
 	unsigned int idx; /**< Mapping index. */
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	uint64_t ipackets; /**< Total of successfully received packets. */
 	uint64_t ibytes; /**< Total of successfully received bytes. */
-#endif
 	uint64_t idropped; /**< Total of packets dropped when RX ring full. */
 	uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
 };
@@ -252,10 +242,8 @@ struct txq_elt {
 
 struct mlx4_txq_stats {
 	unsigned int idx; /**< Mapping index. */
-#ifdef MLX4_PMD_SOFT_COUNTERS
 	uint64_t opackets; /**< Total of successfully sent packets. */
 	uint64_t obytes;   /**< Total of successfully sent bytes. */
-#endif
 	uint64_t odropped; /**< Total of packets not sent when TX ring full. */
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 08/51] net/mlx4: remove scatter mode compilation option
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (6 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 07/51] net/mlx4: remove soft counters compilation option Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 09/51] net/mlx4: remove Tx inline " Adrien Mazarguil
                     ` (44 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This option both sets the maximum number of segments for Rx/Tx packets and
whether scattered mode is supported at all. This commit removes the latter
as well as configuration file exposure since the most appropriate value
should be decided at run-time.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 config/common_base        |  1 -
 doc/guides/nics/mlx4.rst  |  7 -------
 drivers/net/mlx4/Makefile |  4 ----
 drivers/net/mlx4/mlx4.c   | 10 ----------
 drivers/net/mlx4/mlx4.h   |  2 --
 5 files changed, 24 deletions(-)

diff --git a/config/common_base b/config/common_base
index c7ffb6b..f966fd1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -214,7 +214,6 @@ CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
 CONFIG_RTE_LIBRTE_MLX4_PMD=n
 CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS=n
-CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index eba81ba..8c656d3 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -124,13 +124,6 @@ These options can be modified in the ``.config`` file.
   to abort with harmless debugging messages as a workaround.
   Relevant only when CONFIG_RTE_LIBRTE_MLX4_DEBUG is enabled.
 
-- ``CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N`` (default **4**)
-
-  Number of scatter/gather elements (SGEs) per work request (WR). Lowering
-  this number improves performance but also limits the ability to receive
-  scattered packets (packets that do not fit a single mbuf). The default
-  value is a safe tradeoff.
-
 - ``CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE`` (default **0**)
 
   Amount of data to be inlined during TX operations. Improves latency but
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 147e541..f9257fc 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -68,10 +68,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif
 
-ifdef CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N
-CFLAGS += -DMLX4_PMD_SGE_WR_N=$(CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE
 CFLAGS += -DMLX4_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE)
 endif
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 2e8de92..0bbcb7b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1176,8 +1176,6 @@ txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 	txq_mp2mr(txq, mp);
 }
 
-#if MLX4_PMD_SGE_WR_N > 1
-
 /**
  * Copy scattered mbuf contents to a single linear buffer.
  *
@@ -1324,8 +1322,6 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
 	};
 }
 
-#endif /* MLX4_PMD_SGE_WR_N > 1 */
-
 /**
  * DPDK callback for TX.
  *
@@ -1451,7 +1447,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				goto stop;
 			sent_size += length;
 		} else {
-#if MLX4_PMD_SGE_WR_N > 1
 			struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
 			struct tx_burst_sg_ret ret;
 
@@ -1469,11 +1464,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			if (unlikely(err))
 				goto stop;
 			sent_size += ret.length;
-#else /* MLX4_PMD_SGE_WR_N > 1 */
-			DEBUG("%p: TX scattered buffers support not"
-			      " compiled in", (void *)txq);
-			goto stop;
-#endif /* MLX4_PMD_SGE_WR_N > 1 */
 		}
 		elts_head = elts_head_next;
 		/* Increment sent bytes counter. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 97e042e..5c2005d 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -81,9 +81,7 @@
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
 /* Maximum number of Scatter/Gather Elements per Work Request. */
-#ifndef MLX4_PMD_SGE_WR_N
 #define MLX4_PMD_SGE_WR_N 4
-#endif
 
 /* Maximum size for inline data. */
 #ifndef MLX4_PMD_MAX_INLINE
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 09/51] net/mlx4: remove Tx inline compilation option
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (7 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 08/51] net/mlx4: remove scatter mode " Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 10/51] net/mlx4: remove allmulti and promisc support Adrien Mazarguil
                     ` (43 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This should be a run-time parameter.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 config/common_base        | 1 -
 drivers/net/mlx4/Makefile | 4 ----
 drivers/net/mlx4/mlx4.c   | 6 ------
 drivers/net/mlx4/mlx4.h   | 4 ----
 4 files changed, 15 deletions(-)

diff --git a/config/common_base b/config/common_base
index f966fd1..edc563a 100644
--- a/config/common_base
+++ b/config/common_base
@@ -214,7 +214,6 @@ CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
 CONFIG_RTE_LIBRTE_MLX4_PMD=n
 CONFIG_RTE_LIBRTE_MLX4_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS=n
-CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 
 #
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index f9257fc..bd713e2 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -68,10 +68,6 @@ else
 CFLAGS += -DNDEBUG -UPEDANTIC
 endif
 
-ifdef CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE
-CFLAGS += -DMLX4_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE)
-endif
-
 ifdef CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE
 CFLAGS += -DMLX4_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE)
 endif
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 0bbcb7b..394b87c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1428,7 +1428,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					      (uintptr_t)addr);
 			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
 			/* Put packet into send queue. */
-#if MLX4_PMD_MAX_INLINE > 0
 			if (length <= txq->max_inline)
 				err = txq->if_qp->send_pending_inline
 					(txq->qp,
@@ -1436,7 +1435,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 					 length,
 					 send_flags);
 			else
-#endif
 				err = txq->if_qp->send_pending
 					(txq->qp,
 					 addr,
@@ -1578,9 +1576,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 					  MLX4_PMD_SGE_WR_N) ?
 					 priv->device_attr.max_sge :
 					 MLX4_PMD_SGE_WR_N),
-#if MLX4_PMD_MAX_INLINE > 0
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
-#endif
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
 		/* Do *NOT* enable this, completions events are managed per
@@ -1598,10 +1594,8 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-#if MLX4_PMD_MAX_INLINE > 0
 	/* ibv_create_qp() updates this value. */
 	tmpl.max_inline = attr.init.cap.max_inline_data;
-#endif
 	attr.mod = (struct ibv_exp_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5c2005d..256e644 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -84,9 +84,7 @@
 #define MLX4_PMD_SGE_WR_N 4
 
 /* Maximum size for inline data. */
-#ifndef MLX4_PMD_MAX_INLINE
 #define MLX4_PMD_MAX_INLINE 0
-#endif
 
 /*
  * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
@@ -267,9 +265,7 @@ struct txq {
 	struct ibv_qp *qp; /* Queue Pair. */
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
-#if MLX4_PMD_MAX_INLINE > 0
 	uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
-#endif
 	unsigned int elts_n; /* (*elts)[] length. */
 	struct txq_elt (*elts)[]; /* TX elements. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 10/51] net/mlx4: remove allmulti and promisc support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (8 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 09/51] net/mlx4: remove Tx inline " Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 11/51] net/mlx4: remove VLAN filter support Adrien Mazarguil
                     ` (42 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This is done in preparation for a major refactoring.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   2 -
 doc/guides/nics/mlx4.rst          |   2 -
 drivers/net/mlx4/mlx4.c           | 311 ---------------------------------
 drivers/net/mlx4/mlx4.h           |   4 -
 4 files changed, 319 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index f6efd21..344731f 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,8 +13,6 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
-Promiscuous mode     = Y
-Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
 RSS hash             = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 8c656d3..b6aaee2 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -82,8 +82,6 @@ Features
   configured RX queues must be a power of two.
 - VLAN filtering is supported.
 - Link state information is provided.
-- Promiscuous mode is supported.
-- All multicast mode is supported.
 - Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 394b87c..59ffec0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2141,9 +2141,6 @@ rxq_mac_addrs_del(struct rxq *rxq)
 		rxq_mac_addr_del(rxq, i);
 }
 
-static int rxq_promiscuous_enable(struct rxq *);
-static void rxq_promiscuous_disable(struct rxq *);
-
 /**
  * Add single flow steering rule.
  *
@@ -2422,122 +2419,6 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 }
 
 /**
- * Enable allmulti mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_allmulticast_enable(struct rxq *rxq)
-{
-	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_MC_DEFAULT,
-		.num_of_specs = 0,
-		.port = rxq->priv->port,
-		.flags = 0
-	};
-
-	DEBUG("%p: enabling allmulticast mode", (void *)rxq);
-	if (rxq->allmulti_flow != NULL)
-		return EBUSY;
-	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
-	if (flow == NULL) {
-		/* It's not clear whether errno is always set in this case. */
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
-		      (errno ? strerror(errno) : "Unknown error"));
-		if (errno)
-			return errno;
-		return EINVAL;
-	}
-	rxq->allmulti_flow = flow;
-	DEBUG("%p: allmulticast mode enabled", (void *)rxq);
-	return 0;
-}
-
-/**
- * Disable allmulti mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_allmulticast_disable(struct rxq *rxq)
-{
-	DEBUG("%p: disabling allmulticast mode", (void *)rxq);
-	if (rxq->allmulti_flow == NULL)
-		return;
-	claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
-	rxq->allmulti_flow = NULL;
-	DEBUG("%p: allmulticast mode disabled", (void *)rxq);
-}
-
-/**
- * Enable promiscuous mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_promiscuous_enable(struct rxq *rxq)
-{
-	struct ibv_flow *flow;
-	struct ibv_flow_attr attr = {
-		.type = IBV_FLOW_ATTR_ALL_DEFAULT,
-		.num_of_specs = 0,
-		.port = rxq->priv->port,
-		.flags = 0
-	};
-
-	if (rxq->priv->vf)
-		return 0;
-	DEBUG("%p: enabling promiscuous mode", (void *)rxq);
-	if (rxq->promisc_flow != NULL)
-		return EBUSY;
-	errno = 0;
-	flow = ibv_create_flow(rxq->qp, &attr);
-	if (flow == NULL) {
-		/* It's not clear whether errno is always set in this case. */
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
-		      (errno ? strerror(errno) : "Unknown error"));
-		if (errno)
-			return errno;
-		return EINVAL;
-	}
-	rxq->promisc_flow = flow;
-	DEBUG("%p: promiscuous mode enabled", (void *)rxq);
-	return 0;
-}
-
-/**
- * Disable promiscuous mode in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_promiscuous_disable(struct rxq *rxq)
-{
-	if (rxq->priv->vf)
-		return;
-	DEBUG("%p: disabling promiscuous mode", (void *)rxq);
-	if (rxq->promisc_flow == NULL)
-		return;
-	claim_zero(ibv_destroy_flow(rxq->promisc_flow));
-	rxq->promisc_flow = NULL;
-	DEBUG("%p: promiscuous mode disabled", (void *)rxq);
-}
-
-/**
  * Clean up a RX queue.
  *
  * Destroy objects, free allocated memory and reset the structure for reuse.
@@ -2578,8 +2459,6 @@ rxq_cleanup(struct rxq *rxq)
 						&params));
 	}
 	if (rxq->qp != NULL && !rxq->priv->isolated) {
-		rxq_promiscuous_disable(rxq);
-		rxq_allmulticast_disable(rxq);
 		rxq_mac_addrs_del(rxq);
 	}
 	if (rxq->qp != NULL)
@@ -3222,12 +3101,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* Remove attached flows if RSS is disabled (no parent queue). */
 	if (!priv->rss && !priv->isolated) {
-		rxq_allmulticast_disable(&tmpl);
-		rxq_promiscuous_disable(&tmpl);
 		rxq_mac_addrs_del(&tmpl);
 		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
 		memcpy(rxq->mac_configured, tmpl.mac_configured,
 		       sizeof(rxq->mac_configured));
 		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
@@ -3268,13 +3143,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	/* Reconfigure flows. Do not care for errors. */
 	if (!priv->rss && !priv->isolated) {
 		rxq_mac_addrs_add(&tmpl);
-		if (priv->promisc)
-			rxq_promiscuous_enable(&tmpl);
-		if (priv->allmulti)
-			rxq_allmulticast_enable(&tmpl);
 		/* Update original queue in case of failure. */
-		rxq->allmulti_flow = tmpl.allmulti_flow;
-		rxq->promisc_flow = tmpl.promisc_flow;
 		memcpy(rxq->mac_configured, tmpl.mac_configured,
 		       sizeof(rxq->mac_configured));
 		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
@@ -3817,10 +3686,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		if (rxq == NULL)
 			continue;
 		ret = rxq_mac_addrs_add(rxq);
-		if (!ret && priv->promisc)
-			ret = rxq_promiscuous_enable(rxq);
-		if (!ret && priv->allmulti)
-			ret = rxq_allmulticast_enable(rxq);
 		if (!ret)
 			continue;
 		WARN("%p: QP flow attachment failed: %s",
@@ -3858,8 +3723,6 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	while (i != 0) {
 		rxq = (*priv->rxqs)[i--];
 		if (rxq != NULL) {
-			rxq_allmulticast_disable(rxq);
-			rxq_promiscuous_disable(rxq);
 			rxq_mac_addrs_del(rxq);
 		}
 	}
@@ -3907,8 +3770,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		/* Ignore nonexistent RX queues. */
 		if (rxq == NULL)
 			continue;
-		rxq_allmulticast_disable(rxq);
-		rxq_promiscuous_disable(rxq);
 		rxq_mac_addrs_del(rxq);
 	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	priv_unlock(priv);
@@ -4378,170 +4239,6 @@ mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 }
 
 /**
- * DPDK callback to enable promiscuous mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_promiscuous_enable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	int ret;
-
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot enable promiscuous, "
-		      "device is in isolated mode", (void *)dev);
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->promisc) {
-		priv_unlock(priv);
-		return;
-	}
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_promiscuous_enable(LIST_FIRST(&priv->parents));
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_promiscuous_enable((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_promiscuous_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
-	}
-end:
-	priv->promisc = 1;
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to disable promiscuous mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_promiscuous_disable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-
-	priv_lock(priv);
-	if (!priv->promisc || priv->isolated) {
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->rss) {
-		rxq_promiscuous_disable(LIST_FIRST(&priv->parents));
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_promiscuous_disable((*priv->rxqs)[i]);
-end:
-	priv->promisc = 0;
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to enable allmulti mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_allmulticast_enable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	int ret;
-
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot enable allmulticast, "
-		      "device is in isolated mode", (void *)dev);
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->allmulti) {
-		priv_unlock(priv);
-		return;
-	}
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		goto end;
-	if (priv->rss) {
-		ret = rxq_allmulticast_enable(LIST_FIRST(&priv->parents));
-		if (ret) {
-			priv_unlock(priv);
-			return;
-		}
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_allmulticast_enable((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[--i] != NULL)
-				rxq_allmulticast_disable((*priv->rxqs)[i]);
-		priv_unlock(priv);
-		return;
-	}
-end:
-	priv->allmulti = 1;
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to disable allmulti mode.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_allmulticast_disable(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-
-	priv_lock(priv);
-	if (!priv->allmulti || priv->isolated) {
-		priv_unlock(priv);
-		return;
-	}
-	if (priv->rss) {
-		rxq_allmulticast_disable(LIST_FIRST(&priv->parents));
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] != NULL)
-			rxq_allmulticast_disable((*priv->rxqs)[i]);
-end:
-	priv->allmulti = 0;
-	priv_unlock(priv);
-}
-
-/**
  * DPDK callback to retrieve physical link information.
  *
  * @param dev
@@ -4664,10 +4361,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 		 * for errors at this stage. */
 		if (!priv->rss && !priv->isolated) {
 			rxq_mac_addrs_add(rxq);
-			if (priv->promisc)
-				rxq_promiscuous_enable(rxq);
-			if (priv->allmulti)
-				rxq_allmulticast_enable(rxq);
 		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
@@ -4956,10 +4649,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_set_link_down = mlx4_set_link_down,
 	.dev_set_link_up = mlx4_set_link_up,
 	.dev_close = mlx4_dev_close,
-	.promiscuous_enable = mlx4_promiscuous_enable,
-	.promiscuous_disable = mlx4_promiscuous_disable,
-	.allmulticast_enable = mlx4_allmulticast_enable,
-	.allmulticast_disable = mlx4_allmulticast_disable,
 	.link_update = mlx4_link_update,
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 256e644..d4b3a5f 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -210,8 +210,6 @@ struct rxq {
 	 */
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
 	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
-	struct ibv_flow *promisc_flow; /* Promiscuous flow. */
-	struct ibv_flow *allmulti_flow; /* Multicast flow. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -303,8 +301,6 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int promisc:1; /* Device in promiscuous mode. */
-	unsigned int allmulti:1; /* Device receives all multicast packets. */
 	unsigned int hw_qpg:1; /* QP groups are supported. */
 	unsigned int hw_tss:1; /* TSS is supported. */
 	unsigned int hw_rss:1; /* RSS is supported. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 11/51] net/mlx4: remove VLAN filter support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (9 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 10/51] net/mlx4: remove allmulti and promisc support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 12/51] net/mlx4: remove MAC address configuration support Adrien Mazarguil
                     ` (41 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This is done in preparation for a major refactoring.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |   1 -
 drivers/net/mlx4/mlx4.c           | 206 +++------------------------------
 drivers/net/mlx4/mlx4.h           |  13 +--
 4 files changed, 17 insertions(+), 204 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 344731f..bfa6948 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -17,7 +17,6 @@ Unicast MAC filter   = Y
 Multicast MAC filter = Y
 RSS hash             = Y
 SR-IOV               = Y
-VLAN filter          = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
 Inner L3 checksum    = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index b6aaee2..0f340c5 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -80,7 +80,6 @@ Features
 - Multi arch support: x86_64 and POWER8.
 - RSS, also known as RCA, is supported. In this mode the number of
   configured RX queues must be a power of two.
-- VLAN filtering is supported.
 - Link state information is provided.
 - Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 59ffec0..fe05a79 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -47,7 +47,6 @@
 #include <unistd.h>
 #include <limits.h>
 #include <assert.h>
-#include <arpa/inet.h>
 #include <net/if.h>
 #include <dirent.h>
 #include <sys/ioctl.h>
@@ -2073,11 +2072,9 @@ rxq_free_elts(struct rxq *rxq)
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index.
- * @param vlan_index
- *   VLAN index.
  */
 static void
-rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+rxq_del_flow(struct rxq *rxq, unsigned int mac_index)
 {
 #ifndef NDEBUG
 	struct priv *priv = rxq->priv;
@@ -2085,14 +2082,13 @@ rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 		(const uint8_t (*)[ETHER_ADDR_LEN])
 		priv->mac[mac_index].addr_bytes;
 #endif
-	assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
-	      " (VLAN ID %" PRIu16 ")",
+	assert(rxq->mac_flow[mac_index] != NULL);
+	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
 	      (void *)rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index, priv->vlan_filter[vlan_index].id);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
-	rxq->mac_flow[mac_index][vlan_index] = NULL;
+	      mac_index);
+	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
+	rxq->mac_flow[mac_index] = NULL;
 }
 
 /**
@@ -2106,22 +2102,10 @@ rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 static void
 rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-	unsigned int vlans = 0;
-
-	assert(mac_index < elemof(priv->mac));
+	assert(mac_index < elemof(rxq->priv->mac));
 	if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
 		return;
-	for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
-		if (!priv->vlan_filter[i].enabled)
-			continue;
-		rxq_del_flow(rxq, mac_index, i);
-		vlans++;
-	}
-	if (!vlans) {
-		rxq_del_flow(rxq, mac_index, 0);
-	}
+	rxq_del_flow(rxq, mac_index);
 	BITFIELD_RESET(rxq->mac_configured, mac_index);
 }
 
@@ -2148,14 +2132,12 @@ rxq_mac_addrs_del(struct rxq *rxq)
  *   Pointer to RX queue structure.
  * @param mac_index
  *   MAC address index to register.
- * @param vlan_index
- *   VLAN index. Use -1 for a flow without VLAN.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 {
 	struct ibv_flow *flow;
 	struct priv *priv = rxq->priv;
@@ -2172,7 +2154,6 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 	struct ibv_flow_spec_eth *spec = &data.spec;
 
 	assert(mac_index < elemof(priv->mac));
-	assert((vlan_index < elemof(priv->vlan_filter)) || (vlan_index == -1u));
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
 	 * This layout is expected by libibverbs.
@@ -2193,22 +2174,15 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 				(*mac)[0], (*mac)[1], (*mac)[2],
 				(*mac)[3], (*mac)[4], (*mac)[5]
 			},
-			.vlan_tag = ((vlan_index != -1u) ?
-				     htons(priv->vlan_filter[vlan_index].id) :
-				     0),
 		},
 		.mask = {
 			.dst_mac = "\xff\xff\xff\xff\xff\xff",
-			.vlan_tag = ((vlan_index != -1u) ? htons(0xfff) : 0),
 		}
 	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
-	      " (VLAN %s %" PRIu16 ")",
+	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
 	      (void *)rxq,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index,
-	      ((vlan_index != -1u) ? "ID" : "index"),
-	      ((vlan_index != -1u) ? priv->vlan_filter[vlan_index].id : -1u));
+	      mac_index);
 	/* Create related flow. */
 	errno = 0;
 	flow = ibv_create_flow(rxq->qp, attr);
@@ -2221,10 +2195,8 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 			return errno;
 		return EINVAL;
 	}
-	if (vlan_index == -1u)
-		vlan_index = 0;
-	assert(rxq->mac_flow[mac_index][vlan_index] == NULL);
-	rxq->mac_flow[mac_index][vlan_index] = flow;
+	assert(rxq->mac_flow[mac_index] == NULL);
+	rxq->mac_flow[mac_index] = flow;
 	return 0;
 }
 
@@ -2242,37 +2214,14 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
 static int
 rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
 {
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-	unsigned int vlans = 0;
 	int ret;
 
-	assert(mac_index < elemof(priv->mac));
+	assert(mac_index < elemof(rxq->priv->mac));
 	if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
 		rxq_mac_addr_del(rxq, mac_index);
-	/* Fill VLAN specifications. */
-	for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
-		if (!priv->vlan_filter[i].enabled)
-			continue;
-		/* Create related flow. */
-		ret = rxq_add_flow(rxq, mac_index, i);
-		if (!ret) {
-			vlans++;
-			continue;
-		}
-		/* Failure, rollback. */
-		while (i != 0)
-			if (priv->vlan_filter[--i].enabled)
-				rxq_del_flow(rxq, mac_index, i);
-		assert(ret > 0);
+	ret = rxq_add_flow(rxq, mac_index);
+	if (ret)
 		return ret;
-	}
-	/* In case there is no VLAN filter. */
-	if (!vlans) {
-		ret = rxq_add_flow(rxq, mac_index, -1);
-		if (ret)
-			return ret;
-	}
 	BITFIELD_SET(rxq->mac_configured, mac_index);
 	return 0;
 }
@@ -4474,128 +4423,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	return -ret;
 }
 
-/**
- * Configure a VLAN filter.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param vlan_id
- *   VLAN ID to filter.
- * @param on
- *   Toggle filter.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	unsigned int j = -1;
-
-	DEBUG("%p: %s VLAN filter ID %" PRIu16,
-	      (void *)dev, (on ? "enable" : "disable"), vlan_id);
-	for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
-		if (!priv->vlan_filter[i].enabled) {
-			/* Unused index, remember it. */
-			j = i;
-			continue;
-		}
-		if (priv->vlan_filter[i].id != vlan_id)
-			continue;
-		/* This VLAN ID is already known, use its index. */
-		j = i;
-		break;
-	}
-	/* Check if there's room for another VLAN filter. */
-	if (j == (unsigned int)-1)
-		return ENOMEM;
-	/*
-	 * VLAN filters apply to all configured MAC addresses, flow
-	 * specifications must be reconfigured accordingly.
-	 */
-	priv->vlan_filter[j].id = vlan_id;
-	if ((on) && (!priv->vlan_filter[j].enabled)) {
-		/*
-		 * Filter is disabled, enable it.
-		 * Rehashing flows in all RX queues is necessary.
-		 */
-		if (priv->rss)
-			rxq_mac_addrs_del(LIST_FIRST(&priv->parents));
-		else
-			for (i = 0; (i != priv->rxqs_n); ++i)
-				if ((*priv->rxqs)[i] != NULL)
-					rxq_mac_addrs_del((*priv->rxqs)[i]);
-		priv->vlan_filter[j].enabled = 1;
-		if (priv->started) {
-			if (priv->rss)
-				rxq_mac_addrs_add(LIST_FIRST(&priv->parents));
-			else
-				for (i = 0; (i != priv->rxqs_n); ++i) {
-					if ((*priv->rxqs)[i] == NULL)
-						continue;
-					rxq_mac_addrs_add((*priv->rxqs)[i]);
-				}
-		}
-	} else if ((!on) && (priv->vlan_filter[j].enabled)) {
-		/*
-		 * Filter is enabled, disable it.
-		 * Rehashing flows in all RX queues is necessary.
-		 */
-		if (priv->rss)
-			rxq_mac_addrs_del(LIST_FIRST(&priv->parents));
-		else
-			for (i = 0; (i != priv->rxqs_n); ++i)
-				if ((*priv->rxqs)[i] != NULL)
-					rxq_mac_addrs_del((*priv->rxqs)[i]);
-		priv->vlan_filter[j].enabled = 0;
-		if (priv->started) {
-			if (priv->rss)
-				rxq_mac_addrs_add(LIST_FIRST(&priv->parents));
-			else
-				for (i = 0; (i != priv->rxqs_n); ++i) {
-					if ((*priv->rxqs)[i] == NULL)
-						continue;
-					rxq_mac_addrs_add((*priv->rxqs)[i]);
-				}
-		}
-	}
-	return 0;
-}
-
-/**
- * DPDK callback to configure a VLAN filter.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param vlan_id
- *   VLAN ID to filter.
- * @param on
- *   Toggle filter.
- *
- * @return
- *   0 on success, negative errno value on failure.
- */
-static int
-mlx4_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
-{
-	struct priv *priv = dev->data->dev_private;
-	int ret;
-
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot set vlan filter, "
-		      "device is in isolated mode", (void *)dev);
-		priv_unlock(priv);
-		return -EINVAL;
-	}
-	ret = vlan_filter_set(dev, vlan_id, on);
-	priv_unlock(priv);
-	assert(ret >= 0);
-	return -ret;
-}
-
 const struct rte_flow_ops mlx4_flow_ops = {
 	.validate = mlx4_flow_validate,
 	.create = mlx4_flow_create,
@@ -4654,7 +4481,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
 	.dev_supported_ptypes_get = mlx4_dev_supported_ptypes_get,
-	.vlan_filter_set = mlx4_vlan_filter_set,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index d4b3a5f..d26d826 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -74,9 +74,6 @@
  */
 #define MLX4_MAX_MAC_ADDRESSES 128
 
-/* Maximum number of simultaneous VLAN filters supported. See above. */
-#define MLX4_MAX_VLAN_IDS 127
-
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
@@ -205,11 +202,8 @@ struct rxq {
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
-	/*
-	 * Each VLAN ID requires a separate flow steering rule.
-	 */
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
+	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -292,11 +286,6 @@ struct priv {
 	 */
 	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
 	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-	/* VLAN filters. */
-	struct {
-		unsigned int enabled:1; /* If enabled. */
-		unsigned int id:12; /* VLAN ID (0-4095). */
-	} vlan_filter[MLX4_MAX_VLAN_IDS]; /* VLAN filters table. */
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 12/51] net/mlx4: remove MAC address configuration support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (10 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 11/51] net/mlx4: remove VLAN filter support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 13/51] net/mlx4: drop MAC flows affecting all Rx queues Adrien Mazarguil
                     ` (40 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Only the default port MAC address remains and is not configurable.
This is done in preparation for a major refactoring.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   2 -
 doc/guides/nics/mlx4.rst          |   1 -
 drivers/net/mlx4/mlx4.c           | 322 ++++-----------------------------
 drivers/net/mlx4/mlx4.h           |  41 +----
 4 files changed, 39 insertions(+), 327 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index bfa6948..3acf8d3 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,8 +13,6 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
-Unicast MAC filter   = Y
-Multicast MAC filter = Y
 RSS hash             = Y
 SR-IOV               = Y
 L3 checksum offload  = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 0f340c5..d3d51f7 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -81,7 +81,6 @@ Features
 - RSS, also known as RCA, is supported. In this mode the number of
   configured RX queues must be a power of two.
 - Link state information is provided.
-- Multiple MAC addresses (unicast, multicast) can be configured.
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
 - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index fe05a79..8ca5698 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -45,7 +45,6 @@
 #include <string.h>
 #include <errno.h>
 #include <unistd.h>
-#include <limits.h>
 #include <assert.h>
 #include <net/if.h>
 #include <dirent.h>
@@ -2066,84 +2065,46 @@ rxq_free_elts(struct rxq *rxq)
 }
 
 /**
- * Delete flow steering rule.
+ * Unregister a MAC address from a Rx queue.
  *
  * @param rxq
  *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index.
  */
 static void
-rxq_del_flow(struct rxq *rxq, unsigned int mac_index)
+rxq_mac_addr_del(struct rxq *rxq)
 {
 #ifndef NDEBUG
 	struct priv *priv = rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 		(const uint8_t (*)[ETHER_ADDR_LEN])
-		priv->mac[mac_index].addr_bytes;
+		priv->mac.addr_bytes;
 #endif
-	assert(rxq->mac_flow[mac_index] != NULL);
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
-	      (void *)rxq,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
-	rxq->mac_flow[mac_index] = NULL;
-}
-
-/**
- * Unregister a MAC address from a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index.
- */
-static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
-{
-	assert(mac_index < elemof(rxq->priv->mac));
-	if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
+	if (!rxq->mac_flow)
 		return;
-	rxq_del_flow(rxq, mac_index);
-	BITFIELD_RESET(rxq->mac_configured, mac_index);
-}
-
-/**
- * Unregister all MAC addresses from a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_mac_addrs_del(struct rxq *rxq)
-{
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-
-	for (i = 0; (i != elemof(priv->mac)); ++i)
-		rxq_mac_addr_del(rxq, i);
+	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+	      (void *)rxq,
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
+	claim_zero(ibv_destroy_flow(rxq->mac_flow));
+	rxq->mac_flow = NULL;
 }
 
 /**
- * Add single flow steering rule.
+ * Register a MAC address in a Rx queue.
  *
  * @param rxq
  *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index to register.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
+rxq_mac_addr_add(struct rxq *rxq)
 {
 	struct ibv_flow *flow;
 	struct priv *priv = rxq->priv;
 	const uint8_t (*mac)[ETHER_ADDR_LEN] =
 			(const uint8_t (*)[ETHER_ADDR_LEN])
-			priv->mac[mac_index].addr_bytes;
+			priv->mac.addr_bytes;
 
 	/* Allocate flow specification on the stack. */
 	struct __attribute__((packed)) {
@@ -2153,7 +2114,8 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 	struct ibv_flow_attr *attr = &data.attr;
 	struct ibv_flow_spec_eth *spec = &data.spec;
 
-	assert(mac_index < elemof(priv->mac));
+	if (rxq->mac_flow)
+		rxq_mac_addr_del(rxq);
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
 	 * This layout is expected by libibverbs.
@@ -2179,10 +2141,9 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 			.dst_mac = "\xff\xff\xff\xff\xff\xff",
 		}
 	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
+	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
 	      (void *)rxq,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
-	      mac_index);
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
 	/* Create related flow. */
 	errno = 0;
 	flow = ibv_create_flow(rxq->qp, attr);
@@ -2195,99 +2156,12 @@ rxq_add_flow(struct rxq *rxq, unsigned int mac_index)
 			return errno;
 		return EINVAL;
 	}
-	assert(rxq->mac_flow[mac_index] == NULL);
-	rxq->mac_flow[mac_index] = flow;
+	assert(rxq->mac_flow == NULL);
+	rxq->mac_flow = flow;
 	return 0;
 }
 
 /**
- * Register a MAC address in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param mac_index
- *   MAC address index to register.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
-{
-	int ret;
-
-	assert(mac_index < elemof(rxq->priv->mac));
-	if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
-		rxq_mac_addr_del(rxq, mac_index);
-	ret = rxq_add_flow(rxq, mac_index);
-	if (ret)
-		return ret;
-	BITFIELD_SET(rxq->mac_configured, mac_index);
-	return 0;
-}
-
-/**
- * Register all MAC addresses in a RX queue.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_mac_addrs_add(struct rxq *rxq)
-{
-	struct priv *priv = rxq->priv;
-	unsigned int i;
-	int ret;
-
-	for (i = 0; (i != elemof(priv->mac)); ++i) {
-		if (!BITFIELD_ISSET(priv->mac_configured, i))
-			continue;
-		ret = rxq_mac_addr_add(rxq, i);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			rxq_mac_addr_del(rxq, --i);
-		assert(ret > 0);
-		return ret;
-	}
-	return 0;
-}
-
-/**
- * Unregister a MAC address.
- *
- * In RSS mode, the MAC address is unregistered from the parent queue,
- * otherwise it is unregistered from each queue directly.
- *
- * @param priv
- *   Pointer to private structure.
- * @param mac_index
- *   MAC address index.
- */
-static void
-priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
-{
-	unsigned int i;
-
-	assert(!priv->isolated);
-	assert(mac_index < elemof(priv->mac));
-	if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
-		return;
-	if (priv->rss) {
-		rxq_mac_addr_del(LIST_FIRST(&priv->parents), mac_index);
-		goto end;
-	}
-	for (i = 0; (i != priv->dev->data->nb_rx_queues); ++i)
-		rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
-end:
-	BITFIELD_RESET(priv->mac_configured, mac_index);
-}
-
-/**
  * Register a MAC address.
  *
  * In RSS mode, the MAC address is registered in the parent queue,
@@ -2295,8 +2169,6 @@ priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
  *
  * @param priv
  *   Pointer to private structure.
- * @param mac_index
- *   MAC address index to use.
  * @param mac
  *   MAC address to register.
  *
@@ -2304,28 +2176,12 @@ priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
  *   0 on success, errno value on failure.
  */
 static int
-priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
-		  const uint8_t (*mac)[ETHER_ADDR_LEN])
+priv_mac_addr_add(struct priv *priv, const uint8_t (*mac)[ETHER_ADDR_LEN])
 {
 	unsigned int i;
 	int ret;
 
-	assert(mac_index < elemof(priv->mac));
-	/* First, make sure this address isn't already configured. */
-	for (i = 0; (i != elemof(priv->mac)); ++i) {
-		/* Skip this index, it's going to be reconfigured. */
-		if (i == mac_index)
-			continue;
-		if (!BITFIELD_ISSET(priv->mac_configured, i))
-			continue;
-		if (memcmp(priv->mac[i].addr_bytes, *mac, sizeof(*mac)))
-			continue;
-		/* Address already configured elsewhere, return with error. */
-		return EADDRINUSE;
-	}
-	if (BITFIELD_ISSET(priv->mac_configured, mac_index))
-		priv_mac_addr_del(priv, mac_index);
-	priv->mac[mac_index] = (struct ether_addr){
+	priv->mac = (struct ether_addr){
 		{
 			(*mac)[0], (*mac)[1], (*mac)[2],
 			(*mac)[3], (*mac)[4], (*mac)[5]
@@ -2333,19 +2189,10 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 	};
 	/* If device isn't started, this is all we need to do. */
 	if (!priv->started) {
-#ifndef NDEBUG
-		/* Verify that all queues have this index disabled. */
-		for (i = 0; (i != priv->rxqs_n); ++i) {
-			if ((*priv->rxqs)[i] == NULL)
-				continue;
-			assert(!BITFIELD_ISSET
-			       ((*priv->rxqs)[i]->mac_configured, mac_index));
-		}
-#endif
 		goto end;
 	}
 	if (priv->rss) {
-		ret = rxq_mac_addr_add(LIST_FIRST(&priv->parents), mac_index);
+		ret = rxq_mac_addr_add(LIST_FIRST(&priv->parents));
 		if (ret)
 			return ret;
 		goto end;
@@ -2353,17 +2200,16 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
 	for (i = 0; (i != priv->rxqs_n); ++i) {
 		if ((*priv->rxqs)[i] == NULL)
 			continue;
-		ret = rxq_mac_addr_add((*priv->rxqs)[i], mac_index);
+		ret = rxq_mac_addr_add((*priv->rxqs)[i]);
 		if (!ret)
 			continue;
 		/* Failure, rollback. */
 		while (i != 0)
 			if ((*priv->rxqs)[(--i)] != NULL)
-				rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+				rxq_mac_addr_del((*priv->rxqs)[i]);
 		return ret;
 	}
 end:
-	BITFIELD_SET(priv->mac_configured, mac_index);
 	return 0;
 }
 
@@ -2408,7 +2254,7 @@ rxq_cleanup(struct rxq *rxq)
 						&params));
 	}
 	if (rxq->qp != NULL && !rxq->priv->isolated) {
-		rxq_mac_addrs_del(rxq);
+		rxq_mac_addr_del(rxq);
 	}
 	if (rxq->qp != NULL)
 		claim_zero(ibv_destroy_qp(rxq->qp));
@@ -3050,11 +2896,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* Remove attached flows if RSS is disabled (no parent queue). */
 	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addrs_del(&tmpl);
+		rxq_mac_addr_del(&tmpl);
 		/* Update original queue in case of failure. */
-		memcpy(rxq->mac_configured, tmpl.mac_configured,
-		       sizeof(rxq->mac_configured));
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+		rxq->mac_flow = NULL;
 	}
 	/* From now on, any failure will render the queue unusable.
 	 * Reinitialize QP. */
@@ -3091,11 +2935,9 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* Reconfigure flows. Do not care for errors. */
 	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addrs_add(&tmpl);
+		rxq_mac_addr_add(&tmpl);
 		/* Update original queue in case of failure. */
-		memcpy(rxq->mac_configured, tmpl.mac_configured,
-		       sizeof(rxq->mac_configured));
-		memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+		rxq->mac_flow = NULL;
 	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
@@ -3241,7 +3083,7 @@ rxq_create_qp(struct rxq *rxq,
 	}
 	if (!priv->isolated && (parent || !priv->rss)) {
 		/* Configure MAC and broadcast addresses. */
-		ret = rxq_mac_addrs_add(rxq);
+		ret = rxq_mac_addr_add(rxq);
 		if (ret) {
 			ERROR("QP flow attachment failed: %s",
 			      strerror(ret));
@@ -3634,7 +3476,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		/* Ignore nonexistent RX queues. */
 		if (rxq == NULL)
 			continue;
-		ret = rxq_mac_addrs_add(rxq);
+		ret = rxq_mac_addr_add(rxq);
 		if (!ret)
 			continue;
 		WARN("%p: QP flow attachment failed: %s",
@@ -3672,7 +3514,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	while (i != 0) {
 		rxq = (*priv->rxqs)[i--];
 		if (rxq != NULL) {
-			rxq_mac_addrs_del(rxq);
+			rxq_mac_addr_del(rxq);
 		}
 	}
 	priv->started = 0;
@@ -3719,7 +3561,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		/* Ignore nonexistent RX queues. */
 		if (rxq == NULL)
 			continue;
-		rxq_mac_addrs_del(rxq);
+		rxq_mac_addr_del(rxq);
 	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	priv_unlock(priv);
 }
@@ -3972,7 +3814,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->max_rx_queues = max;
 	info->max_tx_queues = max;
 	/* Last array entry is reserved for broadcast. */
-	info->max_mac_addrs = (elemof(priv->mac) - 1);
+	info->max_mac_addrs = 1;
 	info->rx_offload_capa =
 		(priv->hw_csum ?
 		 (DEV_RX_OFFLOAD_IPV4_CKSUM |
@@ -4104,90 +3946,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 }
 
 /**
- * DPDK callback to remove a MAC address.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param index
- *   MAC address index.
- */
-static void
-mlx4_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
-{
-	struct priv *priv = dev->data->dev_private;
-
-	priv_lock(priv);
-	if (priv->isolated)
-		goto end;
-	DEBUG("%p: removing MAC address from index %" PRIu32,
-	      (void *)dev, index);
-	/* Last array entry is reserved for broadcast. */
-	if (index >= (elemof(priv->mac) - 1))
-		goto end;
-	priv_mac_addr_del(priv, index);
-end:
-	priv_unlock(priv);
-}
-
-/**
- * DPDK callback to add a MAC address.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param mac_addr
- *   MAC address to register.
- * @param index
- *   MAC address index.
- * @param vmdq
- *   VMDq pool index to associate address with (ignored).
- */
-static int
-mlx4_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
-		  uint32_t index, uint32_t vmdq)
-{
-	struct priv *priv = dev->data->dev_private;
-	int re;
-
-	(void)vmdq;
-	priv_lock(priv);
-	if (priv->isolated) {
-		DEBUG("%p: cannot add MAC address, "
-		      "device is in isolated mode", (void *)dev);
-		re = EPERM;
-		goto end;
-	}
-	DEBUG("%p: adding MAC address at index %" PRIu32,
-	      (void *)dev, index);
-	/* Last array entry is reserved for broadcast. */
-	if (index >= (elemof(priv->mac) - 1)) {
-		re = EINVAL;
-		goto end;
-	}
-	re = priv_mac_addr_add(priv, index,
-			       (const uint8_t (*)[ETHER_ADDR_LEN])
-			       mac_addr->addr_bytes);
-end:
-	priv_unlock(priv);
-	return -re;
-}
-
-/**
- * DPDK callback to set the primary MAC address.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param mac_addr
- *   MAC address to register.
- */
-static void
-mlx4_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
-{
-	DEBUG("%p: setting primary MAC address", (void *)dev);
-	mlx4_mac_addr_remove(dev, 0);
-	mlx4_mac_addr_add(dev, mac_addr, 0, 0);
-}
-
-/**
  * DPDK callback to retrieve physical link information.
  *
  * @param dev
@@ -4309,7 +4067,7 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 		/* Reenable non-RSS queue attributes. No need to check
 		 * for errors at this stage. */
 		if (!priv->rss && !priv->isolated) {
-			rxq_mac_addrs_add(rxq);
+			rxq_mac_addr_add(rxq);
 		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
@@ -4487,9 +4245,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.tx_queue_release = mlx4_tx_queue_release,
 	.flow_ctrl_get = mlx4_dev_get_flow_ctrl,
 	.flow_ctrl_set = mlx4_dev_set_flow_ctrl,
-	.mac_addr_remove = mlx4_mac_addr_remove,
-	.mac_addr_add = mlx4_mac_addr_add,
-	.mac_addr_set = mlx4_mac_addr_set,
 	.mtu_set = mlx4_dev_set_mtu,
 	.filter_ctrl = mlx4_dev_filter_ctrl,
 	.rx_queue_intr_enable = mlx4_rx_intr_enable,
@@ -5375,13 +5130,10 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[0], mac.addr_bytes[1],
 		     mac.addr_bytes[2], mac.addr_bytes[3],
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
-		/* Register MAC and broadcast addresses. */
-		claim_zero(priv_mac_addr_add(priv, 0,
+		/* Register MAC address. */
+		claim_zero(priv_mac_addr_add(priv,
 					     (const uint8_t (*)[ETHER_ADDR_LEN])
 					     mac.addr_bytes));
-		claim_zero(priv_mac_addr_add(priv, (elemof(priv->mac) - 1),
-					     &(const uint8_t [ETHER_ADDR_LEN])
-					     { "\xff\xff\xff\xff\xff\xff" }));
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
@@ -5412,7 +5164,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		}
 
 		eth_dev->data->dev_private = priv;
-		eth_dev->data->mac_addrs = priv->mac;
+		eth_dev->data->mac_addrs = &priv->mac;
 		eth_dev->device = &pci_dev->device;
 
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index d26d826..d00e77f 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -35,7 +35,6 @@
 #define RTE_PMD_MLX4_H_
 
 #include <stdint.h>
-#include <limits.h>
 
 /*
  * Runtime logging through RTE_LOG() is enabled when not in debugging mode.
@@ -64,16 +63,6 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
-/*
- * Maximum number of simultaneous MAC addresses supported.
- *
- * According to ConnectX's Programmer Reference Manual:
- *   The L2 Address Match is implemented by comparing a MAC/VLAN combination
- *   of 128 MAC addresses and 127 VLAN values, comprising 128x127 possible
- *   L2 addresses.
- */
-#define MLX4_MAX_MAC_ADDRESSES 128
-
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
@@ -112,25 +101,6 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-/* Bit-field manipulation. */
-#define BITFIELD_DECLARE(bf, type, size)				\
-	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) +		\
-		 !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
-#define BITFIELD_DEFINE(bf, type, size)					\
-	BITFIELD_DECLARE((bf), type, (size)) = { 0 }
-#define BITFIELD_SET(bf, b)						\
-	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)),			\
-	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |=		\
-		((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
-#define BITFIELD_RESET(bf, b)						\
-	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)),			\
-	 (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &=		\
-		~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
-#define BITFIELD_ISSET(bf, b)						\
-	(assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)),			\
-	 !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &		\
-	     ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
-
 /* Number of elements in array. */
 #define elemof(a) (sizeof(a) / sizeof((a)[0]))
 
@@ -202,8 +172,7 @@ struct rxq {
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
-	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
-	struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -279,13 +248,7 @@ struct priv {
 	struct ibv_context *ctx; /* Verbs context. */
 	struct ibv_device_attr device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
-	/*
-	 * MAC addresses array and configuration bit-field.
-	 * An extra entry that cannot be modified by the DPDK is reserved
-	 * for broadcast frames (destination MAC address ff:ff:ff:ff:ff:ff).
-	 */
-	struct ether_addr mac[MLX4_MAX_MAC_ADDRESSES];
-	BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
+	struct ether_addr mac; /* MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 13/51] net/mlx4: drop MAC flows affecting all Rx queues
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (11 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 12/51] net/mlx4: remove MAC address configuration support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 14/51] net/mlx4: revert flow API RSS support Adrien Mazarguil
                     ` (39 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Configuring several Rx queues enables RSS, which causes an additional
special parent queue to be created to manage them.

MAC flows are associated with the queue supposed to receive packets; either
the parent one in case of RSS or the single orphan otherwise.

For historical reasons the current implementation supports another scenario
with multiple orphans, in which case MAC flows are configured on all of
them. This is harmless but useless since it cannot happen.

Removing this feature allows dissociating the remaining MAC flow from Rx
queues and store it inside the private structure where it belongs.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 215 +++++++++++--------------------------------
 drivers/net/mlx4/mlx4.h |   2 +-
 2 files changed, 57 insertions(+), 160 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8ca5698..05923e9 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -515,6 +515,9 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 static void
 rxq_cleanup(struct rxq *rxq);
 
+static void
+priv_mac_addr_del(struct priv *priv);
+
 /**
  * Create RSS parent queue.
  *
@@ -641,6 +644,7 @@ dev_configure(struct rte_eth_dev *dev)
 		for (i = 0; (i != priv->rxqs_n); ++i)
 			if ((*priv->rxqs)[i] != NULL)
 				return EINVAL;
+		priv_mac_addr_del(priv);
 		priv_parent_list_cleanup(priv);
 		priv->rss = 0;
 		priv->rxqs_n = 0;
@@ -2065,46 +2069,57 @@ rxq_free_elts(struct rxq *rxq)
 }
 
 /**
- * Unregister a MAC address from a Rx queue.
+ * Unregister a MAC address.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * @param priv
+ *   Pointer to private structure.
  */
 static void
-rxq_mac_addr_del(struct rxq *rxq)
+priv_mac_addr_del(struct priv *priv)
 {
 #ifndef NDEBUG
-	struct priv *priv = rxq->priv;
-	const uint8_t (*mac)[ETHER_ADDR_LEN] =
-		(const uint8_t (*)[ETHER_ADDR_LEN])
-		priv->mac.addr_bytes;
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
 #endif
-	if (!rxq->mac_flow)
+
+	if (!priv->mac_flow)
 		return;
 	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)rxq,
+	      (void *)priv,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	claim_zero(ibv_destroy_flow(rxq->mac_flow));
-	rxq->mac_flow = NULL;
+	claim_zero(ibv_destroy_flow(priv->mac_flow));
+	priv->mac_flow = NULL;
 }
 
 /**
- * Register a MAC address in a Rx queue.
+ * Register a MAC address.
  *
- * @param rxq
- *   Pointer to RX queue structure.
+ * In RSS mode, the MAC address is registered in the parent queue,
+ * otherwise it is registered in queue 0.
+ *
+ * @param priv
+ *   Pointer to private structure.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_mac_addr_add(struct rxq *rxq)
+priv_mac_addr_add(struct priv *priv)
 {
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
+	struct rxq *rxq;
 	struct ibv_flow *flow;
-	struct priv *priv = rxq->priv;
-	const uint8_t (*mac)[ETHER_ADDR_LEN] =
-			(const uint8_t (*)[ETHER_ADDR_LEN])
-			priv->mac.addr_bytes;
+
+	/* If device isn't started, this is all we need to do. */
+	if (!priv->started)
+		return 0;
+	if (priv->isolated)
+		return 0;
+	if (priv->rss)
+		rxq = LIST_FIRST(&priv->parents);
+	else if (*priv->rxqs && (*priv->rxqs)[0])
+		rxq = (*priv->rxqs)[0];
+	else
+		return 0;
 
 	/* Allocate flow specification on the stack. */
 	struct __attribute__((packed)) {
@@ -2114,8 +2129,8 @@ rxq_mac_addr_add(struct rxq *rxq)
 	struct ibv_flow_attr *attr = &data.attr;
 	struct ibv_flow_spec_eth *spec = &data.spec;
 
-	if (rxq->mac_flow)
-		rxq_mac_addr_del(rxq);
+	if (priv->mac_flow)
+		priv_mac_addr_del(priv);
 	/*
 	 * No padding must be inserted by the compiler between attr and spec.
 	 * This layout is expected by libibverbs.
@@ -2142,7 +2157,7 @@ rxq_mac_addr_add(struct rxq *rxq)
 		}
 	};
 	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)rxq,
+	      (void *)priv,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
 	/* Create related flow. */
 	errno = 0;
@@ -2156,60 +2171,8 @@ rxq_mac_addr_add(struct rxq *rxq)
 			return errno;
 		return EINVAL;
 	}
-	assert(rxq->mac_flow == NULL);
-	rxq->mac_flow = flow;
-	return 0;
-}
-
-/**
- * Register a MAC address.
- *
- * In RSS mode, the MAC address is registered in the parent queue,
- * otherwise it is registered in each queue directly.
- *
- * @param priv
- *   Pointer to private structure.
- * @param mac
- *   MAC address to register.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-priv_mac_addr_add(struct priv *priv, const uint8_t (*mac)[ETHER_ADDR_LEN])
-{
-	unsigned int i;
-	int ret;
-
-	priv->mac = (struct ether_addr){
-		{
-			(*mac)[0], (*mac)[1], (*mac)[2],
-			(*mac)[3], (*mac)[4], (*mac)[5]
-		}
-	};
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started) {
-		goto end;
-	}
-	if (priv->rss) {
-		ret = rxq_mac_addr_add(LIST_FIRST(&priv->parents));
-		if (ret)
-			return ret;
-		goto end;
-	}
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		ret = rxq_mac_addr_add((*priv->rxqs)[i]);
-		if (!ret)
-			continue;
-		/* Failure, rollback. */
-		while (i != 0)
-			if ((*priv->rxqs)[(--i)] != NULL)
-				rxq_mac_addr_del((*priv->rxqs)[i]);
-		return ret;
-	}
-end:
+	assert(priv->mac_flow == NULL);
+	priv->mac_flow = flow;
 	return 0;
 }
 
@@ -2253,9 +2216,6 @@ rxq_cleanup(struct rxq *rxq)
 						rxq->if_cq,
 						&params));
 	}
-	if (rxq->qp != NULL && !rxq->priv->isolated) {
-		rxq_mac_addr_del(rxq);
-	}
 	if (rxq->qp != NULL)
 		claim_zero(ibv_destroy_qp(rxq->qp));
 	if (rxq->cq != NULL)
@@ -2894,12 +2854,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		DEBUG("%p: nothing to do", (void *)dev);
 		return 0;
 	}
-	/* Remove attached flows if RSS is disabled (no parent queue). */
-	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addr_del(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->mac_flow = NULL;
-	}
 	/* From now on, any failure will render the queue unusable.
 	 * Reinitialize QP. */
 	if (!tmpl.qp)
@@ -2933,12 +2887,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		assert(err > 0);
 		return err;
 	}
-	/* Reconfigure flows. Do not care for errors. */
-	if (!priv->rss && !priv->isolated) {
-		rxq_mac_addr_add(&tmpl);
-		/* Update original queue in case of failure. */
-		rxq->mac_flow = NULL;
-	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
 	if (pool == NULL) {
@@ -3081,15 +3029,6 @@ rxq_create_qp(struct rxq *rxq,
 		      strerror(ret));
 		return ret;
 	}
-	if (!priv->isolated && (parent || !priv->rss)) {
-		/* Configure MAC and broadcast addresses. */
-		ret = rxq_mac_addr_add(rxq);
-		if (ret) {
-			ERROR("QP flow attachment failed: %s",
-			      strerror(ret));
-			return ret;
-		}
-	}
 	if (!parent) {
 		ret = ibv_post_recv(rxq->qp,
 				    (rxq->sp ?
@@ -3359,6 +3298,8 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -EEXIST;
 		}
 		(*priv->rxqs)[idx] = NULL;
+		if (idx == 0)
+			priv_mac_addr_del(priv);
 		rxq_cleanup(rxq);
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
@@ -3418,6 +3359,8 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 			DEBUG("%p: removing RX queue %p from list",
 			      (void *)priv->dev, (void *)rxq);
 			(*priv->rxqs)[i] = NULL;
+			if (i == 0)
+				priv_mac_addr_del(priv);
 			break;
 		}
 	rxq_cleanup(rxq);
@@ -3449,9 +3392,6 @@ static int
 mlx4_dev_start(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
 	int ret;
 
 	priv_lock(priv);
@@ -3461,28 +3401,9 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	}
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
-	if (priv->isolated) {
-		rxq = NULL;
-		r = 1;
-	} else if (priv->rss) {
-		rxq = LIST_FIRST(&priv->parents);
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
-	/* Iterate only once when RSS is enabled. */
-	do {
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		ret = rxq_mac_addr_add(rxq);
-		if (!ret)
-			continue;
-		WARN("%p: QP flow attachment failed: %s",
-		     (void *)dev, strerror(ret));
+	ret = priv_mac_addr_add(priv);
+	if (ret)
 		goto err;
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
 	ret = priv_dev_link_interrupt_handler_install(priv, dev);
 	if (ret) {
 		ERROR("%p: LSC handler install failed",
@@ -3511,12 +3432,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	return 0;
 err:
 	/* Rollback. */
-	while (i != 0) {
-		rxq = (*priv->rxqs)[i--];
-		if (rxq != NULL) {
-			rxq_mac_addr_del(rxq);
-		}
-	}
+	priv_mac_addr_del(priv);
 	priv->started = 0;
 	priv_unlock(priv);
 	return -ret;
@@ -3534,9 +3450,6 @@ static void
 mlx4_dev_stop(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	unsigned int i = 0;
-	unsigned int r;
-	struct rxq *rxq;
 
 	priv_lock(priv);
 	if (!priv->started) {
@@ -3545,24 +3458,8 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	}
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
-	if (priv->isolated) {
-		rxq = NULL;
-		r = 1;
-	} else if (priv->rss) {
-		rxq = LIST_FIRST(&priv->parents);
-		r = 1;
-	} else {
-		rxq = (*priv->rxqs)[0];
-		r = priv->rxqs_n;
-	}
 	mlx4_priv_flow_stop(priv);
-	/* Iterate only once when RSS is enabled. */
-	do {
-		/* Ignore nonexistent RX queues. */
-		if (rxq == NULL)
-			continue;
-		rxq_mac_addr_del(rxq);
-	} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+	priv_mac_addr_del(priv);
 	priv_unlock(priv);
 }
 
@@ -3647,6 +3544,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+	priv_mac_addr_del(priv);
 	/* Prevent crashes when queues are still in use. This is unfortunately
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
 	 * never release them before closing the device. */
@@ -4036,6 +3934,8 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	} else
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
 	priv->mtu = mtu;
+	/* Remove MAC flow. */
+	priv_mac_addr_del(priv);
 	/* Temporarily replace RX handler with a fake one, assuming it has not
 	 * been copied elsewhere. */
 	dev->rx_pkt_burst = removed_rx_burst;
@@ -4064,11 +3964,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 				rx_func = mlx4_rx_burst_sp;
 			break;
 		}
-		/* Reenable non-RSS queue attributes. No need to check
-		 * for errors at this stage. */
-		if (!priv->rss && !priv->isolated) {
-			rxq_mac_addr_add(rxq);
-		}
 		/* Scattered burst function takes priority. */
 		if (rxq->sp)
 			rx_func = mlx4_rx_burst_sp;
@@ -4076,6 +3971,8 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	/* Burst functions can now be called again. */
 	rte_wmb();
 	dev->rx_pkt_burst = rx_func;
+	/* Restore MAC flow. */
+	ret = priv_mac_addr_add(priv);
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
@@ -5131,9 +5028,9 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[2], mac.addr_bytes[3],
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
-		claim_zero(priv_mac_addr_add(priv,
-					     (const uint8_t (*)[ETHER_ADDR_LEN])
-					     mac.addr_bytes));
+		priv->mac = mac;
+		if (priv_mac_addr_add(priv))
+			goto port_error;
 #ifndef NDEBUG
 		{
 			char ifname[IF_NAMESIZE];
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index d00e77f..af70076 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -172,7 +172,6 @@ struct rxq {
 	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
 	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
-	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -249,6 +248,7 @@ struct priv {
 	struct ibv_device_attr device_attr; /* Device properties. */
 	struct ibv_pd *pd; /* Protection Domain. */
 	struct ether_addr mac; /* MAC address. */
+	struct ibv_flow *mac_flow; /* Flow associated with MAC address. */
 	/* Device properties. */
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 14/51] net/mlx4: revert flow API RSS support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (12 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 13/51] net/mlx4: drop MAC flows affecting all Rx queues Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 15/51] net/mlx4: revert RSS parent queue refactoring Adrien Mazarguil
                     ` (38 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This reverts commit d7769c7c08cc08a9d1bc4e40b95524d9697707d9.

Existing RSS features rely on experimental Verbs provided by Mellanox OFED.

In order to replace this dependency with standard distribution packages,
RSS support must be temporarily removed to be re-implemented using a
different API.

Removing support for the RSS flow rule action is the first step toward this
goal.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |   6 +-
 drivers/net/mlx4/mlx4.h      |   5 -
 drivers/net/mlx4/mlx4_flow.c | 206 +++-----------------------------------
 drivers/net/mlx4/mlx4_flow.h |   3 +-
 4 files changed, 20 insertions(+), 200 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 05923e9..02605c9 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -533,7 +533,7 @@ priv_mac_addr_del(struct priv *priv);
  * @return
  *   Pointer to a parent rxq structure, NULL on failure.
  */
-struct rxq *
+static struct rxq *
 priv_parent_create(struct priv *priv,
 		   uint16_t queues[],
 		   uint16_t children_n)
@@ -670,8 +670,10 @@ dev_configure(struct rte_eth_dev *dev)
 	priv->rss = 1;
 	tmp = priv->rxqs_n;
 	priv->rxqs_n = rxqs_n;
-	if (priv->isolated)
+	if (priv->isolated) {
+		priv->rss = 0;
 		return 0;
+	}
 	if (priv_parent_create(priv, NULL, priv->rxqs_n))
 		return 0;
 	/* Failure, rollback. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index af70076..b76bf48 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -291,9 +291,4 @@ rxq_create_qp(struct rxq *rxq,
 void
 rxq_parent_cleanup(struct rxq *parent);
 
-struct rxq *
-priv_parent_create(struct priv *priv,
-		   uint16_t queues[],
-		   uint16_t children_n);
-
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index f5c015e..827115e 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -109,7 +109,6 @@ struct rte_flow_drop {
 static const enum rte_flow_action_type valid_actions[] = {
 	RTE_FLOW_ACTION_TYPE_DROP,
 	RTE_FLOW_ACTION_TYPE_QUEUE,
-	RTE_FLOW_ACTION_TYPE_RSS,
 	RTE_FLOW_ACTION_TYPE_END,
 };
 
@@ -670,76 +669,6 @@ priv_flow_validate(struct priv *priv,
 			if (!queue || (queue->index > (priv->rxqs_n - 1)))
 				goto exit_action_not_supported;
 			action.queue = 1;
-			action.queues_n = 1;
-			action.queues[0] = queue->index;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_RSS) {
-			int i;
-			int ierr;
-			const struct rte_flow_action_rss *rss =
-				(const struct rte_flow_action_rss *)
-				actions->conf;
-
-			if (!priv->hw_rss) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "RSS cannot be used with "
-					   "the current configuration");
-				return -rte_errno;
-			}
-			if (!priv->isolated) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "RSS cannot be used without "
-					   "isolated mode");
-				return -rte_errno;
-			}
-			if (!rte_is_power_of_2(rss->num)) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "the number of queues "
-					   "should be power of two");
-				return -rte_errno;
-			}
-			if (priv->max_rss_tbl_sz < rss->num) {
-				rte_flow_error_set(error, ENOTSUP,
-					   RTE_FLOW_ERROR_TYPE_ACTION,
-					   actions,
-					   "the number of queues "
-					   "is too large");
-				return -rte_errno;
-			}
-			/* checking indexes array */
-			ierr = 0;
-			for (i = 0; i < rss->num; ++i) {
-				int j;
-				if (rss->queue[i] >= priv->rxqs_n)
-					ierr = 1;
-				/*
-				 * Prevent the user from specifying
-				 * the same queue twice in the RSS array.
-				 */
-				for (j = i + 1; j < rss->num && !ierr; ++j)
-					if (rss->queue[j] == rss->queue[i])
-						ierr = 1;
-				if (ierr) {
-					rte_flow_error_set(
-						error,
-						ENOTSUP,
-						RTE_FLOW_ERROR_TYPE_HANDLE,
-						NULL,
-						"RSS action only supports "
-						"unique queue indices "
-						"in a list");
-					return -rte_errno;
-				}
-			}
-			action.queue = 1;
-			action.queues_n = rss->num;
-			for (i = 0; i < rss->num; ++i)
-				action.queues[i] = rss->queue[i];
 		} else {
 			goto exit_action_not_supported;
 		}
@@ -865,82 +794,6 @@ mlx4_flow_create_drop_queue(struct priv *priv)
 }
 
 /**
- * Get RSS parent rxq structure for given queues.
- *
- * Creates a new or returns an existed one.
- *
- * @param priv
- *   Pointer to private structure.
- * @param queues
- *   queues indices array, NULL in default RSS case.
- * @param children_n
- *   the size of queues array.
- *
- * @return
- *   Pointer to a parent rxq structure, NULL on failure.
- */
-static struct rxq *
-priv_parent_get(struct priv *priv,
-		uint16_t queues[],
-		uint16_t children_n,
-		struct rte_flow_error *error)
-{
-	unsigned int i;
-	struct rxq *parent;
-
-	for (parent = LIST_FIRST(&priv->parents);
-	     parent;
-	     parent = LIST_NEXT(parent, next)) {
-		unsigned int same = 0;
-		unsigned int overlap = 0;
-
-		/*
-		 * Find out whether an appropriate parent queue already exists
-		 * and can be reused, otherwise make sure there are no overlaps.
-		 */
-		for (i = 0; i < children_n; ++i) {
-			unsigned int j;
-
-			for (j = 0; j < parent->rss.queues_n; ++j) {
-				if (parent->rss.queues[j] != queues[i])
-					continue;
-				++overlap;
-				if (i == j)
-					++same;
-			}
-		}
-		if (same == children_n &&
-			children_n == parent->rss.queues_n)
-			return parent;
-		else if (overlap)
-			goto error;
-	}
-	/* Exclude the cases when some QPs were created without RSS */
-	for (i = 0; i < children_n; ++i) {
-		struct rxq *rxq = (*priv->rxqs)[queues[i]];
-		if (rxq->qp)
-			goto error;
-	}
-	parent = priv_parent_create(priv, queues, children_n);
-	if (!parent) {
-		rte_flow_error_set(error,
-				   ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "flow rule creation failure");
-		return NULL;
-	}
-	return parent;
-
-error:
-	rte_flow_error_set(error,
-			   EEXIST,
-			   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-			   NULL,
-			   "sharing a queue between several"
-			   " RSS groups is not supported");
-	return NULL;
-}
-
-/**
  * Complete flow rule creation.
  *
  * @param priv
@@ -963,7 +816,6 @@ priv_flow_create_action_queue(struct priv *priv,
 {
 	struct ibv_qp *qp;
 	struct rte_flow *rte_flow;
-	struct rxq *rxq_parent = NULL;
 
 	assert(priv->pd);
 	assert(priv->ctx);
@@ -977,38 +829,23 @@ priv_flow_create_action_queue(struct priv *priv,
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
 		int ret;
-		unsigned int i;
-		struct rxq *rxq = NULL;
-
-		if (action->queues_n > 1) {
-			rxq_parent = priv_parent_get(priv, action->queues,
-						     action->queues_n, error);
-			if (!rxq_parent)
+		struct rxq *rxq = (*priv->rxqs)[action->queue_id];
+
+		if (!rxq->qp) {
+			assert(priv->isolated);
+			ret = rxq_create_qp(rxq, rxq->elts_n,
+					    0, 0, NULL);
+			if (ret) {
+				rte_flow_error_set(
+					error,
+					ENOMEM,
+					RTE_FLOW_ERROR_TYPE_HANDLE,
+					NULL,
+					"flow rule creation failure");
 				goto error;
-		}
-		for (i = 0; i < action->queues_n; ++i) {
-			rxq = (*priv->rxqs)[action->queues[i]];
-			/*
-			 * In case of isolated mode we postpone
-			 * ibv receive queue creation till the first
-			 * rte_flow rule will be applied on that queue.
-			 */
-			if (!rxq->qp) {
-				assert(priv->isolated);
-				ret = rxq_create_qp(rxq, rxq->elts_n,
-						    0, 0, rxq_parent);
-				if (ret) {
-					rte_flow_error_set(
-						error,
-						ENOMEM,
-						RTE_FLOW_ERROR_TYPE_HANDLE,
-						NULL,
-						"flow rule creation failure");
-					goto error;
-				}
 			}
 		}
-		qp = action->queues_n > 1 ? rxq_parent->qp : rxq->qp;
+		qp = rxq->qp;
 		rte_flow->qp = qp;
 	}
 	rte_flow->ibv_attr = ibv_attr;
@@ -1023,8 +860,6 @@ priv_flow_create_action_queue(struct priv *priv,
 	return rte_flow;
 
 error:
-	if (rxq_parent)
-		rxq_parent_cleanup(rxq_parent);
 	rte_free(rte_flow);
 	return NULL;
 }
@@ -1088,22 +923,11 @@ priv_flow_create(struct priv *priv,
 			continue;
 		} else if (actions->type == RTE_FLOW_ACTION_TYPE_QUEUE) {
 			action.queue = 1;
-			action.queues_n = 1;
-			action.queues[0] =
+			action.queue_id =
 				((const struct rte_flow_action_queue *)
 				 actions->conf)->index;
 		} else if (actions->type == RTE_FLOW_ACTION_TYPE_DROP) {
 			action.drop = 1;
-		} else if (actions->type == RTE_FLOW_ACTION_TYPE_RSS) {
-			unsigned int i;
-			const struct rte_flow_action_rss *rss =
-				(const struct rte_flow_action_rss *)
-				 actions->conf;
-
-			action.queue = 1;
-			action.queues_n = rss->num;
-			for (i = 0; i < rss->num; ++i)
-				action.queues[i] = rss->queue[i];
 		} else {
 			rte_flow_error_set(error, ENOTSUP,
 					   RTE_FLOW_ERROR_TYPE_ACTION,
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 4654dc2..17e5f6e 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -98,8 +98,7 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 struct mlx4_flow_action {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
-	uint16_t queues[RTE_MAX_QUEUES_PER_PORT]; /**< Queue indices to use. */
-	uint16_t queues_n; /**< Number of entries in queue[] */
+	uint32_t queue_id; /**< Identifier of the queue. */
 };
 
 int mlx4_priv_flow_start(struct priv *priv);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 15/51] net/mlx4: revert RSS parent queue refactoring
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (13 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 14/51] net/mlx4: revert flow API RSS support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 16/51] net/mlx4: drop RSS support Adrien Mazarguil
                     ` (37 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This reverts commit ff00a0dc5600dbb0a29e4aa7fa4b078f98c7a360.

Support for several RSS parent queues was necessary to implement the RSS
flow rule action, dropped in a prior commit.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 332 +++++++++++---------------------------
 drivers/net/mlx4/mlx4.h      |  17 +-
 drivers/net/mlx4/mlx4_flow.c |  15 --
 3 files changed, 97 insertions(+), 267 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 02605c9..44a2093 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -507,10 +507,8 @@ txq_cleanup(struct txq *txq);
 
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive,
-	  const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp, int children_n,
-	  struct rxq *rxq_parent);
+	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  struct rte_mempool *mp);
 
 static void
 rxq_cleanup(struct rxq *rxq);
@@ -519,84 +517,6 @@ static void
 priv_mac_addr_del(struct priv *priv);
 
 /**
- * Create RSS parent queue.
- *
- * The new parent is inserted in front of the list in the private structure.
- *
- * @param priv
- *   Pointer to private structure.
- * @param queues
- *   Queues indices array, if NULL use all Rx queues.
- * @param children_n
- *   The number of entries in queues[].
- *
- * @return
- *   Pointer to a parent rxq structure, NULL on failure.
- */
-static struct rxq *
-priv_parent_create(struct priv *priv,
-		   uint16_t queues[],
-		   uint16_t children_n)
-{
-	int ret;
-	uint16_t i;
-	struct rxq *parent;
-
-	parent = rte_zmalloc("parent queue",
-			     sizeof(*parent),
-			     RTE_CACHE_LINE_SIZE);
-	if (!parent) {
-		ERROR("cannot allocate memory for RSS parent queue");
-		return NULL;
-	}
-	ret = rxq_setup(priv->dev, parent, 0, 0, 0,
-			NULL, NULL, children_n, NULL);
-	if (ret) {
-		rte_free(parent);
-		return NULL;
-	}
-	parent->rss.queues_n = children_n;
-	if (queues) {
-		for (i = 0; i < children_n; ++i)
-			parent->rss.queues[i] = queues[i];
-	} else {
-		/* the default RSS ring case */
-		assert(priv->rxqs_n == children_n);
-		for (i = 0; i < priv->rxqs_n; ++i)
-			parent->rss.queues[i] = i;
-	}
-	LIST_INSERT_HEAD(&priv->parents, parent, next);
-	return parent;
-}
-
-/**
- * Clean up RX queue parent structure.
- *
- * @param parent
- *   RX queue parent structure.
- */
-void
-rxq_parent_cleanup(struct rxq *parent)
-{
-	LIST_REMOVE(parent, next);
-	rxq_cleanup(parent);
-	rte_free(parent);
-}
-
-/**
- * Clean up parent structures from the parent list.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_parent_list_cleanup(struct priv *priv)
-{
-	while (!LIST_EMPTY(&priv->parents))
-		rxq_parent_cleanup(LIST_FIRST(&priv->parents));
-}
-
-/**
  * Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
@@ -615,6 +535,7 @@ dev_configure(struct rte_eth_dev *dev)
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
 	unsigned int tmp;
+	int ret;
 
 	priv->rxqs = (void *)dev->data->rx_queues;
 	priv->txqs = (void *)dev->data->tx_queues;
@@ -645,7 +566,7 @@ dev_configure(struct rte_eth_dev *dev)
 			if ((*priv->rxqs)[i] != NULL)
 				return EINVAL;
 		priv_mac_addr_del(priv);
-		priv_parent_list_cleanup(priv);
+		rxq_cleanup(&priv->rxq_parent);
 		priv->rss = 0;
 		priv->rxqs_n = 0;
 	}
@@ -670,16 +591,14 @@ dev_configure(struct rte_eth_dev *dev)
 	priv->rss = 1;
 	tmp = priv->rxqs_n;
 	priv->rxqs_n = rxqs_n;
-	if (priv->isolated) {
-		priv->rss = 0;
-		return 0;
-	}
-	if (priv_parent_create(priv, NULL, priv->rxqs_n))
+	ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, 0, NULL, NULL);
+	if (!ret)
 		return 0;
 	/* Failure, rollback. */
 	priv->rss = 0;
 	priv->rxqs_n = tmp;
-	return ENOMEM;
+	assert(ret > 0);
+	return ret;
 }
 
 /**
@@ -2117,7 +2036,7 @@ priv_mac_addr_add(struct priv *priv)
 	if (priv->isolated)
 		return 0;
 	if (priv->rss)
-		rxq = LIST_FIRST(&priv->parents);
+		rxq = &priv->rxq_parent;
 	else if (*priv->rxqs && (*priv->rxqs)[0])
 		rxq = (*priv->rxqs)[0];
 	else
@@ -2743,18 +2662,15 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
  *   Completion queue to associate with QP.
  * @param desc
  *   Number of descriptors in QP (hint only).
- * @param children_n
- *   If nonzero, a number of children for parent QP and zero for a child.
- * @param rxq_parent
- *   Pointer for a parent in a child case, NULL otherwise.
+ * @param parent
+ *   If nonzero, create a parent QP, otherwise a child.
  *
  * @return
  *   QP pointer or NULL in case of error.
  */
 static struct ibv_qp *
 rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-		 int children_n, struct ibv_exp_res_domain *rd,
-		 struct rxq *rxq_parent)
+		 int parent, struct ibv_exp_res_domain *rd)
 {
 	struct ibv_exp_qp_init_attr attr = {
 		/* CQ to be associated with the send queue. */
@@ -2782,16 +2698,16 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 
 	attr.max_inl_recv = priv->inl_recv_size,
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-	if (children_n > 0) {
+	if (parent) {
 		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
 		/* TSS isn't necessary. */
 		attr.qpg.parent_attrib.tss_child_count = 0;
 		attr.qpg.parent_attrib.rss_child_count =
-			rte_align32pow2(children_n + 1) >> 1;
+			rte_align32pow2(priv->rxqs_n + 1) >> 1;
 		DEBUG("initializing parent RSS queue");
 	} else {
 		attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
-		attr.qpg.qpg_parent = rxq_parent->qp;
+		attr.qpg.qpg_parent = priv->rxq_parent.qp;
 		DEBUG("initializing child RSS queue");
 	}
 	return ibv_exp_create_qp(priv->ctx, &attr);
@@ -2825,7 +2741,13 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int err;
+	int parent = (rxq == &priv->rxq_parent);
 
+	if (parent) {
+		ERROR("%p: cannot rehash parent queue %p",
+		      (void *)dev, (void *)rxq);
+		return EINVAL;
+	}
 	mb_len = rte_pktmbuf_data_room_size(rxq->mp);
 	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
 	/* Number of descriptors and mbufs currently allocated. */
@@ -2858,8 +2780,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	}
 	/* From now on, any failure will render the queue unusable.
 	 * Reinitialize QP. */
-	if (!tmpl.qp)
-		goto skip_init;
 	mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
 	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
 	if (err) {
@@ -2867,6 +2787,12 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		assert(err > 0);
 		return err;
 	}
+	err = ibv_resize_cq(tmpl.cq, desc_n);
+	if (err) {
+		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
+		assert(err > 0);
+		return err;
+	}
 	mod = (struct ibv_exp_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
@@ -2875,6 +2801,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	};
 	err = ibv_exp_modify_qp(tmpl.qp, &mod,
 				(IBV_EXP_QP_STATE |
+				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
 				 IBV_EXP_QP_PORT));
 	if (err) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
@@ -2882,13 +2809,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		assert(err > 0);
 		return err;
 	};
-skip_init:
-	err = ibv_resize_cq(tmpl.cq, desc_n);
-	if (err) {
-		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
 	/* Allocate pool. */
 	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
 	if (pool == NULL) {
@@ -2942,8 +2862,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	rxq->elts_n = 0;
 	rte_free(rxq->elts.sp);
 	rxq->elts.sp = NULL;
-	if (!tmpl.qp)
-		goto skip_rtr;
 	/* Post WRs. */
 	err = ibv_post_recv(tmpl.qp,
 			    (tmpl.sp ?
@@ -2971,103 +2889,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 }
 
 /**
- * Create verbs QP resources associated with a rxq.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param inactive
- *   If true, the queue is disabled because its index is higher or
- *   equal to the real number of queues, which must be a power of 2.
- * @param children_n
- *   The number of children in a parent case, zero for a child.
- * @param rxq_parent
- *   The pointer to a parent RX structure for a child in RSS case,
- *   NULL for parent.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-int
-rxq_create_qp(struct rxq *rxq,
-	      uint16_t desc,
-	      int inactive,
-	      int children_n,
-	      struct rxq *rxq_parent)
-{
-	int ret;
-	struct ibv_exp_qp_attr mod;
-	struct ibv_exp_query_intf_params params;
-	enum ibv_exp_query_intf_status status;
-	struct ibv_recv_wr *bad_wr;
-	int parent = (children_n > 0);
-	struct priv *priv = rxq->priv;
-
-	if (priv->rss && !inactive && (rxq_parent || parent))
-		rxq->qp = rxq_setup_qp_rss(priv, rxq->cq, desc,
-					   children_n, rxq->rd,
-					   rxq_parent);
-	else
-		rxq->qp = rxq_setup_qp(priv, rxq->cq, desc, rxq->rd);
-	if (rxq->qp == NULL) {
-		ret = (errno ? errno : EINVAL);
-		ERROR("QP creation failure: %s",
-		      strerror(ret));
-		return ret;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_exp_modify_qp(rxq->qp, &mod,
-				(IBV_EXP_QP_STATE |
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-				 IBV_EXP_QP_PORT));
-	if (ret) {
-		ERROR("QP state to IBV_QPS_INIT failed: %s",
-		      strerror(ret));
-		return ret;
-	}
-	if (!parent) {
-		ret = ibv_post_recv(rxq->qp,
-				    (rxq->sp ?
-				     &(*rxq->elts.sp)[0].wr :
-				     &(*rxq->elts.no_sp)[0].wr),
-				    &bad_wr);
-		if (ret) {
-			ERROR("ibv_post_recv() failed for WR %p: %s",
-			      (void *)bad_wr,
-			      strerror(ret));
-			return ret;
-		}
-	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_exp_modify_qp(rxq->qp, &mod, IBV_EXP_QP_STATE);
-	if (ret) {
-		ERROR("QP state to IBV_QPS_RTR failed: %s",
-		      strerror(ret));
-		return ret;
-	}
-	params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = rxq->qp,
-	};
-	rxq->if_qp = ibv_exp_query_intf(priv->ctx, &params, &status);
-	if (rxq->if_qp == NULL) {
-		ERROR("QP interface family query failed with status %d",
-		      status);
-		return errno;
-	}
-	return 0;
-}
-
-/**
  * Configure a RX queue.
  *
  * @param dev
@@ -3085,21 +2906,14 @@ rxq_create_qp(struct rxq *rxq,
  *   Thresholds parameters.
  * @param mp
  *   Memory pool for buffer allocations.
- * @param children_n
- *   The number of children in a parent case, zero for a child.
- * @param rxq_parent
- *   The pointer to a parent RX structure (or NULL) in a child case,
- *   NULL for parent.
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive,
-	  const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp, int children_n,
-	  struct rxq *rxq_parent)
+	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rxq tmpl = {
@@ -3107,15 +2921,17 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.mp = mp,
 		.socket = socket
 	};
+	struct ibv_exp_qp_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
 		struct ibv_exp_cq_init_attr cq;
 		struct ibv_exp_res_domain_init_attr rd;
 	} attr;
 	enum ibv_exp_query_intf_status status;
+	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret = 0;
-	int parent = (children_n > 0);
+	int parent = (rxq == &priv->rxq_parent);
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	/*
@@ -3206,6 +3022,32 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
+	if (priv->rss && !inactive)
+		tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
+					   tmpl.rd);
+	else
+		tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+	if (tmpl.qp == NULL) {
+		ret = (errno ? errno : EINVAL);
+		ERROR("%p: QP creation failure: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
+	mod = (struct ibv_exp_qp_attr){
+		/* Move the QP to this state. */
+		.qp_state = IBV_QPS_INIT,
+		/* Primary port number. */
+		.port_num = priv->port
+	};
+	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
+				(IBV_EXP_QP_STATE |
+				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
+				 IBV_EXP_QP_PORT));
+	if (ret) {
+		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
+	}
 	/* Allocate descriptors for RX queues, except for the RSS parent. */
 	if (parent)
 		goto skip_alloc;
@@ -3216,14 +3058,29 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(ret));
-		return ret;
+		goto error;
+	}
+	ret = ibv_post_recv(tmpl.qp,
+			    (tmpl.sp ?
+			     &(*tmpl.elts.sp)[0].wr :
+			     &(*tmpl.elts.no_sp)[0].wr),
+			    &bad_wr);
+	if (ret) {
+		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+		      (void *)dev,
+		      (void *)bad_wr,
+		      strerror(ret));
+		goto error;
 	}
 skip_alloc:
-	if (parent || rxq_parent || !priv->rss) {
-		ret = rxq_create_qp(&tmpl, desc, inactive,
-				    children_n, rxq_parent);
-		if (ret)
-			goto error;
+	mod = (struct ibv_exp_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+	if (ret) {
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(ret));
+		goto error;
 	}
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
@@ -3235,11 +3092,21 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	};
 	tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
 	if (tmpl.if_cq == NULL) {
-		ret = EINVAL;
 		ERROR("%p: CQ interface family query failed with status %d",
 		      (void *)dev, status);
 		goto error;
 	}
+	attr.params = (struct ibv_exp_query_intf_params){
+		.intf_scope = IBV_EXP_INTF_GLOBAL,
+		.intf = IBV_EXP_INTF_QP_BURST,
+		.obj = tmpl.qp,
+	};
+	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+	if (tmpl.if_qp == NULL) {
+		ERROR("%p: QP interface family query failed with status %d",
+		      (void *)dev, status);
+		goto error;
+	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
@@ -3277,7 +3144,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		    unsigned int socket, const struct rte_eth_rxconf *conf,
 		    struct rte_mempool *mp)
 {
-	struct rxq *parent;
 	struct priv *priv = dev->data->dev_private;
 	struct rxq *rxq = (*priv->rxqs)[idx];
 	int inactive = 0;
@@ -3312,16 +3178,9 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -ENOMEM;
 		}
 	}
-	if (priv->rss && !priv->isolated) {
-		/* The list consists of the single default one. */
-		parent = LIST_FIRST(&priv->parents);
-		if (idx >= rte_align32pow2(priv->rxqs_n + 1) >> 1)
-			inactive = 1;
-	} else {
-		parent = NULL;
-	}
-	ret = rxq_setup(dev, rxq, desc, socket,
-			inactive, conf, mp, 0, parent);
+	if (idx >= rte_align32pow2(priv->rxqs_n + 1) >> 1)
+		inactive = 1;
+	ret = rxq_setup(dev, rxq, desc, socket, inactive, conf, mp);
 	if (ret)
 		rte_free(rxq);
 	else {
@@ -3356,6 +3215,7 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		return;
 	priv = rxq->priv;
 	priv_lock(priv);
+	assert(rxq != &priv->rxq_parent);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
@@ -3581,7 +3441,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		priv->txqs = NULL;
 	}
 	if (priv->rss)
-		priv_parent_list_cleanup(priv);
+		rxq_cleanup(&priv->rxq_parent);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b76bf48..b7177d4 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -163,7 +163,6 @@ struct rxq_elt {
 
 /* RX queue descriptor. */
 struct rxq {
-	LIST_ENTRY(rxq) next; /* Used by parent queue only */
 	struct priv *priv; /* Back pointer to private data. */
 	struct rte_mempool *mp; /* Memory Pool for allocations. */
 	struct ibv_mr *mr; /* Memory Region (for mp). */
@@ -185,10 +184,6 @@ struct rxq {
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
-	struct {
-		uint16_t queues_n;
-		uint16_t queues[RTE_MAX_QUEUES_PER_PORT];
-	} rss;
 };
 
 /* TX element. */
@@ -265,6 +260,7 @@ struct priv {
 	unsigned int inl_recv_size; /* Inline recv size */
 	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
+	struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
@@ -274,21 +270,10 @@ struct priv {
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
-	LIST_HEAD(mlx4_parents, rxq) parents;
 	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
 void priv_lock(struct priv *priv);
 void priv_unlock(struct priv *priv);
 
-int
-rxq_create_qp(struct rxq *rxq,
-	      uint16_t desc,
-	      int inactive,
-	      int children_n,
-	      struct rxq *rxq_parent);
-
-void
-rxq_parent_cleanup(struct rxq *parent);
-
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 827115e..2c5dc3c 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -828,23 +828,8 @@ priv_flow_create_action_queue(struct priv *priv,
 	if (action->drop) {
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
-		int ret;
 		struct rxq *rxq = (*priv->rxqs)[action->queue_id];
 
-		if (!rxq->qp) {
-			assert(priv->isolated);
-			ret = rxq_create_qp(rxq, rxq->elts_n,
-					    0, 0, NULL);
-			if (ret) {
-				rte_flow_error_set(
-					error,
-					ENOMEM,
-					RTE_FLOW_ERROR_TYPE_HANDLE,
-					NULL,
-					"flow rule creation failure");
-				goto error;
-			}
-		}
 		qp = rxq->qp;
 		rte_flow->qp = qp;
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 16/51] net/mlx4: drop RSS support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (14 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 15/51] net/mlx4: revert RSS parent queue refactoring Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 17/51] net/mlx4: drop checksum offloads support Adrien Mazarguil
                     ` (36 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The Verbs RSS API used in this PMD is now obsolete. It is superseded by an
enhanced API with fewer constraints already used in the mlx5 PMD.

Drop RSS support in preparation for a major refactoring. The ability to
configure several Rx queues is retained, these can be targeted directly by
creating specific flow rules.

There is no need for "ignored" Rx queues anymore since their number is no
longer limited to powers of two.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |  13 --
 drivers/net/mlx4/mlx4.c           | 212 +++------------------------------
 drivers/net/mlx4/mlx4.h           |   6 -
 4 files changed, 14 insertions(+), 218 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 3acf8d3..aa1ad21 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,7 +13,6 @@ Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
-RSS hash             = Y
 SR-IOV               = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index d3d51f7..9ab9a05 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -78,22 +78,12 @@ Features
 --------
 
 - Multi arch support: x86_64 and POWER8.
-- RSS, also known as RCA, is supported. In this mode the number of
-  configured RX queues must be a power of two.
 - Link state information is provided.
 - Scattered packets are supported for TX and RX.
 - Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
 - Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
 - RX interrupts.
 
-Limitations
------------
-
-- RSS hash key cannot be modified.
-- RSS RETA cannot be configured
-- RSS always includes L3 (IPv4/IPv6) and L4 (UDP/TCP). They cannot be
-  dissociated.
-
 Configuration
 -------------
 
@@ -145,9 +135,6 @@ Environment variables
 Run-time configuration
 ~~~~~~~~~~~~~~~~~~~~~~
 
-- The only constraint when RSS mode is requested is to make sure the number
-  of RX queues is a power of two. This is a hardware requirement.
-
 - librte_pmd_mlx4 brings kernel network interfaces up during initialization
   because it is affected by their state. Forcing them down prevents packets
   reception.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 44a2093..ea0b144 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -31,11 +31,6 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-/*
- * Known limitations:
- * - RSS hash key and options cannot be modified.
- */
-
 /* System headers. */
 #include <stddef.h>
 #include <stdio.h>
@@ -507,7 +502,7 @@ txq_cleanup(struct txq *txq);
 
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  unsigned int socket, const struct rte_eth_rxconf *conf,
 	  struct rte_mempool *mp);
 
 static void
@@ -520,7 +515,6 @@ priv_mac_addr_del(struct priv *priv);
  * Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
- * Allocate parent RSS queue when several RX queues are requested.
  *
  * @param dev
  *   Pointer to Ethernet device structure.
@@ -534,8 +528,6 @@ dev_configure(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
 	unsigned int txqs_n = dev->data->nb_tx_queues;
-	unsigned int tmp;
-	int ret;
 
 	priv->rxqs = (void *)dev->data->rx_queues;
 	priv->txqs = (void *)dev->data->tx_queues;
@@ -544,61 +536,12 @@ dev_configure(struct rte_eth_dev *dev)
 		     (void *)dev, priv->txqs_n, txqs_n);
 		priv->txqs_n = txqs_n;
 	}
-	if (rxqs_n == priv->rxqs_n)
-		return 0;
-	if (!rte_is_power_of_2(rxqs_n) && !priv->isolated) {
-		unsigned n_active;
-
-		n_active = rte_align32pow2(rxqs_n + 1) >> 1;
-		WARN("%p: number of RX queues must be a power"
-			" of 2: %u queues among %u will be active",
-			(void *)dev, n_active, rxqs_n);
-	}
-
-	INFO("%p: RX queues number update: %u -> %u",
-	     (void *)dev, priv->rxqs_n, rxqs_n);
-	/* If RSS is enabled, disable it first. */
-	if (priv->rss) {
-		unsigned int i;
-
-		/* Only if there are no remaining child RX queues. */
-		for (i = 0; (i != priv->rxqs_n); ++i)
-			if ((*priv->rxqs)[i] != NULL)
-				return EINVAL;
-		priv_mac_addr_del(priv);
-		rxq_cleanup(&priv->rxq_parent);
-		priv->rss = 0;
-		priv->rxqs_n = 0;
-	}
-	if (rxqs_n <= 1) {
-		/* Nothing else to do. */
+	if (rxqs_n != priv->rxqs_n) {
+		INFO("%p: Rx queues number update: %u -> %u",
+		     (void *)dev, priv->rxqs_n, rxqs_n);
 		priv->rxqs_n = rxqs_n;
-		return 0;
-	}
-	/* Allocate a new RSS parent queue if supported by hardware. */
-	if (!priv->hw_rss) {
-		ERROR("%p: only a single RX queue can be configured when"
-		      " hardware doesn't support RSS",
-		      (void *)dev);
-		return EINVAL;
 	}
-	/* Fail if hardware doesn't support that many RSS queues. */
-	if (rxqs_n >= priv->max_rss_tbl_sz) {
-		ERROR("%p: only %u RX queues can be configured for RSS",
-		      (void *)dev, priv->max_rss_tbl_sz);
-		return EINVAL;
-	}
-	priv->rss = 1;
-	tmp = priv->rxqs_n;
-	priv->rxqs_n = rxqs_n;
-	ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, 0, NULL, NULL);
-	if (!ret)
-		return 0;
-	/* Failure, rollback. */
-	priv->rss = 0;
-	priv->rxqs_n = tmp;
-	assert(ret > 0);
-	return ret;
+	return 0;
 }
 
 /**
@@ -2014,8 +1957,7 @@ priv_mac_addr_del(struct priv *priv)
 /**
  * Register a MAC address.
  *
- * In RSS mode, the MAC address is registered in the parent queue,
- * otherwise it is registered in queue 0.
+ * The MAC address is registered in queue 0.
  *
  * @param priv
  *   Pointer to private structure.
@@ -2035,9 +1977,7 @@ priv_mac_addr_add(struct priv *priv)
 		return 0;
 	if (priv->isolated)
 		return 0;
-	if (priv->rss)
-		rxq = &priv->rxq_parent;
-	else if (*priv->rxqs && (*priv->rxqs)[0])
+	if (*priv->rxqs && (*priv->rxqs)[0])
 		rxq = (*priv->rxqs)[0];
 	else
 		return 0;
@@ -2647,69 +2587,8 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-	attr.max_inl_recv = priv->inl_recv_size;
-	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-	return ibv_exp_create_qp(priv->ctx, &attr);
-}
-
-/**
- * Allocate a RSS Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- * @param parent
- *   If nonzero, create a parent QP, otherwise a child.
- *
- * @return
- *   QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-		 int parent, struct ibv_exp_res_domain *rd)
-{
-	struct ibv_exp_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX4_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX4_PMD_SGE_WR_N),
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
-			      IBV_EXP_QP_INIT_ATTR_QPG),
-		.pd = priv->pd,
-		.res_domain = rd,
-	};
-
 	attr.max_inl_recv = priv->inl_recv_size,
 	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-	if (parent) {
-		attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
-		/* TSS isn't necessary. */
-		attr.qpg.parent_attrib.tss_child_count = 0;
-		attr.qpg.parent_attrib.rss_child_count =
-			rte_align32pow2(priv->rxqs_n + 1) >> 1;
-		DEBUG("initializing parent RSS queue");
-	} else {
-		attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
-		attr.qpg.qpg_parent = priv->rxq_parent.qp;
-		DEBUG("initializing child RSS queue");
-	}
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
@@ -2741,13 +2620,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int err;
-	int parent = (rxq == &priv->rxq_parent);
 
-	if (parent) {
-		ERROR("%p: cannot rehash parent queue %p",
-		      (void *)dev, (void *)rxq);
-		return EINVAL;
-	}
 	mb_len = rte_pktmbuf_data_room_size(rxq->mp);
 	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
 	/* Number of descriptors and mbufs currently allocated. */
@@ -2800,9 +2673,8 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 		.port_num = priv->port
 	};
 	err = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-				 IBV_EXP_QP_PORT));
+				IBV_EXP_QP_STATE |
+				IBV_EXP_QP_PORT);
 	if (err) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(err));
@@ -2899,9 +2771,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
  *   Number of descriptors to configure in queue.
  * @param socket
  *   NUMA socket on which memory must be allocated.
- * @param inactive
- *   If true, the queue is disabled because its index is higher or
- *   equal to the real number of queues, which must be a power of 2.
  * @param[in] conf
  *   Thresholds parameters.
  * @param mp
@@ -2912,7 +2781,7 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
  */
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, int inactive, const struct rte_eth_rxconf *conf,
+	  unsigned int socket, const struct rte_eth_rxconf *conf,
 	  struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
@@ -2931,20 +2800,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret = 0;
-	int parent = (rxq == &priv->rxq_parent);
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	/*
-	 * If this is a parent queue, hardware must support RSS and
-	 * RSS must be enabled.
-	 */
-	assert((!parent) || ((priv->hw_rss) && (priv->rss)));
-	if (parent) {
-		/* Even if unused, ibv_create_cq() requires at least one
-		 * descriptor. */
-		desc = 1;
-		goto skip_mr;
-	}
 	mb_len = rte_pktmbuf_data_room_size(mp);
 	if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
 		ERROR("%p: invalid number of RX descriptors (must be a"
@@ -2982,7 +2839,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-skip_mr:
 	attr.rd = (struct ibv_exp_res_domain_init_attr){
 		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
 			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
@@ -3022,11 +2878,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	if (priv->rss && !inactive)
-		tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
-					   tmpl.rd);
-	else
-		tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
 	if (tmpl.qp == NULL) {
 		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
@@ -3040,17 +2892,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.port_num = priv->port
 	};
 	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
-				(IBV_EXP_QP_STATE |
-				 (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
-				 IBV_EXP_QP_PORT));
+				IBV_EXP_QP_STATE |
+				IBV_EXP_QP_PORT);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	/* Allocate descriptors for RX queues, except for the RSS parent. */
-	if (parent)
-		goto skip_alloc;
 	if (tmpl.sp)
 		ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
 	else
@@ -3072,7 +2920,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      strerror(ret));
 		goto error;
 	}
-skip_alloc:
 	mod = (struct ibv_exp_qp_attr){
 		.qp_state = IBV_QPS_RTR
 	};
@@ -3146,7 +2993,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 {
 	struct priv *priv = dev->data->dev_private;
 	struct rxq *rxq = (*priv->rxqs)[idx];
-	int inactive = 0;
 	int ret;
 
 	priv_lock(priv);
@@ -3178,9 +3024,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			return -ENOMEM;
 		}
 	}
-	if (idx >= rte_align32pow2(priv->rxqs_n + 1) >> 1)
-		inactive = 1;
-	ret = rxq_setup(dev, rxq, desc, socket, inactive, conf, mp);
+	ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
 	if (ret)
 		rte_free(rxq);
 	else {
@@ -3215,7 +3059,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		return;
 	priv = rxq->priv;
 	priv_lock(priv);
-	assert(rxq != &priv->rxq_parent);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
@@ -3440,8 +3283,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		priv->txqs_n = 0;
 		priv->txqs = NULL;
 	}
-	if (priv->rss)
-		rxq_cleanup(&priv->rxq_parent);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
@@ -4756,7 +4597,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
 		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
-		exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
 
 		DEBUG("using port %u", port);
 
@@ -4814,30 +4654,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			err = ENODEV;
 			goto port_error;
 		}
-		if ((exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_QPG) &&
-		    (exp_device_attr.exp_device_cap_flags &
-		     IBV_EXP_DEVICE_UD_RSS) &&
-		    (exp_device_attr.comp_mask &
-		     IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ) &&
-		    (exp_device_attr.max_rss_tbl_sz > 0)) {
-			priv->hw_qpg = 1;
-			priv->hw_rss = 1;
-			priv->max_rss_tbl_sz = exp_device_attr.max_rss_tbl_sz;
-		} else {
-			priv->hw_qpg = 0;
-			priv->hw_rss = 0;
-			priv->max_rss_tbl_sz = 0;
-		}
-		priv->hw_tss = !!(exp_device_attr.exp_device_cap_flags &
-				  IBV_EXP_DEVICE_UD_TSS);
-		DEBUG("device flags: %s%s%s",
-		      (priv->hw_qpg ? "IBV_DEVICE_QPG " : ""),
-		      (priv->hw_tss ? "IBV_DEVICE_TSS " : ""),
-		      (priv->hw_rss ? "IBV_DEVICE_RSS " : ""));
-		if (priv->hw_rss)
-			DEBUG("maximum RSS indirection table size: %u",
-			      exp_device_attr.max_rss_tbl_sz);
 
 		priv->hw_csum =
 			((exp_device_attr.exp_device_cap_flags &
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b7177d4..8a16b1e 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -248,19 +248,13 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int hw_qpg:1; /* QP groups are supported. */
-	unsigned int hw_tss:1; /* TSS is supported. */
-	unsigned int hw_rss:1; /* RSS is supported. */
 	unsigned int hw_csum:1; /* Checksum offload is supported. */
 	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
-	unsigned int rss:1; /* RSS is enabled. */
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
 	unsigned int inl_recv_size; /* Inline recv size */
-	unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
 	/* RX/TX queues. */
-	struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 17/51] net/mlx4: drop checksum offloads support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (15 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 16/51] net/mlx4: drop RSS support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 18/51] net/mlx4: drop packet type recognition support Adrien Mazarguil
                     ` (35 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The Verbs API used to implement Tx and Rx checksum offloads is deprecated.
Support for these will be added back after refactoring the PMD.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  4 --
 doc/guides/nics/mlx4.rst          |  2 -
 drivers/net/mlx4/mlx4.c           | 91 ++--------------------------------
 drivers/net/mlx4/mlx4.h           |  4 --
 4 files changed, 4 insertions(+), 97 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index aa1ad21..08a2e17 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -14,10 +14,6 @@ MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
 SR-IOV               = Y
-L3 checksum offload  = Y
-L4 checksum offload  = Y
-Inner L3 checksum    = Y
-Inner L4 checksum    = Y
 Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 9ab9a05..754b2d0 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -80,8 +80,6 @@ Features
 - Multi arch support: x86_64 and POWER8.
 - Link state information is provided.
 - Scattered packets are supported for TX and RX.
-- Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
-- Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
 - RX interrupts.
 
 Configuration
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ea0b144..06fe22d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1258,17 +1258,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			++elts_comp;
 			send_flags |= IBV_EXP_QP_BURST_SIGNALED;
 		}
-		/* Should we enable HW CKSUM offload */
-		if (buf->ol_flags &
-		    (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
-			send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
-			/* HW does not support checksum offloads at arbitrary
-			 * offsets but automatically recognizes the packet
-			 * type. For inner L3/L4 checksums, only VXLAN (UDP)
-			 * tunnels are currently supported. */
-			if (RTE_ETH_IS_TUNNEL_PKT(buf->packet_type))
-				send_flags |= IBV_EXP_QP_BURST_TUNNEL;
-		}
 		if (likely(segs == 1)) {
 			uintptr_t addr;
 			uint32_t length;
@@ -2140,41 +2129,6 @@ rxq_cq_to_pkt_type(uint32_t flags)
 	return pkt_type;
 }
 
-/**
- * Translate RX completion flags to offload flags.
- *
- * @param[in] rxq
- *   Pointer to RX queue structure.
- * @param flags
- *   RX completion flags returned by poll_length_flags().
- *
- * @return
- *   Offload flags (ol_flags) for struct rte_mbuf.
- */
-static inline uint32_t
-rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
-{
-	uint32_t ol_flags = 0;
-
-	if (rxq->csum)
-		ol_flags |=
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IP_CSUM_OK,
-				  PKT_RX_IP_CKSUM_GOOD) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
-				  PKT_RX_L4_CKSUM_GOOD);
-	if ((flags & IBV_EXP_CQ_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
-		ol_flags |=
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_IP_CSUM_OK,
-				  PKT_RX_IP_CKSUM_GOOD) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_TCP_UDP_CSUM_OK,
-				  PKT_RX_L4_CKSUM_GOOD);
-	return ol_flags;
-}
-
 static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
 
@@ -2362,7 +2316,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		PORT(pkt_buf) = rxq->port_id;
 		PKT_LEN(pkt_buf) = pkt_buf_len;
 		pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
-		pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
+		pkt_buf->ol_flags = 0;
 
 		/* Return packet. */
 		*(pkts++) = pkt_buf;
@@ -2517,7 +2471,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		PKT_LEN(seg) = len;
 		DATA_LEN(seg) = len;
 		seg->packet_type = rxq_cq_to_pkt_type(flags);
-		seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
+		seg->ol_flags = 0;
 
 		/* Return packet. */
 		*(pkts++) = seg;
@@ -2626,15 +2580,6 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
 	/* Number of descriptors and mbufs currently allocated. */
 	desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
 	mbuf_n = desc_n;
-	/* Toggle RX checksum offload if hardware supports it. */
-	if (priv->hw_csum) {
-		tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-		rxq->csum = tmpl.csum;
-	}
-	if (priv->hw_csum_l2tun) {
-		tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-		rxq->csum_l2tun = tmpl.csum_l2tun;
-	}
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.enable_scatter &&
@@ -2808,11 +2753,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
 		return EINVAL;
 	}
-	/* Toggle RX checksum offload if hardware supports it. */
-	if (priv->hw_csum)
-		tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
-	if (priv->hw_csum_l2tun)
-		tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
@@ -3416,18 +3356,8 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->max_tx_queues = max;
 	/* Last array entry is reserved for broadcast. */
 	info->max_mac_addrs = 1;
-	info->rx_offload_capa =
-		(priv->hw_csum ?
-		 (DEV_RX_OFFLOAD_IPV4_CKSUM |
-		  DEV_RX_OFFLOAD_UDP_CKSUM |
-		  DEV_RX_OFFLOAD_TCP_CKSUM) :
-		 0);
-	info->tx_offload_capa =
-		(priv->hw_csum ?
-		 (DEV_TX_OFFLOAD_IPV4_CKSUM |
-		  DEV_TX_OFFLOAD_UDP_CKSUM |
-		  DEV_TX_OFFLOAD_TCP_CKSUM) :
-		 0);
+	info->rx_offload_capa = 0;
+	info->tx_offload_capa = 0;
 	if (priv_get_ifname(priv, &ifname) == 0)
 		info->if_index = if_nametoindex(ifname);
 	info->speed_capa =
@@ -4655,19 +4585,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			goto port_error;
 		}
 
-		priv->hw_csum =
-			((exp_device_attr.exp_device_cap_flags &
-			  IBV_EXP_DEVICE_RX_CSUM_TCP_UDP_PKT) &&
-			 (exp_device_attr.exp_device_cap_flags &
-			  IBV_EXP_DEVICE_RX_CSUM_IP_PKT));
-		DEBUG("checksum offloading is %ssupported",
-		      (priv->hw_csum ? "" : "not "));
-
-		priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags &
-					 IBV_EXP_DEVICE_VXLAN_SUPPORT);
-		DEBUG("L2 tunnel checksum offloads are %ssupported",
-		      (priv->hw_csum_l2tun ? "" : "not "));
-
 		priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
 
 		if (priv->inl_recv_size) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 8a16b1e..d1104c3 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -179,8 +179,6 @@ struct rxq {
 		struct rxq_elt (*no_sp)[]; /* RX elements. */
 	} elts;
 	unsigned int sp:1; /* Use scattered RX elements. */
-	unsigned int csum:1; /* Enable checksum offloading. */
-	unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -248,8 +246,6 @@ struct priv {
 	uint16_t mtu; /* Configured MTU. */
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
-	unsigned int hw_csum:1; /* Checksum offload is supported. */
-	unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 18/51] net/mlx4: drop packet type recognition support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (16 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 17/51] net/mlx4: drop checksum offloads support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 19/51] net/mlx4: drop scatter/gather support Adrien Mazarguil
                     ` (34 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The Verbs API used to implement packet type recognition is deprecated.
Support will be added back after refactoring the PMD.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |  1 -
 drivers/net/mlx4/mlx4.c           | 70 +---------------------------------
 2 files changed, 2 insertions(+), 69 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 08a2e17..27c7ae3 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -14,7 +14,6 @@ MTU update           = Y
 Jumbo frame          = Y
 Scattered Rx         = Y
 SR-IOV               = Y
-Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
 Other kdrv           = Y
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 06fe22d..f026bcd 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -96,12 +96,6 @@ typedef union {
 
 #define WR_ID(o) (((wr_id_t *)&(o))->data)
 
-/* Transpose flags. Useful to convert IBV to DPDK flags. */
-#define TRANSPOSE(val, from, to) \
-	(((from) >= (to)) ? \
-	 (((val) & (from)) / ((from) / (to))) : \
-	 (((val) & (from)) * ((to) / (from))))
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -2088,47 +2082,6 @@ rxq_cleanup(struct rxq *rxq)
 	memset(rxq, 0, sizeof(*rxq));
 }
 
-/**
- * Translate RX completion flags to packet type.
- *
- * @param flags
- *   RX completion flags returned by poll_length_flags().
- *
- * @note: fix mlx4_dev_supported_ptypes_get() if any change here.
- *
- * @return
- *   Packet type for struct rte_mbuf.
- */
-static inline uint32_t
-rxq_cq_to_pkt_type(uint32_t flags)
-{
-	uint32_t pkt_type;
-
-	if (flags & IBV_EXP_CQ_RX_TUNNEL_PACKET)
-		pkt_type =
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_IPV4_PACKET,
-				  RTE_PTYPE_L3_IPV4_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_OUTER_IPV6_PACKET,
-				  RTE_PTYPE_L3_IPV6_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV4_PACKET,
-				  RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV6_PACKET,
-				  RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN);
-	else
-		pkt_type =
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV4_PACKET,
-				  RTE_PTYPE_L3_IPV4_EXT_UNKNOWN) |
-			TRANSPOSE(flags,
-				  IBV_EXP_CQ_RX_IPV6_PACKET,
-				  RTE_PTYPE_L3_IPV6_EXT_UNKNOWN);
-	return pkt_type;
-}
-
 static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
 
@@ -2315,7 +2268,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		NB_SEGS(pkt_buf) = j;
 		PORT(pkt_buf) = rxq->port_id;
 		PKT_LEN(pkt_buf) = pkt_buf_len;
-		pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
+		pkt_buf->packet_type = 0;
 		pkt_buf->ol_flags = 0;
 
 		/* Return packet. */
@@ -2470,7 +2423,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		NEXT(seg) = NULL;
 		PKT_LEN(seg) = len;
 		DATA_LEN(seg) = len;
-		seg->packet_type = rxq_cq_to_pkt_type(flags);
+		seg->packet_type = 0;
 		seg->ol_flags = 0;
 
 		/* Return packet. */
@@ -3369,24 +3322,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	priv_unlock(priv);
 }
 
-static const uint32_t *
-mlx4_dev_supported_ptypes_get(struct rte_eth_dev *dev)
-{
-	static const uint32_t ptypes[] = {
-		/* refers to rxq_cq_to_pkt_type() */
-		RTE_PTYPE_L3_IPV4,
-		RTE_PTYPE_L3_IPV6,
-		RTE_PTYPE_INNER_L3_IPV4,
-		RTE_PTYPE_INNER_L3_IPV6,
-		RTE_PTYPE_UNKNOWN
-	};
-
-	if (dev->rx_pkt_burst == mlx4_rx_burst ||
-	    dev->rx_pkt_burst == mlx4_rx_burst_sp)
-		return ptypes;
-	return NULL;
-}
-
 /**
  * DPDK callback to get device statistics.
  *
@@ -3768,7 +3703,6 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.stats_get = mlx4_stats_get,
 	.stats_reset = mlx4_stats_reset,
 	.dev_infos_get = mlx4_dev_infos_get,
-	.dev_supported_ptypes_get = mlx4_dev_supported_ptypes_get,
 	.rx_queue_setup = mlx4_rx_queue_setup,
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 19/51] net/mlx4: drop scatter/gather support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (17 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 18/51] net/mlx4: drop packet type recognition support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 20/51] net/mlx4: drop inline receive support Adrien Mazarguil
                     ` (33 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The Verbs API used to implement Tx and Rx burst functions is deprecated.
Drop scatter/gather support to ease refactoring while maintaining basic
single-segment Rx/Tx functionality in the meantime.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 doc/guides/nics/features/mlx4.ini |   1 -
 doc/guides/nics/mlx4.rst          |   1 -
 drivers/net/mlx4/mlx4.c           | 860 +--------------------------------
 drivers/net/mlx4/mlx4.h           |  28 +-
 4 files changed, 26 insertions(+), 864 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini b/doc/guides/nics/features/mlx4.ini
index 27c7ae3..0812a30 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -12,7 +12,6 @@ Rx interrupt         = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
-Scattered Rx         = Y
 SR-IOV               = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 754b2d0..5c3fb76 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -79,7 +79,6 @@ Features
 
 - Multi arch support: x86_64 and POWER8.
 - Link state information is provided.
-- Scattered packets are supported for TX and RX.
 - RX interrupts.
 
 Configuration
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f026bcd..0a71e2c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -60,7 +60,6 @@
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
 #include <rte_spinlock.h>
-#include <rte_atomic.h>
 #include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
@@ -582,26 +581,13 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 	unsigned int i;
 	struct txq_elt (*elts)[elts_n] =
 		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
-	linear_t (*elts_linear)[elts_n] =
-		rte_calloc_socket("TXQ", 1, sizeof(*elts_linear), 0,
-				  txq->socket);
-	struct ibv_mr *mr_linear = NULL;
 	int ret = 0;
 
-	if ((elts == NULL) || (elts_linear == NULL)) {
+	if (elts == NULL) {
 		ERROR("%p: can't allocate packets array", (void *)txq);
 		ret = ENOMEM;
 		goto error;
 	}
-	mr_linear =
-		ibv_reg_mr(txq->priv->pd, elts_linear, sizeof(*elts_linear),
-			   IBV_ACCESS_LOCAL_WRITE);
-	if (mr_linear == NULL) {
-		ERROR("%p: unable to configure MR, ibv_reg_mr() failed",
-		      (void *)txq);
-		ret = EINVAL;
-		goto error;
-	}
 	for (i = 0; (i != elts_n); ++i) {
 		struct txq_elt *elt = &(*elts)[i];
 
@@ -619,15 +605,9 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
 		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
 	txq->elts_comp_cd = txq->elts_comp_cd_init;
-	txq->elts_linear = elts_linear;
-	txq->mr_linear = mr_linear;
 	assert(ret == 0);
 	return 0;
 error:
-	if (mr_linear != NULL)
-		claim_zero(ibv_dereg_mr(mr_linear));
-
-	rte_free(elts_linear);
 	rte_free(elts);
 
 	DEBUG("%p: failed, freed everything", (void *)txq);
@@ -648,8 +628,6 @@ txq_free_elts(struct txq *txq)
 	unsigned int elts_head = txq->elts_head;
 	unsigned int elts_tail = txq->elts_tail;
 	struct txq_elt (*elts)[elts_n] = txq->elts;
-	linear_t (*elts_linear)[elts_n] = txq->elts_linear;
-	struct ibv_mr *mr_linear = txq->mr_linear;
 
 	DEBUG("%p: freeing WRs", (void *)txq);
 	txq->elts_n = 0;
@@ -659,12 +637,6 @@ txq_free_elts(struct txq *txq)
 	txq->elts_comp_cd = 0;
 	txq->elts_comp_cd_init = 0;
 	txq->elts = NULL;
-	txq->elts_linear = NULL;
-	txq->mr_linear = NULL;
-	if (mr_linear != NULL)
-		claim_zero(ibv_dereg_mr(mr_linear));
-
-	rte_free(elts_linear);
 	if (elts == NULL)
 		return;
 	while (elts_tail != elts_head) {
@@ -1037,152 +1009,6 @@ txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 }
 
 /**
- * Copy scattered mbuf contents to a single linear buffer.
- *
- * @param[out] linear
- *   Linear output buffer.
- * @param[in] buf
- *   Scattered input buffer.
- *
- * @return
- *   Number of bytes copied to the output buffer or 0 if not large enough.
- */
-static unsigned int
-linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
-{
-	unsigned int size = 0;
-	unsigned int offset;
-
-	do {
-		unsigned int len = DATA_LEN(buf);
-
-		offset = size;
-		size += len;
-		if (unlikely(size > sizeof(*linear)))
-			return 0;
-		memcpy(&(*linear)[offset],
-		       rte_pktmbuf_mtod(buf, uint8_t *),
-		       len);
-		buf = NEXT(buf);
-	} while (buf != NULL);
-	return size;
-}
-
-/**
- * Handle scattered buffers for mlx4_tx_burst().
- *
- * @param txq
- *   TX queue structure.
- * @param segs
- *   Number of segments in buf.
- * @param elt
- *   TX queue element to fill.
- * @param[in] buf
- *   Buffer to process.
- * @param elts_head
- *   Index of the linear buffer to use if necessary (normally txq->elts_head).
- * @param[out] sges
- *   Array filled with SGEs on success.
- *
- * @return
- *   A structure containing the processed packet size in bytes and the
- *   number of SGEs. Both fields are set to (unsigned int)-1 in case of
- *   failure.
- */
-static struct tx_burst_sg_ret {
-	unsigned int length;
-	unsigned int num;
-}
-tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
-	    struct rte_mbuf *buf, unsigned int elts_head,
-	    struct ibv_sge (*sges)[MLX4_PMD_SGE_WR_N])
-{
-	unsigned int sent_size = 0;
-	unsigned int j;
-	int linearize = 0;
-
-	/* When there are too many segments, extra segments are
-	 * linearized in the last SGE. */
-	if (unlikely(segs > elemof(*sges))) {
-		segs = (elemof(*sges) - 1);
-		linearize = 1;
-	}
-	/* Update element. */
-	elt->buf = buf;
-	/* Register segments as SGEs. */
-	for (j = 0; (j != segs); ++j) {
-		struct ibv_sge *sge = &(*sges)[j];
-		uint32_t lkey;
-
-		/* Retrieve Memory Region key for this memory pool. */
-		lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-		if (unlikely(lkey == (uint32_t)-1)) {
-			/* MR does not exist. */
-			DEBUG("%p: unable to get MP <-> MR association",
-			      (void *)txq);
-			/* Clean up TX element. */
-			elt->buf = NULL;
-			goto stop;
-		}
-		/* Update SGE. */
-		sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
-		if (txq->priv->vf)
-			rte_prefetch0((volatile void *)
-				      (uintptr_t)sge->addr);
-		sge->length = DATA_LEN(buf);
-		sge->lkey = lkey;
-		sent_size += sge->length;
-		buf = NEXT(buf);
-	}
-	/* If buf is not NULL here and is not going to be linearized,
-	 * nb_segs is not valid. */
-	assert(j == segs);
-	assert((buf == NULL) || (linearize));
-	/* Linearize extra segments. */
-	if (linearize) {
-		struct ibv_sge *sge = &(*sges)[segs];
-		linear_t *linear = &(*txq->elts_linear)[elts_head];
-		unsigned int size = linearize_mbuf(linear, buf);
-
-		assert(segs == (elemof(*sges) - 1));
-		if (size == 0) {
-			/* Invalid packet. */
-			DEBUG("%p: packet too large to be linearized.",
-			      (void *)txq);
-			/* Clean up TX element. */
-			elt->buf = NULL;
-			goto stop;
-		}
-		/* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately. */
-		if (elemof(*sges) == 1) {
-			do {
-				struct rte_mbuf *next = NEXT(buf);
-
-				rte_pktmbuf_free_seg(buf);
-				buf = next;
-			} while (buf != NULL);
-			elt->buf = NULL;
-		}
-		/* Update SGE. */
-		sge->addr = (uintptr_t)&(*linear)[0];
-		sge->length = size;
-		sge->lkey = txq->mr_linear->lkey;
-		sent_size += size;
-		/* Include last segment. */
-		segs++;
-	}
-	return (struct tx_burst_sg_ret){
-		.length = sent_size,
-		.num = segs,
-	};
-stop:
-	return (struct tx_burst_sg_ret){
-		.length = -1,
-		.num = -1,
-	};
-}
-
-/**
  * DPDK callback for TX.
  *
  * @param dpdk_txq
@@ -1294,23 +1120,8 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				goto stop;
 			sent_size += length;
 		} else {
-			struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
-			struct tx_burst_sg_ret ret;
-
-			ret = tx_burst_sg(txq, segs, elt, buf, elts_head,
-					  &sges);
-			if (ret.length == (unsigned int)-1)
-				goto stop;
-			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
-			/* Put SG list into send queue. */
-			err = txq->if_qp->send_pending_sg_list
-				(txq->qp,
-				 sges,
-				 ret.num,
-				 send_flags);
-			if (unlikely(err))
-				goto stop;
-			sent_size += ret.length;
+			err = -1;
+			goto stop;
 		}
 		elts_head = elts_head_next;
 		/* Increment sent bytes counter. */
@@ -1375,12 +1186,10 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	(void)conf; /* Thresholds configuration (ignored). */
 	if (priv == NULL)
 		return EINVAL;
-	if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
-		ERROR("%p: invalid number of TX descriptors (must be a"
-		      " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
+	if (desc == 0) {
+		ERROR("%p: invalid number of Tx descriptors", (void *)dev);
 		return EINVAL;
 	}
-	desc /= MLX4_PMD_SGE_WR_N;
 	/* MRs will be registered in mp2mr[] later. */
 	attr.rd = (struct ibv_exp_res_domain_init_attr){
 		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
@@ -1421,10 +1230,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 					priv->device_attr.max_qp_wr :
 					desc),
 			/* Max number of scatter/gather elements in a WR. */
-			.max_send_sge = ((priv->device_attr.max_sge <
-					  MLX4_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX4_PMD_SGE_WR_N),
+			.max_send_sge = 1,
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
@@ -1623,168 +1429,18 @@ mlx4_tx_queue_release(void *dpdk_txq)
 /* RX queues handling. */
 
 /**
- * Allocate RX queue elements with scattered packets support.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- * @param[in] pool
- *   If not NULL, fetch buffers from this array instead of allocating them
- *   with rte_pktmbuf_alloc().
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
-		  struct rte_mbuf **pool)
-{
-	unsigned int i;
-	struct rxq_elt_sp (*elts)[elts_n] =
-		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
-				  rxq->socket);
-	int ret = 0;
-
-	if (elts == NULL) {
-		ERROR("%p: can't allocate packets array", (void *)rxq);
-		ret = ENOMEM;
-		goto error;
-	}
-	/* For each WR (packet). */
-	for (i = 0; (i != elts_n); ++i) {
-		unsigned int j;
-		struct rxq_elt_sp *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
-		struct ibv_sge (*sges)[(elemof(elt->sges))] = &elt->sges;
-
-		/* These two arrays must have the same size. */
-		assert(elemof(elt->sges) == elemof(elt->bufs));
-		/* Configure WR. */
-		wr->wr_id = i;
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = &(*sges)[0];
-		wr->num_sge = elemof(*sges);
-		/* For each SGE (segment). */
-		for (j = 0; (j != elemof(elt->bufs)); ++j) {
-			struct ibv_sge *sge = &(*sges)[j];
-			struct rte_mbuf *buf;
-
-			if (pool != NULL) {
-				buf = *(pool++);
-				assert(buf != NULL);
-				rte_pktmbuf_reset(buf);
-			} else
-				buf = rte_pktmbuf_alloc(rxq->mp);
-			if (buf == NULL) {
-				assert(pool == NULL);
-				ERROR("%p: empty mbuf pool", (void *)rxq);
-				ret = ENOMEM;
-				goto error;
-			}
-			elt->bufs[j] = buf;
-			/* Headroom is reserved by rte_pktmbuf_alloc(). */
-			assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
-			/* Buffer is supposed to be empty. */
-			assert(rte_pktmbuf_data_len(buf) == 0);
-			assert(rte_pktmbuf_pkt_len(buf) == 0);
-			/* sge->addr must be able to store a pointer. */
-			assert(sizeof(sge->addr) >= sizeof(uintptr_t));
-			if (j == 0) {
-				/* The first SGE keeps its headroom. */
-				sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
-				sge->length = (buf->buf_len -
-					       RTE_PKTMBUF_HEADROOM);
-			} else {
-				/* Subsequent SGEs lose theirs. */
-				assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
-				SET_DATA_OFF(buf, 0);
-				sge->addr = (uintptr_t)buf->buf_addr;
-				sge->length = buf->buf_len;
-			}
-			sge->lkey = rxq->mr->lkey;
-			/* Redundant check for tailroom. */
-			assert(sge->length == rte_pktmbuf_tailroom(buf));
-		}
-	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
-	DEBUG("%p: allocated and configured %u WRs (%zu segments)",
-	      (void *)rxq, elts_n, (elts_n * elemof((*elts)[0].sges)));
-	rxq->elts_n = elts_n;
-	rxq->elts_head = 0;
-	rxq->elts.sp = elts;
-	assert(ret == 0);
-	return 0;
-error:
-	if (elts != NULL) {
-		assert(pool == NULL);
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			unsigned int j;
-			struct rxq_elt_sp *elt = &(*elts)[i];
-
-			for (j = 0; (j != elemof(elt->bufs)); ++j) {
-				struct rte_mbuf *buf = elt->bufs[j];
-
-				if (buf != NULL)
-					rte_pktmbuf_free_seg(buf);
-			}
-		}
-		rte_free(elts);
-	}
-	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(ret > 0);
-	return ret;
-}
-
-/**
- * Free RX queue elements with scattered packets support.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_free_elts_sp(struct rxq *rxq)
-{
-	unsigned int i;
-	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt_sp (*elts)[elts_n] = rxq->elts.sp;
-
-	DEBUG("%p: freeing WRs", (void *)rxq);
-	rxq->elts_n = 0;
-	rxq->elts.sp = NULL;
-	if (elts == NULL)
-		return;
-	for (i = 0; (i != elemof(*elts)); ++i) {
-		unsigned int j;
-		struct rxq_elt_sp *elt = &(*elts)[i];
-
-		for (j = 0; (j != elemof(elt->bufs)); ++j) {
-			struct rte_mbuf *buf = elt->bufs[j];
-
-			if (buf != NULL)
-				rte_pktmbuf_free_seg(buf);
-		}
-	}
-	rte_free(elts);
-}
-
-/**
  * Allocate RX queue elements.
  *
  * @param rxq
  *   Pointer to RX queue structure.
  * @param elts_n
  *   Number of elements to allocate.
- * @param[in] pool
- *   If not NULL, fetch buffers from this array instead of allocating them
- *   with rte_pktmbuf_alloc().
  *
  * @return
  *   0 on success, errno value on failure.
  */
 static int
-rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
+rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 {
 	unsigned int i;
 	struct rxq_elt (*elts)[elts_n] =
@@ -1802,16 +1458,9 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 		struct rxq_elt *elt = &(*elts)[i];
 		struct ibv_recv_wr *wr = &elt->wr;
 		struct ibv_sge *sge = &(*elts)[i].sge;
-		struct rte_mbuf *buf;
+		struct rte_mbuf *buf = rte_pktmbuf_alloc(rxq->mp);
 
-		if (pool != NULL) {
-			buf = *(pool++);
-			assert(buf != NULL);
-			rte_pktmbuf_reset(buf);
-		} else
-			buf = rte_pktmbuf_alloc(rxq->mp);
 		if (buf == NULL) {
-			assert(pool == NULL);
 			ERROR("%p: empty mbuf pool", (void *)rxq);
 			ret = ENOMEM;
 			goto error;
@@ -1859,12 +1508,11 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
 	      (void *)rxq, elts_n);
 	rxq->elts_n = elts_n;
 	rxq->elts_head = 0;
-	rxq->elts.no_sp = elts;
+	rxq->elts = elts;
 	assert(ret == 0);
 	return 0;
 error:
 	if (elts != NULL) {
-		assert(pool == NULL);
 		for (i = 0; (i != elemof(*elts)); ++i) {
 			struct rxq_elt *elt = &(*elts)[i];
 			struct rte_mbuf *buf;
@@ -1894,11 +1542,11 @@ rxq_free_elts(struct rxq *rxq)
 {
 	unsigned int i;
 	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt (*elts)[elts_n] = rxq->elts.no_sp;
+	struct rxq_elt (*elts)[elts_n] = rxq->elts;
 
 	DEBUG("%p: freeing WRs", (void *)rxq);
 	rxq->elts_n = 0;
-	rxq->elts.no_sp = NULL;
+	rxq->elts = NULL;
 	if (elts == NULL)
 		return;
 	for (i = 0; (i != elemof(*elts)); ++i) {
@@ -2034,10 +1682,7 @@ rxq_cleanup(struct rxq *rxq)
 	struct ibv_exp_release_intf_params params;
 
 	DEBUG("cleaning up %p", (void *)rxq);
-	if (rxq->sp)
-		rxq_free_elts_sp(rxq);
-	else
-		rxq_free_elts(rxq);
+	rxq_free_elts(rxq);
 	if (rxq->if_qp != NULL) {
 		assert(rxq->priv != NULL);
 		assert(rxq->priv->ctx != NULL);
@@ -2082,230 +1727,10 @@ rxq_cleanup(struct rxq *rxq)
 	memset(rxq, 0, sizeof(*rxq));
 }
 
-static uint16_t
-mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
-
-/**
- * DPDK callback for RX with scattered packets support.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
-	const unsigned int elts_n = rxq->elts_n;
-	unsigned int elts_head = rxq->elts_head;
-	struct ibv_recv_wr head;
-	struct ibv_recv_wr **next = &head.next;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int i;
-	unsigned int pkts_ret = 0;
-	int ret;
-
-	if (unlikely(!rxq->sp))
-		return mlx4_rx_burst(dpdk_rxq, pkts, pkts_n);
-	if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
-		return 0;
-	for (i = 0; (i != pkts_n); ++i) {
-		struct rxq_elt_sp *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
-		unsigned int len;
-		unsigned int pkt_buf_len;
-		struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
-		struct rte_mbuf **pkt_buf_next = &pkt_buf;
-		unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
-		unsigned int j = 0;
-		uint32_t flags;
-
-		/* Sanity checks. */
-#ifdef NDEBUG
-		(void)wr_id;
-#endif
-		assert(wr_id < rxq->elts_n);
-		assert(wr->sg_list == elt->sges);
-		assert(wr->num_sge == elemof(elt->sges));
-		assert(elts_head < rxq->elts_n);
-		assert(rxq->elts_head < rxq->elts_n);
-		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
-						    &flags);
-		if (unlikely(ret < 0)) {
-			struct ibv_wc wc;
-			int wcs_n;
-
-			DEBUG("rxq=%p, poll_length() failed (ret=%d)",
-			      (void *)rxq, ret);
-			/* ibv_poll_cq() must be used in case of failure. */
-			wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
-			if (unlikely(wcs_n == 0))
-				break;
-			if (unlikely(wcs_n < 0)) {
-				DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
-				      (void *)rxq, wcs_n);
-				break;
-			}
-			assert(wcs_n == 1);
-			if (unlikely(wc.status != IBV_WC_SUCCESS)) {
-				/* Whatever, just repost the offending WR. */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
-				      " completion status (%d): %s",
-				      (void *)rxq, wc.wr_id, wc.status,
-				      ibv_wc_status_str(wc.status));
-				/* Increment dropped packets counter. */
-				++rxq->stats.idropped;
-				/* Link completed WRs together for repost. */
-				*next = wr;
-				next = &wr->next;
-				goto repost;
-			}
-			ret = wc.byte_len;
-		}
-		if (ret == 0)
-			break;
-		len = ret;
-		pkt_buf_len = len;
-		/* Link completed WRs together for repost. */
-		*next = wr;
-		next = &wr->next;
-		/*
-		 * Replace spent segments with new ones, concatenate and
-		 * return them as pkt_buf.
-		 */
-		while (1) {
-			struct ibv_sge *sge = &elt->sges[j];
-			struct rte_mbuf *seg = elt->bufs[j];
-			struct rte_mbuf *rep;
-			unsigned int seg_tailroom;
-
-			/*
-			 * Fetch initial bytes of packet descriptor into a
-			 * cacheline while allocating rep.
-			 */
-			rte_prefetch0(seg);
-			rep = rte_mbuf_raw_alloc(rxq->mp);
-			if (unlikely(rep == NULL)) {
-				/*
-				 * Unable to allocate a replacement mbuf,
-				 * repost WR.
-				 */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
-				      " can't allocate a new mbuf",
-				      (void *)rxq, wr_id);
-				if (pkt_buf != NULL) {
-					*pkt_buf_next = NULL;
-					rte_pktmbuf_free(pkt_buf);
-				}
-				/* Increase out of memory counters. */
-				++rxq->stats.rx_nombuf;
-				++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-				goto repost;
-			}
-#ifndef NDEBUG
-			/* Poison user-modifiable fields in rep. */
-			NEXT(rep) = (void *)((uintptr_t)-1);
-			SET_DATA_OFF(rep, 0xdead);
-			DATA_LEN(rep) = 0xd00d;
-			PKT_LEN(rep) = 0xdeadd00d;
-			NB_SEGS(rep) = 0x2a;
-			PORT(rep) = 0x2a;
-			rep->ol_flags = -1;
-			/*
-			 * Clear special flags in mbuf to avoid
-			 * crashing while freeing.
-			 */
-			rep->ol_flags &=
-				~(uint64_t)(IND_ATTACHED_MBUF |
-					    CTRL_MBUF_FLAG);
-#endif
-			assert(rep->buf_len == seg->buf_len);
-			/* Reconfigure sge to use rep instead of seg. */
-			assert(sge->lkey == rxq->mr->lkey);
-			sge->addr = ((uintptr_t)rep->buf_addr + seg_headroom);
-			elt->bufs[j] = rep;
-			++j;
-			/* Update pkt_buf if it's the first segment, or link
-			 * seg to the previous one and update pkt_buf_next. */
-			*pkt_buf_next = seg;
-			pkt_buf_next = &NEXT(seg);
-			/* Update seg information. */
-			seg_tailroom = (seg->buf_len - seg_headroom);
-			assert(sge->length == seg_tailroom);
-			SET_DATA_OFF(seg, seg_headroom);
-			if (likely(len <= seg_tailroom)) {
-				/* Last segment. */
-				DATA_LEN(seg) = len;
-				PKT_LEN(seg) = len;
-				/* Sanity check. */
-				assert(rte_pktmbuf_headroom(seg) ==
-				       seg_headroom);
-				assert(rte_pktmbuf_tailroom(seg) ==
-				       (seg_tailroom - len));
-				break;
-			}
-			DATA_LEN(seg) = seg_tailroom;
-			PKT_LEN(seg) = seg_tailroom;
-			/* Sanity check. */
-			assert(rte_pktmbuf_headroom(seg) == seg_headroom);
-			assert(rte_pktmbuf_tailroom(seg) == 0);
-			/* Fix len and clear headroom for next segments. */
-			len -= seg_tailroom;
-			seg_headroom = 0;
-		}
-		/* Update head and tail segments. */
-		*pkt_buf_next = NULL;
-		assert(pkt_buf != NULL);
-		assert(j != 0);
-		NB_SEGS(pkt_buf) = j;
-		PORT(pkt_buf) = rxq->port_id;
-		PKT_LEN(pkt_buf) = pkt_buf_len;
-		pkt_buf->packet_type = 0;
-		pkt_buf->ol_flags = 0;
-
-		/* Return packet. */
-		*(pkts++) = pkt_buf;
-		++pkts_ret;
-		/* Increase bytes counter. */
-		rxq->stats.ibytes += pkt_buf_len;
-repost:
-		if (++elts_head >= elts_n)
-			elts_head = 0;
-		continue;
-	}
-	if (unlikely(i == 0))
-		return 0;
-	*next = NULL;
-	/* Repost WRs. */
-	ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
-	if (unlikely(ret)) {
-		/* Inability to repost WRs is fatal. */
-		DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
-		      (void *)rxq->priv,
-		      (void *)bad_wr,
-		      strerror(ret));
-		abort();
-	}
-	rxq->elts_head = elts_head;
-	/* Increase packets counter. */
-	rxq->stats.ipackets += pkts_ret;
-	return pkts_ret;
-}
-
 /**
  * DPDK callback for RX.
  *
- * The following function is the same as mlx4_rx_burst_sp(), except it doesn't
- * manage scattered packets. Improves performance when MRU is lower than the
- * size of the first segment.
+ * The following function doesn't manage scattered packets.
  *
  * @param dpdk_rxq
  *   Generic pointer to RX queue structure.
@@ -2321,7 +1746,7 @@ static uint16_t
 mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 {
 	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 	const unsigned int elts_n = rxq->elts_n;
 	unsigned int elts_head = rxq->elts_head;
 	struct ibv_sge sges[pkts_n];
@@ -2329,8 +1754,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	unsigned int pkts_ret = 0;
 	int ret;
 
-	if (unlikely(rxq->sp))
-		return mlx4_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
 	for (i = 0; (i != pkts_n); ++i) {
 		struct rxq_elt *elt = &(*elts)[elts_head];
 		struct ibv_recv_wr *wr = &elt->wr;
@@ -2482,10 +1905,7 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 					priv->device_attr.max_qp_wr :
 					desc),
 			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = ((priv->device_attr.max_sge <
-					  MLX4_PMD_SGE_WR_N) ?
-					 priv->device_attr.max_sge :
-					 MLX4_PMD_SGE_WR_N),
+			.max_recv_sge = 1,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
 		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
@@ -2500,165 +1920,6 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 }
 
 /**
- * Reconfigure a RX queue with new parameters.
- *
- * rxq_rehash() does not allocate mbufs, which, if not done from the right
- * thread (such as a control thread), may corrupt the pool.
- * In case of failure, the queue is left untouched.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param rxq
- *   RX queue pointer.
- *
- * @return
- *   0 on success, errno value on failure.
- */
-static int
-rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
-{
-	struct priv *priv = rxq->priv;
-	struct rxq tmpl = *rxq;
-	unsigned int mbuf_n;
-	unsigned int desc_n;
-	struct rte_mbuf **pool;
-	unsigned int i, k;
-	struct ibv_exp_qp_attr mod;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int mb_len;
-	int err;
-
-	mb_len = rte_pktmbuf_data_room_size(rxq->mp);
-	DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
-	/* Number of descriptors and mbufs currently allocated. */
-	desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
-	mbuf_n = desc_n;
-	/* Enable scattered packets support for this queue if necessary. */
-	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
-	if (dev->data->dev_conf.rxmode.enable_scatter &&
-	    (dev->data->dev_conf.rxmode.max_rx_pkt_len >
-	     (mb_len - RTE_PKTMBUF_HEADROOM))) {
-		tmpl.sp = 1;
-		desc_n /= MLX4_PMD_SGE_WR_N;
-	} else
-		tmpl.sp = 0;
-	DEBUG("%p: %s scattered packets support (%u WRs)",
-	      (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc_n);
-	/* If scatter mode is the same as before, nothing to do. */
-	if (tmpl.sp == rxq->sp) {
-		DEBUG("%p: nothing to do", (void *)dev);
-		return 0;
-	}
-	/* From now on, any failure will render the queue unusable.
-	 * Reinitialize QP. */
-	mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err) {
-		ERROR("%p: cannot reset QP: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	err = ibv_resize_cq(tmpl.cq, desc_n);
-	if (err) {
-		ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod,
-				IBV_EXP_QP_STATE |
-				IBV_EXP_QP_PORT);
-	if (err) {
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(err));
-		assert(err > 0);
-		return err;
-	};
-	/* Allocate pool. */
-	pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
-	if (pool == NULL) {
-		ERROR("%p: cannot allocate memory", (void *)dev);
-		return ENOBUFS;
-	}
-	/* Snatch mbufs from original queue. */
-	k = 0;
-	if (rxq->sp) {
-		struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
-
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			struct rxq_elt_sp *elt = &(*elts)[i];
-			unsigned int j;
-
-			for (j = 0; (j != elemof(elt->bufs)); ++j) {
-				assert(elt->bufs[j] != NULL);
-				pool[k++] = elt->bufs[j];
-			}
-		}
-	} else {
-		struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
-
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf = (void *)
-				((uintptr_t)elt->sge.addr -
-				 WR_ID(elt->wr.wr_id).offset);
-
-			assert(WR_ID(elt->wr.wr_id).id == i);
-			pool[k++] = buf;
-		}
-	}
-	assert(k == mbuf_n);
-	tmpl.elts_n = 0;
-	tmpl.elts.sp = NULL;
-	assert((void *)&tmpl.elts.sp == (void *)&tmpl.elts.no_sp);
-	err = ((tmpl.sp) ?
-	       rxq_alloc_elts_sp(&tmpl, desc_n, pool) :
-	       rxq_alloc_elts(&tmpl, desc_n, pool));
-	if (err) {
-		ERROR("%p: cannot reallocate WRs, aborting", (void *)dev);
-		rte_free(pool);
-		assert(err > 0);
-		return err;
-	}
-	assert(tmpl.elts_n == desc_n);
-	assert(tmpl.elts.sp != NULL);
-	rte_free(pool);
-	/* Clean up original data. */
-	rxq->elts_n = 0;
-	rte_free(rxq->elts.sp);
-	rxq->elts.sp = NULL;
-	/* Post WRs. */
-	err = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
-	if (err) {
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(err));
-		goto skip_rtr;
-	}
-	mod = (struct ibv_exp_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
-	if (err)
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(err));
-skip_rtr:
-	*rxq = tmpl;
-	assert(err >= 0);
-	return err;
-}
-
-/**
  * Configure a RX queue.
  *
  * @param dev
@@ -2701,19 +1962,19 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	mb_len = rte_pktmbuf_data_room_size(mp);
-	if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
-		ERROR("%p: invalid number of RX descriptors (must be a"
-		      " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
+	if (desc == 0) {
+		ERROR("%p: invalid number of Rx descriptors", (void *)dev);
 		return EINVAL;
 	}
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
 	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
-		tmpl.sp = 0;
+		;
 	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
-		tmpl.sp = 1;
-		desc /= MLX4_PMD_SGE_WR_N;
+		WARN("%p: scattered mode has been requested but is"
+		     " not supported, this may lead to packet loss",
+		     (void *)dev);
 	} else {
 		WARN("%p: the requested maximum Rx packet size (%u) is"
 		     " larger than a single mbuf (%u) and scattered"
@@ -2722,8 +1983,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		     dev->data->dev_conf.rxmode.max_rx_pkt_len,
 		     mb_len - RTE_PKTMBUF_HEADROOM);
 	}
-	DEBUG("%p: %s scattered packets support (%u WRs)",
-	      (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
 	/* Use the entire RX mempool as the memory region. */
 	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
 	if (tmpl.mr == NULL) {
@@ -2792,20 +2051,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	if (tmpl.sp)
-		ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
-	else
-		ret = rxq_alloc_elts(&tmpl, desc, NULL);
+	ret = rxq_alloc_elts(&tmpl, desc);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	ret = ibv_post_recv(tmpl.qp,
-			    (tmpl.sp ?
-			     &(*tmpl.elts.sp)[0].wr :
-			     &(*tmpl.elts.no_sp)[0].wr),
-			    &bad_wr);
+	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
 	if (ret) {
 		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
 		      (void *)dev,
@@ -2926,10 +2178,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, (void *)rxq);
 		(*priv->rxqs)[idx] = rxq;
 		/* Update receive callback. */
-		if (rxq->sp)
-			dev->rx_pkt_burst = mlx4_rx_burst_sp;
-		else
-			dev->rx_pkt_burst = mlx4_rx_burst;
+		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
 	priv_unlock(priv);
 	return -ret;
@@ -3205,23 +2454,12 @@ priv_set_link(struct priv *priv, int up)
 {
 	struct rte_eth_dev *dev = priv->dev;
 	int err;
-	unsigned int i;
 
 	if (up) {
 		err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
 		if (err)
 			return err;
-		for (i = 0; i < priv->rxqs_n; i++)
-			if ((*priv->rxqs)[i]->sp)
-				break;
-		/* Check if an sp queue exists.
-		 * Note: Some old frames might be received.
-		 */
-		if (i == priv->rxqs_n)
-			dev->rx_pkt_burst = mlx4_rx_burst;
-		else
-			dev->rx_pkt_burst = mlx4_rx_burst_sp;
-		dev->tx_pkt_burst = mlx4_tx_burst;
+		dev->rx_pkt_burst = mlx4_rx_burst;
 	} else {
 		err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
 		if (err)
@@ -3469,12 +2707,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 /**
  * DPDK callback to change the MTU.
  *
- * Setting the MTU affects hardware MRU (packets larger than the MTU cannot be
- * received). Use this as a hint to enable/disable scattered packets support
- * and improve performance when not needed.
- * Since failure is not an option, reconfiguring queues on the fly is not
- * recommended.
- *
  * @param dev
  *   Pointer to Ethernet device structure.
  * @param in_mtu
@@ -3488,9 +2720,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 {
 	struct priv *priv = dev->data->dev_private;
 	int ret = 0;
-	unsigned int i;
-	uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
-		mlx4_rx_burst;
 
 	priv_lock(priv);
 	/* Set kernel interface MTU first. */
@@ -3502,45 +2731,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	} else
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
 	priv->mtu = mtu;
-	/* Remove MAC flow. */
-	priv_mac_addr_del(priv);
-	/* Temporarily replace RX handler with a fake one, assuming it has not
-	 * been copied elsewhere. */
-	dev->rx_pkt_burst = removed_rx_burst;
-	/* Make sure everyone has left mlx4_rx_burst() and uses
-	 * removed_rx_burst() instead. */
-	rte_wmb();
-	usleep(1000);
-	/* Reconfigure each RX queue. */
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
-		unsigned int max_frame_len;
-
-		if (rxq == NULL)
-			continue;
-		/* Calculate new maximum frame length according to MTU. */
-		max_frame_len = (priv->mtu + ETHER_HDR_LEN +
-				 (ETHER_MAX_VLAN_FRAME_LEN - ETHER_MAX_LEN));
-		/* Provide new values to rxq_setup(). */
-		dev->data->dev_conf.rxmode.jumbo_frame =
-			(max_frame_len > ETHER_MAX_LEN);
-		dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame_len;
-		ret = rxq_rehash(dev, rxq);
-		if (ret) {
-			/* Force SP RX if that queue requires it and abort. */
-			if (rxq->sp)
-				rx_func = mlx4_rx_burst_sp;
-			break;
-		}
-		/* Scattered burst function takes priority. */
-		if (rxq->sp)
-			rx_func = mlx4_rx_burst_sp;
-	}
-	/* Burst functions can now be called again. */
-	rte_wmb();
-	dev->rx_pkt_burst = rx_func;
-	/* Restore MAC flow. */
-	ret = priv_mac_addr_add(priv);
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index d1104c3..cc36d29 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -66,9 +66,6 @@
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
-/* Maximum number of Scatter/Gather Elements per Work Request. */
-#define MLX4_PMD_SGE_WR_N 4
-
 /* Maximum size for inline data. */
 #define MLX4_PMD_MAX_INLINE 0
 
@@ -147,13 +144,6 @@ struct mlx4_rxq_stats {
 	uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
 };
 
-/* RX element (scattered packets). */
-struct rxq_elt_sp {
-	struct ibv_recv_wr wr; /* Work Request. */
-	struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
-	struct rte_mbuf *bufs[MLX4_PMD_SGE_WR_N]; /* SGEs buffers. */
-};
-
 /* RX element. */
 struct rxq_elt {
 	struct ibv_recv_wr wr; /* Work Request. */
@@ -174,11 +164,7 @@ struct rxq {
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	unsigned int elts_head; /* Current index in (*elts)[]. */
-	union {
-		struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
-		struct rxq_elt (*no_sp)[]; /* RX elements. */
-	} elts;
-	unsigned int sp:1; /* Use scattered RX elements. */
+	struct rxq_elt (*elts)[]; /* Rx elements. */
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -196,16 +182,6 @@ struct mlx4_txq_stats {
 	uint64_t odropped; /**< Total of packets not sent when TX ring full. */
 };
 
-/*
- * Linear buffer type. It is used when transmitting buffers with too many
- * segments that do not fit the hardware queue (see max_send_sge).
- * Extra segments are copied (linearized) in such buffers, replacing the
- * last SGE during TX.
- * The size is arbitrary but large enough to hold a jumbo frame with
- * 8 segments considering mbuf.buf_len is about 2048 bytes.
- */
-typedef uint8_t linear_t[16384];
-
 /* TX queue descriptor. */
 struct txq {
 	struct priv *priv; /* Back pointer to private data. */
@@ -227,8 +203,6 @@ struct txq {
 	unsigned int elts_comp_cd; /* Countdown for next completion request. */
 	unsigned int elts_comp_cd_init; /* Initial value for countdown. */
 	struct mlx4_txq_stats stats; /* TX queue counters. */
-	linear_t (*elts_linear)[]; /* Linearized buffers. */
-	struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
 	unsigned int socket; /* CPU socket ID for allocations. */
 	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 20/51] net/mlx4: drop inline receive support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (18 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 19/51] net/mlx4: drop scatter/gather support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 21/51] net/mlx4: use standard QP attributes Adrien Mazarguil
                     ` (32 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

The Verbs API used to implement inline receive is deprecated.
Support will be added back after refactoring the PMD.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 52 --------------------------------------------
 drivers/net/mlx4/mlx4.h |  1 -
 2 files changed, 53 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 0a71e2c..30c70ee 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1914,8 +1914,6 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 		.res_domain = rd,
 	};
 
-	attr.max_inl_recv = priv->inl_recv_size,
-	attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
 	return ibv_exp_create_qp(priv->ctx, &attr);
 }
 
@@ -2977,25 +2975,6 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-/**
- * Retrieve integer value from environment variable.
- *
- * @param[in] name
- *   Environment variable name.
- *
- * @return
- *   Integer value, 0 if the variable is not set.
- */
-static int
-mlx4_getenv_int(const char *name)
-{
-	const char *val = getenv(name);
-
-	if (val == NULL)
-		return 0;
-	return atoi(val);
-}
-
 static void
 mlx4_dev_link_status_handler(void *);
 static void
@@ -3644,13 +3623,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		struct ibv_pd *pd = NULL;
 		struct priv *priv = NULL;
 		struct rte_eth_dev *eth_dev = NULL;
-		struct ibv_exp_device_attr exp_device_attr;
 		struct ether_addr mac;
 
 		/* If port is not enabled, skip. */
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
-		exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
 
 		DEBUG("using port %u", port);
 
@@ -3703,35 +3680,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
-		if (ibv_exp_query_device(ctx, &exp_device_attr)) {
-			ERROR("ibv_exp_query_device() failed");
-			err = ENODEV;
-			goto port_error;
-		}
-
-		priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
-
-		if (priv->inl_recv_size) {
-			exp_device_attr.comp_mask =
-				IBV_EXP_DEVICE_ATTR_INLINE_RECV_SZ;
-			if (ibv_exp_query_device(ctx, &exp_device_attr)) {
-				INFO("Couldn't query device for inline-receive"
-				     " capabilities.");
-				priv->inl_recv_size = 0;
-			} else {
-				if ((unsigned)exp_device_attr.inline_recv_sz <
-				    priv->inl_recv_size) {
-					INFO("Max inline-receive (%d) <"
-					     " requested inline-receive (%u)",
-					     exp_device_attr.inline_recv_sz,
-					     priv->inl_recv_size);
-					priv->inl_recv_size =
-						exp_device_attr.inline_recv_sz;
-				}
-			}
-			INFO("Set inline receive size to %u",
-			     priv->inl_recv_size);
-		}
 
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index cc36d29..9cbde1d 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -223,7 +223,6 @@ struct priv {
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int pending_alarm:1; /* An alarm is pending. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-	unsigned int inl_recv_size; /* Inline recv size */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
 	unsigned int txqs_n; /* TX queues array size. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 21/51] net/mlx4: use standard QP attributes
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (19 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 20/51] net/mlx4: drop inline receive support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 22/51] net/mlx4: revert resource domain support Adrien Mazarguil
                     ` (31 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

The Verbs API used to set QP attributes is deprecated. Revert to the
standard API since it actually supports the remaining ones.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 30c70ee..682307f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1178,7 +1178,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		struct ibv_exp_qp_init_attr init;
 		struct ibv_exp_res_domain_init_attr rd;
 		struct ibv_exp_cq_init_attr cq;
-		struct ibv_exp_qp_attr mod;
+		struct ibv_qp_attr mod;
 	} attr;
 	enum ibv_exp_query_intf_status status;
 	int ret = 0;
@@ -1251,14 +1251,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	}
 	/* ibv_create_qp() updates this value. */
 	tmpl.max_inline = attr.init.cap.max_inline_data;
-	attr.mod = (struct ibv_exp_qp_attr){
+	attr.mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
 		/* Primary port number. */
 		.port_num = priv->port
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod,
-				(IBV_EXP_QP_STATE | IBV_EXP_QP_PORT));
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(ret));
@@ -1270,17 +1269,17 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	attr.mod = (struct ibv_exp_qp_attr){
+	attr.mod = (struct ibv_qp_attr){
 		.qp_state = IBV_QPS_RTR
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
 	attr.mod.qp_state = IBV_QPS_RTS;
-	ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
 		      (void *)dev, strerror(ret));
@@ -1947,7 +1946,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.mp = mp,
 		.socket = socket
 	};
-	struct ibv_exp_qp_attr mod;
+	struct ibv_qp_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
 		struct ibv_exp_cq_init_attr cq;
@@ -2035,15 +2034,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
+	mod = (struct ibv_qp_attr){
 		/* Move the QP to this state. */
 		.qp_state = IBV_QPS_INIT,
 		/* Primary port number. */
 		.port_num = priv->port
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod,
-				IBV_EXP_QP_STATE |
-				IBV_EXP_QP_PORT);
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
 		      (void *)dev, strerror(ret));
@@ -2063,10 +2060,10 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      strerror(ret));
 		goto error;
 	}
-	mod = (struct ibv_exp_qp_attr){
+	mod = (struct ibv_qp_attr){
 		.qp_state = IBV_QPS_RTR
 	};
-	ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
 	if (ret) {
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
 		      (void *)dev, strerror(ret));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 22/51] net/mlx4: revert resource domain support
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (20 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 21/51] net/mlx4: use standard QP attributes Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 23/51] net/mlx4: revert multicast echo prevention Adrien Mazarguil
                     ` (30 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit 3e49c148b715c3c0a12c1200295bb9b312f7028e.

Resource domains are not part of the standard Verbs interface. The
performance improvement they bring will be restored later through a
different data path implementation.

This commit makes the PMD not rely on the non-standard QP allocation
interface.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 88 ++++-----------------------------------
 drivers/net/mlx4/mlx4.h      |  2 -
 drivers/net/mlx4/mlx4_flow.c | 30 +++++--------
 3 files changed, 20 insertions(+), 100 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 682307f..7dbed93 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -697,17 +697,6 @@ txq_cleanup(struct txq *txq)
 		claim_zero(ibv_destroy_qp(txq->qp));
 	if (txq->cq != NULL)
 		claim_zero(ibv_destroy_cq(txq->cq));
-	if (txq->rd != NULL) {
-		struct ibv_exp_destroy_res_domain_attr attr = {
-			.comp_mask = 0,
-		};
-
-		assert(txq->priv != NULL);
-		assert(txq->priv->ctx != NULL);
-		claim_zero(ibv_exp_destroy_res_domain(txq->priv->ctx,
-						      txq->rd,
-						      &attr));
-	}
 	for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
 		if (txq->mp2mr[i].mp == NULL)
 			break;
@@ -1175,9 +1164,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	union {
 		struct ibv_exp_query_intf_params params;
-		struct ibv_exp_qp_init_attr init;
-		struct ibv_exp_res_domain_init_attr rd;
-		struct ibv_exp_cq_init_attr cq;
+		struct ibv_qp_init_attr init;
 		struct ibv_qp_attr mod;
 	} attr;
 	enum ibv_exp_query_intf_status status;
@@ -1191,24 +1178,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		return EINVAL;
 	}
 	/* MRs will be registered in mp2mr[] later. */
-	attr.rd = (struct ibv_exp_res_domain_init_attr){
-		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
-			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
-		.thread_model = IBV_EXP_THREAD_SINGLE,
-		.msg_model = IBV_EXP_MSG_HIGH_BW,
-	};
-	tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
-	if (tmpl.rd == NULL) {
-		ret = ENOMEM;
-		ERROR("%p: RD creation failure: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
-	attr.cq = (struct ibv_exp_cq_init_attr){
-		.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
-		.res_domain = tmpl.rd,
-	};
-	tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
 		ret = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
@@ -1219,7 +1189,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	attr.init = (struct ibv_exp_qp_init_attr){
+	attr.init = (struct ibv_qp_init_attr){
 		/* CQ to be associated with the send queue. */
 		.send_cq = tmpl.cq,
 		/* CQ to be associated with the receive queue. */
@@ -1237,12 +1207,8 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		/* Do *NOT* enable this, completions events are managed per
 		 * TX burst. */
 		.sq_sig_all = 0,
-		.pd = priv->pd,
-		.res_domain = tmpl.rd,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
 	};
-	tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init);
+	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
 	if (tmpl.qp == NULL) {
 		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
@@ -1710,17 +1676,6 @@ rxq_cleanup(struct rxq *rxq)
 		claim_zero(ibv_destroy_cq(rxq->cq));
 	if (rxq->channel != NULL)
 		claim_zero(ibv_destroy_comp_channel(rxq->channel));
-	if (rxq->rd != NULL) {
-		struct ibv_exp_destroy_res_domain_attr attr = {
-			.comp_mask = 0,
-		};
-
-		assert(rxq->priv != NULL);
-		assert(rxq->priv->ctx != NULL);
-		claim_zero(ibv_exp_destroy_res_domain(rxq->priv->ctx,
-						      rxq->rd,
-						      &attr));
-	}
 	if (rxq->mr != NULL)
 		claim_zero(ibv_dereg_mr(rxq->mr));
 	memset(rxq, 0, sizeof(*rxq));
@@ -1890,10 +1845,9 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   QP pointer or NULL in case of error.
  */
 static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
-	     struct ibv_exp_res_domain *rd)
+rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 {
-	struct ibv_exp_qp_init_attr attr = {
+	struct ibv_qp_init_attr attr = {
 		/* CQ to be associated with the send queue. */
 		.send_cq = cq,
 		/* CQ to be associated with the receive queue. */
@@ -1907,13 +1861,9 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
 			.max_recv_sge = 1,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
-		.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
-			      IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
-		.pd = priv->pd,
-		.res_domain = rd,
 	};
 
-	return ibv_exp_create_qp(priv->ctx, &attr);
+	return ibv_create_qp(priv->pd, &attr);
 }
 
 /**
@@ -1949,8 +1899,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	struct ibv_qp_attr mod;
 	union {
 		struct ibv_exp_query_intf_params params;
-		struct ibv_exp_cq_init_attr cq;
-		struct ibv_exp_res_domain_init_attr rd;
 	} attr;
 	enum ibv_exp_query_intf_status status;
 	struct ibv_recv_wr *bad_wr;
@@ -1988,19 +1936,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	attr.rd = (struct ibv_exp_res_domain_init_attr){
-		.comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
-			      IBV_EXP_RES_DOMAIN_MSG_MODEL),
-		.thread_model = IBV_EXP_THREAD_SINGLE,
-		.msg_model = IBV_EXP_MSG_HIGH_BW,
-	};
-	tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
-	if (tmpl.rd == NULL) {
-		ret = ENOMEM;
-		ERROR("%p: RD creation failure: %s",
-		      (void *)dev, strerror(ret));
-		goto error;
-	}
 	if (dev->data->dev_conf.intr_conf.rxq) {
 		tmpl.channel = ibv_create_comp_channel(priv->ctx);
 		if (tmpl.channel == NULL) {
@@ -2011,12 +1946,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 			goto error;
 		}
 	}
-	attr.cq = (struct ibv_exp_cq_init_attr){
-		.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
-		.res_domain = tmpl.rd,
-	};
-	tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0,
-				    &attr.cq);
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
 	if (tmpl.cq == NULL) {
 		ret = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
@@ -2027,7 +1957,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_qp_wr);
 	DEBUG("priv->device_attr.max_sge is %d",
 	      priv->device_attr.max_sge);
-	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
 	if (tmpl.qp == NULL) {
 		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 9cbde1d..edec40c 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -167,7 +167,6 @@ struct rxq {
 	struct rxq_elt (*elts)[]; /* Rx elements. */
 	struct mlx4_rxq_stats stats; /* RX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
-	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
 /* TX element. */
@@ -204,7 +203,6 @@ struct txq {
 	unsigned int elts_comp_cd_init; /* Initial value for countdown. */
 	struct mlx4_txq_stats stats; /* TX queue counters. */
 	unsigned int socket; /* CPU socket ID for allocations. */
-	struct ibv_exp_res_domain *rd; /* Resource Domain. */
 };
 
 struct rte_flow;
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 2c5dc3c..58d4698 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -752,29 +752,21 @@ mlx4_flow_create_drop_queue(struct priv *priv)
 		ERROR("Cannot allocate memory for drop struct");
 		goto err;
 	}
-	cq = ibv_exp_create_cq(priv->ctx, 1, NULL, NULL, 0,
-			      &(struct ibv_exp_cq_init_attr){
-					.comp_mask = 0,
-			      });
+	cq = ibv_create_cq(priv->ctx, 1, NULL, NULL, 0);
 	if (!cq) {
 		ERROR("Cannot create drop CQ");
 		goto err_create_cq;
 	}
-	qp = ibv_exp_create_qp(priv->ctx,
-			      &(struct ibv_exp_qp_init_attr){
-					.send_cq = cq,
-					.recv_cq = cq,
-					.cap = {
-						.max_recv_wr = 1,
-						.max_recv_sge = 1,
-					},
-					.qp_type = IBV_QPT_RAW_PACKET,
-					.comp_mask =
-						IBV_EXP_QP_INIT_ATTR_PD |
-						IBV_EXP_QP_INIT_ATTR_PORT,
-					.pd = priv->pd,
-					.port_num = priv->port,
-			      });
+	qp = ibv_create_qp(priv->pd,
+			   &(struct ibv_qp_init_attr){
+				.send_cq = cq,
+				.recv_cq = cq,
+				.cap = {
+					.max_recv_wr = 1,
+					.max_recv_sge = 1,
+				},
+				.qp_type = IBV_QPT_RAW_PACKET,
+			   });
 	if (!qp) {
 		ERROR("Cannot create drop QP");
 		goto err_create_qp;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 23/51] net/mlx4: revert multicast echo prevention
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (21 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 22/51] net/mlx4: revert resource domain support Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 24/51] net/mlx4: revert fast Verbs interface for Tx Adrien Mazarguil
                     ` (29 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit 8b3ffe95e75d6d305992505005cbb95969874a15.

Multicast loopback prevention is not part of the standard Verbs interface.
Remove it temporarily.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile | 6 +-----
 drivers/net/mlx4/mlx4.c   | 7 -------
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index bd713e2..20692f0 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -91,11 +91,7 @@ mlx4_autoconf.h.new: FORCE
 
 mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 	$Q $(RM) -f -- '$@'
-	$Q sh -- '$<' '$@' \
-		HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
-		infiniband/verbs.h \
-		enum IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
-		$(AUTOCONF_OUTPUT)
+	$Q : > '$@'
 
 # Create mlx4_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 7dbed93..5abfb37 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1266,13 +1266,6 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		.intf_scope = IBV_EXP_INTF_GLOBAL,
 		.intf = IBV_EXP_INTF_QP_BURST,
 		.obj = tmpl.qp,
-#ifdef HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK
-		/* MC loopback must be disabled when not using a VF. */
-		.family_flags =
-			(!priv->vf ?
-			 IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK :
-			 0),
-#endif
 	};
 	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
 	if (tmpl.if_qp == NULL) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 24/51] net/mlx4: revert fast Verbs interface for Tx
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (22 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 23/51] net/mlx4: revert multicast echo prevention Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 25/51] net/mlx4: revert fast Verbs interface for Rx Adrien Mazarguil
                     ` (28 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit 9980f81dc2623291b89cf1c281a6a9f116fd2394.

"Fast Verbs" is a nonstandard experimental interface that must be reverted
for compatibility reasons. Its replacement is slower but temporary,
performance will be restored by a subsequent commit through an enhanced
data path implementation. This one focuses on maintaining basic
functionality in the meantime.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 120 ++++++++++++++++++-------------------------
 drivers/net/mlx4/mlx4.h |   4 +-
 2 files changed, 52 insertions(+), 72 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 5abfb37..4432952 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -666,33 +666,10 @@ txq_free_elts(struct txq *txq)
 static void
 txq_cleanup(struct txq *txq)
 {
-	struct ibv_exp_release_intf_params params;
 	size_t i;
 
 	DEBUG("cleaning up %p", (void *)txq);
 	txq_free_elts(txq);
-	if (txq->if_qp != NULL) {
-		assert(txq->priv != NULL);
-		assert(txq->priv->ctx != NULL);
-		assert(txq->qp != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(txq->priv->ctx,
-						txq->if_qp,
-						&params));
-	}
-	if (txq->if_cq != NULL) {
-		assert(txq->priv != NULL);
-		assert(txq->priv->ctx != NULL);
-		assert(txq->cq != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(txq->priv->ctx,
-						txq->if_cq,
-						&params));
-	}
 	if (txq->qp != NULL)
 		claim_zero(ibv_destroy_qp(txq->qp));
 	if (txq->cq != NULL)
@@ -726,11 +703,12 @@ txq_complete(struct txq *txq)
 	unsigned int elts_comp = txq->elts_comp;
 	unsigned int elts_tail = txq->elts_tail;
 	const unsigned int elts_n = txq->elts_n;
+	struct ibv_wc wcs[elts_comp];
 	int wcs_n;
 
 	if (unlikely(elts_comp == 0))
 		return 0;
-	wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
+	wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
 	if (unlikely(wcs_n == 0))
 		return 0;
 	if (unlikely(wcs_n < 0)) {
@@ -1014,6 +992,9 @@ static uint16_t
 mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 {
 	struct txq *txq = (struct txq *)dpdk_txq;
+	struct ibv_send_wr *wr_head = NULL;
+	struct ibv_send_wr **wr_next = &wr_head;
+	struct ibv_send_wr *wr_bad = NULL;
 	unsigned int elts_head = txq->elts_head;
 	const unsigned int elts_n = txq->elts_n;
 	unsigned int elts_comp_cd = txq->elts_comp_cd;
@@ -1041,6 +1022,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
 		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
 		struct txq_elt *elt = &(*txq->elts)[elts_head];
+		struct ibv_send_wr *wr = &elt->wr;
 		unsigned int segs = NB_SEGS(buf);
 		unsigned int sent_size = 0;
 		uint32_t send_flags = 0;
@@ -1065,9 +1047,10 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		if (unlikely(--elts_comp_cd == 0)) {
 			elts_comp_cd = txq->elts_comp_cd_init;
 			++elts_comp;
-			send_flags |= IBV_EXP_QP_BURST_SIGNALED;
+			send_flags |= IBV_SEND_SIGNALED;
 		}
 		if (likely(segs == 1)) {
+			struct ibv_sge *sge = &elt->sge;
 			uintptr_t addr;
 			uint32_t length;
 			uint32_t lkey;
@@ -1091,30 +1074,26 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 				rte_prefetch0((volatile void *)
 					      (uintptr_t)addr);
 			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
-			/* Put packet into send queue. */
-			if (length <= txq->max_inline)
-				err = txq->if_qp->send_pending_inline
-					(txq->qp,
-					 (void *)addr,
-					 length,
-					 send_flags);
-			else
-				err = txq->if_qp->send_pending
-					(txq->qp,
-					 addr,
-					 length,
-					 lkey,
-					 send_flags);
-			if (unlikely(err))
-				goto stop;
+			sge->addr = addr;
+			sge->length = length;
+			sge->lkey = lkey;
 			sent_size += length;
 		} else {
 			err = -1;
 			goto stop;
 		}
+		if (sent_size <= txq->max_inline)
+			send_flags |= IBV_SEND_INLINE;
 		elts_head = elts_head_next;
 		/* Increment sent bytes counter. */
 		txq->stats.obytes += sent_size;
+		/* Set up WR. */
+		wr->sg_list = &elt->sge;
+		wr->num_sge = segs;
+		wr->opcode = IBV_WR_SEND;
+		wr->send_flags = send_flags;
+		*wr_next = wr;
+		wr_next = &wr->next;
 	}
 stop:
 	/* Take a shortcut if nothing must be sent. */
@@ -1123,12 +1102,37 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	/* Increment sent packets counter. */
 	txq->stats.opackets += i;
 	/* Ring QP doorbell. */
-	err = txq->if_qp->send_flush(txq->qp);
+	*wr_next = NULL;
+	assert(wr_head);
+	err = ibv_post_send(txq->qp, wr_head, &wr_bad);
 	if (unlikely(err)) {
-		/* A nonzero value is not supposed to be returned.
-		 * Nothing can be done about it. */
-		DEBUG("%p: send_flush() failed with error %d",
-		      (void *)txq, err);
+		uint64_t obytes = 0;
+		uint64_t opackets = 0;
+
+		/* Rewind bad WRs. */
+		while (wr_bad != NULL) {
+			int j;
+
+			/* Force completion request if one was lost. */
+			if (wr_bad->send_flags & IBV_SEND_SIGNALED) {
+				elts_comp_cd = 1;
+				--elts_comp;
+			}
+			++opackets;
+			for (j = 0; j < wr_bad->num_sge; ++j)
+				obytes += wr_bad->sg_list[j].length;
+			elts_head = (elts_head ? elts_head : elts_n) - 1;
+			wr_bad = wr_bad->next;
+		}
+		txq->stats.opackets -= opackets;
+		txq->stats.obytes -= obytes;
+		i -= opackets;
+		DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets"
+		      " (%" PRIu64 " bytes) rejected: %s",
+		      (void *)txq,
+		      opackets,
+		      obytes,
+		      (err <= -1) ? "Internal error" : strerror(err));
 	}
 	txq->elts_head = elts_head;
 	txq->elts_comp += elts_comp;
@@ -1163,11 +1167,9 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		.socket = socket
 	};
 	union {
-		struct ibv_exp_query_intf_params params;
 		struct ibv_qp_init_attr init;
 		struct ibv_qp_attr mod;
 	} attr;
-	enum ibv_exp_query_intf_status status;
 	int ret = 0;
 
 	(void)conf; /* Thresholds configuration (ignored). */
@@ -1251,28 +1253,6 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		      (void *)dev, strerror(ret));
 		goto error;
 	}
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_CQ,
-		.obj = tmpl.cq,
-	};
-	tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_cq == NULL) {
-		ERROR("%p: CQ interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = tmpl.qp,
-	};
-	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_qp == NULL) {
-		ERROR("%p: QP interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
 	/* Clean up txq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
 	txq_cleanup(txq);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index edec40c..8a9a678 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -171,6 +171,8 @@ struct rxq {
 
 /* TX element. */
 struct txq_elt {
+	struct ibv_send_wr wr; /* Work request. */
+	struct ibv_sge sge; /* Scatter/gather element. */
 	struct rte_mbuf *buf;
 };
 
@@ -191,8 +193,6 @@ struct txq {
 	} mp2mr[MLX4_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
 	struct ibv_cq *cq; /* Completion Queue. */
 	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
-	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
 	unsigned int elts_n; /* (*elts)[] length. */
 	struct txq_elt (*elts)[]; /* TX elements. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 25/51] net/mlx4: revert fast Verbs interface for Rx
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (23 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 24/51] net/mlx4: revert fast Verbs interface for Tx Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 26/51] net/mlx4: simplify Rx buffer handling Adrien Mazarguil
                     ` (27 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev; +Cc: Moti Haimovsky

This reverts commit acac55f164128fc76da8d93cae1e8c1e560e99f6.

"Fast Verbs" is a nonstandard experimental interface that must be reverted
for compatibility reasons. Its replacement is slower but temporary,
performance will be restored by a subsequent commit through an enhanced
data path implementation. This one focuses on maintaining basic
functionality in the meantime.

Signed-off-by: Moti Haimovsky <motih@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 127 +++++++++++--------------------------------
 drivers/net/mlx4/mlx4.h |   2 -
 2 files changed, 33 insertions(+), 96 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 4432952..79fb666 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1617,32 +1617,8 @@ priv_mac_addr_add(struct priv *priv)
 static void
 rxq_cleanup(struct rxq *rxq)
 {
-	struct ibv_exp_release_intf_params params;
-
 	DEBUG("cleaning up %p", (void *)rxq);
 	rxq_free_elts(rxq);
-	if (rxq->if_qp != NULL) {
-		assert(rxq->priv != NULL);
-		assert(rxq->priv->ctx != NULL);
-		assert(rxq->qp != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
-						rxq->if_qp,
-						&params));
-	}
-	if (rxq->if_cq != NULL) {
-		assert(rxq->priv != NULL);
-		assert(rxq->priv->ctx != NULL);
-		assert(rxq->cq != NULL);
-		params = (struct ibv_exp_release_intf_params){
-			.comp_mask = 0,
-		};
-		claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
-						rxq->if_cq,
-						&params));
-	}
 	if (rxq->qp != NULL)
 		claim_zero(ibv_destroy_qp(rxq->qp));
 	if (rxq->cq != NULL)
@@ -1676,23 +1652,37 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
 	const unsigned int elts_n = rxq->elts_n;
 	unsigned int elts_head = rxq->elts_head;
-	struct ibv_sge sges[pkts_n];
+	struct ibv_wc wcs[pkts_n];
+	struct ibv_recv_wr *wr_head = NULL;
+	struct ibv_recv_wr **wr_next = &wr_head;
+	struct ibv_recv_wr *wr_bad = NULL;
 	unsigned int i;
 	unsigned int pkts_ret = 0;
 	int ret;
 
-	for (i = 0; (i != pkts_n); ++i) {
+	ret = ibv_poll_cq(rxq->cq, pkts_n, wcs);
+	if (unlikely(ret == 0))
+		return 0;
+	if (unlikely(ret < 0)) {
+		DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
+		      (void *)rxq, ret);
+		return 0;
+	}
+	assert(ret <= (int)pkts_n);
+	/* For each work completion. */
+	for (i = 0; i != (unsigned int)ret; ++i) {
+		struct ibv_wc *wc = &wcs[i];
 		struct rxq_elt *elt = &(*elts)[elts_head];
 		struct ibv_recv_wr *wr = &elt->wr;
 		uint64_t wr_id = wr->wr_id;
-		unsigned int len;
+		uint32_t len = wc->byte_len;
 		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
 			WR_ID(wr_id).offset);
 		struct rte_mbuf *rep;
-		uint32_t flags;
 
 		/* Sanity checks. */
 		assert(WR_ID(wr_id).id < rxq->elts_n);
+		assert(wr_id == wc->wr_id);
 		assert(wr->sg_list == &elt->sge);
 		assert(wr->num_sge == 1);
 		assert(elts_head < rxq->elts_n);
@@ -1703,41 +1693,19 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		 */
 		rte_mbuf_prefetch_part1(seg);
 		rte_mbuf_prefetch_part2(seg);
-		ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
-						    &flags);
-		if (unlikely(ret < 0)) {
-			struct ibv_wc wc;
-			int wcs_n;
-
-			DEBUG("rxq=%p, poll_length() failed (ret=%d)",
-			      (void *)rxq, ret);
-			/* ibv_poll_cq() must be used in case of failure. */
-			wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
-			if (unlikely(wcs_n == 0))
-				break;
-			if (unlikely(wcs_n < 0)) {
-				DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
-				      (void *)rxq, wcs_n);
-				break;
-			}
-			assert(wcs_n == 1);
-			if (unlikely(wc.status != IBV_WC_SUCCESS)) {
-				/* Whatever, just repost the offending WR. */
-				DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
-				      " completion status (%d): %s",
-				      (void *)rxq, wc.wr_id, wc.status,
-				      ibv_wc_status_str(wc.status));
-				/* Increment dropped packets counter. */
-				++rxq->stats.idropped;
-				/* Add SGE to array for repost. */
-				sges[i] = elt->sge;
-				goto repost;
-			}
-			ret = wc.byte_len;
+		/* Link completed WRs together for repost. */
+		*wr_next = wr;
+		wr_next = &wr->next;
+		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
+			/* Whatever, just repost the offending WR. */
+			DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
+			      " status (%d): %s",
+			      (void *)rxq, wr_id, wc->status,
+			      ibv_wc_status_str(wc->status));
+			/* Increment dropped packets counter. */
+			++rxq->stats.idropped;
+			goto repost;
 		}
-		if (ret == 0)
-			break;
-		len = ret;
 		rep = rte_mbuf_raw_alloc(rxq->mp);
 		if (unlikely(rep == NULL)) {
 			/*
@@ -1750,8 +1718,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			/* Increase out of memory counters. */
 			++rxq->stats.rx_nombuf;
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-			/* Add SGE to array for repost. */
-			sges[i] = elt->sge;
 			goto repost;
 		}
 
@@ -1763,9 +1729,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			 (uintptr_t)rep);
 		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
 
-		/* Add SGE to array for repost. */
-		sges[i] = elt->sge;
-
 		/* Update seg information. */
 		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
 		NB_SEGS(seg) = 1;
@@ -1789,7 +1752,9 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	if (unlikely(i == 0))
 		return 0;
 	/* Repost WRs. */
-	ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+	*wr_next = NULL;
+	assert(wr_head);
+	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
 	if (unlikely(ret)) {
 		/* Inability to repost WRs is fatal. */
 		DEBUG("%p: recv_burst(): failed (ret=%d)",
@@ -1870,10 +1835,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 		.socket = socket
 	};
 	struct ibv_qp_attr mod;
-	union {
-		struct ibv_exp_query_intf_params params;
-	} attr;
-	enum ibv_exp_query_intf_status status;
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
 	int ret = 0;
@@ -1975,28 +1936,6 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	/* Save port ID. */
 	tmpl.port_id = dev->data->port_id;
 	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_CQ,
-		.obj = tmpl.cq,
-	};
-	tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_cq == NULL) {
-		ERROR("%p: CQ interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
-	attr.params = (struct ibv_exp_query_intf_params){
-		.intf_scope = IBV_EXP_INTF_GLOBAL,
-		.intf = IBV_EXP_INTF_QP_BURST,
-		.obj = tmpl.qp,
-	};
-	tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
-	if (tmpl.if_qp == NULL) {
-		ERROR("%p: QP interface family query failed with status %d",
-		      (void *)dev, status);
-		goto error;
-	}
 	/* Clean up rxq in case we're reinitializing it. */
 	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
 	rxq_cleanup(rxq);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 8a9a678..772784f 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -158,8 +158,6 @@ struct rxq {
 	struct ibv_mr *mr; /* Memory Region (for mp). */
 	struct ibv_cq *cq; /* Completion Queue. */
 	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
-	struct ibv_exp_cq_family *if_cq; /* CQ interface. */
 	struct ibv_comp_channel *channel;
 	unsigned int port_id; /* Port ID for incoming packets. */
 	unsigned int elts_n; /* (*elts)[] length. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 26/51] net/mlx4: simplify Rx buffer handling
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (24 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 25/51] net/mlx4: revert fast Verbs interface for Rx Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 27/51] net/mlx4: simplify link update function Adrien Mazarguil
                     ` (26 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Thanks to the fact the PMD temporarily uses a slower interface for Rx,
removing the WR ID hack to instead store mbuf pointers directly makes the
code simpler at no extra cost.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 79 ++++++--------------------------------------
 drivers/net/mlx4/mlx4.h |  2 +-
 2 files changed, 12 insertions(+), 69 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 79fb666..1208e7a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -84,17 +84,6 @@
 #define NB_SEGS(m) ((m)->nb_segs)
 #define PORT(m) ((m)->port)
 
-/* Work Request ID data type (64 bit). */
-typedef union {
-	struct {
-		uint32_t id;
-		uint16_t offset;
-	} data;
-	uint64_t raw;
-} wr_id_t;
-
-#define WR_ID(o) (((wr_id_t *)&(o))->data)
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -1403,13 +1392,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 			ret = ENOMEM;
 			goto error;
 		}
-		/* Configure WR. Work request ID contains its own index in
-		 * the elts array and the offset between SGE buffer header and
-		 * its data. */
-		WR_ID(wr->wr_id).id = i;
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)buf);
+		elt->buf = buf;
 		wr->next = &(*elts)[(i + 1)].wr;
 		wr->sg_list = sge;
 		wr->num_sge = 1;
@@ -1427,18 +1410,6 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 		sge->lkey = rxq->mr->lkey;
 		/* Redundant check for tailroom. */
 		assert(sge->length == rte_pktmbuf_tailroom(buf));
-		/* Make sure elts index and SGE mbuf pointer can be deduced
-		 * from WR ID. */
-		if ((WR_ID(wr->wr_id).id != i) ||
-		    ((void *)((uintptr_t)sge->addr -
-			WR_ID(wr->wr_id).offset) != buf)) {
-			ERROR("%p: cannot store index and offset in WR ID",
-			      (void *)rxq);
-			sge->addr = 0;
-			rte_pktmbuf_free(buf);
-			ret = EOVERFLOW;
-			goto error;
-		}
 	}
 	/* The last WR pointer must be NULL. */
 	(*elts)[(i - 1)].wr.next = NULL;
@@ -1451,17 +1422,8 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 	return 0;
 error:
 	if (elts != NULL) {
-		for (i = 0; (i != elemof(*elts)); ++i) {
-			struct rxq_elt *elt = &(*elts)[i];
-			struct rte_mbuf *buf;
-
-			if (elt->sge.addr == 0)
-				continue;
-			assert(WR_ID(elt->wr.wr_id).id == i);
-			buf = (void *)((uintptr_t)elt->sge.addr -
-				WR_ID(elt->wr.wr_id).offset);
-			rte_pktmbuf_free_seg(buf);
-		}
+		for (i = 0; (i != elemof(*elts)); ++i)
+			rte_pktmbuf_free_seg((*elts)[i].buf);
 		rte_free(elts);
 	}
 	DEBUG("%p: failed, freed everything", (void *)rxq);
@@ -1487,17 +1449,8 @@ rxq_free_elts(struct rxq *rxq)
 	rxq->elts = NULL;
 	if (elts == NULL)
 		return;
-	for (i = 0; (i != elemof(*elts)); ++i) {
-		struct rxq_elt *elt = &(*elts)[i];
-		struct rte_mbuf *buf;
-
-		if (elt->sge.addr == 0)
-			continue;
-		assert(WR_ID(elt->wr.wr_id).id == i);
-		buf = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(elt->wr.wr_id).offset);
-		rte_pktmbuf_free_seg(buf);
-	}
+	for (i = 0; (i != elemof(*elts)); ++i)
+		rte_pktmbuf_free_seg((*elts)[i].buf);
 	rte_free(elts);
 }
 
@@ -1674,15 +1627,11 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		struct ibv_wc *wc = &wcs[i];
 		struct rxq_elt *elt = &(*elts)[elts_head];
 		struct ibv_recv_wr *wr = &elt->wr;
-		uint64_t wr_id = wr->wr_id;
 		uint32_t len = wc->byte_len;
-		struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
-			WR_ID(wr_id).offset);
+		struct rte_mbuf *seg = elt->buf;
 		struct rte_mbuf *rep;
 
 		/* Sanity checks. */
-		assert(WR_ID(wr_id).id < rxq->elts_n);
-		assert(wr_id == wc->wr_id);
 		assert(wr->sg_list == &elt->sge);
 		assert(wr->num_sge == 1);
 		assert(elts_head < rxq->elts_n);
@@ -1698,9 +1647,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		wr_next = &wr->next;
 		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
 			/* Whatever, just repost the offending WR. */
-			DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
-			      " status (%d): %s",
-			      (void *)rxq, wr_id, wc->status,
+			DEBUG("rxq=%p: bad work completion status (%d): %s",
+			      (void *)rxq, wc->status,
 			      ibv_wc_status_str(wc->status));
 			/* Increment dropped packets counter. */
 			++rxq->stats.idropped;
@@ -1712,9 +1660,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			 * Unable to allocate a replacement mbuf,
 			 * repost WR.
 			 */
-			DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
-			      " can't allocate a new mbuf",
-			      (void *)rxq, WR_ID(wr_id).id);
+			DEBUG("rxq=%p: can't allocate a new mbuf",
+			      (void *)rxq);
 			/* Increase out of memory counters. */
 			++rxq->stats.rx_nombuf;
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
@@ -1724,10 +1671,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		/* Reconfigure sge to use rep instead of seg. */
 		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
 		assert(elt->sge.lkey == rxq->mr->lkey);
-		WR_ID(wr->wr_id).offset =
-			(((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
-			 (uintptr_t)rep);
-		assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+		elt->buf = rep;
 
 		/* Update seg information. */
 		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
@@ -3659,7 +3603,6 @@ RTE_INIT(rte_mlx4_pmd_init);
 static void
 rte_mlx4_pmd_init(void)
 {
-	RTE_BUILD_BUG_ON(sizeof(wr_id_t) != sizeof(uint64_t));
 	/*
 	 * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
 	 * huge pages. Calling ibv_fork_init() during init allows
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 772784f..c619c87 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -148,7 +148,7 @@ struct mlx4_rxq_stats {
 struct rxq_elt {
 	struct ibv_recv_wr wr; /* Work Request. */
 	struct ibv_sge sge; /* Scatter/Gather Element. */
-	/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+	struct rte_mbuf *buf; /**< Buffer. */
 };
 
 /* RX queue descriptor. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 27/51] net/mlx4: simplify link update function
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (25 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 26/51] net/mlx4: simplify Rx buffer handling Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 28/51] net/mlx4: standardize on negative errno values Adrien Mazarguil
                     ` (25 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Returning a different value when the current link status differs from the
previous one was probably useful at some point in the past but is now
meaningless; this value is ignored both internally (mlx4 PMD) and
externally (ethdev wrapper).

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 1208e7a..9694170 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2476,13 +2476,8 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 				ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
 	dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
 			ETH_LINK_SPEED_FIXED);
-	if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) {
-		/* Link status changed. */
-		dev->data->dev_link = dev_link;
-		return 0;
-	}
-	/* Link status is still the same. */
-	return -1;
+	dev->data->dev_link = dev_link;
+	return 0;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 28/51] net/mlx4: standardize on negative errno values
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (26 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 27/51] net/mlx4: simplify link update function Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 29/51] net/mlx4: clean up coding style inconsistencies Adrien Mazarguil
                     ` (24 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Due to its reliance on system calls, the mlx4 PMD uses positive errno
values internally and negative ones at the ethdev API border. Although most
internal functions are documented, this mixed design is unusual and prone
to mistakes (e.g. flow API implementation uses negative values
exclusively).

Standardize on negative errno values and rely on rte_errno instead of
errno in all functions.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 483 ++++++++++++++++++++++++-------------------
 1 file changed, 273 insertions(+), 210 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 9694170..fbb3d73 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -148,7 +148,7 @@ void priv_unlock(struct priv *priv)
  *   Interface name output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
@@ -163,8 +163,10 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
 
 		dir = opendir(path);
-		if (dir == NULL)
-			return -1;
+		if (dir == NULL) {
+			rte_errno = errno;
+			return -rte_errno;
+		}
 	}
 	while ((dent = readdir(dir)) != NULL) {
 		char *name = dent->d_name;
@@ -214,8 +216,10 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 			snprintf(match, sizeof(match), "%s", name);
 	}
 	closedir(dir);
-	if (match[0] == '\0')
-		return -1;
+	if (match[0] == '\0') {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
 	strncpy(*ifname, match, sizeof(*ifname));
 	return 0;
 }
@@ -233,7 +237,8 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
  *   Buffer size.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   Number of bytes read on success, negative errno value otherwise and
+ *   rte_errno is set.
  */
 static int
 priv_sysfs_read(const struct priv *priv, const char *entry,
@@ -242,25 +247,27 @@ priv_sysfs_read(const struct priv *priv, const char *entry,
 	char ifname[IF_NAMESIZE];
 	FILE *file;
 	int ret;
-	int err;
 
-	if (priv_get_ifname(priv, &ifname))
-		return -1;
+	ret = priv_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
 
 	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
 	      ifname, entry);
 
 	file = fopen(path, "rb");
-	if (file == NULL)
-		return -1;
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
 	ret = fread(buf, 1, size, file);
-	err = errno;
-	if (((size_t)ret < size) && (ferror(file)))
-		ret = -1;
-	else
+	if ((size_t)ret < size && ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
 		ret = size;
+	}
 	fclose(file);
-	errno = err;
 	return ret;
 }
 
@@ -277,7 +284,8 @@ priv_sysfs_read(const struct priv *priv, const char *entry,
  *   Buffer size.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   Number of bytes written on success, negative errno value otherwise and
+ *   rte_errno is set.
  */
 static int
 priv_sysfs_write(const struct priv *priv, const char *entry,
@@ -286,25 +294,27 @@ priv_sysfs_write(const struct priv *priv, const char *entry,
 	char ifname[IF_NAMESIZE];
 	FILE *file;
 	int ret;
-	int err;
 
-	if (priv_get_ifname(priv, &ifname))
-		return -1;
+	ret = priv_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
 
 	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
 	      ifname, entry);
 
 	file = fopen(path, "wb");
-	if (file == NULL)
-		return -1;
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
 	ret = fwrite(buf, 1, size, file);
-	err = errno;
-	if (((size_t)ret < size) || (ferror(file)))
-		ret = -1;
-	else
+	if ((size_t)ret < size || ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
 		ret = size;
+	}
 	fclose(file);
-	errno = err;
 	return ret;
 }
 
@@ -319,7 +329,7 @@ priv_sysfs_write(const struct priv *priv, const char *entry,
  *   Value output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
@@ -329,18 +339,19 @@ priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
 	char value_str[32];
 
 	ret = priv_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret == -1) {
+	if (ret < 0) {
 		DEBUG("cannot read %s value from sysfs: %s",
-		      name, strerror(errno));
-		return -1;
+		      name, strerror(rte_errno));
+		return ret;
 	}
 	value_str[ret] = '\0';
 	errno = 0;
 	value_ret = strtoul(value_str, NULL, 0);
 	if (errno) {
+		rte_errno = errno;
 		DEBUG("invalid %s value `%s': %s", name, value_str,
-		      strerror(errno));
-		return -1;
+		      strerror(rte_errno));
+		return -rte_errno;
 	}
 	*value = value_ret;
 	return 0;
@@ -357,7 +368,7 @@ priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
  *   Value to set.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
@@ -366,10 +377,10 @@ priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
 	MKSTR(value_str, "%lu", value);
 
 	ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret == -1) {
+	if (ret < 0) {
 		DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
-		      name, value_str, value, strerror(errno));
-		return -1;
+		      name, value_str, value, strerror(rte_errno));
+		return ret;
 	}
 	return 0;
 }
@@ -385,18 +396,23 @@ priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
  *   Interface request structure output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
 {
 	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-	int ret = -1;
+	int ret;
 
-	if (sock == -1)
-		return ret;
-	if (priv_get_ifname(priv, &ifr->ifr_name) == 0)
-		ret = ioctl(sock, req, ifr);
+	if (sock == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = priv_get_ifname(priv, &ifr->ifr_name);
+	if (!ret && ioctl(sock, req, ifr) == -1) {
+		rte_errno = errno;
+		ret = -rte_errno;
+	}
 	close(sock);
 	return ret;
 }
@@ -410,15 +426,16 @@ priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
  *   MTU value output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_mtu(struct priv *priv, uint16_t *mtu)
 {
-	unsigned long ulong_mtu;
+	unsigned long ulong_mtu = 0;
+	int ret = priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu);
 
-	if (priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu) == -1)
-		return -1;
+	if (ret)
+		return ret;
 	*mtu = ulong_mtu;
 	return 0;
 }
@@ -432,20 +449,23 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
  *   MTU value to set.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_mtu(struct priv *priv, uint16_t mtu)
 {
 	uint16_t new_mtu;
+	int ret = priv_set_sysfs_ulong(priv, "mtu", mtu);
 
-	if (priv_set_sysfs_ulong(priv, "mtu", mtu) ||
-	    priv_get_mtu(priv, &new_mtu))
-		return -1;
+	if (ret)
+		return ret;
+	ret = priv_get_mtu(priv, &new_mtu);
+	if (ret)
+		return ret;
 	if (new_mtu == mtu)
 		return 0;
-	errno = EINVAL;
-	return -1;
+	rte_errno = EINVAL;
+	return -rte_errno;
 }
 
 /**
@@ -459,15 +479,16 @@ priv_set_mtu(struct priv *priv, uint16_t mtu)
  *   Bitmask for flags to modify.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
 {
-	unsigned long tmp;
+	unsigned long tmp = 0;
+	int ret = priv_get_sysfs_ulong(priv, "flags", &tmp);
 
-	if (priv_get_sysfs_ulong(priv, "flags", &tmp) == -1)
-		return -1;
+	if (ret)
+		return ret;
 	tmp &= keep;
 	tmp |= (flags & (~keep));
 	return priv_set_sysfs_ulong(priv, "flags", tmp);
@@ -502,7 +523,7 @@ priv_mac_addr_del(struct priv *priv);
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 dev_configure(struct rte_eth_dev *dev)
@@ -533,7 +554,7 @@ dev_configure(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
@@ -543,9 +564,8 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 
 	priv_lock(priv);
 	ret = dev_configure(dev);
-	assert(ret >= 0);
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
@@ -562,7 +582,7 @@ static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
  *   Number of elements to allocate.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 txq_alloc_elts(struct txq *txq, unsigned int elts_n)
@@ -601,7 +621,8 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 
 	DEBUG("%p: failed, freed everything", (void *)txq);
 	assert(ret > 0);
-	return ret;
+	rte_errno = ret;
+	return -rte_errno;
 }
 
 /**
@@ -795,7 +816,7 @@ static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
  *   Pointer to memory pool.
  *
  * @return
- *   Memory region pointer, NULL in case of error.
+ *   Memory region pointer, NULL in case of error and rte_errno is set.
  */
 static struct ibv_mr *
 mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
@@ -804,8 +825,10 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	uintptr_t start;
 	uintptr_t end;
 	unsigned int i;
+	struct ibv_mr *mr;
 
 	if (mlx4_check_mempool(mp, &start, &end) != 0) {
+		rte_errno = EINVAL;
 		ERROR("mempool %p: not virtually contiguous",
 			(void *)mp);
 		return NULL;
@@ -828,10 +851,13 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
 	      (void *)mp, (void *)start, (void *)end,
 	      (size_t)(end - start));
-	return ibv_reg_mr(pd,
-			  (void *)start,
-			  end - start,
-			  IBV_ACCESS_LOCAL_WRITE);
+	mr = ibv_reg_mr(pd,
+			(void *)start,
+			end - start,
+			IBV_ACCESS_LOCAL_WRITE);
+	if (!mr)
+		rte_errno = errno ? errno : EINVAL;
+	return mr;
 }
 
 /**
@@ -1144,7 +1170,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   Thresholds parameters.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
@@ -1159,21 +1185,24 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 		struct ibv_qp_init_attr init;
 		struct ibv_qp_attr mod;
 	} attr;
-	int ret = 0;
+	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
-	if (priv == NULL)
-		return EINVAL;
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		goto error;
+	}
 	if (desc == 0) {
+		rte_errno = EINVAL;
 		ERROR("%p: invalid number of Tx descriptors", (void *)dev);
-		return EINVAL;
+		goto error;
 	}
 	/* MRs will be registered in mp2mr[] later. */
 	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
 	if (tmpl.cq == NULL) {
-		ret = ENOMEM;
+		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	DEBUG("priv->device_attr.max_qp_wr is %d",
@@ -1201,9 +1230,9 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
 	if (tmpl.qp == NULL) {
-		ret = (errno ? errno : EINVAL);
+		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	/* ibv_create_qp() updates this value. */
@@ -1216,14 +1245,16 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	ret = txq_alloc_elts(&tmpl, desc);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: TXQ allocation failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	attr.mod = (struct ibv_qp_attr){
@@ -1231,15 +1262,17 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	attr.mod.qp_state = IBV_QPS_RTS;
 	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	/* Clean up txq in case we're reinitializing it. */
@@ -1249,12 +1282,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
 	/* Pre-register known mempools. */
 	rte_mempool_walk(txq_mp2mr_iter, txq);
-	assert(ret == 0);
 	return 0;
 error:
+	ret = rte_errno;
 	txq_cleanup(&tmpl);
-	assert(ret > 0);
-	return ret;
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
 }
 
 /**
@@ -1272,7 +1306,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
  *   Thresholds parameters.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
@@ -1286,27 +1320,30 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->txqs_n) {
+		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->txqs_n);
 		priv_unlock(priv);
-		return -EOVERFLOW;
+		return -rte_errno;
 	}
 	if (txq != NULL) {
 		DEBUG("%p: reusing already allocated queue index %u (%p)",
 		      (void *)dev, idx, (void *)txq);
 		if (priv->started) {
+			rte_errno = EEXIST;
 			priv_unlock(priv);
-			return -EEXIST;
+			return -rte_errno;
 		}
 		(*priv->txqs)[idx] = NULL;
 		txq_cleanup(txq);
 	} else {
 		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
 		if (txq == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
 			priv_unlock(priv);
-			return -ENOMEM;
+			return -rte_errno;
 		}
 	}
 	ret = txq_setup(dev, txq, desc, socket, conf);
@@ -1321,7 +1358,7 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 /**
@@ -1364,7 +1401,7 @@ mlx4_tx_queue_release(void *dpdk_txq)
  *   Number of elements to allocate.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
@@ -1373,11 +1410,10 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 	struct rxq_elt (*elts)[elts_n] =
 		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
 				  rxq->socket);
-	int ret = 0;
 
 	if (elts == NULL) {
+		rte_errno = ENOMEM;
 		ERROR("%p: can't allocate packets array", (void *)rxq);
-		ret = ENOMEM;
 		goto error;
 	}
 	/* For each WR (packet). */
@@ -1388,8 +1424,8 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 		struct rte_mbuf *buf = rte_pktmbuf_alloc(rxq->mp);
 
 		if (buf == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("%p: empty mbuf pool", (void *)rxq);
-			ret = ENOMEM;
 			goto error;
 		}
 		elt->buf = buf;
@@ -1418,7 +1454,6 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 	rxq->elts_n = elts_n;
 	rxq->elts_head = 0;
 	rxq->elts = elts;
-	assert(ret == 0);
 	return 0;
 error:
 	if (elts != NULL) {
@@ -1427,8 +1462,8 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 		rte_free(elts);
 	}
 	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(ret > 0);
-	return ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
 }
 
 /**
@@ -1485,7 +1520,7 @@ priv_mac_addr_del(struct priv *priv)
  *   Pointer to private structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_mac_addr_add(struct priv *priv)
@@ -1543,16 +1578,12 @@ priv_mac_addr_add(struct priv *priv)
 	      (void *)priv,
 	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
 	/* Create related flow. */
-	errno = 0;
 	flow = ibv_create_flow(rxq->qp, attr);
 	if (flow == NULL) {
-		/* It's not clear whether errno is always set in this case. */
+		rte_errno = errno ? errno : EINVAL;
 		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, errno,
-		      (errno ? strerror(errno) : "Unknown error"));
-		if (errno)
-			return errno;
-		return EINVAL;
+		      (void *)rxq, rte_errno, strerror(errno));
+		return -rte_errno;
 	}
 	assert(priv->mac_flow == NULL);
 	priv->mac_flow = flow;
@@ -1724,11 +1755,12 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
  *   Number of descriptors in QP (hint only).
  *
  * @return
- *   QP pointer or NULL in case of error.
+ *   QP pointer or NULL in case of error and rte_errno is set.
  */
 static struct ibv_qp *
 rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 {
+	struct ibv_qp *qp;
 	struct ibv_qp_init_attr attr = {
 		/* CQ to be associated with the send queue. */
 		.send_cq = cq,
@@ -1745,7 +1777,10 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
 		.qp_type = IBV_QPT_RAW_PACKET,
 	};
 
-	return ibv_create_qp(priv->pd, &attr);
+	qp = ibv_create_qp(priv->pd, &attr);
+	if (!qp)
+		rte_errno = errno ? errno : EINVAL;
+	return qp;
 }
 
 /**
@@ -1765,7 +1800,7 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
  *   Memory pool for buffer allocations.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
@@ -1781,13 +1816,14 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	struct ibv_qp_attr mod;
 	struct ibv_recv_wr *bad_wr;
 	unsigned int mb_len;
-	int ret = 0;
+	int ret;
 
 	(void)conf; /* Thresholds configuration (ignored). */
 	mb_len = rte_pktmbuf_data_room_size(mp);
 	if (desc == 0) {
+		rte_errno = EINVAL;
 		ERROR("%p: invalid number of Rx descriptors", (void *)dev);
-		return EINVAL;
+		goto error;
 	}
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
@@ -1809,26 +1845,26 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	/* Use the entire RX mempool as the memory region. */
 	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
 	if (tmpl.mr == NULL) {
-		ret = EINVAL;
+		rte_errno = EINVAL;
 		ERROR("%p: MR creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	if (dev->data->dev_conf.intr_conf.rxq) {
 		tmpl.channel = ibv_create_comp_channel(priv->ctx);
 		if (tmpl.channel == NULL) {
-			ret = ENOMEM;
+			rte_errno = ENOMEM;
 			ERROR("%p: Rx interrupt completion channel creation"
 			      " failure: %s",
-			      (void *)dev, strerror(ret));
+			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
 	}
 	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
 	if (tmpl.cq == NULL) {
-		ret = ENOMEM;
+		rte_errno = ENOMEM;
 		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	DEBUG("priv->device_attr.max_qp_wr is %d",
@@ -1837,9 +1873,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	      priv->device_attr.max_sge);
 	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
 	if (tmpl.qp == NULL) {
-		ret = (errno ? errno : EINVAL);
 		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	mod = (struct ibv_qp_attr){
@@ -1850,22 +1885,24 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	ret = rxq_alloc_elts(&tmpl, desc);
 	if (ret) {
 		ERROR("%p: RXQ allocation failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
 		      (void *)dev,
 		      (void *)bad_wr,
-		      strerror(ret));
+		      strerror(rte_errno));
 		goto error;
 	}
 	mod = (struct ibv_qp_attr){
@@ -1873,8 +1910,9 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	};
 	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
 	if (ret) {
+		rte_errno = ret;
 		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(ret));
+		      (void *)dev, strerror(rte_errno));
 		goto error;
 	}
 	/* Save port ID. */
@@ -1885,12 +1923,13 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	rxq_cleanup(rxq);
 	*rxq = tmpl;
 	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
-	assert(ret == 0);
 	return 0;
 error:
+	ret = rte_errno;
 	rxq_cleanup(&tmpl);
-	assert(ret > 0);
-	return ret;
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
 }
 
 /**
@@ -1910,7 +1949,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
  *   Memory pool for buffer allocations.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
@@ -1925,17 +1964,19 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->rxqs_n) {
+		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->rxqs_n);
 		priv_unlock(priv);
-		return -EOVERFLOW;
+		return -rte_errno;
 	}
 	if (rxq != NULL) {
 		DEBUG("%p: reusing already allocated queue index %u (%p)",
 		      (void *)dev, idx, (void *)rxq);
 		if (priv->started) {
+			rte_errno = EEXIST;
 			priv_unlock(priv);
-			return -EEXIST;
+			return -rte_errno;
 		}
 		(*priv->rxqs)[idx] = NULL;
 		if (idx == 0)
@@ -1944,10 +1985,11 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	} else {
 		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
 		if (rxq == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
 			priv_unlock(priv);
-			return -ENOMEM;
+			return -rte_errno;
 		}
 	}
 	ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
@@ -1962,7 +2004,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 /**
@@ -2014,7 +2056,7 @@ priv_dev_link_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_start(struct rte_eth_dev *dev)
@@ -2063,7 +2105,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	priv_mac_addr_del(priv);
 	priv->started = 0;
 	priv_unlock(priv);
-	return -ret;
+	return ret;
 }
 
 /**
@@ -2228,7 +2270,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
  *   Nonzero for link up, otherwise link down.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_set_link(struct priv *priv, int up)
@@ -2258,7 +2300,7 @@ priv_set_link(struct priv *priv, int up)
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_set_link_down(struct rte_eth_dev *dev)
@@ -2279,7 +2321,7 @@ mlx4_set_link_down(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device structure.
  *
  * @return
- *   0 on success, errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_set_link_up(struct rte_eth_dev *dev)
@@ -2437,6 +2479,9 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
  *   Pointer to Ethernet device structure.
  * @param wait_to_complete
  *   Wait for request completion (ignored).
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
@@ -2451,12 +2496,14 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 
 	/* priv_lock() is not taken to allow concurrent calls. */
 
-	if (priv == NULL)
-		return -EINVAL;
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
 	(void)wait_to_complete;
 	if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
-		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
-		return -1;
+		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(rte_errno));
+		return -rte_errno;
 	}
 	memset(&dev_link, 0, sizeof(dev_link));
 	dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
@@ -2464,8 +2511,8 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	ifr.ifr_data = (void *)&edata;
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
-		     strerror(errno));
-		return -1;
+		     strerror(rte_errno));
+		return -rte_errno;
 	}
 	link_speed = ethtool_cmd_speed(&edata);
 	if (link_speed == -1)
@@ -2489,7 +2536,7 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
  *   New MTU.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
@@ -2500,9 +2547,9 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	priv_lock(priv);
 	/* Set kernel interface MTU first. */
 	if (priv_set_mtu(priv, mtu)) {
-		ret = errno;
+		ret = rte_errno;
 		WARN("cannot set port %u MTU to %u: %s", priv->port, mtu,
-		     strerror(ret));
+		     strerror(rte_errno));
 		goto out;
 	} else
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
@@ -2522,7 +2569,7 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
  *   Flow control output buffer.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
@@ -2537,10 +2584,10 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	ifr.ifr_data = (void *)&ethpause;
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = errno;
+		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
 		     " failed: %s",
-		     strerror(ret));
+		     strerror(rte_errno));
 		goto out;
 	}
 
@@ -2570,7 +2617,7 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
  *   Flow control parameters.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
@@ -2598,10 +2645,10 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = errno;
+		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
 		     " failed: %s",
-		     strerror(ret));
+		     strerror(rte_errno));
 		goto out;
 	}
 	ret = 0;
@@ -2634,7 +2681,7 @@ const struct rte_flow_ops mlx4_flow_ops = {
  *   Pointer to operation-specific structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
@@ -2642,12 +2689,10 @@ mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
 		     enum rte_filter_op filter_op,
 		     void *arg)
 {
-	int ret = EINVAL;
-
 	switch (filter_type) {
 	case RTE_ETH_FILTER_GENERIC:
 		if (filter_op != RTE_ETH_FILTER_GET)
-			return -EINVAL;
+			break;
 		*(const void **)arg = &mlx4_flow_ops;
 		return 0;
 	default:
@@ -2655,7 +2700,8 @@ mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
 		      (void *)dev, filter_type);
 		break;
 	}
-	return -ret;
+	rte_errno = ENOTSUP;
+	return -rte_errno;
 }
 
 static const struct eth_dev_ops mlx4_dev_ops = {
@@ -2690,7 +2736,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
  *   PCI bus address output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
@@ -2701,8 +2747,10 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
 	MKSTR(path, "%s/device/uevent", device->ibdev_path);
 
 	file = fopen(path, "rb");
-	if (file == NULL)
-		return -1;
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
 	while (fgets(line, sizeof(line), file) == line) {
 		size_t len = strlen(line);
 		int ret;
@@ -2740,15 +2788,16 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
  *   MAC address output buffer.
  *
  * @return
- *   0 on success, -1 on failure and errno is set.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 {
 	struct ifreq request;
+	int ret = priv_ifreq(priv, SIOCGIFHWADDR, &request);
 
-	if (priv_ifreq(priv, SIOCGIFHWADDR, &request))
-		return -1;
+	if (ret)
+		return ret;
 	memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
 	return 0;
 }
@@ -2886,7 +2935,7 @@ mlx4_dev_interrupt_handler(void *cb_arg)
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
@@ -2900,11 +2949,9 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 					   mlx4_dev_interrupt_handler,
 					   dev);
 	if (ret < 0) {
-		ERROR("rte_intr_callback_unregister failed with %d"
-		      "%s%s%s", ret,
-		      (errno ? " (errno: " : ""),
-		      (errno ? strerror(errno) : ""),
-		      (errno ? ")" : ""));
+		rte_errno = ret;
+		ERROR("rte_intr_callback_unregister failed with %d %s",
+		      ret, strerror(rte_errno));
 	}
 	priv->intr_handle.fd = 0;
 	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -2919,7 +2966,7 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_interrupt_handler_install(struct priv *priv,
@@ -2939,10 +2986,11 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 	flags = fcntl(priv->ctx->async_fd, F_GETFL);
 	rc = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
 	if (rc < 0) {
+		rte_errno = errno ? errno : EINVAL;
 		INFO("failed to change file descriptor async event queue");
 		dev->data->dev_conf.intr_conf.lsc = 0;
 		dev->data->dev_conf.intr_conf.rmv = 0;
-		return -errno;
+		return -rte_errno;
 	} else {
 		priv->intr_handle.fd = priv->ctx->async_fd;
 		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
@@ -2950,9 +2998,10 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 						 mlx4_dev_interrupt_handler,
 						 dev);
 		if (rc) {
+			rte_errno = -rc;
 			ERROR("rte_intr_callback_register failed "
-			      " (errno: %s)", strerror(errno));
-			return rc;
+			      " (rte_errno: %s)", strerror(rte_errno));
+			return -rte_errno;
 		}
 	}
 	return 0;
@@ -2966,7 +3015,7 @@ priv_dev_interrupt_handler_install(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
@@ -2987,7 +3036,7 @@ priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error,
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
@@ -3005,7 +3054,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
 		if (rte_eal_alarm_cancel(mlx4_dev_link_status_handler,
 					 dev)) {
 			ERROR("rte_eal_alarm_cancel failed "
-			      " (errno: %s)", strerror(rte_errno));
+			      " (rte_errno: %s)", strerror(rte_errno));
 			return -rte_errno;
 		}
 	priv->pending_alarm = 0;
@@ -3020,7 +3069,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_link_interrupt_handler_install(struct priv *priv,
@@ -3045,7 +3094,7 @@ priv_dev_link_interrupt_handler_install(struct priv *priv,
  * @param dev
  *   Pointer to the rte_eth_dev structure.
  * @return
- *   0 on success, negative value on error.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_dev_removal_interrupt_handler_install(struct priv *priv,
@@ -3069,7 +3118,7 @@ priv_dev_removal_interrupt_handler_install(struct priv *priv,
  *   Pointer to private structure.
  *
  * @return
- *   0 on success, negative on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 priv_rx_intr_vec_enable(struct priv *priv)
@@ -3085,9 +3134,10 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	priv_rx_intr_vec_disable(priv);
 	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
 	if (intr_handle->intr_vec == NULL) {
+		rte_errno = ENOMEM;
 		ERROR("failed to allocate memory for interrupt vector,"
 		      " Rx interrupts will not be supported");
-		return -ENOMEM;
+		return -rte_errno;
 	}
 	intr_handle->type = RTE_INTR_HANDLE_EXT;
 	for (i = 0; i != n; ++i) {
@@ -3105,20 +3155,22 @@ priv_rx_intr_vec_enable(struct priv *priv)
 			continue;
 		}
 		if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
+			rte_errno = E2BIG;
 			ERROR("too many Rx queues for interrupt vector size"
 			      " (%d), Rx interrupts cannot be enabled",
 			      RTE_MAX_RXTX_INTR_VEC_ID);
 			priv_rx_intr_vec_disable(priv);
-			return -1;
+			return -rte_errno;
 		}
 		fd = rxq->channel->fd;
 		flags = fcntl(fd, F_GETFL);
 		rc = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
 		if (rc < 0) {
+			rte_errno = errno;
 			ERROR("failed to make Rx interrupt file descriptor"
 			      " %d non-blocking for queue index %d", fd, i);
 			priv_rx_intr_vec_disable(priv);
-			return rc;
+			return -rte_errno;
 		}
 		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
 		intr_handle->efds[count] = fd;
@@ -3157,7 +3209,7 @@ priv_rx_intr_vec_disable(struct priv *priv)
  *   Rx queue index.
  *
  * @return
- *   0 on success, negative on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
@@ -3170,8 +3222,10 @@ mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
 		ret = EINVAL;
 	else
 		ret = ibv_req_notify_cq(rxq->cq, 0);
-	if (ret)
+	if (ret) {
+		rte_errno = ret;
 		WARN("unable to arm interrupt on rx queue %d", idx);
+	}
 	return -ret;
 }
 
@@ -3184,7 +3238,7 @@ mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
  *   Rx queue index.
  *
  * @return
- *   0 on success, negative on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
@@ -3202,11 +3256,13 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
 		if (ret || ev_cq != rxq->cq)
 			ret = EINVAL;
 	}
-	if (ret)
+	if (ret) {
+		rte_errno = ret;
 		WARN("unable to disable interrupt on rx queue %d",
 		     idx);
-	else
+	} else {
 		ibv_ack_cq_events(rxq->cq, 1);
+	}
 	return -ret;
 }
 
@@ -3221,7 +3277,7 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
  *   Shared configuration data.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
@@ -3231,8 +3287,9 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 	errno = 0;
 	tmp = strtoul(val, NULL, 0);
 	if (errno) {
+		rte_errno = errno;
 		WARN("%s: \"%s\" is not a valid integer", key, val);
-		return -errno;
+		return -rte_errno;
 	}
 	if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) {
 		uint32_t ports = rte_log2_u32(conf->ports.present);
@@ -3243,13 +3300,15 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
 			return -EINVAL;
 		}
 		if (!(conf->ports.present & (1 << tmp))) {
+			rte_errno = EINVAL;
 			ERROR("invalid port index %lu", tmp);
-			return -EINVAL;
+			return -rte_errno;
 		}
 		conf->ports.enabled |= 1 << tmp;
 	} else {
+		rte_errno = EINVAL;
 		WARN("%s: unknown parameter", key);
-		return -EINVAL;
+		return -rte_errno;
 	}
 	return 0;
 }
@@ -3261,7 +3320,7 @@ mlx4_arg_parse(const char *key, const char *val, struct mlx4_conf *conf)
  *   Device arguments structure.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
@@ -3275,8 +3334,9 @@ mlx4_args(struct rte_devargs *devargs, struct mlx4_conf *conf)
 		return 0;
 	kvlist = rte_kvargs_parse(devargs->args, pmd_mlx4_init_params);
 	if (kvlist == NULL) {
+		rte_errno = EINVAL;
 		ERROR("failed to parse kvargs");
-		return -EINVAL;
+		return -rte_errno;
 	}
 	/* Process parameters. */
 	for (i = 0; pmd_mlx4_init_params[i]; ++i) {
@@ -3312,7 +3372,7 @@ static struct rte_pci_driver mlx4_driver;
  *   PCI device information.
  *
  * @return
- *   0 on success, negative errno value on failure.
+ *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
 mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
@@ -3333,10 +3393,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	list = ibv_get_device_list(&i);
 	if (list == NULL) {
-		assert(errno);
-		if (errno == ENOSYS)
+		rte_errno = errno;
+		assert(rte_errno);
+		if (rte_errno == ENOSYS)
 			ERROR("cannot list devices, is ib_uverbs loaded?");
-		return -errno;
+		return -rte_errno;
 	}
 	assert(i >= 0);
 	/*
@@ -3367,20 +3428,23 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		ibv_free_device_list(list);
 		switch (err) {
 		case 0:
+			rte_errno = ENODEV;
 			ERROR("cannot access device, is mlx4_ib loaded?");
-			return -ENODEV;
+			return -rte_errno;
 		case EINVAL:
+			rte_errno = EINVAL;
 			ERROR("cannot use device, are drivers up to date?");
-			return -EINVAL;
+			return -rte_errno;
 		}
 		assert(err > 0);
-		return -err;
+		rte_errno = err;
+		return -rte_errno;
 	}
 	ibv_dev = list[i];
 
 	DEBUG("device opened");
 	if (ibv_query_device(attr_ctx, &device_attr)) {
-		err = ENODEV;
+		rte_errno = ENODEV;
 		goto error;
 	}
 	INFO("%u port(s) detected", device_attr.phys_port_cnt);
@@ -3388,7 +3452,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	conf.ports.present |= (UINT64_C(1) << device_attr.phys_port_cnt) - 1;
 	if (mlx4_args(pci_dev->device.devargs, &conf)) {
 		ERROR("failed to process device arguments");
-		err = EINVAL;
+		rte_errno = EINVAL;
 		goto error;
 	}
 	/* Use all ports when none are defined */
@@ -3411,22 +3475,22 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 		ctx = ibv_open_device(ibv_dev);
 		if (ctx == NULL) {
-			err = ENODEV;
+			rte_errno = ENODEV;
 			goto port_error;
 		}
 
 		/* Check port status. */
 		err = ibv_query_port(ctx, port, &port_attr);
 		if (err) {
-			ERROR("port query failed: %s", strerror(err));
-			err = ENODEV;
+			rte_errno = err;
+			ERROR("port query failed: %s", strerror(rte_errno));
 			goto port_error;
 		}
 
 		if (port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) {
+			rte_errno = ENOTSUP;
 			ERROR("port %d is not configured in Ethernet mode",
 			      port);
-			err = EINVAL;
 			goto port_error;
 		}
 
@@ -3438,8 +3502,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* Allocate protection domain. */
 		pd = ibv_alloc_pd(ctx);
 		if (pd == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("PD allocation failure");
-			err = ENOMEM;
 			goto port_error;
 		}
 
@@ -3448,8 +3512,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 				   sizeof(*priv),
 				   RTE_CACHE_LINE_SIZE);
 		if (priv == NULL) {
+			rte_errno = ENOMEM;
 			ERROR("priv allocation failure");
-			err = ENOMEM;
 			goto port_error;
 		}
 
@@ -3463,8 +3527,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
 			ERROR("cannot get MAC address, is mlx4_en loaded?"
-			      " (errno: %s)", strerror(errno));
-			err = ENODEV;
+			      " (rte_errno: %s)", strerror(rte_errno));
 			goto port_error;
 		}
 		INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
@@ -3501,7 +3564,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		}
 		if (eth_dev == NULL) {
 			ERROR("can not allocate rte ethdev");
-			err = ENOMEM;
+			rte_errno = ENOMEM;
 			goto port_error;
 		}
 
@@ -3559,8 +3622,8 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		claim_zero(ibv_close_device(attr_ctx));
 	if (list)
 		ibv_free_device_list(list);
-	assert(err >= 0);
-	return -err;
+	assert(rte_errno >= 0);
+	return -rte_errno;
 }
 
 static const struct rte_pci_id mlx4_pci_id_map[] = {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 29/51] net/mlx4: clean up coding style inconsistencies
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (27 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 28/51] net/mlx4: standardize on negative errno values Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 30/51] net/mlx4: remove control path locks Adrien Mazarguil
                     ` (23 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This addresses badly formatted comments and needless empty lines before
refactoring functions into different files.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 77 ++++++++++++++-------------------------
 drivers/net/mlx4/mlx4_flow.c |  1 -
 2 files changed, 28 insertions(+), 50 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index fbb3d73..80b0e3b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -608,8 +608,10 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 	txq->elts_head = 0;
 	txq->elts_tail = 0;
 	txq->elts_comp = 0;
-	/* Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
-	 * at least 4 times per ring. */
+	/*
+	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
+	 * at least 4 times per ring.
+	 */
 	txq->elts_comp_cd_init =
 		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
 		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
@@ -618,7 +620,6 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
 	return 0;
 error:
 	rte_free(elts);
-
 	DEBUG("%p: failed, freed everything", (void *)txq);
 	assert(ret > 0);
 	rte_errno = ret;
@@ -664,7 +665,6 @@ txq_free_elts(struct txq *txq)
 	rte_free(elts);
 }
 
-
 /**
  * Clean up a TX queue.
  *
@@ -755,7 +755,6 @@ static void mlx4_check_mempool_cb(struct rte_mempool *mp,
 
 	(void)mp;
 	(void)mem_idx;
-
 	/* It already failed, skip the next chunks. */
 	if (data->ret != 0)
 		return;
@@ -799,7 +798,6 @@ static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
 	rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, &data);
 	*start = (uintptr_t)data.start;
 	*end = (uintptr_t)data.end;
-
 	return data.ret;
 }
 
@@ -833,7 +831,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 			(void *)mp);
 		return NULL;
 	}
-
 	DEBUG("mempool %p area start=%p end=%p size=%zu",
 	      (void *)mp, (void *)start, (void *)end,
 	      (size_t)(end - start));
@@ -960,8 +957,10 @@ txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
 	struct txq_mp2mr_mbuf_check_data *data = arg;
 	struct rte_mbuf *buf = obj;
 
-	/* Check whether mbuf structure fits element size and whether mempool
-	 * pointer is valid. */
+	/*
+	 * Check whether mbuf structure fits element size and whether mempool
+	 * pointer is valid.
+	 */
 	if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
 		data->ret = -1;
 }
@@ -1224,8 +1223,10 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
 			.max_inline_data = MLX4_PMD_MAX_INLINE,
 		},
 		.qp_type = IBV_QPT_RAW_PACKET,
-		/* Do *NOT* enable this, completions events are managed per
-		 * TX burst. */
+		/*
+		 * Do *NOT* enable this, completions events are managed per
+		 * Tx burst.
+		 */
 		.sq_sig_all = 0,
 	};
 	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
@@ -1698,12 +1699,10 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
 			goto repost;
 		}
-
 		/* Reconfigure sge to use rep instead of seg. */
 		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
 		assert(elt->sge.lkey == rxq->mr->lkey);
 		elt->buf = rep;
-
 		/* Update seg information. */
 		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
 		NB_SEGS(seg) = 1;
@@ -1713,7 +1712,6 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		DATA_LEN(seg) = len;
 		seg->packet_type = 0;
 		seg->ol_flags = 0;
-
 		/* Return packet. */
 		*(pkts++) = seg;
 		++pkts_ret;
@@ -2215,9 +2213,11 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
 	priv_mac_addr_del(priv);
-	/* Prevent crashes when queues are still in use. This is unfortunately
+	/*
+	 * Prevent crashes when queues are still in use. This is unfortunately
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
-	 * never release them before closing the device. */
+	 * never release them before closing the device.
+	 */
 	dev->rx_pkt_burst = removed_rx_burst;
 	dev->tx_pkt_burst = removed_tx_burst;
 	if (priv->rxqs != NULL) {
@@ -2334,6 +2334,7 @@ mlx4_set_link_up(struct rte_eth_dev *dev)
 	priv_unlock(priv);
 	return err;
 }
+
 /**
  * DPDK callback to get information about the device.
  *
@@ -2350,7 +2351,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	char ifname[IF_NAMESIZE];
 
 	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-
 	if (priv == NULL)
 		return;
 	priv_lock(priv);
@@ -2495,7 +2495,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	int link_speed = 0;
 
 	/* priv_lock() is not taken to allow concurrent calls. */
-
 	if (priv == NULL) {
 		rte_errno = EINVAL;
 		return -rte_errno;
@@ -2590,7 +2589,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		     strerror(rte_errno));
 		goto out;
 	}
-
 	fc_conf->autoneg = ethpause.autoneg;
 	if (ethpause.rx_pause && ethpause.tx_pause)
 		fc_conf->mode = RTE_FC_FULL;
@@ -2601,7 +2599,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	else
 		fc_conf->mode = RTE_FC_NONE;
 	ret = 0;
-
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
@@ -2636,13 +2633,11 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		ethpause.rx_pause = 1;
 	else
 		ethpause.rx_pause = 0;
-
 	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
 	    (fc_conf->mode & RTE_FC_TX_PAUSE))
 		ethpause.tx_pause = 1;
 	else
 		ethpause.tx_pause = 0;
-
 	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		ret = rte_errno;
@@ -2652,7 +2647,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		goto out;
 	}
 	ret = 0;
-
 out:
 	priv_unlock(priv);
 	assert(ret >= 0);
@@ -2886,8 +2880,8 @@ mlx4_dev_link_status_handler(void *arg)
 	ret = priv_dev_status_handler(priv, dev, &events);
 	priv_unlock(priv);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL,
-					      NULL);
+		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
+					      NULL, NULL);
 }
 
 /**
@@ -2934,6 +2928,7 @@ mlx4_dev_interrupt_handler(void *cb_arg)
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -2965,6 +2960,7 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -2975,8 +2971,9 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 	int flags;
 	int rc;
 
-	/* Check whether the interrupt handler has already been installed
-	 * for either type of interrupt
+	/*
+	 * Check whether the interrupt handler has already been installed
+	 * for either type of interrupt.
 	 */
 	if (priv->intr_conf.lsc &&
 	    priv->intr_conf.rmv &&
@@ -3014,6 +3011,7 @@ priv_dev_interrupt_handler_install(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3035,6 +3033,7 @@ priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3068,6 +3067,7 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3093,6 +3093,7 @@ priv_dev_link_interrupt_handler_install(struct priv *priv,
  *   Pointer to private structure.
  * @param dev
  *   Pointer to the rte_eth_dev structure.
+ *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
@@ -3390,7 +3391,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	(void)pci_drv;
 	assert(pci_drv == &mlx4_driver);
-
 	list = ibv_get_device_list(&i);
 	if (list == NULL) {
 		rte_errno = errno;
@@ -3441,14 +3441,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return -rte_errno;
 	}
 	ibv_dev = list[i];
-
 	DEBUG("device opened");
 	if (ibv_query_device(attr_ctx, &device_attr)) {
 		rte_errno = ENODEV;
 		goto error;
 	}
 	INFO("%u port(s) detected", device_attr.phys_port_cnt);
-
 	conf.ports.present |= (UINT64_C(1) << device_attr.phys_port_cnt) - 1;
 	if (mlx4_args(pci_dev->device.devargs, &conf)) {
 		ERROR("failed to process device arguments");
@@ -3470,15 +3468,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* If port is not enabled, skip. */
 		if (!(conf.ports.enabled & (1 << i)))
 			continue;
-
 		DEBUG("using port %u", port);
-
 		ctx = ibv_open_device(ibv_dev);
 		if (ctx == NULL) {
 			rte_errno = ENODEV;
 			goto port_error;
 		}
-
 		/* Check port status. */
 		err = ibv_query_port(ctx, port, &port_attr);
 		if (err) {
@@ -3486,19 +3481,16 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("port query failed: %s", strerror(rte_errno));
 			goto port_error;
 		}
-
 		if (port_attr.link_layer != IBV_LINK_LAYER_ETHERNET) {
 			rte_errno = ENOTSUP;
 			ERROR("port %d is not configured in Ethernet mode",
 			      port);
 			goto port_error;
 		}
-
 		if (port_attr.state != IBV_PORT_ACTIVE)
 			DEBUG("port %d is not active: \"%s\" (%d)",
 			      port, ibv_port_state_str(port_attr.state),
 			      port_attr.state);
-
 		/* Allocate protection domain. */
 		pd = ibv_alloc_pd(ctx);
 		if (pd == NULL) {
@@ -3506,7 +3498,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("PD allocation failure");
 			goto port_error;
 		}
-
 		/* from rte_ethdev.c */
 		priv = rte_zmalloc("ethdev private structure",
 				   sizeof(*priv),
@@ -3516,13 +3507,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			ERROR("priv allocation failure");
 			goto port_error;
 		}
-
 		priv->ctx = ctx;
 		priv->device_attr = device_attr;
 		priv->port = port;
 		priv->pd = pd;
 		priv->mtu = ETHER_MTU;
-
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
 		if (priv_get_mac(priv, &mac.addr_bytes)) {
@@ -3553,7 +3542,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		/* Get actual MTU if possible. */
 		priv_get_mtu(priv, &priv->mtu);
 		DEBUG("port %u MTU is %u", priv->port, priv->mtu);
-
 		/* from rte_ethdev.c */
 		{
 			char name[RTE_ETH_NAME_MAX_LEN];
@@ -3567,15 +3555,11 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			rte_errno = ENOMEM;
 			goto port_error;
 		}
-
 		eth_dev->data->dev_private = priv;
 		eth_dev->data->mac_addrs = &priv->mac;
 		eth_dev->device = &pci_dev->device;
-
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
-
 		eth_dev->device->driver = &mlx4_driver.driver;
-
 		/*
 		 * Copy and override interrupt handle to prevent it from
 		 * being shared between all ethdev instances of a given PCI
@@ -3584,11 +3568,9 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		 */
 		priv->intr_handle_dev = *eth_dev->intr_handle;
 		eth_dev->intr_handle = &priv->intr_handle_dev;
-
 		priv->dev = eth_dev;
 		eth_dev->dev_ops = &mlx4_dev_ops;
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
-
 		/* Bring Ethernet device up. */
 		DEBUG("forcing Ethernet interface up");
 		priv_set_flags(priv, ~IFF_UP, IFF_UP);
@@ -3596,7 +3578,6 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 			mlx4_link_update(eth_dev, 0);
 		continue;
-
 port_error:
 		rte_free(priv);
 		if (pd)
@@ -3609,14 +3590,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	}
 	if (i == device_attr.phys_port_cnt)
 		return 0;
-
 	/*
 	 * XXX if something went wrong in the loop above, there is a resource
 	 * leak (ctx, pd, priv, dpdk ethdev) but we can do nothing about it as
 	 * long as the dpdk does not provide a way to deallocate a ethdev and a
 	 * way to enumerate the registered ethdevs to free the previous ones.
 	 */
-
 error:
 	if (attr_ctx)
 		claim_zero(ibv_close_device(attr_ctx));
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 58d4698..7dcb059 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -835,7 +835,6 @@ priv_flow_create_action_queue(struct priv *priv,
 		goto error;
 	}
 	return rte_flow;
-
 error:
 	rte_free(rte_flow);
 	return NULL;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 30/51] net/mlx4: remove control path locks
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (28 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 29/51] net/mlx4: clean up coding style inconsistencies Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 31/51] net/mlx4: remove unnecessary wrapper functions Adrien Mazarguil
                     ` (22 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Concurrent use of various control path functions (e.g. configuring a queue
and destroying it simultaneously) may lead to undefined behavior.

PMD are not supposed to protect themselves from misbehaving applications,
and mlx4 is one of the few with internal locks on most control path
operations. This adds unnecessary complexity.

Leave this role to wrapper functions in ethdev.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 90 +++------------------------------------
 drivers/net/mlx4/mlx4.h      |  4 --
 drivers/net/mlx4/mlx4_flow.c | 15 +------
 3 files changed, 6 insertions(+), 103 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 80b0e3b..dc8a96f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -59,7 +59,6 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
-#include <rte_spinlock.h>
 #include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
@@ -110,29 +109,6 @@ priv_rx_intr_vec_enable(struct priv *priv);
 static void
 priv_rx_intr_vec_disable(struct priv *priv);
 
-/**
- * Lock private structure to protect it from concurrent access in the
- * control path.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void priv_lock(struct priv *priv)
-{
-	rte_spinlock_lock(&priv->lock);
-}
-
-/**
- * Unlock private structure.
- *
- * @param priv
- *   Pointer to private structure.
- */
-void priv_unlock(struct priv *priv)
-{
-	rte_spinlock_unlock(&priv->lock);
-}
-
 /* Allocate a buffer on the stack and fill it with a printf format string. */
 #define MKSTR(name, ...) \
 	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
@@ -559,13 +535,7 @@ dev_configure(struct rte_eth_dev *dev)
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
-	struct priv *priv = dev->data->dev_private;
-	int ret;
-
-	priv_lock(priv);
-	ret = dev_configure(dev);
-	priv_unlock(priv);
-	return ret;
+	return dev_configure(dev);
 }
 
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
@@ -1317,14 +1287,12 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	struct txq *txq = (*priv->txqs)[idx];
 	int ret;
 
-	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->txqs_n) {
 		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->txqs_n);
-		priv_unlock(priv);
 		return -rte_errno;
 	}
 	if (txq != NULL) {
@@ -1332,7 +1300,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, idx, (void *)txq);
 		if (priv->started) {
 			rte_errno = EEXIST;
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 		(*priv->txqs)[idx] = NULL;
@@ -1343,7 +1310,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 	}
@@ -1358,7 +1324,6 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		/* Update send callback. */
 		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
-	priv_unlock(priv);
 	return ret;
 }
 
@@ -1378,7 +1343,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 	if (txq == NULL)
 		return;
 	priv = txq->priv;
-	priv_lock(priv);
 	for (i = 0; (i != priv->txqs_n); ++i)
 		if ((*priv->txqs)[i] == txq) {
 			DEBUG("%p: removing TX queue %p from list",
@@ -1388,7 +1352,6 @@ mlx4_tx_queue_release(void *dpdk_txq)
 		}
 	txq_cleanup(txq);
 	rte_free(txq);
-	priv_unlock(priv);
 }
 
 /* RX queues handling. */
@@ -1958,14 +1921,12 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 	struct rxq *rxq = (*priv->rxqs)[idx];
 	int ret;
 
-	priv_lock(priv);
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
 	if (idx >= priv->rxqs_n) {
 		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
 		      (void *)dev, idx, priv->rxqs_n);
-		priv_unlock(priv);
 		return -rte_errno;
 	}
 	if (rxq != NULL) {
@@ -1973,7 +1934,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		      (void *)dev, idx, (void *)rxq);
 		if (priv->started) {
 			rte_errno = EEXIST;
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 		(*priv->rxqs)[idx] = NULL;
@@ -1986,7 +1946,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = ENOMEM;
 			ERROR("%p: unable to allocate queue index %u",
 			      (void *)dev, idx);
-			priv_unlock(priv);
 			return -rte_errno;
 		}
 	}
@@ -2001,7 +1960,6 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		/* Update receive callback. */
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
-	priv_unlock(priv);
 	return ret;
 }
 
@@ -2021,7 +1979,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	if (rxq == NULL)
 		return;
 	priv = rxq->priv;
-	priv_lock(priv);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",
@@ -2033,7 +1990,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 		}
 	rxq_cleanup(rxq);
 	rte_free(rxq);
-	priv_unlock(priv);
 }
 
 static int
@@ -2062,11 +2018,8 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	priv_lock(priv);
-	if (priv->started) {
-		priv_unlock(priv);
+	if (priv->started)
 		return 0;
-	}
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
 	ret = priv_mac_addr_add(priv);
@@ -2096,13 +2049,11 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		      (void *)dev, strerror(ret));
 		goto err;
 	}
-	priv_unlock(priv);
 	return 0;
 err:
 	/* Rollback. */
 	priv_mac_addr_del(priv);
 	priv->started = 0;
-	priv_unlock(priv);
 	return ret;
 }
 
@@ -2119,16 +2070,12 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 
-	priv_lock(priv);
-	if (!priv->started) {
-		priv_unlock(priv);
+	if (!priv->started)
 		return;
-	}
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
 	priv_mac_addr_del(priv);
-	priv_unlock(priv);
 }
 
 /**
@@ -2208,7 +2155,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
@@ -2257,7 +2203,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	priv_dev_removal_interrupt_handler_uninstall(priv, dev);
 	priv_dev_link_interrupt_handler_uninstall(priv, dev);
 	priv_rx_intr_vec_disable(priv);
-	priv_unlock(priv);
 	memset(priv, 0, sizeof(*priv));
 }
 
@@ -2306,12 +2251,8 @@ static int
 mlx4_set_link_down(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	int err;
 
-	priv_lock(priv);
-	err = priv_set_link(priv, 0);
-	priv_unlock(priv);
-	return err;
+	return priv_set_link(priv, 0);
 }
 
 /**
@@ -2327,12 +2268,8 @@ static int
 mlx4_set_link_up(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	int err;
 
-	priv_lock(priv);
-	err = priv_set_link(priv, 1);
-	priv_unlock(priv);
-	return err;
+	return priv_set_link(priv, 1);
 }
 
 /**
@@ -2353,7 +2290,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
 	info->max_rx_pktlen = 65536;
@@ -2380,7 +2316,6 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 			ETH_LINK_SPEED_20G |
 			ETH_LINK_SPEED_40G |
 			ETH_LINK_SPEED_56G;
-	priv_unlock(priv);
 }
 
 /**
@@ -2401,7 +2336,6 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	/* Add software counters. */
 	for (i = 0; (i != priv->rxqs_n); ++i) {
 		struct rxq *rxq = (*priv->rxqs)[i];
@@ -2436,7 +2370,6 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 		tmp.oerrors += txq->stats.odropped;
 	}
 	*stats = tmp;
-	priv_unlock(priv);
 }
 
 /**
@@ -2454,7 +2387,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 
 	if (priv == NULL)
 		return;
-	priv_lock(priv);
 	for (i = 0; (i != priv->rxqs_n); ++i) {
 		if ((*priv->rxqs)[i] == NULL)
 			continue;
@@ -2469,7 +2401,6 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
 		(*priv->txqs)[i]->stats =
 			(struct mlx4_txq_stats){ .idx = idx };
 	}
-	priv_unlock(priv);
 }
 
 /**
@@ -2494,7 +2425,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 	struct rte_eth_link dev_link;
 	int link_speed = 0;
 
-	/* priv_lock() is not taken to allow concurrent calls. */
 	if (priv == NULL) {
 		rte_errno = EINVAL;
 		return -rte_errno;
@@ -2543,7 +2473,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 	struct priv *priv = dev->data->dev_private;
 	int ret = 0;
 
-	priv_lock(priv);
 	/* Set kernel interface MTU first. */
 	if (priv_set_mtu(priv, mtu)) {
 		ret = rte_errno;
@@ -2554,7 +2483,6 @@ mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
 	priv->mtu = mtu;
 out:
-	priv_unlock(priv);
 	assert(ret >= 0);
 	return -ret;
 }
@@ -2581,7 +2509,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	int ret;
 
 	ifr.ifr_data = (void *)&ethpause;
-	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
@@ -2600,7 +2527,6 @@ mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		fc_conf->mode = RTE_FC_NONE;
 	ret = 0;
 out:
-	priv_unlock(priv);
 	assert(ret >= 0);
 	return -ret;
 }
@@ -2638,7 +2564,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 		ethpause.tx_pause = 1;
 	else
 		ethpause.tx_pause = 0;
-	priv_lock(priv);
 	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
 		ret = rte_errno;
 		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
@@ -2648,7 +2573,6 @@ mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 	}
 	ret = 0;
 out:
-	priv_unlock(priv);
 	assert(ret >= 0);
 	return -ret;
 }
@@ -2874,11 +2798,9 @@ mlx4_dev_link_status_handler(void *arg)
 	uint32_t events;
 	int ret;
 
-	priv_lock(priv);
 	assert(priv->pending_alarm == 1);
 	priv->pending_alarm = 0;
 	ret = priv_dev_status_handler(priv, dev, &events);
-	priv_unlock(priv);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
 					      NULL, NULL);
@@ -2901,9 +2823,7 @@ mlx4_dev_interrupt_handler(void *cb_arg)
 	uint32_t ev;
 	int i;
 
-	priv_lock(priv);
 	ret = priv_dev_status_handler(priv, dev, &ev);
-	priv_unlock(priv);
 	if (ret > 0) {
 		for (i = RTE_ETH_EVENT_UNKNOWN;
 		     i < RTE_ETH_EVENT_MAX;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index c619c87..c840e27 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -229,10 +229,6 @@ struct priv {
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
-	rte_spinlock_t lock; /* Lock for control functions. */
 };
 
-void priv_lock(struct priv *priv);
-void priv_unlock(struct priv *priv);
-
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 7dcb059..07305f1 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -703,13 +703,9 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 		   struct rte_flow_error *error)
 {
 	struct priv *priv = dev->data->dev_private;
-	int ret;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	priv_lock(priv);
-	ret = priv_flow_validate(priv, attr, items, actions, error, &flow);
-	priv_unlock(priv);
-	return ret;
+	return priv_flow_validate(priv, attr, items, actions, error, &flow);
 }
 
 /**
@@ -936,13 +932,11 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *flow;
 
-	priv_lock(priv);
 	flow = priv_flow_create(priv, attr, items, actions, error);
 	if (flow) {
 		LIST_INSERT_HEAD(&priv->flows, flow, next);
 		DEBUG("Flow created %p", (void *)flow);
 	}
-	priv_unlock(priv);
 	return flow;
 }
 
@@ -969,17 +963,14 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 {
 	struct priv *priv = dev->data->dev_private;
 
-	priv_lock(priv);
 	if (priv->rxqs) {
 		rte_flow_error_set(error, ENOTSUP,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, "isolated mode must be set"
 				   " before configuring the device");
-		priv_unlock(priv);
 		return -rte_errno;
 	}
 	priv->isolated = !!enable;
-	priv_unlock(priv);
 	return 0;
 }
 
@@ -1017,9 +1008,7 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 
 	(void)error;
-	priv_lock(priv);
 	priv_flow_destroy(priv, flow);
-	priv_unlock(priv);
 	return 0;
 }
 
@@ -1053,9 +1042,7 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 
 	(void)error;
-	priv_lock(priv);
 	priv_flow_flush(priv);
-	priv_unlock(priv);
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 31/51] net/mlx4: remove unnecessary wrapper functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (29 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 30/51] net/mlx4: remove control path locks Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 32/51] net/mlx4: remove mbuf macro definitions Adrien Mazarguil
                     ` (21 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Wrapper functions whose main purpose was to take a lock on the private
structure are no longer needed since this lock does not exist anymore.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  61 ++++------------------
 drivers/net/mlx4/mlx4_flow.c | 106 +++++++++-----------------------------
 2 files changed, 32 insertions(+), 135 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index dc8a96f..e1efb8c 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -417,10 +417,10 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
 }
 
 /**
- * Set device MTU.
+ * DPDK callback to change the MTU.
  *
  * @param priv
- *   Pointer to private structure.
+ *   Pointer to Ethernet device structure.
  * @param mtu
  *   MTU value to set.
  *
@@ -428,8 +428,9 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_set_mtu(struct priv *priv, uint16_t mtu)
+mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
 {
+	struct priv *priv = dev->data->dev_private;
 	uint16_t new_mtu;
 	int ret = priv_set_sysfs_ulong(priv, "mtu", mtu);
 
@@ -438,8 +439,10 @@ priv_set_mtu(struct priv *priv, uint16_t mtu)
 	ret = priv_get_mtu(priv, &new_mtu);
 	if (ret)
 		return ret;
-	if (new_mtu == mtu)
+	if (new_mtu == mtu) {
+		priv->mtu = mtu;
 		return 0;
+	}
 	rte_errno = EINVAL;
 	return -rte_errno;
 }
@@ -491,7 +494,7 @@ static void
 priv_mac_addr_del(struct priv *priv);
 
 /**
- * Ethernet device configuration.
+ * DPDK callback for Ethernet device configuration.
  *
  * Prepare the driver for a given number of TX and RX queues.
  *
@@ -502,7 +505,7 @@ priv_mac_addr_del(struct priv *priv);
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-dev_configure(struct rte_eth_dev *dev)
+mlx4_dev_configure(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
 	unsigned int rxqs_n = dev->data->nb_rx_queues;
@@ -523,21 +526,6 @@ dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-/**
- * DPDK callback for Ethernet device configuration.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_configure(struct rte_eth_dev *dev)
-{
-	return dev_configure(dev);
-}
-
 static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
 static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
 
@@ -2457,37 +2445,6 @@ mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 }
 
 /**
- * DPDK callback to change the MTU.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param in_mtu
- *   New MTU.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
-{
-	struct priv *priv = dev->data->dev_private;
-	int ret = 0;
-
-	/* Set kernel interface MTU first. */
-	if (priv_set_mtu(priv, mtu)) {
-		ret = rte_errno;
-		WARN("cannot set port %u MTU to %u: %s", priv->port, mtu,
-		     strerror(rte_errno));
-		goto out;
-	} else
-		DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
-	priv->mtu = mtu;
-out:
-	assert(ret >= 0);
-	return -ret;
-}
-
-/**
  * DPDK callback to get flow control status.
  *
  * @param dev
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 07305f1..3463713 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -837,29 +837,19 @@ priv_flow_create_action_queue(struct priv *priv,
 }
 
 /**
- * Convert a flow.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[in] attr
- *   Flow rule attributes.
- * @param[in] items
- *   Pattern specification (list terminated by the END pattern item).
- * @param[in] actions
- *   Associated actions (list terminated by the END action).
- * @param[out] error
- *   Perform verbose error reporting if not NULL.
+ * Create a flow.
  *
- * @return
- *   A flow on success, NULL otherwise.
+ * @see rte_flow_create()
+ * @see rte_flow_ops
  */
-static struct rte_flow *
-priv_flow_create(struct priv *priv,
+struct rte_flow *
+mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item items[],
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error)
 {
+	struct priv *priv = dev->data->dev_private;
 	struct rte_flow *rte_flow;
 	struct mlx4_flow_action action;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
@@ -909,38 +899,17 @@ priv_flow_create(struct priv *priv,
 	}
 	rte_flow = priv_flow_create_action_queue(priv, flow.ibv_attr,
 						 &action, error);
-	if (rte_flow)
+	if (rte_flow) {
+		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
+		DEBUG("Flow created %p", (void *)rte_flow);
 		return rte_flow;
+	}
 exit:
 	rte_free(flow.ibv_attr);
 	return NULL;
 }
 
 /**
- * Create a flow.
- *
- * @see rte_flow_create()
- * @see rte_flow_ops
- */
-struct rte_flow *
-mlx4_flow_create(struct rte_eth_dev *dev,
-		 const struct rte_flow_attr *attr,
-		 const struct rte_flow_item items[],
-		 const struct rte_flow_action actions[],
-		 struct rte_flow_error *error)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rte_flow *flow;
-
-	flow = priv_flow_create(priv, attr, items, actions, error);
-	if (flow) {
-		LIST_INSERT_HEAD(&priv->flows, flow, next);
-		DEBUG("Flow created %p", (void *)flow);
-	}
-	return flow;
-}
-
-/**
  * @see rte_flow_isolate()
  *
  * Must be done before calling dev_configure().
@@ -977,26 +946,6 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 /**
  * Destroy a flow.
  *
- * @param priv
- *   Pointer to private structure.
- * @param[in] flow
- *   Flow to destroy.
- */
-static void
-priv_flow_destroy(struct priv *priv, struct rte_flow *flow)
-{
-	(void)priv;
-	LIST_REMOVE(flow, next);
-	if (flow->ibv_flow)
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-	rte_free(flow->ibv_attr);
-	DEBUG("Flow destroyed %p", (void *)flow);
-	rte_free(flow);
-}
-
-/**
- * Destroy a flow.
- *
  * @see rte_flow_destroy()
  * @see rte_flow_ops
  */
@@ -1005,33 +954,20 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
 		  struct rte_flow *flow,
 		  struct rte_flow_error *error)
 {
-	struct priv *priv = dev->data->dev_private;
-
+	(void)dev;
 	(void)error;
-	priv_flow_destroy(priv, flow);
+	LIST_REMOVE(flow, next);
+	if (flow->ibv_flow)
+		claim_zero(ibv_destroy_flow(flow->ibv_flow));
+	rte_free(flow->ibv_attr);
+	DEBUG("Flow destroyed %p", (void *)flow);
+	rte_free(flow);
 	return 0;
 }
 
 /**
  * Destroy all flows.
  *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_flow_flush(struct priv *priv)
-{
-	while (!LIST_EMPTY(&priv->flows)) {
-		struct rte_flow *flow;
-
-		flow = LIST_FIRST(&priv->flows);
-		priv_flow_destroy(priv, flow);
-	}
-}
-
-/**
- * Destroy all flows.
- *
  * @see rte_flow_flush()
  * @see rte_flow_ops
  */
@@ -1041,8 +977,12 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
 {
 	struct priv *priv = dev->data->dev_private;
 
-	(void)error;
-	priv_flow_flush(priv);
+	while (!LIST_EMPTY(&priv->flows)) {
+		struct rte_flow *flow;
+
+		flow = LIST_FIRST(&priv->flows);
+		mlx4_flow_destroy(dev, flow, error);
+	}
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 32/51] net/mlx4: remove mbuf macro definitions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (30 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 31/51] net/mlx4: remove unnecessary wrapper functions Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 33/51] net/mlx4: use standard macro to get array size Adrien Mazarguil
                     ` (20 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

These were originally used for compatibility between DPDK releases when
this PMD was built out of tree.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e1efb8c..a94f27e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -74,15 +74,6 @@
 #include "mlx4.h"
 #include "mlx4_flow.h"
 
-/* Convenience macros for accessing mbuf fields. */
-#define NEXT(m) ((m)->next)
-#define DATA_LEN(m) ((m)->data_len)
-#define PKT_LEN(m) ((m)->pkt_len)
-#define DATA_OFF(m) ((m)->data_off)
-#define SET_DATA_OFF(m, o) ((m)->data_off = (o))
-#define NB_SEGS(m) ((m)->nb_segs)
-#define PORT(m) ((m)->port)
-
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
 	struct {
@@ -995,7 +986,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
 		struct txq_elt *elt = &(*txq->elts)[elts_head];
 		struct ibv_send_wr *wr = &elt->wr;
-		unsigned int segs = NB_SEGS(buf);
+		unsigned int segs = buf->nb_segs;
 		unsigned int sent_size = 0;
 		uint32_t send_flags = 0;
 
@@ -1009,7 +1000,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 #endif
 			/* Faster than rte_pktmbuf_free(). */
 			do {
-				struct rte_mbuf *next = NEXT(tmp);
+				struct rte_mbuf *next = tmp->next;
 
 				rte_pktmbuf_free_seg(tmp);
 				tmp = next;
@@ -1029,7 +1020,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
 
 			/* Retrieve buffer information. */
 			addr = rte_pktmbuf_mtod(buf, uintptr_t);
-			length = DATA_LEN(buf);
+			length = buf->data_len;
 			/* Retrieve Memory Region key for this memory pool. */
 			lkey = txq_mp2mr(txq, txq_mb2mp(buf));
 			if (unlikely(lkey == (uint32_t)-1)) {
@@ -1385,7 +1376,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 		wr->sg_list = sge;
 		wr->num_sge = 1;
 		/* Headroom is reserved by rte_pktmbuf_alloc(). */
-		assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+		assert(buf->data_off == RTE_PKTMBUF_HEADROOM);
 		/* Buffer is supposed to be empty. */
 		assert(rte_pktmbuf_data_len(buf) == 0);
 		assert(rte_pktmbuf_pkt_len(buf) == 0);
@@ -1655,12 +1646,12 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 		assert(elt->sge.lkey == rxq->mr->lkey);
 		elt->buf = rep;
 		/* Update seg information. */
-		SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
-		NB_SEGS(seg) = 1;
-		PORT(seg) = rxq->port_id;
-		NEXT(seg) = NULL;
-		PKT_LEN(seg) = len;
-		DATA_LEN(seg) = len;
+		seg->data_off = RTE_PKTMBUF_HEADROOM;
+		seg->nb_segs = 1;
+		seg->port = rxq->port_id;
+		seg->next = NULL;
+		seg->pkt_len = len;
+		seg->data_len = len;
 		seg->packet_type = 0;
 		seg->ol_flags = 0;
 		/* Return packet. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 33/51] net/mlx4: use standard macro to get array size
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (31 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 32/51] net/mlx4: remove mbuf macro definitions Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 34/51] net/mlx4: separate debugging macros Adrien Mazarguil
                     ` (19 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 11 ++++++-----
 drivers/net/mlx4/mlx4.h |  3 ---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a94f27e..51259d2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -66,6 +66,7 @@
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
 #include <rte_branch_prediction.h>
+#include <rte_common.h>
 
 /* Generated configuration header. */
 #include "mlx4_autoconf.h"
@@ -633,7 +634,7 @@ txq_cleanup(struct txq *txq)
 		claim_zero(ibv_destroy_qp(txq->qp));
 	if (txq->cq != NULL)
 		claim_zero(ibv_destroy_cq(txq->cq));
-	for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
 		if (txq->mp2mr[i].mp == NULL)
 			break;
 		assert(txq->mp2mr[i].mr != NULL);
@@ -843,7 +844,7 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
 	unsigned int i;
 	struct ibv_mr *mr;
 
-	for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
 		if (unlikely(txq->mp2mr[i].mp == NULL)) {
 			/* Unknown MP, add a new MR for it. */
 			break;
@@ -863,7 +864,7 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
 		      (void *)txq);
 		return (uint32_t)-1;
 	}
-	if (unlikely(i == elemof(txq->mp2mr))) {
+	if (unlikely(i == RTE_DIM(txq->mp2mr))) {
 		/* Table is full, remove oldest entry. */
 		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
 		      (void *)txq);
@@ -1400,7 +1401,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
 	return 0;
 error:
 	if (elts != NULL) {
-		for (i = 0; (i != elemof(*elts)); ++i)
+		for (i = 0; (i != RTE_DIM(*elts)); ++i)
 			rte_pktmbuf_free_seg((*elts)[i].buf);
 		rte_free(elts);
 	}
@@ -1427,7 +1428,7 @@ rxq_free_elts(struct rxq *rxq)
 	rxq->elts = NULL;
 	if (elts == NULL)
 		return;
-	for (i = 0; (i != elemof(*elts)); ++i)
+	for (i = 0; (i != RTE_DIM(*elts)); ++i)
 		rte_pktmbuf_free_seg((*elts)[i].buf);
 	rte_free(elts);
 }
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index c840e27..e2990fe 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -98,9 +98,6 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-/* Number of elements in array. */
-#define elemof(a) (sizeof(a) / sizeof((a)[0]))
-
 /* Debugging */
 #ifndef NDEBUG
 #include <stdio.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 34/51] net/mlx4: separate debugging macros
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (32 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 33/51] net/mlx4: use standard macro to get array size Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 35/51] net/mlx4: use a single interrupt handle Adrien Mazarguil
                     ` (18 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The new definitions also rely on the existing DPDK logging subsystem
instead of using fprintf() directly.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c       |  2 +-
 drivers/net/mlx4/mlx4.h       | 52 --------------------
 drivers/net/mlx4/mlx4_flow.c  |  1 +
 drivers/net/mlx4/mlx4_utils.h | 98 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+), 53 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 51259d2..7e71d90 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -59,7 +59,6 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
-#include <rte_log.h>
 #include <rte_alarm.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
@@ -74,6 +73,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_utils.h"
 
 /** Configuration structure for device arguments. */
 struct mlx4_conf {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index e2990fe..5ecccfa 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -36,23 +36,6 @@
 
 #include <stdint.h>
 
-/*
- * Runtime logging through RTE_LOG() is enabled when not in debugging mode.
- * Intermediate LOG_*() macros add the required end-of-line characters.
- */
-#ifndef NDEBUG
-#define INFO(...) DEBUG(__VA_ARGS__)
-#define WARN(...) DEBUG(__VA_ARGS__)
-#define ERROR(...) DEBUG(__VA_ARGS__)
-#else
-#define LOG__(level, m, ...) \
-	RTE_LOG(level, PMD, MLX4_DRIVER_NAME ": " m "%c", __VA_ARGS__)
-#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
-#define INFO(...) LOG_(INFO, __VA_ARGS__)
-#define WARN(...) LOG_(WARNING, __VA_ARGS__)
-#define ERROR(...) LOG_(ERR, __VA_ARGS__)
-#endif
-
 /* Verbs header. */
 /* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
 #ifdef PEDANTIC
@@ -98,41 +81,6 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-/* Debugging */
-#ifndef NDEBUG
-#include <stdio.h>
-#define DEBUG__(m, ...)						\
-	(fprintf(stderr, "%s:%d: %s(): " m "%c",		\
-		 __FILE__, __LINE__, __func__, __VA_ARGS__),	\
-	 fflush(stderr),					\
-	 (void)0)
-/*
- * Save/restore errno around DEBUG__().
- * XXX somewhat undefined behavior, but works.
- */
-#define DEBUG_(...)				\
-	(errno = ((int []){			\
-		*(volatile int *)&errno,	\
-		(DEBUG__(__VA_ARGS__), 0)	\
-	})[0])
-#define DEBUG(...) DEBUG_(__VA_ARGS__, '\n')
-#ifndef MLX4_PMD_DEBUG_BROKEN_VERBS
-#define claim_zero(...) assert((__VA_ARGS__) == 0)
-#else /* MLX4_PMD_DEBUG_BROKEN_VERBS */
-#define claim_zero(...) \
-	(void)(((__VA_ARGS__) == 0) || \
-		DEBUG("Assertion `(" # __VA_ARGS__ ") == 0' failed (IGNORED)."))
-#endif /* MLX4_PMD_DEBUG_BROKEN_VERBS */
-#define claim_nonzero(...) assert((__VA_ARGS__) != 0)
-#define claim_positive(...) assert((__VA_ARGS__) >= 0)
-#else /* NDEBUG */
-/* No-ops. */
-#define DEBUG(...) (void)0
-#define claim_zero(...) (__VA_ARGS__)
-#define claim_nonzero(...) (__VA_ARGS__)
-#define claim_positive(...) (__VA_ARGS__)
-#endif /* NDEBUG */
-
 struct mlx4_rxq_stats {
 	unsigned int idx; /**< Mapping index. */
 	uint64_t ipackets; /**< Total of successfully received packets. */
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 3463713..6f6f455 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -40,6 +40,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_utils.h"
 
 /** Static initializer for items. */
 #define ITEMS(...) \
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
new file mode 100644
index 0000000..30f96c2
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -0,0 +1,98 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef MLX4_UTILS_H_
+#define MLX4_UTILS_H_
+
+#include <rte_common.h>
+#include <rte_log.h>
+
+#include "mlx4.h"
+
+#ifndef NDEBUG
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX4_DRIVER_NAME) in log messages.
+ */
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+	const char *n = s;
+
+	while (*n)
+		if (*(n++) == '/')
+			s = n;
+	return s;
+}
+
+#define PMD_DRV_LOG(level, ...) \
+	RTE_LOG(level, PMD, \
+		RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \
+			pmd_drv_log_basename(__FILE__), \
+			__LINE__, \
+			__func__, \
+			RTE_FMT_TAIL(__VA_ARGS__,)))
+#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__)
+#ifndef MLX4_PMD_DEBUG_BROKEN_VERBS
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+#else /* MLX4_PMD_DEBUG_BROKEN_VERBS */
+#define claim_zero(...) \
+	(void)(((__VA_ARGS__) == 0) || \
+		DEBUG("Assertion `(" # __VA_ARGS__ ") == 0' failed (IGNORED)."))
+#endif /* MLX4_PMD_DEBUG_BROKEN_VERBS */
+
+#else /* NDEBUG */
+
+/*
+ * Like assert(), DEBUG() becomes a no-op and claim_zero() does not perform
+ * any check when debugging is disabled.
+ */
+
+#define PMD_DRV_LOG(level, ...) \
+	RTE_LOG(level, PMD, \
+		RTE_FMT(MLX4_DRIVER_NAME ": " \
+			RTE_FMT_HEAD(__VA_ARGS__,) "\n", \
+		RTE_FMT_TAIL(__VA_ARGS__,)))
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__)
+#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
+#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
+
+#endif /* MLX4_UTILS_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 35/51] net/mlx4: use a single interrupt handle
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (33 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 34/51] net/mlx4: separate debugging macros Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 36/51] net/mlx4: rename alarm field Adrien Mazarguil
                     ` (17 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The reason one interrupt handle is currently used for RMV/LSC events and
another one for Rx traffic is because these come from distinct file
descriptors.

This can be simplified however as Rx interrupt file descriptors are stored
elsewhere and are registered separately.

Modifying the interrupt handle type to RTE_INTR_HANDLE_UNKNOWN has never
been necessary as disabling interrupts is actually done by unregistering
the associated callback (RMV/LSC) or emptying the EFD array (Rx). Instead,
make clear that the base handle file descriptor is invalid by setting it to
-1 when disabled.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 32 ++++++++++++++++++++------------
 drivers/net/mlx4/mlx4.h |  3 +--
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 7e71d90..21762cc 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2817,8 +2817,7 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 		ERROR("rte_intr_callback_unregister failed with %d %s",
 		      ret, strerror(rte_errno));
 	}
-	priv->intr_handle.fd = 0;
-	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+	priv->intr_handle.fd = -1;
 	return ret;
 }
 
@@ -2859,7 +2858,6 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 		return -rte_errno;
 	} else {
 		priv->intr_handle.fd = priv->ctx->async_fd;
-		priv->intr_handle.type = RTE_INTR_HANDLE_EXT;
 		rc = rte_intr_callback_register(&priv->intr_handle,
 						 mlx4_dev_interrupt_handler,
 						 dev);
@@ -2867,6 +2865,7 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 			rte_errno = -rc;
 			ERROR("rte_intr_callback_register failed "
 			      " (rte_errno: %s)", strerror(rte_errno));
+			priv->intr_handle.fd = -1;
 			return -rte_errno;
 		}
 	}
@@ -2997,7 +2996,7 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	unsigned int rxqs_n = priv->rxqs_n;
 	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
 	unsigned int count = 0;
-	struct rte_intr_handle *intr_handle = priv->dev->intr_handle;
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
 
 	if (!priv->dev->data->dev_conf.intr_conf.rxq)
 		return 0;
@@ -3009,7 +3008,6 @@ priv_rx_intr_vec_enable(struct priv *priv)
 		      " Rx interrupts will not be supported");
 		return -rte_errno;
 	}
-	intr_handle->type = RTE_INTR_HANDLE_EXT;
 	for (i = 0; i != n; ++i) {
 		struct rxq *rxq = (*priv->rxqs)[i];
 		int fd;
@@ -3062,7 +3060,7 @@ priv_rx_intr_vec_enable(struct priv *priv)
 static void
 priv_rx_intr_vec_disable(struct priv *priv)
 {
-	struct rte_intr_handle *intr_handle = priv->dev->intr_handle;
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
 
 	rte_intr_free_epoll_fd(intr_handle);
 	free(intr_handle->intr_vec);
@@ -3429,14 +3427,24 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		eth_dev->device = &pci_dev->device;
 		rte_eth_copy_pci_info(eth_dev, pci_dev);
 		eth_dev->device->driver = &mlx4_driver.driver;
+		/* Initialize local interrupt handle for current port. */
+		priv->intr_handle = (struct rte_intr_handle){
+			.fd = -1,
+			.type = RTE_INTR_HANDLE_EXT,
+		};
 		/*
-		 * Copy and override interrupt handle to prevent it from
-		 * being shared between all ethdev instances of a given PCI
-		 * device. This is required to properly handle Rx interrupts
-		 * on all ports.
+		 * Override ethdev interrupt handle pointer with private
+		 * handle instead of that of the parent PCI device used by
+		 * default. This prevents it from being shared between all
+		 * ports of the same PCI device since each of them is
+		 * associated its own Verbs context.
+		 *
+		 * Rx interrupts in particular require this as the PMD has
+		 * no control over the registration of queue interrupts
+		 * besides setting up eth_dev->intr_handle, the rest is
+		 * handled by rte_intr_rx_ctl().
 		 */
-		priv->intr_handle_dev = *eth_dev->intr_handle;
-		eth_dev->intr_handle = &priv->intr_handle_dev;
+		eth_dev->intr_handle = &priv->intr_handle;
 		priv->dev = eth_dev;
 		eth_dev->dev_ops = &mlx4_dev_ops;
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 5ecccfa..ce827aa 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -169,8 +169,7 @@ struct priv {
 	unsigned int txqs_n; /* TX queues array size. */
 	struct rxq *(*rxqs)[]; /* RX queues. */
 	struct txq *(*txqs)[]; /* TX queues. */
-	struct rte_intr_handle intr_handle_dev; /* Device interrupt handler. */
-	struct rte_intr_handle intr_handle; /* Interrupt handler. */
+	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 36/51] net/mlx4: rename alarm field
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (34 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 35/51] net/mlx4: use a single interrupt handle Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 37/51] net/mlx4: refactor interrupt FD settings Adrien Mazarguil
                     ` (16 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Make clear this field is related to interrupt handling.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 14 +++++++-------
 drivers/net/mlx4/mlx4.h |  6 +++---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 21762cc..42606ed 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2720,10 +2720,10 @@ priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
 	mlx4_link_update(dev, 0);
 	if (((link->link_speed == 0) && link->link_status) ||
 	    ((link->link_speed != 0) && !link->link_status)) {
-		if (!priv->pending_alarm) {
+		if (!priv->intr_alarm) {
 			/* Inconsistent status, check again later. */
-			priv->pending_alarm = 1;
-			rte_eal_alarm_set(MLX4_ALARM_TIMEOUT_US,
+			priv->intr_alarm = 1;
+			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
 					  mlx4_dev_link_status_handler,
 					  dev);
 		}
@@ -2747,8 +2747,8 @@ mlx4_dev_link_status_handler(void *arg)
 	uint32_t events;
 	int ret;
 
-	assert(priv->pending_alarm == 1);
-	priv->pending_alarm = 0;
+	assert(priv->intr_alarm == 1);
+	priv->intr_alarm = 0;
 	ret = priv_dev_status_handler(priv, dev, &events);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
@@ -2917,14 +2917,14 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
 		if (ret)
 			return ret;
 	}
-	if (priv->pending_alarm)
+	if (priv->intr_alarm)
 		if (rte_eal_alarm_cancel(mlx4_dev_link_status_handler,
 					 dev)) {
 			ERROR("rte_eal_alarm_cancel failed "
 			      " (rte_errno: %s)", strerror(rte_errno));
 			return -rte_errno;
 		}
-	priv->pending_alarm = 0;
+	priv->intr_alarm = 0;
 	return 0;
 }
 
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index ce827aa..47ea409 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -63,8 +63,8 @@
 #define MLX4_PMD_TX_MP_CACHE 8
 #endif
 
-/* Alarm timeout. */
-#define MLX4_ALARM_TIMEOUT_US 100000
+/* Interrupt alarm timeout value in microseconds. */
+#define MLX4_INTR_ALARM_TIMEOUT 100000
 
 /* Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
@@ -162,7 +162,7 @@ struct priv {
 	uint8_t port; /* Physical port number. */
 	unsigned int started:1; /* Device started, flows enabled. */
 	unsigned int vf:1; /* This is a VF device. */
-	unsigned int pending_alarm:1; /* An alarm is pending. */
+	unsigned int intr_alarm:1; /* An interrupt alarm is scheduled. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
 	/* RX/TX queues. */
 	unsigned int rxqs_n; /* RX queues array size. */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 37/51] net/mlx4: refactor interrupt FD settings
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (35 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 36/51] net/mlx4: rename alarm field Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 38/51] net/mlx4: clean up interrupt functions prototypes Adrien Mazarguil
                     ` (15 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

File descriptors used for interrupts processing must be made non-blocking.

Doing so as soon as they are opened instead of waiting until they are
needed is more efficient as it avoids performing redundant system calls and
run through their associated error-handling code later on.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile     |  1 +
 drivers/net/mlx4/mlx4.c       | 63 ++++++++++++++----------------------
 drivers/net/mlx4/mlx4.h       |  4 +++
 drivers/net/mlx4/mlx4_utils.c | 66 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_utils.h |  4 +++
 5 files changed, 99 insertions(+), 39 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 20692f0..8a03154 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -37,6 +37,7 @@ LIB = librte_pmd_mlx4.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
 CFLAGS += -O3
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 42606ed..2db2b0e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -48,7 +48,6 @@
 #include <netinet/in.h>
 #include <linux/ethtool.h>
 #include <linux/sockios.h>
-#include <fcntl.h>
 
 #include <rte_ether.h>
 #include <rte_ethdev.h>
@@ -1800,6 +1799,12 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 			      (void *)dev, strerror(rte_errno));
 			goto error;
 		}
+		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
+			ERROR("%p: unable to make Rx interrupt completion"
+			      " channel non-blocking: %s",
+			      (void *)dev, strerror(rte_errno));
+			goto error;
+		}
 	}
 	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
 	if (tmpl.cq == NULL) {
@@ -2836,7 +2841,6 @@ static int
 priv_dev_interrupt_handler_install(struct priv *priv,
 				   struct rte_eth_dev *dev)
 {
-	int flags;
 	int rc;
 
 	/*
@@ -2847,29 +2851,17 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 	    priv->intr_conf.rmv &&
 	    priv->intr_handle.fd)
 		return 0;
-	assert(priv->ctx->async_fd > 0);
-	flags = fcntl(priv->ctx->async_fd, F_GETFL);
-	rc = fcntl(priv->ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
-	if (rc < 0) {
-		rte_errno = errno ? errno : EINVAL;
-		INFO("failed to change file descriptor async event queue");
-		dev->data->dev_conf.intr_conf.lsc = 0;
-		dev->data->dev_conf.intr_conf.rmv = 0;
-		return -rte_errno;
-	} else {
-		priv->intr_handle.fd = priv->ctx->async_fd;
-		rc = rte_intr_callback_register(&priv->intr_handle,
-						 mlx4_dev_interrupt_handler,
-						 dev);
-		if (rc) {
-			rte_errno = -rc;
-			ERROR("rte_intr_callback_register failed "
-			      " (rte_errno: %s)", strerror(rte_errno));
-			priv->intr_handle.fd = -1;
-			return -rte_errno;
-		}
-	}
-	return 0;
+	priv->intr_handle.fd = priv->ctx->async_fd;
+	rc = rte_intr_callback_register(&priv->intr_handle,
+					mlx4_dev_interrupt_handler,
+					dev);
+	if (!rc)
+		return 0;
+	rte_errno = -rc;
+	ERROR("rte_intr_callback_register failed (rte_errno: %s)",
+	      strerror(rte_errno));
+	priv->intr_handle.fd = -1;
+	return -rte_errno;
 }
 
 /**
@@ -3010,9 +3002,6 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	}
 	for (i = 0; i != n; ++i) {
 		struct rxq *rxq = (*priv->rxqs)[i];
-		int fd;
-		int flags;
-		int rc;
 
 		/* Skip queues that cannot request interrupts. */
 		if (!rxq || !rxq->channel) {
@@ -3030,18 +3019,8 @@ priv_rx_intr_vec_enable(struct priv *priv)
 			priv_rx_intr_vec_disable(priv);
 			return -rte_errno;
 		}
-		fd = rxq->channel->fd;
-		flags = fcntl(fd, F_GETFL);
-		rc = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
-		if (rc < 0) {
-			rte_errno = errno;
-			ERROR("failed to make Rx interrupt file descriptor"
-			      " %d non-blocking for queue index %d", fd, i);
-			priv_rx_intr_vec_disable(priv);
-			return -rte_errno;
-		}
 		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
-		intr_handle->efds[count] = fd;
+		intr_handle->efds[count] = rxq->channel->fd;
 		count++;
 	}
 	if (!count)
@@ -3358,6 +3337,12 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 			DEBUG("port %d is not active: \"%s\" (%d)",
 			      port, ibv_port_state_str(port_attr.state),
 			      port_attr.state);
+		/* Make asynchronous FD non-blocking to handle interrupts. */
+		if (mlx4_fd_set_non_blocking(ctx->async_fd) < 0) {
+			ERROR("cannot make asynchronous FD non-blocking: %s",
+			      strerror(rte_errno));
+			goto port_error;
+		}
 		/* Allocate protection domain. */
 		pd = ibv_alloc_pd(ctx);
 		if (pd == NULL) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 47ea409..b61f5f2 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -46,6 +46,10 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
 
diff --git a/drivers/net/mlx4/mlx4_utils.c b/drivers/net/mlx4/mlx4_utils.c
new file mode 100644
index 0000000..fcf76c9
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_utils.c
@@ -0,0 +1,66 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Utility functions used by the mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+
+#include <rte_errno.h>
+
+#include "mlx4_utils.h"
+
+/**
+ * Make a file descriptor non-blocking.
+ *
+ * @param fd
+ *   File descriptor to alter.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_fd_set_non_blocking(int fd)
+{
+	int ret = fcntl(fd, F_GETFL);
+
+	if (ret != -1 && !fcntl(fd, F_SETFL, ret | O_NONBLOCK))
+		return 0;
+	assert(errno);
+	rte_errno = errno;
+	return -rte_errno;
+}
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index 30f96c2..9b178f5 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -95,4 +95,8 @@ pmd_drv_log_basename(const char *s)
 #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
 
+/* mlx4_utils.c */
+
+int mlx4_fd_set_non_blocking(int fd);
+
 #endif /* MLX4_UTILS_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 38/51] net/mlx4: clean up interrupt functions prototypes
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (36 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 37/51] net/mlx4: refactor interrupt FD settings Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 39/51] net/mlx4: compact interrupt functions Adrien Mazarguil
                     ` (14 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

The naming scheme for these functions is overly verbose and not accurate
enough, with too many "handler" functions that are difficult to
differentiate (e.g. mlx4_dev_link_status_handler(),
mlx4_dev_interrupt_handler() and priv_dev_status_handler()).

This commit renames them and removes the unnecessary dev argument which can
be retrieved through the private structure where needed. Documentation is
updated accordingly.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 145 ++++++++++++++++---------------------------
 1 file changed, 55 insertions(+), 90 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 2db2b0e..0fcd4f0 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1977,14 +1977,9 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	rte_free(rxq);
 }
 
-static int
-priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
-
-static int
-priv_dev_removal_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
-
-static int
-priv_dev_link_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
+static int priv_interrupt_handler_install(struct priv *priv);
+static int priv_removal_interrupt_handler_install(struct priv *priv);
+static int priv_link_interrupt_handler_install(struct priv *priv);
 
 /**
  * DPDK callback to start the device.
@@ -2010,13 +2005,13 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	ret = priv_mac_addr_add(priv);
 	if (ret)
 		goto err;
-	ret = priv_dev_link_interrupt_handler_install(priv, dev);
+	ret = priv_link_interrupt_handler_install(priv);
 	if (ret) {
 		ERROR("%p: LSC handler install failed",
 		     (void *)dev);
 		goto err;
 	}
-	ret = priv_dev_removal_interrupt_handler_install(priv, dev);
+	ret = priv_removal_interrupt_handler_install(priv);
 	if (ret) {
 		ERROR("%p: RMV handler install failed",
 		     (void *)dev);
@@ -2113,15 +2108,9 @@ removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	return 0;
 }
 
-static int
-priv_dev_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
-
-static int
-priv_dev_removal_interrupt_handler_uninstall(struct priv *,
-					     struct rte_eth_dev *);
-
-static int
-priv_dev_link_interrupt_handler_uninstall(struct priv *, struct rte_eth_dev *);
+static int priv_interrupt_handler_uninstall(struct priv *priv);
+static int priv_removal_interrupt_handler_uninstall(struct priv *priv);
+static int priv_link_interrupt_handler_uninstall(struct priv *priv);
 
 /**
  * DPDK callback to close the device.
@@ -2185,8 +2174,8 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	priv_dev_removal_interrupt_handler_uninstall(priv, dev);
-	priv_dev_link_interrupt_handler_uninstall(priv, dev);
+	priv_removal_interrupt_handler_uninstall(priv);
+	priv_link_interrupt_handler_uninstall(priv);
 	priv_rx_intr_vec_disable(priv);
 	memset(priv, 0, sizeof(*priv));
 }
@@ -2674,31 +2663,25 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-static void
-mlx4_dev_link_status_handler(void *);
-static void
-mlx4_dev_interrupt_handler(void *);
+static void mlx4_link_status_alarm(struct priv *priv);
 
 /**
- * Link/device status handler.
+ * Collect interrupt events.
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  * @param events
  *   Pointer to event flags holder.
  *
  * @return
- *   Number of events
+ *   Number of events.
  */
 static int
-priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
-			uint32_t *events)
+priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
 {
 	struct ibv_async_event event;
 	int port_change = 0;
-	struct rte_eth_link *link = &dev->data->dev_link;
+	struct rte_eth_link *link = &priv->dev->data->dev_link;
 	int ret = 0;
 
 	*events = 0;
@@ -2722,15 +2705,16 @@ priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
 	}
 	if (!port_change)
 		return ret;
-	mlx4_link_update(dev, 0);
+	mlx4_link_update(priv->dev, 0);
 	if (((link->link_speed == 0) && link->link_status) ||
 	    ((link->link_speed != 0) && !link->link_status)) {
 		if (!priv->intr_alarm) {
 			/* Inconsistent status, check again later. */
 			priv->intr_alarm = 1;
 			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
-					  mlx4_dev_link_status_handler,
-					  dev);
+					  (void (*)(void *))
+					  mlx4_link_status_alarm,
+					  priv);
 		}
 	} else {
 		*events |= (1 << RTE_ETH_EVENT_INTR_LSC);
@@ -2739,53 +2723,48 @@ priv_dev_status_handler(struct priv *priv, struct rte_eth_dev *dev,
 }
 
 /**
- * Handle delayed link status event.
+ * Process scheduled link status check.
  *
- * @param arg
- *   Registered argument.
+ * @param priv
+ *   Pointer to private structure.
  */
 static void
-mlx4_dev_link_status_handler(void *arg)
+mlx4_link_status_alarm(struct priv *priv)
 {
-	struct rte_eth_dev *dev = arg;
-	struct priv *priv = dev->data->dev_private;
 	uint32_t events;
 	int ret;
 
 	assert(priv->intr_alarm == 1);
 	priv->intr_alarm = 0;
-	ret = priv_dev_status_handler(priv, dev, &events);
+	ret = priv_collect_interrupt_events(priv, &events);
 	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC,
+		_rte_eth_dev_callback_process(priv->dev,
+					      RTE_ETH_EVENT_INTR_LSC,
 					      NULL, NULL);
 }
 
 /**
  * Handle interrupts from the NIC.
  *
- * @param[in] intr_handle
- *   Interrupt handler.
- * @param cb_arg
- *   Callback argument.
+ * @param priv
+ *   Pointer to private structure.
  */
 static void
-mlx4_dev_interrupt_handler(void *cb_arg)
+mlx4_interrupt_handler(struct priv *priv)
 {
-	struct rte_eth_dev *dev = cb_arg;
-	struct priv *priv = dev->data->dev_private;
 	int ret;
 	uint32_t ev;
 	int i;
 
-	ret = priv_dev_status_handler(priv, dev, &ev);
+	ret = priv_collect_interrupt_events(priv, &ev);
 	if (ret > 0) {
 		for (i = RTE_ETH_EVENT_UNKNOWN;
 		     i < RTE_ETH_EVENT_MAX;
 		     i++) {
 			if (ev & (1 << i)) {
 				ev &= ~(1 << i);
-				_rte_eth_dev_callback_process(dev, i, NULL,
-							      NULL);
+				_rte_eth_dev_callback_process(priv->dev, i,
+							      NULL, NULL);
 				ret--;
 			}
 		}
@@ -2800,14 +2779,12 @@ mlx4_dev_interrupt_handler(void *cb_arg)
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
+priv_interrupt_handler_uninstall(struct priv *priv)
 {
 	int ret;
 
@@ -2815,8 +2792,9 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 	    priv->intr_conf.rmv)
 		return 0;
 	ret = rte_intr_callback_unregister(&priv->intr_handle,
-					   mlx4_dev_interrupt_handler,
-					   dev);
+					   (void (*)(void *))
+					   mlx4_interrupt_handler,
+					   priv);
 	if (ret < 0) {
 		rte_errno = ret;
 		ERROR("rte_intr_callback_unregister failed with %d %s",
@@ -2831,15 +2809,12 @@ priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_interrupt_handler_install(struct priv *priv,
-				   struct rte_eth_dev *dev)
+priv_interrupt_handler_install(struct priv *priv)
 {
 	int rc;
 
@@ -2853,8 +2828,9 @@ priv_dev_interrupt_handler_install(struct priv *priv,
 		return 0;
 	priv->intr_handle.fd = priv->ctx->async_fd;
 	rc = rte_intr_callback_register(&priv->intr_handle,
-					mlx4_dev_interrupt_handler,
-					dev);
+					(void (*)(void *))
+					mlx4_interrupt_handler,
+					priv);
 	if (!rc)
 		return 0;
 	rte_errno = -rc;
@@ -2869,19 +2845,16 @@ priv_dev_interrupt_handler_install(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
-					    struct rte_eth_dev *dev)
+priv_removal_interrupt_handler_uninstall(struct priv *priv)
 {
-	if (dev->data->dev_conf.intr_conf.rmv) {
+	if (priv->dev->data->dev_conf.intr_conf.rmv) {
 		priv->intr_conf.rmv = 0;
-		return priv_dev_interrupt_handler_uninstall(priv, dev);
+		return priv_interrupt_handler_uninstall(priv);
 	}
 	return 0;
 }
@@ -2891,27 +2864,25 @@ priv_dev_removal_interrupt_handler_uninstall(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
-					  struct rte_eth_dev *dev)
+priv_link_interrupt_handler_uninstall(struct priv *priv)
 {
 	int ret = 0;
 
-	if (dev->data->dev_conf.intr_conf.lsc) {
+	if (priv->dev->data->dev_conf.intr_conf.lsc) {
 		priv->intr_conf.lsc = 0;
-		ret = priv_dev_interrupt_handler_uninstall(priv, dev);
+		ret = priv_interrupt_handler_uninstall(priv);
 		if (ret)
 			return ret;
 	}
 	if (priv->intr_alarm)
-		if (rte_eal_alarm_cancel(mlx4_dev_link_status_handler,
-					 dev)) {
+		if (rte_eal_alarm_cancel((void (*)(void *))
+					 mlx4_link_status_alarm,
+					 priv)) {
 			ERROR("rte_eal_alarm_cancel failed "
 			      " (rte_errno: %s)", strerror(rte_errno));
 			return -rte_errno;
@@ -2925,20 +2896,17 @@ priv_dev_link_interrupt_handler_uninstall(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_link_interrupt_handler_install(struct priv *priv,
-					struct rte_eth_dev *dev)
+priv_link_interrupt_handler_install(struct priv *priv)
 {
 	int ret;
 
-	if (dev->data->dev_conf.intr_conf.lsc) {
-		ret = priv_dev_interrupt_handler_install(priv, dev);
+	if (priv->dev->data->dev_conf.intr_conf.lsc) {
+		ret = priv_interrupt_handler_install(priv);
 		if (ret)
 			return ret;
 		priv->intr_conf.lsc = 1;
@@ -2951,20 +2919,17 @@ priv_dev_link_interrupt_handler_install(struct priv *priv,
  *
  * @param priv
  *   Pointer to private structure.
- * @param dev
- *   Pointer to the rte_eth_dev structure.
  *
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_dev_removal_interrupt_handler_install(struct priv *priv,
-					   struct rte_eth_dev *dev)
+priv_removal_interrupt_handler_install(struct priv *priv)
 {
 	int ret;
 
-	if (dev->data->dev_conf.intr_conf.rmv) {
-		ret = priv_dev_interrupt_handler_install(priv, dev);
+	if (priv->dev->data->dev_conf.intr_conf.rmv) {
+		ret = priv_interrupt_handler_install(priv);
 		if (ret)
 			return ret;
 		priv->intr_conf.rmv = 1;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 39/51] net/mlx4: compact interrupt functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (37 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 38/51] net/mlx4: clean up interrupt functions prototypes Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 40/51] net/mlx4: separate interrupt handling Adrien Mazarguil
                     ` (13 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Link status (LSC) and removal (RMV) interrupts share a common handler and
are toggled simultaneously from common install/uninstall functions.

Four additional wrapper functions (two for each interrupt type) are
currently necessary because the PMD maintains an internal configuration
state for interrupts (priv->intr_conf).

This complexity can be avoided entirely since the PMD does not disable
interrupts configuration parameters in case of error anymore.

With this commit, only two functions are necessary to toggle interrupts
(including Rx) during start/stop cycles.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c | 199 +++++++++----------------------------------
 drivers/net/mlx4/mlx4.h |   1 -
 2 files changed, 41 insertions(+), 159 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 0fcd4f0..a997a63 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1977,9 +1977,8 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	rte_free(rxq);
 }
 
-static int priv_interrupt_handler_install(struct priv *priv);
-static int priv_removal_interrupt_handler_install(struct priv *priv);
-static int priv_link_interrupt_handler_install(struct priv *priv);
+static int priv_intr_uninstall(struct priv *priv);
+static int priv_intr_install(struct priv *priv);
 
 /**
  * DPDK callback to start the device.
@@ -2005,24 +2004,12 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	ret = priv_mac_addr_add(priv);
 	if (ret)
 		goto err;
-	ret = priv_link_interrupt_handler_install(priv);
+	ret = priv_intr_install(priv);
 	if (ret) {
-		ERROR("%p: LSC handler install failed",
+		ERROR("%p: interrupt handler installation failed",
 		     (void *)dev);
 		goto err;
 	}
-	ret = priv_removal_interrupt_handler_install(priv);
-	if (ret) {
-		ERROR("%p: RMV handler install failed",
-		     (void *)dev);
-		goto err;
-	}
-	ret = priv_rx_intr_vec_enable(priv);
-	if (ret) {
-		ERROR("%p: Rx interrupt vector creation failed",
-		      (void *)dev);
-		goto err;
-	}
 	ret = mlx4_priv_flow_start(priv);
 	if (ret) {
 		ERROR("%p: flow start failed: %s",
@@ -2055,6 +2042,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
+	priv_intr_uninstall(priv);
 	priv_mac_addr_del(priv);
 }
 
@@ -2108,10 +2096,6 @@ removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 	return 0;
 }
 
-static int priv_interrupt_handler_uninstall(struct priv *priv);
-static int priv_removal_interrupt_handler_uninstall(struct priv *priv);
-static int priv_link_interrupt_handler_uninstall(struct priv *priv);
-
 /**
  * DPDK callback to close the device.
  *
@@ -2174,9 +2158,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	priv_removal_interrupt_handler_uninstall(priv);
-	priv_link_interrupt_handler_uninstall(priv);
-	priv_rx_intr_vec_disable(priv);
+	priv_intr_uninstall(priv);
 	memset(priv, 0, sizeof(*priv));
 }
 
@@ -2682,6 +2664,8 @@ priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
 	struct ibv_async_event event;
 	int port_change = 0;
 	struct rte_eth_link *link = &priv->dev->data->dev_link;
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
 	int ret = 0;
 
 	*events = 0;
@@ -2691,11 +2675,11 @@ priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
 			break;
 		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
 		     event.event_type == IBV_EVENT_PORT_ERR) &&
-		    (priv->intr_conf.lsc == 1)) {
+		    intr_conf->lsc) {
 			port_change = 1;
 			ret++;
 		} else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			   priv->intr_conf.rmv == 1) {
+			   intr_conf->rmv) {
 			*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
 			ret++;
 		} else
@@ -2784,24 +2768,22 @@ mlx4_interrupt_handler(struct priv *priv)
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_interrupt_handler_uninstall(struct priv *priv)
+priv_intr_uninstall(struct priv *priv)
 {
-	int ret;
+	int err = rte_errno; /* Make sure rte_errno remains unchanged. */
 
-	if (priv->intr_conf.lsc ||
-	    priv->intr_conf.rmv)
-		return 0;
-	ret = rte_intr_callback_unregister(&priv->intr_handle,
-					   (void (*)(void *))
-					   mlx4_interrupt_handler,
-					   priv);
-	if (ret < 0) {
-		rte_errno = ret;
-		ERROR("rte_intr_callback_unregister failed with %d %s",
-		      ret, strerror(rte_errno));
+	if (priv->intr_handle.fd != -1) {
+		rte_intr_callback_unregister(&priv->intr_handle,
+					     (void (*)(void *))
+					     mlx4_interrupt_handler,
+					     priv);
+		priv->intr_handle.fd = -1;
 	}
-	priv->intr_handle.fd = -1;
-	return ret;
+	rte_eal_alarm_cancel((void (*)(void *))mlx4_link_status_alarm, priv);
+	priv->intr_alarm = 0;
+	priv_rx_intr_vec_disable(priv);
+	rte_errno = err;
+	return 0;
 }
 
 /**
@@ -2814,127 +2796,30 @@ priv_interrupt_handler_uninstall(struct priv *priv)
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_interrupt_handler_install(struct priv *priv)
+priv_intr_install(struct priv *priv)
 {
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
 	int rc;
 
-	/*
-	 * Check whether the interrupt handler has already been installed
-	 * for either type of interrupt.
-	 */
-	if (priv->intr_conf.lsc &&
-	    priv->intr_conf.rmv &&
-	    priv->intr_handle.fd)
-		return 0;
-	priv->intr_handle.fd = priv->ctx->async_fd;
-	rc = rte_intr_callback_register(&priv->intr_handle,
-					(void (*)(void *))
-					mlx4_interrupt_handler,
-					priv);
-	if (!rc)
-		return 0;
-	rte_errno = -rc;
-	ERROR("rte_intr_callback_register failed (rte_errno: %s)",
-	      strerror(rte_errno));
-	priv->intr_handle.fd = -1;
-	return -rte_errno;
-}
-
-/**
- * Uninstall interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_removal_interrupt_handler_uninstall(struct priv *priv)
-{
-	if (priv->dev->data->dev_conf.intr_conf.rmv) {
-		priv->intr_conf.rmv = 0;
-		return priv_interrupt_handler_uninstall(priv);
-	}
-	return 0;
-}
-
-/**
- * Uninstall interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_link_interrupt_handler_uninstall(struct priv *priv)
-{
-	int ret = 0;
-
-	if (priv->dev->data->dev_conf.intr_conf.lsc) {
-		priv->intr_conf.lsc = 0;
-		ret = priv_interrupt_handler_uninstall(priv);
-		if (ret)
-			return ret;
-	}
-	if (priv->intr_alarm)
-		if (rte_eal_alarm_cancel((void (*)(void *))
-					 mlx4_link_status_alarm,
-					 priv)) {
-			ERROR("rte_eal_alarm_cancel failed "
-			      " (rte_errno: %s)", strerror(rte_errno));
-			return -rte_errno;
+	priv_intr_uninstall(priv);
+	if (intr_conf->rxq && priv_rx_intr_vec_enable(priv) < 0)
+		goto error;
+	if (intr_conf->lsc | intr_conf->rmv) {
+		priv->intr_handle.fd = priv->ctx->async_fd;
+		rc = rte_intr_callback_register(&priv->intr_handle,
+						(void (*)(void *))
+						mlx4_interrupt_handler,
+						priv);
+		if (rc < 0) {
+			rte_errno = -rc;
+			goto error;
 		}
-	priv->intr_alarm = 0;
-	return 0;
-}
-
-/**
- * Install link interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_link_interrupt_handler_install(struct priv *priv)
-{
-	int ret;
-
-	if (priv->dev->data->dev_conf.intr_conf.lsc) {
-		ret = priv_interrupt_handler_install(priv);
-		if (ret)
-			return ret;
-		priv->intr_conf.lsc = 1;
-	}
-	return 0;
-}
-
-/**
- * Install removal interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_removal_interrupt_handler_install(struct priv *priv)
-{
-	int ret;
-
-	if (priv->dev->data->dev_conf.intr_conf.rmv) {
-		ret = priv_interrupt_handler_install(priv);
-		if (ret)
-			return ret;
-		priv->intr_conf.rmv = 1;
 	}
 	return 0;
+error:
+	priv_intr_uninstall(priv);
+	return -rte_errno;
 }
 
 /**
@@ -2955,8 +2840,6 @@ priv_rx_intr_vec_enable(struct priv *priv)
 	unsigned int count = 0;
 	struct rte_intr_handle *intr_handle = &priv->intr_handle;
 
-	if (!priv->dev->data->dev_conf.intr_conf.rxq)
-		return 0;
 	priv_rx_intr_vec_disable(priv);
 	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
 	if (intr_handle->intr_vec == NULL) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b61f5f2..a35a94e 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -176,7 +176,6 @@ struct priv {
 	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
-	struct rte_intr_conf intr_conf; /* Active interrupt configuration. */
 };
 
 #endif /* RTE_PMD_MLX4_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 40/51] net/mlx4: separate interrupt handling
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (38 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 39/51] net/mlx4: compact interrupt functions Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 41/51] net/mlx4: separate Rx/Tx definitions Adrien Mazarguil
                     ` (12 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 340 +---------------------------------
 drivers/net/mlx4/mlx4.h      |  11 ++
 drivers/net/mlx4/mlx4_intr.c | 376 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 392 insertions(+), 336 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 8a03154..f6e3001 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -37,6 +37,7 @@ LIB = librte_pmd_mlx4.a
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a997a63..667ba2b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -58,7 +58,6 @@
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
 #include <rte_malloc.h>
-#include <rte_alarm.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
 #include <rte_kvargs.h>
@@ -88,18 +87,6 @@ const char *pmd_mlx4_init_params[] = {
 	NULL,
 };
 
-static int
-mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
-
-static int
-mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
-
-static int
-priv_rx_intr_vec_enable(struct priv *priv);
-
-static void
-priv_rx_intr_vec_disable(struct priv *priv);
-
 /* Allocate a buffer on the stack and fill it with a printf format string. */
 #define MKSTR(name, ...) \
 	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
@@ -1977,9 +1964,6 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	rte_free(rxq);
 }
 
-static int priv_intr_uninstall(struct priv *priv);
-static int priv_intr_install(struct priv *priv);
-
 /**
  * DPDK callback to start the device.
  *
@@ -2004,7 +1988,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	ret = priv_mac_addr_add(priv);
 	if (ret)
 		goto err;
-	ret = priv_intr_install(priv);
+	ret = mlx4_intr_install(priv);
 	if (ret) {
 		ERROR("%p: interrupt handler installation failed",
 		     (void *)dev);
@@ -2042,7 +2026,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
-	priv_intr_uninstall(priv);
+	mlx4_intr_uninstall(priv);
 	priv_mac_addr_del(priv);
 }
 
@@ -2158,7 +2142,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 		claim_zero(ibv_close_device(priv->ctx));
 	} else
 		assert(priv->ctx == NULL);
-	priv_intr_uninstall(priv);
+	mlx4_intr_uninstall(priv);
 	memset(priv, 0, sizeof(*priv));
 }
 
@@ -2370,7 +2354,7 @@ mlx4_stats_reset(struct rte_eth_dev *dev)
  * @return
  *   0 on success, negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
 {
 	const struct priv *priv = dev->data->dev_private;
@@ -2645,322 +2629,6 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
 	return 0;
 }
 
-static void mlx4_link_status_alarm(struct priv *priv);
-
-/**
- * Collect interrupt events.
- *
- * @param priv
- *   Pointer to private structure.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Number of events.
- */
-static int
-priv_collect_interrupt_events(struct priv *priv, uint32_t *events)
-{
-	struct ibv_async_event event;
-	int port_change = 0;
-	struct rte_eth_link *link = &priv->dev->data->dev_link;
-	const struct rte_intr_conf *const intr_conf =
-		&priv->dev->data->dev_conf.intr_conf;
-	int ret = 0;
-
-	*events = 0;
-	/* Read all message and acknowledge them. */
-	for (;;) {
-		if (ibv_get_async_event(priv->ctx, &event))
-			break;
-		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-		     event.event_type == IBV_EVENT_PORT_ERR) &&
-		    intr_conf->lsc) {
-			port_change = 1;
-			ret++;
-		} else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-			   intr_conf->rmv) {
-			*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
-			ret++;
-		} else
-			DEBUG("event type %d on port %d not handled",
-			      event.event_type, event.element.port_num);
-		ibv_ack_async_event(&event);
-	}
-	if (!port_change)
-		return ret;
-	mlx4_link_update(priv->dev, 0);
-	if (((link->link_speed == 0) && link->link_status) ||
-	    ((link->link_speed != 0) && !link->link_status)) {
-		if (!priv->intr_alarm) {
-			/* Inconsistent status, check again later. */
-			priv->intr_alarm = 1;
-			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
-					  (void (*)(void *))
-					  mlx4_link_status_alarm,
-					  priv);
-		}
-	} else {
-		*events |= (1 << RTE_ETH_EVENT_INTR_LSC);
-	}
-	return ret;
-}
-
-/**
- * Process scheduled link status check.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-mlx4_link_status_alarm(struct priv *priv)
-{
-	uint32_t events;
-	int ret;
-
-	assert(priv->intr_alarm == 1);
-	priv->intr_alarm = 0;
-	ret = priv_collect_interrupt_events(priv, &events);
-	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
-		_rte_eth_dev_callback_process(priv->dev,
-					      RTE_ETH_EVENT_INTR_LSC,
-					      NULL, NULL);
-}
-
-/**
- * Handle interrupts from the NIC.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-mlx4_interrupt_handler(struct priv *priv)
-{
-	int ret;
-	uint32_t ev;
-	int i;
-
-	ret = priv_collect_interrupt_events(priv, &ev);
-	if (ret > 0) {
-		for (i = RTE_ETH_EVENT_UNKNOWN;
-		     i < RTE_ETH_EVENT_MAX;
-		     i++) {
-			if (ev & (1 << i)) {
-				ev &= ~(1 << i);
-				_rte_eth_dev_callback_process(priv->dev, i,
-							      NULL, NULL);
-				ret--;
-			}
-		}
-		if (ret)
-			WARN("%d event%s not processed", ret,
-			     (ret > 1 ? "s were" : " was"));
-	}
-}
-
-/**
- * Uninstall interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_intr_uninstall(struct priv *priv)
-{
-	int err = rte_errno; /* Make sure rte_errno remains unchanged. */
-
-	if (priv->intr_handle.fd != -1) {
-		rte_intr_callback_unregister(&priv->intr_handle,
-					     (void (*)(void *))
-					     mlx4_interrupt_handler,
-					     priv);
-		priv->intr_handle.fd = -1;
-	}
-	rte_eal_alarm_cancel((void (*)(void *))mlx4_link_status_alarm, priv);
-	priv->intr_alarm = 0;
-	priv_rx_intr_vec_disable(priv);
-	rte_errno = err;
-	return 0;
-}
-
-/**
- * Install interrupt handler.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_intr_install(struct priv *priv)
-{
-	const struct rte_intr_conf *const intr_conf =
-		&priv->dev->data->dev_conf.intr_conf;
-	int rc;
-
-	priv_intr_uninstall(priv);
-	if (intr_conf->rxq && priv_rx_intr_vec_enable(priv) < 0)
-		goto error;
-	if (intr_conf->lsc | intr_conf->rmv) {
-		priv->intr_handle.fd = priv->ctx->async_fd;
-		rc = rte_intr_callback_register(&priv->intr_handle,
-						(void (*)(void *))
-						mlx4_interrupt_handler,
-						priv);
-		if (rc < 0) {
-			rte_errno = -rc;
-			goto error;
-		}
-	}
-	return 0;
-error:
-	priv_intr_uninstall(priv);
-	return -rte_errno;
-}
-
-/**
- * Allocate queue vector and fill epoll fd list for Rx interrupts.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_rx_intr_vec_enable(struct priv *priv)
-{
-	unsigned int i;
-	unsigned int rxqs_n = priv->rxqs_n;
-	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
-	unsigned int count = 0;
-	struct rte_intr_handle *intr_handle = &priv->intr_handle;
-
-	priv_rx_intr_vec_disable(priv);
-	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
-	if (intr_handle->intr_vec == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("failed to allocate memory for interrupt vector,"
-		      " Rx interrupts will not be supported");
-		return -rte_errno;
-	}
-	for (i = 0; i != n; ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
-
-		/* Skip queues that cannot request interrupts. */
-		if (!rxq || !rxq->channel) {
-			/* Use invalid intr_vec[] index to disable entry. */
-			intr_handle->intr_vec[i] =
-				RTE_INTR_VEC_RXTX_OFFSET +
-				RTE_MAX_RXTX_INTR_VEC_ID;
-			continue;
-		}
-		if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
-			rte_errno = E2BIG;
-			ERROR("too many Rx queues for interrupt vector size"
-			      " (%d), Rx interrupts cannot be enabled",
-			      RTE_MAX_RXTX_INTR_VEC_ID);
-			priv_rx_intr_vec_disable(priv);
-			return -rte_errno;
-		}
-		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
-		intr_handle->efds[count] = rxq->channel->fd;
-		count++;
-	}
-	if (!count)
-		priv_rx_intr_vec_disable(priv);
-	else
-		intr_handle->nb_efd = count;
-	return 0;
-}
-
-/**
- * Clean up Rx interrupts handler.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_rx_intr_vec_disable(struct priv *priv)
-{
-	struct rte_intr_handle *intr_handle = &priv->intr_handle;
-
-	rte_intr_free_epoll_fd(intr_handle);
-	free(intr_handle->intr_vec);
-	intr_handle->nb_efd = 0;
-	intr_handle->intr_vec = NULL;
-}
-
-/**
- * DPDK callback for Rx queue interrupt enable.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Rx queue index.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
-	int ret;
-
-	if (!rxq || !rxq->channel)
-		ret = EINVAL;
-	else
-		ret = ibv_req_notify_cq(rxq->cq, 0);
-	if (ret) {
-		rte_errno = ret;
-		WARN("unable to arm interrupt on rx queue %d", idx);
-	}
-	return -ret;
-}
-
-/**
- * DPDK callback for Rx queue interrupt disable.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   Rx queue index.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
-	struct ibv_cq *ev_cq;
-	void *ev_ctx;
-	int ret;
-
-	if (!rxq || !rxq->channel) {
-		ret = EINVAL;
-	} else {
-		ret = ibv_get_cq_event(rxq->cq->channel, &ev_cq, &ev_ctx);
-		if (ret || ev_cq != rxq->cq)
-			ret = EINVAL;
-	}
-	if (ret) {
-		rte_errno = ret;
-		WARN("unable to disable interrupt on rx queue %d",
-		     idx);
-	} else {
-		ibv_ack_cq_events(rxq->cq, 1);
-	}
-	return -ret;
-}
-
 /**
  * Verify and store value for device argument.
  *
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a35a94e..6852c4c 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -178,4 +178,15 @@ struct priv {
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 };
 
+/* mlx4.c */
+
+int mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete);
+
+/* mlx4_intr.c */
+
+int mlx4_intr_uninstall(struct priv *priv);
+int mlx4_intr_install(struct priv *priv);
+int mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
+int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
+
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
new file mode 100644
index 0000000..bcf4d59
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -0,0 +1,376 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Interrupts handling for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_alarm.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_interrupts.h>
+
+#include "mlx4.h"
+#include "mlx4_utils.h"
+
+static void mlx4_link_status_alarm(struct priv *priv);
+
+/**
+ * Clean up Rx interrupts handler.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static void
+mlx4_rx_intr_vec_disable(struct priv *priv)
+{
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
+
+	rte_intr_free_epoll_fd(intr_handle);
+	free(intr_handle->intr_vec);
+	intr_handle->nb_efd = 0;
+	intr_handle->intr_vec = NULL;
+}
+
+/**
+ * Allocate queue vector and fill epoll fd list for Rx interrupts.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_rx_intr_vec_enable(struct priv *priv)
+{
+	unsigned int i;
+	unsigned int rxqs_n = priv->rxqs_n;
+	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
+	unsigned int count = 0;
+	struct rte_intr_handle *intr_handle = &priv->intr_handle;
+
+	mlx4_rx_intr_vec_disable(priv);
+	intr_handle->intr_vec = malloc(sizeof(intr_handle->intr_vec[rxqs_n]));
+	if (intr_handle->intr_vec == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("failed to allocate memory for interrupt vector,"
+		      " Rx interrupts will not be supported");
+		return -rte_errno;
+	}
+	for (i = 0; i != n; ++i) {
+		struct rxq *rxq = (*priv->rxqs)[i];
+
+		/* Skip queues that cannot request interrupts. */
+		if (!rxq || !rxq->channel) {
+			/* Use invalid intr_vec[] index to disable entry. */
+			intr_handle->intr_vec[i] =
+				RTE_INTR_VEC_RXTX_OFFSET +
+				RTE_MAX_RXTX_INTR_VEC_ID;
+			continue;
+		}
+		if (count >= RTE_MAX_RXTX_INTR_VEC_ID) {
+			rte_errno = E2BIG;
+			ERROR("too many Rx queues for interrupt vector size"
+			      " (%d), Rx interrupts cannot be enabled",
+			      RTE_MAX_RXTX_INTR_VEC_ID);
+			mlx4_rx_intr_vec_disable(priv);
+			return -rte_errno;
+		}
+		intr_handle->intr_vec[i] = RTE_INTR_VEC_RXTX_OFFSET + count;
+		intr_handle->efds[count] = rxq->channel->fd;
+		count++;
+	}
+	if (!count)
+		mlx4_rx_intr_vec_disable(priv);
+	else
+		intr_handle->nb_efd = count;
+	return 0;
+}
+
+/**
+ * Collect interrupt events.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param events
+ *   Pointer to event flags holder.
+ *
+ * @return
+ *   Number of events.
+ */
+static int
+mlx4_collect_interrupt_events(struct priv *priv, uint32_t *events)
+{
+	struct ibv_async_event event;
+	int port_change = 0;
+	struct rte_eth_link *link = &priv->dev->data->dev_link;
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
+	int ret = 0;
+
+	*events = 0;
+	/* Read all message and acknowledge them. */
+	for (;;) {
+		if (ibv_get_async_event(priv->ctx, &event))
+			break;
+		if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
+		     event.event_type == IBV_EVENT_PORT_ERR) &&
+		    intr_conf->lsc) {
+			port_change = 1;
+			ret++;
+		} else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
+			   intr_conf->rmv) {
+			*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
+			ret++;
+		} else {
+			DEBUG("event type %d on port %d not handled",
+			      event.event_type, event.element.port_num);
+		}
+		ibv_ack_async_event(&event);
+	}
+	if (!port_change)
+		return ret;
+	mlx4_link_update(priv->dev, 0);
+	if (((link->link_speed == 0) && link->link_status) ||
+	    ((link->link_speed != 0) && !link->link_status)) {
+		if (!priv->intr_alarm) {
+			/* Inconsistent status, check again later. */
+			priv->intr_alarm = 1;
+			rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
+					  (void (*)(void *))
+					  mlx4_link_status_alarm,
+					  priv);
+		}
+	} else {
+		*events |= (1 << RTE_ETH_EVENT_INTR_LSC);
+	}
+	return ret;
+}
+
+/**
+ * Process scheduled link status check.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static void
+mlx4_link_status_alarm(struct priv *priv)
+{
+	uint32_t events;
+	int ret;
+
+	assert(priv->intr_alarm == 1);
+	priv->intr_alarm = 0;
+	ret = mlx4_collect_interrupt_events(priv, &events);
+	if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
+		_rte_eth_dev_callback_process(priv->dev,
+					      RTE_ETH_EVENT_INTR_LSC,
+					      NULL, NULL);
+}
+
+/**
+ * Handle interrupts from the NIC.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+static void
+mlx4_interrupt_handler(struct priv *priv)
+{
+	int ret;
+	uint32_t ev;
+	int i;
+
+	ret = mlx4_collect_interrupt_events(priv, &ev);
+	if (ret > 0) {
+		for (i = RTE_ETH_EVENT_UNKNOWN;
+		     i < RTE_ETH_EVENT_MAX;
+		     i++) {
+			if (ev & (1 << i)) {
+				ev &= ~(1 << i);
+				_rte_eth_dev_callback_process(priv->dev, i,
+							      NULL, NULL);
+				ret--;
+			}
+		}
+		if (ret)
+			WARN("%d event%s not processed", ret,
+			     (ret > 1 ? "s were" : " was"));
+	}
+}
+
+/**
+ * Uninstall interrupt handler.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_intr_uninstall(struct priv *priv)
+{
+	int err = rte_errno; /* Make sure rte_errno remains unchanged. */
+
+	if (priv->intr_handle.fd != -1) {
+		rte_intr_callback_unregister(&priv->intr_handle,
+					     (void (*)(void *))
+					     mlx4_interrupt_handler,
+					     priv);
+		priv->intr_handle.fd = -1;
+	}
+	rte_eal_alarm_cancel((void (*)(void *))mlx4_link_status_alarm, priv);
+	priv->intr_alarm = 0;
+	mlx4_rx_intr_vec_disable(priv);
+	rte_errno = err;
+	return 0;
+}
+
+/**
+ * Install interrupt handler.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_intr_install(struct priv *priv)
+{
+	const struct rte_intr_conf *const intr_conf =
+		&priv->dev->data->dev_conf.intr_conf;
+	int rc;
+
+	mlx4_intr_uninstall(priv);
+	if (intr_conf->rxq && mlx4_rx_intr_vec_enable(priv) < 0)
+		goto error;
+	if (intr_conf->lsc | intr_conf->rmv) {
+		priv->intr_handle.fd = priv->ctx->async_fd;
+		rc = rte_intr_callback_register(&priv->intr_handle,
+						(void (*)(void *))
+						mlx4_interrupt_handler,
+						priv);
+		if (rc < 0) {
+			rte_errno = -rc;
+			goto error;
+		}
+	}
+	return 0;
+error:
+	mlx4_intr_uninstall(priv);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback for Rx queue interrupt disable.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq *rxq = (*priv->rxqs)[idx];
+	struct ibv_cq *ev_cq;
+	void *ev_ctx;
+	int ret;
+
+	if (!rxq || !rxq->channel) {
+		ret = EINVAL;
+	} else {
+		ret = ibv_get_cq_event(rxq->cq->channel, &ev_cq, &ev_ctx);
+		if (ret || ev_cq != rxq->cq)
+			ret = EINVAL;
+	}
+	if (ret) {
+		rte_errno = ret;
+		WARN("unable to disable interrupt on rx queue %d",
+		     idx);
+	} else {
+		ibv_ack_cq_events(rxq->cq, 1);
+	}
+	return -ret;
+}
+
+/**
+ * DPDK callback for Rx queue interrupt enable.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Rx queue index.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq *rxq = (*priv->rxqs)[idx];
+	int ret;
+
+	if (!rxq || !rxq->channel)
+		ret = EINVAL;
+	else
+		ret = ibv_req_notify_cq(rxq->cq, 0);
+	if (ret) {
+		rte_errno = ret;
+		WARN("unable to arm interrupt on rx queue %d", idx);
+	}
+	return -ret;
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 41/51] net/mlx4: separate Rx/Tx definitions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (39 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 40/51] net/mlx4: separate interrupt handling Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 42/51] net/mlx4: separate Rx/Tx functions Adrien Mazarguil
                     ` (11 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Except for a minor documentation update on internal structure definitions
to make them more Doxygen-friendly, there is no impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |   1 +
 drivers/net/mlx4/mlx4.h      |  69 +--------------------
 drivers/net/mlx4/mlx4_flow.c |   1 +
 drivers/net/mlx4/mlx4_intr.c |   1 +
 drivers/net/mlx4/mlx4_rxtx.h | 122 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 127 insertions(+), 67 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 667ba2b..ba06075 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -71,6 +71,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
 /** Configuration structure for device arguments. */
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 6852c4c..edbece6 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -85,73 +85,8 @@ enum {
 
 #define MLX4_DRIVER_NAME "net_mlx4"
 
-struct mlx4_rxq_stats {
-	unsigned int idx; /**< Mapping index. */
-	uint64_t ipackets; /**< Total of successfully received packets. */
-	uint64_t ibytes; /**< Total of successfully received bytes. */
-	uint64_t idropped; /**< Total of packets dropped when RX ring full. */
-	uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
-};
-
-/* RX element. */
-struct rxq_elt {
-	struct ibv_recv_wr wr; /* Work Request. */
-	struct ibv_sge sge; /* Scatter/Gather Element. */
-	struct rte_mbuf *buf; /**< Buffer. */
-};
-
-/* RX queue descriptor. */
-struct rxq {
-	struct priv *priv; /* Back pointer to private data. */
-	struct rte_mempool *mp; /* Memory Pool for allocations. */
-	struct ibv_mr *mr; /* Memory Region (for mp). */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
-	struct ibv_comp_channel *channel;
-	unsigned int port_id; /* Port ID for incoming packets. */
-	unsigned int elts_n; /* (*elts)[] length. */
-	unsigned int elts_head; /* Current index in (*elts)[]. */
-	struct rxq_elt (*elts)[]; /* Rx elements. */
-	struct mlx4_rxq_stats stats; /* RX queue counters. */
-	unsigned int socket; /* CPU socket ID for allocations. */
-};
-
-/* TX element. */
-struct txq_elt {
-	struct ibv_send_wr wr; /* Work request. */
-	struct ibv_sge sge; /* Scatter/gather element. */
-	struct rte_mbuf *buf;
-};
-
-struct mlx4_txq_stats {
-	unsigned int idx; /**< Mapping index. */
-	uint64_t opackets; /**< Total of successfully sent packets. */
-	uint64_t obytes;   /**< Total of successfully sent bytes. */
-	uint64_t odropped; /**< Total of packets not sent when TX ring full. */
-};
-
-/* TX queue descriptor. */
-struct txq {
-	struct priv *priv; /* Back pointer to private data. */
-	struct {
-		const struct rte_mempool *mp; /* Cached Memory Pool. */
-		struct ibv_mr *mr; /* Memory Region (for mp). */
-		uint32_t lkey; /* mr->lkey */
-	} mp2mr[MLX4_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
-	struct ibv_cq *cq; /* Completion Queue. */
-	struct ibv_qp *qp; /* Queue Pair. */
-	uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
-	unsigned int elts_n; /* (*elts)[] length. */
-	struct txq_elt (*elts)[]; /* TX elements. */
-	unsigned int elts_head; /* Current index in (*elts)[]. */
-	unsigned int elts_tail; /* First element awaiting completion. */
-	unsigned int elts_comp; /* Number of completion requests. */
-	unsigned int elts_comp_cd; /* Countdown for next completion request. */
-	unsigned int elts_comp_cd_init; /* Initial value for countdown. */
-	struct mlx4_txq_stats stats; /* TX queue counters. */
-	unsigned int socket; /* CPU socket ID for allocations. */
-};
-
+struct rxq;
+struct txq;
 struct rte_flow;
 
 struct priv {
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 6f6f455..61455ce 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -40,6 +40,7 @@
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
+#include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
 /** Static initializer for items. */
diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
index bcf4d59..76d2e01 100644
--- a/drivers/net/mlx4/mlx4_intr.c
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -56,6 +56,7 @@
 #include <rte_interrupts.h>
 
 #include "mlx4.h"
+#include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
 static void mlx4_link_status_alarm(struct priv *priv);
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
new file mode 100644
index 0000000..ea55aed
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -0,0 +1,122 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef MLX4_RXTX_H_
+#define MLX4_RXTX_H_
+
+#include <stdint.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#include "mlx4.h"
+
+/** Rx queue counters. */
+struct mlx4_rxq_stats {
+	unsigned int idx; /**< Mapping index. */
+	uint64_t ipackets; /**< Total of successfully received packets. */
+	uint64_t ibytes; /**< Total of successfully received bytes. */
+	uint64_t idropped; /**< Total of packets dropped when Rx ring full. */
+	uint64_t rx_nombuf; /**< Total of Rx mbuf allocation failures. */
+};
+
+/** Rx element. */
+struct rxq_elt {
+	struct ibv_recv_wr wr; /**< Work request. */
+	struct ibv_sge sge; /**< Scatter/gather element. */
+	struct rte_mbuf *buf; /**< Buffer. */
+};
+
+/** Rx queue descriptor. */
+struct rxq {
+	struct priv *priv; /**< Back pointer to private data. */
+	struct rte_mempool *mp; /**< Memory pool for allocations. */
+	struct ibv_mr *mr; /**< Memory region (for mp). */
+	struct ibv_cq *cq; /**< Completion queue. */
+	struct ibv_qp *qp; /**< Queue pair. */
+	struct ibv_comp_channel *channel; /**< Rx completion channel. */
+	unsigned int port_id; /**< Port ID for incoming packets. */
+	unsigned int elts_n; /**< (*elts)[] length. */
+	unsigned int elts_head; /**< Current index in (*elts)[]. */
+	struct rxq_elt (*elts)[]; /**< Rx elements. */
+	struct mlx4_rxq_stats stats; /**< Rx queue counters. */
+	unsigned int socket; /**< CPU socket ID for allocations. */
+};
+
+/** Tx element. */
+struct txq_elt {
+	struct ibv_send_wr wr; /* Work request. */
+	struct ibv_sge sge; /* Scatter/gather element. */
+	struct rte_mbuf *buf; /**< Buffer. */
+};
+
+/** Rx queue counters. */
+struct mlx4_txq_stats {
+	unsigned int idx; /**< Mapping index. */
+	uint64_t opackets; /**< Total of successfully sent packets. */
+	uint64_t obytes; /**< Total of successfully sent bytes. */
+	uint64_t odropped; /**< Total of packets not sent when Tx ring full. */
+};
+
+/** Tx queue descriptor. */
+struct txq {
+	struct priv *priv; /**< Back pointer to private data. */
+	struct {
+		const struct rte_mempool *mp; /**< Cached memory pool. */
+		struct ibv_mr *mr; /**< Memory region (for mp). */
+		uint32_t lkey; /**< mr->lkey copy. */
+	} mp2mr[MLX4_PMD_TX_MP_CACHE]; /**< MP to MR translation table. */
+	struct ibv_cq *cq; /**< Completion queue. */
+	struct ibv_qp *qp; /**< Queue pair. */
+	uint32_t max_inline; /**< Max inline send size. */
+	unsigned int elts_n; /**< (*elts)[] length. */
+	struct txq_elt (*elts)[]; /**< Tx elements. */
+	unsigned int elts_head; /**< Current index in (*elts)[]. */
+	unsigned int elts_tail; /**< First element awaiting completion. */
+	unsigned int elts_comp; /**< Number of completion requests. */
+	unsigned int elts_comp_cd; /**< Countdown for next completion. */
+	unsigned int elts_comp_cd_init; /**< Initial value for countdown. */
+	struct mlx4_txq_stats stats; /**< Tx queue counters. */
+	unsigned int socket; /**< CPU socket ID for allocations. */
+};
+
+#endif /* MLX4_RXTX_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 42/51] net/mlx4: separate Rx/Tx functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (40 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 41/51] net/mlx4: separate Rx/Tx definitions Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 43/51] net/mlx4: separate device control functions Adrien Mazarguil
                     ` (10 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

This commit groups all data plane functions (Rx/Tx) into a separate file
and adjusts header files accordingly.

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 484 +----------------------------------
 drivers/net/mlx4/mlx4.h      |   2 +
 drivers/net/mlx4/mlx4_rxtx.c | 524 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_rxtx.h |  12 +
 5 files changed, 545 insertions(+), 478 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index f6e3001..8def32a 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -38,6 +38,7 @@ LIB = librte_pmd_mlx4.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ba06075..a409ec2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -56,13 +56,11 @@
 #include <rte_mbuf.h>
 #include <rte_errno.h>
 #include <rte_mempool.h>
-#include <rte_prefetch.h>
 #include <rte_malloc.h>
 #include <rte_memory.h>
 #include <rte_flow.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
-#include <rte_branch_prediction.h>
 #include <rte_common.h>
 
 /* Generated configuration header. */
@@ -505,9 +503,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-static uint16_t mlx4_tx_burst(void *, struct rte_mbuf **, uint16_t);
-static uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
-
 /* TX queues handling. */
 
 /**
@@ -630,53 +625,6 @@ txq_cleanup(struct txq *txq)
 	memset(txq, 0, sizeof(*txq));
 }
 
-/**
- * Manage TX completions.
- *
- * When sending a burst, mlx4_tx_burst() posts several WRs.
- * To improve performance, a completion event is only required once every
- * MLX4_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
- * for other WRs, but this information would not be used anyway.
- *
- * @param txq
- *   Pointer to TX queue structure.
- *
- * @return
- *   0 on success, -1 on failure.
- */
-static int
-txq_complete(struct txq *txq)
-{
-	unsigned int elts_comp = txq->elts_comp;
-	unsigned int elts_tail = txq->elts_tail;
-	const unsigned int elts_n = txq->elts_n;
-	struct ibv_wc wcs[elts_comp];
-	int wcs_n;
-
-	if (unlikely(elts_comp == 0))
-		return 0;
-	wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
-	if (unlikely(wcs_n == 0))
-		return 0;
-	if (unlikely(wcs_n < 0)) {
-		DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
-		      (void *)txq, wcs_n);
-		return -1;
-	}
-	elts_comp -= wcs_n;
-	assert(elts_comp <= txq->elts_comp);
-	/*
-	 * Assume WC status is successful as nothing can be done about it
-	 * anyway.
-	 */
-	elts_tail += wcs_n * txq->elts_comp_cd_init;
-	if (elts_tail >= elts_n)
-		elts_tail -= elts_n;
-	txq->elts_tail = elts_tail;
-	txq->elts_comp = elts_comp;
-	return 0;
-}
-
 struct mlx4_check_mempool_data {
 	int ret;
 	char *start;
@@ -738,10 +686,6 @@ static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
 	return data.ret;
 }
 
-/* For best performance, this function should not be inlined. */
-static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
-	__rte_noinline;
-
 /**
  * Register mempool as a memory region.
  *
@@ -753,7 +697,7 @@ static struct ibv_mr *mlx4_mp2mr(struct ibv_pd *, struct rte_mempool *)
  * @return
  *   Memory region pointer, NULL in case of error and rte_errno is set.
  */
-static struct ibv_mr *
+struct ibv_mr *
 mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 {
 	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
@@ -794,81 +738,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	return mr;
 }
 
-/**
- * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which
- * the cloned mbuf is allocated is returned instead.
- *
- * @param buf
- *   Pointer to mbuf.
- *
- * @return
- *   Memory pool where data is located for given mbuf.
- */
-static struct rte_mempool *
-txq_mb2mp(struct rte_mbuf *buf)
-{
-	if (unlikely(RTE_MBUF_INDIRECT(buf)))
-		return rte_mbuf_from_indirect(buf)->pool;
-	return buf->pool;
-}
-
-/**
- * Get Memory Region (MR) <-> Memory Pool (MP) association from txq->mp2mr[].
- * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full,
- * remove an entry first.
- *
- * @param txq
- *   Pointer to TX queue structure.
- * @param[in] mp
- *   Memory Pool for which a Memory Region lkey must be returned.
- *
- * @return
- *   mr->lkey on success, (uint32_t)-1 on failure.
- */
-static uint32_t
-txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
-{
-	unsigned int i;
-	struct ibv_mr *mr;
-
-	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
-		if (unlikely(txq->mp2mr[i].mp == NULL)) {
-			/* Unknown MP, add a new MR for it. */
-			break;
-		}
-		if (txq->mp2mr[i].mp == mp) {
-			assert(txq->mp2mr[i].lkey != (uint32_t)-1);
-			assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey);
-			return txq->mp2mr[i].lkey;
-		}
-	}
-	/* Add a new entry, register MR first. */
-	DEBUG("%p: discovered new memory pool \"%s\" (%p)",
-	      (void *)txq, mp->name, (void *)mp);
-	mr = mlx4_mp2mr(txq->priv->pd, mp);
-	if (unlikely(mr == NULL)) {
-		DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
-		      (void *)txq);
-		return (uint32_t)-1;
-	}
-	if (unlikely(i == RTE_DIM(txq->mp2mr))) {
-		/* Table is full, remove oldest entry. */
-		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
-		      (void *)txq);
-		--i;
-		claim_zero(ibv_dereg_mr(txq->mp2mr[0].mr));
-		memmove(&txq->mp2mr[0], &txq->mp2mr[1],
-			(sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
-	}
-	/* Store the new entry. */
-	txq->mp2mr[i].mp = mp;
-	txq->mp2mr[i].mr = mr;
-	txq->mp2mr[i].lkey = mr->lkey;
-	DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
-	      (void *)txq, mp->name, (void *)mp, txq->mp2mr[i].lkey);
-	return txq->mp2mr[i].lkey;
-}
-
 struct txq_mp2mr_mbuf_check_data {
 	int ret;
 };
@@ -923,172 +792,7 @@ txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
 	if (rte_mempool_obj_iter(mp, txq_mp2mr_mbuf_check, &data) == 0 ||
 			data.ret == -1)
 		return;
-	txq_mp2mr(txq, mp);
-}
-
-/**
- * DPDK callback for TX.
- *
- * @param dpdk_txq
- *   Generic pointer to TX queue structure.
- * @param[in] pkts
- *   Packets to transmit.
- * @param pkts_n
- *   Number of packets in array.
- *
- * @return
- *   Number of packets successfully transmitted (<= pkts_n).
- */
-static uint16_t
-mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	struct txq *txq = (struct txq *)dpdk_txq;
-	struct ibv_send_wr *wr_head = NULL;
-	struct ibv_send_wr **wr_next = &wr_head;
-	struct ibv_send_wr *wr_bad = NULL;
-	unsigned int elts_head = txq->elts_head;
-	const unsigned int elts_n = txq->elts_n;
-	unsigned int elts_comp_cd = txq->elts_comp_cd;
-	unsigned int elts_comp = 0;
-	unsigned int i;
-	unsigned int max;
-	int err;
-
-	assert(elts_comp_cd != 0);
-	txq_complete(txq);
-	max = (elts_n - (elts_head - txq->elts_tail));
-	if (max > elts_n)
-		max -= elts_n;
-	assert(max >= 1);
-	assert(max <= elts_n);
-	/* Always leave one free entry in the ring. */
-	--max;
-	if (max == 0)
-		return 0;
-	if (max > pkts_n)
-		max = pkts_n;
-	for (i = 0; (i != max); ++i) {
-		struct rte_mbuf *buf = pkts[i];
-		unsigned int elts_head_next =
-			(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
-		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
-		struct txq_elt *elt = &(*txq->elts)[elts_head];
-		struct ibv_send_wr *wr = &elt->wr;
-		unsigned int segs = buf->nb_segs;
-		unsigned int sent_size = 0;
-		uint32_t send_flags = 0;
-
-		/* Clean up old buffer. */
-		if (likely(elt->buf != NULL)) {
-			struct rte_mbuf *tmp = elt->buf;
-
-#ifndef NDEBUG
-			/* Poisoning. */
-			memset(elt, 0x66, sizeof(*elt));
-#endif
-			/* Faster than rte_pktmbuf_free(). */
-			do {
-				struct rte_mbuf *next = tmp->next;
-
-				rte_pktmbuf_free_seg(tmp);
-				tmp = next;
-			} while (tmp != NULL);
-		}
-		/* Request TX completion. */
-		if (unlikely(--elts_comp_cd == 0)) {
-			elts_comp_cd = txq->elts_comp_cd_init;
-			++elts_comp;
-			send_flags |= IBV_SEND_SIGNALED;
-		}
-		if (likely(segs == 1)) {
-			struct ibv_sge *sge = &elt->sge;
-			uintptr_t addr;
-			uint32_t length;
-			uint32_t lkey;
-
-			/* Retrieve buffer information. */
-			addr = rte_pktmbuf_mtod(buf, uintptr_t);
-			length = buf->data_len;
-			/* Retrieve Memory Region key for this memory pool. */
-			lkey = txq_mp2mr(txq, txq_mb2mp(buf));
-			if (unlikely(lkey == (uint32_t)-1)) {
-				/* MR does not exist. */
-				DEBUG("%p: unable to get MP <-> MR"
-				      " association", (void *)txq);
-				/* Clean up TX element. */
-				elt->buf = NULL;
-				goto stop;
-			}
-			/* Update element. */
-			elt->buf = buf;
-			if (txq->priv->vf)
-				rte_prefetch0((volatile void *)
-					      (uintptr_t)addr);
-			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
-			sge->addr = addr;
-			sge->length = length;
-			sge->lkey = lkey;
-			sent_size += length;
-		} else {
-			err = -1;
-			goto stop;
-		}
-		if (sent_size <= txq->max_inline)
-			send_flags |= IBV_SEND_INLINE;
-		elts_head = elts_head_next;
-		/* Increment sent bytes counter. */
-		txq->stats.obytes += sent_size;
-		/* Set up WR. */
-		wr->sg_list = &elt->sge;
-		wr->num_sge = segs;
-		wr->opcode = IBV_WR_SEND;
-		wr->send_flags = send_flags;
-		*wr_next = wr;
-		wr_next = &wr->next;
-	}
-stop:
-	/* Take a shortcut if nothing must be sent. */
-	if (unlikely(i == 0))
-		return 0;
-	/* Increment sent packets counter. */
-	txq->stats.opackets += i;
-	/* Ring QP doorbell. */
-	*wr_next = NULL;
-	assert(wr_head);
-	err = ibv_post_send(txq->qp, wr_head, &wr_bad);
-	if (unlikely(err)) {
-		uint64_t obytes = 0;
-		uint64_t opackets = 0;
-
-		/* Rewind bad WRs. */
-		while (wr_bad != NULL) {
-			int j;
-
-			/* Force completion request if one was lost. */
-			if (wr_bad->send_flags & IBV_SEND_SIGNALED) {
-				elts_comp_cd = 1;
-				--elts_comp;
-			}
-			++opackets;
-			for (j = 0; j < wr_bad->num_sge; ++j)
-				obytes += wr_bad->sg_list[j].length;
-			elts_head = (elts_head ? elts_head : elts_n) - 1;
-			wr_bad = wr_bad->next;
-		}
-		txq->stats.opackets -= opackets;
-		txq->stats.obytes -= obytes;
-		i -= opackets;
-		DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets"
-		      " (%" PRIu64 " bytes) rejected: %s",
-		      (void *)txq,
-		      opackets,
-		      obytes,
-		      (err <= -1) ? "Internal error" : strerror(err));
-	}
-	txq->elts_head = elts_head;
-	txq->elts_comp += elts_comp;
-	txq->elts_comp_cd = elts_comp_cd;
-	return i;
+	mlx4_txq_mp2mr(txq, mp);
 }
 
 /**
@@ -1546,132 +1250,6 @@ rxq_cleanup(struct rxq *rxq)
 }
 
 /**
- * DPDK callback for RX.
- *
- * The following function doesn't manage scattered packets.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
-	const unsigned int elts_n = rxq->elts_n;
-	unsigned int elts_head = rxq->elts_head;
-	struct ibv_wc wcs[pkts_n];
-	struct ibv_recv_wr *wr_head = NULL;
-	struct ibv_recv_wr **wr_next = &wr_head;
-	struct ibv_recv_wr *wr_bad = NULL;
-	unsigned int i;
-	unsigned int pkts_ret = 0;
-	int ret;
-
-	ret = ibv_poll_cq(rxq->cq, pkts_n, wcs);
-	if (unlikely(ret == 0))
-		return 0;
-	if (unlikely(ret < 0)) {
-		DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
-		      (void *)rxq, ret);
-		return 0;
-	}
-	assert(ret <= (int)pkts_n);
-	/* For each work completion. */
-	for (i = 0; i != (unsigned int)ret; ++i) {
-		struct ibv_wc *wc = &wcs[i];
-		struct rxq_elt *elt = &(*elts)[elts_head];
-		struct ibv_recv_wr *wr = &elt->wr;
-		uint32_t len = wc->byte_len;
-		struct rte_mbuf *seg = elt->buf;
-		struct rte_mbuf *rep;
-
-		/* Sanity checks. */
-		assert(wr->sg_list == &elt->sge);
-		assert(wr->num_sge == 1);
-		assert(elts_head < rxq->elts_n);
-		assert(rxq->elts_head < rxq->elts_n);
-		/*
-		 * Fetch initial bytes of packet descriptor into a
-		 * cacheline while allocating rep.
-		 */
-		rte_mbuf_prefetch_part1(seg);
-		rte_mbuf_prefetch_part2(seg);
-		/* Link completed WRs together for repost. */
-		*wr_next = wr;
-		wr_next = &wr->next;
-		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
-			/* Whatever, just repost the offending WR. */
-			DEBUG("rxq=%p: bad work completion status (%d): %s",
-			      (void *)rxq, wc->status,
-			      ibv_wc_status_str(wc->status));
-			/* Increment dropped packets counter. */
-			++rxq->stats.idropped;
-			goto repost;
-		}
-		rep = rte_mbuf_raw_alloc(rxq->mp);
-		if (unlikely(rep == NULL)) {
-			/*
-			 * Unable to allocate a replacement mbuf,
-			 * repost WR.
-			 */
-			DEBUG("rxq=%p: can't allocate a new mbuf",
-			      (void *)rxq);
-			/* Increase out of memory counters. */
-			++rxq->stats.rx_nombuf;
-			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
-			goto repost;
-		}
-		/* Reconfigure sge to use rep instead of seg. */
-		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
-		assert(elt->sge.lkey == rxq->mr->lkey);
-		elt->buf = rep;
-		/* Update seg information. */
-		seg->data_off = RTE_PKTMBUF_HEADROOM;
-		seg->nb_segs = 1;
-		seg->port = rxq->port_id;
-		seg->next = NULL;
-		seg->pkt_len = len;
-		seg->data_len = len;
-		seg->packet_type = 0;
-		seg->ol_flags = 0;
-		/* Return packet. */
-		*(pkts++) = seg;
-		++pkts_ret;
-		/* Increase bytes counter. */
-		rxq->stats.ibytes += len;
-repost:
-		if (++elts_head >= elts_n)
-			elts_head = 0;
-		continue;
-	}
-	if (unlikely(i == 0))
-		return 0;
-	/* Repost WRs. */
-	*wr_next = NULL;
-	assert(wr_head);
-	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
-	if (unlikely(ret)) {
-		/* Inability to repost WRs is fatal. */
-		DEBUG("%p: recv_burst(): failed (ret=%d)",
-		      (void *)rxq->priv,
-		      ret);
-		abort();
-	}
-	rxq->elts_head = elts_head;
-	/* Increase packets counter. */
-	rxq->stats.ipackets += pkts_ret;
-	return pkts_ret;
-}
-
-/**
  * Allocate a Queue Pair.
  * Optionally setup inline receive if supported.
  *
@@ -2032,56 +1610,6 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 }
 
 /**
- * Dummy DPDK callback for TX.
- *
- * This function is used to temporarily replace the real callback during
- * unsafe control operations on the queue, or in case of error.
- *
- * @param dpdk_txq
- *   Generic pointer to TX queue structure.
- * @param[in] pkts
- *   Packets to transmit.
- * @param pkts_n
- *   Number of packets in array.
- *
- * @return
- *   Number of packets successfully transmitted (<= pkts_n).
- */
-static uint16_t
-removed_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	(void)dpdk_txq;
-	(void)pkts;
-	(void)pkts_n;
-	return 0;
-}
-
-/**
- * Dummy DPDK callback for RX.
- *
- * This function is used to temporarily replace the real callback during
- * unsafe control operations on the queue, or in case of error.
- *
- * @param dpdk_rxq
- *   Generic pointer to RX queue structure.
- * @param[out] pkts
- *   Array to store received packets.
- * @param pkts_n
- *   Maximum number of packets in array.
- *
- * @return
- *   Number of packets successfully received (<= pkts_n).
- */
-static uint16_t
-removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
-{
-	(void)dpdk_rxq;
-	(void)pkts;
-	(void)pkts_n;
-	return 0;
-}
-
-/**
  * DPDK callback to close the device.
  *
  * Destroy all queues and objects, free memory.
@@ -2107,8 +1635,8 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
 	 * never release them before closing the device.
 	 */
-	dev->rx_pkt_burst = removed_rx_burst;
-	dev->tx_pkt_burst = removed_tx_burst;
+	dev->rx_pkt_burst = mlx4_rx_burst_removed;
+	dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	if (priv->rxqs != NULL) {
 		/* XXX race condition if mlx4_rx_burst() is still running. */
 		usleep(1000);
@@ -2173,8 +1701,8 @@ priv_set_link(struct priv *priv, int up)
 		err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
 		if (err)
 			return err;
-		dev->rx_pkt_burst = removed_rx_burst;
-		dev->tx_pkt_burst = removed_tx_burst;
+		dev->rx_pkt_burst = mlx4_rx_burst_removed;
+		dev->tx_pkt_burst = mlx4_tx_burst_removed;
 	}
 	return 0;
 }
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index edbece6..efccf1a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -49,6 +49,7 @@
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 #include <rte_interrupts.h>
+#include <rte_mempool.h>
 
 /* Request send completion once in every 64 sends, might be less. */
 #define MLX4_PMD_TX_PER_COMP_REQ 64
@@ -115,6 +116,7 @@ struct priv {
 
 /* mlx4.c */
 
+struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
 int mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete);
 
 /* mlx4_intr.c */
diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
new file mode 100644
index 0000000..b5e7777
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_rxtx.c
@@ -0,0 +1,524 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Data plane functions for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_branch_prediction.h>
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_prefetch.h>
+
+#include "mlx4.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Manage Tx completions.
+ *
+ * When sending a burst, mlx4_tx_burst() posts several WRs.
+ * To improve performance, a completion event is only required once every
+ * MLX4_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
+ * for other WRs, but this information would not be used anyway.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ *
+ * @return
+ *   0 on success, -1 on failure.
+ */
+static int
+mlx4_txq_complete(struct txq *txq)
+{
+	unsigned int elts_comp = txq->elts_comp;
+	unsigned int elts_tail = txq->elts_tail;
+	const unsigned int elts_n = txq->elts_n;
+	struct ibv_wc wcs[elts_comp];
+	int wcs_n;
+
+	if (unlikely(elts_comp == 0))
+		return 0;
+	wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
+	if (unlikely(wcs_n == 0))
+		return 0;
+	if (unlikely(wcs_n < 0)) {
+		DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
+		      (void *)txq, wcs_n);
+		return -1;
+	}
+	elts_comp -= wcs_n;
+	assert(elts_comp <= txq->elts_comp);
+	/*
+	 * Assume WC status is successful as nothing can be done about it
+	 * anyway.
+	 */
+	elts_tail += wcs_n * txq->elts_comp_cd_init;
+	if (elts_tail >= elts_n)
+		elts_tail -= elts_n;
+	txq->elts_tail = elts_tail;
+	txq->elts_comp = elts_comp;
+	return 0;
+}
+
+/**
+ * Get memory pool (MP) from mbuf. If mbuf is indirect, the pool from which
+ * the cloned mbuf is allocated is returned instead.
+ *
+ * @param buf
+ *   Pointer to mbuf.
+ *
+ * @return
+ *   Memory pool where data is located for given mbuf.
+ */
+static struct rte_mempool *
+mlx4_txq_mb2mp(struct rte_mbuf *buf)
+{
+	if (unlikely(RTE_MBUF_INDIRECT(buf)))
+		return rte_mbuf_from_indirect(buf)->pool;
+	return buf->pool;
+}
+
+/**
+ * Get memory region (MR) <-> memory pool (MP) association from txq->mp2mr[].
+ * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full,
+ * remove an entry first.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ * @param[in] mp
+ *   Memory pool for which a memory region lkey must be returned.
+ *
+ * @return
+ *   mr->lkey on success, (uint32_t)-1 on failure.
+ */
+uint32_t
+mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
+{
+	unsigned int i;
+	struct ibv_mr *mr;
+
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+		if (unlikely(txq->mp2mr[i].mp == NULL)) {
+			/* Unknown MP, add a new MR for it. */
+			break;
+		}
+		if (txq->mp2mr[i].mp == mp) {
+			assert(txq->mp2mr[i].lkey != (uint32_t)-1);
+			assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey);
+			return txq->mp2mr[i].lkey;
+		}
+	}
+	/* Add a new entry, register MR first. */
+	DEBUG("%p: discovered new memory pool \"%s\" (%p)",
+	      (void *)txq, mp->name, (void *)mp);
+	mr = mlx4_mp2mr(txq->priv->pd, mp);
+	if (unlikely(mr == NULL)) {
+		DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
+		      (void *)txq);
+		return (uint32_t)-1;
+	}
+	if (unlikely(i == RTE_DIM(txq->mp2mr))) {
+		/* Table is full, remove oldest entry. */
+		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
+		      (void *)txq);
+		--i;
+		claim_zero(ibv_dereg_mr(txq->mp2mr[0].mr));
+		memmove(&txq->mp2mr[0], &txq->mp2mr[1],
+			(sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
+	}
+	/* Store the new entry. */
+	txq->mp2mr[i].mp = mp;
+	txq->mp2mr[i].mr = mr;
+	txq->mp2mr[i].lkey = mr->lkey;
+	DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
+	      (void *)txq, mp->name, (void *)mp, txq->mp2mr[i].lkey);
+	return txq->mp2mr[i].lkey;
+}
+
+/**
+ * DPDK callback for Tx.
+ *
+ * @param dpdk_txq
+ *   Generic pointer to Tx queue structure.
+ * @param[in] pkts
+ *   Packets to transmit.
+ * @param pkts_n
+ *   Number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	struct txq *txq = (struct txq *)dpdk_txq;
+	struct ibv_send_wr *wr_head = NULL;
+	struct ibv_send_wr **wr_next = &wr_head;
+	struct ibv_send_wr *wr_bad = NULL;
+	unsigned int elts_head = txq->elts_head;
+	const unsigned int elts_n = txq->elts_n;
+	unsigned int elts_comp_cd = txq->elts_comp_cd;
+	unsigned int elts_comp = 0;
+	unsigned int i;
+	unsigned int max;
+	int err;
+
+	assert(elts_comp_cd != 0);
+	mlx4_txq_complete(txq);
+	max = (elts_n - (elts_head - txq->elts_tail));
+	if (max > elts_n)
+		max -= elts_n;
+	assert(max >= 1);
+	assert(max <= elts_n);
+	/* Always leave one free entry in the ring. */
+	--max;
+	if (max == 0)
+		return 0;
+	if (max > pkts_n)
+		max = pkts_n;
+	for (i = 0; (i != max); ++i) {
+		struct rte_mbuf *buf = pkts[i];
+		unsigned int elts_head_next =
+			(((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
+		struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
+		struct txq_elt *elt = &(*txq->elts)[elts_head];
+		struct ibv_send_wr *wr = &elt->wr;
+		unsigned int segs = buf->nb_segs;
+		unsigned int sent_size = 0;
+		uint32_t send_flags = 0;
+
+		/* Clean up old buffer. */
+		if (likely(elt->buf != NULL)) {
+			struct rte_mbuf *tmp = elt->buf;
+
+#ifndef NDEBUG
+			/* Poisoning. */
+			memset(elt, 0x66, sizeof(*elt));
+#endif
+			/* Faster than rte_pktmbuf_free(). */
+			do {
+				struct rte_mbuf *next = tmp->next;
+
+				rte_pktmbuf_free_seg(tmp);
+				tmp = next;
+			} while (tmp != NULL);
+		}
+		/* Request Tx completion. */
+		if (unlikely(--elts_comp_cd == 0)) {
+			elts_comp_cd = txq->elts_comp_cd_init;
+			++elts_comp;
+			send_flags |= IBV_SEND_SIGNALED;
+		}
+		if (likely(segs == 1)) {
+			struct ibv_sge *sge = &elt->sge;
+			uintptr_t addr;
+			uint32_t length;
+			uint32_t lkey;
+
+			/* Retrieve buffer information. */
+			addr = rte_pktmbuf_mtod(buf, uintptr_t);
+			length = buf->data_len;
+			/* Retrieve memory region key for this memory pool. */
+			lkey = mlx4_txq_mp2mr(txq, mlx4_txq_mb2mp(buf));
+			if (unlikely(lkey == (uint32_t)-1)) {
+				/* MR does not exist. */
+				DEBUG("%p: unable to get MP <-> MR"
+				      " association", (void *)txq);
+				/* Clean up Tx element. */
+				elt->buf = NULL;
+				goto stop;
+			}
+			/* Update element. */
+			elt->buf = buf;
+			if (txq->priv->vf)
+				rte_prefetch0((volatile void *)
+					      (uintptr_t)addr);
+			RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+			sge->addr = addr;
+			sge->length = length;
+			sge->lkey = lkey;
+			sent_size += length;
+		} else {
+			err = -1;
+			goto stop;
+		}
+		if (sent_size <= txq->max_inline)
+			send_flags |= IBV_SEND_INLINE;
+		elts_head = elts_head_next;
+		/* Increment sent bytes counter. */
+		txq->stats.obytes += sent_size;
+		/* Set up WR. */
+		wr->sg_list = &elt->sge;
+		wr->num_sge = segs;
+		wr->opcode = IBV_WR_SEND;
+		wr->send_flags = send_flags;
+		*wr_next = wr;
+		wr_next = &wr->next;
+	}
+stop:
+	/* Take a shortcut if nothing must be sent. */
+	if (unlikely(i == 0))
+		return 0;
+	/* Increment sent packets counter. */
+	txq->stats.opackets += i;
+	/* Ring QP doorbell. */
+	*wr_next = NULL;
+	assert(wr_head);
+	err = ibv_post_send(txq->qp, wr_head, &wr_bad);
+	if (unlikely(err)) {
+		uint64_t obytes = 0;
+		uint64_t opackets = 0;
+
+		/* Rewind bad WRs. */
+		while (wr_bad != NULL) {
+			int j;
+
+			/* Force completion request if one was lost. */
+			if (wr_bad->send_flags & IBV_SEND_SIGNALED) {
+				elts_comp_cd = 1;
+				--elts_comp;
+			}
+			++opackets;
+			for (j = 0; j < wr_bad->num_sge; ++j)
+				obytes += wr_bad->sg_list[j].length;
+			elts_head = (elts_head ? elts_head : elts_n) - 1;
+			wr_bad = wr_bad->next;
+		}
+		txq->stats.opackets -= opackets;
+		txq->stats.obytes -= obytes;
+		i -= opackets;
+		DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets"
+		      " (%" PRIu64 " bytes) rejected: %s",
+		      (void *)txq,
+		      opackets,
+		      obytes,
+		      (err <= -1) ? "Internal error" : strerror(err));
+	}
+	txq->elts_head = elts_head;
+	txq->elts_comp += elts_comp;
+	txq->elts_comp_cd = elts_comp_cd;
+	return i;
+}
+
+/**
+ * DPDK callback for Rx.
+ *
+ * The following function doesn't manage scattered packets.
+ *
+ * @param dpdk_rxq
+ *   Generic pointer to Rx queue structure.
+ * @param[out] pkts
+ *   Array to store received packets.
+ * @param pkts_n
+ *   Maximum number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	struct rxq *rxq = (struct rxq *)dpdk_rxq;
+	struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts;
+	const unsigned int elts_n = rxq->elts_n;
+	unsigned int elts_head = rxq->elts_head;
+	struct ibv_wc wcs[pkts_n];
+	struct ibv_recv_wr *wr_head = NULL;
+	struct ibv_recv_wr **wr_next = &wr_head;
+	struct ibv_recv_wr *wr_bad = NULL;
+	unsigned int i;
+	unsigned int pkts_ret = 0;
+	int ret;
+
+	ret = ibv_poll_cq(rxq->cq, pkts_n, wcs);
+	if (unlikely(ret == 0))
+		return 0;
+	if (unlikely(ret < 0)) {
+		DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
+		      (void *)rxq, ret);
+		return 0;
+	}
+	assert(ret <= (int)pkts_n);
+	/* For each work completion. */
+	for (i = 0; i != (unsigned int)ret; ++i) {
+		struct ibv_wc *wc = &wcs[i];
+		struct rxq_elt *elt = &(*elts)[elts_head];
+		struct ibv_recv_wr *wr = &elt->wr;
+		uint32_t len = wc->byte_len;
+		struct rte_mbuf *seg = elt->buf;
+		struct rte_mbuf *rep;
+
+		/* Sanity checks. */
+		assert(wr->sg_list == &elt->sge);
+		assert(wr->num_sge == 1);
+		assert(elts_head < rxq->elts_n);
+		assert(rxq->elts_head < rxq->elts_n);
+		/*
+		 * Fetch initial bytes of packet descriptor into a
+		 * cacheline while allocating rep.
+		 */
+		rte_mbuf_prefetch_part1(seg);
+		rte_mbuf_prefetch_part2(seg);
+		/* Link completed WRs together for repost. */
+		*wr_next = wr;
+		wr_next = &wr->next;
+		if (unlikely(wc->status != IBV_WC_SUCCESS)) {
+			/* Whatever, just repost the offending WR. */
+			DEBUG("rxq=%p: bad work completion status (%d): %s",
+			      (void *)rxq, wc->status,
+			      ibv_wc_status_str(wc->status));
+			/* Increment dropped packets counter. */
+			++rxq->stats.idropped;
+			goto repost;
+		}
+		rep = rte_mbuf_raw_alloc(rxq->mp);
+		if (unlikely(rep == NULL)) {
+			/*
+			 * Unable to allocate a replacement mbuf,
+			 * repost WR.
+			 */
+			DEBUG("rxq=%p: can't allocate a new mbuf",
+			      (void *)rxq);
+			/* Increase out of memory counters. */
+			++rxq->stats.rx_nombuf;
+			++rxq->priv->dev->data->rx_mbuf_alloc_failed;
+			goto repost;
+		}
+		/* Reconfigure sge to use rep instead of seg. */
+		elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
+		assert(elt->sge.lkey == rxq->mr->lkey);
+		elt->buf = rep;
+		/* Update seg information. */
+		seg->data_off = RTE_PKTMBUF_HEADROOM;
+		seg->nb_segs = 1;
+		seg->port = rxq->port_id;
+		seg->next = NULL;
+		seg->pkt_len = len;
+		seg->data_len = len;
+		seg->packet_type = 0;
+		seg->ol_flags = 0;
+		/* Return packet. */
+		*(pkts++) = seg;
+		++pkts_ret;
+		/* Increase bytes counter. */
+		rxq->stats.ibytes += len;
+repost:
+		if (++elts_head >= elts_n)
+			elts_head = 0;
+		continue;
+	}
+	if (unlikely(i == 0))
+		return 0;
+	/* Repost WRs. */
+	*wr_next = NULL;
+	assert(wr_head);
+	ret = ibv_post_recv(rxq->qp, wr_head, &wr_bad);
+	if (unlikely(ret)) {
+		/* Inability to repost WRs is fatal. */
+		DEBUG("%p: recv_burst(): failed (ret=%d)",
+		      (void *)rxq->priv,
+		      ret);
+		abort();
+	}
+	rxq->elts_head = elts_head;
+	/* Increase packets counter. */
+	rxq->stats.ipackets += pkts_ret;
+	return pkts_ret;
+}
+
+/**
+ * Dummy DPDK callback for Tx.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_txq
+ *   Generic pointer to Tx queue structure.
+ * @param[in] pkts
+ *   Packets to transmit.
+ * @param pkts_n
+ *   Number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	(void)dpdk_txq;
+	(void)pkts;
+	(void)pkts_n;
+	return 0;
+}
+
+/**
+ * Dummy DPDK callback for Rx.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_rxq
+ *   Generic pointer to Rx queue structure.
+ * @param[out] pkts
+ *   Array to store received packets.
+ * @param pkts_n
+ *   Maximum number of packets in array.
+ *
+ * @return
+ *   Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+	(void)dpdk_rxq;
+	(void)pkts;
+	(void)pkts_n;
+	return 0;
+}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index ea55aed..669c8a4 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -119,4 +119,16 @@ struct txq {
 	unsigned int socket; /**< CPU socket ID for allocations. */
 };
 
+/* mlx4_rxtx.c */
+
+uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp);
+uint16_t mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts,
+		       uint16_t pkts_n);
+uint16_t mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
+		       uint16_t pkts_n);
+uint16_t mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts,
+			       uint16_t pkts_n);
+uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
+			       uint16_t pkts_n);
+
 #endif /* MLX4_RXTX_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 43/51] net/mlx4: separate device control functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (41 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 42/51] net/mlx4: separate Rx/Tx functions Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:06   ` [PATCH v2 44/51] net/mlx4: separate Tx configuration functions Adrien Mazarguil
                     ` (9 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile      |   1 +
 drivers/net/mlx4/mlx4.c        | 752 +---------------------------------
 drivers/net/mlx4/mlx4.h        |  18 +
 drivers/net/mlx4/mlx4_ethdev.c | 793 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_utils.h  |   9 +
 5 files changed, 830 insertions(+), 743 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 8def32a..9549525 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -36,6 +36,7 @@ LIB = librte_pmd_mlx4.a
 
 # Sources.
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index a409ec2..cff57c2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -41,13 +41,6 @@
 #include <errno.h>
 #include <unistd.h>
 #include <assert.h>
-#include <net/if.h>
-#include <dirent.h>
-#include <sys/ioctl.h>
-#include <sys/socket.h>
-#include <netinet/in.h>
-#include <linux/ethtool.h>
-#include <linux/sockios.h>
 
 #include <rte_ether.h>
 #include <rte_ethdev.h>
@@ -86,370 +79,6 @@ const char *pmd_mlx4_init_params[] = {
 	NULL,
 };
 
-/* Allocate a buffer on the stack and fill it with a printf format string. */
-#define MKSTR(name, ...) \
-	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
-	\
-	snprintf(name, sizeof(name), __VA_ARGS__)
-
-/**
- * Get interface name from private structure.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[out] ifname
- *   Interface name output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
-{
-	DIR *dir;
-	struct dirent *dent;
-	unsigned int dev_type = 0;
-	unsigned int dev_port_prev = ~0u;
-	char match[IF_NAMESIZE] = "";
-
-	{
-		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
-
-		dir = opendir(path);
-		if (dir == NULL) {
-			rte_errno = errno;
-			return -rte_errno;
-		}
-	}
-	while ((dent = readdir(dir)) != NULL) {
-		char *name = dent->d_name;
-		FILE *file;
-		unsigned int dev_port;
-		int r;
-
-		if ((name[0] == '.') &&
-		    ((name[1] == '\0') ||
-		     ((name[1] == '.') && (name[2] == '\0'))))
-			continue;
-
-		MKSTR(path, "%s/device/net/%s/%s",
-		      priv->ctx->device->ibdev_path, name,
-		      (dev_type ? "dev_id" : "dev_port"));
-
-		file = fopen(path, "rb");
-		if (file == NULL) {
-			if (errno != ENOENT)
-				continue;
-			/*
-			 * Switch to dev_id when dev_port does not exist as
-			 * is the case with Linux kernel versions < 3.15.
-			 */
-try_dev_id:
-			match[0] = '\0';
-			if (dev_type)
-				break;
-			dev_type = 1;
-			dev_port_prev = ~0u;
-			rewinddir(dir);
-			continue;
-		}
-		r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
-		fclose(file);
-		if (r != 1)
-			continue;
-		/*
-		 * Switch to dev_id when dev_port returns the same value for
-		 * all ports. May happen when using a MOFED release older than
-		 * 3.0 with a Linux kernel >= 3.15.
-		 */
-		if (dev_port == dev_port_prev)
-			goto try_dev_id;
-		dev_port_prev = dev_port;
-		if (dev_port == (priv->port - 1u))
-			snprintf(match, sizeof(match), "%s", name);
-	}
-	closedir(dir);
-	if (match[0] == '\0') {
-		rte_errno = ENODEV;
-		return -rte_errno;
-	}
-	strncpy(*ifname, match, sizeof(*ifname));
-	return 0;
-}
-
-/**
- * Read from sysfs entry.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[in] entry
- *   Entry name relative to sysfs path.
- * @param[out] buf
- *   Data output buffer.
- * @param size
- *   Buffer size.
- *
- * @return
- *   Number of bytes read on success, negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-priv_sysfs_read(const struct priv *priv, const char *entry,
-		char *buf, size_t size)
-{
-	char ifname[IF_NAMESIZE];
-	FILE *file;
-	int ret;
-
-	ret = priv_get_ifname(priv, &ifname);
-	if (ret)
-		return ret;
-
-	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
-	      ifname, entry);
-
-	file = fopen(path, "rb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = fread(buf, 1, size, file);
-	if ((size_t)ret < size && ferror(file)) {
-		rte_errno = EIO;
-		ret = -rte_errno;
-	} else {
-		ret = size;
-	}
-	fclose(file);
-	return ret;
-}
-
-/**
- * Write to sysfs entry.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param[in] entry
- *   Entry name relative to sysfs path.
- * @param[in] buf
- *   Data buffer.
- * @param size
- *   Buffer size.
- *
- * @return
- *   Number of bytes written on success, negative errno value otherwise and
- *   rte_errno is set.
- */
-static int
-priv_sysfs_write(const struct priv *priv, const char *entry,
-		 char *buf, size_t size)
-{
-	char ifname[IF_NAMESIZE];
-	FILE *file;
-	int ret;
-
-	ret = priv_get_ifname(priv, &ifname);
-	if (ret)
-		return ret;
-
-	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
-	      ifname, entry);
-
-	file = fopen(path, "wb");
-	if (file == NULL) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = fwrite(buf, 1, size, file);
-	if ((size_t)ret < size || ferror(file)) {
-		rte_errno = EIO;
-		ret = -rte_errno;
-	} else {
-		ret = size;
-	}
-	fclose(file);
-	return ret;
-}
-
-/**
- * Get unsigned long sysfs property.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[in] name
- *   Entry name relative to sysfs path.
- * @param[out] value
- *   Value output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
-{
-	int ret;
-	unsigned long value_ret;
-	char value_str[32];
-
-	ret = priv_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret < 0) {
-		DEBUG("cannot read %s value from sysfs: %s",
-		      name, strerror(rte_errno));
-		return ret;
-	}
-	value_str[ret] = '\0';
-	errno = 0;
-	value_ret = strtoul(value_str, NULL, 0);
-	if (errno) {
-		rte_errno = errno;
-		DEBUG("invalid %s value `%s': %s", name, value_str,
-		      strerror(rte_errno));
-		return -rte_errno;
-	}
-	*value = value_ret;
-	return 0;
-}
-
-/**
- * Set unsigned long sysfs property.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[in] name
- *   Entry name relative to sysfs path.
- * @param value
- *   Value to set.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
-{
-	int ret;
-	MKSTR(value_str, "%lu", value);
-
-	ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
-	if (ret < 0) {
-		DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
-		      name, value_str, value, strerror(rte_errno));
-		return ret;
-	}
-	return 0;
-}
-
-/**
- * Perform ifreq ioctl() on associated Ethernet device.
- *
- * @param[in] priv
- *   Pointer to private structure.
- * @param req
- *   Request number to pass to ioctl().
- * @param[out] ifr
- *   Interface request structure output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
-{
-	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
-	int ret;
-
-	if (sock == -1) {
-		rte_errno = errno;
-		return -rte_errno;
-	}
-	ret = priv_get_ifname(priv, &ifr->ifr_name);
-	if (!ret && ioctl(sock, req, ifr) == -1) {
-		rte_errno = errno;
-		ret = -rte_errno;
-	}
-	close(sock);
-	return ret;
-}
-
-/**
- * Get device MTU.
- *
- * @param priv
- *   Pointer to private structure.
- * @param[out] mtu
- *   MTU value output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_mtu(struct priv *priv, uint16_t *mtu)
-{
-	unsigned long ulong_mtu = 0;
-	int ret = priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu);
-
-	if (ret)
-		return ret;
-	*mtu = ulong_mtu;
-	return 0;
-}
-
-/**
- * DPDK callback to change the MTU.
- *
- * @param priv
- *   Pointer to Ethernet device structure.
- * @param mtu
- *   MTU value to set.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
-{
-	struct priv *priv = dev->data->dev_private;
-	uint16_t new_mtu;
-	int ret = priv_set_sysfs_ulong(priv, "mtu", mtu);
-
-	if (ret)
-		return ret;
-	ret = priv_get_mtu(priv, &new_mtu);
-	if (ret)
-		return ret;
-	if (new_mtu == mtu) {
-		priv->mtu = mtu;
-		return 0;
-	}
-	rte_errno = EINVAL;
-	return -rte_errno;
-}
-
-/**
- * Set device flags.
- *
- * @param priv
- *   Pointer to private structure.
- * @param keep
- *   Bitmask for flags that must remain untouched.
- * @param flags
- *   Bitmask for flags to modify.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
-{
-	unsigned long tmp = 0;
-	int ret = priv_get_sysfs_ulong(priv, "flags", &tmp);
-
-	if (ret)
-		return ret;
-	tmp &= keep;
-	tmp |= (flags & (~keep));
-	return priv_set_sysfs_ulong(priv, "flags", tmp);
-}
-
 /* Device configuration. */
 
 static int
@@ -1675,346 +1304,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	memset(priv, 0, sizeof(*priv));
 }
 
-/**
- * Change the link state (UP / DOWN).
- *
- * @param priv
- *   Pointer to Ethernet device private data.
- * @param up
- *   Nonzero for link up, otherwise link down.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_set_link(struct priv *priv, int up)
-{
-	struct rte_eth_dev *dev = priv->dev;
-	int err;
-
-	if (up) {
-		err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
-		if (err)
-			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst;
-	} else {
-		err = priv_set_flags(priv, ~IFF_UP, ~IFF_UP);
-		if (err)
-			return err;
-		dev->rx_pkt_burst = mlx4_rx_burst_removed;
-		dev->tx_pkt_burst = mlx4_tx_burst_removed;
-	}
-	return 0;
-}
-
-/**
- * DPDK callback to bring the link DOWN.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_set_link_down(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-
-	return priv_set_link(priv, 0);
-}
-
-/**
- * DPDK callback to bring the link UP.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_set_link_up(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-
-	return priv_set_link(priv, 1);
-}
-
-/**
- * DPDK callback to get information about the device.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[out] info
- *   Info structure output buffer.
- */
-static void
-mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int max;
-	char ifname[IF_NAMESIZE];
-
-	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
-	if (priv == NULL)
-		return;
-	/* FIXME: we should ask the device for these values. */
-	info->min_rx_bufsize = 32;
-	info->max_rx_pktlen = 65536;
-	/*
-	 * Since we need one CQ per QP, the limit is the minimum number
-	 * between the two values.
-	 */
-	max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ?
-	       priv->device_attr.max_qp : priv->device_attr.max_cq);
-	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
-	if (max >= 65535)
-		max = 65535;
-	info->max_rx_queues = max;
-	info->max_tx_queues = max;
-	/* Last array entry is reserved for broadcast. */
-	info->max_mac_addrs = 1;
-	info->rx_offload_capa = 0;
-	info->tx_offload_capa = 0;
-	if (priv_get_ifname(priv, &ifname) == 0)
-		info->if_index = if_nametoindex(ifname);
-	info->speed_capa =
-			ETH_LINK_SPEED_1G |
-			ETH_LINK_SPEED_10G |
-			ETH_LINK_SPEED_20G |
-			ETH_LINK_SPEED_40G |
-			ETH_LINK_SPEED_56G;
-}
-
-/**
- * DPDK callback to get device statistics.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[out] stats
- *   Stats structure output buffer.
- */
-static void
-mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rte_eth_stats tmp = {0};
-	unsigned int i;
-	unsigned int idx;
-
-	if (priv == NULL)
-		return;
-	/* Add software counters. */
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
-
-		if (rxq == NULL)
-			continue;
-		idx = rxq->stats.idx;
-		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			tmp.q_ipackets[idx] += rxq->stats.ipackets;
-			tmp.q_ibytes[idx] += rxq->stats.ibytes;
-			tmp.q_errors[idx] += (rxq->stats.idropped +
-					      rxq->stats.rx_nombuf);
-		}
-		tmp.ipackets += rxq->stats.ipackets;
-		tmp.ibytes += rxq->stats.ibytes;
-		tmp.ierrors += rxq->stats.idropped;
-		tmp.rx_nombuf += rxq->stats.rx_nombuf;
-	}
-	for (i = 0; (i != priv->txqs_n); ++i) {
-		struct txq *txq = (*priv->txqs)[i];
-
-		if (txq == NULL)
-			continue;
-		idx = txq->stats.idx;
-		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
-			tmp.q_opackets[idx] += txq->stats.opackets;
-			tmp.q_obytes[idx] += txq->stats.obytes;
-			tmp.q_errors[idx] += txq->stats.odropped;
-		}
-		tmp.opackets += txq->stats.opackets;
-		tmp.obytes += txq->stats.obytes;
-		tmp.oerrors += txq->stats.odropped;
-	}
-	*stats = tmp;
-}
-
-/**
- * DPDK callback to clear device statistics.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- */
-static void
-mlx4_stats_reset(struct rte_eth_dev *dev)
-{
-	struct priv *priv = dev->data->dev_private;
-	unsigned int i;
-	unsigned int idx;
-
-	if (priv == NULL)
-		return;
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		idx = (*priv->rxqs)[i]->stats.idx;
-		(*priv->rxqs)[i]->stats =
-			(struct mlx4_rxq_stats){ .idx = idx };
-	}
-	for (i = 0; (i != priv->txqs_n); ++i) {
-		if ((*priv->txqs)[i] == NULL)
-			continue;
-		idx = (*priv->txqs)[i]->stats.idx;
-		(*priv->txqs)[i]->stats =
-			(struct mlx4_txq_stats){ .idx = idx };
-	}
-}
-
-/**
- * DPDK callback to retrieve physical link information.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param wait_to_complete
- *   Wait for request completion (ignored).
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-int
-mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
-{
-	const struct priv *priv = dev->data->dev_private;
-	struct ethtool_cmd edata = {
-		.cmd = ETHTOOL_GSET
-	};
-	struct ifreq ifr;
-	struct rte_eth_link dev_link;
-	int link_speed = 0;
-
-	if (priv == NULL) {
-		rte_errno = EINVAL;
-		return -rte_errno;
-	}
-	(void)wait_to_complete;
-	if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
-		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(rte_errno));
-		return -rte_errno;
-	}
-	memset(&dev_link, 0, sizeof(dev_link));
-	dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
-				(ifr.ifr_flags & IFF_RUNNING));
-	ifr.ifr_data = (void *)&edata;
-	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
-		     strerror(rte_errno));
-		return -rte_errno;
-	}
-	link_speed = ethtool_cmd_speed(&edata);
-	if (link_speed == -1)
-		dev_link.link_speed = 0;
-	else
-		dev_link.link_speed = link_speed;
-	dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
-				ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
-	dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
-			ETH_LINK_SPEED_FIXED);
-	dev->data->dev_link = dev_link;
-	return 0;
-}
-
-/**
- * DPDK callback to get flow control status.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[out] fc_conf
- *   Flow control output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct ifreq ifr;
-	struct ethtool_pauseparam ethpause = {
-		.cmd = ETHTOOL_GPAUSEPARAM
-	};
-	int ret;
-
-	ifr.ifr_data = (void *)&ethpause;
-	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = rte_errno;
-		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
-		     " failed: %s",
-		     strerror(rte_errno));
-		goto out;
-	}
-	fc_conf->autoneg = ethpause.autoneg;
-	if (ethpause.rx_pause && ethpause.tx_pause)
-		fc_conf->mode = RTE_FC_FULL;
-	else if (ethpause.rx_pause)
-		fc_conf->mode = RTE_FC_RX_PAUSE;
-	else if (ethpause.tx_pause)
-		fc_conf->mode = RTE_FC_TX_PAUSE;
-	else
-		fc_conf->mode = RTE_FC_NONE;
-	ret = 0;
-out:
-	assert(ret >= 0);
-	return -ret;
-}
-
-/**
- * DPDK callback to modify flow control parameters.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param[in] fc_conf
- *   Flow control parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct ifreq ifr;
-	struct ethtool_pauseparam ethpause = {
-		.cmd = ETHTOOL_SPAUSEPARAM
-	};
-	int ret;
-
-	ifr.ifr_data = (void *)&ethpause;
-	ethpause.autoneg = fc_conf->autoneg;
-	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
-	    (fc_conf->mode & RTE_FC_RX_PAUSE))
-		ethpause.rx_pause = 1;
-	else
-		ethpause.rx_pause = 0;
-	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
-	    (fc_conf->mode & RTE_FC_TX_PAUSE))
-		ethpause.tx_pause = 1;
-	else
-		ethpause.tx_pause = 0;
-	if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
-		ret = rte_errno;
-		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
-		     " failed: %s",
-		     strerror(rte_errno));
-		goto out;
-	}
-	ret = 0;
-out:
-	assert(ret >= 0);
-	return -ret;
-}
-
 const struct rte_flow_ops mlx4_flow_ops = {
 	.validate = mlx4_flow_validate,
 	.create = mlx4_flow_create,
@@ -2064,8 +1353,8 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_configure = mlx4_dev_configure,
 	.dev_start = mlx4_dev_start,
 	.dev_stop = mlx4_dev_stop,
-	.dev_set_link_down = mlx4_set_link_down,
-	.dev_set_link_up = mlx4_set_link_up,
+	.dev_set_link_down = mlx4_dev_set_link_down,
+	.dev_set_link_up = mlx4_dev_set_link_up,
 	.dev_close = mlx4_dev_close,
 	.link_update = mlx4_link_update,
 	.stats_get = mlx4_stats_get,
@@ -2075,9 +1364,9 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.tx_queue_setup = mlx4_tx_queue_setup,
 	.rx_queue_release = mlx4_rx_queue_release,
 	.tx_queue_release = mlx4_tx_queue_release,
-	.flow_ctrl_get = mlx4_dev_get_flow_ctrl,
-	.flow_ctrl_set = mlx4_dev_set_flow_ctrl,
-	.mtu_set = mlx4_dev_set_mtu,
+	.flow_ctrl_get = mlx4_flow_ctrl_get,
+	.flow_ctrl_set = mlx4_flow_ctrl_set,
+	.mtu_set = mlx4_mtu_set,
 	.filter_ctrl = mlx4_dev_filter_ctrl,
 	.rx_queue_intr_enable = mlx4_rx_intr_enable,
 	.rx_queue_intr_disable = mlx4_rx_intr_disable,
@@ -2136,29 +1425,6 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
 }
 
 /**
- * Get MAC address by querying netdevice.
- *
- * @param[in] priv
- *   struct priv for the requested device.
- * @param[out] mac
- *   MAC address output buffer.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
-{
-	struct ifreq request;
-	int ret = priv_ifreq(priv, SIOCGIFHWADDR, &request);
-
-	if (ret)
-		return ret;
-	memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
-	return 0;
-}
-
-/**
  * Verify and store value for device argument.
  *
  * @param[in] key
@@ -2411,7 +1677,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		priv->mtu = ETHER_MTU;
 		priv->vf = vf;
 		/* Configure the first MAC address by default. */
-		if (priv_get_mac(priv, &mac.addr_bytes)) {
+		if (mlx4_get_mac(priv, &mac.addr_bytes)) {
 			ERROR("cannot get MAC address, is mlx4_en loaded?"
 			      " (rte_errno: %s)", strerror(rte_errno));
 			goto port_error;
@@ -2429,7 +1695,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		{
 			char ifname[IF_NAMESIZE];
 
-			if (priv_get_ifname(priv, &ifname) == 0)
+			if (mlx4_get_ifname(priv, &ifname) == 0)
 				DEBUG("port %u ifname is \"%s\"",
 				      priv->port, ifname);
 			else
@@ -2437,7 +1703,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		}
 #endif
 		/* Get actual MTU if possible. */
-		priv_get_mtu(priv, &priv->mtu);
+		mlx4_mtu_get(priv, &priv->mtu);
 		DEBUG("port %u MTU is %u", priv->port, priv->mtu);
 		/* from rte_ethdev.c */
 		{
@@ -2480,7 +1746,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;
 		/* Bring Ethernet device up. */
 		DEBUG("forcing Ethernet interface up");
-		priv_set_flags(priv, ~IFF_UP, IFF_UP);
+		mlx4_dev_set_link_up(priv->dev);
 		/* Update link status once if waiting for LSC. */
 		if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
 			mlx4_link_update(eth_dev, 0);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index efccf1a..b5f2953 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -34,6 +34,7 @@
 #ifndef RTE_PMD_MLX4_H_
 #define RTE_PMD_MLX4_H_
 
+#include <net/if.h>
 #include <stdint.h>
 
 /* Verbs header. */
@@ -117,7 +118,24 @@ struct priv {
 /* mlx4.c */
 
 struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
+
+/* mlx4_ethdev.c */
+
+int mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE]);
+int mlx4_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN]);
+int mlx4_mtu_get(struct priv *priv, uint16_t *mtu);
+int mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu);
+int mlx4_dev_set_link_down(struct rte_eth_dev *dev);
+int mlx4_dev_set_link_up(struct rte_eth_dev *dev);
+void mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats);
+void mlx4_stats_reset(struct rte_eth_dev *dev);
+void mlx4_dev_infos_get(struct rte_eth_dev *dev,
+			struct rte_eth_dev_info *info);
 int mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete);
+int mlx4_flow_ctrl_get(struct rte_eth_dev *dev,
+		       struct rte_eth_fc_conf *fc_conf);
+int mlx4_flow_ctrl_set(struct rte_eth_dev *dev,
+		       struct rte_eth_fc_conf *fc_conf);
 
 /* mlx4_intr.c */
 
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
new file mode 100644
index 0000000..5f1dba2
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -0,0 +1,793 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Miscellaneous control operations for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <dirent.h>
+#include <errno.h>
+#include <linux/ethtool.h>
+#include <linux/sockios.h>
+#include <net/if.h>
+#include <netinet/ip.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_pci.h>
+
+#include "mlx4.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Get interface name from private structure.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[out] ifname
+ *   Interface name output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
+{
+	DIR *dir;
+	struct dirent *dent;
+	unsigned int dev_type = 0;
+	unsigned int dev_port_prev = ~0u;
+	char match[IF_NAMESIZE] = "";
+
+	{
+		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
+
+		dir = opendir(path);
+		if (dir == NULL) {
+			rte_errno = errno;
+			return -rte_errno;
+		}
+	}
+	while ((dent = readdir(dir)) != NULL) {
+		char *name = dent->d_name;
+		FILE *file;
+		unsigned int dev_port;
+		int r;
+
+		if ((name[0] == '.') &&
+		    ((name[1] == '\0') ||
+		     ((name[1] == '.') && (name[2] == '\0'))))
+			continue;
+
+		MKSTR(path, "%s/device/net/%s/%s",
+		      priv->ctx->device->ibdev_path, name,
+		      (dev_type ? "dev_id" : "dev_port"));
+
+		file = fopen(path, "rb");
+		if (file == NULL) {
+			if (errno != ENOENT)
+				continue;
+			/*
+			 * Switch to dev_id when dev_port does not exist as
+			 * is the case with Linux kernel versions < 3.15.
+			 */
+try_dev_id:
+			match[0] = '\0';
+			if (dev_type)
+				break;
+			dev_type = 1;
+			dev_port_prev = ~0u;
+			rewinddir(dir);
+			continue;
+		}
+		r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
+		fclose(file);
+		if (r != 1)
+			continue;
+		/*
+		 * Switch to dev_id when dev_port returns the same value for
+		 * all ports. May happen when using a MOFED release older than
+		 * 3.0 with a Linux kernel >= 3.15.
+		 */
+		if (dev_port == dev_port_prev)
+			goto try_dev_id;
+		dev_port_prev = dev_port;
+		if (dev_port == (priv->port - 1u))
+			snprintf(match, sizeof(match), "%s", name);
+	}
+	closedir(dir);
+	if (match[0] == '\0') {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	strncpy(*ifname, match, sizeof(*ifname));
+	return 0;
+}
+
+/**
+ * Read from sysfs entry.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[in] entry
+ *   Entry name relative to sysfs path.
+ * @param[out] buf
+ *   Data output buffer.
+ * @param size
+ *   Buffer size.
+ *
+ * @return
+ *   Number of bytes read on success, negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx4_sysfs_read(const struct priv *priv, const char *entry,
+		char *buf, size_t size)
+{
+	char ifname[IF_NAMESIZE];
+	FILE *file;
+	int ret;
+
+	ret = mlx4_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
+
+	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+	      ifname, entry);
+
+	file = fopen(path, "rb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = fread(buf, 1, size, file);
+	if ((size_t)ret < size && ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
+		ret = size;
+	}
+	fclose(file);
+	return ret;
+}
+
+/**
+ * Write to sysfs entry.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[in] entry
+ *   Entry name relative to sysfs path.
+ * @param[in] buf
+ *   Data buffer.
+ * @param size
+ *   Buffer size.
+ *
+ * @return
+ *   Number of bytes written on success, negative errno value otherwise and
+ *   rte_errno is set.
+ */
+static int
+mlx4_sysfs_write(const struct priv *priv, const char *entry,
+		 char *buf, size_t size)
+{
+	char ifname[IF_NAMESIZE];
+	FILE *file;
+	int ret;
+
+	ret = mlx4_get_ifname(priv, &ifname);
+	if (ret)
+		return ret;
+
+	MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+	      ifname, entry);
+
+	file = fopen(path, "wb");
+	if (file == NULL) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = fwrite(buf, 1, size, file);
+	if ((size_t)ret < size || ferror(file)) {
+		rte_errno = EIO;
+		ret = -rte_errno;
+	} else {
+		ret = size;
+	}
+	fclose(file);
+	return ret;
+}
+
+/**
+ * Get unsigned long sysfs property.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[in] name
+ *   Entry name relative to sysfs path.
+ * @param[out] value
+ *   Value output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
+{
+	int ret;
+	unsigned long value_ret;
+	char value_str[32];
+
+	ret = mlx4_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
+	if (ret < 0) {
+		DEBUG("cannot read %s value from sysfs: %s",
+		      name, strerror(rte_errno));
+		return ret;
+	}
+	value_str[ret] = '\0';
+	errno = 0;
+	value_ret = strtoul(value_str, NULL, 0);
+	if (errno) {
+		rte_errno = errno;
+		DEBUG("invalid %s value `%s': %s", name, value_str,
+		      strerror(rte_errno));
+		return -rte_errno;
+	}
+	*value = value_ret;
+	return 0;
+}
+
+/**
+ * Set unsigned long sysfs property.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[in] name
+ *   Entry name relative to sysfs path.
+ * @param value
+ *   Value to set.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
+{
+	int ret;
+	MKSTR(value_str, "%lu", value);
+
+	ret = mlx4_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
+	if (ret < 0) {
+		DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
+		      name, value_str, value, strerror(rte_errno));
+		return ret;
+	}
+	return 0;
+}
+
+/**
+ * Perform ifreq ioctl() on associated Ethernet device.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param req
+ *   Request number to pass to ioctl().
+ * @param[out] ifr
+ *   Interface request structure output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
+{
+	int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+	int ret;
+
+	if (sock == -1) {
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	ret = mlx4_get_ifname(priv, &ifr->ifr_name);
+	if (!ret && ioctl(sock, req, ifr) == -1) {
+		rte_errno = errno;
+		ret = -rte_errno;
+	}
+	close(sock);
+	return ret;
+}
+
+/**
+ * Get MAC address by querying netdevice.
+ *
+ * @param[in] priv
+ *   Pointer to private structure.
+ * @param[out] mac
+ *   MAC address output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
+{
+	struct ifreq request;
+	int ret = mlx4_ifreq(priv, SIOCGIFHWADDR, &request);
+
+	if (ret)
+		return ret;
+	memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+	return 0;
+}
+
+/**
+ * Get device MTU.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param[out] mtu
+ *   MTU value output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mtu_get(struct priv *priv, uint16_t *mtu)
+{
+	unsigned long ulong_mtu = 0;
+	int ret = mlx4_get_sysfs_ulong(priv, "mtu", &ulong_mtu);
+
+	if (ret)
+		return ret;
+	*mtu = ulong_mtu;
+	return 0;
+}
+
+/**
+ * DPDK callback to change the MTU.
+ *
+ * @param priv
+ *   Pointer to Ethernet device structure.
+ * @param mtu
+ *   MTU value to set.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct priv *priv = dev->data->dev_private;
+	uint16_t new_mtu;
+	int ret = mlx4_set_sysfs_ulong(priv, "mtu", mtu);
+
+	if (ret)
+		return ret;
+	ret = mlx4_mtu_get(priv, &new_mtu);
+	if (ret)
+		return ret;
+	if (new_mtu == mtu) {
+		priv->mtu = mtu;
+		return 0;
+	}
+	rte_errno = EINVAL;
+	return -rte_errno;
+}
+
+/**
+ * Set device flags.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param keep
+ *   Bitmask for flags that must remain untouched.
+ * @param flags
+ *   Bitmask for flags to modify.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
+{
+	unsigned long tmp = 0;
+	int ret = mlx4_get_sysfs_ulong(priv, "flags", &tmp);
+
+	if (ret)
+		return ret;
+	tmp &= keep;
+	tmp |= (flags & (~keep));
+	return mlx4_set_sysfs_ulong(priv, "flags", tmp);
+}
+
+/**
+ * Change the link state (UP / DOWN).
+ *
+ * @param priv
+ *   Pointer to Ethernet device private data.
+ * @param up
+ *   Nonzero for link up, otherwise link down.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_dev_set_link(struct priv *priv, int up)
+{
+	struct rte_eth_dev *dev = priv->dev;
+	int err;
+
+	if (up) {
+		err = mlx4_set_flags(priv, ~IFF_UP, IFF_UP);
+		if (err)
+			return err;
+		dev->rx_pkt_burst = mlx4_rx_burst;
+	} else {
+		err = mlx4_set_flags(priv, ~IFF_UP, ~IFF_UP);
+		if (err)
+			return err;
+		dev->rx_pkt_burst = mlx4_rx_burst_removed;
+		dev->tx_pkt_burst = mlx4_tx_burst_removed;
+	}
+	return 0;
+}
+
+/**
+ * DPDK callback to bring the link DOWN.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+
+	return mlx4_dev_set_link(priv, 0);
+}
+
+/**
+ * DPDK callback to bring the link UP.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+
+	return mlx4_dev_set_link(priv, 1);
+}
+
+/**
+ * DPDK callback to get information about the device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] info
+ *   Info structure output buffer.
+ */
+void
+mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
+{
+	struct priv *priv = dev->data->dev_private;
+	unsigned int max;
+	char ifname[IF_NAMESIZE];
+
+	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	if (priv == NULL)
+		return;
+	/* FIXME: we should ask the device for these values. */
+	info->min_rx_bufsize = 32;
+	info->max_rx_pktlen = 65536;
+	/*
+	 * Since we need one CQ per QP, the limit is the minimum number
+	 * between the two values.
+	 */
+	max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ?
+	       priv->device_attr.max_qp : priv->device_attr.max_cq);
+	/* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
+	if (max >= 65535)
+		max = 65535;
+	info->max_rx_queues = max;
+	info->max_tx_queues = max;
+	/* Last array entry is reserved for broadcast. */
+	info->max_mac_addrs = 1;
+	info->rx_offload_capa = 0;
+	info->tx_offload_capa = 0;
+	if (mlx4_get_ifname(priv, &ifname) == 0)
+		info->if_index = if_nametoindex(ifname);
+	info->speed_capa =
+			ETH_LINK_SPEED_1G |
+			ETH_LINK_SPEED_10G |
+			ETH_LINK_SPEED_20G |
+			ETH_LINK_SPEED_40G |
+			ETH_LINK_SPEED_56G;
+}
+
+/**
+ * DPDK callback to get device statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] stats
+ *   Stats structure output buffer.
+ */
+void
+mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rte_eth_stats tmp;
+	unsigned int i;
+	unsigned int idx;
+
+	if (priv == NULL)
+		return;
+	memset(&tmp, 0, sizeof(tmp));
+	/* Add software counters. */
+	for (i = 0; (i != priv->rxqs_n); ++i) {
+		struct rxq *rxq = (*priv->rxqs)[i];
+
+		if (rxq == NULL)
+			continue;
+		idx = rxq->stats.idx;
+		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			tmp.q_ipackets[idx] += rxq->stats.ipackets;
+			tmp.q_ibytes[idx] += rxq->stats.ibytes;
+			tmp.q_errors[idx] += (rxq->stats.idropped +
+					      rxq->stats.rx_nombuf);
+		}
+		tmp.ipackets += rxq->stats.ipackets;
+		tmp.ibytes += rxq->stats.ibytes;
+		tmp.ierrors += rxq->stats.idropped;
+		tmp.rx_nombuf += rxq->stats.rx_nombuf;
+	}
+	for (i = 0; (i != priv->txqs_n); ++i) {
+		struct txq *txq = (*priv->txqs)[i];
+
+		if (txq == NULL)
+			continue;
+		idx = txq->stats.idx;
+		if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+			tmp.q_opackets[idx] += txq->stats.opackets;
+			tmp.q_obytes[idx] += txq->stats.obytes;
+			tmp.q_errors[idx] += txq->stats.odropped;
+		}
+		tmp.opackets += txq->stats.opackets;
+		tmp.obytes += txq->stats.obytes;
+		tmp.oerrors += txq->stats.odropped;
+	}
+	*stats = tmp;
+}
+
+/**
+ * DPDK callback to clear device statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ */
+void
+mlx4_stats_reset(struct rte_eth_dev *dev)
+{
+	struct priv *priv = dev->data->dev_private;
+	unsigned int i;
+	unsigned int idx;
+
+	if (priv == NULL)
+		return;
+	for (i = 0; (i != priv->rxqs_n); ++i) {
+		if ((*priv->rxqs)[i] == NULL)
+			continue;
+		idx = (*priv->rxqs)[i]->stats.idx;
+		(*priv->rxqs)[i]->stats =
+			(struct mlx4_rxq_stats){ .idx = idx };
+	}
+	for (i = 0; (i != priv->txqs_n); ++i) {
+		if ((*priv->txqs)[i] == NULL)
+			continue;
+		idx = (*priv->txqs)[i]->stats.idx;
+		(*priv->txqs)[i]->stats =
+			(struct mlx4_txq_stats){ .idx = idx };
+	}
+}
+
+/**
+ * DPDK callback to retrieve physical link information.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ *   Wait for request completion (ignored).
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+{
+	const struct priv *priv = dev->data->dev_private;
+	struct ethtool_cmd edata = {
+		.cmd = ETHTOOL_GSET,
+	};
+	struct ifreq ifr;
+	struct rte_eth_link dev_link;
+	int link_speed = 0;
+
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	(void)wait_to_complete;
+	if (mlx4_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
+		WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(rte_errno));
+		return -rte_errno;
+	}
+	memset(&dev_link, 0, sizeof(dev_link));
+	dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+				(ifr.ifr_flags & IFF_RUNNING));
+	ifr.ifr_data = (void *)&edata;
+	if (mlx4_ifreq(priv, SIOCETHTOOL, &ifr)) {
+		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+		     strerror(rte_errno));
+		return -rte_errno;
+	}
+	link_speed = ethtool_cmd_speed(&edata);
+	if (link_speed == -1)
+		dev_link.link_speed = 0;
+	else
+		dev_link.link_speed = link_speed;
+	dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+				ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+	dev_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
+				  ETH_LINK_SPEED_FIXED);
+	dev->data->dev_link = dev_link;
+	return 0;
+}
+
+/**
+ * DPDK callback to get flow control status.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] fc_conf
+ *   Flow control output buffer.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_flow_ctrl_get(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct ifreq ifr;
+	struct ethtool_pauseparam ethpause = {
+		.cmd = ETHTOOL_GPAUSEPARAM,
+	};
+	int ret;
+
+	ifr.ifr_data = (void *)&ethpause;
+	if (mlx4_ifreq(priv, SIOCETHTOOL, &ifr)) {
+		ret = rte_errno;
+		WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
+		     " failed: %s",
+		     strerror(rte_errno));
+		goto out;
+	}
+	fc_conf->autoneg = ethpause.autoneg;
+	if (ethpause.rx_pause && ethpause.tx_pause)
+		fc_conf->mode = RTE_FC_FULL;
+	else if (ethpause.rx_pause)
+		fc_conf->mode = RTE_FC_RX_PAUSE;
+	else if (ethpause.tx_pause)
+		fc_conf->mode = RTE_FC_TX_PAUSE;
+	else
+		fc_conf->mode = RTE_FC_NONE;
+	ret = 0;
+out:
+	assert(ret >= 0);
+	return -ret;
+}
+
+/**
+ * DPDK callback to modify flow control parameters.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[in] fc_conf
+ *   Flow control parameters.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct ifreq ifr;
+	struct ethtool_pauseparam ethpause = {
+		.cmd = ETHTOOL_SPAUSEPARAM,
+	};
+	int ret;
+
+	ifr.ifr_data = (void *)&ethpause;
+	ethpause.autoneg = fc_conf->autoneg;
+	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+	    (fc_conf->mode & RTE_FC_RX_PAUSE))
+		ethpause.rx_pause = 1;
+	else
+		ethpause.rx_pause = 0;
+	if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+	    (fc_conf->mode & RTE_FC_TX_PAUSE))
+		ethpause.tx_pause = 1;
+	else
+		ethpause.tx_pause = 0;
+	if (mlx4_ifreq(priv, SIOCETHTOOL, &ifr)) {
+		ret = rte_errno;
+		WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
+		     " failed: %s",
+		     strerror(rte_errno));
+		goto out;
+	}
+	ret = 0;
+out:
+	assert(ret >= 0);
+	return -ret;
+}
diff --git a/drivers/net/mlx4/mlx4_utils.h b/drivers/net/mlx4/mlx4_utils.h
index 9b178f5..0fbdc71 100644
--- a/drivers/net/mlx4/mlx4_utils.h
+++ b/drivers/net/mlx4/mlx4_utils.h
@@ -34,6 +34,9 @@
 #ifndef MLX4_UTILS_H_
 #define MLX4_UTILS_H_
 
+#include <stddef.h>
+#include <stdio.h>
+
 #include <rte_common.h>
 #include <rte_log.h>
 
@@ -95,6 +98,12 @@ pmd_drv_log_basename(const char *s)
 #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
 #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
 
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+	char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
+	\
+	snprintf(name, sizeof(name), __VA_ARGS__)
+
 /* mlx4_utils.c */
 
 int mlx4_fd_set_non_blocking(int fd);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 44/51] net/mlx4: separate Tx configuration functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (42 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 43/51] net/mlx4: separate device control functions Adrien Mazarguil
@ 2017-09-01  8:06   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 45/51] net/mlx4: separate Rx " Adrien Mazarguil
                     ` (8 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:06 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 421 +---------------------------------
 drivers/net/mlx4/mlx4_rxtx.h |   9 +
 drivers/net/mlx4/mlx4_txq.c  | 472 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 483 insertions(+), 420 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 9549525..0b74fed 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -40,6 +40,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
 
 # Basic CFLAGS.
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cff57c2..ee3fc24 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -56,9 +56,6 @@
 #include <rte_interrupts.h>
 #include <rte_common.h>
 
-/* Generated configuration header. */
-#include "mlx4_autoconf.h"
-
 /* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
@@ -82,13 +79,6 @@ const char *pmd_mlx4_init_params[] = {
 /* Device configuration. */
 
 static int
-txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_txconf *conf);
-
-static void
-txq_cleanup(struct txq *txq);
-
-static int
 rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
 	  unsigned int socket, const struct rte_eth_rxconf *conf,
 	  struct rte_mempool *mp);
@@ -132,128 +122,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-/* TX queues handling. */
-
-/**
- * Allocate TX queue elements.
- *
- * @param txq
- *   Pointer to TX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-txq_alloc_elts(struct txq *txq, unsigned int elts_n)
-{
-	unsigned int i;
-	struct txq_elt (*elts)[elts_n] =
-		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
-	int ret = 0;
-
-	if (elts == NULL) {
-		ERROR("%p: can't allocate packets array", (void *)txq);
-		ret = ENOMEM;
-		goto error;
-	}
-	for (i = 0; (i != elts_n); ++i) {
-		struct txq_elt *elt = &(*elts)[i];
-
-		elt->buf = NULL;
-	}
-	DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
-	txq->elts_n = elts_n;
-	txq->elts = elts;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	/*
-	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
-	 * at least 4 times per ring.
-	 */
-	txq->elts_comp_cd_init =
-		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
-		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
-	txq->elts_comp_cd = txq->elts_comp_cd_init;
-	assert(ret == 0);
-	return 0;
-error:
-	rte_free(elts);
-	DEBUG("%p: failed, freed everything", (void *)txq);
-	assert(ret > 0);
-	rte_errno = ret;
-	return -rte_errno;
-}
-
-/**
- * Free TX queue elements.
- *
- * @param txq
- *   Pointer to TX queue structure.
- */
-static void
-txq_free_elts(struct txq *txq)
-{
-	unsigned int elts_n = txq->elts_n;
-	unsigned int elts_head = txq->elts_head;
-	unsigned int elts_tail = txq->elts_tail;
-	struct txq_elt (*elts)[elts_n] = txq->elts;
-
-	DEBUG("%p: freeing WRs", (void *)txq);
-	txq->elts_n = 0;
-	txq->elts_head = 0;
-	txq->elts_tail = 0;
-	txq->elts_comp = 0;
-	txq->elts_comp_cd = 0;
-	txq->elts_comp_cd_init = 0;
-	txq->elts = NULL;
-	if (elts == NULL)
-		return;
-	while (elts_tail != elts_head) {
-		struct txq_elt *elt = &(*elts)[elts_tail];
-
-		assert(elt->buf != NULL);
-		rte_pktmbuf_free(elt->buf);
-#ifndef NDEBUG
-		/* Poisoning. */
-		memset(elt, 0x77, sizeof(*elt));
-#endif
-		if (++elts_tail == elts_n)
-			elts_tail = 0;
-	}
-	rte_free(elts);
-}
-
-/**
- * Clean up a TX queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param txq
- *   Pointer to TX queue structure.
- */
-static void
-txq_cleanup(struct txq *txq)
-{
-	size_t i;
-
-	DEBUG("cleaning up %p", (void *)txq);
-	txq_free_elts(txq);
-	if (txq->qp != NULL)
-		claim_zero(ibv_destroy_qp(txq->qp));
-	if (txq->cq != NULL)
-		claim_zero(ibv_destroy_cq(txq->cq));
-	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
-		if (txq->mp2mr[i].mp == NULL)
-			break;
-		assert(txq->mp2mr[i].mr != NULL);
-		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
-	}
-	memset(txq, 0, sizeof(*txq));
-}
-
 struct mlx4_check_mempool_data {
 	int ret;
 	char *start;
@@ -367,293 +235,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	return mr;
 }
 
-struct txq_mp2mr_mbuf_check_data {
-	int ret;
-};
-
-/**
- * Callback function for rte_mempool_obj_iter() to check whether a given
- * mempool object looks like a mbuf.
- *
- * @param[in] mp
- *   The mempool pointer
- * @param[in] arg
- *   Context data (struct txq_mp2mr_mbuf_check_data). Contains the
- *   return value.
- * @param[in] obj
- *   Object address.
- * @param index
- *   Object index, unused.
- */
-static void
-txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
-	uint32_t index __rte_unused)
-{
-	struct txq_mp2mr_mbuf_check_data *data = arg;
-	struct rte_mbuf *buf = obj;
-
-	/*
-	 * Check whether mbuf structure fits element size and whether mempool
-	 * pointer is valid.
-	 */
-	if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
-		data->ret = -1;
-}
-
-/**
- * Iterator function for rte_mempool_walk() to register existing mempools and
- * fill the MP to MR cache of a TX queue.
- *
- * @param[in] mp
- *   Memory Pool to register.
- * @param *arg
- *   Pointer to TX queue structure.
- */
-static void
-txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
-{
-	struct txq *txq = arg;
-	struct txq_mp2mr_mbuf_check_data data = {
-		.ret = 0,
-	};
-
-	/* Register mempool only if the first element looks like a mbuf. */
-	if (rte_mempool_obj_iter(mp, txq_mp2mr_mbuf_check, &data) == 0 ||
-			data.ret == -1)
-		return;
-	mlx4_txq_mp2mr(txq, mp);
-}
-
-/**
- * Configure a TX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param txq
- *   Pointer to TX queue structure.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_txconf *conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct txq tmpl = {
-		.priv = priv,
-		.socket = socket
-	};
-	union {
-		struct ibv_qp_init_attr init;
-		struct ibv_qp_attr mod;
-	} attr;
-	int ret;
-
-	(void)conf; /* Thresholds configuration (ignored). */
-	if (priv == NULL) {
-		rte_errno = EINVAL;
-		goto error;
-	}
-	if (desc == 0) {
-		rte_errno = EINVAL;
-		ERROR("%p: invalid number of Tx descriptors", (void *)dev);
-		goto error;
-	}
-	/* MRs will be registered in mp2mr[] later. */
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
-	if (tmpl.cq == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	attr.init = (struct ibv_qp_init_attr){
-		/* CQ to be associated with the send queue. */
-		.send_cq = tmpl.cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = tmpl.cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_send_sge = 1,
-			.max_inline_data = MLX4_PMD_MAX_INLINE,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-		/*
-		 * Do *NOT* enable this, completions events are managed per
-		 * Tx burst.
-		 */
-		.sq_sig_all = 0,
-	};
-	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
-	if (tmpl.qp == NULL) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	/* ibv_create_qp() updates this value. */
-	tmpl.max_inline = attr.init.cap.max_inline_data;
-	attr.mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = txq_alloc_elts(&tmpl, desc);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: TXQ allocation failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	attr.mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	attr.mod.qp_state = IBV_QPS_RTS;
-	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	/* Clean up txq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
-	txq_cleanup(txq);
-	*txq = tmpl;
-	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
-	/* Pre-register known mempools. */
-	rte_mempool_walk(txq_mp2mr_iter, txq);
-	return 0;
-error:
-	ret = rte_errno;
-	txq_cleanup(&tmpl);
-	rte_errno = ret;
-	assert(rte_errno > 0);
-	return -rte_errno;
-}
-
-/**
- * DPDK callback to configure a TX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   TX queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_txconf *conf)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct txq *txq = (*priv->txqs)[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= priv->txqs_n) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, priv->txqs_n);
-		return -rte_errno;
-	}
-	if (txq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)txq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		(*priv->txqs)[idx] = NULL;
-		txq_cleanup(txq);
-	} else {
-		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
-		if (txq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = txq_setup(dev, txq, desc, socket, conf);
-	if (ret)
-		rte_free(txq);
-	else {
-		txq->stats.idx = idx;
-		DEBUG("%p: adding TX queue %p to list",
-		      (void *)dev, (void *)txq);
-		(*priv->txqs)[idx] = txq;
-		/* Update send callback. */
-		dev->tx_pkt_burst = mlx4_tx_burst;
-	}
-	return ret;
-}
-
-/**
- * DPDK callback to release a TX queue.
- *
- * @param dpdk_txq
- *   Generic TX queue pointer.
- */
-static void
-mlx4_tx_queue_release(void *dpdk_txq)
-{
-	struct txq *txq = (struct txq *)dpdk_txq;
-	struct priv *priv;
-	unsigned int i;
-
-	if (txq == NULL)
-		return;
-	priv = txq->priv;
-	for (i = 0; (i != priv->txqs_n); ++i)
-		if ((*priv->txqs)[i] == txq) {
-			DEBUG("%p: removing TX queue %p from list",
-			      (void *)priv->dev, (void *)txq);
-			(*priv->txqs)[i] = NULL;
-			break;
-		}
-	txq_cleanup(txq);
-	rte_free(txq);
-}
-
 /* RX queues handling. */
 
 /**
@@ -1288,7 +869,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 			if (tmp == NULL)
 				continue;
 			(*priv->txqs)[i] = NULL;
-			txq_cleanup(tmp);
+			mlx4_txq_cleanup(tmp);
 			rte_free(tmp);
 		}
 		priv->txqs_n = 0;
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 669c8a4..1b457a5 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -45,6 +45,7 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_ethdev.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 
@@ -131,4 +132,12 @@ uint16_t mlx4_tx_burst_removed(void *dpdk_txq, struct rte_mbuf **pkts,
 uint16_t mlx4_rx_burst_removed(void *dpdk_rxq, struct rte_mbuf **pkts,
 			       uint16_t pkts_n);
 
+/* mlx4_txq.c */
+
+void mlx4_txq_cleanup(struct txq *txq);
+int mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			uint16_t desc, unsigned int socket,
+			const struct rte_eth_txconf *conf);
+void mlx4_tx_queue_release(void *dpdk_txq);
+
 #endif /* MLX4_RXTX_H_ */
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
new file mode 100644
index 0000000..945833b
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -0,0 +1,472 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Tx queues configuration for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#include "mlx4.h"
+#include "mlx4_autoconf.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Allocate Tx queue elements.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ * @param elts_n
+ *   Number of elements to allocate.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_txq_alloc_elts(struct txq *txq, unsigned int elts_n)
+{
+	unsigned int i;
+	struct txq_elt (*elts)[elts_n] =
+		rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
+	int ret = 0;
+
+	if (elts == NULL) {
+		ERROR("%p: can't allocate packets array", (void *)txq);
+		ret = ENOMEM;
+		goto error;
+	}
+	for (i = 0; (i != elts_n); ++i) {
+		struct txq_elt *elt = &(*elts)[i];
+
+		elt->buf = NULL;
+	}
+	DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
+	txq->elts_n = elts_n;
+	txq->elts = elts;
+	txq->elts_head = 0;
+	txq->elts_tail = 0;
+	txq->elts_comp = 0;
+	/*
+	 * Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
+	 * at least 4 times per ring.
+	 */
+	txq->elts_comp_cd_init =
+		((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
+		 MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
+	txq->elts_comp_cd = txq->elts_comp_cd_init;
+	assert(ret == 0);
+	return 0;
+error:
+	rte_free(elts);
+	DEBUG("%p: failed, freed everything", (void *)txq);
+	assert(ret > 0);
+	rte_errno = ret;
+	return -rte_errno;
+}
+
+/**
+ * Free Tx queue elements.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ */
+static void
+mlx4_txq_free_elts(struct txq *txq)
+{
+	unsigned int elts_n = txq->elts_n;
+	unsigned int elts_head = txq->elts_head;
+	unsigned int elts_tail = txq->elts_tail;
+	struct txq_elt (*elts)[elts_n] = txq->elts;
+
+	DEBUG("%p: freeing WRs", (void *)txq);
+	txq->elts_n = 0;
+	txq->elts_head = 0;
+	txq->elts_tail = 0;
+	txq->elts_comp = 0;
+	txq->elts_comp_cd = 0;
+	txq->elts_comp_cd_init = 0;
+	txq->elts = NULL;
+	if (elts == NULL)
+		return;
+	while (elts_tail != elts_head) {
+		struct txq_elt *elt = &(*elts)[elts_tail];
+
+		assert(elt->buf != NULL);
+		rte_pktmbuf_free(elt->buf);
+#ifndef NDEBUG
+		/* Poisoning. */
+		memset(elt, 0x77, sizeof(*elt));
+#endif
+		if (++elts_tail == elts_n)
+			elts_tail = 0;
+	}
+	rte_free(elts);
+}
+
+/**
+ * Clean up a Tx queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param txq
+ *   Pointer to Tx queue structure.
+ */
+void
+mlx4_txq_cleanup(struct txq *txq)
+{
+	size_t i;
+
+	DEBUG("cleaning up %p", (void *)txq);
+	mlx4_txq_free_elts(txq);
+	if (txq->qp != NULL)
+		claim_zero(ibv_destroy_qp(txq->qp));
+	if (txq->cq != NULL)
+		claim_zero(ibv_destroy_cq(txq->cq));
+	for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+		if (txq->mp2mr[i].mp == NULL)
+			break;
+		assert(txq->mp2mr[i].mr != NULL);
+		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+	}
+	memset(txq, 0, sizeof(*txq));
+}
+
+struct txq_mp2mr_mbuf_check_data {
+	int ret;
+};
+
+/**
+ * Callback function for rte_mempool_obj_iter() to check whether a given
+ * mempool object looks like a mbuf.
+ *
+ * @param[in] mp
+ *   The mempool pointer
+ * @param[in] arg
+ *   Context data (struct mlx4_txq_mp2mr_mbuf_check_data). Contains the
+ *   return value.
+ * @param[in] obj
+ *   Object address.
+ * @param index
+ *   Object index, unused.
+ */
+static void
+mlx4_txq_mp2mr_mbuf_check(struct rte_mempool *mp, void *arg, void *obj,
+			  uint32_t index)
+{
+	struct txq_mp2mr_mbuf_check_data *data = arg;
+	struct rte_mbuf *buf = obj;
+
+	(void)index;
+	/*
+	 * Check whether mbuf structure fits element size and whether mempool
+	 * pointer is valid.
+	 */
+	if (sizeof(*buf) > mp->elt_size || buf->pool != mp)
+		data->ret = -1;
+}
+
+/**
+ * Iterator function for rte_mempool_walk() to register existing mempools and
+ * fill the MP to MR cache of a Tx queue.
+ *
+ * @param[in] mp
+ *   Memory Pool to register.
+ * @param *arg
+ *   Pointer to Tx queue structure.
+ */
+static void
+mlx4_txq_mp2mr_iter(struct rte_mempool *mp, void *arg)
+{
+	struct txq *txq = arg;
+	struct txq_mp2mr_mbuf_check_data data = {
+		.ret = 0,
+	};
+
+	/* Register mempool only if the first element looks like a mbuf. */
+	if (rte_mempool_obj_iter(mp, mlx4_txq_mp2mr_mbuf_check, &data) == 0 ||
+			data.ret == -1)
+		return;
+	mlx4_txq_mp2mr(txq, mp);
+}
+
+/**
+ * Configure a Tx queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param txq
+ *   Pointer to Tx queue structure.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
+	       unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct txq tmpl = {
+		.priv = priv,
+		.socket = socket
+	};
+	union {
+		struct ibv_qp_init_attr init;
+		struct ibv_qp_attr mod;
+	} attr;
+	int ret;
+
+	(void)conf; /* Thresholds configuration (ignored). */
+	if (priv == NULL) {
+		rte_errno = EINVAL;
+		goto error;
+	}
+	if (desc == 0) {
+		rte_errno = EINVAL;
+		ERROR("%p: invalid number of Tx descriptors", (void *)dev);
+		goto error;
+	}
+	/* MRs will be registered in mp2mr[] later. */
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+	if (tmpl.cq == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("%p: CQ creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	DEBUG("priv->device_attr.max_qp_wr is %d",
+	      priv->device_attr.max_qp_wr);
+	DEBUG("priv->device_attr.max_sge is %d",
+	      priv->device_attr.max_sge);
+	attr.init = (struct ibv_qp_init_attr){
+		/* CQ to be associated with the send queue. */
+		.send_cq = tmpl.cq,
+		/* CQ to be associated with the receive queue. */
+		.recv_cq = tmpl.cq,
+		.cap = {
+			/* Max number of outstanding WRs. */
+			.max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
+					priv->device_attr.max_qp_wr :
+					desc),
+			/* Max number of scatter/gather elements in a WR. */
+			.max_send_sge = 1,
+			.max_inline_data = MLX4_PMD_MAX_INLINE,
+		},
+		.qp_type = IBV_QPT_RAW_PACKET,
+		/*
+		 * Do *NOT* enable this, completions events are managed per
+		 * Tx burst.
+		 */
+		.sq_sig_all = 0,
+	};
+	tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
+	if (tmpl.qp == NULL) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: QP creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	/* ibv_create_qp() updates this value. */
+	tmpl.max_inline = attr.init.cap.max_inline_data;
+	attr.mod = (struct ibv_qp_attr){
+		/* Move the QP to this state. */
+		.qp_state = IBV_QPS_INIT,
+		/* Primary port number. */
+		.port_num = priv->port
+	};
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE | IBV_QP_PORT);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = mlx4_txq_alloc_elts(&tmpl, desc);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: TXQ allocation failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	attr.mod = (struct ibv_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	attr.mod.qp_state = IBV_QPS_RTS;
+	ret = ibv_modify_qp(tmpl.qp, &attr.mod, IBV_QP_STATE);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	/* Clean up txq in case we're reinitializing it. */
+	DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
+	mlx4_txq_cleanup(txq);
+	*txq = tmpl;
+	DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
+	/* Pre-register known mempools. */
+	rte_mempool_walk(mlx4_txq_mp2mr_iter, txq);
+	return 0;
+error:
+	ret = rte_errno;
+	mlx4_txq_cleanup(&tmpl);
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to configure a Tx queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Tx queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_txconf *conf)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct txq *txq = (*priv->txqs)[idx];
+	int ret;
+
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= priv->txqs_n) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, priv->txqs_n);
+		return -rte_errno;
+	}
+	if (txq != NULL) {
+		DEBUG("%p: reusing already allocated queue index %u (%p)",
+		      (void *)dev, idx, (void *)txq);
+		if (priv->started) {
+			rte_errno = EEXIST;
+			return -rte_errno;
+		}
+		(*priv->txqs)[idx] = NULL;
+		mlx4_txq_cleanup(txq);
+	} else {
+		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+		if (txq == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: unable to allocate queue index %u",
+			      (void *)dev, idx);
+			return -rte_errno;
+		}
+	}
+	ret = mlx4_txq_setup(dev, txq, desc, socket, conf);
+	if (ret) {
+		rte_free(txq);
+	} else {
+		txq->stats.idx = idx;
+		DEBUG("%p: adding Tx queue %p to list",
+		      (void *)dev, (void *)txq);
+		(*priv->txqs)[idx] = txq;
+		/* Update send callback. */
+		dev->tx_pkt_burst = mlx4_tx_burst;
+	}
+	return ret;
+}
+
+/**
+ * DPDK callback to release a Tx queue.
+ *
+ * @param dpdk_txq
+ *   Generic Tx queue pointer.
+ */
+void
+mlx4_tx_queue_release(void *dpdk_txq)
+{
+	struct txq *txq = (struct txq *)dpdk_txq;
+	struct priv *priv;
+	unsigned int i;
+
+	if (txq == NULL)
+		return;
+	priv = txq->priv;
+	for (i = 0; (i != priv->txqs_n); ++i)
+		if ((*priv->txqs)[i] == txq) {
+			DEBUG("%p: removing Tx queue %p from list",
+			      (void *)priv->dev, (void *)txq);
+			(*priv->txqs)[i] = NULL;
+			break;
+		}
+	mlx4_txq_cleanup(txq);
+	rte_free(txq);
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 45/51] net/mlx4: separate Rx configuration functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (43 preceding siblings ...)
  2017-09-01  8:06   ` [PATCH v2 44/51] net/mlx4: separate Tx configuration functions Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 46/51] net/mlx4: group flow API handlers in common file Adrien Mazarguil
                     ` (7 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

Private functions are now prefixed with "mlx4_" to prevent them from
conflicting with their mlx5 PMD counterparts at link time.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile    |   1 +
 drivers/net/mlx4/mlx4.c      | 541 +----------------------------------
 drivers/net/mlx4/mlx4_rxq.c  | 579 ++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx4/mlx4_rxtx.h |  11 +
 4 files changed, 597 insertions(+), 535 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 0b74fed..16e5c5a 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_txq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_utils.c
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ee3fc24..031b1e6 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -78,17 +78,6 @@ const char *pmd_mlx4_init_params[] = {
 
 /* Device configuration. */
 
-static int
-rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp);
-
-static void
-rxq_cleanup(struct rxq *rxq);
-
-static void
-priv_mac_addr_del(struct priv *priv);
-
 /**
  * DPDK callback for Ethernet device configuration.
  *
@@ -235,524 +224,6 @@ mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
 	return mr;
 }
 
-/* RX queues handling. */
-
-/**
- * Allocate RX queue elements.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- * @param elts_n
- *   Number of elements to allocate.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
-{
-	unsigned int i;
-	struct rxq_elt (*elts)[elts_n] =
-		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
-				  rxq->socket);
-
-	if (elts == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: can't allocate packets array", (void *)rxq);
-		goto error;
-	}
-	/* For each WR (packet). */
-	for (i = 0; (i != elts_n); ++i) {
-		struct rxq_elt *elt = &(*elts)[i];
-		struct ibv_recv_wr *wr = &elt->wr;
-		struct ibv_sge *sge = &(*elts)[i].sge;
-		struct rte_mbuf *buf = rte_pktmbuf_alloc(rxq->mp);
-
-		if (buf == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: empty mbuf pool", (void *)rxq);
-			goto error;
-		}
-		elt->buf = buf;
-		wr->next = &(*elts)[(i + 1)].wr;
-		wr->sg_list = sge;
-		wr->num_sge = 1;
-		/* Headroom is reserved by rte_pktmbuf_alloc(). */
-		assert(buf->data_off == RTE_PKTMBUF_HEADROOM);
-		/* Buffer is supposed to be empty. */
-		assert(rte_pktmbuf_data_len(buf) == 0);
-		assert(rte_pktmbuf_pkt_len(buf) == 0);
-		/* sge->addr must be able to store a pointer. */
-		assert(sizeof(sge->addr) >= sizeof(uintptr_t));
-		/* SGE keeps its headroom. */
-		sge->addr = (uintptr_t)
-			((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
-		sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
-		sge->lkey = rxq->mr->lkey;
-		/* Redundant check for tailroom. */
-		assert(sge->length == rte_pktmbuf_tailroom(buf));
-	}
-	/* The last WR pointer must be NULL. */
-	(*elts)[(i - 1)].wr.next = NULL;
-	DEBUG("%p: allocated and configured %u single-segment WRs",
-	      (void *)rxq, elts_n);
-	rxq->elts_n = elts_n;
-	rxq->elts_head = 0;
-	rxq->elts = elts;
-	return 0;
-error:
-	if (elts != NULL) {
-		for (i = 0; (i != RTE_DIM(*elts)); ++i)
-			rte_pktmbuf_free_seg((*elts)[i].buf);
-		rte_free(elts);
-	}
-	DEBUG("%p: failed, freed everything", (void *)rxq);
-	assert(rte_errno > 0);
-	return -rte_errno;
-}
-
-/**
- * Free RX queue elements.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_free_elts(struct rxq *rxq)
-{
-	unsigned int i;
-	unsigned int elts_n = rxq->elts_n;
-	struct rxq_elt (*elts)[elts_n] = rxq->elts;
-
-	DEBUG("%p: freeing WRs", (void *)rxq);
-	rxq->elts_n = 0;
-	rxq->elts = NULL;
-	if (elts == NULL)
-		return;
-	for (i = 0; (i != RTE_DIM(*elts)); ++i)
-		rte_pktmbuf_free_seg((*elts)[i].buf);
-	rte_free(elts);
-}
-
-/**
- * Unregister a MAC address.
- *
- * @param priv
- *   Pointer to private structure.
- */
-static void
-priv_mac_addr_del(struct priv *priv)
-{
-#ifndef NDEBUG
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-#endif
-
-	if (!priv->mac_flow)
-		return;
-	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	claim_zero(ibv_destroy_flow(priv->mac_flow));
-	priv->mac_flow = NULL;
-}
-
-/**
- * Register a MAC address.
- *
- * The MAC address is registered in queue 0.
- *
- * @param priv
- *   Pointer to private structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-priv_mac_addr_add(struct priv *priv)
-{
-	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
-	struct rxq *rxq;
-	struct ibv_flow *flow;
-
-	/* If device isn't started, this is all we need to do. */
-	if (!priv->started)
-		return 0;
-	if (priv->isolated)
-		return 0;
-	if (*priv->rxqs && (*priv->rxqs)[0])
-		rxq = (*priv->rxqs)[0];
-	else
-		return 0;
-
-	/* Allocate flow specification on the stack. */
-	struct __attribute__((packed)) {
-		struct ibv_flow_attr attr;
-		struct ibv_flow_spec_eth spec;
-	} data;
-	struct ibv_flow_attr *attr = &data.attr;
-	struct ibv_flow_spec_eth *spec = &data.spec;
-
-	if (priv->mac_flow)
-		priv_mac_addr_del(priv);
-	/*
-	 * No padding must be inserted by the compiler between attr and spec.
-	 * This layout is expected by libibverbs.
-	 */
-	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
-	*attr = (struct ibv_flow_attr){
-		.type = IBV_FLOW_ATTR_NORMAL,
-		.priority = 3,
-		.num_of_specs = 1,
-		.port = priv->port,
-		.flags = 0
-	};
-	*spec = (struct ibv_flow_spec_eth){
-		.type = IBV_FLOW_SPEC_ETH,
-		.size = sizeof(*spec),
-		.val = {
-			.dst_mac = {
-				(*mac)[0], (*mac)[1], (*mac)[2],
-				(*mac)[3], (*mac)[4], (*mac)[5]
-			},
-		},
-		.mask = {
-			.dst_mac = "\xff\xff\xff\xff\xff\xff",
-		}
-	};
-	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
-	      (void *)priv,
-	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
-	/* Create related flow. */
-	flow = ibv_create_flow(rxq->qp, attr);
-	if (flow == NULL) {
-		rte_errno = errno ? errno : EINVAL;
-		ERROR("%p: flow configuration failed, errno=%d: %s",
-		      (void *)rxq, rte_errno, strerror(errno));
-		return -rte_errno;
-	}
-	assert(priv->mac_flow == NULL);
-	priv->mac_flow = flow;
-	return 0;
-}
-
-/**
- * Clean up a RX queue.
- *
- * Destroy objects, free allocated memory and reset the structure for reuse.
- *
- * @param rxq
- *   Pointer to RX queue structure.
- */
-static void
-rxq_cleanup(struct rxq *rxq)
-{
-	DEBUG("cleaning up %p", (void *)rxq);
-	rxq_free_elts(rxq);
-	if (rxq->qp != NULL)
-		claim_zero(ibv_destroy_qp(rxq->qp));
-	if (rxq->cq != NULL)
-		claim_zero(ibv_destroy_cq(rxq->cq));
-	if (rxq->channel != NULL)
-		claim_zero(ibv_destroy_comp_channel(rxq->channel));
-	if (rxq->mr != NULL)
-		claim_zero(ibv_dereg_mr(rxq->mr));
-	memset(rxq, 0, sizeof(*rxq));
-}
-
-/**
- * Allocate a Queue Pair.
- * Optionally setup inline receive if supported.
- *
- * @param priv
- *   Pointer to private structure.
- * @param cq
- *   Completion queue to associate with QP.
- * @param desc
- *   Number of descriptors in QP (hint only).
- *
- * @return
- *   QP pointer or NULL in case of error and rte_errno is set.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
-	struct ibv_qp *qp;
-	struct ibv_qp_init_attr attr = {
-		/* CQ to be associated with the send queue. */
-		.send_cq = cq,
-		/* CQ to be associated with the receive queue. */
-		.recv_cq = cq,
-		.cap = {
-			/* Max number of outstanding WRs. */
-			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
-					priv->device_attr.max_qp_wr :
-					desc),
-			/* Max number of scatter/gather elements in a WR. */
-			.max_recv_sge = 1,
-		},
-		.qp_type = IBV_QPT_RAW_PACKET,
-	};
-
-	qp = ibv_create_qp(priv->pd, &attr);
-	if (!qp)
-		rte_errno = errno ? errno : EINVAL;
-	return qp;
-}
-
-/**
- * Configure a RX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param rxq
- *   Pointer to RX queue structure.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
-	  unsigned int socket, const struct rte_eth_rxconf *conf,
-	  struct rte_mempool *mp)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq tmpl = {
-		.priv = priv,
-		.mp = mp,
-		.socket = socket
-	};
-	struct ibv_qp_attr mod;
-	struct ibv_recv_wr *bad_wr;
-	unsigned int mb_len;
-	int ret;
-
-	(void)conf; /* Thresholds configuration (ignored). */
-	mb_len = rte_pktmbuf_data_room_size(mp);
-	if (desc == 0) {
-		rte_errno = EINVAL;
-		ERROR("%p: invalid number of Rx descriptors", (void *)dev);
-		goto error;
-	}
-	/* Enable scattered packets support for this queue if necessary. */
-	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
-	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
-	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
-		;
-	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
-		WARN("%p: scattered mode has been requested but is"
-		     " not supported, this may lead to packet loss",
-		     (void *)dev);
-	} else {
-		WARN("%p: the requested maximum Rx packet size (%u) is"
-		     " larger than a single mbuf (%u) and scattered"
-		     " mode has not been requested",
-		     (void *)dev,
-		     dev->data->dev_conf.rxmode.max_rx_pkt_len,
-		     mb_len - RTE_PKTMBUF_HEADROOM);
-	}
-	/* Use the entire RX mempool as the memory region. */
-	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
-	if (tmpl.mr == NULL) {
-		rte_errno = EINVAL;
-		ERROR("%p: MR creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	if (dev->data->dev_conf.intr_conf.rxq) {
-		tmpl.channel = ibv_create_comp_channel(priv->ctx);
-		if (tmpl.channel == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: Rx interrupt completion channel creation"
-			      " failure: %s",
-			      (void *)dev, strerror(rte_errno));
-			goto error;
-		}
-		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
-			ERROR("%p: unable to make Rx interrupt completion"
-			      " channel non-blocking: %s",
-			      (void *)dev, strerror(rte_errno));
-			goto error;
-		}
-	}
-	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
-	if (tmpl.cq == NULL) {
-		rte_errno = ENOMEM;
-		ERROR("%p: CQ creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	DEBUG("priv->device_attr.max_qp_wr is %d",
-	      priv->device_attr.max_qp_wr);
-	DEBUG("priv->device_attr.max_sge is %d",
-	      priv->device_attr.max_sge);
-	tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
-	if (tmpl.qp == NULL) {
-		ERROR("%p: QP creation failure: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	mod = (struct ibv_qp_attr){
-		/* Move the QP to this state. */
-		.qp_state = IBV_QPS_INIT,
-		/* Primary port number. */
-		.port_num = priv->port
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = rxq_alloc_elts(&tmpl, desc);
-	if (ret) {
-		ERROR("%p: RXQ allocation failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
-		      (void *)dev,
-		      (void *)bad_wr,
-		      strerror(rte_errno));
-		goto error;
-	}
-	mod = (struct ibv_qp_attr){
-		.qp_state = IBV_QPS_RTR
-	};
-	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
-	if (ret) {
-		rte_errno = ret;
-		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
-		      (void *)dev, strerror(rte_errno));
-		goto error;
-	}
-	/* Save port ID. */
-	tmpl.port_id = dev->data->port_id;
-	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
-	/* Clean up rxq in case we're reinitializing it. */
-	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
-	rxq_cleanup(rxq);
-	*rxq = tmpl;
-	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
-	return 0;
-error:
-	ret = rte_errno;
-	rxq_cleanup(&tmpl);
-	rte_errno = ret;
-	assert(rte_errno > 0);
-	return -rte_errno;
-}
-
-/**
- * DPDK callback to configure a RX queue.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param idx
- *   RX queue index.
- * @param desc
- *   Number of descriptors to configure in queue.
- * @param socket
- *   NUMA socket on which memory must be allocated.
- * @param[in] conf
- *   Thresholds parameters.
- * @param mp
- *   Memory pool for buffer allocations.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
-		    unsigned int socket, const struct rte_eth_rxconf *conf,
-		    struct rte_mempool *mp)
-{
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
-	int ret;
-
-	DEBUG("%p: configuring queue %u for %u descriptors",
-	      (void *)dev, idx, desc);
-	if (idx >= priv->rxqs_n) {
-		rte_errno = EOVERFLOW;
-		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, priv->rxqs_n);
-		return -rte_errno;
-	}
-	if (rxq != NULL) {
-		DEBUG("%p: reusing already allocated queue index %u (%p)",
-		      (void *)dev, idx, (void *)rxq);
-		if (priv->started) {
-			rte_errno = EEXIST;
-			return -rte_errno;
-		}
-		(*priv->rxqs)[idx] = NULL;
-		if (idx == 0)
-			priv_mac_addr_del(priv);
-		rxq_cleanup(rxq);
-	} else {
-		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
-		if (rxq == NULL) {
-			rte_errno = ENOMEM;
-			ERROR("%p: unable to allocate queue index %u",
-			      (void *)dev, idx);
-			return -rte_errno;
-		}
-	}
-	ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
-	if (ret)
-		rte_free(rxq);
-	else {
-		rxq->stats.idx = idx;
-		DEBUG("%p: adding RX queue %p to list",
-		      (void *)dev, (void *)rxq);
-		(*priv->rxqs)[idx] = rxq;
-		/* Update receive callback. */
-		dev->rx_pkt_burst = mlx4_rx_burst;
-	}
-	return ret;
-}
-
-/**
- * DPDK callback to release a RX queue.
- *
- * @param dpdk_rxq
- *   Generic RX queue pointer.
- */
-static void
-mlx4_rx_queue_release(void *dpdk_rxq)
-{
-	struct rxq *rxq = (struct rxq *)dpdk_rxq;
-	struct priv *priv;
-	unsigned int i;
-
-	if (rxq == NULL)
-		return;
-	priv = rxq->priv;
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] == rxq) {
-			DEBUG("%p: removing RX queue %p from list",
-			      (void *)priv->dev, (void *)rxq);
-			(*priv->rxqs)[i] = NULL;
-			if (i == 0)
-				priv_mac_addr_del(priv);
-			break;
-		}
-	rxq_cleanup(rxq);
-	rte_free(rxq);
-}
-
 /**
  * DPDK callback to start the device.
  *
@@ -774,7 +245,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		return 0;
 	DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
 	priv->started = 1;
-	ret = priv_mac_addr_add(priv);
+	ret = mlx4_mac_addr_add(priv);
 	if (ret)
 		goto err;
 	ret = mlx4_intr_install(priv);
@@ -792,7 +263,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 	return 0;
 err:
 	/* Rollback. */
-	priv_mac_addr_del(priv);
+	mlx4_mac_addr_del(priv);
 	priv->started = 0;
 	return ret;
 }
@@ -816,7 +287,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 	priv->started = 0;
 	mlx4_priv_flow_stop(priv);
 	mlx4_intr_uninstall(priv);
-	priv_mac_addr_del(priv);
+	mlx4_mac_addr_del(priv);
 }
 
 /**
@@ -839,7 +310,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	DEBUG("%p: closing device \"%s\"",
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
-	priv_mac_addr_del(priv);
+	mlx4_mac_addr_del(priv);
 	/*
 	 * Prevent crashes when queues are still in use. This is unfortunately
 	 * still required for DPDK 1.3 because some programs (such as testpmd)
@@ -855,7 +326,7 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 			if (tmp == NULL)
 				continue;
 			(*priv->rxqs)[i] = NULL;
-			rxq_cleanup(tmp);
+			mlx4_rxq_cleanup(tmp);
 			rte_free(tmp);
 		}
 		priv->rxqs_n = 0;
@@ -1270,7 +741,7 @@ mlx4_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		     mac.addr_bytes[4], mac.addr_bytes[5]);
 		/* Register MAC address. */
 		priv->mac = mac;
-		if (priv_mac_addr_add(priv))
+		if (mlx4_mac_addr_add(priv))
 			goto port_error;
 #ifndef NDEBUG
 		{
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
new file mode 100644
index 0000000..7f675a4
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -0,0 +1,579 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Rx queues configuration for mlx4 driver.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#include "mlx4.h"
+#include "mlx4_rxtx.h"
+#include "mlx4_utils.h"
+
+/**
+ * Allocate Rx queue elements.
+ *
+ * @param rxq
+ *   Pointer to Rx queue structure.
+ * @param elts_n
+ *   Number of elements to allocate.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n)
+{
+	unsigned int i;
+	struct rxq_elt (*elts)[elts_n] =
+		rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
+				  rxq->socket);
+
+	if (elts == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("%p: can't allocate packets array", (void *)rxq);
+		goto error;
+	}
+	/* For each WR (packet). */
+	for (i = 0; (i != elts_n); ++i) {
+		struct rxq_elt *elt = &(*elts)[i];
+		struct ibv_recv_wr *wr = &elt->wr;
+		struct ibv_sge *sge = &(*elts)[i].sge;
+		struct rte_mbuf *buf = rte_pktmbuf_alloc(rxq->mp);
+
+		if (buf == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: empty mbuf pool", (void *)rxq);
+			goto error;
+		}
+		elt->buf = buf;
+		wr->next = &(*elts)[(i + 1)].wr;
+		wr->sg_list = sge;
+		wr->num_sge = 1;
+		/* Headroom is reserved by rte_pktmbuf_alloc(). */
+		assert(buf->data_off == RTE_PKTMBUF_HEADROOM);
+		/* Buffer is supposed to be empty. */
+		assert(rte_pktmbuf_data_len(buf) == 0);
+		assert(rte_pktmbuf_pkt_len(buf) == 0);
+		/* sge->addr must be able to store a pointer. */
+		assert(sizeof(sge->addr) >= sizeof(uintptr_t));
+		/* SGE keeps its headroom. */
+		sge->addr = (uintptr_t)
+			((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
+		sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
+		sge->lkey = rxq->mr->lkey;
+		/* Redundant check for tailroom. */
+		assert(sge->length == rte_pktmbuf_tailroom(buf));
+	}
+	/* The last WR pointer must be NULL. */
+	(*elts)[(i - 1)].wr.next = NULL;
+	DEBUG("%p: allocated and configured %u single-segment WRs",
+	      (void *)rxq, elts_n);
+	rxq->elts_n = elts_n;
+	rxq->elts_head = 0;
+	rxq->elts = elts;
+	return 0;
+error:
+	if (elts != NULL) {
+		for (i = 0; (i != RTE_DIM(*elts)); ++i)
+			rte_pktmbuf_free_seg((*elts)[i].buf);
+		rte_free(elts);
+	}
+	DEBUG("%p: failed, freed everything", (void *)rxq);
+	assert(rte_errno > 0);
+	return -rte_errno;
+}
+
+/**
+ * Free Rx queue elements.
+ *
+ * @param rxq
+ *   Pointer to Rx queue structure.
+ */
+static void
+mlx4_rxq_free_elts(struct rxq *rxq)
+{
+	unsigned int i;
+	unsigned int elts_n = rxq->elts_n;
+	struct rxq_elt (*elts)[elts_n] = rxq->elts;
+
+	DEBUG("%p: freeing WRs", (void *)rxq);
+	rxq->elts_n = 0;
+	rxq->elts = NULL;
+	if (elts == NULL)
+		return;
+	for (i = 0; (i != RTE_DIM(*elts)); ++i)
+		rte_pktmbuf_free_seg((*elts)[i].buf);
+	rte_free(elts);
+}
+
+/**
+ * Clean up a Rx queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param rxq
+ *   Pointer to Rx queue structure.
+ */
+void
+mlx4_rxq_cleanup(struct rxq *rxq)
+{
+	DEBUG("cleaning up %p", (void *)rxq);
+	mlx4_rxq_free_elts(rxq);
+	if (rxq->qp != NULL)
+		claim_zero(ibv_destroy_qp(rxq->qp));
+	if (rxq->cq != NULL)
+		claim_zero(ibv_destroy_cq(rxq->cq));
+	if (rxq->channel != NULL)
+		claim_zero(ibv_destroy_comp_channel(rxq->channel));
+	if (rxq->mr != NULL)
+		claim_zero(ibv_dereg_mr(rxq->mr));
+	memset(rxq, 0, sizeof(*rxq));
+}
+
+/**
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param cq
+ *   Completion queue to associate with QP.
+ * @param desc
+ *   Number of descriptors in QP (hint only).
+ *
+ * @return
+ *   QP pointer or NULL in case of error and rte_errno is set.
+ */
+static struct ibv_qp *
+mlx4_rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
+{
+	struct ibv_qp *qp;
+	struct ibv_qp_init_attr attr = {
+		/* CQ to be associated with the send queue. */
+		.send_cq = cq,
+		/* CQ to be associated with the receive queue. */
+		.recv_cq = cq,
+		.cap = {
+			/* Max number of outstanding WRs. */
+			.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+					priv->device_attr.max_qp_wr :
+					desc),
+			/* Max number of scatter/gather elements in a WR. */
+			.max_recv_sge = 1,
+		},
+		.qp_type = IBV_QPT_RAW_PACKET,
+	};
+
+	qp = ibv_create_qp(priv->pd, &attr);
+	if (!qp)
+		rte_errno = errno ? errno : EINVAL;
+	return qp;
+}
+
+/**
+ * Configure a Rx queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param rxq
+ *   Pointer to Rx queue structure.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+static int
+mlx4_rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
+	       unsigned int socket, const struct rte_eth_rxconf *conf,
+	       struct rte_mempool *mp)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq tmpl = {
+		.priv = priv,
+		.mp = mp,
+		.socket = socket
+	};
+	struct ibv_qp_attr mod;
+	struct ibv_recv_wr *bad_wr;
+	unsigned int mb_len;
+	int ret;
+
+	(void)conf; /* Thresholds configuration (ignored). */
+	mb_len = rte_pktmbuf_data_room_size(mp);
+	if (desc == 0) {
+		rte_errno = EINVAL;
+		ERROR("%p: invalid number of Rx descriptors", (void *)dev);
+		goto error;
+	}
+	/* Enable scattered packets support for this queue if necessary. */
+	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
+	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
+	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
+		;
+	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
+		WARN("%p: scattered mode has been requested but is"
+		     " not supported, this may lead to packet loss",
+		     (void *)dev);
+	} else {
+		WARN("%p: the requested maximum Rx packet size (%u) is"
+		     " larger than a single mbuf (%u) and scattered"
+		     " mode has not been requested",
+		     (void *)dev,
+		     dev->data->dev_conf.rxmode.max_rx_pkt_len,
+		     mb_len - RTE_PKTMBUF_HEADROOM);
+	}
+	/* Use the entire Rx mempool as the memory region. */
+	tmpl.mr = mlx4_mp2mr(priv->pd, mp);
+	if (tmpl.mr == NULL) {
+		rte_errno = EINVAL;
+		ERROR("%p: MR creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	if (dev->data->dev_conf.intr_conf.rxq) {
+		tmpl.channel = ibv_create_comp_channel(priv->ctx);
+		if (tmpl.channel == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: Rx interrupt completion channel creation"
+			      " failure: %s",
+			      (void *)dev, strerror(rte_errno));
+			goto error;
+		}
+		if (mlx4_fd_set_non_blocking(tmpl.channel->fd) < 0) {
+			ERROR("%p: unable to make Rx interrupt completion"
+			      " channel non-blocking: %s",
+			      (void *)dev, strerror(rte_errno));
+			goto error;
+		}
+	}
+	tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0);
+	if (tmpl.cq == NULL) {
+		rte_errno = ENOMEM;
+		ERROR("%p: CQ creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	DEBUG("priv->device_attr.max_qp_wr is %d",
+	      priv->device_attr.max_qp_wr);
+	DEBUG("priv->device_attr.max_sge is %d",
+	      priv->device_attr.max_sge);
+	tmpl.qp = mlx4_rxq_setup_qp(priv, tmpl.cq, desc);
+	if (tmpl.qp == NULL) {
+		ERROR("%p: QP creation failure: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	mod = (struct ibv_qp_attr){
+		/* Move the QP to this state. */
+		.qp_state = IBV_QPS_INIT,
+		/* Primary port number. */
+		.port_num = priv->port
+	};
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE | IBV_QP_PORT);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = mlx4_rxq_alloc_elts(&tmpl, desc);
+	if (ret) {
+		ERROR("%p: RXQ allocation failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	ret = ibv_post_recv(tmpl.qp, &(*tmpl.elts)[0].wr, &bad_wr);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+		      (void *)dev,
+		      (void *)bad_wr,
+		      strerror(rte_errno));
+		goto error;
+	}
+	mod = (struct ibv_qp_attr){
+		.qp_state = IBV_QPS_RTR
+	};
+	ret = ibv_modify_qp(tmpl.qp, &mod, IBV_QP_STATE);
+	if (ret) {
+		rte_errno = ret;
+		ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+		      (void *)dev, strerror(rte_errno));
+		goto error;
+	}
+	/* Save port ID. */
+	tmpl.port_id = dev->data->port_id;
+	DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
+	/* Clean up rxq in case we're reinitializing it. */
+	DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
+	mlx4_rxq_cleanup(rxq);
+	*rxq = tmpl;
+	DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
+	return 0;
+error:
+	ret = rte_errno;
+	mlx4_rxq_cleanup(&tmpl);
+	rte_errno = ret;
+	assert(rte_errno > 0);
+	return -rte_errno;
+}
+
+/**
+ * DPDK callback to configure a Rx queue.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param idx
+ *   Rx queue index.
+ * @param desc
+ *   Number of descriptors to configure in queue.
+ * @param socket
+ *   NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ *   Thresholds parameters.
+ * @param mp
+ *   Memory pool for buffer allocations.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+		    unsigned int socket, const struct rte_eth_rxconf *conf,
+		    struct rte_mempool *mp)
+{
+	struct priv *priv = dev->data->dev_private;
+	struct rxq *rxq = (*priv->rxqs)[idx];
+	int ret;
+
+	DEBUG("%p: configuring queue %u for %u descriptors",
+	      (void *)dev, idx, desc);
+	if (idx >= priv->rxqs_n) {
+		rte_errno = EOVERFLOW;
+		ERROR("%p: queue index out of range (%u >= %u)",
+		      (void *)dev, idx, priv->rxqs_n);
+		return -rte_errno;
+	}
+	if (rxq != NULL) {
+		DEBUG("%p: reusing already allocated queue index %u (%p)",
+		      (void *)dev, idx, (void *)rxq);
+		if (priv->started) {
+			rte_errno = EEXIST;
+			return -rte_errno;
+		}
+		(*priv->rxqs)[idx] = NULL;
+		if (idx == 0)
+			mlx4_mac_addr_del(priv);
+		mlx4_rxq_cleanup(rxq);
+	} else {
+		rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+		if (rxq == NULL) {
+			rte_errno = ENOMEM;
+			ERROR("%p: unable to allocate queue index %u",
+			      (void *)dev, idx);
+			return -rte_errno;
+		}
+	}
+	ret = mlx4_rxq_setup(dev, rxq, desc, socket, conf, mp);
+	if (ret) {
+		rte_free(rxq);
+	} else {
+		rxq->stats.idx = idx;
+		DEBUG("%p: adding Rx queue %p to list",
+		      (void *)dev, (void *)rxq);
+		(*priv->rxqs)[idx] = rxq;
+		/* Update receive callback. */
+		dev->rx_pkt_burst = mlx4_rx_burst;
+	}
+	return ret;
+}
+
+/**
+ * DPDK callback to release a Rx queue.
+ *
+ * @param dpdk_rxq
+ *   Generic Rx queue pointer.
+ */
+void
+mlx4_rx_queue_release(void *dpdk_rxq)
+{
+	struct rxq *rxq = (struct rxq *)dpdk_rxq;
+	struct priv *priv;
+	unsigned int i;
+
+	if (rxq == NULL)
+		return;
+	priv = rxq->priv;
+	for (i = 0; (i != priv->rxqs_n); ++i)
+		if ((*priv->rxqs)[i] == rxq) {
+			DEBUG("%p: removing Rx queue %p from list",
+			      (void *)priv->dev, (void *)rxq);
+			(*priv->rxqs)[i] = NULL;
+			if (i == 0)
+				mlx4_mac_addr_del(priv);
+			break;
+		}
+	mlx4_rxq_cleanup(rxq);
+	rte_free(rxq);
+}
+
+/**
+ * Unregister a MAC address.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ */
+void
+mlx4_mac_addr_del(struct priv *priv)
+{
+#ifndef NDEBUG
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
+#endif
+
+	if (!priv->mac_flow)
+		return;
+	DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+	      (void *)priv,
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
+	claim_zero(ibv_destroy_flow(priv->mac_flow));
+	priv->mac_flow = NULL;
+}
+
+/**
+ * Register a MAC address.
+ *
+ * The MAC address is registered in queue 0.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_mac_addr_add(struct priv *priv)
+{
+	uint8_t (*mac)[ETHER_ADDR_LEN] = &priv->mac.addr_bytes;
+	struct rxq *rxq;
+	struct ibv_flow *flow;
+
+	/* If device isn't started, this is all we need to do. */
+	if (!priv->started)
+		return 0;
+	if (priv->isolated)
+		return 0;
+	if (*priv->rxqs && (*priv->rxqs)[0])
+		rxq = (*priv->rxqs)[0];
+	else
+		return 0;
+
+	/* Allocate flow specification on the stack. */
+	struct __attribute__((packed)) {
+		struct ibv_flow_attr attr;
+		struct ibv_flow_spec_eth spec;
+	} data;
+	struct ibv_flow_attr *attr = &data.attr;
+	struct ibv_flow_spec_eth *spec = &data.spec;
+
+	if (priv->mac_flow)
+		mlx4_mac_addr_del(priv);
+	/*
+	 * No padding must be inserted by the compiler between attr and spec.
+	 * This layout is expected by libibverbs.
+	 */
+	assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
+	*attr = (struct ibv_flow_attr){
+		.type = IBV_FLOW_ATTR_NORMAL,
+		.priority = 3,
+		.num_of_specs = 1,
+		.port = priv->port,
+		.flags = 0
+	};
+	*spec = (struct ibv_flow_spec_eth){
+		.type = IBV_FLOW_SPEC_ETH,
+		.size = sizeof(*spec),
+		.val = {
+			.dst_mac = {
+				(*mac)[0], (*mac)[1], (*mac)[2],
+				(*mac)[3], (*mac)[4], (*mac)[5]
+			},
+		},
+		.mask = {
+			.dst_mac = "\xff\xff\xff\xff\xff\xff",
+		}
+	};
+	DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x",
+	      (void *)priv,
+	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5]);
+	/* Create related flow. */
+	flow = ibv_create_flow(rxq->qp, attr);
+	if (flow == NULL) {
+		rte_errno = errno ? errno : EINVAL;
+		ERROR("%p: flow configuration failed, errno=%d: %s",
+		      (void *)rxq, rte_errno, strerror(errno));
+		return -rte_errno;
+	}
+	assert(priv->mac_flow == NULL);
+	priv->mac_flow = flow;
+	return 0;
+}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index 1b457a5..fec998a 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -120,6 +120,17 @@ struct txq {
 	unsigned int socket; /**< CPU socket ID for allocations. */
 };
 
+/* mlx4_rxq.c */
+
+void mlx4_rxq_cleanup(struct rxq *rxq);
+int mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx,
+			uint16_t desc, unsigned int socket,
+			const struct rte_eth_rxconf *conf,
+			struct rte_mempool *mp);
+void mlx4_rx_queue_release(void *dpdk_rxq);
+void mlx4_mac_addr_del(struct priv *priv);
+int mlx4_mac_addr_add(struct priv *priv);
+
 /* mlx4_rxtx.c */
 
 uint32_t mlx4_txq_mp2mr(struct txq *txq, struct rte_mempool *mp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 46/51] net/mlx4: group flow API handlers in common file
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (44 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 45/51] net/mlx4: separate Rx " Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 47/51] net/mlx4: rename private functions in flow API Adrien Mazarguil
                     ` (6 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

Only the common filter control operation callback needs to be exposed.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 48 +-------------------------
 drivers/net/mlx4/mlx4_flow.c | 72 ++++++++++++++++++++++++++++++++++++---
 drivers/net/mlx4/mlx4_flow.h | 39 +++++----------------
 3 files changed, 76 insertions(+), 83 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 031b1e6..e095a9f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -51,7 +51,6 @@
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_memory.h>
-#include <rte_flow.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
 #include <rte_common.h>
@@ -356,51 +355,6 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	memset(priv, 0, sizeof(*priv));
 }
 
-const struct rte_flow_ops mlx4_flow_ops = {
-	.validate = mlx4_flow_validate,
-	.create = mlx4_flow_create,
-	.destroy = mlx4_flow_destroy,
-	.flush = mlx4_flow_flush,
-	.query = NULL,
-	.isolate = mlx4_flow_isolate,
-};
-
-/**
- * Manage filter operations.
- *
- * @param dev
- *   Pointer to Ethernet device structure.
- * @param filter_type
- *   Filter type.
- * @param filter_op
- *   Operation to perform.
- * @param arg
- *   Pointer to operation-specific structure.
- *
- * @return
- *   0 on success, negative errno value otherwise and rte_errno is set.
- */
-static int
-mlx4_dev_filter_ctrl(struct rte_eth_dev *dev,
-		     enum rte_filter_type filter_type,
-		     enum rte_filter_op filter_op,
-		     void *arg)
-{
-	switch (filter_type) {
-	case RTE_ETH_FILTER_GENERIC:
-		if (filter_op != RTE_ETH_FILTER_GET)
-			break;
-		*(const void **)arg = &mlx4_flow_ops;
-		return 0;
-	default:
-		ERROR("%p: filter type (%d) not supported",
-		      (void *)dev, filter_type);
-		break;
-	}
-	rte_errno = ENOTSUP;
-	return -rte_errno;
-}
-
 static const struct eth_dev_ops mlx4_dev_ops = {
 	.dev_configure = mlx4_dev_configure,
 	.dev_start = mlx4_dev_start,
@@ -419,7 +373,7 @@ static const struct eth_dev_ops mlx4_dev_ops = {
 	.flow_ctrl_get = mlx4_flow_ctrl_get,
 	.flow_ctrl_set = mlx4_flow_ctrl_set,
 	.mtu_set = mlx4_mtu_set,
-	.filter_ctrl = mlx4_dev_filter_ctrl,
+	.filter_ctrl = mlx4_filter_ctrl,
 	.rx_queue_intr_enable = mlx4_rx_intr_enable,
 	.rx_queue_intr_disable = mlx4_rx_intr_disable,
 };
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 61455ce..6401a83 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -31,8 +31,26 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include <arpa/inet.h>
 #include <assert.h>
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/queue.h>
 
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_errno.h>
+#include <rte_eth_ctrl.h>
+#include <rte_ethdev.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 #include <rte_malloc.h>
@@ -697,7 +715,7 @@ priv_flow_validate(struct priv *priv,
  * @see rte_flow_validate()
  * @see rte_flow_ops
  */
-int
+static int
 mlx4_flow_validate(struct rte_eth_dev *dev,
 		   const struct rte_flow_attr *attr,
 		   const struct rte_flow_item items[],
@@ -844,7 +862,7 @@ priv_flow_create_action_queue(struct priv *priv,
  * @see rte_flow_create()
  * @see rte_flow_ops
  */
-struct rte_flow *
+static struct rte_flow *
 mlx4_flow_create(struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr,
 		 const struct rte_flow_item items[],
@@ -927,7 +945,7 @@ mlx4_flow_create(struct rte_eth_dev *dev,
  * @return
  *   0 on success, a negative value on error.
  */
-int
+static int
 mlx4_flow_isolate(struct rte_eth_dev *dev,
 		  int enable,
 		  struct rte_flow_error *error)
@@ -951,7 +969,7 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
  * @see rte_flow_destroy()
  * @see rte_flow_ops
  */
-int
+static int
 mlx4_flow_destroy(struct rte_eth_dev *dev,
 		  struct rte_flow *flow,
 		  struct rte_flow_error *error)
@@ -973,7 +991,7 @@ mlx4_flow_destroy(struct rte_eth_dev *dev,
  * @see rte_flow_flush()
  * @see rte_flow_ops
  */
-int
+static int
 mlx4_flow_flush(struct rte_eth_dev *dev,
 		struct rte_flow_error *error)
 {
@@ -1044,3 +1062,47 @@ mlx4_priv_flow_start(struct priv *priv)
 	}
 	return 0;
 }
+
+static const struct rte_flow_ops mlx4_flow_ops = {
+	.validate = mlx4_flow_validate,
+	.create = mlx4_flow_create,
+	.destroy = mlx4_flow_destroy,
+	.flush = mlx4_flow_flush,
+	.isolate = mlx4_flow_isolate,
+};
+
+/**
+ * Manage filter operations.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param filter_type
+ *   Filter type.
+ * @param filter_op
+ *   Operation to perform.
+ * @param arg
+ *   Pointer to operation-specific structure.
+ *
+ * @return
+ *   0 on success, negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx4_filter_ctrl(struct rte_eth_dev *dev,
+		 enum rte_filter_type filter_type,
+		 enum rte_filter_op filter_op,
+		 void *arg)
+{
+	switch (filter_type) {
+	case RTE_ETH_FILTER_GENERIC:
+		if (filter_op != RTE_ETH_FILTER_GET)
+			break;
+		*(const void **)arg = &mlx4_flow_ops;
+		return 0;
+	default:
+		ERROR("%p: filter type (%d) not supported",
+		      (void *)dev, filter_type);
+		break;
+	}
+	rte_errno = ENOTSUP;
+	return -rte_errno;
+}
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 17e5f6e..8bd659c 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -34,7 +34,6 @@
 #ifndef RTE_PMD_MLX4_FLOW_H_
 #define RTE_PMD_MLX4_FLOW_H_
 
-#include <stddef.h>
 #include <stdint.h>
 #include <sys/queue.h>
 
@@ -48,12 +47,12 @@
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
 
+#include <rte_eth_ctrl.h>
+#include <rte_ethdev.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 #include <rte_byteorder.h>
 
-#include "mlx4.h"
-
 struct rte_flow {
 	LIST_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	struct ibv_flow *ibv_flow; /**< Verbs flow. */
@@ -61,47 +60,25 @@ struct rte_flow {
 	struct ibv_qp *qp; /**< Verbs queue pair. */
 };
 
-int
-mlx4_flow_validate(struct rte_eth_dev *dev,
-		   const struct rte_flow_attr *attr,
-		   const struct rte_flow_item items[],
-		   const struct rte_flow_action actions[],
-		   struct rte_flow_error *error);
-
-struct rte_flow *
-mlx4_flow_create(struct rte_eth_dev *dev,
-		 const struct rte_flow_attr *attr,
-		 const struct rte_flow_item items[],
-		 const struct rte_flow_action actions[],
-		 struct rte_flow_error *error);
-
-int
-mlx4_flow_destroy(struct rte_eth_dev *dev,
-		  struct rte_flow *flow,
-		  struct rte_flow_error *error);
-
-int
-mlx4_flow_flush(struct rte_eth_dev *dev,
-		struct rte_flow_error *error);
-
 /** Structure to pass to the conversion function. */
 struct mlx4_flow {
 	struct ibv_flow_attr *ibv_attr; /**< Verbs attribute. */
 	unsigned int offset; /**< Offset in bytes in the ibv_attr buffer. */
 };
 
-int
-mlx4_flow_isolate(struct rte_eth_dev *dev,
-		  int enable,
-		  struct rte_flow_error *error);
-
 struct mlx4_flow_action {
 	uint32_t drop:1; /**< Target is a drop queue. */
 	uint32_t queue:1; /**< Target is a receive queue. */
 	uint32_t queue_id; /**< Identifier of the queue. */
 };
 
+/* mlx4_flow.c */
+
 int mlx4_priv_flow_start(struct priv *priv);
 void mlx4_priv_flow_stop(struct priv *priv);
+int mlx4_filter_ctrl(struct rte_eth_dev *dev,
+		     enum rte_filter_type filter_type,
+		     enum rte_filter_op filter_op,
+		     void *arg);
 
 #endif /* RTE_PMD_MLX4_FLOW_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 47/51] net/mlx4: rename private functions in flow API
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (45 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 46/51] net/mlx4: group flow API handlers in common file Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 48/51] net/mlx4: separate memory management functions Adrien Mazarguil
                     ` (5 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

While internal static functions do not cause link time conflicts, this
differentiates them from their mlx5 PMD counterparts while debugging.

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      |  4 ++--
 drivers/net/mlx4/mlx4_flow.c | 30 +++++++++++++++---------------
 drivers/net/mlx4/mlx4_flow.h |  4 ++--
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e095a9f..ed1081b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -253,7 +253,7 @@ mlx4_dev_start(struct rte_eth_dev *dev)
 		     (void *)dev);
 		goto err;
 	}
-	ret = mlx4_priv_flow_start(priv);
+	ret = mlx4_flow_start(priv);
 	if (ret) {
 		ERROR("%p: flow start failed: %s",
 		      (void *)dev, strerror(ret));
@@ -284,7 +284,7 @@ mlx4_dev_stop(struct rte_eth_dev *dev)
 		return;
 	DEBUG("%p: detaching flows from all RX queues", (void *)dev);
 	priv->started = 0;
-	mlx4_priv_flow_stop(priv);
+	mlx4_flow_stop(priv);
 	mlx4_intr_uninstall(priv);
 	mlx4_mac_addr_del(priv);
 }
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 6401a83..5616b83 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -561,7 +561,7 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
 };
 
 /**
- * Validate a flow supported by the NIC.
+ * Make sure a flow rule is supported and initialize associated structure.
  *
  * @param priv
  *   Pointer to private structure.
@@ -580,12 +580,12 @@ static const struct mlx4_flow_items mlx4_flow_items[] = {
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 static int
-priv_flow_validate(struct priv *priv,
-		   const struct rte_flow_attr *attr,
-		   const struct rte_flow_item items[],
-		   const struct rte_flow_action actions[],
-		   struct rte_flow_error *error,
-		   struct mlx4_flow *flow)
+mlx4_flow_prepare(struct priv *priv,
+		  const struct rte_flow_attr *attr,
+		  const struct rte_flow_item items[],
+		  const struct rte_flow_action actions[],
+		  struct rte_flow_error *error,
+		  struct mlx4_flow *flow)
 {
 	const struct mlx4_flow_items *cur_item = mlx4_flow_items;
 	struct mlx4_flow_action action = {
@@ -725,7 +725,7 @@ mlx4_flow_validate(struct rte_eth_dev *dev,
 	struct priv *priv = dev->data->dev_private;
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr) };
 
-	return priv_flow_validate(priv, attr, items, actions, error, &flow);
+	return mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
 }
 
 /**
@@ -817,7 +817,7 @@ mlx4_flow_create_drop_queue(struct priv *priv)
  *   A flow if the rule could be created.
  */
 static struct rte_flow *
-priv_flow_create_action_queue(struct priv *priv,
+mlx4_flow_create_action_queue(struct priv *priv,
 			      struct ibv_flow_attr *ibv_attr,
 			      struct mlx4_flow_action *action,
 			      struct rte_flow_error *error)
@@ -875,7 +875,7 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 	struct mlx4_flow flow = { .offset = sizeof(struct ibv_flow_attr), };
 	int err;
 
-	err = priv_flow_validate(priv, attr, items, actions, error, &flow);
+	err = mlx4_flow_prepare(priv, attr, items, actions, error, &flow);
 	if (err)
 		return NULL;
 	flow.ibv_attr = rte_malloc(__func__, flow.offset, 0);
@@ -894,8 +894,8 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 		.port = priv->port,
 		.flags = 0,
 	};
-	claim_zero(priv_flow_validate(priv, attr, items, actions,
-				      error, &flow));
+	claim_zero(mlx4_flow_prepare(priv, attr, items, actions,
+				     error, &flow));
 	action = (struct mlx4_flow_action){
 		.queue = 0,
 		.drop = 0,
@@ -917,7 +917,7 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 			goto exit;
 		}
 	}
-	rte_flow = priv_flow_create_action_queue(priv, flow.ibv_attr,
+	rte_flow = mlx4_flow_create_action_queue(priv, flow.ibv_attr,
 						 &action, error);
 	if (rte_flow) {
 		LIST_INSERT_HEAD(&priv->flows, rte_flow, next);
@@ -1015,7 +1015,7 @@ mlx4_flow_flush(struct rte_eth_dev *dev,
  *   Pointer to private structure.
  */
 void
-mlx4_priv_flow_stop(struct priv *priv)
+mlx4_flow_stop(struct priv *priv)
 {
 	struct rte_flow *flow;
 
@@ -1039,7 +1039,7 @@ mlx4_priv_flow_stop(struct priv *priv)
  *   0 on success, a errno value otherwise and rte_errno is set.
  */
 int
-mlx4_priv_flow_start(struct priv *priv)
+mlx4_flow_start(struct priv *priv)
 {
 	int ret;
 	struct ibv_qp *qp;
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index 8bd659c..a24ae31 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -74,8 +74,8 @@ struct mlx4_flow_action {
 
 /* mlx4_flow.c */
 
-int mlx4_priv_flow_start(struct priv *priv);
-void mlx4_priv_flow_stop(struct priv *priv);
+int mlx4_flow_start(struct priv *priv);
+void mlx4_flow_stop(struct priv *priv);
 int mlx4_filter_ctrl(struct rte_eth_dev *dev,
 		     enum rte_filter_type filter_type,
 		     enum rte_filter_op filter_op,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 48/51] net/mlx4: separate memory management functions
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (46 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 47/51] net/mlx4: rename private functions in flow API Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 49/51] net/mlx4: clean up includes and comments Adrien Mazarguil
                     ` (4 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

No impact on functionality.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/Makefile  |   1 +
 drivers/net/mlx4/mlx4.c    | 115 -------------------------
 drivers/net/mlx4/mlx4.h    |   8 +-
 drivers/net/mlx4/mlx4_mr.c | 183 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 188 insertions(+), 119 deletions(-)

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 16e5c5a..0515cd7 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -39,6 +39,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_intr.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_mr.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4_txq.c
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ed1081b..e8f7048 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -48,9 +48,7 @@
 #include <rte_dev.h>
 #include <rte_mbuf.h>
 #include <rte_errno.h>
-#include <rte_mempool.h>
 #include <rte_malloc.h>
-#include <rte_memory.h>
 #include <rte_kvargs.h>
 #include <rte_interrupts.h>
 #include <rte_common.h>
@@ -110,119 +108,6 @@ mlx4_dev_configure(struct rte_eth_dev *dev)
 	return 0;
 }
 
-struct mlx4_check_mempool_data {
-	int ret;
-	char *start;
-	char *end;
-};
-
-/* Called by mlx4_check_mempool() when iterating the memory chunks. */
-static void mlx4_check_mempool_cb(struct rte_mempool *mp,
-	void *opaque, struct rte_mempool_memhdr *memhdr,
-	unsigned mem_idx)
-{
-	struct mlx4_check_mempool_data *data = opaque;
-
-	(void)mp;
-	(void)mem_idx;
-	/* It already failed, skip the next chunks. */
-	if (data->ret != 0)
-		return;
-	/* It is the first chunk. */
-	if (data->start == NULL && data->end == NULL) {
-		data->start = memhdr->addr;
-		data->end = data->start + memhdr->len;
-		return;
-	}
-	if (data->end == memhdr->addr) {
-		data->end += memhdr->len;
-		return;
-	}
-	if (data->start == (char *)memhdr->addr + memhdr->len) {
-		data->start -= memhdr->len;
-		return;
-	}
-	/* Error, mempool is not virtually contigous. */
-	data->ret = -1;
-}
-
-/**
- * Check if a mempool can be used: it must be virtually contiguous.
- *
- * @param[in] mp
- *   Pointer to memory pool.
- * @param[out] start
- *   Pointer to the start address of the mempool virtual memory area
- * @param[out] end
- *   Pointer to the end address of the mempool virtual memory area
- *
- * @return
- *   0 on success (mempool is virtually contiguous), -1 on error.
- */
-static int mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start,
-	uintptr_t *end)
-{
-	struct mlx4_check_mempool_data data;
-
-	memset(&data, 0, sizeof(data));
-	rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, &data);
-	*start = (uintptr_t)data.start;
-	*end = (uintptr_t)data.end;
-	return data.ret;
-}
-
-/**
- * Register mempool as a memory region.
- *
- * @param pd
- *   Pointer to protection domain.
- * @param mp
- *   Pointer to memory pool.
- *
- * @return
- *   Memory region pointer, NULL in case of error and rte_errno is set.
- */
-struct ibv_mr *
-mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
-{
-	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-	uintptr_t start;
-	uintptr_t end;
-	unsigned int i;
-	struct ibv_mr *mr;
-
-	if (mlx4_check_mempool(mp, &start, &end) != 0) {
-		rte_errno = EINVAL;
-		ERROR("mempool %p: not virtually contiguous",
-			(void *)mp);
-		return NULL;
-	}
-	DEBUG("mempool %p area start=%p end=%p size=%zu",
-	      (void *)mp, (void *)start, (void *)end,
-	      (size_t)(end - start));
-	/* Round start and end to page boundary if found in memory segments. */
-	for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
-		uintptr_t addr = (uintptr_t)ms[i].addr;
-		size_t len = ms[i].len;
-		unsigned int align = ms[i].hugepage_sz;
-
-		if ((start > addr) && (start < addr + len))
-			start = RTE_ALIGN_FLOOR(start, align);
-		if ((end > addr) && (end < addr + len))
-			end = RTE_ALIGN_CEIL(end, align);
-	}
-	DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
-	      (void *)mp, (void *)start, (void *)end,
-	      (size_t)(end - start));
-	mr = ibv_reg_mr(pd,
-			(void *)start,
-			end - start,
-			IBV_ACCESS_LOCAL_WRITE);
-	if (!mr)
-		rte_errno = errno ? errno : EINVAL;
-	return mr;
-}
-
 /**
  * DPDK callback to start the device.
  *
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b5f2953..94b5f1e 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -115,10 +115,6 @@ struct priv {
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
 };
 
-/* mlx4.c */
-
-struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
-
 /* mlx4_ethdev.c */
 
 int mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE]);
@@ -144,4 +140,8 @@ int mlx4_intr_install(struct priv *priv);
 int mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
 
+/* mlx4_mr.c */
+
+struct ibv_mr *mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp);
+
 #endif /* RTE_PMD_MLX4_H_ */
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
new file mode 100644
index 0000000..9700884
--- /dev/null
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -0,0 +1,183 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ * Memory management functions for mlx4 driver.
+ */
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#include "mlx4_utils.h"
+
+struct mlx4_check_mempool_data {
+	int ret;
+	char *start;
+	char *end;
+};
+
+/**
+ * Called by mlx4_check_mempool() when iterating the memory chunks.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool (unused).
+ * @param[in, out] data
+ *   Pointer to shared buffer with mlx4_check_mempool().
+ * @param[in] memhdr
+ *   Pointer to mempool chunk header.
+ * @param mem_idx
+ *   Mempool element index (unused).
+ */
+static void
+mlx4_check_mempool_cb(struct rte_mempool *mp, void *opaque,
+		      struct rte_mempool_memhdr *memhdr,
+		      unsigned int mem_idx)
+{
+	struct mlx4_check_mempool_data *data = opaque;
+
+	(void)mp;
+	(void)mem_idx;
+	/* It already failed, skip the next chunks. */
+	if (data->ret != 0)
+		return;
+	/* It is the first chunk. */
+	if (data->start == NULL && data->end == NULL) {
+		data->start = memhdr->addr;
+		data->end = data->start + memhdr->len;
+		return;
+	}
+	if (data->end == memhdr->addr) {
+		data->end += memhdr->len;
+		return;
+	}
+	if (data->start == (char *)memhdr->addr + memhdr->len) {
+		data->start -= memhdr->len;
+		return;
+	}
+	/* Error, mempool is not virtually contiguous. */
+	data->ret = -1;
+}
+
+/**
+ * Check if a mempool can be used: it must be virtually contiguous.
+ *
+ * @param[in] mp
+ *   Pointer to memory pool.
+ * @param[out] start
+ *   Pointer to the start address of the mempool virtual memory area.
+ * @param[out] end
+ *   Pointer to the end address of the mempool virtual memory area.
+ *
+ * @return
+ *   0 on success (mempool is virtually contiguous), -1 on error.
+ */
+static int
+mlx4_check_mempool(struct rte_mempool *mp, uintptr_t *start, uintptr_t *end)
+{
+	struct mlx4_check_mempool_data data;
+
+	memset(&data, 0, sizeof(data));
+	rte_mempool_mem_iter(mp, mlx4_check_mempool_cb, &data);
+	*start = (uintptr_t)data.start;
+	*end = (uintptr_t)data.end;
+	return data.ret;
+}
+
+/**
+ * Register mempool as a memory region.
+ *
+ * @param pd
+ *   Pointer to protection domain.
+ * @param mp
+ *   Pointer to memory pool.
+ *
+ * @return
+ *   Memory region pointer, NULL in case of error and rte_errno is set.
+ */
+struct ibv_mr *
+mlx4_mp2mr(struct ibv_pd *pd, struct rte_mempool *mp)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	uintptr_t start;
+	uintptr_t end;
+	unsigned int i;
+	struct ibv_mr *mr;
+
+	if (mlx4_check_mempool(mp, &start, &end) != 0) {
+		rte_errno = EINVAL;
+		ERROR("mempool %p: not virtually contiguous",
+			(void *)mp);
+		return NULL;
+	}
+	DEBUG("mempool %p area start=%p end=%p size=%zu",
+	      (void *)mp, (void *)start, (void *)end,
+	      (size_t)(end - start));
+	/* Round start and end to page boundary if found in memory segments. */
+	for (i = 0; (i < RTE_MAX_MEMSEG) && (ms[i].addr != NULL); ++i) {
+		uintptr_t addr = (uintptr_t)ms[i].addr;
+		size_t len = ms[i].len;
+		unsigned int align = ms[i].hugepage_sz;
+
+		if ((start > addr) && (start < addr + len))
+			start = RTE_ALIGN_FLOOR(start, align);
+		if ((end > addr) && (end < addr + len))
+			end = RTE_ALIGN_CEIL(end, align);
+	}
+	DEBUG("mempool %p using start=%p end=%p size=%zu for MR",
+	      (void *)mp, (void *)start, (void *)end,
+	      (size_t)(end - start));
+	mr = ibv_reg_mr(pd,
+			(void *)start,
+			end - start,
+			IBV_ACCESS_LOCAL_WRITE);
+	if (!mr)
+		rte_errno = errno ? errno : EINVAL;
+	return mr;
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 49/51] net/mlx4: clean up includes and comments
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (47 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 48/51] net/mlx4: separate memory management functions Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 50/51] net/mlx4: remove isolated mode constraint Adrien Mazarguil
                     ` (3 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

Add missing includes and sort them, then update/remove comments around them
for consistency.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c      | 40 ++++++++++++++++++++++++---------------
 drivers/net/mlx4/mlx4.h      |  3 +--
 drivers/net/mlx4/mlx4_flow.c |  5 +++++
 drivers/net/mlx4/mlx4_flow.h |  3 +--
 4 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index e8f7048..317d0e6 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -31,29 +31,41 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-/* System headers. */
+/**
+ * @file
+ * mlx4 driver initialization.
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <inttypes.h>
 #include <stddef.h>
+#include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
-#include <stdint.h>
-#include <inttypes.h>
 #include <string.h>
-#include <errno.h>
 #include <unistd.h>
-#include <assert.h>
 
-#include <rte_ether.h>
-#include <rte_ethdev.h>
-#include <rte_ethdev_pci.h>
+/* Verbs headers do not support -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+#include <rte_common.h>
 #include <rte_dev.h>
-#include <rte_mbuf.h>
 #include <rte_errno.h>
-#include <rte_malloc.h>
-#include <rte_kvargs.h>
+#include <rte_ethdev.h>
+#include <rte_ethdev_pci.h>
+#include <rte_ether.h>
 #include <rte_interrupts.h>
-#include <rte_common.h>
+#include <rte_kvargs.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
 
-/* PMD headers. */
 #include "mlx4.h"
 #include "mlx4_flow.h"
 #include "mlx4_rxtx.h"
@@ -73,8 +85,6 @@ const char *pmd_mlx4_init_params[] = {
 	NULL,
 };
 
-/* Device configuration. */
-
 /**
  * DPDK callback for Ethernet device configuration.
  *
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 94b5f1e..1cd4db3 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -37,8 +37,7 @@
 #include <net/if.h>
 #include <stdint.h>
 
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+/* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index 5616b83..e2798f6 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -31,6 +31,11 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+/**
+ * @file
+ * Flow API operations for mlx4 driver.
+ */
+
 #include <arpa/inet.h>
 #include <assert.h>
 #include <errno.h>
diff --git a/drivers/net/mlx4/mlx4_flow.h b/drivers/net/mlx4/mlx4_flow.h
index a24ae31..fbb775d 100644
--- a/drivers/net/mlx4/mlx4_flow.h
+++ b/drivers/net/mlx4/mlx4_flow.h
@@ -37,8 +37,7 @@
 #include <stdint.h>
 #include <sys/queue.h>
 
-/* Verbs header. */
-/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+/* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
 #pragma GCC diagnostic ignored "-Wpedantic"
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 50/51] net/mlx4: remove isolated mode constraint
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (48 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 49/51] net/mlx4: clean up includes and comments Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01  8:07   ` [PATCH v2 51/51] net/mlx4: rely on ethdev for Tx/Rx queue arrays Adrien Mazarguil
                     ` (2 subsequent siblings)
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

Considering the remaining functionality, the only difference between
isolated and non-isolated mode is that a default MAC flow rule is present
with the latter.

The restriction on enabling isolated mode before creating any queues can
therefore be lifted.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4_flow.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index e2798f6..e177545 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -935,20 +935,10 @@ mlx4_flow_create(struct rte_eth_dev *dev,
 }
 
 /**
- * @see rte_flow_isolate()
- *
- * Must be done before calling dev_configure().
+ * Configure isolated mode.
  *
- * @param dev
- *   Pointer to the ethernet device structure.
- * @param enable
- *   Nonzero to enter isolated mode, attempt to leave it otherwise.
- * @param[out] error
- *   Perform verbose error reporting if not NULL. PMDs initialize this
- *   structure in case of error only.
- *
- * @return
- *   0 on success, a negative value on error.
+ * @see rte_flow_isolate()
+ * @see rte_flow_ops
  */
 static int
 mlx4_flow_isolate(struct rte_eth_dev *dev,
@@ -957,14 +947,17 @@ mlx4_flow_isolate(struct rte_eth_dev *dev,
 {
 	struct priv *priv = dev->data->dev_private;
 
-	if (priv->rxqs) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				   NULL, "isolated mode must be set"
-				   " before configuring the device");
-		return -rte_errno;
-	}
+	if (!!enable == !!priv->isolated)
+		return 0;
 	priv->isolated = !!enable;
+	if (enable) {
+		mlx4_mac_addr_del(priv);
+	} else if (mlx4_mac_addr_add(priv) < 0) {
+		priv->isolated = 1;
+		return -rte_flow_error_set(error, rte_errno,
+					   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+					   NULL, "cannot leave isolated mode");
+	}
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v2 51/51] net/mlx4: rely on ethdev for Tx/Rx queue arrays
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (49 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 50/51] net/mlx4: remove isolated mode constraint Adrien Mazarguil
@ 2017-09-01  8:07   ` Adrien Mazarguil
  2017-09-01 11:24   ` [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD Ferruh Yigit
  2017-09-05  9:59   ` Ferruh Yigit
  52 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01  8:07 UTC (permalink / raw)
  To: dev

Allocation and management of Tx/Rx queue arrays is done by wrappers at the
ethdev level. The resulting information is copied to the private structure
while configuring the device, where it is managed separately by the PMD.

This is redundant and consumes space in the private structure.

Relying more on ethdev also means there is no need to protect the PMD
against burst function calls while closing the device anymore.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx4/mlx4.c        | 58 ++++---------------------------------
 drivers/net/mlx4/mlx4.h        |  5 ----
 drivers/net/mlx4/mlx4_ethdev.c | 41 ++++++++++++--------------
 drivers/net/mlx4/mlx4_flow.c   |  5 ++--
 drivers/net/mlx4/mlx4_intr.c   | 10 +++----
 drivers/net/mlx4/mlx4_rxq.c    | 20 ++++++-------
 drivers/net/mlx4/mlx4_txq.c    | 16 +++++-----
 7 files changed, 48 insertions(+), 107 deletions(-)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 317d0e6..b084903 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -44,7 +44,6 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-#include <unistd.h>
 
 /* Verbs headers do not support -pedantic. */
 #ifdef PEDANTIC
@@ -88,8 +87,6 @@ const char *pmd_mlx4_init_params[] = {
 /**
  * DPDK callback for Ethernet device configuration.
  *
- * Prepare the driver for a given number of TX and RX queues.
- *
  * @param dev
  *   Pointer to Ethernet device structure.
  *
@@ -99,22 +96,7 @@ const char *pmd_mlx4_init_params[] = {
 static int
 mlx4_dev_configure(struct rte_eth_dev *dev)
 {
-	struct priv *priv = dev->data->dev_private;
-	unsigned int rxqs_n = dev->data->nb_rx_queues;
-	unsigned int txqs_n = dev->data->nb_tx_queues;
-
-	priv->rxqs = (void *)dev->data->rx_queues;
-	priv->txqs = (void *)dev->data->tx_queues;
-	if (txqs_n != priv->txqs_n) {
-		INFO("%p: TX queues number update: %u -> %u",
-		     (void *)dev, priv->txqs_n, txqs_n);
-		priv->txqs_n = txqs_n;
-	}
-	if (rxqs_n != priv->rxqs_n) {
-		INFO("%p: Rx queues number update: %u -> %u",
-		     (void *)dev, priv->rxqs_n, rxqs_n);
-		priv->rxqs_n = rxqs_n;
-	}
+	(void)dev;
 	return 0;
 }
 
@@ -196,7 +178,6 @@ static void
 mlx4_dev_close(struct rte_eth_dev *dev)
 {
 	struct priv *priv = dev->data->dev_private;
-	void *tmp;
 	unsigned int i;
 
 	if (priv == NULL)
@@ -205,41 +186,12 @@ mlx4_dev_close(struct rte_eth_dev *dev)
 	      (void *)dev,
 	      ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
 	mlx4_mac_addr_del(priv);
-	/*
-	 * Prevent crashes when queues are still in use. This is unfortunately
-	 * still required for DPDK 1.3 because some programs (such as testpmd)
-	 * never release them before closing the device.
-	 */
 	dev->rx_pkt_burst = mlx4_rx_burst_removed;
 	dev->tx_pkt_burst = mlx4_tx_burst_removed;
-	if (priv->rxqs != NULL) {
-		/* XXX race condition if mlx4_rx_burst() is still running. */
-		usleep(1000);
-		for (i = 0; (i != priv->rxqs_n); ++i) {
-			tmp = (*priv->rxqs)[i];
-			if (tmp == NULL)
-				continue;
-			(*priv->rxqs)[i] = NULL;
-			mlx4_rxq_cleanup(tmp);
-			rte_free(tmp);
-		}
-		priv->rxqs_n = 0;
-		priv->rxqs = NULL;
-	}
-	if (priv->txqs != NULL) {
-		/* XXX race condition if mlx4_tx_burst() is still running. */
-		usleep(1000);
-		for (i = 0; (i != priv->txqs_n); ++i) {
-			tmp = (*priv->txqs)[i];
-			if (tmp == NULL)
-				continue;
-			(*priv->txqs)[i] = NULL;
-			mlx4_txq_cleanup(tmp);
-			rte_free(tmp);
-		}
-		priv->txqs_n = 0;
-		priv->txqs = NULL;
-	}
+	for (i = 0; i != dev->data->nb_rx_queues; ++i)
+		mlx4_rx_queue_release(dev->data->rx_queues[i]);
+	for (i = 0; i != dev->data->nb_tx_queues; ++i)
+		mlx4_tx_queue_release(dev->data->tx_queues[i]);
 	if (priv->pd != NULL) {
 		assert(priv->ctx != NULL);
 		claim_zero(ibv_dealloc_pd(priv->pd));
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 1cd4db3..93e5502 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -104,11 +104,6 @@ struct priv {
 	unsigned int vf:1; /* This is a VF device. */
 	unsigned int intr_alarm:1; /* An interrupt alarm is scheduled. */
 	unsigned int isolated:1; /* Toggle isolated mode. */
-	/* RX/TX queues. */
-	unsigned int rxqs_n; /* RX queues array size. */
-	unsigned int txqs_n; /* TX queues array size. */
-	struct rxq *(*rxqs)[]; /* RX queues. */
-	struct txq *(*txqs)[]; /* TX queues. */
 	struct rte_intr_handle intr_handle; /* Port interrupt handle. */
 	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
 	LIST_HEAD(mlx4_flows, rte_flow) flows;
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 5f1dba2..a9e8059 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -574,17 +574,14 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 void
 mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
-	struct priv *priv = dev->data->dev_private;
 	struct rte_eth_stats tmp;
 	unsigned int i;
 	unsigned int idx;
 
-	if (priv == NULL)
-		return;
 	memset(&tmp, 0, sizeof(tmp));
 	/* Add software counters. */
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
+	for (i = 0; i != dev->data->nb_rx_queues; ++i) {
+		struct rxq *rxq = dev->data->rx_queues[i];
 
 		if (rxq == NULL)
 			continue;
@@ -600,8 +597,8 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 		tmp.ierrors += rxq->stats.idropped;
 		tmp.rx_nombuf += rxq->stats.rx_nombuf;
 	}
-	for (i = 0; (i != priv->txqs_n); ++i) {
-		struct txq *txq = (*priv->txqs)[i];
+	for (i = 0; i != dev->data->nb_tx_queues; ++i) {
+		struct txq *txq = dev->data->tx_queues[i];
 
 		if (txq == NULL)
 			continue;
@@ -627,25 +624,23 @@ mlx4_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 void
 mlx4_stats_reset(struct rte_eth_dev *dev)
 {
-	struct priv *priv = dev->data->dev_private;
 	unsigned int i;
-	unsigned int idx;
 
-	if (priv == NULL)
-		return;
-	for (i = 0; (i != priv->rxqs_n); ++i) {
-		if ((*priv->rxqs)[i] == NULL)
-			continue;
-		idx = (*priv->rxqs)[i]->stats.idx;
-		(*priv->rxqs)[i]->stats =
-			(struct mlx4_rxq_stats){ .idx = idx };
+	for (i = 0; i != dev->data->nb_rx_queues; ++i) {
+		struct rxq *rxq = dev->data->rx_queues[i];
+
+		if (rxq)
+			rxq->stats = (struct mlx4_rxq_stats){
+				.idx = rxq->stats.idx,
+			};
 	}
-	for (i = 0; (i != priv->txqs_n); ++i) {
-		if ((*priv->txqs)[i] == NULL)
-			continue;
-		idx = (*priv->txqs)[i]->stats.idx;
-		(*priv->txqs)[i]->stats =
-			(struct mlx4_txq_stats){ .idx = idx };
+	for (i = 0; i != dev->data->nb_tx_queues; ++i) {
+		struct txq *txq = dev->data->tx_queues[i];
+
+		if (txq)
+			txq->stats = (struct mlx4_txq_stats){
+				.idx = txq->stats.idx,
+			};
 	}
 }
 
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index e177545..0885a91 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -691,7 +691,8 @@ mlx4_flow_prepare(struct priv *priv,
 				(const struct rte_flow_action_queue *)
 				actions->conf;
 
-			if (!queue || (queue->index > (priv->rxqs_n - 1)))
+			if (!queue || (queue->index >
+				       (priv->dev->data->nb_rx_queues - 1)))
 				goto exit_action_not_supported;
 			action.queue = 1;
 		} else {
@@ -841,7 +842,7 @@ mlx4_flow_create_action_queue(struct priv *priv,
 	if (action->drop) {
 		qp = priv->flow_drop_queue ? priv->flow_drop_queue->qp : NULL;
 	} else {
-		struct rxq *rxq = (*priv->rxqs)[action->queue_id];
+		struct rxq *rxq = priv->dev->data->rx_queues[action->queue_id];
 
 		qp = rxq->qp;
 		rte_flow->qp = qp;
diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
index 76d2e01..e3449ee 100644
--- a/drivers/net/mlx4/mlx4_intr.c
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -91,7 +91,7 @@ static int
 mlx4_rx_intr_vec_enable(struct priv *priv)
 {
 	unsigned int i;
-	unsigned int rxqs_n = priv->rxqs_n;
+	unsigned int rxqs_n = priv->dev->data->nb_rx_queues;
 	unsigned int n = RTE_MIN(rxqs_n, (uint32_t)RTE_MAX_RXTX_INTR_VEC_ID);
 	unsigned int count = 0;
 	struct rte_intr_handle *intr_handle = &priv->intr_handle;
@@ -105,7 +105,7 @@ mlx4_rx_intr_vec_enable(struct priv *priv)
 		return -rte_errno;
 	}
 	for (i = 0; i != n; ++i) {
-		struct rxq *rxq = (*priv->rxqs)[i];
+		struct rxq *rxq = priv->dev->data->rx_queues[i];
 
 		/* Skip queues that cannot request interrupts. */
 		if (!rxq || !rxq->channel) {
@@ -324,8 +324,7 @@ mlx4_intr_install(struct priv *priv)
 int
 mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
 {
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
+	struct rxq *rxq = dev->data->rx_queues[idx];
 	struct ibv_cq *ev_cq;
 	void *ev_ctx;
 	int ret;
@@ -361,8 +360,7 @@ mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx)
 int
 mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx)
 {
-	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
+	struct rxq *rxq = dev->data->rx_queues[idx];
 	int ret;
 
 	if (!rxq || !rxq->channel)
diff --git a/drivers/net/mlx4/mlx4_rxq.c b/drivers/net/mlx4/mlx4_rxq.c
index 7f675a4..409983f 100644
--- a/drivers/net/mlx4/mlx4_rxq.c
+++ b/drivers/net/mlx4/mlx4_rxq.c
@@ -403,15 +403,15 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		    struct rte_mempool *mp)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct rxq *rxq = (*priv->rxqs)[idx];
+	struct rxq *rxq = dev->data->rx_queues[idx];
 	int ret;
 
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
-	if (idx >= priv->rxqs_n) {
+	if (idx >= dev->data->nb_rx_queues) {
 		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, priv->rxqs_n);
+		      (void *)dev, idx, dev->data->nb_rx_queues);
 		return -rte_errno;
 	}
 	if (rxq != NULL) {
@@ -421,7 +421,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = EEXIST;
 			return -rte_errno;
 		}
-		(*priv->rxqs)[idx] = NULL;
+		dev->data->rx_queues[idx] = NULL;
 		if (idx == 0)
 			mlx4_mac_addr_del(priv);
 		mlx4_rxq_cleanup(rxq);
@@ -441,7 +441,7 @@ mlx4_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		rxq->stats.idx = idx;
 		DEBUG("%p: adding Rx queue %p to list",
 		      (void *)dev, (void *)rxq);
-		(*priv->rxqs)[idx] = rxq;
+		dev->data->rx_queues[idx] = rxq;
 		/* Update receive callback. */
 		dev->rx_pkt_burst = mlx4_rx_burst;
 	}
@@ -464,11 +464,11 @@ mlx4_rx_queue_release(void *dpdk_rxq)
 	if (rxq == NULL)
 		return;
 	priv = rxq->priv;
-	for (i = 0; (i != priv->rxqs_n); ++i)
-		if ((*priv->rxqs)[i] == rxq) {
+	for (i = 0; i != priv->dev->data->nb_rx_queues; ++i)
+		if (priv->dev->data->rx_queues[i] == rxq) {
 			DEBUG("%p: removing Rx queue %p from list",
 			      (void *)priv->dev, (void *)rxq);
-			(*priv->rxqs)[i] = NULL;
+			priv->dev->data->rx_queues[i] = NULL;
 			if (i == 0)
 				mlx4_mac_addr_del(priv);
 			break;
@@ -522,8 +522,8 @@ mlx4_mac_addr_add(struct priv *priv)
 		return 0;
 	if (priv->isolated)
 		return 0;
-	if (*priv->rxqs && (*priv->rxqs)[0])
-		rxq = (*priv->rxqs)[0];
+	if (priv->dev->data->rx_queues && priv->dev->data->rx_queues[0])
+		rxq = priv->dev->data->rx_queues[0];
 	else
 		return 0;
 
diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c
index 945833b..e0245b0 100644
--- a/drivers/net/mlx4/mlx4_txq.c
+++ b/drivers/net/mlx4/mlx4_txq.c
@@ -401,15 +401,15 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		    unsigned int socket, const struct rte_eth_txconf *conf)
 {
 	struct priv *priv = dev->data->dev_private;
-	struct txq *txq = (*priv->txqs)[idx];
+	struct txq *txq = dev->data->tx_queues[idx];
 	int ret;
 
 	DEBUG("%p: configuring queue %u for %u descriptors",
 	      (void *)dev, idx, desc);
-	if (idx >= priv->txqs_n) {
+	if (idx >= dev->data->nb_tx_queues) {
 		rte_errno = EOVERFLOW;
 		ERROR("%p: queue index out of range (%u >= %u)",
-		      (void *)dev, idx, priv->txqs_n);
+		      (void *)dev, idx, dev->data->nb_tx_queues);
 		return -rte_errno;
 	}
 	if (txq != NULL) {
@@ -419,7 +419,7 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 			rte_errno = EEXIST;
 			return -rte_errno;
 		}
-		(*priv->txqs)[idx] = NULL;
+		dev->data->tx_queues[idx] = NULL;
 		mlx4_txq_cleanup(txq);
 	} else {
 		txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
@@ -437,7 +437,7 @@ mlx4_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
 		txq->stats.idx = idx;
 		DEBUG("%p: adding Tx queue %p to list",
 		      (void *)dev, (void *)txq);
-		(*priv->txqs)[idx] = txq;
+		dev->data->tx_queues[idx] = txq;
 		/* Update send callback. */
 		dev->tx_pkt_burst = mlx4_tx_burst;
 	}
@@ -460,11 +460,11 @@ mlx4_tx_queue_release(void *dpdk_txq)
 	if (txq == NULL)
 		return;
 	priv = txq->priv;
-	for (i = 0; (i != priv->txqs_n); ++i)
-		if ((*priv->txqs)[i] == txq) {
+	for (i = 0; i != priv->dev->data->nb_tx_queues; ++i)
+		if (priv->dev->data->tx_queues[i] == txq) {
 			DEBUG("%p: removing Tx queue %p from list",
 			      (void *)priv->dev, (void *)txq);
-			(*priv->txqs)[i] = NULL;
+			priv->dev->data->tx_queues[i] = NULL;
 			break;
 		}
 	mlx4_txq_cleanup(txq);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 04/48] net/mlx4: remove useless compilation checks
  2017-08-18 13:39   ` Ferruh Yigit
@ 2017-09-01 10:19     ` Adrien Mazarguil
  0 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01 10:19 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Hi Ferruh,

On Fri, Aug 18, 2017 at 02:39:18PM +0100, Ferruh Yigit wrote:
> On 8/1/2017 5:53 PM, Adrien Mazarguil wrote:
> > Verbs support for RSS, inline receive and extended device query calls has
> > not been optional for a while. Their absence is untested and is therefore
> > unsupported.
> > 
> > Remove the related compilation checks and assume Mellanox OFED is up to
> > date, as described in the documentation.
> 
> So this requires Mellanox OFED 4.1 is there,

Well, the PMD most likely works with versions older than that, but they are
not officially supported anymore. These changes assert that no effort is
made to maintain compatibility.

> is there a check for the OFED version, or do you think does it required?

I think it's not necessary. You know, the existing checks haven't been
validated for a very long time, mlx4 probably does not even compile anymore
without these features. Even if it does, it might be unable to perform TX/RX
at all. Getting a compilation failure at least makes things clear.

> 
> > 
> > Use this opportunity to remove a few useless data path debugging messages
> > behind compilation checks on never defined macros.
> > 
> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> 
> <...>
> 

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 03/51] net/mlx4: check max number of ports dynamically
  2017-09-01  8:06   ` [PATCH v2 03/51] net/mlx4: check max number of ports dynamically Adrien Mazarguil
@ 2017-09-01 10:57     ` Legacy, Allain
  0 siblings, 0 replies; 110+ messages in thread
From: Legacy, Allain @ 2017-09-01 10:57 UTC (permalink / raw)
  To: Adrien Mazarguil, dev; +Cc: Gaëtan Rivet


> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Friday, September 01, 2017 4:06 AM
> To: dev@dpdk.org
> Cc: Gaëtan Rivet; Legacy, Allain
> Subject: [PATCH v2 03/51] net/mlx4: check max number of ports dynamically
> 
> Use maximum number reported by hardware capabilities as replacement for
> the static check on MLX4_PMD_MAX_PHYS_PORTS.
> 
> Cc: Gaëtan Rivet <gaetan.rivet@6wind.com>
> Cc: Allain Legacy <allain.legacy@windriver.com>
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Acked-by:  Allain Legacy <allain.legacy@windriver.com>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (50 preceding siblings ...)
  2017-09-01  8:07   ` [PATCH v2 51/51] net/mlx4: rely on ethdev for Tx/Rx queue arrays Adrien Mazarguil
@ 2017-09-01 11:24   ` Ferruh Yigit
  2017-09-01 11:56     ` Adrien Mazarguil
  2017-09-05  9:59   ` Ferruh Yigit
  52 siblings, 1 reply; 110+ messages in thread
From: Ferruh Yigit @ 2017-09-01 11:24 UTC (permalink / raw)
  To: Adrien Mazarguil, dev

Hi Adrien,

On 9/1/2017 9:06 AM, Adrien Mazarguil wrote:
> The main purpose of this large series is to relieve the mlx4 PMD from its
> dependency on Mellanox OFED to instead rely on the standard rdma-core
> package provided by Linux distributions.
> 
> While compatibility with Mellanox OFED is preserved, all nonstandard
> functionality has to be stripped from the PMD in order to re-implement it
> through an approach compatible with rdma-core.
> 
> Due to the amount of changes necessary to achieve this goal, this rework
> starts off by removing extraneous code to simplify the PMD as much as
> possible before either replacing or dismantling functionality that relies on
> nonstandard Verbs.
> 
> What remains after applying this series is single-segment Tx/Rx support,
> without offloads nor RSS, on the default MAC address (which cannot be
> configured). Support for multiple queues and the flow API (minus the RSS
> action) are also preserved.
> 
> Missing functionality that needs substantial work will be restored later by
> subsequent series.

Thanks for comprehensive re-work, out of curiosity, is adding removed
functionality planned for this release?

> 
> Also because the mlx4 PMD is mostly contained in a single very large source
> file of 6400+ lines (mlx4.c) which has become extremely difficult to
> maintain, this rework is used as an opportunity to finally group functions
> into separate files, as in mlx5.
> 
> This rework targets DPDK 17.11.

<...>

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD
  2017-09-01 11:24   ` [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD Ferruh Yigit
@ 2017-09-01 11:56     ` Adrien Mazarguil
  0 siblings, 0 replies; 110+ messages in thread
From: Adrien Mazarguil @ 2017-09-01 11:56 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Fri, Sep 01, 2017 at 12:24:40PM +0100, Ferruh Yigit wrote:
> Hi Adrien,
> 
> On 9/1/2017 9:06 AM, Adrien Mazarguil wrote:
> > The main purpose of this large series is to relieve the mlx4 PMD from its
> > dependency on Mellanox OFED to instead rely on the standard rdma-core
> > package provided by Linux distributions.
> > 
> > While compatibility with Mellanox OFED is preserved, all nonstandard
> > functionality has to be stripped from the PMD in order to re-implement it
> > through an approach compatible with rdma-core.
> > 
> > Due to the amount of changes necessary to achieve this goal, this rework
> > starts off by removing extraneous code to simplify the PMD as much as
> > possible before either replacing or dismantling functionality that relies on
> > nonstandard Verbs.
> > 
> > What remains after applying this series is single-segment Tx/Rx support,
> > without offloads nor RSS, on the default MAC address (which cannot be
> > configured). Support for multiple queues and the flow API (minus the RSS
> > action) are also preserved.
> > 
> > Missing functionality that needs substantial work will be restored later by
> > subsequent series.
> 
> Thanks for comprehensive re-work, out of curiosity, is adding removed
> functionality planned for this release?

Yes, well for the most part. This includes TX/RX with enhanced performance
(several patches already on ML but I need to review them), scatter/gather,
checksum offloads, packet type recognition, RSS, MAC/VLAN/promisc/allmulti
filtering and related configuration.

Actually, only secondary process support might be missing from the next
release. Since we're separately working on a new approach for mlx5, we'll
see how well it performs before considering a similar change for mlx4.

> 
> > 
> > Also because the mlx4 PMD is mostly contained in a single very large source
> > file of 6400+ lines (mlx4.c) which has become extremely difficult to
> > maintain, this rework is used as an opportunity to finally group functions
> > into separate files, as in mlx5.
> > 
> > This rework targets DPDK 17.11.
> 
> <...>
> 

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD
  2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
                     ` (51 preceding siblings ...)
  2017-09-01 11:24   ` [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD Ferruh Yigit
@ 2017-09-05  9:59   ` Ferruh Yigit
  52 siblings, 0 replies; 110+ messages in thread
From: Ferruh Yigit @ 2017-09-05  9:59 UTC (permalink / raw)
  To: Adrien Mazarguil, dev

On 9/1/2017 9:06 AM, Adrien Mazarguil wrote:
> The main purpose of this large series is to relieve the mlx4 PMD from its
> dependency on Mellanox OFED to instead rely on the standard rdma-core
> package provided by Linux distributions.
> 
> While compatibility with Mellanox OFED is preserved, all nonstandard
> functionality has to be stripped from the PMD in order to re-implement it
> through an approach compatible with rdma-core.
> 
> Due to the amount of changes necessary to achieve this goal, this rework
> starts off by removing extraneous code to simplify the PMD as much as
> possible before either replacing or dismantling functionality that relies on
> nonstandard Verbs.
> 
> What remains after applying this series is single-segment Tx/Rx support,
> without offloads nor RSS, on the default MAC address (which cannot be
> configured). Support for multiple queues and the flow API (minus the RSS
> action) are also preserved.
> 
> Missing functionality that needs substantial work will be restored later by
> subsequent series.
> 
> Also because the mlx4 PMD is mostly contained in a single very large source
> file of 6400+ lines (mlx4.c) which has become extremely difficult to
> maintain, this rework is used as an opportunity to finally group functions
> into separate files, as in mlx5.
> 
> This rework targets DPDK 17.11.
> 
> Changes since v1:
> 
> - Rebased series on top of the latest upstream fixes.
> 
> - Cleaned up remaining typos and coding style issues.
> 
> - "net/mlx4: check max number of ports dynamically":
>   Removed extra loop and added error message on maximum number of ports
>   according to Allain's suggestion.
> 
> - "net/mlx4: drop scatter/gather support":
>   Additionally removed unnecessary mbuf pool from rxq_alloc_elts().
> 
> - "net/mlx4: simplify Rx buffer handling":
>   New patch removing unnecessary code from the simplified Rx path.
> 
> - "net/mlx4: remove isolated mode constraint":
>   New patch removing needless constraint for isolated mode, which can now
>   be toggled anytime.
> 
> - "net/mlx4: rely on ethdev for Tx/Rx queue arrays":
>   New patch refactoring duplicated information from ethdev.
> 
> Adrien Mazarguil (51):
>   net/mlx4: add consistency to copyright notices
>   net/mlx4: remove limitation on number of instances
>   net/mlx4: check max number of ports dynamically
>   net/mlx4: remove useless compilation checks
>   net/mlx4: remove secondary process support
>   net/mlx4: remove useless code
>   net/mlx4: remove soft counters compilation option
>   net/mlx4: remove scatter mode compilation option
>   net/mlx4: remove Tx inline compilation option
>   net/mlx4: remove allmulti and promisc support
>   net/mlx4: remove VLAN filter support
>   net/mlx4: remove MAC address configuration support
>   net/mlx4: drop MAC flows affecting all Rx queues
>   net/mlx4: revert flow API RSS support
>   net/mlx4: revert RSS parent queue refactoring
>   net/mlx4: drop RSS support
>   net/mlx4: drop checksum offloads support
>   net/mlx4: drop packet type recognition support
>   net/mlx4: drop scatter/gather support
>   net/mlx4: drop inline receive support
>   net/mlx4: use standard QP attributes
>   net/mlx4: revert resource domain support
>   net/mlx4: revert multicast echo prevention
>   net/mlx4: revert fast Verbs interface for Tx
>   net/mlx4: revert fast Verbs interface for Rx
>   net/mlx4: simplify Rx buffer handling
>   net/mlx4: simplify link update function
>   net/mlx4: standardize on negative errno values
>   net/mlx4: clean up coding style inconsistencies
>   net/mlx4: remove control path locks
>   net/mlx4: remove unnecessary wrapper functions
>   net/mlx4: remove mbuf macro definitions
>   net/mlx4: use standard macro to get array size
>   net/mlx4: separate debugging macros
>   net/mlx4: use a single interrupt handle
>   net/mlx4: rename alarm field
>   net/mlx4: refactor interrupt FD settings
>   net/mlx4: clean up interrupt functions prototypes
>   net/mlx4: compact interrupt functions
>   net/mlx4: separate interrupt handling
>   net/mlx4: separate Rx/Tx definitions
>   net/mlx4: separate Rx/Tx functions
>   net/mlx4: separate device control functions
>   net/mlx4: separate Tx configuration functions
>   net/mlx4: separate Rx configuration functions
>   net/mlx4: group flow API handlers in common file
>   net/mlx4: rename private functions in flow API
>   net/mlx4: separate memory management functions
>   net/mlx4: clean up includes and comments
>   net/mlx4: remove isolated mode constraint
>   net/mlx4: rely on ethdev for Tx/Rx queue arrays

Series applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2017-09-05  9:59 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 16:53 [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 01/48] net/mlx4: add consistency to copyright notices Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 02/48] net/mlx4: remove limitation on number of instances Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 03/48] net/mlx4: check max number of ports dynamically Adrien Mazarguil
2017-08-01 17:35   ` Legacy, Allain
2017-08-02  7:52     ` Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 04/48] net/mlx4: remove useless compilation checks Adrien Mazarguil
2017-08-18 13:39   ` Ferruh Yigit
2017-09-01 10:19     ` Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 05/48] net/mlx4: remove secondary process support Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 06/48] net/mlx4: remove useless code Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 07/48] net/mlx4: remove soft counters compilation option Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 08/48] net/mlx4: remove scatter mode " Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 09/48] net/mlx4: remove Tx inline " Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 10/48] net/mlx4: remove allmulti and promisc support Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 11/48] net/mlx4: remove VLAN filter support Adrien Mazarguil
2017-08-01 16:53 ` [PATCH v1 12/48] net/mlx4: remove MAC address configuration support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 13/48] net/mlx4: drop MAC flows affecting all Rx queues Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 14/48] net/mlx4: revert flow API RSS support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 15/48] net/mlx4: revert RSS parent queue refactoring Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 16/48] net/mlx4: drop RSS support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 17/48] net/mlx4: drop checksum offloads support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 18/48] net/mlx4: drop packet type recognition support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 19/48] net/mlx4: drop scatter/gather support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 20/48] net/mlx4: drop inline receive support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 21/48] net/mlx4: use standard QP attributes Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 22/48] net/mlx4: revert resource domain support Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 23/48] net/mlx4: revert multicast echo prevention Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 24/48] net/mlx4: revert fast Verbs interface for Tx Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 25/48] net/mlx4: revert fast Verbs interface for Rx Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 26/48] net/mlx4: simplify link update function Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 27/48] net/mlx4: standardize on negative errno values Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 28/48] net/mlx4: clean up coding style inconsistencies Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 29/48] net/mlx4: remove control path locks Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 30/48] net/mlx4: remove unnecessary wrapper functions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 31/48] net/mlx4: remove mbuf macro definitions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 32/48] net/mlx4: use standard macro to get array size Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 33/48] net/mlx4: separate debugging macros Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 34/48] net/mlx4: use a single interrupt handle Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 35/48] net/mlx4: rename alarm field Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 36/48] net/mlx4: refactor interrupt FD settings Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 37/48] net/mlx4: clean up interrupt functions prototypes Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 38/48] net/mlx4: compact interrupt functions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 39/48] net/mlx4: separate interrupt handling Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 40/48] net/mlx4: separate Rx/Tx definitions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 41/48] net/mlx4: separate Rx/Tx functions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 42/48] net/mlx4: separate device control functions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 43/48] net/mlx4: separate Tx configuration functions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 44/48] net/mlx4: separate Rx " Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 45/48] net/mlx4: group flow API handlers in common file Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 46/48] net/mlx4: rename private functions in flow API Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 47/48] net/mlx4: separate memory management functions Adrien Mazarguil
2017-08-01 16:54 ` [PATCH v1 48/48] net/mlx4: clean up includes and comments Adrien Mazarguil
2017-08-18 13:28 ` [PATCH v1 00/48] net/mlx4: trim and refactor entire PMD Ferruh Yigit
2017-09-01  8:06 ` [PATCH v2 00/51] " Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 01/51] net/mlx4: add consistency to copyright notices Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 02/51] net/mlx4: remove limitation on number of instances Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 03/51] net/mlx4: check max number of ports dynamically Adrien Mazarguil
2017-09-01 10:57     ` Legacy, Allain
2017-09-01  8:06   ` [PATCH v2 04/51] net/mlx4: remove useless compilation checks Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 05/51] net/mlx4: remove secondary process support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 06/51] net/mlx4: remove useless code Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 07/51] net/mlx4: remove soft counters compilation option Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 08/51] net/mlx4: remove scatter mode " Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 09/51] net/mlx4: remove Tx inline " Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 10/51] net/mlx4: remove allmulti and promisc support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 11/51] net/mlx4: remove VLAN filter support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 12/51] net/mlx4: remove MAC address configuration support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 13/51] net/mlx4: drop MAC flows affecting all Rx queues Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 14/51] net/mlx4: revert flow API RSS support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 15/51] net/mlx4: revert RSS parent queue refactoring Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 16/51] net/mlx4: drop RSS support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 17/51] net/mlx4: drop checksum offloads support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 18/51] net/mlx4: drop packet type recognition support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 19/51] net/mlx4: drop scatter/gather support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 20/51] net/mlx4: drop inline receive support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 21/51] net/mlx4: use standard QP attributes Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 22/51] net/mlx4: revert resource domain support Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 23/51] net/mlx4: revert multicast echo prevention Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 24/51] net/mlx4: revert fast Verbs interface for Tx Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 25/51] net/mlx4: revert fast Verbs interface for Rx Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 26/51] net/mlx4: simplify Rx buffer handling Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 27/51] net/mlx4: simplify link update function Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 28/51] net/mlx4: standardize on negative errno values Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 29/51] net/mlx4: clean up coding style inconsistencies Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 30/51] net/mlx4: remove control path locks Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 31/51] net/mlx4: remove unnecessary wrapper functions Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 32/51] net/mlx4: remove mbuf macro definitions Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 33/51] net/mlx4: use standard macro to get array size Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 34/51] net/mlx4: separate debugging macros Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 35/51] net/mlx4: use a single interrupt handle Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 36/51] net/mlx4: rename alarm field Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 37/51] net/mlx4: refactor interrupt FD settings Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 38/51] net/mlx4: clean up interrupt functions prototypes Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 39/51] net/mlx4: compact interrupt functions Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 40/51] net/mlx4: separate interrupt handling Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 41/51] net/mlx4: separate Rx/Tx definitions Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 42/51] net/mlx4: separate Rx/Tx functions Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 43/51] net/mlx4: separate device control functions Adrien Mazarguil
2017-09-01  8:06   ` [PATCH v2 44/51] net/mlx4: separate Tx configuration functions Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 45/51] net/mlx4: separate Rx " Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 46/51] net/mlx4: group flow API handlers in common file Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 47/51] net/mlx4: rename private functions in flow API Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 48/51] net/mlx4: separate memory management functions Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 49/51] net/mlx4: clean up includes and comments Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 50/51] net/mlx4: remove isolated mode constraint Adrien Mazarguil
2017-09-01  8:07   ` [PATCH v2 51/51] net/mlx4: rely on ethdev for Tx/Rx queue arrays Adrien Mazarguil
2017-09-01 11:24   ` [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD Ferruh Yigit
2017-09-01 11:56     ` Adrien Mazarguil
2017-09-05  9:59   ` Ferruh Yigit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.