All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gaetan Rivet <gaetan.rivet@6wind.com>
To: dev@dpdk.org
Subject: [PATCH v2 06/13] net/failsafe: add fail-safe PMD
Date: Wed,  8 Mar 2017 16:15:39 +0100	[thread overview]
Message-ID: <1d50a6f160bd12c3f24fc9b6a260edb6b03ac394.1488985489.git.gaetan.rivet@6wind.com> (raw)
In-Reply-To: <cover.1488550982.git.gaetan.rivet@6wind.com>
In-Reply-To: <cover.1488985489.git.gaetan.rivet@6wind.com>

Introduce the fail-safe poll mode driver initialization and enable its
build infrastructure.

This PMD allows for applications to benefits from true hot-plugging
support without having to implement it.

It intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back.
It also allows defining a contingency to the removal of a device, by
designating a fail-over device that will take on transmitting operations
if the preferred device is removed.

Applications only see a fail-safe instance, without caring for
underlying activity ensuring their continued operations.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                             |   5 +
 config/common_base                      |   5 +
 doc/guides/nics/fail_safe.rst           | 133 +++++++
 doc/guides/nics/features/failsafe.ini   |  24 ++
 doc/guides/nics/index.rst               |   1 +
 drivers/net/Makefile                    |   1 +
 drivers/net/failsafe/Makefile           |  72 ++++
 drivers/net/failsafe/failsafe.c         | 229 +++++++++++
 drivers/net/failsafe/failsafe_args.c    | 347 ++++++++++++++++
 drivers/net/failsafe/failsafe_eal.c     | 318 +++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     | 677 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h | 232 +++++++++++
 drivers/net/failsafe/failsafe_rxtx.c    | 107 +++++
 mk/rte.app.mk                           |   1 +
 14 files changed, 2152 insertions(+)
 create mode 100644 doc/guides/nics/fail_safe.rst
 create mode 100644 doc/guides/nics/features/failsafe.ini
 create mode 100644 drivers/net/failsafe/Makefile
 create mode 100644 drivers/net/failsafe/failsafe.c
 create mode 100644 drivers/net/failsafe/failsafe_args.c
 create mode 100644 drivers/net/failsafe/failsafe_eal.c
 create mode 100644 drivers/net/failsafe/failsafe_ops.c
 create mode 100644 drivers/net/failsafe/failsafe_private.h
 create mode 100644 drivers/net/failsafe/failsafe_rxtx.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cc3bf98..ab9ed0c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -315,6 +315,11 @@ M: Matej Vido <vido@cesnet.cz>
 F: drivers/net/szedata2/
 F: doc/guides/nics/szedata2.rst
 
+Fail-safe PMD
+M: Gaetan Rivet <gaetan.rivet@6wind.com>
+F: drivers/net/failsafe/
+F: doc/guides/nics/fail_safe.rst
+
 Intel e1000
 M: Wenzhuo Lu <wenzhuo.lu@intel.com>
 F: drivers/net/e1000/
diff --git a/config/common_base b/config/common_base
index 71a4fcb..ae64a5b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -364,6 +364,11 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
 CONFIG_RTE_LIBRTE_PMD_NULL=y
 
 #
+# Compile fail-safe PMD
+#
+CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
new file mode 100644
index 0000000..056f85f
--- /dev/null
+++ b/doc/guides/nics/fail_safe.rst
@@ -0,0 +1,133 @@
+..  BSD LICENSE
+    Copyright 2017 6WIND S.A.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Fail-safe poll mode driver library
+==================================
+
+The Fail-safe poll mode driver library (**librte_pmd_failsafe**) is a virtual
+device that allows using any device supporting hotplug (sudden device removal
+and plugging on its bus), without modifying other components relying on such
+device (application, other PMDs).
+
+Additionally to the Seamless Hotplug feature, the Fail-safe PMD offers the
+ability to redirect operations to secondary devices when the primary has been
+removed from the system.
+
+.. note::
+
+   The library is enabled by default. You can enable it or disable it manually
+   by setting the ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` configuration option.
+
+Features
+--------
+
+The Fail-safe PMD only supports a limited set of features. If you plan to use a
+device underneath the Fail-safe PMD with a specific feature, this feature must
+be supported by the Fail-safe PMD to avoid throwing any error.
+
+Check the feature matrix for the complete set of supported features.
+
+Compilation options
+-------------------
+
+These options can be modified in the ``$RTE_TARGET/build/.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` (default **y**)
+
+  Toggle compiling librte_pmd_failsafe itself.
+
+- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG`` (default **n**)
+
+  Toggle debugging code.
+
+Using the Fail-safe PMD from the EAL command line
+-------------------------------------------------
+
+The Fail-safe PMD can be used like most other DPDK virtual devices, by passing a
+``--vdev`` parameter to the EAL when starting the application. The device name
+must start with the *net_failsafe* prefix, followed by numbers or letters. This
+name must be unique for each device. Each fail-safe instance must have at least one
+sub-device, up to ``RTE_MAX_ETHPORTS-1``.
+
+A sub-device can be any legal DPDK device, including possibly another fail-safe
+instance.
+
+Fail-safe command line parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **dev(<iface>)** parameter
+
+  This parameter allows the user to define a sub-device. The ``<iface>`` part of
+  this parameter must be a valid device definition. It could be the argument
+  provided to a ``-w`` PCI device specification or the argument that would be
+  given to a ``--vdev`` parameter (including a fail-safe).
+  Enclosing the device definition within parenthesis here allows using
+  additional sub-device parameters if need be. They will be passed on to the
+  sub-device.
+
+- **mac** parameter [MAC address]
+
+  This parameter allows the user to set a default MAC address to the fail-safe
+  and all of its sub-devices.
+  If no default mac address is provided, the fail-safe PMD will read the MAC
+  address of the first of its sub-device to be successfully probed and use it as
+  its default MAC address, trying to set it to all of its other sub-devices.
+  If no sub-device was successfully probed at initialization, then a random MAC
+  address is generated, that will be subsequently applied to all sub-device once
+  they are probed.
+
+Usage example
+~~~~~~~~~~~~~
+
+This section shows some example of using **testpmd** with a fail-safe PMD.
+
+#. Request huge pages:
+
+   .. code-block:: console
+
+      echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Start testpmd
+
+   .. code-block:: console
+
+      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 --no-pci \
+         --vdev='net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0,nodeaction=r1:0:CREATE)' -- \
+         -i
+
+Using the Fail-safe PMD from an application
+-------------------------------------------
+
+This driver strives to be as seamless as possible to existing applications, in
+order to propose the hotplug functionality in the easiest way possible.
+
+Care must be taken, however, to respect the **ether** API concerning device
+access, and in particular, using the ``RTE_ETH_FOREACH_DEV`` macro to iterate
+over ethernet devices, instead of directly accessing them or by writing one's
+own device iterator.
diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
new file mode 100644
index 0000000..3c52823
--- /dev/null
+++ b/doc/guides/nics/features/failsafe.ini
@@ -0,0 +1,24 @@
+;
+; Supported features of the 'fail-safe' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+Queue start/stop     = Y
+MTU update           = Y
+Jumbo frame          = Y
+Promiscuous mode     = Y
+Allmulticast mode    = Y
+Unicast MAC filter   = Y
+Multicast MAC filter = Y
+VLAN filter          = Y
+Packet type parsing  = Y
+Basic stats          = Y
+Stats per queue      = Y
+ARMv7                = Y
+ARMv8                = Y
+Power8               = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..0ba52e1 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -58,6 +58,7 @@ Network Interface Controller Drivers
     vhost
     vmxnet3
     pcap_ring
+    fail_safe
 
 **Figures**
 
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 40fc333..3c658b4 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -38,6 +38,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
 DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += e1000
 DIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
new file mode 100644
index 0000000..06199ad
--- /dev/null
+++ b/drivers/net/failsafe/Makefile
@@ -0,0 +1,72 @@
+#   BSD LICENSE
+#
+#   Copyright 2017 6WIND S.A.
+#   Copyright 2017 Mellanox.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of 6WIND S.A. nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name
+LIB = librte_pmd_failsafe.a
+
+# Sources are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_args.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
+
+# No exported include files
+
+# This lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf
+
+ifneq ($(DEBUG),)
+CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG := y
+endif
+
+# Basic CFLAGS:
+CFLAGS += -std=gnu99 -Wall -Wextra
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=700
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+CFLAGS += -pedantic -DPEDANTIC
+
+ifeq ($(CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG),y)
+CFLAGS += -g -UNDEBUG
+else
+CFLAGS += -O3
+CFLAGS += -DNDEBUG
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
new file mode 100644
index 0000000..cd60193
--- /dev/null
+++ b/drivers/net/failsafe/failsafe.c
@@ -0,0 +1,229 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <rte_alarm.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "failsafe_private.h"
+
+const char pmd_failsafe_driver_name[] = FAILSAFE_DRIVER_NAME;
+static const struct rte_eth_link eth_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_UP,
+	.link_autoneg = ETH_LINK_SPEED_AUTONEG,
+};
+
+static int
+sub_device_create(struct rte_eth_dev *dev,
+		const char *params)
+{
+	uint8_t nb_subs;
+	int ret;
+
+	ret = failsafe_args_count_subdevice(dev, params);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->subs_tail > FAILSAFE_MAX_ETHPORTS) {
+		ERROR("Cannot allocate more than %d ports",
+			FAILSAFE_MAX_ETHPORTS);
+		return -ENOSPC;
+	}
+	nb_subs = PRIV(dev)->subs_tail;
+	PRIV(dev)->subs = rte_zmalloc(NULL,
+			sizeof(struct sub_device) * nb_subs,
+			RTE_CACHE_LINE_SIZE);
+	if (PRIV(dev)->subs == NULL) {
+		ERROR("Could not allocate sub_devices");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void
+sub_device_free(struct rte_eth_dev *dev)
+{
+	rte_free(PRIV(dev)->subs);
+}
+
+static int
+eth_dev_create(const char *name,
+		const unsigned socket_id,
+		const char *params)
+{
+	struct rte_eth_dev *dev;
+	struct ether_addr *mac;
+	struct fs_priv *priv;
+	int ret;
+
+	dev = NULL;
+	priv = NULL;
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Creating fail-safe device on NUMA socket %u",
+			socket_id);
+	dev = rte_eth_dev_allocate(name);
+	if (dev == NULL) {
+		ERROR("Unable to allocate rte_eth_dev");
+		return -1;
+	}
+	priv = rte_zmalloc_socket(name, sizeof(*priv), 0, socket_id);
+	if (priv == NULL) {
+		ERROR("Unable to allocate private data");
+		goto free_dev;
+	}
+	dev->data->dev_private = priv;
+	PRIV(dev)->dev = dev;
+	dev->dev_ops = &failsafe_ops;
+	TAILQ_INIT(&dev->link_intr_cbs);
+	dev->data->dev_flags = 0x0;
+	dev->driver = NULL;
+	dev->data->kdrv = RTE_KDRV_NONE;
+	dev->data->drv_name = pmd_failsafe_driver_name;
+	dev->data->numa_node = socket_id;
+	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
+	dev->data->dev_link = eth_link;
+	PRIV(dev)->nb_mac_addr = 1;
+	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
+	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
+	if (params == NULL) {
+		ERROR("This PMD requires sub-devices, none provided");
+		goto free_priv;
+	}
+	ret = sub_device_create(dev, params);
+	if (ret) {
+		ERROR("Could not allocate sub_devices");
+		goto free_priv;
+	}
+	ret = failsafe_args_parse(dev, params);
+	if (ret)
+		goto free_subs;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		goto free_args;
+	mac = &dev->data->mac_addrs[0];
+	if (!mac_from_arg) {
+		struct sub_device *sdev;
+		uint8_t i;
+
+		/*
+		 * Use the ether_addr from first probed
+		 * device, either preferred or fallback.
+		 */
+		FOREACH_SUBDEV(sdev, i, dev)
+			if (sdev->state >= DEV_PROBED) {
+				ether_addr_copy(&ETH(sdev)->data->mac_addrs[0],
+						mac);
+				break;
+			}
+		/*
+		 * If no device has been probed and no ether_addr
+		 * has been provided on the command line, use a random
+		 * valid one.
+		 */
+		if (i == priv->subs_tail) {
+			eth_random_addr(
+				&mac->addr_bytes[0]);
+			ret = rte_eth_dev_default_mac_addr_set(
+						dev->data->port_id, mac);
+			if (ret) {
+				ERROR("Failed to set default MAC address");
+				goto free_subs;
+			}
+		}
+	}
+	INFO("MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5]);
+	return 0;
+free_args:
+	failsafe_args_free(dev);
+free_subs:
+	sub_device_free(dev);
+free_priv:
+	rte_free(priv);
+free_dev:
+	rte_eth_dev_release_port(dev);
+	return -1;
+}
+
+static int
+rte_eth_free(const char *name)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	dev = rte_eth_dev_allocated(name);
+	if (dev == NULL)
+		return -ENODEV;
+	ret = failsafe_eal_uninit(dev);
+	if (ret)
+		ERROR("Error while uninitializing sub-EAL");
+	failsafe_args_free(dev);
+	sub_device_free(dev);
+	rte_free(PRIV(dev));
+	rte_eth_dev_release_port(dev);
+	return ret;
+}
+
+static int
+rte_pmd_failsafe_probe(const char *name, const char *params)
+{
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
+			name);
+	return eth_dev_create(name, rte_socket_id(), params);
+}
+
+static int
+rte_pmd_failsafe_remove(const char *name)
+{
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Uninitializing " FAILSAFE_DRIVER_NAME " for %s", name);
+	return rte_eth_free(name);
+}
+
+static struct rte_vdev_driver failsafe_drv = {
+	.probe	= rte_pmd_failsafe_probe,
+	.remove	= rte_pmd_failsafe_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
+RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);
+RTE_PMD_REGISTER_PARAM_STRING(net_failsafe, PMD_FAILSAFE_PARAM_STRING);
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
new file mode 100644
index 0000000..faa48f4
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -0,0 +1,347 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <string.h>
+#include <errno.h>
+
+#include <rte_devargs.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+
+#include "failsafe_private.h"
+
+#define DEVARGS_MAXLEN 4096
+
+/* Callback used when a new device is found in devargs */
+typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
+		uint8_t head);
+
+int mac_from_arg;
+
+const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_MAC_KVARG,
+	NULL,
+};
+
+/*
+ * input: text.
+ * output: 0: if text[0] != '(',
+ *         0: if there are no corresponding ')'
+ *         n: distance to corresponding ')' otherwise
+ */
+static size_t
+closing_paren(const char *text)
+{
+	int nb_open = 0;
+	size_t i = 0;
+
+	while (text[i] != '\0') {
+		if (text[i] == '(')
+			nb_open++;
+		if (text[i] == ')')
+			nb_open--;
+		if (nb_open == 0)
+			return i;
+		i++;
+	}
+	return 0;
+}
+
+static int
+parse_device(struct sub_device *sdev, char *args)
+{
+	struct rte_devargs *d;
+	size_t b;
+
+	b = 0;
+	d = &sdev->devargs;
+	DEBUG("%s", args);
+	while (args[b] != ',' &&
+	       args[b] != '\0')
+		b++;
+	if (args[b] != '\0') {
+		d->args = strdup(&args[b + 1]);
+		if (d->args == NULL) {
+			ERROR("Not enough memory to store sub-device args");
+			return -ENOMEM;
+		}
+		args[b] = '\0';
+	} else {
+		d->args = strdup("");
+	}
+	if (eal_parse_pci_BDF(args, &d->pci.addr) == 0 ||
+	    eal_parse_pci_DomBDF(args, &d->pci.addr) == 0)
+		d->type = RTE_DEVTYPE_WHITELISTED_PCI;
+	else {
+		d->type = RTE_DEVTYPE_VIRTUAL;
+		snprintf(d->virt.drv_name,
+				sizeof(d->virt.drv_name), "%s", args);
+	}
+	sdev->state = DEV_PARSED;
+	return 0;
+}
+
+static int
+parse_device_param(struct rte_eth_dev *dev, const char *param,
+		uint8_t head)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	char *args = NULL;
+	size_t a, b;
+	int ret;
+
+	priv = PRIV(dev);
+	a = 0;
+	b = 0;
+	ret = 0;
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	a = b;
+	b += closing_paren(&param[b]);
+	if (a == b) {
+		ERROR("Dangling parenthesis");
+		return -EINVAL;
+	}
+	a += 1;
+	args = strndup(&param[a], b - a);
+	if (args == NULL) {
+		ERROR("Not enough memory for parameter parsing");
+		return -ENOMEM;
+	}
+	sdev = &priv->subs[head];
+	if (strncmp(param, "dev", 3) == 0) {
+		ret = parse_device(sdev, args);
+		if (ret)
+			goto free_args;
+	} else {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+free_args:
+	free(args);
+	return ret;
+}
+
+static int
+parse_sub_devices(parse_cb *cb,
+		struct rte_eth_dev *dev, const char *params)
+{
+	size_t a, b;
+	uint8_t head;
+	int ret;
+
+	a = 0;
+	head = 0;
+	ret = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',') {
+			a = b + 1;
+			continue;
+		}
+		if (params[b] == '(') {
+			size_t start = b;
+
+			b += closing_paren(&params[b]);
+			if (b == start) {
+				ERROR("Dangling parenthesis");
+				return -EINVAL;
+			}
+			ret = (*cb)(dev, &params[a], head);
+			if (ret)
+				return ret;
+			head += 1;
+			b += 1;
+			if (params[b] == '\0')
+				return 0;
+		}
+		a = b + 1;
+	}
+	return 0;
+}
+
+static int
+remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
+{
+	char buffer[DEVARGS_MAXLEN] = {0};
+	size_t a, b;
+	int i;
+
+	a = 0;
+	i = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',' || params[b] == '\0')
+			i += snprintf(&buffer[i], b - a + 1, "%s", &params[a]);
+		if (params[b] == '(') {
+			size_t start = b;
+			b += closing_paren(&params[b]);
+			if (b == start)
+				return -EINVAL;
+			b += 1;
+			if (params[b] == '\0')
+				goto out;
+		}
+		a = b + 1;
+	}
+out:
+	snprintf(params, DEVARGS_MAXLEN, "%s", buffer);
+	return 0;
+}
+
+static int
+get_mac_addr_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	struct ether_addr *ea = out;
+	int ret;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+		&ea->addr_bytes[0], &ea->addr_bytes[1],
+		&ea->addr_bytes[2], &ea->addr_bytes[3],
+		&ea->addr_bytes[4], &ea->addr_bytes[5]);
+	return ret != ETHER_ADDR_LEN;
+}
+
+int
+failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
+{
+	struct fs_priv *priv;
+	char mut_params[DEVARGS_MAXLEN] = "";
+	struct rte_kvargs *kvlist = NULL;
+	unsigned arg_count;
+	size_t n;
+	int ret;
+
+	if (dev == NULL || params == NULL)
+		return -EINVAL;
+	priv = PRIV(dev);
+	ret = 0;
+	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
+	/* default parameters */
+	mac_from_arg = 0;
+	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
+	if (n >= sizeof(mut_params)) {
+		ERROR("Parameter string too long (>=%zu)",
+				sizeof(mut_params));
+		return -ENOMEM;
+	}
+	ret = parse_sub_devices(parse_device_param,
+				dev, params);
+	if (ret < 0)
+		return ret;
+	ret = remove_sub_devices_definition(mut_params);
+	if (ret < 0)
+		return ret;
+	if (strnlen(mut_params, sizeof(mut_params)) > 0) {
+		kvlist = rte_kvargs_parse(mut_params,
+				pmd_failsafe_init_parameters);
+		if (kvlist == NULL) {
+			ERROR("Error parsing parameters, usage:\n"
+				PMD_FAILSAFE_PARAM_STRING);
+			return -1;
+		}
+		/* MAC addr */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_MAC_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_MAC_KVARG,
+					&get_mac_addr_arg,
+					&dev->data->mac_addrs[0]);
+			if (ret < 0)
+				goto free_kvlist;
+			mac_from_arg = 1;
+		}
+	}
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+void
+failsafe_args_free(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		free(sdev->devargs.args);
+		sdev->devargs.args = NULL;
+	}
+}
+
+static int
+count_device(struct rte_eth_dev *dev, const char *param,
+		uint8_t head __rte_unused)
+{
+	size_t b = 0;
+
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	if (strncmp(param, "dev", b) &&
+	    strncmp(param, "exec", b)) {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+	PRIV(dev)->subs_tail += 1;
+	return 0;
+}
+
+int
+failsafe_args_count_subdevice(struct rte_eth_dev *dev,
+			const char *params)
+{
+	return parse_sub_devices(count_device,
+				dev, params);
+}
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
new file mode 100644
index 0000000..fcee500
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -0,0 +1,318 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev *
+pci_addr_to_eth_dev(struct rte_pci_addr *addr)
+{
+	uint8_t pid;
+
+	if (addr == NULL)
+		return NULL;
+	for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+		struct rte_pci_addr *addr2;
+		struct rte_eth_dev *edev;
+
+		edev = &rte_eth_devices[pid];
+		if (edev->device == NULL ||
+		    edev->device->devargs == NULL)
+			continue;
+		addr2 = &edev->device->devargs->pci.addr;
+		if (rte_eal_compare_pci_addr(addr, addr2) == 0)
+			return edev;
+	}
+	return NULL;
+}
+
+static int
+pci_scan_one(struct sub_device *sdev)
+{
+	struct rte_devargs *da;
+	char dirname[PATH_MAX];
+
+	da = &sdev->devargs;
+	snprintf(dirname, sizeof(dirname),
+		"%s/" PCI_PRI_FMT,
+		pci_get_sysfs_path(),
+		da->pci.addr.domain,
+		da->pci.addr.bus,
+		da->pci.addr.devid,
+		da->pci.addr.function);
+	errno = 0;
+	if (rte_eal_pci_parse_sysfs_entry(&sdev->pci_device,
+		dirname, &da->pci.addr) < 0) {
+		if (errno == ENOENT) {
+			DEBUG("Could not scan requested device " PCI_PRI_FMT,
+				da->pci.addr.domain,
+				da->pci.addr.bus,
+				da->pci.addr.devid,
+				da->pci.addr.function);
+		} else {
+			ERROR("Error while scanning sysfs entry %s",
+					dirname);
+			return -1;
+		}
+	} else {
+		sdev->state = DEV_SCANNED;
+	}
+	return 0;
+}
+
+static int
+pci_scan(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			sdev->device.devargs = da;
+			ret = pci_scan_one(sdev);
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+dev_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			sdev->device.devargs = da;
+			PRIV(dev)->current_probed = i;
+			ret = rte_eal_vdev_init(da->virt.drv_name,
+						da->args);
+			if (ret)
+				return ret;
+			sdev->eth_dev =
+				rte_eth_dev_allocated(da->virt.drv_name);
+			if (ETH(sdev) == NULL) {
+				ERROR("sub_device %d init went wrong", i);
+				continue;
+			}
+			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			sdev->state = DEV_PROBED;
+		}
+	}
+	return 0;
+}
+
+static int
+pci_probe(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_pci_device *pdev = NULL;
+	struct rte_devargs *da = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_SCANNED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			pdev->device.devargs = da;
+			ret = rte_eal_pci_probe_all_drivers(pdev);
+			/* probing failed */
+			if (ret < 0) {
+				ERROR("Failed to probe requested device "
+					PCI_PRI_FMT,
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return -1;
+			}
+			/* no driver found */
+			if (ret > 0) {
+				ERROR("Requested device " PCI_PRI_FMT
+					" has no suitable driver",
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return 1;
+			}
+			sdev->eth_dev =
+				pci_addr_to_eth_dev(&pdev->addr);
+			if (ETH(sdev) == NULL) {
+				ERROR("sub_device %d init went wrong", i);
+				continue;
+			}
+			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			sdev->state = DEV_PROBED;
+		}
+	}
+	return 0;
+}
+
+int
+failsafe_eal_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	ret = pci_scan(dev);
+	if (ret)
+		return ret;
+	ret = pci_probe(dev);
+	if (ret)
+		return ret;
+	ret = dev_init(dev);
+	if (ret)
+		return ret;
+	/*
+	 * We only update TX_SUBDEV if we are not started.
+	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
+	 * preferred port only upon starting it, so that the switch is smoother.
+	 */
+	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			/* Using first probed device */
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+				DEBUG("Switching tx_dev to sub_device %d",
+				      i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static int
+dev_uninit(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			sdev->device.devargs = da;
+			/*
+			 * We are lucky here that no virtual device
+			 * currently creates multiple eth_dev.
+			 */
+			ret = rte_eal_vdev_uninit(da->virt.drv_name);
+			if (ret)
+				return ret;
+			sdev->state = DEV_PROBED - 1;
+		}
+	}
+	return 0;
+}
+
+static int
+pci_remove(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_pci_device *pdev = NULL;
+	struct rte_devargs *da = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			ret = rte_eal_pci_detach_all_drivers(pdev);
+			/* probing failed */
+			if (ret < 0) {
+				ERROR("Failed to remove requested device "
+					PCI_PRI_FMT,
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return -1;
+			}
+			/* no driver found */
+			if (ret > 0) {
+				ERROR("Requested device " PCI_PRI_FMT
+					" has no suitable driver",
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return 1;
+			}
+			sdev->state = DEV_PROBED - 1;
+		}
+	}
+	return 0;
+}
+
+int
+failsafe_eal_uninit(struct rte_eth_dev *dev)
+{
+	int ret;
+
+	ret = pci_remove(dev);
+	if (ret)
+		return ret;
+	ret = dev_uninit(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
new file mode 100644
index 0000000..470fea4
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -0,0 +1,677 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev_info default_infos = {
+	.driver_name = pmd_failsafe_driver_name,
+	/* Max possible number of elements */
+	.max_rx_pktlen = UINT32_MAX,
+	.max_rx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_tx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_mac_addrs = FAILSAFE_MAX_ETHADDR,
+	.max_hash_mac_addrs = UINT32_MAX,
+	.max_vfs = UINT16_MAX,
+	.max_vmdq_pools = UINT16_MAX,
+	.rx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	.tx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	/* Set of understood capabilities */
+	.rx_offload_capa = 0x0,
+	.tx_offload_capa = 0x0,
+	.flow_type_rss_offloads = 0x0,
+};
+
+static int
+fs_dev_configure(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PROBED)
+			continue;
+		DEBUG("Configuring sub-device %d", i);
+		ret = rte_eth_dev_configure(PORT_ID(sdev),
+					dev->data->nb_rx_queues,
+					dev->data->nb_tx_queues,
+					&dev->data->dev_conf);
+		if (ret) {
+			ERROR("Could not configure sub_device %d", i);
+			return ret;
+		}
+		sdev->state = DEV_ACTIVE;
+	}
+	return 0;
+}
+
+static int
+fs_dev_start(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_ACTIVE)
+			continue;
+		DEBUG("Starting sub_device %d", i);
+		ret = rte_eth_dev_start(PORT_ID(sdev));
+		if (ret)
+			return ret;
+		sdev->state = DEV_STARTED;
+	}
+	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+				DEBUG("Switching tx_dev to sub_device %d", i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static void
+fs_dev_stop(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+		rte_eth_dev_stop(PORT_ID(sdev));
+		sdev->state = DEV_STARTED - 1;
+	}
+}
+
+static int
+fs_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
+		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_up failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
+		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_down failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void dev_free_queues(struct rte_eth_dev *dev);
+static void
+fs_dev_close(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Closing sub_device %d", i);
+		rte_eth_dev_close(PORT_ID(sdev));
+		sdev->state = DEV_ACTIVE - 1;
+	}
+	dev_free_queues(dev);
+}
+
+static void
+fs_rx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct rxq *rxq;
+
+	if (queue == NULL)
+		return;
+	rxq = queue;
+	dev = rxq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, rx_queue_release)
+			(&ETH(sdev)->data->rx_queues[rxq->qid]);
+	rte_free(rxq);
+}
+
+static int
+fs_rx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t rx_queue_id,
+		uint16_t nb_rx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		struct rte_mempool *mb_pool)
+{
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	uint8_t i;
+	int ret;
+
+	rxq = dev->data->rx_queues[rx_queue_id];
+	if (rxq != NULL) {
+		fs_rx_queue_release(rxq);
+		dev->data->rx_queues[rx_queue_id] = NULL;
+	}
+	rxq = rte_zmalloc(NULL, sizeof(*rxq),
+			  RTE_CACHE_LINE_SIZE);
+	if (rxq == NULL)
+		return -ENOMEM;
+	rxq->qid = rx_queue_id;
+	rxq->socket_id = socket_id;
+	rxq->info.mp = mb_pool;
+	rxq->info.conf = *rx_conf;
+	rxq->info.nb_desc = nb_rx_desc;
+	rxq->priv = PRIV(dev);
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
+				rx_queue_id,
+				nb_rx_desc, socket_id,
+				rx_conf, mb_pool);
+		if (ret) {
+			ERROR("RX queue setup failed for sub_device %d", i);
+			goto free_rxq;
+		}
+	}
+	return 0;
+free_rxq:
+	fs_rx_queue_release(rxq);
+	return ret;
+}
+
+static void
+fs_tx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct txq *txq;
+
+	if (queue == NULL)
+		return;
+	txq = queue;
+	dev = txq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, tx_queue_release)
+			(&ETH(sdev)->data->tx_queues[txq->qid]);
+	rte_free(txq);
+}
+
+static int
+fs_tx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t tx_queue_id,
+		uint16_t nb_tx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	uint8_t i;
+	int ret;
+
+	txq = dev->data->tx_queues[tx_queue_id];
+	if (txq != NULL) {
+		fs_tx_queue_release(txq);
+		dev->data->tx_queues[tx_queue_id] = NULL;
+	}
+	txq = rte_zmalloc("ethdev TX queue", sizeof(*txq),
+			  RTE_CACHE_LINE_SIZE);
+	if (txq == NULL)
+		return -ENOMEM;
+	txq->qid = tx_queue_id;
+	txq->socket_id = socket_id;
+	txq->info.conf = *tx_conf;
+	txq->info.nb_desc = nb_tx_desc;
+	txq->priv = PRIV(dev);
+	dev->data->tx_queues[tx_queue_id] = txq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev),
+				tx_queue_id,
+				nb_tx_desc, socket_id,
+				tx_conf);
+		if (ret) {
+			ERROR("TX queue setup failed for sub_device %d", i);
+			goto free_txq;
+		}
+	}
+	return 0;
+free_txq:
+	fs_tx_queue_release(txq);
+	return ret;
+}
+
+static void
+dev_free_queues(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		fs_rx_queue_release(dev->data->rx_queues[i]);
+		dev->data->rx_queues[i] = NULL;
+	}
+	dev->data->nb_rx_queues = 0;
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		fs_tx_queue_release(dev->data->tx_queues[i]);
+		dev->data->tx_queues[i] = NULL;
+	}
+	dev->data->nb_tx_queues = 0;
+}
+
+static void
+fs_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_enable(PORT_ID(sdev));
+}
+
+static void
+fs_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_disable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_enable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_disable(PORT_ID(sdev));
+}
+
+static int
+fs_link_update(struct rte_eth_dev *dev,
+		int wait_to_complete)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling link_update on sub_device %d", i);
+		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
+		if (ret && ret != -1) {
+			ERROR("Link update failed for sub_device %d with error %d",
+			      i, ret);
+			return ret;
+		}
+	}
+	if (TX_SUBDEV(dev)) {
+		struct rte_eth_link *l1;
+		struct rte_eth_link *l2;
+
+		l1 = &dev->data->dev_link;
+		l2 = &ETH(TX_SUBDEV(dev))->data->dev_link;
+		if (memcmp(l1, l2, sizeof(*l1))) {
+			*l1 = *l2;
+			return 0;
+		}
+	}
+	return -1;
+}
+
+static void
+fs_stats_get(struct rte_eth_dev *dev,
+		struct rte_eth_stats *stats)
+{
+	struct sub_device *sdev;
+	struct rte_eth_stats loc_stats;
+	uint8_t j;
+	size_t i;
+
+	memset(stats, 0, sizeof(*stats));
+	memset(&loc_stats, 0, sizeof(loc_stats));
+	FOREACH_SUBDEV_ST(sdev, j, dev, DEV_PROBED) {
+		rte_eth_stats_get(PORT_ID(sdev), &loc_stats);
+		stats->ipackets += loc_stats.ipackets;
+		stats->opackets += loc_stats.opackets;
+		stats->ibytes += loc_stats.ibytes;
+		stats->obytes += loc_stats.obytes;
+		stats->imissed += loc_stats.imissed;
+		stats->ierrors += loc_stats.ierrors;
+		stats->oerrors += loc_stats.oerrors;
+		stats->rx_nombuf += loc_stats.rx_nombuf;
+		for (i = 0; i < RTE_DIM(stats->q_ipackets); i++)
+			stats->q_ipackets[i] += loc_stats.q_ipackets[i];
+		for (i = 0; i < RTE_DIM(stats->q_opackets); i++)
+			stats->q_opackets[i] += loc_stats.q_opackets[i];
+		for (i = 0; i < RTE_DIM(stats->q_ibytes); i++)
+			stats->q_ibytes[i] += loc_stats.q_ibytes[i];
+		for (i = 0; i < RTE_DIM(stats->q_obytes); i++)
+			stats->q_obytes[i] += loc_stats.q_obytes[i];
+		for (i = 0; i < RTE_DIM(stats->q_errors); i++)
+			stats->q_errors[i] += loc_stats.q_errors[i];
+	}
+}
+
+static void
+fs_stats_reset(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_stats_reset(PORT_ID(sdev));
+}
+
+/**
+ * Fail-safe dev_infos_get rules:
+ *
+ * No sub_device:
+ *   Numerables:
+ *      Use the maximum possible values for any field, so as not
+ *      to impede any further configuration effort.
+ *   Capabilities:
+ *      Limits capabilities to those that are understood by the
+ *      fail-safe PMD. This understanding stems from the fail-safe
+ *      being capable of verifying that the related capability is
+ *      expressed within the device configuration (struct rte_eth_conf).
+ *
+ * At least one probed sub_device:
+ *   Numerables:
+ *      Uses values from the active probed sub_device
+ *      The rationale here is that if any sub_device is less capable
+ *      (for example concerning the number of queues) than the active
+ *      sub_device, then its subsequent configuration will fail.
+ *      It is impossible to foresee this failure when the failing sub_device
+ *      is supposed to be plugged-in later on, so the configuration process
+ *      is the single point of failure and error reporting.
+ *   Capabilities:
+ *      Uses a logical AND of RX capabilities among
+ *      all sub_devices and the default capabilities.
+ *      Uses a logical AND of TX capabilities among
+ *      the active probed sub_device and the default capabilities.
+ *
+ */
+static void
+fs_dev_infos_get(struct rte_eth_dev *dev,
+		  struct rte_eth_dev_info *infos)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL) {
+		DEBUG("No probed device, using default infos");
+		rte_memcpy(&PRIV(dev)->infos, &default_infos,
+			   sizeof(default_infos));
+	} else {
+		uint32_t rx_offload_capa;
+
+		rx_offload_capa = default_infos.rx_offload_capa;
+		FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+			rte_eth_dev_info_get(PORT_ID(sdev),
+					&PRIV(dev)->infos);
+			rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa;
+		}
+		sdev = TX_SUBDEV(dev);
+		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
+		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
+		PRIV(dev)->infos.tx_offload_capa &=
+					default_infos.tx_offload_capa;
+		PRIV(dev)->infos.flow_type_rss_offloads &=
+					default_infos.flow_type_rss_offloads;
+	}
+	rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
+}
+
+static const uint32_t *
+fs_dev_supported_ptypes_get(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_eth_dev *edev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return NULL;
+	edev = ETH(sdev);
+	/* ENOTSUP: counts as no supported ptypes */
+	if (SUBOPS(sdev, dev_supported_ptypes_get) == NULL)
+		return NULL;
+	/*
+	 * The API does not permit to do a clean AND of all ptypes,
+	 * It is also incomplete by design and we do not really care
+	 * to have a best possible value in this context.
+	 * We just return the ptypes of the device of highest
+	 * priority, usually the PREFERRED device.
+	 */
+	return SUBOPS(sdev, dev_supported_ptypes_get)(edev);
+}
+
+static int
+fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
+		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_vlan_filter failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_flow_ctrl_get(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return 0;
+	if (SUBOPS(sdev, flow_ctrl_get) == NULL)
+		return -ENOTSUP;
+	return SUBOPS(sdev, flow_ctrl_get)(ETH(sdev), fc_conf);
+}
+
+static int
+fs_flow_ctrl_set(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
+		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_flow_ctrl_set failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void
+fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	/* No check: already done within the rte_eth_dev_mac_addr_remove
+	 * call for the fail-safe device.
+	 */
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
+				&dev->data->mac_addrs[index]);
+	PRIV(dev)->mac_addr_pool[index] = 0;
+}
+
+static void
+fs_mac_addr_add(struct rte_eth_dev *dev,
+		struct ether_addr *mac_addr,
+		uint32_t index,
+		uint32_t vmdq)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	assert(index < FAILSAFE_MAX_ETHADDR);
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
+	if (index >= PRIV(dev)->nb_mac_addr) {
+		DEBUG("Growing mac_addrs array");
+		PRIV(dev)->nb_mac_addr = index;
+	}
+	PRIV(dev)->mac_addr_pool[index] = vmdq;
+}
+
+static void
+fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
+}
+
+const struct eth_dev_ops failsafe_ops = {
+	.dev_configure = fs_dev_configure,
+	.dev_start = fs_dev_start,
+	.dev_stop = fs_dev_stop,
+	.dev_set_link_down = fs_dev_set_link_down,
+	.dev_set_link_up = fs_dev_set_link_up,
+	.dev_close = fs_dev_close,
+	.promiscuous_enable = fs_promiscuous_enable,
+	.promiscuous_disable = fs_promiscuous_disable,
+	.allmulticast_enable = fs_allmulticast_enable,
+	.allmulticast_disable = fs_allmulticast_disable,
+	.link_update = fs_link_update,
+	.stats_get = fs_stats_get,
+	.stats_reset = fs_stats_reset,
+	.dev_infos_get = fs_dev_infos_get,
+	.dev_supported_ptypes_get = fs_dev_supported_ptypes_get,
+	.mtu_set = fs_mtu_set,
+	.vlan_filter_set = fs_vlan_filter_set,
+	.rx_queue_setup = fs_rx_queue_setup,
+	.tx_queue_setup = fs_tx_queue_setup,
+	.rx_queue_release = fs_rx_queue_release,
+	.tx_queue_release = fs_tx_queue_release,
+	.flow_ctrl_get = fs_flow_ctrl_get,
+	.flow_ctrl_set = fs_flow_ctrl_set,
+	.mac_addr_remove = fs_mac_addr_remove,
+	.mac_addr_add = fs_mac_addr_add,
+	.mac_addr_set = fs_mac_addr_set,
+};
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
new file mode 100644
index 0000000..84a4d3a
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -0,0 +1,232 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
+#define _RTE_ETH_FAILSAFE_PRIVATE_H_
+
+#include <rte_dev.h>
+#include <rte_pci.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+
+#define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
+
+#define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PARAM_STRING	\
+	"dev(<ifc>),"			\
+	"mac=mac_addr"			\
+	""
+
+#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+
+#define FAILSAFE_MAX_ETHPORTS (RTE_MAX_ETHPORTS - 1)
+#define FAILSAFE_MAX_ETHADDR 128
+
+/* TYPES */
+
+struct rxq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	/* id of last sub_device polled */
+	uint8_t last_polled;
+	unsigned int socket_id;
+	struct rte_eth_rxq_info info;
+};
+
+struct txq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	unsigned int socket_id;
+	struct rte_eth_txq_info info;
+};
+
+enum dev_state {
+	DEV_UNDEFINED = 0,
+	DEV_PARSED,
+	DEV_SCANNED,
+	DEV_PROBED,
+	DEV_ACTIVE,
+	DEV_STARTED,
+};
+
+struct sub_device {
+	/* Exhaustive DPDK device description */
+	struct rte_devargs devargs;
+	RTE_STD_C11
+	union {
+		struct rte_device device;
+		struct rte_pci_device pci_device;
+	};
+	struct rte_eth_dev *eth_dev;
+	/* Device state machine */
+	enum dev_state state;
+};
+
+struct fs_priv {
+	struct rte_eth_dev *dev;
+	/*
+	 * Set of sub_devices.
+	 * subs[0] is the preferred device
+	 * any other is just another slave
+	 */
+	uint8_t subs_head; /* if head == tail, no subs */
+	uint8_t subs_tail; /* first invalid */
+	uint8_t subs_tx;
+	struct sub_device *subs;
+	uint8_t current_probed;
+	/* current number of mac_addr slots allocated. */
+	uint32_t nb_mac_addr;
+	struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
+	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
+	/* current capabilities */
+	struct rte_eth_dev_info infos;
+};
+
+/* RX / TX */
+
+uint16_t failsafe_rx_burst(void *rxq,
+		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t failsafe_tx_burst(void *txq,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
+/* ARGS */
+
+int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
+void failsafe_args_free(struct rte_eth_dev *dev);
+int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
+
+/* EAL */
+
+int failsafe_eal_init(struct rte_eth_dev *dev);
+int failsafe_eal_uninit(struct rte_eth_dev *dev);
+
+/* GLOBALS */
+
+extern const char pmd_failsafe_driver_name[];
+extern const struct eth_dev_ops failsafe_ops;
+extern int mac_from_arg;
+
+/* HELPERS */
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PRIV(dev) \
+	((struct fs_priv *)(dev)->data->dev_private)
+
+/* sdev: (struct sub_device *) */
+#define ETH(sdev) \
+	((sdev)->eth_dev)
+
+/* sdev: (struct sub_device *) */
+#define PORT_ID(sdev) \
+	(ETH(sdev)->data->port_id)
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices:
+ * s:     (struct sub_device *), iterator
+ * i:     (uint8_t), increment
+ * dev:   (struct rte_eth_dev *), fail-safe ethdev
+ * state: (enum dev_state), minimum acceptable device state
+ */
+#define FOREACH_SUBDEV_ST(s, i, dev, state)				\
+	for (i = fs_find_next((dev), 0, state);				\
+	     i < PRIV(dev)->subs_tail && (s = &PRIV(dev)->subs[i]);	\
+	     i = fs_find_next((dev), i + 1, state))
+
+/**
+ * Iterator construct over fail-safe sub-devices:
+ * s:   (struct sub_device *), iterator
+ * i:   (uint8_t), increment
+ * dev: (struct rte_eth_dev *), fail-safe ethdev
+ */
+#define FOREACH_SUBDEV(s, i, dev)			\
+	FOREACH_SUBDEV_ST(s, i, dev, DEV_UNDEFINED)
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PREFERRED_SUBDEV(dev) \
+	(&PRIV(dev)->subs[0])
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define TX_SUBDEV(dev)							  \
+	(PRIV(dev)->subs_tx >= PRIV(dev)->subs_tail		   ? NULL \
+	 : (PRIV(dev)->subs[PRIV(dev)->subs_tx].state < DEV_PROBED ? NULL \
+	 : &PRIV(dev)->subs[PRIV(dev)->subs_tx]))
+
+/**
+ * s:   (struct sub_device *)
+ * ops: (struct eth_dev_ops) member
+ */
+#define SUBOPS(s, ops) \
+	(ETH(s)->dev_ops->ops)
+
+#ifndef NDEBUG
+#include <stdio.h>
+#define DEBUG__(m, ...)						\
+	(fprintf(stderr, "%s:%d: %s(): " m "%c",		\
+		 __FILE__, __LINE__, __func__, __VA_ARGS__),	\
+	 (void)0)
+#define DEBUG_(...)				\
+	(errno = ((int []){			\
+		*(volatile int *)&errno,	\
+		(DEBUG__(__VA_ARGS__), 0)	\
+	})[0])
+#define DEBUG(...) DEBUG_(__VA_ARGS__, '\n')
+#define INFO(...) DEBUG(__VA_ARGS__)
+#define WARN(...) DEBUG(__VA_ARGS__)
+#define ERROR(...) DEBUG(__VA_ARGS__)
+#else
+#define DEBUG(...) ((void)0)
+#define LOG__(level, m, ...) \
+	RTE_LOG(level, PMD, "net_failsafe: " m "%c", __VA_ARGS__)
+#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
+#define INFO(...) LOG_(INFO, __VA_ARGS__)
+#define WARN(...) LOG_(WARNING, "WARNING: " __VA_ARGS__)
+#define ERROR(...) LOG_(ERR, "ERROR: " __VA_ARGS__)
+#endif
+
+/* inlined functions */
+
+static inline uint8_t
+fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
+		enum dev_state min_state)
+{
+	while (sid < PRIV(dev)->subs_tail) {
+		if (PRIV(dev)->subs[sid].state >= min_state)
+			break;
+		sid++;
+	}
+	if (sid >= PRIV(dev)->subs_tail)
+		return PRIV(dev)->subs_tail;
+	return sid;
+}
+
+#endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
new file mode 100644
index 0000000..a45b4e5
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -0,0 +1,107 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include "failsafe_private.h"
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_rx_burst(void *queue,
+		  struct rte_mbuf **rx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	void *sub_rxq;
+	uint16_t nb_rx;
+	uint8_t nb_polled, nb_subs;
+	uint8_t i;
+
+	rxq = queue;
+	priv = rxq->priv;
+	nb_subs = priv->subs_tail - priv->subs_head;
+	nb_polled = 0;
+	for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
+		i++;
+		if (i == priv->subs_tail)
+			i = priv->subs_head;
+		sdev = &priv->subs[i];
+		if (unlikely(ETH(sdev) == NULL))
+			continue;
+		if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
+			continue;
+		if (unlikely(sdev->state != DEV_STARTED))
+			continue;
+		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
+		nb_rx = ETH(sdev)->
+			rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
+		if (nb_rx) {
+			rxq->last_polled = i;
+			return nb_rx;
+		}
+	}
+	return 0;
+}
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_tx_burst(void *queue,
+		  struct rte_mbuf **tx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	void *sub_txq;
+
+	txq = queue;
+	sdev = TX_SUBDEV(txq->priv->dev);
+	if (unlikely(sdev == NULL))
+		return 0;
+	if (unlikely(ETH(sdev) == NULL))
+		return 0;
+	if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
+		return 0;
+	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
+	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 92f3635..2032625 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -112,6 +112,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)      += -lrte_pmd_e1000
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)        += -lrte_pmd_ena
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)       += -lrte_pmd_enic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)      += -lrte_pmd_fm10k
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE)   += -lrte_pmd_failsafe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)       += -lrte_pmd_i40e
 _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)      += -lrte_pmd_ixgbe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs
-- 
2.1.4

  parent reply	other threads:[~2017-03-08 15:16 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
2017-03-03 15:40 ` [PATCH 01/12] ethdev: save VLAN filter setting Gaetan Rivet
2017-03-03 17:33   ` Stephen Hemminger
2017-03-03 15:40 ` [PATCH 02/12] ethdev: add flow API rule copy function Gaetan Rivet
2017-03-03 15:40 ` [PATCH 03/12] ethdev: add deferred intermediate device state Gaetan Rivet
2017-03-03 17:34   ` Stephen Hemminger
2017-03-03 15:40 ` [PATCH 04/12] pci: expose device detach routine Gaetan Rivet
2017-03-03 15:40 ` [PATCH 05/12] pci: expose parse and probe routines Gaetan Rivet
2017-03-03 15:40 ` [PATCH 06/12] net/failsafe: add fail-safe PMD Gaetan Rivet
2017-03-03 17:38   ` Stephen Hemminger
2017-03-06 14:19     ` Gaëtan Rivet
2017-03-03 15:40 ` [PATCH 07/12] net/failsafe: add plug-in support Gaetan Rivet
2017-03-03 15:40 ` [PATCH 08/12] net/failsafe: add flexible device definition Gaetan Rivet
2017-03-03 15:40 ` [PATCH 09/12] net/failsafe: support flow API Gaetan Rivet
2017-03-03 15:40 ` [PATCH 10/12] net/failsafe: support offload capabilities Gaetan Rivet
2017-03-03 15:40 ` [PATCH 11/12] net/failsafe: add fast burst functions Gaetan Rivet
2017-03-03 15:40 ` [PATCH 12/12] net/failsafe: support device removal Gaetan Rivet
2017-03-03 16:14 ` [PATCH 00/12] introduce fail-safe PMD Bruce Richardson
2017-03-06 13:53   ` Gaëtan Rivet
2017-03-03 17:27 ` Stephen Hemminger
2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 01/13] ethdev: save VLAN filter setting Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 02/13] ethdev: add flow API rule copy function Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 03/13] ethdev: add deferred intermediate device state Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 04/13] pci: expose device detach routine Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 05/13] pci: expose parse and probe routines Gaetan Rivet
2017-03-08 15:15   ` Gaetan Rivet [this message]
2017-03-08 15:15   ` [PATCH v2 07/13] net/failsafe: add plug-in support Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 08/13] net/failsafe: add flexible device definition Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 09/13] net/failsafe: support flow API Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 10/13] net/failsafe: support offload capabilities Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 11/13] net/failsafe: add fast burst functions Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 12/13] net/failsafe: support device removal Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 13/13] net/failsafe: support link status change event Gaetan Rivet
2017-03-08 16:54   ` [PATCH v2 00/13] introduce fail-safe PMD Neil Horman
2017-03-09  9:15     ` Bruce Richardson
2017-03-10  9:13       ` Gaëtan Rivet
2017-03-10 22:43         ` Neil Horman
2017-03-14 14:49           ` Gaëtan Rivet
2017-03-15  3:28             ` Bruce Richardson
2017-03-15 11:15               ` Thomas Monjalon
2017-03-15 14:25                 ` Gaëtan Rivet
2017-03-16 20:50                   ` Neil Horman
2017-03-17 10:56                     ` Gaëtan Rivet
2017-03-18 19:51                       ` Neil Horman
2017-03-20 15:00   ` Thomas Monjalon
2017-05-17 12:50     ` Ferruh Yigit
2017-05-17 16:59       ` Gaëtan Rivet
2017-03-23 13:01   ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d50a6f160bd12c3f24fc9b6a260edb6b03ac394.1488985489.git.gaetan.rivet@6wind.com \
    --to=gaetan.rivet@6wind.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.