All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] introduce fail-safe PMD
@ 2017-03-03 15:40 Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 01/12] ethdev: save VLAN filter setting Gaetan Rivet
                   ` (14 more replies)
  0 siblings, 15 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, Adrien Mazarguil, Jingjing Wu

This PMD intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back so that
existing applications do not need to be modified to benefit from true
hot-plugging support.

The stacked PMD approach shares many similarities with the bonding PMD but
with a different purpose. While bonding provides the ability to group
several links into a single logical device for enhanced throughput and
supports fail-over at link level, this one manages the sudden disappearance
of the underlying device; it guarantees applications face a valid device in
working order at all times.

Each fail-safe instance is configured to run atop one or several
devices, with one defined as the preferred device. Hot-plug events are
handled on all of them, and Tx is always directed to the preferred device
if present or to the next available failover device (Rx is always performed
on all devices for simplicity).

Moreover, the configured slaves (preferred or failover) do not need to be
present at initialization time and may appear later.

Slaves configuration is continuously synchronized with that of the virtual
device, which exposes their common set of capabilities to the application.
Failure to apply the current configuration state to a slave for any reason
simply reschedules its initialization.

This series depends on the series
[PATCH 0/4] clarify eth_dev state management
[PATCH 0/5] add device removal event

Gaetan Rivet (12):
  ethdev: save VLAN filter setting
  ethdev: add flow API rule copy function
  ethdev: add deferred intermediate device state
  pci: expose device detach routine
  pci: expose parse and probe routines
  net/failsafe: add fail-safe PMD
  net/failsafe: add plug-in support
  net/failsafe: add flexible device definition
  net/failsafe: support flow API
  net/failsafe: support offload capabilities
  net/failsafe: add fast burst functions
  net/failsafe: support device removal

 MAINTAINERS                                     |   4 +
 app/test-pmd/config.c                           | 263 ++------
 app/test-pmd/testpmd.h                          |   5 +-
 config/common_linuxapp                          |   1 +
 drivers/net/Makefile                            |   1 +
 drivers/net/failsafe/Makefile                   |  73 +++
 drivers/net/failsafe/failsafe.c                 | 279 ++++++++
 drivers/net/failsafe/failsafe_args.c            | 503 ++++++++++++++
 drivers/net/failsafe/failsafe_eal.c             | 300 +++++++++
 drivers/net/failsafe/failsafe_ether.c           | 368 +++++++++++
 drivers/net/failsafe/failsafe_flow.c            | 230 +++++++
 drivers/net/failsafe/failsafe_ops.c             | 838 ++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h         | 326 +++++++++
 drivers/net/failsafe/failsafe_rxtx.c            | 184 ++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   6 +
 lib/librte_eal/common/eal_common_pci.c          |  16 +-
 lib/librte_eal/common/include/rte_pci.h         |  40 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c           |  39 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   7 +
 lib/librte_ether/rte_ethdev.c                   |  22 +-
 lib/librte_ether/rte_ethdev.h                   |  11 +
 lib/librte_ether/rte_flow.c                     | 283 ++++++++
 lib/librte_ether/rte_flow.h                     |  59 ++
 mk/rte.app.mk                                   |   1 +
 24 files changed, 3611 insertions(+), 248 deletions(-)
 create mode 100644 drivers/net/failsafe/Makefile
 create mode 100644 drivers/net/failsafe/failsafe.c
 create mode 100644 drivers/net/failsafe/failsafe_args.c
 create mode 100644 drivers/net/failsafe/failsafe_eal.c
 create mode 100644 drivers/net/failsafe/failsafe_ether.c
 create mode 100644 drivers/net/failsafe/failsafe_flow.c
 create mode 100644 drivers/net/failsafe/failsafe_ops.c
 create mode 100644 drivers/net/failsafe/failsafe_private.h
 create mode 100644 drivers/net/failsafe/failsafe_rxtx.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/12] ethdev: save VLAN filter setting
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 17:33   ` Stephen Hemminger
  2017-03-03 15:40 ` [PATCH 02/12] ethdev: add flow API rule copy function Gaetan Rivet
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Other configuration items (i.e. MAC addresses) are stored within
rte_eth_dev_data, but not this one.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 lib/librte_ether/rte_ethdev.c | 19 ++++++++++++++++++-
 lib/librte_ether/rte_ethdev.h | 10 ++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 6c4b796..61a63b7 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1764,6 +1764,7 @@ int
 rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t vlan_id, int on)
 {
 	struct rte_eth_dev *dev;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
 	dev = &rte_eth_devices[port_id];
@@ -1779,7 +1780,23 @@ rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t vlan_id, int on)
 	}
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vlan_filter_set, -ENOTSUP);
 
-	return (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
+	ret = (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
+	if (ret == 0) {
+		struct rte_vlan_filter_conf *vfc;
+		int vidx;
+		int vbit;
+
+		vfc = &dev->data->vlan_filter_conf;
+		vidx = vlan_id / 64;
+		vbit = vlan_id % 64;
+
+		if (on)
+			vfc->ids[vidx] |= UINT64_C(1) << vbit;
+		else
+			vfc->ids[vidx] &= ~(UINT64_C(1) << vbit);
+	}
+
+	return ret;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ed99660..f01d140 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -375,6 +375,14 @@ enum rte_vlan_type {
 };
 
 /**
+ * A structure used to describe a vlan filter.
+ * If the bit corresponding to a VID is set, such VID is on.
+ */
+struct rte_vlan_filter_conf {
+	uint64_t ids[64];
+};
+
+/**
  * A structure used to configure the Receive Side Scaling (RSS) feature
  * of an Ethernet port.
  * If not NULL, the *rss_key* pointer of the *rss_conf* structure points
@@ -1716,6 +1724,8 @@ struct rte_eth_dev_data {
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough */
 	int numa_node;  /**< NUMA node connection */
 	const char *drv_name;   /**< Driver name */
+	struct rte_vlan_filter_conf vlan_filter_conf;
+	/**< VLAN filter configuration. */
 };
 
 /** Device supports hotplug detach */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/12] ethdev: add flow API rule copy function
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 01/12] ethdev: save VLAN filter setting Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 03/12] ethdev: add deferred intermediate device state Gaetan Rivet
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Take this helper from the testpmd app to offer it alongside the flow
API.

This allows PMDs and applications to save flow rules in their generic
format for later processing. This is useful when rules cannot be applied
immediately, such as when the device is not properly initialized.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 app/test-pmd/config.c       | 263 +++++++---------------------------------
 app/test-pmd/testpmd.h      |   5 +-
 lib/librte_ether/rte_flow.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow.h |  59 +++++++++
 4 files changed, 387 insertions(+), 223 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 54e38c4..f20e258 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -932,208 +932,6 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 	printf("Set MTU failed. diag=%d\n", diag);
 }
 
-/* Generic flow management functions. */
-
-/** Generate flow_item[] entry. */
-#define MK_FLOW_ITEM(t, s) \
-	[RTE_FLOW_ITEM_TYPE_ ## t] = { \
-		.name = # t, \
-		.size = s, \
-	}
-
-/** Information about known flow pattern items. */
-static const struct {
-	const char *name;
-	size_t size;
-} flow_item[] = {
-	MK_FLOW_ITEM(END, 0),
-	MK_FLOW_ITEM(VOID, 0),
-	MK_FLOW_ITEM(INVERT, 0),
-	MK_FLOW_ITEM(ANY, sizeof(struct rte_flow_item_any)),
-	MK_FLOW_ITEM(PF, 0),
-	MK_FLOW_ITEM(VF, sizeof(struct rte_flow_item_vf)),
-	MK_FLOW_ITEM(PORT, sizeof(struct rte_flow_item_port)),
-	MK_FLOW_ITEM(RAW, sizeof(struct rte_flow_item_raw)), /* +pattern[] */
-	MK_FLOW_ITEM(ETH, sizeof(struct rte_flow_item_eth)),
-	MK_FLOW_ITEM(VLAN, sizeof(struct rte_flow_item_vlan)),
-	MK_FLOW_ITEM(IPV4, sizeof(struct rte_flow_item_ipv4)),
-	MK_FLOW_ITEM(IPV6, sizeof(struct rte_flow_item_ipv6)),
-	MK_FLOW_ITEM(ICMP, sizeof(struct rte_flow_item_icmp)),
-	MK_FLOW_ITEM(UDP, sizeof(struct rte_flow_item_udp)),
-	MK_FLOW_ITEM(TCP, sizeof(struct rte_flow_item_tcp)),
-	MK_FLOW_ITEM(SCTP, sizeof(struct rte_flow_item_sctp)),
-	MK_FLOW_ITEM(VXLAN, sizeof(struct rte_flow_item_vxlan)),
-};
-
-/** Compute storage space needed by item specification. */
-static void
-flow_item_spec_size(const struct rte_flow_item *item,
-		    size_t *size, size_t *pad)
-{
-	if (!item->spec)
-		goto empty;
-	switch (item->type) {
-		union {
-			const struct rte_flow_item_raw *raw;
-		} spec;
-
-	case RTE_FLOW_ITEM_TYPE_RAW:
-		spec.raw = item->spec;
-		*size = offsetof(struct rte_flow_item_raw, pattern) +
-			spec.raw->length * sizeof(*spec.raw->pattern);
-		break;
-	default:
-empty:
-		*size = 0;
-		break;
-	}
-	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
-}
-
-/** Generate flow_action[] entry. */
-#define MK_FLOW_ACTION(t, s) \
-	[RTE_FLOW_ACTION_TYPE_ ## t] = { \
-		.name = # t, \
-		.size = s, \
-	}
-
-/** Information about known flow actions. */
-static const struct {
-	const char *name;
-	size_t size;
-} flow_action[] = {
-	MK_FLOW_ACTION(END, 0),
-	MK_FLOW_ACTION(VOID, 0),
-	MK_FLOW_ACTION(PASSTHRU, 0),
-	MK_FLOW_ACTION(MARK, sizeof(struct rte_flow_action_mark)),
-	MK_FLOW_ACTION(FLAG, 0),
-	MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)),
-	MK_FLOW_ACTION(DROP, 0),
-	MK_FLOW_ACTION(COUNT, 0),
-	MK_FLOW_ACTION(DUP, sizeof(struct rte_flow_action_dup)),
-	MK_FLOW_ACTION(RSS, sizeof(struct rte_flow_action_rss)), /* +queue[] */
-	MK_FLOW_ACTION(PF, 0),
-	MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
-};
-
-/** Compute storage space needed by action configuration. */
-static void
-flow_action_conf_size(const struct rte_flow_action *action,
-		      size_t *size, size_t *pad)
-{
-	if (!action->conf)
-		goto empty;
-	switch (action->type) {
-		union {
-			const struct rte_flow_action_rss *rss;
-		} conf;
-
-	case RTE_FLOW_ACTION_TYPE_RSS:
-		conf.rss = action->conf;
-		*size = offsetof(struct rte_flow_action_rss, queue) +
-			conf.rss->num * sizeof(*conf.rss->queue);
-		break;
-	default:
-empty:
-		*size = 0;
-		break;
-	}
-	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
-}
-
-/** Generate a port_flow entry from attributes/pattern/actions. */
-static struct port_flow *
-port_flow_new(const struct rte_flow_attr *attr,
-	      const struct rte_flow_item *pattern,
-	      const struct rte_flow_action *actions)
-{
-	const struct rte_flow_item *item;
-	const struct rte_flow_action *action;
-	struct port_flow *pf = NULL;
-	size_t tmp;
-	size_t pad;
-	size_t off1 = 0;
-	size_t off2 = 0;
-	int err = ENOTSUP;
-
-store:
-	item = pattern;
-	if (pf)
-		pf->pattern = (void *)&pf->data[off1];
-	do {
-		struct rte_flow_item *dst = NULL;
-
-		if ((unsigned int)item->type >= RTE_DIM(flow_item) ||
-		    !flow_item[item->type].name)
-			goto notsup;
-		if (pf)
-			dst = memcpy(pf->data + off1, item, sizeof(*item));
-		off1 += sizeof(*item);
-		flow_item_spec_size(item, &tmp, &pad);
-		if (item->spec) {
-			if (pf)
-				dst->spec = memcpy(pf->data + off2,
-						   item->spec, tmp);
-			off2 += tmp + pad;
-		}
-		if (item->last) {
-			if (pf)
-				dst->last = memcpy(pf->data + off2,
-						   item->last, tmp);
-			off2 += tmp + pad;
-		}
-		if (item->mask) {
-			if (pf)
-				dst->mask = memcpy(pf->data + off2,
-						   item->mask, tmp);
-			off2 += tmp + pad;
-		}
-		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
-	} while ((item++)->type != RTE_FLOW_ITEM_TYPE_END);
-	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
-	action = actions;
-	if (pf)
-		pf->actions = (void *)&pf->data[off1];
-	do {
-		struct rte_flow_action *dst = NULL;
-
-		if ((unsigned int)action->type >= RTE_DIM(flow_action) ||
-		    !flow_action[action->type].name)
-			goto notsup;
-		if (pf)
-			dst = memcpy(pf->data + off1, action, sizeof(*action));
-		off1 += sizeof(*action);
-		flow_action_conf_size(action, &tmp, &pad);
-		if (action->conf) {
-			if (pf)
-				dst->conf = memcpy(pf->data + off2,
-						   action->conf, tmp);
-			off2 += tmp + pad;
-		}
-		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
-	} while ((action++)->type != RTE_FLOW_ACTION_TYPE_END);
-	if (pf != NULL)
-		return pf;
-	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
-	tmp = RTE_ALIGN_CEIL(offsetof(struct port_flow, data), sizeof(double));
-	pf = calloc(1, tmp + off1 + off2);
-	if (pf == NULL)
-		err = errno;
-	else {
-		*pf = (const struct port_flow){
-			.size = tmp + off1 + off2,
-			.attr = *attr,
-		};
-		tmp -= offsetof(struct port_flow, data);
-		off2 = tmp + off1;
-		off1 = tmp;
-		goto store;
-	}
-notsup:
-	rte_errno = err;
-	return NULL;
-}
-
 /** Print a message out of a flow error. */
 static int
 port_flow_complain(struct rte_flow_error *error)
@@ -1186,6 +984,28 @@ port_flow_validate(portid_t port_id,
 	return 0;
 }
 
+/** Generate a port_flow entry from attributes/pattern/actions. */
+static struct port_flow *
+port_flow_new(const struct rte_flow_attr *attr,
+	      const struct rte_flow_item *pattern,
+	      const struct rte_flow_action *actions)
+{
+	struct port_flow *pf;
+
+	pf = calloc(1, sizeof(struct port_flow));
+	if (!pf)
+		return NULL;
+
+	pf->fd = rte_flow_copy(attr, pattern, actions, NULL);
+	if (pf->fd == NULL) {
+		free(pf);
+		return NULL;
+	}
+	pf->size += pf->fd->size;
+
+	return pf;
+}
+
 /** Create flow rule. */
 int
 port_flow_create(portid_t port_id,
@@ -1265,6 +1085,7 @@ port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule)
 			}
 			printf("Flow rule #%u destroyed\n", pf->id);
 			*tmp = pf->next;
+			free(pf->fd);
 			free(pf);
 			break;
 		}
@@ -1325,11 +1146,11 @@ port_flow_query(portid_t port_id, uint32_t rule,
 		printf("Flow rule #%u not found\n", rule);
 		return -ENOENT;
 	}
-	if ((unsigned int)action >= RTE_DIM(flow_action) ||
-	    !flow_action[action].name)
+	if ((unsigned int)action >= RTE_FLOW_ACTION_TYPE_MAX ||
+	    !rte_flow_desc_data_action[action].name)
 		name = "unknown";
 	else
-		name = flow_action[action].name;
+		name = rte_flow_desc_data_action[action].name;
 	switch (action) {
 	case RTE_FLOW_ACTION_TYPE_COUNT:
 		break;
@@ -1385,18 +1206,18 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		if (n) {
 			/* Filter out unwanted groups. */
 			for (i = 0; i != n; ++i)
-				if (pf->attr.group == group[i])
+				if (pf->fd->attr.group == group[i])
 					break;
 			if (i == n)
 				continue;
 		}
 		tmp = &list;
 		while (*tmp &&
-		       (pf->attr.group > (*tmp)->attr.group ||
-			(pf->attr.group == (*tmp)->attr.group &&
-			 pf->attr.priority > (*tmp)->attr.priority) ||
-			(pf->attr.group == (*tmp)->attr.group &&
-			 pf->attr.priority == (*tmp)->attr.priority &&
+		       (pf->fd->attr.group > (*tmp)->fd->attr.group ||
+			(pf->fd->attr.group == (*tmp)->fd->attr.group &&
+			 pf->fd->attr.priority > (*tmp)->fd->attr.priority) ||
+			(pf->fd->attr.group == (*tmp)->fd->attr.group &&
+			 pf->fd->attr.priority == (*tmp)->fd->attr.priority &&
 			 pf->id > (*tmp)->id)))
 			tmp = &(*tmp)->tmp;
 		pf->tmp = *tmp;
@@ -1404,24 +1225,28 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 	}
 	printf("ID\tGroup\tPrio\tAttr\tRule\n");
 	for (pf = list; pf != NULL; pf = pf->tmp) {
-		const struct rte_flow_item *item = pf->pattern;
-		const struct rte_flow_action *action = pf->actions;
+		const struct rte_flow_item *item = pf->fd->items;
+		const struct rte_flow_action *action = pf->fd->actions;
 
 		printf("%" PRIu32 "\t%" PRIu32 "\t%" PRIu32 "\t%c%c\t",
 		       pf->id,
-		       pf->attr.group,
-		       pf->attr.priority,
-		       pf->attr.ingress ? 'i' : '-',
-		       pf->attr.egress ? 'e' : '-');
+		       pf->fd->attr.group,
+		       pf->fd->attr.priority,
+		       pf->fd->attr.ingress ? 'i' : '-',
+		       pf->fd->attr.egress ? 'e' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
 			if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
-				printf("%s ", flow_item[item->type].name);
+				printf("%s ",
+				       rte_flow_desc_data_item[item->type].
+									name);
 			++item;
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
 			if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
-				printf(" %s", flow_action[action->type].name);
+				printf(" %s",
+				       rte_flow_desc_data_action[action->type].
+									name);
 			++action;
 		}
 		printf("\n");
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 85d3e82..6002388 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -153,10 +153,7 @@ struct port_flow {
 	struct port_flow *tmp; /**< Temporary linking. */
 	uint32_t id; /**< Flow rule ID. */
 	struct rte_flow *flow; /**< Opaque flow object returned by PMD. */
-	struct rte_flow_attr attr; /**< Attributes. */
-	struct rte_flow_item *pattern; /**< Pattern. */
-	struct rte_flow_action *actions; /**< Actions. */
-	uint8_t data[]; /**< Storage for pattern/actions. */
+	struct rte_flow_desc *fd; /**< Generic flow description */
 };
 
 /**
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index aaa70d6..c8e2e93 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -33,6 +33,7 @@
 
 #include <stdint.h>
 
+#include <rte_common.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include "rte_ethdev.h"
@@ -157,3 +158,285 @@ rte_flow_query(uint8_t port_id,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, rte_strerror(ENOSYS));
 }
+
+/** Information about known flow pattern items. */
+const struct rte_flow_desc_data rte_flow_desc_data_item[] = {
+	[RTE_FLOW_ITEM_TYPE_END] = {
+		.name = "END",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_VOID] = {
+		.name = "VOID",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_INVERT] = {
+		.name = "INVERT",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_ANY] = {
+		.name = "ANY",
+		.size = sizeof(struct rte_flow_item_any),
+	},
+	[RTE_FLOW_ITEM_TYPE_PF] = {
+		.name = "PF",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_VF] = {
+		.name = "VF",
+		.size = sizeof(struct rte_flow_item_vf),
+	},
+	[RTE_FLOW_ITEM_TYPE_PORT] = {
+		.name = "PORT",
+		.size = sizeof(struct rte_flow_item_port),
+	},
+	[RTE_FLOW_ITEM_TYPE_RAW] = {
+		.name = "RAW",
+		.size = sizeof(struct rte_flow_item_raw), /* +pattern[] */
+	},
+	[RTE_FLOW_ITEM_TYPE_ETH] = {
+		.name = "ETH",
+		.size = sizeof(struct rte_flow_item_eth),
+	},
+	[RTE_FLOW_ITEM_TYPE_VLAN] = {
+		.name = "VLAN",
+		.size = sizeof(struct rte_flow_item_vlan),
+	},
+	[RTE_FLOW_ITEM_TYPE_IPV4] = {
+		.name = "IPV4",
+		.size = sizeof(struct rte_flow_item_ipv4),
+	},
+	[RTE_FLOW_ITEM_TYPE_IPV6] = {
+		.name = "IPV6",
+		.size = sizeof(struct rte_flow_item_ipv6),
+	},
+	[RTE_FLOW_ITEM_TYPE_ICMP] = {
+		.name = "ICMP",
+		.size = sizeof(struct rte_flow_item_icmp),
+	},
+	[RTE_FLOW_ITEM_TYPE_UDP] = {
+		.name = "UDP",
+		.size = sizeof(struct rte_flow_item_udp),
+	},
+	[RTE_FLOW_ITEM_TYPE_TCP] = {
+		.name = "TCP",
+		.size = sizeof(struct rte_flow_item_tcp),
+	},
+	[RTE_FLOW_ITEM_TYPE_SCTP] = {
+		.name = "SCTP",
+		.size = sizeof(struct rte_flow_item_sctp),
+	},
+	[RTE_FLOW_ITEM_TYPE_VXLAN] = {
+		.name = "VXLAN",
+		.size = sizeof(struct rte_flow_item_vxlan),
+	},
+	[RTE_FLOW_ITEM_TYPE_MAX] = {
+		.name = "MAX",
+		.size = 0,
+	},
+};
+
+/** Compute storage space needed by item specification. */
+static void
+flow_item_spec_size(const struct rte_flow_item *item,
+		    size_t *size, size_t *pad)
+{
+	union {
+		const struct rte_flow_item_raw *raw;
+	} spec;
+
+	if (!item->spec)
+		goto empty;
+	switch (item->type) {
+	case RTE_FLOW_ITEM_TYPE_RAW:
+		spec.raw = item->spec;
+		*size = offsetof(struct rte_flow_item_raw, pattern) +
+			spec.raw->length * sizeof(*spec.raw->pattern);
+		break;
+	default:
+empty:
+		*size = 0;
+		break;
+	}
+	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
+}
+
+/** Information about known flow actions. */
+const struct rte_flow_desc_data rte_flow_desc_data_action[] = {
+	[RTE_FLOW_ACTION_TYPE_END] = {
+		.name = "END",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_VOID] = {
+		.name = "VOID",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_PASSTHRU] = {
+		.name = "PASSTHRU",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_MARK] = {
+		.name = "MARK",
+		.size = sizeof(struct rte_flow_action_mark),
+	},
+	[RTE_FLOW_ACTION_TYPE_FLAG] = {
+		.name = "FLAG",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = {
+		.name = "QUEUE",
+		.size = sizeof(struct rte_flow_action_queue),
+	},
+	[RTE_FLOW_ACTION_TYPE_DROP] = {
+		.name = "DROP",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_COUNT] = {
+		.name = "COUNT",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_DUP] = {
+		.name = "DUP",
+		.size = sizeof(struct rte_flow_action_dup),
+	},
+	[RTE_FLOW_ACTION_TYPE_RSS] = {
+		.name = "RSS",
+		.size = sizeof(struct rte_flow_action_rss), /* +queue[] */
+	},
+	[RTE_FLOW_ACTION_TYPE_PF] = {
+		.name = "PF",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_VF] = {
+		.name = "VF",
+		.size = sizeof(struct rte_flow_action_vf),
+	},
+	[RTE_FLOW_ACTION_TYPE_MAX] = {
+		.name = "MAX",
+		.size = 0,
+	},
+};
+
+/** Compute storage space needed by action configuration. */
+static void
+flow_action_conf_size(const struct rte_flow_action *action,
+		      size_t *size, size_t *pad)
+{
+	union {
+		const struct rte_flow_action_rss *rss;
+	} conf;
+
+	if (!action->conf)
+		goto empty;
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		conf.rss = action->conf;
+		*size = offsetof(struct rte_flow_action_rss, queue) +
+			conf.rss->num * sizeof(*conf.rss->queue);
+		break;
+	default:
+empty:
+		*size = 0;
+		break;
+	}
+	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
+}
+
+/** Copy a comprehensive description of an rte_flow. */
+struct rte_flow_desc *
+rte_flow_copy(const struct rte_flow_attr *attr,
+	      const struct rte_flow_item *items,
+	      const struct rte_flow_action *actions,
+	      void *(zalloc)(size_t, size_t))
+{
+	const struct rte_flow_item *item;
+	const struct rte_flow_action *action;
+	struct rte_flow_desc *fd = NULL;
+	size_t tmp;
+	size_t pad;
+	size_t off1 = 0;
+	size_t off2 = 0;
+	int err = ENOTSUP;
+
+	if (zalloc == NULL)
+		zalloc = &calloc;
+
+store:
+	item = items;
+	if (fd)
+		fd->items = (void *)&fd->data[off1];
+	do {
+		struct rte_flow_item *dst = NULL;
+
+		if ((unsigned int)item->type >=
+			RTE_DIM(rte_flow_desc_data_item) ||
+		    !rte_flow_desc_data_item[item->type].name)
+			goto notsup;
+		if (fd)
+			dst = memcpy(fd->data + off1, item, sizeof(*item));
+		off1 += sizeof(*item);
+		flow_item_spec_size(item, &tmp, &pad);
+		if (item->spec) {
+			if (fd)
+				dst->spec = memcpy(fd->data + off2,
+						   item->spec, tmp);
+			off2 += tmp + pad;
+		}
+		if (item->last) {
+			if (fd)
+				dst->last = memcpy(fd->data + off2,
+						   item->last, tmp);
+			off2 += tmp + pad;
+		}
+		if (item->mask) {
+			if (fd)
+				dst->mask = memcpy(fd->data + off2,
+						   item->mask, tmp);
+			off2 += tmp + pad;
+		}
+		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
+	} while ((item++)->type != RTE_FLOW_ITEM_TYPE_END);
+	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
+	action = actions;
+	if (fd)
+		fd->actions = (void *)&fd->data[off1];
+	do {
+		struct rte_flow_action *dst = NULL;
+
+		if ((unsigned int)action->type >=
+			RTE_DIM(rte_flow_desc_data_action) ||
+		    !rte_flow_desc_data_action[action->type].name)
+			goto notsup;
+		if (fd)
+			dst = memcpy(fd->data + off1, action, sizeof(*action));
+		off1 += sizeof(*action);
+		flow_action_conf_size(action, &tmp, &pad);
+		if (action->conf) {
+			if (fd)
+				dst->conf = memcpy(fd->data + off2,
+						   action->conf, tmp);
+			off2 += tmp + pad;
+		}
+		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
+	} while ((action++)->type != RTE_FLOW_ACTION_TYPE_END);
+	if (fd != NULL)
+		return fd;
+	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
+	tmp = RTE_ALIGN_CEIL(offsetof(struct rte_flow_desc, data),
+			     sizeof(double));
+	fd = zalloc(1, tmp + off1 + off2);
+	if (fd == NULL)
+		err = errno;
+	else {
+		*fd = (const struct rte_flow_desc){
+			.size = tmp + off1 + off2,
+			.attr = *attr,
+		};
+		tmp -= offsetof(struct rte_flow_desc, data);
+		off2 = tmp + off1;
+		off1 = tmp;
+		goto store;
+	}
+notsup:
+	rte_errno = err;
+	return NULL;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index 171a569..f0b33d2 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -282,6 +282,11 @@ enum rte_flow_item_type {
 	 * See struct rte_flow_item_nvgre.
 	 */
 	RTE_FLOW_ITEM_TYPE_NVGRE,
+
+	/**
+	 * Sentinel
+	 */
+	RTE_FLOW_ITEM_TYPE_MAX,
 };
 
 /**
@@ -779,6 +784,11 @@ enum rte_flow_action_type {
 	 * See struct rte_flow_action_vf.
 	 */
 	RTE_FLOW_ACTION_TYPE_VF,
+
+	/**
+	 * Sentinel
+	 */
+	RTE_FLOW_ACTION_TYPE_MAX,
 };
 
 /**
@@ -1083,6 +1093,55 @@ rte_flow_query(uint8_t port_id,
 	       void *data,
 	       struct rte_flow_error *error);
 
+/**
+ * Generic flow generator description.
+ *
+ * This description must be sufficient to describe an rte_flow independently
+ * from any PMD implementation and allow for replayability and identification.
+ */
+struct rte_flow_desc {
+	size_t size; /**< Allocated space including data[]. */
+	struct rte_flow_attr attr; /**< Attributes. */
+	struct rte_flow_item *items; /**< Items. */
+	struct rte_flow_action *actions; /**< Actions. */
+	uint8_t data[]; /**< Storage for items/actions. */
+};
+
+/**
+ * Copy an rte_flow rule description
+ *
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param zalloc
+ *   A zeroing allocator function. If NULL, the standard calloc() function is
+ *   used.
+ *
+ * @return
+ *   The flow description structure sufficient to re-create or identify a
+ *   specific flow if successful ; NULL otherwise, in which case rte_error
+ *   is set.
+ */
+struct rte_flow_desc *
+rte_flow_copy(const struct rte_flow_attr *attr,
+	      const struct rte_flow_item *items,
+	      const struct rte_flow_action *actions,
+	      void *(zalloc)(size_t, size_t));
+
+/**
+ * Flow elements description tables.
+ */
+struct rte_flow_desc_data {
+	const char *name;
+	size_t size;
+};
+
+extern const struct rte_flow_desc_data rte_flow_desc_data_item[];
+extern const struct rte_flow_desc_data rte_flow_desc_data_action[];
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/12] ethdev: add deferred intermediate device state
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 01/12] ethdev: save VLAN filter setting Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 02/12] ethdev: add flow API rule copy function Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 17:34   ` Stephen Hemminger
  2017-03-03 15:40 ` [PATCH 04/12] pci: expose device detach routine Gaetan Rivet
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

This device state means that the device is managed externally, by
whichever party has set this state (PMD or application).

Note: this new device state is only an information. The related device
structure and operators are still valid and can be used normally.

It is however made private by device management helpers within ethdev,
making the device invisible to applications.
---
 lib/librte_ether/rte_ethdev.c | 3 ++-
 lib/librte_ether/rte_ethdev.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 61a63b7..7824f87 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -382,7 +382,8 @@ int
 rte_eth_dev_is_valid_port(uint8_t port_id)
 {
 	if (port_id >= RTE_MAX_ETHPORTS ||
-	    rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
+	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
+	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
 		return 0;
 	else
 		return 1;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f01d140..ae1e9e6 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1629,6 +1629,7 @@ struct rte_eth_rxtx_callback {
 enum rte_eth_dev_state {
 	RTE_ETH_DEV_UNUSED = 0,
 	RTE_ETH_DEV_ATTACHED,
+	RTE_ETH_DEV_DEFERRED,
 };
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/12] pci: expose device detach routine
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (2 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 03/12] ethdev: add deferred intermediate device state Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 05/12] pci: expose parse and probe routines Gaetan Rivet
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Make the pci_detach_all_drivers public. No further changes.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  5 +++++
 lib/librte_eal/common/eal_common_pci.c          |  6 +++---
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  5 +++++
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 2cf1ac8..3a87756 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -185,3 +185,8 @@ DPDK_17.02 {
 	rte_bus_unregister;
 
 } DPDK_16.11;
+
+DPDK_17.05 {
+	rte_eal_pci_detach_all_drivers;
+
+} DPDK_17.02;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 72547bd..d38335a 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -313,8 +313,8 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
  * registered driver for the given device. Return -1 if initialization
  * failed, return 1 if no driver is found for this device.
  */
-static int
-pci_detach_all_drivers(struct rte_pci_device *dev)
+int
+rte_eal_pci_detach_all_drivers(struct rte_pci_device *dev)
 {
 	struct rte_pci_driver *dr = NULL;
 	int rc = 0;
@@ -388,7 +388,7 @@ rte_eal_pci_detach(const struct rte_pci_addr *addr)
 		if (rte_eal_compare_pci_addr(&dev->addr, addr))
 			continue;
 
-		ret = pci_detach_all_drivers(dev);
+		ret = rte_eal_pci_detach_all_drivers(dev);
 		if (ret < 0)
 			goto err_return;
 
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 4a67883..598a1ef 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -459,6 +459,21 @@ void pci_unmap_resource(void *requested_addr, size_t size);
 int rte_eal_pci_probe_one(const struct rte_pci_addr *addr);
 
 /**
+ * Remove any PCI drivers tied to the device.
+ *
+ * If vendor/device ID match, call the remove() function of all
+ * registered driver for the given device.
+ *
+ * @param dev
+ *	The PCI device to remove
+ * @return
+ *   - 0 on success.
+ *   - Negative on error.
+ *   - Positive if no driver exists for that device.
+ */
+int rte_eal_pci_detach_all_drivers(struct rte_pci_device *dev);
+
+/**
  * Close the single PCI device.
  *
  * Scan the content of the PCI bus, and find the pci device specified by pci
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 3c68ff5..192c0d5 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -189,3 +189,8 @@ DPDK_17.02 {
 	rte_bus_unregister;
 
 } DPDK_16.11;
+
+DPDK_17.05 {
+	rte_eal_pci_detach_all_drivers;
+
+} DPDK_17.02;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/12] pci: expose parse and probe routines
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (3 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 04/12] pci: expose device detach routine Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 06/12] net/failsafe: add fail-safe PMD Gaetan Rivet
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Make pci_probe_all_drivers public, no further changes to it.
Introduce a public function for pci_scan_one. This functions scan one
device, but does not allocate that device or insert it within the device
list.
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_pci.c          | 10 +++----
 lib/librte_eal/common/include/rte_pci.h         | 25 ++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 39 +++++++++++++++----------
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
 5 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 3a87756..ba8f8e4 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -188,5 +188,6 @@ DPDK_17.02 {
 
 DPDK_17.05 {
 	rte_eal_pci_detach_all_drivers;
+	rte_eal_pci_probe_all_drivers;
 
 } DPDK_17.02;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index d38335a..15a0c48 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -282,8 +282,8 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,
  * registered driver for the given device. Return -1 if initialization
  * failed, return 1 if no driver is found for this device.
  */
-static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
+int
+rte_eal_pci_probe_all_drivers(struct rte_pci_device *dev)
 {
 	struct rte_pci_driver *dr = NULL;
 	int rc = 0;
@@ -358,7 +358,7 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
 		if (rte_eal_compare_pci_addr(&dev->addr, addr))
 			continue;
 
-		ret = pci_probe_all_drivers(dev);
+		ret = rte_eal_pci_probe_all_drivers(dev);
 		if (ret)
 			goto err_return;
 		return 0;
@@ -430,10 +430,10 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			ret = pci_probe_all_drivers(dev);
+			ret = rte_eal_pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
 			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-			ret = pci_probe_all_drivers(dev);
+			ret = rte_eal_pci_probe_all_drivers(dev);
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 598a1ef..4291da8 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -373,6 +373,16 @@ rte_eal_compare_pci_addr(const struct rte_pci_addr *addr,
 int rte_eal_pci_scan(void);
 
 /**
+ * Parse the content of one PCI entry.
+ *
+ * @return
+ *  0 on success, negative on error
+ */
+int rte_eal_pci_parse_sysfs_entry(struct rte_pci_device *dev,
+				 const char *dirname,
+				 const struct rte_pci_addr *addr);
+
+/**
  * Probe the PCI bus for registered drivers.
  *
  * Scan the content of the PCI bus, and call the probe() function for
@@ -459,6 +469,21 @@ void pci_unmap_resource(void *requested_addr, size_t size);
 int rte_eal_pci_probe_one(const struct rte_pci_addr *addr);
 
 /**
+ * Probe all pci drivers against the device.
+ *
+ * If vendor/device ID match, call the probe() function of all
+ * registered driver for the given device.
+ *
+ * @param dev
+ *	The PCI device to probe
+ * @return
+ *   - 0 on success.
+ *   - Negative on error.
+ *   - Positive if no driver exists for that device.
+ */
+int rte_eal_pci_probe_all_drivers(struct rte_pci_device *dev);
+
+/**
  * Remove any PCI drivers tied to the device.
  *
  * If vendor/device ID match, call the remove() function of all
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index e2fc219..51c6b84 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -226,27 +226,22 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
 	return -1;
 }
 
-/* Scan one pci sysfs entry, and fill the devices list from it. */
-static int
-pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
+/* Parse a pci sysfs entry */
+int
+rte_eal_pci_parse_sysfs_entry(struct rte_pci_device *dev, const char *dirname,
+			const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
-	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
-		return -1;
-
 	memset(dev, 0, sizeof(*dev));
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -254,7 +249,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -263,7 +257,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -272,7 +265,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -281,7 +273,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -320,7 +311,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		free(dev);
 		return -1;
 	}
 
@@ -329,7 +319,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		free(dev);
 		return -1;
 	}
 
@@ -345,6 +334,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	} else
 		dev->kdrv = RTE_KDRV_NONE;
 
+	return 0;
+}
+
+/* Scan one pci sysfs entry, and fill the devices list from it. */
+static int
+pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
+{
+	struct rte_pci_device *dev;
+	int ret;
+
+	dev = malloc(sizeof(*dev));
+	if (dev == NULL)
+		return -1;
+
+	ret = rte_eal_pci_parse_sysfs_entry(dev, dirname, addr);
+	if (ret) {
+		free(dev);
+		return ret;
+	}
+
 	/* device is valid, add in list (sorted) */
 	if (TAILQ_EMPTY(&pci_device_list)) {
 		rte_eal_device_insert(&dev->device);
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 192c0d5..27b7939 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -192,5 +192,7 @@ DPDK_17.02 {
 
 DPDK_17.05 {
 	rte_eal_pci_detach_all_drivers;
+	rte_eal_pci_probe_all_drivers;
+	rte_eal_pci_parse_sysfs_entry;
 
 } DPDK_17.02;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/12] net/failsafe: add fail-safe PMD
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (4 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 05/12] pci: expose parse and probe routines Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 17:38   ` Stephen Hemminger
  2017-03-03 15:40 ` [PATCH 07/12] net/failsafe: add plug-in support Gaetan Rivet
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Introduce the fail-safe poll mode driver initialization and enable its
build infrastructure.

This PMD allows for applications to benefits from true hot-plugging
support without having to implement it.

It intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back.
It also allows defining a contingency to the removal of a device, by
designating a fail-over device that will take on transmitting operations
if the preferred device is removed.

Applications only see a fail-safe instance, without caring for
underlying operations ensuring their continued operations.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                             |   4 +
 config/common_linuxapp                  |   1 +
 drivers/net/Makefile                    |   1 +
 drivers/net/failsafe/Makefile           |  71 ++++
 drivers/net/failsafe/failsafe.c         | 229 +++++++++++
 drivers/net/failsafe/failsafe_args.c    | 347 ++++++++++++++++
 drivers/net/failsafe/failsafe_eal.c     | 318 +++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     | 677 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h | 234 +++++++++++
 drivers/net/failsafe/failsafe_rxtx.c    | 107 +++++
 mk/rte.app.mk                           |   1 +
 11 files changed, 1990 insertions(+)
 create mode 100644 drivers/net/failsafe/Makefile
 create mode 100644 drivers/net/failsafe/failsafe.c
 create mode 100644 drivers/net/failsafe/failsafe_args.c
 create mode 100644 drivers/net/failsafe/failsafe_eal.c
 create mode 100644 drivers/net/failsafe/failsafe_ops.c
 create mode 100644 drivers/net/failsafe/failsafe_private.h
 create mode 100644 drivers/net/failsafe/failsafe_rxtx.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cc3bf98..58a77d0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -315,6 +315,10 @@ M: Matej Vido <vido@cesnet.cz>
 F: drivers/net/szedata2/
 F: doc/guides/nics/szedata2.rst
 
+Failsafe PMD
+M: Gaetan Rivet <gaetan.rivet@6wind.com>
+F: drivers/net/failsafe/
+
 Intel e1000
 M: Wenzhuo Lu <wenzhuo.lu@intel.com>
 F: drivers/net/e1000/
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 00ebaac..5e18267 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -44,5 +44,6 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_NFP_PMD=y
+CONFIG_RTE_LIBRTE_FAILSAFE_PMD=y
 CONFIG_RTE_LIBRTE_POWER=y
 CONFIG_RTE_VIRTIO_USER=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 40fc333..b8dcf96 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -38,6 +38,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
 DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += e1000
 DIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
+DIRS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
new file mode 100644
index 0000000..3b7a644
--- /dev/null
+++ b/drivers/net/failsafe/Makefile
@@ -0,0 +1,71 @@
+#   BSD LICENSE
+#
+#   Copyright 2017 6WIND S.A.
+#   Copyright 2017 Mellanox.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of 6WIND S.A. nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name
+LIB = librte_pmd_failsafe.a
+
+# Sources are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe.c
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_args.c
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_rxtx.c
+
+# No exported include files
+
+# This lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += lib/librte_ether
+
+ifneq ($(DEBUG),)
+CONFIG_RTE_LIBRTE_FAILSAFE_DEBUG := y
+endif
+
+# Basic CFLAGS:
+CFLAGS += -std=gnu99 -Wall -Wextra
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=700
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+CFLAGS += -pedantic -DPEDANTIC
+
+ifeq ($(CONFIG_RTE_LIBRTE_FAILSAFE_DEBUG),y)
+CFLAGS += -g -UNDEBUG
+else
+CFLAGS += -O3
+CFLAGS += -DNDEBUG
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
new file mode 100644
index 0000000..cd60193
--- /dev/null
+++ b/drivers/net/failsafe/failsafe.c
@@ -0,0 +1,229 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <rte_alarm.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "failsafe_private.h"
+
+const char pmd_failsafe_driver_name[] = FAILSAFE_DRIVER_NAME;
+static const struct rte_eth_link eth_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_UP,
+	.link_autoneg = ETH_LINK_SPEED_AUTONEG,
+};
+
+static int
+sub_device_create(struct rte_eth_dev *dev,
+		const char *params)
+{
+	uint8_t nb_subs;
+	int ret;
+
+	ret = failsafe_args_count_subdevice(dev, params);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->subs_tail > FAILSAFE_MAX_ETHPORTS) {
+		ERROR("Cannot allocate more than %d ports",
+			FAILSAFE_MAX_ETHPORTS);
+		return -ENOSPC;
+	}
+	nb_subs = PRIV(dev)->subs_tail;
+	PRIV(dev)->subs = rte_zmalloc(NULL,
+			sizeof(struct sub_device) * nb_subs,
+			RTE_CACHE_LINE_SIZE);
+	if (PRIV(dev)->subs == NULL) {
+		ERROR("Could not allocate sub_devices");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void
+sub_device_free(struct rte_eth_dev *dev)
+{
+	rte_free(PRIV(dev)->subs);
+}
+
+static int
+eth_dev_create(const char *name,
+		const unsigned socket_id,
+		const char *params)
+{
+	struct rte_eth_dev *dev;
+	struct ether_addr *mac;
+	struct fs_priv *priv;
+	int ret;
+
+	dev = NULL;
+	priv = NULL;
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Creating fail-safe device on NUMA socket %u",
+			socket_id);
+	dev = rte_eth_dev_allocate(name);
+	if (dev == NULL) {
+		ERROR("Unable to allocate rte_eth_dev");
+		return -1;
+	}
+	priv = rte_zmalloc_socket(name, sizeof(*priv), 0, socket_id);
+	if (priv == NULL) {
+		ERROR("Unable to allocate private data");
+		goto free_dev;
+	}
+	dev->data->dev_private = priv;
+	PRIV(dev)->dev = dev;
+	dev->dev_ops = &failsafe_ops;
+	TAILQ_INIT(&dev->link_intr_cbs);
+	dev->data->dev_flags = 0x0;
+	dev->driver = NULL;
+	dev->data->kdrv = RTE_KDRV_NONE;
+	dev->data->drv_name = pmd_failsafe_driver_name;
+	dev->data->numa_node = socket_id;
+	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
+	dev->data->dev_link = eth_link;
+	PRIV(dev)->nb_mac_addr = 1;
+	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
+	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
+	if (params == NULL) {
+		ERROR("This PMD requires sub-devices, none provided");
+		goto free_priv;
+	}
+	ret = sub_device_create(dev, params);
+	if (ret) {
+		ERROR("Could not allocate sub_devices");
+		goto free_priv;
+	}
+	ret = failsafe_args_parse(dev, params);
+	if (ret)
+		goto free_subs;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		goto free_args;
+	mac = &dev->data->mac_addrs[0];
+	if (!mac_from_arg) {
+		struct sub_device *sdev;
+		uint8_t i;
+
+		/*
+		 * Use the ether_addr from first probed
+		 * device, either preferred or fallback.
+		 */
+		FOREACH_SUBDEV(sdev, i, dev)
+			if (sdev->state >= DEV_PROBED) {
+				ether_addr_copy(&ETH(sdev)->data->mac_addrs[0],
+						mac);
+				break;
+			}
+		/*
+		 * If no device has been probed and no ether_addr
+		 * has been provided on the command line, use a random
+		 * valid one.
+		 */
+		if (i == priv->subs_tail) {
+			eth_random_addr(
+				&mac->addr_bytes[0]);
+			ret = rte_eth_dev_default_mac_addr_set(
+						dev->data->port_id, mac);
+			if (ret) {
+				ERROR("Failed to set default MAC address");
+				goto free_subs;
+			}
+		}
+	}
+	INFO("MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5]);
+	return 0;
+free_args:
+	failsafe_args_free(dev);
+free_subs:
+	sub_device_free(dev);
+free_priv:
+	rte_free(priv);
+free_dev:
+	rte_eth_dev_release_port(dev);
+	return -1;
+}
+
+static int
+rte_eth_free(const char *name)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	dev = rte_eth_dev_allocated(name);
+	if (dev == NULL)
+		return -ENODEV;
+	ret = failsafe_eal_uninit(dev);
+	if (ret)
+		ERROR("Error while uninitializing sub-EAL");
+	failsafe_args_free(dev);
+	sub_device_free(dev);
+	rte_free(PRIV(dev));
+	rte_eth_dev_release_port(dev);
+	return ret;
+}
+
+static int
+rte_pmd_failsafe_probe(const char *name, const char *params)
+{
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
+			name);
+	return eth_dev_create(name, rte_socket_id(), params);
+}
+
+static int
+rte_pmd_failsafe_remove(const char *name)
+{
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Uninitializing " FAILSAFE_DRIVER_NAME " for %s", name);
+	return rte_eth_free(name);
+}
+
+static struct rte_vdev_driver failsafe_drv = {
+	.probe	= rte_pmd_failsafe_probe,
+	.remove	= rte_pmd_failsafe_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
+RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);
+RTE_PMD_REGISTER_PARAM_STRING(net_failsafe, PMD_FAILSAFE_PARAM_STRING);
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
new file mode 100644
index 0000000..faa48f4
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -0,0 +1,347 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <string.h>
+#include <errno.h>
+
+#include <rte_devargs.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+
+#include "failsafe_private.h"
+
+#define DEVARGS_MAXLEN 4096
+
+/* Callback used when a new device is found in devargs */
+typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
+		uint8_t head);
+
+int mac_from_arg;
+
+const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_MAC_KVARG,
+	NULL,
+};
+
+/*
+ * input: text.
+ * output: 0: if text[0] != '(',
+ *         0: if there are no corresponding ')'
+ *         n: distance to corresponding ')' otherwise
+ */
+static size_t
+closing_paren(const char *text)
+{
+	int nb_open = 0;
+	size_t i = 0;
+
+	while (text[i] != '\0') {
+		if (text[i] == '(')
+			nb_open++;
+		if (text[i] == ')')
+			nb_open--;
+		if (nb_open == 0)
+			return i;
+		i++;
+	}
+	return 0;
+}
+
+static int
+parse_device(struct sub_device *sdev, char *args)
+{
+	struct rte_devargs *d;
+	size_t b;
+
+	b = 0;
+	d = &sdev->devargs;
+	DEBUG("%s", args);
+	while (args[b] != ',' &&
+	       args[b] != '\0')
+		b++;
+	if (args[b] != '\0') {
+		d->args = strdup(&args[b + 1]);
+		if (d->args == NULL) {
+			ERROR("Not enough memory to store sub-device args");
+			return -ENOMEM;
+		}
+		args[b] = '\0';
+	} else {
+		d->args = strdup("");
+	}
+	if (eal_parse_pci_BDF(args, &d->pci.addr) == 0 ||
+	    eal_parse_pci_DomBDF(args, &d->pci.addr) == 0)
+		d->type = RTE_DEVTYPE_WHITELISTED_PCI;
+	else {
+		d->type = RTE_DEVTYPE_VIRTUAL;
+		snprintf(d->virt.drv_name,
+				sizeof(d->virt.drv_name), "%s", args);
+	}
+	sdev->state = DEV_PARSED;
+	return 0;
+}
+
+static int
+parse_device_param(struct rte_eth_dev *dev, const char *param,
+		uint8_t head)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	char *args = NULL;
+	size_t a, b;
+	int ret;
+
+	priv = PRIV(dev);
+	a = 0;
+	b = 0;
+	ret = 0;
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	a = b;
+	b += closing_paren(&param[b]);
+	if (a == b) {
+		ERROR("Dangling parenthesis");
+		return -EINVAL;
+	}
+	a += 1;
+	args = strndup(&param[a], b - a);
+	if (args == NULL) {
+		ERROR("Not enough memory for parameter parsing");
+		return -ENOMEM;
+	}
+	sdev = &priv->subs[head];
+	if (strncmp(param, "dev", 3) == 0) {
+		ret = parse_device(sdev, args);
+		if (ret)
+			goto free_args;
+	} else {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+free_args:
+	free(args);
+	return ret;
+}
+
+static int
+parse_sub_devices(parse_cb *cb,
+		struct rte_eth_dev *dev, const char *params)
+{
+	size_t a, b;
+	uint8_t head;
+	int ret;
+
+	a = 0;
+	head = 0;
+	ret = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',') {
+			a = b + 1;
+			continue;
+		}
+		if (params[b] == '(') {
+			size_t start = b;
+
+			b += closing_paren(&params[b]);
+			if (b == start) {
+				ERROR("Dangling parenthesis");
+				return -EINVAL;
+			}
+			ret = (*cb)(dev, &params[a], head);
+			if (ret)
+				return ret;
+			head += 1;
+			b += 1;
+			if (params[b] == '\0')
+				return 0;
+		}
+		a = b + 1;
+	}
+	return 0;
+}
+
+static int
+remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
+{
+	char buffer[DEVARGS_MAXLEN] = {0};
+	size_t a, b;
+	int i;
+
+	a = 0;
+	i = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',' || params[b] == '\0')
+			i += snprintf(&buffer[i], b - a + 1, "%s", &params[a]);
+		if (params[b] == '(') {
+			size_t start = b;
+			b += closing_paren(&params[b]);
+			if (b == start)
+				return -EINVAL;
+			b += 1;
+			if (params[b] == '\0')
+				goto out;
+		}
+		a = b + 1;
+	}
+out:
+	snprintf(params, DEVARGS_MAXLEN, "%s", buffer);
+	return 0;
+}
+
+static int
+get_mac_addr_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	struct ether_addr *ea = out;
+	int ret;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+		&ea->addr_bytes[0], &ea->addr_bytes[1],
+		&ea->addr_bytes[2], &ea->addr_bytes[3],
+		&ea->addr_bytes[4], &ea->addr_bytes[5]);
+	return ret != ETHER_ADDR_LEN;
+}
+
+int
+failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
+{
+	struct fs_priv *priv;
+	char mut_params[DEVARGS_MAXLEN] = "";
+	struct rte_kvargs *kvlist = NULL;
+	unsigned arg_count;
+	size_t n;
+	int ret;
+
+	if (dev == NULL || params == NULL)
+		return -EINVAL;
+	priv = PRIV(dev);
+	ret = 0;
+	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
+	/* default parameters */
+	mac_from_arg = 0;
+	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
+	if (n >= sizeof(mut_params)) {
+		ERROR("Parameter string too long (>=%zu)",
+				sizeof(mut_params));
+		return -ENOMEM;
+	}
+	ret = parse_sub_devices(parse_device_param,
+				dev, params);
+	if (ret < 0)
+		return ret;
+	ret = remove_sub_devices_definition(mut_params);
+	if (ret < 0)
+		return ret;
+	if (strnlen(mut_params, sizeof(mut_params)) > 0) {
+		kvlist = rte_kvargs_parse(mut_params,
+				pmd_failsafe_init_parameters);
+		if (kvlist == NULL) {
+			ERROR("Error parsing parameters, usage:\n"
+				PMD_FAILSAFE_PARAM_STRING);
+			return -1;
+		}
+		/* MAC addr */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_MAC_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_MAC_KVARG,
+					&get_mac_addr_arg,
+					&dev->data->mac_addrs[0]);
+			if (ret < 0)
+				goto free_kvlist;
+			mac_from_arg = 1;
+		}
+	}
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+void
+failsafe_args_free(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		free(sdev->devargs.args);
+		sdev->devargs.args = NULL;
+	}
+}
+
+static int
+count_device(struct rte_eth_dev *dev, const char *param,
+		uint8_t head __rte_unused)
+{
+	size_t b = 0;
+
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	if (strncmp(param, "dev", b) &&
+	    strncmp(param, "exec", b)) {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+	PRIV(dev)->subs_tail += 1;
+	return 0;
+}
+
+int
+failsafe_args_count_subdevice(struct rte_eth_dev *dev,
+			const char *params)
+{
+	return parse_sub_devices(count_device,
+				dev, params);
+}
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
new file mode 100644
index 0000000..fcee500
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -0,0 +1,318 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev *
+pci_addr_to_eth_dev(struct rte_pci_addr *addr)
+{
+	uint8_t pid;
+
+	if (addr == NULL)
+		return NULL;
+	for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+		struct rte_pci_addr *addr2;
+		struct rte_eth_dev *edev;
+
+		edev = &rte_eth_devices[pid];
+		if (edev->device == NULL ||
+		    edev->device->devargs == NULL)
+			continue;
+		addr2 = &edev->device->devargs->pci.addr;
+		if (rte_eal_compare_pci_addr(addr, addr2) == 0)
+			return edev;
+	}
+	return NULL;
+}
+
+static int
+pci_scan_one(struct sub_device *sdev)
+{
+	struct rte_devargs *da;
+	char dirname[PATH_MAX];
+
+	da = &sdev->devargs;
+	snprintf(dirname, sizeof(dirname),
+		"%s/" PCI_PRI_FMT,
+		pci_get_sysfs_path(),
+		da->pci.addr.domain,
+		da->pci.addr.bus,
+		da->pci.addr.devid,
+		da->pci.addr.function);
+	errno = 0;
+	if (rte_eal_pci_parse_sysfs_entry(&sdev->pci_device,
+		dirname, &da->pci.addr) < 0) {
+		if (errno == ENOENT) {
+			DEBUG("Could not scan requested device " PCI_PRI_FMT,
+				da->pci.addr.domain,
+				da->pci.addr.bus,
+				da->pci.addr.devid,
+				da->pci.addr.function);
+		} else {
+			ERROR("Error while scanning sysfs entry %s",
+					dirname);
+			return -1;
+		}
+	} else {
+		sdev->state = DEV_SCANNED;
+	}
+	return 0;
+}
+
+static int
+pci_scan(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			sdev->device.devargs = da;
+			ret = pci_scan_one(sdev);
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+dev_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			sdev->device.devargs = da;
+			PRIV(dev)->current_probed = i;
+			ret = rte_eal_vdev_init(da->virt.drv_name,
+						da->args);
+			if (ret)
+				return ret;
+			sdev->eth_dev =
+				rte_eth_dev_allocated(da->virt.drv_name);
+			if (ETH(sdev) == NULL) {
+				ERROR("sub_device %d init went wrong", i);
+				continue;
+			}
+			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			sdev->state = DEV_PROBED;
+		}
+	}
+	return 0;
+}
+
+static int
+pci_probe(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_pci_device *pdev = NULL;
+	struct rte_devargs *da = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_SCANNED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			pdev->device.devargs = da;
+			ret = rte_eal_pci_probe_all_drivers(pdev);
+			/* probing failed */
+			if (ret < 0) {
+				ERROR("Failed to probe requested device "
+					PCI_PRI_FMT,
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return -1;
+			}
+			/* no driver found */
+			if (ret > 0) {
+				ERROR("Requested device " PCI_PRI_FMT
+					" has no suitable driver",
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return 1;
+			}
+			sdev->eth_dev =
+				pci_addr_to_eth_dev(&pdev->addr);
+			if (ETH(sdev) == NULL) {
+				ERROR("sub_device %d init went wrong", i);
+				continue;
+			}
+			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			sdev->state = DEV_PROBED;
+		}
+	}
+	return 0;
+}
+
+int
+failsafe_eal_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	ret = pci_scan(dev);
+	if (ret)
+		return ret;
+	ret = pci_probe(dev);
+	if (ret)
+		return ret;
+	ret = dev_init(dev);
+	if (ret)
+		return ret;
+	/*
+	 * We only update TX_SUBDEV if we are not started.
+	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
+	 * preferred port only upon starting it, so that the switch is smoother.
+	 */
+	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			/* Using first probed device */
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+				DEBUG("Switching tx_dev to sub_device %d",
+				      i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static int
+dev_uninit(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			sdev->device.devargs = da;
+			/*
+			 * We are lucky here that no virtual device
+			 * currently creates multiple eth_dev.
+			 */
+			ret = rte_eal_vdev_uninit(da->virt.drv_name);
+			if (ret)
+				return ret;
+			sdev->state = DEV_PROBED - 1;
+		}
+	}
+	return 0;
+}
+
+static int
+pci_remove(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_pci_device *pdev = NULL;
+	struct rte_devargs *da = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			ret = rte_eal_pci_detach_all_drivers(pdev);
+			/* probing failed */
+			if (ret < 0) {
+				ERROR("Failed to remove requested device "
+					PCI_PRI_FMT,
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return -1;
+			}
+			/* no driver found */
+			if (ret > 0) {
+				ERROR("Requested device " PCI_PRI_FMT
+					" has no suitable driver",
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return 1;
+			}
+			sdev->state = DEV_PROBED - 1;
+		}
+	}
+	return 0;
+}
+
+int
+failsafe_eal_uninit(struct rte_eth_dev *dev)
+{
+	int ret;
+
+	ret = pci_remove(dev);
+	if (ret)
+		return ret;
+	ret = dev_uninit(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
new file mode 100644
index 0000000..470fea4
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -0,0 +1,677 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev_info default_infos = {
+	.driver_name = pmd_failsafe_driver_name,
+	/* Max possible number of elements */
+	.max_rx_pktlen = UINT32_MAX,
+	.max_rx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_tx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_mac_addrs = FAILSAFE_MAX_ETHADDR,
+	.max_hash_mac_addrs = UINT32_MAX,
+	.max_vfs = UINT16_MAX,
+	.max_vmdq_pools = UINT16_MAX,
+	.rx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	.tx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	/* Set of understood capabilities */
+	.rx_offload_capa = 0x0,
+	.tx_offload_capa = 0x0,
+	.flow_type_rss_offloads = 0x0,
+};
+
+static int
+fs_dev_configure(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PROBED)
+			continue;
+		DEBUG("Configuring sub-device %d", i);
+		ret = rte_eth_dev_configure(PORT_ID(sdev),
+					dev->data->nb_rx_queues,
+					dev->data->nb_tx_queues,
+					&dev->data->dev_conf);
+		if (ret) {
+			ERROR("Could not configure sub_device %d", i);
+			return ret;
+		}
+		sdev->state = DEV_ACTIVE;
+	}
+	return 0;
+}
+
+static int
+fs_dev_start(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_ACTIVE)
+			continue;
+		DEBUG("Starting sub_device %d", i);
+		ret = rte_eth_dev_start(PORT_ID(sdev));
+		if (ret)
+			return ret;
+		sdev->state = DEV_STARTED;
+	}
+	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+				DEBUG("Switching tx_dev to sub_device %d", i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static void
+fs_dev_stop(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+		rte_eth_dev_stop(PORT_ID(sdev));
+		sdev->state = DEV_STARTED - 1;
+	}
+}
+
+static int
+fs_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
+		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_up failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
+		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_down failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void dev_free_queues(struct rte_eth_dev *dev);
+static void
+fs_dev_close(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Closing sub_device %d", i);
+		rte_eth_dev_close(PORT_ID(sdev));
+		sdev->state = DEV_ACTIVE - 1;
+	}
+	dev_free_queues(dev);
+}
+
+static void
+fs_rx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct rxq *rxq;
+
+	if (queue == NULL)
+		return;
+	rxq = queue;
+	dev = rxq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, rx_queue_release)
+			(&ETH(sdev)->data->rx_queues[rxq->qid]);
+	rte_free(rxq);
+}
+
+static int
+fs_rx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t rx_queue_id,
+		uint16_t nb_rx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		struct rte_mempool *mb_pool)
+{
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	uint8_t i;
+	int ret;
+
+	rxq = dev->data->rx_queues[rx_queue_id];
+	if (rxq != NULL) {
+		fs_rx_queue_release(rxq);
+		dev->data->rx_queues[rx_queue_id] = NULL;
+	}
+	rxq = rte_zmalloc(NULL, sizeof(*rxq),
+			  RTE_CACHE_LINE_SIZE);
+	if (rxq == NULL)
+		return -ENOMEM;
+	rxq->qid = rx_queue_id;
+	rxq->socket_id = socket_id;
+	rxq->info.mp = mb_pool;
+	rxq->info.conf = *rx_conf;
+	rxq->info.nb_desc = nb_rx_desc;
+	rxq->priv = PRIV(dev);
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
+				rx_queue_id,
+				nb_rx_desc, socket_id,
+				rx_conf, mb_pool);
+		if (ret) {
+			ERROR("RX queue setup failed for sub_device %d", i);
+			goto free_rxq;
+		}
+	}
+	return 0;
+free_rxq:
+	fs_rx_queue_release(rxq);
+	return ret;
+}
+
+static void
+fs_tx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct txq *txq;
+
+	if (queue == NULL)
+		return;
+	txq = queue;
+	dev = txq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, tx_queue_release)
+			(&ETH(sdev)->data->tx_queues[txq->qid]);
+	rte_free(txq);
+}
+
+static int
+fs_tx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t tx_queue_id,
+		uint16_t nb_tx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	uint8_t i;
+	int ret;
+
+	txq = dev->data->tx_queues[tx_queue_id];
+	if (txq != NULL) {
+		fs_tx_queue_release(txq);
+		dev->data->tx_queues[tx_queue_id] = NULL;
+	}
+	txq = rte_zmalloc("ethdev TX queue", sizeof(*txq),
+			  RTE_CACHE_LINE_SIZE);
+	if (txq == NULL)
+		return -ENOMEM;
+	txq->qid = tx_queue_id;
+	txq->socket_id = socket_id;
+	txq->info.conf = *tx_conf;
+	txq->info.nb_desc = nb_tx_desc;
+	txq->priv = PRIV(dev);
+	dev->data->tx_queues[tx_queue_id] = txq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev),
+				tx_queue_id,
+				nb_tx_desc, socket_id,
+				tx_conf);
+		if (ret) {
+			ERROR("TX queue setup failed for sub_device %d", i);
+			goto free_txq;
+		}
+	}
+	return 0;
+free_txq:
+	fs_tx_queue_release(txq);
+	return ret;
+}
+
+static void
+dev_free_queues(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		fs_rx_queue_release(dev->data->rx_queues[i]);
+		dev->data->rx_queues[i] = NULL;
+	}
+	dev->data->nb_rx_queues = 0;
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		fs_tx_queue_release(dev->data->tx_queues[i]);
+		dev->data->tx_queues[i] = NULL;
+	}
+	dev->data->nb_tx_queues = 0;
+}
+
+static void
+fs_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_enable(PORT_ID(sdev));
+}
+
+static void
+fs_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_disable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_enable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_disable(PORT_ID(sdev));
+}
+
+static int
+fs_link_update(struct rte_eth_dev *dev,
+		int wait_to_complete)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling link_update on sub_device %d", i);
+		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
+		if (ret && ret != -1) {
+			ERROR("Link update failed for sub_device %d with error %d",
+			      i, ret);
+			return ret;
+		}
+	}
+	if (TX_SUBDEV(dev)) {
+		struct rte_eth_link *l1;
+		struct rte_eth_link *l2;
+
+		l1 = &dev->data->dev_link;
+		l2 = &ETH(TX_SUBDEV(dev))->data->dev_link;
+		if (memcmp(l1, l2, sizeof(*l1))) {
+			*l1 = *l2;
+			return 0;
+		}
+	}
+	return -1;
+}
+
+static void
+fs_stats_get(struct rte_eth_dev *dev,
+		struct rte_eth_stats *stats)
+{
+	struct sub_device *sdev;
+	struct rte_eth_stats loc_stats;
+	uint8_t j;
+	size_t i;
+
+	memset(stats, 0, sizeof(*stats));
+	memset(&loc_stats, 0, sizeof(loc_stats));
+	FOREACH_SUBDEV_ST(sdev, j, dev, DEV_PROBED) {
+		rte_eth_stats_get(PORT_ID(sdev), &loc_stats);
+		stats->ipackets += loc_stats.ipackets;
+		stats->opackets += loc_stats.opackets;
+		stats->ibytes += loc_stats.ibytes;
+		stats->obytes += loc_stats.obytes;
+		stats->imissed += loc_stats.imissed;
+		stats->ierrors += loc_stats.ierrors;
+		stats->oerrors += loc_stats.oerrors;
+		stats->rx_nombuf += loc_stats.rx_nombuf;
+		for (i = 0; i < RTE_DIM(stats->q_ipackets); i++)
+			stats->q_ipackets[i] += loc_stats.q_ipackets[i];
+		for (i = 0; i < RTE_DIM(stats->q_opackets); i++)
+			stats->q_opackets[i] += loc_stats.q_opackets[i];
+		for (i = 0; i < RTE_DIM(stats->q_ibytes); i++)
+			stats->q_ibytes[i] += loc_stats.q_ibytes[i];
+		for (i = 0; i < RTE_DIM(stats->q_obytes); i++)
+			stats->q_obytes[i] += loc_stats.q_obytes[i];
+		for (i = 0; i < RTE_DIM(stats->q_errors); i++)
+			stats->q_errors[i] += loc_stats.q_errors[i];
+	}
+}
+
+static void
+fs_stats_reset(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_stats_reset(PORT_ID(sdev));
+}
+
+/**
+ * Fail-safe dev_infos_get rules:
+ *
+ * No sub_device:
+ *   Numerables:
+ *      Use the maximum possible values for any field, so as not
+ *      to impede any further configuration effort.
+ *   Capabilities:
+ *      Limits capabilities to those that are understood by the
+ *      fail-safe PMD. This understanding stems from the fail-safe
+ *      being capable of verifying that the related capability is
+ *      expressed within the device configuration (struct rte_eth_conf).
+ *
+ * At least one probed sub_device:
+ *   Numerables:
+ *      Uses values from the active probed sub_device
+ *      The rationale here is that if any sub_device is less capable
+ *      (for example concerning the number of queues) than the active
+ *      sub_device, then its subsequent configuration will fail.
+ *      It is impossible to foresee this failure when the failing sub_device
+ *      is supposed to be plugged-in later on, so the configuration process
+ *      is the single point of failure and error reporting.
+ *   Capabilities:
+ *      Uses a logical AND of RX capabilities among
+ *      all sub_devices and the default capabilities.
+ *      Uses a logical AND of TX capabilities among
+ *      the active probed sub_device and the default capabilities.
+ *
+ */
+static void
+fs_dev_infos_get(struct rte_eth_dev *dev,
+		  struct rte_eth_dev_info *infos)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL) {
+		DEBUG("No probed device, using default infos");
+		rte_memcpy(&PRIV(dev)->infos, &default_infos,
+			   sizeof(default_infos));
+	} else {
+		uint32_t rx_offload_capa;
+
+		rx_offload_capa = default_infos.rx_offload_capa;
+		FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+			rte_eth_dev_info_get(PORT_ID(sdev),
+					&PRIV(dev)->infos);
+			rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa;
+		}
+		sdev = TX_SUBDEV(dev);
+		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
+		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
+		PRIV(dev)->infos.tx_offload_capa &=
+					default_infos.tx_offload_capa;
+		PRIV(dev)->infos.flow_type_rss_offloads &=
+					default_infos.flow_type_rss_offloads;
+	}
+	rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
+}
+
+static const uint32_t *
+fs_dev_supported_ptypes_get(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_eth_dev *edev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return NULL;
+	edev = ETH(sdev);
+	/* ENOTSUP: counts as no supported ptypes */
+	if (SUBOPS(sdev, dev_supported_ptypes_get) == NULL)
+		return NULL;
+	/*
+	 * The API does not permit to do a clean AND of all ptypes,
+	 * It is also incomplete by design and we do not really care
+	 * to have a best possible value in this context.
+	 * We just return the ptypes of the device of highest
+	 * priority, usually the PREFERRED device.
+	 */
+	return SUBOPS(sdev, dev_supported_ptypes_get)(edev);
+}
+
+static int
+fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
+		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_vlan_filter failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_flow_ctrl_get(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return 0;
+	if (SUBOPS(sdev, flow_ctrl_get) == NULL)
+		return -ENOTSUP;
+	return SUBOPS(sdev, flow_ctrl_get)(ETH(sdev), fc_conf);
+}
+
+static int
+fs_flow_ctrl_set(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
+		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_flow_ctrl_set failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void
+fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	/* No check: already done within the rte_eth_dev_mac_addr_remove
+	 * call for the fail-safe device.
+	 */
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
+				&dev->data->mac_addrs[index]);
+	PRIV(dev)->mac_addr_pool[index] = 0;
+}
+
+static void
+fs_mac_addr_add(struct rte_eth_dev *dev,
+		struct ether_addr *mac_addr,
+		uint32_t index,
+		uint32_t vmdq)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	assert(index < FAILSAFE_MAX_ETHADDR);
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
+	if (index >= PRIV(dev)->nb_mac_addr) {
+		DEBUG("Growing mac_addrs array");
+		PRIV(dev)->nb_mac_addr = index;
+	}
+	PRIV(dev)->mac_addr_pool[index] = vmdq;
+}
+
+static void
+fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
+}
+
+const struct eth_dev_ops failsafe_ops = {
+	.dev_configure = fs_dev_configure,
+	.dev_start = fs_dev_start,
+	.dev_stop = fs_dev_stop,
+	.dev_set_link_down = fs_dev_set_link_down,
+	.dev_set_link_up = fs_dev_set_link_up,
+	.dev_close = fs_dev_close,
+	.promiscuous_enable = fs_promiscuous_enable,
+	.promiscuous_disable = fs_promiscuous_disable,
+	.allmulticast_enable = fs_allmulticast_enable,
+	.allmulticast_disable = fs_allmulticast_disable,
+	.link_update = fs_link_update,
+	.stats_get = fs_stats_get,
+	.stats_reset = fs_stats_reset,
+	.dev_infos_get = fs_dev_infos_get,
+	.dev_supported_ptypes_get = fs_dev_supported_ptypes_get,
+	.mtu_set = fs_mtu_set,
+	.vlan_filter_set = fs_vlan_filter_set,
+	.rx_queue_setup = fs_rx_queue_setup,
+	.tx_queue_setup = fs_tx_queue_setup,
+	.rx_queue_release = fs_rx_queue_release,
+	.tx_queue_release = fs_tx_queue_release,
+	.flow_ctrl_get = fs_flow_ctrl_get,
+	.flow_ctrl_set = fs_flow_ctrl_set,
+	.mac_addr_remove = fs_mac_addr_remove,
+	.mac_addr_add = fs_mac_addr_add,
+	.mac_addr_set = fs_mac_addr_set,
+};
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
new file mode 100644
index 0000000..86b1613
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -0,0 +1,234 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
+#define _RTE_ETH_FAILSAFE_PRIVATE_H_
+
+#include <rte_dev.h>
+#include <rte_pci.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+
+#define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
+
+#define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PARAM_STRING	\
+	"dev(<ifc>),"			\
+	"exec(<shell command>),"	\
+	"mac=mac_addr"			\
+	""
+
+#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+
+#define FAILSAFE_MAX_ETHPORTS (RTE_MAX_ETHPORTS - 1)
+#define FAILSAFE_MAX_ETHADDR 128
+
+/* TYPES */
+
+struct rxq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	/* id of last sub_device polled */
+	uint8_t last_polled;
+	unsigned int socket_id;
+	struct rte_eth_rxq_info info;
+};
+
+struct txq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	unsigned int socket_id;
+	struct rte_eth_txq_info info;
+};
+
+enum dev_state {
+	DEV_UNDEFINED = 0,
+	DEV_PARSED,
+	DEV_SCANNED,
+	DEV_PROBED,
+	DEV_ACTIVE,
+	DEV_STARTED,
+};
+
+struct sub_device {
+	/* Exhaustive DPDK device description */
+	struct rte_devargs devargs;
+	RTE_STD_C11
+	union {
+		struct rte_device device;
+		struct rte_pci_device pci_device;
+	};
+	struct rte_eth_dev *eth_dev;
+	struct rte_eth_dev_data data;
+	/* Device state machine */
+	enum dev_state state;
+};
+
+struct fs_priv {
+	struct rte_eth_dev *dev;
+	/*
+	 * Set of sub_devices.
+	 * subs[0] is the preferred device
+	 * any other is just another slave
+	 */
+	uint8_t subs_head; /* if head == tail, no subs */
+	uint8_t subs_tail; /* first invalid */
+	uint8_t subs_tx;
+	struct sub_device *subs;
+	uint8_t current_probed;
+	/* current number of mac_addr slots allocated. */
+	uint32_t nb_mac_addr;
+	struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
+	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
+	/* current capabilities */
+	struct rte_eth_dev_info infos;
+};
+
+/* RX / TX */
+
+uint16_t failsafe_rx_burst(void *rxq,
+		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t failsafe_tx_burst(void *txq,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
+/* ARGS */
+
+int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
+void failsafe_args_free(struct rte_eth_dev *dev);
+int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
+
+/* EAL */
+
+int failsafe_eal_init(struct rte_eth_dev *dev);
+int failsafe_eal_uninit(struct rte_eth_dev *dev);
+
+/* GLOBALS */
+
+extern const char pmd_failsafe_driver_name[];
+extern const struct eth_dev_ops failsafe_ops;
+extern int mac_from_arg;
+
+/* HELPERS */
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PRIV(dev) \
+	((struct fs_priv *)(dev)->data->dev_private)
+
+/* sdev: (struct sub_device *) */
+#define ETH(sdev) \
+	((sdev)->eth_dev)
+
+/* sdev: (struct sub_device *) */
+#define PORT_ID(sdev) \
+	(ETH(sdev)->data->port_id)
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices:
+ * s:     (struct sub_device *), iterator
+ * i:     (uint8_t), increment
+ * dev:   (struct rte_eth_dev *), fail-safe ethdev
+ * state: (enum dev_state), minimum acceptable device state
+ */
+#define FOREACH_SUBDEV_ST(s, i, dev, state)				\
+	for (i = fs_find_next((dev), 0, state);				\
+	     i < PRIV(dev)->subs_tail && (s = &PRIV(dev)->subs[i]);	\
+	     i = fs_find_next((dev), i + 1, state))
+
+/**
+ * Iterator construct over fail-safe sub-devices:
+ * s:   (struct sub_device *), iterator
+ * i:   (uint8_t), increment
+ * dev: (struct rte_eth_dev *), fail-safe ethdev
+ */
+#define FOREACH_SUBDEV(s, i, dev)			\
+	FOREACH_SUBDEV_ST(s, i, dev, DEV_UNDEFINED)
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PREFERRED_SUBDEV(dev) \
+	(&PRIV(dev)->subs[0])
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define TX_SUBDEV(dev)							  \
+	(PRIV(dev)->subs_tx >= PRIV(dev)->subs_tail		   ? NULL \
+	 : (PRIV(dev)->subs[PRIV(dev)->subs_tx].state < DEV_PROBED ? NULL \
+	 : &PRIV(dev)->subs[PRIV(dev)->subs_tx]))
+
+/**
+ * s:   (struct sub_device *)
+ * ops: (struct eth_dev_ops) member
+ */
+#define SUBOPS(s, ops) \
+	(ETH(s)->dev_ops->ops)
+
+#ifndef NDEBUG
+#include <stdio.h>
+#define DEBUG__(m, ...)						\
+	(fprintf(stderr, "%s:%d: %s(): " m "%c",		\
+		 __FILE__, __LINE__, __func__, __VA_ARGS__),	\
+	 (void)0)
+#define DEBUG_(...)				\
+	(errno = ((int []){			\
+		*(volatile int *)&errno,	\
+		(DEBUG__(__VA_ARGS__), 0)	\
+	})[0])
+#define DEBUG(...) DEBUG_(__VA_ARGS__, '\n')
+#define INFO(...) DEBUG(__VA_ARGS__)
+#define WARN(...) DEBUG(__VA_ARGS__)
+#define ERROR(...) DEBUG(__VA_ARGS__)
+#else
+#define DEBUG(...) ((void)0)
+#define LOG__(level, m, ...) \
+	RTE_LOG(level, PMD, "net_failsafe: " m "%c", __VA_ARGS__)
+#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
+#define INFO(...) LOG_(INFO, __VA_ARGS__)
+#define WARN(...) LOG_(WARNING, "WARNING: " __VA_ARGS__)
+#define ERROR(...) LOG_(ERR, "ERROR: " __VA_ARGS__)
+#endif
+
+/* inlined functions */
+
+static inline uint8_t
+fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
+		enum dev_state min_state)
+{
+	while (sid < PRIV(dev)->subs_tail) {
+		if (PRIV(dev)->subs[sid].state >= min_state)
+			break;
+		sid++;
+	}
+	if (sid >= PRIV(dev)->subs_tail)
+		return PRIV(dev)->subs_tail;
+	return sid;
+}
+
+#endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
new file mode 100644
index 0000000..a45b4e5
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -0,0 +1,107 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include "failsafe_private.h"
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_rx_burst(void *queue,
+		  struct rte_mbuf **rx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	void *sub_rxq;
+	uint16_t nb_rx;
+	uint8_t nb_polled, nb_subs;
+	uint8_t i;
+
+	rxq = queue;
+	priv = rxq->priv;
+	nb_subs = priv->subs_tail - priv->subs_head;
+	nb_polled = 0;
+	for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
+		i++;
+		if (i == priv->subs_tail)
+			i = priv->subs_head;
+		sdev = &priv->subs[i];
+		if (unlikely(ETH(sdev) == NULL))
+			continue;
+		if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
+			continue;
+		if (unlikely(sdev->state != DEV_STARTED))
+			continue;
+		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
+		nb_rx = ETH(sdev)->
+			rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
+		if (nb_rx) {
+			rxq->last_polled = i;
+			return nb_rx;
+		}
+	}
+	return 0;
+}
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_tx_burst(void *queue,
+		  struct rte_mbuf **tx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	void *sub_txq;
+
+	txq = queue;
+	sdev = TX_SUBDEV(txq->priv->dev);
+	if (unlikely(sdev == NULL))
+		return 0;
+	if (unlikely(ETH(sdev) == NULL))
+		return 0;
+	if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
+		return 0;
+	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
+	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 92f3635..bbb64ce 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -112,6 +112,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)      += -lrte_pmd_e1000
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)        += -lrte_pmd_ena
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)       += -lrte_pmd_enic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)      += -lrte_pmd_fm10k
+_LDLIBS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD)   += -lrte_pmd_failsafe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)       += -lrte_pmd_i40e
 _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)      += -lrte_pmd_ixgbe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/12] net/failsafe: add plug-in support
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (5 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 06/12] net/failsafe: add fail-safe PMD Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 08/12] net/failsafe: add flexible device definition Gaetan Rivet
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Periodically check for the existence of a device.
If a device has not been initialized and exists on the system, then it
is probed and configured.

The configuration process strives to synchronize the states between the
plugged-in sub-device and the fail-safe device.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |  49 +++++++
 drivers/net/failsafe/failsafe_args.c    |  32 +++++
 drivers/net/failsafe/failsafe_eal.c     |  30 +----
 drivers/net/failsafe/failsafe_ether.c   | 227 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     |  24 ++--
 drivers/net/failsafe/failsafe_private.h |  53 ++++++++
 7 files changed, 375 insertions(+), 41 deletions(-)
 create mode 100644 drivers/net/failsafe/failsafe_ether.c

diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
index 3b7a644..38acf3c 100644
--- a/drivers/net/failsafe/Makefile
+++ b/drivers/net/failsafe/Makefile
@@ -38,6 +38,7 @@ LIB = librte_pmd_failsafe.a
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_args.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_ether.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_ops.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_rxtx.c
 
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index cd60193..36d266f 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -79,6 +79,50 @@ sub_device_free(struct rte_eth_dev *dev)
 	rte_free(PRIV(dev)->subs);
 }
 
+static void failsafe_plugin_alarm(void *arg);
+
+int
+failsafe_plugin_alarm_install(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	if (PRIV(dev)->pending_alarm)
+		return 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		if (sdev->state != PRIV(dev)->state)
+			break;
+	if (i != PRIV(dev)->subs_tail) {
+		ret = rte_eal_alarm_set(plug_in_poll * 1000,
+					failsafe_plugin_alarm,
+					dev);
+		if (ret) {
+			ERROR("Could not set up plug-in event detection");
+			return ret;
+		}
+		PRIV(dev)->pending_alarm = 1;
+	}
+	return 0;
+}
+
+static void
+failsafe_plugin_alarm(void *arg)
+{
+	struct rte_eth_dev *dev = arg;
+	int ret;
+
+	if (!PRIV(dev)->pending_alarm)
+		return;
+	PRIV(dev)->pending_alarm = 0;
+	ret = failsafe_eth_dev_state_sync(dev);
+	if (ret)
+		ERROR("Unable to synchronize sub_device state");
+	ret = failsafe_plugin_alarm_install(dev);
+	if (ret)
+		ERROR("Unable to set up next alarm");
+}
+
 static int
 eth_dev_create(const char *name,
 		const unsigned socket_id,
@@ -134,6 +178,11 @@ eth_dev_create(const char *name,
 	ret = failsafe_eal_init(dev);
 	if (ret)
 		goto free_args;
+	ret = failsafe_plugin_alarm_install(dev);
+	if (ret) {
+		ERROR("Could not set up plug-in event detection");
+		goto free_subs;
+	}
 	mac = &dev->data->mac_addrs[0];
 	if (!mac_from_arg) {
 		struct sub_device *sdev;
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index faa48f4..773b322 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -45,9 +45,11 @@
 typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
 		uint8_t head);
 
+uint64_t plug_in_poll;
 int mac_from_arg;
 
 const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_PLUG_IN_POLL_KVARG,
 	PMD_FAILSAFE_MAC_KVARG,
 	NULL,
 };
@@ -237,6 +239,24 @@ remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
 }
 
 static int
+get_u64_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	uint64_t *u64 = out;
+	char *endptr = NULL;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	errno = 0;
+	*u64 = strtoull(value, &endptr, 0);
+	if (errno != 0)
+		return -errno;
+	if (endptr == value)
+		return -1;
+	return 0;
+}
+
+static int
 get_mac_addr_arg(const char *key __rte_unused,
 		const char *value, void *out)
 {
@@ -268,6 +288,7 @@ failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 	ret = 0;
 	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
 	/* default parameters */
+	plug_in_poll = FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS;
 	mac_from_arg = 0;
 	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
 	if (n >= sizeof(mut_params)) {
@@ -290,6 +311,16 @@ failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 				PMD_FAILSAFE_PARAM_STRING);
 			return -1;
 		}
+		/* PLUG_IN event poll timer */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_PLUG_IN_POLL_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_PLUG_IN_POLL_KVARG,
+					&get_u64_arg, &plug_in_poll);
+			if (ret < 0)
+				goto free_kvlist;
+		}
 		/* MAC addr */
 		arg_count = rte_kvargs_count(kvlist,
 				PMD_FAILSAFE_MAC_KVARG);
@@ -303,6 +334,7 @@ failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 			mac_from_arg = 1;
 		}
 	}
+	PRIV(dev)->state = DEV_PARSED;
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index fcee500..a5e8c3c 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -198,8 +198,6 @@ pci_probe(struct rte_eth_dev *dev)
 int
 failsafe_eal_init(struct rte_eth_dev *dev)
 {
-	struct sub_device *sdev;
-	uint8_t i;
 	int ret;
 
 	ret = pci_scan(dev);
@@ -211,30 +209,9 @@ failsafe_eal_init(struct rte_eth_dev *dev)
 	ret = dev_init(dev);
 	if (ret)
 		return ret;
-	/*
-	 * We only update TX_SUBDEV if we are not started.
-	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
-	 * preferred port only upon starting it, so that the switch is smoother.
-	 */
-	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
-		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
-		    (TX_SUBDEV(dev) == NULL ||
-		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
-			DEBUG("Switching tx_dev to preferred sub_device");
-			PRIV(dev)->subs_tx = 0;
-		}
-	} else {
-		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
-		    TX_SUBDEV(dev) == NULL) {
-			/* Using first probed device */
-			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
-				DEBUG("Switching tx_dev to sub_device %d",
-				      i);
-				PRIV(dev)->subs_tx = i;
-				break;
-			}
-		}
-	}
+	if (PRIV(dev)->state < DEV_PROBED)
+		PRIV(dev)->state = DEV_PROBED;
+	fs_switch_dev(dev);
 	return 0;
 }
 
@@ -314,5 +291,6 @@ failsafe_eal_uninit(struct rte_eth_dev *dev)
 	ret = dev_uninit(dev);
 	if (ret)
 		return ret;
+	PRIV(dev)->state = DEV_PROBED - 1;
 	return 0;
 }
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
new file mode 100644
index 0000000..9f4a0a0
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -0,0 +1,227 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "failsafe_private.h"
+
+static int
+eth_dev_conf_apply(struct rte_eth_dev *dev,
+		struct sub_device *sdev)
+{
+	struct rte_eth_dev *edev;
+	struct rte_vlan_filter_conf *vfc1;
+	struct rte_vlan_filter_conf *vfc2;
+	uint32_t i;
+	int ret;
+
+	edev = ETH(sdev);
+	/* RX queue setup */
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct rxq *rxq;
+
+		rxq = dev->data->rx_queues[i];
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev), i,
+				rxq->info.nb_desc, rxq->socket_id,
+				&rxq->info.conf, rxq->info.mp);
+		if (ret) {
+			ERROR("rx_queue_setup failed");
+			return ret;
+		}
+	}
+	/* TX queue setup */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct txq *txq;
+
+		txq = dev->data->tx_queues[i];
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev), i,
+				txq->info.nb_desc, txq->socket_id,
+				&txq->info.conf);
+		if (ret) {
+			ERROR("tx_queue_setup failed");
+			return ret;
+		}
+	}
+	/* dev_link.link_status */
+	if (dev->data->dev_link.link_status !=
+	    edev->data->dev_link.link_status) {
+		DEBUG("Configuring link_status");
+		if (dev->data->dev_link.link_status)
+			ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		else
+			ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Failed to apply link_status");
+			return ret;
+		}
+	} else {
+		DEBUG("link_status already set");
+	}
+	/* promiscuous */
+	if (dev->data->promiscuous != edev->data->promiscuous) {
+		DEBUG("Configuring promiscuous");
+		if (dev->data->promiscuous)
+			rte_eth_promiscuous_enable(PORT_ID(sdev));
+		else
+			rte_eth_promiscuous_disable(PORT_ID(sdev));
+	} else {
+		DEBUG("promiscuous already set");
+	}
+	/* all_multicast */
+	if (dev->data->all_multicast != edev->data->all_multicast) {
+		DEBUG("Configuring all_multicast");
+		if (dev->data->all_multicast)
+			rte_eth_allmulticast_enable(PORT_ID(sdev));
+		else
+			rte_eth_allmulticast_disable(PORT_ID(sdev));
+	} else {
+		DEBUG("all_multicast already set");
+	}
+	/* MTU */
+	if (dev->data->mtu != edev->data->mtu) {
+		DEBUG("Configuring MTU");
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), dev->data->mtu);
+		if (ret) {
+			ERROR("Failed to apply MTU");
+			return ret;
+		}
+	} else {
+		DEBUG("MTU already set");
+	}
+	/* default MAC */
+	DEBUG("Configuring default MAC address");
+	ret = rte_eth_dev_default_mac_addr_set(PORT_ID(sdev),
+			&dev->data->mac_addrs[0]);
+	if (ret) {
+		ERROR("Setting default MAC address failed");
+		return ret;
+	}
+	/* additional MAC */
+	if (PRIV(dev)->nb_mac_addr > 1)
+		DEBUG("Configure additional MAC address%s",
+			(PRIV(dev)->nb_mac_addr > 2 ? "es" : ""));
+	for (i = 1; i < PRIV(dev)->nb_mac_addr; i++) {
+		struct ether_addr *ea;
+
+		ea = &dev->data->mac_addrs[i];
+		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), ea,
+				PRIV(dev)->mac_addr_pool[i]);
+		if (ret) {
+			char ea_fmt[ETHER_ADDR_FMT_SIZE];
+
+			ether_format_addr(ea_fmt, ETHER_ADDR_FMT_SIZE, ea);
+			ERROR("Adding MAC address %s failed", ea_fmt);
+		}
+	}
+	/* VLAN filter */
+	vfc1 = &dev->data->vlan_filter_conf;
+	vfc2 = &edev->data->vlan_filter_conf;
+	if (memcmp(vfc1, vfc2, sizeof(struct rte_vlan_filter_conf))) {
+		uint64_t vbit;
+		uint64_t ids;
+		size_t i;
+		uint16_t vlan_id;
+
+		DEBUG("Configuring VLAN filter");
+		for (i = 0; i < RTE_DIM(vfc1->ids); i++) {
+			if (vfc1->ids[i] == 0)
+				continue;
+			ids = vfc1->ids[i];
+			while (ids) {
+				vlan_id = 64 * i;
+				/* count trailing zeroes */
+				vbit = ~ids & (ids - 1);
+				/* clear least significant bit set */
+				ids ^= (ids ^ (ids - 1)) ^ vbit;
+				for (; vbit; vlan_id++)
+					vbit >>= 1;
+				ret = rte_eth_dev_vlan_filter(
+					PORT_ID(sdev), vlan_id, 1);
+				if (ret) {
+					ERROR("Failed to apply VLAN filter %hu",
+						vlan_id);
+					return ret;
+				}
+			}
+		}
+	} else {
+		DEBUG("VLAN filter already set");
+	}
+	return 0;
+}
+
+int
+failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint32_t inactive;
+	int ret;
+	uint8_t i;
+
+	if (PRIV(dev)->state < DEV_PROBED)
+		return 0;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->state < DEV_ACTIVE)
+		return 0;
+	inactive = 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		if (sdev->state == DEV_PROBED)
+			inactive |= UINT32_C(1) << i;
+	ret = dev->dev_ops->dev_configure(dev);
+	if (ret)
+		return ret;
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (inactive & (UINT32_C(1) << i)) {
+			ret = eth_dev_conf_apply(dev, sdev);
+			if (ret) {
+				ERROR("Could not apply configuration to sub_device %d",
+				      i);
+				/* TODO: disable device */
+				return ret;
+			}
+		}
+	}
+	/* If new devices have been configured, check if
+	 * the link state has changed.
+	 */
+	if (inactive)
+		dev->dev_ops->link_update(dev, 1);
+	if (PRIV(dev)->state < DEV_STARTED)
+		return 0;
+	ret = dev->dev_ops->dev_start(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 470fea4..e96b43c 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -89,6 +89,8 @@ fs_dev_configure(struct rte_eth_dev *dev)
 		}
 		sdev->state = DEV_ACTIVE;
 	}
+	if (PRIV(dev)->state < DEV_ACTIVE)
+		PRIV(dev)->state = DEV_ACTIVE;
 	return 0;
 }
 
@@ -108,21 +110,9 @@ fs_dev_start(struct rte_eth_dev *dev)
 			return ret;
 		sdev->state = DEV_STARTED;
 	}
-	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
-		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
-			DEBUG("Switching tx_dev to preferred sub_device");
-			PRIV(dev)->subs_tx = 0;
-		}
-	} else {
-		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
-		    TX_SUBDEV(dev) == NULL) {
-			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
-				DEBUG("Switching tx_dev to sub_device %d", i);
-				PRIV(dev)->subs_tx = i;
-				break;
-			}
-		}
-	}
+	if (PRIV(dev)->state < DEV_STARTED)
+		PRIV(dev)->state = DEV_STARTED;
+	fs_switch_dev(dev);
 	return 0;
 }
 
@@ -132,6 +122,7 @@ fs_dev_stop(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
+	PRIV(dev)->state = DEV_STARTED - 1;
 	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
 		rte_eth_dev_stop(PORT_ID(sdev));
 		sdev->state = DEV_STARTED - 1;
@@ -183,6 +174,9 @@ fs_dev_close(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
+	if (PRIV(dev)->state == DEV_STARTED)
+		dev->dev_ops->dev_stop(dev);
+	PRIV(dev)->state = DEV_ACTIVE - 1;
 	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Closing sub_device %d", i);
 		rte_eth_dev_close(PORT_ID(sdev));
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index 86b1613..8659c6f 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -42,9 +42,11 @@
 #define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
 
 #define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PLUG_IN_POLL_KVARG "plug_in_poll"
 #define PMD_FAILSAFE_PARAM_STRING	\
 	"dev(<ifc>),"			\
 	"exec(<shell command>),"	\
+	"plug_in_poll=u64,"		\
 	"mac=mac_addr"			\
 	""
 
@@ -112,8 +114,20 @@ struct fs_priv {
 	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
 	/* current capabilities */
 	struct rte_eth_dev_info infos;
+	/* Device state machine.
+	 * This level will be tracking state of the EAL and eth
+	 * layer at large as defined by the user application.
+	 * It will then steer the sub_devices toward the same
+	 * synchronized state.
+	 */
+	enum dev_state state;
+	unsigned int pending_alarm:1; /* An alarm is pending */
 };
 
+/* MISC */
+
+int failsafe_plugin_alarm_install(struct rte_eth_dev *dev);
+
 /* RX / TX */
 
 uint16_t failsafe_rx_burst(void *rxq,
@@ -132,10 +146,15 @@ int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
 int failsafe_eal_init(struct rte_eth_dev *dev);
 int failsafe_eal_uninit(struct rte_eth_dev *dev);
 
+/* ETH_DEV */
+
+int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
+
 /* GLOBALS */
 
 extern const char pmd_failsafe_driver_name[];
 extern const struct eth_dev_ops failsafe_ops;
+extern uint64_t plug_in_poll;
 extern int mac_from_arg;
 
 /* HELPERS */
@@ -231,4 +250,38 @@ fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
 	return sid;
 }
 
+static inline void
+fs_switch_dev(struct rte_eth_dev *dev)
+{
+	enum dev_state req_state;
+
+	req_state = PRIV(dev)->state;
+	if (PREFERRED_SUBDEV(dev)->state >= req_state) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (req_state == DEV_STARTED) ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < req_state) ||
+		   TX_SUBDEV(dev) == NULL) {
+		struct sub_device *sdev;
+		uint8_t i;
+
+		/* Using acceptable device */
+		FOREACH_SUBDEV_ST(sdev, i, dev, req_state) {
+			DEBUG("Switching tx_dev to sub_device %d",
+			      i);
+			PRIV(dev)->subs_tx = i;
+			break;
+		}
+	} else if (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < req_state) {
+		DEBUG("No device ready, deactivating tx_dev");
+		PRIV(dev)->subs_tx = PRIV(dev)->subs_tail;
+	} else {
+		return;
+	}
+}
+
 #endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/12] net/failsafe: add flexible device definition
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (6 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 07/12] net/failsafe: add plug-in support Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 09/12] net/failsafe: support flow API Gaetan Rivet
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Add the "exec" device type.
The parameters given to this type of device will be executed in a shell.
The output of this command is then used as a definition for a device.

That command can be re-interpreted if the related device is not
plugged-in. It allows for a device definition to react to system
changes (e.g. changing PCI bus for a given device).

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/failsafe_args.c    | 102 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ether.c   |   7 +++
 drivers/net/failsafe/failsafe_private.h |   4 ++
 3 files changed, 113 insertions(+)

diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index 773b322..839831f 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -112,6 +112,80 @@ parse_device(struct sub_device *sdev, char *args)
 	return 0;
 }
 
+static void
+sanitize_cmdline(char *args)
+{
+	size_t len;
+
+	len = strnlen(args, DEVARGS_MAXLEN);
+	args[len - 1] = '\0';
+}
+
+static int
+execute_cmd(struct sub_device *sdev, char *cmdline)
+{
+	FILE *fp;
+	/* store possible newline as well */
+	char output[DEVARGS_MAXLEN + 1];
+	size_t len;
+	int old_err;
+	int ret;
+
+	DEBUG("'%s'", cmdline);
+	if (cmdline == NULL &&
+	    sdev->cmdline == NULL) {
+		/* debug: should never happen to a user */
+		DEBUG("Invalid command line");
+		return -EINVAL;
+	}
+	if (sdev->cmdline == NULL) {
+		char *new_str;
+
+		len = strlen(cmdline) + 1;
+		new_str = rte_realloc(sdev->cmdline, len,
+				RTE_CACHE_LINE_SIZE);
+		if (new_str == NULL) {
+			ERROR("Command line allocation failed");
+			return -ENOMEM;
+		}
+		sdev->cmdline = new_str;
+		snprintf(sdev->cmdline, len, "%s", cmdline);
+	} else {
+		if (strcmp(sdev->cmdline, cmdline))
+			DEBUG("cmd mismatch: '%s' != '%s'",
+				sdev->cmdline, cmdline);
+		cmdline = sdev->cmdline;
+	}
+	old_err = errno;
+	fp = popen(cmdline, "r");
+	if (fp == NULL) {
+		ret = errno;
+		ERROR("popen: %s", strerror(errno));
+		errno = old_err;
+		return ret;
+	}
+	/* We only read one line */
+	if (fgets(output, sizeof(output) - 1, fp) == NULL) {
+		DEBUG("Could not read command output");
+		return -ENODEV;
+	}
+	sanitize_cmdline(output);
+	ret = parse_device(sdev, output);
+	if (ret) {
+		ERROR("Parsing device '%s' failed", output);
+		goto ret_pclose;
+	}
+ret_pclose:
+	ret = pclose(fp);
+	if (ret) {
+		ret = errno;
+		ERROR("pclose: %s", strerror(errno));
+		errno = old_err;
+		return ret;
+	}
+	return ret;
+}
+
 static int
 parse_device_param(struct rte_eth_dev *dev, const char *param,
 		uint8_t head)
@@ -146,6 +220,14 @@ parse_device_param(struct rte_eth_dev *dev, const char *param,
 		ret = parse_device(sdev, args);
 		if (ret)
 			goto free_args;
+	} else if (strncmp(param, "exec", 4) == 0) {
+		ret = execute_cmd(sdev, args);
+		if (ret == -ENODEV) {
+			DEBUG("Reading device info from command line failed");
+			ret = 0;
+		}
+		if (ret)
+			goto free_args;
 	} else {
 		ERROR("Unrecognized device type: %.*s", (int)b, param);
 		return -EINVAL;
@@ -347,6 +429,8 @@ failsafe_args_free(struct rte_eth_dev *dev)
 	uint8_t i;
 
 	FOREACH_SUBDEV(sdev, i, dev) {
+		rte_free(sdev->cmdline);
+		sdev->cmdline = NULL;
 		free(sdev->devargs.args);
 		sdev->devargs.args = NULL;
 	}
@@ -377,3 +461,21 @@ failsafe_args_count_subdevice(struct rte_eth_dev *dev,
 	return parse_sub_devices(count_device,
 				dev, params);
 }
+
+int
+failsafe_args_parse_subs(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret = 0;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state >= DEV_PARSED)
+			continue;
+		if (sdev->cmdline)
+			ret = execute_cmd(sdev, sdev->cmdline);
+		if (ret == 0)
+			sdev->state = DEV_PARSED;
+	}
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 9f4a0a0..2d90bcb 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -188,6 +188,13 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 	int ret;
 	uint8_t i;
 
+	if (PRIV(dev)->state < DEV_PARSED)
+		return 0;
+
+	ret = failsafe_args_parse_subs(dev);
+	if (ret)
+		return ret;
+
 	if (PRIV(dev)->state < DEV_PROBED)
 		return 0;
 	ret = failsafe_eal_init(dev);
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index 8659c6f..c1aa24d 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -94,6 +94,9 @@ struct sub_device {
 	struct rte_eth_dev_data data;
 	/* Device state machine */
 	enum dev_state state;
+
+	/* Some device are defined as a command line */
+	char *cmdline;
 };
 
 struct fs_priv {
@@ -140,6 +143,7 @@ uint16_t failsafe_tx_burst(void *txq,
 int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
 void failsafe_args_free(struct rte_eth_dev *dev);
 int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
+int failsafe_args_parse_subs(struct rte_eth_dev *dev);
 
 /* EAL */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/12] net/failsafe: support flow API
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (7 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 08/12] net/failsafe: add flexible device definition Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 10/12] net/failsafe: support offload capabilities Gaetan Rivet
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |   1 +
 drivers/net/failsafe/failsafe_eal.c     |   2 +
 drivers/net/failsafe/failsafe_ether.c   |  76 +++++++++++
 drivers/net/failsafe/failsafe_flow.c    | 230 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     |  29 ++++
 drivers/net/failsafe/failsafe_private.h |  20 +++
 7 files changed, 359 insertions(+)
 create mode 100644 drivers/net/failsafe/failsafe_flow.c

diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
index 38acf3c..92519ac 100644
--- a/drivers/net/failsafe/Makefile
+++ b/drivers/net/failsafe/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_ether.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_ops.c
 SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_FAILSAFE_PMD) += failsafe_flow.c
 
 # No exported include files
 
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index 36d266f..fbe8f55 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -161,6 +161,7 @@ eth_dev_create(const char *name,
 	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
 	dev->data->dev_link = eth_link;
 	PRIV(dev)->nb_mac_addr = 1;
+	TAILQ_INIT(&PRIV(dev)->flow_list);
 	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
 	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
 	if (params == NULL) {
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index a5e8c3c..9817fc9 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -139,6 +139,7 @@ dev_init(struct rte_eth_dev *dev)
 				continue;
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			SUB_ID(sdev) = i;
 			sdev->state = DEV_PROBED;
 		}
 	}
@@ -189,6 +190,7 @@ pci_probe(struct rte_eth_dev *dev)
 				continue;
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			SUB_ID(sdev) = i;
 			sdev->state = DEV_PROBED;
 		}
 	}
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 2d90bcb..439e02c 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -33,8 +33,46 @@
 
 #include <unistd.h>
 
+#include <rte_flow.h>
+#include <rte_flow_driver.h>
+
 #include "failsafe_private.h"
 
+/** Print a message out of a flow error. */
+static int
+fs_flow_complain(struct rte_flow_error *error)
+{
+	static const char *const errstrlist[] = {
+		[RTE_FLOW_ERROR_TYPE_NONE] = "no error",
+		[RTE_FLOW_ERROR_TYPE_UNSPECIFIED] = "cause unspecified",
+		[RTE_FLOW_ERROR_TYPE_HANDLE] = "flow rule (handle)",
+		[RTE_FLOW_ERROR_TYPE_ATTR_GROUP] = "group field",
+		[RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY] = "priority field",
+		[RTE_FLOW_ERROR_TYPE_ATTR_INGRESS] = "ingress field",
+		[RTE_FLOW_ERROR_TYPE_ATTR_EGRESS] = "egress field",
+		[RTE_FLOW_ERROR_TYPE_ATTR] = "attributes structure",
+		[RTE_FLOW_ERROR_TYPE_ITEM_NUM] = "pattern length",
+		[RTE_FLOW_ERROR_TYPE_ITEM] = "specific pattern item",
+		[RTE_FLOW_ERROR_TYPE_ACTION_NUM] = "number of actions",
+		[RTE_FLOW_ERROR_TYPE_ACTION] = "specific action",
+	};
+	const char *errstr;
+	char buf[32];
+	int err = rte_errno;
+
+	if ((unsigned int)error->type >= RTE_DIM(errstrlist) ||
+			!errstrlist[error->type])
+		errstr = "unknown type";
+	else
+		errstr = errstrlist[error->type];
+	ERROR("Caught error type %d (%s): %s%s\n",
+		error->type, errstr,
+		error->cause ? (snprintf(buf, sizeof(buf), "cause: %p, ",
+				error->cause), buf) : "",
+		error->message ? error->message : "(no stated reason)");
+	return -err;
+}
+
 static int
 eth_dev_conf_apply(struct rte_eth_dev *dev,
 		struct sub_device *sdev)
@@ -42,6 +80,8 @@ eth_dev_conf_apply(struct rte_eth_dev *dev,
 	struct rte_eth_dev *edev;
 	struct rte_vlan_filter_conf *vfc1;
 	struct rte_vlan_filter_conf *vfc2;
+	struct rte_flow *flow;
+	struct rte_flow_error ferror;
 	uint32_t i;
 	int ret;
 
@@ -177,6 +217,42 @@ eth_dev_conf_apply(struct rte_eth_dev *dev,
 	} else {
 		DEBUG("VLAN filter already set");
 	}
+	/* rte_flow */
+	if (TAILQ_EMPTY(&PRIV(dev)->flow_list)) {
+		DEBUG("rte_flow already set");
+	} else {
+		DEBUG("Resetting rte_flow configuration");
+#ifndef NDEBUG
+		memset(&ferror, 0x11, sizeof(ferror));
+#endif
+		ret = rte_flow_flush(PORT_ID(sdev), &ferror);
+		if (ret) {
+			fs_flow_complain(&ferror);
+			return ret;
+		}
+#ifndef NDEBUG
+		memset(&ferror, 0x11, sizeof(ferror));
+#endif
+		i = 0;
+		rte_errno = 0;
+		DEBUG("Configuring rte_flow");
+		TAILQ_FOREACH(flow, &PRIV(dev)->flow_list, next) {
+			DEBUG("Creating flow #%" PRIu32, i++);
+			flow->flows[SUB_ID(sdev)] =
+				rte_flow_create(PORT_ID(sdev),
+						&flow->fd->attr,
+						flow->fd->items,
+						flow->fd->actions,
+						&ferror);
+			ret = rte_errno;
+			if (ret)
+				break;
+		}
+		if (ret) {
+			fs_flow_complain(&ferror);
+			return ret;
+		}
+	}
 	return 0;
 }
 
diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
new file mode 100644
index 0000000..3079786
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -0,0 +1,230 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <sys/queue.h>
+
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+#include <rte_flow.h>
+#include <rte_flow_driver.h>
+
+#include "failsafe_private.h"
+
+static void *
+fs_calloc(size_t n, size_t sz)
+{
+	size_t size;
+
+	size = n * sz;
+	if (n != 0 && size / n != sz) {
+		ERROR("size_t overflow");
+		return NULL;
+	}
+	return rte_zmalloc(NULL, size,
+			RTE_CACHE_LINE_SIZE);
+}
+
+static struct rte_flow *
+fs_flow_allocate(const struct rte_flow_attr *attr,
+		 const struct rte_flow_item *items,
+		 const struct rte_flow_action *actions)
+{
+	struct rte_flow *flow;
+
+	flow = rte_zmalloc(NULL, sizeof(struct rte_flow),
+			RTE_CACHE_LINE_SIZE);
+	if (flow == NULL) {
+		ERROR("Could not allocate new flow");
+		return NULL;
+	}
+	flow->fd = rte_flow_copy(attr, items, actions, fs_calloc);
+	if (flow->fd == NULL) {
+		ERROR("Could not allocate flow description");
+		goto free_flow;
+	}
+	return flow;
+free_flow:
+	rte_free(flow);
+
+	return NULL;
+}
+
+static void
+fs_flow_release(struct rte_flow **flow)
+{
+	rte_free((*flow)->fd);
+	rte_free(*flow);
+	*flow = NULL;
+}
+
+static int
+fs_flow_validate(struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr,
+		 const struct rte_flow_item patterns[],
+		 const struct rte_flow_action actions[],
+		 struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_flow_validate on sub_device %d", i);
+		ret = rte_flow_validate(PORT_ID(sdev),
+				attr, patterns, actions, error);
+		if (ret) {
+			ERROR("Operation rte_flow_validate failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static struct rte_flow *
+fs_flow_create(struct rte_eth_dev *dev,
+	       const struct rte_flow_attr *attr,
+	       const struct rte_flow_item patterns[],
+	       const struct rte_flow_action actions[],
+	       struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	struct rte_flow *flow;
+	uint8_t i;
+
+	flow = fs_flow_allocate(attr, patterns, actions);
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		flow->flows[i] = rte_flow_create(PORT_ID(sdev),
+				attr, patterns, actions, error);
+		if (flow->flows[i] == NULL) {
+			ERROR("Failed to create flow on sub_device %d",
+				i);
+			goto err;
+		}
+	}
+	TAILQ_INSERT_TAIL(&PRIV(dev)->flow_list, flow, next);
+	return flow;
+err:
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (flow->flows[i] != NULL)
+			rte_flow_destroy(PORT_ID(sdev),
+				flow->flows[i], error);
+	}
+	fs_flow_release(&flow);
+	return NULL;
+}
+
+static int
+fs_flow_destroy(struct rte_eth_dev *dev,
+		struct rte_flow *flow,
+		struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	if (flow == NULL) {
+		ERROR("Invalid flow");
+		return -EINVAL;
+	}
+	ret = 0;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		int local_ret;
+
+		if (flow->flows[i] == NULL)
+			continue;
+		local_ret = rte_flow_destroy(PORT_ID(sdev),
+				flow->flows[i], error);
+		if (local_ret) {
+			ERROR("Failed to destroy flow on sub_device %d: %d",
+					i, local_ret);
+			if (ret == 0)
+				ret = local_ret;
+		}
+	}
+	TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
+	rte_free(flow);
+	return ret;
+}
+
+static int
+fs_flow_flush(struct rte_eth_dev *dev,
+	      struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	struct rte_flow *flow;
+	void *tmp;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_flow_flush on sub_device %d", i);
+		ret = rte_flow_flush(PORT_ID(sdev), error);
+		if (ret) {
+			ERROR("Operation rte_flow_flush failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
+		TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
+		fs_flow_release(&flow);
+	}
+	return 0;
+}
+
+static int
+fs_flow_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow,
+	      enum rte_flow_action_type type,
+	      void *arg,
+	      struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev != NULL) {
+		return rte_flow_query(PORT_ID(sdev),
+				flow->flows[SUB_ID(sdev)], type, arg, error);
+	}
+	WARN("No active sub_device to query about its flow");
+	return -1;
+}
+
+const struct rte_flow_ops fs_flow_ops = {
+	.validate = fs_flow_validate,
+	.create = fs_flow_create,
+	.destroy = fs_flow_destroy,
+	.flush = fs_flow_flush,
+	.query = fs_flow_query,
+};
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index e96b43c..57de33f 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -35,6 +35,7 @@
 #include <stdint.h>
 #include <rte_ethdev.h>
 #include <rte_malloc.h>
+#include <rte_flow.h>
 
 #include "failsafe_private.h"
 
@@ -641,6 +642,33 @@ fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
 }
 
+static int
+fs_filter_ctrl(struct rte_eth_dev *dev,
+		enum rte_filter_type type,
+		enum rte_filter_op op,
+		void *arg)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	if (type == RTE_ETH_FILTER_GENERIC &&
+	    op == RTE_ETH_FILTER_GET) {
+		*(const void **)arg = &fs_flow_ops;
+		return 0;
+	}
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_filter_ctrl on sub_device %d", i);
+		ret = rte_eth_dev_filter_ctrl(PORT_ID(sdev), type, op, arg);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_filter_ctrl failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
 const struct eth_dev_ops failsafe_ops = {
 	.dev_configure = fs_dev_configure,
 	.dev_start = fs_dev_start,
@@ -668,4 +696,5 @@ const struct eth_dev_ops failsafe_ops = {
 	.mac_addr_remove = fs_mac_addr_remove,
 	.mac_addr_add = fs_mac_addr_add,
 	.mac_addr_set = fs_mac_addr_set,
+	.filter_ctrl = fs_filter_ctrl,
 };
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index c1aa24d..d20f9b6 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -34,6 +34,8 @@
 #ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
 #define _RTE_ETH_FAILSAFE_PRIVATE_H_
 
+#include <sys/queue.h>
+
 #include <rte_dev.h>
 #include <rte_pci.h>
 #include <rte_ethdev.h>
@@ -73,6 +75,16 @@ struct txq {
 	struct rte_eth_txq_info info;
 };
 
+struct rte_flow {
+	TAILQ_ENTRY(rte_flow) next;
+
+	/* sub_flows */
+	struct rte_flow *flows[FAILSAFE_MAX_ETHPORTS];
+
+	/* flow description for synchronization */
+	struct rte_flow_desc *fd;
+};
+
 enum dev_state {
 	DEV_UNDEFINED = 0,
 	DEV_PARSED,
@@ -92,6 +104,7 @@ struct sub_device {
 	};
 	struct rte_eth_dev *eth_dev;
 	struct rte_eth_dev_data data;
+	uint8_t sid;
 	/* Device state machine */
 	enum dev_state state;
 
@@ -111,6 +124,8 @@ struct fs_priv {
 	uint8_t subs_tx;
 	struct sub_device *subs;
 	uint8_t current_probed;
+	/* flow mapping */
+	TAILQ_HEAD(sub_flows, rte_flow) flow_list;
 	/* current number of mac_addr slots allocated. */
 	uint32_t nb_mac_addr;
 	struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
@@ -158,6 +173,7 @@ int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
 
 extern const char pmd_failsafe_driver_name[];
 extern const struct eth_dev_ops failsafe_ops;
+extern const struct rte_flow_ops fs_flow_ops;
 extern uint64_t plug_in_poll;
 extern int mac_from_arg;
 
@@ -175,6 +191,10 @@ extern int mac_from_arg;
 #define PORT_ID(sdev) \
 	(ETH(sdev)->data->port_id)
 
+/* sdev: (struct sub_device *) */
+#define SUB_ID(sdev) \
+	(sdev->sid)
+
 /**
  * Stateful iterator construct over fail-safe sub-devices:
  * s:     (struct sub_device *), iterator
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/12] net/failsafe: support offload capabilities
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (8 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 09/12] net/failsafe: support flow API Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 11/12] net/failsafe: add fast burst functions Gaetan Rivet
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/failsafe_ops.c | 125 +++++++++++++++++++++++++++++++++++-
 1 file changed, 123 insertions(+), 2 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 57de33f..0ed49b2 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -63,22 +63,143 @@ static struct rte_eth_dev_info default_infos = {
 		.nb_seg_max = UINT16_MAX,
 		.nb_mtu_seg_max = UINT16_MAX,
 	},
-	/* Set of understood capabilities */
-	.rx_offload_capa = 0x0,
+	/*
+	 * Set of capabilities that can be verified upon
+	 * configuring a sub-device.
+	 */
+	.rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_LRO,
 	.tx_offload_capa = 0x0,
 	.flow_type_rss_offloads = 0x0,
 };
 
+/**
+ * Check whether a specific offloading capability
+ * is supported by a sub_device.
+ *
+ * @return
+ *   0: all requested capabilities are supported by the sub_device
+ *   positive value: This flag at least is not supported by the sub_device
+ */
+static int
+fs_port_offload_validate(struct rte_eth_dev *dev,
+			 struct sub_device *sdev)
+{
+	struct rte_eth_dev_info infos = {0};
+	struct rte_eth_conf *cf;
+	uint32_t cap;
+
+	cf = &dev->data->dev_conf;
+	SUBOPS(sdev, dev_infos_get)(ETH(sdev), &infos);
+	/* RX capabilities */
+	cap = infos.rx_offload_capa;
+	if (cf->rxmode.hw_vlan_strip &&
+	    ((cap & DEV_RX_OFFLOAD_VLAN_STRIP) == 0)) {
+		WARN("VLAN stripping offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_VLAN_STRIP;
+	}
+	if (cf->rxmode.hw_ip_checksum &&
+	    ((cap & (DEV_RX_OFFLOAD_IPV4_CKSUM |
+		     DEV_RX_OFFLOAD_UDP_CKSUM |
+		     DEV_RX_OFFLOAD_TCP_CKSUM)) !=
+	     (DEV_RX_OFFLOAD_IPV4_CKSUM |
+	      DEV_RX_OFFLOAD_UDP_CKSUM |
+	      DEV_RX_OFFLOAD_TCP_CKSUM))) {
+		WARN("IP checksum offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_IPV4_CKSUM |
+		       DEV_RX_OFFLOAD_UDP_CKSUM |
+		       DEV_RX_OFFLOAD_TCP_CKSUM;
+	}
+	if (cf->rxmode.enable_lro &&
+	    ((cap & DEV_RX_OFFLOAD_TCP_LRO) == 0)) {
+		WARN("TCP LRO offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_TCP_LRO;
+	}
+	if (cf->rxmode.hw_vlan_extend &&
+	    ((cap & DEV_RX_OFFLOAD_QINQ_STRIP) == 0)) {
+		WARN("Stacked VLAN stripping offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_QINQ_STRIP;
+	}
+	/* TX capabilities */
+	/* Nothing to do, no tx capa supported */
+	return 0;
+}
+
+/*
+ * Disable the dev_conf flag related to an offload capability flag
+ * within an ethdev configuration.
+ */
+static int
+fs_port_disable_offload(struct rte_eth_conf *cf,
+			uint32_t ol_cap)
+{
+	switch (ol_cap) {
+	case DEV_RX_OFFLOAD_VLAN_STRIP:
+		INFO("Disabling VLAN stripping offload");
+		cf->rxmode.hw_vlan_strip = 0;
+		break;
+	case DEV_RX_OFFLOAD_IPV4_CKSUM:
+	case DEV_RX_OFFLOAD_UDP_CKSUM:
+	case DEV_RX_OFFLOAD_TCP_CKSUM:
+	case (DEV_RX_OFFLOAD_IPV4_CKSUM |
+	      DEV_RX_OFFLOAD_UDP_CKSUM |
+	      DEV_RX_OFFLOAD_TCP_CKSUM):
+		INFO("Disabling IP checksum offload");
+		cf->rxmode.hw_ip_checksum = 0;
+		break;
+	case DEV_RX_OFFLOAD_TCP_LRO:
+		INFO("Disabling TCP LRO offload");
+		cf->rxmode.enable_lro = 0;
+		break;
+	case DEV_RX_OFFLOAD_QINQ_STRIP:
+		INFO("Disabling stacked VLAN stripping offload");
+		cf->rxmode.hw_vlan_extend = 0;
+		break;
+	default:
+		DEBUG("Unable to disable offload capability: %" PRIx32,
+		      ol_cap);
+		return -1;
+	}
+	return 0;
+}
+
 static int
 fs_dev_configure(struct rte_eth_dev *dev)
 {
 	struct sub_device *sdev;
 	uint8_t i;
+	int capa_flag;
 	int ret;
 
 	FOREACH_SUBDEV(sdev, i, dev) {
 		if (sdev->state != DEV_PROBED)
 			continue;
+		DEBUG("Checking capabilities for sub_device %d", i);
+		while ((capa_flag = fs_port_offload_validate(dev, sdev))) {
+			if (PRIV(dev)->state >= DEV_ACTIVE) {
+				ERROR("device already configured, cannot fix live configuration");
+				return -1;
+			}
+			ret = fs_port_disable_offload(&dev->data->dev_conf,
+						      capa_flag);
+			if (ret) {
+				ERROR("Unable to disable offload capability");
+				return ret;
+			}
+		}
+	}
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PROBED)
+			continue;
 		DEBUG("Configuring sub-device %d", i);
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/12] net/failsafe: add fast burst functions
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (9 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 10/12] net/failsafe: support offload capabilities Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 15:40 ` [PATCH 12/12] net/failsafe: support device removal Gaetan Rivet
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/failsafe_private.h |   8 +++
 drivers/net/failsafe/failsafe_rxtx.c    | 117 ++++++++++++++++++++++++++------
 2 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d20f9b6..db7ce93 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -148,11 +148,18 @@ int failsafe_plugin_alarm_install(struct rte_eth_dev *dev);
 
 /* RX / TX */
 
+void set_burst_fn(struct rte_eth_dev *dev);
+
 uint16_t failsafe_rx_burst(void *rxq,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 uint16_t failsafe_tx_burst(void *txq,
 		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
 
+uint16_t failsafe_rx_burst_fast(void *rxq,
+		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t failsafe_tx_burst_fast(void *txq,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 /* ARGS */
 
 int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
@@ -306,6 +313,7 @@ fs_switch_dev(struct rte_eth_dev *dev)
 	} else {
 		return;
 	}
+	set_burst_fn(dev);
 }
 
 #endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
index a45b4e5..796a9ad 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -36,11 +36,53 @@
 
 #include "failsafe_private.h"
 
-/*
- * TODO: write fast version,
- * without additional checks, to be activated once
- * everything has been verified to comply.
- */
+static inline int
+sdev_rx_unsafe(struct sub_device *sdev)
+{
+	return (ETH(sdev) == NULL) ||
+		(ETH(sdev)->rx_pkt_burst == NULL) ||
+		(sdev->state != DEV_STARTED);
+}
+
+static inline int
+sdev_tx_unsafe(struct sub_device *sdev)
+{
+	return (sdev == NULL) ||
+		(ETH(sdev) == NULL) ||
+		(ETH(sdev)->tx_pkt_burst == NULL) ||
+		(sdev->state != DEV_STARTED);
+}
+
+void
+set_burst_fn(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int need_safe;
+	int safe_set;
+
+	need_safe = 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		need_safe |= sdev_rx_unsafe(sdev);
+	safe_set = (dev->rx_pkt_burst == &failsafe_rx_burst);
+	if (need_safe && !safe_set) {
+		DEBUG("Using safe RX bursts");
+		dev->rx_pkt_burst = &failsafe_rx_burst;
+	} else if (!need_safe && safe_set) {
+		DEBUG("Using fast RX bursts");
+		dev->rx_pkt_burst = &failsafe_rx_burst_fast;
+	}
+	need_safe = sdev_tx_unsafe(TX_SUBDEV(dev));
+	safe_set = (dev->tx_pkt_burst == &failsafe_tx_burst);
+	if (need_safe && !safe_set) {
+		DEBUG("Using safe TX bursts");
+		dev->tx_pkt_burst = &failsafe_tx_burst;
+	} else if (!need_safe && safe_set) {
+		DEBUG("Using fast TX bursts");
+		dev->tx_pkt_burst = &failsafe_tx_burst_fast;
+	}
+}
+
 uint16_t
 failsafe_rx_burst(void *queue,
 		  struct rte_mbuf **rx_pkts,
@@ -63,11 +105,7 @@ failsafe_rx_burst(void *queue,
 		if (i == priv->subs_tail)
 			i = priv->subs_head;
 		sdev = &priv->subs[i];
-		if (unlikely(ETH(sdev) == NULL))
-			continue;
-		if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
-			continue;
-		if (unlikely(sdev->state != DEV_STARTED))
+		if (unlikely(sdev_rx_unsafe(sdev)))
 			continue;
 		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
 		nb_rx = ETH(sdev)->
@@ -80,11 +118,39 @@ failsafe_rx_burst(void *queue,
 	return 0;
 }
 
-/*
- * TODO: write fast version,
- * without additional checks, to be activated once
- * everything has been verified to comply.
- */
+uint16_t
+failsafe_rx_burst_fast(void *queue,
+			 struct rte_mbuf **rx_pkts,
+			 uint16_t nb_pkts)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	void *sub_rxq;
+	uint16_t nb_rx;
+	uint8_t nb_polled, nb_subs;
+	uint8_t i;
+
+	rxq = queue;
+	priv = rxq->priv;
+	nb_subs = priv->subs_tail - priv->subs_head;
+	nb_polled = 0;
+	for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
+		i++;
+		if (i == priv->subs_tail)
+			i = priv->subs_head;
+		sdev = &priv->subs[i];
+		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
+		nb_rx = ETH(sdev)->
+			rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
+		if (nb_rx) {
+			rxq->last_polled = i;
+			return nb_rx;
+		}
+	}
+	return 0;
+}
+
 uint16_t
 failsafe_tx_burst(void *queue,
 		  struct rte_mbuf **tx_pkts,
@@ -96,12 +162,23 @@ failsafe_tx_burst(void *queue,
 
 	txq = queue;
 	sdev = TX_SUBDEV(txq->priv->dev);
-	if (unlikely(sdev == NULL))
-		return 0;
-	if (unlikely(ETH(sdev) == NULL))
-		return 0;
-	if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
+	if (unlikely(sdev_tx_unsafe(sdev)))
 		return 0;
 	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
 	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
 }
+
+uint16_t
+failsafe_tx_burst_fast(void *queue,
+			 struct rte_mbuf **tx_pkts,
+			 uint16_t nb_pkts)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	void *sub_txq;
+
+	txq = queue;
+	sdev = TX_SUBDEV(txq->priv->dev);
+	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
+	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/12] net/failsafe: support device removal
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (10 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 11/12] net/failsafe: add fast burst functions Gaetan Rivet
@ 2017-03-03 15:40 ` Gaetan Rivet
  2017-03-03 16:14 ` [PATCH 00/12] introduce fail-safe PMD Bruce Richardson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-03 15:40 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/failsafe_args.c    | 22 ++++++++++++
 drivers/net/failsafe/failsafe_eal.c     |  2 ++
 drivers/net/failsafe/failsafe_ether.c   | 62 +++++++++++++++++++++++++++++++--
 drivers/net/failsafe/failsafe_ops.c     | 17 +++++++++
 drivers/net/failsafe/failsafe_private.h |  7 ++++
 5 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index 839831f..62033c4 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -462,6 +462,26 @@ failsafe_args_count_subdevice(struct rte_eth_dev *dev,
 				dev, params);
 }
 
+static int
+parse_sub_device(struct sub_device *sdev)
+{
+	struct rte_devargs *da;
+	char params[DEVARGS_MAXLEN] = "";
+
+	da = &sdev->devargs;
+	if (da->type == RTE_DEVTYPE_VIRTUAL)
+		snprintf(params, sizeof(params) - 1,
+			"%s,%s", da->virt.drv_name, da->args);
+	else
+		snprintf(params, sizeof(params) - 1,
+			PCI_PRI_FMT ",%s",
+			da->pci.addr.domain, da->pci.addr.bus,
+			da->pci.addr.devid, da->pci.addr.function,
+			da->args);
+
+	return parse_device(sdev, params);
+}
+
 int
 failsafe_args_parse_subs(struct rte_eth_dev *dev)
 {
@@ -474,6 +494,8 @@ failsafe_args_parse_subs(struct rte_eth_dev *dev)
 			continue;
 		if (sdev->cmdline)
 			ret = execute_cmd(sdev, sdev->cmdline);
+		else
+			ret = parse_sub_device(sdev);
 		if (ret == 0)
 			sdev->state = DEV_PARSED;
 	}
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index 9817fc9..8bb8d45 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -140,6 +140,7 @@ dev_init(struct rte_eth_dev *dev)
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
 			SUB_ID(sdev) = i;
+			sdev->fs_dev = dev;
 			sdev->state = DEV_PROBED;
 		}
 	}
@@ -191,6 +192,7 @@ pci_probe(struct rte_eth_dev *dev)
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
 			SUB_ID(sdev) = i;
+			sdev->fs_dev = dev;
 			sdev->state = DEV_PROBED;
 		}
 	}
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 439e02c..2f05de5 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -33,6 +33,7 @@
 
 #include <unistd.h>
 
+#include <rte_alarm.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 
@@ -256,6 +257,43 @@ eth_dev_conf_apply(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static void
+fs_dev_remove(void *arg)
+{
+	struct sub_device *sdev = arg;
+	struct rte_devargs *da;
+	struct rte_pci_device *pdev;
+	enum dev_state state;
+
+	state = sdev->state;
+	sdev->state = DEV_UNDEFINED;
+	fs_switch_dev(sdev->fs_dev);
+	switch (state) {
+	case DEV_STARTED:
+		rte_eth_dev_stop(PORT_ID(sdev));
+		/* fallthrough */
+	case DEV_ACTIVE:
+		rte_eth_dev_close(PORT_ID(sdev));
+		/* fallthrough */
+	case DEV_PROBED:
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			rte_eal_pci_detach_all_drivers(pdev);
+		} else if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			rte_eal_vdev_uninit(da->virt.drv_name);
+		}
+		sdev->eth_dev->state = RTE_ETH_DEV_UNUSED;
+		/* fallthrough */
+	case DEV_SCANNED:
+	case DEV_PARSED:
+	case DEV_UNDEFINED:
+		/* the end */
+		break;
+	}
+	failsafe_plugin_alarm_install(sdev->fs_dev);
+}
+
 int
 failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 {
@@ -291,12 +329,13 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 			if (ret) {
 				ERROR("Could not apply configuration to sub_device %d",
 				      i);
-				/* TODO: disable device */
+				fs_dev_remove(sdev);
 				return ret;
 			}
 		}
 	}
-	/* If new devices have been configured, check if
+	/*
+	 * If new devices have been configured, check if
 	 * the link state has changed.
 	 */
 	if (inactive)
@@ -308,3 +347,22 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 		return ret;
 	return 0;
 }
+
+void
+failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
+				enum rte_eth_event_type type,
+				void *arg)
+{
+	if (type != RTE_ETH_EVENT_INTR_RMV) {
+		ERROR("Incorrect event");
+		return;
+	}
+	/*
+	 * Async removal, the sub-PMD will try to unregister
+	 * the callback at the source of the current thread context.
+	 */
+	if (rte_eal_alarm_set(FAILSAFE_PLUGOUT_ASYNC_RESCHED_US,
+			      fs_dev_remove,
+			      arg))
+		ERROR("Could not set up deferred sub_device removal");
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 0ed49b2..c2524c7 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -198,8 +198,19 @@ fs_dev_configure(struct rte_eth_dev *dev)
 		}
 	}
 	FOREACH_SUBDEV(sdev, i, dev) {
+		int rmv_interrupt = 0;
+
 		if (sdev->state != DEV_PROBED)
 			continue;
+
+		rmv_interrupt = ETH(sdev)->data->dev_flags &
+				RTE_ETH_DEV_INTR_RMV;
+		if (rmv_interrupt) {
+			DEBUG("Enabling RMV interrupts for sub_device %d", i);
+			dev->data->dev_conf.intr_conf.rmv = 1;
+		} else {
+			DEBUG("sub_device %d does not support RMV event", i);
+		}
 		DEBUG("Configuring sub-device %d", i);
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
@@ -209,6 +220,12 @@ fs_dev_configure(struct rte_eth_dev *dev)
 			ERROR("Could not configure sub_device %d", i);
 			return ret;
 		}
+		if (rmv_interrupt)
+			rte_eth_dev_callback_register(PORT_ID(sdev),
+					RTE_ETH_EVENT_INTR_RMV,
+					failsafe_eth_rmv_event_callback,
+					sdev);
+		dev->data->dev_conf.intr_conf.rmv = 0;
 		sdev->state = DEV_ACTIVE;
 	}
 	if (PRIV(dev)->state < DEV_ACTIVE)
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index db7ce93..fcaccc3 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -53,6 +53,7 @@
 	""
 
 #define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+#define FAILSAFE_PLUGOUT_ASYNC_RESCHED_US 100000
 
 #define FAILSAFE_MAX_ETHPORTS (RTE_MAX_ETHPORTS - 1)
 #define FAILSAFE_MAX_ETHADDR 128
@@ -110,6 +111,9 @@ struct sub_device {
 
 	/* Some device are defined as a command line */
 	char *cmdline;
+
+	/* fail-safe device backreference */
+	struct rte_eth_dev *fs_dev;
 };
 
 struct fs_priv {
@@ -175,6 +179,9 @@ int failsafe_eal_uninit(struct rte_eth_dev *dev);
 /* ETH_DEV */
 
 int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
+void failsafe_eth_rmv_event_callback(uint8_t port_id,
+				     enum rte_eth_event_type type,
+				     void *arg);
 
 /* GLOBALS */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/12] introduce fail-safe PMD
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (11 preceding siblings ...)
  2017-03-03 15:40 ` [PATCH 12/12] net/failsafe: support device removal Gaetan Rivet
@ 2017-03-03 16:14 ` Bruce Richardson
  2017-03-06 13:53   ` Gaëtan Rivet
  2017-03-03 17:27 ` Stephen Hemminger
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
  14 siblings, 1 reply; 49+ messages in thread
From: Bruce Richardson @ 2017-03-03 16:14 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev, Thomas Monjalon, Adrien Mazarguil, Jingjing Wu

On Fri, Mar 03, 2017 at 04:40:22PM +0100, Gaetan Rivet wrote:
> This PMD intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back so that
> existing applications do not need to be modified to benefit from true
> hot-plugging support.
> 
> The stacked PMD approach shares many similarities with the bonding PMD but
> with a different purpose. While bonding provides the ability to group
> several links into a single logical device for enhanced throughput and
> supports fail-over at link level, this one manages the sudden disappearance
> of the underlying device; it guarantees applications face a valid device in
> working order at all times.
> 
> Each fail-safe instance is configured to run atop one or several
> devices, with one defined as the preferred device. Hot-plug events are
> handled on all of them, and Tx is always directed to the preferred device
> if present or to the next available failover device (Rx is always performed
> on all devices for simplicity).
> 
> Moreover, the configured slaves (preferred or failover) do not need to be
> present at initialization time and may appear later.
> 
> Slaves configuration is continuously synchronized with that of the virtual
> device, which exposes their common set of capabilities to the application.
> Failure to apply the current configuration state to a slave for any reason
> simply reschedules its initialization.
> 
> This series depends on the series
> [PATCH 0/4] clarify eth_dev state management
> [PATCH 0/5] add device removal event
> 
Hi,

this looks an interesting PMD, and I like the wrapper approach. However,
why duplicate the functionality of the bonding device in another device
driver? Is it not possible to have this driver wrap individual devices
and then have the bonding driver use those wrapped devices to present an
omni-present device? It seems strange to have support for grouping
devices together in two different drivers - one should leverage the
other.

Alternatively, should this be merged into the bonding driver or replace
it?

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/12] introduce fail-safe PMD
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (12 preceding siblings ...)
  2017-03-03 16:14 ` [PATCH 00/12] introduce fail-safe PMD Bruce Richardson
@ 2017-03-03 17:27 ` Stephen Hemminger
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
  14 siblings, 0 replies; 49+ messages in thread
From: Stephen Hemminger @ 2017-03-03 17:27 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev, Thomas Monjalon, Adrien Mazarguil, Jingjing Wu

On Fri,  3 Mar 2017 16:40:22 +0100
Gaetan Rivet <gaetan.rivet@6wind.com> wrote:

> This PMD intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back so that
> existing applications do not need to be modified to benefit from true
> hot-plugging support.
> 
> The stacked PMD approach shares many similarities with the bonding PMD but
> with a different purpose. While bonding provides the ability to group
> several links into a single logical device for enhanced throughput and
> supports fail-over at link level, this one manages the sudden disappearance
> of the underlying device; it guarantees applications face a valid device in
> working order at all times.
> 
> Each fail-safe instance is configured to run atop one or several
> devices, with one defined as the preferred device. Hot-plug events are
> handled on all of them, and Tx is always directed to the preferred device
> if present or to the next available failover device (Rx is always performed
> on all devices for simplicity).
> 
> Moreover, the configured slaves (preferred or failover) do not need to be
> present at initialization time and may appear later.
> 
> Slaves configuration is continuously synchronized with that of the virtual
> device, which exposes their common set of capabilities to the application.
> Failure to apply the current configuration state to a slave for any reason
> simply reschedules its initialization.
> 

This looks interesting for SR-IOV VF support on Hyper-V.
It would be good if the driver worked in interrupt mode as well.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/12] ethdev: save VLAN filter setting
  2017-03-03 15:40 ` [PATCH 01/12] ethdev: save VLAN filter setting Gaetan Rivet
@ 2017-03-03 17:33   ` Stephen Hemminger
  0 siblings, 0 replies; 49+ messages in thread
From: Stephen Hemminger @ 2017-03-03 17:33 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

On Fri,  3 Mar 2017 16:40:23 +0100
Gaetan Rivet <gaetan.rivet@6wind.com> wrote:

> Other configuration items (i.e. MAC addresses) are stored within
> rte_eth_dev_data, but not this one.
> 
> Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>

Maybe time to introduce a generic set of bit array operations like
kernel set/clear/test bit functions. There are architecture specific
instructions for this (doesn't matter for this case). May even be GCC builtin support

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/12] ethdev: add deferred intermediate device state
  2017-03-03 15:40 ` [PATCH 03/12] ethdev: add deferred intermediate device state Gaetan Rivet
@ 2017-03-03 17:34   ` Stephen Hemminger
  0 siblings, 0 replies; 49+ messages in thread
From: Stephen Hemminger @ 2017-03-03 17:34 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

On Fri,  3 Mar 2017 16:40:25 +0100
Gaetan Rivet <gaetan.rivet@6wind.com> wrote:

> This device state means that the device is managed externally, by
> whichever party has set this state (PMD or application).
> 
> Note: this new device state is only an information. The related device
> structure and operators are still valid and can be used normally.
> 
> It is however made private by device management helpers within ethdev,
> making the device invisible to applications.
> ---
>  lib/librte_ether/rte_ethdev.c | 3 ++-
>  lib/librte_ether/rte_ethdev.h | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)

Might be better to introduce RFC2863 operational states.
This is something management tools might want.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/12] net/failsafe: add fail-safe PMD
  2017-03-03 15:40 ` [PATCH 06/12] net/failsafe: add fail-safe PMD Gaetan Rivet
@ 2017-03-03 17:38   ` Stephen Hemminger
  2017-03-06 14:19     ` Gaëtan Rivet
  0 siblings, 1 reply; 49+ messages in thread
From: Stephen Hemminger @ 2017-03-03 17:38 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

On Fri,  3 Mar 2017 16:40:28 +0100
Gaetan Rivet <gaetan.rivet@6wind.com> wrote:

> +
> +static struct rte_eth_dev *
> +pci_addr_to_eth_dev(struct rte_pci_addr *addr)
> +{
> +	uint8_t pid;
> +
> +	if (addr == NULL)
> +		return NULL;
> +	for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
> +		struct rte_pci_addr *addr2;
> +		struct rte_eth_dev *edev;
> +
> +		edev = &rte_eth_devices[pid];
> +		if (edev->device == NULL ||
> +		    edev->device->devargs == NULL)
> +			continue;
> +		addr2 = &edev->device->devargs->pci.addr;
> +		if (rte_eal_compare_pci_addr(addr, addr2) == 0)
> +			return edev;
> +	}
> +	return NULL;
> +}
> +
> +static int
> +pci_scan_one(struct sub_device *sdev)
> +{
> +	struct rte_devargs *da;
> +	char dirname[PATH_MAX];
> +
> +	da = &sdev->devargs;
> +	snprintf(dirname, sizeof(dirname),
> +		"%s/" PCI_PRI_FMT,
> +		pci_get_sysfs_path(),
> +		da->pci.addr.domain,
> +		da->pci.addr.bus,
> +		da->pci.addr.devid,
> +		da->pci.addr.function);
> +	errno = 0;
> +	if (rte_eal_pci_parse_sysfs_entry(&sdev->pci_device,
> +		dirname, &da->pci.addr) < 0) {
> +		if (errno == ENOENT) {
> +			DEBUG("Could not scan requested device " PCI_PRI_FMT,
> +				da->pci.addr.domain,
> +				da->pci.addr.bus,
> +				da->pci.addr.devid,
> +				da->pci.addr.function);
> +		} else {
> +			ERROR("Error while scanning sysfs entry %s",
> +					dirname);
> +			return -1;
> +		}
> +	} else {
> +		sdev->state = DEV_SCANNED;
> +	}
> +	return 0;
> +}

This needs to be generic and in EAL.
A bigger problem is that it PCI specific and therefore won't work in environments
where devices are attached to different busses (SOC and Hyper-V).

Please rework to play well with bus model.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/12] introduce fail-safe PMD
  2017-03-03 16:14 ` [PATCH 00/12] introduce fail-safe PMD Bruce Richardson
@ 2017-03-06 13:53   ` Gaëtan Rivet
  0 siblings, 0 replies; 49+ messages in thread
From: Gaëtan Rivet @ 2017-03-06 13:53 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Thomas Monjalon, Adrien Mazarguil, Jingjing Wu

On Fri, Mar 03, 2017 at 04:14:20PM +0000, Bruce Richardson wrote:
>On Fri, Mar 03, 2017 at 04:40:22PM +0100, Gaetan Rivet wrote:
>> This PMD intercepts and manages Ethernet device removal events issued by
>> slave PMDs and re-initializes them transparently when brought back so that
>> existing applications do not need to be modified to benefit from true
>> hot-plugging support.
>>
>> The stacked PMD approach shares many similarities with the bonding PMD but
>> with a different purpose. While bonding provides the ability to group
>> several links into a single logical device for enhanced throughput and
>> supports fail-over at link level, this one manages the sudden disappearance
>> of the underlying device; it guarantees applications face a valid device in
>> working order at all times.
>>
>> Each fail-safe instance is configured to run atop one or several
>> devices, with one defined as the preferred device. Hot-plug events are
>> handled on all of them, and Tx is always directed to the preferred device
>> if present or to the next available failover device (Rx is always performed
>> on all devices for simplicity).
>>
>> Moreover, the configured slaves (preferred or failover) do not need to be
>> present at initialization time and may appear later.
>>
>> Slaves configuration is continuously synchronized with that of the virtual
>> device, which exposes their common set of capabilities to the application.
>> Failure to apply the current configuration state to a slave for any reason
>> simply reschedules its initialization.
>>
>> This series depends on the series
>> [PATCH 0/4] clarify eth_dev state management
>> [PATCH 0/5] add device removal event
>>
>Hi,
>

Hi Bruce,

>this looks an interesting PMD, and I like the wrapper approach. However,
>why duplicate the functionality of the bonding device in another device
>driver? Is it not possible to have this driver wrap individual devices
>and then have the bonding driver use those wrapped devices to present an
>omni-present device? It seems strange to have support for grouping
>devices together in two different drivers - one should leverage the
>other.
>

Yes there appears to be a certain amount of overlap between these two 
PMDs from the above description. Actually we've considered modifying 
bonding at first but found it to be fundamentally incompatible:

- By design, with bonding, applications are aware of slave PMDs and can 
  use them directly (even if they shouldn't, depending on the 
  aggregation mode). If they were supported, hot-plug events would have 
  to be managed by the application as well.

- With fail-safe, slaves are provided as PMD parameters and are fully owned
  by the parent instance, which takes care of their entire initialization
  and provides transparent support for hot-plug events.

Their purposes also differ:

- Bonding implements various modes of operation for link aggregation (round
  robin, active backup, LACP...) in the same fashion as the Linux bond
  driver.

- Fail-safe handles hot-plug events so applications do not have to.

To answer your second question, both are stackable: the fail-safe design 
aims at being the most transparent possible, meaning that it should make 
no difference to applications using it, whether within a bond or not.

>Alternatively, should this be merged into the bonding driver or replace
>it?
>

Applications (or users) that need their combined functionality benefit 
from greater flexibility this way, so I think having them both in the 
tree makes sense.

>Regards,
>/Bruce

Regards
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/12] net/failsafe: add fail-safe PMD
  2017-03-03 17:38   ` Stephen Hemminger
@ 2017-03-06 14:19     ` Gaëtan Rivet
  0 siblings, 0 replies; 49+ messages in thread
From: Gaëtan Rivet @ 2017-03-06 14:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Fri, Mar 03, 2017 at 09:38:11AM -0800, Stephen Hemminger wrote:
>On Fri,  3 Mar 2017 16:40:28 +0100
>Gaetan Rivet <gaetan.rivet@6wind.com> wrote:
>
>> +
>> +static struct rte_eth_dev *
>> +pci_addr_to_eth_dev(struct rte_pci_addr *addr)
>> +{
>> +	uint8_t pid;
>> +
>> +	if (addr == NULL)
>> +		return NULL;
>> +	for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
>> +		struct rte_pci_addr *addr2;
>> +		struct rte_eth_dev *edev;
>> +
>> +		edev = &rte_eth_devices[pid];
>> +		if (edev->device == NULL ||
>> +		    edev->device->devargs == NULL)
>> +			continue;
>> +		addr2 = &edev->device->devargs->pci.addr;
>> +		if (rte_eal_compare_pci_addr(addr, addr2) == 0)
>> +			return edev;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +static int
>> +pci_scan_one(struct sub_device *sdev)
>> +{
>> +	struct rte_devargs *da;
>> +	char dirname[PATH_MAX];
>> +
>> +	da = &sdev->devargs;
>> +	snprintf(dirname, sizeof(dirname),
>> +		"%s/" PCI_PRI_FMT,
>> +		pci_get_sysfs_path(),
>> +		da->pci.addr.domain,
>> +		da->pci.addr.bus,
>> +		da->pci.addr.devid,
>> +		da->pci.addr.function);
>> +	errno = 0;
>> +	if (rte_eal_pci_parse_sysfs_entry(&sdev->pci_device,
>> +		dirname, &da->pci.addr) < 0) {
>> +		if (errno == ENOENT) {
>> +			DEBUG("Could not scan requested device " PCI_PRI_FMT,
>> +				da->pci.addr.domain,
>> +				da->pci.addr.bus,
>> +				da->pci.addr.devid,
>> +				da->pci.addr.function);
>> +		} else {
>> +			ERROR("Error while scanning sysfs entry %s",
>> +					dirname);
>> +			return -1;
>> +		}
>> +	} else {
>> +		sdev->state = DEV_SCANNED;
>> +	}
>> +	return 0;
>> +}
>
>This needs to be generic and in EAL.
>A bigger problem is that it PCI specific and therefore won't work in environments
>where devices are attached to different busses (SOC and Hyper-V).
>
>Please rework to play well with bus model.

Yes I agree, I planned to do so in any case for a V2.

I saw a few commits from Jan Blunck about a vdev bus[0].
Do you have other series regarding busses in mind that I should rebase 
upon for a better support?

[0]: http://dpdk.org/ml/archives/dev/2017-March/059423.html

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
                   ` (13 preceding siblings ...)
  2017-03-03 17:27 ` Stephen Hemminger
@ 2017-03-08 15:15 ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 01/13] ethdev: save VLAN filter setting Gaetan Rivet
                     ` (15 more replies)
  14 siblings, 16 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, Adrien Mazarguil

This PMD intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back so that
existing applications do not need to be modified to benefit from true
hot-plugging support.

The stacked PMD approach shares many similarities with the bonding PMD but
with a different purpose. While bonding provides the ability to group
several links into a single logical device for enhanced throughput and
supports fail-over at link level, this one manages the sudden disappearance
of the underlying device; it guarantees applications face a valid device in
working order at all times.

Each fail-safe instance is configured to run atop one or several
devices, with one defined as the preferred device. Hot-plug events are
handled on all of them, and Tx is always directed to the preferred device
if present or to the next available failover device (Rx is always performed
on all devices for simplicity).

Moreover, the configured slaves (preferred or failover) do not need to be
present at initialization time and may appear later.

Slaves configuration is continuously synchronized with that of the virtual
device, which exposes their common set of capabilities to the application.
Failure to apply the current configuration state to a slave for any reason
simply reschedules its initialization.

This series depends on the series
[PATCH 0/4] clarify eth_dev state management
[PATCH 0/5] add device removal event

v1 --> v2:

  - Wrote documentation
  - Fixed commit logs, signed-off-by
  - Added LSC event support
  - A few minor fixes

Gaetan Rivet (13):
  ethdev: save VLAN filter setting
  ethdev: add flow API rule copy function
  ethdev: add deferred intermediate device state
  pci: expose device detach routine
  pci: expose parse and probe routines
  net/failsafe: add fail-safe PMD
  net/failsafe: add plug-in support
  net/failsafe: add flexible device definition
  net/failsafe: support flow API
  net/failsafe: support offload capabilities
  net/failsafe: add fast burst functions
  net/failsafe: support device removal
  net/failsafe: support link status change event

 MAINTAINERS                                     |   5 +
 app/test-pmd/config.c                           | 263 ++-----
 app/test-pmd/testpmd.h                          |   5 +-
 config/common_base                              |   5 +
 doc/guides/nics/fail_safe.rst                   | 182 +++++
 doc/guides/nics/features/failsafe.ini           |  32 +
 doc/guides/nics/index.rst                       |   1 +
 drivers/net/Makefile                            |   1 +
 drivers/net/failsafe/Makefile                   |  74 ++
 drivers/net/failsafe/failsafe.c                 | 299 ++++++++
 drivers/net/failsafe/failsafe_args.c            | 503 ++++++++++++++
 drivers/net/failsafe/failsafe_eal.c             | 300 ++++++++
 drivers/net/failsafe/failsafe_ether.c           | 394 +++++++++++
 drivers/net/failsafe/failsafe_flow.c            | 230 +++++++
 drivers/net/failsafe/failsafe_ops.c             | 867 ++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h         | 329 +++++++++
 drivers/net/failsafe/failsafe_rxtx.c            | 184 +++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   6 +
 lib/librte_eal/common/eal_common_pci.c          |  16 +-
 lib/librte_eal/common/include/rte_pci.h         |  40 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c           |  39 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   7 +
 lib/librte_ether/rte_ethdev.c                   |  22 +-
 lib/librte_ether/rte_ethdev.h                   |  11 +
 lib/librte_ether/rte_flow.c                     | 283 ++++++++
 lib/librte_ether/rte_flow.h                     |  59 ++
 mk/rte.app.mk                                   |   1 +
 27 files changed, 3910 insertions(+), 248 deletions(-)
 create mode 100644 doc/guides/nics/fail_safe.rst
 create mode 100644 doc/guides/nics/features/failsafe.ini
 create mode 100644 drivers/net/failsafe/Makefile
 create mode 100644 drivers/net/failsafe/failsafe.c
 create mode 100644 drivers/net/failsafe/failsafe_args.c
 create mode 100644 drivers/net/failsafe/failsafe_eal.c
 create mode 100644 drivers/net/failsafe/failsafe_ether.c
 create mode 100644 drivers/net/failsafe/failsafe_flow.c
 create mode 100644 drivers/net/failsafe/failsafe_ops.c
 create mode 100644 drivers/net/failsafe/failsafe_private.h
 create mode 100644 drivers/net/failsafe/failsafe_rxtx.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v2 01/13] ethdev: save VLAN filter setting
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 02/13] ethdev: add flow API rule copy function Gaetan Rivet
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Other configuration items (i.e. MAC addresses) are stored within
rte_eth_dev_data, but not this one.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 lib/librte_ether/rte_ethdev.c | 19 ++++++++++++++++++-
 lib/librte_ether/rte_ethdev.h | 10 ++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 6c4b796..61a63b7 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1764,6 +1764,7 @@ int
 rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t vlan_id, int on)
 {
 	struct rte_eth_dev *dev;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
 	dev = &rte_eth_devices[port_id];
@@ -1779,7 +1780,23 @@ rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t vlan_id, int on)
 	}
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vlan_filter_set, -ENOTSUP);
 
-	return (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
+	ret = (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on);
+	if (ret == 0) {
+		struct rte_vlan_filter_conf *vfc;
+		int vidx;
+		int vbit;
+
+		vfc = &dev->data->vlan_filter_conf;
+		vidx = vlan_id / 64;
+		vbit = vlan_id % 64;
+
+		if (on)
+			vfc->ids[vidx] |= UINT64_C(1) << vbit;
+		else
+			vfc->ids[vidx] &= ~(UINT64_C(1) << vbit);
+	}
+
+	return ret;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ed99660..f01d140 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -375,6 +375,14 @@ enum rte_vlan_type {
 };
 
 /**
+ * A structure used to describe a vlan filter.
+ * If the bit corresponding to a VID is set, such VID is on.
+ */
+struct rte_vlan_filter_conf {
+	uint64_t ids[64];
+};
+
+/**
  * A structure used to configure the Receive Side Scaling (RSS) feature
  * of an Ethernet port.
  * If not NULL, the *rss_key* pointer of the *rss_conf* structure points
@@ -1716,6 +1724,8 @@ struct rte_eth_dev_data {
 	enum rte_kernel_driver kdrv;    /**< Kernel driver passthrough */
 	int numa_node;  /**< NUMA node connection */
 	const char *drv_name;   /**< Driver name */
+	struct rte_vlan_filter_conf vlan_filter_conf;
+	/**< VLAN filter configuration. */
 };
 
 /** Device supports hotplug detach */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 02/13] ethdev: add flow API rule copy function
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 01/13] ethdev: save VLAN filter setting Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 03/13] ethdev: add deferred intermediate device state Gaetan Rivet
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Take this helper from the testpmd app to offer it alongside the flow
API.

This allows PMDs and applications to save flow rules in their generic
format for later processing. This is useful when rules cannot be applied
immediately, such as when the device is not properly initialized.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 app/test-pmd/config.c       | 263 +++++++---------------------------------
 app/test-pmd/testpmd.h      |   5 +-
 lib/librte_ether/rte_flow.c | 283 ++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_flow.h |  59 +++++++++
 4 files changed, 387 insertions(+), 223 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 54e38c4..f20e258 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -932,208 +932,6 @@ port_mtu_set(portid_t port_id, uint16_t mtu)
 	printf("Set MTU failed. diag=%d\n", diag);
 }
 
-/* Generic flow management functions. */
-
-/** Generate flow_item[] entry. */
-#define MK_FLOW_ITEM(t, s) \
-	[RTE_FLOW_ITEM_TYPE_ ## t] = { \
-		.name = # t, \
-		.size = s, \
-	}
-
-/** Information about known flow pattern items. */
-static const struct {
-	const char *name;
-	size_t size;
-} flow_item[] = {
-	MK_FLOW_ITEM(END, 0),
-	MK_FLOW_ITEM(VOID, 0),
-	MK_FLOW_ITEM(INVERT, 0),
-	MK_FLOW_ITEM(ANY, sizeof(struct rte_flow_item_any)),
-	MK_FLOW_ITEM(PF, 0),
-	MK_FLOW_ITEM(VF, sizeof(struct rte_flow_item_vf)),
-	MK_FLOW_ITEM(PORT, sizeof(struct rte_flow_item_port)),
-	MK_FLOW_ITEM(RAW, sizeof(struct rte_flow_item_raw)), /* +pattern[] */
-	MK_FLOW_ITEM(ETH, sizeof(struct rte_flow_item_eth)),
-	MK_FLOW_ITEM(VLAN, sizeof(struct rte_flow_item_vlan)),
-	MK_FLOW_ITEM(IPV4, sizeof(struct rte_flow_item_ipv4)),
-	MK_FLOW_ITEM(IPV6, sizeof(struct rte_flow_item_ipv6)),
-	MK_FLOW_ITEM(ICMP, sizeof(struct rte_flow_item_icmp)),
-	MK_FLOW_ITEM(UDP, sizeof(struct rte_flow_item_udp)),
-	MK_FLOW_ITEM(TCP, sizeof(struct rte_flow_item_tcp)),
-	MK_FLOW_ITEM(SCTP, sizeof(struct rte_flow_item_sctp)),
-	MK_FLOW_ITEM(VXLAN, sizeof(struct rte_flow_item_vxlan)),
-};
-
-/** Compute storage space needed by item specification. */
-static void
-flow_item_spec_size(const struct rte_flow_item *item,
-		    size_t *size, size_t *pad)
-{
-	if (!item->spec)
-		goto empty;
-	switch (item->type) {
-		union {
-			const struct rte_flow_item_raw *raw;
-		} spec;
-
-	case RTE_FLOW_ITEM_TYPE_RAW:
-		spec.raw = item->spec;
-		*size = offsetof(struct rte_flow_item_raw, pattern) +
-			spec.raw->length * sizeof(*spec.raw->pattern);
-		break;
-	default:
-empty:
-		*size = 0;
-		break;
-	}
-	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
-}
-
-/** Generate flow_action[] entry. */
-#define MK_FLOW_ACTION(t, s) \
-	[RTE_FLOW_ACTION_TYPE_ ## t] = { \
-		.name = # t, \
-		.size = s, \
-	}
-
-/** Information about known flow actions. */
-static const struct {
-	const char *name;
-	size_t size;
-} flow_action[] = {
-	MK_FLOW_ACTION(END, 0),
-	MK_FLOW_ACTION(VOID, 0),
-	MK_FLOW_ACTION(PASSTHRU, 0),
-	MK_FLOW_ACTION(MARK, sizeof(struct rte_flow_action_mark)),
-	MK_FLOW_ACTION(FLAG, 0),
-	MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)),
-	MK_FLOW_ACTION(DROP, 0),
-	MK_FLOW_ACTION(COUNT, 0),
-	MK_FLOW_ACTION(DUP, sizeof(struct rte_flow_action_dup)),
-	MK_FLOW_ACTION(RSS, sizeof(struct rte_flow_action_rss)), /* +queue[] */
-	MK_FLOW_ACTION(PF, 0),
-	MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
-};
-
-/** Compute storage space needed by action configuration. */
-static void
-flow_action_conf_size(const struct rte_flow_action *action,
-		      size_t *size, size_t *pad)
-{
-	if (!action->conf)
-		goto empty;
-	switch (action->type) {
-		union {
-			const struct rte_flow_action_rss *rss;
-		} conf;
-
-	case RTE_FLOW_ACTION_TYPE_RSS:
-		conf.rss = action->conf;
-		*size = offsetof(struct rte_flow_action_rss, queue) +
-			conf.rss->num * sizeof(*conf.rss->queue);
-		break;
-	default:
-empty:
-		*size = 0;
-		break;
-	}
-	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
-}
-
-/** Generate a port_flow entry from attributes/pattern/actions. */
-static struct port_flow *
-port_flow_new(const struct rte_flow_attr *attr,
-	      const struct rte_flow_item *pattern,
-	      const struct rte_flow_action *actions)
-{
-	const struct rte_flow_item *item;
-	const struct rte_flow_action *action;
-	struct port_flow *pf = NULL;
-	size_t tmp;
-	size_t pad;
-	size_t off1 = 0;
-	size_t off2 = 0;
-	int err = ENOTSUP;
-
-store:
-	item = pattern;
-	if (pf)
-		pf->pattern = (void *)&pf->data[off1];
-	do {
-		struct rte_flow_item *dst = NULL;
-
-		if ((unsigned int)item->type >= RTE_DIM(flow_item) ||
-		    !flow_item[item->type].name)
-			goto notsup;
-		if (pf)
-			dst = memcpy(pf->data + off1, item, sizeof(*item));
-		off1 += sizeof(*item);
-		flow_item_spec_size(item, &tmp, &pad);
-		if (item->spec) {
-			if (pf)
-				dst->spec = memcpy(pf->data + off2,
-						   item->spec, tmp);
-			off2 += tmp + pad;
-		}
-		if (item->last) {
-			if (pf)
-				dst->last = memcpy(pf->data + off2,
-						   item->last, tmp);
-			off2 += tmp + pad;
-		}
-		if (item->mask) {
-			if (pf)
-				dst->mask = memcpy(pf->data + off2,
-						   item->mask, tmp);
-			off2 += tmp + pad;
-		}
-		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
-	} while ((item++)->type != RTE_FLOW_ITEM_TYPE_END);
-	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
-	action = actions;
-	if (pf)
-		pf->actions = (void *)&pf->data[off1];
-	do {
-		struct rte_flow_action *dst = NULL;
-
-		if ((unsigned int)action->type >= RTE_DIM(flow_action) ||
-		    !flow_action[action->type].name)
-			goto notsup;
-		if (pf)
-			dst = memcpy(pf->data + off1, action, sizeof(*action));
-		off1 += sizeof(*action);
-		flow_action_conf_size(action, &tmp, &pad);
-		if (action->conf) {
-			if (pf)
-				dst->conf = memcpy(pf->data + off2,
-						   action->conf, tmp);
-			off2 += tmp + pad;
-		}
-		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
-	} while ((action++)->type != RTE_FLOW_ACTION_TYPE_END);
-	if (pf != NULL)
-		return pf;
-	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
-	tmp = RTE_ALIGN_CEIL(offsetof(struct port_flow, data), sizeof(double));
-	pf = calloc(1, tmp + off1 + off2);
-	if (pf == NULL)
-		err = errno;
-	else {
-		*pf = (const struct port_flow){
-			.size = tmp + off1 + off2,
-			.attr = *attr,
-		};
-		tmp -= offsetof(struct port_flow, data);
-		off2 = tmp + off1;
-		off1 = tmp;
-		goto store;
-	}
-notsup:
-	rte_errno = err;
-	return NULL;
-}
-
 /** Print a message out of a flow error. */
 static int
 port_flow_complain(struct rte_flow_error *error)
@@ -1186,6 +984,28 @@ port_flow_validate(portid_t port_id,
 	return 0;
 }
 
+/** Generate a port_flow entry from attributes/pattern/actions. */
+static struct port_flow *
+port_flow_new(const struct rte_flow_attr *attr,
+	      const struct rte_flow_item *pattern,
+	      const struct rte_flow_action *actions)
+{
+	struct port_flow *pf;
+
+	pf = calloc(1, sizeof(struct port_flow));
+	if (!pf)
+		return NULL;
+
+	pf->fd = rte_flow_copy(attr, pattern, actions, NULL);
+	if (pf->fd == NULL) {
+		free(pf);
+		return NULL;
+	}
+	pf->size += pf->fd->size;
+
+	return pf;
+}
+
 /** Create flow rule. */
 int
 port_flow_create(portid_t port_id,
@@ -1265,6 +1085,7 @@ port_flow_destroy(portid_t port_id, uint32_t n, const uint32_t *rule)
 			}
 			printf("Flow rule #%u destroyed\n", pf->id);
 			*tmp = pf->next;
+			free(pf->fd);
 			free(pf);
 			break;
 		}
@@ -1325,11 +1146,11 @@ port_flow_query(portid_t port_id, uint32_t rule,
 		printf("Flow rule #%u not found\n", rule);
 		return -ENOENT;
 	}
-	if ((unsigned int)action >= RTE_DIM(flow_action) ||
-	    !flow_action[action].name)
+	if ((unsigned int)action >= RTE_FLOW_ACTION_TYPE_MAX ||
+	    !rte_flow_desc_data_action[action].name)
 		name = "unknown";
 	else
-		name = flow_action[action].name;
+		name = rte_flow_desc_data_action[action].name;
 	switch (action) {
 	case RTE_FLOW_ACTION_TYPE_COUNT:
 		break;
@@ -1385,18 +1206,18 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 		if (n) {
 			/* Filter out unwanted groups. */
 			for (i = 0; i != n; ++i)
-				if (pf->attr.group == group[i])
+				if (pf->fd->attr.group == group[i])
 					break;
 			if (i == n)
 				continue;
 		}
 		tmp = &list;
 		while (*tmp &&
-		       (pf->attr.group > (*tmp)->attr.group ||
-			(pf->attr.group == (*tmp)->attr.group &&
-			 pf->attr.priority > (*tmp)->attr.priority) ||
-			(pf->attr.group == (*tmp)->attr.group &&
-			 pf->attr.priority == (*tmp)->attr.priority &&
+		       (pf->fd->attr.group > (*tmp)->fd->attr.group ||
+			(pf->fd->attr.group == (*tmp)->fd->attr.group &&
+			 pf->fd->attr.priority > (*tmp)->fd->attr.priority) ||
+			(pf->fd->attr.group == (*tmp)->fd->attr.group &&
+			 pf->fd->attr.priority == (*tmp)->fd->attr.priority &&
 			 pf->id > (*tmp)->id)))
 			tmp = &(*tmp)->tmp;
 		pf->tmp = *tmp;
@@ -1404,24 +1225,28 @@ port_flow_list(portid_t port_id, uint32_t n, const uint32_t group[n])
 	}
 	printf("ID\tGroup\tPrio\tAttr\tRule\n");
 	for (pf = list; pf != NULL; pf = pf->tmp) {
-		const struct rte_flow_item *item = pf->pattern;
-		const struct rte_flow_action *action = pf->actions;
+		const struct rte_flow_item *item = pf->fd->items;
+		const struct rte_flow_action *action = pf->fd->actions;
 
 		printf("%" PRIu32 "\t%" PRIu32 "\t%" PRIu32 "\t%c%c\t",
 		       pf->id,
-		       pf->attr.group,
-		       pf->attr.priority,
-		       pf->attr.ingress ? 'i' : '-',
-		       pf->attr.egress ? 'e' : '-');
+		       pf->fd->attr.group,
+		       pf->fd->attr.priority,
+		       pf->fd->attr.ingress ? 'i' : '-',
+		       pf->fd->attr.egress ? 'e' : '-');
 		while (item->type != RTE_FLOW_ITEM_TYPE_END) {
 			if (item->type != RTE_FLOW_ITEM_TYPE_VOID)
-				printf("%s ", flow_item[item->type].name);
+				printf("%s ",
+				       rte_flow_desc_data_item[item->type].
+									name);
 			++item;
 		}
 		printf("=>");
 		while (action->type != RTE_FLOW_ACTION_TYPE_END) {
 			if (action->type != RTE_FLOW_ACTION_TYPE_VOID)
-				printf(" %s", flow_action[action->type].name);
+				printf(" %s",
+				       rte_flow_desc_data_action[action->type].
+									name);
 			++action;
 		}
 		printf("\n");
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 85d3e82..6002388 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -153,10 +153,7 @@ struct port_flow {
 	struct port_flow *tmp; /**< Temporary linking. */
 	uint32_t id; /**< Flow rule ID. */
 	struct rte_flow *flow; /**< Opaque flow object returned by PMD. */
-	struct rte_flow_attr attr; /**< Attributes. */
-	struct rte_flow_item *pattern; /**< Pattern. */
-	struct rte_flow_action *actions; /**< Actions. */
-	uint8_t data[]; /**< Storage for pattern/actions. */
+	struct rte_flow_desc *fd; /**< Generic flow description */
 };
 
 /**
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index aaa70d6..c8e2e93 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -33,6 +33,7 @@
 
 #include <stdint.h>
 
+#include <rte_common.h>
 #include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include "rte_ethdev.h"
@@ -157,3 +158,285 @@ rte_flow_query(uint8_t port_id,
 				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
 				   NULL, rte_strerror(ENOSYS));
 }
+
+/** Information about known flow pattern items. */
+const struct rte_flow_desc_data rte_flow_desc_data_item[] = {
+	[RTE_FLOW_ITEM_TYPE_END] = {
+		.name = "END",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_VOID] = {
+		.name = "VOID",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_INVERT] = {
+		.name = "INVERT",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_ANY] = {
+		.name = "ANY",
+		.size = sizeof(struct rte_flow_item_any),
+	},
+	[RTE_FLOW_ITEM_TYPE_PF] = {
+		.name = "PF",
+		.size = 0,
+	},
+	[RTE_FLOW_ITEM_TYPE_VF] = {
+		.name = "VF",
+		.size = sizeof(struct rte_flow_item_vf),
+	},
+	[RTE_FLOW_ITEM_TYPE_PORT] = {
+		.name = "PORT",
+		.size = sizeof(struct rte_flow_item_port),
+	},
+	[RTE_FLOW_ITEM_TYPE_RAW] = {
+		.name = "RAW",
+		.size = sizeof(struct rte_flow_item_raw), /* +pattern[] */
+	},
+	[RTE_FLOW_ITEM_TYPE_ETH] = {
+		.name = "ETH",
+		.size = sizeof(struct rte_flow_item_eth),
+	},
+	[RTE_FLOW_ITEM_TYPE_VLAN] = {
+		.name = "VLAN",
+		.size = sizeof(struct rte_flow_item_vlan),
+	},
+	[RTE_FLOW_ITEM_TYPE_IPV4] = {
+		.name = "IPV4",
+		.size = sizeof(struct rte_flow_item_ipv4),
+	},
+	[RTE_FLOW_ITEM_TYPE_IPV6] = {
+		.name = "IPV6",
+		.size = sizeof(struct rte_flow_item_ipv6),
+	},
+	[RTE_FLOW_ITEM_TYPE_ICMP] = {
+		.name = "ICMP",
+		.size = sizeof(struct rte_flow_item_icmp),
+	},
+	[RTE_FLOW_ITEM_TYPE_UDP] = {
+		.name = "UDP",
+		.size = sizeof(struct rte_flow_item_udp),
+	},
+	[RTE_FLOW_ITEM_TYPE_TCP] = {
+		.name = "TCP",
+		.size = sizeof(struct rte_flow_item_tcp),
+	},
+	[RTE_FLOW_ITEM_TYPE_SCTP] = {
+		.name = "SCTP",
+		.size = sizeof(struct rte_flow_item_sctp),
+	},
+	[RTE_FLOW_ITEM_TYPE_VXLAN] = {
+		.name = "VXLAN",
+		.size = sizeof(struct rte_flow_item_vxlan),
+	},
+	[RTE_FLOW_ITEM_TYPE_MAX] = {
+		.name = "MAX",
+		.size = 0,
+	},
+};
+
+/** Compute storage space needed by item specification. */
+static void
+flow_item_spec_size(const struct rte_flow_item *item,
+		    size_t *size, size_t *pad)
+{
+	union {
+		const struct rte_flow_item_raw *raw;
+	} spec;
+
+	if (!item->spec)
+		goto empty;
+	switch (item->type) {
+	case RTE_FLOW_ITEM_TYPE_RAW:
+		spec.raw = item->spec;
+		*size = offsetof(struct rte_flow_item_raw, pattern) +
+			spec.raw->length * sizeof(*spec.raw->pattern);
+		break;
+	default:
+empty:
+		*size = 0;
+		break;
+	}
+	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
+}
+
+/** Information about known flow actions. */
+const struct rte_flow_desc_data rte_flow_desc_data_action[] = {
+	[RTE_FLOW_ACTION_TYPE_END] = {
+		.name = "END",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_VOID] = {
+		.name = "VOID",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_PASSTHRU] = {
+		.name = "PASSTHRU",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_MARK] = {
+		.name = "MARK",
+		.size = sizeof(struct rte_flow_action_mark),
+	},
+	[RTE_FLOW_ACTION_TYPE_FLAG] = {
+		.name = "FLAG",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_QUEUE] = {
+		.name = "QUEUE",
+		.size = sizeof(struct rte_flow_action_queue),
+	},
+	[RTE_FLOW_ACTION_TYPE_DROP] = {
+		.name = "DROP",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_COUNT] = {
+		.name = "COUNT",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_DUP] = {
+		.name = "DUP",
+		.size = sizeof(struct rte_flow_action_dup),
+	},
+	[RTE_FLOW_ACTION_TYPE_RSS] = {
+		.name = "RSS",
+		.size = sizeof(struct rte_flow_action_rss), /* +queue[] */
+	},
+	[RTE_FLOW_ACTION_TYPE_PF] = {
+		.name = "PF",
+		.size = 0,
+	},
+	[RTE_FLOW_ACTION_TYPE_VF] = {
+		.name = "VF",
+		.size = sizeof(struct rte_flow_action_vf),
+	},
+	[RTE_FLOW_ACTION_TYPE_MAX] = {
+		.name = "MAX",
+		.size = 0,
+	},
+};
+
+/** Compute storage space needed by action configuration. */
+static void
+flow_action_conf_size(const struct rte_flow_action *action,
+		      size_t *size, size_t *pad)
+{
+	union {
+		const struct rte_flow_action_rss *rss;
+	} conf;
+
+	if (!action->conf)
+		goto empty;
+	switch (action->type) {
+	case RTE_FLOW_ACTION_TYPE_RSS:
+		conf.rss = action->conf;
+		*size = offsetof(struct rte_flow_action_rss, queue) +
+			conf.rss->num * sizeof(*conf.rss->queue);
+		break;
+	default:
+empty:
+		*size = 0;
+		break;
+	}
+	*pad = RTE_ALIGN_CEIL(*size, sizeof(double)) - *size;
+}
+
+/** Copy a comprehensive description of an rte_flow. */
+struct rte_flow_desc *
+rte_flow_copy(const struct rte_flow_attr *attr,
+	      const struct rte_flow_item *items,
+	      const struct rte_flow_action *actions,
+	      void *(zalloc)(size_t, size_t))
+{
+	const struct rte_flow_item *item;
+	const struct rte_flow_action *action;
+	struct rte_flow_desc *fd = NULL;
+	size_t tmp;
+	size_t pad;
+	size_t off1 = 0;
+	size_t off2 = 0;
+	int err = ENOTSUP;
+
+	if (zalloc == NULL)
+		zalloc = &calloc;
+
+store:
+	item = items;
+	if (fd)
+		fd->items = (void *)&fd->data[off1];
+	do {
+		struct rte_flow_item *dst = NULL;
+
+		if ((unsigned int)item->type >=
+			RTE_DIM(rte_flow_desc_data_item) ||
+		    !rte_flow_desc_data_item[item->type].name)
+			goto notsup;
+		if (fd)
+			dst = memcpy(fd->data + off1, item, sizeof(*item));
+		off1 += sizeof(*item);
+		flow_item_spec_size(item, &tmp, &pad);
+		if (item->spec) {
+			if (fd)
+				dst->spec = memcpy(fd->data + off2,
+						   item->spec, tmp);
+			off2 += tmp + pad;
+		}
+		if (item->last) {
+			if (fd)
+				dst->last = memcpy(fd->data + off2,
+						   item->last, tmp);
+			off2 += tmp + pad;
+		}
+		if (item->mask) {
+			if (fd)
+				dst->mask = memcpy(fd->data + off2,
+						   item->mask, tmp);
+			off2 += tmp + pad;
+		}
+		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
+	} while ((item++)->type != RTE_FLOW_ITEM_TYPE_END);
+	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
+	action = actions;
+	if (fd)
+		fd->actions = (void *)&fd->data[off1];
+	do {
+		struct rte_flow_action *dst = NULL;
+
+		if ((unsigned int)action->type >=
+			RTE_DIM(rte_flow_desc_data_action) ||
+		    !rte_flow_desc_data_action[action->type].name)
+			goto notsup;
+		if (fd)
+			dst = memcpy(fd->data + off1, action, sizeof(*action));
+		off1 += sizeof(*action);
+		flow_action_conf_size(action, &tmp, &pad);
+		if (action->conf) {
+			if (fd)
+				dst->conf = memcpy(fd->data + off2,
+						   action->conf, tmp);
+			off2 += tmp + pad;
+		}
+		off2 = RTE_ALIGN_CEIL(off2, sizeof(double));
+	} while ((action++)->type != RTE_FLOW_ACTION_TYPE_END);
+	if (fd != NULL)
+		return fd;
+	off1 = RTE_ALIGN_CEIL(off1, sizeof(double));
+	tmp = RTE_ALIGN_CEIL(offsetof(struct rte_flow_desc, data),
+			     sizeof(double));
+	fd = zalloc(1, tmp + off1 + off2);
+	if (fd == NULL)
+		err = errno;
+	else {
+		*fd = (const struct rte_flow_desc){
+			.size = tmp + off1 + off2,
+			.attr = *attr,
+		};
+		tmp -= offsetof(struct rte_flow_desc, data);
+		off2 = tmp + off1;
+		off1 = tmp;
+		goto store;
+	}
+notsup:
+	rte_errno = err;
+	return NULL;
+}
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index 171a569..f0b33d2 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -282,6 +282,11 @@ enum rte_flow_item_type {
 	 * See struct rte_flow_item_nvgre.
 	 */
 	RTE_FLOW_ITEM_TYPE_NVGRE,
+
+	/**
+	 * Sentinel
+	 */
+	RTE_FLOW_ITEM_TYPE_MAX,
 };
 
 /**
@@ -779,6 +784,11 @@ enum rte_flow_action_type {
 	 * See struct rte_flow_action_vf.
 	 */
 	RTE_FLOW_ACTION_TYPE_VF,
+
+	/**
+	 * Sentinel
+	 */
+	RTE_FLOW_ACTION_TYPE_MAX,
 };
 
 /**
@@ -1083,6 +1093,55 @@ rte_flow_query(uint8_t port_id,
 	       void *data,
 	       struct rte_flow_error *error);
 
+/**
+ * Generic flow generator description.
+ *
+ * This description must be sufficient to describe an rte_flow independently
+ * from any PMD implementation and allow for replayability and identification.
+ */
+struct rte_flow_desc {
+	size_t size; /**< Allocated space including data[]. */
+	struct rte_flow_attr attr; /**< Attributes. */
+	struct rte_flow_item *items; /**< Items. */
+	struct rte_flow_action *actions; /**< Actions. */
+	uint8_t data[]; /**< Storage for items/actions. */
+};
+
+/**
+ * Copy an rte_flow rule description
+ *
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ * @param zalloc
+ *   A zeroing allocator function. If NULL, the standard calloc() function is
+ *   used.
+ *
+ * @return
+ *   The flow description structure sufficient to re-create or identify a
+ *   specific flow if successful ; NULL otherwise, in which case rte_error
+ *   is set.
+ */
+struct rte_flow_desc *
+rte_flow_copy(const struct rte_flow_attr *attr,
+	      const struct rte_flow_item *items,
+	      const struct rte_flow_action *actions,
+	      void *(zalloc)(size_t, size_t));
+
+/**
+ * Flow elements description tables.
+ */
+struct rte_flow_desc_data {
+	const char *name;
+	size_t size;
+};
+
+extern const struct rte_flow_desc_data rte_flow_desc_data_item[];
+extern const struct rte_flow_desc_data rte_flow_desc_data_action[];
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 03/13] ethdev: add deferred intermediate device state
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 01/13] ethdev: save VLAN filter setting Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 02/13] ethdev: add flow API rule copy function Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 04/13] pci: expose device detach routine Gaetan Rivet
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

This device state means that the device is managed externally, by
whichever party has set this state (PMD or application).

Note: this new device state is only an information. The related device
structure and operators are still valid and can be used normally.

It is however made private by device management helpers within ethdev,
making the device invisible to applications.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 lib/librte_ether/rte_ethdev.c | 3 ++-
 lib/librte_ether/rte_ethdev.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 61a63b7..7824f87 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -382,7 +382,8 @@ int
 rte_eth_dev_is_valid_port(uint8_t port_id)
 {
 	if (port_id >= RTE_MAX_ETHPORTS ||
-	    rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
+	    (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
+	     rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
 		return 0;
 	else
 		return 1;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f01d140..ae1e9e6 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1629,6 +1629,7 @@ struct rte_eth_rxtx_callback {
 enum rte_eth_dev_state {
 	RTE_ETH_DEV_UNUSED = 0,
 	RTE_ETH_DEV_ATTACHED,
+	RTE_ETH_DEV_DEFERRED,
 };
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 04/13] pci: expose device detach routine
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (2 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 03/13] ethdev: add deferred intermediate device state Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 05/13] pci: expose parse and probe routines Gaetan Rivet
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Make the pci_detach_all_drivers public. No further changes.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  5 +++++
 lib/librte_eal/common/eal_common_pci.c          |  6 +++---
 lib/librte_eal/common/include/rte_pci.h         | 15 +++++++++++++++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  5 +++++
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 2cf1ac8..3a87756 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -185,3 +185,8 @@ DPDK_17.02 {
 	rte_bus_unregister;
 
 } DPDK_16.11;
+
+DPDK_17.05 {
+	rte_eal_pci_detach_all_drivers;
+
+} DPDK_17.02;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 72547bd..d38335a 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -313,8 +313,8 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
  * registered driver for the given device. Return -1 if initialization
  * failed, return 1 if no driver is found for this device.
  */
-static int
-pci_detach_all_drivers(struct rte_pci_device *dev)
+int
+rte_eal_pci_detach_all_drivers(struct rte_pci_device *dev)
 {
 	struct rte_pci_driver *dr = NULL;
 	int rc = 0;
@@ -388,7 +388,7 @@ rte_eal_pci_detach(const struct rte_pci_addr *addr)
 		if (rte_eal_compare_pci_addr(&dev->addr, addr))
 			continue;
 
-		ret = pci_detach_all_drivers(dev);
+		ret = rte_eal_pci_detach_all_drivers(dev);
 		if (ret < 0)
 			goto err_return;
 
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 4a67883..598a1ef 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -459,6 +459,21 @@ void pci_unmap_resource(void *requested_addr, size_t size);
 int rte_eal_pci_probe_one(const struct rte_pci_addr *addr);
 
 /**
+ * Remove any PCI drivers tied to the device.
+ *
+ * If vendor/device ID match, call the remove() function of all
+ * registered driver for the given device.
+ *
+ * @param dev
+ *	The PCI device to remove
+ * @return
+ *   - 0 on success.
+ *   - Negative on error.
+ *   - Positive if no driver exists for that device.
+ */
+int rte_eal_pci_detach_all_drivers(struct rte_pci_device *dev);
+
+/**
  * Close the single PCI device.
  *
  * Scan the content of the PCI bus, and find the pci device specified by pci
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 3c68ff5..192c0d5 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -189,3 +189,8 @@ DPDK_17.02 {
 	rte_bus_unregister;
 
 } DPDK_16.11;
+
+DPDK_17.05 {
+	rte_eal_pci_detach_all_drivers;
+
+} DPDK_17.02;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 05/13] pci: expose parse and probe routines
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (3 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 04/13] pci: expose device detach routine Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 06/13] net/failsafe: add fail-safe PMD Gaetan Rivet
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Make pci_probe_all_drivers public, no further changes to it.
Introduce a public function for pci_scan_one. This functions scan one
device, but does not allocate that device or insert it within the device
list.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_pci.c          | 10 +++----
 lib/librte_eal/common/include/rte_pci.h         | 25 ++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci.c           | 39 +++++++++++++++----------
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  2 ++
 5 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 3a87756..ba8f8e4 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -188,5 +188,6 @@ DPDK_17.02 {
 
 DPDK_17.05 {
 	rte_eal_pci_detach_all_drivers;
+	rte_eal_pci_probe_all_drivers;
 
 } DPDK_17.02;
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index d38335a..15a0c48 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -282,8 +282,8 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,
  * registered driver for the given device. Return -1 if initialization
  * failed, return 1 if no driver is found for this device.
  */
-static int
-pci_probe_all_drivers(struct rte_pci_device *dev)
+int
+rte_eal_pci_probe_all_drivers(struct rte_pci_device *dev)
 {
 	struct rte_pci_driver *dr = NULL;
 	int rc = 0;
@@ -358,7 +358,7 @@ rte_eal_pci_probe_one(const struct rte_pci_addr *addr)
 		if (rte_eal_compare_pci_addr(&dev->addr, addr))
 			continue;
 
-		ret = pci_probe_all_drivers(dev);
+		ret = rte_eal_pci_probe_all_drivers(dev);
 		if (ret)
 			goto err_return;
 		return 0;
@@ -430,10 +430,10 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			ret = pci_probe_all_drivers(dev);
+			ret = rte_eal_pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
 			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-			ret = pci_probe_all_drivers(dev);
+			ret = rte_eal_pci_probe_all_drivers(dev);
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 598a1ef..4291da8 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -373,6 +373,16 @@ rte_eal_compare_pci_addr(const struct rte_pci_addr *addr,
 int rte_eal_pci_scan(void);
 
 /**
+ * Parse the content of one PCI entry.
+ *
+ * @return
+ *  0 on success, negative on error
+ */
+int rte_eal_pci_parse_sysfs_entry(struct rte_pci_device *dev,
+				 const char *dirname,
+				 const struct rte_pci_addr *addr);
+
+/**
  * Probe the PCI bus for registered drivers.
  *
  * Scan the content of the PCI bus, and call the probe() function for
@@ -459,6 +469,21 @@ void pci_unmap_resource(void *requested_addr, size_t size);
 int rte_eal_pci_probe_one(const struct rte_pci_addr *addr);
 
 /**
+ * Probe all pci drivers against the device.
+ *
+ * If vendor/device ID match, call the probe() function of all
+ * registered driver for the given device.
+ *
+ * @param dev
+ *	The PCI device to probe
+ * @return
+ *   - 0 on success.
+ *   - Negative on error.
+ *   - Positive if no driver exists for that device.
+ */
+int rte_eal_pci_probe_all_drivers(struct rte_pci_device *dev);
+
+/**
  * Remove any PCI drivers tied to the device.
  *
  * If vendor/device ID match, call the remove() function of all
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index e2fc219..51c6b84 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -226,27 +226,22 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
 	return -1;
 }
 
-/* Scan one pci sysfs entry, and fill the devices list from it. */
-static int
-pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
+/* Parse a pci sysfs entry */
+int
+rte_eal_pci_parse_sysfs_entry(struct rte_pci_device *dev, const char *dirname,
+			const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
-	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
-		return -1;
-
 	memset(dev, 0, sizeof(*dev));
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -254,7 +249,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -263,7 +257,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -272,7 +265,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -281,7 +273,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		free(dev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -320,7 +311,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		free(dev);
 		return -1;
 	}
 
@@ -329,7 +319,6 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		free(dev);
 		return -1;
 	}
 
@@ -345,6 +334,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	} else
 		dev->kdrv = RTE_KDRV_NONE;
 
+	return 0;
+}
+
+/* Scan one pci sysfs entry, and fill the devices list from it. */
+static int
+pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
+{
+	struct rte_pci_device *dev;
+	int ret;
+
+	dev = malloc(sizeof(*dev));
+	if (dev == NULL)
+		return -1;
+
+	ret = rte_eal_pci_parse_sysfs_entry(dev, dirname, addr);
+	if (ret) {
+		free(dev);
+		return ret;
+	}
+
 	/* device is valid, add in list (sorted) */
 	if (TAILQ_EMPTY(&pci_device_list)) {
 		rte_eal_device_insert(&dev->device);
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 192c0d5..27b7939 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -192,5 +192,7 @@ DPDK_17.02 {
 
 DPDK_17.05 {
 	rte_eal_pci_detach_all_drivers;
+	rte_eal_pci_probe_all_drivers;
+	rte_eal_pci_parse_sysfs_entry;
 
 } DPDK_17.02;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 06/13] net/failsafe: add fail-safe PMD
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (4 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 05/13] pci: expose parse and probe routines Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 07/13] net/failsafe: add plug-in support Gaetan Rivet
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Introduce the fail-safe poll mode driver initialization and enable its
build infrastructure.

This PMD allows for applications to benefits from true hot-plugging
support without having to implement it.

It intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back.
It also allows defining a contingency to the removal of a device, by
designating a fail-over device that will take on transmitting operations
if the preferred device is removed.

Applications only see a fail-safe instance, without caring for
underlying activity ensuring their continued operations.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                             |   5 +
 config/common_base                      |   5 +
 doc/guides/nics/fail_safe.rst           | 133 +++++++
 doc/guides/nics/features/failsafe.ini   |  24 ++
 doc/guides/nics/index.rst               |   1 +
 drivers/net/Makefile                    |   1 +
 drivers/net/failsafe/Makefile           |  72 ++++
 drivers/net/failsafe/failsafe.c         | 229 +++++++++++
 drivers/net/failsafe/failsafe_args.c    | 347 ++++++++++++++++
 drivers/net/failsafe/failsafe_eal.c     | 318 +++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     | 677 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h | 232 +++++++++++
 drivers/net/failsafe/failsafe_rxtx.c    | 107 +++++
 mk/rte.app.mk                           |   1 +
 14 files changed, 2152 insertions(+)
 create mode 100644 doc/guides/nics/fail_safe.rst
 create mode 100644 doc/guides/nics/features/failsafe.ini
 create mode 100644 drivers/net/failsafe/Makefile
 create mode 100644 drivers/net/failsafe/failsafe.c
 create mode 100644 drivers/net/failsafe/failsafe_args.c
 create mode 100644 drivers/net/failsafe/failsafe_eal.c
 create mode 100644 drivers/net/failsafe/failsafe_ops.c
 create mode 100644 drivers/net/failsafe/failsafe_private.h
 create mode 100644 drivers/net/failsafe/failsafe_rxtx.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cc3bf98..ab9ed0c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -315,6 +315,11 @@ M: Matej Vido <vido@cesnet.cz>
 F: drivers/net/szedata2/
 F: doc/guides/nics/szedata2.rst
 
+Fail-safe PMD
+M: Gaetan Rivet <gaetan.rivet@6wind.com>
+F: drivers/net/failsafe/
+F: doc/guides/nics/fail_safe.rst
+
 Intel e1000
 M: Wenzhuo Lu <wenzhuo.lu@intel.com>
 F: drivers/net/e1000/
diff --git a/config/common_base b/config/common_base
index 71a4fcb..ae64a5b 100644
--- a/config/common_base
+++ b/config/common_base
@@ -364,6 +364,11 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
 CONFIG_RTE_LIBRTE_PMD_NULL=y
 
 #
+# Compile fail-safe PMD
+#
+CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
new file mode 100644
index 0000000..056f85f
--- /dev/null
+++ b/doc/guides/nics/fail_safe.rst
@@ -0,0 +1,133 @@
+..  BSD LICENSE
+    Copyright 2017 6WIND S.A.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Fail-safe poll mode driver library
+==================================
+
+The Fail-safe poll mode driver library (**librte_pmd_failsafe**) is a virtual
+device that allows using any device supporting hotplug (sudden device removal
+and plugging on its bus), without modifying other components relying on such
+device (application, other PMDs).
+
+Additionally to the Seamless Hotplug feature, the Fail-safe PMD offers the
+ability to redirect operations to secondary devices when the primary has been
+removed from the system.
+
+.. note::
+
+   The library is enabled by default. You can enable it or disable it manually
+   by setting the ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` configuration option.
+
+Features
+--------
+
+The Fail-safe PMD only supports a limited set of features. If you plan to use a
+device underneath the Fail-safe PMD with a specific feature, this feature must
+be supported by the Fail-safe PMD to avoid throwing any error.
+
+Check the feature matrix for the complete set of supported features.
+
+Compilation options
+-------------------
+
+These options can be modified in the ``$RTE_TARGET/build/.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` (default **y**)
+
+  Toggle compiling librte_pmd_failsafe itself.
+
+- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG`` (default **n**)
+
+  Toggle debugging code.
+
+Using the Fail-safe PMD from the EAL command line
+-------------------------------------------------
+
+The Fail-safe PMD can be used like most other DPDK virtual devices, by passing a
+``--vdev`` parameter to the EAL when starting the application. The device name
+must start with the *net_failsafe* prefix, followed by numbers or letters. This
+name must be unique for each device. Each fail-safe instance must have at least one
+sub-device, up to ``RTE_MAX_ETHPORTS-1``.
+
+A sub-device can be any legal DPDK device, including possibly another fail-safe
+instance.
+
+Fail-safe command line parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **dev(<iface>)** parameter
+
+  This parameter allows the user to define a sub-device. The ``<iface>`` part of
+  this parameter must be a valid device definition. It could be the argument
+  provided to a ``-w`` PCI device specification or the argument that would be
+  given to a ``--vdev`` parameter (including a fail-safe).
+  Enclosing the device definition within parenthesis here allows using
+  additional sub-device parameters if need be. They will be passed on to the
+  sub-device.
+
+- **mac** parameter [MAC address]
+
+  This parameter allows the user to set a default MAC address to the fail-safe
+  and all of its sub-devices.
+  If no default mac address is provided, the fail-safe PMD will read the MAC
+  address of the first of its sub-device to be successfully probed and use it as
+  its default MAC address, trying to set it to all of its other sub-devices.
+  If no sub-device was successfully probed at initialization, then a random MAC
+  address is generated, that will be subsequently applied to all sub-device once
+  they are probed.
+
+Usage example
+~~~~~~~~~~~~~
+
+This section shows some example of using **testpmd** with a fail-safe PMD.
+
+#. Request huge pages:
+
+   .. code-block:: console
+
+      echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Start testpmd
+
+   .. code-block:: console
+
+      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 --no-pci \
+         --vdev='net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0,nodeaction=r1:0:CREATE)' -- \
+         -i
+
+Using the Fail-safe PMD from an application
+-------------------------------------------
+
+This driver strives to be as seamless as possible to existing applications, in
+order to propose the hotplug functionality in the easiest way possible.
+
+Care must be taken, however, to respect the **ether** API concerning device
+access, and in particular, using the ``RTE_ETH_FOREACH_DEV`` macro to iterate
+over ethernet devices, instead of directly accessing them or by writing one's
+own device iterator.
diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
new file mode 100644
index 0000000..3c52823
--- /dev/null
+++ b/doc/guides/nics/features/failsafe.ini
@@ -0,0 +1,24 @@
+;
+; Supported features of the 'fail-safe' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+Queue start/stop     = Y
+MTU update           = Y
+Jumbo frame          = Y
+Promiscuous mode     = Y
+Allmulticast mode    = Y
+Unicast MAC filter   = Y
+Multicast MAC filter = Y
+VLAN filter          = Y
+Packet type parsing  = Y
+Basic stats          = Y
+Stats per queue      = Y
+ARMv7                = Y
+ARMv8                = Y
+Power8               = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..0ba52e1 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -58,6 +58,7 @@ Network Interface Controller Drivers
     vhost
     vmxnet3
     pcap_ring
+    fail_safe
 
 **Figures**
 
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 40fc333..3c658b4 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -38,6 +38,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
 DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += e1000
 DIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
new file mode 100644
index 0000000..06199ad
--- /dev/null
+++ b/drivers/net/failsafe/Makefile
@@ -0,0 +1,72 @@
+#   BSD LICENSE
+#
+#   Copyright 2017 6WIND S.A.
+#   Copyright 2017 Mellanox.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of 6WIND S.A. nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name
+LIB = librte_pmd_failsafe.a
+
+# Sources are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_args.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
+
+# No exported include files
+
+# This lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf
+
+ifneq ($(DEBUG),)
+CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG := y
+endif
+
+# Basic CFLAGS:
+CFLAGS += -std=gnu99 -Wall -Wextra
+CFLAGS += -I.
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=700
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+CFLAGS += -pedantic -DPEDANTIC
+
+ifeq ($(CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG),y)
+CFLAGS += -g -UNDEBUG
+else
+CFLAGS += -O3
+CFLAGS += -DNDEBUG
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
new file mode 100644
index 0000000..cd60193
--- /dev/null
+++ b/drivers/net/failsafe/failsafe.c
@@ -0,0 +1,229 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <rte_alarm.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "failsafe_private.h"
+
+const char pmd_failsafe_driver_name[] = FAILSAFE_DRIVER_NAME;
+static const struct rte_eth_link eth_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_UP,
+	.link_autoneg = ETH_LINK_SPEED_AUTONEG,
+};
+
+static int
+sub_device_create(struct rte_eth_dev *dev,
+		const char *params)
+{
+	uint8_t nb_subs;
+	int ret;
+
+	ret = failsafe_args_count_subdevice(dev, params);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->subs_tail > FAILSAFE_MAX_ETHPORTS) {
+		ERROR("Cannot allocate more than %d ports",
+			FAILSAFE_MAX_ETHPORTS);
+		return -ENOSPC;
+	}
+	nb_subs = PRIV(dev)->subs_tail;
+	PRIV(dev)->subs = rte_zmalloc(NULL,
+			sizeof(struct sub_device) * nb_subs,
+			RTE_CACHE_LINE_SIZE);
+	if (PRIV(dev)->subs == NULL) {
+		ERROR("Could not allocate sub_devices");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void
+sub_device_free(struct rte_eth_dev *dev)
+{
+	rte_free(PRIV(dev)->subs);
+}
+
+static int
+eth_dev_create(const char *name,
+		const unsigned socket_id,
+		const char *params)
+{
+	struct rte_eth_dev *dev;
+	struct ether_addr *mac;
+	struct fs_priv *priv;
+	int ret;
+
+	dev = NULL;
+	priv = NULL;
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Creating fail-safe device on NUMA socket %u",
+			socket_id);
+	dev = rte_eth_dev_allocate(name);
+	if (dev == NULL) {
+		ERROR("Unable to allocate rte_eth_dev");
+		return -1;
+	}
+	priv = rte_zmalloc_socket(name, sizeof(*priv), 0, socket_id);
+	if (priv == NULL) {
+		ERROR("Unable to allocate private data");
+		goto free_dev;
+	}
+	dev->data->dev_private = priv;
+	PRIV(dev)->dev = dev;
+	dev->dev_ops = &failsafe_ops;
+	TAILQ_INIT(&dev->link_intr_cbs);
+	dev->data->dev_flags = 0x0;
+	dev->driver = NULL;
+	dev->data->kdrv = RTE_KDRV_NONE;
+	dev->data->drv_name = pmd_failsafe_driver_name;
+	dev->data->numa_node = socket_id;
+	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
+	dev->data->dev_link = eth_link;
+	PRIV(dev)->nb_mac_addr = 1;
+	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
+	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
+	if (params == NULL) {
+		ERROR("This PMD requires sub-devices, none provided");
+		goto free_priv;
+	}
+	ret = sub_device_create(dev, params);
+	if (ret) {
+		ERROR("Could not allocate sub_devices");
+		goto free_priv;
+	}
+	ret = failsafe_args_parse(dev, params);
+	if (ret)
+		goto free_subs;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		goto free_args;
+	mac = &dev->data->mac_addrs[0];
+	if (!mac_from_arg) {
+		struct sub_device *sdev;
+		uint8_t i;
+
+		/*
+		 * Use the ether_addr from first probed
+		 * device, either preferred or fallback.
+		 */
+		FOREACH_SUBDEV(sdev, i, dev)
+			if (sdev->state >= DEV_PROBED) {
+				ether_addr_copy(&ETH(sdev)->data->mac_addrs[0],
+						mac);
+				break;
+			}
+		/*
+		 * If no device has been probed and no ether_addr
+		 * has been provided on the command line, use a random
+		 * valid one.
+		 */
+		if (i == priv->subs_tail) {
+			eth_random_addr(
+				&mac->addr_bytes[0]);
+			ret = rte_eth_dev_default_mac_addr_set(
+						dev->data->port_id, mac);
+			if (ret) {
+				ERROR("Failed to set default MAC address");
+				goto free_subs;
+			}
+		}
+	}
+	INFO("MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5]);
+	return 0;
+free_args:
+	failsafe_args_free(dev);
+free_subs:
+	sub_device_free(dev);
+free_priv:
+	rte_free(priv);
+free_dev:
+	rte_eth_dev_release_port(dev);
+	return -1;
+}
+
+static int
+rte_eth_free(const char *name)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	dev = rte_eth_dev_allocated(name);
+	if (dev == NULL)
+		return -ENODEV;
+	ret = failsafe_eal_uninit(dev);
+	if (ret)
+		ERROR("Error while uninitializing sub-EAL");
+	failsafe_args_free(dev);
+	sub_device_free(dev);
+	rte_free(PRIV(dev));
+	rte_eth_dev_release_port(dev);
+	return ret;
+}
+
+static int
+rte_pmd_failsafe_probe(const char *name, const char *params)
+{
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
+			name);
+	return eth_dev_create(name, rte_socket_id(), params);
+}
+
+static int
+rte_pmd_failsafe_remove(const char *name)
+{
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Uninitializing " FAILSAFE_DRIVER_NAME " for %s", name);
+	return rte_eth_free(name);
+}
+
+static struct rte_vdev_driver failsafe_drv = {
+	.probe	= rte_pmd_failsafe_probe,
+	.remove	= rte_pmd_failsafe_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
+RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);
+RTE_PMD_REGISTER_PARAM_STRING(net_failsafe, PMD_FAILSAFE_PARAM_STRING);
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
new file mode 100644
index 0000000..faa48f4
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -0,0 +1,347 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include <string.h>
+#include <errno.h>
+
+#include <rte_devargs.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+
+#include "failsafe_private.h"
+
+#define DEVARGS_MAXLEN 4096
+
+/* Callback used when a new device is found in devargs */
+typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
+		uint8_t head);
+
+int mac_from_arg;
+
+const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_MAC_KVARG,
+	NULL,
+};
+
+/*
+ * input: text.
+ * output: 0: if text[0] != '(',
+ *         0: if there are no corresponding ')'
+ *         n: distance to corresponding ')' otherwise
+ */
+static size_t
+closing_paren(const char *text)
+{
+	int nb_open = 0;
+	size_t i = 0;
+
+	while (text[i] != '\0') {
+		if (text[i] == '(')
+			nb_open++;
+		if (text[i] == ')')
+			nb_open--;
+		if (nb_open == 0)
+			return i;
+		i++;
+	}
+	return 0;
+}
+
+static int
+parse_device(struct sub_device *sdev, char *args)
+{
+	struct rte_devargs *d;
+	size_t b;
+
+	b = 0;
+	d = &sdev->devargs;
+	DEBUG("%s", args);
+	while (args[b] != ',' &&
+	       args[b] != '\0')
+		b++;
+	if (args[b] != '\0') {
+		d->args = strdup(&args[b + 1]);
+		if (d->args == NULL) {
+			ERROR("Not enough memory to store sub-device args");
+			return -ENOMEM;
+		}
+		args[b] = '\0';
+	} else {
+		d->args = strdup("");
+	}
+	if (eal_parse_pci_BDF(args, &d->pci.addr) == 0 ||
+	    eal_parse_pci_DomBDF(args, &d->pci.addr) == 0)
+		d->type = RTE_DEVTYPE_WHITELISTED_PCI;
+	else {
+		d->type = RTE_DEVTYPE_VIRTUAL;
+		snprintf(d->virt.drv_name,
+				sizeof(d->virt.drv_name), "%s", args);
+	}
+	sdev->state = DEV_PARSED;
+	return 0;
+}
+
+static int
+parse_device_param(struct rte_eth_dev *dev, const char *param,
+		uint8_t head)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	char *args = NULL;
+	size_t a, b;
+	int ret;
+
+	priv = PRIV(dev);
+	a = 0;
+	b = 0;
+	ret = 0;
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	a = b;
+	b += closing_paren(&param[b]);
+	if (a == b) {
+		ERROR("Dangling parenthesis");
+		return -EINVAL;
+	}
+	a += 1;
+	args = strndup(&param[a], b - a);
+	if (args == NULL) {
+		ERROR("Not enough memory for parameter parsing");
+		return -ENOMEM;
+	}
+	sdev = &priv->subs[head];
+	if (strncmp(param, "dev", 3) == 0) {
+		ret = parse_device(sdev, args);
+		if (ret)
+			goto free_args;
+	} else {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+free_args:
+	free(args);
+	return ret;
+}
+
+static int
+parse_sub_devices(parse_cb *cb,
+		struct rte_eth_dev *dev, const char *params)
+{
+	size_t a, b;
+	uint8_t head;
+	int ret;
+
+	a = 0;
+	head = 0;
+	ret = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',') {
+			a = b + 1;
+			continue;
+		}
+		if (params[b] == '(') {
+			size_t start = b;
+
+			b += closing_paren(&params[b]);
+			if (b == start) {
+				ERROR("Dangling parenthesis");
+				return -EINVAL;
+			}
+			ret = (*cb)(dev, &params[a], head);
+			if (ret)
+				return ret;
+			head += 1;
+			b += 1;
+			if (params[b] == '\0')
+				return 0;
+		}
+		a = b + 1;
+	}
+	return 0;
+}
+
+static int
+remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
+{
+	char buffer[DEVARGS_MAXLEN] = {0};
+	size_t a, b;
+	int i;
+
+	a = 0;
+	i = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',' || params[b] == '\0')
+			i += snprintf(&buffer[i], b - a + 1, "%s", &params[a]);
+		if (params[b] == '(') {
+			size_t start = b;
+			b += closing_paren(&params[b]);
+			if (b == start)
+				return -EINVAL;
+			b += 1;
+			if (params[b] == '\0')
+				goto out;
+		}
+		a = b + 1;
+	}
+out:
+	snprintf(params, DEVARGS_MAXLEN, "%s", buffer);
+	return 0;
+}
+
+static int
+get_mac_addr_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	struct ether_addr *ea = out;
+	int ret;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+		&ea->addr_bytes[0], &ea->addr_bytes[1],
+		&ea->addr_bytes[2], &ea->addr_bytes[3],
+		&ea->addr_bytes[4], &ea->addr_bytes[5]);
+	return ret != ETHER_ADDR_LEN;
+}
+
+int
+failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
+{
+	struct fs_priv *priv;
+	char mut_params[DEVARGS_MAXLEN] = "";
+	struct rte_kvargs *kvlist = NULL;
+	unsigned arg_count;
+	size_t n;
+	int ret;
+
+	if (dev == NULL || params == NULL)
+		return -EINVAL;
+	priv = PRIV(dev);
+	ret = 0;
+	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
+	/* default parameters */
+	mac_from_arg = 0;
+	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
+	if (n >= sizeof(mut_params)) {
+		ERROR("Parameter string too long (>=%zu)",
+				sizeof(mut_params));
+		return -ENOMEM;
+	}
+	ret = parse_sub_devices(parse_device_param,
+				dev, params);
+	if (ret < 0)
+		return ret;
+	ret = remove_sub_devices_definition(mut_params);
+	if (ret < 0)
+		return ret;
+	if (strnlen(mut_params, sizeof(mut_params)) > 0) {
+		kvlist = rte_kvargs_parse(mut_params,
+				pmd_failsafe_init_parameters);
+		if (kvlist == NULL) {
+			ERROR("Error parsing parameters, usage:\n"
+				PMD_FAILSAFE_PARAM_STRING);
+			return -1;
+		}
+		/* MAC addr */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_MAC_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_MAC_KVARG,
+					&get_mac_addr_arg,
+					&dev->data->mac_addrs[0]);
+			if (ret < 0)
+				goto free_kvlist;
+			mac_from_arg = 1;
+		}
+	}
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+void
+failsafe_args_free(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		free(sdev->devargs.args);
+		sdev->devargs.args = NULL;
+	}
+}
+
+static int
+count_device(struct rte_eth_dev *dev, const char *param,
+		uint8_t head __rte_unused)
+{
+	size_t b = 0;
+
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	if (strncmp(param, "dev", b) &&
+	    strncmp(param, "exec", b)) {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+	PRIV(dev)->subs_tail += 1;
+	return 0;
+}
+
+int
+failsafe_args_count_subdevice(struct rte_eth_dev *dev,
+			const char *params)
+{
+	return parse_sub_devices(count_device,
+				dev, params);
+}
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
new file mode 100644
index 0000000..fcee500
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -0,0 +1,318 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev *
+pci_addr_to_eth_dev(struct rte_pci_addr *addr)
+{
+	uint8_t pid;
+
+	if (addr == NULL)
+		return NULL;
+	for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+		struct rte_pci_addr *addr2;
+		struct rte_eth_dev *edev;
+
+		edev = &rte_eth_devices[pid];
+		if (edev->device == NULL ||
+		    edev->device->devargs == NULL)
+			continue;
+		addr2 = &edev->device->devargs->pci.addr;
+		if (rte_eal_compare_pci_addr(addr, addr2) == 0)
+			return edev;
+	}
+	return NULL;
+}
+
+static int
+pci_scan_one(struct sub_device *sdev)
+{
+	struct rte_devargs *da;
+	char dirname[PATH_MAX];
+
+	da = &sdev->devargs;
+	snprintf(dirname, sizeof(dirname),
+		"%s/" PCI_PRI_FMT,
+		pci_get_sysfs_path(),
+		da->pci.addr.domain,
+		da->pci.addr.bus,
+		da->pci.addr.devid,
+		da->pci.addr.function);
+	errno = 0;
+	if (rte_eal_pci_parse_sysfs_entry(&sdev->pci_device,
+		dirname, &da->pci.addr) < 0) {
+		if (errno == ENOENT) {
+			DEBUG("Could not scan requested device " PCI_PRI_FMT,
+				da->pci.addr.domain,
+				da->pci.addr.bus,
+				da->pci.addr.devid,
+				da->pci.addr.function);
+		} else {
+			ERROR("Error while scanning sysfs entry %s",
+					dirname);
+			return -1;
+		}
+	} else {
+		sdev->state = DEV_SCANNED;
+	}
+	return 0;
+}
+
+static int
+pci_scan(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			sdev->device.devargs = da;
+			ret = pci_scan_one(sdev);
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+dev_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			sdev->device.devargs = da;
+			PRIV(dev)->current_probed = i;
+			ret = rte_eal_vdev_init(da->virt.drv_name,
+						da->args);
+			if (ret)
+				return ret;
+			sdev->eth_dev =
+				rte_eth_dev_allocated(da->virt.drv_name);
+			if (ETH(sdev) == NULL) {
+				ERROR("sub_device %d init went wrong", i);
+				continue;
+			}
+			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			sdev->state = DEV_PROBED;
+		}
+	}
+	return 0;
+}
+
+static int
+pci_probe(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_pci_device *pdev = NULL;
+	struct rte_devargs *da = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_SCANNED)
+			continue;
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			pdev->device.devargs = da;
+			ret = rte_eal_pci_probe_all_drivers(pdev);
+			/* probing failed */
+			if (ret < 0) {
+				ERROR("Failed to probe requested device "
+					PCI_PRI_FMT,
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return -1;
+			}
+			/* no driver found */
+			if (ret > 0) {
+				ERROR("Requested device " PCI_PRI_FMT
+					" has no suitable driver",
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return 1;
+			}
+			sdev->eth_dev =
+				pci_addr_to_eth_dev(&pdev->addr);
+			if (ETH(sdev) == NULL) {
+				ERROR("sub_device %d init went wrong", i);
+				continue;
+			}
+			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			sdev->state = DEV_PROBED;
+		}
+	}
+	return 0;
+}
+
+int
+failsafe_eal_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	ret = pci_scan(dev);
+	if (ret)
+		return ret;
+	ret = pci_probe(dev);
+	if (ret)
+		return ret;
+	ret = dev_init(dev);
+	if (ret)
+		return ret;
+	/*
+	 * We only update TX_SUBDEV if we are not started.
+	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
+	 * preferred port only upon starting it, so that the switch is smoother.
+	 */
+	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			/* Using first probed device */
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+				DEBUG("Switching tx_dev to sub_device %d",
+				      i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static int
+dev_uninit(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			sdev->device.devargs = da;
+			/*
+			 * We are lucky here that no virtual device
+			 * currently creates multiple eth_dev.
+			 */
+			ret = rte_eal_vdev_uninit(da->virt.drv_name);
+			if (ret)
+				return ret;
+			sdev->state = DEV_PROBED - 1;
+		}
+	}
+	return 0;
+}
+
+static int
+pci_remove(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	struct rte_pci_device *pdev = NULL;
+	struct rte_devargs *da = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			ret = rte_eal_pci_detach_all_drivers(pdev);
+			/* probing failed */
+			if (ret < 0) {
+				ERROR("Failed to remove requested device "
+					PCI_PRI_FMT,
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return -1;
+			}
+			/* no driver found */
+			if (ret > 0) {
+				ERROR("Requested device " PCI_PRI_FMT
+					" has no suitable driver",
+					da->pci.addr.domain,
+					da->pci.addr.bus,
+					da->pci.addr.devid,
+					da->pci.addr.function);
+				return 1;
+			}
+			sdev->state = DEV_PROBED - 1;
+		}
+	}
+	return 0;
+}
+
+int
+failsafe_eal_uninit(struct rte_eth_dev *dev)
+{
+	int ret;
+
+	ret = pci_remove(dev);
+	if (ret)
+		return ret;
+	ret = dev_uninit(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
new file mode 100644
index 0000000..470fea4
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -0,0 +1,677 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev_info default_infos = {
+	.driver_name = pmd_failsafe_driver_name,
+	/* Max possible number of elements */
+	.max_rx_pktlen = UINT32_MAX,
+	.max_rx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_tx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_mac_addrs = FAILSAFE_MAX_ETHADDR,
+	.max_hash_mac_addrs = UINT32_MAX,
+	.max_vfs = UINT16_MAX,
+	.max_vmdq_pools = UINT16_MAX,
+	.rx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	.tx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	/* Set of understood capabilities */
+	.rx_offload_capa = 0x0,
+	.tx_offload_capa = 0x0,
+	.flow_type_rss_offloads = 0x0,
+};
+
+static int
+fs_dev_configure(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PROBED)
+			continue;
+		DEBUG("Configuring sub-device %d", i);
+		ret = rte_eth_dev_configure(PORT_ID(sdev),
+					dev->data->nb_rx_queues,
+					dev->data->nb_tx_queues,
+					&dev->data->dev_conf);
+		if (ret) {
+			ERROR("Could not configure sub_device %d", i);
+			return ret;
+		}
+		sdev->state = DEV_ACTIVE;
+	}
+	return 0;
+}
+
+static int
+fs_dev_start(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_ACTIVE)
+			continue;
+		DEBUG("Starting sub_device %d", i);
+		ret = rte_eth_dev_start(PORT_ID(sdev));
+		if (ret)
+			return ret;
+		sdev->state = DEV_STARTED;
+	}
+	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+				DEBUG("Switching tx_dev to sub_device %d", i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static void
+fs_dev_stop(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+		rte_eth_dev_stop(PORT_ID(sdev));
+		sdev->state = DEV_STARTED - 1;
+	}
+}
+
+static int
+fs_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
+		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_up failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
+		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_down failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void dev_free_queues(struct rte_eth_dev *dev);
+static void
+fs_dev_close(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Closing sub_device %d", i);
+		rte_eth_dev_close(PORT_ID(sdev));
+		sdev->state = DEV_ACTIVE - 1;
+	}
+	dev_free_queues(dev);
+}
+
+static void
+fs_rx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct rxq *rxq;
+
+	if (queue == NULL)
+		return;
+	rxq = queue;
+	dev = rxq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, rx_queue_release)
+			(&ETH(sdev)->data->rx_queues[rxq->qid]);
+	rte_free(rxq);
+}
+
+static int
+fs_rx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t rx_queue_id,
+		uint16_t nb_rx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		struct rte_mempool *mb_pool)
+{
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	uint8_t i;
+	int ret;
+
+	rxq = dev->data->rx_queues[rx_queue_id];
+	if (rxq != NULL) {
+		fs_rx_queue_release(rxq);
+		dev->data->rx_queues[rx_queue_id] = NULL;
+	}
+	rxq = rte_zmalloc(NULL, sizeof(*rxq),
+			  RTE_CACHE_LINE_SIZE);
+	if (rxq == NULL)
+		return -ENOMEM;
+	rxq->qid = rx_queue_id;
+	rxq->socket_id = socket_id;
+	rxq->info.mp = mb_pool;
+	rxq->info.conf = *rx_conf;
+	rxq->info.nb_desc = nb_rx_desc;
+	rxq->priv = PRIV(dev);
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
+				rx_queue_id,
+				nb_rx_desc, socket_id,
+				rx_conf, mb_pool);
+		if (ret) {
+			ERROR("RX queue setup failed for sub_device %d", i);
+			goto free_rxq;
+		}
+	}
+	return 0;
+free_rxq:
+	fs_rx_queue_release(rxq);
+	return ret;
+}
+
+static void
+fs_tx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct txq *txq;
+
+	if (queue == NULL)
+		return;
+	txq = queue;
+	dev = txq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, tx_queue_release)
+			(&ETH(sdev)->data->tx_queues[txq->qid]);
+	rte_free(txq);
+}
+
+static int
+fs_tx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t tx_queue_id,
+		uint16_t nb_tx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	uint8_t i;
+	int ret;
+
+	txq = dev->data->tx_queues[tx_queue_id];
+	if (txq != NULL) {
+		fs_tx_queue_release(txq);
+		dev->data->tx_queues[tx_queue_id] = NULL;
+	}
+	txq = rte_zmalloc("ethdev TX queue", sizeof(*txq),
+			  RTE_CACHE_LINE_SIZE);
+	if (txq == NULL)
+		return -ENOMEM;
+	txq->qid = tx_queue_id;
+	txq->socket_id = socket_id;
+	txq->info.conf = *tx_conf;
+	txq->info.nb_desc = nb_tx_desc;
+	txq->priv = PRIV(dev);
+	dev->data->tx_queues[tx_queue_id] = txq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev),
+				tx_queue_id,
+				nb_tx_desc, socket_id,
+				tx_conf);
+		if (ret) {
+			ERROR("TX queue setup failed for sub_device %d", i);
+			goto free_txq;
+		}
+	}
+	return 0;
+free_txq:
+	fs_tx_queue_release(txq);
+	return ret;
+}
+
+static void
+dev_free_queues(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		fs_rx_queue_release(dev->data->rx_queues[i]);
+		dev->data->rx_queues[i] = NULL;
+	}
+	dev->data->nb_rx_queues = 0;
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		fs_tx_queue_release(dev->data->tx_queues[i]);
+		dev->data->tx_queues[i] = NULL;
+	}
+	dev->data->nb_tx_queues = 0;
+}
+
+static void
+fs_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_enable(PORT_ID(sdev));
+}
+
+static void
+fs_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_disable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_enable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_disable(PORT_ID(sdev));
+}
+
+static int
+fs_link_update(struct rte_eth_dev *dev,
+		int wait_to_complete)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling link_update on sub_device %d", i);
+		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
+		if (ret && ret != -1) {
+			ERROR("Link update failed for sub_device %d with error %d",
+			      i, ret);
+			return ret;
+		}
+	}
+	if (TX_SUBDEV(dev)) {
+		struct rte_eth_link *l1;
+		struct rte_eth_link *l2;
+
+		l1 = &dev->data->dev_link;
+		l2 = &ETH(TX_SUBDEV(dev))->data->dev_link;
+		if (memcmp(l1, l2, sizeof(*l1))) {
+			*l1 = *l2;
+			return 0;
+		}
+	}
+	return -1;
+}
+
+static void
+fs_stats_get(struct rte_eth_dev *dev,
+		struct rte_eth_stats *stats)
+{
+	struct sub_device *sdev;
+	struct rte_eth_stats loc_stats;
+	uint8_t j;
+	size_t i;
+
+	memset(stats, 0, sizeof(*stats));
+	memset(&loc_stats, 0, sizeof(loc_stats));
+	FOREACH_SUBDEV_ST(sdev, j, dev, DEV_PROBED) {
+		rte_eth_stats_get(PORT_ID(sdev), &loc_stats);
+		stats->ipackets += loc_stats.ipackets;
+		stats->opackets += loc_stats.opackets;
+		stats->ibytes += loc_stats.ibytes;
+		stats->obytes += loc_stats.obytes;
+		stats->imissed += loc_stats.imissed;
+		stats->ierrors += loc_stats.ierrors;
+		stats->oerrors += loc_stats.oerrors;
+		stats->rx_nombuf += loc_stats.rx_nombuf;
+		for (i = 0; i < RTE_DIM(stats->q_ipackets); i++)
+			stats->q_ipackets[i] += loc_stats.q_ipackets[i];
+		for (i = 0; i < RTE_DIM(stats->q_opackets); i++)
+			stats->q_opackets[i] += loc_stats.q_opackets[i];
+		for (i = 0; i < RTE_DIM(stats->q_ibytes); i++)
+			stats->q_ibytes[i] += loc_stats.q_ibytes[i];
+		for (i = 0; i < RTE_DIM(stats->q_obytes); i++)
+			stats->q_obytes[i] += loc_stats.q_obytes[i];
+		for (i = 0; i < RTE_DIM(stats->q_errors); i++)
+			stats->q_errors[i] += loc_stats.q_errors[i];
+	}
+}
+
+static void
+fs_stats_reset(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_stats_reset(PORT_ID(sdev));
+}
+
+/**
+ * Fail-safe dev_infos_get rules:
+ *
+ * No sub_device:
+ *   Numerables:
+ *      Use the maximum possible values for any field, so as not
+ *      to impede any further configuration effort.
+ *   Capabilities:
+ *      Limits capabilities to those that are understood by the
+ *      fail-safe PMD. This understanding stems from the fail-safe
+ *      being capable of verifying that the related capability is
+ *      expressed within the device configuration (struct rte_eth_conf).
+ *
+ * At least one probed sub_device:
+ *   Numerables:
+ *      Uses values from the active probed sub_device
+ *      The rationale here is that if any sub_device is less capable
+ *      (for example concerning the number of queues) than the active
+ *      sub_device, then its subsequent configuration will fail.
+ *      It is impossible to foresee this failure when the failing sub_device
+ *      is supposed to be plugged-in later on, so the configuration process
+ *      is the single point of failure and error reporting.
+ *   Capabilities:
+ *      Uses a logical AND of RX capabilities among
+ *      all sub_devices and the default capabilities.
+ *      Uses a logical AND of TX capabilities among
+ *      the active probed sub_device and the default capabilities.
+ *
+ */
+static void
+fs_dev_infos_get(struct rte_eth_dev *dev,
+		  struct rte_eth_dev_info *infos)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL) {
+		DEBUG("No probed device, using default infos");
+		rte_memcpy(&PRIV(dev)->infos, &default_infos,
+			   sizeof(default_infos));
+	} else {
+		uint32_t rx_offload_capa;
+
+		rx_offload_capa = default_infos.rx_offload_capa;
+		FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+			rte_eth_dev_info_get(PORT_ID(sdev),
+					&PRIV(dev)->infos);
+			rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa;
+		}
+		sdev = TX_SUBDEV(dev);
+		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
+		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
+		PRIV(dev)->infos.tx_offload_capa &=
+					default_infos.tx_offload_capa;
+		PRIV(dev)->infos.flow_type_rss_offloads &=
+					default_infos.flow_type_rss_offloads;
+	}
+	rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
+}
+
+static const uint32_t *
+fs_dev_supported_ptypes_get(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_eth_dev *edev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return NULL;
+	edev = ETH(sdev);
+	/* ENOTSUP: counts as no supported ptypes */
+	if (SUBOPS(sdev, dev_supported_ptypes_get) == NULL)
+		return NULL;
+	/*
+	 * The API does not permit to do a clean AND of all ptypes,
+	 * It is also incomplete by design and we do not really care
+	 * to have a best possible value in this context.
+	 * We just return the ptypes of the device of highest
+	 * priority, usually the PREFERRED device.
+	 */
+	return SUBOPS(sdev, dev_supported_ptypes_get)(edev);
+}
+
+static int
+fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
+		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_vlan_filter failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_flow_ctrl_get(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return 0;
+	if (SUBOPS(sdev, flow_ctrl_get) == NULL)
+		return -ENOTSUP;
+	return SUBOPS(sdev, flow_ctrl_get)(ETH(sdev), fc_conf);
+}
+
+static int
+fs_flow_ctrl_set(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
+		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_flow_ctrl_set failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void
+fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	/* No check: already done within the rte_eth_dev_mac_addr_remove
+	 * call for the fail-safe device.
+	 */
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
+				&dev->data->mac_addrs[index]);
+	PRIV(dev)->mac_addr_pool[index] = 0;
+}
+
+static void
+fs_mac_addr_add(struct rte_eth_dev *dev,
+		struct ether_addr *mac_addr,
+		uint32_t index,
+		uint32_t vmdq)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	assert(index < FAILSAFE_MAX_ETHADDR);
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
+	if (index >= PRIV(dev)->nb_mac_addr) {
+		DEBUG("Growing mac_addrs array");
+		PRIV(dev)->nb_mac_addr = index;
+	}
+	PRIV(dev)->mac_addr_pool[index] = vmdq;
+}
+
+static void
+fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
+}
+
+const struct eth_dev_ops failsafe_ops = {
+	.dev_configure = fs_dev_configure,
+	.dev_start = fs_dev_start,
+	.dev_stop = fs_dev_stop,
+	.dev_set_link_down = fs_dev_set_link_down,
+	.dev_set_link_up = fs_dev_set_link_up,
+	.dev_close = fs_dev_close,
+	.promiscuous_enable = fs_promiscuous_enable,
+	.promiscuous_disable = fs_promiscuous_disable,
+	.allmulticast_enable = fs_allmulticast_enable,
+	.allmulticast_disable = fs_allmulticast_disable,
+	.link_update = fs_link_update,
+	.stats_get = fs_stats_get,
+	.stats_reset = fs_stats_reset,
+	.dev_infos_get = fs_dev_infos_get,
+	.dev_supported_ptypes_get = fs_dev_supported_ptypes_get,
+	.mtu_set = fs_mtu_set,
+	.vlan_filter_set = fs_vlan_filter_set,
+	.rx_queue_setup = fs_rx_queue_setup,
+	.tx_queue_setup = fs_tx_queue_setup,
+	.rx_queue_release = fs_rx_queue_release,
+	.tx_queue_release = fs_tx_queue_release,
+	.flow_ctrl_get = fs_flow_ctrl_get,
+	.flow_ctrl_set = fs_flow_ctrl_set,
+	.mac_addr_remove = fs_mac_addr_remove,
+	.mac_addr_add = fs_mac_addr_add,
+	.mac_addr_set = fs_mac_addr_set,
+};
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
new file mode 100644
index 0000000..84a4d3a
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -0,0 +1,232 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
+#define _RTE_ETH_FAILSAFE_PRIVATE_H_
+
+#include <rte_dev.h>
+#include <rte_pci.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+
+#define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
+
+#define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PARAM_STRING	\
+	"dev(<ifc>),"			\
+	"mac=mac_addr"			\
+	""
+
+#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+
+#define FAILSAFE_MAX_ETHPORTS (RTE_MAX_ETHPORTS - 1)
+#define FAILSAFE_MAX_ETHADDR 128
+
+/* TYPES */
+
+struct rxq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	/* id of last sub_device polled */
+	uint8_t last_polled;
+	unsigned int socket_id;
+	struct rte_eth_rxq_info info;
+};
+
+struct txq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	unsigned int socket_id;
+	struct rte_eth_txq_info info;
+};
+
+enum dev_state {
+	DEV_UNDEFINED = 0,
+	DEV_PARSED,
+	DEV_SCANNED,
+	DEV_PROBED,
+	DEV_ACTIVE,
+	DEV_STARTED,
+};
+
+struct sub_device {
+	/* Exhaustive DPDK device description */
+	struct rte_devargs devargs;
+	RTE_STD_C11
+	union {
+		struct rte_device device;
+		struct rte_pci_device pci_device;
+	};
+	struct rte_eth_dev *eth_dev;
+	/* Device state machine */
+	enum dev_state state;
+};
+
+struct fs_priv {
+	struct rte_eth_dev *dev;
+	/*
+	 * Set of sub_devices.
+	 * subs[0] is the preferred device
+	 * any other is just another slave
+	 */
+	uint8_t subs_head; /* if head == tail, no subs */
+	uint8_t subs_tail; /* first invalid */
+	uint8_t subs_tx;
+	struct sub_device *subs;
+	uint8_t current_probed;
+	/* current number of mac_addr slots allocated. */
+	uint32_t nb_mac_addr;
+	struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
+	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
+	/* current capabilities */
+	struct rte_eth_dev_info infos;
+};
+
+/* RX / TX */
+
+uint16_t failsafe_rx_burst(void *rxq,
+		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t failsafe_tx_burst(void *txq,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
+/* ARGS */
+
+int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
+void failsafe_args_free(struct rte_eth_dev *dev);
+int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
+
+/* EAL */
+
+int failsafe_eal_init(struct rte_eth_dev *dev);
+int failsafe_eal_uninit(struct rte_eth_dev *dev);
+
+/* GLOBALS */
+
+extern const char pmd_failsafe_driver_name[];
+extern const struct eth_dev_ops failsafe_ops;
+extern int mac_from_arg;
+
+/* HELPERS */
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PRIV(dev) \
+	((struct fs_priv *)(dev)->data->dev_private)
+
+/* sdev: (struct sub_device *) */
+#define ETH(sdev) \
+	((sdev)->eth_dev)
+
+/* sdev: (struct sub_device *) */
+#define PORT_ID(sdev) \
+	(ETH(sdev)->data->port_id)
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices:
+ * s:     (struct sub_device *), iterator
+ * i:     (uint8_t), increment
+ * dev:   (struct rte_eth_dev *), fail-safe ethdev
+ * state: (enum dev_state), minimum acceptable device state
+ */
+#define FOREACH_SUBDEV_ST(s, i, dev, state)				\
+	for (i = fs_find_next((dev), 0, state);				\
+	     i < PRIV(dev)->subs_tail && (s = &PRIV(dev)->subs[i]);	\
+	     i = fs_find_next((dev), i + 1, state))
+
+/**
+ * Iterator construct over fail-safe sub-devices:
+ * s:   (struct sub_device *), iterator
+ * i:   (uint8_t), increment
+ * dev: (struct rte_eth_dev *), fail-safe ethdev
+ */
+#define FOREACH_SUBDEV(s, i, dev)			\
+	FOREACH_SUBDEV_ST(s, i, dev, DEV_UNDEFINED)
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PREFERRED_SUBDEV(dev) \
+	(&PRIV(dev)->subs[0])
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define TX_SUBDEV(dev)							  \
+	(PRIV(dev)->subs_tx >= PRIV(dev)->subs_tail		   ? NULL \
+	 : (PRIV(dev)->subs[PRIV(dev)->subs_tx].state < DEV_PROBED ? NULL \
+	 : &PRIV(dev)->subs[PRIV(dev)->subs_tx]))
+
+/**
+ * s:   (struct sub_device *)
+ * ops: (struct eth_dev_ops) member
+ */
+#define SUBOPS(s, ops) \
+	(ETH(s)->dev_ops->ops)
+
+#ifndef NDEBUG
+#include <stdio.h>
+#define DEBUG__(m, ...)						\
+	(fprintf(stderr, "%s:%d: %s(): " m "%c",		\
+		 __FILE__, __LINE__, __func__, __VA_ARGS__),	\
+	 (void)0)
+#define DEBUG_(...)				\
+	(errno = ((int []){			\
+		*(volatile int *)&errno,	\
+		(DEBUG__(__VA_ARGS__), 0)	\
+	})[0])
+#define DEBUG(...) DEBUG_(__VA_ARGS__, '\n')
+#define INFO(...) DEBUG(__VA_ARGS__)
+#define WARN(...) DEBUG(__VA_ARGS__)
+#define ERROR(...) DEBUG(__VA_ARGS__)
+#else
+#define DEBUG(...) ((void)0)
+#define LOG__(level, m, ...) \
+	RTE_LOG(level, PMD, "net_failsafe: " m "%c", __VA_ARGS__)
+#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
+#define INFO(...) LOG_(INFO, __VA_ARGS__)
+#define WARN(...) LOG_(WARNING, "WARNING: " __VA_ARGS__)
+#define ERROR(...) LOG_(ERR, "ERROR: " __VA_ARGS__)
+#endif
+
+/* inlined functions */
+
+static inline uint8_t
+fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
+		enum dev_state min_state)
+{
+	while (sid < PRIV(dev)->subs_tail) {
+		if (PRIV(dev)->subs[sid].state >= min_state)
+			break;
+		sid++;
+	}
+	if (sid >= PRIV(dev)->subs_tail)
+		return PRIV(dev)->subs_tail;
+	return sid;
+}
+
+#endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
new file mode 100644
index 0000000..a45b4e5
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -0,0 +1,107 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include "failsafe_private.h"
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_rx_burst(void *queue,
+		  struct rte_mbuf **rx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	void *sub_rxq;
+	uint16_t nb_rx;
+	uint8_t nb_polled, nb_subs;
+	uint8_t i;
+
+	rxq = queue;
+	priv = rxq->priv;
+	nb_subs = priv->subs_tail - priv->subs_head;
+	nb_polled = 0;
+	for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
+		i++;
+		if (i == priv->subs_tail)
+			i = priv->subs_head;
+		sdev = &priv->subs[i];
+		if (unlikely(ETH(sdev) == NULL))
+			continue;
+		if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
+			continue;
+		if (unlikely(sdev->state != DEV_STARTED))
+			continue;
+		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
+		nb_rx = ETH(sdev)->
+			rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
+		if (nb_rx) {
+			rxq->last_polled = i;
+			return nb_rx;
+		}
+	}
+	return 0;
+}
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_tx_burst(void *queue,
+		  struct rte_mbuf **tx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	void *sub_txq;
+
+	txq = queue;
+	sdev = TX_SUBDEV(txq->priv->dev);
+	if (unlikely(sdev == NULL))
+		return 0;
+	if (unlikely(ETH(sdev) == NULL))
+		return 0;
+	if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
+		return 0;
+	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
+	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
+}
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 92f3635..2032625 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -112,6 +112,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)      += -lrte_pmd_e1000
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)        += -lrte_pmd_ena
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)       += -lrte_pmd_enic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)      += -lrte_pmd_fm10k
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE)   += -lrte_pmd_failsafe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)       += -lrte_pmd_i40e
 _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)      += -lrte_pmd_ixgbe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD)       += -lrte_pmd_mlx4 -libverbs
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 07/13] net/failsafe: add plug-in support
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (5 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 06/13] net/failsafe: add fail-safe PMD Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 08/13] net/failsafe: add flexible device definition Gaetan Rivet
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Periodically check for the existence of a device.
If a device has not been initialized and exists on the system, then it
is probed and configured.

The configuration process strives to synchronize the states between the
plugged-in sub-device and the fail-safe device.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/nics/fail_safe.rst           |  19 +++
 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |  68 ++++++++++
 drivers/net/failsafe/failsafe_args.c    |  32 +++++
 drivers/net/failsafe/failsafe_eal.c     |  30 +----
 drivers/net/failsafe/failsafe_ether.c   | 228 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     |  25 ++--
 drivers/net/failsafe/failsafe_private.h |  57 +++++++-
 8 files changed, 418 insertions(+), 42 deletions(-)
 create mode 100644 drivers/net/failsafe/failsafe_ether.c

diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index 056f85f..74bd807 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -102,6 +102,11 @@ Fail-safe command line parameters
   address is generated, that will be subsequently applied to all sub-device once
   they are probed.
 
+- **plug_in_poll** parameter [UINT64] (default **2000**)
+
+  This parameter allows the user to configure the amount of time in milliseconds
+  between two sub-device probing attempt.
+
 Usage example
 ~~~~~~~~~~~~~
 
@@ -131,3 +136,17 @@ Care must be taken, however, to respect the **ether** API concerning device
 access, and in particular, using the ``RTE_ETH_FOREACH_DEV`` macro to iterate
 over ethernet devices, instead of directly accessing them or by writing one's
 own device iterator.
+
+Plug-in feature
+---------------
+
+A sub-device can be defined without existing on the system when the fail-safe
+PMD is initialized. Upon probing this device, the fail-safe PMD will detect its
+absence and postpone its use. It will then register for a periodic check on any
+missing sub-device.
+
+During this time, the fail-safe PMD can be used normally, configured and told to
+emit and receive packets. It will store any applied configuration, and try to
+apply it upon the probing of its missing sub-device. After this configuration
+pass, the new sub-device will be synchronized with other sub-devices, i.e. be
+started if the fail-safe PMD has been started by the user before.
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
index 06199ad..4567961 100644
--- a/drivers/net/failsafe/Makefile
+++ b/drivers/net/failsafe/Makefile
@@ -40,6 +40,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_args.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ether.c
 
 # No exported include files
 
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index cd60193..2063393 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -79,6 +79,69 @@ sub_device_free(struct rte_eth_dev *dev)
 	rte_free(PRIV(dev)->subs);
 }
 
+static void failsafe_plugin_alarm(void *arg);
+
+int
+failsafe_plugin_alarm_install(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	if (PRIV(dev)->pending_alarm)
+		return 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		if (sdev->state != PRIV(dev)->state)
+			break;
+	if (i != PRIV(dev)->subs_tail) {
+		ret = rte_eal_alarm_set(plug_in_poll * 1000,
+					failsafe_plugin_alarm,
+					dev);
+		if (ret) {
+			ERROR("Could not set up plug-in event detection");
+			return ret;
+		}
+		PRIV(dev)->pending_alarm = 1;
+	}
+	return 0;
+}
+
+int
+failsafe_plugin_alarm_cancel(struct rte_eth_dev *dev)
+{
+	int ret = 0;
+
+	if (PRIV(dev)->pending_alarm) {
+		rte_errno = 0;
+		rte_eal_alarm_cancel(failsafe_plugin_alarm, dev);
+		if (rte_errno) {
+			ERROR("rte_eal_alarm_cancel failed (errno: %s)",
+			      strerror(rte_errno));
+			ret = -rte_errno;
+		} else {
+			PRIV(dev)->pending_alarm = 0;
+		}
+	}
+	return ret;
+}
+
+static void
+failsafe_plugin_alarm(void *arg)
+{
+	struct rte_eth_dev *dev = arg;
+	int ret;
+
+	if (!PRIV(dev)->pending_alarm)
+		return;
+	PRIV(dev)->pending_alarm = 0;
+	ret = failsafe_eth_dev_state_sync(dev);
+	if (ret)
+		ERROR("Unable to synchronize sub_device state");
+	ret = failsafe_plugin_alarm_install(dev);
+	if (ret)
+		ERROR("Unable to set up next alarm");
+}
+
 static int
 eth_dev_create(const char *name,
 		const unsigned socket_id,
@@ -134,6 +197,11 @@ eth_dev_create(const char *name,
 	ret = failsafe_eal_init(dev);
 	if (ret)
 		goto free_args;
+	ret = failsafe_plugin_alarm_install(dev);
+	if (ret) {
+		ERROR("Could not set up plug-in event detection");
+		goto free_subs;
+	}
 	mac = &dev->data->mac_addrs[0];
 	if (!mac_from_arg) {
 		struct sub_device *sdev;
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index faa48f4..773b322 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -45,9 +45,11 @@
 typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
 		uint8_t head);
 
+uint64_t plug_in_poll;
 int mac_from_arg;
 
 const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_PLUG_IN_POLL_KVARG,
 	PMD_FAILSAFE_MAC_KVARG,
 	NULL,
 };
@@ -237,6 +239,24 @@ remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
 }
 
 static int
+get_u64_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	uint64_t *u64 = out;
+	char *endptr = NULL;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	errno = 0;
+	*u64 = strtoull(value, &endptr, 0);
+	if (errno != 0)
+		return -errno;
+	if (endptr == value)
+		return -1;
+	return 0;
+}
+
+static int
 get_mac_addr_arg(const char *key __rte_unused,
 		const char *value, void *out)
 {
@@ -268,6 +288,7 @@ failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 	ret = 0;
 	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
 	/* default parameters */
+	plug_in_poll = FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS;
 	mac_from_arg = 0;
 	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
 	if (n >= sizeof(mut_params)) {
@@ -290,6 +311,16 @@ failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 				PMD_FAILSAFE_PARAM_STRING);
 			return -1;
 		}
+		/* PLUG_IN event poll timer */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_PLUG_IN_POLL_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_PLUG_IN_POLL_KVARG,
+					&get_u64_arg, &plug_in_poll);
+			if (ret < 0)
+				goto free_kvlist;
+		}
 		/* MAC addr */
 		arg_count = rte_kvargs_count(kvlist,
 				PMD_FAILSAFE_MAC_KVARG);
@@ -303,6 +334,7 @@ failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 			mac_from_arg = 1;
 		}
 	}
+	PRIV(dev)->state = DEV_PARSED;
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index fcee500..a5e8c3c 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -198,8 +198,6 @@ pci_probe(struct rte_eth_dev *dev)
 int
 failsafe_eal_init(struct rte_eth_dev *dev)
 {
-	struct sub_device *sdev;
-	uint8_t i;
 	int ret;
 
 	ret = pci_scan(dev);
@@ -211,30 +209,9 @@ failsafe_eal_init(struct rte_eth_dev *dev)
 	ret = dev_init(dev);
 	if (ret)
 		return ret;
-	/*
-	 * We only update TX_SUBDEV if we are not started.
-	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
-	 * preferred port only upon starting it, so that the switch is smoother.
-	 */
-	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
-		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
-		    (TX_SUBDEV(dev) == NULL ||
-		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
-			DEBUG("Switching tx_dev to preferred sub_device");
-			PRIV(dev)->subs_tx = 0;
-		}
-	} else {
-		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
-		    TX_SUBDEV(dev) == NULL) {
-			/* Using first probed device */
-			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
-				DEBUG("Switching tx_dev to sub_device %d",
-				      i);
-				PRIV(dev)->subs_tx = i;
-				break;
-			}
-		}
-	}
+	if (PRIV(dev)->state < DEV_PROBED)
+		PRIV(dev)->state = DEV_PROBED;
+	fs_switch_dev(dev);
 	return 0;
 }
 
@@ -314,5 +291,6 @@ failsafe_eal_uninit(struct rte_eth_dev *dev)
 	ret = dev_uninit(dev);
 	if (ret)
 		return ret;
+	PRIV(dev)->state = DEV_PROBED - 1;
 	return 0;
 }
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
new file mode 100644
index 0000000..ccdd471
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -0,0 +1,228 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "failsafe_private.h"
+
+static int
+eth_dev_conf_apply(struct rte_eth_dev *dev,
+		struct sub_device *sdev)
+{
+	struct rte_eth_dev *edev;
+	struct rte_vlan_filter_conf *vfc1;
+	struct rte_vlan_filter_conf *vfc2;
+	uint32_t i;
+	int ret;
+
+	edev = ETH(sdev);
+	/* RX queue setup */
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct rxq *rxq;
+
+		rxq = dev->data->rx_queues[i];
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev), i,
+				rxq->info.nb_desc, rxq->socket_id,
+				&rxq->info.conf, rxq->info.mp);
+		if (ret) {
+			ERROR("rx_queue_setup failed");
+			return ret;
+		}
+	}
+	/* TX queue setup */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct txq *txq;
+
+		txq = dev->data->tx_queues[i];
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev), i,
+				txq->info.nb_desc, txq->socket_id,
+				&txq->info.conf);
+		if (ret) {
+			ERROR("tx_queue_setup failed");
+			return ret;
+		}
+	}
+	/* dev_link.link_status */
+	if (dev->data->dev_link.link_status !=
+	    edev->data->dev_link.link_status) {
+		DEBUG("Configuring link_status");
+		if (dev->data->dev_link.link_status)
+			ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		else
+			ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Failed to apply link_status");
+			return ret;
+		}
+	} else {
+		DEBUG("link_status already set");
+	}
+	/* promiscuous */
+	if (dev->data->promiscuous != edev->data->promiscuous) {
+		DEBUG("Configuring promiscuous");
+		if (dev->data->promiscuous)
+			rte_eth_promiscuous_enable(PORT_ID(sdev));
+		else
+			rte_eth_promiscuous_disable(PORT_ID(sdev));
+	} else {
+		DEBUG("promiscuous already set");
+	}
+	/* all_multicast */
+	if (dev->data->all_multicast != edev->data->all_multicast) {
+		DEBUG("Configuring all_multicast");
+		if (dev->data->all_multicast)
+			rte_eth_allmulticast_enable(PORT_ID(sdev));
+		else
+			rte_eth_allmulticast_disable(PORT_ID(sdev));
+	} else {
+		DEBUG("all_multicast already set");
+	}
+	/* MTU */
+	if (dev->data->mtu != edev->data->mtu) {
+		DEBUG("Configuring MTU");
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), dev->data->mtu);
+		if (ret) {
+			ERROR("Failed to apply MTU");
+			return ret;
+		}
+	} else {
+		DEBUG("MTU already set");
+	}
+	/* default MAC */
+	DEBUG("Configuring default MAC address");
+	ret = rte_eth_dev_default_mac_addr_set(PORT_ID(sdev),
+			&dev->data->mac_addrs[0]);
+	if (ret) {
+		ERROR("Setting default MAC address failed");
+		return ret;
+	}
+	/* additional MAC */
+	if (PRIV(dev)->nb_mac_addr > 1)
+		DEBUG("Configure additional MAC address%s",
+			(PRIV(dev)->nb_mac_addr > 2 ? "es" : ""));
+	for (i = 1; i < PRIV(dev)->nb_mac_addr; i++) {
+		struct ether_addr *ea;
+
+		ea = &dev->data->mac_addrs[i];
+		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), ea,
+				PRIV(dev)->mac_addr_pool[i]);
+		if (ret) {
+			char ea_fmt[ETHER_ADDR_FMT_SIZE];
+
+			ether_format_addr(ea_fmt, ETHER_ADDR_FMT_SIZE, ea);
+			ERROR("Adding MAC address %s failed", ea_fmt);
+		}
+	}
+	/* VLAN filter */
+	vfc1 = &dev->data->vlan_filter_conf;
+	vfc2 = &edev->data->vlan_filter_conf;
+	if (memcmp(vfc1, vfc2, sizeof(struct rte_vlan_filter_conf))) {
+		uint64_t vbit;
+		uint64_t ids;
+		size_t i;
+		uint16_t vlan_id;
+
+		DEBUG("Configuring VLAN filter");
+		for (i = 0; i < RTE_DIM(vfc1->ids); i++) {
+			if (vfc1->ids[i] == 0)
+				continue;
+			ids = vfc1->ids[i];
+			while (ids) {
+				vlan_id = 64 * i;
+				/* count trailing zeroes */
+				vbit = ~ids & (ids - 1);
+				/* clear least significant bit set */
+				ids ^= (ids ^ (ids - 1)) ^ vbit;
+				for (; vbit; vlan_id++)
+					vbit >>= 1;
+				ret = rte_eth_dev_vlan_filter(
+					PORT_ID(sdev), vlan_id, 1);
+				if (ret) {
+					ERROR("Failed to apply VLAN filter %hu",
+						vlan_id);
+					return ret;
+				}
+			}
+		}
+	} else {
+		DEBUG("VLAN filter already set");
+	}
+	return 0;
+}
+
+int
+failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint32_t inactive;
+	int ret;
+	uint8_t i;
+
+	if (PRIV(dev)->state < DEV_PROBED)
+		return 0;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->state < DEV_ACTIVE)
+		return 0;
+	inactive = 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		if (sdev->state == DEV_PROBED)
+			inactive |= UINT32_C(1) << i;
+	ret = dev->dev_ops->dev_configure(dev);
+	if (ret)
+		return ret;
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (inactive & (UINT32_C(1) << i)) {
+			ret = eth_dev_conf_apply(dev, sdev);
+			if (ret) {
+				ERROR("Could not apply configuration to sub_device %d",
+				      i);
+				/* TODO: disable device */
+				return ret;
+			}
+		}
+	}
+	/*
+	 * If new devices have been configured, check if
+	 * the link state has changed.
+	 */
+	if (inactive)
+		dev->dev_ops->link_update(dev, 1);
+	if (PRIV(dev)->state < DEV_STARTED)
+		return 0;
+	ret = dev->dev_ops->dev_start(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 470fea4..b408877 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -89,6 +89,8 @@ fs_dev_configure(struct rte_eth_dev *dev)
 		}
 		sdev->state = DEV_ACTIVE;
 	}
+	if (PRIV(dev)->state < DEV_ACTIVE)
+		PRIV(dev)->state = DEV_ACTIVE;
 	return 0;
 }
 
@@ -108,21 +110,9 @@ fs_dev_start(struct rte_eth_dev *dev)
 			return ret;
 		sdev->state = DEV_STARTED;
 	}
-	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
-		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
-			DEBUG("Switching tx_dev to preferred sub_device");
-			PRIV(dev)->subs_tx = 0;
-		}
-	} else {
-		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
-		    TX_SUBDEV(dev) == NULL) {
-			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
-				DEBUG("Switching tx_dev to sub_device %d", i);
-				PRIV(dev)->subs_tx = i;
-				break;
-			}
-		}
-	}
+	if (PRIV(dev)->state < DEV_STARTED)
+		PRIV(dev)->state = DEV_STARTED;
+	fs_switch_dev(dev);
 	return 0;
 }
 
@@ -132,6 +122,7 @@ fs_dev_stop(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
+	PRIV(dev)->state = DEV_STARTED - 1;
 	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
 		rte_eth_dev_stop(PORT_ID(sdev));
 		sdev->state = DEV_STARTED - 1;
@@ -183,6 +174,10 @@ fs_dev_close(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
+	failsafe_plugin_alarm_cancel(dev);
+	if (PRIV(dev)->state == DEV_STARTED)
+		dev->dev_ops->dev_stop(dev);
+	PRIV(dev)->state = DEV_ACTIVE - 1;
 	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Closing sub_device %d", i);
 		rte_eth_dev_close(PORT_ID(sdev));
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index 84a4d3a..bb27223 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -42,9 +42,11 @@
 #define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
 
 #define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PLUG_IN_POLL_KVARG "plug_in_poll"
 #define PMD_FAILSAFE_PARAM_STRING	\
 	"dev(<ifc>),"			\
-	"mac=mac_addr"			\
+	"mac=mac_addr,"			\
+	"plug_in_poll=u64"		\
 	""
 
 #define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
@@ -110,8 +112,22 @@ struct fs_priv {
 	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
 	/* current capabilities */
 	struct rte_eth_dev_info infos;
+	/*
+	 * Fail-safe state machine.
+	 * This level will be tracking state of the EAL and eth
+	 * layer at large as defined by the user application.
+	 * It will then steer the sub_devices toward the same
+	 * synchronized state.
+	 */
+	enum dev_state state;
+	unsigned int pending_alarm:1; /* An alarm is pending */
 };
 
+/* MISC */
+
+int failsafe_plugin_alarm_install(struct rte_eth_dev *dev);
+int failsafe_plugin_alarm_cancel(struct rte_eth_dev *dev);
+
 /* RX / TX */
 
 uint16_t failsafe_rx_burst(void *rxq,
@@ -130,10 +146,15 @@ int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
 int failsafe_eal_init(struct rte_eth_dev *dev);
 int failsafe_eal_uninit(struct rte_eth_dev *dev);
 
+/* ETH_DEV */
+
+int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
+
 /* GLOBALS */
 
 extern const char pmd_failsafe_driver_name[];
 extern const struct eth_dev_ops failsafe_ops;
+extern uint64_t plug_in_poll;
 extern int mac_from_arg;
 
 /* HELPERS */
@@ -229,4 +250,38 @@ fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
 	return sid;
 }
 
+static inline void
+fs_switch_dev(struct rte_eth_dev *dev)
+{
+	enum dev_state req_state;
+
+	req_state = PRIV(dev)->state;
+	if (PREFERRED_SUBDEV(dev)->state >= req_state) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (req_state == DEV_STARTED) ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < req_state) ||
+		   TX_SUBDEV(dev) == NULL) {
+		struct sub_device *sdev;
+		uint8_t i;
+
+		/* Using acceptable device */
+		FOREACH_SUBDEV_ST(sdev, i, dev, req_state) {
+			DEBUG("Switching tx_dev to sub_device %d",
+			      i);
+			PRIV(dev)->subs_tx = i;
+			break;
+		}
+	} else if (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < req_state) {
+		DEBUG("No device ready, deactivating tx_dev");
+		PRIV(dev)->subs_tx = PRIV(dev)->subs_tail;
+	} else {
+		return;
+	}
+}
+
 #endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 08/13] net/failsafe: add flexible device definition
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (6 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 07/13] net/failsafe: add plug-in support Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 09/13] net/failsafe: support flow API Gaetan Rivet
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Add the "exec" device type.
The parameters given to this type of device will be executed in a shell.
The output of this command is then used as a definition for a device.

That command can be re-interpreted if the related device is not
plugged-in. It allows for a device definition to react to system
changes (e.g. changing PCI bus for a given device).

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/nics/fail_safe.rst           |  16 +++++
 drivers/net/failsafe/failsafe_args.c    | 102 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ether.c   |   7 +++
 drivers/net/failsafe/failsafe_private.h |   4 ++
 4 files changed, 129 insertions(+)

diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index 74bd807..bb8a221 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -91,6 +91,15 @@ Fail-safe command line parameters
   additional sub-device parameters if need be. They will be passed on to the
   sub-device.
 
+- **exec(<shell command>)** parameter
+
+  This parameter allows the user to provide a command to the fail-safe PMD to
+  execute and define a sub-device. It is done within a regular shell context.
+  The first line of its output is read by the fail-safe PMD and otherwise
+  interpreted as if passed by the regular **dev** parameter. Any other line is
+  discarded. If the command fail or output an incorrect string, the sub-device
+  is not initialized.
+
 - **mac** parameter [MAC address]
 
   This parameter allows the user to set a default MAC address to the fail-safe
@@ -126,6 +135,13 @@ This section shows some example of using **testpmd** with a fail-safe PMD.
          --vdev='net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0,nodeaction=r1:0:CREATE)' -- \
          -i
 
+#. Start testpmd using a flexible device definition
+
+   .. code-block:: console
+
+      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 --no-pci \
+         --vdev='net_failsafe0,exec(echo 84:00.0)' -- -i
+
 Using the Fail-safe PMD from an application
 -------------------------------------------
 
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index 773b322..839831f 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -112,6 +112,80 @@ parse_device(struct sub_device *sdev, char *args)
 	return 0;
 }
 
+static void
+sanitize_cmdline(char *args)
+{
+	size_t len;
+
+	len = strnlen(args, DEVARGS_MAXLEN);
+	args[len - 1] = '\0';
+}
+
+static int
+execute_cmd(struct sub_device *sdev, char *cmdline)
+{
+	FILE *fp;
+	/* store possible newline as well */
+	char output[DEVARGS_MAXLEN + 1];
+	size_t len;
+	int old_err;
+	int ret;
+
+	DEBUG("'%s'", cmdline);
+	if (cmdline == NULL &&
+	    sdev->cmdline == NULL) {
+		/* debug: should never happen to a user */
+		DEBUG("Invalid command line");
+		return -EINVAL;
+	}
+	if (sdev->cmdline == NULL) {
+		char *new_str;
+
+		len = strlen(cmdline) + 1;
+		new_str = rte_realloc(sdev->cmdline, len,
+				RTE_CACHE_LINE_SIZE);
+		if (new_str == NULL) {
+			ERROR("Command line allocation failed");
+			return -ENOMEM;
+		}
+		sdev->cmdline = new_str;
+		snprintf(sdev->cmdline, len, "%s", cmdline);
+	} else {
+		if (strcmp(sdev->cmdline, cmdline))
+			DEBUG("cmd mismatch: '%s' != '%s'",
+				sdev->cmdline, cmdline);
+		cmdline = sdev->cmdline;
+	}
+	old_err = errno;
+	fp = popen(cmdline, "r");
+	if (fp == NULL) {
+		ret = errno;
+		ERROR("popen: %s", strerror(errno));
+		errno = old_err;
+		return ret;
+	}
+	/* We only read one line */
+	if (fgets(output, sizeof(output) - 1, fp) == NULL) {
+		DEBUG("Could not read command output");
+		return -ENODEV;
+	}
+	sanitize_cmdline(output);
+	ret = parse_device(sdev, output);
+	if (ret) {
+		ERROR("Parsing device '%s' failed", output);
+		goto ret_pclose;
+	}
+ret_pclose:
+	ret = pclose(fp);
+	if (ret) {
+		ret = errno;
+		ERROR("pclose: %s", strerror(errno));
+		errno = old_err;
+		return ret;
+	}
+	return ret;
+}
+
 static int
 parse_device_param(struct rte_eth_dev *dev, const char *param,
 		uint8_t head)
@@ -146,6 +220,14 @@ parse_device_param(struct rte_eth_dev *dev, const char *param,
 		ret = parse_device(sdev, args);
 		if (ret)
 			goto free_args;
+	} else if (strncmp(param, "exec", 4) == 0) {
+		ret = execute_cmd(sdev, args);
+		if (ret == -ENODEV) {
+			DEBUG("Reading device info from command line failed");
+			ret = 0;
+		}
+		if (ret)
+			goto free_args;
 	} else {
 		ERROR("Unrecognized device type: %.*s", (int)b, param);
 		return -EINVAL;
@@ -347,6 +429,8 @@ failsafe_args_free(struct rte_eth_dev *dev)
 	uint8_t i;
 
 	FOREACH_SUBDEV(sdev, i, dev) {
+		rte_free(sdev->cmdline);
+		sdev->cmdline = NULL;
 		free(sdev->devargs.args);
 		sdev->devargs.args = NULL;
 	}
@@ -377,3 +461,21 @@ failsafe_args_count_subdevice(struct rte_eth_dev *dev,
 	return parse_sub_devices(count_device,
 				dev, params);
 }
+
+int
+failsafe_args_parse_subs(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret = 0;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state >= DEV_PARSED)
+			continue;
+		if (sdev->cmdline)
+			ret = execute_cmd(sdev, sdev->cmdline);
+		if (ret == 0)
+			sdev->state = DEV_PARSED;
+	}
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index ccdd471..a6ccf8f 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -188,6 +188,13 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 	int ret;
 	uint8_t i;
 
+	if (PRIV(dev)->state < DEV_PARSED)
+		return 0;
+
+	ret = failsafe_args_parse_subs(dev);
+	if (ret)
+		return ret;
+
 	if (PRIV(dev)->state < DEV_PROBED)
 		return 0;
 	ret = failsafe_eal_init(dev);
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index bb27223..fcd5b75 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -45,6 +45,7 @@
 #define PMD_FAILSAFE_PLUG_IN_POLL_KVARG "plug_in_poll"
 #define PMD_FAILSAFE_PARAM_STRING	\
 	"dev(<ifc>),"			\
+	"exec(<shell command>),"	\
 	"mac=mac_addr,"			\
 	"plug_in_poll=u64"		\
 	""
@@ -92,6 +93,8 @@ struct sub_device {
 	struct rte_eth_dev *eth_dev;
 	/* Device state machine */
 	enum dev_state state;
+	/* Some device are defined as a command line */
+	char *cmdline;
 };
 
 struct fs_priv {
@@ -140,6 +143,7 @@ uint16_t failsafe_tx_burst(void *txq,
 int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
 void failsafe_args_free(struct rte_eth_dev *dev);
 int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
+int failsafe_args_parse_subs(struct rte_eth_dev *dev);
 
 /* EAL */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 09/13] net/failsafe: support flow API
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (7 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 08/13] net/failsafe: add flexible device definition Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 10/13] net/failsafe: support offload capabilities Gaetan Rivet
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/nics/features/failsafe.ini   |   1 +
 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |   1 +
 drivers/net/failsafe/failsafe_eal.c     |   2 +
 drivers/net/failsafe/failsafe_ether.c   |  76 +++++++++++
 drivers/net/failsafe/failsafe_flow.c    | 230 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     |  29 ++++
 drivers/net/failsafe/failsafe_private.h |  20 +++
 8 files changed, 360 insertions(+)
 create mode 100644 drivers/net/failsafe/failsafe_flow.c

diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
index 3c52823..9167b59 100644
--- a/doc/guides/nics/features/failsafe.ini
+++ b/doc/guides/nics/features/failsafe.ini
@@ -13,6 +13,7 @@ Allmulticast mode    = Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
 VLAN filter          = Y
+Flow API             = Y
 Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
index 4567961..a53bb75 100644
--- a/drivers/net/failsafe/Makefile
+++ b/drivers/net/failsafe/Makefile
@@ -41,6 +41,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ether.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_flow.c
 
 # No exported include files
 
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index 2063393..6151736 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -180,6 +180,7 @@ eth_dev_create(const char *name,
 	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
 	dev->data->dev_link = eth_link;
 	PRIV(dev)->nb_mac_addr = 1;
+	TAILQ_INIT(&PRIV(dev)->flow_list);
 	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
 	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
 	if (params == NULL) {
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index a5e8c3c..9817fc9 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -139,6 +139,7 @@ dev_init(struct rte_eth_dev *dev)
 				continue;
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			SUB_ID(sdev) = i;
 			sdev->state = DEV_PROBED;
 		}
 	}
@@ -189,6 +190,7 @@ pci_probe(struct rte_eth_dev *dev)
 				continue;
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+			SUB_ID(sdev) = i;
 			sdev->state = DEV_PROBED;
 		}
 	}
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index a6ccf8f..8c73b4c 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -33,8 +33,46 @@
 
 #include <unistd.h>
 
+#include <rte_flow.h>
+#include <rte_flow_driver.h>
+
 #include "failsafe_private.h"
 
+/** Print a message out of a flow error. */
+static int
+fs_flow_complain(struct rte_flow_error *error)
+{
+	static const char *const errstrlist[] = {
+		[RTE_FLOW_ERROR_TYPE_NONE] = "no error",
+		[RTE_FLOW_ERROR_TYPE_UNSPECIFIED] = "cause unspecified",
+		[RTE_FLOW_ERROR_TYPE_HANDLE] = "flow rule (handle)",
+		[RTE_FLOW_ERROR_TYPE_ATTR_GROUP] = "group field",
+		[RTE_FLOW_ERROR_TYPE_ATTR_PRIORITY] = "priority field",
+		[RTE_FLOW_ERROR_TYPE_ATTR_INGRESS] = "ingress field",
+		[RTE_FLOW_ERROR_TYPE_ATTR_EGRESS] = "egress field",
+		[RTE_FLOW_ERROR_TYPE_ATTR] = "attributes structure",
+		[RTE_FLOW_ERROR_TYPE_ITEM_NUM] = "pattern length",
+		[RTE_FLOW_ERROR_TYPE_ITEM] = "specific pattern item",
+		[RTE_FLOW_ERROR_TYPE_ACTION_NUM] = "number of actions",
+		[RTE_FLOW_ERROR_TYPE_ACTION] = "specific action",
+	};
+	const char *errstr;
+	char buf[32];
+	int err = rte_errno;
+
+	if ((unsigned int)error->type >= RTE_DIM(errstrlist) ||
+			!errstrlist[error->type])
+		errstr = "unknown type";
+	else
+		errstr = errstrlist[error->type];
+	ERROR("Caught error type %d (%s): %s%s\n",
+		error->type, errstr,
+		error->cause ? (snprintf(buf, sizeof(buf), "cause: %p, ",
+				error->cause), buf) : "",
+		error->message ? error->message : "(no stated reason)");
+	return -err;
+}
+
 static int
 eth_dev_conf_apply(struct rte_eth_dev *dev,
 		struct sub_device *sdev)
@@ -42,6 +80,8 @@ eth_dev_conf_apply(struct rte_eth_dev *dev,
 	struct rte_eth_dev *edev;
 	struct rte_vlan_filter_conf *vfc1;
 	struct rte_vlan_filter_conf *vfc2;
+	struct rte_flow *flow;
+	struct rte_flow_error ferror;
 	uint32_t i;
 	int ret;
 
@@ -177,6 +217,42 @@ eth_dev_conf_apply(struct rte_eth_dev *dev,
 	} else {
 		DEBUG("VLAN filter already set");
 	}
+	/* rte_flow */
+	if (TAILQ_EMPTY(&PRIV(dev)->flow_list)) {
+		DEBUG("rte_flow already set");
+	} else {
+		DEBUG("Resetting rte_flow configuration");
+#ifndef NDEBUG
+		memset(&ferror, 0x11, sizeof(ferror));
+#endif
+		ret = rte_flow_flush(PORT_ID(sdev), &ferror);
+		if (ret) {
+			fs_flow_complain(&ferror);
+			return ret;
+		}
+#ifndef NDEBUG
+		memset(&ferror, 0x11, sizeof(ferror));
+#endif
+		i = 0;
+		rte_errno = 0;
+		DEBUG("Configuring rte_flow");
+		TAILQ_FOREACH(flow, &PRIV(dev)->flow_list, next) {
+			DEBUG("Creating flow #%" PRIu32, i++);
+			flow->flows[SUB_ID(sdev)] =
+				rte_flow_create(PORT_ID(sdev),
+						&flow->fd->attr,
+						flow->fd->items,
+						flow->fd->actions,
+						&ferror);
+			ret = rte_errno;
+			if (ret)
+				break;
+		}
+		if (ret) {
+			fs_flow_complain(&ferror);
+			return ret;
+		}
+	}
 	return 0;
 }
 
diff --git a/drivers/net/failsafe/failsafe_flow.c b/drivers/net/failsafe/failsafe_flow.c
new file mode 100644
index 0000000..3079786
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -0,0 +1,230 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <sys/queue.h>
+
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+#include <rte_flow.h>
+#include <rte_flow_driver.h>
+
+#include "failsafe_private.h"
+
+static void *
+fs_calloc(size_t n, size_t sz)
+{
+	size_t size;
+
+	size = n * sz;
+	if (n != 0 && size / n != sz) {
+		ERROR("size_t overflow");
+		return NULL;
+	}
+	return rte_zmalloc(NULL, size,
+			RTE_CACHE_LINE_SIZE);
+}
+
+static struct rte_flow *
+fs_flow_allocate(const struct rte_flow_attr *attr,
+		 const struct rte_flow_item *items,
+		 const struct rte_flow_action *actions)
+{
+	struct rte_flow *flow;
+
+	flow = rte_zmalloc(NULL, sizeof(struct rte_flow),
+			RTE_CACHE_LINE_SIZE);
+	if (flow == NULL) {
+		ERROR("Could not allocate new flow");
+		return NULL;
+	}
+	flow->fd = rte_flow_copy(attr, items, actions, fs_calloc);
+	if (flow->fd == NULL) {
+		ERROR("Could not allocate flow description");
+		goto free_flow;
+	}
+	return flow;
+free_flow:
+	rte_free(flow);
+
+	return NULL;
+}
+
+static void
+fs_flow_release(struct rte_flow **flow)
+{
+	rte_free((*flow)->fd);
+	rte_free(*flow);
+	*flow = NULL;
+}
+
+static int
+fs_flow_validate(struct rte_eth_dev *dev,
+		 const struct rte_flow_attr *attr,
+		 const struct rte_flow_item patterns[],
+		 const struct rte_flow_action actions[],
+		 struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_flow_validate on sub_device %d", i);
+		ret = rte_flow_validate(PORT_ID(sdev),
+				attr, patterns, actions, error);
+		if (ret) {
+			ERROR("Operation rte_flow_validate failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static struct rte_flow *
+fs_flow_create(struct rte_eth_dev *dev,
+	       const struct rte_flow_attr *attr,
+	       const struct rte_flow_item patterns[],
+	       const struct rte_flow_action actions[],
+	       struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	struct rte_flow *flow;
+	uint8_t i;
+
+	flow = fs_flow_allocate(attr, patterns, actions);
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		flow->flows[i] = rte_flow_create(PORT_ID(sdev),
+				attr, patterns, actions, error);
+		if (flow->flows[i] == NULL) {
+			ERROR("Failed to create flow on sub_device %d",
+				i);
+			goto err;
+		}
+	}
+	TAILQ_INSERT_TAIL(&PRIV(dev)->flow_list, flow, next);
+	return flow;
+err:
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (flow->flows[i] != NULL)
+			rte_flow_destroy(PORT_ID(sdev),
+				flow->flows[i], error);
+	}
+	fs_flow_release(&flow);
+	return NULL;
+}
+
+static int
+fs_flow_destroy(struct rte_eth_dev *dev,
+		struct rte_flow *flow,
+		struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	if (flow == NULL) {
+		ERROR("Invalid flow");
+		return -EINVAL;
+	}
+	ret = 0;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		int local_ret;
+
+		if (flow->flows[i] == NULL)
+			continue;
+		local_ret = rte_flow_destroy(PORT_ID(sdev),
+				flow->flows[i], error);
+		if (local_ret) {
+			ERROR("Failed to destroy flow on sub_device %d: %d",
+					i, local_ret);
+			if (ret == 0)
+				ret = local_ret;
+		}
+	}
+	TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
+	rte_free(flow);
+	return ret;
+}
+
+static int
+fs_flow_flush(struct rte_eth_dev *dev,
+	      struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+	struct rte_flow *flow;
+	void *tmp;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_flow_flush on sub_device %d", i);
+		ret = rte_flow_flush(PORT_ID(sdev), error);
+		if (ret) {
+			ERROR("Operation rte_flow_flush failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	TAILQ_FOREACH_SAFE(flow, &PRIV(dev)->flow_list, next, tmp) {
+		TAILQ_REMOVE(&PRIV(dev)->flow_list, flow, next);
+		fs_flow_release(&flow);
+	}
+	return 0;
+}
+
+static int
+fs_flow_query(struct rte_eth_dev *dev,
+	      struct rte_flow *flow,
+	      enum rte_flow_action_type type,
+	      void *arg,
+	      struct rte_flow_error *error)
+{
+	struct sub_device *sdev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev != NULL) {
+		return rte_flow_query(PORT_ID(sdev),
+				flow->flows[SUB_ID(sdev)], type, arg, error);
+	}
+	WARN("No active sub_device to query about its flow");
+	return -1;
+}
+
+const struct rte_flow_ops fs_flow_ops = {
+	.validate = fs_flow_validate,
+	.create = fs_flow_create,
+	.destroy = fs_flow_destroy,
+	.flush = fs_flow_flush,
+	.query = fs_flow_query,
+};
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index b408877..d837280 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -35,6 +35,7 @@
 #include <stdint.h>
 #include <rte_ethdev.h>
 #include <rte_malloc.h>
+#include <rte_flow.h>
 
 #include "failsafe_private.h"
 
@@ -642,6 +643,33 @@ fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
 }
 
+static int
+fs_filter_ctrl(struct rte_eth_dev *dev,
+		enum rte_filter_type type,
+		enum rte_filter_op op,
+		void *arg)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	if (type == RTE_ETH_FILTER_GENERIC &&
+	    op == RTE_ETH_FILTER_GET) {
+		*(const void **)arg = &fs_flow_ops;
+		return 0;
+	}
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_filter_ctrl on sub_device %d", i);
+		ret = rte_eth_dev_filter_ctrl(PORT_ID(sdev), type, op, arg);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_filter_ctrl failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
 const struct eth_dev_ops failsafe_ops = {
 	.dev_configure = fs_dev_configure,
 	.dev_start = fs_dev_start,
@@ -669,4 +697,5 @@ const struct eth_dev_ops failsafe_ops = {
 	.mac_addr_remove = fs_mac_addr_remove,
 	.mac_addr_add = fs_mac_addr_add,
 	.mac_addr_set = fs_mac_addr_set,
+	.filter_ctrl = fs_filter_ctrl,
 };
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index fcd5b75..c636199 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -34,6 +34,8 @@
 #ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
 #define _RTE_ETH_FAILSAFE_PRIVATE_H_
 
+#include <sys/queue.h>
+
 #include <rte_dev.h>
 #include <rte_pci.h>
 #include <rte_ethdev.h>
@@ -73,6 +75,16 @@ struct txq {
 	struct rte_eth_txq_info info;
 };
 
+struct rte_flow {
+	TAILQ_ENTRY(rte_flow) next;
+
+	/* sub_flows */
+	struct rte_flow *flows[FAILSAFE_MAX_ETHPORTS];
+
+	/* flow description for synchronization */
+	struct rte_flow_desc *fd;
+};
+
 enum dev_state {
 	DEV_UNDEFINED = 0,
 	DEV_PARSED,
@@ -91,6 +103,7 @@ struct sub_device {
 		struct rte_pci_device pci_device;
 	};
 	struct rte_eth_dev *eth_dev;
+	uint8_t sid;
 	/* Device state machine */
 	enum dev_state state;
 	/* Some device are defined as a command line */
@@ -109,6 +122,8 @@ struct fs_priv {
 	uint8_t subs_tx;
 	struct sub_device *subs;
 	uint8_t current_probed;
+	/* flow mapping */
+	TAILQ_HEAD(sub_flows, rte_flow) flow_list;
 	/* current number of mac_addr slots allocated. */
 	uint32_t nb_mac_addr;
 	struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
@@ -158,6 +173,7 @@ int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
 
 extern const char pmd_failsafe_driver_name[];
 extern const struct eth_dev_ops failsafe_ops;
+extern const struct rte_flow_ops fs_flow_ops;
 extern uint64_t plug_in_poll;
 extern int mac_from_arg;
 
@@ -175,6 +191,10 @@ extern int mac_from_arg;
 #define PORT_ID(sdev) \
 	(ETH(sdev)->data->port_id)
 
+/* sdev: (struct sub_device *) */
+#define SUB_ID(sdev) \
+	(sdev->sid)
+
 /**
  * Stateful iterator construct over fail-safe sub-devices:
  * s:     (struct sub_device *), iterator
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 10/13] net/failsafe: support offload capabilities
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (8 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 09/13] net/failsafe: support flow API Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 11/13] net/failsafe: add fast burst functions Gaetan Rivet
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/nics/features/failsafe.ini |   6 ++
 drivers/net/failsafe/failsafe_ops.c   | 125 +++++++++++++++++++++++++++++++++-
 2 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
index 9167b59..257f579 100644
--- a/doc/guides/nics/features/failsafe.ini
+++ b/doc/guides/nics/features/failsafe.ini
@@ -14,6 +14,12 @@ Unicast MAC filter   = Y
 Multicast MAC filter = Y
 VLAN filter          = Y
 Flow API             = Y
+VLAN offload         = Y
+QinQ offload         = Y
+L3 checksum offload  = Y
+L4 checksum offload  = Y
+Inner L3 checksum    = Y
+Inner L4 checksum    = Y
 Packet type parsing  = Y
 Basic stats          = Y
 Stats per queue      = Y
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index d837280..2a4d102 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -63,22 +63,143 @@ static struct rte_eth_dev_info default_infos = {
 		.nb_seg_max = UINT16_MAX,
 		.nb_mtu_seg_max = UINT16_MAX,
 	},
-	/* Set of understood capabilities */
-	.rx_offload_capa = 0x0,
+	/*
+	 * Set of capabilities that can be verified upon
+	 * configuring a sub-device.
+	 */
+	.rx_offload_capa =
+		DEV_RX_OFFLOAD_VLAN_STRIP |
+		DEV_RX_OFFLOAD_QINQ_STRIP |
+		DEV_RX_OFFLOAD_IPV4_CKSUM |
+		DEV_RX_OFFLOAD_UDP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_CKSUM |
+		DEV_RX_OFFLOAD_TCP_LRO,
 	.tx_offload_capa = 0x0,
 	.flow_type_rss_offloads = 0x0,
 };
 
+/**
+ * Check whether a specific offloading capability
+ * is supported by a sub_device.
+ *
+ * @return
+ *   0: all requested capabilities are supported by the sub_device
+ *   positive value: This flag at least is not supported by the sub_device
+ */
+static int
+fs_port_offload_validate(struct rte_eth_dev *dev,
+			 struct sub_device *sdev)
+{
+	struct rte_eth_dev_info infos = {0};
+	struct rte_eth_conf *cf;
+	uint32_t cap;
+
+	cf = &dev->data->dev_conf;
+	SUBOPS(sdev, dev_infos_get)(ETH(sdev), &infos);
+	/* RX capabilities */
+	cap = infos.rx_offload_capa;
+	if (cf->rxmode.hw_vlan_strip &&
+	    ((cap & DEV_RX_OFFLOAD_VLAN_STRIP) == 0)) {
+		WARN("VLAN stripping offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_VLAN_STRIP;
+	}
+	if (cf->rxmode.hw_ip_checksum &&
+	    ((cap & (DEV_RX_OFFLOAD_IPV4_CKSUM |
+		     DEV_RX_OFFLOAD_UDP_CKSUM |
+		     DEV_RX_OFFLOAD_TCP_CKSUM)) !=
+	     (DEV_RX_OFFLOAD_IPV4_CKSUM |
+	      DEV_RX_OFFLOAD_UDP_CKSUM |
+	      DEV_RX_OFFLOAD_TCP_CKSUM))) {
+		WARN("IP checksum offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_IPV4_CKSUM |
+		       DEV_RX_OFFLOAD_UDP_CKSUM |
+		       DEV_RX_OFFLOAD_TCP_CKSUM;
+	}
+	if (cf->rxmode.enable_lro &&
+	    ((cap & DEV_RX_OFFLOAD_TCP_LRO) == 0)) {
+		WARN("TCP LRO offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_TCP_LRO;
+	}
+	if (cf->rxmode.hw_vlan_extend &&
+	    ((cap & DEV_RX_OFFLOAD_QINQ_STRIP) == 0)) {
+		WARN("Stacked VLAN stripping offload requested but not supported by sub_device %d",
+		      SUB_ID(sdev));
+		return DEV_RX_OFFLOAD_QINQ_STRIP;
+	}
+	/* TX capabilities */
+	/* Nothing to do, no tx capa supported */
+	return 0;
+}
+
+/*
+ * Disable the dev_conf flag related to an offload capability flag
+ * within an ethdev configuration.
+ */
+static int
+fs_port_disable_offload(struct rte_eth_conf *cf,
+			uint32_t ol_cap)
+{
+	switch (ol_cap) {
+	case DEV_RX_OFFLOAD_VLAN_STRIP:
+		INFO("Disabling VLAN stripping offload");
+		cf->rxmode.hw_vlan_strip = 0;
+		break;
+	case DEV_RX_OFFLOAD_IPV4_CKSUM:
+	case DEV_RX_OFFLOAD_UDP_CKSUM:
+	case DEV_RX_OFFLOAD_TCP_CKSUM:
+	case (DEV_RX_OFFLOAD_IPV4_CKSUM |
+	      DEV_RX_OFFLOAD_UDP_CKSUM |
+	      DEV_RX_OFFLOAD_TCP_CKSUM):
+		INFO("Disabling IP checksum offload");
+		cf->rxmode.hw_ip_checksum = 0;
+		break;
+	case DEV_RX_OFFLOAD_TCP_LRO:
+		INFO("Disabling TCP LRO offload");
+		cf->rxmode.enable_lro = 0;
+		break;
+	case DEV_RX_OFFLOAD_QINQ_STRIP:
+		INFO("Disabling stacked VLAN stripping offload");
+		cf->rxmode.hw_vlan_extend = 0;
+		break;
+	default:
+		DEBUG("Unable to disable offload capability: %" PRIx32,
+		      ol_cap);
+		return -1;
+	}
+	return 0;
+}
+
 static int
 fs_dev_configure(struct rte_eth_dev *dev)
 {
 	struct sub_device *sdev;
 	uint8_t i;
+	int capa_flag;
 	int ret;
 
 	FOREACH_SUBDEV(sdev, i, dev) {
 		if (sdev->state != DEV_PROBED)
 			continue;
+		DEBUG("Checking capabilities for sub_device %d", i);
+		while ((capa_flag = fs_port_offload_validate(dev, sdev))) {
+			if (PRIV(dev)->state >= DEV_ACTIVE) {
+				ERROR("device already configured, cannot fix live configuration");
+				return -1;
+			}
+			ret = fs_port_disable_offload(&dev->data->dev_conf,
+						      capa_flag);
+			if (ret) {
+				ERROR("Unable to disable offload capability");
+				return ret;
+			}
+		}
+	}
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PROBED)
+			continue;
 		DEBUG("Configuring sub-device %d", i);
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 11/13] net/failsafe: add fast burst functions
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (9 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 10/13] net/failsafe: support offload capabilities Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 12/13] net/failsafe: support device removal Gaetan Rivet
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 drivers/net/failsafe/failsafe_private.h |   8 +++
 drivers/net/failsafe/failsafe_rxtx.c    | 117 ++++++++++++++++++++++++++------
 2 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index c636199..faf0e71 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -148,11 +148,18 @@ int failsafe_plugin_alarm_cancel(struct rte_eth_dev *dev);
 
 /* RX / TX */
 
+void set_burst_fn(struct rte_eth_dev *dev);
+
 uint16_t failsafe_rx_burst(void *rxq,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 uint16_t failsafe_tx_burst(void *txq,
 		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
 
+uint16_t failsafe_rx_burst_fast(void *rxq,
+		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t failsafe_tx_burst_fast(void *txq,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
 /* ARGS */
 
 int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
@@ -306,6 +313,7 @@ fs_switch_dev(struct rte_eth_dev *dev)
 	} else {
 		return;
 	}
+	set_burst_fn(dev);
 }
 
 #endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
index a45b4e5..796a9ad 100644
--- a/drivers/net/failsafe/failsafe_rxtx.c
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -36,11 +36,53 @@
 
 #include "failsafe_private.h"
 
-/*
- * TODO: write fast version,
- * without additional checks, to be activated once
- * everything has been verified to comply.
- */
+static inline int
+sdev_rx_unsafe(struct sub_device *sdev)
+{
+	return (ETH(sdev) == NULL) ||
+		(ETH(sdev)->rx_pkt_burst == NULL) ||
+		(sdev->state != DEV_STARTED);
+}
+
+static inline int
+sdev_tx_unsafe(struct sub_device *sdev)
+{
+	return (sdev == NULL) ||
+		(ETH(sdev) == NULL) ||
+		(ETH(sdev)->tx_pkt_burst == NULL) ||
+		(sdev->state != DEV_STARTED);
+}
+
+void
+set_burst_fn(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int need_safe;
+	int safe_set;
+
+	need_safe = 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		need_safe |= sdev_rx_unsafe(sdev);
+	safe_set = (dev->rx_pkt_burst == &failsafe_rx_burst);
+	if (need_safe && !safe_set) {
+		DEBUG("Using safe RX bursts");
+		dev->rx_pkt_burst = &failsafe_rx_burst;
+	} else if (!need_safe && safe_set) {
+		DEBUG("Using fast RX bursts");
+		dev->rx_pkt_burst = &failsafe_rx_burst_fast;
+	}
+	need_safe = sdev_tx_unsafe(TX_SUBDEV(dev));
+	safe_set = (dev->tx_pkt_burst == &failsafe_tx_burst);
+	if (need_safe && !safe_set) {
+		DEBUG("Using safe TX bursts");
+		dev->tx_pkt_burst = &failsafe_tx_burst;
+	} else if (!need_safe && safe_set) {
+		DEBUG("Using fast TX bursts");
+		dev->tx_pkt_burst = &failsafe_tx_burst_fast;
+	}
+}
+
 uint16_t
 failsafe_rx_burst(void *queue,
 		  struct rte_mbuf **rx_pkts,
@@ -63,11 +105,7 @@ failsafe_rx_burst(void *queue,
 		if (i == priv->subs_tail)
 			i = priv->subs_head;
 		sdev = &priv->subs[i];
-		if (unlikely(ETH(sdev) == NULL))
-			continue;
-		if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
-			continue;
-		if (unlikely(sdev->state != DEV_STARTED))
+		if (unlikely(sdev_rx_unsafe(sdev)))
 			continue;
 		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
 		nb_rx = ETH(sdev)->
@@ -80,11 +118,39 @@ failsafe_rx_burst(void *queue,
 	return 0;
 }
 
-/*
- * TODO: write fast version,
- * without additional checks, to be activated once
- * everything has been verified to comply.
- */
+uint16_t
+failsafe_rx_burst_fast(void *queue,
+			 struct rte_mbuf **rx_pkts,
+			 uint16_t nb_pkts)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	void *sub_rxq;
+	uint16_t nb_rx;
+	uint8_t nb_polled, nb_subs;
+	uint8_t i;
+
+	rxq = queue;
+	priv = rxq->priv;
+	nb_subs = priv->subs_tail - priv->subs_head;
+	nb_polled = 0;
+	for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
+		i++;
+		if (i == priv->subs_tail)
+			i = priv->subs_head;
+		sdev = &priv->subs[i];
+		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
+		nb_rx = ETH(sdev)->
+			rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
+		if (nb_rx) {
+			rxq->last_polled = i;
+			return nb_rx;
+		}
+	}
+	return 0;
+}
+
 uint16_t
 failsafe_tx_burst(void *queue,
 		  struct rte_mbuf **tx_pkts,
@@ -96,12 +162,23 @@ failsafe_tx_burst(void *queue,
 
 	txq = queue;
 	sdev = TX_SUBDEV(txq->priv->dev);
-	if (unlikely(sdev == NULL))
-		return 0;
-	if (unlikely(ETH(sdev) == NULL))
-		return 0;
-	if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
+	if (unlikely(sdev_tx_unsafe(sdev)))
 		return 0;
 	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
 	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
 }
+
+uint16_t
+failsafe_tx_burst_fast(void *queue,
+			 struct rte_mbuf **tx_pkts,
+			 uint16_t nb_pkts)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	void *sub_txq;
+
+	txq = queue;
+	sdev = TX_SUBDEV(txq->priv->dev);
+	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
+	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
+}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 12/13] net/failsafe: support device removal
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (10 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 11/13] net/failsafe: add fast burst functions Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 15:15   ` [PATCH v2 13/13] net/failsafe: support link status change event Gaetan Rivet
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/nics/fail_safe.rst           | 14 +++++++
 drivers/net/failsafe/failsafe_args.c    | 22 +++++++++++
 drivers/net/failsafe/failsafe_eal.c     |  2 +
 drivers/net/failsafe/failsafe_ether.c   | 67 ++++++++++++++++++++++++++++++++-
 drivers/net/failsafe/failsafe_ops.c     | 21 +++++++++++
 drivers/net/failsafe/failsafe_private.h |  7 ++++
 6 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index bb8a221..8811ed3 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -51,6 +51,12 @@ The Fail-safe PMD only supports a limited set of features. If you plan to use a
 device underneath the Fail-safe PMD with a specific feature, this feature must
 be supported by the Fail-safe PMD to avoid throwing any error.
 
+A notable exception is the device removal feature. The fail-safe PMD being a
+virtual device, it cannot currently be removed in the sense of a specific bus
+hotplug, like for PCI for example. It will however enable this feature for its
+sub-device automatically, detecting those that are capable and register the
+relevant callback for such event.
+
 Check the feature matrix for the complete set of supported features.
 
 Compilation options
@@ -166,3 +172,11 @@ emit and receive packets. It will store any applied configuration, and try to
 apply it upon the probing of its missing sub-device. After this configuration
 pass, the new sub-device will be synchronized with other sub-devices, i.e. be
 started if the fail-safe PMD has been started by the user before.
+
+Plug-out feature
+----------------
+
+A sub-device supporting the device removal event can be removed from its bus at
+any time. The fail-safe PMD will register a callback for such event and react
+accordingly. It will try to safely stop, close and uninit the sub-device having
+emitted this event, allowing it to free its eventual resources.
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index 839831f..62033c4 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -462,6 +462,26 @@ failsafe_args_count_subdevice(struct rte_eth_dev *dev,
 				dev, params);
 }
 
+static int
+parse_sub_device(struct sub_device *sdev)
+{
+	struct rte_devargs *da;
+	char params[DEVARGS_MAXLEN] = "";
+
+	da = &sdev->devargs;
+	if (da->type == RTE_DEVTYPE_VIRTUAL)
+		snprintf(params, sizeof(params) - 1,
+			"%s,%s", da->virt.drv_name, da->args);
+	else
+		snprintf(params, sizeof(params) - 1,
+			PCI_PRI_FMT ",%s",
+			da->pci.addr.domain, da->pci.addr.bus,
+			da->pci.addr.devid, da->pci.addr.function,
+			da->args);
+
+	return parse_device(sdev, params);
+}
+
 int
 failsafe_args_parse_subs(struct rte_eth_dev *dev)
 {
@@ -474,6 +494,8 @@ failsafe_args_parse_subs(struct rte_eth_dev *dev)
 			continue;
 		if (sdev->cmdline)
 			ret = execute_cmd(sdev, sdev->cmdline);
+		else
+			ret = parse_sub_device(sdev);
 		if (ret == 0)
 			sdev->state = DEV_PARSED;
 	}
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index 9817fc9..8bb8d45 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -140,6 +140,7 @@ dev_init(struct rte_eth_dev *dev)
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
 			SUB_ID(sdev) = i;
+			sdev->fs_dev = dev;
 			sdev->state = DEV_PROBED;
 		}
 	}
@@ -191,6 +192,7 @@ pci_probe(struct rte_eth_dev *dev)
 			}
 			ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
 			SUB_ID(sdev) = i;
+			sdev->fs_dev = dev;
 			sdev->state = DEV_PROBED;
 		}
 	}
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 8c73b4c..f12b8d7 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -33,6 +33,7 @@
 
 #include <unistd.h>
 
+#include <rte_alarm.h>
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
 
@@ -256,6 +257,43 @@ eth_dev_conf_apply(struct rte_eth_dev *dev,
 	return 0;
 }
 
+static void
+fs_dev_remove(void *arg)
+{
+	struct sub_device *sdev = arg;
+	struct rte_devargs *da;
+	struct rte_pci_device *pdev;
+
+	switch (sdev->state) {
+	case DEV_STARTED:
+		rte_eth_dev_stop(PORT_ID(sdev));
+		sdev->state = DEV_ACTIVE;
+		/* fallthrough */
+	case DEV_ACTIVE:
+		rte_eth_dev_close(PORT_ID(sdev));
+		sdev->state = DEV_PROBED;
+		/* fallthrough */
+	case DEV_PROBED:
+		da = &sdev->devargs;
+		if (da->type == RTE_DEVTYPE_WHITELISTED_PCI) {
+			pdev = &sdev->pci_device;
+			rte_eal_pci_detach_all_drivers(pdev);
+		} else if (da->type == RTE_DEVTYPE_VIRTUAL) {
+			rte_eal_vdev_uninit(da->virt.drv_name);
+		}
+		sdev->eth_dev->state = RTE_ETH_DEV_UNUSED;
+		sdev->state = DEV_PARSED;
+		/* fallthrough */
+	case DEV_SCANNED:
+	case DEV_PARSED:
+	case DEV_UNDEFINED:
+		sdev->state = DEV_UNDEFINED;
+		/* the end */
+		break;
+	}
+	failsafe_plugin_alarm_install(sdev->fs_dev);
+}
+
 int
 failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 {
@@ -291,7 +329,7 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 			if (ret) {
 				ERROR("Could not apply configuration to sub_device %d",
 				      i);
-				/* TODO: disable device */
+				fs_dev_remove(sdev);
 				return ret;
 			}
 		}
@@ -309,3 +347,30 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
 		return ret;
 	return 0;
 }
+
+void
+failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
+				enum rte_eth_event_type event,
+				void *cb_arg)
+{
+	struct sub_device *sdev = cb_arg;
+	enum dev_state state;
+
+	if (event != RTE_ETH_EVENT_INTR_RMV) {
+		ERROR("Incorrect event");
+		return;
+	}
+	/* Switch as soon as possible tx_dev. */
+	state = sdev->state;
+	sdev->state = DEV_UNDEFINED;
+	fs_switch_dev(sdev->fs_dev);
+	sdev->state = state;
+	/*
+	 * Async removal, the sub-PMD will try to unregister
+	 * the callback at the source of the current thread context.
+	 */
+	if (rte_eal_alarm_set(FAILSAFE_PLUGOUT_ASYNC_RESCHED_US,
+			      fs_dev_remove,
+			      cb_arg))
+		ERROR("Could not set up deferred sub_device removal");
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 2a4d102..8d0e7a2 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -198,8 +198,19 @@ fs_dev_configure(struct rte_eth_dev *dev)
 		}
 	}
 	FOREACH_SUBDEV(sdev, i, dev) {
+		int rmv_interrupt = 0;
+
 		if (sdev->state != DEV_PROBED)
 			continue;
+
+		rmv_interrupt = ETH(sdev)->data->dev_flags &
+				RTE_ETH_DEV_INTR_RMV;
+		if (rmv_interrupt) {
+			DEBUG("Enabling RMV interrupts for sub_device %d", i);
+			dev->data->dev_conf.intr_conf.rmv = 1;
+		} else {
+			DEBUG("sub_device %d does not support RMV event", i);
+		}
 		DEBUG("Configuring sub-device %d", i);
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
@@ -209,6 +220,16 @@ fs_dev_configure(struct rte_eth_dev *dev)
 			ERROR("Could not configure sub_device %d", i);
 			return ret;
 		}
+		if (rmv_interrupt) {
+			ret = rte_eth_dev_callback_register(PORT_ID(sdev),
+					RTE_ETH_EVENT_INTR_RMV,
+					failsafe_eth_rmv_event_callback,
+					sdev);
+			if (ret)
+				WARN("Failed to register RMV callback for sub_device %d",
+				     SUB_ID(sdev));
+		}
+		dev->data->dev_conf.intr_conf.rmv = 0;
 		sdev->state = DEV_ACTIVE;
 	}
 	if (PRIV(dev)->state < DEV_ACTIVE)
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index faf0e71..5efd084 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -53,6 +53,7 @@
 	""
 
 #define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+#define FAILSAFE_PLUGOUT_ASYNC_RESCHED_US 100000
 
 #define FAILSAFE_MAX_ETHPORTS (RTE_MAX_ETHPORTS - 1)
 #define FAILSAFE_MAX_ETHADDR 128
@@ -108,6 +109,9 @@ struct sub_device {
 	enum dev_state state;
 	/* Some device are defined as a command line */
 	char *cmdline;
+
+	/* fail-safe device backreference */
+	struct rte_eth_dev *fs_dev;
 };
 
 struct fs_priv {
@@ -175,6 +179,9 @@ int failsafe_eal_uninit(struct rte_eth_dev *dev);
 /* ETH_DEV */
 
 int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
+void failsafe_eth_rmv_event_callback(uint8_t port_id,
+				     enum rte_eth_event_type type,
+				     void *arg);
 
 /* GLOBALS */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 13/13] net/failsafe: support link status change event
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (11 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 12/13] net/failsafe: support device removal Gaetan Rivet
@ 2017-03-08 15:15   ` Gaetan Rivet
  2017-03-08 16:54   ` [PATCH v2 00/13] introduce fail-safe PMD Neil Horman
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Gaetan Rivet @ 2017-03-08 15:15 UTC (permalink / raw)
  To: dev

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
 doc/guides/nics/features/failsafe.ini   |  1 +
 drivers/net/failsafe/failsafe.c         |  1 +
 drivers/net/failsafe/failsafe_ether.c   | 18 ++++++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     | 24 ++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h |  3 +++
 5 files changed, 47 insertions(+)

diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
index 257f579..251ce55 100644
--- a/doc/guides/nics/features/failsafe.ini
+++ b/doc/guides/nics/features/failsafe.ini
@@ -5,6 +5,7 @@
 ;
 [Features]
 Link status          = Y
+Link status event    = Y
 Queue start/stop     = Y
 MTU update           = Y
 Jumbo frame          = Y
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index 6151736..f885c19 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -238,6 +238,7 @@ eth_dev_create(const char *name,
 		mac->addr_bytes[0], mac->addr_bytes[1],
 		mac->addr_bytes[2], mac->addr_bytes[3],
 		mac->addr_bytes[4], mac->addr_bytes[5]);
+	dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
 	return 0;
 free_args:
 	failsafe_args_free(dev);
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index f12b8d7..5c2e118 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -374,3 +374,21 @@ failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
 			      cb_arg))
 		ERROR("Could not set up deferred sub_device removal");
 }
+
+void
+failsafe_eth_lsc_event_callback(uint8_t port_id __rte_unused,
+				enum rte_eth_event_type event,
+				void *cb_arg)
+{
+	struct rte_eth_dev *dev = cb_arg;
+	int ret;
+
+	if (event != RTE_ETH_EVENT_INTR_LSC) {
+		ERROR("Incorrect event");
+		return;
+	}
+	ret = dev->dev_ops->link_update(dev, 0);
+	/* We must pass on the LSC event */
+	if (ret)
+		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 8d0e7a2..695e7b3 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -199,6 +199,8 @@ fs_dev_configure(struct rte_eth_dev *dev)
 	}
 	FOREACH_SUBDEV(sdev, i, dev) {
 		int rmv_interrupt = 0;
+		int lsc_interrupt = 0;
+		int lsc_enabled;
 
 		if (sdev->state != DEV_PROBED)
 			continue;
@@ -211,6 +213,18 @@ fs_dev_configure(struct rte_eth_dev *dev)
 		} else {
 			DEBUG("sub_device %d does not support RMV event", i);
 		}
+
+		lsc_enabled = dev->data->dev_conf.intr_conf.lsc;
+		lsc_interrupt = lsc_enabled &&
+				(ETH(sdev)->data->dev_flags &
+				 RTE_ETH_DEV_INTR_LSC);
+		if (lsc_interrupt) {
+			DEBUG("Enabling LSC interrupts for sub_device %d", i);
+			dev->data->dev_conf.intr_conf.lsc = 1;
+		} else if (lsc_enabled && !lsc_interrupt) {
+			DEBUG("Disabling LSC interrupts for sub_device %d", i);
+			dev->data->dev_conf.intr_conf.lsc = 0;
+		}
 		DEBUG("Configuring sub-device %d", i);
 		ret = rte_eth_dev_configure(PORT_ID(sdev),
 					dev->data->nb_rx_queues,
@@ -230,6 +244,16 @@ fs_dev_configure(struct rte_eth_dev *dev)
 				     SUB_ID(sdev));
 		}
 		dev->data->dev_conf.intr_conf.rmv = 0;
+		if (lsc_interrupt) {
+			ret = rte_eth_dev_callback_register(PORT_ID(sdev),
+							RTE_ETH_EVENT_INTR_LSC,
+							failsafe_eth_lsc_event_callback,
+							dev);
+			if (ret)
+				WARN("Failed to register LSC callback for sub_device %d",
+				     SUB_ID(sdev));
+		}
+		dev->data->dev_conf.intr_conf.lsc = lsc_enabled;
 		sdev->state = DEV_ACTIVE;
 	}
 	if (PRIV(dev)->state < DEV_ACTIVE)
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index 5efd084..27e2a0c 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -182,6 +182,9 @@ int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
 void failsafe_eth_rmv_event_callback(uint8_t port_id,
 				     enum rte_eth_event_type type,
 				     void *arg);
+void failsafe_eth_lsc_event_callback(uint8_t port_id,
+				     enum rte_eth_event_type event,
+				     void *cb_arg);
 
 /* GLOBALS */
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (12 preceding siblings ...)
  2017-03-08 15:15   ` [PATCH v2 13/13] net/failsafe: support link status change event Gaetan Rivet
@ 2017-03-08 16:54   ` Neil Horman
  2017-03-09  9:15     ` Bruce Richardson
  2017-03-20 15:00   ` Thomas Monjalon
  2017-03-23 13:01   ` Ferruh Yigit
  15 siblings, 1 reply; 49+ messages in thread
From: Neil Horman @ 2017-03-08 16:54 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev, Thomas Monjalon, Adrien Mazarguil

On Wed, Mar 08, 2017 at 04:15:33PM +0100, Gaetan Rivet wrote:
> This PMD intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back so that
> existing applications do not need to be modified to benefit from true
> hot-plugging support.
> 
> The stacked PMD approach shares many similarities with the bonding PMD but
> with a different purpose. While bonding provides the ability to group
> several links into a single logical device for enhanced throughput and
> supports fail-over at link level, this one manages the sudden disappearance
> of the underlying device; it guarantees applications face a valid device in
> working order at all times.
> 
Why not just add this feature to the bonding pmd then?  A bond is perfectly
capable of handling the trivial case of a single underlying device, and adding
an option to make the underly slave 'persistent' seem both much simpler in terms
of implementation and code size, than adding an entire new pmd, along with its
supporting code.

Neil

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-08 16:54   ` [PATCH v2 00/13] introduce fail-safe PMD Neil Horman
@ 2017-03-09  9:15     ` Bruce Richardson
  2017-03-10  9:13       ` Gaëtan Rivet
  0 siblings, 1 reply; 49+ messages in thread
From: Bruce Richardson @ 2017-03-09  9:15 UTC (permalink / raw)
  To: Neil Horman; +Cc: Gaetan Rivet, dev, Thomas Monjalon, Adrien Mazarguil

On Wed, Mar 08, 2017 at 11:54:02AM -0500, Neil Horman wrote:
> On Wed, Mar 08, 2017 at 04:15:33PM +0100, Gaetan Rivet wrote:
> > This PMD intercepts and manages Ethernet device removal events issued by
> > slave PMDs and re-initializes them transparently when brought back so that
> > existing applications do not need to be modified to benefit from true
> > hot-plugging support.
> > 
> > The stacked PMD approach shares many similarities with the bonding PMD but
> > with a different purpose. While bonding provides the ability to group
> > several links into a single logical device for enhanced throughput and
> > supports fail-over at link level, this one manages the sudden disappearance
> > of the underlying device; it guarantees applications face a valid device in
> > working order at all times.
> > 
> Why not just add this feature to the bonding pmd then?  A bond is perfectly
> capable of handling the trivial case of a single underlying device, and adding
> an option to make the underly slave 'persistent' seem both much simpler in terms
> of implementation and code size, than adding an entire new pmd, along with its
> supporting code.
> 
> Neil
>
+1 
I don't like the idea of having multiple PMDs in DPDK to handle
combining multiple other devices into one.

/Bruce

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-09  9:15     ` Bruce Richardson
@ 2017-03-10  9:13       ` Gaëtan Rivet
  2017-03-10 22:43         ` Neil Horman
  0 siblings, 1 reply; 49+ messages in thread
From: Gaëtan Rivet @ 2017-03-10  9:13 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Neil Horman, dev, Thomas Monjalon, Adrien Mazarguil

On Thu, Mar 09, 2017 at 09:15:14AM +0000, Bruce Richardson wrote:
>On Wed, Mar 08, 2017 at 11:54:02AM -0500, Neil Horman wrote:
>> On Wed, Mar 08, 2017 at 04:15:33PM +0100, Gaetan Rivet wrote:
>> > This PMD intercepts and manages Ethernet device removal events issued by
>> > slave PMDs and re-initializes them transparently when brought back so that
>> > existing applications do not need to be modified to benefit from true
>> > hot-plugging support.
>> >
>> > The stacked PMD approach shares many similarities with the bonding PMD but
>> > with a different purpose. While bonding provides the ability to group
>> > several links into a single logical device for enhanced throughput and
>> > supports fail-over at link level, this one manages the sudden disappearance
>> > of the underlying device; it guarantees applications face a valid device in
>> > working order at all times.
>> >
>> Why not just add this feature to the bonding pmd then?  A bond is perfectly
>> capable of handling the trivial case of a single underlying device, and adding
>> an option to make the underly slave 'persistent' seem both much simpler in terms
>> of implementation and code size, than adding an entire new pmd, along with its
>> supporting code.
>>
>> Neil
>>

@Neil
I don't know if you saw my answer to Bruce on the matter [1], it
partially adresses your point.

>+1
>I don't like the idea of having multiple PMDs in DPDK to handle
>combining multiple other devices into one.
>
>/Bruce

I understand the concern. Let's first put aside for the moment the link
grouping, which is only part of the fail-safe PMD function.

The fail-safe PMD at its core, provides an alternative paradigm, a new
proposal for a hot-plug functionality in a lightweight form-factor from a
user standpoint.

The central question that I would like to tackle is this: why should we
require from our users declaring a bonding device to have hot-plug support?

I took some time to illustrate a few modes of operation:

Fig. 1

    .-------------.
    | application |
    `------.------'
           |
      .----'-----.---------. <------ init, conf, Rx/Tx
      |          |         |
      |      .---|--.------|--. <--- conf, link check, Rx/Tx
      |      |   |  |      |  |
      v      |   v  v      v  v
 .---------. | .-------. .------.
 | bonding | | | ixgbe | | mlx4 |
 `----.----' | `-------' `------'
      |      |
      `------'

 Typical link fail-over.


Fig. 2

  .-------------.
  | application |
  `------.------'
         | <-------- init, conf, Rx/Tx
         v
   .-----------.
   | fail-safe |
   `-----.-----'
         |
     .---'----. <--- init, conf, dev check, Rx/Tx
     |        |
     v        v
 .-------. .------.
 | ixgbe | | mlx4 |
 `-------' `------'

 Typical automatic hot-plug handling with device fail-over.


Fig. 3

    .-------------.
    | application |
    `------.------'
           |
      .----'-----.-------------. <---------- init, conf, Rx/Tx
      |          |             |
      |      .---|--.----------|--. <------- conf, link check, Rx/Tx
      |      |   |  |          |  |
      v      |   v  v          v  v
 .---------. | .-----------. .-----------.
 | bonding | | | fail-safe | | fail-safe |
 `----.----' | `-----.-----' `-----.-----'
      |      |       |             | <------ init, conf, dev check, Rx/Tx
      `------'       v             v
                 .-------.      .------.
                 | ixgbe |      | mlx4 |
                 `-------'      `------'

 Combination to provide link fail-over with automatic hot-plug handling.


Fig. 4

    .-------------.
    | application |
    `------.------'
           |
      .----'-----.-------------. <---------- init, conf, Rx/Tx
      |          |             |
      |      .---|--.----------|--. <------- conf, link check, Rx/Tx
      |      |   |  |          |  |
      v      |   v  v          v  v
 .---------. | .-----------. .-----------.
 | bonding | | | fail-safe | | fail-safe |
 `----.----' | `-----.-----' `-----.-----'
      |      |       |             | <------- init, conf, dev check, Rx/Tx
      `------'       |             |
              .------'---.     .---'------.
              |          |     |          |
              v          v     v          v
       .--------. .--------. .--------. .--------.
       | mlx4 1 | | mlx4 2 | | mlx4 1 | | mlx4 2 |
       | port 1 | | port 1 | | port 2 | | port 2 |
       `--------' `--------' `--------' `--------'

 Complex use case with link fail-over at port level and automatic hot-plug
 handling with device fail-over.

1. LSC vs. RMV

  A link status change is a valid state for a device. It calls for
  specific responses, e.g. a link switch in a bonding device, without
  losing the general configuration of the port.

  The removal of a device calls for more than pausing operations and
  switching an active device. The party responsible for initializing the
  device should take care of closing it properly. If this party also
  wants to be able to restore the device if it was plugged back in, it
  would need be able to initialize it back and reconfigure its previous
  state.

  As we can see that in [Fig. 1], this responsibility lies upon the
  application.

2. Bonding and link availability

  The hot-plug functionality is not a core function of the bonding PMD.
  It is only interested in knowing if the link is active or not.

  Adding the device persistence to the bonding PMD would mean adding the
  ability to flexibly parse device definitions to cope with plug-ins in
  evolving busses (PCI hot-plug could mean changing bus addresses), being
  able to emulate the EAL and the ether layer and to properly store the
  device configuration.  This means formally describing the life of a
  device in a DPDK application from start to finish.

  All of this hot-plug support, in order for the bonding to be aware of
  the status of some of its links. This seems like scope-creep.

3. Fail-safe and hot-plug support

  An attach / detach (pseudo-hotplug) API exists in DPDK. The main 
  problem of this API is that it does not offer reacting to a device 
  plug-out, only triggering a device detaching from an application. This 
  is a completely different approach from an application standpoint.

  The fail-safe PMD offers an out-of-the-box implementation of a newly
  proposed hot-plug API [2]. It allows a seamless integration to users
  for device removal support in their applications, without requiring
  evolutions [Fig. 2]. This flexibility allows it to be used as part of
  a bond [Fig. 3], [Fig. 4], while current bonding does not allow for
  detaching devices, even if it were to be considered hot-plug. There is
  no reason however to require declaring a bonding device to be able to
  use it, from a user perspective this seems backward.

  Both bonding and fail-safe PMDs deal with enslaving devices and
  acting upon their state changing. The bonding function performs at
  link level, while the hot-plug function deals with the device itself.
  I do not think this similarity justify merging both functions.
  Maintenance would be easier with clear, simpler functions implemented
  in separate PMDs.

[1]: http://dpdk.org/ml/archives/dev/2017-March/059446.html
[2]: http://dpdk.org/ml/archives/dev/2017-March/059217.html

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-10  9:13       ` Gaëtan Rivet
@ 2017-03-10 22:43         ` Neil Horman
  2017-03-14 14:49           ` Gaëtan Rivet
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Horman @ 2017-03-10 22:43 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: Bruce Richardson, dev, Thomas Monjalon, Adrien Mazarguil

On Fri, Mar 10, 2017 at 10:13:32AM +0100, Gaëtan Rivet wrote:
> On Thu, Mar 09, 2017 at 09:15:14AM +0000, Bruce Richardson wrote:
> > On Wed, Mar 08, 2017 at 11:54:02AM -0500, Neil Horman wrote:
> > > On Wed, Mar 08, 2017 at 04:15:33PM +0100, Gaetan Rivet wrote:
> > > > This PMD intercepts and manages Ethernet device removal events issued by
> > > > slave PMDs and re-initializes them transparently when brought back so that
> > > > existing applications do not need to be modified to benefit from true
> > > > hot-plugging support.
> > > >
> > > > The stacked PMD approach shares many similarities with the bonding PMD but
> > > > with a different purpose. While bonding provides the ability to group
> > > > several links into a single logical device for enhanced throughput and
> > > > supports fail-over at link level, this one manages the sudden disappearance
> > > > of the underlying device; it guarantees applications face a valid device in
> > > > working order at all times.
> > > >
> > > Why not just add this feature to the bonding pmd then?  A bond is perfectly
> > > capable of handling the trivial case of a single underlying device, and adding
> > > an option to make the underly slave 'persistent' seem both much simpler in terms
> > > of implementation and code size, than adding an entire new pmd, along with its
> > > supporting code.
> > > 
> > > Neil
> > > 
> 
> @Neil
> I don't know if you saw my answer to Bruce on the matter [1], it
> partially adresses your point.
> 
I did, and I think it asserts your points, but doesn't really address mine.  See
below. 

> > +1
> > I don't like the idea of having multiple PMDs in DPDK to handle
> > combining multiple other devices into one.
> > 
> > /Bruce
> 
> I understand the concern. Let's first put aside for the moment the link
> grouping, which is only part of the fail-safe PMD function.
> 
> The fail-safe PMD at its core, provides an alternative paradigm, a new
> proposal for a hot-plug functionality in a lightweight form-factor from a
> user standpoint.
> 
Ok, but lets be clear here, you're duplicating alot of functionality.  And for
that privlidge, there will be an additional 4000 lines of code to maintain.

> The central question that I would like to tackle is this: why should we
> require from our users declaring a bonding device to have hot-plug support?
> 
We'll, strictly speaking, I suppose we don't have to require it.  But by that
same token, we don't need to do it in a separate PMD either, there are lots of
other options.

> I took some time to illustrate a few modes of operation:
> 
> Fig. 1
> 
>    .-------------.
>    | application |
>    `------.------'
>           |
>      .----'-----.---------. <------ init, conf, Rx/Tx
>      |          |         |
>      |      .---|--.------|--. <--- conf, link check, Rx/Tx
>      |      |   |  |      |  |
>      v      |   v  v      v  v
> .---------. | .-------. .------.
> | bonding | | | ixgbe | | mlx4 |
> `----.----' | `-------' `------'
>      |      |
>      `------'
> 
> Typical link fail-over.
> 
> 
> Fig. 2
> 
>  .-------------.
>  | application |
>  `------.------'
>         | <-------- init, conf, Rx/Tx
>         v
>   .-----------.
>   | fail-safe |
>   `-----.-----'
>         |
>     .---'----. <--- init, conf, dev check, Rx/Tx
>     |        |
>     v        v
> .-------. .------.
> | ixgbe | | mlx4 |
> `-------' `------'
> 
> Typical automatic hot-plug handling with device fail-over.
> 
> 
> Fig. 3
> 
>    .-------------.
>    | application |
>    `------.------'
>           |
>      .----'-----.-------------. <---------- init, conf, Rx/Tx
>      |          |             |
>      |      .---|--.----------|--. <------- conf, link check, Rx/Tx
>      |      |   |  |          |  |
>      v      |   v  v          v  v
> .---------. | .-----------. .-----------.
> | bonding | | | fail-safe | | fail-safe |
> `----.----' | `-----.-----' `-----.-----'
>      |      |       |             | <------ init, conf, dev check, Rx/Tx
>      `------'       v             v
>                 .-------.      .------.
>                 | ixgbe |      | mlx4 |
>                 `-------'      `------'
> 
> Combination to provide link fail-over with automatic hot-plug handling.
> 
> 
> Fig. 4
> 
>    .-------------.
>    | application |
>    `------.------'
>           |
>      .----'-----.-------------. <---------- init, conf, Rx/Tx
>      |          |             |
>      |      .---|--.----------|--. <------- conf, link check, Rx/Tx
>      |      |   |  |          |  |
>      v      |   v  v          v  v
> .---------. | .-----------. .-----------.
> | bonding | | | fail-safe | | fail-safe |
> `----.----' | `-----.-----' `-----.-----'
>      |      |       |             | <------- init, conf, dev check, Rx/Tx
>      `------'       |             |
>              .------'---.     .---'------.
>              |          |     |          |
>              v          v     v          v
>       .--------. .--------. .--------. .--------.
>       | mlx4 1 | | mlx4 2 | | mlx4 1 | | mlx4 2 |
>       | port 1 | | port 1 | | port 2 | | port 2 |
>       `--------' `--------' `--------' `--------'
> 
> Complex use case with link fail-over at port level and automatic hot-plug
> handling with device fail-over.

Yes, I think we all understand the purpose of your PMD - its there to provide a
placeholder device that can respond to application requests in a sane manner,
until such time as real hardware is put in its place via a hot plug/failure.  Thats all
well and good, I'm saying there are better ways to go about this that can
provide the same functionality without having to add an extra 4k lines of code
to the project, many of which already exist.

> 1. LSC vs. RMV
> 
>  A link status change is a valid state for a device. It calls for
>  specific responses, e.g. a link switch in a bonding device, without
>  losing the general configuration of the port.
> 
>  The removal of a device calls for more than pausing operations and
>  switching an active device. The party responsible for initializing the
>  device should take care of closing it properly. If this party also
>  wants to be able to restore the device if it was plugged back in, it
>  would need be able to initialize it back and reconfigure its previous
>  state.
> 
>  As we can see that in [Fig. 1], this responsibility lies upon the
>  application.
> 
Again, yes, I think we all see the benefit of centralizing hot plug operations,
no one is disagreing with that, its the code/functional duplication that is
concerning.

> 2. Bonding and link availability
> 
>  The hot-plug functionality is not a core function of the bonding PMD.
>  It is only interested in knowing if the link is active or not.
> 
Currently, yes.  The suggestion was that you augment the bonding driver so that
hot plug is a core function of bonding.

>  Adding the device persistence to the bonding PMD would mean adding the
>  ability to flexibly parse device definitions to cope with plug-ins in
>  evolving busses (PCI hot-plug could mean changing bus addresses), being
>  able to emulate the EAL and the ether layer and to properly store the
>  device configuration.  This means formally describing the life of a
>  device in a DPDK application from start to finish.
> 
Which seems to me to be exactly what your PMD does.  I don't see why its
fundamentally harder to do that in an existing pmd, than it is in a new one.

>  All of this hot-plug support, in order for the bonding to be aware of
>  the status of some of its links. This seems like scope-creep.
> 
But one of your primary use cases is bonding.  Its the nature of code to add
features as time goes on.

> 3. Fail-safe and hot-plug support
> 
>  An attach / detach (pseudo-hotplug) API exists in DPDK. The main  problem
> of this API is that it does not offer reacting to a device  plug-out, only
> triggering a device detaching from an application. This  is a completely
> different approach from an application standpoint.
> 
>  The fail-safe PMD offers an out-of-the-box implementation of a newly
>  proposed hot-plug API [2]. It allows a seamless integration to users
>  for device removal support in their applications, without requiring
>  evolutions [Fig. 2]. This flexibility allows it to be used as part of
>  a bond [Fig. 3], [Fig. 4], while current bonding does not allow for
>  detaching devices, even if it were to be considered hot-plug. There is
>  no reason however to require declaring a bonding device to be able to
>  use it, from a user perspective this seems backward.
> 
>  Both bonding and fail-safe PMDs deal with enslaving devices and
>  acting upon their state changing. The bonding function performs at
>  link level, while the hot-plug function deals with the device itself.
>  I do not think this similarity justify merging both functions.
>  Maintenance would be easier with clear, simpler functions implemented
>  in separate PMDs.
> 

My suggestion to you would be to re-architect this in the following way:

1) Augment the null pmd to accept arbitrary parameters.  Paremeters which are
not explicitly recognized are stored.  Create an api call via the ethdev api to
retrieve said arguments.

2) Augment the bonding driver so that, upon configuration, null drivers can be
added to the bond, and marked as replaceable in some fashion

3) Port your hotplug implementation into the bonding code (or make it a separate
library, not sure which yet).  On a hot plug event trigger, react by querying
the replaceable interfaces to see if the hot plugged nic is meant to replace an
existing one.  If a match is found, detach the null pmd instance, and add the
hot plugged interface.

The intent here is to provide you with the functionality that you have (which is
a good feature), without having to support an entire new pmd that replicates a
great deal of existing code an functionality.

Regards
Neil

> [1]: http://dpdk.org/ml/archives/dev/2017-March/059446.html
> [2]: http://dpdk.org/ml/archives/dev/2017-March/059217.html
> 
> -- 
> Gaëtan Rivet
> 6WIND
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-10 22:43         ` Neil Horman
@ 2017-03-14 14:49           ` Gaëtan Rivet
  2017-03-15  3:28             ` Bruce Richardson
  0 siblings, 1 reply; 49+ messages in thread
From: Gaëtan Rivet @ 2017-03-14 14:49 UTC (permalink / raw)
  To: Neil Horman; +Cc: Bruce Richardson, dev, Thomas Monjalon, Adrien Mazarguil

> > The central question that I would like to tackle is this: why should
> > we require from our users declaring a bonding device to have
> > hot-plug support?
> >
> We'll, strictly speaking, I suppose we don't have to require it.  But by that
> same token, we don't need to do it in a separate PMD either, there are lots of
> other options.
>

I think I used an ambiguous formulation here. To be sure that we
understand each other, what I want to express is that it will certainly
seem very strange for a user to declare a bond on a single device,
first, and to be expected to do so if they wanted hot-plug support on
that particular device.

The bonding here would only be used as a place-holder, which does not
really make sense from the user point of view.

I think you understood that, and what I take from your response is that
while agreeing you would like to do so differently. I just wanted to be
clear.

> > I took some time to illustrate a few modes of operation:
> >
> > Fig. 1
> >
> >    .-------------.
> >    | application |
> >    `------.------'
> >           |
> >      .----'-----.---------. <------ init, conf, Rx/Tx
> >      |          |         |
> >      |      .---|--.------|--. <--- conf, link check, Rx/Tx
> >      |      |   |  |      |  |
> >      v      |   v  v      v  v
> > .---------. | .-------. .------.
> > | bonding | | | ixgbe | | mlx4 |
> > `----.----' | `-------' `------'
> >      |      |
> >      `------'
> >
> > Typical link fail-over.
> >
> >
> > Fig. 2
> >
> >  .-------------.
> >  | application |
> >  `------.------'
> >         | <-------- init, conf, Rx/Tx
> >         v
> >   .-----------.
> >   | fail-safe |
> >   `-----.-----'
> >         |
> >     .---'----. <--- init, conf, dev check, Rx/Tx
> >     |        |
> >     v        v
> > .-------. .------.
> > | ixgbe | | mlx4 |
> > `-------' `------'
> >
> > Typical automatic hot-plug handling with device fail-over.
> >
> > [...]
>
> Yes, I think we all understand the purpose of your PMD - its there to provide a
> placeholder device that can respond to application requests in a sane manner,
> until such time as real hardware is put in its place via a hot plug/failure.  Thats all
> well and good, I'm saying there are better ways to go about this that can
> provide the same functionality without having to add an extra 4k lines of code
> to the project, many of which already exist.
>

Ah, yes, I didn't want to imply that the purpose of this PMD wasn't
understood already by many. These figures are there to illustrate some
use cases that some users could recognize, and serve as a support for
the point made afterward.

The main thing that can be taken from these is the division along the
link-level and device-level checking that is done. This explains most of
my position. The nature of those checks imply different kind of code,
most of which is thus actually not duplicated / would require pretty
much the same amount of code to be implemented either as libraries or as
part of the bonding PMD. This is the crux of my argument, which I expand
upon below.

> > 1. LSC vs. RMV
> >
> >  A link status change is a valid state for a device. It calls for
> >  specific responses, e.g. a link switch in a bonding device, without
> >  losing the general configuration of the port.
> >
> >  The removal of a device calls for more than pausing operations and
> >  switching an active device. The party responsible for initializing the
> >  device should take care of closing it properly. If this party also
> >  wants to be able to restore the device if it was plugged back in, it
> >  would need be able to initialize it back and reconfigure its previous
> >  state.
> >
> >  As we can see that in [Fig. 1], this responsibility lies upon the
> >  application.
> >
> Again, yes, I think we all see the benefit of centralizing hot plug operations,
> no one is disagreing with that, its the code/functional duplication that is
> concerning.
>

Certainly, I will try to explain why the code is not actually duplicated
/ why the functions are actually only superficially overlapping.

> > 2. Bonding and link availability
> >
> >  The hot-plug functionality is not a core function of the bonding PMD.
> >  It is only interested in knowing if the link is active or not.
> >
> Currently, yes.  The suggestion was that you augment the bonding driver so that
> hot plug is a core function of bonding.
>
> >  Adding the device persistence to the bonding PMD would mean adding the
> >  ability to flexibly parse device definitions to cope with plug-ins in
> >  evolving busses (PCI hot-plug could mean changing bus addresses), being
> >  able to emulate the EAL and the ether layer and to properly store the
> >  device configuration.  This means formally describing the life of a
> >  device in a DPDK application from start to finish.
> >
> Which seems to me to be exactly what your PMD does.  I don't see why its
> fundamentally harder to do that in an existing pmd, than it is in a new one.
>

Indeed it does. I must emphasize the "formally describe the life of a
device". The hot-plug functionality goes beyong the link-level check.
The description of a device from a DPDK standpoint is complete in the
fail-safe PMD. The state-machine must be able to describe the entire
life of a device, from the devargs parsing to its start-up.

We cannot reuse the existing bonding PMD architecture for this.  We
would have to rewrite the bonding PMD from the ground up for the
hot-plug function. Because it is actually a different approach to
managing the slaves.

This is what I wanted to illustrate in [Fig. 1] and [Fig. 2]:

- In the bonding, the init and configuration steps are still the
  responsibility of the application and no one else. The bonding PMD
  captures the device, re-applies its configuration upon dev_configure()
  which is actually re-applying part of the configuration already 
  present within the slave eth_dev (cf rte_eth_dev_config_restore).

- In the fail-safe, the init and configuration are both the
  responsibilities of the fail-safe PMD itself, not the application
  anymore. This handling of these responsibilities in lieu of the
  application is the whole point of the "deferred hot-plug" support, of
  proposing a simple implementation to the user.

This change in responsibilities is the bulk of the fail-safe code. It
would have to be added as-is to the bonding. Verifying the correctness
of the sync of the initialization phase (acceptable states of a device
following several events registered by the fail-safe PMD) and the
configuration items between the state the application believes it is in
and the fail-safe knows it is in, is the bulk of the fail-safe code.

This function is not overlapping with that of the bonding. The reason I
did not add this whole architecture to the bonding is that when I tried
to do so, I found that I only had two possibilities:

- The current slave handling path is kept, and we only add a new one
  with additional functionalities: full init and conf handling with
  extended parsing capabilities.

- The current slave handling is scraped and replaced entirely by the new
  slave management. The old capturing of existing device is not done
  anymore.

The first solution is not acceptable, because we effectively end-up with
a maintenance nightmare by having to validate two types of slaves with
differing capabilities, differing initialization paths and differing
configuration code.  This is extremely awkward and architecturally
unsound. This is essentially the same as having the exact code of the
fail-safe as an aside in the bonding, maintening exactly the same
breadth of code while having muddier interfaces and organization.

The second solution is not acceptable, because we are bending the whole
existing bonding API to our whim. We could just as well simply rename
the fail-safe PMD as bonding, add a few grouping capabilities and call
it a day. This is not acceptable for users.

All of this, for hotplug support in the bonding PMD, while it actually
makes sense to offer this function on its own. Again, this third path
with a new PMD clarifies the functions for the users and for the reasons
exposed, helps the maintenance of the code.

There is no code duplication, or very little, because the slave
management done by the fail-safe is specific to the hot-plug function.
We do not handle slaves in the same way, and this is entirely due to the
difference in nature between a link-level check and a device-level
check.

>
> My suggestion to you would be to re-architect this in the following way:
>
> 1) Augment the null pmd to accept arbitrary parameters.  Paremeters which are
> not explicitly recognized are stored.  Create an api call via the ethdev api to
> retrieve said arguments.
>
> 2) Augment the bonding driver so that, upon configuration, null drivers can be
> added to the bond, and marked as replaceable in some fashion
>
> 3) Port your hotplug implementation into the bonding code (or make it a separate
> library, not sure which yet).  On a hot plug event trigger, react by querying
> the replaceable interfaces to see if the hot plugged nic is meant to replace an
> existing one.  If a match is found, detach the null pmd instance, and add the
> hot plugged interface.
>

Porting this feature either as part of the bonding PMD or as a library 
means having a good part of the fail-safe PMD added somewhere else. It 
does not reduce much code length.

It does however modify the proposed user interface: either as a library 
that an application would have to explicitly use, or part of the bonding 
PMD with the problems previously explained.

The library form has been considered before. We would have to propose 
both a library for hot-plug handling (initializing and uninitializing a 
device), and a library for handling configuration. Neither can be 
seamlessly integrated in existing applications. They will have to be 
called correctly, an API will have to be respected to add the hot-plug 
support.

Additionally, their implementation is not so simple in this form. The 
configuration store / restore for example will need a full dev_ops 
coverage in order to deal with each configuration items that can be 
used. We end up with a complete dev_ops layer that strongly resemble a 
PMD. It also requires a saving space that will need to be managed, 
either by an application or by a PMD calling it. Having this space 
somehow managed by the EAL leads to implementing a PMD.

The PMD form allows for a clean abstraction layer that can be interfaced 
both with an application on top and a driver underneath. We are able to 
respect both APIs and to be inserted without either being aware of it.  
It does not overcomplicate the code itself nor the architecture while 
the overhead of a PMD is a negligible part, which can be justified to 
offer the seamless integration feature.

> The intent here is to provide you with the functionality that you have (which is
> a good feature), without having to support an entire new pmd that replicates a
> great deal of existing code an functionality.
>

We already considered using the bonding PMD as much as possible, having
generic libraries otherwise for the new facilities that we want. It is
not simpler or easier to maintain. It however impedes users in adopting 
the feature.

Regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-14 14:49           ` Gaëtan Rivet
@ 2017-03-15  3:28             ` Bruce Richardson
  2017-03-15 11:15               ` Thomas Monjalon
  0 siblings, 1 reply; 49+ messages in thread
From: Bruce Richardson @ 2017-03-15  3:28 UTC (permalink / raw)
  To: Gaëtan Rivet; +Cc: Neil Horman, dev, Thomas Monjalon, Adrien Mazarguil

On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
> > > The central question that I would like to tackle is this: why should
> > > we require from our users declaring a bonding device to have
> > > hot-plug support?
> > >
> > We'll, strictly speaking, I suppose we don't have to require it.  But by that
> > same token, we don't need to do it in a separate PMD either, there are lots of
> > other options.
> > 
> 
> I think I used an ambiguous formulation here. To be sure that we
> understand each other, what I want to express is that it will certainly
> seem very strange for a user to declare a bond on a single device,
> first, and to be expected to do so if they wanted hot-plug support on
> that particular device.
> 
> The bonding here would only be used as a place-holder, which does not
> really make sense from the user point of view.
> 
> I think you understood that, and what I take from your response is that
> while agreeing you would like to do so differently. I just wanted to be
> clear.
> 
> > > I took some time to illustrate a few modes of operation:
> > >
> > > Fig. 1
> > >
> > >    .-------------.
> > >    | application |
> > >    `------.------'
> > >           |
> > >      .----'-----.---------. <------ init, conf, Rx/Tx
> > >      |          |         |
> > >      |      .---|--.------|--. <--- conf, link check, Rx/Tx
> > >      |      |   |  |      |  |
> > >      v      |   v  v      v  v
> > > .---------. | .-------. .------.
> > > | bonding | | | ixgbe | | mlx4 |
> > > `----.----' | `-------' `------'
> > >      |      |
> > >      `------'
> > >
> > > Typical link fail-over.
> > >
> > >
> > > Fig. 2
> > >
> > >  .-------------.
> > >  | application |
> > >  `------.------'
> > >         | <-------- init, conf, Rx/Tx
> > >         v
> > >   .-----------.
> > >   | fail-safe |
> > >   `-----.-----'
> > >         |
> > >     .---'----. <--- init, conf, dev check, Rx/Tx
> > >     |        |
> > >     v        v
> > > .-------. .------.
> > > | ixgbe | | mlx4 |
> > > `-------' `------'
> > >
> > > Typical automatic hot-plug handling with device fail-over.
> > >
> > > [...]
> > 
> > Yes, I think we all understand the purpose of your PMD - its there to provide a
> > placeholder device that can respond to application requests in a sane manner,
> > until such time as real hardware is put in its place via a hot plug/failure.  Thats all
> > well and good, I'm saying there are better ways to go about this that can
> > provide the same functionality without having to add an extra 4k lines of code
> > to the project, many of which already exist.
> > 
> 
> Ah, yes, I didn't want to imply that the purpose of this PMD wasn't
> understood already by many. These figures are there to illustrate some
> use cases that some users could recognize, and serve as a support for
> the point made afterward.
> 
> The main thing that can be taken from these is the division along the
> link-level and device-level checking that is done. This explains most of
> my position. The nature of those checks imply different kind of code,
> most of which is thus actually not duplicated / would require pretty
> much the same amount of code to be implemented either as libraries or as
> part of the bonding PMD. This is the crux of my argument, which I expand
> upon below.
> 
> > > 1. LSC vs. RMV
> > >
> > >  A link status change is a valid state for a device. It calls for
> > >  specific responses, e.g. a link switch in a bonding device, without
> > >  losing the general configuration of the port.
> > >
> > >  The removal of a device calls for more than pausing operations and
> > >  switching an active device. The party responsible for initializing the
> > >  device should take care of closing it properly. If this party also
> > >  wants to be able to restore the device if it was plugged back in, it
> > >  would need be able to initialize it back and reconfigure its previous
> > >  state.
> > >
> > >  As we can see that in [Fig. 1], this responsibility lies upon the
> > >  application.
> > >
> > Again, yes, I think we all see the benefit of centralizing hot plug operations,
> > no one is disagreing with that, its the code/functional duplication that is
> > concerning.
> > 
> 
> Certainly, I will try to explain why the code is not actually duplicated
> / why the functions are actually only superficially overlapping.
> 
> > > 2. Bonding and link availability
> > >
> > >  The hot-plug functionality is not a core function of the bonding PMD.
> > >  It is only interested in knowing if the link is active or not.
> > >
> > Currently, yes.  The suggestion was that you augment the bonding driver so that
> > hot plug is a core function of bonding.
> > 
> > >  Adding the device persistence to the bonding PMD would mean adding the
> > >  ability to flexibly parse device definitions to cope with plug-ins in
> > >  evolving busses (PCI hot-plug could mean changing bus addresses), being
> > >  able to emulate the EAL and the ether layer and to properly store the
> > >  device configuration.  This means formally describing the life of a
> > >  device in a DPDK application from start to finish.
> > >
> > Which seems to me to be exactly what your PMD does.  I don't see why its
> > fundamentally harder to do that in an existing pmd, than it is in a new one.
> > 
> 
> Indeed it does. I must emphasize the "formally describe the life of a
> device". The hot-plug functionality goes beyong the link-level check.
> The description of a device from a DPDK standpoint is complete in the
> fail-safe PMD. The state-machine must be able to describe the entire
> life of a device, from the devargs parsing to its start-up.
> 
> We cannot reuse the existing bonding PMD architecture for this.  We
> would have to rewrite the bonding PMD from the ground up for the
> hot-plug function. Because it is actually a different approach to
> managing the slaves.
> 
> This is what I wanted to illustrate in [Fig. 1] and [Fig. 2]:
> 
> - In the bonding, the init and configuration steps are still the
>  responsibility of the application and no one else. The bonding PMD
>  captures the device, re-applies its configuration upon dev_configure()
>  which is actually re-applying part of the configuration already  present
> within the slave eth_dev (cf rte_eth_dev_config_restore).
> 
> - In the fail-safe, the init and configuration are both the
>  responsibilities of the fail-safe PMD itself, not the application
>  anymore. This handling of these responsibilities in lieu of the
>  application is the whole point of the "deferred hot-plug" support, of
>  proposing a simple implementation to the user.
> 
> This change in responsibilities is the bulk of the fail-safe code. It
> would have to be added as-is to the bonding. Verifying the correctness
> of the sync of the initialization phase (acceptable states of a device
> following several events registered by the fail-safe PMD) and the
> configuration items between the state the application believes it is in
> and the fail-safe knows it is in, is the bulk of the fail-safe code.
> 
> This function is not overlapping with that of the bonding. The reason I
> did not add this whole architecture to the bonding is that when I tried
> to do so, I found that I only had two possibilities:
> 
> - The current slave handling path is kept, and we only add a new one
>  with additional functionalities: full init and conf handling with
>  extended parsing capabilities.
> 
> - The current slave handling is scraped and replaced entirely by the new
>  slave management. The old capturing of existing device is not done
>  anymore.
> 
> The first solution is not acceptable, because we effectively end-up with
> a maintenance nightmare by having to validate two types of slaves with
> differing capabilities, differing initialization paths and differing
> configuration code.  This is extremely awkward and architecturally
> unsound. This is essentially the same as having the exact code of the
> fail-safe as an aside in the bonding, maintening exactly the same
> breadth of code while having muddier interfaces and organization.
> 
> The second solution is not acceptable, because we are bending the whole
> existing bonding API to our whim. We could just as well simply rename
> the fail-safe PMD as bonding, add a few grouping capabilities and call
> it a day. This is not acceptable for users.
> 
If the first solution is indeed not an option, why do you think this
second one would be unacceptable for users? If the functionality remains
the same, I don't see how it matters much for users which driver
provides it or where the code originates.

Despite all the discussion, it still just doesn't make sense to me to
have more than one DPDK driver to handle failover - be it link or
device. If nothing else, it's going to be awkward to explain to users
that if they want fail-over for when a link goes down they have to use
driver A, but if they want fail-over when a NIC gets hotplugged they use
driver B, and if they want both kinds of failover - which would surely
be the expected case - they need to use both drivers. The usability is
a problem here.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-15  3:28             ` Bruce Richardson
@ 2017-03-15 11:15               ` Thomas Monjalon
  2017-03-15 14:25                 ` Gaëtan Rivet
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Monjalon @ 2017-03-15 11:15 UTC (permalink / raw)
  To: Bruce Richardson, Gaëtan Rivet
  Cc: Neil Horman, dev, Adrien Mazarguil, techboard

2017-03-15 03:28, Bruce Richardson:
> On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
> > > > 2. Bonding and link availability
> > > >
> > > >  The hot-plug functionality is not a core function of the bonding PMD.
> > > >  It is only interested in knowing if the link is active or not.
> > > >
> > > Currently, yes.  The suggestion was that you augment the bonding driver so that
> > > hot plug is a core function of bonding.
> > > 
> > > >  Adding the device persistence to the bonding PMD would mean adding the
> > > >  ability to flexibly parse device definitions to cope with plug-ins in
> > > >  evolving busses (PCI hot-plug could mean changing bus addresses), being
> > > >  able to emulate the EAL and the ether layer and to properly store the
> > > >  device configuration.  This means formally describing the life of a
> > > >  device in a DPDK application from start to finish.
> > > >
> > > Which seems to me to be exactly what your PMD does.  I don't see why its
> > > fundamentally harder to do that in an existing pmd, than it is in a new one.
> > > 
> > 
> > Indeed it does. I must emphasize the "formally describe the life of a
> > device". The hot-plug functionality goes beyong the link-level check.
> > The description of a device from a DPDK standpoint is complete in the
> > fail-safe PMD. The state-machine must be able to describe the entire
> > life of a device, from the devargs parsing to its start-up.
> > 
> > We cannot reuse the existing bonding PMD architecture for this.  We
> > would have to rewrite the bonding PMD from the ground up for the
> > hot-plug function. Because it is actually a different approach to
> > managing the slaves.
> > 
> > This is what I wanted to illustrate in [Fig. 1] and [Fig. 2]:
> > 
> > - In the bonding, the init and configuration steps are still the
> >  responsibility of the application and no one else. The bonding PMD
> >  captures the device, re-applies its configuration upon dev_configure()
> >  which is actually re-applying part of the configuration already  present
> > within the slave eth_dev (cf rte_eth_dev_config_restore).
> > 
> > - In the fail-safe, the init and configuration are both the
> >  responsibilities of the fail-safe PMD itself, not the application
> >  anymore. This handling of these responsibilities in lieu of the
> >  application is the whole point of the "deferred hot-plug" support, of
> >  proposing a simple implementation to the user.
> > 
> > This change in responsibilities is the bulk of the fail-safe code. It
> > would have to be added as-is to the bonding. Verifying the correctness
> > of the sync of the initialization phase (acceptable states of a device
> > following several events registered by the fail-safe PMD) and the
> > configuration items between the state the application believes it is in
> > and the fail-safe knows it is in, is the bulk of the fail-safe code.
> > 
> > This function is not overlapping with that of the bonding. The reason I
> > did not add this whole architecture to the bonding is that when I tried
> > to do so, I found that I only had two possibilities:
> > 
> > - The current slave handling path is kept, and we only add a new one
> >  with additional functionalities: full init and conf handling with
> >  extended parsing capabilities.
> > 
> > - The current slave handling is scraped and replaced entirely by the new
> >  slave management. The old capturing of existing device is not done
> >  anymore.
> > 
> > The first solution is not acceptable, because we effectively end-up with
> > a maintenance nightmare by having to validate two types of slaves with
> > differing capabilities, differing initialization paths and differing
> > configuration code.  This is extremely awkward and architecturally
> > unsound. This is essentially the same as having the exact code of the
> > fail-safe as an aside in the bonding, maintening exactly the same
> > breadth of code while having muddier interfaces and organization.
> > 
> > The second solution is not acceptable, because we are bending the whole
> > existing bonding API to our whim. We could just as well simply rename
> > the fail-safe PMD as bonding, add a few grouping capabilities and call
> > it a day. This is not acceptable for users.
> > 
> If the first solution is indeed not an option, why do you think this
> second one would be unacceptable for users? If the functionality remains
> the same, I don't see how it matters much for users which driver
> provides it or where the code originates.
> 
> Despite all the discussion, it still just doesn't make sense to me to
> have more than one DPDK driver to handle failover - be it link or
> device. If nothing else, it's going to be awkward to explain to users
> that if they want fail-over for when a link goes down they have to use
> driver A, but if they want fail-over when a NIC gets hotplugged they use
> driver B, and if they want both kinds of failover - which would surely
> be the expected case - they need to use both drivers. The usability is
> a problem here.

It seems everybody agrees on the need for the failsafe code.
We are just discussing the right place to implement it.

Gaetan, moving this code in the bonding PMD means replacing the bonding
API design by the failsafe design, right?
With the failsafe design in the bonding PMD, is it possible to keep other
bonding features?

In case we do not have a consensus in the following days, I suggest to add
this topic in the next techboard meeting agenda.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-15 11:15               ` Thomas Monjalon
@ 2017-03-15 14:25                 ` Gaëtan Rivet
  2017-03-16 20:50                   ` Neil Horman
  0 siblings, 1 reply; 49+ messages in thread
From: Gaëtan Rivet @ 2017-03-15 14:25 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Bruce Richardson, Neil Horman, dev, Adrien Mazarguil, techboard

On Wed, Mar 15, 2017 at 12:15:56PM +0100, Thomas Monjalon wrote:
>2017-03-15 03:28, Bruce Richardson:
>> On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
>> > - In the bonding, the init and configuration steps are still the
>> >  responsibility of the application and no one else. The bonding PMD
>> >  captures the device, re-applies its configuration upon dev_configure()
>> >  which is actually re-applying part of the configuration already  present
>> > within the slave eth_dev (cf rte_eth_dev_config_restore).
>> >
>> > - In the fail-safe, the init and configuration are both the
>> >  responsibilities of the fail-safe PMD itself, not the application
>> >  anymore. This handling of these responsibilities in lieu of the
>> >  application is the whole point of the "deferred hot-plug" support, of
>> >  proposing a simple implementation to the user.
>> >
>> > This change in responsibilities is the bulk of the fail-safe code. It
>> > would have to be added as-is to the bonding. Verifying the correctness
>> > of the sync of the initialization phase (acceptable states of a device
>> > following several events registered by the fail-safe PMD) and the
>> > configuration items between the state the application believes it is in
>> > and the fail-safe knows it is in, is the bulk of the fail-safe code.
>> >
>> > This function is not overlapping with that of the bonding. The reason I
>> > did not add this whole architecture to the bonding is that when I tried
>> > to do so, I found that I only had two possibilities:
>> >
>> > - The current slave handling path is kept, and we only add a new one
>> >  with additional functionalities: full init and conf handling with
>> >  extended parsing capabilities.
>> >
>> > - The current slave handling is scraped and replaced entirely by the new
>> >  slave management. The old capturing of existing device is not done
>> >  anymore.
>> >
>> > The first solution is not acceptable, because we effectively end-up with
>> > a maintenance nightmare by having to validate two types of slaves with
>> > differing capabilities, differing initialization paths and differing
>> > configuration code.  This is extremely awkward and architecturally
>> > unsound. This is essentially the same as having the exact code of the
>> > fail-safe as an aside in the bonding, maintening exactly the same
>> > breadth of code while having muddier interfaces and organization.
>> >
>> > The second solution is not acceptable, because we are bending the whole
>> > existing bonding API to our whim. We could just as well simply rename
>> > the fail-safe PMD as bonding, add a few grouping capabilities and call
>> > it a day. This is not acceptable for users.
>> >
>> If the first solution is indeed not an option, why do you think this
>> second one would be unacceptable for users? If the functionality remains
>> the same, I don't see how it matters much for users which driver
>> provides it or where the code originates.
>>

The problem with the second solution is also that bonding is not only a 
PMD. It exposes its own public API that existing applications rely on, 
see rte_eth_bond_*() definitions in rte_eth_bond.h.

Although bonding instances can be set up through command-line options,
target "users" are mainly applications explicitly written to use it.
This must be preserved for no other reason that it hasn't been 
deprecated.

Also, trying to implement this API for the device failover function 
would implies a device capture down to the devargs parsing level. This 
means that a PMD could request taking over a device, messing with the 
internals of the EAL: devargs list and busses lists of devices. This 
seems unacceptable.

The bonding API is thus in conflict with the concept of a device 
failover in the context of the current DPDK arch.

>> Despite all the discussion, it still just doesn't make sense to me to
>> have more than one DPDK driver to handle failover - be it link or
>> device. If nothing else, it's going to be awkward to explain to users
>> that if they want fail-over for when a link goes down they have to use
>> driver A, but if they want fail-over when a NIC gets hotplugged they use
>> driver B, and if they want both kinds of failover - which would surely
>> be the expected case - they need to use both drivers. The usability is
>> a problem here.

Having both kind of failovers in the same PMD will always lead to the 
first solution in some form or another.

I am sure we can document all this in a way that does no cause users 
confusion, with the help of community feedback such as yours.

Perhaps "net_failsafe" is a misnomer? We also thought about 
"net_persistent" or "net_hotplug". Any other ideas?

It is also possible for me to remove the failover support from this 
series, only providing deferred hot-plug handling at first. I could then 
send the failover support as separate patches to better assert that it 
is a useful, secondary feature that is essentially free to implement.

>
>It seems everybody agrees on the need for the failsafe code.
>We are just discussing the right place to implement it.
>
>Gaetan, moving this code in the bonding PMD means replacing the bonding
>API design by the failsafe design, right?
>With the failsafe design in the bonding PMD, is it possible to keep other
>bonding features?

As seen previously, the bonding API is incompatible with device 
failover.

Having some features enabled solely for one kind of failover, while 
having specific code paths for both, seems unecessarily complicated to 
me ; following suite with my previous points about the first solution.

>
>In case we do not have a consensus in the following days, I suggest to add
>this topic in the next techboard meeting agenda.

Regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-15 14:25                 ` Gaëtan Rivet
@ 2017-03-16 20:50                   ` Neil Horman
  2017-03-17 10:56                     ` Gaëtan Rivet
  0 siblings, 1 reply; 49+ messages in thread
From: Neil Horman @ 2017-03-16 20:50 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: Thomas Monjalon, Bruce Richardson, dev, Adrien Mazarguil, techboard

On Wed, Mar 15, 2017 at 03:25:37PM +0100, Gaëtan Rivet wrote:
> On Wed, Mar 15, 2017 at 12:15:56PM +0100, Thomas Monjalon wrote:
> > 2017-03-15 03:28, Bruce Richardson:
> > > On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
> > > > - In the bonding, the init and configuration steps are still the
> > > >  responsibility of the application and no one else. The bonding PMD
> > > >  captures the device, re-applies its configuration upon dev_configure()
> > > >  which is actually re-applying part of the configuration already  present
> > > > within the slave eth_dev (cf rte_eth_dev_config_restore).
> > > >
> > > > - In the fail-safe, the init and configuration are both the
> > > >  responsibilities of the fail-safe PMD itself, not the application
> > > >  anymore. This handling of these responsibilities in lieu of the
> > > >  application is the whole point of the "deferred hot-plug" support, of
> > > >  proposing a simple implementation to the user.
> > > >
> > > > This change in responsibilities is the bulk of the fail-safe code. It
> > > > would have to be added as-is to the bonding. Verifying the correctness
> > > > of the sync of the initialization phase (acceptable states of a device
> > > > following several events registered by the fail-safe PMD) and the
> > > > configuration items between the state the application believes it is in
> > > > and the fail-safe knows it is in, is the bulk of the fail-safe code.
> > > >
> > > > This function is not overlapping with that of the bonding. The reason I
> > > > did not add this whole architecture to the bonding is that when I tried
> > > > to do so, I found that I only had two possibilities:
> > > >
> > > > - The current slave handling path is kept, and we only add a new one
> > > >  with additional functionalities: full init and conf handling with
> > > >  extended parsing capabilities.
> > > >
> > > > - The current slave handling is scraped and replaced entirely by the new
> > > >  slave management. The old capturing of existing device is not done
> > > >  anymore.
> > > >
> > > > The first solution is not acceptable, because we effectively end-up with
> > > > a maintenance nightmare by having to validate two types of slaves with
> > > > differing capabilities, differing initialization paths and differing
> > > > configuration code.  This is extremely awkward and architecturally
> > > > unsound. This is essentially the same as having the exact code of the
> > > > fail-safe as an aside in the bonding, maintening exactly the same
> > > > breadth of code while having muddier interfaces and organization.
> > > >
> > > > The second solution is not acceptable, because we are bending the whole
> > > > existing bonding API to our whim. We could just as well simply rename
> > > > the fail-safe PMD as bonding, add a few grouping capabilities and call
> > > > it a day. This is not acceptable for users.
> > > >
> > > If the first solution is indeed not an option, why do you think this
> > > second one would be unacceptable for users? If the functionality remains
> > > the same, I don't see how it matters much for users which driver
> > > provides it or where the code originates.
> > > 
> 
> The problem with the second solution is also that bonding is not only a PMD.
> It exposes its own public API that existing applications rely on, see
> rte_eth_bond_*() definitions in rte_eth_bond.h.
> 
> Although bonding instances can be set up through command-line options,
> target "users" are mainly applications explicitly written to use it.
> This must be preserved for no other reason that it hasn't been deprecated.
> 
I fail to see how either of your points are relevant.  The fact that the bonding
pmd exposes an api to the application has no bearing on its ability to implement
a hot plug function.

> Also, trying to implement this API for the device failover function would
> implies a device capture down to the devargs parsing level. This means that
> a PMD could request taking over a device, messing with the internals of the
> EAL: devargs list and busses lists of devices. This seems unacceptable.
> 
Why?  You just said yourself above that, while there is a devargs interface to
the bonding driver, there is also an api, which is the more used method to
configure bonding.  I'm not sure I agree with that, but I think its beside the
point.  Your PMD also requires configuration, and it appears necessecary that
you do so from the command line (you need to specifically ennumerate the
subdevices that you intend to provide failsafe behavior to).  I see no reason
why such a feature cant' be added to bonding, and the null pmd used as a
standin device, should the ennumerated device not yet exist).

To your argument regarding about taking over a device, I don't see how you find
that unacceptable, as it is precisely what the bonding driver does today, in the
sense that it allows an application to assign a master/slave relationship to
devices right now.  I see no reason that we can't convey the right and ability
for bonding to do that dynamically based on configuration.

> The bonding API is thus in conflict with the concept of a device failover in
> the context of the current DPDK arch.
> 
I really don't see how you get to this from your argument above.

> > > Despite all the discussion, it still just doesn't make sense to me to
> > > have more than one DPDK driver to handle failover - be it link or
> > > device. If nothing else, it's going to be awkward to explain to users
> > > that if they want fail-over for when a link goes down they have to use
> > > driver A, but if they want fail-over when a NIC gets hotplugged they use
> > > driver B, and if they want both kinds of failover - which would surely
> > > be the expected case - they need to use both drivers. The usability is
> > > a problem here.
> 
> Having both kind of failovers in the same PMD will always lead to the first
> solution in some form or another.
> 
It really isn't because you can model hotplug behavior as a trival form of the
failover that bonding does now (i.e. failover between a null device and a
preferred real device).

> I am sure we can document all this in a way that does no cause users
> confusion, with the help of community feedback such as yours.
> 
> Perhaps "net_failsafe" is a misnomer? We also thought about "net_persistent"
> or "net_hotplug". Any other ideas?
> 
> It is also possible for me to remove the failover support from this series,
> only providing deferred hot-plug handling at first. I could then send the
> failover support as separate patches to better assert that it is a useful,
> secondary feature that is essentially free to implement.
> 
I think thats solving the wrong problem.  I've no issue with the functionality
in this patch, its really the implementation that we are all arguing against.

> > 
> > It seems everybody agrees on the need for the failsafe code.
> > We are just discussing the right place to implement it.
> > 
> > Gaetan, moving this code in the bonding PMD means replacing the bonding
> > API design by the failsafe design, right?
> > With the failsafe design in the bonding PMD, is it possible to keep other
> > bonding features?
> 
> As seen previously, the bonding API is incompatible with device failover.
> 
Its not been seen previously, you asserted it to be so, and I certainly disagree
with that assertion.  I think others might too.

Additionally, its not really in line with this discussion, but in looking at
your hotplug detection code, I think somewhat lacking.  Currently you seem to
implement this with a timer that wakes up and checks for device existance, which
is pretty substandard in my mind.  Thats going to waste cpu cycles that might
lead to packet loss.  I'd really prefer to see you augment the eal library with
an event handling code (it can tie into udev in linux and kqueue in bsd), and
create a generic event hook, that we can use to detect device adds/removes
without having to wake up constantly to see if anything has changed.


> Having some features enabled solely for one kind of failover, while having
> specific code paths for both, seems unecessarily complicated to me ;
> following suite with my previous points about the first solution.
> 
> > 
> > In case we do not have a consensus in the following days, I suggest to add
> > this topic in the next techboard meeting agenda.
> 
> Regards,
> -- 
> Gaëtan Rivet
> 6WIND
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-16 20:50                   ` Neil Horman
@ 2017-03-17 10:56                     ` Gaëtan Rivet
  2017-03-18 19:51                       ` Neil Horman
  0 siblings, 1 reply; 49+ messages in thread
From: Gaëtan Rivet @ 2017-03-17 10:56 UTC (permalink / raw)
  To: Neil Horman
  Cc: Thomas Monjalon, Bruce Richardson, dev, Adrien Mazarguil, techboard

On Thu, Mar 16, 2017 at 04:50:43PM -0400, Neil Horman wrote:
>On Wed, Mar 15, 2017 at 03:25:37PM +0100, Gaëtan Rivet wrote:
>> On Wed, Mar 15, 2017 at 12:15:56PM +0100, Thomas Monjalon wrote:
>> > 2017-03-15 03:28, Bruce Richardson:
>> > > On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
>> > > > - In the bonding, the init and configuration steps are still the
>> > > >  responsibility of the application and no one else. The bonding PMD
>> > > >  captures the device, re-applies its configuration upon dev_configure()
>> > > >  which is actually re-applying part of the configuration already  present
>> > > > within the slave eth_dev (cf rte_eth_dev_config_restore).
>> > > >
>> > > > - In the fail-safe, the init and configuration are both the
>> > > >  responsibilities of the fail-safe PMD itself, not the application
>> > > >  anymore. This handling of these responsibilities in lieu of the
>> > > >  application is the whole point of the "deferred hot-plug" support, of
>> > > >  proposing a simple implementation to the user.
>> > > >
>> > > > This change in responsibilities is the bulk of the fail-safe code. It
>> > > > would have to be added as-is to the bonding. Verifying the correctness
>> > > > of the sync of the initialization phase (acceptable states of a device
>> > > > following several events registered by the fail-safe PMD) and the
>> > > > configuration items between the state the application believes it is in
>> > > > and the fail-safe knows it is in, is the bulk of the fail-safe code.
>> > > >
>> > > > This function is not overlapping with that of the bonding. The reason I
>> > > > did not add this whole architecture to the bonding is that when I tried
>> > > > to do so, I found that I only had two possibilities:
>> > > >
>> > > > - The current slave handling path is kept, and we only add a new one
>> > > >  with additional functionalities: full init and conf handling with
>> > > >  extended parsing capabilities.
>> > > >
>> > > > - The current slave handling is scraped and replaced entirely by the new
>> > > >  slave management. The old capturing of existing device is not done
>> > > >  anymore.
>> > > >
>> > > > The first solution is not acceptable, because we effectively end-up with
>> > > > a maintenance nightmare by having to validate two types of slaves with
>> > > > differing capabilities, differing initialization paths and differing
>> > > > configuration code.  This is extremely awkward and architecturally
>> > > > unsound. This is essentially the same as having the exact code of the
>> > > > fail-safe as an aside in the bonding, maintening exactly the same
>> > > > breadth of code while having muddier interfaces and organization.
>> > > >
>> > > > The second solution is not acceptable, because we are bending the whole
>> > > > existing bonding API to our whim. We could just as well simply rename
>> > > > the fail-safe PMD as bonding, add a few grouping capabilities and call
>> > > > it a day. This is not acceptable for users.
>> > > >
>> > > If the first solution is indeed not an option, why do you think this
>> > > second one would be unacceptable for users? If the functionality remains
>> > > the same, I don't see how it matters much for users which driver
>> > > provides it or where the code originates.
>> > >
>>
>> The problem with the second solution is also that bonding is not only a PMD.
>> It exposes its own public API that existing applications rely on, see
>> rte_eth_bond_*() definitions in rte_eth_bond.h.
>>
>> Although bonding instances can be set up through command-line options,
>> target "users" are mainly applications explicitly written to use it.
>> This must be preserved for no other reason that it hasn't been deprecated.
>>
>I fail to see how either of your points are relevant.  The fact that the bonding
>pmd exposes an api to the application has no bearing on its ability to implement
>a hot plug function.
>

This depends on the API making sense in the context of the new 
functionality.

This API offers to add and remove slaves to a bonding and to configure 
them. In the fail-safe arch, it is not possible to add and remove slaves 
from the grouping. Doing so would mean adding and removing devices from 
internal EAL structures.

It is also invalid to try to configure a fail-safe slave. An application 
only configures a fail-safe device, which will in turn configure its 
slaves. This separation follows from the nature of a device failover.

As seen previously, the fail-safe PMD handles different responsibilities 
from the bonding PMD. It is thus necessary to make different assumptions 
concerning what it can and cannot do with a slave.

>> Also, trying to implement this API for the device failover function would
>> implies a device capture down to the devargs parsing level. This means that
>> a PMD could request taking over a device, messing with the internals of the
>> EAL: devargs list and busses lists of devices. This seems unacceptable.
>>
>Why?  You just said yourself above that, while there is a devargs interface to
>the bonding driver, there is also an api, which is the more used method to
>configure bonding.  I'm not sure I agree with that, but I think its beside the
>point.  Your PMD also requires configuration, and it appears necessecary that
>you do so from the command line (you need to specifically ennumerate the
>subdevices that you intend to provide failsafe behavior to).  I see no reason
>why such a feature cant' be added to bonding, and the null pmd used as a
>standin device, should the ennumerated device not yet exist).
>
>To your argument regarding about taking over a device, I don't see how you find
>that unacceptable, as it is precisely what the bonding driver does today, in the
>sense that it allows an application to assign a master/slave relationship to
>devices right now.  I see no reason that we can't convey the right and ability
>for bonding to do that dynamically based on configuration.
>

No, the bonding PMD does not take over a device. It only cares about the 
ether layer for its link failover. It does not care about parsing 
parameters of a slave, probing devices, detaching drivers. It does not 
remove a device from the pci_device_list in the EAL for example.

Doing so would imply exposing private internals structures from the EAL, 
messing with elements reserved while doing the EAL init. This requires 
controlling a priority in the device initialization order to create the 
over-arching ones last (which is a hacky solution). It would wreak havoc 
with the DPDK arch.

The fail-safe PMD does not rely on the EAL for handling its slaves. This 
is what I explained before, when I touched upon the differing 
responsibilities implied by the differences in nature between a link 
failover and a device failover.

>> The bonding API is thus in conflict with the concept of a device failover in
>> the context of the current DPDK arch.
>>
>I really don't see how you get to this from your argument above.
>

The current DPDK arch does not expose EAL elements to be modified by 
PMDs, and with good reasons. In this context, it is not possible to 
handle slaves correctly for a device failover in the bonding PMD,
because the bonding PMD from the get-go expects the EAL to handle its 
slaves on a device level.

>> > > Despite all the discussion, it still just doesn't make sense to me to
>> > > have more than one DPDK driver to handle failover - be it link or
>> > > device. If nothing else, it's going to be awkward to explain to users
>> > > that if they want fail-over for when a link goes down they have to use
>> > > driver A, but if they want fail-over when a NIC gets hotplugged they use
>> > > driver B, and if they want both kinds of failover - which would surely
>> > > be the expected case - they need to use both drivers. The usability is
>> > > a problem here.
>>
>> Having both kind of failovers in the same PMD will always lead to the first
>> solution in some form or another.
>>
>It really isn't because you can model hotplug behavior as a trival form of the
>failover that bonding does now (i.e. failover between a null device and a
>preferred real device).
>

The preferred real device still has to be created / destroyed. It still 
relies on EAL entry points for handling. It still puts additional 
responsibilities on a PMD. Those responsibilities are expressed in sub 
layers clearly defined in the fail-safe PMD. You would have to create 
these sub-layers in some form in the bonding for it to be able to create 
a preferred real device at some point. This additional way of handling 
slaves has already been discussed as inducing a messy architecture to 
the bonding PMD.

>> I am sure we can document all this in a way that does no cause users
>> confusion, with the help of community feedback such as yours.
>>
>> Perhaps "net_failsafe" is a misnomer? We also thought about "net_persistent"
>> or "net_hotplug". Any other ideas?
>>
>> It is also possible for me to remove the failover support from this series,
>> only providing deferred hot-plug handling at first. I could then send the
>> failover support as separate patches to better assert that it is a useful,
>> secondary feature that is essentially free to implement.
>>
>I think thats solving the wrong problem.  I've no issue with the functionality
>in this patch, its really the implementation that we are all arguing against.
>
>> >
>> > It seems everybody agrees on the need for the failsafe code.
>> > We are just discussing the right place to implement it.
>> >
>> > Gaetan, moving this code in the bonding PMD means replacing the bonding
>> > API design by the failsafe design, right?
>> > With the failsafe design in the bonding PMD, is it possible to keep other
>> > bonding features?
>>
>> As seen previously, the bonding API is incompatible with device failover.
>>
>Its not been seen previously, you asserted it to be so, and I certainly disagree
>with that assertion.  I think others might too.
>

I also explained at length my assertion. I can certainly expand further 
if necessary, but you need to point the elements you disagree with.

>Additionally, its not really in line with this discussion, but in looking at
>your hotplug detection code, I think somewhat lacking.  Currently you seem to
>implement this with a timer that wakes up and checks for device existance, which
>is pretty substandard in my mind.  Thats going to waste cpu cycles that might
>lead to packet loss.  I'd really prefer to see you augment the eal library with
>an event handling code (it can tie into udev in linux and kqueue in bsd), and
>create a generic event hook, that we can use to detect device adds/removes
>without having to wake up constantly to see if anything has changed.
>
>

I think it's fine. We can discuss it further once we agree on the form 
the hot-plug implementation will take in the DPDK.

>> Having some features enabled solely for one kind of failover, while 
>> having
>> specific code paths for both, seems unecessarily complicated to me ;
>> following suite with my previous points about the first solution.
>>
>> >
>> > In case we do not have a consensus in the following days, I suggest to add
>> > this topic in the next techboard meeting agenda.

Best regards,
-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-17 10:56                     ` Gaëtan Rivet
@ 2017-03-18 19:51                       ` Neil Horman
  0 siblings, 0 replies; 49+ messages in thread
From: Neil Horman @ 2017-03-18 19:51 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: Thomas Monjalon, Bruce Richardson, dev, Adrien Mazarguil, techboard

On Fri, Mar 17, 2017 at 11:56:21AM +0100, Gaëtan Rivet wrote:
> On Thu, Mar 16, 2017 at 04:50:43PM -0400, Neil Horman wrote:
> > On Wed, Mar 15, 2017 at 03:25:37PM +0100, Gaëtan Rivet wrote:
> > > On Wed, Mar 15, 2017 at 12:15:56PM +0100, Thomas Monjalon wrote:
> > > > 2017-03-15 03:28, Bruce Richardson:
> > > > > On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
> > > > > > - In the bonding, the init and configuration steps are still the
> > > > > >  responsibility of the application and no one else. The bonding PMD
> > > > > >  captures the device, re-applies its configuration upon dev_configure()
> > > > > >  which is actually re-applying part of the configuration already  present
> > > > > > within the slave eth_dev (cf rte_eth_dev_config_restore).
> > > > > >
> > > > > > - In the fail-safe, the init and configuration are both the
> > > > > >  responsibilities of the fail-safe PMD itself, not the application
> > > > > >  anymore. This handling of these responsibilities in lieu of the
> > > > > >  application is the whole point of the "deferred hot-plug" support, of
> > > > > >  proposing a simple implementation to the user.
> > > > > >
> > > > > > This change in responsibilities is the bulk of the fail-safe code. It
> > > > > > would have to be added as-is to the bonding. Verifying the correctness
> > > > > > of the sync of the initialization phase (acceptable states of a device
> > > > > > following several events registered by the fail-safe PMD) and the
> > > > > > configuration items between the state the application believes it is in
> > > > > > and the fail-safe knows it is in, is the bulk of the fail-safe code.
> > > > > >
> > > > > > This function is not overlapping with that of the bonding. The reason I
> > > > > > did not add this whole architecture to the bonding is that when I tried
> > > > > > to do so, I found that I only had two possibilities:
> > > > > >
> > > > > > - The current slave handling path is kept, and we only add a new one
> > > > > >  with additional functionalities: full init and conf handling with
> > > > > >  extended parsing capabilities.
> > > > > >
> > > > > > - The current slave handling is scraped and replaced entirely by the new
> > > > > >  slave management. The old capturing of existing device is not done
> > > > > >  anymore.
> > > > > >
> > > > > > The first solution is not acceptable, because we effectively end-up with
> > > > > > a maintenance nightmare by having to validate two types of slaves with
> > > > > > differing capabilities, differing initialization paths and differing
> > > > > > configuration code.  This is extremely awkward and architecturally
> > > > > > unsound. This is essentially the same as having the exact code of the
> > > > > > fail-safe as an aside in the bonding, maintening exactly the same
> > > > > > breadth of code while having muddier interfaces and organization.
> > > > > >
> > > > > > The second solution is not acceptable, because we are bending the whole
> > > > > > existing bonding API to our whim. We could just as well simply rename
> > > > > > the fail-safe PMD as bonding, add a few grouping capabilities and call
> > > > > > it a day. This is not acceptable for users.
> > > > > >
> > > > > If the first solution is indeed not an option, why do you think this
> > > > > second one would be unacceptable for users? If the functionality remains
> > > > > the same, I don't see how it matters much for users which driver
> > > > > provides it or where the code originates.
> > > > >
> > > 
> > > The problem with the second solution is also that bonding is not only a PMD.
> > > It exposes its own public API that existing applications rely on, see
> > > rte_eth_bond_*() definitions in rte_eth_bond.h.
> > > 
> > > Although bonding instances can be set up through command-line options,
> > > target "users" are mainly applications explicitly written to use it.
> > > This must be preserved for no other reason that it hasn't been deprecated.
> > > 
> > I fail to see how either of your points are relevant.  The fact that the bonding
> > pmd exposes an api to the application has no bearing on its ability to implement
> > a hot plug function.
> > 
> 
> This depends on the API making sense in the context of the new
> functionality.
> 
Well, the api should always make sense in the context of any added
functionality, but it seems to me thats just another way of saying you might
need to make some modifications to the bonding api.  I'm not saying thats
necessecarily going to have to be the case, but while I'm a big proponent of ABI
stability, I would support an API change to add valid functionality if it were a
big enough feature.  Though again, I don't think thats really necessecary if you
rethink the model a little bit

> This API offers to add and remove slaves to a bonding and to configure them.
> In the fail-safe arch, it is not possible to add and remove slaves from the
> grouping. Doing so would mean adding and removing devices from internal EAL
> structures.
> 
Ok, so you update the bonding option parser to include a fail-safe mode, which
only supports two slaves, one of which is an instance of the null-pmd and the
other is to be added at a later date in response to a hot plug event.  In the
fail safe mode, after initial option parsing, bonds operating in this mode
return an error to the application when/if it attempts to remove a slave from a
bond.  That doesn't seem so hard to me.

> It is also invalid to try to configure a fail-safe slave. An application
> only configures a fail-safe device, which will in turn configure its slaves.
> This separation follows from the nature of a device failover.
> 
See my previous mail, you augment the null pmd to allow storage of aribtrary
configuration strings or key/value pairs when operating as a bonded slave.  The
bond is then capable of retrieving configuration from that null instance and
pushing it to the real slave on a hot plug event.

> As seen previously, the fail-safe PMD handles different responsibilities
> from the bonding PMD. It is thus necessary to make different assumptions
> concerning what it can and cannot do with a slave.
> 
Only because you seem to refuse to think about the failsafe model in any other
way than the one you have implemented.

> > > Also, trying to implement this API for the device failover function would
> > > implies a device capture down to the devargs parsing level. This means that
> > > a PMD could request taking over a device, messing with the internals of the
> > > EAL: devargs list and busses lists of devices. This seems unacceptable.
> > > 
> > Why?  You just said yourself above that, while there is a devargs interface to
> > the bonding driver, there is also an api, which is the more used method to
> > configure bonding.  I'm not sure I agree with that, but I think its beside the
> > point.  Your PMD also requires configuration, and it appears necessecary that
> > you do so from the command line (you need to specifically ennumerate the
> > subdevices that you intend to provide failsafe behavior to).  I see no reason
> > why such a feature cant' be added to bonding, and the null pmd used as a
> > standin device, should the ennumerated device not yet exist).
> > 
> > To your argument regarding about taking over a device, I don't see how you find
> > that unacceptable, as it is precisely what the bonding driver does today, in the
> > sense that it allows an application to assign a master/slave relationship to
> > devices right now.  I see no reason that we can't convey the right and ability
> > for bonding to do that dynamically based on configuration.
> > 
> 
> No, the bonding PMD does not take over a device. It only cares about the
> ether layer for its link failover. It does not care about parsing parameters
> of a slave, probing devices, detaching drivers. It does not remove a device
> from the pci_device_list in the EAL for example.
> 
Again, please take a moment and think about how else your failsafe model might
be implemented in the context of bonding.  Right now, you are asserting that
your failsafe model can't be implemented in any other way becausee of the
decisions you have made.  If you change how you think of failsafe, you'll see it
can be done in other ways.

> Doing so would imply exposing private internals structures from the EAL,
> messing with elements reserved while doing the EAL init. This requires
> controlling a priority in the device initialization order to create the
> over-arching ones last (which is a hacky solution). It would wreak havoc
> with the DPDK arch.
> 
> The fail-safe PMD does not rely on the EAL for handling its slaves. This is
> what I explained before, when I touched upon the differing responsibilities
> implied by the differences in nature between a link failover and a device
> failover.
> 
> > > The bonding API is thus in conflict with the concept of a device failover in
> > > the context of the current DPDK arch.
> > > 
> > I really don't see how you get to this from your argument above.
> > 
> 
> The current DPDK arch does not expose EAL elements to be modified by PMDs,
> and with good reasons. In this context, it is not possible to handle slaves
> correctly for a device failover in the bonding PMD,
> because the bonding PMD from the get-go expects the EAL to handle its slaves
> on a device level.
> 
> > > > > Despite all the discussion, it still just doesn't make sense to me to
> > > > > have more than one DPDK driver to handle failover - be it link or
> > > > > device. If nothing else, it's going to be awkward to explain to users
> > > > > that if they want fail-over for when a link goes down they have to use
> > > > > driver A, but if they want fail-over when a NIC gets hotplugged they use
> > > > > driver B, and if they want both kinds of failover - which would surely
> > > > > be the expected case - they need to use both drivers. The usability is
> > > > > a problem here.
> > > 
> > > Having both kind of failovers in the same PMD will always lead to the first
> > > solution in some form or another.
> > > 
> > It really isn't because you can model hotplug behavior as a trival form of the
> > failover that bonding does now (i.e. failover between a null device and a
> > preferred real device).
> > 
> 
> The preferred real device still has to be created / destroyed. It still
> relies on EAL entry points for handling. It still puts additional
> responsibilities on a PMD. Those responsibilities are expressed in sub
> layers clearly defined in the fail-safe PMD. You would have to create these
> sub-layers in some form in the bonding for it to be able to create a
> preferred real device at some point. This additional way of handling slaves
> has already been discussed as inducing a messy architecture to the bonding
> PMD.
> 
> > > I am sure we can document all this in a way that does no cause users
> > > confusion, with the help of community feedback such as yours.
> > > 
> > > Perhaps "net_failsafe" is a misnomer? We also thought about "net_persistent"
> > > or "net_hotplug". Any other ideas?
> > > 
> > > It is also possible for me to remove the failover support from this series,
> > > only providing deferred hot-plug handling at first. I could then send the
> > > failover support as separate patches to better assert that it is a useful,
> > > secondary feature that is essentially free to implement.
> > > 
> > I think thats solving the wrong problem.  I've no issue with the functionality
> > in this patch, its really the implementation that we are all arguing against.
> > 
> > > >
> > > > It seems everybody agrees on the need for the failsafe code.
> > > > We are just discussing the right place to implement it.
> > > >
> > > > Gaetan, moving this code in the bonding PMD means replacing the bonding
> > > > API design by the failsafe design, right?
> > > > With the failsafe design in the bonding PMD, is it possible to keep other
> > > > bonding features?
> > > 
> > > As seen previously, the bonding API is incompatible with device failover.
> > > 
> > Its not been seen previously, you asserted it to be so, and I certainly disagree
> > with that assertion.  I think others might too.
> > 
> 
> I also explained at length my assertion. I can certainly expand further if
> necessary, but you need to point the elements you disagree with.
> 
> > Additionally, its not really in line with this discussion, but in looking at
> > your hotplug detection code, I think somewhat lacking.  Currently you seem to
> > implement this with a timer that wakes up and checks for device existance, which
> > is pretty substandard in my mind.  Thats going to waste cpu cycles that might
> > lead to packet loss.  I'd really prefer to see you augment the eal library with
> > an event handling code (it can tie into udev in linux and kqueue in bsd), and
> > create a generic event hook, that we can use to detect device adds/removes
> > without having to wake up constantly to see if anything has changed.
> > 
> > 
> 
> I think it's fine. We can discuss it further once we agree on the form the
> hot-plug implementation will take in the DPDK.
> 


Well, until then, I feel like we're talking past one another at this point, and
so I'll just say this driver is a NAK for me.

Neil


> > > Having some features enabled solely for one kind of failover, while
> > > having
> > > specific code paths for both, seems unecessarily complicated to me ;
> > > following suite with my previous points about the first solution.
> > > 
> > > >
> > > > In case we do not have a consensus in the following days, I suggest to add
> > > > this topic in the next techboard meeting agenda.
> 
> Best regards,
> -- 
> Gaëtan Rivet
> 6WIND
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (13 preceding siblings ...)
  2017-03-08 16:54   ` [PATCH v2 00/13] introduce fail-safe PMD Neil Horman
@ 2017-03-20 15:00   ` Thomas Monjalon
  2017-05-17 12:50     ` Ferruh Yigit
  2017-03-23 13:01   ` Ferruh Yigit
  15 siblings, 1 reply; 49+ messages in thread
From: Thomas Monjalon @ 2017-03-20 15:00 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev, Adrien Mazarguil

There have been some discussions on this new PMD and it will be
discussed today in the techboard meeting.

I would like to expose my view and summarize the solutions I have heard.
First it is important to remind that everyone agrees on the need for
this feature, i.e. masking the hotplug events by maintaining an ethdev
object even without real underlying device.

1/ 
The proposal from Gaetan is to add a failsafe driver with 2 features:
	* masking underlying device
	* limited and small failover code to switch from a device
	  to another one, with the same centralized configuration
The latter feature makes think to the bonding driver, but it could be
kept limited without any intent of implementing real bonding features.

2/ 
If we really want to merge failsafe and bonding features, we could
create a new bonding driver with centralized configuration.
The legacy bonding driver let each slave to be configured separately.
It is a different model and we should not mix them.
If one is better, it could be deprecated later.

3/
It can be tried to implement the failsafe feature into the bonding
driver, as Neil suggests.
However, I am not sure it would work very well or would be easy to use.

4/
We can implement only the failsafe feature as a PMD and use it to wrap
the slaves of the bonding driver.
So the order of link would be 
	bonding -> failsafe -> real device
In this model, failsafe can have only one slave and do not implement
the fail-over feature.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
                     ` (14 preceding siblings ...)
  2017-03-20 15:00   ` Thomas Monjalon
@ 2017-03-23 13:01   ` Ferruh Yigit
  15 siblings, 0 replies; 49+ messages in thread
From: Ferruh Yigit @ 2017-03-23 13:01 UTC (permalink / raw)
  To: Gaetan Rivet, dev; +Cc: Thomas Monjalon, Adrien Mazarguil

On 3/8/2017 3:15 PM, Gaetan Rivet wrote:
> This PMD intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back so that
> existing applications do not need to be modified to benefit from true
> hot-plugging support.
> 
> The stacked PMD approach shares many similarities with the bonding PMD but
> with a different purpose. While bonding provides the ability to group
> several links into a single logical device for enhanced throughput and
> supports fail-over at link level, this one manages the sudden disappearance
> of the underlying device; it guarantees applications face a valid device in
> working order at all times.
> 
> Each fail-safe instance is configured to run atop one or several
> devices, with one defined as the preferred device. Hot-plug events are
> handled on all of them, and Tx is always directed to the preferred device
> if present or to the next available failover device (Rx is always performed
> on all devices for simplicity).
> 
> Moreover, the configured slaves (preferred or failover) do not need to be
> present at initialization time and may appear later.
> 
> Slaves configuration is continuously synchronized with that of the virtual
> device, which exposes their common set of capabilities to the application.
> Failure to apply the current configuration state to a slave for any reason
> simply reschedules its initialization.
> 
> This series depends on the series
> [PATCH 0/4] clarify eth_dev state management
> [PATCH 0/5] add device removal event
> 
> v1 --> v2:
> 
>   - Wrote documentation
>   - Fixed commit logs, signed-off-by
>   - Added LSC event support
>   - A few minor fixes
> 
> Gaetan Rivet (13):
>   ethdev: save VLAN filter setting
>   ethdev: add flow API rule copy function
>   ethdev: add deferred intermediate device state
>   pci: expose device detach routine
>   pci: expose parse and probe routines
>   net/failsafe: add fail-safe PMD
>   net/failsafe: add plug-in support
>   net/failsafe: add flexible device definition
>   net/failsafe: support flow API
>   net/failsafe: support offload capabilities
>   net/failsafe: add fast burst functions
>   net/failsafe: support device removal
>   net/failsafe: support link status change event

Postponed to next release (17.08) based on tech board decision [1]

[1]
http://dpdk.org/ml/archives/dev/2017-March/061009.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-03-20 15:00   ` Thomas Monjalon
@ 2017-05-17 12:50     ` Ferruh Yigit
  2017-05-17 16:59       ` Gaëtan Rivet
  0 siblings, 1 reply; 49+ messages in thread
From: Ferruh Yigit @ 2017-05-17 12:50 UTC (permalink / raw)
  To: Thomas Monjalon, Gaetan Rivet
  Cc: dev, Adrien Mazarguil, Neil Horman, Bruce Richardson, Stephen Hemminger

On 3/20/2017 3:00 PM, Thomas Monjalon wrote:
> There have been some discussions on this new PMD and it will be
> discussed today in the techboard meeting.
> 
> I would like to expose my view and summarize the solutions I have heard.
> First it is important to remind that everyone agrees on the need for
> this feature, i.e. masking the hotplug events by maintaining an ethdev
> object even without real underlying device.
> 
> 1/ 
> The proposal from Gaetan is to add a failsafe driver with 2 features:
> 	* masking underlying device
> 	* limited and small failover code to switch from a device
> 	  to another one, with the same centralized configuration
> The latter feature makes think to the bonding driver, but it could be
> kept limited without any intent of implementing real bonding features.
> 
> 2/ 
> If we really want to merge failsafe and bonding features, we could
> create a new bonding driver with centralized configuration.
> The legacy bonding driver let each slave to be configured separately.
> It is a different model and we should not mix them.
> If one is better, it could be deprecated later.
> 
> 3/
> It can be tried to implement the failsafe feature into the bonding
> driver, as Neil suggests.
> However, I am not sure it would work very well or would be easy to use.
> 
> 4/
> We can implement only the failsafe feature as a PMD and use it to wrap
> the slaves of the bonding driver.
> So the order of link would be 
> 	bonding -> failsafe -> real device
> In this model, failsafe can have only one slave and do not implement
> the fail-over feature.
> 

Tech board decided [1] to "reconsider" the PMD for this release (17.08).
So, lets start it J

I think it is good idea to continue on top of above summary, is there a
plan to how to proceed?

Thanks,
ferruh

[1]
http://dpdk.org/ml/archives/dev/2017-March/061009.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 00/13] introduce fail-safe PMD
  2017-05-17 12:50     ` Ferruh Yigit
@ 2017-05-17 16:59       ` Gaëtan Rivet
  0 siblings, 0 replies; 49+ messages in thread
From: Gaëtan Rivet @ 2017-05-17 16:59 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Thomas Monjalon, dev, Adrien Mazarguil, Neil Horman,
	Bruce Richardson, Stephen Hemminger

On Wed, May 17, 2017 at 01:50:40PM +0100, Ferruh Yigit wrote:
>On 3/20/2017 3:00 PM, Thomas Monjalon wrote:
>> There have been some discussions on this new PMD and it will be
>> discussed today in the techboard meeting.
>>
>> I would like to expose my view and summarize the solutions I have heard.
>> First it is important to remind that everyone agrees on the need for
>> this feature, i.e. masking the hotplug events by maintaining an ethdev
>> object even without real underlying device.
>>
>> 1/
>> The proposal from Gaetan is to add a failsafe driver with 2 features:
>> 	* masking underlying device
>> 	* limited and small failover code to switch from a device
>> 	  to another one, with the same centralized configuration
>> The latter feature makes think to the bonding driver, but it could be
>> kept limited without any intent of implementing real bonding features.
>>
>> 2/
>> If we really want to merge failsafe and bonding features, we could
>> create a new bonding driver with centralized configuration.
>> The legacy bonding driver let each slave to be configured separately.
>> It is a different model and we should not mix them.
>> If one is better, it could be deprecated later.
>>
>> 3/
>> It can be tried to implement the failsafe feature into the bonding
>> driver, as Neil suggests.
>> However, I am not sure it would work very well or would be easy to use.
>>
>> 4/
>> We can implement only the failsafe feature as a PMD and use it to wrap
>> the slaves of the bonding driver.
>> So the order of link would be
>> 	bonding -> failsafe -> real device
>> In this model, failsafe can have only one slave and do not implement
>> the fail-over feature.
>>
>
>Tech board decided [1] to "reconsider" the PMD for this release (17.08).
>So, lets start it J
>
>I think it is good idea to continue on top of above summary, is there a
>plan to how to proceed?
>

The fail-safe proposal has not evolved from the techboard point of view.
The salient point is still choosing between those 4 possible integrations.

To give a quick overview of its current state:

I have started working on a v3 to be integrated to v17.08. The work
however was exceedingly complicated due to deep-rooted dependencies in
the PCI implementation within the EAL, which has evolved in v17.05 and
will evolve during this release.

The current rte_bus rework from Jan Blunck and myself will greatly
simplify the sub-EAL layer that was used in the fail-safe. I am thus
waiting on Jan Blunck series on attach / detach, to propose mine in
turn for rte_devargs, move the PCI bus where it belongs and, finally,
rebase the fail-safe upon it.

The form this work is taking however is still the same as previously,
thus currently aiming at solutions 1 or 2.

>Thanks,
>ferruh
>
>[1]
>http://dpdk.org/ml/archives/dev/2017-March/061009.html

-- 
Gaëtan Rivet
6WIND

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2017-05-17 16:59 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 15:40 [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
2017-03-03 15:40 ` [PATCH 01/12] ethdev: save VLAN filter setting Gaetan Rivet
2017-03-03 17:33   ` Stephen Hemminger
2017-03-03 15:40 ` [PATCH 02/12] ethdev: add flow API rule copy function Gaetan Rivet
2017-03-03 15:40 ` [PATCH 03/12] ethdev: add deferred intermediate device state Gaetan Rivet
2017-03-03 17:34   ` Stephen Hemminger
2017-03-03 15:40 ` [PATCH 04/12] pci: expose device detach routine Gaetan Rivet
2017-03-03 15:40 ` [PATCH 05/12] pci: expose parse and probe routines Gaetan Rivet
2017-03-03 15:40 ` [PATCH 06/12] net/failsafe: add fail-safe PMD Gaetan Rivet
2017-03-03 17:38   ` Stephen Hemminger
2017-03-06 14:19     ` Gaëtan Rivet
2017-03-03 15:40 ` [PATCH 07/12] net/failsafe: add plug-in support Gaetan Rivet
2017-03-03 15:40 ` [PATCH 08/12] net/failsafe: add flexible device definition Gaetan Rivet
2017-03-03 15:40 ` [PATCH 09/12] net/failsafe: support flow API Gaetan Rivet
2017-03-03 15:40 ` [PATCH 10/12] net/failsafe: support offload capabilities Gaetan Rivet
2017-03-03 15:40 ` [PATCH 11/12] net/failsafe: add fast burst functions Gaetan Rivet
2017-03-03 15:40 ` [PATCH 12/12] net/failsafe: support device removal Gaetan Rivet
2017-03-03 16:14 ` [PATCH 00/12] introduce fail-safe PMD Bruce Richardson
2017-03-06 13:53   ` Gaëtan Rivet
2017-03-03 17:27 ` Stephen Hemminger
2017-03-08 15:15 ` [PATCH v2 00/13] " Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 01/13] ethdev: save VLAN filter setting Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 02/13] ethdev: add flow API rule copy function Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 03/13] ethdev: add deferred intermediate device state Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 04/13] pci: expose device detach routine Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 05/13] pci: expose parse and probe routines Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 06/13] net/failsafe: add fail-safe PMD Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 07/13] net/failsafe: add plug-in support Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 08/13] net/failsafe: add flexible device definition Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 09/13] net/failsafe: support flow API Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 10/13] net/failsafe: support offload capabilities Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 11/13] net/failsafe: add fast burst functions Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 12/13] net/failsafe: support device removal Gaetan Rivet
2017-03-08 15:15   ` [PATCH v2 13/13] net/failsafe: support link status change event Gaetan Rivet
2017-03-08 16:54   ` [PATCH v2 00/13] introduce fail-safe PMD Neil Horman
2017-03-09  9:15     ` Bruce Richardson
2017-03-10  9:13       ` Gaëtan Rivet
2017-03-10 22:43         ` Neil Horman
2017-03-14 14:49           ` Gaëtan Rivet
2017-03-15  3:28             ` Bruce Richardson
2017-03-15 11:15               ` Thomas Monjalon
2017-03-15 14:25                 ` Gaëtan Rivet
2017-03-16 20:50                   ` Neil Horman
2017-03-17 10:56                     ` Gaëtan Rivet
2017-03-18 19:51                       ` Neil Horman
2017-03-20 15:00   ` Thomas Monjalon
2017-05-17 12:50     ` Ferruh Yigit
2017-05-17 16:59       ` Gaëtan Rivet
2017-03-23 13:01   ` Ferruh Yigit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.