All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] support reset of VF link
@ 2016-06-06  5:40 Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 1/8] lib/librte_ether: support device reset Wenzhuo Lu
                   ` (9 more replies)
  0 siblings, 10 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev

If the PF link is down and up, VF link will not work
accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.

PS: This patch set is splitted from a previous patch set, *automatic
link recovery on ixgbe/igb VF*, and it's base on the patch set
*support mailbox interruption on ixgbe/igb VF*.

Wenzhuo Lu (8):
  lib/librte_ether: support device reset
  lib/librte_ether: defind RX/TX lock mode
  ixgbe: RX/TX with lock on VF
  ixgbe: implement device reset on VF
  igb: RX/TX with lock on VF
  igb: implement device reset on VF
  i40e:RX/TX with lock on VF
  i40e: implement device reset on VF

 doc/guides/rel_notes/release_16_07.rst |  14 ++++
 drivers/net/e1000/e1000_ethdev.h       | 126 ++++++++++++++++++++++++++++
 drivers/net/e1000/igb_ethdev.c         | 118 +++++++++++++++++++++++++-
 drivers/net/e1000/igb_rxtx.c           | 148 +++++++++------------------------
 drivers/net/i40e/i40e_ethdev.c         |   4 +-
 drivers/net/i40e/i40e_ethdev.h         |   5 ++
 drivers/net/i40e/i40e_ethdev_vf.c      | 145 +++++++++++++++++++++++++++++++-
 drivers/net/i40e/i40e_rxtx.c           |  45 ++++++----
 drivers/net/i40e/i40e_rxtx.h           |  34 ++++++++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 120 +++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  32 ++++++-
 drivers/net/ixgbe/ixgbe_rxtx.c         | 116 +++++++++++++++++++++++---
 drivers/net/ixgbe/ixgbe_rxtx.h         |  13 +++
 drivers/net/ixgbe/ixgbe_rxtx_vec.c     |   6 ++
 lib/librte_ether/rte_ethdev.c          |  17 ++++
 lib/librte_ether/rte_ethdev.h          |  76 +++++++++++++++++
 lib/librte_ether/rte_ether_version.map |   7 ++
 17 files changed, 879 insertions(+), 147 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/8] lib/librte_ether: support device reset
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode Wenzhuo Lu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down/up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 lib/librte_ether/rte_ethdev.c          | 17 +++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 14 ++++++++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++++++
 3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..e43dca9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
 				-ENOTSUP);
 	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
 }
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+	struct rte_eth_dev *dev;
+	int diag;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+	diag = (*dev->dev_ops->dev_reset)(dev);
+
+	return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..74e895f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
 	 uint8_t en);
 /**< @internal enable/disable the l2 tunnel offload functions */
 
+typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
 #ifdef RTE_NIC_BYPASS
 
 enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
 	eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
 	/** Enable/disable l2 tunnel offload functions */
 	eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+	/** Reset device. */
+	eth_dev_reset_t dev_reset;
 };
 
 /**
@@ -4253,6 +4258,15 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
 				  uint32_t mask,
 				  uint8_t en);
 
+/**
+ * Reset an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
 	rte_eth_tx_buffer_set_err_callback;
 
 } DPDK_2.2;
+
+DPDK_16.07 {
+	global:
+
+	rte_eth_dev_reset;
+
+} DPDK_16.04;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 1/8] lib/librte_ether: support device reset Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-08  2:15   ` Stephen Hemminger
  2016-06-06  5:40 ` [PATCH 3/8] ixgbe: RX/TX with lock on VF Wenzhuo Lu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu, Zhe Tao

Define lock mode for RX/TX queue. Because when resetting
the device we want the resetting thread to get the lock
of the RX/TX queue to make sure the RX/TX is stopped.

Using next ABI macro for this ABI change as it has too
much impact. 7 APIs and 1 global variable are impacted.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
Signed-off-by: Zhe Tao <zhe.tao@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 62 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 74e895f..4efb5e9 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -354,7 +354,12 @@ struct rte_eth_rxmode {
 		jumbo_frame      : 1, /**< Jumbo Frame Receipt enable. */
 		hw_strip_crc     : 1, /**< Enable CRC stripping by hardware. */
 		enable_scatter   : 1, /**< Enable scatter packets rx handler */
+#ifndef RTE_NEXT_ABI
 		enable_lro       : 1; /**< Enable LRO */
+#else
+		enable_lro       : 1, /**< Enable LRO */
+		lock_mode        : 1; /**< Using lock path */
+#endif
 };
 
 /**
@@ -634,11 +639,68 @@ struct rte_eth_txmode {
 		/**< If set, reject sending out tagged pkts */
 		hw_vlan_reject_untagged : 1,
 		/**< If set, reject sending out untagged pkts */
+#ifndef RTE_NEXT_ABI
 		hw_vlan_insert_pvid : 1;
 		/**< If set, enable port based VLAN insertion */
+#else
+		hw_vlan_insert_pvid : 1,
+		/**< If set, enable port based VLAN insertion */
+		lock_mode : 1;
+		/**< If set, using lock path */
+#endif
 };
 
 /**
+ * The macros for the RX/TX lock mode functions
+ */
+#ifdef RTE_NEXT_ABI
+#define RX_LOCK_FUNCTION(dev, func) \
+	(dev->data->dev_conf.rxmode.lock_mode ? \
+	func ## _lock : func)
+
+#define TX_LOCK_FUNCTION(dev, func) \
+	(dev->data->dev_conf.txmode.lock_mode ? \
+	func ## _lock : func)
+#else
+#define RX_LOCK_FUNCTION(dev, func) func
+
+#define TX_LOCK_FUNCTION(dev, func) func
+#endif
+
+/* Add the lock RX/TX function for VF reset */
+#define GENERATE_RX_LOCK(func, nic) \
+uint16_t func ## _lock(void *rx_queue, \
+		      struct rte_mbuf **rx_pkts, \
+		      uint16_t nb_pkts) \
+{					\
+	struct nic ## _rx_queue *rxq = rx_queue; \
+	uint16_t nb_rx = 0; \
+						\
+	if (rte_spinlock_trylock(&rxq->rx_lock)) { \
+		nb_rx = func(rx_queue, rx_pkts, nb_pkts); \
+		rte_spinlock_unlock(&rxq->rx_lock); \
+	} \
+	\
+	return nb_rx; \
+}
+
+#define GENERATE_TX_LOCK(func, nic) \
+uint16_t func ## _lock(void *tx_queue, \
+		      struct rte_mbuf **tx_pkts, \
+		      uint16_t nb_pkts) \
+{					\
+	struct nic ## _tx_queue *txq = tx_queue; \
+	uint16_t nb_tx = 0; \
+						\
+	if (rte_spinlock_trylock(&txq->tx_lock)) { \
+		nb_tx = func(tx_queue, tx_pkts, nb_pkts); \
+		rte_spinlock_unlock(&txq->tx_lock); \
+	} \
+	\
+	return nb_tx; \
+}
+
+/**
  * A structure used to configure an RX ring of an Ethernet port.
  */
 struct rte_eth_rxconf {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/8] ixgbe: RX/TX with lock on VF
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 1/8] lib/librte_ether: support device reset Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 4/8] ixgbe: implement device reset " Wenzhuo Lu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

Add RX/TX paths with lock for VF. It's used when
the function of link reset on VF is needed.
When the lock for RX/TX is added, the RX/TX can be
stopped. Then we have a chance to reset the VF link.

Please be aware there's performence drop if the lock
path is chosen.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c   | 12 +++++--
 drivers/net/ixgbe/ixgbe_ethdev.h   | 20 +++++++++++
 drivers/net/ixgbe/ixgbe_rxtx.c     | 74 ++++++++++++++++++++++++++++++++------
 drivers/net/ixgbe/ixgbe_rxtx.h     | 13 +++++++
 drivers/net/ixgbe/ixgbe_rxtx_vec.c |  6 ++++
 5 files changed, 112 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 05f4f29..fd2682f 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1325,8 +1325,8 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 	PMD_INIT_FUNC_TRACE();
 
 	eth_dev->dev_ops = &ixgbevf_eth_dev_ops;
-	eth_dev->rx_pkt_burst = &ixgbe_recv_pkts;
-	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
+	eth_dev->rx_pkt_burst = RX_LOCK_FUNCTION(eth_dev, ixgbe_recv_pkts);
+	eth_dev->tx_pkt_burst = TX_LOCK_FUNCTION(eth_dev, ixgbe_xmit_pkts);
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -3012,7 +3012,15 @@ ixgbe_dev_supported_ptypes_get(struct rte_eth_dev *dev)
 	if (dev->rx_pkt_burst == ixgbe_recv_pkts ||
 	    dev->rx_pkt_burst == ixgbe_recv_pkts_lro_single_alloc ||
 	    dev->rx_pkt_burst == ixgbe_recv_pkts_lro_bulk_alloc ||
+#ifndef RTE_NEXT_ABI
 	    dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc)
+#else
+	    dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc ||
+	    dev->rx_pkt_burst == ixgbe_recv_pkts_lock ||
+	    dev->rx_pkt_burst == ixgbe_recv_pkts_lro_single_alloc_lock ||
+	    dev->rx_pkt_burst == ixgbe_recv_pkts_lro_bulk_alloc_lock ||
+	    dev->rx_pkt_burst == ixgbe_recv_pkts_bulk_alloc_lock)
+#endif
 		return ptypes;
 	return NULL;
 }
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..701107b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -390,12 +390,32 @@ uint16_t ixgbe_recv_pkts_lro_single_alloc(void *rx_queue,
 uint16_t ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 
+uint16_t ixgbe_recv_pkts_lock(void *rx_queue,
+			      struct rte_mbuf **rx_pkts,
+			      uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_bulk_alloc_lock(void *rx_queue,
+					 struct rte_mbuf **rx_pkts,
+					 uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_lro_single_alloc_lock(void *rx_queue,
+					       struct rte_mbuf **rx_pkts,
+					       uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_lro_bulk_alloc_lock(void *rx_queue,
+					     struct rte_mbuf **rx_pkts,
+					     uint16_t nb_pkts);
+
 uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbe_xmit_pkts_lock(void *tx_queue,
+			      struct rte_mbuf **tx_pkts,
+			      uint16_t nb_pkts);
+uint16_t ixgbe_xmit_pkts_simple_lock(void *tx_queue,
+				     struct rte_mbuf **tx_pkts,
+				     uint16_t nb_pkts);
+
 int ixgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
 			      struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9c6eaf2..a45d115 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -353,6 +353,8 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 	return nb_tx;
 }
 
+GENERATE_TX_LOCK(ixgbe_xmit_pkts_simple, ixgbe)
+
 static inline void
 ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
 		volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
@@ -904,6 +906,8 @@ end_of_tx:
 	return nb_tx;
 }
 
+GENERATE_TX_LOCK(ixgbe_xmit_pkts, ixgbe)
+
 /*********************************************************************
  *
  *  RX functions
@@ -1524,6 +1528,8 @@ ixgbe_recv_pkts_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
+GENERATE_RX_LOCK(ixgbe_recv_pkts_bulk_alloc, ixgbe)
+
 uint16_t
 ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts)
@@ -1712,6 +1718,8 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
+GENERATE_RX_LOCK(ixgbe_recv_pkts, ixgbe)
+
 /**
  * Detect an RSC descriptor.
  */
@@ -2071,6 +2079,8 @@ ixgbe_recv_pkts_lro_single_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, false);
 }
 
+GENERATE_RX_LOCK(ixgbe_recv_pkts_lro_single_alloc, ixgbe)
+
 uint16_t
 ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
 			       uint16_t nb_pkts)
@@ -2078,6 +2088,8 @@ ixgbe_recv_pkts_lro_bulk_alloc(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return ixgbe_recv_pkts_lro(rx_queue, rx_pkts, nb_pkts, true);
 }
 
+GENERATE_RX_LOCK(ixgbe_recv_pkts_lro_bulk_alloc, ixgbe)
+
 /*********************************************************************
  *
  *  Queue management functions
@@ -2186,10 +2198,12 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				(rte_eal_process_type() != RTE_PROC_PRIMARY ||
 					ixgbe_txq_vec_setup(txq) == 0)) {
 			PMD_INIT_LOG(DEBUG, "Vector tx enabled.");
-			dev->tx_pkt_burst = ixgbe_xmit_pkts_vec;
+			dev->tx_pkt_burst =
+				TX_LOCK_FUNCTION(dev, ixgbe_xmit_pkts_vec);
 		} else
 #endif
-		dev->tx_pkt_burst = ixgbe_xmit_pkts_simple;
+		dev->tx_pkt_burst =
+			TX_LOCK_FUNCTION(dev, ixgbe_xmit_pkts_simple);
 	} else {
 		PMD_INIT_LOG(DEBUG, "Using full-featured tx code path");
 		PMD_INIT_LOG(DEBUG,
@@ -2200,7 +2214,7 @@ ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 				" - tx_rs_thresh = %lu " "[RTE_PMD_IXGBE_TX_MAX_BURST=%lu]",
 				(unsigned long)txq->tx_rs_thresh,
 				(unsigned long)RTE_PMD_IXGBE_TX_MAX_BURST);
-		dev->tx_pkt_burst = ixgbe_xmit_pkts;
+		dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, ixgbe_xmit_pkts);
 	}
 }
 
@@ -2347,6 +2361,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->txq_flags = tx_conf->txq_flags;
 	txq->ops = &def_txq_ops;
 	txq->tx_deferred_start = tx_conf->tx_deferred_start;
+	rte_spinlock_init(&txq->tx_lock);
 
 	/*
 	 * Modification to set VFTDT for virtual function if vf is detected
@@ -2625,6 +2640,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
 							0 : ETHER_CRC_LEN);
 	rxq->drop_en = rx_conf->rx_drop_en;
 	rxq->rx_deferred_start = rx_conf->rx_deferred_start;
+	rte_spinlock_init(&rxq->rx_lock);
 
 	/*
 	 * The packet type in RX descriptor is different for different NICs.
@@ -4172,11 +4188,15 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
 		if (adapter->rx_bulk_alloc_allowed) {
 			PMD_INIT_LOG(DEBUG, "LRO is requested. Using a bulk "
 					   "allocation version");
-			dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
+			dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(dev,
+					ixgbe_recv_pkts_lro_bulk_alloc);
 		} else {
 			PMD_INIT_LOG(DEBUG, "LRO is requested. Using a single "
 					   "allocation version");
-			dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
+			dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(dev,
+					ixgbe_recv_pkts_lro_single_alloc);
 		}
 	} else if (dev->data->scattered_rx) {
 		/*
@@ -4188,12 +4208,16 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
 					    "callback (port=%d).",
 				     dev->data->port_id);
 
-			dev->rx_pkt_burst = ixgbe_recv_scattered_pkts_vec;
+			dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(dev,
+					ixgbe_recv_scattered_pkts_vec);
 		} else if (adapter->rx_bulk_alloc_allowed) {
 			PMD_INIT_LOG(DEBUG, "Using a Scattered with bulk "
 					   "allocation callback (port=%d).",
 				     dev->data->port_id);
-			dev->rx_pkt_burst = ixgbe_recv_pkts_lro_bulk_alloc;
+			dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(dev,
+					ixgbe_recv_pkts_lro_bulk_alloc);
 		} else {
 			PMD_INIT_LOG(DEBUG, "Using Regualr (non-vector, "
 					    "single allocation) "
@@ -4201,7 +4225,9 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
 					    "(port=%d).",
 				     dev->data->port_id);
 
-			dev->rx_pkt_burst = ixgbe_recv_pkts_lro_single_alloc;
+			dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(dev,
+					ixgbe_recv_pkts_lro_single_alloc);
 		}
 	/*
 	 * Below we set "simple" callbacks according to port/queues parameters.
@@ -4217,28 +4243,36 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
 			     RTE_IXGBE_DESCS_PER_LOOP,
 			     dev->data->port_id);
 
-		dev->rx_pkt_burst = ixgbe_recv_pkts_vec;
+		dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, ixgbe_recv_pkts_vec);
 	} else if (adapter->rx_bulk_alloc_allowed) {
 		PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are "
 				    "satisfied. Rx Burst Bulk Alloc function "
 				    "will be used on port=%d.",
 			     dev->data->port_id);
 
-		dev->rx_pkt_burst = ixgbe_recv_pkts_bulk_alloc;
+		dev->rx_pkt_burst =
+			RX_LOCK_FUNCTION(dev,
+				ixgbe_recv_pkts_bulk_alloc);
 	} else {
 		PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are not "
 				    "satisfied, or Scattered Rx is requested "
 				    "(port=%d).",
 			     dev->data->port_id);
 
-		dev->rx_pkt_burst = ixgbe_recv_pkts;
+		dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, ixgbe_recv_pkts);
 	}
 
 	/* Propagate information about RX function choice through all queues. */
 
 	rx_using_sse =
 		(dev->rx_pkt_burst == ixgbe_recv_scattered_pkts_vec ||
+#ifndef RTE_NEXT_ABI
 		dev->rx_pkt_burst == ixgbe_recv_pkts_vec);
+#else
+		 dev->rx_pkt_burst == ixgbe_recv_pkts_vec ||
+		 dev->rx_pkt_burst == ixgbe_recv_scattered_pkts_vec_lock ||
+		 dev->rx_pkt_burst == ixgbe_recv_pkts_vec_lock);
+#endif
 
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 		struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
@@ -5225,6 +5259,15 @@ ixgbe_recv_pkts_vec(
 }
 
 uint16_t __attribute__((weak))
+ixgbe_recv_pkts_vec_lock(
+	void __rte_unused *rx_queue,
+	struct rte_mbuf __rte_unused **rx_pkts,
+	uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
+uint16_t __attribute__((weak))
 ixgbe_recv_scattered_pkts_vec(
 	void __rte_unused *rx_queue,
 	struct rte_mbuf __rte_unused **rx_pkts,
@@ -5233,6 +5276,15 @@ ixgbe_recv_scattered_pkts_vec(
 	return 0;
 }
 
+uint16_t __attribute__((weak))
+ixgbe_recv_scattered_pkts_vec_lock(
+	void __rte_unused *rx_queue,
+	struct rte_mbuf __rte_unused **rx_pkts,
+	uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
 int __attribute__((weak))
 ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
 {
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 3691a19..5f0ca1f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -34,6 +34,8 @@
 #ifndef _IXGBE_RXTX_H_
 #define _IXGBE_RXTX_H_
 
+#include <rte_spinlock.h>
+
 /*
  * Rings setup and release.
  *
@@ -126,6 +128,7 @@ struct ixgbe_rx_queue {
 	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
 	struct rte_mbuf *pkt_last_seg; /**< Last segment of current packet. */
 	uint64_t            mbuf_initializer; /**< value to init mbufs */
+	rte_spinlock_t      rx_lock; /**< Lock for packet receiption. */
 	uint16_t            nb_rx_desc; /**< number of RX descriptors. */
 	uint16_t            rx_tail;  /**< current value of RDT register. */
 	uint16_t            nb_rx_hold; /**< number of held free RX desc. */
@@ -212,6 +215,7 @@ struct ixgbe_tx_queue {
 		struct ixgbe_tx_entry_v *sw_ring_v; /**< address of SW ring for vector PMD */
 	};
 	volatile uint32_t   *tdt_reg_addr; /**< Address of TDT register. */
+	rte_spinlock_t      tx_lock; /**< Lock for packet transmission. */
 	uint16_t            nb_tx_desc;    /**< number of TX descriptors. */
 	uint16_t            tx_tail;       /**< current value of TDT reg. */
 	/**< Start freeing TX buffers if there are less free descriptors than
@@ -301,6 +305,12 @@ uint16_t ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 uint16_t ixgbe_recv_scattered_pkts_vec(void *rx_queue,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t ixgbe_recv_pkts_vec_lock(void *rx_queue,
+				  struct rte_mbuf **rx_pkts,
+				  uint16_t nb_pkts);
+uint16_t ixgbe_recv_scattered_pkts_vec_lock(void *rx_queue,
+					    struct rte_mbuf **rx_pkts,
+					    uint16_t nb_pkts);
 int ixgbe_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
 int ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq);
 void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq);
@@ -309,6 +319,9 @@ void ixgbe_rx_queue_release_mbufs_vec(struct ixgbe_rx_queue *rxq);
 
 uint16_t ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
+uint16_t ixgbe_xmit_pkts_vec_lock(void *tx_queue,
+				  struct rte_mbuf **tx_pkts,
+				  uint16_t nb_pkts);
 int ixgbe_txq_vec_setup(struct ixgbe_tx_queue *txq);
 
 #endif /* RTE_IXGBE_INC_VECTOR */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index e97ea82..32ecbd2 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -420,6 +420,8 @@ ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return _recv_raw_pkts_vec(rx_queue, rx_pkts, nb_pkts, NULL);
 }
 
+GENERATE_RX_LOCK(ixgbe_recv_pkts_vec, ixgbe)
+
 static inline uint16_t
 reassemble_packets(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_bufs,
 		   uint16_t nb_bufs, uint8_t *split_flags)
@@ -526,6 +528,8 @@ ixgbe_recv_scattered_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 		&split_flags[i]);
 }
 
+GENERATE_RX_LOCK(ixgbe_recv_scattered_pkts_vec, ixgbe)
+
 static inline void
 vtx1(volatile union ixgbe_adv_tx_desc *txdp,
 		struct rte_mbuf *pkt, uint64_t flags)
@@ -680,6 +684,8 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 	return nb_pkts;
 }
 
+GENERATE_TX_LOCK(ixgbe_xmit_pkts_vec, ixgbe)
+
 static void __attribute__((cold))
 ixgbe_tx_queue_release_mbufs_vec(struct ixgbe_tx_queue *txq)
 {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 4/8] ixgbe: implement device reset on VF
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (2 preceding siblings ...)
  2016-06-06  5:40 ` [PATCH 3/8] ixgbe: RX/TX with lock on VF Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 5/8] igb: RX/TX with lock " Wenzhuo Lu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

Implement the device reset function.
1, Add the fake RX/TX functions.
2, The reset function tries to stop RX/TX by replacing
   the RX/TX functions with the fake ones and getting the
   locks to make sure the regular RX/TX finished.
3, After the RX/TX stopped, reset the VF port, and then
   release the locks and restore the RX/TX functions.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |   9 +++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 108 ++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  12 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c         |  42 ++++++++++++-
 4 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a761e3c..d36c4b1 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,6 +53,15 @@ New Features
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
+* **Added device reset support for ixgbe VF.**
+
+  Added the device reset API. APP can call this API to reset the VF port
+  when it's not working.
+  Based on the mailbox interruption support, when VF reseives the control
+  message from PF, it means the PF link state changes, VF uses the reset
+  callback in the message handler to notice the APP. APP need call the device
+  reset API to reset the VF port.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index fd2682f..1e3520b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -381,6 +381,8 @@ static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
 static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *udp_tunnel);
 
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -586,6 +588,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
 	.reta_query           = ixgbe_dev_rss_reta_query,
 	.rss_hash_update      = ixgbe_dev_rss_hash_update,
 	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
+	.dev_reset            = ixgbevf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -4060,7 +4063,8 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 		ETH_VLAN_EXTEND_MASK;
 	ixgbevf_vlan_offload_set(dev, mask);
 
-	ixgbevf_dev_rxtx_start(dev);
+	if (ixgbevf_dev_rxtx_start(dev))
+		return -1;
 
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7193,6 +7197,108 @@ static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
 }
 
 static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ixgbe_adapter *adapter =
+		(struct ixgbe_adapter *)dev->data->dev_private;
+	int diag = 0;
+	uint32_t vteiam;
+	uint16_t i;
+	struct ixgbe_rx_queue *rxq;
+	struct ixgbe_tx_queue *txq;
+
+	/* Nothing needs to be done if the device is not started. */
+	if (!dev->data->dev_started)
+		return 0;
+
+	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+	/**
+	 * Stop RX/TX by fake functions and locks.
+	 * Fake functions are used to make RX/TX lock easier.
+	 */
+	adapter->rx_backup = dev->rx_pkt_burst;
+	adapter->tx_backup = dev->tx_pkt_burst;
+	dev->rx_pkt_burst = ixgbevf_recv_pkts_fake;
+	dev->tx_pkt_burst = ixgbevf_xmit_pkts_fake;
+
+	if (dev->data->rx_queues)
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			rxq = dev->data->rx_queues[i];
+			rte_spinlock_lock(&rxq->rx_lock);
+		}
+
+	if (dev->data->tx_queues)
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			rte_spinlock_lock(&txq->tx_lock);
+		}
+
+	/* Performance VF reset. */
+	do {
+		dev->data->dev_started = 0;
+		ixgbevf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+		}
+		rte_delay_ms(1000);
+
+		diag = ixgbevf_dev_start(dev);
+		/*If fail to start the device, need to stop/start it again. */
+		if (diag) {
+			PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+				     "Failed to start device.");
+			continue;
+		}
+		dev->data->dev_started = 1;
+		ixgbevf_dev_stats_reset(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+			diag = 0;
+		}
+
+		/**
+		 * When the PF link is down, there has chance
+		 * that VF cannot operate its registers. Will
+		 * check if the registers is written
+		 * successfully. If not, repeat stop/start until
+		 * the PF link is up, in other words, until the
+		 * registers can be written.
+		 */
+		vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+	/* Reference ixgbevf_intr_enable when checking */
+	} while (diag || vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+	/**
+	 * Release the locks for queues.
+	 * Restore the RX/TX functions.
+	 */
+	if (dev->data->rx_queues)
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			rxq = dev->data->rx_queues[i];
+			rte_spinlock_unlock(&rxq->rx_lock);
+		}
+
+	if (dev->data->tx_queues)
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			rte_spinlock_unlock(&txq->tx_lock);
+		}
+
+	dev->rx_pkt_burst = adapter->rx_backup;
+	dev->tx_pkt_burst = adapter->tx_backup;
+
+	return 0;
+}
+
+static int
 ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
 {
 	uint32_t eicr;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 701107b..d50fad4 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -289,6 +289,8 @@ struct ixgbe_adapter {
 	struct rte_timecounter      systime_tc;
 	struct rte_timecounter      rx_tstamp_tc;
 	struct rte_timecounter      tx_tstamp_tc;
+	eth_rx_burst_t              rx_backup;
+	eth_tx_burst_t              tx_backup;
 };
 
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
@@ -377,7 +379,7 @@ int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);
 
 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
 
-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
 
 uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
@@ -409,6 +411,14 @@ uint16_t ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 uint16_t ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t ixgbevf_recv_pkts_fake(void *rx_queue,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts);
+
+uint16_t ixgbevf_xmit_pkts_fake(void *tx_queue,
+				struct rte_mbuf **tx_pkts,
+				uint16_t nb_pkts);
+
 uint16_t ixgbe_xmit_pkts_lock(void *tx_queue,
 			      struct rte_mbuf **tx_pkts,
 			      uint16_t nb_pkts);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index a45d115..b4e7659 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5181,7 +5181,7 @@ ixgbevf_dev_tx_init(struct rte_eth_dev *dev)
 /*
  * [VF] Start Transmit and Receive Units.
  */
-void __attribute__((cold))
+int __attribute__((cold))
 ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw     *hw;
@@ -5218,7 +5218,15 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 			txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
 		} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
 		if (!poll_ms)
+#ifndef RTE_NEXT_ABI
 			PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+#else
+		{
+			PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+			if (dev->data->dev_conf.txmode.lock_mode)
+				return -1;
+		}
+#endif
 	}
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 
@@ -5235,11 +5243,21 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 			rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
 		} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
 		if (!poll_ms)
+#ifndef RTE_NEXT_ABI
+			PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+#else
+		{
 			PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+			if (dev->data->dev_conf.rxmode.lock_mode)
+				return -1;
+		}
+#endif
 		rte_wmb();
 		IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);
 
 	}
+
+	return 0;
 }
 
 /* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
@@ -5290,3 +5308,25 @@ ixgbe_rxq_vec_setup(struct ixgbe_rx_queue __rte_unused *rxq)
 {
 	return -1;
 }
+
+/**
+ * A fake function to stop receiption.
+ */
+uint16_t
+ixgbevf_recv_pkts_fake(void __rte_unused *rx_queue,
+		       struct rte_mbuf __rte_unused **rx_pkts,
+		       uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
+/**
+ * A fake function to stop transmission.
+ */
+uint16_t
+ixgbevf_xmit_pkts_fake(void __rte_unused *tx_queue,
+		       struct rte_mbuf __rte_unused **tx_pkts,
+		       uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 5/8] igb: RX/TX with lock on VF
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (3 preceding siblings ...)
  2016-06-06  5:40 ` [PATCH 4/8] ixgbe: implement device reset " Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 6/8] igb: implement device reset " Wenzhuo Lu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

Add RX/TX paths with lock for VF. It's used when
the function of link reset on VF is needed.
When the lock for RX/TX is added, the RX/TX can be
stopped. Then we have a chance to reset the VF link.

Please be aware there's performence drop if the lock
path is chosen.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 drivers/net/e1000/e1000_ethdev.h | 10 ++++++++++
 drivers/net/e1000/igb_ethdev.c   | 14 +++++++++++---
 drivers/net/e1000/igb_rxtx.c     | 26 +++++++++++++++++++++-----
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index e8bf8da..6a42994 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -319,6 +319,16 @@ uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 uint16_t eth_igb_recv_scattered_pkts(void *rxq,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 
+uint16_t eth_igb_xmit_pkts_lock(void *txq,
+				struct rte_mbuf **tx_pkts,
+				uint16_t nb_pkts);
+uint16_t eth_igb_recv_pkts_lock(void *rxq,
+				struct rte_mbuf **rx_pkts,
+				uint16_t nb_pkts);
+uint16_t eth_igb_recv_scattered_pkts_lock(void *rxq,
+					  struct rte_mbuf **rx_pkts,
+					  uint16_t nb_pkts);
+
 int eth_igb_rss_hash_update(struct rte_eth_dev *dev,
 			    struct rte_eth_rss_conf *rss_conf);
 
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..8aad741 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -909,15 +909,17 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 	PMD_INIT_FUNC_TRACE();
 
 	eth_dev->dev_ops = &igbvf_eth_dev_ops;
-	eth_dev->rx_pkt_burst = &eth_igb_recv_pkts;
-	eth_dev->tx_pkt_burst = &eth_igb_xmit_pkts;
+	eth_dev->rx_pkt_burst = RX_LOCK_FUNCTION(eth_dev, eth_igb_recv_pkts);
+	eth_dev->tx_pkt_burst = TX_LOCK_FUNCTION(eth_dev, eth_igb_xmit_pkts);
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
 	 * RX function */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY){
 		if (eth_dev->data->scattered_rx)
-			eth_dev->rx_pkt_burst = &eth_igb_recv_scattered_pkts;
+			eth_dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(eth_dev,
+						 eth_igb_recv_scattered_pkts);
 		return 0;
 	}
 
@@ -1999,7 +2001,13 @@ eth_igb_supported_ptypes_get(struct rte_eth_dev *dev)
 	};
 
 	if (dev->rx_pkt_burst == eth_igb_recv_pkts ||
+#ifndef RTE_NEXT_ABI
 	    dev->rx_pkt_burst == eth_igb_recv_scattered_pkts)
+#else
+	    dev->rx_pkt_burst == eth_igb_recv_scattered_pkts ||
+	    dev->rx_pkt_burst == eth_igb_recv_pkts_lock ||
+	    dev->rx_pkt_burst == eth_igb_recv_scattered_pkts_lock)
+#endif
 		return ptypes;
 	return NULL;
 }
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 18aeead..7e97330 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -67,6 +67,7 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_string_fns.h>
+#include <rte_spinlock.h>
 
 #include "e1000_logs.h"
 #include "base/e1000_api.h"
@@ -107,6 +108,7 @@ struct igb_rx_queue {
 	struct igb_rx_entry *sw_ring;   /**< address of RX software ring. */
 	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
 	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	rte_spinlock_t      rx_lock; /**< Lock for packet receiption. */
 	uint16_t            nb_rx_desc; /**< number of RX descriptors. */
 	uint16_t            rx_tail;    /**< current value of RDT register. */
 	uint16_t            nb_rx_hold; /**< number of held free RX desc. */
@@ -174,6 +176,7 @@ struct igb_tx_queue {
 	volatile union e1000_adv_tx_desc *tx_ring; /**< TX ring address */
 	uint64_t               tx_ring_phys_addr; /**< TX ring DMA address. */
 	struct igb_tx_entry    *sw_ring; /**< virtual address of SW ring. */
+	rte_spinlock_t         tx_lock; /**< Lock for packet transmission. */
 	volatile uint32_t      *tdt_reg_addr; /**< Address of TDT register. */
 	uint32_t               txd_type;      /**< Device-specific TXD type */
 	uint16_t               nb_tx_desc;    /**< number of TX descriptors. */
@@ -615,6 +618,8 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	return nb_tx;
 }
 
+GENERATE_TX_LOCK(eth_igb_xmit_pkts, igb)
+
 /*********************************************************************
  *
  *  RX functions
@@ -931,6 +936,8 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
+GENERATE_RX_LOCK(eth_igb_recv_pkts, igb)
+
 uint16_t
 eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 			 uint16_t nb_pkts)
@@ -1186,6 +1193,8 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 	return nb_rx;
 }
 
+GENERATE_RX_LOCK(eth_igb_recv_scattered_pkts, igb)
+
 /*
  * Maximum number of Ring Descriptors.
  *
@@ -1344,6 +1353,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 	txq->reg_idx = (uint16_t)((RTE_ETH_DEV_SRIOV(dev).active == 0) ?
 		queue_idx : RTE_ETH_DEV_SRIOV(dev).def_pool_q_idx + queue_idx);
 	txq->port_id = dev->data->port_id;
+	rte_spinlock_init(&txq->tx_lock);
 
 	txq->tdt_reg_addr = E1000_PCI_REG_ADDR(hw, E1000_TDT(txq->reg_idx));
 	txq->tx_ring_phys_addr = rte_mem_phy2mch(tz->memseg_id, tz->phys_addr);
@@ -1361,7 +1371,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
 		     txq->sw_ring, txq->tx_ring, txq->tx_ring_phys_addr);
 
 	igb_reset_tx_queue(txq, dev);
-	dev->tx_pkt_burst = eth_igb_xmit_pkts;
+	dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, eth_igb_xmit_pkts);
 	dev->data->tx_queues[queue_idx] = txq;
 
 	return 0;
@@ -1467,6 +1477,7 @@ eth_igb_rx_queue_setup(struct rte_eth_dev *dev,
 	rxq->port_id = dev->data->port_id;
 	rxq->crc_len = (uint8_t) ((dev->data->dev_conf.rxmode.hw_strip_crc) ? 0 :
 				  ETHER_CRC_LEN);
+	rte_spinlock_init(&rxq->rx_lock);
 
 	/*
 	 *  Allocate RX ring hardware descriptors. A memzone large enough to
@@ -2323,7 +2334,7 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
 
 	/* Configure and enable each RX queue. */
 	rctl_bsize = 0;
-	dev->rx_pkt_burst = eth_igb_recv_pkts;
+	dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, eth_igb_recv_pkts);
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 		uint64_t bus_addr;
 		uint32_t rxdctl;
@@ -2370,7 +2381,9 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
 				if (!dev->data->scattered_rx)
 					PMD_INIT_LOG(DEBUG,
 						     "forcing scatter mode");
-				dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+				dev->rx_pkt_burst =
+					RX_LOCK_FUNCTION(dev,
+						eth_igb_recv_scattered_pkts);
 				dev->data->scattered_rx = 1;
 			}
 		} else {
@@ -2381,7 +2394,9 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
 				rctl_bsize = buf_size;
 			if (!dev->data->scattered_rx)
 				PMD_INIT_LOG(DEBUG, "forcing scatter mode");
-			dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+			dev->rx_pkt_burst =
+				RX_LOCK_FUNCTION(dev,
+					eth_igb_recv_scattered_pkts);
 			dev->data->scattered_rx = 1;
 		}
 
@@ -2414,7 +2429,8 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
 	if (dev->data->dev_conf.rxmode.enable_scatter) {
 		if (!dev->data->scattered_rx)
 			PMD_INIT_LOG(DEBUG, "forcing scatter mode");
-		dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+		dev->rx_pkt_burst =
+			RX_LOCK_FUNCTION(dev, eth_igb_recv_scattered_pkts);
 		dev->data->scattered_rx = 1;
 	}
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 6/8] igb: implement device reset on VF
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (4 preceding siblings ...)
  2016-06-06  5:40 ` [PATCH 5/8] igb: RX/TX with lock " Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 7/8] i40e:RX/TX with lock " Wenzhuo Lu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Wenzhuo Lu

Implement the device reset function.
1, Add the fake RX/TX functions.
2, The reset function tries to stop RX/TX by replacing
   the RX/TX functions with the fake ones and getting the
   locks to make sure the regular RX/TX finished.
3, After the RX/TX stopped, reset the VF port, and then
   release the locks and restore the RX/TX functions.

BTW: The definition of some structures are moved from .c
file to .h file.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |   2 +-
 drivers/net/e1000/e1000_ethdev.h       | 116 ++++++++++++++++++++++++++++++
 drivers/net/e1000/igb_ethdev.c         | 104 +++++++++++++++++++++++++++
 drivers/net/e1000/igb_rxtx.c           | 128 ++++++---------------------------
 4 files changed, 243 insertions(+), 107 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d36c4b1..a4c0cc3 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,7 +53,7 @@ New Features
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
-* **Added device reset support for ixgbe VF.**
+* **Added device reset support for ixgbe/igb VF.**
 
   Added the device reset API. APP can call this API to reset the VF port
   when it's not working.
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 6a42994..4ae03ce 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -34,6 +34,7 @@
 #ifndef _E1000_ETHDEV_H_
 #define _E1000_ETHDEV_H_
 #include <rte_time.h>
+#include <rte_spinlock.h>
 
 /* need update link, bit flag */
 #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
@@ -261,6 +262,113 @@ struct e1000_adapter {
 	struct rte_timecounter  systime_tc;
 	struct rte_timecounter  rx_tstamp_tc;
 	struct rte_timecounter  tx_tstamp_tc;
+	eth_rx_burst_t rx_backup;
+	eth_tx_burst_t tx_backup;
+};
+
+/**
+ * Structure associated with each descriptor of the RX ring of a RX queue.
+ */
+struct igb_rx_entry {
+	struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
+};
+
+/**
+ * Structure associated with each descriptor of the TX ring of a TX queue.
+ */
+struct igb_tx_entry {
+	struct rte_mbuf *mbuf; /**< mbuf associated with TX desc, if any. */
+	uint16_t next_id; /**< Index of next descriptor in ring. */
+	uint16_t last_id; /**< Index of last scattered descriptor. */
+};
+
+/**
+ * Hardware context number
+ */
+enum igb_advctx_num {
+	IGB_CTX_0    = 0, /**< CTX0    */
+	IGB_CTX_1    = 1, /**< CTX1    */
+	IGB_CTX_NUM  = 2, /**< CTX_NUM */
+};
+
+/** Offload features */
+union igb_tx_offload {
+	uint64_t data;
+	struct {
+		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
+		uint64_t vlan_tci:16;  /**< VLAN Tag Control Identifier(CPU order). */
+		uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+		uint64_t tso_segsz:16; /**< TCP TSO segment size. */
+
+		/* uint64_t unused:8; */
+	};
+};
+
+/**
+ * Strucutre to check if new context need be built
+ */
+struct igb_advctx_info {
+	uint64_t flags;           /**< ol_flags related to context build. */
+	/** tx offload: vlan, tso, l2-l3-l4 lengths. */
+	union igb_tx_offload tx_offload;
+	/** compare mask for tx offload. */
+	union igb_tx_offload tx_offload_mask;
+};
+
+/**
+ * Structure associated with each RX queue.
+ */
+struct igb_rx_queue {
+	struct rte_mempool  *mb_pool;   /**< mbuf pool to populate RX ring. */
+	volatile union e1000_adv_rx_desc *rx_ring; /**< RX ring virtual address. */
+	uint64_t            rx_ring_phys_addr; /**< RX ring DMA address. */
+	volatile uint32_t   *rdt_reg_addr; /**< RDT register address. */
+	volatile uint32_t   *rdh_reg_addr; /**< RDH register address. */
+	struct igb_rx_entry *sw_ring;   /**< address of RX software ring. */
+	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
+	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+	rte_spinlock_t      rx_lock; /**< Lock for packet receiption. */
+	uint16_t            nb_rx_desc; /**< number of RX descriptors. */
+	uint16_t            rx_tail;    /**< current value of RDT register. */
+	uint16_t            nb_rx_hold; /**< number of held free RX desc. */
+	uint16_t            rx_free_thresh; /**< max free RX desc to hold. */
+	uint16_t            queue_id;   /**< RX queue index. */
+	uint16_t            reg_idx;    /**< RX queue register index. */
+	uint8_t             port_id;    /**< Device port identifier. */
+	uint8_t             pthresh;    /**< Prefetch threshold register. */
+	uint8_t             hthresh;    /**< Host threshold register. */
+	uint8_t             wthresh;    /**< Write-back threshold register. */
+	uint8_t             crc_len;    /**< 0 if CRC stripped, 4 otherwise. */
+	uint8_t             drop_en;  /**< If not 0, set SRRCTL.Drop_En. */
+};
+
+/**
+ * Structure associated with each TX queue.
+ */
+struct igb_tx_queue {
+	volatile union e1000_adv_tx_desc *tx_ring; /**< TX ring address */
+	uint64_t               tx_ring_phys_addr; /**< TX ring DMA address. */
+	struct igb_tx_entry    *sw_ring; /**< virtual address of SW ring. */
+	volatile uint32_t      *tdt_reg_addr; /**< Address of TDT register. */
+	rte_spinlock_t         tx_lock; /**< Lock for packet transmission. */
+	uint32_t               txd_type;      /**< Device-specific TXD type */
+	uint16_t               nb_tx_desc;    /**< number of TX descriptors. */
+	uint16_t               tx_tail; /**< Current value of TDT register. */
+	uint16_t               tx_head;
+	/**< Index of first used TX descriptor. */
+	uint16_t               queue_id; /**< TX queue index. */
+	uint16_t               reg_idx;  /**< TX queue register index. */
+	uint8_t                port_id;  /**< Device port identifier. */
+	uint8_t                pthresh;  /**< Prefetch threshold register. */
+	uint8_t                hthresh;  /**< Host threshold register. */
+	uint8_t                wthresh;  /**< Write-back threshold register. */
+	uint32_t               ctx_curr;
+	/**< Current used hardware descriptor. */
+	uint32_t               ctx_start;
+	/**< Start context position for transmit queue. */
+	struct igb_advctx_info ctx_cache[IGB_CTX_NUM];
+	/**< Hardware context history.*/
 };
 
 #define E1000_DEV_PRIVATE(adapter) \
@@ -316,6 +424,14 @@ uint16_t eth_igb_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
 uint16_t eth_igb_recv_pkts(void *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
 
+uint16_t eth_igbvf_xmit_pkts_fake(void *txq,
+				  struct rte_mbuf **tx_pkts,
+				  uint16_t nb_pkts);
+
+uint16_t eth_igbvf_recv_pkts_fake(void *rxq,
+				  struct rte_mbuf **rx_pkts,
+				  uint16_t nb_pkts);
+
 uint16_t eth_igb_recv_scattered_pkts(void *rxq,
 		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
 
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 8aad741..4b78a25 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -268,6 +268,7 @@ static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
 static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 					void *param);
 static void igbvf_mbx_process(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev);
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -409,6 +410,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
 	.mac_addr_set         = igbvf_default_mac_addr_set,
 	.get_reg_length       = igbvf_get_reg_length,
 	.get_reg              = igbvf_get_regs,
+	.dev_reset            = igbvf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -2663,6 +2665,108 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
 }
 
 static int
+igbvf_dev_reset(struct rte_eth_dev *dev)
+{
+	struct e1000_adapter *adapter =
+		(struct e1000_adapter *)dev->data->dev_private;
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag = 0;
+	uint32_t eiam;
+	uint16_t i;
+	struct igb_rx_queue *rxq;
+	struct igb_tx_queue *txq;
+	/* Reference igbvf_intr_enable */
+	uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+	/* Nothing needs to be done if the device is not started. */
+	if (!dev->data->dev_started)
+		return 0;
+
+	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+	/**
+	 * Stop RX/TX by fake functions and locks.
+	 * Fake functions are used to make RX/TX lock easier.
+	 */
+	adapter->rx_backup = dev->rx_pkt_burst;
+	adapter->tx_backup = dev->tx_pkt_burst;
+	dev->rx_pkt_burst = eth_igbvf_recv_pkts_fake;
+	dev->tx_pkt_burst = eth_igbvf_xmit_pkts_fake;
+
+	if (dev->data->rx_queues)
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			rxq = dev->data->rx_queues[i];
+			rte_spinlock_lock(&rxq->rx_lock);
+		}
+
+	if (dev->data->tx_queues)
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			rte_spinlock_lock(&txq->tx_lock);
+		}
+
+	/* Performance VF reset. */
+	do {
+		dev->data->dev_started = 0;
+		igbvf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = eth_igb_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Igb VF reset: "
+				     "Failed to update link.");
+		}
+		rte_delay_ms(1000);
+
+		diag = igbvf_dev_start(dev);
+		if (diag) {
+			PMD_INIT_LOG(ERR, "Igb VF reset: "
+				     "Failed to start device.");
+			return diag;
+		}
+		dev->data->dev_started = 1;
+		eth_igbvf_stats_reset(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = eth_igb_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Igb VF reset: "
+				     "Failed to update link.");
+		}
+
+		/**
+		 * When the PF link is down, there has chance
+		 * that VF cannot operate its registers. Will
+		 * check if the registers is written
+		 * successfully. If not, repeat stop/start until
+		 * the PF link is up, in other words, until the
+		 * registers can be written.
+		 */
+		eiam = E1000_READ_REG(hw, E1000_EIAM);
+	} while (!(eiam & eiam_mbx));
+
+	/**
+	 * Release the locks for queues.
+	 * Restore the RX/TX functions.
+	 */
+	if (dev->data->rx_queues)
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			rxq = dev->data->rx_queues[i];
+			rte_spinlock_unlock(&rxq->rx_lock);
+		}
+
+	if (dev->data->tx_queues)
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			rte_spinlock_unlock(&txq->tx_lock);
+		}
+
+	dev->rx_pkt_burst = adapter->rx_backup;
+	dev->tx_pkt_burst = adapter->tx_backup;
+
+	return 0;
+}
+
+static int
 eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
 {
 	struct e1000_interrupt *intr =
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 7e97330..5af7173 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -67,7 +67,6 @@
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 #include <rte_string_fns.h>
-#include <rte_spinlock.h>
 
 #include "e1000_logs.h"
 #include "base/e1000_api.h"
@@ -80,72 +79,6 @@
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG)
 
-/**
- * Structure associated with each descriptor of the RX ring of a RX queue.
- */
-struct igb_rx_entry {
-	struct rte_mbuf *mbuf; /**< mbuf associated with RX descriptor. */
-};
-
-/**
- * Structure associated with each descriptor of the TX ring of a TX queue.
- */
-struct igb_tx_entry {
-	struct rte_mbuf *mbuf; /**< mbuf associated with TX desc, if any. */
-	uint16_t next_id; /**< Index of next descriptor in ring. */
-	uint16_t last_id; /**< Index of last scattered descriptor. */
-};
-
-/**
- * Structure associated with each RX queue.
- */
-struct igb_rx_queue {
-	struct rte_mempool  *mb_pool;   /**< mbuf pool to populate RX ring. */
-	volatile union e1000_adv_rx_desc *rx_ring; /**< RX ring virtual address. */
-	uint64_t            rx_ring_phys_addr; /**< RX ring DMA address. */
-	volatile uint32_t   *rdt_reg_addr; /**< RDT register address. */
-	volatile uint32_t   *rdh_reg_addr; /**< RDH register address. */
-	struct igb_rx_entry *sw_ring;   /**< address of RX software ring. */
-	struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
-	struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
-	rte_spinlock_t      rx_lock; /**< Lock for packet receiption. */
-	uint16_t            nb_rx_desc; /**< number of RX descriptors. */
-	uint16_t            rx_tail;    /**< current value of RDT register. */
-	uint16_t            nb_rx_hold; /**< number of held free RX desc. */
-	uint16_t            rx_free_thresh; /**< max free RX desc to hold. */
-	uint16_t            queue_id;   /**< RX queue index. */
-	uint16_t            reg_idx;    /**< RX queue register index. */
-	uint8_t             port_id;    /**< Device port identifier. */
-	uint8_t             pthresh;    /**< Prefetch threshold register. */
-	uint8_t             hthresh;    /**< Host threshold register. */
-	uint8_t             wthresh;    /**< Write-back threshold register. */
-	uint8_t             crc_len;    /**< 0 if CRC stripped, 4 otherwise. */
-	uint8_t             drop_en;  /**< If not 0, set SRRCTL.Drop_En. */
-};
-
-/**
- * Hardware context number
- */
-enum igb_advctx_num {
-	IGB_CTX_0    = 0, /**< CTX0    */
-	IGB_CTX_1    = 1, /**< CTX1    */
-	IGB_CTX_NUM  = 2, /**< CTX_NUM */
-};
-
-/** Offload features */
-union igb_tx_offload {
-	uint64_t data;
-	struct {
-		uint64_t l3_len:9; /**< L3 (IP) Header Length. */
-		uint64_t l2_len:7; /**< L2 (MAC) Header Length. */
-		uint64_t vlan_tci:16;  /**< VLAN Tag Control Identifier(CPU order). */
-		uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
-		uint64_t tso_segsz:16; /**< TCP TSO segment size. */
-
-		/* uint64_t unused:8; */
-	};
-};
-
 /*
  * Compare mask for igb_tx_offload.data,
  * should be in sync with igb_tx_offload layout.
@@ -158,45 +91,6 @@ union igb_tx_offload {
 #define TX_TSO_CMP_MASK	\
 	(TX_MACIP_LEN_CMP_MASK | TX_TCP_LEN_CMP_MASK | TX_TSO_MSS_CMP_MASK)
 
-/**
- * Strucutre to check if new context need be built
- */
-struct igb_advctx_info {
-	uint64_t flags;           /**< ol_flags related to context build. */
-	/** tx offload: vlan, tso, l2-l3-l4 lengths. */
-	union igb_tx_offload tx_offload;
-	/** compare mask for tx offload. */
-	union igb_tx_offload tx_offload_mask;
-};
-
-/**
- * Structure associated with each TX queue.
- */
-struct igb_tx_queue {
-	volatile union e1000_adv_tx_desc *tx_ring; /**< TX ring address */
-	uint64_t               tx_ring_phys_addr; /**< TX ring DMA address. */
-	struct igb_tx_entry    *sw_ring; /**< virtual address of SW ring. */
-	rte_spinlock_t         tx_lock; /**< Lock for packet transmission. */
-	volatile uint32_t      *tdt_reg_addr; /**< Address of TDT register. */
-	uint32_t               txd_type;      /**< Device-specific TXD type */
-	uint16_t               nb_tx_desc;    /**< number of TX descriptors. */
-	uint16_t               tx_tail; /**< Current value of TDT register. */
-	uint16_t               tx_head;
-	/**< Index of first used TX descriptor. */
-	uint16_t               queue_id; /**< TX queue index. */
-	uint16_t               reg_idx;  /**< TX queue register index. */
-	uint8_t                port_id;  /**< Device port identifier. */
-	uint8_t                pthresh;  /**< Prefetch threshold register. */
-	uint8_t                hthresh;  /**< Host threshold register. */
-	uint8_t                wthresh;  /**< Write-back threshold register. */
-	uint32_t               ctx_curr;
-	/**< Current used hardware descriptor. */
-	uint32_t               ctx_start;
-	/**< Start context position for transmit queue. */
-	struct igb_advctx_info ctx_cache[IGB_CTX_NUM];
-	/**< Hardware context history.*/
-};
-
 #if 1
 #define RTE_PMD_USE_PREFETCH
 #endif
@@ -2530,3 +2424,25 @@ igb_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	qinfo->conf.tx_thresh.hthresh = txq->hthresh;
 	qinfo->conf.tx_thresh.wthresh = txq->wthresh;
 }
+
+/**
+ * A fake function to stop transmission.
+ */
+uint16_t
+eth_igbvf_xmit_pkts_fake(void __rte_unused *tx_queue,
+			 struct rte_mbuf __rte_unused **tx_pkts,
+			 uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
+/**
+ * A fake function to stop receiption.
+ */
+uint16_t
+eth_igbvf_recv_pkts_fake(void __rte_unused *rx_queue,
+			 struct rte_mbuf __rte_unused **rx_pkts,
+			 uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 7/8] i40e:RX/TX with lock on VF
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (5 preceding siblings ...)
  2016-06-06  5:40 ` [PATCH 6/8] igb: implement device reset " Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-06  5:40 ` [PATCH 8/8] i40e: implement device reset " Wenzhuo Lu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Zhe Tao

Add RX/TX paths with lock for VF. It's used when
the function of link reset on VF is needed.
When the lock for RX/TX is added, the RX/TX can be
stopped. Then we have a chance to reset the VF link.

Please be aware there's performence drop if the lock
path is chosen.

Signed-off-by: Zhe Tao <zhe.tao@intel.com>
---
 drivers/net/i40e/i40e_ethdev.c    |  4 ++--
 drivers/net/i40e/i40e_ethdev.h    |  4 ++++
 drivers/net/i40e/i40e_ethdev_vf.c |  4 ++--
 drivers/net/i40e/i40e_rxtx.c      | 45 +++++++++++++++++++++++++--------------
 drivers/net/i40e/i40e_rxtx.h      | 30 ++++++++++++++++++++++++++
 5 files changed, 67 insertions(+), 20 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 24777d5..1380330 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -764,8 +764,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
 	PMD_INIT_FUNC_TRACE();
 
 	dev->dev_ops = &i40e_eth_dev_ops;
-	dev->rx_pkt_burst = i40e_recv_pkts;
-	dev->tx_pkt_burst = i40e_xmit_pkts;
+	dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts);
+	dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts);
 
 	/* for secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cfd2399..672d920 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -540,6 +540,10 @@ struct i40e_adapter {
 	struct rte_timecounter systime_tc;
 	struct rte_timecounter rx_tstamp_tc;
 	struct rte_timecounter tx_tstamp_tc;
+
+	/* For VF reset backup */
+	eth_rx_burst_t rx_backup;
+	eth_tx_burst_t tx_backup;
 };
 
 int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 90682ac..46d8a7c 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1451,8 +1451,8 @@ i40evf_dev_init(struct rte_eth_dev *eth_dev)
 
 	/* assign ops func pointer */
 	eth_dev->dev_ops = &i40evf_eth_dev_ops;
-	eth_dev->rx_pkt_burst = &i40e_recv_pkts;
-	eth_dev->tx_pkt_burst = &i40e_xmit_pkts;
+	eth_dev->rx_pkt_burst = RX_LOCK_FUNCTION(eth_dev, i40e_recv_pkts);
+	eth_dev->tx_pkt_burst = TX_LOCK_FUNCTION(eth_dev, i40e_xmit_pkts);
 
 	/*
 	 * For secondary processes, we don't initialise any further as primary
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..0a6dcfb 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -79,10 +79,6 @@
 		PKT_TX_TCP_SEG |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
-static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
-				      struct rte_mbuf **tx_pkts,
-				      uint16_t nb_pkts);
-
 static inline void
 i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp)
 {
@@ -1144,7 +1140,7 @@ rx_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	return 0;
 }
 
-static uint16_t
+uint16_t
 i40e_recv_pkts_bulk_alloc(void *rx_queue,
 			  struct rte_mbuf **rx_pkts,
 			  uint16_t nb_pkts)
@@ -1169,7 +1165,7 @@ i40e_recv_pkts_bulk_alloc(void *rx_queue,
 	return nb_rx;
 }
 #else
-static uint16_t
+uint16_t
 i40e_recv_pkts_bulk_alloc(void __rte_unused *rx_queue,
 			  struct rte_mbuf __rte_unused **rx_pkts,
 			  uint16_t __rte_unused nb_pkts)
@@ -1892,7 +1888,7 @@ tx_xmit_pkts(struct i40e_tx_queue *txq,
 	return nb_pkts;
 }
 
-static uint16_t
+uint16_t
 i40e_xmit_pkts_simple(void *tx_queue,
 		      struct rte_mbuf **tx_pkts,
 		      uint16_t nb_pkts)
@@ -2121,10 +2117,13 @@ i40e_dev_supported_ptypes_get(struct rte_eth_dev *dev)
 	};
 
 	if (dev->rx_pkt_burst == i40e_recv_pkts ||
+	    dev->rx_pkt_burst == i40e_recv_pkts_lock ||
 #ifdef RTE_LIBRTE_I40E_RX_ALLOW_BULK_ALLOC
 	    dev->rx_pkt_burst == i40e_recv_pkts_bulk_alloc ||
+	    dev->rx_pkt_burst == i40e_recv_pkts_bulk_alloc_lock ||
 #endif
-	    dev->rx_pkt_burst == i40e_recv_scattered_pkts)
+	    dev->rx_pkt_burst == i40e_recv_scattered_pkts ||
+	    dev->rx_pkt_burst == i40e_recv_scattered_pkts_lock)
 		return ptypes;
 	return NULL;
 }
@@ -2648,6 +2647,7 @@ i40e_reset_rx_queue(struct i40e_rx_queue *rxq)
 
 	rxq->rxrearm_start = 0;
 	rxq->rxrearm_nb = 0;
+	rte_spinlock_init(&rxq->rx_lock);
 }
 
 void
@@ -2704,6 +2704,7 @@ i40e_reset_tx_queue(struct i40e_tx_queue *txq)
 
 	txq->last_desc_cleaned = (uint16_t)(txq->nb_tx_desc - 1);
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_desc - 1);
+	rte_spinlock_init(&txq->tx_lock);
 }
 
 /* Init the TX queue in hardware */
@@ -3155,12 +3156,12 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
 					    "callback (port=%d).",
 				     dev->data->port_id);
 
-			dev->rx_pkt_burst = i40e_recv_scattered_pkts_vec;
+			dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_scattered_pkts_vec);
 		} else {
 			PMD_INIT_LOG(DEBUG, "Using a Scattered with bulk "
 					   "allocation callback (port=%d).",
 				     dev->data->port_id);
-			dev->rx_pkt_burst = i40e_recv_scattered_pkts;
+			dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_scattered_pkts);
 		}
 	/* If parameters allow we are going to choose between the following
 	 * callbacks:
@@ -3174,27 +3175,29 @@ i40e_set_rx_function(struct rte_eth_dev *dev)
 			     RTE_I40E_DESCS_PER_LOOP,
 			     dev->data->port_id);
 
-		dev->rx_pkt_burst = i40e_recv_pkts_vec;
+		dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts_vec);
 	} else if (ad->rx_bulk_alloc_allowed) {
 		PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are "
 				    "satisfied. Rx Burst Bulk Alloc function "
 				    "will be used on port=%d.",
 			     dev->data->port_id);
 
-		dev->rx_pkt_burst = i40e_recv_pkts_bulk_alloc;
+		dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts_bulk_alloc);
 	} else {
 		PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are not "
 				    "satisfied, or Scattered Rx is requested "
 				    "(port=%d).",
 			     dev->data->port_id);
 
-		dev->rx_pkt_burst = i40e_recv_pkts;
+		dev->rx_pkt_burst = RX_LOCK_FUNCTION(dev, i40e_recv_pkts);
 	}
 
 	/* Propagate information about RX function choice through all queues. */
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		rx_using_sse =
 			(dev->rx_pkt_burst == i40e_recv_scattered_pkts_vec ||
+			 dev->rx_pkt_burst == i40e_recv_scattered_pkts_vec_lock ||
+			 dev->rx_pkt_burst == i40e_recv_pkts_vec_lock ||
 			 dev->rx_pkt_burst == i40e_recv_pkts_vec);
 
 		for (i = 0; i < dev->data->nb_rx_queues; i++) {
@@ -3250,14 +3253,14 @@ i40e_set_tx_function(struct rte_eth_dev *dev)
 	if (ad->tx_simple_allowed) {
 		if (ad->tx_vec_allowed) {
 			PMD_INIT_LOG(DEBUG, "Vector tx finally be used.");
-			dev->tx_pkt_burst = i40e_xmit_pkts_vec;
+			dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts_vec);
 		} else {
 			PMD_INIT_LOG(DEBUG, "Simple tx finally be used.");
-			dev->tx_pkt_burst = i40e_xmit_pkts_simple;
+			dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts_simple);
 		}
 	} else {
 		PMD_INIT_LOG(DEBUG, "Xmit tx finally be used.");
-		dev->tx_pkt_burst = i40e_xmit_pkts;
+		dev->tx_pkt_burst = TX_LOCK_FUNCTION(dev, i40e_xmit_pkts);
 	}
 }
 
@@ -3311,3 +3314,13 @@ i40e_xmit_pkts_vec(void __rte_unused *tx_queue,
 {
 	return 0;
 }
+
+GENERATE_RX_LOCK(i40e_recv_pkts, i40e)
+GENERATE_RX_LOCK(i40e_recv_pkts_vec, i40e)
+GENERATE_RX_LOCK(i40e_recv_pkts_bulk_alloc, i40e)
+GENERATE_RX_LOCK(i40e_recv_scattered_pkts, i40e)
+GENERATE_RX_LOCK(i40e_recv_scattered_pkts_vec, i40e)
+
+GENERATE_TX_LOCK(i40e_xmit_pkts, i40e)
+GENERATE_TX_LOCK(i40e_xmit_pkts_vec, i40e)
+GENERATE_TX_LOCK(i40e_xmit_pkts_simple, i40e)
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..a1c13b8 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -140,6 +140,7 @@ struct i40e_rx_queue {
 	bool rx_deferred_start; /**< don't start this queue in dev start */
 	uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
 	uint8_t dcb_tc;         /**< Traffic class of rx queue */
+	rte_spinlock_t rx_lock; /**< lock for rx path */
 };
 
 struct i40e_tx_entry {
@@ -181,6 +182,7 @@ struct i40e_tx_queue {
 	bool q_set; /**< indicate if tx queue has been configured */
 	bool tx_deferred_start; /**< don't start this queue in dev start */
 	uint8_t dcb_tc;         /**< Traffic class of tx queue */
+	rte_spinlock_t tx_lock; /**< lock for tx path */
 };
 
 /** Offload features */
@@ -223,6 +225,27 @@ uint16_t i40e_recv_scattered_pkts(void *rx_queue,
 uint16_t i40e_xmit_pkts(void *tx_queue,
 			struct rte_mbuf **tx_pkts,
 			uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_lock(void *tx_queue,
+			struct rte_mbuf **tx_pkts,
+			uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_simple(void *tx_queue,
+		      struct rte_mbuf **tx_pkts,
+		      uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_simple_lock(void *tx_queue,
+		      struct rte_mbuf **tx_pkts,
+		      uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_lock(void *rx_queue,
+			struct rte_mbuf **rx_pkts,
+			uint16_t nb_pkts);
+uint16_t i40e_recv_scattered_pkts_lock(void *rx_queue,
+				  struct rte_mbuf **rx_pkts,
+				  uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_bulk_alloc(void *rx_queue,
+			  struct rte_mbuf **rx_pkts,
+			  uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_bulk_alloc_lock(void *rx_queue,
+			  struct rte_mbuf **rx_pkts,
+			  uint16_t nb_pkts);
 int i40e_tx_queue_init(struct i40e_tx_queue *txq);
 int i40e_rx_queue_init(struct i40e_rx_queue *rxq);
 void i40e_free_tx_resources(struct i40e_tx_queue *txq);
@@ -244,12 +267,19 @@ uint16_t i40e_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
 uint16_t i40e_recv_scattered_pkts_vec(void *rx_queue,
 				      struct rte_mbuf **rx_pkts,
 				      uint16_t nb_pkts);
+uint16_t i40e_recv_pkts_vec_lock(void *rx_queue, struct rte_mbuf **rx_pkts,
+			    uint16_t nb_pkts);
+uint16_t i40e_recv_scattered_pkts_vec_lock(void *rx_queue,
+				      struct rte_mbuf **rx_pkts,
+				      uint16_t nb_pkts);
 int i40e_rx_vec_dev_conf_condition_check(struct rte_eth_dev *dev);
 int i40e_rxq_vec_setup(struct i40e_rx_queue *rxq);
 int i40e_txq_vec_setup(struct i40e_tx_queue *txq);
 void i40e_rx_queue_release_mbufs_vec(struct i40e_rx_queue *rxq);
 uint16_t i40e_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts,
 			    uint16_t nb_pkts);
+uint16_t i40e_xmit_pkts_vec_lock(void *tx_queue, struct rte_mbuf **tx_pkts,
+			    uint16_t nb_pkts);
 void i40e_set_rx_function(struct rte_eth_dev *dev);
 void i40e_set_tx_function_flag(struct rte_eth_dev *dev,
 			       struct i40e_tx_queue *txq);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 8/8] i40e: implement device reset on VF
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (6 preceding siblings ...)
  2016-06-06  5:40 ` [PATCH 7/8] i40e:RX/TX with lock " Wenzhuo Lu
@ 2016-06-06  5:40 ` Wenzhuo Lu
  2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
  9 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-06  5:40 UTC (permalink / raw)
  To: dev; +Cc: Zhe Tao

Implement the device reset function.
1, Add the fake RX/TX functions.
2, The reset function tries to stop RX/TX by replacing
   the RX/TX functions with the fake ones and getting the
   locks to make sure the regular RX/TX finished.
3, After the RX/TX stopped, reset the VF port, and then
   release the locks.

Signed-off-by: Zhe Tao <zhe.tao@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |   5 ++
 drivers/net/i40e/i40e_ethdev.h         |   7 +-
 drivers/net/i40e/i40e_ethdev_vf.c      | 141 +++++++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_rxtx.h           |   4 +
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a4c0cc3..f43b867 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -62,6 +62,11 @@ New Features
   callback in the message handler to notice the APP. APP need call the device
   reset API to reset the VF port.
 
+* **Added VF reset support for i40e VF driver.**
+
+  Added a new implementaion to allow i40e VF driver to
+  reset the functionality and state of itself.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 672d920..dcd6e0f 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -541,9 +541,8 @@ struct i40e_adapter {
 	struct rte_timecounter rx_tstamp_tc;
 	struct rte_timecounter tx_tstamp_tc;
 
-	/* For VF reset backup */
-	eth_rx_burst_t rx_backup;
-	eth_tx_burst_t tx_backup;
+	/* For VF reset */
+	uint8_t reset_number;
 };
 
 int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
@@ -597,6 +596,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
 
+void i40evf_emulate_vf_reset(uint8_t port_id);
+
 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
 	(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 46d8a7c..9fc121b 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,6 +157,12 @@ i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
 static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
 				   uint8_t *msg,
 				   uint16_t msglen);
+static int i40evf_dev_uninit(struct rte_eth_dev *eth_dev);
+static int i40evf_dev_init(struct rte_eth_dev *eth_dev);
+static void i40evf_dev_close(struct rte_eth_dev *dev);
+static int i40evf_dev_start(struct rte_eth_dev *dev);
+static int i40evf_dev_configure(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);
 
 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -223,6 +229,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
 	.reta_query           = i40evf_dev_rss_reta_query,
 	.rss_hash_update      = i40evf_dev_rss_hash_update,
 	.rss_hash_conf_get    = i40evf_dev_rss_hash_conf_get,
+	.dev_reset            = i40evf_handle_vf_reset
 };
 
 /*
@@ -1309,6 +1316,140 @@ i40evf_uninit_vf(struct rte_eth_dev *dev)
 }
 
 static void
+i40e_vf_queue_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct i40e_rx_queue *rxq = dev->data->rx_queues[i];
+
+		if (rxq->q_set) {
+			i40e_dev_rx_queue_setup(dev,
+						rxq->queue_id,
+						rxq->nb_rx_desc,
+						rxq->socket_id,
+						&rxq->rxconf,
+						rxq->mp);
+		}
+
+		rxq = dev->data->rx_queues[i];
+		rte_spinlock_trylock(&rxq->rx_lock);
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct i40e_tx_queue *txq = dev->data->tx_queues[i];
+
+		if (txq->q_set) {
+			i40e_dev_tx_queue_setup(dev,
+						txq->queue_id,
+						txq->nb_tx_desc,
+						txq->socket_id,
+						&txq->txconf);
+		}
+
+		txq = dev->data->tx_queues[i];
+		rte_spinlock_trylock(&txq->tx_lock);
+	}
+}
+
+static void
+i40e_vf_reset_dev(struct rte_eth_dev *dev)
+{
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+	i40evf_dev_close(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev close complete");
+	i40evf_dev_uninit(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev detached");
+	memset(dev->data->dev_private, 0,
+	       (uint64_t)&adapter->reset_number - (uint64_t)adapter);
+
+	i40evf_dev_configure(dev);
+	i40evf_dev_init(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev attached");
+	i40e_vf_queue_reset(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf queue reset");
+	i40evf_dev_start(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev restart");
+}
+
+static uint16_t
+i40evf_recv_pkts_detach(void __rte_unused *rx_queue,
+			struct rte_mbuf __rte_unused **rx_pkts,
+			uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
+static uint16_t
+i40evf_xmit_pkts_detach(void __rte_unused *tx_queue,
+			struct rte_mbuf __rte_unused **tx_pkts,
+			uint16_t __rte_unused nb_pkts)
+{
+	return 0;
+}
+
+static int
+i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+{
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+	uint16_t i = 0;
+	struct i40e_rx_queue *rxq;
+	struct i40e_tx_queue *txq;
+
+	if (!dev->data->dev_started)
+		return 0;
+
+	adapter->reset_number = 1;
+
+	/**
+	 * Stop RX/TX by fake functions and locks.
+	 * Fake functions are used to make RX/TX lock easier.
+	 */
+	dev->rx_pkt_burst = i40evf_recv_pkts_detach;
+	dev->tx_pkt_burst = i40evf_xmit_pkts_detach;
+
+	if (dev->data->rx_queues)
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			rxq = dev->data->rx_queues[i];
+			rte_spinlock_lock(&rxq->rx_lock);
+		}
+
+	if (dev->data->tx_queues)
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			rte_spinlock_lock(&txq->tx_lock);
+		}
+
+	i40e_vf_reset_dev(dev);
+
+	adapter->reset_number = 0;
+
+	if (dev->data->rx_queues)
+		for (i = 0; i < dev->data->nb_rx_queues; i++) {
+			rxq = dev->data->rx_queues[i];
+			rte_spinlock_unlock(&rxq->rx_lock);
+		}
+
+	if (dev->data->tx_queues)
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			txq = dev->data->tx_queues[i];
+			rte_spinlock_unlock(&txq->tx_lock);
+		}
+
+	return 0;
+}
+
+void
+i40evf_emulate_vf_reset(uint8_t port_id)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	i40evf_handle_vf_reset(dev);
+}
+
+static void
 i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
 			   uint8_t *msg,
 			   __rte_unused uint16_t msglen)
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index a1c13b8..7ee33dc 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -141,6 +141,8 @@ struct i40e_rx_queue {
 	uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
 	uint8_t dcb_tc;         /**< Traffic class of rx queue */
 	rte_spinlock_t rx_lock; /**< lock for rx path */
+	uint8_t socket_id;
+	struct rte_eth_rxconf rxconf;
 };
 
 struct i40e_tx_entry {
@@ -183,6 +185,8 @@ struct i40e_tx_queue {
 	bool tx_deferred_start; /**< don't start this queue in dev start */
 	uint8_t dcb_tc;         /**< Traffic class of tx queue */
 	rte_spinlock_t tx_lock; /**< lock for tx path */
+	uint8_t socket_id;
+	struct rte_eth_txconf txconf;
 };
 
 /** Offload features */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-06  5:40 ` [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode Wenzhuo Lu
@ 2016-06-08  2:15   ` Stephen Hemminger
  2016-06-08  7:34     ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Stephen Hemminger @ 2016-06-08  2:15 UTC (permalink / raw)
  To: Wenzhuo Lu; +Cc: dev, Zhe Tao

On Mon,  6 Jun 2016 13:40:47 +0800
Wenzhuo Lu <wenzhuo.lu@intel.com> wrote:

> Define lock mode for RX/TX queue. Because when resetting
> the device we want the resetting thread to get the lock
> of the RX/TX queue to make sure the RX/TX is stopped.
> 
> Using next ABI macro for this ABI change as it has too
> much impact. 7 APIs and 1 global variable are impacted.
> 
> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> Signed-off-by: Zhe Tao <zhe.tao@intel.com>

Why does this patch set make a different assumption the rest of the DPDK?

The rest of the DPDK operates on the principle that the application
is smart enough to stop the device before making changes. There is no
equivalent to the Linux kernel RTNL mutex. The API assumes application
threads are well behaved and will not try and sabotage each other.

If you restrict the reset operation to only being available when RX/TX is stopped,
then no lock is needed.

The fact that it requires lots more locking inside each device driver implies
to me this is not correct way to architect this.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-08  2:15   ` Stephen Hemminger
@ 2016-06-08  7:34     ` Lu, Wenzhuo
  2016-06-09  7:50       ` Olivier Matz
  2016-06-10 18:12       ` Stephen Hemminger
  0 siblings, 2 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-08  7:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Tao, Zhe

Hi Stephen,


> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, June 8, 2016 10:16 AM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org; Tao, Zhe
> Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
> 
> On Mon,  6 Jun 2016 13:40:47 +0800
> Wenzhuo Lu <wenzhuo.lu@intel.com> wrote:
> 
> > Define lock mode for RX/TX queue. Because when resetting the device we
> > want the resetting thread to get the lock of the RX/TX queue to make
> > sure the RX/TX is stopped.
> >
> > Using next ABI macro for this ABI change as it has too much impact. 7
> > APIs and 1 global variable are impacted.
> >
> > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> > Signed-off-by: Zhe Tao <zhe.tao@intel.com>
> 
> Why does this patch set make a different assumption the rest of the DPDK?
> 
> The rest of the DPDK operates on the principle that the application is smart
> enough to stop the device before making changes. There is no equivalent to the
> Linux kernel RTNL mutex. The API assumes application threads are well behaved
> and will not try and sabotage each other.
> 
> If you restrict the reset operation to only being available when RX/TX is stopped,
> then no lock is needed.
> 
> The fact that it requires lots more locking inside each device driver implies to me
> this is not correct way to architect this.
It's a good question. This patch set doesn't follow the regular assumption of DPDK.
But it's a requirement we've got from some customers. The users want the driver does as much as it can. The best is the link state change is transparent to the  users.
The patch set tries to provide another choice if the users don't want to stop their rx/tx to handle the reset event.

And as discussed in the other thread, most probably we will move the lock from the PMD layer to rte lay. It'll avoid the change in every device.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-08  7:34     ` Lu, Wenzhuo
@ 2016-06-09  7:50       ` Olivier Matz
  2016-06-12  5:25         ` Lu, Wenzhuo
  2016-06-10 18:12       ` Stephen Hemminger
  1 sibling, 1 reply; 72+ messages in thread
From: Olivier Matz @ 2016-06-09  7:50 UTC (permalink / raw)
  To: Lu, Wenzhuo, Stephen Hemminger; +Cc: dev, Tao, Zhe

Hi,

On 06/08/2016 09:34 AM, Lu, Wenzhuo wrote:
> Hi Stephen,
> 
> 
>> -----Original Message-----
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Wednesday, June 8, 2016 10:16 AM
>> To: Lu, Wenzhuo
>> Cc: dev@dpdk.org; Tao, Zhe
>> Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
>>
>> On Mon,  6 Jun 2016 13:40:47 +0800
>> Wenzhuo Lu <wenzhuo.lu@intel.com> wrote:
>>
>>> Define lock mode for RX/TX queue. Because when resetting the device we
>>> want the resetting thread to get the lock of the RX/TX queue to make
>>> sure the RX/TX is stopped.
>>>
>>> Using next ABI macro for this ABI change as it has too much impact. 7
>>> APIs and 1 global variable are impacted.
>>>
>>> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
>>> Signed-off-by: Zhe Tao <zhe.tao@intel.com>
>>
>> Why does this patch set make a different assumption the rest of the DPDK?
>>
>> The rest of the DPDK operates on the principle that the application is smart
>> enough to stop the device before making changes. There is no equivalent to the
>> Linux kernel RTNL mutex. The API assumes application threads are well behaved
>> and will not try and sabotage each other.
>>
>> If you restrict the reset operation to only being available when RX/TX is stopped,
>> then no lock is needed.
>>
>> The fact that it requires lots more locking inside each device driver implies to me
>> this is not correct way to architect this.

+1

I'm not sure adding locks is the proper way to do.
This is the application responsibility to ensure that:
- control functions are not called concurrently on the same port
- rx/tx functions are not called when the device is stopped/reset/...

However, I do think the usage paradigms of the ethdev api should be
better documented in rte_ethdev.h (ex: which functions can be called
concurrently). This would be a first step.

If we really want a helper API to do that in DPDK, the _next_ step
could be to add them in the ethdev api to achieve this. Maybe
something like (the function names could be better):

- to be called on one control thread:

  rte_eth_stop_rxtx(port)
  rte_eth_start_rxtx(port)

  rte_eth_get_rxtx_state(port)
     -> return "running" if at least one core is inside the rx/tx code
     -> return "stopped" if all cores are outside the rx/tx code

- to be called on dataplane cores:

  /* same than rte_eth_rx_burst(), but checks if rx/tx is allowed
   * first, else do nothing */
  rte_eth_rx_burst_interruptible()
  rte_eth_tx_burst_interruptible()


The code of control thread could be:

  rte_eth_stop_rxtx(port);
  /* wait that all dataplane cores finished their processing */
  while (rte_eth_get_rxtx_state(port) != stopped)
      ;
  rte_eth_some_control_operation(port);
  rte_eth_start_rxtx(port);


I think this could be done without any lock, just with the proper
memory barriers and a per-core status.

But this API may impose a paradigm to the application, and I'm not
sure the DPDK should do that.

Regards,
Olivier

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-08  7:34     ` Lu, Wenzhuo
  2016-06-09  7:50       ` Olivier Matz
@ 2016-06-10 18:12       ` Stephen Hemminger
  2016-06-12  5:27         ` Lu, Wenzhuo
  1 sibling, 1 reply; 72+ messages in thread
From: Stephen Hemminger @ 2016-06-10 18:12 UTC (permalink / raw)
  To: Lu, Wenzhuo; +Cc: dev, Tao, Zhe

On Wed, 8 Jun 2016 07:34:43 +0000
"Lu, Wenzhuo" <wenzhuo.lu@intel.com> wrote:

> > 
> > The fact that it requires lots more locking inside each device driver implies to me
> > this is not correct way to architect this.  
> It's a good question. This patch set doesn't follow the regular assumption of DPDK.
> But it's a requirement we've got from some customers. The users want the driver does as much as it can. The best is the link state change is transparent to the  users.
> The patch set tries to provide another choice if the users don't want to stop their rx/tx to handle the reset event.

Then bring those uses to the development world (on users mailing list) and lets
start the discussion there.  The requirements creeping in through the backdoor also worries me.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-09  7:50       ` Olivier Matz
@ 2016-06-12  5:25         ` Lu, Wenzhuo
  0 siblings, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-12  5:25 UTC (permalink / raw)
  To: Olivier Matz, Stephen Hemminger; +Cc: dev, Tao, Zhe

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, June 9, 2016 3:51 PM
> To: Lu, Wenzhuo; Stephen Hemminger
> Cc: dev@dpdk.org; Tao, Zhe
> Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
> 
> Hi,
> 
> On 06/08/2016 09:34 AM, Lu, Wenzhuo wrote:
> > Hi Stephen,
> >
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Wednesday, June 8, 2016 10:16 AM
> >> To: Lu, Wenzhuo
> >> Cc: dev@dpdk.org; Tao, Zhe
> >> Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX
> >> lock mode
> >>
> >> On Mon,  6 Jun 2016 13:40:47 +0800
> >> Wenzhuo Lu <wenzhuo.lu@intel.com> wrote:
> >>
> >>> Define lock mode for RX/TX queue. Because when resetting the device
> >>> we want the resetting thread to get the lock of the RX/TX queue to
> >>> make sure the RX/TX is stopped.
> >>>
> >>> Using next ABI macro for this ABI change as it has too much impact.
> >>> 7 APIs and 1 global variable are impacted.
> >>>
> >>> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> >>> Signed-off-by: Zhe Tao <zhe.tao@intel.com>
> >>
> >> Why does this patch set make a different assumption the rest of the DPDK?
> >>
> >> The rest of the DPDK operates on the principle that the application
> >> is smart enough to stop the device before making changes. There is no
> >> equivalent to the Linux kernel RTNL mutex. The API assumes
> >> application threads are well behaved and will not try and sabotage each
> other.
> >>
> >> If you restrict the reset operation to only being available when
> >> RX/TX is stopped, then no lock is needed.
> >>
> >> The fact that it requires lots more locking inside each device driver
> >> implies to me this is not correct way to architect this.
> 
> +1
> 
> I'm not sure adding locks is the proper way to do.
> This is the application responsibility to ensure that:
> - control functions are not called concurrently on the same port
> - rx/tx functions are not called when the device is stopped/reset/...
> 
> However, I do think the usage paradigms of the ethdev api should be better
> documented in rte_ethdev.h (ex: which functions can be called concurrently).
> This would be a first step.
> 
> If we really want a helper API to do that in DPDK, the _next_ step could be to
> add them in the ethdev api to achieve this. Maybe something like (the function
> names could be better):
> 
> - to be called on one control thread:
> 
>   rte_eth_stop_rxtx(port)
>   rte_eth_start_rxtx(port)
> 
>   rte_eth_get_rxtx_state(port)
>      -> return "running" if at least one core is inside the rx/tx code
>      -> return "stopped" if all cores are outside the rx/tx code
> 
> - to be called on dataplane cores:
> 
>   /* same than rte_eth_rx_burst(), but checks if rx/tx is allowed
>    * first, else do nothing */
>   rte_eth_rx_burst_interruptible()
>   rte_eth_tx_burst_interruptible()
> 
> 
> The code of control thread could be:
> 
>   rte_eth_stop_rxtx(port);
>   /* wait that all dataplane cores finished their processing */
>   while (rte_eth_get_rxtx_state(port) != stopped)
>       ;
>   rte_eth_some_control_operation(port);
>   rte_eth_start_rxtx(port);
> 
> 
> I think this could be done without any lock, just with the proper memory barriers
> and a per-core status.
> 
> But this API may impose a paradigm to the application, and I'm not sure the
> DPDK should do that.
I don't quite catch your point. Seems your solution still need the APP to change the code. I think it's more complex than just letting the APP to stop the rx/tx and reset the port. Our purpose of this patch set is to let APP do less as possible. It's not a good choice if we make it more complex.
And seems it's hard to stop and start rx/tx in rte layer. Normally APP should do that. To my opinion, we have to introduce lock in rte to achieve that.

> 
> Regards,
> Olivier

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
  2016-06-10 18:12       ` Stephen Hemminger
@ 2016-06-12  5:27         ` Lu, Wenzhuo
  0 siblings, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-12  5:27 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Tao, Zhe

Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, June 11, 2016 2:12 AM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org; Tao, Zhe
> Subject: Re: [dpdk-dev] [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode
> 
> On Wed, 8 Jun 2016 07:34:43 +0000
> "Lu, Wenzhuo" <wenzhuo.lu@intel.com> wrote:
> 
> > >
> > > The fact that it requires lots more locking inside each device
> > > driver implies to me this is not correct way to architect this.
> > It's a good question. This patch set doesn't follow the regular assumption of
> DPDK.
> > But it's a requirement we've got from some customers. The users want the
> driver does as much as it can. The best is the link state change is transparent to
> the  users.
> > The patch set tries to provide another choice if the users don't want to stop
> their rx/tx to handle the reset event.
> 
> Then bring those uses to the development world (on users mailing list) and lets
> start the discussion there.  The requirements creeping in through the backdoor
> also worries me.
Got it. Then how about we only provide a reset API and let the APP to stop/start the rx/tx and call the API to reset the port? Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 0/4] support reset of VF link
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (7 preceding siblings ...)
  2016-06-06  5:40 ` [PATCH 8/8] i40e: implement device reset " Wenzhuo Lu
@ 2016-06-15  3:03 ` Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 1/4] lib/librte_ether: support device reset Wenzhuo Lu
                     ` (3 more replies)
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
  9 siblings, 4 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-15  3:03 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang

If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.

PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on the
patch set *support mailbox interruption on ixgbe/igb VF*.

Wenzhuo Lu (3):
  lib/librte_ether: support device reset
  ixgbe: implement device reset on VF
  igb: implement device reset on VF

Zhe Tao (1):
  i40e: implement device reset on VF

v1:
- Added the implementation for the VF reset functionality.
v2:
- Changed the i40e related operations during VF reset.
v3:
- Resent the patches because of the mail sent issue.
v4:
- Removed some VF reset emulation code.
v5:
- Removed all the code related with lock.

 doc/guides/rel_notes/release_16_07.rst | 13 ++++++
 drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev.h         |  4 ++
 drivers/net/i40e/i40e_ethdev_vf.c      | 83 ++++++++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_rxtx.c           | 10 ++++
 drivers/net/i40e/i40e_rxtx.h           |  4 ++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
 lib/librte_ether/rte_ethdev.c          | 17 +++++++
 lib/librte_ether/rte_ethdev.h          | 14 ++++++
 lib/librte_ether/rte_ether_version.map |  7 +++
 12 files changed, 284 insertions(+), 5 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v5 1/4] lib/librte_ether: support device reset
  2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
@ 2016-06-15  3:03   ` Wenzhuo Lu
  2016-06-16 15:31     ` Bruce Richardson
  2016-06-16 15:36     ` Thomas Monjalon
  2016-06-15  3:03   ` [PATCH v5 2/4] ixgbe: implement device reset on VF Wenzhuo Lu
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-15  3:03 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, Wenzhuo Lu

Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 lib/librte_ether/rte_ethdev.c          | 17 +++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 14 ++++++++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++++++
 3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..e43dca9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
 				-ENOTSUP);
 	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
 }
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+	struct rte_eth_dev *dev;
+	int diag;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+	diag = (*dev->dev_ops->dev_reset)(dev);
+
+	return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..74e895f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
 	 uint8_t en);
 /**< @internal enable/disable the l2 tunnel offload functions */
 
+typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
 #ifdef RTE_NIC_BYPASS
 
 enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
 	eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
 	/** Enable/disable l2 tunnel offload functions */
 	eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+	/** Reset device. */
+	eth_dev_reset_t dev_reset;
 };
 
 /**
@@ -4253,6 +4258,15 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
 				  uint32_t mask,
 				  uint8_t en);
 
+/**
+ * Reset an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
 	rte_eth_tx_buffer_set_err_callback;
 
 } DPDK_2.2;
+
+DPDK_16.07 {
+	global:
+
+	rte_eth_dev_reset;
+
+} DPDK_16.04;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v5 2/4] ixgbe: implement device reset on VF
  2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 1/4] lib/librte_ether: support device reset Wenzhuo Lu
@ 2016-06-15  3:03   ` Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 3/4] igb: " Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 4/4] i40e: " Wenzhuo Lu
  3 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-15  3:03 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, Wenzhuo Lu

Implement the device reset function.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  9 +++++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++++--
 4 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a761e3c..d36c4b1 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,6 +53,15 @@ New Features
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
+* **Added device reset support for ixgbe VF.**
+
+  Added the device reset API. APP can call this API to reset the VF port
+  when it's not working.
+  Based on the mailbox interruption support, when VF reseives the control
+  message from PF, it means the PF link state changes, VF uses the reset
+  callback in the message handler to notice the APP. APP need call the device
+  reset API to reset the VF port.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 05f4f29..4e62cbb 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -381,6 +381,8 @@ static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
 static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *udp_tunnel);
 
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -586,6 +588,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
 	.reta_query           = ixgbe_dev_rss_reta_query,
 	.rss_hash_update      = ixgbe_dev_rss_hash_update,
 	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
+	.dev_reset            = ixgbevf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -4052,7 +4055,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 		ETH_VLAN_EXTEND_MASK;
 	ixgbevf_vlan_offload_set(dev, mask);
 
-	ixgbevf_dev_rxtx_start(dev);
+	err = ixgbevf_dev_rxtx_start(dev);
+	if (err)
+		return err;
 
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7185,6 +7190,63 @@ static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
 }
 
 static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag = 0;
+	uint32_t vteiam;
+
+	/* Nothing needs to be done if the device is not started. */
+	if (!dev->data->dev_started)
+		return 0;
+
+	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+	/* Performance VF reset. */
+	do {
+		dev->data->dev_started = 0;
+		ixgbevf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+		}
+		rte_delay_ms(1000);
+
+		diag = ixgbevf_dev_start(dev);
+		/*If fail to start the device, need to stop/start it again. */
+		if (diag) {
+			PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+				     "Failed to start device.");
+			continue;
+		}
+		dev->data->dev_started = 1;
+		ixgbevf_dev_stats_reset(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+			diag = 0;
+		}
+
+		/**
+		 * When the PF link is down, there has chance
+		 * that VF cannot operate its registers. Will
+		 * check if the registers is written
+		 * successfully. If not, repeat stop/start until
+		 * the PF link is up, in other words, until the
+		 * registers can be written.
+		 */
+		vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+	/* Reference ixgbevf_intr_enable when checking */
+	} while (diag || vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+	return 0;
+}
+
+static int
 ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
 {
 	uint32_t eicr;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..bc68b43 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -377,7 +377,7 @@ int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);
 
 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
 
-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
 
 uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9c6eaf2..aa26c12 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5147,7 +5147,7 @@ ixgbevf_dev_tx_init(struct rte_eth_dev *dev)
 /*
  * [VF] Start Transmit and Receive Units.
  */
-void __attribute__((cold))
+int __attribute__((cold))
 ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw     *hw;
@@ -5183,8 +5183,10 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 			rte_delay_ms(1);
 			txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
 		} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
-		if (!poll_ms)
+		if (!poll_ms) {
 			PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+			return -1;
+		}
 	}
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 
@@ -5200,12 +5202,16 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 			rte_delay_ms(1);
 			rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
 		} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
-		if (!poll_ms)
+		if (!poll_ms) {
 			PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+			return -1;
+		}
 		rte_wmb();
 		IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);
 
 	}
+
+	return 0;
 }
 
 /* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v5 3/4] igb: implement device reset on VF
  2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 1/4] lib/librte_ether: support device reset Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 2/4] ixgbe: implement device reset on VF Wenzhuo Lu
@ 2016-06-15  3:03   ` Wenzhuo Lu
  2016-06-15  3:03   ` [PATCH v5 4/4] i40e: " Wenzhuo Lu
  3 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-15  3:03 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, Wenzhuo Lu

Implement the device reset function.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  2 +-
 drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d36c4b1..a4c0cc3 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,7 +53,7 @@ New Features
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
-* **Added device reset support for ixgbe VF.**
+* **Added device reset support for ixgbe/igb VF.**
 
   Added the device reset API. APP can call this API to reset the VF port
   when it's not working.
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..f1ac4b5 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -268,6 +268,7 @@ static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
 static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 					void *param);
 static void igbvf_mbx_process(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev);
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -409,6 +410,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
 	.mac_addr_set         = igbvf_default_mac_addr_set,
 	.get_reg_length       = igbvf_get_reg_length,
 	.get_reg              = igbvf_get_regs,
+	.dev_reset            = igbvf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -2655,6 +2657,63 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
 }
 
 static int
+igbvf_dev_reset(struct rte_eth_dev *dev)
+{
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag = 0;
+	uint32_t eiam;
+	/* Reference igbvf_intr_enable */
+	uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+	/* Nothing needs to be done if the device is not started. */
+	if (!dev->data->dev_started)
+		return 0;
+
+	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+	/* Performance VF reset. */
+	do {
+		dev->data->dev_started = 0;
+		igbvf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = eth_igb_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Igb VF reset: "
+				     "Failed to update link.");
+		}
+		rte_delay_ms(1000);
+
+		diag = igbvf_dev_start(dev);
+		if (diag) {
+			PMD_INIT_LOG(ERR, "Igb VF reset: "
+				     "Failed to start device.");
+			return diag;
+		}
+		dev->data->dev_started = 1;
+		eth_igbvf_stats_reset(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = eth_igb_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Igb VF reset: "
+				     "Failed to update link.");
+		}
+
+		/**
+		 * When the PF link is down, there has chance
+		 * that VF cannot operate its registers. Will
+		 * check if the registers is written
+		 * successfully. If not, repeat stop/start until
+		 * the PF link is up, in other words, until the
+		 * registers can be written.
+		 */
+		eiam = E1000_READ_REG(hw, E1000_EIAM);
+	} while (!(eiam & eiam_mbx));
+
+	return 0;
+}
+
+static int
 eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
 {
 	struct e1000_interrupt *intr =
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v5 4/4] i40e: implement device reset on VF
  2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
                     ` (2 preceding siblings ...)
  2016-06-15  3:03   ` [PATCH v5 3/4] igb: " Wenzhuo Lu
@ 2016-06-15  3:03   ` Wenzhuo Lu
  3 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-15  3:03 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, Zhe Tao

Implement the device reset function.
This reset function will detach device then
attach device, reconfigure dev, re-setup the Rx/Tx queues.

Signed-off-by: Zhe Tao <zhe.tao@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  4 ++
 drivers/net/i40e/i40e_ethdev.h         |  4 ++
 drivers/net/i40e/i40e_ethdev_vf.c      | 83 ++++++++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_rxtx.c           | 10 ++++
 drivers/net/i40e/i40e_rxtx.h           |  4 ++
 5 files changed, 105 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a4c0cc3..6661b07 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -62,6 +62,10 @@ New Features
   callback in the message handler to notice the APP. APP need call the device
   reset API to reset the VF port.
 
+* **Added VF reset support for i40e VF driver.**
+
+  Added a new implementaion to allow i40e VF driver to
+  reset the functionality and state of itself.
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cfd2399..4e0df3b 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -540,6 +540,8 @@ struct i40e_adapter {
 	struct rte_timecounter systime_tc;
 	struct rte_timecounter rx_tstamp_tc;
 	struct rte_timecounter tx_tstamp_tc;
+	/* For VF reset */
+	uint8_t reset_number;
 };
 
 int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
@@ -593,6 +595,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
 
+void i40evf_emulate_vf_reset(uint8_t port_id);
+
 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
 	(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 90682ac..2f65a29 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,6 +157,12 @@ i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
 static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
 				   uint8_t *msg,
 				   uint16_t msglen);
+static int i40evf_dev_uninit(struct rte_eth_dev *eth_dev);
+static int i40evf_dev_init(struct rte_eth_dev *eth_dev);
+static void i40evf_dev_close(struct rte_eth_dev *dev);
+static int i40evf_dev_start(struct rte_eth_dev *dev);
+static int i40evf_dev_configure(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);
 
 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -223,6 +229,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
 	.reta_query           = i40evf_dev_rss_reta_query,
 	.rss_hash_update      = i40evf_dev_rss_hash_update,
 	.rss_hash_conf_get    = i40evf_dev_rss_hash_conf_get,
+	.dev_reset            = i40evf_handle_vf_reset
 };
 
 /*
@@ -1309,6 +1316,82 @@ i40evf_uninit_vf(struct rte_eth_dev *dev)
 }
 
 static void
+i40e_vf_queue_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct i40e_rx_queue *rxq = dev->data->rx_queues[i];
+
+		if (rxq->q_set) {
+			i40e_dev_rx_queue_setup(dev,
+						rxq->queue_id,
+						rxq->nb_rx_desc,
+						rxq->socket_id,
+						&rxq->rxconf,
+						rxq->mp);
+		}
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct i40e_tx_queue *txq = dev->data->tx_queues[i];
+
+		if (txq->q_set) {
+			i40e_dev_tx_queue_setup(dev,
+						txq->queue_id,
+						txq->nb_tx_desc,
+						txq->socket_id,
+						&txq->txconf);
+		}
+	}
+}
+
+static void
+i40e_vf_reset_dev(struct rte_eth_dev *dev)
+{
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+	i40evf_dev_close(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev close complete");
+	i40evf_dev_uninit(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev detached");
+	memset(dev->data->dev_private, 0,
+	       (uint64_t)&adapter->reset_number - (uint64_t)adapter);
+
+	i40evf_dev_configure(dev);
+	i40evf_dev_init(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev attached");
+	i40e_vf_queue_reset(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf queue reset");
+	i40evf_dev_start(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev restart");
+}
+
+static int
+i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+{
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+	if (!dev->data->dev_started)
+		return 0;
+
+	adapter->reset_number = 1;
+	i40e_vf_reset_dev(dev);
+	adapter->reset_number = 0;
+
+	return 0;
+}
+
+void
+i40evf_emulate_vf_reset(uint8_t port_id)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	i40evf_handle_vf_reset(dev);
+}
+
+static void
 i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
 			   uint8_t *msg,
 			   __rte_unused uint16_t msglen)
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..8dbc64c 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2148,6 +2148,7 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len, i;
 	uint16_t base, bsf, tc_mapping;
 	int use_def_burst_func = 1;
+	struct rte_eth_rxconf conf = *rx_conf;
 
 	if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
 		struct i40e_vf *vf =
@@ -2186,6 +2187,8 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	rxq->mp = mp;
+	rxq->socket_id = socket_id;
+	rxq->rxconf = conf;
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -2365,6 +2368,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	uint32_t ring_size;
 	uint16_t tx_rs_thresh, tx_free_thresh;
 	uint16_t i, base, bsf, tc_mapping;
+	struct rte_eth_txconf conf = *tx_conf;
 
 	if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
 		struct i40e_vf *vf =
@@ -2488,6 +2492,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	txq->nb_tx_desc = nb_desc;
+	txq->socket_id = socket_id;
+	txq->txconf = conf;
 	txq->tx_rs_thresh = tx_rs_thresh;
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
@@ -2950,8 +2956,12 @@ void
 i40e_dev_free_queues(struct rte_eth_dev *dev)
 {
 	uint16_t i;
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 
 	PMD_INIT_FUNC_TRACE();
+	if (adapter->reset_number)
+		return;
 
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 		i40e_dev_rx_queue_release(dev->data->rx_queues[i]);
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..9e1b05a 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -140,6 +140,8 @@ struct i40e_rx_queue {
 	bool rx_deferred_start; /**< don't start this queue in dev start */
 	uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
 	uint8_t dcb_tc;         /**< Traffic class of rx queue */
+	uint8_t socket_id;
+	struct rte_eth_rxconf rxconf;
 };
 
 struct i40e_tx_entry {
@@ -181,6 +183,8 @@ struct i40e_tx_queue {
 	bool q_set; /**< indicate if tx queue has been configured */
 	bool tx_deferred_start; /**< don't start this queue in dev start */
 	uint8_t dcb_tc;         /**< Traffic class of tx queue */
+	uint8_t socket_id;
+	struct rte_eth_txconf txconf;
 };
 
 /** Offload features */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 1/4] lib/librte_ether: support device reset
  2016-06-15  3:03   ` [PATCH v5 1/4] lib/librte_ether: support device reset Wenzhuo Lu
@ 2016-06-16 15:31     ` Bruce Richardson
  2016-06-16 15:36     ` Thomas Monjalon
  1 sibling, 0 replies; 72+ messages in thread
From: Bruce Richardson @ 2016-06-16 15:31 UTC (permalink / raw)
  To: Wenzhuo Lu
  Cc: dev, konstantin.ananyev, jing.d.chen, cunming.liang, jingjing.wu,
	helin.zhang

On Wed, Jun 15, 2016 at 11:03:31AM +0800, Wenzhuo Lu wrote:
> Add an API to reset the device.
> It's for VF device in this scenario, kernel PF + DPDK VF.
> When the PF port down->up, APP should call this API to
> reset VF port. Most likely, APP should call it in its
> management thread and guarantee the thread safe. It means
> APP should stop the rx/tx and the device, then reset the
> device, then recover the device and rx/tx.
> 
> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>

Since this is adding a new ethdev feature, I think you should also add a new
row to the NIC feature overview matrix so we can record the PMDs which support
it.

/Bruce

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v5 1/4] lib/librte_ether: support device reset
  2016-06-15  3:03   ` [PATCH v5 1/4] lib/librte_ether: support device reset Wenzhuo Lu
  2016-06-16 15:31     ` Bruce Richardson
@ 2016-06-16 15:36     ` Thomas Monjalon
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Monjalon @ 2016-06-16 15:36 UTC (permalink / raw)
  To: Wenzhuo Lu
  Cc: dev, konstantin.ananyev, bruce.richardson, jing.d.chen,
	cunming.liang, jingjing.wu, helin.zhang

2016-06-15 11:03, Wenzhuo Lu:
> +/**
> + * Reset an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + */
> +int
> +rte_eth_dev_reset(uint8_t port_id);

Please explain in the doxygen comment what means a reset.
We must understand why and when an application should call it.
And it must be clear for a PMD developper how to implement it.
What is the return value?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 0/4] support reset of VF link
  2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
                   ` (8 preceding siblings ...)
  2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
@ 2016-06-20  6:24 ` Wenzhuo Lu
  2016-06-20  6:24   ` [PATCH v6 1/4] lib/librte_ether: support device reset Wenzhuo Lu
                     ` (4 more replies)
  9 siblings, 5 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-20  6:24 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, thomas.monjalon

If the PF link is down and up, VF link will not work accordingly.
This patch set addes the support of VF link reset. So, when VF
receices the messges of physical link down/up. APP can reset the
VF link and let it recover.

PS: This patch set is splitted from a previous patch set,
*automatic link recovery on ixgbe/igb VF*, and it's base on the
patch set *support mailbox interruption on ixgbe/igb VF*.

Wenzhuo Lu (3):
  lib/librte_ether: support device reset
  ixgbe: implement device reset on VF
  igb: implement device reset on VF

Zhe Tao (1):
  i40e: implement device reset on VF

v1:
- Added the implementation for the VF reset functionality.
v2:
- Changed the i40e related operations during VF reset.
v3:
- Resent the patches because of the mail sent issue.
v4:
- Removed some VF reset emulation code.
v5:
- Removed all the code related with lock.
v6:
- Updated the NIC feature overview matrix.
- Added more explanation in the doxygen comment of reset API.

 doc/guides/nics/overview.rst           |  1 +
 doc/guides/rel_notes/release_16_07.rst | 13 ++++++
 drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
 drivers/net/i40e/i40e_ethdev.h         |  4 ++
 drivers/net/i40e/i40e_ethdev_vf.c      | 83 ++++++++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_rxtx.c           | 10 ++++
 drivers/net/i40e/i40e_rxtx.h           |  4 ++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
 lib/librte_ether/rte_ethdev.c          | 17 +++++++
 lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++
 13 files changed, 295 insertions(+), 5 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
@ 2016-06-20  6:24   ` Wenzhuo Lu
  2016-06-20  9:14     ` Jerin Jacob
  2016-06-20  6:24   ` [PATCH v6 2/4] ixgbe: implement device reset on VF Wenzhuo Lu
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-20  6:24 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, thomas.monjalon, Wenzhuo Lu

Add an API to reset the device.
It's for VF device in this scenario, kernel PF + DPDK VF.
When the PF port down->up, APP should call this API to
reset VF port. Most likely, APP should call it in its
management thread and guarantee the thread safe. It means
APP should stop the rx/tx and the device, then reset the
device, then recover the device and rx/tx.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/nics/overview.rst           |  1 +
 lib/librte_ether/rte_ethdev.c          | 17 +++++++++++++++++
 lib/librte_ether/rte_ethdev.h          | 24 ++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++++++
 4 files changed, 49 insertions(+)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index 0bd8fae..c8a4985 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -89,6 +89,7 @@ Most of these differences are summarized below.
    Speed capabilities
    Link status            Y Y   Y Y   Y Y Y     Y   Y Y Y Y         Y Y         Y Y   Y Y Y Y
    Link status event      Y Y     Y     Y Y     Y   Y Y             Y Y         Y Y     Y
+   Link reset                               Y Y   Y     Y Y
    Queue status event                                                                   Y
    Rx interrupt                   Y     Y Y Y Y Y Y Y Y Y Y Y Y Y Y
    Queue start/stop             Y   Y Y Y Y Y Y     Y Y     Y Y Y Y Y Y               Y   Y Y
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e148028..6c0449b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
 				-ENOTSUP);
 	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
 }
+
+int
+rte_eth_dev_reset(uint8_t port_id)
+{
+	struct rte_eth_dev *dev;
+	int diag;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
+
+	diag = (*dev->dev_ops->dev_reset)(dev);
+
+	return diag;
+}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..5b3ba12 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
 	 uint8_t en);
 /**< @internal enable/disable the l2 tunnel offload functions */
 
+typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to reset a configured Ethernet device. */
+
 #ifdef RTE_NIC_BYPASS
 
 enum {
@@ -1508,6 +1511,8 @@ struct eth_dev_ops {
 	eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
 	/** Enable/disable l2 tunnel offload functions */
 	eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
+	/** Reset device. */
+	eth_dev_reset_t dev_reset;
 };
 
 /**
@@ -4253,6 +4258,25 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
 				  uint32_t mask,
 				  uint8_t en);
 
+/**
+ * Reset an ethernet device when it's not working. One scenario is, after PF
+ * port is down and up, the related VF port should be reset.
+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.
+ * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
+ * APP can drop the packets and release the buffer instead of sending them.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if port identifier is invalid.
+ *   - (-ENOTSUP) if hardware doesn't support this function.
+ */
+int
+rte_eth_dev_reset(uint8_t port_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c34207e 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
 	rte_eth_tx_buffer_set_err_callback;
 
 } DPDK_2.2;
+
+DPDK_16.07 {
+	global:
+
+	rte_eth_dev_reset;
+
+} DPDK_16.04;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 2/4] ixgbe: implement device reset on VF
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
  2016-06-20  6:24   ` [PATCH v6 1/4] lib/librte_ether: support device reset Wenzhuo Lu
@ 2016-06-20  6:24   ` Wenzhuo Lu
  2016-06-20  6:24   ` [PATCH v6 3/4] igb: " Wenzhuo Lu
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-20  6:24 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, thomas.monjalon, Wenzhuo Lu

Implement the device reset function.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  9 +++++
 drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
 drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++++--
 4 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a761e3c..d36c4b1 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,6 +53,15 @@ New Features
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
+* **Added device reset support for ixgbe VF.**
+
+  Added the device reset API. APP can call this API to reset the VF port
+  when it's not working.
+  Based on the mailbox interruption support, when VF reseives the control
+  message from PF, it means the PF link state changes, VF uses the reset
+  callback in the message handler to notice the APP. APP need call the device
+  reset API to reset the VF port.
+
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 05f4f29..4e62cbb 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -381,6 +381,8 @@ static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
 static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *udp_tunnel);
 
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -586,6 +588,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = {
 	.reta_query           = ixgbe_dev_rss_reta_query,
 	.rss_hash_update      = ixgbe_dev_rss_hash_update,
 	.rss_hash_conf_get    = ixgbe_dev_rss_hash_conf_get,
+	.dev_reset            = ixgbevf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -4052,7 +4055,9 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
 		ETH_VLAN_EXTEND_MASK;
 	ixgbevf_vlan_offload_set(dev, mask);
 
-	ixgbevf_dev_rxtx_start(dev);
+	err = ixgbevf_dev_rxtx_start(dev);
+	if (err)
+		return err;
 
 	/* check and configure queue intr-vector mapping */
 	if (dev->data->dev_conf.intr_conf.rxq != 0) {
@@ -7185,6 +7190,63 @@ static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
 }
 
 static int
+ixgbevf_dev_reset(struct rte_eth_dev *dev)
+{
+	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag = 0;
+	uint32_t vteiam;
+
+	/* Nothing needs to be done if the device is not started. */
+	if (!dev->data->dev_started)
+		return 0;
+
+	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+	/* Performance VF reset. */
+	do {
+		dev->data->dev_started = 0;
+		ixgbevf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+		}
+		rte_delay_ms(1000);
+
+		diag = ixgbevf_dev_start(dev);
+		/*If fail to start the device, need to stop/start it again. */
+		if (diag) {
+			PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
+				     "Failed to start device.");
+			continue;
+		}
+		dev->data->dev_started = 1;
+		ixgbevf_dev_stats_reset(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = ixgbe_dev_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Ixgbe VF reset: "
+				     "Failed to update link.");
+			diag = 0;
+		}
+
+		/**
+		 * When the PF link is down, there has chance
+		 * that VF cannot operate its registers. Will
+		 * check if the registers is written
+		 * successfully. If not, repeat stop/start until
+		 * the PF link is up, in other words, until the
+		 * registers can be written.
+		 */
+		vteiam = IXGBE_READ_REG(hw, IXGBE_VTEIAM);
+	/* Reference ixgbevf_intr_enable when checking */
+	} while (diag || vteiam != IXGBE_VF_IRQ_ENABLE_MASK);
+
+	return 0;
+}
+
+static int
 ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev)
 {
 	uint32_t eicr;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 4ff6338..bc68b43 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -377,7 +377,7 @@ int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);
 
 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
 
-void ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
+int ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev);
 
 uint16_t ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 9c6eaf2..aa26c12 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5147,7 +5147,7 @@ ixgbevf_dev_tx_init(struct rte_eth_dev *dev)
 /*
  * [VF] Start Transmit and Receive Units.
  */
-void __attribute__((cold))
+int __attribute__((cold))
 ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw     *hw;
@@ -5183,8 +5183,10 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 			rte_delay_ms(1);
 			txdctl = IXGBE_READ_REG(hw, IXGBE_VFTXDCTL(i));
 		} while (--poll_ms && !(txdctl & IXGBE_TXDCTL_ENABLE));
-		if (!poll_ms)
+		if (!poll_ms) {
 			PMD_INIT_LOG(ERR, "Could not enable Tx Queue %d", i);
+			return -1;
+		}
 	}
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 
@@ -5200,12 +5202,16 @@ ixgbevf_dev_rxtx_start(struct rte_eth_dev *dev)
 			rte_delay_ms(1);
 			rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i));
 		} while (--poll_ms && !(rxdctl & IXGBE_RXDCTL_ENABLE));
-		if (!poll_ms)
+		if (!poll_ms) {
 			PMD_INIT_LOG(ERR, "Could not enable Rx Queue %d", i);
+			return -1;
+		}
 		rte_wmb();
 		IXGBE_WRITE_REG(hw, IXGBE_VFRDT(i), rxq->nb_rx_desc - 1);
 
 	}
+
+	return 0;
 }
 
 /* Stubs needed for linkage when CONFIG_RTE_IXGBE_INC_VECTOR is set to 'n' */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 3/4] igb: implement device reset on VF
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
  2016-06-20  6:24   ` [PATCH v6 1/4] lib/librte_ether: support device reset Wenzhuo Lu
  2016-06-20  6:24   ` [PATCH v6 2/4] ixgbe: implement device reset on VF Wenzhuo Lu
@ 2016-06-20  6:24   ` Wenzhuo Lu
  2016-06-20  6:24   ` [PATCH v6 4/4] i40e: " Wenzhuo Lu
  2016-07-04 15:48   ` [PATCH v6 0/4] support reset of VF link Luca Boccassi
  4 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-20  6:24 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, thomas.monjalon, Wenzhuo Lu

Implement the device reset function.

Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  2 +-
 drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index d36c4b1..a4c0cc3 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -53,7 +53,7 @@ New Features
   VF. To handle this link up/down event, add the mailbox interruption
   support to receive the message.
 
-* **Added device reset support for ixgbe VF.**
+* **Added device reset support for ixgbe/igb VF.**
 
   Added the device reset API. APP can call this API to reset the VF port
   when it's not working.
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index b0e5e6a..f1ac4b5 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -268,6 +268,7 @@ static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
 static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 					void *param);
 static void igbvf_mbx_process(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev);
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -409,6 +410,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = {
 	.mac_addr_set         = igbvf_default_mac_addr_set,
 	.get_reg_length       = igbvf_get_reg_length,
 	.get_reg              = igbvf_get_regs,
+	.dev_reset            = igbvf_dev_reset,
 };
 
 /* store statistics names and its offset in stats structure */
@@ -2655,6 +2657,63 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)
 }
 
 static int
+igbvf_dev_reset(struct rte_eth_dev *dev)
+{
+	struct e1000_hw *hw =
+		E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	int diag = 0;
+	uint32_t eiam;
+	/* Reference igbvf_intr_enable */
+	uint32_t eiam_mbx = 1 << E1000_VTIVAR_MISC_MAILBOX;
+
+	/* Nothing needs to be done if the device is not started. */
+	if (!dev->data->dev_started)
+		return 0;
+
+	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
+
+	/* Performance VF reset. */
+	do {
+		dev->data->dev_started = 0;
+		igbvf_dev_stop(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = eth_igb_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Igb VF reset: "
+				     "Failed to update link.");
+		}
+		rte_delay_ms(1000);
+
+		diag = igbvf_dev_start(dev);
+		if (diag) {
+			PMD_INIT_LOG(ERR, "Igb VF reset: "
+				     "Failed to start device.");
+			return diag;
+		}
+		dev->data->dev_started = 1;
+		eth_igbvf_stats_reset(dev);
+		if (dev->data->dev_conf.intr_conf.lsc == 0)
+			diag = eth_igb_link_update(dev, 0);
+		if (diag) {
+			PMD_INIT_LOG(INFO, "Igb VF reset: "
+				     "Failed to update link.");
+		}
+
+		/**
+		 * When the PF link is down, there has chance
+		 * that VF cannot operate its registers. Will
+		 * check if the registers is written
+		 * successfully. If not, repeat stop/start until
+		 * the PF link is up, in other words, until the
+		 * registers can be written.
+		 */
+		eiam = E1000_READ_REG(hw, E1000_EIAM);
+	} while (!(eiam & eiam_mbx));
+
+	return 0;
+}
+
+static int
 eth_igbvf_interrupt_action(struct rte_eth_dev *dev)
 {
 	struct e1000_interrupt *intr =
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH v6 4/4] i40e: implement device reset on VF
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
                     ` (2 preceding siblings ...)
  2016-06-20  6:24   ` [PATCH v6 3/4] igb: " Wenzhuo Lu
@ 2016-06-20  6:24   ` Wenzhuo Lu
  2016-07-04 15:48   ` [PATCH v6 0/4] support reset of VF link Luca Boccassi
  4 siblings, 0 replies; 72+ messages in thread
From: Wenzhuo Lu @ 2016-06-20  6:24 UTC (permalink / raw)
  To: dev
  Cc: konstantin.ananyev, bruce.richardson, jing.d.chen, cunming.liang,
	jingjing.wu, helin.zhang, thomas.monjalon, Zhe Tao

Implement the device reset function.
This reset function will detach device then
attach device, reconfigure dev, re-setup the Rx/Tx queues.

Signed-off-by: Zhe Tao <zhe.tao@intel.com>
---
 doc/guides/rel_notes/release_16_07.rst |  4 ++
 drivers/net/i40e/i40e_ethdev.h         |  4 ++
 drivers/net/i40e/i40e_ethdev_vf.c      | 83 ++++++++++++++++++++++++++++++++++
 drivers/net/i40e/i40e_rxtx.c           | 10 ++++
 drivers/net/i40e/i40e_rxtx.h           |  4 ++
 5 files changed, 105 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a4c0cc3..6661b07 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -62,6 +62,10 @@ New Features
   callback in the message handler to notice the APP. APP need call the device
   reset API to reset the VF port.
 
+* **Added VF reset support for i40e VF driver.**
+
+  Added a new implementaion to allow i40e VF driver to
+  reset the functionality and state of itself.
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cfd2399..4e0df3b 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -540,6 +540,8 @@ struct i40e_adapter {
 	struct rte_timecounter systime_tc;
 	struct rte_timecounter rx_tstamp_tc;
 	struct rte_timecounter tx_tstamp_tc;
+	/* For VF reset */
+	uint8_t reset_number;
 };
 
 int i40e_dev_switch_queues(struct i40e_pf *pf, bool on);
@@ -593,6 +595,8 @@ void i40e_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 void i40e_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
 	struct rte_eth_txq_info *qinfo);
 
+void i40evf_emulate_vf_reset(uint8_t port_id);
+
 /* I40E_DEV_PRIVATE_TO */
 #define I40E_DEV_PRIVATE_TO_PF(adapter) \
 	(&((struct i40e_adapter *)adapter)->pf)
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index 90682ac..2f65a29 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,6 +157,12 @@ i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
 static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
 				   uint8_t *msg,
 				   uint16_t msglen);
+static int i40evf_dev_uninit(struct rte_eth_dev *eth_dev);
+static int i40evf_dev_init(struct rte_eth_dev *eth_dev);
+static void i40evf_dev_close(struct rte_eth_dev *dev);
+static int i40evf_dev_start(struct rte_eth_dev *dev);
+static int i40evf_dev_configure(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);
 
 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -223,6 +229,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = {
 	.reta_query           = i40evf_dev_rss_reta_query,
 	.rss_hash_update      = i40evf_dev_rss_hash_update,
 	.rss_hash_conf_get    = i40evf_dev_rss_hash_conf_get,
+	.dev_reset            = i40evf_handle_vf_reset
 };
 
 /*
@@ -1309,6 +1316,82 @@ i40evf_uninit_vf(struct rte_eth_dev *dev)
 }
 
 static void
+i40e_vf_queue_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct i40e_rx_queue *rxq = dev->data->rx_queues[i];
+
+		if (rxq->q_set) {
+			i40e_dev_rx_queue_setup(dev,
+						rxq->queue_id,
+						rxq->nb_rx_desc,
+						rxq->socket_id,
+						&rxq->rxconf,
+						rxq->mp);
+		}
+	}
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct i40e_tx_queue *txq = dev->data->tx_queues[i];
+
+		if (txq->q_set) {
+			i40e_dev_tx_queue_setup(dev,
+						txq->queue_id,
+						txq->nb_tx_desc,
+						txq->socket_id,
+						&txq->txconf);
+		}
+	}
+}
+
+static void
+i40e_vf_reset_dev(struct rte_eth_dev *dev)
+{
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+	i40evf_dev_close(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev close complete");
+	i40evf_dev_uninit(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev detached");
+	memset(dev->data->dev_private, 0,
+	       (uint64_t)&adapter->reset_number - (uint64_t)adapter);
+
+	i40evf_dev_configure(dev);
+	i40evf_dev_init(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev attached");
+	i40e_vf_queue_reset(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf queue reset");
+	i40evf_dev_start(dev);
+	PMD_DRV_LOG(DEBUG, "i40evf dev restart");
+}
+
+static int
+i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+{
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+	if (!dev->data->dev_started)
+		return 0;
+
+	adapter->reset_number = 1;
+	i40e_vf_reset_dev(dev);
+	adapter->reset_number = 0;
+
+	return 0;
+}
+
+void
+i40evf_emulate_vf_reset(uint8_t port_id)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+	i40evf_handle_vf_reset(dev);
+}
+
+static void
 i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
 			   uint8_t *msg,
 			   __rte_unused uint16_t msglen)
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..8dbc64c 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -2148,6 +2148,7 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	uint16_t len, i;
 	uint16_t base, bsf, tc_mapping;
 	int use_def_burst_func = 1;
+	struct rte_eth_rxconf conf = *rx_conf;
 
 	if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
 		struct i40e_vf *vf =
@@ -2186,6 +2187,8 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
 		return -ENOMEM;
 	}
 	rxq->mp = mp;
+	rxq->socket_id = socket_id;
+	rxq->rxconf = conf;
 	rxq->nb_rx_desc = nb_desc;
 	rxq->rx_free_thresh = rx_conf->rx_free_thresh;
 	rxq->queue_id = queue_idx;
@@ -2365,6 +2368,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	uint32_t ring_size;
 	uint16_t tx_rs_thresh, tx_free_thresh;
 	uint16_t i, base, bsf, tc_mapping;
+	struct rte_eth_txconf conf = *tx_conf;
 
 	if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
 		struct i40e_vf *vf =
@@ -2488,6 +2492,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
 	}
 
 	txq->nb_tx_desc = nb_desc;
+	txq->socket_id = socket_id;
+	txq->txconf = conf;
 	txq->tx_rs_thresh = tx_rs_thresh;
 	txq->tx_free_thresh = tx_free_thresh;
 	txq->pthresh = tx_conf->tx_thresh.pthresh;
@@ -2950,8 +2956,12 @@ void
 i40e_dev_free_queues(struct rte_eth_dev *dev)
 {
 	uint16_t i;
+	struct i40e_adapter *adapter =
+		I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 
 	PMD_INIT_FUNC_TRACE();
+	if (adapter->reset_number)
+		return;
 
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 		i40e_dev_rx_queue_release(dev->data->rx_queues[i]);
diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h
index 98179f0..9e1b05a 100644
--- a/drivers/net/i40e/i40e_rxtx.h
+++ b/drivers/net/i40e/i40e_rxtx.h
@@ -140,6 +140,8 @@ struct i40e_rx_queue {
 	bool rx_deferred_start; /**< don't start this queue in dev start */
 	uint16_t rx_using_sse; /**<flag indicate the usage of vPMD for rx */
 	uint8_t dcb_tc;         /**< Traffic class of rx queue */
+	uint8_t socket_id;
+	struct rte_eth_rxconf rxconf;
 };
 
 struct i40e_tx_entry {
@@ -181,6 +183,8 @@ struct i40e_tx_queue {
 	bool q_set; /**< indicate if tx queue has been configured */
 	bool tx_deferred_start; /**< don't start this queue in dev start */
 	uint8_t dcb_tc;         /**< Traffic class of tx queue */
+	uint8_t socket_id;
+	struct rte_eth_txconf txconf;
 };
 
 /** Offload features */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-20  6:24   ` [PATCH v6 1/4] lib/librte_ether: support device reset Wenzhuo Lu
@ 2016-06-20  9:14     ` Jerin Jacob
  2016-06-20 16:17       ` Stephen Hemminger
  2016-06-21  0:51       ` Lu, Wenzhuo
  0 siblings, 2 replies; 72+ messages in thread
From: Jerin Jacob @ 2016-06-20  9:14 UTC (permalink / raw)
  To: Wenzhuo Lu
  Cc: dev, konstantin.ananyev, bruce.richardson, jing.d.chen,
	cunming.liang, jingjing.wu, helin.zhang, thomas.monjalon

On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> Add an API to reset the device.
> It's for VF device in this scenario, kernel PF + DPDK VF.
> When the PF port down->up, APP should call this API to
> reset VF port. Most likely, APP should call it in its
> management thread and guarantee the thread safe. It means
> APP should stop the rx/tx and the device, then reset the
> device, then recover the device and rx/tx.

Following is _a_ use-case for Device reset. But may be not be _the_ use
case. IMO, We need to first say expected behavior of this API and add a use-case
later.

Other use-case would be, PCIe VF with functional level reset for SRIOV
migration.
Are we on same page?

> 
> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> ---
>  doc/guides/nics/overview.rst           |  1 +
>  lib/librte_ether/rte_ethdev.c          | 17 +++++++++++++++++
>  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++++++++++++++++
>  lib/librte_ether/rte_ether_version.map |  7 +++++++
>  4 files changed, 49 insertions(+)
> 
> diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
> index 0bd8fae..c8a4985 100644
> --- a/doc/guides/nics/overview.rst
> +++ b/doc/guides/nics/overview.rst
> @@ -89,6 +89,7 @@ Most of these differences are summarized below.
>     Speed capabilities
>     Link status            Y Y   Y Y   Y Y Y     Y   Y Y Y Y         Y Y         Y Y   Y Y Y Y
>     Link status event      Y Y     Y     Y Y     Y   Y Y             Y Y         Y Y     Y
> +   Link reset                               Y Y   Y     Y Y

More appropriate would be "Device reset" ? Right?

>     Queue status event                                                                   Y
>     Rx interrupt                   Y     Y Y Y Y Y Y Y Y Y Y Y Y Y Y
>     Queue start/stop             Y   Y Y Y Y Y Y     Y Y     Y Y Y Y Y Y               Y   Y Y
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index e148028..6c0449b 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
>  				-ENOTSUP);
>  	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask, en);
>  }
> +
> +int
> +rte_eth_dev_reset(uint8_t port_id)
> +{
> +	struct rte_eth_dev *dev;
> +	int diag;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> +	dev = &rte_eth_devices[port_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
> +
> +	diag = (*dev->dev_ops->dev_reset)(dev);
> +
> +	return diag;
> +}
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 2757510..5b3ba12 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
>  	 uint8_t en);
>  /**< @internal enable/disable the l2 tunnel offload functions */
>  
> +typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev);
> +/**< @internal Function used to reset a configured Ethernet device. */
> +
>  #ifdef RTE_NIC_BYPASS
>  
>  enum {
> @@ -1508,6 +1511,8 @@ struct eth_dev_ops {
>  	eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
>  	/** Enable/disable l2 tunnel offload functions */
>  	eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
> +	/** Reset device. */
> +	eth_dev_reset_t dev_reset;
>  };
>  
>  /**
> @@ -4253,6 +4258,25 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
>  				  uint32_t mask,
>  				  uint8_t en);
>  
> +/**
> + * Reset an ethernet device when it's not working. One scenario is, after PF
> + * port is down and up, the related VF port should be reset.
> + * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
> + * queues, restart the port.
> + * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
> + * APP can drop the packets and release the buffer instead of sending them.

Same as first comment.

> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENODEV) if port identifier is invalid.
> + *   - (-ENOTSUP) if hardware doesn't support this function.
> + */
> +int
> +rte_eth_dev_reset(uint8_t port_id);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
> index 214ecc7..c34207e 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -132,3 +132,10 @@ DPDK_16.04 {
>  	rte_eth_tx_buffer_set_err_callback;
>  
>  } DPDK_2.2;
> +
> +DPDK_16.07 {
> +	global:
> +
> +	rte_eth_dev_reset;
> +
> +} DPDK_16.04;
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-20  9:14     ` Jerin Jacob
@ 2016-06-20 16:17       ` Stephen Hemminger
  2016-06-21  3:51         ` Jerin Jacob
  2016-06-21  0:51       ` Lu, Wenzhuo
  1 sibling, 1 reply; 72+ messages in thread
From: Stephen Hemminger @ 2016-06-20 16:17 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Wenzhuo Lu, dev, konstantin.ananyev, bruce.richardson,
	jing.d.chen, cunming.liang, jingjing.wu, helin.zhang,
	thomas.monjalon

On Mon, 20 Jun 2016 14:44:11 +0530
Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > Add an API to reset the device.
> > It's for VF device in this scenario, kernel PF + DPDK VF.
> > When the PF port down->up, APP should call this API to
> > reset VF port. Most likely, APP should call it in its
> > management thread and guarantee the thread safe. It means
> > APP should stop the rx/tx and the device, then reset the
> > device, then recover the device and rx/tx.
> 
> Following is _a_ use-case for Device reset. But may be not be _the_ use
> case. IMO, We need to first say expected behavior of this API and add a use-case
> later.
> 
> Other use-case would be, PCIe VF with functional level reset for SRIOV
> migration.
> Are we on same page?


In my experience with Linux devices, this is normally handled by the
device driver in the start routine.  Since any use case which needs
this is going to do a stop/reset/start sequence, why not just have
the VF device driver do this in the start routine?.

Adding yet another API and state transistion if not necessary increases
the complexity and required test cases for all devices.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-20  9:14     ` Jerin Jacob
  2016-06-20 16:17       ` Stephen Hemminger
@ 2016-06-21  0:51       ` Lu, Wenzhuo
  1 sibling, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-21  0:51 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Richardson, Bruce, Chen, Jing D, Liang,
	Cunming, Wu, Jingjing, Zhang, Helin, thomas.monjalon

Hi Jerin,


> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Monday, June 20, 2016 5:14 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen, Jing D; Liang,
> Cunming; Wu, Jingjing; Zhang, Helin; thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > Add an API to reset the device.
> > It's for VF device in this scenario, kernel PF + DPDK VF.
> > When the PF port down->up, APP should call this API to reset VF port.
> > Most likely, APP should call it in its management thread and guarantee
> > the thread safe. It means APP should stop the rx/tx and the device,
> > then reset the device, then recover the device and rx/tx.
> 
> Following is _a_ use-case for Device reset. But may be not be _the_ use case.
> IMO, We need to first say expected behavior of this API and add a use-case later.
Thanks for the suggestion, I'll reword it.

> 
> Other use-case would be, PCIe VF with functional level reset for SRIOV migration.
> Are we on same page?
I'm not sure:) Does this SRIOV migration mean the migration of a Logical domain that has a VF assigned to it?

> 
> >
> > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> > ---
> >  doc/guides/nics/overview.rst           |  1 +
> >  lib/librte_ether/rte_ethdev.c          | 17 +++++++++++++++++
> >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++++++++++++++++
> >  lib/librte_ether/rte_ether_version.map |  7 +++++++
> >  4 files changed, 49 insertions(+)
> >
> > diff --git a/doc/guides/nics/overview.rst
> > b/doc/guides/nics/overview.rst index 0bd8fae..c8a4985 100644
> > --- a/doc/guides/nics/overview.rst
> > +++ b/doc/guides/nics/overview.rst
> > @@ -89,6 +89,7 @@ Most of these differences are summarized below.
> >     Speed capabilities
> >     Link status            Y Y   Y Y   Y Y Y     Y   Y Y Y Y         Y Y         Y Y   Y Y Y Y
> >     Link status event      Y Y     Y     Y Y     Y   Y Y             Y Y         Y Y     Y
> > +   Link reset                               Y Y   Y     Y Y
> 
> More appropriate would be "Device reset" ? Right?
Yes, sounds better :)

> 
> >     Queue status event                                                                   Y
> >     Rx interrupt                   Y     Y Y Y Y Y Y Y Y Y Y Y Y Y Y
> >     Queue start/stop             Y   Y Y Y Y Y Y     Y Y     Y Y Y Y Y Y               Y   Y Y
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index e148028..6c0449b 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -3346,3 +3346,20 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t
> port_id,
> >  				-ENOTSUP);
> >  	return (*dev->dev_ops->l2_tunnel_offload_set)(dev, l2_tunnel, mask,
> > en);  }
> > +
> > +int
> > +rte_eth_dev_reset(uint8_t port_id)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	int diag;
> > +
> > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +
> > +	dev = &rte_eth_devices[port_id];
> > +
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
> > +
> > +	diag = (*dev->dev_ops->dev_reset)(dev);
> > +
> > +	return diag;
> > +}
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 2757510..5b3ba12 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1318,6 +1318,9 @@ typedef int (*eth_l2_tunnel_offload_set_t)
> >  	 uint8_t en);
> >  /**< @internal enable/disable the l2 tunnel offload functions */
> >
> > +typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev); /**<
> > +@internal Function used to reset a configured Ethernet device. */
> > +
> >  #ifdef RTE_NIC_BYPASS
> >
> >  enum {
> > @@ -1508,6 +1511,8 @@ struct eth_dev_ops {
> >  	eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf;
> >  	/** Enable/disable l2 tunnel offload functions */
> >  	eth_l2_tunnel_offload_set_t l2_tunnel_offload_set;
> > +	/** Reset device. */
> > +	eth_dev_reset_t dev_reset;
> >  };
> >
> >  /**
> > @@ -4253,6 +4258,25 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t
> port_id,
> >  				  uint32_t mask,
> >  				  uint8_t en);
> >
> > +/**
> > + * Reset an ethernet device when it's not working. One scenario is,
> > +after PF
> > + * port is down and up, the related VF port should be reset.
> > + * The API will stop the port, clear the rx/tx queues, re-setup the
> > +rx/tx
> > + * queues, restart the port.
> > + * Before calling this API, APP should stop the rx/tx. When tx is
> > +being stopped,
> > + * APP can drop the packets and release the buffer instead of sending them.
> 
> Same as first comment.
I'll reword it.

> 
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + *
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENODEV) if port identifier is invalid.
> > + *   - (-ENOTSUP) if hardware doesn't support this function.
> > + */
> > +int
> > +rte_eth_dev_reset(uint8_t port_id);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_ether/rte_ether_version.map
> > b/lib/librte_ether/rte_ether_version.map
> > index 214ecc7..c34207e 100644
> > --- a/lib/librte_ether/rte_ether_version.map
> > +++ b/lib/librte_ether/rte_ether_version.map
> > @@ -132,3 +132,10 @@ DPDK_16.04 {
> >  	rte_eth_tx_buffer_set_err_callback;
> >
> >  } DPDK_2.2;
> > +
> > +DPDK_16.07 {
> > +	global:
> > +
> > +	rte_eth_dev_reset;
> > +
> > +} DPDK_16.04;
> > --
> > 1.9.3
> >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-20 16:17       ` Stephen Hemminger
@ 2016-06-21  3:51         ` Jerin Jacob
  2016-06-21  6:14           ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-21  3:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Wenzhuo Lu, dev, konstantin.ananyev, bruce.richardson,
	jing.d.chen, cunming.liang, jingjing.wu, helin.zhang,
	thomas.monjalon

On Mon, Jun 20, 2016 at 09:17:14AM -0700, Stephen Hemminger wrote:
> On Mon, 20 Jun 2016 14:44:11 +0530
> Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> 
> > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > Add an API to reset the device.
> > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > When the PF port down->up, APP should call this API to
> > > reset VF port. Most likely, APP should call it in its
> > > management thread and guarantee the thread safe. It means
> > > APP should stop the rx/tx and the device, then reset the
> > > device, then recover the device and rx/tx.
> > 
> > Following is _a_ use-case for Device reset. But may be not be _the_ use
> > case. IMO, We need to first say expected behavior of this API and add a use-case
> > later.
> > 
> > Other use-case would be, PCIe VF with functional level reset for SRIOV
> > migration.
> > Are we on same page?
> 
> 
> In my experience with Linux devices, this is normally handled by the
> device driver in the start routine.  Since any use case which needs
> this is going to do a stop/reset/start sequence, why not just have
> the VF device driver do this in the start routine?.
> 
> Adding yet another API and state transistion if not necessary increases
> the complexity and required test cases for all devices.

I agree with Stephen here.I think if application needs to call start
after the device reset then we could add this logic in start itself
rather exposing a yet another API

> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21  3:51         ` Jerin Jacob
@ 2016-06-21  6:14           ` Lu, Wenzhuo
  2016-06-21  7:37             ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-21  6:14 UTC (permalink / raw)
  To: Jerin Jacob, Stephen Hemminger
  Cc: dev, Ananyev, Konstantin, Richardson, Bruce, Chen, Jing D, Liang,
	Cunming, Wu, Jingjing, Zhang, Helin, thomas.monjalon

Hi Jerin, Stephen,


> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, June 21, 2016 11:51 AM
> To: Stephen Hemminger
> Cc: Lu, Wenzhuo; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen,
> Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Mon, Jun 20, 2016 at 09:17:14AM -0700, Stephen Hemminger wrote:
> > On Mon, 20 Jun 2016 14:44:11 +0530
> > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> >
> > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > Add an API to reset the device.
> > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > When the PF port down->up, APP should call this API to reset VF
> > > > port. Most likely, APP should call it in its management thread and
> > > > guarantee the thread safe. It means APP should stop the rx/tx and
> > > > the device, then reset the device, then recover the device and
> > > > rx/tx.
> > >
> > > Following is _a_ use-case for Device reset. But may be not be _the_
> > > use case. IMO, We need to first say expected behavior of this API
> > > and add a use-case later.
> > >
> > > Other use-case would be, PCIe VF with functional level reset for
> > > SRIOV migration.
> > > Are we on same page?
> >
> >
> > In my experience with Linux devices, this is normally handled by the
> > device driver in the start routine.  Since any use case which needs
> > this is going to do a stop/reset/start sequence, why not just have the
> > VF device driver do this in the start routine?.
> >
> > Adding yet another API and state transistion if not necessary
> > increases the complexity and required test cases for all devices.
> 
> I agree with Stephen here.I think if application needs to call start after the
> device reset then we could add this logic in start itself rather exposing a yet
> another API
Do you mean changing the device_start to include all these actions, stop device -> stop queue -> re-setup queue -> start queue -> start device ?

> 
> >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21  6:14           ` Lu, Wenzhuo
@ 2016-06-21  7:37             ` Jerin Jacob
  2016-06-21  8:24               ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-21  7:37 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Stephen Hemminger, dev, Ananyev, Konstantin, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Tue, Jun 21, 2016 at 06:14:29AM +0000, Lu, Wenzhuo wrote:
> Hi Jerin, Stephen,
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Tuesday, June 21, 2016 11:51 AM
> > To: Stephen Hemminger
> > Cc: Lu, Wenzhuo; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen,
> > Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > 
> > On Mon, Jun 20, 2016 at 09:17:14AM -0700, Stephen Hemminger wrote:
> > > On Mon, 20 Jun 2016 14:44:11 +0530
> > > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:
> > >
> > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > Add an API to reset the device.
> > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > When the PF port down->up, APP should call this API to reset VF
> > > > > port. Most likely, APP should call it in its management thread and
> > > > > guarantee the thread safe. It means APP should stop the rx/tx and
> > > > > the device, then reset the device, then recover the device and
> > > > > rx/tx.
> > > >
> > > > Following is _a_ use-case for Device reset. But may be not be _the_
> > > > use case. IMO, We need to first say expected behavior of this API
> > > > and add a use-case later.
> > > >
> > > > Other use-case would be, PCIe VF with functional level reset for
> > > > SRIOV migration.
> > > > Are we on same page?
> > >
> > >
> > > In my experience with Linux devices, this is normally handled by the
> > > device driver in the start routine.  Since any use case which needs
> > > this is going to do a stop/reset/start sequence, why not just have the
> > > VF device driver do this in the start routine?.
> > >
> > > Adding yet another API and state transistion if not necessary
> > > increases the complexity and required test cases for all devices.
> > 
> > I agree with Stephen here.I think if application needs to call start after the
> > device reset then we could add this logic in start itself rather exposing a yet
> > another API
> Do you mean changing the device_start to include all these actions, stop device -> stop queue -> re-setup queue -> start queue -> start device ?

What was the expected API call sequence when you were introduced this API?

Point was to have implicit device reset in the API call
sequence(Wherever make sense for specific PMD)

Jerin

> 
> > 
> > >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21  7:37             ` Jerin Jacob
@ 2016-06-21  8:24               ` Lu, Wenzhuo
  2016-06-21  8:55                 ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-21  8:24 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Stephen Hemminger, dev, Ananyev, Konstantin, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, June 21, 2016 3:37 PM
> To: Lu, Wenzhuo
> Cc: Stephen Hemminger; dev@dpdk.org; Ananyev, Konstantin; Richardson,
> Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Tue, Jun 21, 2016 at 06:14:29AM +0000, Lu, Wenzhuo wrote:
> > Hi Jerin, Stephen,
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Tuesday, June 21, 2016 11:51 AM
> > > To: Stephen Hemminger
> > > Cc: Lu, Wenzhuo; dev@dpdk.org; Ananyev, Konstantin; Richardson,
> > > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > > thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > device reset
> > >
> > > On Mon, Jun 20, 2016 at 09:17:14AM -0700, Stephen Hemminger wrote:
> > > > On Mon, 20 Jun 2016 14:44:11 +0530 Jerin Jacob
> > > > <jerin.jacob@caviumnetworks.com> wrote:
> > > >
> > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > Add an API to reset the device.
> > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > VF port. Most likely, APP should call it in its management
> > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > the device and rx/tx.
> > > > >
> > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > this API and add a use-case later.
> > > > >
> > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > SRIOV migration.
> > > > > Are we on same page?
> > > >
> > > >
> > > > In my experience with Linux devices, this is normally handled by
> > > > the device driver in the start routine.  Since any use case which
> > > > needs this is going to do a stop/reset/start sequence, why not
> > > > just have the VF device driver do this in the start routine?.
> > > >
> > > > Adding yet another API and state transistion if not necessary
> > > > increases the complexity and required test cases for all devices.
> > >
> > > I agree with Stephen here.I think if application needs to call start
> > > after the device reset then we could add this logic in start itself
> > > rather exposing a yet another API
> > Do you mean changing the device_start to include all these actions, stop
> device -> stop queue -> re-setup queue -> start queue -> start device ?
> 
> What was the expected API call sequence when you were introduced this API?
> 
> Point was to have implicit device reset in the API call sequence(Wherever make
> sense for specific PMD)
I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -> rte_eth_dev_start. 
Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit is to save the code for APP.

> 
> Jerin
> 
> >
> > >
> > > >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21  8:24               ` Lu, Wenzhuo
@ 2016-06-21  8:55                 ` Jerin Jacob
  2016-06-21  9:26                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-21  8:55 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Stephen Hemminger, dev, Ananyev, Konstantin, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Tue, Jun 21, 2016 at 08:24:36AM +0000, Lu, Wenzhuo wrote:
> Hi Jerin,

Hi Wenzhuo,

> > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > Add an API to reset the device.
> > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > the device and rx/tx.
> > > > > >
> > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > this API and add a use-case later.
> > > > > >
> > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > SRIOV migration.
> > > > > > Are we on same page?
> > > > >
> > > > >
> > > > > In my experience with Linux devices, this is normally handled by
> > > > > the device driver in the start routine.  Since any use case which
> > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > just have the VF device driver do this in the start routine?.
> > > > >
> > > > > Adding yet another API and state transistion if not necessary
> > > > > increases the complexity and required test cases for all devices.
> > > >
> > > > I agree with Stephen here.I think if application needs to call start
> > > > after the device reset then we could add this logic in start itself
> > > > rather exposing a yet another API
> > > Do you mean changing the device_start to include all these actions, stop
> > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > 
> > What was the expected API call sequence when you were introduced this API?
> > 
> > Point was to have implicit device reset in the API call sequence(Wherever make
> > sense for specific PMD)
> I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -> rte_eth_dev_start. 
> Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit is to save the code for APP.

Then I am bit confused with original commit log description.
|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
I was under impression that it a low level reset API for this device? Is
n't it?

The other issue is generalized outlook of the API, Certain PMD will not
have PF link down/up event? Link down/up and only connected to VF and PF
only for configuration.

How about fixing it more transparently in PMD driver itself as
PMD driver knows the PF link up/down event, Is it possible to
recover the VF on that event if its only matter of resetting it?

Jerin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21  8:55                 ` Jerin Jacob
@ 2016-06-21  9:26                   ` Ananyev, Konstantin
  2016-06-21 10:57                     ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Ananyev, Konstantin @ 2016-06-21  9:26 UTC (permalink / raw)
  To: Jerin Jacob, Lu, Wenzhuo
  Cc: Stephen Hemminger, dev, Richardson, Bruce, Chen, Jing D, Liang,
	Cunming, Wu, Jingjing, Zhang, Helin, thomas.monjalon

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, June 21, 2016 9:56 AM
> To: Lu, Wenzhuo
> Cc: Stephen Hemminger; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang,
> Helin; thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Tue, Jun 21, 2016 at 08:24:36AM +0000, Lu, Wenzhuo wrote:
> > Hi Jerin,
> 
> Hi Wenzhuo,
> 
> > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > > Add an API to reset the device.
> > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > > the device and rx/tx.
> > > > > > >
> > > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > > this API and add a use-case later.
> > > > > > >
> > > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > > SRIOV migration.
> > > > > > > Are we on same page?
> > > > > >
> > > > > >
> > > > > > In my experience with Linux devices, this is normally handled by
> > > > > > the device driver in the start routine.  Since any use case which
> > > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > > just have the VF device driver do this in the start routine?.
> > > > > >
> > > > > > Adding yet another API and state transistion if not necessary
> > > > > > increases the complexity and required test cases for all devices.
> > > > >
> > > > > I agree with Stephen here.I think if application needs to call start
> > > > > after the device reset then we could add this logic in start itself
> > > > > rather exposing a yet another API
> > > > Do you mean changing the device_start to include all these actions, stop
> > > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > >
> > > What was the expected API call sequence when you were introduced this API?
> > >
> > > Point was to have implicit device reset in the API call sequence(Wherever make
> > > sense for specific PMD)
> > I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API
> call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup ->
> rte_eth_dev_start.
> > Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit
> is to save the code for APP.
> 
> Then I am bit confused with original commit log description.
> |
> |It means APP should stop the rx/tx and the device, then reset the
> |device, then recover the device and rx/tx.
> |
> I was under impression that it a low level reset API for this device? Is
> n't it?
> 
> The other issue is generalized outlook of the API, Certain PMD will not
> have PF link down/up event? Link down/up and only connected to VF and PF
> only for configuration.
> 
> How about fixing it more transparently in PMD driver itself as
> PMD driver knows the PF link up/down event, Is it possible to
> recover the VF on that event if its only matter of resetting it?

I think we already went through that discussion on the list.
Unfortunately with current dpdk design it is hardly possible.
To achieve that we need to introduce some sort of synchronisation
between IO and control APIs (locking or so).
Actually I am not sure why having a special reset function will be a problem.
Yes, it would exist only for VFs, for PF it could be left unimplemented.
Though it definitely seems more convenient from user point of view,
they would know: to handle VF reset event, they just need to call that
particular function, not to re-implement their own.

Konstantin

> 
> Jerin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21  9:26                   ` Ananyev, Konstantin
@ 2016-06-21 10:57                     ` Jerin Jacob
  2016-06-21 13:10                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-21 10:57 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Lu, Wenzhuo, Stephen Hemminger, dev, Richardson, Bruce, Chen,
	Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Tue, Jun 21, 2016 at 09:26:12AM +0000, Ananyev, Konstantin wrote:

Hi Konstantin,

> Hi Jerin,
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Tuesday, June 21, 2016 9:56 AM
> > To: Lu, Wenzhuo
> > Cc: Stephen Hemminger; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang,
> > Helin; thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > 
> > On Tue, Jun 21, 2016 at 08:24:36AM +0000, Lu, Wenzhuo wrote:
> > > Hi Jerin,
> > 
> > Hi Wenzhuo,
> > 
> > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > > > Add an API to reset the device.
> > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > > > the device and rx/tx.
> > > > > > > >
> > > > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > > > this API and add a use-case later.
> > > > > > > >
> > > > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > > > SRIOV migration.
> > > > > > > > Are we on same page?
> > > > > > >
> > > > > > >
> > > > > > > In my experience with Linux devices, this is normally handled by
> > > > > > > the device driver in the start routine.  Since any use case which
> > > > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > > > just have the VF device driver do this in the start routine?.
> > > > > > >
> > > > > > > Adding yet another API and state transistion if not necessary
> > > > > > > increases the complexity and required test cases for all devices.
> > > > > >
> > > > > > I agree with Stephen here.I think if application needs to call start
> > > > > > after the device reset then we could add this logic in start itself
> > > > > > rather exposing a yet another API
> > > > > Do you mean changing the device_start to include all these actions, stop
> > > > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > > >
> > > > What was the expected API call sequence when you were introduced this API?
> > > >
> > > > Point was to have implicit device reset in the API call sequence(Wherever make
> > > > sense for specific PMD)
> > > I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this API
> > call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup ->
> > rte_eth_dev_start.
> > > Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The benefit
> > is to save the code for APP.
> > 
> > Then I am bit confused with original commit log description.
> > |
> > |It means APP should stop the rx/tx and the device, then reset the
> > |device, then recover the device and rx/tx.
> > |
> > I was under impression that it a low level reset API for this device? Is
> > n't it?
> > 
> > The other issue is generalized outlook of the API, Certain PMD will not
> > have PF link down/up event? Link down/up and only connected to VF and PF
> > only for configuration.
> > 
> > How about fixing it more transparently in PMD driver itself as
> > PMD driver knows the PF link up/down event, Is it possible to
> > recover the VF on that event if its only matter of resetting it?
> 
> I think we already went through that discussion on the list.
> Unfortunately with current dpdk design it is hardly possible.
> To achieve that we need to introduce some sort of synchronisation
> between IO and control APIs (locking or so).
> Actually I am not sure why having a special reset function will be a problem.

|
|It means APP should stop the rx/tx and the device, then reset the
|device, then recover the device and rx/tx.
|
Just to understand, If application still need  to do the stop then what
value addtion reset API brings on the table?


> Yes, it would exist only for VFs, for PF it could be left unimplemented.
> Though it definitely seems more convenient from user point of view,
> they would know: to handle VF reset event, they just need to call that
> particular function, not to re-implement their own.
What if driver returns "not implemented" then application will have do
generic rte_eth_dev_stop/rte_eth_dev_start.That way in application
perspective we are NOT solving any problem.

Jerin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21 10:57                     ` Jerin Jacob
@ 2016-06-21 13:10                       ` Ananyev, Konstantin
  2016-06-21 13:30                         ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Ananyev, Konstantin @ 2016-06-21 13:10 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Lu, Wenzhuo, Stephen Hemminger, dev, Richardson, Bruce, Chen,
	Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon



> 
> Hi Konstantin,
> 
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Tuesday, June 21, 2016 9:56 AM
> > > To: Lu, Wenzhuo
> > > Cc: Stephen Hemminger; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> Zhang,
> > > Helin; thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > >
> > > On Tue, Jun 21, 2016 at 08:24:36AM +0000, Lu, Wenzhuo wrote:
> > > > Hi Jerin,
> > >
> > > Hi Wenzhuo,
> > >
> > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > > > > the device and rx/tx.
> > > > > > > > >
> > > > > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > > > > this API and add a use-case later.
> > > > > > > > >
> > > > > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > > > > SRIOV migration.
> > > > > > > > > Are we on same page?
> > > > > > > >
> > > > > > > >
> > > > > > > > In my experience with Linux devices, this is normally handled by
> > > > > > > > the device driver in the start routine.  Since any use case which
> > > > > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > > > > just have the VF device driver do this in the start routine?.
> > > > > > > >
> > > > > > > > Adding yet another API and state transistion if not necessary
> > > > > > > > increases the complexity and required test cases for all devices.
> > > > > > >
> > > > > > > I agree with Stephen here.I think if application needs to call start
> > > > > > > after the device reset then we could add this logic in start itself
> > > > > > > rather exposing a yet another API
> > > > > > Do you mean changing the device_start to include all these actions, stop
> > > > > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > > > >
> > > > > What was the expected API call sequence when you were introduced this API?
> > > > >
> > > > > Point was to have implicit device reset in the API call sequence(Wherever make
> > > > > sense for specific PMD)
> > > > I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this
> API
> > > call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -
> >
> > > rte_eth_dev_start.
> > > > Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
> benefit
> > > is to save the code for APP.
> > >
> > > Then I am bit confused with original commit log description.
> > > |
> > > |It means APP should stop the rx/tx and the device, then reset the
> > > |device, then recover the device and rx/tx.
> > > |
> > > I was under impression that it a low level reset API for this device? Is
> > > n't it?
> > >
> > > The other issue is generalized outlook of the API, Certain PMD will not
> > > have PF link down/up event? Link down/up and only connected to VF and PF
> > > only for configuration.
> > >
> > > How about fixing it more transparently in PMD driver itself as
> > > PMD driver knows the PF link up/down event, Is it possible to
> > > recover the VF on that event if its only matter of resetting it?
> >
> > I think we already went through that discussion on the list.
> > Unfortunately with current dpdk design it is hardly possible.
> > To achieve that we need to introduce some sort of synchronisation
> > between IO and control APIs (locking or so).
> > Actually I am not sure why having a special reset function will be a problem.
> 
> |
> |It means APP should stop the rx/tx and the device, then reset the
> |device, then recover the device and rx/tx.
> |
> Just to understand, If application still need  to do the stop then what
> value addtion reset API brings on the table?

If application calls dev_reset() it doesn't need to call dev_stop() before it.
dev_reset() will take care of it. 
But it needs to make sure that no other thread will try to modify that device state
(either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.

> 
> 
> > Yes, it would exist only for VFs, for PF it could be left unimplemented.
> > Though it definitely seems more convenient from user point of view,
> > they would know: to handle VF reset event, they just need to call that
> > particular function, not to re-implement their own.
> What if driver returns "not implemented" then application will have do
> generic rte_eth_dev_stop/rte_eth_dev_start.
>That way in application  perspective we are NOT solving any problem.

True, but as I said for PF application would just never receive such event.
I suppose it is possible to implement one for PF too, I just don't see
much point - as probably no-one will ever use it.

Konstantin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21 13:10                       ` Ananyev, Konstantin
@ 2016-06-21 13:30                         ` Jerin Jacob
  2016-06-21 14:03                           ` Ananyev, Konstantin
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-21 13:30 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Lu, Wenzhuo, Stephen Hemminger, dev, Richardson, Bruce, Chen,
	Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Tue, Jun 21, 2016 at 01:10:40PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > 
> > Hi Konstantin,
> > 
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Tuesday, June 21, 2016 9:56 AM
> > > > To: Lu, Wenzhuo
> > > > Cc: Stephen Hemminger; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> > Zhang,
> > > > Helin; thomas.monjalon@6wind.com
> > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > > >
> > > > On Tue, Jun 21, 2016 at 08:24:36AM +0000, Lu, Wenzhuo wrote:
> > > > > Hi Jerin,
> > > >
> > > > Hi Wenzhuo,
> > > >
> > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > > > > > the device and rx/tx.
> > > > > > > > > >
> > > > > > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > > > > > this API and add a use-case later.
> > > > > > > > > >
> > > > > > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > > > > > SRIOV migration.
> > > > > > > > > > Are we on same page?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In my experience with Linux devices, this is normally handled by
> > > > > > > > > the device driver in the start routine.  Since any use case which
> > > > > > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > > > > > just have the VF device driver do this in the start routine?.
> > > > > > > > >
> > > > > > > > > Adding yet another API and state transistion if not necessary
> > > > > > > > > increases the complexity and required test cases for all devices.
> > > > > > > >
> > > > > > > > I agree with Stephen here.I think if application needs to call start
> > > > > > > > after the device reset then we could add this logic in start itself
> > > > > > > > rather exposing a yet another API
> > > > > > > Do you mean changing the device_start to include all these actions, stop
> > > > > > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > > > > >
> > > > > > What was the expected API call sequence when you were introduced this API?
> > > > > >
> > > > > > Point was to have implicit device reset in the API call sequence(Wherever make
> > > > > > sense for specific PMD)
> > > > > I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use this
> > API
> > > > call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup -> rte_eth_tx_queue_setup -
> > >
> > > > rte_eth_dev_start.
> > > > > Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
> > benefit
> > > > is to save the code for APP.
> > > >
> > > > Then I am bit confused with original commit log description.
> > > > |
> > > > |It means APP should stop the rx/tx and the device, then reset the
> > > > |device, then recover the device and rx/tx.
> > > > |
> > > > I was under impression that it a low level reset API for this device? Is
> > > > n't it?
> > > >
> > > > The other issue is generalized outlook of the API, Certain PMD will not
> > > > have PF link down/up event? Link down/up and only connected to VF and PF
> > > > only for configuration.
> > > >
> > > > How about fixing it more transparently in PMD driver itself as
> > > > PMD driver knows the PF link up/down event, Is it possible to
> > > > recover the VF on that event if its only matter of resetting it?
> > >
> > > I think we already went through that discussion on the list.
> > > Unfortunately with current dpdk design it is hardly possible.
> > > To achieve that we need to introduce some sort of synchronisation
> > > between IO and control APIs (locking or so).
> > > Actually I am not sure why having a special reset function will be a problem.
> > 
> > |
> > |It means APP should stop the rx/tx and the device, then reset the
> > |device, then recover the device and rx/tx.
> > |
> > Just to understand, If application still need  to do the stop then what
> > value addtion reset API brings on the table?
> 
> If application calls dev_reset() it doesn't need to call dev_stop() before it.
> dev_reset() will take care of it. 
> But it needs to make sure that no other thread will try to modify that device state
> (either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.

OK. This description looks different than commit log and API doxygen comment. Please fix it.
How about a different name for this API. Device reset is too generic?

> 
> > 
> > 
> > > Yes, it would exist only for VFs, for PF it could be left unimplemented.
> > > Though it definitely seems more convenient from user point of view,
> > > they would know: to handle VF reset event, they just need to call that
> > > particular function, not to re-implement their own.
> > What if driver returns "not implemented" then application will have do
> > generic rte_eth_dev_stop/rte_eth_dev_start.
> >That way in application  perspective we are NOT solving any problem.
> 
> True, but as I said for PF application would just never receive such event.
What is this event ? Is it VF Link up/down event?

No I was referring to VF itself, Other VF PMD drivers in drivers/net
where this callback is not implemented.

Jerin


> I suppose it is possible to implement one for PF too, I just don't see
> much point - as probably no-one will ever use it.
> 
> Konstantin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21 13:30                         ` Jerin Jacob
@ 2016-06-21 14:03                           ` Ananyev, Konstantin
  2016-06-21 14:29                             ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Ananyev, Konstantin @ 2016-06-21 14:03 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Lu, Wenzhuo, Stephen Hemminger, dev, Richardson, Bruce, Chen,
	Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, June 21, 2016 2:31 PM
> To: Ananyev, Konstantin
> Cc: Lu, Wenzhuo; Stephen Hemminger; dev@dpdk.org; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Tue, Jun 21, 2016 at 01:10:40PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > >
> > > Hi Konstantin,
> > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > Sent: Tuesday, June 21, 2016 9:56 AM
> > > > > To: Lu, Wenzhuo
> > > > > Cc: Stephen Hemminger; dev@dpdk.org; Ananyev, Konstantin; Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> > > Zhang,
> > > > > Helin; thomas.monjalon@6wind.com
> > > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > > > >
> > > > > On Tue, Jun 21, 2016 at 08:24:36AM +0000, Lu, Wenzhuo wrote:
> > > > > > Hi Jerin,
> > > > >
> > > > > Hi Wenzhuo,
> > > > >
> > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > > > > > > the device and rx/tx.
> > > > > > > > > > >
> > > > > > > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > > > > > > this API and add a use-case later.
> > > > > > > > > > >
> > > > > > > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > > > > > > SRIOV migration.
> > > > > > > > > > > Are we on same page?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In my experience with Linux devices, this is normally handled by
> > > > > > > > > > the device driver in the start routine.  Since any use case which
> > > > > > > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > > > > > > just have the VF device driver do this in the start routine?.
> > > > > > > > > >
> > > > > > > > > > Adding yet another API and state transistion if not necessary
> > > > > > > > > > increases the complexity and required test cases for all devices.
> > > > > > > > >
> > > > > > > > > I agree with Stephen here.I think if application needs to call start
> > > > > > > > > after the device reset then we could add this logic in start itself
> > > > > > > > > rather exposing a yet another API
> > > > > > > > Do you mean changing the device_start to include all these actions, stop
> > > > > > > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > > > > > >
> > > > > > > What was the expected API call sequence when you were introduced this API?
> > > > > > >
> > > > > > > Point was to have implicit device reset in the API call sequence(Wherever make
> > > > > > > sense for specific PMD)
> > > > > > I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use
> this
> > > API
> > > > > call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> rte_eth_tx_queue_setup -
> > > >
> > > > > rte_eth_dev_start.
> > > > > > Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
> > > benefit
> > > > > is to save the code for APP.
> > > > >
> > > > > Then I am bit confused with original commit log description.
> > > > > |
> > > > > |It means APP should stop the rx/tx and the device, then reset the
> > > > > |device, then recover the device and rx/tx.
> > > > > |
> > > > > I was under impression that it a low level reset API for this device? Is
> > > > > n't it?
> > > > >
> > > > > The other issue is generalized outlook of the API, Certain PMD will not
> > > > > have PF link down/up event? Link down/up and only connected to VF and PF
> > > > > only for configuration.
> > > > >
> > > > > How about fixing it more transparently in PMD driver itself as
> > > > > PMD driver knows the PF link up/down event, Is it possible to
> > > > > recover the VF on that event if its only matter of resetting it?
> > > >
> > > > I think we already went through that discussion on the list.
> > > > Unfortunately with current dpdk design it is hardly possible.
> > > > To achieve that we need to introduce some sort of synchronisation
> > > > between IO and control APIs (locking or so).
> > > > Actually I am not sure why having a special reset function will be a problem.
> > >
> > > |
> > > |It means APP should stop the rx/tx and the device, then reset the
> > > |device, then recover the device and rx/tx.
> > > |
> > > Just to understand, If application still need  to do the stop then what
> > > value addtion reset API brings on the table?
> >
> > If application calls dev_reset() it doesn't need to call dev_stop() before it.
> > dev_reset() will take care of it.
> > But it needs to make sure that no other thread will try to modify that device state
> > (either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.
> 
> OK. This description looks different than commit log and API doxygen comment. Please fix it.
> How about a different name for this API. Device reset is too generic?
> 
> >
> > >
> > >
> > > > Yes, it would exist only for VFs, for PF it could be left unimplemented.
> > > > Though it definitely seems more convenient from user point of view,
> > > > they would know: to handle VF reset event, they just need to call that
> > > > particular function, not to re-implement their own.
> > > What if driver returns "not implemented" then application will have do
> > > generic rte_eth_dev_stop/rte_eth_dev_start.
> > >That way in application  perspective we are NOT solving any problem.
> >
> > True, but as I said for PF application would just never receive such event.
> What is this event ? Is it VF Link up/down event?
> 
> No I was referring to VF itself, Other VF PMD drivers in drivers/net
> where this callback is not implemented.

Hmm, the only suggestion I have here -
Maintainers/developers of non-Intel PMD will implement it for their VFs?
In case of course they do need to handle similar event.
if not I suppose there is no harm to left it unimplemented.
Konstantin

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21 14:03                           ` Ananyev, Konstantin
@ 2016-06-21 14:29                             ` Jerin Jacob
  2016-06-22  1:35                               ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-21 14:29 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Lu, Wenzhuo, Stephen Hemminger, dev, Richardson, Bruce, Chen,
	Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Tue, Jun 21, 2016 at 02:03:15PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > > > > > Hi Wenzhuo,
> > > > > >
> > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu wrote:
> > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > > > When the PF port down->up, APP should call this API to reset
> > > > > > > > > > > > > VF port. Most likely, APP should call it in its management
> > > > > > > > > > > > > thread and guarantee the thread safe. It means APP should stop
> > > > > > > > > > > > > the rx/tx and the device, then reset the device, then recover
> > > > > > > > > > > > > the device and rx/tx.
> > > > > > > > > > > >
> > > > > > > > > > > > Following is _a_ use-case for Device reset. But may be not be
> > > > > > > > > > > > _the_ use case. IMO, We need to first say expected behavior of
> > > > > > > > > > > > this API and add a use-case later.
> > > > > > > > > > > >
> > > > > > > > > > > > Other use-case would be, PCIe VF with functional level reset for
> > > > > > > > > > > > SRIOV migration.
> > > > > > > > > > > > Are we on same page?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > In my experience with Linux devices, this is normally handled by
> > > > > > > > > > > the device driver in the start routine.  Since any use case which
> > > > > > > > > > > needs this is going to do a stop/reset/start sequence, why not
> > > > > > > > > > > just have the VF device driver do this in the start routine?.
> > > > > > > > > > >
> > > > > > > > > > > Adding yet another API and state transistion if not necessary
> > > > > > > > > > > increases the complexity and required test cases for all devices.
> > > > > > > > > >
> > > > > > > > > > I agree with Stephen here.I think if application needs to call start
> > > > > > > > > > after the device reset then we could add this logic in start itself
> > > > > > > > > > rather exposing a yet another API
> > > > > > > > > Do you mean changing the device_start to include all these actions, stop
> > > > > > > > device -> stop queue -> re-setup queue -> start queue -> start device ?
> > > > > > > >
> > > > > > > > What was the expected API call sequence when you were introduced this API?
> > > > > > > >
> > > > > > > > Point was to have implicit device reset in the API call sequence(Wherever make
> > > > > > > > sense for specific PMD)
> > > > > > > I think the API call sequence depends on the implementation of the APP. Let's say if there's not this reset API, APP can use
> > this
> > > > API
> > > > > > call sequence to handle the PF link down/up event, rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > rte_eth_tx_queue_setup -
> > > > >
> > > > > > rte_eth_dev_start.
> > > > > > > Actually our purpose is to use this reset API instead of the API call sequence. You can see the reset API is not necessary. The
> > > > benefit
> > > > > > is to save the code for APP.
> > > > > >
> > > > > > Then I am bit confused with original commit log description.
> > > > > > |
> > > > > > |It means APP should stop the rx/tx and the device, then reset the
> > > > > > |device, then recover the device and rx/tx.
> > > > > > |
> > > > > > I was under impression that it a low level reset API for this device? Is
> > > > > > n't it?
> > > > > >
> > > > > > The other issue is generalized outlook of the API, Certain PMD will not
> > > > > > have PF link down/up event? Link down/up and only connected to VF and PF
> > > > > > only for configuration.
> > > > > >
> > > > > > How about fixing it more transparently in PMD driver itself as
> > > > > > PMD driver knows the PF link up/down event, Is it possible to
> > > > > > recover the VF on that event if its only matter of resetting it?
> > > > >
> > > > > I think we already went through that discussion on the list.
> > > > > Unfortunately with current dpdk design it is hardly possible.
> > > > > To achieve that we need to introduce some sort of synchronisation
> > > > > between IO and control APIs (locking or so).
> > > > > Actually I am not sure why having a special reset function will be a problem.
> > > >
> > > > |
> > > > |It means APP should stop the rx/tx and the device, then reset the
> > > > |device, then recover the device and rx/tx.
> > > > |
> > > > Just to understand, If application still need  to do the stop then what
> > > > value addtion reset API brings on the table?
> > >
> > > If application calls dev_reset() it doesn't need to call dev_stop() before it.
> > > dev_reset() will take care of it.
> > > But it needs to make sure that no other thread will try to modify that device state
> > > (either dev_stop/start, or eth_rx_busrst/eth_tx_burst) while the reset op is in place.
> > 
> > OK. This description looks different than commit log and API doxygen comment. Please fix it.
> > How about a different name for this API. Device reset is too generic?
> > 
> > >
> > > >
> > > >
> > > > > Yes, it would exist only for VFs, for PF it could be left unimplemented.
> > > > > Though it definitely seems more convenient from user point of view,
> > > > > they would know: to handle VF reset event, they just need to call that
> > > > > particular function, not to re-implement their own.
> > > > What if driver returns "not implemented" then application will have do
> > > > generic rte_eth_dev_stop/rte_eth_dev_start.
> > > >That way in application  perspective we are NOT solving any problem.
> > >
> > > True, but as I said for PF application would just never receive such event.
> > What is this event ? Is it VF Link up/down event?
> > 
> > No I was referring to VF itself, Other VF PMD drivers in drivers/net
> > where this callback is not implemented.
> 
> Hmm, the only suggestion I have here -
> Maintainers/developers of non-Intel PMD will implement it for their VFs?

That's fine. But, We have to know what to implement here in PMD perspective?
That's reason being asking about the API expectation and application usage :-)

> In case of course they do need to handle similar event.
Which is this event and How application get notify it.

> if not I suppose there is no harm to left it unimplemented.
OK. If it is for VF/PF link down-up event then I will make it as 'nop'.

Jerin

> Konstantin
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-21 14:29                             ` Jerin Jacob
@ 2016-06-22  1:35                               ` Lu, Wenzhuo
  2016-06-22  2:37                                 ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-22  1:35 UTC (permalink / raw)
  To: Jerin Jacob, Ananyev, Konstantin
  Cc: Stephen Hemminger, dev, Richardson, Bruce, Chen, Jing D, Liang,
	Cunming, Wu, Jingjing, Zhang, Helin, thomas.monjalon

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, June 21, 2016 10:29 PM
> To: Ananyev, Konstantin
> Cc: Lu, Wenzhuo; Stephen Hemminger; dev@dpdk.org; Richardson, Bruce; Chen,
> Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Tue, Jun 21, 2016 at 02:03:15PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > > > > > Hi Wenzhuo,
> > > > > > >
> > > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu
> wrote:
> > > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > > > > When the PF port down->up, APP should call
> > > > > > > > > > > > > > this API to reset VF port. Most likely, APP
> > > > > > > > > > > > > > should call it in its management thread and
> > > > > > > > > > > > > > guarantee the thread safe. It means APP should
> > > > > > > > > > > > > > stop the rx/tx and the device, then reset the device, then
> recover the device and rx/tx.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Following is _a_ use-case for Device reset. But
> > > > > > > > > > > > > may be not be _the_ use case. IMO, We need to
> > > > > > > > > > > > > first say expected behavior of this API and add a use-case
> later.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Other use-case would be, PCIe VF with functional
> > > > > > > > > > > > > level reset for SRIOV migration.
> > > > > > > > > > > > > Are we on same page?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > In my experience with Linux devices, this is
> > > > > > > > > > > > normally handled by the device driver in the start
> > > > > > > > > > > > routine.  Since any use case which needs this is
> > > > > > > > > > > > going to do a stop/reset/start sequence, why not just have
> the VF device driver do this in the start routine?.
> > > > > > > > > > > >
> > > > > > > > > > > > Adding yet another API and state transistion if
> > > > > > > > > > > > not necessary increases the complexity and required test
> cases for all devices.
> > > > > > > > > > >
> > > > > > > > > > > I agree with Stephen here.I think if application
> > > > > > > > > > > needs to call start after the device reset then we
> > > > > > > > > > > could add this logic in start itself rather exposing
> > > > > > > > > > > a yet another API
> > > > > > > > > > Do you mean changing the device_start to include all
> > > > > > > > > > these actions, stop
> > > > > > > > > device -> stop queue -> re-setup queue -> start queue -> start
> device ?
> > > > > > > > >
> > > > > > > > > What was the expected API call sequence when you were
> introduced this API?
> > > > > > > > >
> > > > > > > > > Point was to have implicit device reset in the API call
> > > > > > > > > sequence(Wherever make sense for specific PMD)
> > > > > > > > I think the API call sequence depends on the
> > > > > > > > implementation of the APP. Let's say if there's not this
> > > > > > > > reset API, APP can use
> > > this
> > > > > API
> > > > > > > call sequence to handle the PF link down/up event,
> > > > > > > rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > > rte_eth_tx_queue_setup -
> > > > > >
> > > > > > > rte_eth_dev_start.
> > > > > > > > Actually our purpose is to use this reset API instead of
> > > > > > > > the API call sequence. You can see the reset API is not
> > > > > > > > necessary. The
> > > > > benefit
> > > > > > > is to save the code for APP.
> > > > > > >
> > > > > > > Then I am bit confused with original commit log description.
> > > > > > > |
> > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > |
> > > > > > > I was under impression that it a low level reset API for
> > > > > > > this device? Is n't it?
> > > > > > >
> > > > > > > The other issue is generalized outlook of the API, Certain
> > > > > > > PMD will not have PF link down/up event? Link down/up and
> > > > > > > only connected to VF and PF only for configuration.
> > > > > > >
> > > > > > > How about fixing it more transparently in PMD driver itself
> > > > > > > as PMD driver knows the PF link up/down event, Is it
> > > > > > > possible to recover the VF on that event if its only matter of resetting
> it?
> > > > > >
> > > > > > I think we already went through that discussion on the list.
> > > > > > Unfortunately with current dpdk design it is hardly possible.
> > > > > > To achieve that we need to introduce some sort of
> > > > > > synchronisation between IO and control APIs (locking or so).
> > > > > > Actually I am not sure why having a special reset function will be a
> problem.
> > > > >
> > > > > |
> > > > > |It means APP should stop the rx/tx and the device, then reset
> > > > > |the device, then recover the device and rx/tx.
> > > > > |
> > > > > Just to understand, If application still need  to do the stop
> > > > > then what value addtion reset API brings on the table?
> > > >
> > > > If application calls dev_reset() it doesn't need to call dev_stop() before it.
> > > > dev_reset() will take care of it.
> > > > But it needs to make sure that no other thread will try to modify
> > > > that device state (either dev_stop/start, or eth_rx_busrst/eth_tx_burst)
> while the reset op is in place.
> > >
> > > OK. This description looks different than commit log and API doxygen
> comment. Please fix it.
> > > How about a different name for this API. Device reset is too generic?
Any suggestion? I use this name because I believe what this API do is to reset the device.

> > >
> > > >
> > > > >
> > > > >
> > > > > > Yes, it would exist only for VFs, for PF it could be left unimplemented.
> > > > > > Though it definitely seems more convenient from user point of
> > > > > > view, they would know: to handle VF reset event, they just
> > > > > > need to call that particular function, not to re-implement their own.
> > > > > What if driver returns "not implemented" then application will
> > > > >have do  generic rte_eth_dev_stop/rte_eth_dev_start.
> > > > >That way in application  perspective we are NOT solving any problem.
> > > >
> > > > True, but as I said for PF application would just never receive such event.
> > > What is this event ? Is it VF Link up/down event?
> > >
> > > No I was referring to VF itself, Other VF PMD drivers in drivers/net
> > > where this callback is not implemented.
> >
> > Hmm, the only suggestion I have here - Maintainers/developers of
> > non-Intel PMD will implement it for their VFs?
> 
> That's fine. But, We have to know what to implement here in PMD perspective?
> That's reason being asking about the API expectation and application usage :-)
> 
> > In case of course they do need to handle similar event.
> Which is this event and How application get notify it.
When the PF link is down/up, the PF will use the mailbox to send a message to VF. The event here means the VF receives that message from PF. So VF can know the physical link state changed. You see it's only for VF. PF will not receive such kind of message.
And we use the callback mechanism to let APP notified. APP should register a callback function. When VF driver receives the message it will call the callback function, then APP can know that.

> 
> > if not I suppose there is no harm to left it unimplemented.
> OK. If it is for VF/PF link down-up event then I will make it as 'nop'.
As explained above, the event is not VF/PF link down-up. Actually it's that VF is notified the PF link is down-up.

And to my opinion, although now we only implement the reset API for VF, I believe there's nothing preventing us to implement this API for PF if we can find some scenario that we need to reset the PF link. The reset API is reset API, it can be used for the event described above. But it's not bound to this event.
> 
> Jerin
> 
> > Konstantin
> >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  1:35                               ` Lu, Wenzhuo
@ 2016-06-22  2:37                                 ` Jerin Jacob
  2016-06-22  3:32                                   ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-22  2:37 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Wed, Jun 22, 2016 at 01:35:37AM +0000, Lu, Wenzhuo wrote:
> Hi Jerin,
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Tuesday, June 21, 2016 10:29 PM
> > To: Ananyev, Konstantin
> > Cc: Lu, Wenzhuo; Stephen Hemminger; dev@dpdk.org; Richardson, Bruce; Chen,
> > Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > 
> > On Tue, Jun 21, 2016 at 02:03:15PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > > > > > Hi Wenzhuo,
> > > > > > > >
> > > > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800, Wenzhuo Lu
> > wrote:
> > > > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > > > > > When the PF port down->up, APP should call
> > > > > > > > > > > > > > > this API to reset VF port. Most likely, APP
> > > > > > > > > > > > > > > should call it in its management thread and
> > > > > > > > > > > > > > > guarantee the thread safe. It means APP should
> > > > > > > > > > > > > > > stop the rx/tx and the device, then reset the device, then
> > recover the device and rx/tx.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Following is _a_ use-case for Device reset. But
> > > > > > > > > > > > > > may be not be _the_ use case. IMO, We need to
> > > > > > > > > > > > > > first say expected behavior of this API and add a use-case
> > later.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Other use-case would be, PCIe VF with functional
> > > > > > > > > > > > > > level reset for SRIOV migration.
> > > > > > > > > > > > > > Are we on same page?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > In my experience with Linux devices, this is
> > > > > > > > > > > > > normally handled by the device driver in the start
> > > > > > > > > > > > > routine.  Since any use case which needs this is
> > > > > > > > > > > > > going to do a stop/reset/start sequence, why not just have
> > the VF device driver do this in the start routine?.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Adding yet another API and state transistion if
> > > > > > > > > > > > > not necessary increases the complexity and required test
> > cases for all devices.
> > > > > > > > > > > >
> > > > > > > > > > > > I agree with Stephen here.I think if application
> > > > > > > > > > > > needs to call start after the device reset then we
> > > > > > > > > > > > could add this logic in start itself rather exposing
> > > > > > > > > > > > a yet another API
> > > > > > > > > > > Do you mean changing the device_start to include all
> > > > > > > > > > > these actions, stop
> > > > > > > > > > device -> stop queue -> re-setup queue -> start queue -> start
> > device ?
> > > > > > > > > >
> > > > > > > > > > What was the expected API call sequence when you were
> > introduced this API?
> > > > > > > > > >
> > > > > > > > > > Point was to have implicit device reset in the API call
> > > > > > > > > > sequence(Wherever make sense for specific PMD)
> > > > > > > > > I think the API call sequence depends on the
> > > > > > > > > implementation of the APP. Let's say if there's not this
> > > > > > > > > reset API, APP can use
> > > > this
> > > > > > API
> > > > > > > > call sequence to handle the PF link down/up event,
> > > > > > > > rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > > > rte_eth_tx_queue_setup -
> > > > > > >
> > > > > > > > rte_eth_dev_start.
> > > > > > > > > Actually our purpose is to use this reset API instead of
> > > > > > > > > the API call sequence. You can see the reset API is not
> > > > > > > > > necessary. The
> > > > > > benefit
> > > > > > > > is to save the code for APP.
> > > > > > > >
> > > > > > > > Then I am bit confused with original commit log description.
> > > > > > > > |
> > > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > > |
> > > > > > > > I was under impression that it a low level reset API for
> > > > > > > > this device? Is n't it?
> > > > > > > >
> > > > > > > > The other issue is generalized outlook of the API, Certain
> > > > > > > > PMD will not have PF link down/up event? Link down/up and
> > > > > > > > only connected to VF and PF only for configuration.
> > > > > > > >
> > > > > > > > How about fixing it more transparently in PMD driver itself
> > > > > > > > as PMD driver knows the PF link up/down event, Is it
> > > > > > > > possible to recover the VF on that event if its only matter of resetting
> > it?
> > > > > > >
> > > > > > > I think we already went through that discussion on the list.
> > > > > > > Unfortunately with current dpdk design it is hardly possible.
> > > > > > > To achieve that we need to introduce some sort of
> > > > > > > synchronisation between IO and control APIs (locking or so).
> > > > > > > Actually I am not sure why having a special reset function will be a
> > problem.
> > > > > >
> > > > > > |
> > > > > > |It means APP should stop the rx/tx and the device, then reset
> > > > > > |the device, then recover the device and rx/tx.
> > > > > > |
> > > > > > Just to understand, If application still need  to do the stop
> > > > > > then what value addtion reset API brings on the table?
> > > > >
> > > > > If application calls dev_reset() it doesn't need to call dev_stop() before it.
> > > > > dev_reset() will take care of it.
> > > > > But it needs to make sure that no other thread will try to modify
> > > > > that device state (either dev_stop/start, or eth_rx_busrst/eth_tx_burst)
> > while the reset op is in place.
> > > >
> > > > OK. This description looks different than commit log and API doxygen
> > comment. Please fix it.
> > > > How about a different name for this API. Device reset is too generic?
> Any suggestion? I use this name because I believe what this API do is to reset the device.
> 
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > Yes, it would exist only for VFs, for PF it could be left unimplemented.
> > > > > > > Though it definitely seems more convenient from user point of
> > > > > > > view, they would know: to handle VF reset event, they just
> > > > > > > need to call that particular function, not to re-implement their own.
> > > > > > What if driver returns "not implemented" then application will
> > > > > >have do  generic rte_eth_dev_stop/rte_eth_dev_start.
> > > > > >That way in application  perspective we are NOT solving any problem.
> > > > >
> > > > > True, but as I said for PF application would just never receive such event.
> > > > What is this event ? Is it VF Link up/down event?
> > > >
> > > > No I was referring to VF itself, Other VF PMD drivers in drivers/net
> > > > where this callback is not implemented.
> > >
> > > Hmm, the only suggestion I have here - Maintainers/developers of
> > > non-Intel PMD will implement it for their VFs?
> > 
> > That's fine. But, We have to know what to implement here in PMD perspective?
> > That's reason being asking about the API expectation and application usage :-)
> > 
> > > In case of course they do need to handle similar event.
> > Which is this event and How application get notify it.
> When the PF link is down/up, the PF will use the mailbox to send a message to VF. The event here means the VF receives that message from PF. So VF can know the physical link state changed. You see it's only for VF. PF will not receive such kind of message.
> And we use the callback mechanism to let APP notified. APP should register a callback function. When VF driver receives the message it will call the callback function, then APP can know that.

How about the standardizing  a name for that event like
RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or
RTE_ETH_EVENT_INTR_PF_LSC or similar (like RTE_ETH_EVENT_INTR_RESET)
and counter API in VF to handle the specific event whose API name
similar to selected event name not eth_dev_reset(reset sounds like more
like HW reset, In PCIe device perspective FLR etc)

OR

How about handling in more generic way where a generic alert message
send by PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
And have only one handle functions in VF side so that in future
we can keep adding new functionality with out introducing new counter API in VF

Jerin

> 
> > 
> > > if not I suppose there is no harm to left it unimplemented.
> > OK. If it is for VF/PF link down-up event then I will make it as 'nop'.
> As explained above, the event is not VF/PF link down-up. Actually it's that VF is notified the PF link is down-up.
> 
> And to my opinion, although now we only implement the reset API for VF, I believe there's nothing preventing us to implement this API for PF if we can find some scenario that we need to reset the PF link. The reset API is reset API, it can be used for the event described above. But it's not bound to this event.
> > 
> > Jerin
> > 
> > > Konstantin
> > >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  2:37                                 ` Jerin Jacob
@ 2016-06-22  3:32                                   ` Lu, Wenzhuo
  2016-06-22  4:14                                     ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-22  3:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, June 22, 2016 10:38 AM
> To: Lu, Wenzhuo
> Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org; Richardson,
> Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Wed, Jun 22, 2016 at 01:35:37AM +0000, Lu, Wenzhuo wrote:
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Tuesday, June 21, 2016 10:29 PM
> > > To: Ananyev, Konstantin
> > > Cc: Lu, Wenzhuo; Stephen Hemminger; dev@dpdk.org; Richardson, Bruce;
> > > Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > > thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > device reset
> > >
> > > On Tue, Jun 21, 2016 at 02:03:15PM +0000, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > > > > > Hi Wenzhuo,
> > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800,
> > > > > > > > > > > > > > > Wenzhuo Lu
> > > wrote:
> > > > > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > > > > > > When the PF port down->up, APP should call
> > > > > > > > > > > > > > > > this API to reset VF port. Most likely,
> > > > > > > > > > > > > > > > APP should call it in its management
> > > > > > > > > > > > > > > > thread and guarantee the thread safe. It
> > > > > > > > > > > > > > > > means APP should stop the rx/tx and the
> > > > > > > > > > > > > > > > device, then reset the device, then
> > > recover the device and rx/tx.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Following is _a_ use-case for Device reset.
> > > > > > > > > > > > > > > But may be not be _the_ use case. IMO, We
> > > > > > > > > > > > > > > need to first say expected behavior of this
> > > > > > > > > > > > > > > API and add a use-case
> > > later.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Other use-case would be, PCIe VF with
> > > > > > > > > > > > > > > functional level reset for SRIOV migration.
> > > > > > > > > > > > > > > Are we on same page?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In my experience with Linux devices, this is
> > > > > > > > > > > > > > normally handled by the device driver in the
> > > > > > > > > > > > > > start routine.  Since any use case which needs
> > > > > > > > > > > > > > this is going to do a stop/reset/start
> > > > > > > > > > > > > > sequence, why not just have
> > > the VF device driver do this in the start routine?.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Adding yet another API and state transistion
> > > > > > > > > > > > > > if not necessary increases the complexity and
> > > > > > > > > > > > > > required test
> > > cases for all devices.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I agree with Stephen here.I think if application
> > > > > > > > > > > > > needs to call start after the device reset then
> > > > > > > > > > > > > we could add this logic in start itself rather
> > > > > > > > > > > > > exposing a yet another API
> > > > > > > > > > > > Do you mean changing the device_start to include
> > > > > > > > > > > > all these actions, stop
> > > > > > > > > > > device -> stop queue -> re-setup queue -> start
> > > > > > > > > > > queue -> start
> > > device ?
> > > > > > > > > > >
> > > > > > > > > > > What was the expected API call sequence when you
> > > > > > > > > > > were
> > > introduced this API?
> > > > > > > > > > >
> > > > > > > > > > > Point was to have implicit device reset in the API
> > > > > > > > > > > call sequence(Wherever make sense for specific PMD)
> > > > > > > > > > I think the API call sequence depends on the
> > > > > > > > > > implementation of the APP. Let's say if there's not
> > > > > > > > > > this reset API, APP can use
> > > > > this
> > > > > > > API
> > > > > > > > > call sequence to handle the PF link down/up event,
> > > > > > > > > rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > > > > rte_eth_tx_queue_setup -
> > > > > > > >
> > > > > > > > > rte_eth_dev_start.
> > > > > > > > > > Actually our purpose is to use this reset API instead
> > > > > > > > > > of the API call sequence. You can see the reset API is
> > > > > > > > > > not necessary. The
> > > > > > > benefit
> > > > > > > > > is to save the code for APP.
> > > > > > > > >
> > > > > > > > > Then I am bit confused with original commit log description.
> > > > > > > > > |
> > > > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > > > |
> > > > > > > > > I was under impression that it a low level reset API for
> > > > > > > > > this device? Is n't it?
> > > > > > > > >
> > > > > > > > > The other issue is generalized outlook of the API,
> > > > > > > > > Certain PMD will not have PF link down/up event? Link
> > > > > > > > > down/up and only connected to VF and PF only for configuration.
> > > > > > > > >
> > > > > > > > > How about fixing it more transparently in PMD driver
> > > > > > > > > itself as PMD driver knows the PF link up/down event, Is
> > > > > > > > > it possible to recover the VF on that event if its only
> > > > > > > > > matter of resetting
> > > it?
> > > > > > > >
> > > > > > > > I think we already went through that discussion on the list.
> > > > > > > > Unfortunately with current dpdk design it is hardly possible.
> > > > > > > > To achieve that we need to introduce some sort of
> > > > > > > > synchronisation between IO and control APIs (locking or so).
> > > > > > > > Actually I am not sure why having a special reset function
> > > > > > > > will be a
> > > problem.
> > > > > > >
> > > > > > > |
> > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > |
> > > > > > > Just to understand, If application still need  to do the
> > > > > > > stop then what value addtion reset API brings on the table?
> > > > > >
> > > > > > If application calls dev_reset() it doesn't need to call dev_stop() before
> it.
> > > > > > dev_reset() will take care of it.
> > > > > > But it needs to make sure that no other thread will try to
> > > > > > modify that device state (either dev_stop/start, or
> > > > > > eth_rx_busrst/eth_tx_burst)
> > > while the reset op is in place.
> > > > >
> > > > > OK. This description looks different than commit log and API
> > > > > doxygen
> > > comment. Please fix it.
> > > > > How about a different name for this API. Device reset is too generic?
> > Any suggestion? I use this name because I believe what this API do is to reset
> the device.
> >
> > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Yes, it would exist only for VFs, for PF it could be left
> unimplemented.
> > > > > > > > Though it definitely seems more convenient from user point
> > > > > > > > of view, they would know: to handle VF reset event, they
> > > > > > > > just need to call that particular function, not to re-implement their
> own.
> > > > > > > What if driver returns "not implemented" then application
> > > > > > >will have do  generic rte_eth_dev_stop/rte_eth_dev_start.
> > > > > > >That way in application  perspective we are NOT solving any problem.
> > > > > >
> > > > > > True, but as I said for PF application would just never receive such event.
> > > > > What is this event ? Is it VF Link up/down event?
> > > > >
> > > > > No I was referring to VF itself, Other VF PMD drivers in
> > > > > drivers/net where this callback is not implemented.
> > > >
> > > > Hmm, the only suggestion I have here - Maintainers/developers of
> > > > non-Intel PMD will implement it for their VFs?
> > >
> > > That's fine. But, We have to know what to implement here in PMD
> perspective?
> > > That's reason being asking about the API expectation and application
> > > usage :-)
> > >
> > > > In case of course they do need to handle similar event.
> > > Which is this event and How application get notify it.
> > When the PF link is down/up, the PF will use the mailbox to send a message to
> VF. The event here means the VF receives that message from PF. So VF can know
> the physical link state changed. You see it's only for VF. PF will not receive such
> kind of message.
> > And we use the callback mechanism to let APP notified. APP should register a
> callback function. When VF driver receives the message it will call the callback
> function, then APP can know that.
> 
> How about the standardizing  a name for that event like
> RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or RTE_ETH_EVENT_INTR_PF_LSC
> or similar (like RTE_ETH_EVENT_INTR_RESET) and counter API in VF to handle
> the specific event whose API name similar to selected event name not
> eth_dev_reset(reset sounds like more like HW reset, In PCIe device perspective
> FLR etc)
> 
> OR
> 
> How about handling in more generic way where a generic alert message send by
> PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
> And have only one handle functions in VF side so that in future we can keep
> adding new functionality with out introducing new counter API in VF
> 
> Jerin
Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call back functions with the events.
Actually I want the APP to register a callback function reset_event_callback for the reset event. Like this,
		/* register reset interrupt callback */
		rte_eth_dev_callback_register(portid,
			RTE_ETH_EVENT_INTR_RESET, reset_event_callback, NULL);
And when the VF driver finds PF link down/up, it  should  use _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into the callback which is provided by APP. Means reset_event_callback here.

> 
> >
> > >
> > > > if not I suppose there is no harm to left it unimplemented.
> > > OK. If it is for VF/PF link down-up event then I will make it as 'nop'.
> > As explained above, the event is not VF/PF link down-up. Actually it's that VF is
> notified the PF link is down-up.
> >
> > And to my opinion, although now we only implement the reset API for VF, I
> believe there's nothing preventing us to implement this API for PF if we can find
> some scenario that we need to reset the PF link. The reset API is reset API, it can
> be used for the event described above. But it's not bound to this event.
> > >
> > > Jerin
> > >
> > > > Konstantin
> > > >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  3:32                                   ` Lu, Wenzhuo
@ 2016-06-22  4:14                                     ` Jerin Jacob
  2016-06-22  5:05                                       ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-22  4:14 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Wed, Jun 22, 2016 at 03:32:16AM +0000, Lu, Wenzhuo wrote:
> Hi Jerin,
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, June 22, 2016 10:38 AM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org; Richardson,
> > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > 
> > On Wed, Jun 22, 2016 at 01:35:37AM +0000, Lu, Wenzhuo wrote:
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Tuesday, June 21, 2016 10:29 PM
> > > > To: Ananyev, Konstantin
> > > > Cc: Lu, Wenzhuo; Stephen Hemminger; dev@dpdk.org; Richardson, Bruce;
> > > > Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > > > thomas.monjalon@6wind.com
> > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > > device reset
> > > >
> > > > On Tue, Jun 21, 2016 at 02:03:15PM +0000, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > > > > > Hi Wenzhuo,
> > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM +0800,
> > > > > > > > > > > > > > > > Wenzhuo Lu
> > > > wrote:
> > > > > > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK VF.
> > > > > > > > > > > > > > > > > When the PF port down->up, APP should call
> > > > > > > > > > > > > > > > > this API to reset VF port. Most likely,
> > > > > > > > > > > > > > > > > APP should call it in its management
> > > > > > > > > > > > > > > > > thread and guarantee the thread safe. It
> > > > > > > > > > > > > > > > > means APP should stop the rx/tx and the
> > > > > > > > > > > > > > > > > device, then reset the device, then
> > > > recover the device and rx/tx.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Following is _a_ use-case for Device reset.
> > > > > > > > > > > > > > > > But may be not be _the_ use case. IMO, We
> > > > > > > > > > > > > > > > need to first say expected behavior of this
> > > > > > > > > > > > > > > > API and add a use-case
> > > > later.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Other use-case would be, PCIe VF with
> > > > > > > > > > > > > > > > functional level reset for SRIOV migration.
> > > > > > > > > > > > > > > > Are we on same page?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In my experience with Linux devices, this is
> > > > > > > > > > > > > > > normally handled by the device driver in the
> > > > > > > > > > > > > > > start routine.  Since any use case which needs
> > > > > > > > > > > > > > > this is going to do a stop/reset/start
> > > > > > > > > > > > > > > sequence, why not just have
> > > > the VF device driver do this in the start routine?.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Adding yet another API and state transistion
> > > > > > > > > > > > > > > if not necessary increases the complexity and
> > > > > > > > > > > > > > > required test
> > > > cases for all devices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I agree with Stephen here.I think if application
> > > > > > > > > > > > > > needs to call start after the device reset then
> > > > > > > > > > > > > > we could add this logic in start itself rather
> > > > > > > > > > > > > > exposing a yet another API
> > > > > > > > > > > > > Do you mean changing the device_start to include
> > > > > > > > > > > > > all these actions, stop
> > > > > > > > > > > > device -> stop queue -> re-setup queue -> start
> > > > > > > > > > > > queue -> start
> > > > device ?
> > > > > > > > > > > >
> > > > > > > > > > > > What was the expected API call sequence when you
> > > > > > > > > > > > were
> > > > introduced this API?
> > > > > > > > > > > >
> > > > > > > > > > > > Point was to have implicit device reset in the API
> > > > > > > > > > > > call sequence(Wherever make sense for specific PMD)
> > > > > > > > > > > I think the API call sequence depends on the
> > > > > > > > > > > implementation of the APP. Let's say if there's not
> > > > > > > > > > > this reset API, APP can use
> > > > > > this
> > > > > > > > API
> > > > > > > > > > call sequence to handle the PF link down/up event,
> > > > > > > > > > rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > > > > > rte_eth_tx_queue_setup -
> > > > > > > > >
> > > > > > > > > > rte_eth_dev_start.
> > > > > > > > > > > Actually our purpose is to use this reset API instead
> > > > > > > > > > > of the API call sequence. You can see the reset API is
> > > > > > > > > > > not necessary. The
> > > > > > > > benefit
> > > > > > > > > > is to save the code for APP.
> > > > > > > > > >
> > > > > > > > > > Then I am bit confused with original commit log description.
> > > > > > > > > > |
> > > > > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > > > > |
> > > > > > > > > > I was under impression that it a low level reset API for
> > > > > > > > > > this device? Is n't it?
> > > > > > > > > >
> > > > > > > > > > The other issue is generalized outlook of the API,
> > > > > > > > > > Certain PMD will not have PF link down/up event? Link
> > > > > > > > > > down/up and only connected to VF and PF only for configuration.
> > > > > > > > > >
> > > > > > > > > > How about fixing it more transparently in PMD driver
> > > > > > > > > > itself as PMD driver knows the PF link up/down event, Is
> > > > > > > > > > it possible to recover the VF on that event if its only
> > > > > > > > > > matter of resetting
> > > > it?
> > > > > > > > >
> > > > > > > > > I think we already went through that discussion on the list.
> > > > > > > > > Unfortunately with current dpdk design it is hardly possible.
> > > > > > > > > To achieve that we need to introduce some sort of
> > > > > > > > > synchronisation between IO and control APIs (locking or so).
> > > > > > > > > Actually I am not sure why having a special reset function
> > > > > > > > > will be a
> > > > problem.
> > > > > > > >
> > > > > > > > |
> > > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > > |
> > > > > > > > Just to understand, If application still need  to do the
> > > > > > > > stop then what value addtion reset API brings on the table?
> > > > > > >
> > > > > > > If application calls dev_reset() it doesn't need to call dev_stop() before
> > it.
> > > > > > > dev_reset() will take care of it.
> > > > > > > But it needs to make sure that no other thread will try to
> > > > > > > modify that device state (either dev_stop/start, or
> > > > > > > eth_rx_busrst/eth_tx_burst)
> > > > while the reset op is in place.
> > > > > >
> > > > > > OK. This description looks different than commit log and API
> > > > > > doxygen
> > > > comment. Please fix it.
> > > > > > How about a different name for this API. Device reset is too generic?
> > > Any suggestion? I use this name because I believe what this API do is to reset
> > the device.
> > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Yes, it would exist only for VFs, for PF it could be left
> > unimplemented.
> > > > > > > > > Though it definitely seems more convenient from user point
> > > > > > > > > of view, they would know: to handle VF reset event, they
> > > > > > > > > just need to call that particular function, not to re-implement their
> > own.
> > > > > > > > What if driver returns "not implemented" then application
> > > > > > > >will have do  generic rte_eth_dev_stop/rte_eth_dev_start.
> > > > > > > >That way in application  perspective we are NOT solving any problem.
> > > > > > >
> > > > > > > True, but as I said for PF application would just never receive such event.
> > > > > > What is this event ? Is it VF Link up/down event?
> > > > > >
> > > > > > No I was referring to VF itself, Other VF PMD drivers in
> > > > > > drivers/net where this callback is not implemented.
> > > > >
> > > > > Hmm, the only suggestion I have here - Maintainers/developers of
> > > > > non-Intel PMD will implement it for their VFs?
> > > >
> > > > That's fine. But, We have to know what to implement here in PMD
> > perspective?
> > > > That's reason being asking about the API expectation and application
> > > > usage :-)
> > > >
> > > > > In case of course they do need to handle similar event.
> > > > Which is this event and How application get notify it.
> > > When the PF link is down/up, the PF will use the mailbox to send a message to
> > VF. The event here means the VF receives that message from PF. So VF can know
> > the physical link state changed. You see it's only for VF. PF will not receive such
> > kind of message.
> > > And we use the callback mechanism to let APP notified. APP should register a
> > callback function. When VF driver receives the message it will call the callback
> > function, then APP can know that.
> > 
> > How about the standardizing  a name for that event like
> > RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or RTE_ETH_EVENT_INTR_PF_LSC
> > or similar (like RTE_ETH_EVENT_INTR_RESET) and counter API in VF to handle
> > the specific event whose API name similar to selected event name not
> > eth_dev_reset(reset sounds like more like HW reset, In PCIe device perspective
> > FLR etc)
> > 
> > OR
> > 
> > How about handling in more generic way where a generic alert message send by
> > PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
> > And have only one handle functions in VF side so that in future we can keep
> > adding new functionality with out introducing new counter API in VF
> > 
> > Jerin
> Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call back functions with the events.
> Actually I want the APP to register a callback function reset_event_callback for the reset event. Like this,
> 		/* register reset interrupt callback */
> 		rte_eth_dev_callback_register(portid,
> 			RTE_ETH_EVENT_INTR_RESET, reset_event_callback, NULL);
> And when the VF driver finds PF link down/up, it  should  use _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into the callback which is provided by APP. Means reset_event_callback here.

me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
reset.I guess it is not for the PF link change or it isfor generic VF reset request
initiated by PF for everything.

file: lib/librte_ether/rte_ethdev.h
        RTE_ETH_EVENT_INTR_RESET,
		/**< reset interrupt event, sent to VF on PF reset */
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^

if application need to call rte_ethdev_reset() on  RTE_ETH_EVENT_INTR_RESET
event then please mention it commit log or API description.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  4:14                                     ` Jerin Jacob
@ 2016-06-22  5:05                                       ` Lu, Wenzhuo
  2016-06-22  6:10                                         ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-22  5:05 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, June 22, 2016 12:15 PM
> To: Lu, Wenzhuo
> Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org; Richardson,
> Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Wed, Jun 22, 2016 at 03:32:16AM +0000, Lu, Wenzhuo wrote:
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Wednesday, June 22, 2016 10:38 AM
> > > To: Lu, Wenzhuo
> > > Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org;
> > > Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> > > Zhang, Helin; thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > device reset
> > >
> > > On Wed, Jun 22, 2016 at 01:35:37AM +0000, Lu, Wenzhuo wrote:
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > Sent: Tuesday, June 21, 2016 10:29 PM
> > > > > To: Ananyev, Konstantin
> > > > > Cc: Lu, Wenzhuo; Stephen Hemminger; dev@dpdk.org; Richardson,
> > > > > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > > > > thomas.monjalon@6wind.com
> > > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > > > device reset
> > > > >
> > > > > On Tue, Jun 21, 2016 at 02:03:15PM +0000, Ananyev, Konstantin wrote:
> > > > > >
> > > > > >
> > > > > > > > > > > Hi Wenzhuo,
> > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Jun 20, 2016 at 02:24:27PM
> > > > > > > > > > > > > > > > > +0800, Wenzhuo Lu
> > > > > wrote:
> > > > > > > > > > > > > > > > > > Add an API to reset the device.
> > > > > > > > > > > > > > > > > > It's for VF device in this scenario, kernel PF + DPDK
> VF.
> > > > > > > > > > > > > > > > > > When the PF port down->up, APP should
> > > > > > > > > > > > > > > > > > call this API to reset VF port. Most
> > > > > > > > > > > > > > > > > > likely, APP should call it in its
> > > > > > > > > > > > > > > > > > management thread and guarantee the
> > > > > > > > > > > > > > > > > > thread safe. It means APP should stop
> > > > > > > > > > > > > > > > > > the rx/tx and the device, then reset
> > > > > > > > > > > > > > > > > > the device, then
> > > > > recover the device and rx/tx.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Following is _a_ use-case for Device reset.
> > > > > > > > > > > > > > > > > But may be not be _the_ use case. IMO,
> > > > > > > > > > > > > > > > > We need to first say expected behavior
> > > > > > > > > > > > > > > > > of this API and add a use-case
> > > > > later.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Other use-case would be, PCIe VF with
> > > > > > > > > > > > > > > > > functional level reset for SRIOV migration.
> > > > > > > > > > > > > > > > > Are we on same page?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In my experience with Linux devices, this
> > > > > > > > > > > > > > > > is normally handled by the device driver
> > > > > > > > > > > > > > > > in the start routine.  Since any use case
> > > > > > > > > > > > > > > > which needs this is going to do a
> > > > > > > > > > > > > > > > stop/reset/start sequence, why not just
> > > > > > > > > > > > > > > > have
> > > > > the VF device driver do this in the start routine?.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Adding yet another API and state
> > > > > > > > > > > > > > > > transistion if not necessary increases the
> > > > > > > > > > > > > > > > complexity and required test
> > > > > cases for all devices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I agree with Stephen here.I think if
> > > > > > > > > > > > > > > application needs to call start after the
> > > > > > > > > > > > > > > device reset then we could add this logic in
> > > > > > > > > > > > > > > start itself rather exposing a yet another
> > > > > > > > > > > > > > > API
> > > > > > > > > > > > > > Do you mean changing the device_start to
> > > > > > > > > > > > > > include all these actions, stop
> > > > > > > > > > > > > device -> stop queue -> re-setup queue -> start
> > > > > > > > > > > > > queue -> start
> > > > > device ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > What was the expected API call sequence when you
> > > > > > > > > > > > > were
> > > > > introduced this API?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Point was to have implicit device reset in the
> > > > > > > > > > > > > API call sequence(Wherever make sense for
> > > > > > > > > > > > > specific PMD)
> > > > > > > > > > > > I think the API call sequence depends on the
> > > > > > > > > > > > implementation of the APP. Let's say if there's
> > > > > > > > > > > > not this reset API, APP can use
> > > > > > > this
> > > > > > > > > API
> > > > > > > > > > > call sequence to handle the PF link down/up event,
> > > > > > > > > > > rte_eth_dev_close -> rte_eth_rx_queue_setup ->
> > > > > > > rte_eth_tx_queue_setup -
> > > > > > > > > >
> > > > > > > > > > > rte_eth_dev_start.
> > > > > > > > > > > > Actually our purpose is to use this reset API
> > > > > > > > > > > > instead of the API call sequence. You can see the
> > > > > > > > > > > > reset API is not necessary. The
> > > > > > > > > benefit
> > > > > > > > > > > is to save the code for APP.
> > > > > > > > > > >
> > > > > > > > > > > Then I am bit confused with original commit log description.
> > > > > > > > > > > |
> > > > > > > > > > > |It means APP should stop the rx/tx and the device,
> > > > > > > > > > > |then reset the device, then recover the device and rx/tx.
> > > > > > > > > > > |
> > > > > > > > > > > I was under impression that it a low level reset API
> > > > > > > > > > > for this device? Is n't it?
> > > > > > > > > > >
> > > > > > > > > > > The other issue is generalized outlook of the API,
> > > > > > > > > > > Certain PMD will not have PF link down/up event?
> > > > > > > > > > > Link down/up and only connected to VF and PF only for
> configuration.
> > > > > > > > > > >
> > > > > > > > > > > How about fixing it more transparently in PMD driver
> > > > > > > > > > > itself as PMD driver knows the PF link up/down
> > > > > > > > > > > event, Is it possible to recover the VF on that
> > > > > > > > > > > event if its only matter of resetting
> > > > > it?
> > > > > > > > > >
> > > > > > > > > > I think we already went through that discussion on the list.
> > > > > > > > > > Unfortunately with current dpdk design it is hardly possible.
> > > > > > > > > > To achieve that we need to introduce some sort of
> > > > > > > > > > synchronisation between IO and control APIs (locking or so).
> > > > > > > > > > Actually I am not sure why having a special reset
> > > > > > > > > > function will be a
> > > > > problem.
> > > > > > > > >
> > > > > > > > > |
> > > > > > > > > |It means APP should stop the rx/tx and the device, then
> > > > > > > > > |reset the device, then recover the device and rx/tx.
> > > > > > > > > |
> > > > > > > > > Just to understand, If application still need  to do the
> > > > > > > > > stop then what value addtion reset API brings on the table?
> > > > > > > >
> > > > > > > > If application calls dev_reset() it doesn't need to call
> > > > > > > > dev_stop() before
> > > it.
> > > > > > > > dev_reset() will take care of it.
> > > > > > > > But it needs to make sure that no other thread will try to
> > > > > > > > modify that device state (either dev_stop/start, or
> > > > > > > > eth_rx_busrst/eth_tx_burst)
> > > > > while the reset op is in place.
> > > > > > >
> > > > > > > OK. This description looks different than commit log and API
> > > > > > > doxygen
> > > > > comment. Please fix it.
> > > > > > > How about a different name for this API. Device reset is too generic?
> > > > Any suggestion? I use this name because I believe what this API do
> > > > is to reset
> > > the device.
> > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Yes, it would exist only for VFs, for PF it could be
> > > > > > > > > > left
> > > unimplemented.
> > > > > > > > > > Though it definitely seems more convenient from user
> > > > > > > > > > point of view, they would know: to handle VF reset
> > > > > > > > > > event, they just need to call that particular
> > > > > > > > > > function, not to re-implement their
> > > own.
> > > > > > > > > What if driver returns "not implemented" then
> > > > > > > > >application will have do  generic
> rte_eth_dev_stop/rte_eth_dev_start.
> > > > > > > > >That way in application  perspective we are NOT solving any
> problem.
> > > > > > > >
> > > > > > > > True, but as I said for PF application would just never receive such
> event.
> > > > > > > What is this event ? Is it VF Link up/down event?
> > > > > > >
> > > > > > > No I was referring to VF itself, Other VF PMD drivers in
> > > > > > > drivers/net where this callback is not implemented.
> > > > > >
> > > > > > Hmm, the only suggestion I have here - Maintainers/developers
> > > > > > of non-Intel PMD will implement it for their VFs?
> > > > >
> > > > > That's fine. But, We have to know what to implement here in PMD
> > > perspective?
> > > > > That's reason being asking about the API expectation and
> > > > > application usage :-)
> > > > >
> > > > > > In case of course they do need to handle similar event.
> > > > > Which is this event and How application get notify it.
> > > > When the PF link is down/up, the PF will use the mailbox to send a
> > > > message to
> > > VF. The event here means the VF receives that message from PF. So VF
> > > can know the physical link state changed. You see it's only for VF.
> > > PF will not receive such kind of message.
> > > > And we use the callback mechanism to let APP notified. APP should
> > > > register a
> > > callback function. When VF driver receives the message it will call
> > > the callback function, then APP can know that.
> > >
> > > How about the standardizing  a name for that event like
> > > RTE_ETH_EVENT_INTR_DOWNSTREAM_LSC or
> RTE_ETH_EVENT_INTR_PF_LSC or
> > > similar (like RTE_ETH_EVENT_INTR_RESET) and counter API in VF to
> > > handle the specific event whose API name similar to selected event
> > > name not eth_dev_reset(reset sounds like more like HW reset, In PCIe
> > > device perspective FLR etc)
> > >
> > > OR
> > >
> > > How about handling in more generic way where a generic alert message
> > > send by PF to VF like RTE_ETH_EVENT_INTR_PF_ALERT or similar.
> > > And have only one handle functions in VF side so that in future we
> > > can keep adding new functionality with out introducing new counter
> > > API in VF
> > >
> > > Jerin
> > Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
> back functions with the events.
> > Actually I want the APP to register a callback function reset_event_callback for
> the reset event. Like this,
> > 		/* register reset interrupt callback */
> > 		rte_eth_dev_callback_register(portid,
> > 			RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
> NULL); And when the
> > VF driver finds PF link down/up, it  should  use
> _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into
> the callback which is provided by APP. Means reset_event_callback here.
> 
> me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
> reset.I guess it is not for the PF link change or it isfor generic VF reset request
> initiated by PF for everything.
I think this event is for device reset not only for PF but also can for VF. I think we can use this event when the driver want the APP to reset the device. The PF link down/up caused VF reset is one of the cases.

> 
> file: lib/librte_ether/rte_ethdev.h
>         RTE_ETH_EVENT_INTR_RESET,
> 		/**< reset interrupt event, sent to VF on PF reset */
>                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> if application need to call rte_ethdev_reset() on  RTE_ETH_EVENT_INTR_RESET
> event then please mention it commit log or API description.
Good suggestion. I'll try to find where's the good place to add more explanation.
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  5:05                                       ` Lu, Wenzhuo
@ 2016-06-22  6:10                                         ` Jerin Jacob
  2016-06-22  6:42                                           ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-22  6:10 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Wed, Jun 22, 2016 at 05:05:14AM +0000, Lu, Wenzhuo wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, June 22, 2016 12:15 PM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org; Richardson,
> > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > 
> > On Wed, Jun 22, 2016 at 03:32:16AM +0000, Lu, Wenzhuo wrote:
> > > Lost here. I think these RTE_ETH_EVENTs are used to connect the APP call
> > back functions with the events.
> > > Actually I want the APP to register a callback function reset_event_callback for
> > the reset event. Like this,
> > > 		/* register reset interrupt callback */
> > > 		rte_eth_dev_callback_register(portid,
> > > 			RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
> > NULL); And when the
> > > VF driver finds PF link down/up, it  should  use
> > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run into
> > the callback which is provided by APP. Means reset_event_callback here.
> > 
> > me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify the PF
> > reset.I guess it is not for the PF link change or it isfor generic VF reset request
> > initiated by PF for everything.
> I think this event is for device reset not only for PF but also can for VF. I think we can use this event when the driver want the APP to reset the device. The PF link down/up caused VF reset is one of the cases.

Then please correct description for the RTE_ETH_EVENT_INTR_RESET
in lib/librte_ether/rte_ethdev.h
"/**< reset interrupt event, sent to VF on PF reset */"

> 
> > 
> > file: lib/librte_ether/rte_ethdev.h
> >         RTE_ETH_EVENT_INTR_RESET,
> > 		/**< reset interrupt event, sent to VF on PF reset */
> >                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > if application need to call rte_ethdev_reset() on  RTE_ETH_EVENT_INTR_RESET
> > event then please mention it commit log or API description.
> Good suggestion. I'll try to find where's the good place to add more explanation.

I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
similar to reflect the use case(API called by application on reset event from PF)

The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
then VF's reset PMD callback shall be a 'nop'

Jerin

> > 
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  6:10                                         ` Jerin Jacob
@ 2016-06-22  6:42                                           ` Lu, Wenzhuo
  2016-06-22  7:59                                             ` Jerin Jacob
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-22  6:42 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

Hi Jerin,


> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, June 22, 2016 2:10 PM
> To: Lu, Wenzhuo
> Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org; Richardson,
> Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Wed, Jun 22, 2016 at 05:05:14AM +0000, Lu, Wenzhuo wrote:
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Wednesday, June 22, 2016 12:15 PM
> > > To: Lu, Wenzhuo
> > > Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org;
> > > Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> > > Zhang, Helin; thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > device reset
> > >
> > > On Wed, Jun 22, 2016 at 03:32:16AM +0000, Lu, Wenzhuo wrote:
> > > > Lost here. I think these RTE_ETH_EVENTs are used to connect the
> > > > APP call
> > > back functions with the events.
> > > > Actually I want the APP to register a callback function
> > > > reset_event_callback for
> > > the reset event. Like this,
> > > > 		/* register reset interrupt callback */
> > > > 		rte_eth_dev_callback_register(portid,
> > > > 			RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
> > > NULL); And when the
> > > > VF driver finds PF link down/up, it  should  use
> > > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run
> > > into the callback which is provided by APP. Means reset_event_callback here.
> > >
> > > me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify
> > > the PF reset.I guess it is not for the PF link change or it isfor
> > > generic VF reset request initiated by PF for everything.
> > I think this event is for device reset not only for PF but also can for VF. I think
> we can use this event when the driver want the APP to reset the device. The PF
> link down/up caused VF reset is one of the cases.
> 
> Then please correct description for the RTE_ETH_EVENT_INTR_RESET in
> lib/librte_ether/rte_ethdev.h "/**< reset interrupt event, sent to VF on PF reset
> */"
> 
> >
> > >
> > > file: lib/librte_ether/rte_ethdev.h
> > >         RTE_ETH_EVENT_INTR_RESET,
> > > 		/**< reset interrupt event, sent to VF on PF reset */
> > >
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >
> > > if application need to call rte_ethdev_reset() on
> > > RTE_ETH_EVENT_INTR_RESET event then please mention it commit log or
> API description.
> > Good suggestion. I'll try to find where's the good place to add more
> explanation.
> 
> I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
> similar to reflect the use case(API called by application on reset event from PF)
> 
> The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
> then VF's reset PMD callback shall be a 'nop'
> 
> Jerin
But I don't think it's appropriate to bind the RTE_ETH_EVENTs with the APIs. This patch set provide a reset API to reset the device. Don't mean this reset API only can be used when the APP hit the event RTE_ETH_EVENT_INTR_RESET. I can add some comments to suggest the user to call the reset API at that time. But I think APP can call the reset API anytime when it thinks it's necessary. So I don't like the name *process_reset_intr*, it hints that this API is only for the INTR_RESET event.

> 
> > >
> >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  6:42                                           ` Lu, Wenzhuo
@ 2016-06-22  7:59                                             ` Jerin Jacob
  2016-06-22  8:17                                               ` Thomas Monjalon
  0 siblings, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-22  7:59 UTC (permalink / raw)
  To: Lu, Wenzhuo
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin,
	thomas.monjalon

On Wed, Jun 22, 2016 at 06:42:43AM +0000, Lu, Wenzhuo wrote:
> Hi Jerin,
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, June 22, 2016 2:10 PM
> > To: Lu, Wenzhuo
> > Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org; Richardson,
> > Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin;
> > thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> > 
> > On Wed, Jun 22, 2016 at 05:05:14AM +0000, Lu, Wenzhuo wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > Sent: Wednesday, June 22, 2016 12:15 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org;
> > > > Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing;
> > > > Zhang, Helin; thomas.monjalon@6wind.com
> > > > Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support
> > > > device reset
> > > >
> > > > On Wed, Jun 22, 2016 at 03:32:16AM +0000, Lu, Wenzhuo wrote:
> > > > > Lost here. I think these RTE_ETH_EVENTs are used to connect the
> > > > > APP call
> > > > back functions with the events.
> > > > > Actually I want the APP to register a callback function
> > > > > reset_event_callback for
> > > > the reset event. Like this,
> > > > > 		/* register reset interrupt callback */
> > > > > 		rte_eth_dev_callback_register(portid,
> > > > > 			RTE_ETH_EVENT_INTR_RESET, reset_event_callback,
> > > > NULL); And when the
> > > > > VF driver finds PF link down/up, it  should  use
> > > > _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET) to run
> > > > into the callback which is provided by APP. Means reset_event_callback here.
> > > >
> > > > me too. Their is existing RTE_ETH_EVENT_INTR_RESET event to notify
> > > > the PF reset.I guess it is not for the PF link change or it isfor
> > > > generic VF reset request initiated by PF for everything.
> > > I think this event is for device reset not only for PF but also can for VF. I think
> > we can use this event when the driver want the APP to reset the device. The PF
> > link down/up caused VF reset is one of the cases.
> > 
> > Then please correct description for the RTE_ETH_EVENT_INTR_RESET in
> > lib/librte_ether/rte_ethdev.h "/**< reset interrupt event, sent to VF on PF reset
> > */"
> > 
> > >
> > > >
> > > > file: lib/librte_ether/rte_ethdev.h
> > > >         RTE_ETH_EVENT_INTR_RESET,
> > > > 		/**< reset interrupt event, sent to VF on PF reset */
> > > >
> > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > >
> > > > if application need to call rte_ethdev_reset() on
> > > > RTE_ETH_EVENT_INTR_RESET event then please mention it commit log or
> > API description.
> > > Good suggestion. I'll try to find where's the good place to add more
> > explanation.
> > 
> > I guess then reset API can be changed to rte_ethdev_process_reset_intr() or
> > similar to reflect the use case(API called by application on reset event from PF)
> > 
> > The PMDs were PF does not generate the RTE_ETH_EVENT_INTR_RESET to VF
> > then VF's reset PMD callback shall be a 'nop'
> > 
> > Jerin
> But I don't think it's appropriate to bind the RTE_ETH_EVENTs with the APIs. This patch set provide a reset API to reset the device. Don't mean this reset API only can be used when the APP hit the event RTE_ETH_EVENT_INTR_RESET. I can add some comments to suggest the user to call the reset API at that time. But I think APP can call the reset API anytime when it thinks it's necessary. So I don't like the name *process_reset_intr*, it hints that this API is only for the INTR_RESET event.

That's where scope of API and PMD implementation its not getting clear.
Can you tell me any other use case where we need to call this API from application.
The name rte_ethdev_reset() is too generic. If you are going with that
generic name then you may need add lot of details in API description.

Thomas,
As a librte_ether maintainer any comments on this?


> 
> > 
> > > >
> > >

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  7:59                                             ` Jerin Jacob
@ 2016-06-22  8:17                                               ` Thomas Monjalon
  2016-06-22  8:25                                                 ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Monjalon @ 2016-06-22  8:17 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Lu, Wenzhuo, Ananyev, Konstantin, Stephen Hemminger, dev,
	Richardson, Bruce, Chen, Jing D, Liang, Cunming, Wu, Jingjing,
	Zhang, Helin

2016-06-22 13:29, Jerin Jacob:
> Thomas,
> As a librte_ether maintainer any comments on this?

+1 for adding details and make sure naming is good.
I don't really need to comment here because I have already done this
comment earlier:
	http://dpdk.org/ml/archives/dev/2016-June/041845.html
Thank you for insisting.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  8:17                                               ` Thomas Monjalon
@ 2016-06-22  8:25                                                 ` Lu, Wenzhuo
  2016-06-22  9:18                                                   ` Thomas Monjalon
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-22  8:25 UTC (permalink / raw)
  To: Thomas Monjalon, Jerin Jacob
  Cc: Ananyev, Konstantin, Stephen Hemminger, dev, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin

Hi Thomas,


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, June 22, 2016 4:17 PM
> To: Jerin Jacob
> Cc: Lu, Wenzhuo; Ananyev, Konstantin; Stephen Hemminger; dev@dpdk.org;
> Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> 2016-06-22 13:29, Jerin Jacob:
> > Thomas,
> > As a librte_ether maintainer any comments on this?
> 
> +1 for adding details and make sure naming is good.
> I don't really need to comment here because I have already done this comment
> earlier:
> 	http://dpdk.org/ml/archives/dev/2016-June/041845.html
> Thank you for insisting.
I've add some details in this patch set. If it's not enough, please let me know.
And I think this discussion is about what the API name should be like. Actually I think all the existing name is describing what is done by the API not when and where it should be used, like dev_start/stop.
But anyway I'm open for changing the name. Is the name process_reset_intr you prefer? Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  8:25                                                 ` Lu, Wenzhuo
@ 2016-06-22  9:18                                                   ` Thomas Monjalon
  2016-06-22 11:06                                                     ` Jerin Jacob
  2016-06-23  0:39                                                     ` Lu, Wenzhuo
  0 siblings, 2 replies; 72+ messages in thread
From: Thomas Monjalon @ 2016-06-22  9:18 UTC (permalink / raw)
  To: Lu, Wenzhuo, Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Stephen Hemminger, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin

2016-06-22 08:25, Lu, Wenzhuo:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-06-22 13:29, Jerin Jacob:
> > > Thomas,
> > > As a librte_ether maintainer any comments on this?
> > 
> > +1 for adding details and make sure naming is good.
> > I don't really need to comment here because I have already done this comment
> > earlier:
> > 	http://dpdk.org/ml/archives/dev/2016-June/041845.html
> > Thank you for insisting.
> I've add some details in this patch set. If it's not enough, please let me know.
> And I think this discussion is about what the API name should be like. Actually I think all the existing name is describing what is done by the API not when and where it should be used, like dev_start/stop.

You're right, I overlooked it:

+ * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
+ * queues, restart the port.

Jerin, which detail do you think is needed?

Wenzhuo, why this function is needed?
All these actions are already possible independently.
When looking at ixgbe implementation, I see:
	ixgbevf_dev_stats_reset() which is not documented in the API
	rte_delay_ms(1000);
	do {} while
It looks to be some hacks.
If you really need some workarounds to handle some tricky situations,
maybe that the API is not detailed enough.

> But anyway I'm open for changing the name. Is the name process_reset_intr you prefer? Thanks.

Not sure.
If you really intend to add a generic reset, maybe rte_eth_dev_reset()
is a good name. We just need more justification.
After reading the doc, the user can understand it is just a wrapper of
existing functions. But it appears in the code that it does more and can
help in some situations.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  9:18                                                   ` Thomas Monjalon
@ 2016-06-22 11:06                                                     ` Jerin Jacob
  2016-06-23  0:45                                                       ` Lu, Wenzhuo
  2016-06-23  0:39                                                     ` Lu, Wenzhuo
  1 sibling, 1 reply; 72+ messages in thread
From: Jerin Jacob @ 2016-06-22 11:06 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Lu, Wenzhuo, dev, Ananyev, Konstantin, Stephen Hemminger,
	Richardson, Bruce, Chen, Jing D, Liang, Cunming, Wu, Jingjing,
	Zhang, Helin

On Wed, Jun 22, 2016 at 11:18:21AM +0200, Thomas Monjalon wrote:
> 2016-06-22 08:25, Lu, Wenzhuo:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-06-22 13:29, Jerin Jacob:
> > > > Thomas,
> > > > As a librte_ether maintainer any comments on this?
> > > 
> > > +1 for adding details and make sure naming is good.
> > > I don't really need to comment here because I have already done this comment
> > > earlier:
> > > 	http://dpdk.org/ml/archives/dev/2016-June/041845.html
> > > Thank you for insisting.
> > I've add some details in this patch set. If it's not enough, please let me know.
> > And I think this discussion is about what the API name should be like. Actually I think all the existing name is describing what is done by the API not when and where it should be used, like dev_start/stop.
> 
> You're right, I overlooked it:
> 
> + * The API will stop the port, clear the rx/tx queues, re-setup the rx/tx
> + * queues, restart the port.
> 
> Jerin, which detail do you think is needed?

When to use what ? In what scenarios application need to use
generic stop/start vs this new API?

How about calling it as rte_eth_dev_restart() ?

If existing stop and then start is same the new API in functional perspective,
How about having generic implementation of rte_eth_dev_restart() if PMD
specific restart handlers are NOT found.

That why application need to call only rte_eth_dev_restart() for port
restart. It can internally decide optimized stop/start or generic
restart

Jerin

> 
> Wenzhuo, why this function is needed?
> All these actions are already possible independently.
> When looking at ixgbe implementation, I see:
> 	ixgbevf_dev_stats_reset() which is not documented in the API
> 	rte_delay_ms(1000);
> 	do {} while
> It looks to be some hacks.
> If you really need some workarounds to handle some tricky situations,
> maybe that the API is not detailed enough.
> 
> > But anyway I'm open for changing the name. Is the name process_reset_intr you prefer? Thanks.
> 
> Not sure.
> If you really intend to add a generic reset, maybe rte_eth_dev_reset()
> is a good name. We just need more justification.
> After reading the doc, the user can understand it is just a wrapper of
> existing functions. But it appears in the code that it does more and can
> help in some situations.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22  9:18                                                   ` Thomas Monjalon
  2016-06-22 11:06                                                     ` Jerin Jacob
@ 2016-06-23  0:39                                                     ` Lu, Wenzhuo
  1 sibling, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-23  0:39 UTC (permalink / raw)
  To: Thomas Monjalon, Jerin Jacob
  Cc: dev, Ananyev, Konstantin, Stephen Hemminger, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin

Hi Thomas,


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Wednesday, June 22, 2016 5:18 PM
> To: Lu, Wenzhuo; Jerin Jacob
> Cc: dev@dpdk.org; Ananyev, Konstantin; Stephen Hemminger; Richardson,
> Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> 2016-06-22 08:25, Lu, Wenzhuo:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-06-22 13:29, Jerin Jacob:
> > > > Thomas,
> > > > As a librte_ether maintainer any comments on this?
> > >
> > > +1 for adding details and make sure naming is good.
> > > I don't really need to comment here because I have already done this
> > > comment
> > > earlier:
> > > 	http://dpdk.org/ml/archives/dev/2016-June/041845.html
> > > Thank you for insisting.
> > I've add some details in this patch set. If it's not enough, please let me know.
> > And I think this discussion is about what the API name should be like. Actually I
> think all the existing name is describing what is done by the API not when and
> where it should be used, like dev_start/stop.
> 
> You're right, I overlooked it:
> 
> + * The API will stop the port, clear the rx/tx queues, re-setup the
> + rx/tx
> + * queues, restart the port.
> 
> Jerin, which detail do you think is needed?
> 
> Wenzhuo, why this function is needed?
As you said below and discussed before, it's a wrapper of the existing functions. The benefit is helping the users avoid the complex implementation when they want to stop and re-start the device.

> All these actions are already possible independently.
> When looking at ixgbe implementation, I see:
> 	ixgbevf_dev_stats_reset() which is not documented in the API
> 	rte_delay_ms(1000);
> 	do {} while
> It looks to be some hacks.
> If you really need some workarounds to handle some tricky situations, maybe
> that the API is not detailed enough.
Yes, you're right. Still something left. I'll add more detail.

> 
> > But anyway I'm open for changing the name. Is the name process_reset_intr
> you prefer? Thanks.
> 
> Not sure.
> If you really intend to add a generic reset, maybe rte_eth_dev_reset() is a good
> name. We just need more justification.
> After reading the doc, the user can understand it is just a wrapper of existing
> functions. But it appears in the code that it does more and can help in some
> situations.
I'll add more info. Thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 1/4] lib/librte_ether: support device reset
  2016-06-22 11:06                                                     ` Jerin Jacob
@ 2016-06-23  0:45                                                       ` Lu, Wenzhuo
  0 siblings, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-06-23  0:45 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon
  Cc: dev, Ananyev, Konstantin, Stephen Hemminger, Richardson, Bruce,
	Chen, Jing D, Liang, Cunming, Wu, Jingjing, Zhang, Helin

Hi Jerin,


> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, June 22, 2016 7:07 PM
> To: Thomas Monjalon
> Cc: Lu, Wenzhuo; dev@dpdk.org; Ananyev, Konstantin; Stephen Hemminger;
> Richardson, Bruce; Chen, Jing D; Liang, Cunming; Wu, Jingjing; Zhang, Helin
> Subject: Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_ether: support device reset
> 
> On Wed, Jun 22, 2016 at 11:18:21AM +0200, Thomas Monjalon wrote:
> > 2016-06-22 08:25, Lu, Wenzhuo:
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2016-06-22 13:29, Jerin Jacob:
> > > > > Thomas,
> > > > > As a librte_ether maintainer any comments on this?
> > > >
> > > > +1 for adding details and make sure naming is good.
> > > > I don't really need to comment here because I have already done
> > > > this comment
> > > > earlier:
> > > > 	http://dpdk.org/ml/archives/dev/2016-June/041845.html
> > > > Thank you for insisting.
> > > I've add some details in this patch set. If it's not enough, please let me know.
> > > And I think this discussion is about what the API name should be like. Actually
> I think all the existing name is describing what is done by the API not when and
> where it should be used, like dev_start/stop.
> >
> > You're right, I overlooked it:
> >
> > + * The API will stop the port, clear the rx/tx queues, re-setup the
> > + rx/tx
> > + * queues, restart the port.
> >
> > Jerin, which detail do you think is needed?
> 
> When to use what ? In what scenarios application need to use generic stop/start
> vs this new API?
I'll add more explanation. Actually I've written an example. But after discussion we agree it's not a good idea to add a totally new example just for one function. I'm thinking about now to fuse this example into testpmd.

> 
> How about calling it as rte_eth_dev_restart() ?
Sounds good :)

> 
> If existing stop and then start is same the new API in functional perspective, How
> about having generic implementation of rte_eth_dev_restart() if PMD specific
> restart handlers are NOT found.
Good suggestion, thanks.

> 
> That why application need to call only rte_eth_dev_restart() for port restart. It
> can internally decide optimized stop/start or generic restart
> 
> Jerin
> 
> >
> > Wenzhuo, why this function is needed?
> > All these actions are already possible independently.
> > When looking at ixgbe implementation, I see:
> > 	ixgbevf_dev_stats_reset() which is not documented in the API
> > 	rte_delay_ms(1000);
> > 	do {} while
> > It looks to be some hacks.
> > If you really need some workarounds to handle some tricky situations,
> > maybe that the API is not detailed enough.
> >
> > > But anyway I'm open for changing the name. Is the name process_reset_intr
> you prefer? Thanks.
> >
> > Not sure.
> > If you really intend to add a generic reset, maybe rte_eth_dev_reset()
> > is a good name. We just need more justification.
> > After reading the doc, the user can understand it is just a wrapper of
> > existing functions. But it appears in the code that it does more and
> > can help in some situations.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
                     ` (3 preceding siblings ...)
  2016-06-20  6:24   ` [PATCH v6 4/4] i40e: " Wenzhuo Lu
@ 2016-07-04 15:48   ` Luca Boccassi
  2016-07-05  0:52     ` Lu, Wenzhuo
  4 siblings, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-07-04 15:48 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> If the PF link is down and up, VF link will not work accordingly.
> This patch set addes the support of VF link reset. So, when VF
> receices the messges of physical link down/up. APP can reset the
> VF link and let it recover.
> 
> PS: This patch set is splitted from a previous patch set,
> *automatic link recovery on ixgbe/igb VF*, and it's base on the
> patch set *support mailbox interruption on ixgbe/igb VF*.
> 
> Wenzhuo Lu (3):
>   lib/librte_ether: support device reset
>   ixgbe: implement device reset on VF
>   igb: implement device reset on VF
> 
> Zhe Tao (1):
>   i40e: implement device reset on VF
> 
> v1:
> - Added the implementation for the VF reset functionality.
> v2:
> - Changed the i40e related operations during VF reset.
> v3:
> - Resent the patches because of the mail sent issue.
> v4:
> - Removed some VF reset emulation code.
> v5:
> - Removed all the code related with lock.
> v6:
> - Updated the NIC feature overview matrix.
> - Added more explanation in the doxygen comment of reset API.
> 
>  doc/guides/nics/overview.rst           |  1 +
>  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
>  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
>  drivers/net/i40e/i40e_ethdev.h         |  4 ++
>  drivers/net/i40e/i40e_ethdev_vf.c      | 83 ++++++++++++++++++++++++++++++++++
>  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
>  drivers/net/i40e/i40e_rxtx.h           |  4 ++
>  drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
>  lib/librte_ether/rte_ethdev.c          | 17 +++++++
>  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
>  lib/librte_ether/rte_ether_version.map |  7 +++
>  13 files changed, 295 insertions(+), 5 deletions(-)

Hello Wenzhuo,

I'm testing this patchset, but I am sporadically running into an issue
where the VFs reset fails after the PF flaps.

I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.

I am using calling rte_eth_dev_reset in response to a
RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in
the log:

PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.

Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
The application has ~500 2MB hugepages, and there's 2GB of free memory
available on top of that.

Have you seen this before? Any pointer or suggestion for debugging?

Thanks!

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-04 15:48   ` [PATCH v6 0/4] support reset of VF link Luca Boccassi
@ 2016-07-05  0:52     ` Lu, Wenzhuo
  2016-07-05  9:52       ` Luca Boccassi
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-05  0:52 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

Hi Luca,


> -----Original Message-----
> From: Luca Boccassi [mailto:lboccass@Brocade.com]
> Sent: Monday, July 4, 2016 11:48 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> 
> On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > If the PF link is down and up, VF link will not work accordingly.
> > This patch set addes the support of VF link reset. So, when VF
> > receices the messges of physical link down/up. APP can reset the VF
> > link and let it recover.
> >
> > PS: This patch set is splitted from a previous patch set, *automatic
> > link recovery on ixgbe/igb VF*, and it's base on the patch set
> > *support mailbox interruption on ixgbe/igb VF*.
> >
> > Wenzhuo Lu (3):
> >   lib/librte_ether: support device reset
> >   ixgbe: implement device reset on VF
> >   igb: implement device reset on VF
> >
> > Zhe Tao (1):
> >   i40e: implement device reset on VF
> >
> > v1:
> > - Added the implementation for the VF reset functionality.
> > v2:
> > - Changed the i40e related operations during VF reset.
> > v3:
> > - Resent the patches because of the mail sent issue.
> > v4:
> > - Removed some VF reset emulation code.
> > v5:
> > - Removed all the code related with lock.
> > v6:
> > - Updated the NIC feature overview matrix.
> > - Added more explanation in the doxygen comment of reset API.
> >
> >  doc/guides/nics/overview.rst           |  1 +
> >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> >  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
> >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> ++++++++++++++++++++++++++++++++++
> >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
> >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> >  lib/librte_ether/rte_ether_version.map |  7 +++
> >  13 files changed, 295 insertions(+), 5 deletions(-)
> 
> Hello Wenzhuo,
> 
> I'm testing this patchset, but I am sporadically running into an issue where the
> VFs reset fails after the PF flaps.
> 
> I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> 
> I am using calling rte_eth_dev_reset in response to a
> RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
> log:
> 
> PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
> PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> 
> Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> The application has ~500 2MB hugepages, and there's 2GB of free memory
> available on top of that.
> 
> Have you seen this before? Any pointer or suggestion for debugging?
> 
> Thanks!
> 
> --
> Kind regards,
> Luca Boccassi
I think the problem is the mbuf occupied by the packets is not released. This memory has to be released by the APP, so my patches haven’t covered this. Actually an example is needed to show how to use the reset API. I plan to modify the testpmd.
You may notice this feature is postponed to 16.11. Would you like to wait for the new version that will include an example? 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-05  0:52     ` Lu, Wenzhuo
@ 2016-07-05  9:52       ` Luca Boccassi
  2016-07-06  0:45         ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-07-05  9:52 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> Hi Luca,
> 
> 
> > -----Original Message-----
> > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > Sent: Monday, July 4, 2016 11:48 PM
> > To: Lu, Wenzhuo
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > 
> > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > If the PF link is down and up, VF link will not work accordingly.
> > > This patch set addes the support of VF link reset. So, when VF
> > > receices the messges of physical link down/up. APP can reset the VF
> > > link and let it recover.
> > >
> > > PS: This patch set is splitted from a previous patch set, *automatic
> > > link recovery on ixgbe/igb VF*, and it's base on the patch set
> > > *support mailbox interruption on ixgbe/igb VF*.
> > >
> > > Wenzhuo Lu (3):
> > >   lib/librte_ether: support device reset
> > >   ixgbe: implement device reset on VF
> > >   igb: implement device reset on VF
> > >
> > > Zhe Tao (1):
> > >   i40e: implement device reset on VF
> > >
> > > v1:
> > > - Added the implementation for the VF reset functionality.
> > > v2:
> > > - Changed the i40e related operations during VF reset.
> > > v3:
> > > - Resent the patches because of the mail sent issue.
> > > v4:
> > > - Removed some VF reset emulation code.
> > > v5:
> > > - Removed all the code related with lock.
> > > v6:
> > > - Updated the NIC feature overview matrix.
> > > - Added more explanation in the doxygen comment of reset API.
> > >
> > >  doc/guides/nics/overview.rst           |  1 +
> > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > >  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
> > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > ++++++++++++++++++++++++++++++++++
> > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
> > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > 
> > Hello Wenzhuo,
> > 
> > I'm testing this patchset, but I am sporadically running into an issue where the
> > VFs reset fails after the PF flaps.
> > 
> > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > 
> > I am using calling rte_eth_dev_reset in response to a
> > RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear in the
> > log:
> > 
> > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
> > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > 
> > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> > ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> > The application has ~500 2MB hugepages, and there's 2GB of free memory
> > available on top of that.
> > 
> > Have you seen this before? Any pointer or suggestion for debugging?
> > 
> > Thanks!
> > 
> > --
> > Kind regards,
> > Luca Boccassi
> I think the problem is the mbuf occupied by the packets is not released. This memory has to be released by the APP, so my patches haven’t covered this. Actually an example is needed to show how to use the reset API. I plan to modify the testpmd.
> You may notice this feature is postponed to 16.11. Would you like to wait for the new version that will include an example? 

Hi,

Unfortunately we need the VF reset working sooner than that, so one way
or the other I'll need to sort it out. Given I've got a use case where
this is happening, if it can be helpful for you I'm more than happy to
help as a guinea pig. If you could please give some guidance/guidelines
with regards to which API to use to sort the mbuf problem, I can try it
out and give back some feedback.

Thanks!

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-05  9:52       ` Luca Boccassi
@ 2016-07-06  0:45         ` Lu, Wenzhuo
  2016-07-06 16:26           ` Luca Boccassi
       [not found]           ` <1467822182.32466.34.camel@brocade.com>
  0 siblings, 2 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-06  0:45 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 5088 bytes --]

Hi Luca,

> -----Original Message-----
> From: Luca Boccassi [mailto:lboccass@Brocade.com]
> Sent: Tuesday, July 5, 2016 5:53 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> 
> On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > Hi Luca,
> >
> >
> > > -----Original Message-----
> > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > Sent: Monday, July 4, 2016 11:48 PM
> > > To: Lu, Wenzhuo
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > >
> > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > If the PF link is down and up, VF link will not work accordingly.
> > > > This patch set addes the support of VF link reset. So, when VF
> > > > receices the messges of physical link down/up. APP can reset the
> > > > VF link and let it recover.
> > > >
> > > > PS: This patch set is splitted from a previous patch set,
> > > > *automatic link recovery on ixgbe/igb VF*, and it's base on the
> > > > patch set *support mailbox interruption on ixgbe/igb VF*.
> > > >
> > > > Wenzhuo Lu (3):
> > > >   lib/librte_ether: support device reset
> > > >   ixgbe: implement device reset on VF
> > > >   igb: implement device reset on VF
> > > >
> > > > Zhe Tao (1):
> > > >   i40e: implement device reset on VF
> > > >
> > > > v1:
> > > > - Added the implementation for the VF reset functionality.
> > > > v2:
> > > > - Changed the i40e related operations during VF reset.
> > > > v3:
> > > > - Resent the patches because of the mail sent issue.
> > > > v4:
> > > > - Removed some VF reset emulation code.
> > > > v5:
> > > > - Removed all the code related with lock.
> > > > v6:
> > > > - Updated the NIC feature overview matrix.
> > > > - Added more explanation in the doxygen comment of reset API.
> > > >
> > > >  doc/guides/nics/overview.rst           |  1 +
> > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > >  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
> > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > ++++++++++++++++++++++++++++++++++
> > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
> > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > >
> > > Hello Wenzhuo,
> > >
> > > I'm testing this patchset, but I am sporadically running into an
> > > issue where the VFs reset fails after the PF flaps.
> > >
> > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > >
> > > I am using calling rte_eth_dev_reset in response to a
> > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear
> > > in the
> > > log:
> > >
> > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
> > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > >
> > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> > > ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> > > The application has ~500 2MB hugepages, and there's 2GB of free
> > > memory available on top of that.
> > >
> > > Have you seen this before? Any pointer or suggestion for debugging?
> > >
> > > Thanks!
> > >
> > > --
> > > Kind regards,
> > > Luca Boccassi
> > I think the problem is the mbuf occupied by the packets is not released. This
> memory has to be released by the APP, so my patches haven’t covered this.
> Actually an example is needed to show how to use the reset API. I plan to modify
> the testpmd.
> > You may notice this feature is postponed to 16.11. Would you like to wait for
> the new version that will include an example?
> 
> Hi,
> 
> Unfortunately we need the VF reset working sooner than that, so one way or
> the other I'll need to sort it out. Given I've got a use case where this is happening,
> if it can be helpful for you I'm more than happy to help as a guinea pig. If you
> could please give some guidance/guidelines with regards to which API to use to
> sort the mbuf problem, I can try it out and give back some feedback.
> 
> Thanks!
I made a stupid mistake and deleted all my code. So, I have to take some time to rewrite it :(
Attached the example I used to test the reset API. It's modified from the l2fwd example. So you can compare it with l2fwd to see what need to be added.
Hopefully it can help :)

> 
> --
> Kind regards,
> Luca Boccassi

[-- Attachment #2: main.c --]
[-- Type: text/plain, Size: 20990 bytes --]

/*-
 *   BSD LICENSE
 *
 *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 *   All rights reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
 *   are met:
 *
 *     * Redistributions of source code must retain the above copyright
 *       notice, this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in
 *       the documentation and/or other materials provided with the
 *       distribution.
 *     * Neither the name of Intel Corporation nor the names of its
 *       contributors may be used to endorse or promote products derived
 *       from this software without specific prior written permission.
 *
 *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
 *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <sys/types.h>
#include <sys/queue.h>
#include <netinet/in.h>
#include <setjmp.h>
#include <stdarg.h>
#include <ctype.h>
#include <errno.h>
#include <getopt.h>
#include <signal.h>
#include <stdbool.h>

#include <rte_common.h>
#include <rte_log.h>
#include <rte_malloc.h>
#include <rte_memory.h>
#include <rte_memcpy.h>
#include <rte_memzone.h>
#include <rte_eal.h>
#include <rte_per_lcore.h>
#include <rte_launch.h>
#include <rte_atomic.h>
#include <rte_cycles.h>
#include <rte_prefetch.h>
#include <rte_lcore.h>
#include <rte_per_lcore.h>
#include <rte_branch_prediction.h>
#include <rte_interrupts.h>
#include <rte_pci.h>
#include <rte_random.h>
#include <rte_debug.h>
#include <rte_ether.h>
#include <rte_ethdev.h>
#include <rte_ring.h>
#include <rte_mempool.h>
#include <rte_mbuf.h>

static volatile bool force_quit;

#define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1

#define NB_MBUF   8192

#define MAX_PKT_BURST 32
#define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */

/*
 * Configurable number of RX/TX ring descriptors
 */
#define RTE_TEST_RX_DESC_DEFAULT 128
#define RTE_TEST_TX_DESC_DEFAULT 512
static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;

/* ethernet addresses of ports */
static struct ether_addr l2fwd_ports_eth_addr[RTE_MAX_ETHPORTS];

/* mask of enabled ports */
static uint32_t l2fwd_enabled_port_mask;

/* list of enabled ports */
static uint32_t l2fwd_dst_ports[RTE_MAX_ETHPORTS];

static unsigned int l2fwd_rx_queue_per_lcore = 1;

#define MAX_RX_QUEUE_PER_LCORE 16
#define MAX_TX_QUEUE_PER_PORT 16
struct lcore_queue_conf {
	unsigned n_rx_port;
	unsigned rx_port_list[MAX_RX_QUEUE_PER_LCORE];
} __rte_cache_aligned;
struct lcore_queue_conf lcore_queue_conf[RTE_MAX_LCORE];

static struct rte_eth_dev_tx_buffer *tx_buffer[RTE_MAX_ETHPORTS];

static const struct rte_eth_conf port_conf = {
	.rxmode = {
		.split_hdr_size = 0,
		.header_split   = 0, /**< Header Split disabled */
		.hw_ip_checksum = 0, /**< IP checksum offload disabled */
		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
	},
	.txmode = {
		.mq_mode = ETH_MQ_TX_NONE,
	},
};

struct rte_mempool *l2fwd_pktmbuf_pool;

/* Per-port statistics struct */
struct l2fwd_port_statistics {
	uint64_t tx;
	uint64_t rx;
	uint64_t dropped;
} __rte_cache_aligned;
struct l2fwd_port_statistics port_statistics[RTE_MAX_ETHPORTS];

/* A tsc-based timer responsible for triggering statistics printout */
#define TIMER_MILLISECOND 2000000ULL /* around 1ms at 2 Ghz */
#define MAX_TIMER_PERIOD 86400 /* 1 day max */
/* default period is 10 seconds */
static int64_t timer_period = 10 * TIMER_MILLISECOND * 1000;

static uint32_t stop_forwarding;
static uint8_t reset_port;

static rte_spinlock_t ports_rx_lock[RTE_MAX_ETHPORTS];
static rte_spinlock_t ports_tx_lock[RTE_MAX_ETHPORTS];

/* Print out statistics on packets dropped */
static void
print_stats(void)
{
	uint64_t total_packets_dropped, total_packets_tx, total_packets_rx;
	unsigned portid;

	total_packets_dropped = 0;
	total_packets_tx = 0;
	total_packets_rx = 0;

	const char clr[] = { 27, '[', '2', 'J', '\0' };
	const char topLeft[] = { 27, '[', '1', ';', '1', 'H', '\0' };

		/* Clear screen and move to top left */
	printf("%s%s", clr, topLeft);

	printf("\nPort statistics ====================================");

	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
		/* skip disabled ports */
		if ((l2fwd_enabled_port_mask & (1 << portid)) == 0)
			continue;
		printf("\nStatistics for port %u ------------------------------"
			   "\nPackets sent: %24"PRIu64
			   "\nPackets received: %20"PRIu64
			   "\nPackets dropped: %21"PRIu64,
			   portid,
			   port_statistics[portid].tx,
			   port_statistics[portid].rx,
			   port_statistics[portid].dropped);

		total_packets_dropped += port_statistics[portid].dropped;
		total_packets_tx += port_statistics[portid].tx;
		total_packets_rx += port_statistics[portid].rx;
	}
	printf("\nAggregate statistics ==============================="
		   "\nTotal packets sent: %18"PRIu64
		   "\nTotal packets received: %14"PRIu64
		   "\nTotal packets dropped: %15"PRIu64,
		   total_packets_tx,
		   total_packets_rx,
		   total_packets_dropped);
	printf("\n====================================================\n");
}

static void
l2fwd_simple_forward(struct rte_mbuf *m, unsigned portid)
{
	struct ether_hdr *eth;
	void *tmp;
	unsigned dst_port;
	int sent;
	struct rte_eth_dev_tx_buffer *buffer;

	dst_port = l2fwd_dst_ports[portid];
	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);

	/* 02:00:00:00:00:xx */
	tmp = &eth->d_addr.addr_bytes[0];
	*((uint64_t *)tmp) = 0x000000000002 + ((uint64_t)dst_port << 40);

	/* src addr */
	ether_addr_copy(&l2fwd_ports_eth_addr[dst_port], &eth->s_addr);

	buffer = tx_buffer[dst_port];
	if (!rte_spinlock_trylock(&ports_tx_lock[dst_port])) {
		rte_eth_tx_buffer_drop_callback
			(buffer->pkts, buffer->length, 0);
	} else {
		sent = rte_eth_tx_buffer(dst_port, 0, buffer, m);
		if (sent)
			port_statistics[dst_port].tx += sent;
		rte_spinlock_unlock(&ports_tx_lock[dst_port]);
	}
}

/* main processing loop */
static void
l2fwd_main_loop(void)
{
	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
	struct rte_mbuf *m;
	int sent;
	unsigned lcore_id;
	uint64_t prev_tsc, diff_tsc, cur_tsc, timer_tsc;
	unsigned i, j, portid, nb_rx;
	struct lcore_queue_conf *qconf;
	const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) /
				   US_PER_S * BURST_TX_DRAIN_US;
	struct rte_eth_dev_tx_buffer *buffer;

	prev_tsc = 0;
	timer_tsc = 0;

	lcore_id = rte_lcore_id();
	qconf = &lcore_queue_conf[lcore_id];

	if (qconf->n_rx_port == 0) {
		RTE_LOG(INFO, L2FWD, "lcore %u has nothing to do\n", lcore_id);
		return;
	}

	RTE_LOG(INFO, L2FWD, "entering main loop on lcore %u\n", lcore_id);

	for (i = 0; i < qconf->n_rx_port; i++) {

		portid = qconf->rx_port_list[i];
		RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id,
			portid);

	}

	while (!force_quit) {

		cur_tsc = rte_rdtsc();

		/*
		 * TX burst queue drain
		 */
		diff_tsc = cur_tsc - prev_tsc;
		if (unlikely(diff_tsc > drain_tsc)) {

			for (i = 0; i < qconf->n_rx_port; i++) {
				portid =
					l2fwd_dst_ports[qconf->rx_port_list[i]];
				buffer = tx_buffer[portid];
				if (!rte_spinlock_trylock(&ports_tx_lock[portid])) {
					rte_eth_tx_buffer_drop_callback
						(buffer->pkts, buffer->length, 0);
					continue;
				}

				sent = rte_eth_tx_buffer_flush(portid,
							       0,
							       buffer);
				if (sent)
					port_statistics[portid].tx += sent;
				rte_spinlock_unlock(&ports_tx_lock[portid]);
			}

			/* if timer is enabled */
			if (timer_period > 0) {

				/* advance the timer */
				timer_tsc += diff_tsc;

				/* if timer has reached its timeout */
				if (unlikely(timer_tsc >=
					     (uint64_t) timer_period)) {
					/* do this only on master core */
					if (lcore_id ==
					    rte_get_master_lcore()) {
						print_stats();
						/* reset the timer */
						timer_tsc = 0;
					}
				}
			}

			prev_tsc = cur_tsc;
		}

		/*
		 * Read packet from RX queues
		 */
		for (i = 0; i < qconf->n_rx_port; i++) {

			portid = qconf->rx_port_list[i];
			if (!rte_spinlock_trylock(&ports_rx_lock[portid]))
				continue;

			nb_rx = rte_eth_rx_burst((uint8_t) portid, 0,
						 pkts_burst, MAX_PKT_BURST);

			port_statistics[portid].rx += nb_rx;

			rte_spinlock_unlock(&ports_rx_lock[portid]);

			for (j = 0; j < nb_rx; j++) {
				m = pkts_burst[j];
				rte_prefetch0(rte_pktmbuf_mtod(m, void *));
				l2fwd_simple_forward(m, portid);
			}
		}
	}
}

static int
l2fwd_launch_one_lcore(__attribute__((unused)) void *dummy)
{
	l2fwd_main_loop();
	return 0;
}

/* display usage */
static void
l2fwd_usage(const char *prgname)
{
	printf("%s [EAL options] -- -p PORTMASK [-q NQ]\n"
	       "  -p PORTMASK: hexadecimal bitmask of ports to configure\n"
	       "  -q NQ: number of queue (=ports) per lcore (default is 1)\n"
		   "  -T PERIOD: statistics will be refreshed each PERIOD seconds (0 to disable, 10 default, 86400 maximum)\n",
	       prgname);
}

static int
l2fwd_parse_portmask(const char *portmask)
{
	char *end = NULL;
	unsigned long pm;

	/* parse hexadecimal string */
	pm = strtoul(portmask, &end, 16);
	if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0'))
		return -1;

	if (pm == 0)
		return -1;

	return pm;
}

static unsigned int
l2fwd_parse_nqueue(const char *q_arg)
{
	char *end = NULL;
	unsigned long n;

	/* parse hexadecimal string */
	n = strtoul(q_arg, &end, 10);
	if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0'))
		return 0;
	if (n == 0)
		return 0;
	if (n >= MAX_RX_QUEUE_PER_LCORE)
		return 0;

	return n;
}

static int
l2fwd_parse_timer_period(const char *q_arg)
{
	char *end = NULL;
	int n;

	/* parse number string */
	n = strtol(q_arg, &end, 10);
	if ((q_arg[0] == '\0') || (end == NULL) || (*end != '\0'))
		return -1;
	if (n >= MAX_TIMER_PERIOD)
		return -1;

	return n;
}

/* Parse the argument given in the command line of the application */
static int
l2fwd_parse_args(int argc, char **argv)
{
	int opt, ret;
	char **argvopt;
	int option_index;
	char *prgname = argv[0];
	static struct option lgopts[] = {
		{NULL, 0, 0, 0}
	};

	argvopt = argv;

	while ((opt = getopt_long(argc, argvopt, "p:q:T:",
				  lgopts, &option_index)) != EOF) {

		switch (opt) {
		/* portmask */
		case 'p':
			l2fwd_enabled_port_mask = l2fwd_parse_portmask(optarg);
			if (l2fwd_enabled_port_mask == 0) {
				printf("invalid portmask\n");
				l2fwd_usage(prgname);
				return -1;
			}
			break;

		/* nqueue */
		case 'q':
			l2fwd_rx_queue_per_lcore = l2fwd_parse_nqueue(optarg);
			if (l2fwd_rx_queue_per_lcore == 0) {
				printf("invalid queue number\n");
				l2fwd_usage(prgname);
				return -1;
			}
			break;

		/* timer period */
		case 'T':
			timer_period = l2fwd_parse_timer_period(optarg) *
				       1000 * TIMER_MILLISECOND;
			if (timer_period < 0) {
				printf("invalid timer period\n");
				l2fwd_usage(prgname);
				return -1;
			}
			break;

		/* long options */
		case 0:
			l2fwd_usage(prgname);
			return -1;

		default:
			l2fwd_usage(prgname);
			return -1;
		}
	}

	if (optind >= 0)
		argv[optind-1] = prgname;

	ret = optind-1;
	optind = 0; /* reset getopt lib */
	return ret;
}

/* Check the link status of all ports in up to 9s, and print them finally */
static void
check_all_ports_link_status(uint8_t port_num, uint32_t port_mask)
{
#define CHECK_INTERVAL 100 /* 100ms */
#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */
	uint8_t portid, count, all_ports_up, print_flag = 0;
	struct rte_eth_link link;

	printf("\nChecking link status");
	fflush(stdout);
	for (count = 0; count <= MAX_CHECK_TIME; count++) {
		if (force_quit)
			return;
		all_ports_up = 1;
		for (portid = 0; portid < port_num; portid++) {
			if (force_quit)
				return;
			if ((port_mask & (1 << portid)) == 0)
				continue;
			memset(&link, 0, sizeof(link));
			rte_eth_link_get_nowait(portid, &link);
			/* print link status if flag set */
			if (print_flag == 1) {
				if (link.link_status)
					printf("Port %d Link Up - speed %u "
						"Mbps - %s\n", (uint8_t)portid,
						(unsigned)link.link_speed,
				(link.link_duplex == ETH_LINK_FULL_DUPLEX) ?
					("full-duplex") : ("half-duplex\n"));
				else
					printf("Port %d Link Down\n",
						(uint8_t)portid);
				continue;
			}
			/* clear all_ports_up flag if any link down */
			if (link.link_status == ETH_LINK_DOWN) {
				all_ports_up = 0;
				break;
			}
		}
		/* after finally printing all link status, get out */
		if (print_flag == 1)
			break;

		if (all_ports_up == 0) {
			printf(".");
			fflush(stdout);
			rte_delay_ms(CHECK_INTERVAL);
		}

		/* set the print_flag if all ports up or timeout */
		if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) {
			print_flag = 1;
			printf("done\n");
		}
	}
}

static void
signal_handler(int signum)
{
	if (signum == SIGINT || signum == SIGTERM) {
		printf("\n\nSignal %d received, preparing to exit...\n",
				signum);
		force_quit = true;
	}
}

static void
reset_event_callback(uint8_t port_id, enum rte_eth_event_type type, void *param)
{
	RTE_SET_USED(param);

	printf("\n\nIn registered callback...\n");
	printf("Event type: %s on port %d\n",
		type == RTE_ETH_EVENT_INTR_RESET ? "RESET interrupt" :
		"unknown event", port_id);
	reset_port = port_id;
	rte_compiler_barrier(); /* prevent compiler reordering */
	stop_forwarding = 1;
}

int
main(int argc, char **argv)
{
	struct lcore_queue_conf *qconf;
	struct rte_eth_dev_info dev_info;
	int ret;
	uint8_t nb_ports;
	uint8_t nb_ports_available;
	uint8_t portid, last_port;
	unsigned lcore_id, rx_lcore_id;
	unsigned nb_ports_in_mask = 0;

	/* init EAL */
	ret = rte_eal_init(argc, argv);
	if (ret < 0)
		rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n");
	argc -= ret;
	argv += ret;

	force_quit = false;
	signal(SIGINT, signal_handler);
	signal(SIGTERM, signal_handler);

	/* parse application arguments (after the EAL ones) */
	ret = l2fwd_parse_args(argc, argv);
	if (ret < 0)
		rte_exit(EXIT_FAILURE, "Invalid L2FWD arguments\n");

	/* create the mbuf pool */
	l2fwd_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", NB_MBUF, 32,
		0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
	if (l2fwd_pktmbuf_pool == NULL)
		rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n");

	nb_ports = rte_eth_dev_count();
	if (nb_ports == 0)
		rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

	if (nb_ports > RTE_MAX_ETHPORTS)
		nb_ports = RTE_MAX_ETHPORTS;

	/* reset l2fwd_dst_ports */
	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
		l2fwd_dst_ports[portid] = 0;
	last_port = 0;

	/* init ports rx/tx lock */
	for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++) {
		rte_spinlock_init(&ports_rx_lock[portid]);
		rte_spinlock_init(&ports_tx_lock[portid]);
	}

	/*
	 * Each logical core is assigned a dedicated TX queue on each port.
	 */
	for (portid = 0; portid < nb_ports; portid++) {
		/* skip ports that are not enabled */
		if ((l2fwd_enabled_port_mask & (1 << portid)) == 0)
			continue;

		if (nb_ports_in_mask % 2) {
			l2fwd_dst_ports[portid] = last_port;
			l2fwd_dst_ports[last_port] = portid;
		} else
			last_port = portid;

		nb_ports_in_mask++;

		rte_eth_dev_info_get(portid, &dev_info);
	}
	if (nb_ports_in_mask % 2) {
		printf("Notice: odd number of ports in portmask.\n");
		l2fwd_dst_ports[last_port] = last_port;
	}

	rx_lcore_id = 1;
	qconf = NULL;

	/* Initialize the port/queue configuration of each logical core */
	for (portid = 0; portid < nb_ports; portid++) {
		/* skip ports that are not enabled */
		if ((l2fwd_enabled_port_mask & (1 << portid)) == 0)
			continue;

		/* get the lcore_id for this port */
		while (rte_lcore_is_enabled(rx_lcore_id) == 0 ||
		       lcore_queue_conf[rx_lcore_id].n_rx_port ==
		       l2fwd_rx_queue_per_lcore) {
			rx_lcore_id++;
			if (rx_lcore_id >= RTE_MAX_LCORE)
				rte_exit(EXIT_FAILURE, "Not enough cores\n");
		}

		if (qconf != &lcore_queue_conf[rx_lcore_id])
			/* Assigned a new logical core in the loop above. */
			qconf = &lcore_queue_conf[rx_lcore_id];

		qconf->rx_port_list[qconf->n_rx_port] = portid;
		qconf->n_rx_port++;
		printf("Lcore %u: RX port %u\n",
		       rx_lcore_id,
		       (unsigned) portid);
	}

	nb_ports_available = nb_ports;

	/* Initialise each port */
	for (portid = 0; portid < nb_ports; portid++) {
		/* skip ports that are not enabled */
		if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
			printf("Skipping disabled port %u\n",
			       (unsigned) portid);
			nb_ports_available--;
			continue;
		}
		/* init port */
		printf("Initializing port %u... ", (unsigned) portid);
		fflush(stdout);
		ret = rte_eth_dev_configure(portid, 1, 1, &port_conf);
		if (ret < 0)
			rte_exit(EXIT_FAILURE,
				 "Cannot configure device: err=%d, port=%u\n",
				 ret, (unsigned) portid);

		/* register reset interrupt callback */
		rte_eth_dev_callback_register(portid,
			RTE_ETH_EVENT_INTR_RESET, reset_event_callback, NULL);

		rte_eth_macaddr_get(portid, &l2fwd_ports_eth_addr[portid]);

		/* init one RX queue */
		fflush(stdout);
		ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd,
					     rte_eth_dev_socket_id(portid),
					     NULL,
					     l2fwd_pktmbuf_pool);
		if (ret < 0)
			rte_exit(EXIT_FAILURE,
				 "rte_eth_rx_queue_setup:err=%d, port=%u\n",
				 ret, (unsigned) portid);

		/* init one TX queue on each port */
		fflush(stdout);
		ret = rte_eth_tx_queue_setup(portid, 0, nb_txd,
				rte_eth_dev_socket_id(portid),
				NULL);
		if (ret < 0)
			rte_exit(EXIT_FAILURE,
				 "rte_eth_tx_queue_setup:err=%d, port=%u\n",
				 ret, (unsigned) portid);

		/* Initialize TX buffers */
		tx_buffer[portid] = rte_zmalloc_socket("tx_buffer",
				RTE_ETH_TX_BUFFER_SIZE(MAX_PKT_BURST), 0,
				rte_eth_dev_socket_id(portid));
		if (tx_buffer[portid] == NULL)
			rte_exit(EXIT_FAILURE,
				 "Cannot allocate buffer for tx on port %u\n",
				 (unsigned) portid);

		rte_eth_tx_buffer_init(tx_buffer[portid], MAX_PKT_BURST);

		ret = rte_eth_tx_buffer_set_err_callback(tx_buffer[portid],
				rte_eth_tx_buffer_count_callback,
				&port_statistics[portid].dropped);
		if (ret < 0)
			rte_exit(EXIT_FAILURE, "Cannot set error callback for "
				 "tx buffer on port %u\n", (unsigned) portid);

		/* Start device */
		ret = rte_eth_dev_start(portid);
		if (ret < 0)
			rte_exit(EXIT_FAILURE,
				 "rte_eth_dev_start:err=%d, port=%u\n",
				 ret, (unsigned) portid);

		printf("done:\n");

		rte_eth_promiscuous_enable(portid);

		printf("Port %u, MAC address: "
		       "%02X:%02X:%02X:%02X:%02X:%02X\n\n",
		       (unsigned) portid,
		       l2fwd_ports_eth_addr[portid].addr_bytes[0],
		       l2fwd_ports_eth_addr[portid].addr_bytes[1],
		       l2fwd_ports_eth_addr[portid].addr_bytes[2],
		       l2fwd_ports_eth_addr[portid].addr_bytes[3],
		       l2fwd_ports_eth_addr[portid].addr_bytes[4],
		       l2fwd_ports_eth_addr[portid].addr_bytes[5]);

		/* initialize port stats */
		memset(&port_statistics, 0, sizeof(port_statistics));
	}

	if (!nb_ports_available) {
		rte_exit(EXIT_FAILURE,
			"All available ports are disabled. Please set portmask.\n");
	}

	check_all_ports_link_status(nb_ports, l2fwd_enabled_port_mask);

	ret = 0;
	/* launch per-lcore init on every lcore */
	rte_eal_mp_remote_launch(l2fwd_launch_one_lcore, NULL, SKIP_MASTER);

	printf("\nwaiting..");
	while (1) {
		rte_delay_ms(1000);
		printf("..");
		if (stop_forwarding == 1) {
			printf("\nreset port %u\n", reset_port);
			rte_spinlock_lock(&ports_rx_lock[reset_port]);
			rte_spinlock_lock(&ports_tx_lock[reset_port]);
			rte_eth_dev_reset(reset_port);
			rte_spinlock_unlock(&ports_rx_lock[reset_port]);
			rte_spinlock_unlock(&ports_tx_lock[reset_port]);
			stop_forwarding = 0;
		}
		if (force_quit)
			break;
	}

	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
		if (rte_eal_wait_lcore(lcore_id) < 0) {
			ret = -1;
			break;
		}
	}

	for (portid = 0; portid < nb_ports; portid++) {
		if ((l2fwd_enabled_port_mask & (1 << portid)) == 0)
			continue;
		printf("Closing port %d...", portid);
		rte_eth_dev_stop(portid);
		rte_eth_dev_close(portid);
		printf(" Done\n");
	}
	printf("Bye...\n");

	return ret;
}

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-06  0:45         ` Lu, Wenzhuo
@ 2016-07-06 16:26           ` Luca Boccassi
       [not found]           ` <1467822182.32466.34.camel@brocade.com>
  1 sibling, 0 replies; 72+ messages in thread
From: Luca Boccassi @ 2016-07-06 16:26 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:
> Hi Luca,
> 
> > -----Original Message-----
> > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > Sent: Tuesday, July 5, 2016 5:53 PM
> > To: Lu, Wenzhuo
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > 
> > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > > Hi Luca,
> > >
> > >
> > > > -----Original Message-----
> > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > Sent: Monday, July 4, 2016 11:48 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > >
> > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > > If the PF link is down and up, VF link will not work accordingly.
> > > > > This patch set addes the support of VF link reset. So, when VF
> > > > > receices the messges of physical link down/up. APP can reset the
> > > > > VF link and let it recover.
> > > > >
> > > > > PS: This patch set is splitted from a previous patch set,
> > > > > *automatic link recovery on ixgbe/igb VF*, and it's base on the
> > > > > patch set *support mailbox interruption on ixgbe/igb VF*.
> > > > >
> > > > > Wenzhuo Lu (3):
> > > > >   lib/librte_ether: support device reset
> > > > >   ixgbe: implement device reset on VF
> > > > >   igb: implement device reset on VF
> > > > >
> > > > > Zhe Tao (1):
> > > > >   i40e: implement device reset on VF
> > > > >
> > > > > v1:
> > > > > - Added the implementation for the VF reset functionality.
> > > > > v2:
> > > > > - Changed the i40e related operations during VF reset.
> > > > > v3:
> > > > > - Resent the patches because of the mail sent issue.
> > > > > v4:
> > > > > - Removed some VF reset emulation code.
> > > > > v5:
> > > > > - Removed all the code related with lock.
> > > > > v6:
> > > > > - Updated the NIC feature overview matrix.
> > > > > - Added more explanation in the doxygen comment of reset API.
> > > > >
> > > > >  doc/guides/nics/overview.rst           |  1 +
> > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > > >  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
> > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > > ++++++++++++++++++++++++++++++++++
> > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64 +++++++++++++++++++++++++-
> > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > > >
> > > > Hello Wenzhuo,
> > > >
> > > > I'm testing this patchset, but I am sporadically running into an
> > > > issue where the VFs reset fails after the PF flaps.
> > > >
> > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > > >
> > > > I am using calling rte_eth_dev_reset in response to a
> > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors appear
> > > > in the
> > > > log:
> > > >
> > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0
> > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > >
> > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> > > > ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> > > > The application has ~500 2MB hugepages, and there's 2GB of free
> > > > memory available on top of that.
> > > >
> > > > Have you seen this before? Any pointer or suggestion for debugging?
> > > >
> > > > Thanks!
> > > >
> > > > --
> > > > Kind regards,
> > > > Luca Boccassi
> > > I think the problem is the mbuf occupied by the packets is not released. This
> > memory has to be released by the APP, so my patches haven’t covered this.
> > Actually an example is needed to show how to use the reset API. I plan to modify
> > the testpmd.
> > > You may notice this feature is postponed to 16.11. Would you like to wait for
> > the new version that will include an example?
> > 
> > Hi,
> > 
> > Unfortunately we need the VF reset working sooner than that, so one way or
> > the other I'll need to sort it out. Given I've got a use case where this is happening,
> > if it can be helpful for you I'm more than happy to help as a guinea pig. If you
> > could please give some guidance/guidelines with regards to which API to use to
> > sort the mbuf problem, I can try it out and give back some feedback.
> > 
> > Thanks!
> I made a stupid mistake and deleted all my code. So, I have to take some time to rewrite it :(
> Attached the example I used to test the reset API. It's modified from the l2fwd example. So you can compare it with l2fwd to see what need to be added.
> Hopefully it can help :)

Thanks! That made me understand a couple of things more, and I've got
past the problem.

Unfortunately now there's a bigger issue - rte_eth_dev_reset is a
blocking call. the _RESET event callback is fired when the PF goes down,
but when I call rte_eth_dev_reset it will block until the PF goes back
up. There is no way, as far as I can see, to know if the PF is back up
before calling rte_eth_dev_reset.

This is a problem because, as far as I understand, I have to call all
the rte_eth_dev_ APIs from the same thread, in my case the master
thread, and I can't have that block potentially indefinitely.

Would it be possible to have 2 events instead of 1, one when the PF goes
down and one when it goes up? This way an application would be able to
soft-stop the port (drain queues, etc) when the PF is down, and then
call the reset API when it goes back up.

Thanks!

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
       [not found]           ` <1467822182.32466.34.camel@brocade.com>
@ 2016-07-07  1:09             ` Lu, Wenzhuo
  2016-07-07 10:20               ` Luca Boccassi
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-07  1:09 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

Hi Luca,


> -----Original Message-----
> From: Luca Boccassi [mailto:lboccass@Brocade.com]
> Sent: Thursday, July 7, 2016 12:23 AM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> 
> On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:
> > Hi Luca,
> >
> > > -----Original Message-----
> > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > Sent: Tuesday, July 5, 2016 5:53 PM
> > > To: Lu, Wenzhuo
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > >
> > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > > > Hi Luca,
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > > Sent: Monday, July 4, 2016 11:48 PM
> > > > > To: Lu, Wenzhuo
> > > > > Cc: dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > > >
> > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > > > If the PF link is down and up, VF link will not work accordingly.
> > > > > > This patch set addes the support of VF link reset. So, when VF
> > > > > > receices the messges of physical link down/up. APP can reset
> > > > > > the VF link and let it recover.
> > > > > >
> > > > > > PS: This patch set is splitted from a previous patch set,
> > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base on
> > > > > > the patch set *support mailbox interruption on ixgbe/igb VF*.
> > > > > >
> > > > > > Wenzhuo Lu (3):
> > > > > >   lib/librte_ether: support device reset
> > > > > >   ixgbe: implement device reset on VF
> > > > > >   igb: implement device reset on VF
> > > > > >
> > > > > > Zhe Tao (1):
> > > > > >   i40e: implement device reset on VF
> > > > > >
> > > > > > v1:
> > > > > > - Added the implementation for the VF reset functionality.
> > > > > > v2:
> > > > > > - Changed the i40e related operations during VF reset.
> > > > > > v3:
> > > > > > - Resent the patches because of the mail sent issue.
> > > > > > v4:
> > > > > > - Removed some VF reset emulation code.
> > > > > > v5:
> > > > > > - Removed all the code related with lock.
> > > > > > v6:
> > > > > > - Updated the NIC feature overview matrix.
> > > > > > - Added more explanation in the doxygen comment of reset API.
> > > > > >
> > > > > >  doc/guides/nics/overview.rst           |  1 +
> > > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > > > >  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
> > > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > > > ++++++++++++++++++++++++++++++++++
> > > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64
> +++++++++++++++++++++++++-
> > > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > > > >
> > > > > Hello Wenzhuo,
> > > > >
> > > > > I'm testing this patchset, but I am sporadically running into an
> > > > > issue where the VFs reset fails after the PF flaps.
> > > > >
> > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > > > >
> > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors
> > > > > appear in the
> > > > > log:
> > > > >
> > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > queue_id=0
> > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > >
> > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> > > > > ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> > > > > The application has ~500 2MB hugepages, and there's 2GB of free
> > > > > memory available on top of that.
> > > > >
> > > > > Have you seen this before? Any pointer or suggestion for debugging?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > --
> > > > > Kind regards,
> > > > > Luca Boccassi
> > > > I think the problem is the mbuf occupied by the packets is not
> > > > released. This
> > > memory has to be released by the APP, so my patches haven’t covered this.
> > > Actually an example is needed to show how to use the reset API. I
> > > plan to modify the testpmd.
> > > > You may notice this feature is postponed to 16.11. Would you like
> > > > to wait for
> > > the new version that will include an example?
> > >
> > > Hi,
> > >
> > > Unfortunately we need the VF reset working sooner than that, so one
> > > way or the other I'll need to sort it out. Given I've got a use case
> > > where this is happening, if it can be helpful for you I'm more than
> > > happy to help as a guinea pig. If you could please give some
> > > guidance/guidelines with regards to which API to use to sort the mbuf
> problem, I can try it out and give back some feedback.
> > >
> > > Thanks!
> > I made a stupid mistake and deleted all my code. So, I have to take
> > some time to rewrite it :( Attached the example I used to test the reset API. It's
> modified from the l2fwd example. So you can compare it with l2fwd to see what
> need to be added.
> > Hopefully it can help :)
> 
> Thanks! That made me understand a couple of things more, and I've got past the
> problem.
> 
> Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking call.
> the _RESET event callback is fired when the PF goes down, but when I call
> rte_eth_dev_reset it will block until the PF goes back up. There is no way, as far
> as I can see, to know if the PF is back up before calling rte_eth_dev_reset.
> 
> This is a problem because, as far as I understand, I have to call all the
> rte_eth_dev_ APIs from the same thread, in my case the master thread, and I
> can't have that block potentially indefinitely.
> 
> Would it be possible to have 2 events instead of 1, one when the PF goes down
> and one when it goes up? This way an application would be able to soft-stop the
> port (drain queues, etc) when the PF is down, and then call the reset API when it
> goes back up.
> 
> Thanks!
Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
1, Normally we use kernel driver for PF. Now the kernel driver only have one kind of message for link down and up. So we cannot tell if it's down or up.
2, When the PF is down, if we don't reset the VF, VF is not working. It cannot receive any message from PF. So we cannot know that when PF is up. It means normally we have to reset VF twice when PF down and up. (Surely we can wait a while when we receive the message from PF until PF is up. But we cannot tell how long the time is appropriate. So this *wait a while* may work for flash.)

> 
> --
> Kind regards,
> Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-07  1:09             ` Lu, Wenzhuo
@ 2016-07-07 10:20               ` Luca Boccassi
  2016-07-07 13:12                 ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-07-07 10:20 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Thu, 2016-07-07 at 01:09 +0000, Lu, Wenzhuo wrote:
> Hi Luca,
> 
> 
> > -----Original Message-----
> > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > Sent: Thursday, July 7, 2016 12:23 AM
> > To: Lu, Wenzhuo
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > 
> > On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:
> > > Hi Luca,
> > >
> > > > -----Original Message-----
> > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > Sent: Tuesday, July 5, 2016 5:53 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > >
> > > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > > > > Hi Luca,
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > > > Sent: Monday, July 4, 2016 11:48 PM
> > > > > > To: Lu, Wenzhuo
> > > > > > Cc: dev@dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > > > >
> > > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > > > > If the PF link is down and up, VF link will not work accordingly.
> > > > > > > This patch set addes the support of VF link reset. So, when VF
> > > > > > > receices the messges of physical link down/up. APP can reset
> > > > > > > the VF link and let it recover.
> > > > > > >
> > > > > > > PS: This patch set is splitted from a previous patch set,
> > > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base on
> > > > > > > the patch set *support mailbox interruption on ixgbe/igb VF*.
> > > > > > >
> > > > > > > Wenzhuo Lu (3):
> > > > > > >   lib/librte_ether: support device reset
> > > > > > >   ixgbe: implement device reset on VF
> > > > > > >   igb: implement device reset on VF
> > > > > > >
> > > > > > > Zhe Tao (1):
> > > > > > >   i40e: implement device reset on VF
> > > > > > >
> > > > > > > v1:
> > > > > > > - Added the implementation for the VF reset functionality.
> > > > > > > v2:
> > > > > > > - Changed the i40e related operations during VF reset.
> > > > > > > v3:
> > > > > > > - Resent the patches because of the mail sent issue.
> > > > > > > v4:
> > > > > > > - Removed some VF reset emulation code.
> > > > > > > v5:
> > > > > > > - Removed all the code related with lock.
> > > > > > > v6:
> > > > > > > - Updated the NIC feature overview matrix.
> > > > > > > - Added more explanation in the doxygen comment of reset API.
> > > > > > >
> > > > > > >  doc/guides/nics/overview.rst           |  1 +
> > > > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > > > > >  drivers/net/e1000/igb_ethdev.c         | 59 ++++++++++++++++++++++++
> > > > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > > > > ++++++++++++++++++++++++++++++++++
> > > > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64
> > +++++++++++++++++++++++++-
> > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > > > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > > > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > Hello Wenzhuo,
> > > > > >
> > > > > > I'm testing this patchset, but I am sporadically running into an
> > > > > > issue where the VFs reset fails after the PF flaps.
> > > > > >
> > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > > > > >
> > > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors
> > > > > > appear in the
> > > > > > log:
> > > > > >
> > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > > queue_id=0
> > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > > >
> > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> > > > > > ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> > > > > > The application has ~500 2MB hugepages, and there's 2GB of free
> > > > > > memory available on top of that.
> > > > > >
> > > > > > Have you seen this before? Any pointer or suggestion for debugging?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > --
> > > > > > Kind regards,
> > > > > > Luca Boccassi
> > > > > I think the problem is the mbuf occupied by the packets is not
> > > > > released. This
> > > > memory has to be released by the APP, so my patches haven’t covered this.
> > > > Actually an example is needed to show how to use the reset API. I
> > > > plan to modify the testpmd.
> > > > > You may notice this feature is postponed to 16.11. Would you like
> > > > > to wait for
> > > > the new version that will include an example?
> > > >
> > > > Hi,
> > > >
> > > > Unfortunately we need the VF reset working sooner than that, so one
> > > > way or the other I'll need to sort it out. Given I've got a use case
> > > > where this is happening, if it can be helpful for you I'm more than
> > > > happy to help as a guinea pig. If you could please give some
> > > > guidance/guidelines with regards to which API to use to sort the mbuf
> > problem, I can try it out and give back some feedback.
> > > >
> > > > Thanks!
> > > I made a stupid mistake and deleted all my code. So, I have to take
> > > some time to rewrite it :( Attached the example I used to test the reset API. It's
> > modified from the l2fwd example. So you can compare it with l2fwd to see what
> > need to be added.
> > > Hopefully it can help :)
> > 
> > Thanks! That made me understand a couple of things more, and I've got past the
> > problem.
> > 
> > Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking call.
> > the _RESET event callback is fired when the PF goes down, but when I call
> > rte_eth_dev_reset it will block until the PF goes back up. There is no way, as far
> > as I can see, to know if the PF is back up before calling rte_eth_dev_reset.
> > 
> > This is a problem because, as far as I understand, I have to call all the
> > rte_eth_dev_ APIs from the same thread, in my case the master thread, and I
> > can't have that block potentially indefinitely.
> > 
> > Would it be possible to have 2 events instead of 1, one when the PF goes down
> > and one when it goes up? This way an application would be able to soft-stop the
> > port (drain queues, etc) when the PF is down, and then call the reset API when it
> > goes back up.
> > 
> > Thanks!
> Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
> 1, Normally we use kernel driver for PF. Now the kernel driver only have one kind of message for link down and up. So we cannot tell if it's down or up.
> 2, When the PF is down, if we don't reset the VF, VF is not working. It cannot receive any message from PF. So we cannot know that when PF is up. It means normally we have to reset VF twice when PF down and up. (Surely we can wait a while when we receive the message from PF until PF is up. But we cannot tell how long the time is appropriate. So this *wait a while* may work for flash.)

Thanks for the clarification, I understand.

The problem with a blocking call is that we basically need to spawn one
thread per rte_eth_dev_reset call, since there is no way of knowing if a
PF is down for good or just flapping, and we can't have a single thread
managing all the interfaces being blocked forever (EG: PF 1 and 2 go
down, thread blocks on PF 1 reset call but it never returns, meanwhile
PF 2 goes back up but call is never made).

A colleague of mine, Eric Kinzie, suggested to add a blocking boolean
parameter to rte_eth_dev_reset API. If set to false, then the call will
not block and just does one try and return an error (EAGAIN ?). Would
this be an acceptable proposition?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-07 10:20               ` Luca Boccassi
@ 2016-07-07 13:12                 ` Lu, Wenzhuo
  2016-07-07 16:19                   ` Luca Boccassi
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-07 13:12 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

> -----Original Message-----
> From: Luca Boccassi [mailto:lboccass@Brocade.com]
> Sent: Thursday, July 7, 2016 6:21 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> 
> On Thu, 2016-07-07 at 01:09 +0000, Lu, Wenzhuo wrote:
> > Hi Luca,
> >
> >
> > > -----Original Message-----
> > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > Sent: Thursday, July 7, 2016 12:23 AM
> > > To: Lu, Wenzhuo
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > >
> > > On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:
> > > > Hi Luca,
> > > >
> > > > > -----Original Message-----
> > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > > Sent: Tuesday, July 5, 2016 5:53 PM
> > > > > To: Lu, Wenzhuo
> > > > > Cc: dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > > >
> > > > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > > > > > Hi Luca,
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > > > > Sent: Monday, July 4, 2016 11:48 PM
> > > > > > > To: Lu, Wenzhuo
> > > > > > > Cc: dev@dpdk.org
> > > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF
> > > > > > > link
> > > > > > >
> > > > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > > > > > If the PF link is down and up, VF link will not work accordingly.
> > > > > > > > This patch set addes the support of VF link reset. So,
> > > > > > > > when VF receices the messges of physical link down/up. APP
> > > > > > > > can reset the VF link and let it recover.
> > > > > > > >
> > > > > > > > PS: This patch set is splitted from a previous patch set,
> > > > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base
> > > > > > > > on the patch set *support mailbox interruption on ixgbe/igb VF*.
> > > > > > > >
> > > > > > > > Wenzhuo Lu (3):
> > > > > > > >   lib/librte_ether: support device reset
> > > > > > > >   ixgbe: implement device reset on VF
> > > > > > > >   igb: implement device reset on VF
> > > > > > > >
> > > > > > > > Zhe Tao (1):
> > > > > > > >   i40e: implement device reset on VF
> > > > > > > >
> > > > > > > > v1:
> > > > > > > > - Added the implementation for the VF reset functionality.
> > > > > > > > v2:
> > > > > > > > - Changed the i40e related operations during VF reset.
> > > > > > > > v3:
> > > > > > > > - Resent the patches because of the mail sent issue.
> > > > > > > > v4:
> > > > > > > > - Removed some VF reset emulation code.
> > > > > > > > v5:
> > > > > > > > - Removed all the code related with lock.
> > > > > > > > v6:
> > > > > > > > - Updated the NIC feature overview matrix.
> > > > > > > > - Added more explanation in the doxygen comment of reset API.
> > > > > > > >
> > > > > > > >  doc/guides/nics/overview.rst           |  1 +
> > > > > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > > > > > >  drivers/net/e1000/igb_ethdev.c         | 59
> ++++++++++++++++++++++++
> > > > > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > > > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > > > > > ++++++++++++++++++++++++++++++++++
> > > > > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > > > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64
> > > +++++++++++++++++++++++++-
> > > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > > > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > > > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > > > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > > > > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > > > > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > Hello Wenzhuo,
> > > > > > >
> > > > > > > I'm testing this patchset, but I am sporadically running
> > > > > > > into an issue where the VFs reset fails after the PF flaps.
> > > > > > >
> > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > > > > > >
> > > > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors
> > > > > > > appear in the
> > > > > > > log:
> > > > > > >
> > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > > > queue_id=0
> > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware
> > > > > > > (-12)
> > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > > > >
> > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call
> > > > > > > in ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of
> 2048.
> > > > > > > The application has ~500 2MB hugepages, and there's 2GB of
> > > > > > > free memory available on top of that.
> > > > > > >
> > > > > > > Have you seen this before? Any pointer or suggestion for debugging?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > --
> > > > > > > Kind regards,
> > > > > > > Luca Boccassi
> > > > > > I think the problem is the mbuf occupied by the packets is not
> > > > > > released. This
> > > > > memory has to be released by the APP, so my patches haven’t covered
> this.
> > > > > Actually an example is needed to show how to use the reset API.
> > > > > I plan to modify the testpmd.
> > > > > > You may notice this feature is postponed to 16.11. Would you
> > > > > > like to wait for
> > > > > the new version that will include an example?
> > > > >
> > > > > Hi,
> > > > >
> > > > > Unfortunately we need the VF reset working sooner than that, so
> > > > > one way or the other I'll need to sort it out. Given I've got a
> > > > > use case where this is happening, if it can be helpful for you
> > > > > I'm more than happy to help as a guinea pig. If you could please
> > > > > give some guidance/guidelines with regards to which API to use
> > > > > to sort the mbuf
> > > problem, I can try it out and give back some feedback.
> > > > >
> > > > > Thanks!
> > > > I made a stupid mistake and deleted all my code. So, I have to
> > > > take some time to rewrite it :( Attached the example I used to
> > > > test the reset API. It's
> > > modified from the l2fwd example. So you can compare it with l2fwd to
> > > see what need to be added.
> > > > Hopefully it can help :)
> > >
> > > Thanks! That made me understand a couple of things more, and I've
> > > got past the problem.
> > >
> > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking
> call.
> > > the _RESET event callback is fired when the PF goes down, but when I
> > > call rte_eth_dev_reset it will block until the PF goes back up.
> > > There is no way, as far as I can see, to know if the PF is back up before
> calling rte_eth_dev_reset.
> > >
> > > This is a problem because, as far as I understand, I have to call
> > > all the rte_eth_dev_ APIs from the same thread, in my case the
> > > master thread, and I can't have that block potentially indefinitely.
> > >
> > > Would it be possible to have 2 events instead of 1, one when the PF
> > > goes down and one when it goes up? This way an application would be
> > > able to soft-stop the port (drain queues, etc) when the PF is down,
> > > and then call the reset API when it goes back up.
> > >
> > > Thanks!
> > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
> > 1, Normally we use kernel driver for PF. Now the kernel driver only have one
> kind of message for link down and up. So we cannot tell if it's down or up.
> > 2, When the PF is down, if we don't reset the VF, VF is not working.
> > It cannot receive any message from PF. So we cannot know that when PF
> > is up. It means normally we have to reset VF twice when PF down and
> > up. (Surely we can wait a while when we receive the message from PF
> > until PF is up. But we cannot tell how long the time is appropriate.
> > So this *wait a while* may work for flash.)
> 
> Thanks for the clarification, I understand.
> 
> The problem with a blocking call is that we basically need to spawn one thread
> per rte_eth_dev_reset call, since there is no way of knowing if a PF is down for
> good or just flapping, and we can't have a single thread managing all the
> interfaces being blocked forever (EG: PF 1 and 2 go down, thread blocks on PF 1
> reset call but it never returns, meanwhile PF 2 goes back up but call is never
> made).
> 
> A colleague of mine, Eric Kinzie, suggested to add a blocking boolean parameter
> to rte_eth_dev_reset API. If set to false, then the call will not block and just does
> one try and return an error (EAGAIN ?). Would this be an acceptable proposition?
It's a good suggestion. 
And I think if the parameter is set to false and the link is not up after trying once, it will be APP's responsibility to setup a timer or something like that to keep trying to bring up the link.

> 
> --
> Kind regards,
> Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-07 13:12                 ` Lu, Wenzhuo
@ 2016-07-07 16:19                   ` Luca Boccassi
  2016-07-08  0:14                     ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-07-07 16:19 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Thu, 2016-07-07 at 13:12 +0000, Lu, Wenzhuo wrote:
> > -----Original Message-----
> > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > Sent: Thursday, July 7, 2016 6:21 PM
> > To: Lu, Wenzhuo
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > 
> > On Thu, 2016-07-07 at 01:09 +0000, Lu, Wenzhuo wrote:
> > > Hi Luca,
> > >
> > >
> > > > -----Original Message-----
> > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > Sent: Thursday, July 7, 2016 12:23 AM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > >
> > > > On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:
> > > > > Hi Luca,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > > > Sent: Tuesday, July 5, 2016 5:53 PM
> > > > > > To: Lu, Wenzhuo
> > > > > > Cc: dev@dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > > > >
> > > > > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > > > > > > Hi Luca,
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Luca Boccassi [mailto:lboccass@Brocade.com]
> > > > > > > > Sent: Monday, July 4, 2016 11:48 PM
> > > > > > > > To: Lu, Wenzhuo
> > > > > > > > Cc: dev@dpdk.org
> > > > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF
> > > > > > > > link
> > > > > > > >
> > > > > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > > > > > > If the PF link is down and up, VF link will not work accordingly.
> > > > > > > > > This patch set addes the support of VF link reset. So,
> > > > > > > > > when VF receices the messges of physical link down/up. APP
> > > > > > > > > can reset the VF link and let it recover.
> > > > > > > > >
> > > > > > > > > PS: This patch set is splitted from a previous patch set,
> > > > > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base
> > > > > > > > > on the patch set *support mailbox interruption on ixgbe/igb VF*.
> > > > > > > > >
> > > > > > > > > Wenzhuo Lu (3):
> > > > > > > > >   lib/librte_ether: support device reset
> > > > > > > > >   ixgbe: implement device reset on VF
> > > > > > > > >   igb: implement device reset on VF
> > > > > > > > >
> > > > > > > > > Zhe Tao (1):
> > > > > > > > >   i40e: implement device reset on VF
> > > > > > > > >
> > > > > > > > > v1:
> > > > > > > > > - Added the implementation for the VF reset functionality.
> > > > > > > > > v2:
> > > > > > > > > - Changed the i40e related operations during VF reset.
> > > > > > > > > v3:
> > > > > > > > > - Resent the patches because of the mail sent issue.
> > > > > > > > > v4:
> > > > > > > > > - Removed some VF reset emulation code.
> > > > > > > > > v5:
> > > > > > > > > - Removed all the code related with lock.
> > > > > > > > > v6:
> > > > > > > > > - Updated the NIC feature overview matrix.
> > > > > > > > > - Added more explanation in the doxygen comment of reset API.
> > > > > > > > >
> > > > > > > > >  doc/guides/nics/overview.rst           |  1 +
> > > > > > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > > > > > > >  drivers/net/e1000/igb_ethdev.c         | 59
> > ++++++++++++++++++++++++
> > > > > > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > > > > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > > > > > > ++++++++++++++++++++++++++++++++++
> > > > > > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > > > > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64
> > > > +++++++++++++++++++++++++-
> > > > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > > > > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > > > > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > > > > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > > > > > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > > > > > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > > > > > > >
> > > > > > > > Hello Wenzhuo,
> > > > > > > >
> > > > > > > > I'm testing this patchset, but I am sporadically running
> > > > > > > > into an issue where the VFs reset fails after the PF flaps.
> > > > > > > >
> > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > > > > > > >
> > > > > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors
> > > > > > > > appear in the
> > > > > > > > log:
> > > > > > > >
> > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > > > > queue_id=0
> > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware
> > > > > > > > (-12)
> > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > > > > >
> > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call
> > > > > > > > in ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of
> > 2048.
> > > > > > > > The application has ~500 2MB hugepages, and there's 2GB of
> > > > > > > > free memory available on top of that.
> > > > > > > >
> > > > > > > > Have you seen this before? Any pointer or suggestion for debugging?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > --
> > > > > > > > Kind regards,
> > > > > > > > Luca Boccassi
> > > > > > > I think the problem is the mbuf occupied by the packets is not
> > > > > > > released. This
> > > > > > memory has to be released by the APP, so my patches haven’t covered
> > this.
> > > > > > Actually an example is needed to show how to use the reset API.
> > > > > > I plan to modify the testpmd.
> > > > > > > You may notice this feature is postponed to 16.11. Would you
> > > > > > > like to wait for
> > > > > > the new version that will include an example?
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Unfortunately we need the VF reset working sooner than that, so
> > > > > > one way or the other I'll need to sort it out. Given I've got a
> > > > > > use case where this is happening, if it can be helpful for you
> > > > > > I'm more than happy to help as a guinea pig. If you could please
> > > > > > give some guidance/guidelines with regards to which API to use
> > > > > > to sort the mbuf
> > > > problem, I can try it out and give back some feedback.
> > > > > >
> > > > > > Thanks!
> > > > > I made a stupid mistake and deleted all my code. So, I have to
> > > > > take some time to rewrite it :( Attached the example I used to
> > > > > test the reset API. It's
> > > > modified from the l2fwd example. So you can compare it with l2fwd to
> > > > see what need to be added.
> > > > > Hopefully it can help :)
> > > >
> > > > Thanks! That made me understand a couple of things more, and I've
> > > > got past the problem.
> > > >
> > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking
> > call.
> > > > the _RESET event callback is fired when the PF goes down, but when I
> > > > call rte_eth_dev_reset it will block until the PF goes back up.
> > > > There is no way, as far as I can see, to know if the PF is back up before
> > calling rte_eth_dev_reset.
> > > >
> > > > This is a problem because, as far as I understand, I have to call
> > > > all the rte_eth_dev_ APIs from the same thread, in my case the
> > > > master thread, and I can't have that block potentially indefinitely.
> > > >
> > > > Would it be possible to have 2 events instead of 1, one when the PF
> > > > goes down and one when it goes up? This way an application would be
> > > > able to soft-stop the port (drain queues, etc) when the PF is down,
> > > > and then call the reset API when it goes back up.
> > > >
> > > > Thanks!
> > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
> > > 1, Normally we use kernel driver for PF. Now the kernel driver only have one
> > kind of message for link down and up. So we cannot tell if it's down or up.
> > > 2, When the PF is down, if we don't reset the VF, VF is not working.
> > > It cannot receive any message from PF. So we cannot know that when PF
> > > is up. It means normally we have to reset VF twice when PF down and
> > > up. (Surely we can wait a while when we receive the message from PF
> > > until PF is up. But we cannot tell how long the time is appropriate.
> > > So this *wait a while* may work for flash.)
> > 
> > Thanks for the clarification, I understand.
> > 
> > The problem with a blocking call is that we basically need to spawn one thread
> > per rte_eth_dev_reset call, since there is no way of knowing if a PF is down for
> > good or just flapping, and we can't have a single thread managing all the
> > interfaces being blocked forever (EG: PF 1 and 2 go down, thread blocks on PF 1
> > reset call but it never returns, meanwhile PF 2 goes back up but call is never
> > made).
> > 
> > A colleague of mine, Eric Kinzie, suggested to add a blocking boolean parameter
> > to rte_eth_dev_reset API. If set to false, then the call will not block and just does
> > one try and return an error (EAGAIN ?). Would this be an acceptable proposition?
> It's a good suggestion. 
> And I think if the parameter is set to false and the link is not up after trying once, it will be APP's responsibility to setup a timer or something like that to keep trying to bring up the link.

That seems reasonable. I've thrown together a quick diff and played with
it on top of your patches and DPDK 2.2, seems to work as intended, I'm
attaching it for reference. Feel free to pick it up, adapt it or ignore
it :-)

Also I've noticed that the ixgbe is the only one that actually blocks,
e1000 returns already immediately if the dev_start fails (perhaps it
should be changed to be consistent?) and ixgb40 does weird things that
I'm not sure about, but couldn't spot a loop in there :-)

Also I've used int instead of bool because
drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
compilation fails when including stdbool.h and using bool in
rte_ethdev.h

-- 
Kind regards,
Luca Boccassi

--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -260,7 +260,7 @@ static void eth_igb_configure_msix_intr(
 static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
 					void *param);
 static void igbvf_mbx_process(struct rte_eth_dev *dev);
-static int igbvf_dev_reset(struct rte_eth_dev *dev);
+static int igbvf_dev_reset(struct rte_eth_dev *dev, int blocking);
 
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -2598,7 +2598,7 @@ void igbvf_mbx_process(struct rte_eth_de
 }
 
 static int
-igbvf_dev_reset(struct rte_eth_dev *dev)
+igbvf_dev_reset(struct rte_eth_dev *dev, __rte_unused int blocking)
 {
 	struct e1000_hw *hw =
 		 E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -2626,12 +2626,12 @@ igbvf_dev_reset(struct rte_eth_dev *dev)
 		 rte_delay_ms(1000);
 
 		 diag = igbvf_dev_start(dev);
+		 dev->data->dev_started = 1;
 		 if (diag) {
 			  PMD_INIT_LOG(ERR, "Igb VF reset: "
 					 "Failed to start device.");
-			  return diag;
+			  return -EAGAIN;
 		 }
-		 dev->data->dev_started = 1;
 		 eth_igbvf_stats_reset(dev);
 		 if (dev->data->dev_conf.intr_conf.lsc == 0)
 			  diag = eth_igb_link_update(dev, 0);
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -157,7 +157,7 @@ static int i40evf_dev_init(struct rte_et
 static void i40evf_dev_close(struct rte_eth_dev *dev);
 static int i40evf_dev_start(struct rte_eth_dev *dev);
 static int i40evf_dev_configure(struct rte_eth_dev *dev);
-static int i40evf_handle_vf_reset(struct rte_eth_dev *dev);
+static int i40evf_handle_vf_reset(struct rte_eth_dev *dev, int blocking);
 
 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -1498,7 +1498,7 @@ i40e_vf_reset_dev(struct rte_eth_dev *de
 }
 
 static int
-i40evf_handle_vf_reset(struct rte_eth_dev *dev)
+i40evf_handle_vf_reset(struct rte_eth_dev *dev, __rte_unused int blocking)
 {
 	struct i40e_adapter *adapter =
 		 I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
@@ -1518,7 +1518,7 @@ i40evf_emulate_vf_reset(uint8_t port_id)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 
-	i40evf_handle_vf_reset(dev);
+	i40evf_handle_vf_reset(dev, 0);
 }
 
 static int
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -379,7 +379,7 @@ static void ixgbevf_dev_interrupt_handle
 		(r) = (h)->bitmap[idx] >> bit & 1;\
 	}while(0)
 
-static int ixgbevf_dev_reset(struct rte_eth_dev *dev);
+static int ixgbevf_dev_reset(struct rte_eth_dev *dev, int blocking);
 
 /*
  * The set of PCI devices this driver supports
@@ -6227,7 +6227,7 @@ static void ixgbevf_mbx_process(struct r
 }
 
 static int
-ixgbevf_dev_reset(struct rte_eth_dev *dev)
+ixgbevf_dev_reset(struct rte_eth_dev *dev, int blocking)
 {
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	int diag = 0;
@@ -6256,7 +6256,12 @@ ixgbevf_dev_reset(struct rte_eth_dev *de
 		 if (diag) {
 			  PMD_INIT_LOG(ERR, "Ixgbe VF reset: "
 					 "Failed to start device.");
-			  continue;
+			if (blocking)
+				continue;
+			else {
+				dev->data->dev_started = 1;
+				return -EAGAIN;
+			}
 		 }
 		 dev->data->dev_started = 1;
 		 ixgbevf_dev_stats_reset(dev);
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3370,7 +3370,7 @@ rte_eth_copy_pci_info(struct rte_eth_dev
 }
 
 int
-rte_eth_dev_reset(uint8_t port_id)
+rte_eth_dev_reset(uint8_t port_id, int blocking)
 {
 	struct rte_eth_dev *dev;
 	int diag;
@@ -3381,7 +3381,7 @@ rte_eth_dev_reset(uint8_t port_id)
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
 
-	diag = (*dev->dev_ops->dev_reset)(dev);
+	diag = (*dev->dev_ops->dev_reset)(dev, blocking);
 
 	return diag;
 }
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1262,7 +1262,7 @@ typedef int (*eth_set_eeprom_t)(struct r
 				struct rte_dev_eeprom_info *info);
 /**< @internal Program eeprom data  */
 
-typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev);
+typedef int  (*eth_dev_reset_t)(struct rte_eth_dev *dev, int blocking);
 /**< @internal Function used to reset a configured Ethernet device. */
 
 #ifdef RTE_NIC_BYPASS
@@ -3927,17 +3927,21 @@ rte_eth_dma_zone_reserve(const struct rt
  * queues, restart the port.
  * Before calling this API, APP should stop the rx/tx. When tx is being stopped,
  * APP can drop the packets and release the buffer instead of sending them.
+ * This call will block until the PF is up again, unless blocking is false.
  *
  * @param port_id
  *   The port identifier of the Ethernet device.
+ * @param blocking
+ *   Whether or not to block if the PF is not yet UP.
  *
  * @return
  *   - (0) if successful.
  *   - (-ENODEV) if port identifier is invalid.
  *   - (-ENOTSUP) if hardware doesn't support this function.
+ *   - (-EAGAIN) if PF is not up and blocking was false.
  */
 int
-rte_eth_dev_reset(uint8_t port_id);
+rte_eth_dev_reset(uint8_t port_id, int blocking);
 
 #ifdef __cplusplus
 }


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-07 16:19                   ` Luca Boccassi
@ 2016-07-08  0:14                     ` Lu, Wenzhuo
  2016-07-08 17:15                       ` Luca Boccassi
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-08  0:14 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

> > > > > > > > > Hello Wenzhuo,
> > > > > > > > >
> > > > > > > > > I'm testing this patchset, but I am sporadically running
> > > > > > > > > into an issue where the VFs reset fails after the PF flaps.
> > > > > > > > >
> > > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs
> in.
> > > > > > > > >
> > > > > > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following
> > > > > > > > > errors appear in the
> > > > > > > > > log:
> > > > > > > > >
> > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > > > > > queue_id=0
> > > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX
> > > > > > > > > hardware
> > > > > > > > > (-12)
> > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > > > > > >
> > > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc
> > > > > > > > > call in ixgbe_alloc_rx_queue_mbufs returns NULL at
> > > > > > > > > iteration 64 out of
> > > 2048.
> > > > > > > > > The application has ~500 2MB hugepages, and there's 2GB
> > > > > > > > > of free memory available on top of that.
> > > > > > > > >
> > > > > > > > > Have you seen this before? Any pointer or suggestion for
> debugging?
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Kind regards,
> > > > > > > > > Luca Boccassi
> > > > > > > > I think the problem is the mbuf occupied by the packets is
> > > > > > > > not released. This
> > > > > > > memory has to be released by the APP, so my patches haven’t
> > > > > > > covered
> > > this.
> > > > > > > Actually an example is needed to show how to use the reset API.
> > > > > > > I plan to modify the testpmd.
> > > > > > > > You may notice this feature is postponed to 16.11. Would
> > > > > > > > you like to wait for
> > > > > > > the new version that will include an example?
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Unfortunately we need the VF reset working sooner than that,
> > > > > > > so one way or the other I'll need to sort it out. Given I've
> > > > > > > got a use case where this is happening, if it can be helpful
> > > > > > > for you I'm more than happy to help as a guinea pig. If you
> > > > > > > could please give some guidance/guidelines with regards to
> > > > > > > which API to use to sort the mbuf
> > > > > problem, I can try it out and give back some feedback.
> > > > > > >
> > > > > > > Thanks!
> > > > > > I made a stupid mistake and deleted all my code. So, I have to
> > > > > > take some time to rewrite it :( Attached the example I used to
> > > > > > test the reset API. It's
> > > > > modified from the l2fwd example. So you can compare it with
> > > > > l2fwd to see what need to be added.
> > > > > > Hopefully it can help :)
> > > > >
> > > > > Thanks! That made me understand a couple of things more, and
> > > > > I've got past the problem.
> > > > >
> > > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is
> > > > > a blocking
> > > call.
> > > > > the _RESET event callback is fired when the PF goes down, but
> > > > > when I call rte_eth_dev_reset it will block until the PF goes back up.
> > > > > There is no way, as far as I can see, to know if the PF is back
> > > > > up before
> > > calling rte_eth_dev_reset.
> > > > >
> > > > > This is a problem because, as far as I understand, I have to
> > > > > call all the rte_eth_dev_ APIs from the same thread, in my case
> > > > > the master thread, and I can't have that block potentially indefinitely.
> > > > >
> > > > > Would it be possible to have 2 events instead of 1, one when the
> > > > > PF goes down and one when it goes up? This way an application
> > > > > would be able to soft-stop the port (drain queues, etc) when the
> > > > > PF is down, and then call the reset API when it goes back up.
> > > > >
> > > > > Thanks!
> > > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
> > > > 1, Normally we use kernel driver for PF. Now the kernel driver
> > > > only have one
> > > kind of message for link down and up. So we cannot tell if it's down or up.
> > > > 2, When the PF is down, if we don't reset the VF, VF is not working.
> > > > It cannot receive any message from PF. So we cannot know that when
> > > > PF is up. It means normally we have to reset VF twice when PF down
> > > > and up. (Surely we can wait a while when we receive the message
> > > > from PF until PF is up. But we cannot tell how long the time is appropriate.
> > > > So this *wait a while* may work for flash.)
> > >
> > > Thanks for the clarification, I understand.
> > >
> > > The problem with a blocking call is that we basically need to spawn
> > > one thread per rte_eth_dev_reset call, since there is no way of
> > > knowing if a PF is down for good or just flapping, and we can't have
> > > a single thread managing all the interfaces being blocked forever
> > > (EG: PF 1 and 2 go down, thread blocks on PF 1 reset call but it
> > > never returns, meanwhile PF 2 goes back up but call is never made).
> > >
> > > A colleague of mine, Eric Kinzie, suggested to add a blocking
> > > boolean parameter to rte_eth_dev_reset API. If set to false, then
> > > the call will not block and just does one try and return an error (EAGAIN ?).
> Would this be an acceptable proposition?
> > It's a good suggestion.
> > And I think if the parameter is set to false and the link is not up after trying
> once, it will be APP's responsibility to setup a timer or something like that to
> keep trying to bring up the link.
> 
> That seems reasonable. I've thrown together a quick diff and played with it on
> top of your patches and DPDK 2.2, seems to work as intended, I'm attaching it
> for reference. Feel free to pick it up, adapt it or ignore it :-)
> 
> Also I've noticed that the ixgbe is the only one that actually blocks,
> e1000 returns already immediately if the dev_start fails (perhaps it should be
> changed to be consistent?) and ixgb40 does weird things that I'm not sure about,
> but couldn't spot a loop in there :-)
> 
> Also I've used int instead of bool because
> drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
> compilation fails when including stdbool.h and using bool in rte_ethdev.h
> 
> --
> Kind regards,
> Luca Boccassi
Glad to know it's working now.  Thanks for your patch.  Surely I'll try to include it in the next version :)


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-08  0:14                     ` Lu, Wenzhuo
@ 2016-07-08 17:15                       ` Luca Boccassi
  2016-07-11  1:32                         ` Lu, Wenzhuo
  0 siblings, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-07-08 17:15 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Fri, 2016-07-08 at 00:14 +0000, Lu, Wenzhuo wrote:
> > > > > > > > > > Hello Wenzhuo,
> > > > > > > > > >
> > > > > > > > > > I'm testing this patchset, but I am sporadically running
> > > > > > > > > > into an issue where the VFs reset fails after the PF flaps.
> > > > > > > > > >
> > > > > > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs
> > in.
> > > > > > > > > >
> > > > > > > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following
> > > > > > > > > > errors appear in the
> > > > > > > > > > log:
> > > > > > > > > >
> > > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > > > > > > queue_id=0
> > > > > > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX
> > > > > > > > > > hardware
> > > > > > > > > > (-12)
> > > > > > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > > > > > > >
> > > > > > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc
> > > > > > > > > > call in ixgbe_alloc_rx_queue_mbufs returns NULL at
> > > > > > > > > > iteration 64 out of
> > > > 2048.
> > > > > > > > > > The application has ~500 2MB hugepages, and there's 2GB
> > > > > > > > > > of free memory available on top of that.
> > > > > > > > > >
> > > > > > > > > > Have you seen this before? Any pointer or suggestion for
> > debugging?
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Kind regards,
> > > > > > > > > > Luca Boccassi
> > > > > > > > > I think the problem is the mbuf occupied by the packets is
> > > > > > > > > not released. This
> > > > > > > > memory has to be released by the APP, so my patches haven’t
> > > > > > > > covered
> > > > this.
> > > > > > > > Actually an example is needed to show how to use the reset API.
> > > > > > > > I plan to modify the testpmd.
> > > > > > > > > You may notice this feature is postponed to 16.11. Would
> > > > > > > > > you like to wait for
> > > > > > > > the new version that will include an example?
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Unfortunately we need the VF reset working sooner than that,
> > > > > > > > so one way or the other I'll need to sort it out. Given I've
> > > > > > > > got a use case where this is happening, if it can be helpful
> > > > > > > > for you I'm more than happy to help as a guinea pig. If you
> > > > > > > > could please give some guidance/guidelines with regards to
> > > > > > > > which API to use to sort the mbuf
> > > > > > problem, I can try it out and give back some feedback.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > I made a stupid mistake and deleted all my code. So, I have to
> > > > > > > take some time to rewrite it :( Attached the example I used to
> > > > > > > test the reset API. It's
> > > > > > modified from the l2fwd example. So you can compare it with
> > > > > > l2fwd to see what need to be added.
> > > > > > > Hopefully it can help :)
> > > > > >
> > > > > > Thanks! That made me understand a couple of things more, and
> > > > > > I've got past the problem.
> > > > > >
> > > > > > Unfortunately now there's a bigger issue - rte_eth_dev_reset is
> > > > > > a blocking
> > > > call.
> > > > > > the _RESET event callback is fired when the PF goes down, but
> > > > > > when I call rte_eth_dev_reset it will block until the PF goes back up.
> > > > > > There is no way, as far as I can see, to know if the PF is back
> > > > > > up before
> > > > calling rte_eth_dev_reset.
> > > > > >
> > > > > > This is a problem because, as far as I understand, I have to
> > > > > > call all the rte_eth_dev_ APIs from the same thread, in my case
> > > > > > the master thread, and I can't have that block potentially indefinitely.
> > > > > >
> > > > > > Would it be possible to have 2 events instead of 1, one when the
> > > > > > PF goes down and one when it goes up? This way an application
> > > > > > would be able to soft-stop the port (drain queues, etc) when the
> > > > > > PF is down, and then call the reset API when it goes back up.
> > > > > >
> > > > > > Thanks!
> > > > > Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
> > > > > 1, Normally we use kernel driver for PF. Now the kernel driver
> > > > > only have one
> > > > kind of message for link down and up. So we cannot tell if it's down or up.
> > > > > 2, When the PF is down, if we don't reset the VF, VF is not working.
> > > > > It cannot receive any message from PF. So we cannot know that when
> > > > > PF is up. It means normally we have to reset VF twice when PF down
> > > > > and up. (Surely we can wait a while when we receive the message
> > > > > from PF until PF is up. But we cannot tell how long the time is appropriate.
> > > > > So this *wait a while* may work for flash.)
> > > >
> > > > Thanks for the clarification, I understand.
> > > >
> > > > The problem with a blocking call is that we basically need to spawn
> > > > one thread per rte_eth_dev_reset call, since there is no way of
> > > > knowing if a PF is down for good or just flapping, and we can't have
> > > > a single thread managing all the interfaces being blocked forever
> > > > (EG: PF 1 and 2 go down, thread blocks on PF 1 reset call but it
> > > > never returns, meanwhile PF 2 goes back up but call is never made).
> > > >
> > > > A colleague of mine, Eric Kinzie, suggested to add a blocking
> > > > boolean parameter to rte_eth_dev_reset API. If set to false, then
> > > > the call will not block and just does one try and return an error (EAGAIN ?).
> > Would this be an acceptable proposition?
> > > It's a good suggestion.
> > > And I think if the parameter is set to false and the link is not up after trying
> > once, it will be APP's responsibility to setup a timer or something like that to
> > keep trying to bring up the link.
> > 
> > That seems reasonable. I've thrown together a quick diff and played with it on
> > top of your patches and DPDK 2.2, seems to work as intended, I'm attaching it
> > for reference. Feel free to pick it up, adapt it or ignore it :-)
> > 
> > Also I've noticed that the ixgbe is the only one that actually blocks,
> > e1000 returns already immediately if the dev_start fails (perhaps it should be
> > changed to be consistent?) and ixgb40 does weird things that I'm not sure about,
> > but couldn't spot a loop in there :-)
> > 
> > Also I've used int instead of bool because
> > drivers/net/e1000/base/e1000_osdep.h redefines bool and true/false, so
> > compilation fails when including stdbool.h and using bool in rte_ethdev.h
> > 
> > --
> > Kind regards,
> > Luca Boccassi
> Glad to know it's working now.  Thanks for your patch.  Surely I'll try to include it in the next version :)

Great, thanks!

Unfortunately I found one issue: if PF is down, and then the VF on the
guest is down as well (ip link down) and then goes back up before the
PF, then calling rte_eth_dev_reset will return 0 (success), even though
the PF is still down and it should fail. This is with ixgbe. Any idea
what could be the problem?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-08 17:15                       ` Luca Boccassi
@ 2016-07-11  1:32                         ` Lu, Wenzhuo
  2016-07-11 12:02                           ` Luca Boccassi
  0 siblings, 1 reply; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-11  1:32 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

> 
> Unfortunately I found one issue: if PF is down, and then the VF on the guest is
> down as well (ip link down) and then goes back up before the PF, then calling
> rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
> should fail. This is with ixgbe. Any idea what could be the problem?
I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the  registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again. 
But on igb, when PF link is down, we cannot reset VF link successfully, so when the PF link is up, we cannot receive the message. No trigger for us to reset the VF link again. That's why on igb we have to try again and again until it succeed, means until PF link is up.
So the return 0 by rte_eth_dev_reset means the resetting succeeded, not mean the rx/tx is ready. Rx/tx has to depend on the PF link is up.

> 
> --
> Kind regards,
> Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-11  1:32                         ` Lu, Wenzhuo
@ 2016-07-11 12:02                           ` Luca Boccassi
  2016-07-11 15:43                             ` Luca Boccassi
  0 siblings, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-07-11 12:02 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Mon, 2016-07-11 at 01:32 +0000, Lu, Wenzhuo wrote:
> > 
> > Unfortunately I found one issue: if PF is down, and then the VF on the guest is
> > down as well (ip link down) and then goes back up before the PF, then calling
> > rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
> > should fail. This is with ixgbe. Any idea what could be the problem?
> I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the  registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
> It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again. 

What message do you refer to here? I am seeing the RESET callback only
when the PF goes down, not when it goes up.

At the moment, with ixgbe, this happens:

PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
VF link has no-carrier, and traffic does NOT go through

The problem is that there is just no way of being notified that PF is
up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
need to run it again.

> But on igb, when PF link is down, we cannot reset VF link successfully, so when the PF link is up, we cannot receive the message. No trigger for us to reset the VF link again. That's why on igb we have to try again and again until it succeed, means until PF link is up.
> So the return 0 by rte_eth_dev_reset means the resetting succeeded, not mean the rx/tx is ready. Rx/tx has to depend on the PF link is up.

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-11 12:02                           ` Luca Boccassi
@ 2016-07-11 15:43                             ` Luca Boccassi
  2016-07-12  1:19                               ` Lu, Wenzhuo
  2016-08-26 12:58                               ` Luca Boccassi
  0 siblings, 2 replies; 72+ messages in thread
From: Luca Boccassi @ 2016-07-11 15:43 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Mon, 2016-07-11 at 13:02 +0100, Luca Boccassi wrote:
> On Mon, 2016-07-11 at 01:32 +0000, Lu, Wenzhuo wrote:
> > > 
> > > Unfortunately I found one issue: if PF is down, and then the VF on the guest is
> > > down as well (ip link down) and then goes back up before the PF, then calling
> > > rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
> > > should fail. This is with ixgbe. Any idea what could be the problem?
> > I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the  registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
> > It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again. 
> 
> What message do you refer to here? I am seeing the RESET callback only
> when the PF goes down, not when it goes up.
> 
> At the moment, with ixgbe, this happens:
> 
> PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
> down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
> VF link has no-carrier, and traffic does NOT go through
> 
> The problem is that there is just no way of being notified that PF is
> up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
> need to run it again.

I was now able to solve this use case, by having the rte_eth_dev_reset
implementations return -EAGAIN if the dev is not up. This way I know, in
the application, that I have to try again. What do you think?

IMHO it makes sense, as the reset does not actually succeeds, and the
caller should try again. The diff is very trivial, and attached for
reference.

-- 
Kind regards,
Luca Boccassi


Make rte_eth_dev_reset return EAGAIN if VF down

If VF is down the reset will not happen, so the driver should return
EAGAIN to signal the application that it needs to call again
rte_eth_dev_reset.

Signed-off-by: Luca Boccassi <lboccass@brocade.com
---
 drivers/net/e1000/igb_ethdev.c    |    2 +-
 drivers/net/i40e/i40e_ethdev_vf.c |    2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -6235,7 +6235,7 @@ ixgbevf_dev_reset(struct rte_eth_dev *de
 
 	/* Nothing needs to be done if the device is not started. */
 	if (!dev->data->dev_started)
-		 return 0;
+		 return -EAGAIN;
 
 	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
 
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1504,7 +1504,7 @@ i40evf_handle_vf_reset(struct rte_eth_de
 		 I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 
 	if (!dev->data->dev_started)
-		 return 0;
+		 return -EAGAIN;
 
 	adapter->reset_number = 1;
 	i40e_vf_reset_dev(dev);
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2609,7 +2609,7 @@ igbvf_dev_reset(struct rte_eth_dev *dev,
 
 	/* Nothing needs to be done if the device is not started. */
 	if (!dev->data->dev_started)
-		 return 0;
+		 return -EAGAIN;
 
 	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-11 15:43                             ` Luca Boccassi
@ 2016-07-12  1:19                               ` Lu, Wenzhuo
  2016-08-26 12:58                               ` Luca Boccassi
  1 sibling, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-07-12  1:19 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

> -----Original Message-----
> From: Luca Boccassi [mailto:lboccass@Brocade.com]
> Sent: Monday, July 11, 2016 11:43 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> 
> On Mon, 2016-07-11 at 13:02 +0100, Luca Boccassi wrote:
> > On Mon, 2016-07-11 at 01:32 +0000, Lu, Wenzhuo wrote:
> > > >
> > > > Unfortunately I found one issue: if PF is down, and then the VF on
> > > > the guest is down as well (ip link down) and then goes back up
> > > > before the PF, then calling rte_eth_dev_reset will return 0
> > > > (success), even though the PF is still down and it should fail. This is with
> ixgbe. Any idea what could be the problem?
> > > I've found this interesting thing. I believe it’s the HW difference between igb
> and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF
> cannot. The expression is the  registers of the ixgbe VF can be accessed when
> the PF link is down but igb VF cannot.
> > > It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is
> up, we receive the message again and reset the VF link again.
> >
> > What message do you refer to here? I am seeing the RESET callback only
> > when the PF goes down, not when it goes up.
> >
> > At the moment, with ixgbe, this happens:
> >
> > PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
> > down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up
> > -> VF link has no-carrier, and traffic does NOT go through
> >
> > The problem is that there is just no way of being notified that PF is
> > up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
> > need to run it again.
> 
> I was now able to solve this use case, by having the rte_eth_dev_reset
> implementations return -EAGAIN if the dev is not up. This way I know, in the
> application, that I have to try again. What do you think?
> 
> IMHO it makes sense, as the reset does not actually succeeds, and the caller
> should try again. The diff is very trivial, and attached for reference.
Yes, I think the change is reasonable. Sorry, I didn’t realize you're talking about the code you have changed. Maybe we're not on the same page when discussing before :)

> 
> --
> Kind regards,
> Luca Boccassi
> 
> 
> Make rte_eth_dev_reset return EAGAIN if VF down
> 
> If VF is down the reset will not happen, so the driver should return
> EAGAIN to signal the application that it needs to call again
> rte_eth_dev_reset.
> 
> Signed-off-by: Luca Boccassi <lboccass@brocade.com
> ---
>  drivers/net/e1000/igb_ethdev.c    |    2 +-
>  drivers/net/i40e/i40e_ethdev_vf.c |    2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c  |    2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -6235,7 +6235,7 @@ ixgbevf_dev_reset(struct rte_eth_dev *de
> 
>  	/* Nothing needs to be done if the device is not started. */
>  	if (!dev->data->dev_started)
> -		 return 0;
> +		 return -EAGAIN;
> 
>  	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
> 
> --- a/drivers/net/i40e/i40e_ethdev_vf.c
> +++ b/drivers/net/i40e/i40e_ethdev_vf.c
> @@ -1504,7 +1504,7 @@ i40evf_handle_vf_reset(struct rte_eth_de
>  		 I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
> 
>  	if (!dev->data->dev_started)
> -		 return 0;
> +		 return -EAGAIN;
> 
>  	adapter->reset_number = 1;
>  	i40e_vf_reset_dev(dev);
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -2609,7 +2609,7 @@ igbvf_dev_reset(struct rte_eth_dev *dev,
> 
>  	/* Nothing needs to be done if the device is not started. */
>  	if (!dev->data->dev_started)
> -		 return 0;
> +		 return -EAGAIN;
> 
>  	PMD_DRV_LOG(DEBUG, "Link up/down event detected.");
> 

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-07-11 15:43                             ` Luca Boccassi
  2016-07-12  1:19                               ` Lu, Wenzhuo
@ 2016-08-26 12:58                               ` Luca Boccassi
  2016-08-29  1:04                                 ` Lu, Wenzhuo
  1 sibling, 1 reply; 72+ messages in thread
From: Luca Boccassi @ 2016-08-26 12:58 UTC (permalink / raw)
  To: wenzhuo.lu; +Cc: dev

On Mon, 2016-07-11 at 15:43 +0000, Luca Boccassi wrote:
> On Mon, 2016-07-11 at 13:02 +0100, Luca Boccassi wrote:
> > On Mon, 2016-07-11 at 01:32 +0000, Lu, Wenzhuo wrote:
> > > > 
> > > > Unfortunately I found one issue: if PF is down, and then the VF on the guest is
> > > > down as well (ip link down) and then goes back up before the PF, then calling
> > > > rte_eth_dev_reset will return 0 (success), even though the PF is still down and it
> > > > should fail. This is with ixgbe. Any idea what could be the problem?
> > > I've found this interesting thing. I believe it’s the HW difference between igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb VF cannot. The expression is the  registers of the ixgbe VF can be accessed when the PF link is down but igb VF cannot.
> > > It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link is up, we receive the message again and reset the VF link again. 
> > 
> > What message do you refer to here? I am seeing the RESET callback only
> > when the PF goes down, not when it goes up.
> > 
> > At the moment, with ixgbe, this happens:
> > 
> > PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
> > down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up ->
> > VF link has no-carrier, and traffic does NOT go through
> > 
> > The problem is that there is just no way of being notified that PF is
> > up, and if rte_eth_dev_reset succeeds I have no way of knowing that I
> > need to run it again.
> 
> I was now able to solve this use case, by having the rte_eth_dev_reset
> implementations return -EAGAIN if the dev is not up. This way I know, in
> the application, that I have to try again. What do you think?
> 
> IMHO it makes sense, as the reset does not actually succeeds, and the
> caller should try again. The diff is very trivial, and attached for
> reference.

Hi,

Is there any update on resubmitting this patchset for 16.11? Thanks!

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH v6 0/4] support reset of VF link
  2016-08-26 12:58                               ` Luca Boccassi
@ 2016-08-29  1:04                                 ` Lu, Wenzhuo
  0 siblings, 0 replies; 72+ messages in thread
From: Lu, Wenzhuo @ 2016-08-29  1:04 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

Hi Luca,


> -----Original Message-----
> From: Luca Boccassi [mailto:lboccass@Brocade.com]
> Sent: Friday, August 26, 2016 8:58 PM
> To: Lu, Wenzhuo
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> 
> On Mon, 2016-07-11 at 15:43 +0000, Luca Boccassi wrote:
> > On Mon, 2016-07-11 at 13:02 +0100, Luca Boccassi wrote:
> > > On Mon, 2016-07-11 at 01:32 +0000, Lu, Wenzhuo wrote:
> > > > >
> > > > > Unfortunately I found one issue: if PF is down, and then the VF
> > > > > on the guest is down as well (ip link down) and then goes back
> > > > > up before the PF, then calling rte_eth_dev_reset will return 0
> > > > > (success), even though the PF is still down and it should fail. This is with
> ixgbe. Any idea what could be the problem?
> > > > I've found this interesting thing. I believe it’s the HW difference between
> igb and ixgbe. When the link is down, ixgbe VF can be reset successfully but igb
> VF cannot. The expression is the  registers of the ixgbe VF can be accessed when
> the PF link is down but igb VF cannot.
> > > > It means, on ixgbe, when PF link is down, we reset the VF link. Then PF link
> is up, we receive the message again and reset the VF link again.
> > >
> > > What message do you refer to here? I am seeing the RESET callback
> > > only when the PF goes down, not when it goes up.
> > >
> > > At the moment, with ixgbe, this happens:
> > >
> > > PF down -> reset notification, rte_eth_dev_reset keeps failing -> VF
> > > down -> VF up -> rte_eth_dev_reset in a loop/timer succeeds -> PF up
> > > -> VF link has no-carrier, and traffic does NOT go through
> > >
> > > The problem is that there is just no way of being notified that PF
> > > is up, and if rte_eth_dev_reset succeeds I have no way of knowing
> > > that I need to run it again.
> >
> > I was now able to solve this use case, by having the rte_eth_dev_reset
> > implementations return -EAGAIN if the dev is not up. This way I know,
> > in the application, that I have to try again. What do you think?
> >
> > IMHO it makes sense, as the reset does not actually succeeds, and the
> > caller should try again. The diff is very trivial, and attached for
> > reference.
> 
> Hi,
> 
> Is there any update on resubmitting this patchset for 16.11? Thanks!
Sorry, we're short of hands, so this feature is planned to R17.02.

> 
> --
> Kind regards,
> Luca Boccassi

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2016-08-29  1:04 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-06  5:40 [PATCH 0/8] support reset of VF link Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 1/8] lib/librte_ether: support device reset Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 2/8] lib/librte_ether: defind RX/TX lock mode Wenzhuo Lu
2016-06-08  2:15   ` Stephen Hemminger
2016-06-08  7:34     ` Lu, Wenzhuo
2016-06-09  7:50       ` Olivier Matz
2016-06-12  5:25         ` Lu, Wenzhuo
2016-06-10 18:12       ` Stephen Hemminger
2016-06-12  5:27         ` Lu, Wenzhuo
2016-06-06  5:40 ` [PATCH 3/8] ixgbe: RX/TX with lock on VF Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 4/8] ixgbe: implement device reset " Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 5/8] igb: RX/TX with lock " Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 6/8] igb: implement device reset " Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 7/8] i40e:RX/TX with lock " Wenzhuo Lu
2016-06-06  5:40 ` [PATCH 8/8] i40e: implement device reset " Wenzhuo Lu
2016-06-15  3:03 ` [PATCH v5 0/4] support reset of VF link Wenzhuo Lu
2016-06-15  3:03   ` [PATCH v5 1/4] lib/librte_ether: support device reset Wenzhuo Lu
2016-06-16 15:31     ` Bruce Richardson
2016-06-16 15:36     ` Thomas Monjalon
2016-06-15  3:03   ` [PATCH v5 2/4] ixgbe: implement device reset on VF Wenzhuo Lu
2016-06-15  3:03   ` [PATCH v5 3/4] igb: " Wenzhuo Lu
2016-06-15  3:03   ` [PATCH v5 4/4] i40e: " Wenzhuo Lu
2016-06-20  6:24 ` [PATCH v6 0/4] support reset of VF link Wenzhuo Lu
2016-06-20  6:24   ` [PATCH v6 1/4] lib/librte_ether: support device reset Wenzhuo Lu
2016-06-20  9:14     ` Jerin Jacob
2016-06-20 16:17       ` Stephen Hemminger
2016-06-21  3:51         ` Jerin Jacob
2016-06-21  6:14           ` Lu, Wenzhuo
2016-06-21  7:37             ` Jerin Jacob
2016-06-21  8:24               ` Lu, Wenzhuo
2016-06-21  8:55                 ` Jerin Jacob
2016-06-21  9:26                   ` Ananyev, Konstantin
2016-06-21 10:57                     ` Jerin Jacob
2016-06-21 13:10                       ` Ananyev, Konstantin
2016-06-21 13:30                         ` Jerin Jacob
2016-06-21 14:03                           ` Ananyev, Konstantin
2016-06-21 14:29                             ` Jerin Jacob
2016-06-22  1:35                               ` Lu, Wenzhuo
2016-06-22  2:37                                 ` Jerin Jacob
2016-06-22  3:32                                   ` Lu, Wenzhuo
2016-06-22  4:14                                     ` Jerin Jacob
2016-06-22  5:05                                       ` Lu, Wenzhuo
2016-06-22  6:10                                         ` Jerin Jacob
2016-06-22  6:42                                           ` Lu, Wenzhuo
2016-06-22  7:59                                             ` Jerin Jacob
2016-06-22  8:17                                               ` Thomas Monjalon
2016-06-22  8:25                                                 ` Lu, Wenzhuo
2016-06-22  9:18                                                   ` Thomas Monjalon
2016-06-22 11:06                                                     ` Jerin Jacob
2016-06-23  0:45                                                       ` Lu, Wenzhuo
2016-06-23  0:39                                                     ` Lu, Wenzhuo
2016-06-21  0:51       ` Lu, Wenzhuo
2016-06-20  6:24   ` [PATCH v6 2/4] ixgbe: implement device reset on VF Wenzhuo Lu
2016-06-20  6:24   ` [PATCH v6 3/4] igb: " Wenzhuo Lu
2016-06-20  6:24   ` [PATCH v6 4/4] i40e: " Wenzhuo Lu
2016-07-04 15:48   ` [PATCH v6 0/4] support reset of VF link Luca Boccassi
2016-07-05  0:52     ` Lu, Wenzhuo
2016-07-05  9:52       ` Luca Boccassi
2016-07-06  0:45         ` Lu, Wenzhuo
2016-07-06 16:26           ` Luca Boccassi
     [not found]           ` <1467822182.32466.34.camel@brocade.com>
2016-07-07  1:09             ` Lu, Wenzhuo
2016-07-07 10:20               ` Luca Boccassi
2016-07-07 13:12                 ` Lu, Wenzhuo
2016-07-07 16:19                   ` Luca Boccassi
2016-07-08  0:14                     ` Lu, Wenzhuo
2016-07-08 17:15                       ` Luca Boccassi
2016-07-11  1:32                         ` Lu, Wenzhuo
2016-07-11 12:02                           ` Luca Boccassi
2016-07-11 15:43                             ` Luca Boccassi
2016-07-12  1:19                               ` Lu, Wenzhuo
2016-08-26 12:58                               ` Luca Boccassi
2016-08-29  1:04                                 ` Lu, Wenzhuo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.