All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH v2 07/18] fm10k: wait for queues to drain if stop_hw() fails once
Date: Tue,  7 Jun 2016 16:08:51 -0700	[thread overview]
Message-ID: <20160607230902.5457-8-jacob.e.keller@intel.com> (raw)
In-Reply-To: <20160607230902.5457-1-jacob.e.keller@intel.com>

It turns out that sometimes during a reset the Tx queues will be
temporarily stuck longer than .stop_hw() expects. Work around this issue
by attempting to .stop_hw() first. If it tails, wait a number of
attempts until the Tx queues appear to be drained. After this, attempt
stop_hw() again. This ensures that we avoid waiting if we don't need to,
such as during the first initialization of a VF, and give the proper
amount of time necessary to recover from most situations. It is possible
that the hardware is actually stuck. For PFs, this is usually fixed by
a datapath reset. Unfortunately the VF cannot request a similar reset
for itself.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k.h      |  1 +
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |  2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c  | 44 +++++++++++++++++++++++----
 3 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index c8d0817766bf..c4cf08dcf5af 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -458,6 +458,7 @@ __be16 fm10k_tx_encap_offload(struct sk_buff *skb);
 netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
 				  struct fm10k_ring *tx_ring);
 void fm10k_tx_timeout_reset(struct fm10k_intfc *interface);
+u64 fm10k_get_tx_pending(struct fm10k_ring *ring);
 bool fm10k_check_tx_hang(struct fm10k_ring *tx_ring);
 void fm10k_alloc_rx_buffers(struct fm10k_ring *rx_ring, u16 cleaned_count);
 
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index c6a464551577..c85fc98945fa 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1128,7 +1128,7 @@ static u64 fm10k_get_tx_completed(struct fm10k_ring *ring)
 	return ring->stats.packets;
 }
 
-static u64 fm10k_get_tx_pending(struct fm10k_ring *ring)
+u64 fm10k_get_tx_pending(struct fm10k_ring *ring)
 {
 	struct fm10k_intfc *interface = ring->q_vector->interface;
 	struct fm10k_hw *hw = &interface->hw;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 4dfd1284a8de..7c9b20c6b6c1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1613,7 +1613,7 @@ void fm10k_down(struct fm10k_intfc *interface)
 {
 	struct net_device *netdev = interface->netdev;
 	struct fm10k_hw *hw = &interface->hw;
-	int err;
+	int err, i = 0, count = 0;
 
 	/* signal that we are down to the interrupt handler and service task */
 	if (test_and_set_bit(__FM10K_DOWN, &interface->state))
@@ -1629,9 +1629,6 @@ void fm10k_down(struct fm10k_intfc *interface)
 	/* reset Rx filters */
 	fm10k_reset_rx_state(interface);
 
-	/* allow 10ms for device to quiesce */
-	usleep_range(10000, 20000);
-
 	/* disable polling routines */
 	fm10k_napi_disable_all(interface);
 
@@ -1642,11 +1639,46 @@ void fm10k_down(struct fm10k_intfc *interface)
 	while (test_and_set_bit(__FM10K_UPDATING_STATS, &interface->state))
 		usleep_range(1000, 2000);
 
+	/* skip waiting for TX DMA if we lost PCIe link */
+	if (FM10K_REMOVED(hw->hw_addr))
+		goto skip_tx_dma_drain;
+
+	/* In some rare circumstances it can take a while for Tx queues to
+	 * quiesce and be fully disabled. Attempt to .stop_hw() first, and
+	 * then if we get ERR_REQUESTS_PENDING, go ahead and wait in a loop
+	 * until the Tx queues have emptied, or until a number of retries. If
+	 * we fail to clear within the retry loop, we will issue a warning
+	 * indicating that Tx DMA is probably hung. Note this means we call
+	 * .stop_hw() twice but this shouldn't cause any problems.
+	 */
+	err = hw->mac.ops.stop_hw(hw);
+	if (err != FM10K_ERR_REQUESTS_PENDING)
+		goto skip_tx_dma_drain;
+
+#define TX_DMA_DRAIN_RETRIES 25
+	for (count = 0; count < TX_DMA_DRAIN_RETRIES; count++) {
+		usleep_range(10000, 20000);
+
+		/* start checking@the last ring to have pending Tx */
+		for (; i < interface->num_tx_queues; i++)
+			if (fm10k_get_tx_pending(interface->tx_ring[i]))
+				break;
+
+		/* if all the queues are drained, we can break now */
+		if (i == interface->num_tx_queues)
+			break;
+	}
+
+	if (count >= TX_DMA_DRAIN_RETRIES)
+		dev_err(&interface->pdev->dev,
+			"Tx queues failed to drain after %d tries. Tx DMA is probably hung.\n",
+			count);
+skip_tx_dma_drain:
 	/* Disable DMA engine for Tx/Rx */
 	err = hw->mac.ops.stop_hw(hw);
 	if (err == FM10K_ERR_REQUESTS_PENDING)
-		dev_info(&interface->pdev->dev,
-			 "due to pending requests hw was not shut down gracefully\n");
+		dev_err(&interface->pdev->dev,
+			"due to pending requests hw was not shut down gracefully\n");
 	else if (err)
 		dev_err(&interface->pdev->dev, "stop_hw failed: %d\n", err);
 
-- 
2.9.0.rc1.405.g81f467e


  parent reply	other threads:[~2016-06-07 23:08 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-07 23:08 [Intel-wired-lan] [PATCH v2 00/18] fm10k fixes for suspend/resume and related Jacob Keller
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 01/18] fm10k: prevent multiple threads updating statistics Jacob Keller
2016-07-11 19:24   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 02/18] fm10k: Reset mailbox global interrupts Jacob Keller
2016-06-23 23:19   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 03/18] fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING Jacob Keller
2016-06-23 23:19   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 04/18] fm10k: perform data path reset even when switch is not ready Jacob Keller
2016-06-23 23:20   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 05/18] fm10k: use actual hardware registers when checking for pending Tx Jacob Keller
2016-06-23 23:22   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 06/18] fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING Jacob Keller
2016-06-23 23:23   ` Singh, Krishneil K
2016-06-07 23:08 ` Jacob Keller [this message]
2016-06-23 23:24   ` [Intel-wired-lan] [PATCH v2 07/18] fm10k: wait for queues to drain if stop_hw() fails once Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 08/18] fm10k: split fm10k_reinit into two functions Jacob Keller
2016-06-23 23:25   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 09/18] fm10k: implement prepare_suspend and handle_resume Jacob Keller
2016-06-23 23:25   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 10/18] fm10k: use common reset flow when handling io errors from PCI stack Jacob Keller
2016-06-23 23:26   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 11/18] fm10k: implement reset_notify handler for PCIe FLR events Jacob Keller
2016-06-23 23:27   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 12/18] fm10k: use common flow for suspend and resume Jacob Keller
2016-06-23 23:28   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 13/18] fm10k: enable bus master after every reset Jacob Keller
2016-06-23 23:29   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 14/18] fm10k: check if PCIe link is restored Jacob Keller
2016-06-23 23:29   ` Singh, Krishneil K
2016-06-07 23:08 ` [Intel-wired-lan] [PATCH v2 15/18] fm10k: implement request_lport_map pointer Jacob Keller
2016-06-23 23:30   ` Singh, Krishneil K
2016-06-07 23:09 ` [Intel-wired-lan] [PATCH v2 16/18] fm10k: force link to remain down for at least a second on resume events Jacob Keller
2016-06-23 23:31   ` Singh, Krishneil K
2016-06-07 23:09 ` [Intel-wired-lan] [PATCH v2 17/18] fm10k: return proper error code when pci_enable_msix_range fails Jacob Keller
2016-06-23 23:31   ` Singh, Krishneil K
2016-06-07 23:09 ` [Intel-wired-lan] [PATCH v2 18/18] fm10k: bump version number Jacob Keller
2016-06-14 16:47   ` Singh, Krishneil K

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160607230902.5457-8-jacob.e.keller@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=intel-wired-lan@osuosl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.