All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@mellanox.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Huy Nguyen <huyn@mellanox.com>,
	Saeed Mahameed <saeedm@mellanox.com>
Subject: [net 2/6] net/mlx5: Cancel health poll before sending panic teardown command
Date: Tue,  7 Nov 2017 23:21:38 -0800	[thread overview]
Message-ID: <20171108072142.30870-3-saeedm@mellanox.com> (raw)
In-Reply-To: <20171108072142.30870-1-saeedm@mellanox.com>

From: Huy Nguyen <huyn@mellanox.com>

After the panic teardown firmware command, health_care detects the error
in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is
no longer needed because the panic teardown firmware command will bring
down the PCI bus communication with the HCA.

The solution is to cancel the health care timer and its pending
workqueue request before sending panic teardown firmware command.

Kernel trace:
mlx5_core 0033:01:00.0: Shutdown was called
mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here
mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1
mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called
mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start
mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0
mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end
Unable to handle kernel paging request for data at address 0x0000003f
Faulting instruction address: 0xc0080000434b8c80

Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow')
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0d2c8dcd6eae..06562c9a6b9c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1482,9 +1482,16 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev *dev)
 		return -EAGAIN;
 	}
 
+	/* Panic tear down fw command will stop the PCI bus communication
+	 * with the HCA, so the health polll is no longer needed.
+	 */
+	mlx5_drain_health_wq(dev);
+	mlx5_stop_health_poll(dev);
+
 	ret = mlx5_cmd_force_teardown_hca(dev);
 	if (ret) {
 		mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret);
+		mlx5_start_health_poll(dev);
 		return ret;
 	}
 
-- 
2.14.2

  parent reply	other threads:[~2017-11-08  7:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-08  7:21 [pull request][net 0/6] Mellanox, mlx5 fixes 2017-11-08 Saeed Mahameed
2017-11-08  7:21 ` [net 1/6] net/mlx5: Loop over temp list to release delay events Saeed Mahameed
2017-11-08  7:21 ` Saeed Mahameed [this message]
2017-11-08 14:28   ` [net 2/6] net/mlx5: Cancel health poll before sending panic teardown command Or Gerlitz
2017-11-08  7:21 ` [net 3/6] net/mlx5: FPGA, return -EINVAL if size is zero Saeed Mahameed
2017-11-08 14:13   ` Or Gerlitz
2017-11-09  7:43     ` Kamal Heib
2017-11-09  9:12       ` Or Gerlitz
2017-11-10  6:13         ` Saeed Mahameed
2017-11-10  6:23           ` Or Gerlitz
2017-11-10  6:37             ` Saeed Mahameed
2017-11-09  9:13       ` Or Gerlitz
2017-11-08  7:21 ` [net 4/6] net/mlx5e: Fix napi poll with zero budget Saeed Mahameed
2017-11-08  7:21 ` [net 5/6] net/mlx5e: Set page to null in case dma mapping fails Saeed Mahameed
2017-11-08  7:21 ` [net 6/6] net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171108072142.30870-3-saeedm@mellanox.com \
    --to=saeedm@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=huyn@mellanox.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.