All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	netdev@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>, Aya Levin <ayal@nvidia.com>
Subject: [net 08/14] net/mlx5: Fix sync reset event handler error flow
Date: Mon, 21 Nov 2022 18:25:53 -0800	[thread overview]
Message-ID: <20221122022559.89459-9-saeed@kernel.org> (raw)
In-Reply-To: <20221122022559.89459-1-saeed@kernel.org>

From: Moshe Shemesh <moshe@nvidia.com>

When sync reset now event handling fails on mlx5_pci_link_toggle() then
no reset was done. However, since mlx5_cmd_fast_teardown_hca() was
already done, the firmware function is closed and the driver is left
without firmware functionality.

Fix it by setting device error state and reopen the firmware resources.
Reopening is done by the thread that was called for devlink reload
fw_activate as it already holds the devlink lock.

Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 9d908a0ccfef..1e46f9afa40e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -9,7 +9,8 @@ enum {
 	MLX5_FW_RESET_FLAGS_RESET_REQUESTED,
 	MLX5_FW_RESET_FLAGS_NACK_RESET_REQUEST,
 	MLX5_FW_RESET_FLAGS_PENDING_COMP,
-	MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS
+	MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS,
+	MLX5_FW_RESET_FLAGS_RELOAD_REQUIRED
 };
 
 struct mlx5_fw_reset {
@@ -406,7 +407,7 @@ static void mlx5_sync_reset_now_event(struct work_struct *work)
 	err = mlx5_pci_link_toggle(dev);
 	if (err) {
 		mlx5_core_warn(dev, "mlx5_pci_link_toggle failed, no reset done, err %d\n", err);
-		goto done;
+		set_bit(MLX5_FW_RESET_FLAGS_RELOAD_REQUIRED, &fw_reset->reset_flags);
 	}
 
 	mlx5_enter_error_state(dev, true);
@@ -482,6 +483,10 @@ int mlx5_fw_reset_wait_reset_done(struct mlx5_core_dev *dev)
 		goto out;
 	}
 	err = fw_reset->ret;
+	if (test_and_clear_bit(MLX5_FW_RESET_FLAGS_RELOAD_REQUIRED, &fw_reset->reset_flags)) {
+		mlx5_unload_one_devl_locked(dev);
+		mlx5_load_one_devl_locked(dev, false);
+	}
 out:
 	clear_bit(MLX5_FW_RESET_FLAGS_PENDING_COMP, &fw_reset->reset_flags);
 	return err;
-- 
2.38.1


  parent reply	other threads:[~2022-11-22  2:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22  2:25 [pull request][net 00/14] mlx5 fixes 2022-11-21 Saeed Mahameed
2022-11-22  2:25 ` [net 01/14] net/mlx5: Do not query pci info while pci disabled Saeed Mahameed
2022-11-23  5:00   ` patchwork-bot+netdevbpf
2022-11-22  2:25 ` [net 02/14] net/mlx5: Fix FW tracer timestamp calculation Saeed Mahameed
2022-11-22  2:25 ` [net 03/14] net/mlx5: SF: Fix probing active SFs during driver probe phase Saeed Mahameed
2022-11-23 14:57   ` Maciej Fijalkowski
2022-11-23 17:11     ` Parav Pandit
2022-11-23 17:44       ` Maciej Fijalkowski
2022-11-23 23:36     ` Saeed Mahameed
2022-11-22  2:25 ` [net 04/14] net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint Saeed Mahameed
2022-11-23 15:06   ` Maciej Fijalkowski
2022-11-23 23:48     ` Saeed Mahameed
2022-11-24  1:55       ` Jakub Kicinski
2022-11-24  4:37         ` Saeed Mahameed
2022-11-22  2:25 ` [net 05/14] net/mlx5: Fix handling of entry refcount when command is not issued to FW Saeed Mahameed
2022-11-22  2:25 ` [net 06/14] net/mlx5: Lag, avoid lockdep warnings Saeed Mahameed
2022-11-22  2:25 ` [net 07/14] net/mlx5: E-Switch, Set correctly vport destination Saeed Mahameed
2022-11-22  2:25 ` Saeed Mahameed [this message]
2022-11-22  2:25 ` [net 09/14] net/mlx5e: Fix missing alignment in size of MTT/KLM entries Saeed Mahameed
2022-11-22  2:25 ` [net 10/14] net/mlx5e: Offload rule only when all encaps are valid Saeed Mahameed
2022-11-22  2:25 ` [net 11/14] net/mlx5e: Remove leftovers from old XSK queues enumeration Saeed Mahameed
2022-11-22  2:25 ` [net 12/14] net/mlx5e: Fix MACsec SA initialization routine Saeed Mahameed
2022-11-22  2:25 ` [net 13/14] net/mlx5e: Fix MACsec update SecY Saeed Mahameed
2022-11-23 15:21   ` Maciej Fijalkowski
2022-11-23 23:57     ` Saeed Mahameed
2022-11-22  2:25 ` [net 14/14] net/mlx5e: Fix possible race condition in macsec extended packet number update routine Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221122022559.89459-9-saeed@kernel.org \
    --to=saeed@kernel.org \
    --cc=ayal@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.