netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/14] bnxt_en: health and error recovery.
@ 2019-08-26  3:54 Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

This patchset implements adapter health and error recovery.  The status
is reported through several devlink reporters and the driver will
initiate and complete the coordinated recovery process.

Michael Chan (11):
  bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent
    mode.
  bnxt_en: Prepare bnxt_init_one() to be called multiple times.
  bnxt_en: Refactor bnxt_sriov_enable().
  bnxt_en: Handle firmware reset status during IF_UP.
  bnxt_en: Discover firmware error recovery capabilities.
  bnxt_en: Pre-map the firmware health monitoring registers.
  bnxt_en: Enable health monitoring.
  bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs.
  bnxt_en: Handle RESET_NOTIFY async event from firmware.
  bnxt_en: Do not send firmware messages if firmware is in error state.
  bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task().

Vasundhara Volam (3):
  bnxt_en: Add new FW devlink_health_reporter
  bnxt_en: Retain user settings on a VF after RESET_NOTIFY event.
  bnxt_en: Add FW fatal devlink_health_reporter

 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 795 ++++++++++++++++++++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  88 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 189 +++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   5 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c   |  96 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h   |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c     |   3 +
 9 files changed, 1092 insertions(+), 88 deletions(-)

-- 
2.5.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  5:15   ` David Miller
  2019-08-26  3:54 ` [PATCH net-next 02/14] bnxt_en: Prepare bnxt_init_one() to be called multiple times Michael Chan
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

If the silent parameter is set, suppress all messages when there is
no response from firmware.  When polling for firmware to come out of
reset, no response may be normal and we want to suppress the error
messages.  Also, don't poll for the firmware DMA response if Bus Master
is disabled.  This is in preparation for error recovery when firmware
may be in error or reset state or Bus Master is disabled.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b9ad43d..b1d4c2d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4129,6 +4129,9 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 	/* Ring channel doorbell */
 	writel(1, bp->bar0 + doorbell_offset);
 
+	if (!pci_is_enabled(bp->pdev))
+		return 0;
+
 	if (!timeout)
 		timeout = DFLT_HWRM_CMD_TIMEOUT;
 	/* convert timeout to usec */
@@ -4160,6 +4163,8 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		}
 
 		if (bp->hwrm_intr_seq_id != (u16)~seq_id) {
+			if (silent)
+				return -EBUSY;
 			netdev_err(bp->dev, "Resp cmpl intr err msg: 0x%x\n",
 				   le16_to_cpu(req->req_type));
 			return -1;
@@ -4186,6 +4191,8 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		}
 
 		if (i >= tmo_count) {
+			if (silent)
+				return -EBUSY;
 			netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d\n",
 				   HWRM_TOTAL_TIMEOUT(i),
 				   le16_to_cpu(req->req_type),
@@ -4204,6 +4211,8 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		}
 
 		if (j >= HWRM_VALID_BIT_DELAY_USEC) {
+			if (silent)
+				return -EBUSY;
 			netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d v:%d\n",
 				   HWRM_TOTAL_TIMEOUT(i),
 				   le16_to_cpu(req->req_type),
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 02/14] bnxt_en: Prepare bnxt_init_one() to be called multiple times.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

In preparation for the new firmware reset feature, some of the logic
in bnxt_init_one() will be called again after firmware has reset.
Reset some of the flags and capabilities so that everything that can
change can be re-initialized.  Refactor some functions to probe
firmware versions and capabilities.  Check some buffers before
allocating as they may have been allocated previously.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 123 +++++++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   5 +-
 3 files changed, 90 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b1d4c2d..3e7a559 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3555,6 +3555,9 @@ static int bnxt_alloc_kong_hwrm_resources(struct bnxt *bp)
 {
 	struct pci_dev *pdev = bp->pdev;
 
+	if (bp->hwrm_cmd_kong_resp_addr)
+		return 0;
+
 	bp->hwrm_cmd_kong_resp_addr =
 		dma_alloc_coherent(&pdev->dev, PAGE_SIZE,
 				   &bp->hwrm_cmd_kong_resp_dma_addr,
@@ -3594,6 +3597,9 @@ static int bnxt_alloc_hwrm_short_cmd_req(struct bnxt *bp)
 {
 	struct pci_dev *pdev = bp->pdev;
 
+	if (bp->hwrm_short_cmd_req_addr)
+		return 0;
+
 	bp->hwrm_short_cmd_req_addr =
 		dma_alloc_coherent(&pdev->dev, bp->hwrm_max_ext_req_len,
 				   &bp->hwrm_short_cmd_req_dma_addr,
@@ -4995,6 +5001,7 @@ static int bnxt_hwrm_vnic_qcaps(struct bnxt *bp)
 	int rc;
 
 	bp->hw_ring_stats_size = sizeof(struct ctx_hw_stats);
+	bp->flags &= ~(BNXT_FLAG_NEW_RSS_CAP | BNXT_FLAG_ROCE_MIRROR_CAP);
 	if (bp->hwrm_spec_code < 0x10600)
 		return 0;
 
@@ -6857,6 +6864,7 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp)
 		pf->max_tx_wm_flows = le32_to_cpu(resp->max_tx_wm_flows);
 		pf->max_rx_em_flows = le32_to_cpu(resp->max_rx_em_flows);
 		pf->max_rx_wm_flows = le32_to_cpu(resp->max_rx_wm_flows);
+		bp->flags &= ~BNXT_FLAG_WOL_CAP;
 		if (flags & FUNC_QCAPS_RESP_FLAGS_WOL_MAGICPKT_SUPPORTED)
 			bp->flags |= BNXT_FLAG_WOL_CAP;
 	} else {
@@ -6985,20 +6993,30 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
 	return rc;
 }
 
-static int bnxt_hwrm_ver_get(struct bnxt *bp)
+static int __bnxt_hwrm_ver_get(struct bnxt *bp, bool silent)
 {
-	int rc;
 	struct hwrm_ver_get_input req = {0};
-	struct hwrm_ver_get_output *resp = bp->hwrm_cmd_resp_addr;
-	u32 dev_caps_cfg;
+	int rc;
 
-	bp->hwrm_max_req_len = HWRM_MAX_REQ_LEN;
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_VER_GET, -1, -1);
 	req.hwrm_intf_maj = HWRM_VERSION_MAJOR;
 	req.hwrm_intf_min = HWRM_VERSION_MINOR;
 	req.hwrm_intf_upd = HWRM_VERSION_UPDATE;
+
+	rc = bnxt_hwrm_do_send_msg(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT,
+				   silent);
+	return rc;
+}
+
+static int bnxt_hwrm_ver_get(struct bnxt *bp)
+{
+	struct hwrm_ver_get_output *resp = bp->hwrm_cmd_resp_addr;
+	u32 dev_caps_cfg;
+	int rc;
+
+	bp->hwrm_max_req_len = HWRM_MAX_REQ_LEN;
 	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	rc = __bnxt_hwrm_ver_get(bp, false);
 	if (rc)
 		goto hwrm_ver_get_exit;
 
@@ -8171,6 +8189,9 @@ static int bnxt_hwrm_phy_qcaps(struct bnxt *bp)
 	struct hwrm_port_phy_qcaps_output *resp = bp->hwrm_cmd_resp_addr;
 	struct bnxt_link_info *link_info = &bp->link_info;
 
+	bp->flags &= ~BNXT_FLAG_EEE_CAP;
+	if (bp->test_info)
+		bp->test_info->flags &= ~BNXT_TEST_FL_EXT_LPBK;
 	if (bp->hwrm_spec_code < 0x10201)
 		return 0;
 
@@ -8536,6 +8557,7 @@ static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp)
 	struct bnxt_pf_info *pf = &bp->pf;
 	int rc;
 
+	bp->num_leds = 0;
 	if (BNXT_VF(bp) || bp->hwrm_spec_code < 0x10601)
 		return 0;
 
@@ -8630,6 +8652,7 @@ static void bnxt_get_wol_settings(struct bnxt *bp)
 {
 	u16 handle = 0;
 
+	bp->wol = 0;
 	if (!BNXT_PF(bp) || !(bp->flags & BNXT_FLAG_WOL_CAP))
 		return;
 
@@ -9992,6 +10015,53 @@ static int bnxt_fw_init_one_p2(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_set_dflt_rss_hash_type(struct bnxt *bp)
+{
+	bp->flags &= ~BNXT_FLAG_UDP_RSS_CAP;
+	bp->rss_hash_cfg = VNIC_RSS_CFG_REQ_HASH_TYPE_IPV4 |
+			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4 |
+			   VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6 |
+			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV6;
+	if (BNXT_CHIP_P4(bp) && bp->hwrm_spec_code >= 0x10501) {
+		bp->flags |= BNXT_FLAG_UDP_RSS_CAP;
+		bp->rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV4 |
+				    VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV6;
+	}
+}
+
+static void bnxt_set_dflt_rfs(struct bnxt *bp)
+{
+	struct net_device *dev = bp->dev;
+
+	dev->hw_features &= ~NETIF_F_NTUPLE;
+	dev->features &= ~NETIF_F_NTUPLE;
+	bp->flags &= ~BNXT_FLAG_RFS;
+	if (bnxt_rfs_supported(bp)) {
+		dev->hw_features |= NETIF_F_NTUPLE;
+		if (bnxt_rfs_capable(bp)) {
+			bp->flags |= BNXT_FLAG_RFS;
+			dev->features |= NETIF_F_NTUPLE;
+		}
+	}
+}
+
+static void bnxt_fw_init_one_p3(struct bnxt *bp)
+{
+	struct pci_dev *pdev = bp->pdev;
+
+	bnxt_set_dflt_rss_hash_type(bp);
+	bnxt_set_dflt_rfs(bp);
+
+	bnxt_get_wol_settings(bp);
+	if (bp->flags & BNXT_FLAG_WOL_CAP)
+		device_set_wakeup_enable(&pdev->dev, bp->wol);
+	else
+		device_set_wakeup_capable(&pdev->dev, false);
+
+	bnxt_hwrm_set_cache_line_size(bp, cache_line_size());
+	bnxt_hwrm_coal_params_qcaps(bp);
+}
+
 static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 {
 	int rc;
@@ -10597,7 +10667,7 @@ static void bnxt_remove_one(struct pci_dev *pdev)
 	free_netdev(dev);
 }
 
-static int bnxt_probe_phy(struct bnxt *bp)
+static int bnxt_probe_phy(struct bnxt *bp, bool fw_dflt)
 {
 	int rc = 0;
 	struct bnxt_link_info *link_info = &bp->link_info;
@@ -10608,8 +10678,6 @@ static int bnxt_probe_phy(struct bnxt *bp)
 			   rc);
 		return rc;
 	}
-	mutex_init(&bp->link_lock);
-
 	rc = bnxt_update_link(bp, false);
 	if (rc) {
 		netdev_err(bp->dev, "Probe phy can't update link (rc: %x)\n",
@@ -10623,6 +10691,9 @@ static int bnxt_probe_phy(struct bnxt *bp)
 	if (link_info->auto_link_speeds && !link_info->support_auto_speeds)
 		link_info->support_auto_speeds = link_info->support_speeds;
 
+	if (!fw_dflt)
+		return 0;
+
 	/*initialize the ethool setting copy with NVM settings */
 	if (BNXT_AUTO_MODE(link_info->auto_mode)) {
 		link_info->autoneg = BNXT_AUTONEG_SPEED;
@@ -10643,7 +10714,7 @@ static int bnxt_probe_phy(struct bnxt *bp)
 			link_info->auto_pause_setting & BNXT_LINK_PAUSE_BOTH;
 	else
 		link_info->req_flow_ctrl = link_info->force_pause_setting;
-	return rc;
+	return 0;
 }
 
 static int bnxt_get_max_irq(struct pci_dev *pdev)
@@ -10947,6 +11018,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto init_err_pci_clean;
 
 	mutex_init(&bp->hwrm_cmd_lock);
+	mutex_init(&bp->link_lock);
 
 	rc = bnxt_fw_init_one_p1(bp);
 	if (rc)
@@ -11022,7 +11094,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->min_mtu = ETH_ZLEN;
 	dev->max_mtu = bp->max_mtu;
 
-	rc = bnxt_probe_phy(bp);
+	rc = bnxt_probe_phy(bp, true);
 	if (rc)
 		goto init_err_pci_clean;
 
@@ -11036,24 +11108,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto init_err_pci_clean;
 	}
 
-	/* Default RSS hash cfg. */
-	bp->rss_hash_cfg = VNIC_RSS_CFG_REQ_HASH_TYPE_IPV4 |
-			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4 |
-			   VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6 |
-			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV6;
-	if (BNXT_CHIP_P4(bp) && bp->hwrm_spec_code >= 0x10501) {
-		bp->flags |= BNXT_FLAG_UDP_RSS_CAP;
-		bp->rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV4 |
-				    VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV6;
-	}
-
-	if (bnxt_rfs_supported(bp)) {
-		dev->hw_features |= NETIF_F_NTUPLE;
-		if (bnxt_rfs_capable(bp)) {
-			bp->flags |= BNXT_FLAG_RFS;
-			dev->features |= NETIF_F_NTUPLE;
-		}
-	}
+	bnxt_fw_init_one_p3(bp);
 
 	if (dev->hw_features & NETIF_F_HW_VLAN_CTAG_RX)
 		bp->flags |= BNXT_FLAG_STRIP_VLAN;
@@ -11067,16 +11122,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 */
 	bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
 
-	bnxt_get_wol_settings(bp);
-	if (bp->flags & BNXT_FLAG_WOL_CAP)
-		device_set_wakeup_enable(&pdev->dev, bp->wol);
-	else
-		device_set_wakeup_capable(&pdev->dev, false);
-
-	bnxt_hwrm_set_cache_line_size(bp, cache_line_size());
-
-	bnxt_hwrm_coal_params_qcaps(bp);
-
 	if (BNXT_PF(bp)) {
 		if (!bnxt_pf_wq) {
 			bnxt_pf_wq =
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index 07301cb..e3ed792 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -391,6 +391,7 @@ static int bnxt_hwrm_queue_dscp_qcaps(struct bnxt *bp)
 	struct hwrm_queue_dscp_qcaps_input req = {0};
 	int rc;
 
+	bp->max_dscp_value = 0;
 	if (bp->hwrm_spec_code < 0x10800 || BNXT_VF(bp))
 		return 0;
 
@@ -722,6 +723,7 @@ static const struct dcbnl_rtnl_ops dcbnl_ops = {
 
 void bnxt_dcb_init(struct bnxt *bp)
 {
+	bp->dcbx_cap = 0;
 	if (bp->hwrm_spec_code < 0x10501)
 		return;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index b624174..ec87f0d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -3362,6 +3362,7 @@ void bnxt_ethtool_init(struct bnxt *bp)
 	if (!(bp->fw_cap & BNXT_FW_CAP_PKG_VER))
 		bnxt_get_pkgver(dev);
 
+	bp->num_tests = 0;
 	if (bp->hwrm_spec_code < 0x10704 || !BNXT_SINGLE_PF(bp))
 		return;
 
@@ -3371,7 +3372,9 @@ void bnxt_ethtool_init(struct bnxt *bp)
 	if (rc)
 		goto ethtool_init_exit;
 
-	test_info = kzalloc(sizeof(*bp->test_info), GFP_KERNEL);
+	test_info = bp->test_info;
+	if (!test_info)
+		test_info = kzalloc(sizeof(*bp->test_info), GFP_KERNEL);
 	if (!test_info)
 		goto ethtool_init_exit;
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 02/14] bnxt_en: Prepare bnxt_init_one() to be called multiple times Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  5:36   ` David Miller
  2019-08-26  6:00   ` Leon Romanovsky
  2019-08-26  3:54 ` [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

Refactor the hardware/firmware configuration portion in
bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
new function can be called after a firmware reset to reconfigure the
VFs previously enabled.

When VFs need to be reconfigured dynamically after firmwware reset, the
configuration sequence on the PF needs to be changed to register the VF
buffers first.  Otherwise, some VF firmware commands may not succeed as
there may not be PF buffers ready for the re-directed firmware commands.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 50 +++++++++++++++++--------
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  1 +
 2 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 2b90a2b..93524d7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -687,6 +687,32 @@ static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
 		return bnxt_hwrm_func_cfg(bp, num_vfs);
 }
 
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+{
+	int rc;
+
+	/* Register buffers for VFs */
+	rc = bnxt_hwrm_func_buf_rgtr(bp);
+	if (rc)
+		return rc;
+
+	/* Reserve resources for VFs */
+	rc = bnxt_func_cfg(bp, *num_vfs);
+	if (rc != *num_vfs) {
+		if (rc <= 0) {
+			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
+			*num_vfs = 0;
+			return rc;
+		}
+		netdev_warn(bp->dev, "Only able to reserve resources for %d VFs.\n",
+			    rc);
+		*num_vfs = rc;
+	}
+
+	bnxt_ulp_sriov_cfg(bp, *num_vfs);
+	return 0;
+}
+
 static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 {
 	int rc = 0, vfs_supported;
@@ -752,25 +778,10 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 	if (rc)
 		goto err_out1;
 
-	/* Reserve resources for VFs */
-	rc = bnxt_func_cfg(bp, *num_vfs);
-	if (rc != *num_vfs) {
-		if (rc <= 0) {
-			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
-			*num_vfs = 0;
-			goto err_out2;
-		}
-		netdev_warn(bp->dev, "Only able to reserve resources for %d VFs.\n", rc);
-		*num_vfs = rc;
-	}
-
-	/* Register buffers for VFs */
-	rc = bnxt_hwrm_func_buf_rgtr(bp);
+	rc = bnxt_cfg_hw_sriov(bp, num_vfs);
 	if (rc)
 		goto err_out2;
 
-	bnxt_ulp_sriov_cfg(bp, *num_vfs);
-
 	rc = pci_enable_sriov(bp->pdev, *num_vfs);
 	if (rc)
 		goto err_out2;
@@ -1190,6 +1201,13 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 }
 #else
 
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+{
+	if (*num_vfs)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
 void bnxt_sriov_disable(struct bnxt *bp)
 {
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index 2eed9ed..0abf18e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -36,6 +36,7 @@ int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (2 preceding siblings ...)
  2019-08-26  3:54 ` [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  5:47   ` David Miller
  2019-08-26  3:54 ` [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities Michael Chan
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

During IF_UP, newer firmware has a new status flag that indicates that
firmware has reset.  Add new function bnxt_fw_init_one() to re-probe the
firmware and re-setup VF resources on the PF if necessary.  If the
re-probe fails, set a flag to prevent bnxt_open() from proceeding again.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 112 ++++++++++++++++++++++++------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   2 +
 2 files changed, 93 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 3e7a559..94641c9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7005,7 +7005,9 @@ static int __bnxt_hwrm_ver_get(struct bnxt *bp, bool silent)
 
 	rc = bnxt_hwrm_do_send_msg(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT,
 				   silent);
-	return rc;
+	if (rc)
+		return -ENODEV;
+	return 0;
 }
 
 static int bnxt_hwrm_ver_get(struct bnxt *bp)
@@ -8513,11 +8515,14 @@ static int bnxt_hwrm_shutdown_link(struct bnxt *bp)
 	return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 }
 
+static int bnxt_fw_init_one(struct bnxt *bp);
+
 static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 {
 	struct hwrm_func_drv_if_change_output *resp = bp->hwrm_cmd_resp_addr;
 	struct hwrm_func_drv_if_change_input req = {0};
-	bool resc_reinit = false;
+	bool resc_reinit = false, fw_reset = false;
+	u32 flags = 0;
 	int rc;
 
 	if (!(bp->fw_cap & BNXT_FW_CAP_IF_CHANGE))
@@ -8528,26 +8533,53 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 		req.flags = cpu_to_le32(FUNC_DRV_IF_CHANGE_REQ_FLAGS_UP);
 	mutex_lock(&bp->hwrm_cmd_lock);
 	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (!rc && (resp->flags &
-		    cpu_to_le32(FUNC_DRV_IF_CHANGE_RESP_FLAGS_RESC_CHANGE)))
-		resc_reinit = true;
+	if (!rc)
+		flags = le32_to_cpu(resp->flags);
 	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (rc)
+		return -EIO;
 
-	if (up && resc_reinit && BNXT_NEW_RM(bp)) {
-		struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+	if (!up)
+		return 0;
 
-		rc = bnxt_hwrm_func_resc_qcaps(bp, true);
-		hw_resc->resv_cp_rings = 0;
-		hw_resc->resv_stat_ctxs = 0;
-		hw_resc->resv_irqs = 0;
-		hw_resc->resv_tx_rings = 0;
-		hw_resc->resv_rx_rings = 0;
-		hw_resc->resv_hw_ring_grps = 0;
-		hw_resc->resv_vnics = 0;
-		bp->tx_nr_rings = 0;
-		bp->rx_nr_rings = 0;
+	if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_RESC_CHANGE)
+		resc_reinit = true;
+	if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE)
+		fw_reset = true;
+
+	if (resc_reinit || fw_reset) {
+		if (fw_reset) {
+			rc = bnxt_fw_init_one(bp);
+			if (rc) {
+				set_bit(BNXT_STATE_PROBE_ERR, &bp->state);
+				return rc;
+			}
+			bnxt_clear_int_mode(bp);
+			rc = bnxt_init_int_mode(bp);
+			if (rc) {
+				netdev_err(bp->dev, "init int mode failed\n");
+				return rc;
+			}
+			set_bit(BNXT_STATE_FW_RESET_DET, &bp->state);
+		}
+		if (BNXT_NEW_RM(bp)) {
+			struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+
+			rc = bnxt_hwrm_func_resc_qcaps(bp, true);
+			hw_resc->resv_cp_rings = 0;
+			hw_resc->resv_stat_ctxs = 0;
+			hw_resc->resv_irqs = 0;
+			hw_resc->resv_tx_rings = 0;
+			hw_resc->resv_rx_rings = 0;
+			hw_resc->resv_hw_ring_grps = 0;
+			hw_resc->resv_vnics = 0;
+			if (!fw_reset) {
+				bp->tx_nr_rings = 0;
+				bp->rx_nr_rings = 0;
+			}
+		}
 	}
-	return rc;
+	return 0;
 }
 
 static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp)
@@ -8699,6 +8731,9 @@ static void bnxt_hwmon_open(struct bnxt *bp)
 {
 	struct pci_dev *pdev = bp->pdev;
 
+	if (bp->hwmon_dev)
+		return;
+
 	bp->hwmon_dev = hwmon_device_register_with_groups(&pdev->dev,
 							  DRV_MODULE_NAME, bp,
 							  bnxt_groups);
@@ -8964,12 +8999,26 @@ static int bnxt_open(struct net_device *dev)
 	struct bnxt *bp = netdev_priv(dev);
 	int rc;
 
-	bnxt_hwrm_if_change(bp, true);
-	rc = __bnxt_open_nic(bp, true, true);
+	if (test_bit(BNXT_STATE_PROBE_ERR, &bp->state))
+		return -ENODEV;
+
+	rc = bnxt_hwrm_if_change(bp, true);
 	if (rc)
+		return rc;
+	rc = __bnxt_open_nic(bp, true, true);
+	if (rc) {
 		bnxt_hwrm_if_change(bp, false);
+	} else {
+		if (test_and_clear_bit(BNXT_STATE_FW_RESET_DET, &bp->state) &&
+		    BNXT_PF(bp)) {
+			struct bnxt_pf_info *pf = &bp->pf;
+			int n = pf->active_vfs;
 
-	bnxt_hwmon_open(bp);
+			if (n)
+				bnxt_cfg_hw_sriov(bp, &n);
+		}
+		bnxt_hwmon_open(bp);
+	}
 
 	return rc;
 }
@@ -10062,6 +10111,27 @@ static void bnxt_fw_init_one_p3(struct bnxt *bp)
 	bnxt_hwrm_coal_params_qcaps(bp);
 }
 
+static int bnxt_fw_init_one(struct bnxt *bp)
+{
+	int rc;
+
+	rc = bnxt_fw_init_one_p1(bp);
+	if (rc) {
+		netdev_err(bp->dev, "Firmware init phase 1 failed\n");
+		return rc;
+	}
+	rc = bnxt_fw_init_one_p2(bp);
+	if (rc) {
+		netdev_err(bp->dev, "Firmware init phase 2 failed\n");
+		return rc;
+	}
+	rc = bnxt_approve_mac(bp, bp->dev->dev_addr, false);
+	if (rc)
+		return rc;
+	bnxt_fw_init_one_p3(bp);
+	return 0;
+}
+
 static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 {
 	int rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1b1610d..b515d83e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1555,6 +1555,8 @@ struct bnxt {
 #define BNXT_STATE_OPEN		0
 #define BNXT_STATE_IN_SP_TASK	1
 #define BNXT_STATE_READ_STATS	2
+#define BNXT_STATE_FW_RESET_DET 3
+#define BNXT_STATE_PROBE_ERR	5
 
 	struct bnxt_irq	*irq_tbl;
 	int			total_irqs;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (3 preceding siblings ...)
  2019-08-26  3:54 ` [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  5:49   ` David Miller
  2019-08-26  3:54 ` [PATCH net-next 06/14] bnxt_en: Pre-map the firmware health monitoring registers Michael Chan
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 79 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 38 +++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 94641c9..ef43f9e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6833,6 +6833,8 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp)
 		bp->fw_cap |= BNXT_FW_CAP_PCIE_STATS_SUPPORTED;
 	if (flags & FUNC_QCAPS_RESP_FLAGS_EXT_STATS_SUPPORTED)
 		bp->fw_cap |= BNXT_FW_CAP_EXT_STATS_SUPPORTED;
+	if (flags &  FUNC_QCAPS_RESP_FLAGS_ERROR_RECOVERY_CAPABLE)
+		bp->fw_cap |= BNXT_FW_CAP_ERROR_RECOVERY;
 
 	bp->tx_push_thresh = 0;
 	if (flags & FUNC_QCAPS_RESP_FLAGS_PUSH_MODE_SUPPORTED)
@@ -6934,6 +6936,76 @@ static int bnxt_hwrm_cfa_adv_flow_mgnt_qcaps(struct bnxt *bp)
 	return rc;
 }
 
+static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
+{
+	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	struct hwrm_error_recovery_qcfg_input req = {0};
+	int rc, i;
+
+	if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
+		return 0;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
+	mutex_lock(&bp->hwrm_cmd_lock);
+	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc) {
+		rc = -EOPNOTSUPP;
+		goto err_recovery_out;
+	}
+	if (!fw_health) {
+		fw_health = kzalloc(sizeof(*fw_health), GFP_KERNEL);
+		bp->fw_health = fw_health;
+		if (!fw_health) {
+			rc = -ENOMEM;
+			goto err_recovery_out;
+		}
+	}
+	fw_health->flags = le32_to_cpu(resp->flags);
+	if ((fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU) &&
+	    !(bp->fw_cap & BNXT_FW_CAP_KONG_MB_CHNL)) {
+		rc = -EINVAL;
+		goto err_recovery_out;
+	}
+	fw_health->polling_dsecs = le32_to_cpu(resp->driver_polling_freq);
+	fw_health->master_func_wait_dsecs =
+		le32_to_cpu(resp->master_func_wait_period);
+	fw_health->normal_func_wait_dsecs =
+		le32_to_cpu(resp->normal_func_wait_period);
+	fw_health->post_reset_wait_dsecs =
+		le32_to_cpu(resp->master_func_wait_period_after_reset);
+	fw_health->post_reset_max_wait_dsecs =
+		le32_to_cpu(resp->max_bailout_time_after_reset);
+	fw_health->regs[BNXT_FW_HEALTH_REG] =
+		le32_to_cpu(resp->fw_health_status_reg);
+	fw_health->regs[BNXT_FW_HEARTBEAT_REG] =
+		le32_to_cpu(resp->fw_heartbeat_reg);
+	fw_health->regs[BNXT_FW_RESET_CNT_REG] =
+		le32_to_cpu(resp->fw_reset_cnt_reg);
+	fw_health->regs[BNXT_FW_RESET_INPROG_REG] =
+		le32_to_cpu(resp->reset_inprogress_reg);
+	fw_health->fw_reset_inprog_reg_mask =
+		le32_to_cpu(resp->reset_inprogress_reg_mask);
+	fw_health->fw_reset_seq_cnt = resp->reg_array_cnt;
+	if (fw_health->fw_reset_seq_cnt >= 16) {
+		rc = -EINVAL;
+		goto err_recovery_out;
+	}
+	for (i = 0; i < fw_health->fw_reset_seq_cnt; i++) {
+		fw_health->fw_reset_seq_regs[i] =
+			le32_to_cpu(resp->reset_reg[i]);
+		fw_health->fw_reset_seq_vals[i] =
+			le32_to_cpu(resp->reset_reg_val[i]);
+		fw_health->fw_reset_seq_delay_msec[i] =
+			resp->delay_after_reset[i];
+	}
+err_recovery_out:
+	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (rc)
+		bp->fw_cap &= ~BNXT_FW_CAP_ERROR_RECOVERY;
+	return rc;
+}
+
 static int bnxt_hwrm_func_reset(struct bnxt *bp)
 {
 	struct hwrm_func_reset_input req = {0};
@@ -10048,6 +10120,11 @@ static int bnxt_fw_init_one_p2(struct bnxt *bp)
 		netdev_warn(bp->dev, "hwrm query adv flow mgnt failure rc: %d\n",
 			    rc);
 
+	rc = bnxt_hwrm_error_recovery_qcfg(bp);
+	if (rc)
+		netdev_warn(bp->dev, "hwrm query error recovery failure rc: %d\n",
+			    rc);
+
 	rc = bnxt_hwrm_func_drv_rgtr(bp);
 	if (rc)
 		return -ENODEV;
@@ -11228,6 +11305,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	bnxt_free_ctx_mem(bp);
 	kfree(bp->ctx);
 	bp->ctx = NULL;
+	kfree(bp->fw_health);
+	bp->fw_health = NULL;
 	bnxt_cleanup_pci(bp);
 
 init_err_free:
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index b515d83e..741ae68 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1333,6 +1333,41 @@ struct bnxt_ctx_mem_info {
 	struct bnxt_ctx_pg_info *tqm_mem[9];
 };
 
+struct bnxt_fw_health {
+	u32 flags;
+	u32 polling_dsecs;
+	u32 master_func_wait_dsecs;
+	u32 normal_func_wait_dsecs;
+	u32 post_reset_wait_dsecs;
+	u32 post_reset_max_wait_dsecs;
+	u32 regs[4];
+	u32 mapped_regs[4];
+#define BNXT_FW_HEALTH_REG		0
+#define BNXT_FW_HEARTBEAT_REG		1
+#define BNXT_FW_RESET_CNT_REG		2
+#define BNXT_FW_RESET_INPROG_REG	3
+	u32 fw_reset_inprog_reg_mask;
+	u32 last_fw_heartbeat;
+	u32 last_fw_reset_cnt;
+	u8 enabled:1;
+	u8 master:1;
+	u8 tmr_multiplier;
+	u8 tmr_counter;
+	u8 fw_reset_seq_cnt;
+	u32 fw_reset_seq_regs[16];
+	u32 fw_reset_seq_vals[16];
+	u32 fw_reset_seq_delay_msec[16];
+};
+
+#define BNXT_FW_HEALTH_REG_TYPE_MASK	3
+#define BNXT_FW_HEALTH_REG_TYPE_CFG	0
+#define BNXT_FW_HEALTH_REG_TYPE_GRC	1
+#define BNXT_FW_HEALTH_REG_TYPE_BAR0	2
+#define BNXT_FW_HEALTH_REG_TYPE_BAR1	3
+
+#define BNXT_FW_HEALTH_REG_TYPE(reg)	((reg) & BNXT_FW_HEALTH_REG_TYPE_MASK)
+#define BNXT_FW_HEALTH_REG_OFF(reg)	((reg) & ~BNXT_FW_HEALTH_REG_TYPE_MASK)
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
@@ -1581,6 +1616,7 @@ struct bnxt {
 	#define BNXT_FW_CAP_KONG_MB_CHNL		0x00000080
 	#define BNXT_FW_CAP_OVS_64BIT_HANDLE		0x00000400
 	#define BNXT_FW_CAP_TRUSTED_VF			0x00000800
+	#define BNXT_FW_CAP_ERROR_RECOVERY		0x00002000
 	#define BNXT_FW_CAP_PKG_VER			0x00004000
 	#define BNXT_FW_CAP_CFA_ADV_FLOW		0x00008000
 	#define BNXT_FW_CAP_CFA_RFS_RING_TBL_IDX	0x00010000
@@ -1666,6 +1702,8 @@ struct bnxt {
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 
+	struct bnxt_fw_health	*fw_health;
+
 	struct bnxt_hw_resc	hw_resc;
 	struct bnxt_pf_info	pf;
 	struct bnxt_ctx_mem_info	*ctx;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 06/14] bnxt_en: Pre-map the firmware health monitoring registers.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (4 preceding siblings ...)
  2019-08-26  3:54 ` [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 07/14] bnxt_en: Enable health monitoring Michael Chan
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

Pre-map the GRC registers for periodic firmware health monitoring.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 29 +++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  6 ++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ef43f9e..9ab5024 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6936,6 +6936,33 @@ static int bnxt_hwrm_cfa_adv_flow_mgnt_qcaps(struct bnxt *bp)
 	return rc;
 }
 
+static int bnxt_map_fw_health_regs(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg_base = 0xffffffff;
+	int i;
+
+	/* Only pre-map the monitoring GRC registers using window 3 */
+	for (i = 0; i < 4; i++) {
+		u32 reg = fw_health->regs[i];
+
+		if (BNXT_FW_HEALTH_REG_TYPE(reg) != BNXT_FW_HEALTH_REG_TYPE_GRC)
+			continue;
+		if (reg_base == 0xffffffff)
+			reg_base = reg & BNXT_GRC_BASE_MASK;
+		if ((reg & BNXT_GRC_BASE_MASK) != reg_base)
+			return -ERANGE;
+		fw_health->mapped_regs[i] = BNXT_FW_HEALTH_WIN_BASE +
+					    (reg & BNXT_GRC_OFFSET_MASK);
+	}
+	if (reg_base == 0xffffffff)
+		return 0;
+
+	writel(reg_base, bp->bar0 + BNXT_GRCPF_REG_WINDOW_BASE_OUT +
+			 BNXT_FW_HEALTH_WIN_MAP_OFF);
+	return 0;
+}
+
 static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 {
 	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
@@ -7001,6 +7028,8 @@ static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 	}
 err_recovery_out:
 	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (!rc)
+		rc = bnxt_map_fw_health_regs(bp);
 	if (rc)
 		bp->fw_cap &= ~BNXT_FW_CAP_ERROR_RECOVERY;
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 741ae68..6053dfd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1217,6 +1217,9 @@ struct bnxt_test_info {
 #define BNXT_GRCPF_REG_KONG_COMM		0xA00
 #define BNXT_GRCPF_REG_KONG_COMM_TRIGGER	0xB00
 
+#define BNXT_GRC_BASE_MASK			0xfffff000
+#define BNXT_GRC_OFFSET_MASK			0x00000ffc
+
 struct bnxt_tc_flow_stats {
 	u64		packets;
 	u64		bytes;
@@ -1368,6 +1371,9 @@ struct bnxt_fw_health {
 #define BNXT_FW_HEALTH_REG_TYPE(reg)	((reg) & BNXT_FW_HEALTH_REG_TYPE_MASK)
 #define BNXT_FW_HEALTH_REG_OFF(reg)	((reg) & ~BNXT_FW_HEALTH_REG_TYPE_MASK)
 
+#define BNXT_FW_HEALTH_WIN_BASE		0x3000
+#define BNXT_FW_HEALTH_WIN_MAP_OFF	8
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 07/14] bnxt_en: Enable health monitoring.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (5 preceding siblings ...)
  2019-08-26  3:54 ` [PATCH net-next 06/14] bnxt_en: Pre-map the firmware health monitoring registers Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  3:54 ` [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs Michael Chan
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

Handle the async event from the firmware that enables firmware health
monitoring.  Store initial health metrics.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 66 ++++++++++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  9 +++++
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9ab5024..f522f3b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -254,6 +254,7 @@ static const u16 bnxt_async_events_arr[] = {
 	ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
 	ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE,
 	ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
+	ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY,
 };
 
 static struct workqueue_struct *bnxt_pf_wq;
@@ -1896,6 +1897,33 @@ static int bnxt_force_rx_discard(struct bnxt *bp,
 	return bnxt_rx_pkt(bp, cpr, raw_cons, event);
 }
 
+u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg = fw_health->regs[reg_idx];
+	u32 reg_type, reg_off, val = 0;
+
+	reg_type = BNXT_FW_HEALTH_REG_TYPE(reg);
+	reg_off = BNXT_FW_HEALTH_REG_OFF(reg);
+	switch (reg_type) {
+	case BNXT_FW_HEALTH_REG_TYPE_CFG:
+		pci_read_config_dword(bp->pdev, reg_off, &val);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_GRC:
+		reg_off = fw_health->mapped_regs[reg_idx];
+		/* fall through */
+	case BNXT_FW_HEALTH_REG_TYPE_BAR0:
+		val = readl(bp->bar0 + reg_off);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_BAR1:
+		val = readl(bp->bar1 + reg_off);
+		break;
+	}
+	if (reg_idx == BNXT_FW_RESET_INPROG_REG)
+		val &= fw_health->fw_reset_inprog_reg_mask;
+	return val;
+}
+
 #define BNXT_GET_EVENT_PORT(data)	\
 	((data) &			\
 	 ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK)
@@ -1951,6 +1979,35 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
+	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
+		struct bnxt_fw_health *fw_health = bp->fw_health;
+		u32 data1 = le32_to_cpu(cmpl->event_data1);
+
+		if (!fw_health)
+			goto async_event_process_exit;
+
+		fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1);
+		fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1);
+		if (!fw_health->enabled)
+			break;
+
+		if (netif_msg_drv(bp))
+			netdev_info(bp->dev, "Error recovery info: error recovery[%d], master[%d], reset count[0x%x], health status: 0x%x\n",
+				    fw_health->enabled, fw_health->master,
+				    bnxt_fw_health_readl(bp,
+							 BNXT_FW_RESET_CNT_REG),
+				    bnxt_fw_health_readl(bp,
+							 BNXT_FW_HEALTH_REG));
+		fw_health->tmr_multiplier =
+			DIV_ROUND_UP(fw_health->polling_dsecs * HZ,
+				     bp->current_interval * 10);
+		fw_health->tmr_counter = fw_health->tmr_multiplier;
+		fw_health->last_fw_heartbeat =
+			bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+		fw_health->last_fw_reset_cnt =
+			bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+		goto async_event_process_exit;
+	}
 	default:
 		goto async_event_process_exit;
 	}
@@ -4286,9 +4343,14 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap,
 		cpu_to_le32(FUNC_DRV_RGTR_REQ_ENABLES_ASYNC_EVENT_FWD);
 
 	memset(async_events_bmap, 0, sizeof(async_events_bmap));
-	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++)
-		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
+	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++) {
+		u16 event_id = bnxt_async_events_arr[i];
 
+		if (event_id == ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY &&
+		    !(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
+			continue;
+		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
+	}
 	if (bmap && bmap_size) {
 		for (i = 0; i < bmap_size; i++) {
 			if (test_bit(i, bmap))
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 6053dfd..96f2e12 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -472,6 +472,14 @@ struct rx_tpa_end_cmp_ext {
 	((le32_to_cpu((rx_tpa_end_ext)->rx_tpa_end_cmp_dup_acks) &	\
 	 RX_TPA_END_CMP_AGG_BUFS_P5) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT_P5)
 
+#define EVENT_DATA1_RECOVERY_MASTER_FUNC(data1)				\
+	!!((data1) &							\
+	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC)
+
+#define EVENT_DATA1_RECOVERY_ENABLED(data1)				\
+	!!((data1) &							\
+	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_RECOVERY_ENABLED)
+
 struct nqe_cn {
 	__le16	type;
 	#define NQ_CN_TYPE_MASK           0x3fUL
@@ -1914,6 +1922,7 @@ extern const u16 bnxt_lhint_arr[];
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		       u16 prod, gfp_t gfp);
 void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data);
+u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx);
 void bnxt_set_tpa_flags(struct bnxt *bp);
 void bnxt_set_ring_params(struct bnxt *);
 int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (6 preceding siblings ...)
  2019-08-26  3:54 ` [PATCH net-next 07/14] bnxt_en: Enable health monitoring Michael Chan
@ 2019-08-26  3:54 ` Michael Chan
  2019-08-26  5:59   ` David Miller
  2019-08-26  3:55 ` [PATCH net-next 09/14] bnxt_en: Add new FW devlink_health_reporter Michael Chan
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

The new flag will be set in subsequent patches when firmware is
going through reset.  Delay close and SRIOV disable operations until
the flag is cleared.

Store the registered_vfs value from the firmware.  This value will
be polled in subsequent patches to determine if SR-IOV is in quiescent
state.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 16 ++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  2 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |  4 ++++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f522f3b..be0eb1c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6326,6 +6326,8 @@ static int bnxt_hwrm_func_qcfg(struct bnxt *bp)
 		struct bnxt_vf_info *vf = &bp->vf;
 
 		vf->vlan = le16_to_cpu(resp->vlan) & VLAN_VID_MASK;
+	} else {
+		bp->pf.registered_vfs = le16_to_cpu(resp->registered_vfs);
 	}
 #endif
 	flags = le16_to_cpu(resp->flags);
@@ -8710,6 +8712,10 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 	if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE)
 		fw_reset = true;
 
+	if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state) && !fw_reset) {
+		netdev_err(bp->dev, "RESET_DONE not set during FW reset.\n");
+		return -ENODEV;
+	}
 	if (resc_reinit || fw_reset) {
 		if (fw_reset) {
 			rc = bnxt_fw_init_one(bp);
@@ -9218,6 +9224,9 @@ static void __bnxt_close_nic(struct bnxt *bp, bool irq_re_init,
 	bnxt_debug_dev_exit(bp);
 	bnxt_disable_napi(bp);
 	del_timer_sync(&bp->timer);
+	if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		pci_disable_device(bp->pdev);
+
 	bnxt_free_skbs(bp);
 
 	/* Save ring stats before shutdown */
@@ -9234,6 +9243,13 @@ int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
 {
 	int rc = 0;
 
+	while (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		netdev_info(bp->dev, "FW reset in progress, delaying close");
+		rtnl_unlock();
+		msleep(250);
+		rtnl_lock();
+	}
+
 #ifdef CONFIG_BNXT_SRIOV
 	if (bp->sriov_cfg) {
 		rc = wait_event_interruptible_timeout(bp->sriov_cfg_wait,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 96f2e12..c316de5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1066,6 +1066,7 @@ struct bnxt_pf_info {
 	u8	mac_addr[ETH_ALEN];
 	u32	first_vf_id;
 	u16	active_vfs;
+	u16	registered_vfs;
 	u16	max_vfs;
 	u32	max_encap_records;
 	u32	max_decap_records;
@@ -1605,6 +1606,7 @@ struct bnxt {
 #define BNXT_STATE_IN_SP_TASK	1
 #define BNXT_STATE_READ_STATS	2
 #define BNXT_STATE_FW_RESET_DET 3
+#define BNXT_STATE_IN_FW_RESET	4
 #define BNXT_STATE_PROBE_ERR	5
 
 	struct bnxt_irq	*irq_tbl;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 93524d7..e435374 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -802,6 +802,10 @@ void bnxt_sriov_disable(struct bnxt *bp)
 {
 	u16 num_vfs = pci_num_vf(bp->pdev);
 
+	while (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		netdev_info(bp->dev, "FW reset in progress, delaying SR-IOV config");
+		msleep(250);
+	}
 	if (!num_vfs)
 		return;
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 09/14] bnxt_en: Add new FW devlink_health_reporter
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (7 preceding siblings ...)
  2019-08-26  3:54 ` [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs Michael Chan
@ 2019-08-26  3:55 ` Michael Chan
  2019-08-26  3:55 ` [PATCH net-next 10/14] bnxt_en: Handle RESET_NOTIFY async event from firmware Michael Chan
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Create new FW devlink_health_reporter, to know the current health
status of FW.

Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw

pci/0000:af:00.0:
  name fw
    state healthy error 0 recover 0

 FW status: Healthy; Reset count: 1

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  3 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 81 +++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index c316de5..5d291f5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1369,6 +1369,7 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_regs[16];
 	u32 fw_reset_seq_vals[16];
 	u32 fw_reset_seq_delay_msec[16];
+	struct devlink_health_reporter	*fw_reporter;
 };
 
 #define BNXT_FW_HEALTH_REG_TYPE_MASK	3
@@ -1383,6 +1384,8 @@ struct bnxt_fw_health {
 #define BNXT_FW_HEALTH_WIN_BASE		0x3000
 #define BNXT_FW_HEALTH_WIN_MAP_OFF	8
 
+#define BNXT_FW_STATUS_HEALTHY		0x8000
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index c05d663..7eb1a25 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -15,6 +15,84 @@
 #include "bnxt_vfr.h"
 #include "bnxt_devlink.h"
 
+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
+				     struct devlink_fmsg *fmsg)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+	struct bnxt_fw_health *health = bp->fw_health;
+	u32 val, health_status;
+	int rc;
+
+	if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return 0;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
+	health_status = val & 0xffff;
+
+	if (health_status == BNXT_FW_STATUS_HEALTHY) {
+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
+						  "Healthy;");
+		if (rc)
+			return rc;
+	} else if (health_status < BNXT_FW_STATUS_HEALTHY) {
+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
+						  "Not yet completed initialization;");
+		if (rc)
+			return rc;
+	} else if (health_status > BNXT_FW_STATUS_HEALTHY) {
+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
+						  "Encountered fatal error and cannot recover;");
+		if (rc)
+			return rc;
+	}
+
+	if (val >> 16) {
+		rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
+		if (rc)
+			return rc;
+	}
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+static const struct devlink_health_reporter_ops bnxt_dl_fw_reporter_ops = {
+	.name = "fw",
+	.diagnose = bnxt_fw_reporter_diagnose,
+};
+
+static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
+{
+	struct bnxt_fw_health *health = bp->fw_health;
+
+	if (!health)
+		return;
+
+	health->fw_reporter =
+		devlink_health_reporter_create(bp->dl, &bnxt_dl_fw_reporter_ops,
+					       0, false, bp);
+	if (IS_ERR(health->fw_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_reporter));
+		health->fw_reporter = NULL;
+	}
+}
+
+static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
+{
+	struct bnxt_fw_health *health = bp->fw_health;
+
+	if (!health)
+		return;
+
+	if (health->fw_reporter)
+		devlink_health_reporter_destroy(health->fw_reporter);
+}
+
 static const struct devlink_ops bnxt_dl_ops = {
 #ifdef CONFIG_BNXT_SRIOV
 	.eswitch_mode_set = bnxt_dl_eswitch_mode_set,
@@ -251,6 +329,8 @@ int bnxt_dl_register(struct bnxt *bp)
 
 	devlink_params_publish(dl);
 
+	bnxt_dl_fw_reporters_create(bp);
+
 	return 0;
 
 err_dl_port_unreg:
@@ -273,6 +353,7 @@ void bnxt_dl_unregister(struct bnxt *bp)
 	if (!dl)
 		return;
 
+	bnxt_dl_fw_reporters_destroy(bp);
 	devlink_port_params_unregister(&bp->dl_port, bnxt_dl_port_params,
 				       ARRAY_SIZE(bnxt_dl_port_params));
 	devlink_port_unregister(&bp->dl_port);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 10/14] bnxt_en: Handle RESET_NOTIFY async event from firmware.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (8 preceding siblings ...)
  2019-08-26  3:55 ` [PATCH net-next 09/14] bnxt_en: Add new FW devlink_health_reporter Michael Chan
@ 2019-08-26  3:55 ` Michael Chan
  2019-08-26  3:55 ` [PATCH net-next 11/14] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event Michael Chan
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

This event from firmware signals a coordinated reset initiated by the
firmware.  It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.

Add devlink health reporter for this event.  The driver will perform
an orderly shutdown and will unregister with the firmware.  If the PF
has active and registered VFs, it will wait for all VFs to unregister
first before shutdown.  After that, it will poll for firmware to come
out of reset.  When firmware starts responding, the driver will re-probe
all capabilities before initializing the device.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 164 +++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  19 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c |  52 +++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c     |   3 +
 5 files changed, 238 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index be0eb1c..12ab4ae 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -254,6 +254,7 @@ static const u16 bnxt_async_events_arr[] = {
 	ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
 	ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE,
 	ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
+	ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY,
 	ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY,
 };
 
@@ -1139,6 +1140,14 @@ static int bnxt_discard_rx(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	return 0;
 }
 
+static void bnxt_queue_fw_reset_work(struct bnxt *bp, unsigned long delay)
+{
+	if (BNXT_PF(bp))
+		queue_delayed_work(bnxt_pf_wq, &bp->fw_reset_task, delay);
+	else
+		schedule_delayed_work(&bp->fw_reset_task, delay);
+}
+
 static void bnxt_queue_sp_work(struct bnxt *bp)
 {
 	if (BNXT_PF(bp))
@@ -1979,6 +1988,16 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
+	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+		bp->fw_reset_timestamp = jiffies;
+		bp->fw_reset_min_dsecs = cmpl->timestamp_lo;
+		if (!bp->fw_reset_min_dsecs)
+			bp->fw_reset_min_dsecs = 20;
+		bp->fw_reset_max_dsecs = le16_to_cpu(cmpl->timestamp_hi);
+		if (!bp->fw_reset_max_dsecs)
+			bp->fw_reset_max_dsecs = 60;
+		set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event);
+		break;
 	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
 		struct bnxt_fw_health *fw_health = bp->fw_health;
 		u32 data1 = le32_to_cpu(cmpl->event_data1);
@@ -4377,7 +4396,8 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 			    FUNC_DRV_RGTR_REQ_ENABLES_VER);
 
 	req.os_type = cpu_to_le16(FUNC_DRV_RGTR_REQ_OS_TYPE_LINUX);
-	req.flags = cpu_to_le32(FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE);
+	req.flags = cpu_to_le32(FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE |
+				FUNC_DRV_RGTR_REQ_FLAGS_HOT_RESET_SUPPORT);
 	req.ver_maj_8b = DRV_VER_MAJ;
 	req.ver_min_8b = DRV_VER_MIN;
 	req.ver_upd_8b = DRV_VER_UPD;
@@ -9957,6 +9977,53 @@ static void bnxt_reset(struct bnxt *bp, bool silent)
 	bnxt_rtnl_unlock_sp(bp);
 }
 
+static void bnxt_fw_reset_close(struct bnxt *bp)
+{
+	__bnxt_close_nic(bp, true, false);
+	bnxt_ulp_irq_stop(bp);
+	bnxt_clear_int_mode(bp);
+	bnxt_hwrm_func_drv_unrgtr(bp);
+	bnxt_free_ctx_mem(bp);
+	kfree(bp->ctx);
+	bp->ctx = NULL;
+}
+
+void bnxt_fw_reset(struct bnxt *bp)
+{
+	int rc;
+
+	bnxt_rtnl_lock_sp(bp);
+	if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
+	    !test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+		if (BNXT_PF(bp) && bp->pf.active_vfs) {
+			rc = bnxt_hwrm_func_qcfg(bp);
+			if (rc) {
+				netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
+					   rc);
+				clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+				dev_close(bp->dev);
+				goto fw_reset_exit;
+			}
+			if (bp->pf.registered_vfs) {
+				u16 vf_tmo_dsecs = bp->pf.registered_vfs * 10;
+
+				if (bp->fw_reset_max_dsecs < vf_tmo_dsecs)
+					bp->fw_reset_max_dsecs = vf_tmo_dsecs;
+				bp->fw_reset_state =
+					BNXT_FW_RESET_STATE_POLL_VF;
+				bnxt_queue_fw_reset_work(bp, HZ / 10);
+				goto fw_reset_exit;
+			}
+		}
+		bnxt_fw_reset_close(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
+	}
+fw_reset_exit:
+	bnxt_rtnl_unlock_sp(bp);
+}
+
 static void bnxt_chk_missed_irq(struct bnxt *bp)
 {
 	int i;
@@ -10087,6 +10154,9 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event))
 		bnxt_reset(bp, true);
 
+	if (test_and_clear_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event))
+		bnxt_devlink_health_report(bp, BNXT_FW_RESET_NOTIFY_SP_EVENT);
+
 	smp_mb__before_atomic();
 	clear_bit(BNXT_STATE_IN_SP_TASK, &bp->state);
 }
@@ -10316,6 +10386,97 @@ static int bnxt_fw_init_one(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_fw_reset_task(struct work_struct *work)
+{
+	struct bnxt *bp = container_of(work, struct bnxt, fw_reset_task.work);
+	int rc;
+
+	if (!test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		netdev_err(bp->dev, "bnxt_fw_reset_task() called when not in fw reset mode!\n");
+		return;
+	}
+
+	switch (bp->fw_reset_state) {
+	case BNXT_FW_RESET_STATE_POLL_VF:
+		rc = bnxt_hwrm_func_qcfg(bp);
+		if (rc) {
+			netdev_err(bp->dev, "Firmware reset aborted, subsequent func_qcfg cmd failed, rc = %d, %d msecs since reset timestamp\n",
+				   rc, jiffies_to_msecs(jiffies -
+				   bp->fw_reset_timestamp));
+			goto fw_reset_abort;
+		}
+		if (bp->pf.registered_vfs) {
+			if (time_after(jiffies, bp->fw_reset_timestamp +
+				       (bp->fw_reset_max_dsecs * HZ / 10))) {
+				clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+				bp->fw_reset_state = 0;
+				netdev_err(bp->dev, "Firmware reset aborted, %d VFs still registered\n",
+					   bp->pf.registered_vfs);
+				return;
+			}
+			bnxt_queue_fw_reset_work(bp, HZ / 10);
+			return;
+		}
+		bp->fw_reset_timestamp = jiffies;
+		rtnl_lock();
+		bnxt_fw_reset_close(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		rtnl_unlock();
+		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
+		return;
+	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		if (pci_enable_device(bp->pdev)) {
+			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
+			goto fw_reset_abort;
+		}
+		pci_set_master(bp->pdev);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_POLL_FW;
+		/* fall through */
+	case BNXT_FW_RESET_STATE_POLL_FW:
+		bp->hwrm_cmd_timeout = SHORT_HWRM_CMD_TIMEOUT;
+		rc = __bnxt_hwrm_ver_get(bp, true);
+		if (rc) {
+			if (time_after(jiffies, bp->fw_reset_timestamp +
+				       (bp->fw_reset_max_dsecs * HZ / 10))) {
+				netdev_err(bp->dev, "Firmware reset aborted\n");
+				goto fw_reset_abort;
+			}
+			bnxt_queue_fw_reset_work(bp, HZ / 5);
+			return;
+		}
+		bp->hwrm_cmd_timeout = DFLT_HWRM_CMD_TIMEOUT;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_OPENING;
+		/* fall through */
+	case BNXT_FW_RESET_STATE_OPENING:
+		while (!rtnl_trylock()) {
+			bnxt_queue_fw_reset_work(bp, HZ / 10);
+			return;
+		}
+		rc = bnxt_open(bp->dev);
+		if (rc) {
+			netdev_err(bp->dev, "bnxt_open_nic() failed\n");
+			clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+			dev_close(bp->dev);
+		}
+		bnxt_ulp_irq_restart(bp, rc);
+		rtnl_unlock();
+
+		bp->fw_reset_state = 0;
+		/* Make sure fw_reset_state is 0 before clearing the flag */
+		smp_mb__before_atomic();
+		clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+		break;
+	}
+	return;
+
+fw_reset_abort:
+	clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+	bp->fw_reset_state = 0;
+	rtnl_lock();
+	dev_close(bp->dev);
+	rtnl_unlock();
+}
+
 static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 {
 	int rc;
@@ -10378,6 +10539,7 @@ static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 	pci_enable_pcie_error_reporting(pdev);
 
 	INIT_WORK(&bp->sp_task, bnxt_sp_task);
+	INIT_DELAYED_WORK(&bp->fw_reset_task, bnxt_fw_reset_task);
 
 	spin_lock_init(&bp->ntp_fltr_lock);
 #if BITS_PER_LONG == 32
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 5d291f5..cb9a9a4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -640,6 +640,7 @@ struct nqe_cn {
 #define BNXT_HWRM_MAX_REQ_LEN		(bp->hwrm_max_req_len)
 #define BNXT_HWRM_SHORT_REQ_LEN		sizeof(struct hwrm_short_input)
 #define DFLT_HWRM_CMD_TIMEOUT		500
+#define SHORT_HWRM_CMD_TIMEOUT		20
 #define HWRM_CMD_TIMEOUT		(bp->hwrm_cmd_timeout)
 #define HWRM_RESET_TIMEOUT		((HWRM_CMD_TIMEOUT) * 4)
 #define HWRM_RESP_ERR_CODE_MASK		0xffff
@@ -1370,6 +1371,11 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_vals[16];
 	u32 fw_reset_seq_delay_msec[16];
 	struct devlink_health_reporter	*fw_reporter;
+	struct devlink_health_reporter *fw_reset_reporter;
+};
+
+struct bnxt_fw_reporter_ctx {
+	unsigned long sp_event;
 };
 
 #define BNXT_FW_HEALTH_REG_TYPE_MASK	3
@@ -1720,6 +1726,18 @@ struct bnxt {
 #define BNXT_FLOW_STATS_SP_EVENT	15
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
+#define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
+
+	struct delayed_work	fw_reset_task;
+	int			fw_reset_state;
+#define BNXT_FW_RESET_STATE_POLL_VF	1
+#define BNXT_FW_RESET_STATE_RESET_FW	2
+#define BNXT_FW_RESET_STATE_ENABLE_DEV	3
+#define BNXT_FW_RESET_STATE_POLL_FW	4
+#define BNXT_FW_RESET_STATE_OPENING	5
+	u16			fw_reset_min_dsecs;
+	u16			fw_reset_max_dsecs;
+	unsigned long		fw_reset_timestamp;
 
 	struct bnxt_fw_health	*fw_health;
 
@@ -1960,6 +1978,7 @@ int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_half_open_nic(struct bnxt *bp);
 void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+void bnxt_fw_reset(struct bnxt *bp);
 int bnxt_check_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
 		     int tx_xdp);
 int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 7eb1a25..c2f5890 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -65,6 +65,24 @@ static const struct devlink_health_reporter_ops bnxt_dl_fw_reporter_ops = {
 	.diagnose = bnxt_fw_reporter_diagnose,
 };
 
+static int bnxt_fw_reset_recover(struct devlink_health_reporter *reporter,
+				 void *priv_ctx)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+
+	if (!priv_ctx)
+		return -EOPNOTSUPP;
+
+	bnxt_fw_reset(bp);
+	return 0;
+}
+
+static const
+struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
+	.name = "fw_reset",
+	.recover = bnxt_fw_reset_recover,
+};
+
 static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 {
 	struct bnxt_fw_health *health = bp->fw_health;
@@ -80,6 +98,16 @@ static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 			    PTR_ERR(health->fw_reporter));
 		health->fw_reporter = NULL;
 	}
+
+	health->fw_reset_reporter =
+		devlink_health_reporter_create(bp->dl,
+					       &bnxt_dl_fw_reset_reporter_ops,
+					       0, true, bp);
+	if (IS_ERR(health->fw_reset_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_reset_reporter));
+		health->fw_reset_reporter = NULL;
+	}
 }
 
 static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
@@ -91,6 +119,30 @@ static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
 
 	if (health->fw_reporter)
 		devlink_health_reporter_destroy(health->fw_reporter);
+
+	if (health->fw_reset_reporter)
+		devlink_health_reporter_destroy(health->fw_reset_reporter);
+}
+
+void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	struct bnxt_fw_reporter_ctx fw_reporter_ctx;
+
+	if (!fw_health)
+		return;
+
+	fw_reporter_ctx.sp_event = event;
+	switch (event) {
+	case BNXT_FW_RESET_NOTIFY_SP_EVENT:
+		if (!fw_health->fw_reset_reporter)
+			return;
+
+		devlink_health_report(fw_health->fw_reset_reporter,
+				      "FW non-fatal reset event received",
+				      &fw_reporter_ctx);
+		return;
+	}
 }
 
 static const struct devlink_ops bnxt_dl_ops = {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index 5b6b2c7..b97e0ba 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -55,6 +55,7 @@ struct bnxt_dl_nvm_param {
 	u16 num_bits;
 };
 
+void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event);
 int bnxt_dl_register(struct bnxt *bp);
 void bnxt_dl_unregister(struct bnxt *bp);
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index fc77caf..b2c1609 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -226,6 +226,9 @@ static int bnxt_send_msg(struct bnxt_en_dev *edev, int ulp_id,
 	struct input *req;
 	int rc;
 
+	if (ulp_id != BNXT_ROCE_ULP && bp->fw_reset_state)
+		return -EBUSY;
+
 	mutex_lock(&bp->hwrm_cmd_lock);
 	req = fw_msg->msg;
 	req->resp_addr = cpu_to_le64(bp->hwrm_cmd_resp_dma_addr);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 11/14] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (9 preceding siblings ...)
  2019-08-26  3:55 ` [PATCH net-next 10/14] bnxt_en: Handle RESET_NOTIFY async event from firmware Michael Chan
@ 2019-08-26  3:55 ` Michael Chan
  2019-08-26  3:55 ` [PATCH net-next 12/14] bnxt_en: Do not send firmware messages if firmware is in error state Michael Chan
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Retain the VF MAC address, default VLAN, TX rate control, trust settings
of VFs after firmware reset.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 50 +++++++++++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  2 +-
 3 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 12ab4ae..87e51e0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9204,7 +9204,7 @@ static int bnxt_open(struct net_device *dev)
 			int n = pf->active_vfs;
 
 			if (n)
-				bnxt_cfg_hw_sriov(bp, &n);
+				bnxt_cfg_hw_sriov(bp, &n, true);
 		}
 		bnxt_hwmon_open(bp);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index e435374..03104bf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -486,10 +486,43 @@ static int bnxt_hwrm_func_buf_rgtr(struct bnxt *bp)
 	return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 }
 
+/* Caller holds bp->hwrm_cmd_lock mutex lock */
+static void __bnxt_set_vf_params(struct bnxt *bp, int vf_id)
+{
+	struct hwrm_func_cfg_input req = {0};
+	struct bnxt_vf_info *vf;
+
+	vf = &bp->pf.vf[vf_id];
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_CFG, -1, -1);
+	req.fid = cpu_to_le16(vf->fw_fid);
+	req.flags = cpu_to_le32(vf->func_flags);
+
+	if (is_valid_ether_addr(vf->mac_addr)) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_DFLT_MAC_ADDR);
+		memcpy(req.dflt_mac_addr, vf->mac_addr, ETH_ALEN);
+	}
+	if (vf->vlan) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_DFLT_VLAN);
+		req.dflt_vlan = cpu_to_le16(vf->vlan);
+	}
+	if (vf->max_tx_rate) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_MAX_BW);
+		req.max_bw = cpu_to_le32(vf->max_tx_rate);
+#ifdef HAVE_IFLA_TX_RATE
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_MIN_BW);
+		req.min_bw = cpu_to_le32(vf->min_tx_rate);
+#endif
+	}
+	if (vf->flags & BNXT_VF_TRUST)
+		req.flags |= cpu_to_le32(FUNC_CFG_REQ_FLAGS_TRUSTED_VF_ENABLE);
+
+	_hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
 /* Only called by PF to reserve resources for VFs, returns actual number of
  * VFs configured, or < 0 on error.
  */
-static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
+static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs, bool reset)
 {
 	struct hwrm_func_vf_resource_cfg_input req = {0};
 	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
@@ -561,6 +594,9 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		if (reset)
+			__bnxt_set_vf_params(bp, i);
+
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
@@ -679,15 +715,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 	return rc;
 }
 
-static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
+static int bnxt_func_cfg(struct bnxt *bp, int num_vfs, bool reset)
 {
 	if (BNXT_NEW_RM(bp))
-		return bnxt_hwrm_func_vf_resc_cfg(bp, num_vfs);
+		return bnxt_hwrm_func_vf_resc_cfg(bp, num_vfs, reset);
 	else
 		return bnxt_hwrm_func_cfg(bp, num_vfs);
 }
 
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset)
 {
 	int rc;
 
@@ -697,7 +733,7 @@ int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
 		return rc;
 
 	/* Reserve resources for VFs */
-	rc = bnxt_func_cfg(bp, *num_vfs);
+	rc = bnxt_func_cfg(bp, *num_vfs, reset);
 	if (rc != *num_vfs) {
 		if (rc <= 0) {
 			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
@@ -778,7 +814,7 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 	if (rc)
 		goto err_out1;
 
-	rc = bnxt_cfg_hw_sriov(bp, num_vfs);
+	rc = bnxt_cfg_hw_sriov(bp, num_vfs, false);
 	if (rc)
 		goto err_out2;
 
@@ -1205,7 +1241,7 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 }
 #else
 
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset)
 {
 	if (*num_vfs)
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index 0abf18e..629641b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -36,7 +36,7 @@ int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs);
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 12/14] bnxt_en: Do not send firmware messages if firmware is in error state.
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (10 preceding siblings ...)
  2019-08-26  3:55 ` [PATCH net-next 11/14] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event Michael Chan
@ 2019-08-26  3:55 ` Michael Chan
  2019-08-26  3:55 ` [PATCH net-next 13/14] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task() Michael Chan
  2019-08-26  3:55 ` [PATCH net-next 14/14] bnxt_en: Add FW fatal devlink_health_reporter Michael Chan
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller.  Fix up some clean up functions to continue
and not abort when the firmware message function returns error.

This is preparation work to fully handle firmware error recovery
under fatal conditions.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 +++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 87e51e0..5d0f028 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4147,6 +4147,9 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 	u32 bar_offset = BNXT_GRCPF_REG_CHIMP_COMM;
 	u16 dst = BNXT_HWRM_CHNL_CHIMP;
 
+	if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return -EBUSY;
+
 	if (msg_len > BNXT_HWRM_MAX_REQ_LEN) {
 		if (msg_len > bp->hwrm_max_ext_req_len ||
 		    !bp->hwrm_short_cmd_req_addr)
@@ -5021,8 +5024,6 @@ static int bnxt_hwrm_vnic_free_one(struct bnxt *bp, u16 vnic_id)
 			cpu_to_le32(bp->vnic_info[vnic_id].fw_vnic_id);
 
 		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-		if (rc)
-			return rc;
 		bp->vnic_info[vnic_id].fw_vnic_id = INVALID_HW_RING_ID;
 	}
 	return rc;
@@ -5162,8 +5163,6 @@ static int bnxt_hwrm_ring_grp_free(struct bnxt *bp)
 
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
-		if (rc)
-			break;
 		bp->grp_info[i].fw_grp_id = INVALID_HW_RING_ID;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
@@ -5482,6 +5481,9 @@ static int hwrm_ring_free_send_msg(struct bnxt *bp,
 	struct hwrm_ring_free_output *resp = bp->hwrm_cmd_resp_addr;
 	u16 error_code;
 
+	if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return 0;
+
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_RING_FREE, cmpl_ring_id, -1);
 	req.ring_type = ring_type;
 	req.ring_id = cpu_to_le16(ring->fw_ring_id);
@@ -6283,8 +6285,6 @@ static int bnxt_hwrm_stat_ctx_free(struct bnxt *bp)
 
 			rc = _hwrm_send_message(bp, &req, sizeof(req),
 						HWRM_CMD_TIMEOUT);
-			if (rc)
-				break;
 
 			cpr->hw_stats_ctx_id = INVALID_STATS_CTX_ID;
 		}
@@ -7406,6 +7406,8 @@ static int bnxt_set_tpa(struct bnxt *bp, bool set_tpa)
 
 	if (set_tpa)
 		tpa_flags = bp->flags & BNXT_FLAG_TPA;
+	else if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return 0;
 	for (i = 0; i < bp->nr_vnics; i++) {
 		rc = bnxt_hwrm_vnic_set_tpa(bp, i, tpa_flags);
 		if (rc) {
@@ -9996,7 +9998,8 @@ void bnxt_fw_reset(struct bnxt *bp)
 	if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
 	    !test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
 		set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
-		if (BNXT_PF(bp) && bp->pf.active_vfs) {
+		if (BNXT_PF(bp) && bp->pf.active_vfs &&
+		    test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
 			rc = bnxt_hwrm_func_qcfg(bp);
 			if (rc) {
 				netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
@@ -10425,6 +10428,7 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 		return;
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
 			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
 			goto fw_reset_abort;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index cb9a9a4..078900a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1617,6 +1617,7 @@ struct bnxt {
 #define BNXT_STATE_FW_RESET_DET 3
 #define BNXT_STATE_IN_FW_RESET	4
 #define BNXT_STATE_PROBE_ERR	5
+#define BNXT_STATE_FW_FATAL_COND	6
 
 	struct bnxt_irq	*irq_tbl;
 	int			total_irqs;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 13/14] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task().
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (11 preceding siblings ...)
  2019-08-26  3:55 ` [PATCH net-next 12/14] bnxt_en: Do not send firmware messages if firmware is in error state Michael Chan
@ 2019-08-26  3:55 ` Michael Chan
  2019-08-26  3:55 ` [PATCH net-next 14/14] bnxt_en: Add FW fatal devlink_health_reporter Michael Chan
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

This state handles driver initiated chip reset during error recovery.
Only the master function will perform this step during error recovery.
The next patch will add code to initiate this reset from the master
function.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 64 +++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5d0f028..1687a1a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10389,6 +10389,62 @@ static int bnxt_fw_init_one(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_fw_reset_writel(struct bnxt *bp, int reg_idx)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg = fw_health->fw_reset_seq_regs[reg_idx];
+	u32 val = fw_health->fw_reset_seq_vals[reg_idx];
+	u32 reg_type, reg_off, delay_msecs;
+
+	delay_msecs = fw_health->fw_reset_seq_delay_msec[reg_idx];
+	reg_type = BNXT_FW_HEALTH_REG_TYPE(reg);
+	reg_off = BNXT_FW_HEALTH_REG_OFF(reg);
+	switch (reg_type) {
+	case BNXT_FW_HEALTH_REG_TYPE_CFG:
+		pci_write_config_dword(bp->pdev, reg_off, val);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_GRC:
+		writel(reg_off & BNXT_GRC_BASE_MASK,
+		       bp->bar0 + BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4);
+		reg_off = (reg_off & BNXT_GRC_OFFSET_MASK) + 0x2000;
+		/* fall through */
+	case BNXT_FW_HEALTH_REG_TYPE_BAR0:
+		writel(val, bp->bar0 + reg_off);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_BAR1:
+		writel(val, bp->bar1 + reg_off);
+		break;
+	}
+	if (delay_msecs) {
+		pci_read_config_dword(bp->pdev, 0, &val);
+		msleep(delay_msecs);
+	}
+}
+
+static void bnxt_reset_all(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	int i;
+
+	if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_HOST) {
+		for (i = 0; i < fw_health->fw_reset_seq_cnt; i++)
+			bnxt_fw_reset_writel(bp, i);
+	} else if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU) {
+		struct hwrm_fw_reset_input req = {0};
+		int rc;
+
+		bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FW_RESET, -1, -1);
+		req.resp_addr = cpu_to_le64(bp->hwrm_cmd_kong_resp_dma_addr);
+		req.embedded_proc_type = FW_RESET_REQ_EMBEDDED_PROC_TYPE_CHIP;
+		req.selfrst_status = FW_RESET_REQ_SELFRST_STATUS_SELFRSTASAP;
+		req.flags = FW_RESET_REQ_FLAGS_RESET_GRACEFUL;
+		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+		if (rc)
+			netdev_warn(bp->dev, "Unable to reset FW rc=%d\n", rc);
+	}
+	bp->fw_reset_timestamp = jiffies;
+}
+
 static void bnxt_fw_reset_task(struct work_struct *work)
 {
 	struct bnxt *bp = container_of(work, struct bnxt, fw_reset_task.work);
@@ -10427,6 +10483,14 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		rtnl_unlock();
 		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 		return;
+	case BNXT_FW_RESET_STATE_RESET_FW: {
+		u32 wait_dsecs = bp->fw_health->post_reset_wait_dsecs;
+
+		bnxt_reset_all(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		bnxt_queue_fw_reset_work(bp, wait_dsecs * HZ / 10);
+		return;
+	}
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
 		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next 14/14] bnxt_en: Add FW fatal devlink_health_reporter
  2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
                   ` (12 preceding siblings ...)
  2019-08-26  3:55 ` [PATCH net-next 13/14] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task() Michael Chan
@ 2019-08-26  3:55 ` Michael Chan
  13 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Health show command example and output:

$ devlink health show pci/0000:af:00.0 reporter fw_fatal

pci/0000:af:00.0:
  name fw_fatal
    state healthy error 1 recover 1 grace_period 0 auto_recover true

Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 125 +++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |   8 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c |  56 ++++++++++
 3 files changed, 186 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1687a1a..d338f5b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1988,7 +1988,9 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
-	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: {
+		u32 data1 = le32_to_cpu(cmpl->event_data1);
+
 		bp->fw_reset_timestamp = jiffies;
 		bp->fw_reset_min_dsecs = cmpl->timestamp_lo;
 		if (!bp->fw_reset_min_dsecs)
@@ -1996,8 +1998,16 @@ static int bnxt_async_event_process(struct bnxt *bp,
 		bp->fw_reset_max_dsecs = le16_to_cpu(cmpl->timestamp_hi);
 		if (!bp->fw_reset_max_dsecs)
 			bp->fw_reset_max_dsecs = 60;
+		if (EVENT_DATA1_RESET_NOTIFY_FATAL(data1)) {
+			netdev_warn(bp->dev, "Firmware fatal reset event received\n");
+			set_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
+		} else {
+			netdev_warn(bp->dev, "Firmware non-fatal reset event received, max wait time %d msec\n",
+				    bp->fw_reset_max_dsecs * 100);
+		}
 		set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event);
 		break;
+	}
 	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
 		struct bnxt_fw_health *fw_health = bp->fw_health;
 		u32 data1 = le32_to_cpu(cmpl->event_data1);
@@ -4390,6 +4400,7 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 {
 	struct hwrm_func_drv_rgtr_output *resp = bp->hwrm_cmd_resp_addr;
 	struct hwrm_func_drv_rgtr_input req = {0};
+	u32 flags;
 	int rc;
 
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_DRV_RGTR, -1, -1);
@@ -4399,8 +4410,11 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 			    FUNC_DRV_RGTR_REQ_ENABLES_VER);
 
 	req.os_type = cpu_to_le16(FUNC_DRV_RGTR_REQ_OS_TYPE_LINUX);
-	req.flags = cpu_to_le32(FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE |
-				FUNC_DRV_RGTR_REQ_FLAGS_HOT_RESET_SUPPORT);
+	flags = FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE |
+		FUNC_DRV_RGTR_REQ_FLAGS_HOT_RESET_SUPPORT;
+	if (bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY)
+		flags |= FUNC_DRV_RGTR_REQ_FLAGS_ERROR_RECOVERY_SUPPORT;
+	req.flags = cpu_to_le32(flags);
 	req.ver_maj_8b = DRV_VER_MAJ;
 	req.ver_min_8b = DRV_VER_MIN;
 	req.ver_upd_8b = DRV_VER_UPD;
@@ -9913,6 +9927,38 @@ static void bnxt_tx_timeout(struct net_device *dev)
 	bnxt_queue_sp_work(bp);
 }
 
+static void bnxt_fw_health_check(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 val;
+
+	if (!fw_health || !fw_health->enabled ||
+	    test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return;
+
+	if (fw_health->tmr_counter) {
+		fw_health->tmr_counter--;
+		return;
+	}
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+	if (val == fw_health->last_fw_heartbeat)
+		goto fw_reset;
+
+	fw_health->last_fw_heartbeat = val;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	if (val != fw_health->last_fw_reset_cnt)
+		goto fw_reset;
+
+	fw_health->tmr_counter = fw_health->tmr_multiplier;
+	return;
+
+fw_reset:
+	set_bit(BNXT_FW_EXCEPTION_SP_EVENT, &bp->sp_event);
+	bnxt_queue_sp_work(bp);
+}
+
 static void bnxt_timer(struct timer_list *t)
 {
 	struct bnxt *bp = from_timer(bp, t, timer);
@@ -9924,6 +9970,9 @@ static void bnxt_timer(struct timer_list *t)
 	if (atomic_read(&bp->intr_sem) != 0)
 		goto bnxt_restart_timer;
 
+	if (bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY)
+		bnxt_fw_health_check(bp);
+
 	if (bp->link_info.link_up && (bp->flags & BNXT_FLAG_PORT_STATS) &&
 	    bp->stats_coal_ticks) {
 		set_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event);
@@ -9990,6 +10039,60 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
 	bp->ctx = NULL;
 }
 
+static bool is_bnxt_fw_ok(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	bool no_heartbeat = false, has_reset = false;
+	u32 val;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+	if (val == fw_health->last_fw_heartbeat)
+		no_heartbeat = true;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	if (val != fw_health->last_fw_reset_cnt)
+		has_reset = true;
+
+	if (!no_heartbeat && has_reset)
+		return true;
+
+	return false;
+}
+
+/* rtnl_lock is acquired before calling this function */
+static void bnxt_force_fw_reset(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 wait_dsecs;
+
+	if (!test_bit(BNXT_STATE_OPEN, &bp->state) ||
+	    test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return;
+
+	set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+	bnxt_fw_reset_close(bp);
+	wait_dsecs = fw_health->master_func_wait_dsecs;
+	if (fw_health->master) {
+		if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU)
+			wait_dsecs = 0;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_RESET_FW;
+	} else {
+		bp->fw_reset_timestamp = jiffies + wait_dsecs * HZ / 10;
+		wait_dsecs = fw_health->normal_func_wait_dsecs;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+	}
+	bp->fw_reset_max_dsecs = fw_health->post_reset_max_wait_dsecs;
+	bnxt_queue_fw_reset_work(bp, wait_dsecs * HZ / 10);
+}
+
+void bnxt_fw_exception(struct bnxt *bp)
+{
+	set_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
+	bnxt_rtnl_lock_sp(bp);
+	bnxt_force_fw_reset(bp);
+	bnxt_rtnl_unlock_sp(bp);
+}
+
 void bnxt_fw_reset(struct bnxt *bp)
 {
 	int rc;
@@ -10160,6 +10263,12 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event))
 		bnxt_devlink_health_report(bp, BNXT_FW_RESET_NOTIFY_SP_EVENT);
 
+	if (test_and_clear_bit(BNXT_FW_EXCEPTION_SP_EVENT, &bp->sp_event)) {
+		if (!is_bnxt_fw_ok(bp))
+			bnxt_devlink_health_report(bp,
+						   BNXT_FW_EXCEPTION_SP_EVENT);
+	}
+
 	smp_mb__before_atomic();
 	clear_bit(BNXT_STATE_IN_SP_TASK, &bp->state);
 }
@@ -10492,6 +10601,16 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		return;
 	}
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state) &&
+		    bp->fw_health) {
+			u32 val;
+
+			val = bnxt_fw_health_readl(bp,
+						   BNXT_FW_RESET_INPROG_REG);
+			if (val)
+				netdev_warn(bp->dev, "FW reset inprog %x after min wait time.\n",
+					    val);
+		}
 		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
 			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 078900a..1710405 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -472,6 +472,11 @@ struct rx_tpa_end_cmp_ext {
 	((le32_to_cpu((rx_tpa_end_ext)->rx_tpa_end_cmp_dup_acks) &	\
 	 RX_TPA_END_CMP_AGG_BUFS_P5) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT_P5)
 
+#define EVENT_DATA1_RESET_NOTIFY_FATAL(data1)				\
+	(((data1) &							\
+	  ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK) ==\
+	 ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL)
+
 #define EVENT_DATA1_RECOVERY_MASTER_FUNC(data1)				\
 	!!((data1) &							\
 	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC)
@@ -1372,6 +1377,7 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_delay_msec[16];
 	struct devlink_health_reporter	*fw_reporter;
 	struct devlink_health_reporter *fw_reset_reporter;
+	struct devlink_health_reporter *fw_fatal_reporter;
 };
 
 struct bnxt_fw_reporter_ctx {
@@ -1728,6 +1734,7 @@ struct bnxt {
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 #define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
+#define BNXT_FW_EXCEPTION_SP_EVENT	19
 
 	struct delayed_work	fw_reset_task;
 	int			fw_reset_state;
@@ -1979,6 +1986,7 @@ int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_half_open_nic(struct bnxt *bp);
 void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+void bnxt_fw_exception(struct bnxt *bp);
 void bnxt_fw_reset(struct bnxt *bp);
 int bnxt_check_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
 		     int tx_xdp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index c2f5890..2fad018 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -83,6 +83,31 @@ struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
 	.recover = bnxt_fw_reset_recover,
 };
 
+static int bnxt_fw_fatal_recover(struct devlink_health_reporter *reporter,
+				 void *priv_ctx)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+	struct bnxt_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
+	unsigned long event;
+
+	if (!priv_ctx)
+		return -EOPNOTSUPP;
+
+	event = fw_reporter_ctx->sp_event;
+	if (event == BNXT_FW_RESET_NOTIFY_SP_EVENT)
+		bnxt_fw_reset(bp);
+	else if (event == BNXT_FW_EXCEPTION_SP_EVENT)
+		bnxt_fw_exception(bp);
+
+	return 0;
+}
+
+static const
+struct devlink_health_reporter_ops bnxt_dl_fw_fatal_reporter_ops = {
+	.name = "fw_fatal",
+	.recover = bnxt_fw_fatal_recover,
+};
+
 static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 {
 	struct bnxt_fw_health *health = bp->fw_health;
@@ -108,6 +133,16 @@ static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 			    PTR_ERR(health->fw_reset_reporter));
 		health->fw_reset_reporter = NULL;
 	}
+
+	health->fw_fatal_reporter =
+		devlink_health_reporter_create(bp->dl,
+					       &bnxt_dl_fw_fatal_reporter_ops,
+					       0, true, bp);
+	if (IS_ERR(health->fw_fatal_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_fatal_reporter));
+		health->fw_fatal_reporter = NULL;
+	}
 }
 
 static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
@@ -122,6 +157,9 @@ static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
 
 	if (health->fw_reset_reporter)
 		devlink_health_reporter_destroy(health->fw_reset_reporter);
+
+	if (health->fw_fatal_reporter)
+		devlink_health_reporter_destroy(health->fw_fatal_reporter);
 }
 
 void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
@@ -135,6 +173,15 @@ void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
 	fw_reporter_ctx.sp_event = event;
 	switch (event) {
 	case BNXT_FW_RESET_NOTIFY_SP_EVENT:
+		if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
+			if (!fw_health->fw_fatal_reporter)
+				return;
+
+			devlink_health_report(fw_health->fw_fatal_reporter,
+					      "FW fatal async event received",
+					      &fw_reporter_ctx);
+			return;
+		}
 		if (!fw_health->fw_reset_reporter)
 			return;
 
@@ -142,6 +189,15 @@ void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
 				      "FW non-fatal reset event received",
 				      &fw_reporter_ctx);
 		return;
+
+	case BNXT_FW_EXCEPTION_SP_EVENT:
+		if (!fw_health->fw_fatal_reporter)
+			return;
+
+		devlink_health_report(fw_health->fw_fatal_reporter,
+				      "FW fatal error reported",
+				      &fw_reporter_ctx);
+		return;
 	}
 }
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode.
  2019-08-26  3:54 ` [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
@ 2019-08-26  5:15   ` David Miller
  2019-08-26  6:17     ` Michael Chan
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2019-08-26  5:15 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:52 -0400

> If the silent parameter is set, suppress all messages when there is
> no response from firmware.  When polling for firmware to come out of
> reset, no response may be normal and we want to suppress the error
> messages.  Also, don't poll for the firmware DMA response if Bus Master
> is disabled.  This is in preparation for error recovery when firmware
> may be in error or reset state or Bus Master is disabled.
> 
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>

The function bnxt_hwrm_do_send_msg() seems to be an interesting mix of return
values, what are the semantics?

It seems to use 0 for success, some error codes, and -1.  Does -1 have special
meaning?

Just curious, and really this unorthodox return value semantic should
be documented into a comment above the function.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-26  3:54 ` [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
@ 2019-08-26  5:36   ` David Miller
  2019-08-26  6:06     ` Michael Chan
  2019-08-26  6:00   ` Leon Romanovsky
  1 sibling, 1 reply; 27+ messages in thread
From: David Miller @ 2019-08-26  5:36 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:54 -0400

> @@ -687,6 +687,32 @@ static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
>  		return bnxt_hwrm_func_cfg(bp, num_vfs);
>  }
>  
> +int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
> +{
> +	int rc;
> +
> +	/* Register buffers for VFs */
> +	rc = bnxt_hwrm_func_buf_rgtr(bp);
> +	if (rc)
> +		return rc;
> +
> +	/* Reserve resources for VFs */
> +	rc = bnxt_func_cfg(bp, *num_vfs);
> +	if (rc != *num_vfs) {

I notice that these two operations are reversed here from where they were in the
bnxt_sriov_enable() function.  Does the BUF_RGTR operation have to be undone if
the bnxt_func_cfg() fails?

When it's not a straight extraction of code into a helper function one really
should do one of two things in my opinion:

1) Explain the differences in the commit message.

2) Do a straight extration in one commit, change the ordering in another.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP.
  2019-08-26  3:54 ` [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
@ 2019-08-26  5:47   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2019-08-26  5:47 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:55 -0400

> @@ -7005,7 +7005,9 @@ static int __bnxt_hwrm_ver_get(struct bnxt *bp, bool silent)
>  
>  	rc = bnxt_hwrm_do_send_msg(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT,
>  				   silent);
> -	return rc;
> +	if (rc)
> +		return -ENODEV;
> +	return 0;
>  }
>  
>  static int bnxt_hwrm_ver_get(struct bnxt *bp)
 ...
> @@ -8528,26 +8533,53 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
>  		req.flags = cpu_to_le32(FUNC_DRV_IF_CHANGE_REQ_FLAGS_UP);
>  	mutex_lock(&bp->hwrm_cmd_lock);
>  	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
> -	if (!rc && (resp->flags &
> -		    cpu_to_le32(FUNC_DRV_IF_CHANGE_RESP_FLAGS_RESC_CHANGE)))
> -		resc_reinit = true;
> +	if (!rc)
> +		flags = le32_to_cpu(resp->flags);
>  	mutex_unlock(&bp->hwrm_cmd_lock);
> +	if (rc)
> +		return -EIO;

Following up to my other review comments, if _hwrm_send_message() et
al. returned consistently proper error codes instead of sometimes -1,
couldn't you avoid at least some of these 'rc' remappings?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities.
  2019-08-26  3:54 ` [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities Michael Chan
@ 2019-08-26  5:49   ` David Miller
  2019-08-26  6:23     ` Michael Chan
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2019-08-26  5:49 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:56 -0400

> +static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
> +{
> +	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
> +	struct bnxt_fw_health *fw_health = bp->fw_health;
> +	struct hwrm_error_recovery_qcfg_input req = {0};
> +	int rc, i;
> +
> +	if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
> +		return 0;
> +
> +	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
> +	mutex_lock(&bp->hwrm_cmd_lock);
> +	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
> +	if (rc) {
> +		rc = -EOPNOTSUPP;
> +		goto err_recovery_out;
> +	}

How is this logically an unsupported operation if you're guarding it's use
with an appropriate capability check?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs.
  2019-08-26  3:54 ` [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs Michael Chan
@ 2019-08-26  5:59   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2019-08-26  5:59 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:59 -0400

> @@ -9234,6 +9243,13 @@ int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
>  {
>  	int rc = 0;
>  
> +	while (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
> +		netdev_info(bp->dev, "FW reset in progress, delaying close");
> +		rtnl_unlock();
> +		msleep(250);
> +		rtnl_lock();
> +	}

Dropping the RTNL here is extremely dangerous.

Operations other than actual device close can get into the
bnxt_close_nic() code paths (changing features, ethtool ops, etc.)

So we can thus re-enter this function once you drop the RTNL mutex.

Furthermore, and I understand what pains you go into in patch #9 to
avoid this, but it's an endless loop.  If there are bugs there, we
will get stuck in this close path forever.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-26  3:54 ` [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
  2019-08-26  5:36   ` David Miller
@ 2019-08-26  6:00   ` Leon Romanovsky
  2019-08-30  9:18     ` Leon Romanovsky
  1 sibling, 1 reply; 27+ messages in thread
From: Leon Romanovsky @ 2019-08-26  6:00 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, vasundhara-v.volam, jiri, ray.jui

On Sun, Aug 25, 2019 at 11:54:54PM -0400, Michael Chan wrote:
> Refactor the hardware/firmware configuration portion in
> bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
> new function can be called after a firmware reset to reconfigure the
> VFs previously enabled.

I wonder what does it mean for already bound VFs to vfio driver?
Will you rebind them as well? Can I assume that FW error in one VF
will trigger "restart" of other VFs too?

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-26  5:36   ` David Miller
@ 2019-08-26  6:06     ` Michael Chan
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  6:06 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui

On Sun, Aug 25, 2019 at 10:36 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Sun, 25 Aug 2019 23:54:54 -0400
>
> > @@ -687,6 +687,32 @@ static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
> >               return bnxt_hwrm_func_cfg(bp, num_vfs);
> >  }
> >
> > +int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
> > +{
> > +     int rc;
> > +
> > +     /* Register buffers for VFs */
> > +     rc = bnxt_hwrm_func_buf_rgtr(bp);
> > +     if (rc)
> > +             return rc;
> > +
> > +     /* Reserve resources for VFs */
> > +     rc = bnxt_func_cfg(bp, *num_vfs);
> > +     if (rc != *num_vfs) {
>
> I notice that these two operations are reversed here from where they were in the
> bnxt_sriov_enable() function.  Does the BUF_RGTR operation have to be undone if
> the bnxt_func_cfg() fails?
>
> When it's not a straight extraction of code into a helper function one really
> should do one of two things in my opinion:
>
> 1) Explain the differences in the commit message.
>
> 2) Do a straight extration in one commit, change the ordering in another.

OK.  Will break it up into 2 commits with explanations.  The reason is
that during normal init, the PF is always initialized first and the
exact bring-up sequence does not matter too much.  During error
recovery, the PF and VFs can be all trying to recover at about the
same time and the sequence matters more.  Will explain all of this
again in v2.  Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode.
  2019-08-26  5:15   ` David Miller
@ 2019-08-26  6:17     ` Michael Chan
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  6:17 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui

On Sun, Aug 25, 2019 at 10:15 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Sun, 25 Aug 2019 23:54:52 -0400
>
> > If the silent parameter is set, suppress all messages when there is
> > no response from firmware.  When polling for firmware to come out of
> > reset, no response may be normal and we want to suppress the error
> > messages.  Also, don't poll for the firmware DMA response if Bus Master
> > is disabled.  This is in preparation for error recovery when firmware
> > may be in error or reset state or Bus Master is disabled.
> >
> > Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>
> The function bnxt_hwrm_do_send_msg() seems to be an interesting mix of return
> values, what are the semantics?
>
> It seems to use 0 for success, some error codes, and -1.  Does -1 have special
> meaning?
>
> Just curious, and really this unorthodox return value semantic should
> be documented into a comment above the function.

Sadly, it was coded initially to return firmware defined error codes.
But in some cases, the return code gets propagated all the way back to
userspace.  The long term goal is to convert to standard error codes
and so we try to use standard error codes whenever we add new patches
related to this function.  I will see what I can do to make this
better in v2.  Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities.
  2019-08-26  5:49   ` David Miller
@ 2019-08-26  6:23     ` Michael Chan
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Chan @ 2019-08-26  6:23 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui

On Sun, Aug 25, 2019 at 10:49 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Sun, 25 Aug 2019 23:54:56 -0400
>
> > +static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
> > +{
> > +     struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
> > +     struct bnxt_fw_health *fw_health = bp->fw_health;
> > +     struct hwrm_error_recovery_qcfg_input req = {0};
> > +     int rc, i;
> > +
> > +     if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
> > +             return 0;
> > +
> > +     bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
> > +     mutex_lock(&bp->hwrm_cmd_lock);
> > +     rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
> > +     if (rc) {
> > +             rc = -EOPNOTSUPP;
> > +             goto err_recovery_out;
> > +     }
>
> How is this logically an unsupported operation if you're guarding it's use
> with an appropriate capability check?

The BNXT_FW_CAP_ERROR_RECOVERY flag says that error recovery should be
supported, but if the firmware call to get the recovery parameters
fails, we return -EOPNOTSUPP to let the caller know that error
recovery cannot be supported.  Again, I will try to clean up the error
codes in v2.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-26  6:00   ` Leon Romanovsky
@ 2019-08-30  9:18     ` Leon Romanovsky
  2019-08-30 16:00       ` Michael Chan
  0 siblings, 1 reply; 27+ messages in thread
From: Leon Romanovsky @ 2019-08-30  9:18 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, vasundhara-v.volam, jiri, ray.jui

On Mon, Aug 26, 2019 at 09:00:45AM +0300, Leon Romanovsky wrote:
> On Sun, Aug 25, 2019 at 11:54:54PM -0400, Michael Chan wrote:
> > Refactor the hardware/firmware configuration portion in
> > bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
> > new function can be called after a firmware reset to reconfigure the
> > VFs previously enabled.
>
> I wonder what does it mean for already bound VFs to vfio driver?
> Will you rebind them as well? Can I assume that FW error in one VF
> will trigger "restart" of other VFs too?

Care to reply?


hanks

>
> Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-30  9:18     ` Leon Romanovsky
@ 2019-08-30 16:00       ` Michael Chan
  2019-08-31  7:41         ` Leon Romanovsky
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Chan @ 2019-08-30 16:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David Miller, Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui

On Fri, Aug 30, 2019 at 2:18 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Aug 26, 2019 at 09:00:45AM +0300, Leon Romanovsky wrote:
> > On Sun, Aug 25, 2019 at 11:54:54PM -0400, Michael Chan wrote:
> > > Refactor the hardware/firmware configuration portion in
> > > bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
> > > new function can be called after a firmware reset to reconfigure the
> > > VFs previously enabled.
> >
> > I wonder what does it mean for already bound VFs to vfio driver?
> > Will you rebind them as well? Can I assume that FW error in one VF
> > will trigger "restart" of other VFs too?
>
> Care to reply?
>
>
Sorry, I missed your email earlier.

A firmware reset/recovery has no direct effect on a VF or any function
if it is just idle.  The PCI interface of any function does not get
reset.

If a VF driver (Linux VF driver, DPDK driver, etc) has initialized on
that function, meaning it has exchanged messages with firmware to
register itself and to allocate resources (such as rings), then the
firmware reset will require all those resources to be re-discovered
and re-initialized.  These VF resources are initially assigned by the
PF.  So this refactored function on the PF is to re-assign these
resources back to the VF after the firmware reset.  Again, if the VF
is just bound to vfio and is idle, there is no effect.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-30 16:00       ` Michael Chan
@ 2019-08-31  7:41         ` Leon Romanovsky
  0 siblings, 0 replies; 27+ messages in thread
From: Leon Romanovsky @ 2019-08-31  7:41 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui

On Fri, Aug 30, 2019 at 09:00:27AM -0700, Michael Chan wrote:
> On Fri, Aug 30, 2019 at 2:18 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Aug 26, 2019 at 09:00:45AM +0300, Leon Romanovsky wrote:
> > > On Sun, Aug 25, 2019 at 11:54:54PM -0400, Michael Chan wrote:
> > > > Refactor the hardware/firmware configuration portion in
> > > > bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
> > > > new function can be called after a firmware reset to reconfigure the
> > > > VFs previously enabled.
> > >
> > > I wonder what does it mean for already bound VFs to vfio driver?
> > > Will you rebind them as well? Can I assume that FW error in one VF
> > > will trigger "restart" of other VFs too?
> >
> > Care to reply?
> >
> >
> Sorry, I missed your email earlier.
>
> A firmware reset/recovery has no direct effect on a VF or any function
> if it is just idle.  The PCI interface of any function does not get
> reset.
>
> If a VF driver (Linux VF driver, DPDK driver, etc) has initialized on
> that function, meaning it has exchanged messages with firmware to
> register itself and to allocate resources (such as rings), then the
> firmware reset will require all those resources to be re-discovered
> and re-initialized.  These VF resources are initially assigned by the
> PF.  So this refactored function on the PF is to re-assign these
> resources back to the VF after the firmware reset.  Again, if the VF
> is just bound to vfio and is idle, there is no effect.

Thanks for explaining the flow.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-08-31  7:41 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26  3:54 [PATCH net-next 00/14] bnxt_en: health and error recovery Michael Chan
2019-08-26  3:54 ` [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
2019-08-26  5:15   ` David Miller
2019-08-26  6:17     ` Michael Chan
2019-08-26  3:54 ` [PATCH net-next 02/14] bnxt_en: Prepare bnxt_init_one() to be called multiple times Michael Chan
2019-08-26  3:54 ` [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
2019-08-26  5:36   ` David Miller
2019-08-26  6:06     ` Michael Chan
2019-08-26  6:00   ` Leon Romanovsky
2019-08-30  9:18     ` Leon Romanovsky
2019-08-30 16:00       ` Michael Chan
2019-08-31  7:41         ` Leon Romanovsky
2019-08-26  3:54 ` [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
2019-08-26  5:47   ` David Miller
2019-08-26  3:54 ` [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities Michael Chan
2019-08-26  5:49   ` David Miller
2019-08-26  6:23     ` Michael Chan
2019-08-26  3:54 ` [PATCH net-next 06/14] bnxt_en: Pre-map the firmware health monitoring registers Michael Chan
2019-08-26  3:54 ` [PATCH net-next 07/14] bnxt_en: Enable health monitoring Michael Chan
2019-08-26  3:54 ` [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs Michael Chan
2019-08-26  5:59   ` David Miller
2019-08-26  3:55 ` [PATCH net-next 09/14] bnxt_en: Add new FW devlink_health_reporter Michael Chan
2019-08-26  3:55 ` [PATCH net-next 10/14] bnxt_en: Handle RESET_NOTIFY async event from firmware Michael Chan
2019-08-26  3:55 ` [PATCH net-next 11/14] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event Michael Chan
2019-08-26  3:55 ` [PATCH net-next 12/14] bnxt_en: Do not send firmware messages if firmware is in error state Michael Chan
2019-08-26  3:55 ` [PATCH net-next 13/14] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task() Michael Chan
2019-08-26  3:55 ` [PATCH net-next 14/14] bnxt_en: Add FW fatal devlink_health_reporter Michael Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).