All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 00/22] bnxt_en: health and error recovery.
@ 2019-08-30  3:54 Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 01/22] bnxt_en: Use a common function to print the same ethtool -f error message Michael Chan
                   ` (22 more replies)
  0 siblings, 23 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

This patchset implements adapter health and error recovery.  The status
is reported through several devlink reporters and the driver will
initiate and complete the recovery process using the devlink infrastructure.

v2: Added 4 patches at the beginning of the patchset to clean up error code
    handling related to firmware messages and to convert to use standard
    error codes.

    Removed the dropping of rtnl_lock in bnxt_close().

    Broke up the patches some more for better patch organization and
    future bisection.

Michael Chan (17):
  bnxt_en: Use a common function to print the same ethtool -f error
    message.
  bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg().
  bnxt_en: Convert error code in firmware message response to standard
    code.
  bnxt_en: Simplify error checking in the SR-IOV message forwarding
    functions.
  bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent
    mode.
  bnxt_en: Prepare bnxt_init_one() to be called multiple times.
  bnxt_en: Refactor bnxt_sriov_enable().
  bnxt_en: Handle firmware reset status during IF_UP.
  bnxt_en: Discover firmware error recovery capabilities.
  bnxt_en: Pre-map the firmware health monitoring registers.
  bnxt_en: Enable health monitoring.
  bnxt_en: Add BNXT_STATE_IN_FW_RESET state.
  bnxt_en: Handle RESET_NOTIFY async event from firmware.
  bnxt_en: Handle firmware reset.
  bnxt_en: Do not send firmware messages if firmware is in error state.
  bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task().
  bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors.

Vasundhara Volam (5):
  bnxt_en: Register buffers for VFs before reserving resources.
  bnxt_en: Add new FW devlink_health_reporter
  bnxt_en: Add devlink health reset reporter.
  bnxt_en: Retain user settings on a VF after RESET_NOTIFY event.
  bnxt_en: Add FW fatal devlink_health_reporter.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 884 +++++++++++++++++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  91 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     |   6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 197 ++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  39 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c   | 181 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h   |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c      |  17 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c     |   3 +
 10 files changed, 1172 insertions(+), 248 deletions(-)

-- 
2.5.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 01/22] bnxt_en: Use a common function to print the same ethtool -f error message.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 02/22] bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg() Michael Chan
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

The same message is printed 3 times in the code, so use a common function
to do that.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index b624174..72bb730 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1699,6 +1699,11 @@ static u32 bnxt_get_link(struct net_device *dev)
 	return bp->link_info.link_up;
 }
 
+static void bnxt_print_admin_err(struct bnxt *bp)
+{
+	netdev_info(bp->dev, "PF does not have admin privileges to flash or reset the device\n");
+}
+
 static int bnxt_find_nvram_item(struct net_device *dev, u16 type, u16 ordinal,
 				u16 ext, u16 *index, u32 *item_length,
 				u32 *data_length);
@@ -1739,8 +1744,7 @@ static int bnxt_flash_nvram(struct net_device *dev,
 	dma_free_coherent(&bp->pdev->dev, data_len, kmem, dma_handle);
 
 	if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
-		netdev_info(dev,
-			    "PF does not have admin privileges to flash the device\n");
+		bnxt_print_admin_err(bp);
 		rc = -EACCES;
 	} else if (rc) {
 		rc = -EIO;
@@ -1795,8 +1799,7 @@ static int bnxt_firmware_reset(struct net_device *dev,
 
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
-		netdev_info(dev,
-			    "PF does not have admin privileges to reset the device\n");
+		bnxt_print_admin_err(bp);
 		rc = -EACCES;
 	} else if (rc) {
 		rc = -EIO;
@@ -2096,8 +2099,7 @@ static int bnxt_flash_package_from_file(struct net_device *dev,
 	mutex_unlock(&bp->hwrm_cmd_lock);
 err_exit:
 	if (hwrm_err == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
-		netdev_info(dev,
-			    "PF does not have admin privileges to flash the device\n");
+		bnxt_print_admin_err(bp);
 		rc = -EACCES;
 	} else if (hwrm_err) {
 		rc = -EOPNOTSUPP;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 02/22] bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg().
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 01/22] bnxt_en: Use a common function to print the same ethtool -f error message Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 03/22] bnxt_en: Convert error code in firmware message response to standard code Michael Chan
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Replace the non-standard -1 code with -EBUSY when there is no firmware
response after waiting for the maximum timeout.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b9ad43d..c8550ca 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4162,7 +4162,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		if (bp->hwrm_intr_seq_id != (u16)~seq_id) {
 			netdev_err(bp->dev, "Resp cmpl intr err msg: 0x%x\n",
 				   le16_to_cpu(req->req_type));
-			return -1;
+			return -EBUSY;
 		}
 		len = (le32_to_cpu(*resp_len) & HWRM_RESP_LEN_MASK) >>
 		      HWRM_RESP_LEN_SFT;
@@ -4190,7 +4190,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 				   HWRM_TOTAL_TIMEOUT(i),
 				   le16_to_cpu(req->req_type),
 				   le16_to_cpu(req->seq_id), len);
-			return -1;
+			return -EBUSY;
 		}
 
 		/* Last byte of resp contains valid bit */
@@ -4208,7 +4208,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 				   HWRM_TOTAL_TIMEOUT(i),
 				   le16_to_cpu(req->req_type),
 				   le16_to_cpu(req->seq_id), len, *valid);
-			return -1;
+			return -EBUSY;
 		}
 	}
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 03/22] bnxt_en: Convert error code in firmware message response to standard code.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 01/22] bnxt_en: Use a common function to print the same ethtool -f error message Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 02/22] bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg() Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 04/22] bnxt_en: Simplify error checking in the SR-IOV message forwarding functions Michael Chan
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

The main firmware messaging function returns the firmware defined error
code and many callers have to convert to standard error code for proper
propagation to userspace.  Convert bnxt_hwrm_do_send_msg() to return
standard error code so we can do away with all the special error code
handling by the many callers.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 60 ++++++++++++++---------
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     |  4 --
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c |  8 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 20 ++------
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c   | 12 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c      | 17 -------
 6 files changed, 44 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c8550ca..b9eb24e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4048,6 +4048,32 @@ void bnxt_hwrm_cmd_hdr_init(struct bnxt *bp, void *request, u16 req_type,
 		req->resp_addr = cpu_to_le64(bp->hwrm_cmd_resp_dma_addr);
 }
 
+static int bnxt_hwrm_to_stderr(u32 hwrm_err)
+{
+	switch (hwrm_err) {
+	case HWRM_ERR_CODE_SUCCESS:
+		return 0;
+	case HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED:
+		return -EACCES;
+	case HWRM_ERR_CODE_RESOURCE_ALLOC_ERROR:
+		return -ENOSPC;
+	case HWRM_ERR_CODE_INVALID_PARAMS:
+	case HWRM_ERR_CODE_INVALID_FLAGS:
+	case HWRM_ERR_CODE_INVALID_ENABLES:
+	case HWRM_ERR_CODE_UNSUPPORTED_TLV:
+	case HWRM_ERR_CODE_UNSUPPORTED_OPTION_ERR:
+		return -EINVAL;
+	case HWRM_ERR_CODE_NO_BUFFER:
+		return -ENOMEM;
+	case HWRM_ERR_CODE_HOT_RESET_PROGRESS:
+		return -EAGAIN;
+	case HWRM_ERR_CODE_CMD_NOT_SUPPORTED:
+		return -EOPNOTSUPP;
+	default:
+		return -EIO;
+	}
+}
+
 static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 				 int timeout, bool silent)
 {
@@ -4222,7 +4248,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		netdev_err(bp->dev, "hwrm req_type 0x%x seq id 0x%x error 0x%x\n",
 			   le16_to_cpu(resp->req_type),
 			   le16_to_cpu(resp->seq_id), rc);
-	return rc;
+	return bnxt_hwrm_to_stderr(rc);
 }
 
 int _hwrm_send_message(struct bnxt *bp, void *msg, u32 msg_len, int timeout)
@@ -4335,10 +4361,8 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
-	else if (resp->flags &
-		 cpu_to_le32(FUNC_DRV_RGTR_RESP_FLAGS_IF_CHANGE_SUPPORTED))
+	if (!rc && (resp->flags &
+		    cpu_to_le32(FUNC_DRV_RGTR_RESP_FLAGS_IF_CHANGE_SUPPORTED)))
 		bp->fw_cap |= BNXT_FW_CAP_IF_CHANGE;
 	mutex_unlock(&bp->hwrm_cmd_lock);
 	return rc;
@@ -4761,7 +4785,7 @@ static int bnxt_hwrm_vnic_set_rss_p5(struct bnxt *bp, u16 vnic_id, bool set_rss)
 		}
 		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 		if (rc)
-			return -EIO;
+			return rc;
 	}
 	return 0;
 }
@@ -5521,7 +5545,7 @@ static int bnxt_hwrm_get_rings(struct bnxt *bp)
 	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	if (rc) {
 		mutex_unlock(&bp->hwrm_cmd_lock);
-		return -EIO;
+		return rc;
 	}
 
 	hw_resc->resv_tx_rings = le16_to_cpu(resp->alloc_tx_rings);
@@ -5685,7 +5709,7 @@ bnxt_hwrm_reserve_pf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
 
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	if (rc)
-		return -ENOMEM;
+		return rc;
 
 	if (bp->hwrm_spec_code < 0x10601)
 		bp->hw_resc.resv_tx_rings = tx_rings;
@@ -5710,7 +5734,7 @@ bnxt_hwrm_reserve_vf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
 				     cp_rings, stats, vnics);
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	if (rc)
-		return -ENOMEM;
+		return rc;
 
 	rc = bnxt_hwrm_get_rings(bp);
 	return rc;
@@ -5891,9 +5915,7 @@ static int bnxt_hwrm_check_vf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
 
 	req.flags = cpu_to_le32(flags);
 	rc = hwrm_send_message_silent(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		return -ENOMEM;
-	return 0;
+	return rc;
 }
 
 static int bnxt_hwrm_check_pf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
@@ -5921,9 +5943,7 @@ static int bnxt_hwrm_check_pf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
 
 	req.flags = cpu_to_le32(flags);
 	rc = hwrm_send_message_silent(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		return -ENOMEM;
-	return 0;
+	return rc;
 }
 
 static int bnxt_hwrm_check_rings(struct bnxt *bp, int tx_rings, int rx_rings,
@@ -6483,8 +6503,6 @@ static int bnxt_hwrm_func_backing_store_cfg(struct bnxt *bp, u32 enables)
 	}
 	req.flags = cpu_to_le32(flags);
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -6746,10 +6764,8 @@ int bnxt_hwrm_func_resc_qcaps(struct bnxt *bp, bool all)
 	mutex_lock(&bp->hwrm_cmd_lock);
 	rc = _hwrm_send_message_silent(bp, &req, sizeof(req),
 				       HWRM_CMD_TIMEOUT);
-	if (rc) {
-		rc = -EIO;
+	if (rc)
 		goto hwrm_func_resc_qcaps_exit;
-	}
 
 	hw_resc->max_tx_sch_inputs = le16_to_cpu(resp->max_tx_scheduler_inputs);
 	if (!all)
@@ -7257,8 +7273,6 @@ static int bnxt_hwrm_set_br_mode(struct bnxt *bp, u16 br_mode)
 	else
 		return -EINVAL;
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -7278,8 +7292,6 @@ static int bnxt_hwrm_set_cache_line_size(struct bnxt *bp, int size)
 		req.options = FUNC_CFG_REQ_OPTIONS_CACHE_LINESIZE_SIZE_128;
 
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index 07301cb..a2ffc3e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -377,8 +377,6 @@ static int bnxt_hwrm_set_dcbx_app(struct bnxt *bp, struct dcb_app *app,
 	set.data_len = cpu_to_le16(sizeof(*data) + sizeof(*fw_app) * n);
 	set.hdr_cnt = 1;
 	rc = hwrm_send_message(bp, &set, sizeof(set), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
 
 set_app_exit:
 	dma_free_coherent(&bp->pdev->dev, data_len, data, mapping);
@@ -433,8 +431,6 @@ static int bnxt_hwrm_queue_dscp2pri_cfg(struct bnxt *bp, struct dcb_app *app,
 	dscp2pri->pri = app->priority;
 	req.entry_cnt = cpu_to_le16(1);
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
 	dma_free_coherent(&bp->pdev->dev, sizeof(*dscp2pri), dscp2pri,
 			  mapping);
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index c05d663..7d9a8c4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -109,13 +109,9 @@ static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
 		memcpy(buf, data_addr, bytesize);
 
 	dma_free_coherent(&bp->pdev->dev, bytesize, data_addr, data_dma_addr);
-	if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
+	if (rc == -EACCES)
 		netdev_err(bp->dev, "PF does not have admin privileges to modify NVM config\n");
-		return -EACCES;
-	} else if (rc) {
-		return -EIO;
-	}
-	return 0;
+	return rc;
 }
 
 static int bnxt_dl_nvm_param_get(struct devlink *dl, u32 id,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 72bb730..a3a8722 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1743,12 +1743,8 @@ static int bnxt_flash_nvram(struct net_device *dev,
 	rc = hwrm_send_message(bp, &req, sizeof(req), FLASH_NVRAM_TIMEOUT);
 	dma_free_coherent(&bp->pdev->dev, data_len, kmem, dma_handle);
 
-	if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
+	if (rc == -EACCES)
 		bnxt_print_admin_err(bp);
-		rc = -EACCES;
-	} else if (rc) {
-		rc = -EIO;
-	}
 	return rc;
 }
 
@@ -1798,12 +1794,8 @@ static int bnxt_firmware_reset(struct net_device *dev,
 	}
 
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
+	if (rc == -EACCES)
 		bnxt_print_admin_err(bp);
-		rc = -EACCES;
-	} else if (rc) {
-		rc = -EIO;
-	}
 	return rc;
 }
 
@@ -2098,12 +2090,8 @@ static int bnxt_flash_package_from_file(struct net_device *dev,
 flash_pkg_exit:
 	mutex_unlock(&bp->hwrm_cmd_lock);
 err_exit:
-	if (hwrm_err == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
+	if (hwrm_err == -EACCES)
 		bnxt_print_admin_err(bp);
-		rc = -EACCES;
-	} else if (hwrm_err) {
-		rc = -EOPNOTSUPP;
-	}
 	return rc;
 }
 
@@ -2642,8 +2630,6 @@ static int bnxt_set_phys_id(struct net_device *dev,
 		led_cfg->led_group_id = bp->leds[i].led_group_id;
 	}
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 2b90a2b..9972862 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -133,7 +133,7 @@ static int bnxt_hwrm_func_qcfg_flags(struct bnxt *bp, struct bnxt_vf_info *vf)
 	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 	if (rc) {
 		mutex_unlock(&bp->hwrm_cmd_lock);
-		return -EIO;
+		return rc;
 	}
 	vf->func_qcfg_flags = le16_to_cpu(resp->flags);
 	mutex_unlock(&bp->hwrm_cmd_lock);
@@ -164,9 +164,7 @@ static int bnxt_hwrm_set_trusted_vf(struct bnxt *bp, struct bnxt_vf_info *vf)
 	else
 		req.flags = cpu_to_le32(FUNC_CFG_REQ_FLAGS_TRUSTED_VF_DISABLE);
 	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (rc)
-		return -EIO;
-	return 0;
+	return rc;
 }
 
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trusted)
@@ -564,10 +562,8 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
-		if (rc) {
-			rc = -ENOMEM;
+		if (rc)
 			break;
-		}
 		pf->active_vfs = i + 1;
 		pf->vf[i].fw_fid = pf->first_vf_id + i;
 	}
@@ -664,8 +660,6 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 		total_vf_tx_rings += vf_tx_rsvd;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
-	if (rc)
-		rc = -ENOMEM;
 	if (pf->active_vfs) {
 		hw_resc->max_tx_rings -= total_vf_tx_rings;
 		hw_resc->max_rx_rings -= vf_rx_rings * num_vfs;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index dd621f6..c8062d0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -319,8 +319,6 @@ static int bnxt_hwrm_cfa_flow_free(struct bnxt *bp,
 	if (rc)
 		netdev_info(bp->dev, "%s: Error rc=%d", __func__, rc);
 
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -515,11 +513,6 @@ static int bnxt_hwrm_cfa_flow_alloc(struct bnxt *bp, struct bnxt_tc_flow *flow,
 		}
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
-
-	if (rc == HWRM_ERR_CODE_RESOURCE_ALLOC_ERROR)
-		rc = -ENOSPC;
-	else if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -591,8 +584,6 @@ static int hwrm_cfa_decap_filter_alloc(struct bnxt *bp,
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -609,8 +600,6 @@ static int hwrm_cfa_decap_filter_free(struct bnxt *bp,
 	if (rc)
 		netdev_info(bp->dev, "%s: Error rc=%d", __func__, rc);
 
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -660,8 +649,6 @@ static int hwrm_cfa_encap_record_alloc(struct bnxt *bp,
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -678,8 +665,6 @@ static int hwrm_cfa_encap_record_free(struct bnxt *bp,
 	if (rc)
 		netdev_info(bp->dev, "%s: Error rc=%d", __func__, rc);
 
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
@@ -1457,8 +1442,6 @@ bnxt_hwrm_cfa_flow_stats_get(struct bnxt *bp, int num_flows,
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
 
-	if (rc)
-		rc = -EIO;
 	return rc;
 }
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 04/22] bnxt_en: Simplify error checking in the SR-IOV message forwarding functions.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (2 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 03/22] bnxt_en: Convert error code in firmware message response to standard code Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 05/22] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

There are 4 functions handling message forwarding for SR-IOV.  They
check for non-zero firmware response code and then return -1.  There
is no need to do this anymore.  The main messaging function will
now return standard error code.  Since we don't need to examine the
response, we can use the hwrm_send_message() variant which will
take the mutex automatically.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 72 +++----------------------
 1 file changed, 8 insertions(+), 64 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 9972862..b6a84d0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -25,7 +25,6 @@
 static int bnxt_hwrm_fwd_async_event_cmpl(struct bnxt *bp,
 					  struct bnxt_vf_info *vf, u16 event_id)
 {
-	struct hwrm_fwd_async_event_cmpl_output *resp = bp->hwrm_cmd_resp_addr;
 	struct hwrm_fwd_async_event_cmpl_input req = {0};
 	struct hwrm_async_event_cmpl *async_cmpl;
 	int rc = 0;
@@ -40,23 +39,10 @@ static int bnxt_hwrm_fwd_async_event_cmpl(struct bnxt *bp,
 	async_cmpl->type = cpu_to_le16(ASYNC_EVENT_CMPL_TYPE_HWRM_ASYNC_EVENT);
 	async_cmpl->event_id = cpu_to_le16(event_id);
 
-	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-
-	if (rc) {
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
 		netdev_err(bp->dev, "hwrm_fwd_async_event_cmpl failed. rc:%d\n",
 			   rc);
-		goto fwd_async_event_cmpl_exit;
-	}
-
-	if (resp->error_code) {
-		netdev_err(bp->dev, "hwrm_fwd_async_event_cmpl error %d\n",
-			   resp->error_code);
-		rc = -1;
-	}
-
-fwd_async_event_cmpl_exit:
-	mutex_unlock(&bp->hwrm_cmd_lock);
 	return rc;
 }
 
@@ -864,7 +850,6 @@ static int bnxt_hwrm_fwd_resp(struct bnxt *bp, struct bnxt_vf_info *vf,
 {
 	int rc = 0;
 	struct hwrm_fwd_resp_input req = {0};
-	struct hwrm_fwd_resp_output *resp = bp->hwrm_cmd_resp_addr;
 
 	if (BNXT_FWD_RESP_SIZE_ERR(msg_size))
 		return -EINVAL;
@@ -879,22 +864,9 @@ static int bnxt_hwrm_fwd_resp(struct bnxt *bp, struct bnxt_vf_info *vf,
 	req.encap_resp_cmpl_ring = encap_resp_cpr;
 	memcpy(req.encap_resp, encap_resp, msg_size);
 
-	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-
-	if (rc) {
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
 		netdev_err(bp->dev, "hwrm_fwd_resp failed. rc:%d\n", rc);
-		goto fwd_resp_exit;
-	}
-
-	if (resp->error_code) {
-		netdev_err(bp->dev, "hwrm_fwd_resp error %d\n",
-			   resp->error_code);
-		rc = -1;
-	}
-
-fwd_resp_exit:
-	mutex_unlock(&bp->hwrm_cmd_lock);
 	return rc;
 }
 
@@ -903,7 +875,6 @@ static int bnxt_hwrm_fwd_err_resp(struct bnxt *bp, struct bnxt_vf_info *vf,
 {
 	int rc = 0;
 	struct hwrm_reject_fwd_resp_input req = {0};
-	struct hwrm_reject_fwd_resp_output *resp = bp->hwrm_cmd_resp_addr;
 
 	if (BNXT_REJ_FWD_RESP_SIZE_ERR(msg_size))
 		return -EINVAL;
@@ -914,22 +885,9 @@ static int bnxt_hwrm_fwd_err_resp(struct bnxt *bp, struct bnxt_vf_info *vf,
 	req.encap_resp_target_id = cpu_to_le16(vf->fw_fid);
 	memcpy(req.encap_request, vf->hwrm_cmd_req_addr, msg_size);
 
-	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-
-	if (rc) {
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
 		netdev_err(bp->dev, "hwrm_fwd_err_resp failed. rc:%d\n", rc);
-		goto fwd_err_resp_exit;
-	}
-
-	if (resp->error_code) {
-		netdev_err(bp->dev, "hwrm_fwd_err_resp error %d\n",
-			   resp->error_code);
-		rc = -1;
-	}
-
-fwd_err_resp_exit:
-	mutex_unlock(&bp->hwrm_cmd_lock);
 	return rc;
 }
 
@@ -938,7 +896,6 @@ static int bnxt_hwrm_exec_fwd_resp(struct bnxt *bp, struct bnxt_vf_info *vf,
 {
 	int rc = 0;
 	struct hwrm_exec_fwd_resp_input req = {0};
-	struct hwrm_exec_fwd_resp_output *resp = bp->hwrm_cmd_resp_addr;
 
 	if (BNXT_EXEC_FWD_RESP_SIZE_ERR(msg_size))
 		return -EINVAL;
@@ -949,22 +906,9 @@ static int bnxt_hwrm_exec_fwd_resp(struct bnxt *bp, struct bnxt_vf_info *vf,
 	req.encap_resp_target_id = cpu_to_le16(vf->fw_fid);
 	memcpy(req.encap_request, vf->hwrm_cmd_req_addr, msg_size);
 
-	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-
-	if (rc) {
+	rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
 		netdev_err(bp->dev, "hwrm_exec_fw_resp failed. rc:%d\n", rc);
-		goto exec_fwd_resp_exit;
-	}
-
-	if (resp->error_code) {
-		netdev_err(bp->dev, "hwrm_exec_fw_resp error %d\n",
-			   resp->error_code);
-		rc = -1;
-	}
-
-exec_fwd_resp_exit:
-	mutex_unlock(&bp->hwrm_cmd_lock);
 	return rc;
 }
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 05/22] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (3 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 04/22] bnxt_en: Simplify error checking in the SR-IOV message forwarding functions Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 06/22] bnxt_en: Prepare bnxt_init_one() to be called multiple times Michael Chan
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

If the silent parameter is set, suppress all messages when there is
no response from firmware.  When polling for firmware to come out of
reset, no response may be normal and we want to suppress the error
messages.  Also, don't poll for the firmware DMA response if Bus Master
is disabled.  This is in preparation for error recovery when firmware
may be in error or reset state or Bus Master is disabled.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b9eb24e..3e48bcb 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4155,6 +4155,9 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 	/* Ring channel doorbell */
 	writel(1, bp->bar0 + doorbell_offset);
 
+	if (!pci_is_enabled(bp->pdev))
+		return 0;
+
 	if (!timeout)
 		timeout = DFLT_HWRM_CMD_TIMEOUT;
 	/* convert timeout to usec */
@@ -4186,8 +4189,9 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		}
 
 		if (bp->hwrm_intr_seq_id != (u16)~seq_id) {
-			netdev_err(bp->dev, "Resp cmpl intr err msg: 0x%x\n",
-				   le16_to_cpu(req->req_type));
+			if (!silent)
+				netdev_err(bp->dev, "Resp cmpl intr err msg: 0x%x\n",
+					   le16_to_cpu(req->req_type));
 			return -EBUSY;
 		}
 		len = (le32_to_cpu(*resp_len) & HWRM_RESP_LEN_MASK) >>
@@ -4212,10 +4216,11 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		}
 
 		if (i >= tmo_count) {
-			netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d\n",
-				   HWRM_TOTAL_TIMEOUT(i),
-				   le16_to_cpu(req->req_type),
-				   le16_to_cpu(req->seq_id), len);
+			if (!silent)
+				netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d\n",
+					   HWRM_TOTAL_TIMEOUT(i),
+					   le16_to_cpu(req->req_type),
+					   le16_to_cpu(req->seq_id), len);
 			return -EBUSY;
 		}
 
@@ -4230,10 +4235,12 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		}
 
 		if (j >= HWRM_VALID_BIT_DELAY_USEC) {
-			netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d v:%d\n",
-				   HWRM_TOTAL_TIMEOUT(i),
-				   le16_to_cpu(req->req_type),
-				   le16_to_cpu(req->seq_id), len, *valid);
+			if (!silent)
+				netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 0x%x} len:%d v:%d\n",
+					   HWRM_TOTAL_TIMEOUT(i),
+					   le16_to_cpu(req->req_type),
+					   le16_to_cpu(req->seq_id), len,
+					   *valid);
 			return -EBUSY;
 		}
 	}
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 06/22] bnxt_en: Prepare bnxt_init_one() to be called multiple times.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (4 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 05/22] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 07/22] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

In preparation for the new firmware reset feature, some of the logic
in bnxt_init_one() and related functions will be called again after
firmware has reset.  Reset some of the flags and capabilities so that
everything that can change can be re-initialized.  Refactor some
functions to probe firmware versions and capabilities.  Check some
buffers before allocating as they may have been allocated previously.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 126 +++++++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c     |   2 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   5 +-
 3 files changed, 93 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 3e48bcb..696a548 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3555,6 +3555,9 @@ static int bnxt_alloc_kong_hwrm_resources(struct bnxt *bp)
 {
 	struct pci_dev *pdev = bp->pdev;
 
+	if (bp->hwrm_cmd_kong_resp_addr)
+		return 0;
+
 	bp->hwrm_cmd_kong_resp_addr =
 		dma_alloc_coherent(&pdev->dev, PAGE_SIZE,
 				   &bp->hwrm_cmd_kong_resp_dma_addr,
@@ -3594,6 +3597,9 @@ static int bnxt_alloc_hwrm_short_cmd_req(struct bnxt *bp)
 {
 	struct pci_dev *pdev = bp->pdev;
 
+	if (bp->hwrm_short_cmd_req_addr)
+		return 0;
+
 	bp->hwrm_short_cmd_req_addr =
 		dma_alloc_coherent(&pdev->dev, bp->hwrm_max_ext_req_len,
 				   &bp->hwrm_short_cmd_req_dma_addr,
@@ -5017,6 +5023,7 @@ static int bnxt_hwrm_vnic_qcaps(struct bnxt *bp)
 	int rc;
 
 	bp->hw_ring_stats_size = sizeof(struct ctx_hw_stats);
+	bp->flags &= ~(BNXT_FLAG_NEW_RSS_CAP | BNXT_FLAG_ROCE_MIRROR_CAP);
 	if (bp->hwrm_spec_code < 0x10600)
 		return 0;
 
@@ -6871,6 +6878,7 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp)
 		pf->max_tx_wm_flows = le32_to_cpu(resp->max_tx_wm_flows);
 		pf->max_rx_em_flows = le32_to_cpu(resp->max_rx_em_flows);
 		pf->max_rx_wm_flows = le32_to_cpu(resp->max_rx_wm_flows);
+		bp->flags &= ~BNXT_FLAG_WOL_CAP;
 		if (flags & FUNC_QCAPS_RESP_FLAGS_WOL_MAGICPKT_SUPPORTED)
 			bp->flags |= BNXT_FLAG_WOL_CAP;
 	} else {
@@ -6999,20 +7007,30 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp)
 	return rc;
 }
 
-static int bnxt_hwrm_ver_get(struct bnxt *bp)
+static int __bnxt_hwrm_ver_get(struct bnxt *bp, bool silent)
 {
-	int rc;
 	struct hwrm_ver_get_input req = {0};
-	struct hwrm_ver_get_output *resp = bp->hwrm_cmd_resp_addr;
-	u32 dev_caps_cfg;
+	int rc;
 
-	bp->hwrm_max_req_len = HWRM_MAX_REQ_LEN;
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_VER_GET, -1, -1);
 	req.hwrm_intf_maj = HWRM_VERSION_MAJOR;
 	req.hwrm_intf_min = HWRM_VERSION_MINOR;
 	req.hwrm_intf_upd = HWRM_VERSION_UPDATE;
+
+	rc = bnxt_hwrm_do_send_msg(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT,
+				   silent);
+	return rc;
+}
+
+static int bnxt_hwrm_ver_get(struct bnxt *bp)
+{
+	struct hwrm_ver_get_output *resp = bp->hwrm_cmd_resp_addr;
+	u32 dev_caps_cfg;
+	int rc;
+
+	bp->hwrm_max_req_len = HWRM_MAX_REQ_LEN;
 	mutex_lock(&bp->hwrm_cmd_lock);
-	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	rc = __bnxt_hwrm_ver_get(bp, false);
 	if (rc)
 		goto hwrm_ver_get_exit;
 
@@ -8181,6 +8199,9 @@ static int bnxt_hwrm_phy_qcaps(struct bnxt *bp)
 	struct hwrm_port_phy_qcaps_output *resp = bp->hwrm_cmd_resp_addr;
 	struct bnxt_link_info *link_info = &bp->link_info;
 
+	bp->flags &= ~BNXT_FLAG_EEE_CAP;
+	if (bp->test_info)
+		bp->test_info->flags &= ~BNXT_TEST_FL_EXT_LPBK;
 	if (bp->hwrm_spec_code < 0x10201)
 		return 0;
 
@@ -8546,6 +8567,7 @@ static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp)
 	struct bnxt_pf_info *pf = &bp->pf;
 	int rc;
 
+	bp->num_leds = 0;
 	if (BNXT_VF(bp) || bp->hwrm_spec_code < 0x10601)
 		return 0;
 
@@ -8640,6 +8662,7 @@ static void bnxt_get_wol_settings(struct bnxt *bp)
 {
 	u16 handle = 0;
 
+	bp->wol = 0;
 	if (!BNXT_PF(bp) || !(bp->flags & BNXT_FLAG_WOL_CAP))
 		return;
 
@@ -8686,6 +8709,9 @@ static void bnxt_hwmon_open(struct bnxt *bp)
 {
 	struct pci_dev *pdev = bp->pdev;
 
+	if (bp->hwmon_dev)
+		return;
+
 	bp->hwmon_dev = hwmon_device_register_with_groups(&pdev->dev,
 							  DRV_MODULE_NAME, bp,
 							  bnxt_groups);
@@ -10002,6 +10028,53 @@ static int bnxt_fw_init_one_p2(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_set_dflt_rss_hash_type(struct bnxt *bp)
+{
+	bp->flags &= ~BNXT_FLAG_UDP_RSS_CAP;
+	bp->rss_hash_cfg = VNIC_RSS_CFG_REQ_HASH_TYPE_IPV4 |
+			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4 |
+			   VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6 |
+			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV6;
+	if (BNXT_CHIP_P4(bp) && bp->hwrm_spec_code >= 0x10501) {
+		bp->flags |= BNXT_FLAG_UDP_RSS_CAP;
+		bp->rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV4 |
+				    VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV6;
+	}
+}
+
+static void bnxt_set_dflt_rfs(struct bnxt *bp)
+{
+	struct net_device *dev = bp->dev;
+
+	dev->hw_features &= ~NETIF_F_NTUPLE;
+	dev->features &= ~NETIF_F_NTUPLE;
+	bp->flags &= ~BNXT_FLAG_RFS;
+	if (bnxt_rfs_supported(bp)) {
+		dev->hw_features |= NETIF_F_NTUPLE;
+		if (bnxt_rfs_capable(bp)) {
+			bp->flags |= BNXT_FLAG_RFS;
+			dev->features |= NETIF_F_NTUPLE;
+		}
+	}
+}
+
+static void bnxt_fw_init_one_p3(struct bnxt *bp)
+{
+	struct pci_dev *pdev = bp->pdev;
+
+	bnxt_set_dflt_rss_hash_type(bp);
+	bnxt_set_dflt_rfs(bp);
+
+	bnxt_get_wol_settings(bp);
+	if (bp->flags & BNXT_FLAG_WOL_CAP)
+		device_set_wakeup_enable(&pdev->dev, bp->wol);
+	else
+		device_set_wakeup_capable(&pdev->dev, false);
+
+	bnxt_hwrm_set_cache_line_size(bp, cache_line_size());
+	bnxt_hwrm_coal_params_qcaps(bp);
+}
+
 static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 {
 	int rc;
@@ -10607,7 +10680,7 @@ static void bnxt_remove_one(struct pci_dev *pdev)
 	free_netdev(dev);
 }
 
-static int bnxt_probe_phy(struct bnxt *bp)
+static int bnxt_probe_phy(struct bnxt *bp, bool fw_dflt)
 {
 	int rc = 0;
 	struct bnxt_link_info *link_info = &bp->link_info;
@@ -10618,8 +10691,6 @@ static int bnxt_probe_phy(struct bnxt *bp)
 			   rc);
 		return rc;
 	}
-	mutex_init(&bp->link_lock);
-
 	rc = bnxt_update_link(bp, false);
 	if (rc) {
 		netdev_err(bp->dev, "Probe phy can't update link (rc: %x)\n",
@@ -10633,6 +10704,9 @@ static int bnxt_probe_phy(struct bnxt *bp)
 	if (link_info->auto_link_speeds && !link_info->support_auto_speeds)
 		link_info->support_auto_speeds = link_info->support_speeds;
 
+	if (!fw_dflt)
+		return 0;
+
 	/*initialize the ethool setting copy with NVM settings */
 	if (BNXT_AUTO_MODE(link_info->auto_mode)) {
 		link_info->autoneg = BNXT_AUTONEG_SPEED;
@@ -10653,7 +10727,7 @@ static int bnxt_probe_phy(struct bnxt *bp)
 			link_info->auto_pause_setting & BNXT_LINK_PAUSE_BOTH;
 	else
 		link_info->req_flow_ctrl = link_info->force_pause_setting;
-	return rc;
+	return 0;
 }
 
 static int bnxt_get_max_irq(struct pci_dev *pdev)
@@ -10957,6 +11031,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto init_err_pci_clean;
 
 	mutex_init(&bp->hwrm_cmd_lock);
+	mutex_init(&bp->link_lock);
 
 	rc = bnxt_fw_init_one_p1(bp);
 	if (rc)
@@ -11032,7 +11107,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->min_mtu = ETH_ZLEN;
 	dev->max_mtu = bp->max_mtu;
 
-	rc = bnxt_probe_phy(bp);
+	rc = bnxt_probe_phy(bp, true);
 	if (rc)
 		goto init_err_pci_clean;
 
@@ -11046,24 +11121,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto init_err_pci_clean;
 	}
 
-	/* Default RSS hash cfg. */
-	bp->rss_hash_cfg = VNIC_RSS_CFG_REQ_HASH_TYPE_IPV4 |
-			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4 |
-			   VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6 |
-			   VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV6;
-	if (BNXT_CHIP_P4(bp) && bp->hwrm_spec_code >= 0x10501) {
-		bp->flags |= BNXT_FLAG_UDP_RSS_CAP;
-		bp->rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV4 |
-				    VNIC_RSS_CFG_REQ_HASH_TYPE_UDP_IPV6;
-	}
-
-	if (bnxt_rfs_supported(bp)) {
-		dev->hw_features |= NETIF_F_NTUPLE;
-		if (bnxt_rfs_capable(bp)) {
-			bp->flags |= BNXT_FLAG_RFS;
-			dev->features |= NETIF_F_NTUPLE;
-		}
-	}
+	bnxt_fw_init_one_p3(bp);
 
 	if (dev->hw_features & NETIF_F_HW_VLAN_CTAG_RX)
 		bp->flags |= BNXT_FLAG_STRIP_VLAN;
@@ -11077,16 +11135,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 */
 	bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
 
-	bnxt_get_wol_settings(bp);
-	if (bp->flags & BNXT_FLAG_WOL_CAP)
-		device_set_wakeup_enable(&pdev->dev, bp->wol);
-	else
-		device_set_wakeup_capable(&pdev->dev, false);
-
-	bnxt_hwrm_set_cache_line_size(bp, cache_line_size());
-
-	bnxt_hwrm_coal_params_qcaps(bp);
-
 	if (BNXT_PF(bp)) {
 		if (!bnxt_pf_wq) {
 			bnxt_pf_wq =
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
index a2ffc3e..fb6f30d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
@@ -389,6 +389,7 @@ static int bnxt_hwrm_queue_dscp_qcaps(struct bnxt *bp)
 	struct hwrm_queue_dscp_qcaps_input req = {0};
 	int rc;
 
+	bp->max_dscp_value = 0;
 	if (bp->hwrm_spec_code < 0x10800 || BNXT_VF(bp))
 		return 0;
 
@@ -718,6 +719,7 @@ static const struct dcbnl_rtnl_ops dcbnl_ops = {
 
 void bnxt_dcb_init(struct bnxt *bp)
 {
+	bp->dcbx_cap = 0;
 	if (bp->hwrm_spec_code < 0x10501)
 		return;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index a3a8722..235265e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -3350,6 +3350,7 @@ void bnxt_ethtool_init(struct bnxt *bp)
 	if (!(bp->fw_cap & BNXT_FW_CAP_PKG_VER))
 		bnxt_get_pkgver(dev);
 
+	bp->num_tests = 0;
 	if (bp->hwrm_spec_code < 0x10704 || !BNXT_SINGLE_PF(bp))
 		return;
 
@@ -3359,7 +3360,9 @@ void bnxt_ethtool_init(struct bnxt *bp)
 	if (rc)
 		goto ethtool_init_exit;
 
-	test_info = kzalloc(sizeof(*bp->test_info), GFP_KERNEL);
+	test_info = bp->test_info;
+	if (!test_info)
+		test_info = kzalloc(sizeof(*bp->test_info), GFP_KERNEL);
 	if (!test_info)
 		goto ethtool_init_exit;
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 07/22] bnxt_en: Refactor bnxt_sriov_enable().
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (5 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 06/22] bnxt_en: Prepare bnxt_init_one() to be called multiple times Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 08/22] bnxt_en: Register buffers for VFs before reserving resources Michael Chan
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Refactor the hardware/firmware configuration portion in
bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
new function can be called after a firmware reset to reconfigure the
VFs previously enabled.

v2: straight refactor of the code.  Reordering done in the next patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 50 +++++++++++++++++--------
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  1 +
 2 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index b6a84d0..7506d20 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -667,6 +667,32 @@ static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
 		return bnxt_hwrm_func_cfg(bp, num_vfs);
 }
 
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+{
+	int rc;
+
+	/* Reserve resources for VFs */
+	rc = bnxt_func_cfg(bp, *num_vfs);
+	if (rc != *num_vfs) {
+		if (rc <= 0) {
+			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
+			*num_vfs = 0;
+			return rc;
+		}
+		netdev_warn(bp->dev, "Only able to reserve resources for %d VFs.\n",
+			    rc);
+		*num_vfs = rc;
+	}
+
+	/* Register buffers for VFs */
+	rc = bnxt_hwrm_func_buf_rgtr(bp);
+	if (rc)
+		return rc;
+
+	bnxt_ulp_sriov_cfg(bp, *num_vfs);
+	return 0;
+}
+
 static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 {
 	int rc = 0, vfs_supported;
@@ -732,25 +758,10 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 	if (rc)
 		goto err_out1;
 
-	/* Reserve resources for VFs */
-	rc = bnxt_func_cfg(bp, *num_vfs);
-	if (rc != *num_vfs) {
-		if (rc <= 0) {
-			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
-			*num_vfs = 0;
-			goto err_out2;
-		}
-		netdev_warn(bp->dev, "Only able to reserve resources for %d VFs.\n", rc);
-		*num_vfs = rc;
-	}
-
-	/* Register buffers for VFs */
-	rc = bnxt_hwrm_func_buf_rgtr(bp);
+	rc = bnxt_cfg_hw_sriov(bp, num_vfs);
 	if (rc)
 		goto err_out2;
 
-	bnxt_ulp_sriov_cfg(bp, *num_vfs);
-
 	rc = pci_enable_sriov(bp->pdev, *num_vfs);
 	if (rc)
 		goto err_out2;
@@ -1128,6 +1139,13 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 }
 #else
 
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+{
+	if (*num_vfs)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
 void bnxt_sriov_disable(struct bnxt *bp)
 {
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index 2eed9ed..0abf18e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -36,6 +36,7 @@ int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 08/22] bnxt_en: Register buffers for VFs before reserving resources.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (6 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 07/22] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 09/22] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

When VFs need to be reconfigured dynamically after firmwware reset, the
configuration sequence on the PF needs to be changed to register the VF
buffers first.  Otherwise, some VF firmware commands may not succeed as
there may not be PF buffers ready for the re-directed firmware commands.

This sequencing did not matter much before when we only supported
the normal bring-up of VFs.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 7506d20..ac890ca 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -671,6 +671,11 @@ int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
 {
 	int rc;
 
+	/* Register buffers for VFs */
+	rc = bnxt_hwrm_func_buf_rgtr(bp);
+	if (rc)
+		return rc;
+
 	/* Reserve resources for VFs */
 	rc = bnxt_func_cfg(bp, *num_vfs);
 	if (rc != *num_vfs) {
@@ -684,11 +689,6 @@ int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
 		*num_vfs = rc;
 	}
 
-	/* Register buffers for VFs */
-	rc = bnxt_hwrm_func_buf_rgtr(bp);
-	if (rc)
-		return rc;
-
 	bnxt_ulp_sriov_cfg(bp, *num_vfs);
 	return 0;
 }
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 09/22] bnxt_en: Handle firmware reset status during IF_UP.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (7 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 08/22] bnxt_en: Register buffers for VFs before reserving resources Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 10/22] bnxt_en: Discover firmware error recovery capabilities Michael Chan
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

During IF_UP, newer firmware has a new status flag that indicates that
firmware has reset.  Add new function bnxt_fw_init_one() to re-probe the
firmware and re-setup VF resources on the PF if necessary.  If the
re-probe fails, set a flag to prevent bnxt_open() from proceeding again.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 107 ++++++++++++++++++++++++------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   2 +
 2 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 696a548..303933b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8523,11 +8523,14 @@ static int bnxt_hwrm_shutdown_link(struct bnxt *bp)
 	return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 }
 
+static int bnxt_fw_init_one(struct bnxt *bp);
+
 static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 {
 	struct hwrm_func_drv_if_change_output *resp = bp->hwrm_cmd_resp_addr;
 	struct hwrm_func_drv_if_change_input req = {0};
-	bool resc_reinit = false;
+	bool resc_reinit = false, fw_reset = false;
+	u32 flags = 0;
 	int rc;
 
 	if (!(bp->fw_cap & BNXT_FW_CAP_IF_CHANGE))
@@ -8538,26 +8541,53 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 		req.flags = cpu_to_le32(FUNC_DRV_IF_CHANGE_REQ_FLAGS_UP);
 	mutex_lock(&bp->hwrm_cmd_lock);
 	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-	if (!rc && (resp->flags &
-		    cpu_to_le32(FUNC_DRV_IF_CHANGE_RESP_FLAGS_RESC_CHANGE)))
-		resc_reinit = true;
+	if (!rc)
+		flags = le32_to_cpu(resp->flags);
 	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (rc)
+		return rc;
 
-	if (up && resc_reinit && BNXT_NEW_RM(bp)) {
-		struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+	if (!up)
+		return 0;
 
-		rc = bnxt_hwrm_func_resc_qcaps(bp, true);
-		hw_resc->resv_cp_rings = 0;
-		hw_resc->resv_stat_ctxs = 0;
-		hw_resc->resv_irqs = 0;
-		hw_resc->resv_tx_rings = 0;
-		hw_resc->resv_rx_rings = 0;
-		hw_resc->resv_hw_ring_grps = 0;
-		hw_resc->resv_vnics = 0;
-		bp->tx_nr_rings = 0;
-		bp->rx_nr_rings = 0;
+	if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_RESC_CHANGE)
+		resc_reinit = true;
+	if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE)
+		fw_reset = true;
+
+	if (resc_reinit || fw_reset) {
+		if (fw_reset) {
+			rc = bnxt_fw_init_one(bp);
+			if (rc) {
+				set_bit(BNXT_STATE_ABORT_ERR, &bp->state);
+				return rc;
+			}
+			bnxt_clear_int_mode(bp);
+			rc = bnxt_init_int_mode(bp);
+			if (rc) {
+				netdev_err(bp->dev, "init int mode failed\n");
+				return rc;
+			}
+			set_bit(BNXT_STATE_FW_RESET_DET, &bp->state);
+		}
+		if (BNXT_NEW_RM(bp)) {
+			struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
+
+			rc = bnxt_hwrm_func_resc_qcaps(bp, true);
+			hw_resc->resv_cp_rings = 0;
+			hw_resc->resv_stat_ctxs = 0;
+			hw_resc->resv_irqs = 0;
+			hw_resc->resv_tx_rings = 0;
+			hw_resc->resv_rx_rings = 0;
+			hw_resc->resv_hw_ring_grps = 0;
+			hw_resc->resv_vnics = 0;
+			if (!fw_reset) {
+				bp->tx_nr_rings = 0;
+				bp->rx_nr_rings = 0;
+			}
+		}
 	}
-	return rc;
+	return 0;
 }
 
 static int bnxt_hwrm_port_led_qcaps(struct bnxt *bp)
@@ -8977,12 +9007,28 @@ static int bnxt_open(struct net_device *dev)
 	struct bnxt *bp = netdev_priv(dev);
 	int rc;
 
-	bnxt_hwrm_if_change(bp, true);
-	rc = __bnxt_open_nic(bp, true, true);
+	if (test_bit(BNXT_STATE_ABORT_ERR, &bp->state)) {
+		netdev_err(bp->dev, "A previous firmware reset did not complete, aborting\n");
+		return -ENODEV;
+	}
+
+	rc = bnxt_hwrm_if_change(bp, true);
 	if (rc)
+		return rc;
+	rc = __bnxt_open_nic(bp, true, true);
+	if (rc) {
 		bnxt_hwrm_if_change(bp, false);
+	} else {
+		if (test_and_clear_bit(BNXT_STATE_FW_RESET_DET, &bp->state) &&
+		    BNXT_PF(bp)) {
+			struct bnxt_pf_info *pf = &bp->pf;
+			int n = pf->active_vfs;
 
-	bnxt_hwmon_open(bp);
+			if (n)
+				bnxt_cfg_hw_sriov(bp, &n);
+		}
+		bnxt_hwmon_open(bp);
+	}
 
 	return rc;
 }
@@ -10075,6 +10121,27 @@ static void bnxt_fw_init_one_p3(struct bnxt *bp)
 	bnxt_hwrm_coal_params_qcaps(bp);
 }
 
+static int bnxt_fw_init_one(struct bnxt *bp)
+{
+	int rc;
+
+	rc = bnxt_fw_init_one_p1(bp);
+	if (rc) {
+		netdev_err(bp->dev, "Firmware init phase 1 failed\n");
+		return rc;
+	}
+	rc = bnxt_fw_init_one_p2(bp);
+	if (rc) {
+		netdev_err(bp->dev, "Firmware init phase 2 failed\n");
+		return rc;
+	}
+	rc = bnxt_approve_mac(bp, bp->dev->dev_addr, false);
+	if (rc)
+		return rc;
+	bnxt_fw_init_one_p3(bp);
+	return 0;
+}
+
 static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 {
 	int rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 1b1610d..3e0fcc2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1555,6 +1555,8 @@ struct bnxt {
 #define BNXT_STATE_OPEN		0
 #define BNXT_STATE_IN_SP_TASK	1
 #define BNXT_STATE_READ_STATS	2
+#define BNXT_STATE_FW_RESET_DET 3
+#define BNXT_STATE_ABORT_ERR	5
 
 	struct bnxt_irq	*irq_tbl;
 	int			total_irqs;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 10/22] bnxt_en: Discover firmware error recovery capabilities.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (8 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 09/22] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 11/22] bnxt_en: Pre-map the firmware health monitoring registers Michael Chan
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 77 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 38 +++++++++++++++
 2 files changed, 115 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 303933b..825a7f9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6847,6 +6847,8 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp)
 		bp->fw_cap |= BNXT_FW_CAP_PCIE_STATS_SUPPORTED;
 	if (flags & FUNC_QCAPS_RESP_FLAGS_EXT_STATS_SUPPORTED)
 		bp->fw_cap |= BNXT_FW_CAP_EXT_STATS_SUPPORTED;
+	if (flags &  FUNC_QCAPS_RESP_FLAGS_ERROR_RECOVERY_CAPABLE)
+		bp->fw_cap |= BNXT_FW_CAP_ERROR_RECOVERY;
 
 	bp->tx_push_thresh = 0;
 	if (flags & FUNC_QCAPS_RESP_FLAGS_PUSH_MODE_SUPPORTED)
@@ -6948,6 +6950,74 @@ static int bnxt_hwrm_cfa_adv_flow_mgnt_qcaps(struct bnxt *bp)
 	return rc;
 }
 
+static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
+{
+	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	struct hwrm_error_recovery_qcfg_input req = {0};
+	int rc, i;
+
+	if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
+		return 0;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
+	mutex_lock(&bp->hwrm_cmd_lock);
+	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		goto err_recovery_out;
+	if (!fw_health) {
+		fw_health = kzalloc(sizeof(*fw_health), GFP_KERNEL);
+		bp->fw_health = fw_health;
+		if (!fw_health) {
+			rc = -ENOMEM;
+			goto err_recovery_out;
+		}
+	}
+	fw_health->flags = le32_to_cpu(resp->flags);
+	if ((fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU) &&
+	    !(bp->fw_cap & BNXT_FW_CAP_KONG_MB_CHNL)) {
+		rc = -EINVAL;
+		goto err_recovery_out;
+	}
+	fw_health->polling_dsecs = le32_to_cpu(resp->driver_polling_freq);
+	fw_health->master_func_wait_dsecs =
+		le32_to_cpu(resp->master_func_wait_period);
+	fw_health->normal_func_wait_dsecs =
+		le32_to_cpu(resp->normal_func_wait_period);
+	fw_health->post_reset_wait_dsecs =
+		le32_to_cpu(resp->master_func_wait_period_after_reset);
+	fw_health->post_reset_max_wait_dsecs =
+		le32_to_cpu(resp->max_bailout_time_after_reset);
+	fw_health->regs[BNXT_FW_HEALTH_REG] =
+		le32_to_cpu(resp->fw_health_status_reg);
+	fw_health->regs[BNXT_FW_HEARTBEAT_REG] =
+		le32_to_cpu(resp->fw_heartbeat_reg);
+	fw_health->regs[BNXT_FW_RESET_CNT_REG] =
+		le32_to_cpu(resp->fw_reset_cnt_reg);
+	fw_health->regs[BNXT_FW_RESET_INPROG_REG] =
+		le32_to_cpu(resp->reset_inprogress_reg);
+	fw_health->fw_reset_inprog_reg_mask =
+		le32_to_cpu(resp->reset_inprogress_reg_mask);
+	fw_health->fw_reset_seq_cnt = resp->reg_array_cnt;
+	if (fw_health->fw_reset_seq_cnt >= 16) {
+		rc = -EINVAL;
+		goto err_recovery_out;
+	}
+	for (i = 0; i < fw_health->fw_reset_seq_cnt; i++) {
+		fw_health->fw_reset_seq_regs[i] =
+			le32_to_cpu(resp->reset_reg[i]);
+		fw_health->fw_reset_seq_vals[i] =
+			le32_to_cpu(resp->reset_reg_val[i]);
+		fw_health->fw_reset_seq_delay_msec[i] =
+			resp->delay_after_reset[i];
+	}
+err_recovery_out:
+	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (rc)
+		bp->fw_cap &= ~BNXT_FW_CAP_ERROR_RECOVERY;
+	return rc;
+}
+
 static int bnxt_hwrm_func_reset(struct bnxt *bp)
 {
 	struct hwrm_func_reset_input req = {0};
@@ -10058,6 +10128,11 @@ static int bnxt_fw_init_one_p2(struct bnxt *bp)
 		netdev_warn(bp->dev, "hwrm query adv flow mgnt failure rc: %d\n",
 			    rc);
 
+	rc = bnxt_hwrm_error_recovery_qcfg(bp);
+	if (rc)
+		netdev_warn(bp->dev, "hwrm query error recovery failure rc: %d\n",
+			    rc);
+
 	rc = bnxt_hwrm_func_drv_rgtr(bp);
 	if (rc)
 		return -ENODEV;
@@ -11238,6 +11313,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	bnxt_free_ctx_mem(bp);
 	kfree(bp->ctx);
 	bp->ctx = NULL;
+	kfree(bp->fw_health);
+	bp->fw_health = NULL;
 	bnxt_cleanup_pci(bp);
 
 init_err_free:
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3e0fcc2..ce535e5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1333,6 +1333,41 @@ struct bnxt_ctx_mem_info {
 	struct bnxt_ctx_pg_info *tqm_mem[9];
 };
 
+struct bnxt_fw_health {
+	u32 flags;
+	u32 polling_dsecs;
+	u32 master_func_wait_dsecs;
+	u32 normal_func_wait_dsecs;
+	u32 post_reset_wait_dsecs;
+	u32 post_reset_max_wait_dsecs;
+	u32 regs[4];
+	u32 mapped_regs[4];
+#define BNXT_FW_HEALTH_REG		0
+#define BNXT_FW_HEARTBEAT_REG		1
+#define BNXT_FW_RESET_CNT_REG		2
+#define BNXT_FW_RESET_INPROG_REG	3
+	u32 fw_reset_inprog_reg_mask;
+	u32 last_fw_heartbeat;
+	u32 last_fw_reset_cnt;
+	u8 enabled:1;
+	u8 master:1;
+	u8 tmr_multiplier;
+	u8 tmr_counter;
+	u8 fw_reset_seq_cnt;
+	u32 fw_reset_seq_regs[16];
+	u32 fw_reset_seq_vals[16];
+	u32 fw_reset_seq_delay_msec[16];
+};
+
+#define BNXT_FW_HEALTH_REG_TYPE_MASK	3
+#define BNXT_FW_HEALTH_REG_TYPE_CFG	0
+#define BNXT_FW_HEALTH_REG_TYPE_GRC	1
+#define BNXT_FW_HEALTH_REG_TYPE_BAR0	2
+#define BNXT_FW_HEALTH_REG_TYPE_BAR1	3
+
+#define BNXT_FW_HEALTH_REG_TYPE(reg)	((reg) & BNXT_FW_HEALTH_REG_TYPE_MASK)
+#define BNXT_FW_HEALTH_REG_OFF(reg)	((reg) & ~BNXT_FW_HEALTH_REG_TYPE_MASK)
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
@@ -1581,6 +1616,7 @@ struct bnxt {
 	#define BNXT_FW_CAP_KONG_MB_CHNL		0x00000080
 	#define BNXT_FW_CAP_OVS_64BIT_HANDLE		0x00000400
 	#define BNXT_FW_CAP_TRUSTED_VF			0x00000800
+	#define BNXT_FW_CAP_ERROR_RECOVERY		0x00002000
 	#define BNXT_FW_CAP_PKG_VER			0x00004000
 	#define BNXT_FW_CAP_CFA_ADV_FLOW		0x00008000
 	#define BNXT_FW_CAP_CFA_RFS_RING_TBL_IDX	0x00010000
@@ -1666,6 +1702,8 @@ struct bnxt {
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 
+	struct bnxt_fw_health	*fw_health;
+
 	struct bnxt_hw_resc	hw_resc;
 	struct bnxt_pf_info	pf;
 	struct bnxt_ctx_mem_info	*ctx;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 11/22] bnxt_en: Pre-map the firmware health monitoring registers.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (9 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 10/22] bnxt_en: Discover firmware error recovery capabilities Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 12/22] bnxt_en: Enable health monitoring Michael Chan
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Pre-map the GRC registers for periodic firmware health monitoring.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 29 +++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  6 ++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 825a7f9..8ec41d6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6950,6 +6950,33 @@ static int bnxt_hwrm_cfa_adv_flow_mgnt_qcaps(struct bnxt *bp)
 	return rc;
 }
 
+static int bnxt_map_fw_health_regs(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg_base = 0xffffffff;
+	int i;
+
+	/* Only pre-map the monitoring GRC registers using window 3 */
+	for (i = 0; i < 4; i++) {
+		u32 reg = fw_health->regs[i];
+
+		if (BNXT_FW_HEALTH_REG_TYPE(reg) != BNXT_FW_HEALTH_REG_TYPE_GRC)
+			continue;
+		if (reg_base == 0xffffffff)
+			reg_base = reg & BNXT_GRC_BASE_MASK;
+		if ((reg & BNXT_GRC_BASE_MASK) != reg_base)
+			return -ERANGE;
+		fw_health->mapped_regs[i] = BNXT_FW_HEALTH_WIN_BASE +
+					    (reg & BNXT_GRC_OFFSET_MASK);
+	}
+	if (reg_base == 0xffffffff)
+		return 0;
+
+	writel(reg_base, bp->bar0 + BNXT_GRCPF_REG_WINDOW_BASE_OUT +
+			 BNXT_FW_HEALTH_WIN_MAP_OFF);
+	return 0;
+}
+
 static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 {
 	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
@@ -7013,6 +7040,8 @@ static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 	}
 err_recovery_out:
 	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (!rc)
+		rc = bnxt_map_fw_health_regs(bp);
 	if (rc)
 		bp->fw_cap &= ~BNXT_FW_CAP_ERROR_RECOVERY;
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index ce535e5..78fd585 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1217,6 +1217,9 @@ struct bnxt_test_info {
 #define BNXT_GRCPF_REG_KONG_COMM		0xA00
 #define BNXT_GRCPF_REG_KONG_COMM_TRIGGER	0xB00
 
+#define BNXT_GRC_BASE_MASK			0xfffff000
+#define BNXT_GRC_OFFSET_MASK			0x00000ffc
+
 struct bnxt_tc_flow_stats {
 	u64		packets;
 	u64		bytes;
@@ -1368,6 +1371,9 @@ struct bnxt_fw_health {
 #define BNXT_FW_HEALTH_REG_TYPE(reg)	((reg) & BNXT_FW_HEALTH_REG_TYPE_MASK)
 #define BNXT_FW_HEALTH_REG_OFF(reg)	((reg) & ~BNXT_FW_HEALTH_REG_TYPE_MASK)
 
+#define BNXT_FW_HEALTH_WIN_BASE		0x3000
+#define BNXT_FW_HEALTH_WIN_MAP_OFF	8
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 12/22] bnxt_en: Enable health monitoring.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (10 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 11/22] bnxt_en: Pre-map the firmware health monitoring registers Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 13/22] bnxt_en: Add BNXT_STATE_IN_FW_RESET state Michael Chan
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Handle the async event from the firmware that enables firmware health
monitoring.  Store initial health metrics.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 66 ++++++++++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  9 +++++
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8ec41d6..27fc047 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -254,6 +254,7 @@ static const u16 bnxt_async_events_arr[] = {
 	ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
 	ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE,
 	ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
+	ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY,
 };
 
 static struct workqueue_struct *bnxt_pf_wq;
@@ -1896,6 +1897,33 @@ static int bnxt_force_rx_discard(struct bnxt *bp,
 	return bnxt_rx_pkt(bp, cpr, raw_cons, event);
 }
 
+u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg = fw_health->regs[reg_idx];
+	u32 reg_type, reg_off, val = 0;
+
+	reg_type = BNXT_FW_HEALTH_REG_TYPE(reg);
+	reg_off = BNXT_FW_HEALTH_REG_OFF(reg);
+	switch (reg_type) {
+	case BNXT_FW_HEALTH_REG_TYPE_CFG:
+		pci_read_config_dword(bp->pdev, reg_off, &val);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_GRC:
+		reg_off = fw_health->mapped_regs[reg_idx];
+		/* fall through */
+	case BNXT_FW_HEALTH_REG_TYPE_BAR0:
+		val = readl(bp->bar0 + reg_off);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_BAR1:
+		val = readl(bp->bar1 + reg_off);
+		break;
+	}
+	if (reg_idx == BNXT_FW_RESET_INPROG_REG)
+		val &= fw_health->fw_reset_inprog_reg_mask;
+	return val;
+}
+
 #define BNXT_GET_EVENT_PORT(data)	\
 	((data) &			\
 	 ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK)
@@ -1951,6 +1979,35 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
+	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
+		struct bnxt_fw_health *fw_health = bp->fw_health;
+		u32 data1 = le32_to_cpu(cmpl->event_data1);
+
+		if (!fw_health)
+			goto async_event_process_exit;
+
+		fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1);
+		fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1);
+		if (!fw_health->enabled)
+			break;
+
+		if (netif_msg_drv(bp))
+			netdev_info(bp->dev, "Error recovery info: error recovery[%d], master[%d], reset count[0x%x], health status: 0x%x\n",
+				    fw_health->enabled, fw_health->master,
+				    bnxt_fw_health_readl(bp,
+							 BNXT_FW_RESET_CNT_REG),
+				    bnxt_fw_health_readl(bp,
+							 BNXT_FW_HEALTH_REG));
+		fw_health->tmr_multiplier =
+			DIV_ROUND_UP(fw_health->polling_dsecs * HZ,
+				     bp->current_interval * 10);
+		fw_health->tmr_counter = fw_health->tmr_multiplier;
+		fw_health->last_fw_heartbeat =
+			bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+		fw_health->last_fw_reset_cnt =
+			bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+		goto async_event_process_exit;
+	}
 	default:
 		goto async_event_process_exit;
 	}
@@ -4310,9 +4367,14 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap,
 		cpu_to_le32(FUNC_DRV_RGTR_REQ_ENABLES_ASYNC_EVENT_FWD);
 
 	memset(async_events_bmap, 0, sizeof(async_events_bmap));
-	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++)
-		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
+	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++) {
+		u16 event_id = bnxt_async_events_arr[i];
 
+		if (event_id == ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY &&
+		    !(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
+			continue;
+		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
+	}
 	if (bmap && bmap_size) {
 		for (i = 0; i < bmap_size; i++) {
 			if (test_bit(i, bmap))
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 78fd585..eb55519 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -472,6 +472,14 @@ struct rx_tpa_end_cmp_ext {
 	((le32_to_cpu((rx_tpa_end_ext)->rx_tpa_end_cmp_dup_acks) &	\
 	 RX_TPA_END_CMP_AGG_BUFS_P5) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT_P5)
 
+#define EVENT_DATA1_RECOVERY_MASTER_FUNC(data1)				\
+	!!((data1) &							\
+	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC)
+
+#define EVENT_DATA1_RECOVERY_ENABLED(data1)				\
+	!!((data1) &							\
+	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_RECOVERY_ENABLED)
+
 struct nqe_cn {
 	__le16	type;
 	#define NQ_CN_TYPE_MASK           0x3fUL
@@ -1914,6 +1922,7 @@ extern const u16 bnxt_lhint_arr[];
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		       u16 prod, gfp_t gfp);
 void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data);
+u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx);
 void bnxt_set_tpa_flags(struct bnxt *bp);
 void bnxt_set_ring_params(struct bnxt *);
 int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 13/22] bnxt_en: Add BNXT_STATE_IN_FW_RESET state.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (11 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 12/22] bnxt_en: Enable health monitoring Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter Michael Chan
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

The new flag will be set in subsequent patches when firmware is
going through reset.  If bnxt_close() is called while the new flag
is set, the FW reset sequence will have to be aborted because the
NIC is prematurely closed before FW reset has completed.  We also
reject SRIOV configurations while FW reset is in progress.

v2: No longer drop rtnl_lock() in close and wait for FW reset to complete.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 20 ++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |  5 +++++
 3 files changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 27fc047..4caacab 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8716,6 +8716,10 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 	if (flags & FUNC_DRV_IF_CHANGE_RESP_FLAGS_HOT_FW_RESET_DONE)
 		fw_reset = true;
 
+	if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state) && !fw_reset) {
+		netdev_err(bp->dev, "RESET_DONE not set during FW reset.\n");
+		return -ENODEV;
+	}
 	if (resc_reinit || fw_reset) {
 		if (fw_reset) {
 			rc = bnxt_fw_init_one(bp);
@@ -9226,6 +9230,10 @@ static void __bnxt_close_nic(struct bnxt *bp, bool irq_re_init,
 	bnxt_debug_dev_exit(bp);
 	bnxt_disable_napi(bp);
 	del_timer_sync(&bp->timer);
+	if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state) &&
+	    pci_is_enabled(bp->pdev))
+		pci_disable_device(bp->pdev);
+
 	bnxt_free_skbs(bp);
 
 	/* Save ring stats before shutdown */
@@ -9242,6 +9250,18 @@ int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
 {
 	int rc = 0;
 
+	if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		/* If we get here, it means firmware reset is in progress
+		 * while we are trying to close.  We can safely proceed with
+		 * the close because we are holding rtnl_lock().  Some firmware
+		 * messages may fail as we proceed to close.  We set the
+		 * ABORT_ERR flag here so that the FW reset thread will later
+		 * abort when it gets the rtnl_lock() and sees the flag.
+		 */
+		netdev_warn(bp->dev, "FW reset in progress during close, FW reset will be aborted\n");
+		set_bit(BNXT_STATE_ABORT_ERR, &bp->state);
+	}
+
 #ifdef CONFIG_BNXT_SRIOV
 	if (bp->sriov_cfg) {
 		rc = wait_event_interruptible_timeout(bp->sriov_cfg_wait,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index eb55519..6f7aa7c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1605,6 +1605,7 @@ struct bnxt {
 #define BNXT_STATE_IN_SP_TASK	1
 #define BNXT_STATE_READ_STATS	2
 #define BNXT_STATE_FW_RESET_DET 3
+#define BNXT_STATE_IN_FW_RESET	4
 #define BNXT_STATE_ABORT_ERR	5
 
 	struct bnxt_irq	*irq_tbl;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index ac890ca..4aacfc3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -828,6 +828,11 @@ int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs)
 		rtnl_unlock();
 		return 0;
 	}
+	if (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		netdev_warn(dev, "Reject SRIOV config request when FW reset is in progress\n");
+		rtnl_unlock();
+		return 0;
+	}
 	bp->sriov_cfg = true;
 	rtnl_unlock();
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (12 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 13/22] bnxt_en: Add BNXT_STATE_IN_FW_RESET state Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-10-07  9:56   ` Jiri Pirko
  2019-08-30  3:54 ` [PATCH net-next v2 15/22] bnxt_en: Handle RESET_NOTIFY async event from firmware Michael Chan
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui, Jiri Pirko

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Create new FW devlink_health_reporter, to know the current health
status of FW.

Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw

pci/0000:af:00.0:
  name fw
    state healthy error 0 recover 0

 FW status: Healthy; Reset count: 1

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  3 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 81 +++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 6f7aa7c..a75fe16 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1368,6 +1368,7 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_regs[16];
 	u32 fw_reset_seq_vals[16];
 	u32 fw_reset_seq_delay_msec[16];
+	struct devlink_health_reporter	*fw_reporter;
 };
 
 #define BNXT_FW_HEALTH_REG_TYPE_MASK	3
@@ -1382,6 +1383,8 @@ struct bnxt_fw_health {
 #define BNXT_FW_HEALTH_WIN_BASE		0x3000
 #define BNXT_FW_HEALTH_WIN_MAP_OFF	8
 
+#define BNXT_FW_STATUS_HEALTHY		0x8000
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 7d9a8c4..e15d5dc 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -15,6 +15,84 @@
 #include "bnxt_vfr.h"
 #include "bnxt_devlink.h"
 
+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
+				     struct devlink_fmsg *fmsg)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+	struct bnxt_fw_health *health = bp->fw_health;
+	u32 val, health_status;
+	int rc;
+
+	if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return 0;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
+	health_status = val & 0xffff;
+
+	if (health_status == BNXT_FW_STATUS_HEALTHY) {
+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
+						  "Healthy;");
+		if (rc)
+			return rc;
+	} else if (health_status < BNXT_FW_STATUS_HEALTHY) {
+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
+						  "Not yet completed initialization;");
+		if (rc)
+			return rc;
+	} else if (health_status > BNXT_FW_STATUS_HEALTHY) {
+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
+						  "Encountered fatal error and cannot recover;");
+		if (rc)
+			return rc;
+	}
+
+	if (val >> 16) {
+		rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
+		if (rc)
+			return rc;
+	}
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+static const struct devlink_health_reporter_ops bnxt_dl_fw_reporter_ops = {
+	.name = "fw",
+	.diagnose = bnxt_fw_reporter_diagnose,
+};
+
+static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
+{
+	struct bnxt_fw_health *health = bp->fw_health;
+
+	if (!health)
+		return;
+
+	health->fw_reporter =
+		devlink_health_reporter_create(bp->dl, &bnxt_dl_fw_reporter_ops,
+					       0, false, bp);
+	if (IS_ERR(health->fw_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_reporter));
+		health->fw_reporter = NULL;
+	}
+}
+
+static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
+{
+	struct bnxt_fw_health *health = bp->fw_health;
+
+	if (!health)
+		return;
+
+	if (health->fw_reporter)
+		devlink_health_reporter_destroy(health->fw_reporter);
+}
+
 static const struct devlink_ops bnxt_dl_ops = {
 #ifdef CONFIG_BNXT_SRIOV
 	.eswitch_mode_set = bnxt_dl_eswitch_mode_set,
@@ -247,6 +325,8 @@ int bnxt_dl_register(struct bnxt *bp)
 
 	devlink_params_publish(dl);
 
+	bnxt_dl_fw_reporters_create(bp);
+
 	return 0;
 
 err_dl_port_unreg:
@@ -269,6 +349,7 @@ void bnxt_dl_unregister(struct bnxt *bp)
 	if (!dl)
 		return;
 
+	bnxt_dl_fw_reporters_destroy(bp);
 	devlink_port_params_unregister(&bp->dl_port, bnxt_dl_port_params,
 				       ARRAY_SIZE(bnxt_dl_port_params));
 	devlink_port_unregister(&bp->dl_port);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 15/22] bnxt_en: Handle RESET_NOTIFY async event from firmware.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (13 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30  3:54 ` [PATCH net-next v2 16/22] bnxt_en: Handle firmware reset Michael Chan
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

This event from firmware signals a coordinated reset initiated by the
firmware.  It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.

We store the parameters from this event.  Subsequent patches will
add logic to handle reset itself using devlink reporters.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 11 +++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  7 +++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4caacab..d1d33f6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -254,6 +254,7 @@ static const u16 bnxt_async_events_arr[] = {
 	ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
 	ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE,
 	ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
+	ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY,
 	ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY,
 };
 
@@ -1979,6 +1980,16 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
+	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+		bp->fw_reset_timestamp = jiffies;
+		bp->fw_reset_min_dsecs = cmpl->timestamp_lo;
+		if (!bp->fw_reset_min_dsecs)
+			bp->fw_reset_min_dsecs = BNXT_DFLT_FW_RST_MIN_DSECS;
+		bp->fw_reset_max_dsecs = le16_to_cpu(cmpl->timestamp_hi);
+		if (!bp->fw_reset_max_dsecs)
+			bp->fw_reset_max_dsecs = BNXT_DFLT_FW_RST_MAX_DSECS;
+		set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event);
+		break;
 	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
 		struct bnxt_fw_health *fw_health = bp->fw_health;
 		u32 data1 = le32_to_cpu(cmpl->event_data1);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index a75fe16..858dc40 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1719,6 +1719,13 @@ struct bnxt {
 #define BNXT_FLOW_STATS_SP_EVENT	15
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
+#define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
+
+	u16			fw_reset_min_dsecs;
+#define BNXT_DFLT_FW_RST_MIN_DSECS	20
+	u16			fw_reset_max_dsecs;
+#define BNXT_DFLT_FW_RST_MAX_DSECS	60
+	unsigned long		fw_reset_timestamp;
 
 	struct bnxt_fw_health	*fw_health;
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 16/22] bnxt_en: Handle firmware reset.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (14 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 15/22] bnxt_en: Handle RESET_NOTIFY async event from firmware Michael Chan
@ 2019-08-30  3:54 ` Michael Chan
  2019-08-30 17:56   ` kbuild test robot
  2019-08-30  3:55 ` [PATCH net-next v2 17/22] bnxt_en: Add devlink health reset reporter Michael Chan
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Add the bnxt_fw_reset() main function to handle firmware reset.  This
is triggered by firmware to initiate an orderly reset, for example
when a non-fatal exception condition has been detected.  bnxt_fw_reset()
will first wait for all VFs to shutdown and then start the
bnxt_fw_reset_task() work queue to go through the sequence of reset,
re-probe, and re-initialization.

The next patch will add the devlink reporter to start the sequence and
call bnxt_fw_reset().

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 150 ++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  11 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c |   3 +
 3 files changed, 164 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d1d33f6..98b15551 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1140,6 +1140,14 @@ static int bnxt_discard_rx(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	return 0;
 }
 
+static void bnxt_queue_fw_reset_work(struct bnxt *bp, unsigned long delay)
+{
+	if (BNXT_PF(bp))
+		queue_delayed_work(bnxt_pf_wq, &bp->fw_reset_task, delay);
+	else
+		schedule_delayed_work(&bp->fw_reset_task, delay);
+}
+
 static void bnxt_queue_sp_work(struct bnxt *bp)
 {
 	if (BNXT_PF(bp))
@@ -6355,6 +6363,8 @@ static int bnxt_hwrm_func_qcfg(struct bnxt *bp)
 		struct bnxt_vf_info *vf = &bp->vf;
 
 		vf->vlan = le16_to_cpu(resp->vlan) & VLAN_VID_MASK;
+	} else {
+		bp->pf.registered_vfs = le16_to_cpu(resp->registered_vfs);
 	}
 #endif
 	flags = le16_to_cpu(resp->flags);
@@ -9980,6 +9990,53 @@ static void bnxt_reset(struct bnxt *bp, bool silent)
 	bnxt_rtnl_unlock_sp(bp);
 }
 
+static void bnxt_fw_reset_close(struct bnxt *bp)
+{
+	__bnxt_close_nic(bp, true, false);
+	bnxt_ulp_irq_stop(bp);
+	bnxt_clear_int_mode(bp);
+	bnxt_hwrm_func_drv_unrgtr(bp);
+	bnxt_free_ctx_mem(bp);
+	kfree(bp->ctx);
+	bp->ctx = NULL;
+}
+
+void bnxt_fw_reset(struct bnxt *bp)
+{
+	int rc;
+
+	bnxt_rtnl_lock_sp(bp);
+	if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
+	    !test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+		if (BNXT_PF(bp) && bp->pf.active_vfs) {
+			rc = bnxt_hwrm_func_qcfg(bp);
+			if (rc) {
+				netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
+					   rc);
+				clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+				dev_close(bp->dev);
+				goto fw_reset_exit;
+			}
+			if (bp->pf.registered_vfs || bp->sriov_cfg) {
+				u16 vf_tmo_dsecs = bp->pf.registered_vfs * 10;
+
+				if (bp->fw_reset_max_dsecs < vf_tmo_dsecs)
+					bp->fw_reset_max_dsecs = vf_tmo_dsecs;
+				bp->fw_reset_state =
+					BNXT_FW_RESET_STATE_POLL_VF;
+				bnxt_queue_fw_reset_work(bp, HZ / 10);
+				goto fw_reset_exit;
+			}
+		}
+		bnxt_fw_reset_close(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
+	}
+fw_reset_exit:
+	bnxt_rtnl_unlock_sp(bp);
+}
+
 static void bnxt_chk_missed_irq(struct bnxt *bp)
 {
 	int i;
@@ -10339,6 +10396,98 @@ static int bnxt_fw_init_one(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_fw_reset_task(struct work_struct *work)
+{
+	struct bnxt *bp = container_of(work, struct bnxt, fw_reset_task.work);
+	int rc;
+
+	if (!test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+		netdev_err(bp->dev, "bnxt_fw_reset_task() called when not in fw reset mode!\n");
+		return;
+	}
+
+	switch (bp->fw_reset_state) {
+	case BNXT_FW_RESET_STATE_POLL_VF:
+		rc = bnxt_hwrm_func_qcfg(bp);
+		if (rc) {
+			netdev_err(bp->dev, "Firmware reset aborted, subsequent func_qcfg cmd failed, rc = %d, %d msecs since reset timestamp\n",
+				   rc, jiffies_to_msecs(jiffies -
+				   bp->fw_reset_timestamp));
+			goto fw_reset_abort;
+		}
+		if (bp->pf.registered_vfs || bp->sriov_cfg) {
+			if (time_after(jiffies, bp->fw_reset_timestamp +
+				       (bp->fw_reset_max_dsecs * HZ / 10))) {
+				clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+				bp->fw_reset_state = 0;
+				netdev_err(bp->dev, "Firmware reset aborted, %d VFs still registered, sriov_cfg %d\n",
+					   bp->pf.registered_vfs,
+					   bp->sriov_cfg);
+				return;
+			}
+			bnxt_queue_fw_reset_work(bp, HZ / 10);
+			return;
+		}
+		bp->fw_reset_timestamp = jiffies;
+		rtnl_lock();
+		bnxt_fw_reset_close(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		rtnl_unlock();
+		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
+		return;
+	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		if (pci_enable_device(bp->pdev)) {
+			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
+			goto fw_reset_abort;
+		}
+		pci_set_master(bp->pdev);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_POLL_FW;
+		/* fall through */
+	case BNXT_FW_RESET_STATE_POLL_FW:
+		bp->hwrm_cmd_timeout = SHORT_HWRM_CMD_TIMEOUT;
+		rc = __bnxt_hwrm_ver_get(bp, true);
+		if (rc) {
+			if (time_after(jiffies, bp->fw_reset_timestamp +
+				       (bp->fw_reset_max_dsecs * HZ / 10))) {
+				netdev_err(bp->dev, "Firmware reset aborted\n");
+				goto fw_reset_abort;
+			}
+			bnxt_queue_fw_reset_work(bp, HZ / 5);
+			return;
+		}
+		bp->hwrm_cmd_timeout = DFLT_HWRM_CMD_TIMEOUT;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_OPENING;
+		/* fall through */
+	case BNXT_FW_RESET_STATE_OPENING:
+		while (!rtnl_trylock()) {
+			bnxt_queue_fw_reset_work(bp, HZ / 10);
+			return;
+		}
+		rc = bnxt_open(bp->dev);
+		if (rc) {
+			netdev_err(bp->dev, "bnxt_open_nic() failed\n");
+			clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+			dev_close(bp->dev);
+		}
+		bnxt_ulp_irq_restart(bp, rc);
+		rtnl_unlock();
+
+		bp->fw_reset_state = 0;
+		/* Make sure fw_reset_state is 0 before clearing the flag */
+		smp_mb__before_atomic();
+		clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+		break;
+	}
+	return;
+
+fw_reset_abort:
+	clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+	bp->fw_reset_state = 0;
+	rtnl_lock();
+	dev_close(bp->dev);
+	rtnl_unlock();
+}
+
 static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 {
 	int rc;
@@ -10401,6 +10550,7 @@ static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 	pci_enable_pcie_error_reporting(pdev);
 
 	INIT_WORK(&bp->sp_task, bnxt_sp_task);
+	INIT_DELAYED_WORK(&bp->fw_reset_task, bnxt_fw_reset_task);
 
 	spin_lock_init(&bp->ntp_fltr_lock);
 #if BITS_PER_LONG == 32
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 858dc40..c78aa51 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -640,6 +640,7 @@ struct nqe_cn {
 #define BNXT_HWRM_MAX_REQ_LEN		(bp->hwrm_max_req_len)
 #define BNXT_HWRM_SHORT_REQ_LEN		sizeof(struct hwrm_short_input)
 #define DFLT_HWRM_CMD_TIMEOUT		500
+#define SHORT_HWRM_CMD_TIMEOUT		20
 #define HWRM_CMD_TIMEOUT		(bp->hwrm_cmd_timeout)
 #define HWRM_RESET_TIMEOUT		((HWRM_CMD_TIMEOUT) * 4)
 #define HWRM_RESP_ERR_CODE_MASK		0xffff
@@ -1066,6 +1067,7 @@ struct bnxt_pf_info {
 	u8	mac_addr[ETH_ALEN];
 	u32	first_vf_id;
 	u16	active_vfs;
+	u16	registered_vfs;
 	u16	max_vfs;
 	u32	max_encap_records;
 	u32	max_decap_records;
@@ -1721,6 +1723,14 @@ struct bnxt {
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 #define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
 
+	struct delayed_work	fw_reset_task;
+	int			fw_reset_state;
+#define BNXT_FW_RESET_STATE_POLL_VF	1
+#define BNXT_FW_RESET_STATE_RESET_FW	2
+#define BNXT_FW_RESET_STATE_ENABLE_DEV	3
+#define BNXT_FW_RESET_STATE_POLL_FW	4
+#define BNXT_FW_RESET_STATE_OPENING	5
+
 	u16			fw_reset_min_dsecs;
 #define BNXT_DFLT_FW_RST_MIN_DSECS	20
 	u16			fw_reset_max_dsecs;
@@ -1966,6 +1976,7 @@ int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_half_open_nic(struct bnxt *bp);
 void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+void bnxt_fw_reset(struct bnxt *bp);
 int bnxt_check_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
 		     int tx_xdp);
 int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index fc77caf..b2c1609 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -226,6 +226,9 @@ static int bnxt_send_msg(struct bnxt_en_dev *edev, int ulp_id,
 	struct input *req;
 	int rc;
 
+	if (ulp_id != BNXT_ROCE_ULP && bp->fw_reset_state)
+		return -EBUSY;
+
 	mutex_lock(&bp->hwrm_cmd_lock);
 	req = fw_msg->msg;
 	req->resp_addr = cpu_to_le64(bp->hwrm_cmd_resp_dma_addr);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 17/22] bnxt_en: Add devlink health reset reporter.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (15 preceding siblings ...)
  2019-08-30  3:54 ` [PATCH net-next v2 16/22] bnxt_en: Handle firmware reset Michael Chan
@ 2019-08-30  3:55 ` Michael Chan
  2019-08-30  3:55 ` [PATCH net-next v2 18/22] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event Michael Chan
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui, Jiri Pirko

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Add devlink health reporter for the firmware reset event.  Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         |  3 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  5 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 52 +++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  1 +
 4 files changed, 61 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 98b15551..cd20067 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10167,6 +10167,9 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event))
 		bnxt_reset(bp, true);
 
+	if (test_and_clear_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event))
+		bnxt_devlink_health_report(bp, BNXT_FW_RESET_NOTIFY_SP_EVENT);
+
 	smp_mb__before_atomic();
 	clear_bit(BNXT_STATE_IN_SP_TASK, &bp->state);
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index c78aa51..d65625f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1371,6 +1371,11 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_vals[16];
 	u32 fw_reset_seq_delay_msec[16];
 	struct devlink_health_reporter	*fw_reporter;
+	struct devlink_health_reporter *fw_reset_reporter;
+};
+
+struct bnxt_fw_reporter_ctx {
+	unsigned long sp_event;
 };
 
 #define BNXT_FW_HEALTH_REG_TYPE_MASK	3
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index e15d5dc..8512467 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -65,6 +65,24 @@ static const struct devlink_health_reporter_ops bnxt_dl_fw_reporter_ops = {
 	.diagnose = bnxt_fw_reporter_diagnose,
 };
 
+static int bnxt_fw_reset_recover(struct devlink_health_reporter *reporter,
+				 void *priv_ctx)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+
+	if (!priv_ctx)
+		return -EOPNOTSUPP;
+
+	bnxt_fw_reset(bp);
+	return 0;
+}
+
+static const
+struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
+	.name = "fw_reset",
+	.recover = bnxt_fw_reset_recover,
+};
+
 static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 {
 	struct bnxt_fw_health *health = bp->fw_health;
@@ -80,6 +98,16 @@ static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 			    PTR_ERR(health->fw_reporter));
 		health->fw_reporter = NULL;
 	}
+
+	health->fw_reset_reporter =
+		devlink_health_reporter_create(bp->dl,
+					       &bnxt_dl_fw_reset_reporter_ops,
+					       0, true, bp);
+	if (IS_ERR(health->fw_reset_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_reset_reporter));
+		health->fw_reset_reporter = NULL;
+	}
 }
 
 static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
@@ -91,6 +119,30 @@ static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
 
 	if (health->fw_reporter)
 		devlink_health_reporter_destroy(health->fw_reporter);
+
+	if (health->fw_reset_reporter)
+		devlink_health_reporter_destroy(health->fw_reset_reporter);
+}
+
+void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	struct bnxt_fw_reporter_ctx fw_reporter_ctx;
+
+	if (!fw_health)
+		return;
+
+	fw_reporter_ctx.sp_event = event;
+	switch (event) {
+	case BNXT_FW_RESET_NOTIFY_SP_EVENT:
+		if (!fw_health->fw_reset_reporter)
+			return;
+
+		devlink_health_report(fw_health->fw_reset_reporter,
+				      "FW non-fatal reset event received",
+				      &fw_reporter_ctx);
+		return;
+	}
 }
 
 static const struct devlink_ops bnxt_dl_ops = {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index 5b6b2c7..b97e0ba 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -55,6 +55,7 @@ struct bnxt_dl_nvm_param {
 	u16 num_bits;
 };
 
+void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event);
 int bnxt_dl_register(struct bnxt *bp);
 void bnxt_dl_unregister(struct bnxt *bp);
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 18/22] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (16 preceding siblings ...)
  2019-08-30  3:55 ` [PATCH net-next v2 17/22] bnxt_en: Add devlink health reset reporter Michael Chan
@ 2019-08-30  3:55 ` Michael Chan
  2019-08-30  3:55 ` [PATCH net-next v2 19/22] bnxt_en: Do not send firmware messages if firmware is in error state Michael Chan
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Retain the VF MAC address, default VLAN, TX rate control, trust settings
of VFs after firmware reset.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 50 +++++++++++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  2 +-
 3 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cd20067..ff911d6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9211,7 +9211,7 @@ static int bnxt_open(struct net_device *dev)
 			int n = pf->active_vfs;
 
 			if (n)
-				bnxt_cfg_hw_sriov(bp, &n);
+				bnxt_cfg_hw_sriov(bp, &n, true);
 		}
 		bnxt_hwmon_open(bp);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 4aacfc3..f6f3454 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -470,10 +470,43 @@ static int bnxt_hwrm_func_buf_rgtr(struct bnxt *bp)
 	return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 }
 
+/* Caller holds bp->hwrm_cmd_lock mutex lock */
+static void __bnxt_set_vf_params(struct bnxt *bp, int vf_id)
+{
+	struct hwrm_func_cfg_input req = {0};
+	struct bnxt_vf_info *vf;
+
+	vf = &bp->pf.vf[vf_id];
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_CFG, -1, -1);
+	req.fid = cpu_to_le16(vf->fw_fid);
+	req.flags = cpu_to_le32(vf->func_flags);
+
+	if (is_valid_ether_addr(vf->mac_addr)) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_DFLT_MAC_ADDR);
+		memcpy(req.dflt_mac_addr, vf->mac_addr, ETH_ALEN);
+	}
+	if (vf->vlan) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_DFLT_VLAN);
+		req.dflt_vlan = cpu_to_le16(vf->vlan);
+	}
+	if (vf->max_tx_rate) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_MAX_BW);
+		req.max_bw = cpu_to_le32(vf->max_tx_rate);
+#ifdef HAVE_IFLA_TX_RATE
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_MIN_BW);
+		req.min_bw = cpu_to_le32(vf->min_tx_rate);
+#endif
+	}
+	if (vf->flags & BNXT_VF_TRUST)
+		req.flags |= cpu_to_le32(FUNC_CFG_REQ_FLAGS_TRUSTED_VF_ENABLE);
+
+	_hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
 /* Only called by PF to reserve resources for VFs, returns actual number of
  * VFs configured, or < 0 on error.
  */
-static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
+static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs, bool reset)
 {
 	struct hwrm_func_vf_resource_cfg_input req = {0};
 	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
@@ -545,6 +578,9 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		if (reset)
+			__bnxt_set_vf_params(bp, i);
+
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
@@ -659,15 +695,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 	return rc;
 }
 
-static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
+static int bnxt_func_cfg(struct bnxt *bp, int num_vfs, bool reset)
 {
 	if (BNXT_NEW_RM(bp))
-		return bnxt_hwrm_func_vf_resc_cfg(bp, num_vfs);
+		return bnxt_hwrm_func_vf_resc_cfg(bp, num_vfs, reset);
 	else
 		return bnxt_hwrm_func_cfg(bp, num_vfs);
 }
 
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset)
 {
 	int rc;
 
@@ -677,7 +713,7 @@ int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
 		return rc;
 
 	/* Reserve resources for VFs */
-	rc = bnxt_func_cfg(bp, *num_vfs);
+	rc = bnxt_func_cfg(bp, *num_vfs, reset);
 	if (rc != *num_vfs) {
 		if (rc <= 0) {
 			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
@@ -758,7 +794,7 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 	if (rc)
 		goto err_out1;
 
-	rc = bnxt_cfg_hw_sriov(bp, num_vfs);
+	rc = bnxt_cfg_hw_sriov(bp, num_vfs, false);
 	if (rc)
 		goto err_out2;
 
@@ -1144,7 +1180,7 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 }
 #else
 
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset)
 {
 	if (*num_vfs)
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index 0abf18e..629641b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -36,7 +36,7 @@ int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs);
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 19/22] bnxt_en: Do not send firmware messages if firmware is in error state.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (17 preceding siblings ...)
  2019-08-30  3:55 ` [PATCH net-next v2 18/22] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event Michael Chan
@ 2019-08-30  3:55 ` Michael Chan
  2019-08-30  3:55 ` [PATCH net-next v2 20/22] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task() Michael Chan
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller.  Fix up some clean up functions to continue
and not abort when the firmware message function returns error.

This is preparation work to fully handle firmware error recovery
under fatal conditions.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 +++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff911d6..f584a6b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4173,6 +4173,9 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 	u32 bar_offset = BNXT_GRCPF_REG_CHIMP_COMM;
 	u16 dst = BNXT_HWRM_CHNL_CHIMP;
 
+	if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return -EBUSY;
+
 	if (msg_len > BNXT_HWRM_MAX_REQ_LEN) {
 		if (msg_len > bp->hwrm_max_ext_req_len ||
 		    !bp->hwrm_short_cmd_req_addr)
@@ -5042,8 +5045,6 @@ static int bnxt_hwrm_vnic_free_one(struct bnxt *bp, u16 vnic_id)
 			cpu_to_le32(bp->vnic_info[vnic_id].fw_vnic_id);
 
 		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-		if (rc)
-			return rc;
 		bp->vnic_info[vnic_id].fw_vnic_id = INVALID_HW_RING_ID;
 	}
 	return rc;
@@ -5183,8 +5184,6 @@ static int bnxt_hwrm_ring_grp_free(struct bnxt *bp)
 
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
-		if (rc)
-			break;
 		bp->grp_info[i].fw_grp_id = INVALID_HW_RING_ID;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
@@ -5503,6 +5502,9 @@ static int hwrm_ring_free_send_msg(struct bnxt *bp,
 	struct hwrm_ring_free_output *resp = bp->hwrm_cmd_resp_addr;
 	u16 error_code;
 
+	if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return 0;
+
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_RING_FREE, cmpl_ring_id, -1);
 	req.ring_type = ring_type;
 	req.ring_id = cpu_to_le16(ring->fw_ring_id);
@@ -6300,8 +6302,6 @@ static int bnxt_hwrm_stat_ctx_free(struct bnxt *bp)
 
 			rc = _hwrm_send_message(bp, &req, sizeof(req),
 						HWRM_CMD_TIMEOUT);
-			if (rc)
-				break;
 
 			cpr->hw_stats_ctx_id = INVALID_STATS_CTX_ID;
 		}
@@ -7415,6 +7415,8 @@ static int bnxt_set_tpa(struct bnxt *bp, bool set_tpa)
 
 	if (set_tpa)
 		tpa_flags = bp->flags & BNXT_FLAG_TPA;
+	else if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return 0;
 	for (i = 0; i < bp->nr_vnics; i++) {
 		rc = bnxt_hwrm_vnic_set_tpa(bp, i, tpa_flags);
 		if (rc) {
@@ -10009,7 +10011,8 @@ void bnxt_fw_reset(struct bnxt *bp)
 	if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
 	    !test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
 		set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
-		if (BNXT_PF(bp) && bp->pf.active_vfs) {
+		if (BNXT_PF(bp) && bp->pf.active_vfs &&
+		    !test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
 			rc = bnxt_hwrm_func_qcfg(bp);
 			if (rc) {
 				netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
@@ -10439,6 +10442,7 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 		return;
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
 			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
 			goto fw_reset_abort;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index d65625f..f3a6aad 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1617,6 +1617,7 @@ struct bnxt {
 #define BNXT_STATE_FW_RESET_DET 3
 #define BNXT_STATE_IN_FW_RESET	4
 #define BNXT_STATE_ABORT_ERR	5
+#define BNXT_STATE_FW_FATAL_COND	6
 
 	struct bnxt_irq	*irq_tbl;
 	int			total_irqs;
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 20/22] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task().
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (18 preceding siblings ...)
  2019-08-30  3:55 ` [PATCH net-next v2 19/22] bnxt_en: Do not send firmware messages if firmware is in error state Michael Chan
@ 2019-08-30  3:55 ` Michael Chan
  2019-08-30  3:55 ` [PATCH net-next v2 21/22] bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors Michael Chan
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

This state handles driver initiated chip reset during error recovery.
Only the master function will perform this step during error recovery.
The next patch will add code to initiate this reset from the master
function.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 64 +++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f584a6b..51cf679 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10402,6 +10402,62 @@ static int bnxt_fw_init_one(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_fw_reset_writel(struct bnxt *bp, int reg_idx)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg = fw_health->fw_reset_seq_regs[reg_idx];
+	u32 val = fw_health->fw_reset_seq_vals[reg_idx];
+	u32 reg_type, reg_off, delay_msecs;
+
+	delay_msecs = fw_health->fw_reset_seq_delay_msec[reg_idx];
+	reg_type = BNXT_FW_HEALTH_REG_TYPE(reg);
+	reg_off = BNXT_FW_HEALTH_REG_OFF(reg);
+	switch (reg_type) {
+	case BNXT_FW_HEALTH_REG_TYPE_CFG:
+		pci_write_config_dword(bp->pdev, reg_off, val);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_GRC:
+		writel(reg_off & BNXT_GRC_BASE_MASK,
+		       bp->bar0 + BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4);
+		reg_off = (reg_off & BNXT_GRC_OFFSET_MASK) + 0x2000;
+		/* fall through */
+	case BNXT_FW_HEALTH_REG_TYPE_BAR0:
+		writel(val, bp->bar0 + reg_off);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_BAR1:
+		writel(val, bp->bar1 + reg_off);
+		break;
+	}
+	if (delay_msecs) {
+		pci_read_config_dword(bp->pdev, 0, &val);
+		msleep(delay_msecs);
+	}
+}
+
+static void bnxt_reset_all(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	int i;
+
+	if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_HOST) {
+		for (i = 0; i < fw_health->fw_reset_seq_cnt; i++)
+			bnxt_fw_reset_writel(bp, i);
+	} else if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU) {
+		struct hwrm_fw_reset_input req = {0};
+		int rc;
+
+		bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FW_RESET, -1, -1);
+		req.resp_addr = cpu_to_le64(bp->hwrm_cmd_kong_resp_dma_addr);
+		req.embedded_proc_type = FW_RESET_REQ_EMBEDDED_PROC_TYPE_CHIP;
+		req.selfrst_status = FW_RESET_REQ_SELFRST_STATUS_SELFRSTASAP;
+		req.flags = FW_RESET_REQ_FLAGS_RESET_GRACEFUL;
+		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+		if (rc)
+			netdev_warn(bp->dev, "Unable to reset FW rc=%d\n", rc);
+	}
+	bp->fw_reset_timestamp = jiffies;
+}
+
 static void bnxt_fw_reset_task(struct work_struct *work)
 {
 	struct bnxt *bp = container_of(work, struct bnxt, fw_reset_task.work);
@@ -10441,6 +10497,14 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		rtnl_unlock();
 		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 		return;
+	case BNXT_FW_RESET_STATE_RESET_FW: {
+		u32 wait_dsecs = bp->fw_health->post_reset_wait_dsecs;
+
+		bnxt_reset_all(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		bnxt_queue_fw_reset_work(bp, wait_dsecs * HZ / 10);
+		return;
+	}
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
 		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 21/22] bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (19 preceding siblings ...)
  2019-08-30  3:55 ` [PATCH net-next v2 20/22] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task() Michael Chan
@ 2019-08-30  3:55 ` Michael Chan
  2019-08-30  3:55 ` [PATCH net-next v2 22/22] bnxt_en: Add FW fatal devlink_health_reporter Michael Chan
  2019-08-30 21:02 ` [PATCH net-next v2 00/22] bnxt_en: health and error recovery David Miller
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui

This call will handle fatal firmware errors by forcing a reset on the
firmware.  The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages.  From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.

The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 44 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 45 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 51cf679..5c7379e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10003,6 +10003,40 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
 	bp->ctx = NULL;
 }
 
+/* rtnl_lock is acquired before calling this function */
+static void bnxt_force_fw_reset(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 wait_dsecs;
+
+	if (!test_bit(BNXT_STATE_OPEN, &bp->state) ||
+	    test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return;
+
+	set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+	bnxt_fw_reset_close(bp);
+	wait_dsecs = fw_health->master_func_wait_dsecs;
+	if (fw_health->master) {
+		if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU)
+			wait_dsecs = 0;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_RESET_FW;
+	} else {
+		bp->fw_reset_timestamp = jiffies + wait_dsecs * HZ / 10;
+		wait_dsecs = fw_health->normal_func_wait_dsecs;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+	}
+	bp->fw_reset_max_dsecs = fw_health->post_reset_max_wait_dsecs;
+	bnxt_queue_fw_reset_work(bp, wait_dsecs * HZ / 10);
+}
+
+void bnxt_fw_exception(struct bnxt *bp)
+{
+	set_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
+	bnxt_rtnl_lock_sp(bp);
+	bnxt_force_fw_reset(bp);
+	bnxt_rtnl_unlock_sp(bp);
+}
+
 void bnxt_fw_reset(struct bnxt *bp)
 {
 	int rc;
@@ -10506,6 +10540,16 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		return;
 	}
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state) &&
+		    bp->fw_health) {
+			u32 val;
+
+			val = bnxt_fw_health_readl(bp,
+						   BNXT_FW_RESET_INPROG_REG);
+			if (val)
+				netdev_warn(bp->dev, "FW reset inprog %x after min wait time.\n",
+					    val);
+		}
 		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
 			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index f3a6aad..3459b2a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1982,6 +1982,7 @@ int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_half_open_nic(struct bnxt *bp);
 void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+void bnxt_fw_exception(struct bnxt *bp);
 void bnxt_fw_reset(struct bnxt *bp);
 int bnxt_check_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
 		     int tx_xdp);
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH net-next v2 22/22] bnxt_en: Add FW fatal devlink_health_reporter.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (20 preceding siblings ...)
  2019-08-30  3:55 ` [PATCH net-next v2 21/22] bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors Michael Chan
@ 2019-08-30  3:55 ` Michael Chan
  2019-08-30 21:02 ` [PATCH net-next v2 00/22] bnxt_en: health and error recovery David Miller
  22 siblings, 0 replies; 28+ messages in thread
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui, Jiri Pirko

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Health show command example and output:

$ devlink health show pci/0000:af:00.0 reporter fw_fatal

pci/0000:af:00.0:
  name fw_fatal
    state healthy error 1 recover 1 grace_period 0 auto_recover true

Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.

We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 80 ++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  7 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 56 ++++++++++++++++
 3 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5c7379e..f8a834f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1988,7 +1988,9 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
-	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: {
+		u32 data1 = le32_to_cpu(cmpl->event_data1);
+
 		bp->fw_reset_timestamp = jiffies;
 		bp->fw_reset_min_dsecs = cmpl->timestamp_lo;
 		if (!bp->fw_reset_min_dsecs)
@@ -1996,8 +1998,16 @@ static int bnxt_async_event_process(struct bnxt *bp,
 		bp->fw_reset_max_dsecs = le16_to_cpu(cmpl->timestamp_hi);
 		if (!bp->fw_reset_max_dsecs)
 			bp->fw_reset_max_dsecs = BNXT_DFLT_FW_RST_MAX_DSECS;
+		if (EVENT_DATA1_RESET_NOTIFY_FATAL(data1)) {
+			netdev_warn(bp->dev, "Firmware fatal reset event received\n");
+			set_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
+		} else {
+			netdev_warn(bp->dev, "Firmware non-fatal reset event received, max wait time %d msec\n",
+				    bp->fw_reset_max_dsecs * 100);
+		}
 		set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event);
 		break;
+	}
 	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
 		struct bnxt_fw_health *fw_health = bp->fw_health;
 		u32 data1 = le32_to_cpu(cmpl->event_data1);
@@ -4414,6 +4424,7 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 {
 	struct hwrm_func_drv_rgtr_output *resp = bp->hwrm_cmd_resp_addr;
 	struct hwrm_func_drv_rgtr_input req = {0};
+	u32 flags;
 	int rc;
 
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_DRV_RGTR, -1, -1);
@@ -4423,7 +4434,11 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 			    FUNC_DRV_RGTR_REQ_ENABLES_VER);
 
 	req.os_type = cpu_to_le16(FUNC_DRV_RGTR_REQ_OS_TYPE_LINUX);
-	req.flags = cpu_to_le32(FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE);
+	flags = FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE |
+		FUNC_DRV_RGTR_REQ_FLAGS_HOT_RESET_SUPPORT;
+	if (bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY)
+		flags |= FUNC_DRV_RGTR_REQ_FLAGS_ERROR_RECOVERY_SUPPORT;
+	req.flags = cpu_to_le32(flags);
 	req.ver_maj_8b = DRV_VER_MAJ;
 	req.ver_min_8b = DRV_VER_MIN;
 	req.ver_upd_8b = DRV_VER_UPD;
@@ -9926,6 +9941,38 @@ static void bnxt_tx_timeout(struct net_device *dev)
 	bnxt_queue_sp_work(bp);
 }
 
+static void bnxt_fw_health_check(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 val;
+
+	if (!fw_health || !fw_health->enabled ||
+	    test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return;
+
+	if (fw_health->tmr_counter) {
+		fw_health->tmr_counter--;
+		return;
+	}
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+	if (val == fw_health->last_fw_heartbeat)
+		goto fw_reset;
+
+	fw_health->last_fw_heartbeat = val;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	if (val != fw_health->last_fw_reset_cnt)
+		goto fw_reset;
+
+	fw_health->tmr_counter = fw_health->tmr_multiplier;
+	return;
+
+fw_reset:
+	set_bit(BNXT_FW_EXCEPTION_SP_EVENT, &bp->sp_event);
+	bnxt_queue_sp_work(bp);
+}
+
 static void bnxt_timer(struct timer_list *t)
 {
 	struct bnxt *bp = from_timer(bp, t, timer);
@@ -9937,6 +9984,9 @@ static void bnxt_timer(struct timer_list *t)
 	if (atomic_read(&bp->intr_sem) != 0)
 		goto bnxt_restart_timer;
 
+	if (bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY)
+		bnxt_fw_health_check(bp);
+
 	if (bp->link_info.link_up && (bp->flags & BNXT_FLAG_PORT_STATS) &&
 	    bp->stats_coal_ticks) {
 		set_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event);
@@ -10003,6 +10053,26 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
 	bp->ctx = NULL;
 }
 
+static bool is_bnxt_fw_ok(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	bool no_heartbeat = false, has_reset = false;
+	u32 val;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+	if (val == fw_health->last_fw_heartbeat)
+		no_heartbeat = true;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	if (val != fw_health->last_fw_reset_cnt)
+		has_reset = true;
+
+	if (!no_heartbeat && has_reset)
+		return true;
+
+	return false;
+}
+
 /* rtnl_lock is acquired before calling this function */
 static void bnxt_force_fw_reset(struct bnxt *bp)
 {
@@ -10207,6 +10277,12 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event))
 		bnxt_devlink_health_report(bp, BNXT_FW_RESET_NOTIFY_SP_EVENT);
 
+	if (test_and_clear_bit(BNXT_FW_EXCEPTION_SP_EVENT, &bp->sp_event)) {
+		if (!is_bnxt_fw_ok(bp))
+			bnxt_devlink_health_report(bp,
+						   BNXT_FW_EXCEPTION_SP_EVENT);
+	}
+
 	smp_mb__before_atomic();
 	clear_bit(BNXT_STATE_IN_SP_TASK, &bp->state);
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3459b2a..333b0a8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -472,6 +472,11 @@ struct rx_tpa_end_cmp_ext {
 	((le32_to_cpu((rx_tpa_end_ext)->rx_tpa_end_cmp_dup_acks) &	\
 	 RX_TPA_END_CMP_AGG_BUFS_P5) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT_P5)
 
+#define EVENT_DATA1_RESET_NOTIFY_FATAL(data1)				\
+	(((data1) &							\
+	  ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK) ==\
+	 ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL)
+
 #define EVENT_DATA1_RECOVERY_MASTER_FUNC(data1)				\
 	!!((data1) &							\
 	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC)
@@ -1372,6 +1377,7 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_delay_msec[16];
 	struct devlink_health_reporter	*fw_reporter;
 	struct devlink_health_reporter *fw_reset_reporter;
+	struct devlink_health_reporter *fw_fatal_reporter;
 };
 
 struct bnxt_fw_reporter_ctx {
@@ -1728,6 +1734,7 @@ struct bnxt {
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 #define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
+#define BNXT_FW_EXCEPTION_SP_EVENT	19
 
 	struct delayed_work	fw_reset_task;
 	int			fw_reset_state;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 8512467..e664392 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -83,6 +83,31 @@ struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
 	.recover = bnxt_fw_reset_recover,
 };
 
+static int bnxt_fw_fatal_recover(struct devlink_health_reporter *reporter,
+				 void *priv_ctx)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+	struct bnxt_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
+	unsigned long event;
+
+	if (!priv_ctx)
+		return -EOPNOTSUPP;
+
+	event = fw_reporter_ctx->sp_event;
+	if (event == BNXT_FW_RESET_NOTIFY_SP_EVENT)
+		bnxt_fw_reset(bp);
+	else if (event == BNXT_FW_EXCEPTION_SP_EVENT)
+		bnxt_fw_exception(bp);
+
+	return 0;
+}
+
+static const
+struct devlink_health_reporter_ops bnxt_dl_fw_fatal_reporter_ops = {
+	.name = "fw_fatal",
+	.recover = bnxt_fw_fatal_recover,
+};
+
 static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 {
 	struct bnxt_fw_health *health = bp->fw_health;
@@ -108,6 +133,16 @@ static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 			    PTR_ERR(health->fw_reset_reporter));
 		health->fw_reset_reporter = NULL;
 	}
+
+	health->fw_fatal_reporter =
+		devlink_health_reporter_create(bp->dl,
+					       &bnxt_dl_fw_fatal_reporter_ops,
+					       0, true, bp);
+	if (IS_ERR(health->fw_fatal_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_fatal_reporter));
+		health->fw_fatal_reporter = NULL;
+	}
 }
 
 static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
@@ -122,6 +157,9 @@ static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
 
 	if (health->fw_reset_reporter)
 		devlink_health_reporter_destroy(health->fw_reset_reporter);
+
+	if (health->fw_fatal_reporter)
+		devlink_health_reporter_destroy(health->fw_fatal_reporter);
 }
 
 void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
@@ -135,6 +173,15 @@ void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
 	fw_reporter_ctx.sp_event = event;
 	switch (event) {
 	case BNXT_FW_RESET_NOTIFY_SP_EVENT:
+		if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
+			if (!fw_health->fw_fatal_reporter)
+				return;
+
+			devlink_health_report(fw_health->fw_fatal_reporter,
+					      "FW fatal async event received",
+					      &fw_reporter_ctx);
+			return;
+		}
 		if (!fw_health->fw_reset_reporter)
 			return;
 
@@ -142,6 +189,15 @@ void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
 				      "FW non-fatal reset event received",
 				      &fw_reporter_ctx);
 		return;
+
+	case BNXT_FW_EXCEPTION_SP_EVENT:
+		if (!fw_health->fw_fatal_reporter)
+			return;
+
+		devlink_health_report(fw_health->fw_fatal_reporter,
+				      "FW fatal error reported",
+				      &fw_reporter_ctx);
+		return;
 	}
 }
 
-- 
2.5.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next v2 16/22] bnxt_en: Handle firmware reset.
  2019-08-30  3:54 ` [PATCH net-next v2 16/22] bnxt_en: Handle firmware reset Michael Chan
@ 2019-08-30 17:56   ` kbuild test robot
  0 siblings, 0 replies; 28+ messages in thread
From: kbuild test robot @ 2019-08-30 17:56 UTC (permalink / raw)
  To: Michael Chan; +Cc: kbuild-all, davem, netdev, vasundhara-v.volam, ray.jui

[-- Attachment #1: Type: text/plain, Size: 2764 bytes --]

Hi Michael,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Michael-Chan/bnxt_en-health-and-error-recovery/20190830-164635
config: x86_64-randconfig-f001-201934 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-11) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_fw_reset':
>> drivers/net/ethernet/broadcom/bnxt/bnxt.c:10021:35: error: 'struct bnxt' has no member named 'sriov_cfg'
       if (bp->pf.registered_vfs || bp->sriov_cfg) {
                                      ^~
   drivers/net/ethernet/broadcom/bnxt/bnxt.c: In function 'bnxt_fw_reset_task':
   drivers/net/ethernet/broadcom/bnxt/bnxt.c:10418:34: error: 'struct bnxt' has no member named 'sriov_cfg'
      if (bp->pf.registered_vfs || bp->sriov_cfg) {
                                     ^~
   drivers/net/ethernet/broadcom/bnxt/bnxt.c:10425:11: error: 'struct bnxt' has no member named 'sriov_cfg'
            bp->sriov_cfg);
              ^~

vim +10021 drivers/net/ethernet/broadcom/bnxt/bnxt.c

 10003	
 10004	void bnxt_fw_reset(struct bnxt *bp)
 10005	{
 10006		int rc;
 10007	
 10008		bnxt_rtnl_lock_sp(bp);
 10009		if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
 10010		    !test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
 10011			set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
 10012			if (BNXT_PF(bp) && bp->pf.active_vfs) {
 10013				rc = bnxt_hwrm_func_qcfg(bp);
 10014				if (rc) {
 10015					netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
 10016						   rc);
 10017					clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
 10018					dev_close(bp->dev);
 10019					goto fw_reset_exit;
 10020				}
 10021				if (bp->pf.registered_vfs || bp->sriov_cfg) {
 10022					u16 vf_tmo_dsecs = bp->pf.registered_vfs * 10;
 10023	
 10024					if (bp->fw_reset_max_dsecs < vf_tmo_dsecs)
 10025						bp->fw_reset_max_dsecs = vf_tmo_dsecs;
 10026					bp->fw_reset_state =
 10027						BNXT_FW_RESET_STATE_POLL_VF;
 10028					bnxt_queue_fw_reset_work(bp, HZ / 10);
 10029					goto fw_reset_exit;
 10030				}
 10031			}
 10032			bnxt_fw_reset_close(bp);
 10033			bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
 10034			bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 10035		}
 10036	fw_reset_exit:
 10037		bnxt_rtnl_unlock_sp(bp);
 10038	}
 10039	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33761 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next v2 00/22] bnxt_en: health and error recovery.
  2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
                   ` (21 preceding siblings ...)
  2019-08-30  3:55 ` [PATCH net-next v2 22/22] bnxt_en: Add FW fatal devlink_health_reporter Michael Chan
@ 2019-08-30 21:02 ` David Miller
  22 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2019-08-30 21:02 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, ray.jui

From: Michael Chan <michael.chan@broadcom.com>
Date: Thu, 29 Aug 2019 23:54:43 -0400

> This patchset implements adapter health and error recovery.  The status
> is reported through several devlink reporters and the driver will
> initiate and complete the recovery process using the devlink infrastructure.
> 
> v2: Added 4 patches at the beginning of the patchset to clean up error code
>     handling related to firmware messages and to convert to use standard
>     error codes.
> 
>     Removed the dropping of rtnl_lock in bnxt_close().
> 
>     Broke up the patches some more for better patch organization and
>     future bisection.

The return value handling looks a lot better now, thanks for cleaning that
up.

Series applied, thanks Michael.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter
  2019-08-30  3:54 ` [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter Michael Chan
@ 2019-10-07  9:56   ` Jiri Pirko
  2019-10-09  4:55     ` Vasundhara Volam
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Pirko @ 2019-10-07  9:56 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, netdev, vasundhara-v.volam, ray.jui, Jiri Pirko, ayal, moshe

Fri, Aug 30, 2019 at 05:54:57AM CEST, michael.chan@broadcom.com wrote:
>From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
>
>Create new FW devlink_health_reporter, to know the current health
>status of FW.
>
>Command example and output:
>$ devlink health show pci/0000:af:00.0 reporter fw
>
>pci/0000:af:00.0:
>  name fw
>    state healthy error 0 recover 0
>
> FW status: Healthy; Reset count: 1

I'm puzzled how did you get this output, since you put "FW status" into
"diagnose" callback fmsg and that is called upon "devlink health diagnose".

[...]


>+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
>+				     struct devlink_fmsg *fmsg)
>+{
>+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
>+	struct bnxt_fw_health *health = bp->fw_health;
>+	u32 val, health_status;
>+	int rc;
>+
>+	if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
>+		return 0;
>+
>+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
>+	health_status = val & 0xffff;
>+
>+	if (health_status == BNXT_FW_STATUS_HEALTHY) {
>+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>+						  "Healthy;");

First of all, the ";" is just wrong. You should put plain string if
anything. You are trying to format user output here. Don't do that
please.

Please see json output:
$ devlink health show pci/0000:af:00.0 reporter fw -j -p

Please remove ";" from the strings.


Second, I do not understand why you need this "FW status" at all. The
reporter itself has state healthy/error:
pci/0000:af:00.0:
  name fw
    state healthy error 0 recover 0
          ^^^^^^^

"FW" is redundant of course as the reporter name is "fw".

Please remove "FW status" and replace with some pair indicating the
actual error state.

In mlx5 they call it "Description".


>+		if (rc)
>+			return rc;
>+	} else if (health_status < BNXT_FW_STATUS_HEALTHY) {
>+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>+						  "Not yet completed initialization;");
>+		if (rc)
>+			return rc;
>+	} else if (health_status > BNXT_FW_STATUS_HEALTHY) {
>+		rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>+						  "Encountered fatal error and cannot recover;");
>+		if (rc)
>+			return rc;
>+	}
>+
>+	if (val >> 16) {
>+		rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);

Perhaps rather call this "Error code"?


>+		if (rc)
>+			return rc;
>+	}
>+
>+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
>+	rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);

What is this counter counting? Number of recoveries?
If so, that is also already counted internally by devlink.

[...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter
  2019-10-07  9:56   ` Jiri Pirko
@ 2019-10-09  4:55     ` Vasundhara Volam
  2019-10-09  7:04       ` Jiri Pirko
  0 siblings, 1 reply; 28+ messages in thread
From: Vasundhara Volam @ 2019-10-09  4:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Michael Chan, David Miller, Netdev, Ray Jui, Jiri Pirko, ayal,
	Moshe Shemesh

On Mon, Oct 7, 2019 at 3:26 PM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Fri, Aug 30, 2019 at 05:54:57AM CEST, michael.chan@broadcom.com wrote:
> >From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
> >
> >Create new FW devlink_health_reporter, to know the current health
> >status of FW.
> >
> >Command example and output:
> >$ devlink health show pci/0000:af:00.0 reporter fw
> >
> >pci/0000:af:00.0:
> >  name fw
> >    state healthy error 0 recover 0
> >
> > FW status: Healthy; Reset count: 1
>
> I'm puzzled how did you get this output, since you put "FW status" into
> "diagnose" callback fmsg and that is called upon "devlink health diagnose".
>
> [...]
Jiri, you are right last line is output of diagnose command. Command
is missing here.

$ devlink health diagnose pci/0000:af:00.0 reporter fw
 FW status: Healthy; Reset count: 0

>
> >+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
> >+                                   struct devlink_fmsg *fmsg)
> >+{
> >+      struct bnxt *bp = devlink_health_reporter_priv(reporter);
> >+      struct bnxt_fw_health *health = bp->fw_health;
> >+      u32 val, health_status;
> >+      int rc;
> >+
> >+      if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
> >+              return 0;
> >+
> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
> >+      health_status = val & 0xffff;
> >+
> >+      if (health_status == BNXT_FW_STATUS_HEALTHY) {
> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+                                                "Healthy;");
>
> First of all, the ";" is just wrong. You should put plain string if
> anything. You are trying to format user output here. Don't do that
> please.
>
> Please see json output:
> $ devlink health show pci/0000:af:00.0 reporter fw -j -p
>
> Please remove ";" from the strings.
Okay, I will send a patch for removing ";"

>
>
> Second, I do not understand why you need this "FW status" at all. The
> reporter itself has state healthy/error:
> pci/0000:af:00.0:
>   name fw
>     state healthy error 0 recover 0
>           ^^^^^^^
>
> "FW" is redundant of course as the reporter name is "fw".
>
> Please remove "FW status" and replace with some pair indicating the
> actual error state.
Okay, I can rename to "Status description" so that "FW" name will not
be repeated.

>
> In mlx5 they call it "Description".
>
>
> >+              if (rc)
> >+                      return rc;
> >+      } else if (health_status < BNXT_FW_STATUS_HEALTHY) {
> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+                                                "Not yet completed initialization;");
> >+              if (rc)
> >+                      return rc;
> >+      } else if (health_status > BNXT_FW_STATUS_HEALTHY) {
> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
> >+                                                "Encountered fatal error and cannot recover;");
> >+              if (rc)
> >+                      return rc;
> >+      }
> >+
> >+      if (val >> 16) {
> >+              rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
>
> Perhaps rather call this "Error code"?
Okay.

>
>
> >+              if (rc)
> >+                      return rc;
> >+      }
> >+
> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
> >+      rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
>
> What is this counter counting? Number of recoveries?
> If so, that is also already counted internally by devlink.
"Reset count" is the counter that displays the number of times
firmware has gone for
reset through different mechanisms and devlink is one of it. Firmware
could have gone
for a reset through other tools as well.

Driver gets the information from firmware health register, when
diagnose command is invoked.

>
> [...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter
  2019-10-09  4:55     ` Vasundhara Volam
@ 2019-10-09  7:04       ` Jiri Pirko
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Pirko @ 2019-10-09  7:04 UTC (permalink / raw)
  To: Vasundhara Volam
  Cc: Michael Chan, David Miller, Netdev, Ray Jui, Jiri Pirko, ayal,
	Moshe Shemesh

Wed, Oct 09, 2019 at 06:55:45AM CEST, vasundhara-v.volam@broadcom.com wrote:
>On Mon, Oct 7, 2019 at 3:26 PM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Fri, Aug 30, 2019 at 05:54:57AM CEST, michael.chan@broadcom.com wrote:
>> >From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
>> >
>> >Create new FW devlink_health_reporter, to know the current health
>> >status of FW.
>> >
>> >Command example and output:
>> >$ devlink health show pci/0000:af:00.0 reporter fw
>> >
>> >pci/0000:af:00.0:
>> >  name fw
>> >    state healthy error 0 recover 0
>> >
>> > FW status: Healthy; Reset count: 1
>>
>> I'm puzzled how did you get this output, since you put "FW status" into
>> "diagnose" callback fmsg and that is called upon "devlink health diagnose".
>>
>> [...]
>Jiri, you are right last line is output of diagnose command. Command
>is missing here.
>
>$ devlink health diagnose pci/0000:af:00.0 reporter fw
> FW status: Healthy; Reset count: 0
>
>>
>> >+static int bnxt_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
>> >+                                   struct devlink_fmsg *fmsg)
>> >+{
>> >+      struct bnxt *bp = devlink_health_reporter_priv(reporter);
>> >+      struct bnxt_fw_health *health = bp->fw_health;
>> >+      u32 val, health_status;
>> >+      int rc;
>> >+
>> >+      if (!health || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
>> >+              return 0;
>> >+
>> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG);
>> >+      health_status = val & 0xffff;
>> >+
>> >+      if (health_status == BNXT_FW_STATUS_HEALTHY) {
>> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>> >+                                                "Healthy;");
>>
>> First of all, the ";" is just wrong. You should put plain string if
>> anything. You are trying to format user output here. Don't do that
>> please.
>>
>> Please see json output:
>> $ devlink health show pci/0000:af:00.0 reporter fw -j -p
>>
>> Please remove ";" from the strings.
>Okay, I will send a patch for removing ";"
>
>>
>>
>> Second, I do not understand why you need this "FW status" at all. The
>> reporter itself has state healthy/error:
>> pci/0000:af:00.0:
>>   name fw
>>     state healthy error 0 recover 0
>>           ^^^^^^^
>>
>> "FW" is redundant of course as the reporter name is "fw".
>>
>> Please remove "FW status" and replace with some pair indicating the
>> actual error state.
>Okay, I can rename to "Status description" so that "FW" name will not
>be repeated.

Also please remove the redundant "Healthy" information. That is an
attribute of the reporter itself.


>
>>
>> In mlx5 they call it "Description".
>>
>>
>> >+              if (rc)
>> >+                      return rc;
>> >+      } else if (health_status < BNXT_FW_STATUS_HEALTHY) {
>> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>> >+                                                "Not yet completed initialization;");
>> >+              if (rc)
>> >+                      return rc;
>> >+      } else if (health_status > BNXT_FW_STATUS_HEALTHY) {
>> >+              rc = devlink_fmsg_string_pair_put(fmsg, "FW status",
>> >+                                                "Encountered fatal error and cannot recover;");
>> >+              if (rc)
>> >+                      return rc;
>> >+      }
>> >+
>> >+      if (val >> 16) {
>> >+              rc = devlink_fmsg_u32_pair_put(fmsg, "Error", val >> 16);
>>
>> Perhaps rather call this "Error code"?
>Okay.
>
>>
>>
>> >+              if (rc)
>> >+                      return rc;
>> >+      }
>> >+
>> >+      val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
>> >+      rc = devlink_fmsg_u32_pair_put(fmsg, "Reset count", val);
>>
>> What is this counter counting? Number of recoveries?
>> If so, that is also already counted internally by devlink.
>"Reset count" is the counter that displays the number of times
>firmware has gone for
>reset through different mechanisms and devlink is one of it. Firmware
>could have gone
>for a reset through other tools as well.
>
>Driver gets the information from firmware health register, when
>diagnose command is invoked.

okay, sounds sane.
Thanks!

>
>>
>> [...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-10-09  7:04 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-30  3:54 [PATCH net-next v2 00/22] bnxt_en: health and error recovery Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 01/22] bnxt_en: Use a common function to print the same ethtool -f error message Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 02/22] bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg() Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 03/22] bnxt_en: Convert error code in firmware message response to standard code Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 04/22] bnxt_en: Simplify error checking in the SR-IOV message forwarding functions Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 05/22] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 06/22] bnxt_en: Prepare bnxt_init_one() to be called multiple times Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 07/22] bnxt_en: Refactor bnxt_sriov_enable() Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 08/22] bnxt_en: Register buffers for VFs before reserving resources Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 09/22] bnxt_en: Handle firmware reset status during IF_UP Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 10/22] bnxt_en: Discover firmware error recovery capabilities Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 11/22] bnxt_en: Pre-map the firmware health monitoring registers Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 12/22] bnxt_en: Enable health monitoring Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 13/22] bnxt_en: Add BNXT_STATE_IN_FW_RESET state Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 14/22] bnxt_en: Add new FW devlink_health_reporter Michael Chan
2019-10-07  9:56   ` Jiri Pirko
2019-10-09  4:55     ` Vasundhara Volam
2019-10-09  7:04       ` Jiri Pirko
2019-08-30  3:54 ` [PATCH net-next v2 15/22] bnxt_en: Handle RESET_NOTIFY async event from firmware Michael Chan
2019-08-30  3:54 ` [PATCH net-next v2 16/22] bnxt_en: Handle firmware reset Michael Chan
2019-08-30 17:56   ` kbuild test robot
2019-08-30  3:55 ` [PATCH net-next v2 17/22] bnxt_en: Add devlink health reset reporter Michael Chan
2019-08-30  3:55 ` [PATCH net-next v2 18/22] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event Michael Chan
2019-08-30  3:55 ` [PATCH net-next v2 19/22] bnxt_en: Do not send firmware messages if firmware is in error state Michael Chan
2019-08-30  3:55 ` [PATCH net-next v2 20/22] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task() Michael Chan
2019-08-30  3:55 ` [PATCH net-next v2 21/22] bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors Michael Chan
2019-08-30  3:55 ` [PATCH net-next v2 22/22] bnxt_en: Add FW fatal devlink_health_reporter Michael Chan
2019-08-30 21:02 ` [PATCH net-next v2 00/22] bnxt_en: health and error recovery David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.