All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
@ 2023-02-09 19:06 Tony Nguyen
  2023-02-09 19:06 ` [PATCH net-next 1/5] ice: remove FW logging code Tony Nguyen
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Tony Nguyen @ 2023-02-09 19:06 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Tony Nguyen, netdev, paul.m.stillwell.jr, jacob.e.keller, jiri, idosch

Paul Stillwell says:

FW log support was added to the ice driver, but that version is no
longer supported. There is a newer version of FW logging (v2) that
adds more control knobs to get the exact data out of the FW
for debugging.

Additionally, instead of dumping the FW log output to syslog,
dump it to debugfs. The FW log data is really just binary
data that the FW log team decodes to determine what happens so the
translation from binary to some text output just slows things down
and results in potential dropped data. The structure for the debugfs
entry is: /sys/kernel/debug/ice/<pci device>/fwlog

Once enabled the FW log data is received as ARQ events that the driver
processes.

The FW logging is across all the PFs on the device, so restrict the
commands to only PF0.

The following new device parameters are added:
- fwlog_supported (read-only): does the FW support FW logging
- fwlog_enabled (read/write): is FW logging currently running
- fwlog_level (read/write): the log level enabled, valid values are
    Each level includes the messages from the previous/lower level
	0 - no logging
	1 - error logging
	2 - warning logging
	3 - normal logging
	4 - verbose logging
- fwlog_resolution (read/write): the number of log messages to included
  in a single ARQ event. The range is 1-128 (1 means push every log
  message, 128 means push only when the max AQ command buffer is full).
  The suggested value is 10.

This patch series adds the following set of devlink commands:

devlink dev param set <pci dev> name fwlog_enabled value <true/false> cmode runtime
devlink dev param set <pci dev> name fwlog_level value <0-4> cmode runtime
devlink dev param set <pci dev> name fwlog_resolution value <1-128> cmode runtime
---
Previous discussion:
https://lore.kernel.org/netdev/fea3e7bc-db75-ce15-1330-d80483267ee2@intel.com/

The following are changes since commit 5131a053f2927158fb42880c69b5dc0d2e28ddee:
  Merge tag 'linux-can-next-for-6.3-20230208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE

Paul M Stillwell Jr (5):
  ice: remove FW logging code
  ice: enable devlink to check FW logging status
  ice: add ability to query/set FW log level and resolution
  ice: disable FW logging on driver unload
  ice: use debugfs to output FW log data

 Documentation/networking/devlink/ice.rst      |  39 ++
 drivers/net/ethernet/intel/ice/Makefile       |   5 +-
 drivers/net/ethernet/intel/ice/ice.h          |  22 ++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 160 ++++----
 drivers/net/ethernet/intel/ice/ice_common.c   | 218 +----------
 drivers/net/ethernet/intel/ice/ice_common.h   |   1 -
 drivers/net/ethernet/intel/ice/ice_debugfs.c  | 116 ++++++
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 181 ++++++++-
 drivers/net/ethernet/intel/ice/ice_fwlog.c    | 367 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_fwlog.h    |  57 +++
 drivers/net/ethernet/intel/ice/ice_main.c     | 101 ++++-
 drivers/net/ethernet/intel/ice/ice_type.h     |  23 +-
 12 files changed, 973 insertions(+), 317 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_debugfs.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_fwlog.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_fwlog.h

-- 
2.38.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/5] ice: remove FW logging code
  2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
@ 2023-02-09 19:06 ` Tony Nguyen
  2023-02-09 19:06 ` [PATCH net-next 2/5] ice: enable devlink to check FW logging status Tony Nguyen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Tony Nguyen @ 2023-02-09 19:06 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Paul M Stillwell Jr, netdev, anthony.l.nguyen, jacob.e.keller,
	jiri, idosch, Gurucharan G

From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

The FW logging code doesn't work because there is no way to set
cq_ena or uart_ena so remove the code. This code is the original
(v1) way of FW logging so it should be replaced with the v2 way.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  78 -------
 drivers/net/ethernet/intel/ice/ice_common.c   | 218 ------------------
 drivers/net/ethernet/intel/ice/ice_common.h   |   1 -
 drivers/net/ethernet/intel/ice/ice_main.c     |   3 -
 drivers/net/ethernet/intel/ice/ice_type.h     |  20 --
 5 files changed, 320 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 838d9b274d68..e1b95cc3a538 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1953,78 +1953,6 @@ struct ice_aqc_add_rdma_qset_data {
 	struct ice_aqc_add_tx_rdma_qset_entry rdma_qsets[];
 };
 
-/* Configure Firmware Logging Command (indirect 0xFF09)
- * Logging Information Read Response (indirect 0xFF10)
- * Note: The 0xFF10 command has no input parameters.
- */
-struct ice_aqc_fw_logging {
-	u8 log_ctrl;
-#define ICE_AQC_FW_LOG_AQ_EN		BIT(0)
-#define ICE_AQC_FW_LOG_UART_EN		BIT(1)
-	u8 rsvd0;
-	u8 log_ctrl_valid; /* Not used by 0xFF10 Response */
-#define ICE_AQC_FW_LOG_AQ_VALID		BIT(0)
-#define ICE_AQC_FW_LOG_UART_VALID	BIT(1)
-	u8 rsvd1[5];
-	__le32 addr_high;
-	__le32 addr_low;
-};
-
-enum ice_aqc_fw_logging_mod {
-	ICE_AQC_FW_LOG_ID_GENERAL = 0,
-	ICE_AQC_FW_LOG_ID_CTRL,
-	ICE_AQC_FW_LOG_ID_LINK,
-	ICE_AQC_FW_LOG_ID_LINK_TOPO,
-	ICE_AQC_FW_LOG_ID_DNL,
-	ICE_AQC_FW_LOG_ID_I2C,
-	ICE_AQC_FW_LOG_ID_SDP,
-	ICE_AQC_FW_LOG_ID_MDIO,
-	ICE_AQC_FW_LOG_ID_ADMINQ,
-	ICE_AQC_FW_LOG_ID_HDMA,
-	ICE_AQC_FW_LOG_ID_LLDP,
-	ICE_AQC_FW_LOG_ID_DCBX,
-	ICE_AQC_FW_LOG_ID_DCB,
-	ICE_AQC_FW_LOG_ID_NETPROXY,
-	ICE_AQC_FW_LOG_ID_NVM,
-	ICE_AQC_FW_LOG_ID_AUTH,
-	ICE_AQC_FW_LOG_ID_VPD,
-	ICE_AQC_FW_LOG_ID_IOSF,
-	ICE_AQC_FW_LOG_ID_PARSER,
-	ICE_AQC_FW_LOG_ID_SW,
-	ICE_AQC_FW_LOG_ID_SCHEDULER,
-	ICE_AQC_FW_LOG_ID_TXQ,
-	ICE_AQC_FW_LOG_ID_RSVD,
-	ICE_AQC_FW_LOG_ID_POST,
-	ICE_AQC_FW_LOG_ID_WATCHDOG,
-	ICE_AQC_FW_LOG_ID_TASK_DISPATCH,
-	ICE_AQC_FW_LOG_ID_MNG,
-	ICE_AQC_FW_LOG_ID_MAX,
-};
-
-/* Defines for both above FW logging command/response buffers */
-#define ICE_AQC_FW_LOG_ID_S		0
-#define ICE_AQC_FW_LOG_ID_M		(0xFFF << ICE_AQC_FW_LOG_ID_S)
-
-#define ICE_AQC_FW_LOG_CONF_SUCCESS	0	/* Used by response */
-#define ICE_AQC_FW_LOG_CONF_BAD_INDX	BIT(12)	/* Used by response */
-
-#define ICE_AQC_FW_LOG_EN_S		12
-#define ICE_AQC_FW_LOG_EN_M		(0xF << ICE_AQC_FW_LOG_EN_S)
-#define ICE_AQC_FW_LOG_INFO_EN		BIT(12)	/* Used by command */
-#define ICE_AQC_FW_LOG_INIT_EN		BIT(13)	/* Used by command */
-#define ICE_AQC_FW_LOG_FLOW_EN		BIT(14)	/* Used by command */
-#define ICE_AQC_FW_LOG_ERR_EN		BIT(15)	/* Used by command */
-
-/* Get/Clear FW Log (indirect 0xFF11) */
-struct ice_aqc_get_clear_fw_log {
-	u8 flags;
-#define ICE_AQC_FW_LOG_CLEAR		BIT(0)
-#define ICE_AQC_FW_LOG_MORE_DATA_AVAIL	BIT(1)
-	u8 rsvd1[7];
-	__le32 addr_high;
-	__le32 addr_low;
-};
-
 /* Download Package (indirect 0x0C40) */
 /* Also used for Update Package (indirect 0x0C41 and 0x0C42) */
 struct ice_aqc_download_pkg {
@@ -2184,8 +2112,6 @@ struct ice_aq_desc {
 		struct ice_aqc_add_rdma_qset add_rdma_qset;
 		struct ice_aqc_add_get_update_free_vsi vsi_cmd;
 		struct ice_aqc_add_update_free_vsi_resp add_update_free_vsi_res;
-		struct ice_aqc_fw_logging fw_logging;
-		struct ice_aqc_get_clear_fw_log get_clear_fw_log;
 		struct ice_aqc_download_pkg download_pkg;
 		struct ice_aqc_driver_shared_params drv_shared_params;
 		struct ice_aqc_set_mac_lb set_mac_lb;
@@ -2368,10 +2294,6 @@ enum ice_adminq_opc {
 
 	/* Standalone Commands/Events */
 	ice_aqc_opc_event_lan_overflow			= 0x1001,
-
-	/* debug commands */
-	ice_aqc_opc_fw_logging				= 0xFF09,
-	ice_aqc_opc_fw_logging_info			= 0xFF10,
 };
 
 #endif /* _ICE_ADMINQ_CMD_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 75f8b8bca7e8..21d415d29387 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -797,217 +797,6 @@ static void ice_cleanup_fltr_mgmt_struct(struct ice_hw *hw)
 	devm_kfree(ice_hw_to_dev(hw), sw);
 }
 
-/**
- * ice_get_fw_log_cfg - get FW logging configuration
- * @hw: pointer to the HW struct
- */
-static int ice_get_fw_log_cfg(struct ice_hw *hw)
-{
-	struct ice_aq_desc desc;
-	__le16 *config;
-	int status;
-	u16 size;
-
-	size = sizeof(*config) * ICE_AQC_FW_LOG_ID_MAX;
-	config = devm_kzalloc(ice_hw_to_dev(hw), size, GFP_KERNEL);
-	if (!config)
-		return -ENOMEM;
-
-	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_fw_logging_info);
-
-	status = ice_aq_send_cmd(hw, &desc, config, size, NULL);
-	if (!status) {
-		u16 i;
-
-		/* Save FW logging information into the HW structure */
-		for (i = 0; i < ICE_AQC_FW_LOG_ID_MAX; i++) {
-			u16 v, m, flgs;
-
-			v = le16_to_cpu(config[i]);
-			m = (v & ICE_AQC_FW_LOG_ID_M) >> ICE_AQC_FW_LOG_ID_S;
-			flgs = (v & ICE_AQC_FW_LOG_EN_M) >> ICE_AQC_FW_LOG_EN_S;
-
-			if (m < ICE_AQC_FW_LOG_ID_MAX)
-				hw->fw_log.evnts[m].cur = flgs;
-		}
-	}
-
-	devm_kfree(ice_hw_to_dev(hw), config);
-
-	return status;
-}
-
-/**
- * ice_cfg_fw_log - configure FW logging
- * @hw: pointer to the HW struct
- * @enable: enable certain FW logging events if true, disable all if false
- *
- * This function enables/disables the FW logging via Rx CQ events and a UART
- * port based on predetermined configurations. FW logging via the Rx CQ can be
- * enabled/disabled for individual PF's. However, FW logging via the UART can
- * only be enabled/disabled for all PFs on the same device.
- *
- * To enable overall FW logging, the "cq_en" and "uart_en" enable bits in
- * hw->fw_log need to be set accordingly, e.g. based on user-provided input,
- * before initializing the device.
- *
- * When re/configuring FW logging, callers need to update the "cfg" elements of
- * the hw->fw_log.evnts array with the desired logging event configurations for
- * modules of interest. When disabling FW logging completely, the callers can
- * just pass false in the "enable" parameter. On completion, the function will
- * update the "cur" element of the hw->fw_log.evnts array with the resulting
- * logging event configurations of the modules that are being re/configured. FW
- * logging modules that are not part of a reconfiguration operation retain their
- * previous states.
- *
- * Before resetting the device, it is recommended that the driver disables FW
- * logging before shutting down the control queue. When disabling FW logging
- * ("enable" = false), the latest configurations of FW logging events stored in
- * hw->fw_log.evnts[] are not overridden to allow them to be reconfigured after
- * a device reset.
- *
- * When enabling FW logging to emit log messages via the Rx CQ during the
- * device's initialization phase, a mechanism alternative to interrupt handlers
- * needs to be used to extract FW log messages from the Rx CQ periodically and
- * to prevent the Rx CQ from being full and stalling other types of control
- * messages from FW to SW. Interrupts are typically disabled during the device's
- * initialization phase.
- */
-static int ice_cfg_fw_log(struct ice_hw *hw, bool enable)
-{
-	struct ice_aqc_fw_logging *cmd;
-	u16 i, chgs = 0, len = 0;
-	struct ice_aq_desc desc;
-	__le16 *data = NULL;
-	u8 actv_evnts = 0;
-	void *buf = NULL;
-	int status = 0;
-
-	if (!hw->fw_log.cq_en && !hw->fw_log.uart_en)
-		return 0;
-
-	/* Disable FW logging only when the control queue is still responsive */
-	if (!enable &&
-	    (!hw->fw_log.actv_evnts || !ice_check_sq_alive(hw, &hw->adminq)))
-		return 0;
-
-	/* Get current FW log settings */
-	status = ice_get_fw_log_cfg(hw);
-	if (status)
-		return status;
-
-	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_fw_logging);
-	cmd = &desc.params.fw_logging;
-
-	/* Indicate which controls are valid */
-	if (hw->fw_log.cq_en)
-		cmd->log_ctrl_valid |= ICE_AQC_FW_LOG_AQ_VALID;
-
-	if (hw->fw_log.uart_en)
-		cmd->log_ctrl_valid |= ICE_AQC_FW_LOG_UART_VALID;
-
-	if (enable) {
-		/* Fill in an array of entries with FW logging modules and
-		 * logging events being reconfigured.
-		 */
-		for (i = 0; i < ICE_AQC_FW_LOG_ID_MAX; i++) {
-			u16 val;
-
-			/* Keep track of enabled event types */
-			actv_evnts |= hw->fw_log.evnts[i].cfg;
-
-			if (hw->fw_log.evnts[i].cfg == hw->fw_log.evnts[i].cur)
-				continue;
-
-			if (!data) {
-				data = devm_kcalloc(ice_hw_to_dev(hw),
-						    ICE_AQC_FW_LOG_ID_MAX,
-						    sizeof(*data),
-						    GFP_KERNEL);
-				if (!data)
-					return -ENOMEM;
-			}
-
-			val = i << ICE_AQC_FW_LOG_ID_S;
-			val |= hw->fw_log.evnts[i].cfg << ICE_AQC_FW_LOG_EN_S;
-			data[chgs++] = cpu_to_le16(val);
-		}
-
-		/* Only enable FW logging if at least one module is specified.
-		 * If FW logging is currently enabled but all modules are not
-		 * enabled to emit log messages, disable FW logging altogether.
-		 */
-		if (actv_evnts) {
-			/* Leave if there is effectively no change */
-			if (!chgs)
-				goto out;
-
-			if (hw->fw_log.cq_en)
-				cmd->log_ctrl |= ICE_AQC_FW_LOG_AQ_EN;
-
-			if (hw->fw_log.uart_en)
-				cmd->log_ctrl |= ICE_AQC_FW_LOG_UART_EN;
-
-			buf = data;
-			len = sizeof(*data) * chgs;
-			desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
-		}
-	}
-
-	status = ice_aq_send_cmd(hw, &desc, buf, len, NULL);
-	if (!status) {
-		/* Update the current configuration to reflect events enabled.
-		 * hw->fw_log.cq_en and hw->fw_log.uart_en indicate if the FW
-		 * logging mode is enabled for the device. They do not reflect
-		 * actual modules being enabled to emit log messages. So, their
-		 * values remain unchanged even when all modules are disabled.
-		 */
-		u16 cnt = enable ? chgs : (u16)ICE_AQC_FW_LOG_ID_MAX;
-
-		hw->fw_log.actv_evnts = actv_evnts;
-		for (i = 0; i < cnt; i++) {
-			u16 v, m;
-
-			if (!enable) {
-				/* When disabling all FW logging events as part
-				 * of device's de-initialization, the original
-				 * configurations are retained, and can be used
-				 * to reconfigure FW logging later if the device
-				 * is re-initialized.
-				 */
-				hw->fw_log.evnts[i].cur = 0;
-				continue;
-			}
-
-			v = le16_to_cpu(data[i]);
-			m = (v & ICE_AQC_FW_LOG_ID_M) >> ICE_AQC_FW_LOG_ID_S;
-			hw->fw_log.evnts[m].cur = hw->fw_log.evnts[m].cfg;
-		}
-	}
-
-out:
-	if (data)
-		devm_kfree(ice_hw_to_dev(hw), data);
-
-	return status;
-}
-
-/**
- * ice_output_fw_log
- * @hw: pointer to the HW struct
- * @desc: pointer to the AQ message descriptor
- * @buf: pointer to the buffer accompanying the AQ message
- *
- * Formats a FW Log message and outputs it via the standard driver logs.
- */
-void ice_output_fw_log(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf)
-{
-	ice_debug(hw, ICE_DBG_FW_LOG, "[ FW Log Msg Start ]\n");
-	ice_debug_array(hw, ICE_DBG_FW_LOG, 16, 1, (u8 *)buf,
-			le16_to_cpu(desc->datalen));
-	ice_debug(hw, ICE_DBG_FW_LOG, "[ FW Log Msg End ]\n");
-}
-
 /**
  * ice_get_itr_intrl_gran
  * @hw: pointer to the HW struct
@@ -1065,11 +854,6 @@ int ice_init_hw(struct ice_hw *hw)
 	if (status)
 		goto err_unroll_cqinit;
 
-	/* Enable FW logging. Not fatal if this fails. */
-	status = ice_cfg_fw_log(hw, true);
-	if (status)
-		ice_debug(hw, ICE_DBG_INIT, "Failed to enable FW logging.\n");
-
 	status = ice_clear_pf_cfg(hw);
 	if (status)
 		goto err_unroll_cqinit;
@@ -1219,8 +1003,6 @@ void ice_deinit_hw(struct ice_hw *hw)
 	ice_free_hw_tbls(hw);
 	mutex_destroy(&hw->tnl_lock);
 
-	/* Attempt to disable FW logging before shutting down control queues */
-	ice_cfg_fw_log(hw, false);
 	ice_destroy_all_ctrlq(hw);
 
 	/* Clear VSI contexts if not already cleared */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 98aa8d124730..1c558ad8d35c 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -188,7 +188,6 @@ ice_ena_vsi_txq(struct ice_port_info *pi, u16 vsi_handle, u8 tc, u16 q_handle,
 		struct ice_sq_cd *cd);
 int ice_replay_vsi(struct ice_hw *hw, u16 vsi_handle);
 void ice_replay_post(struct ice_hw *hw);
-void ice_output_fw_log(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf);
 struct ice_q_ctx *
 ice_get_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 q_handle);
 int ice_sbq_rw_reg(struct ice_hw *hw, struct ice_sbq_msg_input *in);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index b71ff7e2ebf2..055849dc608a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1485,9 +1485,6 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type)
 			if (!ice_is_malicious_vf(pf, &event, i, pending))
 				ice_vc_process_vf_msg(pf, &event);
 			break;
-		case ice_aqc_opc_fw_logging:
-			ice_output_fw_log(hw, &event.desc, event.msg_buf);
-			break;
 		case ice_aqc_opc_lldp_set_mib_change:
 			ice_dcb_process_lldp_set_mib_change(pf, &event);
 			break;
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index e3f622cad425..126605b7eb3b 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -727,24 +727,6 @@ struct ice_switch_info {
 	DECLARE_BITMAP(prof_res_bm[ICE_MAX_NUM_PROFILES], ICE_MAX_FV_WORDS);
 };
 
-/* FW logging configuration */
-struct ice_fw_log_evnt {
-	u8 cfg : 4;	/* New event enables to configure */
-	u8 cur : 4;	/* Current/active event enables */
-};
-
-struct ice_fw_log_cfg {
-	u8 cq_en : 1;    /* FW logging is enabled via the control queue */
-	u8 uart_en : 1;  /* FW logging is enabled via UART for all PFs */
-	u8 actv_evnts;   /* Cumulation of currently enabled log events */
-
-#define ICE_FW_LOG_EVNT_INFO	(ICE_AQC_FW_LOG_INFO_EN >> ICE_AQC_FW_LOG_EN_S)
-#define ICE_FW_LOG_EVNT_INIT	(ICE_AQC_FW_LOG_INIT_EN >> ICE_AQC_FW_LOG_EN_S)
-#define ICE_FW_LOG_EVNT_FLOW	(ICE_AQC_FW_LOG_FLOW_EN >> ICE_AQC_FW_LOG_EN_S)
-#define ICE_FW_LOG_EVNT_ERR	(ICE_AQC_FW_LOG_ERR_EN >> ICE_AQC_FW_LOG_EN_S)
-	struct ice_fw_log_evnt evnts[ICE_AQC_FW_LOG_ID_MAX];
-};
-
 /* Enum defining the different states of the mailbox snapshot in the
  * PF-VF mailbox overflow detection algorithm. The snapshot can be in
  * states:
@@ -877,8 +859,6 @@ struct ice_hw {
 	u8 fw_patch;		/* firmware patch version */
 	u32 fw_build;		/* firmware build number */
 
-	struct ice_fw_log_cfg fw_log;
-
 /* Device max aggregate bandwidths corresponding to the GL_PWR_MODE_CTL
  * register. Used for determining the ITR/INTRL granularity during
  * initialization.
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/5] ice: enable devlink to check FW logging status
  2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
  2023-02-09 19:06 ` [PATCH net-next 1/5] ice: remove FW logging code Tony Nguyen
@ 2023-02-09 19:06 ` Tony Nguyen
  2023-02-09 19:07 ` [PATCH net-next 3/5] ice: add ability to query/set FW log level and resolution Tony Nguyen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Tony Nguyen @ 2023-02-09 19:06 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Paul M Stillwell Jr, netdev, anthony.l.nguyen, jacob.e.keller,
	jiri, idosch, Gurucharan G

From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

Users want the ability to debug FW issues by retrieving the
FW logs from the E8xx devices. Enable devlink to query the driver
to see if the NVM image allows FW logging and to see if FW
logging is currently running. The set command is not supported
at this time.

This is the beginning of the v2 for FW logging.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/Makefile       |   3 +-
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  81 ++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.c   |   2 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  |  64 +++++++++-
 drivers/net/ethernet/intel/ice/ice_fwlog.c    | 117 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_fwlog.h    |  52 ++++++++
 drivers/net/ethernet/intel/ice/ice_type.h     |   4 +
 7 files changed, 321 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_fwlog.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_fwlog.h

diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index f269952d207d..6e4680ad097c 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -33,7 +33,8 @@ ice-y := ice_main.o	\
 	 ice_lag.o	\
 	 ice_ethtool.o  \
 	 ice_repr.o	\
-	 ice_tc_lib.o
+	 ice_tc_lib.o	\
+	 ice_fwlog.o
 ice-$(CONFIG_PCI_IOV) +=	\
 	ice_sriov.o		\
 	ice_virtchnl.o		\
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index e1b95cc3a538..0d560287ec23 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -2038,6 +2038,83 @@ struct ice_aqc_event_lan_overflow {
 	u8 reserved[8];
 };
 
+enum ice_aqc_fw_logging_mod {
+	ICE_AQC_FW_LOG_ID_GENERAL = 0,
+	ICE_AQC_FW_LOG_ID_CTRL,
+	ICE_AQC_FW_LOG_ID_LINK,
+	ICE_AQC_FW_LOG_ID_LINK_TOPO,
+	ICE_AQC_FW_LOG_ID_DNL,
+	ICE_AQC_FW_LOG_ID_I2C,
+	ICE_AQC_FW_LOG_ID_SDP,
+	ICE_AQC_FW_LOG_ID_MDIO,
+	ICE_AQC_FW_LOG_ID_ADMINQ,
+	ICE_AQC_FW_LOG_ID_HDMA,
+	ICE_AQC_FW_LOG_ID_LLDP,
+	ICE_AQC_FW_LOG_ID_DCBX,
+	ICE_AQC_FW_LOG_ID_DCB,
+	ICE_AQC_FW_LOG_ID_XLR,
+	ICE_AQC_FW_LOG_ID_NVM,
+	ICE_AQC_FW_LOG_ID_AUTH,
+	ICE_AQC_FW_LOG_ID_VPD,
+	ICE_AQC_FW_LOG_ID_IOSF,
+	ICE_AQC_FW_LOG_ID_PARSER,
+	ICE_AQC_FW_LOG_ID_SW,
+	ICE_AQC_FW_LOG_ID_SCHEDULER,
+	ICE_AQC_FW_LOG_ID_TXQ,
+	ICE_AQC_FW_LOG_ID_RSVD,
+	ICE_AQC_FW_LOG_ID_POST,
+	ICE_AQC_FW_LOG_ID_WATCHDOG,
+	ICE_AQC_FW_LOG_ID_TASK_DISPATCH,
+	ICE_AQC_FW_LOG_ID_MNG,
+	ICE_AQC_FW_LOG_ID_SYNCE,
+	ICE_AQC_FW_LOG_ID_HEALTH,
+	ICE_AQC_FW_LOG_ID_TSDRV,
+	ICE_AQC_FW_LOG_ID_PFREG,
+	ICE_AQC_FW_LOG_ID_MDLVER,
+	ICE_AQC_FW_LOG_ID_MAX,
+};
+
+/* Set FW Logging configuration (indirect 0xFF30)
+ * Register for FW Logging (indirect 0xFF31)
+ * Query FW Logging (indirect 0xFF32)
+ */
+struct ice_aqc_fw_log {
+	u8 cmd_flags;
+#define ICE_AQC_FW_LOG_CONF_UART_EN	BIT(0)
+#define ICE_AQC_FW_LOG_CONF_AQ_EN	BIT(1)
+#define ICE_AQC_FW_LOG_QUERY_REGISTERED	BIT(2)
+#define ICE_AQC_FW_LOG_CONF_SET_VALID	BIT(3)
+#define ICE_AQC_FW_LOG_AQ_REGISTER	BIT(0)
+#define ICE_AQC_FW_LOG_AQ_QUERY		BIT(2)
+
+	u8 rsp_flag;
+	__le16 fw_rt_msb;
+	union {
+		struct {
+			__le32 fw_rt_lsb;
+		} sync;
+		struct {
+			__le16 log_resolution;
+#define ICE_AQC_FW_LOG_MIN_RESOLUTION		(1)
+#define ICE_AQC_FW_LOG_MAX_RESOLUTION		(128)
+
+			__le16 mdl_cnt;
+		} cfg;
+	} ops;
+	__le32 addr_high;
+	__le32 addr_low;
+};
+
+/* Response Buffer for:
+ *    Set Firmware Logging Configuration (0xFF30)
+ *    Query FW Logging (0xFF32)
+ */
+struct ice_aqc_fw_log_cfg_resp {
+	__le16 module_identifier;
+	u8 log_level;
+	u8 rsvd0;
+};
+
 /**
  * struct ice_aq_desc - Admin Queue (AQ) descriptor
  * @flags: ICE_AQ_FLAG_* flags
@@ -2114,6 +2191,7 @@ struct ice_aq_desc {
 		struct ice_aqc_add_update_free_vsi_resp add_update_free_vsi_res;
 		struct ice_aqc_download_pkg download_pkg;
 		struct ice_aqc_driver_shared_params drv_shared_params;
+		struct ice_aqc_fw_log fw_log;
 		struct ice_aqc_set_mac_lb set_mac_lb;
 		struct ice_aqc_alloc_free_res_cmd sw_res_ctrl;
 		struct ice_aqc_set_mac_cfg set_mac_cfg;
@@ -2294,6 +2372,9 @@ enum ice_adminq_opc {
 
 	/* Standalone Commands/Events */
 	ice_aqc_opc_event_lan_overflow			= 0x1001,
+
+	/* FW Logging Commands */
+	ice_aqc_opc_fw_logs_query			= 0xFF32,
 };
 
 #endif /* _ICE_ADMINQ_CMD_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 21d415d29387..3549d3573e73 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -854,6 +854,8 @@ int ice_init_hw(struct ice_hw *hw)
 	if (status)
 		goto err_unroll_cqinit;
 
+	ice_fwlog_set_supported(hw);
+
 	status = ice_clear_pf_cfg(hw);
 	if (status)
 		goto err_unroll_cqinit;
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 63acb7b96053..39776a3de9fa 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -369,6 +369,13 @@ static int ice_devlink_info_get(struct devlink *devlink,
 	return err;
 }
 
+enum ice_devlink_param_id {
+	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	ICE_DEVLINK_PARAM_ID_TX_BALANCE,
+	ICE_DEVLINK_PARAM_ID_FWLOG_SUPPORTED,
+	ICE_DEVLINK_PARAM_ID_FWLOG_ENABLED,
+};
+
 /**
  * ice_devlink_reload_empr_start - Start EMP reset to activate new firmware
  * @pf: pointer to the pf instance
@@ -1383,6 +1390,50 @@ ice_devlink_enable_iw_validate(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int
+ice_devlink_fwlog_supported_get(struct devlink *devlink, u32 id,
+				struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	/* only support fw log commands on PF 0 */
+	if (pf->hw.bus.func)
+		return -EOPNOTSUPP;
+
+	ctx->val.vbool = ice_fwlog_supported(&pf->hw);
+	return 0;
+}
+
+static int
+ice_devlink_fwlog_supported_set(struct devlink *devlink, u32 id,
+				struct devlink_param_gset_ctx *ctx)
+{
+	/* set operation is unsupported */
+	return -EOPNOTSUPP;
+}
+
+static int
+ice_devlink_fwlog_enabled_get(struct devlink *devlink, u32 id,
+			      struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	/* only support fw log commands on PF 0 */
+	if (pf->hw.bus.func)
+		return -EOPNOTSUPP;
+
+	ctx->val.vbool = pf->hw.fwlog_ena;
+	return 0;
+}
+
+static int
+ice_devlink_fwlog_enabled_set(struct devlink *devlink, u32 id,
+			      struct devlink_param_gset_ctx *ctx)
+{
+	/* set operation is unsupported at this time */
+	return -EOPNOTSUPP;
+}
+
 static const struct devlink_param ice_devlink_params[] = {
 	DEVLINK_PARAM_GENERIC(ENABLE_ROCE, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
 			      ice_devlink_enable_roce_get,
@@ -1392,7 +1443,18 @@ static const struct devlink_param ice_devlink_params[] = {
 			      ice_devlink_enable_iw_get,
 			      ice_devlink_enable_iw_set,
 			      ice_devlink_enable_iw_validate),
-
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_FWLOG_SUPPORTED,
+			     "fwlog_supported", DEVLINK_PARAM_TYPE_BOOL,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     ice_devlink_fwlog_supported_get,
+			     ice_devlink_fwlog_supported_set,
+			     NULL),
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_FWLOG_ENABLED,
+			     "fwlog_enabled", DEVLINK_PARAM_TYPE_BOOL,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     ice_devlink_fwlog_enabled_get,
+			     ice_devlink_fwlog_enabled_set,
+			     NULL),
 };
 
 static void ice_devlink_free(void *devlink_ptr)
diff --git a/drivers/net/ethernet/intel/ice/ice_fwlog.c b/drivers/net/ethernet/intel/ice/ice_fwlog.c
new file mode 100644
index 000000000000..fac970f03dd0
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_fwlog.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, Intel Corporation. */
+
+#include "ice_common.h"
+#include "ice_fwlog.h"
+
+/**
+ * ice_fwlog_supported - Cached for whether FW supports FW logging or not
+ * @hw: pointer to the HW structure
+ *
+ * This will always return false if called before ice_init_hw(), so it must be
+ * called after ice_init_hw().
+ */
+bool ice_fwlog_supported(struct ice_hw *hw)
+{
+	return hw->fwlog_supported;
+}
+
+/**
+ * ice_aq_fwlog_get - Get the current firmware logging configuration (0xFF32)
+ * @hw: pointer to the HW structure
+ * @cfg: firmware logging configuration to populate
+ */
+static int
+ice_aq_fwlog_get(struct ice_hw *hw, struct ice_fwlog_cfg *cfg)
+{
+	struct ice_aqc_fw_log_cfg_resp *fw_modules;
+	struct ice_aqc_fw_log *cmd;
+	struct ice_aq_desc desc;
+	u16 module_id_cnt;
+	int status;
+	void *buf;
+	int i;
+
+	memset(cfg, 0, sizeof(*cfg));
+
+	buf = kzalloc(ICE_AQ_MAX_BUF_LEN, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_fw_logs_query);
+	cmd = &desc.params.fw_log;
+
+	cmd->cmd_flags = ICE_AQC_FW_LOG_AQ_QUERY;
+
+	status = ice_aq_send_cmd(hw, &desc, buf, ICE_AQ_MAX_BUF_LEN, NULL);
+	if (status) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "Failed to get FW log configuration\n");
+		goto status_out;
+	}
+
+	module_id_cnt = le16_to_cpu(cmd->ops.cfg.mdl_cnt);
+	if (module_id_cnt < ICE_AQC_FW_LOG_ID_MAX) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "FW returned less than the expected number of FW log module IDs\n");
+	} else if (module_id_cnt > ICE_AQC_FW_LOG_ID_MAX) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "FW returned more than expected number of FW log module IDs, setting module_id_cnt to software expected max %u\n",
+			  ICE_AQC_FW_LOG_ID_MAX);
+		module_id_cnt = ICE_AQC_FW_LOG_ID_MAX;
+	}
+
+	cfg->log_resolution = le16_to_cpu(cmd->ops.cfg.log_resolution);
+	if (cmd->cmd_flags & ICE_AQC_FW_LOG_CONF_AQ_EN)
+		cfg->options |= ICE_FWLOG_OPTION_ARQ_ENA;
+	if (cmd->cmd_flags & ICE_AQC_FW_LOG_CONF_UART_EN)
+		cfg->options |= ICE_FWLOG_OPTION_UART_ENA;
+	if (cmd->cmd_flags & ICE_AQC_FW_LOG_QUERY_REGISTERED)
+		cfg->options |= ICE_FWLOG_OPTION_IS_REGISTERED;
+
+	fw_modules = (struct ice_aqc_fw_log_cfg_resp *)buf;
+
+	for (i = 0; i < module_id_cnt; i++) {
+		struct ice_aqc_fw_log_cfg_resp *fw_module = &fw_modules[i];
+
+		cfg->module_entries[i].module_id =
+			le16_to_cpu(fw_module->module_identifier);
+		cfg->module_entries[i].log_level = fw_module->log_level;
+	}
+
+status_out:
+	kfree(buf);
+	return status;
+}
+
+/**
+ * ice_fwlog_set_supported - Set if FW logging is supported by FW
+ * @hw: pointer to the HW struct
+ *
+ * If FW returns success to the ice_aq_fwlog_get call then it supports FW
+ * logging, else it doesn't. Set the fwlog_supported flag accordingly.
+ *
+ * This function is only meant to be called during driver init to determine if
+ * the FW support FW logging.
+ */
+void ice_fwlog_set_supported(struct ice_hw *hw)
+{
+	struct ice_fwlog_cfg *cfg;
+	int status;
+
+	hw->fwlog_supported = false;
+
+	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
+	if (!cfg)
+		return;
+
+	/* don't call ice_fwlog_get() because that would overwrite the cached
+	 * configuration from the call to ice_fwlog_init(), which is expected to
+	 * be called prior to this function
+	 */
+	status = ice_aq_fwlog_get(hw, cfg);
+	if (status)
+		ice_debug(hw, ICE_DBG_FW_LOG, "ice_aq_fwlog_get failed, FW logging is not supported on this version of FW, status %d\n",
+			  status);
+	else
+		hw->fwlog_supported = true;
+
+	kfree(cfg);
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_fwlog.h b/drivers/net/ethernet/intel/ice/ice_fwlog.h
new file mode 100644
index 000000000000..3a2c83502763
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_fwlog.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2022, Intel Corporation. */
+
+#ifndef _ICE_FWLOG_H_
+#define _ICE_FWLOG_H_
+#include "ice_adminq_cmd.h"
+
+struct ice_hw;
+
+/* Only a single log level should be set and all log levels under the set value
+ * are enabled, e.g. if log level is set to ICE_FW_LOG_LEVEL_VERBOSE, then all
+ * other log levels are included (except ICE_FW_LOG_LEVEL_NONE)
+ */
+enum ice_fwlog_level {
+	ICE_FWLOG_LEVEL_NONE = 0,
+	ICE_FWLOG_LEVEL_ERROR = 1,
+	ICE_FWLOG_LEVEL_WARNING = 2,
+	ICE_FWLOG_LEVEL_NORMAL = 3,
+	ICE_FWLOG_LEVEL_VERBOSE = 4,
+	ICE_FWLOG_LEVEL_INVALID, /* all values >= this entry are invalid */
+};
+
+struct ice_fwlog_module_entry {
+	/* module ID for the corresponding firmware logging event */
+	u16 module_id;
+	/* verbosity level for the module_id */
+	u8 log_level;
+};
+
+struct ice_fwlog_cfg {
+	/* list of modules for configuring log level */
+	struct ice_fwlog_module_entry module_entries[ICE_AQC_FW_LOG_ID_MAX];
+	/* options used to configure firmware logging */
+	u16 options;
+#define ICE_FWLOG_OPTION_ARQ_ENA		BIT(0)
+#define ICE_FWLOG_OPTION_UART_ENA		BIT(1)
+	/* set before calling ice_fwlog_init() so the PF registers for firmware
+	 * logging on initialization
+	 */
+#define ICE_FWLOG_OPTION_REGISTER_ON_INIT	BIT(2)
+	/* set in the ice_fwlog_get() response if the PF is registered for FW
+	 * logging events over ARQ
+	 */
+#define ICE_FWLOG_OPTION_IS_REGISTERED		BIT(3)
+
+	/* minimum number of log events sent per Admin Receive Queue event */
+	u16 log_resolution;
+};
+
+void ice_fwlog_set_supported(struct ice_hw *hw);
+bool ice_fwlog_supported(struct ice_hw *hw);
+#endif /* _ICE_FWLOG_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 126605b7eb3b..1284fe8d78f2 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -17,6 +17,7 @@
 #include "ice_protocol_type.h"
 #include "ice_sbq_cmd.h"
 #include "ice_vlan_mode.h"
+#include "ice_fwlog.h"
 
 static inline bool ice_is_tc_ena(unsigned long bitmap, u8 tc)
 {
@@ -859,6 +860,9 @@ struct ice_hw {
 	u8 fw_patch;		/* firmware patch version */
 	u32 fw_build;		/* firmware build number */
 
+	bool fwlog_supported; /* does hardware support FW logging? */
+	bool fwlog_ena; /* currently logging? */
+
 /* Device max aggregate bandwidths corresponding to the GL_PWR_MODE_CTL
  * register. Used for determining the ITR/INTRL granularity during
  * initialization.
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/5] ice: add ability to query/set FW log level and resolution
  2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
  2023-02-09 19:06 ` [PATCH net-next 1/5] ice: remove FW logging code Tony Nguyen
  2023-02-09 19:06 ` [PATCH net-next 2/5] ice: enable devlink to check FW logging status Tony Nguyen
@ 2023-02-09 19:07 ` Tony Nguyen
  2023-02-09 19:07 ` [PATCH net-next 4/5] ice: disable FW logging on driver unload Tony Nguyen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Tony Nguyen @ 2023-02-09 19:07 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Paul M Stillwell Jr, netdev, anthony.l.nguyen, jacob.e.keller,
	jiri, idosch, Gurucharan G

From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

The E8xx has the ability to change the FW log level and
the granularity at which the logs get output. Enable
the ability to see what the current values are and to
change them.

The following devlink commands are now supported:

devlink dev param set <pci dev> name fwlog_enabled value <true/false> cmode runtime
devlink dev param set <pci dev> name fwlog_level value <0-4> cmode runtime
devlink dev param set <pci dev> name fwlog_resolution value <1-128> cmode runtime

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |   4 +-
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 121 +++++++-
 drivers/net/ethernet/intel/ice/ice_fwlog.c    | 260 +++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_fwlog.h    |   5 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 6 files changed, 385 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 0d560287ec23..1af036beeb45 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -2374,6 +2374,8 @@ enum ice_adminq_opc {
 	ice_aqc_opc_event_lan_overflow			= 0x1001,
 
 	/* FW Logging Commands */
+	ice_aqc_opc_fw_logs_config			= 0xFF30,
+	ice_aqc_opc_fw_logs_register			= 0xFF31,
 	ice_aqc_opc_fw_logs_query			= 0xFF32,
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 3549d3573e73..5fa1f76b0ab0 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -854,7 +854,9 @@ int ice_init_hw(struct ice_hw *hw)
 	if (status)
 		goto err_unroll_cqinit;
 
-	ice_fwlog_set_supported(hw);
+	status = ice_fwlog_init(hw);
+	if (status)
+		ice_debug(hw, ICE_DBG_FW_LOG, "Error initializing FW logging: %d\n", status);
 
 	status = ice_clear_pf_cfg(hw);
 	if (status)
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 39776a3de9fa..5321bae75c20 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -374,6 +374,8 @@ enum ice_devlink_param_id {
 	ICE_DEVLINK_PARAM_ID_TX_BALANCE,
 	ICE_DEVLINK_PARAM_ID_FWLOG_SUPPORTED,
 	ICE_DEVLINK_PARAM_ID_FWLOG_ENABLED,
+	ICE_DEVLINK_PARAM_ID_FWLOG_LEVEL,
+	ICE_DEVLINK_PARAM_ID_FWLOG_RESOLUTION,
 };
 
 /**
@@ -1430,8 +1432,111 @@ static int
 ice_devlink_fwlog_enabled_set(struct devlink *devlink, u32 id,
 			      struct devlink_param_gset_ctx *ctx)
 {
-	/* set operation is unsupported at this time */
-	return -EOPNOTSUPP;
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_hw *hw = &pf->hw;
+	int status;
+
+	/* only support fw log commands on PF 0 */
+	if (hw->bus.func)
+		return -EOPNOTSUPP;
+
+	if (hw->fwlog_ena == ctx->val.vbool)
+		return 0;
+
+	hw->fwlog_ena = ctx->val.vbool;
+
+	if (hw->fwlog_ena)
+		hw->fwlog_cfg.options |= ICE_FWLOG_OPTION_ARQ_ENA;
+	else
+		hw->fwlog_cfg.options &= ~ICE_FWLOG_OPTION_ARQ_ENA;
+
+	/* send the cfg to FW and register for events */
+	status = ice_fwlog_set(hw, &hw->fwlog_cfg);
+	if (status)
+		return status;
+
+	return hw->fwlog_ena ? ice_fwlog_register(hw) : ice_fwlog_unregister(hw);
+}
+
+static int
+ice_devlink_fwlog_level_get(struct devlink *devlink, u32 id,
+			    struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	/* only support fw log commands on PF 0 */
+	if (pf->hw.bus.func)
+		return -EOPNOTSUPP;
+
+	/* all the log levels are the same so pick the first one */
+	ctx->val.vu8 = pf->hw.fwlog_cfg.module_entries[0].log_level;
+
+	return 0;
+}
+
+static int
+ice_devlink_fwlog_level_set(struct devlink *devlink, u32 id,
+			    struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_fwlog_cfg *cfg;
+	int i;
+
+	if (pf->hw.bus.func)
+		return -EOPNOTSUPP;
+
+	if (ctx->val.vu8 >= ICE_FWLOG_LEVEL_INVALID) {
+		dev_err(ice_pf_to_dev(pf), "Log level is greater than allowed! %d\n",
+			ctx->val.vu8);
+		return -EINVAL;
+	}
+
+	cfg = &pf->hw.fwlog_cfg;
+
+	/* update the log level for all modules to the same thing. this gets
+	 * written to the FW when the user says enable logging
+	 */
+	for (i = 0; i < ICE_AQC_FW_LOG_ID_MAX; i++)
+		cfg->module_entries[i].log_level = ctx->val.vu8;
+
+	return 0;
+}
+
+static int
+ice_devlink_fwlog_resolution_get(struct devlink *devlink, u32 id,
+				 struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	/* only support fw log commands on PF 0 */
+	if (pf->hw.bus.func)
+		return -EOPNOTSUPP;
+
+	ctx->val.vu8 = pf->hw.fwlog_cfg.log_resolution;
+
+	return 0;
+}
+
+static int
+ice_devlink_fwlog_resolution_set(struct devlink *devlink, u32 id,
+				 struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	/* only support fw log commands on PF 0 */
+	if (pf->hw.bus.func)
+		return -EOPNOTSUPP;
+
+	if (ctx->val.vu8 < ICE_AQC_FW_LOG_MIN_RESOLUTION ||
+	    ctx->val.vu8 > ICE_AQC_FW_LOG_MAX_RESOLUTION) {
+		dev_err(ice_pf_to_dev(pf), "Log resolution is out of range. Should be between 1 - 128: %d\n",
+			ctx->val.vu8);
+		return -EINVAL;
+	}
+
+	pf->hw.fwlog_cfg.log_resolution = ctx->val.vu8;
+
+	return 0;
 }
 
 static const struct devlink_param ice_devlink_params[] = {
@@ -1455,6 +1560,18 @@ static const struct devlink_param ice_devlink_params[] = {
 			     ice_devlink_fwlog_enabled_get,
 			     ice_devlink_fwlog_enabled_set,
 			     NULL),
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_FWLOG_LEVEL,
+			     "fwlog_level", DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     ice_devlink_fwlog_level_get,
+			     ice_devlink_fwlog_level_set,
+			     NULL),
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_FWLOG_RESOLUTION,
+			     "fwlog_resolution", DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     ice_devlink_fwlog_resolution_get,
+			     ice_devlink_fwlog_resolution_set,
+			     NULL),
 };
 
 static void ice_devlink_free(void *devlink_ptr)
diff --git a/drivers/net/ethernet/intel/ice/ice_fwlog.c b/drivers/net/ethernet/intel/ice/ice_fwlog.c
index fac970f03dd0..a6b468123bc0 100644
--- a/drivers/net/ethernet/intel/ice/ice_fwlog.c
+++ b/drivers/net/ethernet/intel/ice/ice_fwlog.c
@@ -4,6 +4,146 @@
 #include "ice_common.h"
 #include "ice_fwlog.h"
 
+/**
+ * valid_module_entries - validate all the module entry IDs and log levels
+ * @hw: pointer to the HW structure
+ * @entries: entries to validate
+ * @num_entries: number of entries to validate
+ */
+static bool
+valid_module_entries(struct ice_hw *hw, struct ice_fwlog_module_entry *entries,
+		     u16 num_entries)
+{
+	int i;
+
+	if (!entries) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "Null ice_fwlog_module_entry array\n");
+		return false;
+	}
+
+	if (!num_entries) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "num_entries must be non-zero\n");
+		return false;
+	}
+
+	for (i = 0; i < num_entries; i++) {
+		struct ice_fwlog_module_entry *entry = &entries[i];
+
+		if (entry->module_id >= ICE_AQC_FW_LOG_ID_MAX) {
+			ice_debug(hw, ICE_DBG_FW_LOG, "Invalid module_id %u, max valid module_id is %u\n",
+				  entry->module_id, ICE_AQC_FW_LOG_ID_MAX - 1);
+			return false;
+		}
+
+		if (entry->log_level >= ICE_FWLOG_LEVEL_INVALID) {
+			ice_debug(hw, ICE_DBG_FW_LOG, "Invalid log_level %u, max valid log_level is %u\n",
+				  entry->log_level,
+				  ICE_AQC_FW_LOG_ID_MAX - 1);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/**
+ * valid_cfg - validate entire configuration
+ * @hw: pointer to the HW structure
+ * @cfg: config to validate
+ */
+static bool valid_cfg(struct ice_hw *hw, struct ice_fwlog_cfg *cfg)
+{
+	if (!cfg) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "Null ice_fwlog_cfg\n");
+		return false;
+	}
+
+	if (cfg->log_resolution < ICE_AQC_FW_LOG_MIN_RESOLUTION ||
+	    cfg->log_resolution > ICE_AQC_FW_LOG_MAX_RESOLUTION) {
+		ice_debug(hw, ICE_DBG_FW_LOG, "Unsupported log_resolution %u, must be between %u and %u\n",
+			  cfg->log_resolution, ICE_AQC_FW_LOG_MIN_RESOLUTION,
+			  ICE_AQC_FW_LOG_MAX_RESOLUTION);
+		return false;
+	}
+
+	return valid_module_entries(hw, cfg->module_entries,
+				    ICE_AQC_FW_LOG_ID_MAX);
+}
+
+/**
+ * ice_fwlog_init - Initialize FW logging configuration
+ * @hw: pointer to the HW structure
+ *
+ * This function should be called on driver initialization during
+ * ice_init_hw().
+ */
+int ice_fwlog_init(struct ice_hw *hw)
+{
+	int status;
+
+	ice_fwlog_set_supported(hw);
+
+	if (ice_fwlog_supported(hw)) {
+		/* read the current config from the FW and store it */
+		status = ice_fwlog_get(hw, &hw->fwlog_cfg);
+		if (status)
+			return status;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_aq_fwlog_set - Set FW logging configuration AQ command (0xFF30)
+ * @hw: pointer to the HW structure
+ * @entries: entries to configure
+ * @num_entries: number of @entries
+ * @options: options from ice_fwlog_cfg->options structure
+ * @log_resolution: logging resolution
+ */
+static int
+ice_aq_fwlog_set(struct ice_hw *hw, struct ice_fwlog_module_entry *entries,
+		 u16 num_entries, u16 options, u16 log_resolution)
+{
+	struct ice_aqc_fw_log_cfg_resp *fw_modules;
+	struct ice_aqc_fw_log *cmd;
+	struct ice_aq_desc desc;
+	int status;
+	int i;
+
+	fw_modules = kcalloc(num_entries, sizeof(*fw_modules), GFP_KERNEL);
+	if (!fw_modules)
+		return -ENOMEM;
+
+	for (i = 0; i < num_entries; i++) {
+		fw_modules[i].module_identifier =
+			cpu_to_le16(entries[i].module_id);
+		fw_modules[i].log_level = entries[i].log_level;
+	}
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_fw_logs_config);
+	desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
+
+	cmd = &desc.params.fw_log;
+
+	cmd->cmd_flags = ICE_AQC_FW_LOG_CONF_SET_VALID;
+	cmd->ops.cfg.log_resolution = cpu_to_le16(log_resolution);
+	cmd->ops.cfg.mdl_cnt = cpu_to_le16(num_entries);
+
+	if (options & ICE_FWLOG_OPTION_ARQ_ENA)
+		cmd->cmd_flags |= ICE_AQC_FW_LOG_CONF_AQ_EN;
+	if (options & ICE_FWLOG_OPTION_UART_ENA)
+		cmd->cmd_flags |= ICE_AQC_FW_LOG_CONF_UART_EN;
+
+	status = ice_aq_send_cmd(hw, &desc, fw_modules,
+				 sizeof(*fw_modules) * num_entries,
+				 NULL);
+
+	kfree(fw_modules);
+
+	return status;
+}
+
 /**
  * ice_fwlog_supported - Cached for whether FW supports FW logging or not
  * @hw: pointer to the HW structure
@@ -16,13 +156,102 @@ bool ice_fwlog_supported(struct ice_hw *hw)
 	return hw->fwlog_supported;
 }
 
+/**
+ * ice_fwlog_set - Set the firmware logging settings
+ * @hw: pointer to the HW structure
+ * @cfg: config used to set firmware logging
+ *
+ * This function should be called whenever the driver needs to set the firmware
+ * logging configuration. It can be called on initialization, reset, or during
+ * runtime.
+ *
+ * If the PF wishes to receive FW logging then it must register via
+ * ice_fwlog_register. Note, that ice_fwlog_register does not need to be called
+ * for init.
+ */
+int ice_fwlog_set(struct ice_hw *hw, struct ice_fwlog_cfg *cfg)
+{
+	int status;
+
+	if (!ice_fwlog_supported(hw))
+		return -EOPNOTSUPP;
+
+	if (!valid_cfg(hw, cfg))
+		return -EINVAL;
+
+	status = ice_aq_fwlog_set(hw, cfg->module_entries,
+				  ICE_AQC_FW_LOG_ID_MAX, cfg->options,
+				  cfg->log_resolution);
+
+	return status;
+}
+
+/**
+ * ice_aq_fwlog_register - Register PF for firmware logging events (0xFF31)
+ * @hw: pointer to the HW structure
+ * @reg: true to register and false to unregister
+ */
+static int ice_aq_fwlog_register(struct ice_hw *hw, bool reg)
+{
+	struct ice_aq_desc desc;
+
+	ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_fw_logs_register);
+
+	if (reg)
+		desc.params.fw_log.cmd_flags = ICE_AQC_FW_LOG_AQ_REGISTER;
+
+	return ice_aq_send_cmd(hw, &desc, NULL, 0, NULL);
+}
+
+/**
+ * ice_fwlog_register - Register the PF for firmware logging
+ * @hw: pointer to the HW structure
+ *
+ * After this call the PF will start to receive firmware logging based on the
+ * configuration set in ice_fwlog_set.
+ */
+int ice_fwlog_register(struct ice_hw *hw)
+{
+	int status;
+
+	if (!ice_fwlog_supported(hw))
+		return -EOPNOTSUPP;
+
+	status = ice_aq_fwlog_register(hw, true);
+	if (status)
+		ice_debug(hw, ICE_DBG_FW_LOG, "Failed to register for firmware logging events over ARQ\n");
+	else
+		hw->fwlog_cfg.options |= ICE_FWLOG_OPTION_IS_REGISTERED;
+
+	return status;
+}
+
+/**
+ * ice_fwlog_unregister - Unregister the PF from firmware logging
+ * @hw: pointer to the HW structure
+ */
+int ice_fwlog_unregister(struct ice_hw *hw)
+{
+	int status;
+
+	if (!ice_fwlog_supported(hw))
+		return -EOPNOTSUPP;
+
+	status = ice_aq_fwlog_register(hw, false);
+	if (status)
+		ice_debug(hw, ICE_DBG_FW_LOG, "Failed to unregister from firmware logging events over ARQ\n");
+	else
+		hw->fwlog_cfg.options &= ~ICE_FWLOG_OPTION_IS_REGISTERED;
+
+	return status;
+}
+
 /**
  * ice_aq_fwlog_get - Get the current firmware logging configuration (0xFF32)
  * @hw: pointer to the HW structure
  * @cfg: firmware logging configuration to populate
  */
-static int
-ice_aq_fwlog_get(struct ice_hw *hw, struct ice_fwlog_cfg *cfg)
+static int ice_aq_fwlog_get(struct ice_hw *hw, struct ice_fwlog_cfg *cfg)
 {
 	struct ice_aqc_fw_log_cfg_resp *fw_modules;
 	struct ice_aqc_fw_log *cmd;
@@ -102,9 +331,8 @@ void ice_fwlog_set_supported(struct ice_hw *hw)
 	if (!cfg)
 		return;
 
-	/* don't call ice_fwlog_get() because that would overwrite the cached
-	 * configuration from the call to ice_fwlog_init(), which is expected to
-	 * be called prior to this function
+	/* don't call ice_fwlog_get() because that would check to see if FW
+	 * logging is supported which is what the driver is determining now
 	 */
 	status = ice_aq_fwlog_get(hw, cfg);
 	if (status)
@@ -115,3 +343,25 @@ void ice_fwlog_set_supported(struct ice_hw *hw)
 
 	kfree(cfg);
 }
+
+/**
+ * ice_fwlog_get - Get the firmware logging settings
+ * @hw: pointer to the HW structure
+ * @cfg: config to populate based on current firmware logging settings
+ */
+int ice_fwlog_get(struct ice_hw *hw, struct ice_fwlog_cfg *cfg)
+{
+	int status;
+
+	if (!ice_fwlog_supported(hw))
+		return -EOPNOTSUPP;
+
+	if (!cfg)
+		return -EINVAL;
+
+	status = ice_aq_fwlog_get(hw, cfg);
+	if (status)
+		return status;
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_fwlog.h b/drivers/net/ethernet/intel/ice/ice_fwlog.h
index 3a2c83502763..fcfceb9f6ec2 100644
--- a/drivers/net/ethernet/intel/ice/ice_fwlog.h
+++ b/drivers/net/ethernet/intel/ice/ice_fwlog.h
@@ -49,4 +49,9 @@ struct ice_fwlog_cfg {
 
 void ice_fwlog_set_supported(struct ice_hw *hw);
 bool ice_fwlog_supported(struct ice_hw *hw);
+int ice_fwlog_init(struct ice_hw *hw);
+int ice_fwlog_set(struct ice_hw *hw, struct ice_fwlog_cfg *cfg);
+int ice_fwlog_get(struct ice_hw *hw, struct ice_fwlog_cfg *cfg);
+int ice_fwlog_register(struct ice_hw *hw);
+int ice_fwlog_unregister(struct ice_hw *hw);
 #endif /* _ICE_FWLOG_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 1284fe8d78f2..5161757f07a6 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -860,6 +860,7 @@ struct ice_hw {
 	u8 fw_patch;		/* firmware patch version */
 	u32 fw_build;		/* firmware build number */
 
+	struct ice_fwlog_cfg fwlog_cfg;
 	bool fwlog_supported; /* does hardware support FW logging? */
 	bool fwlog_ena; /* currently logging? */
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 4/5] ice: disable FW logging on driver unload
  2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
                   ` (2 preceding siblings ...)
  2023-02-09 19:07 ` [PATCH net-next 3/5] ice: add ability to query/set FW log level and resolution Tony Nguyen
@ 2023-02-09 19:07 ` Tony Nguyen
  2023-02-09 19:07 ` [PATCH net-next 5/5] ice: use debugfs to output FW log data Tony Nguyen
  2023-02-11  4:23 ` [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Jakub Kicinski
  5 siblings, 0 replies; 17+ messages in thread
From: Tony Nguyen @ 2023-02-09 19:07 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Paul M Stillwell Jr, netdev, anthony.l.nguyen, jacob.e.keller,
	jiri, idosch, Gurucharan G

From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

The FW is running in it's own context irregardless of what the driver
is doing. In this case, if the driver previously registered for FW
log events and then the driver unloads without informing the FW to
unregister for FW log events then the FW still has a timer running to
output FW logs.

The next time the driver loads and tries to register for FW log events
then the FW returns an error, but still enables the continued
outputting of FW logs. This causes an IO error to devlink which isn't
intuitive since the logs are still being output.

Fix this by disabling FW logging when the driver is being unloaded.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 29 +++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 055849dc608a..cf8c9220a6d7 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4526,6 +4526,33 @@ static void ice_unregister_netdev(struct ice_vsi *vsi)
 	clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state);
 }
 
+/**
+ * ice_pf_fwlog_deinit - clear FW logging metadata on device exit
+ * @pf: pointer to the PF struct
+ */
+static void ice_pf_fwlog_deinit(struct ice_pf *pf)
+{
+	struct ice_hw *hw = &pf->hw;
+
+	/* make sure FW logging is disabled to not put the FW in a weird state
+	 * for the next driver load
+	 */
+	if (hw->fwlog_ena) {
+		int status;
+
+		hw->fwlog_cfg.options &= ~ICE_FWLOG_OPTION_ARQ_ENA;
+		status = ice_fwlog_set(hw, &hw->fwlog_cfg);
+		if (status)
+			dev_warn(ice_pf_to_dev(pf), "Unable to turn off FW logging, status: %d\n",
+				 status);
+
+		status = ice_fwlog_unregister(hw);
+		if (status)
+			dev_warn(ice_pf_to_dev(pf), "Unable to unregister FW logging, status: %d\n",
+				 status);
+	}
+}
+
 /**
  * ice_cfg_netdev - Allocate, configure and register a netdev
  * @vsi: the VSI associated with the new netdev
@@ -5266,6 +5293,8 @@ static void ice_remove(struct pci_dev *pdev)
 		msleep(100);
 	}
 
+	ice_pf_fwlog_deinit(pf);
+
 	if (test_bit(ICE_FLAG_SRIOV_ENA, pf->flags)) {
 		set_bit(ICE_VF_RESETS_DISABLED, pf->state);
 		ice_free_vfs(pf);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 5/5] ice: use debugfs to output FW log data
  2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
                   ` (3 preceding siblings ...)
  2023-02-09 19:07 ` [PATCH net-next 4/5] ice: disable FW logging on driver unload Tony Nguyen
@ 2023-02-09 19:07 ` Tony Nguyen
  2023-02-11  4:23 ` [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Jakub Kicinski
  5 siblings, 0 replies; 17+ messages in thread
From: Tony Nguyen @ 2023-02-09 19:07 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Paul M Stillwell Jr, netdev, anthony.l.nguyen, jacob.e.keller,
	jiri, idosch, Gurucharan G

From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

The FW log data can be quite large so we don't want to use syslog.
Instead take advantage of debugfs to write the data to.

The file is binary data and users should send them to us to
work with the FW team to decode them.

An example of how to retrieve the data using debugfs is:

cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog > fwlog

Also updated the documentation to add the new parameters.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 Documentation/networking/devlink/ice.rst      |  39 ++++++
 drivers/net/ethernet/intel/ice/Makefile       |   4 +-
 drivers/net/ethernet/intel/ice/ice.h          |  22 +++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_debugfs.c  | 116 ++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_main.c     | 125 ++++++++++++++----
 6 files changed, 279 insertions(+), 28 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_debugfs.c

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index 10f282c2117c..3bc2ca31759f 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -7,6 +7,45 @@ ice devlink support
 This document describes the devlink features implemented by the ``ice``
 device driver.
 
+Parameters
+=============
+
+.. list-table:: Driver-specific parameters implemented
+   :widths: 5 5 5 85
+
+   * - Name
+     - Type
+     - Mode
+     - Description
+   * - ``fwlog_supported``
+     - Boolean
+     - runtime
+     - This parameter indicates to the user whether FW loggiing is supported
+       or not in the currently loaded FW.
+   * - ``fwlog_enabled``
+     - Boolean
+     - runtime
+     - This parameter indicates to the user whether the driver is currently
+       getting FW logs or not.
+   * - ``fwlog_level``
+     - u8
+     - runtime
+     - This parameter indicates the current log level. Each level includes the
+       messages from the previous/lower level. Valid values are
+
+          * ``0`` - no logging
+          * ``1`` - error logging
+          * ``2`` - warning logging
+          * ``3`` - normal logging
+          * ``4`` - verbose logging
+   * - ``fwlog_resolution``
+     - u8
+     - runtime
+     - This parameter indicates the number of log messages to included in a
+       single ARQ event. The range is 1-128 (1 means push every log message,
+       128 means push only when the max AQ command buffer is full). The
+       suggested value is 10.
+
 Info versions
 =============
 
diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index 6e4680ad097c..452a440a9810 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -34,7 +34,8 @@ ice-y := ice_main.o	\
 	 ice_ethtool.o  \
 	 ice_repr.o	\
 	 ice_tc_lib.o	\
-	 ice_fwlog.o
+	 ice_fwlog.o	\
+	 ice_debugfs.o
 ice-$(CONFIG_PCI_IOV) +=	\
 	ice_sriov.o		\
 	ice_virtchnl.o		\
@@ -49,3 +50,4 @@ ice-$(CONFIG_RFS_ACCEL) += ice_arfs.o
 ice-$(CONFIG_XDP_SOCKETS) += ice_xsk.o
 ice-$(CONFIG_ICE_SWITCHDEV) += ice_eswitch.o
 ice-$(CONFIG_ICE_GNSS) += ice_gnss.o
+ice-$(CONFIG_DEBUG_FS) += ice_debugfs.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 7bd50e49312c..f599e0e7b7dc 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -550,6 +550,7 @@ struct ice_pf {
 	struct ice_vsi_stats **vsi_stats;
 	struct ice_sw *first_sw;	/* first switch created by firmware */
 	u16 eswitch_mode;		/* current mode of eswitch */
+	struct dentry *ice_debugfs_pf;
 	struct ice_vfs vfs;
 	DECLARE_BITMAP(features, ICE_F_MAX);
 	DECLARE_BITMAP(state, ICE_STATE_NBITS);
@@ -632,6 +633,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+	struct list_head fwlog_data_list;
+	u8 fwlog_list_count;
 };
 
 struct ice_netdev_priv {
@@ -646,6 +649,15 @@ struct ice_netdev_priv {
 	struct list_head tc_indr_block_priv_list;
 };
 
+struct ice_fwlog_data {
+	struct list_head list;
+	u16 data_size;
+	u8 *data;
+};
+
+/* define the maximum number of items that can be in the list */
+#define ICE_FWLOG_MAX_SIZE	128
+
 /**
  * ice_vector_ch_enabled
  * @qv: pointer to q_vector, can be NULL
@@ -870,6 +882,16 @@ static inline bool ice_is_adq_active(struct ice_pf *pf)
 	return false;
 }
 
+#ifdef CONFIG_DEBUG_FS
+void ice_debugfs_fwlog_init(struct ice_pf *pf);
+void ice_debugfs_init(void);
+void ice_debugfs_exit(void);
+#else
+static inline void ice_debugfs_fwlog_init(struct ice_pf *pf) { }
+static inline void ice_debugfs_init(void) { }
+static inline void ice_debugfs_exit(void) { }
+#endif /* CONFIG_DEBUG_FS */
+
 bool netif_is_ice(struct net_device *dev);
 int ice_vsi_setup_tx_rings(struct ice_vsi *vsi);
 int ice_vsi_setup_rx_rings(struct ice_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 1af036beeb45..27c2cea29c51 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -2377,6 +2377,7 @@ enum ice_adminq_opc {
 	ice_aqc_opc_fw_logs_config			= 0xFF30,
 	ice_aqc_opc_fw_logs_register			= 0xFF31,
 	ice_aqc_opc_fw_logs_query			= 0xFF32,
+	ice_aqc_opc_fw_logs_event			= 0xFF33,
 };
 
 #endif /* _ICE_ADMINQ_CMD_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_debugfs.c b/drivers/net/ethernet/intel/ice/ice_debugfs.c
new file mode 100644
index 000000000000..c097e9465070
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_debugfs.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, Intel Corporation. */
+
+#include <linux/vmalloc.h>
+
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+#include <linux/random.h>
+#include "ice.h"
+
+static struct dentry *ice_debugfs_root;
+
+/**
+ * ice_debugfs_command_read - read from command datum
+ * @filp: the opened file
+ * @buffer: where to write the data for the user to read
+ * @count: the size of the user's buffer
+ * @ppos: file position offset
+ */
+static ssize_t ice_debugfs_command_read(struct file *filp, char __user *buffer,
+					size_t count, loff_t *ppos)
+{
+	struct ice_pf *pf = filp->private_data;
+	struct ice_fwlog_data *log, *tmp_log;
+	int data_copied = 0;
+	int copy_count = 0;
+
+	if (list_empty(&pf->fwlog_data_list))
+		return 0;
+
+	list_for_each_entry(log, &pf->fwlog_data_list, list) {
+		u16 cur_buf_len = log->data_size;
+		int retval;
+
+		if (cur_buf_len >= count)
+			break;
+
+		retval = copy_to_user(buffer, log->data, cur_buf_len);
+		if (retval)
+			return -EFAULT;
+
+		data_copied += cur_buf_len;
+		buffer += cur_buf_len;
+		count -= cur_buf_len;
+		*ppos += cur_buf_len;
+		copy_count++;
+	}
+
+	/* only free the data once we know there weren't any errors */
+	list_for_each_entry_safe(log, tmp_log, &pf->fwlog_data_list, list) {
+		if (!copy_count)
+			break;
+
+		vfree(log->data);
+		list_del(&log->list);
+		vfree(log);
+		pf->fwlog_list_count--;
+		copy_count--;
+	}
+
+	return data_copied;
+}
+
+static const struct file_operations ice_debugfs_command_fops = {
+	.owner = THIS_MODULE,
+	.open  = simple_open,
+	.read = ice_debugfs_command_read,
+};
+
+/**
+ * ice_debugfs_fwlog_init - setup the debugfs directory
+ * @pf: the ice that is starting up
+ */
+void ice_debugfs_fwlog_init(struct ice_pf *pf)
+{
+	const char *name = pci_name(pf->pdev);
+	struct dentry *pfile;
+
+	/* only support fw log commands on PF 0 */
+	if (pf->hw.bus.func)
+		return;
+
+	pf->ice_debugfs_pf = debugfs_create_dir(name, ice_debugfs_root);
+	if (IS_ERR(pf->ice_debugfs_pf))
+		return;
+
+	pfile = debugfs_create_file("fwlog", 0400, pf->ice_debugfs_pf, pf,
+				    &ice_debugfs_command_fops);
+	if (!pfile)
+		goto create_failed;
+
+	return;
+
+create_failed:
+	dev_err(ice_pf_to_dev(pf), "debugfs dir/file for %s failed\n", name);
+	debugfs_remove_recursive(pf->ice_debugfs_pf);
+}
+
+/**
+ * ice_debugfs_init - create root directory for debugfs entries
+ */
+void ice_debugfs_init(void)
+{
+	ice_debugfs_root = debugfs_create_dir(KBUILD_MODNAME, NULL);
+	if (IS_ERR(ice_debugfs_root))
+		pr_info("init of debugfs failed\n");
+}
+
+/**
+ * ice_debugfs_exit - remove debugfs entries
+ */
+void ice_debugfs_exit(void)
+{
+	debugfs_remove_recursive(ice_debugfs_root);
+	ice_debugfs_root = NULL;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index cf8c9220a6d7..31df142c5558 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5,6 +5,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/vmalloc.h>
 #include <generated/utsrelease.h>
 #include "ice.h"
 #include "ice_base.h"
@@ -1212,6 +1213,43 @@ ice_handle_link_event(struct ice_pf *pf, struct ice_rq_event_info *event)
 	return status;
 }
 
+/**
+ * ice_get_fwlog_data - copy the FW log data from ARQ event
+ * @pf: PF that the FW log event is associated with
+ * @event: event structure containing FW log data
+ */
+static void
+ice_get_fwlog_data(struct ice_pf *pf, struct ice_rq_event_info *event)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_fwlog_data *fwlog;
+
+	if (pf->fwlog_list_count >= ICE_FWLOG_MAX_SIZE)
+		return;
+
+	fwlog = vmalloc(sizeof(*fwlog));
+	if (!fwlog) {
+		dev_warn(dev, "Couldn't allocate memory for FWlog element\n");
+		return;
+	}
+
+	INIT_LIST_HEAD(&fwlog->list);
+
+	fwlog->data_size = le16_to_cpu(event->desc.datalen);
+	fwlog->data = vzalloc(fwlog->data_size);
+	if (!fwlog->data) {
+		dev_warn(dev, "Couldn't allocate memory for FWlog data\n");
+		vfree(fwlog);
+		return;
+	}
+
+	memcpy(fwlog->data, event->msg_buf, fwlog->data_size);
+
+	list_add_tail(&fwlog->list, &pf->fwlog_data_list);
+
+	pf->fwlog_list_count++;
+}
+
 enum ice_aq_task_state {
 	ICE_AQ_TASK_WAITING = 0,
 	ICE_AQ_TASK_COMPLETE,
@@ -1485,6 +1523,9 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type)
 			if (!ice_is_malicious_vf(pf, &event, i, pending))
 				ice_vc_process_vf_msg(pf, &event);
 			break;
+		case ice_aqc_opc_fw_logs_event:
+			ice_get_fwlog_data(pf, &event);
+			break;
 		case ice_aqc_opc_lldp_set_mib_change:
 			ice_dcb_process_lldp_set_mib_change(pf, &event);
 			break;
@@ -4526,33 +4567,6 @@ static void ice_unregister_netdev(struct ice_vsi *vsi)
 	clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state);
 }
 
-/**
- * ice_pf_fwlog_deinit - clear FW logging metadata on device exit
- * @pf: pointer to the PF struct
- */
-static void ice_pf_fwlog_deinit(struct ice_pf *pf)
-{
-	struct ice_hw *hw = &pf->hw;
-
-	/* make sure FW logging is disabled to not put the FW in a weird state
-	 * for the next driver load
-	 */
-	if (hw->fwlog_ena) {
-		int status;
-
-		hw->fwlog_cfg.options &= ~ICE_FWLOG_OPTION_ARQ_ENA;
-		status = ice_fwlog_set(hw, &hw->fwlog_cfg);
-		if (status)
-			dev_warn(ice_pf_to_dev(pf), "Unable to turn off FW logging, status: %d\n",
-				 status);
-
-		status = ice_fwlog_unregister(hw);
-		if (status)
-			dev_warn(ice_pf_to_dev(pf), "Unable to unregister FW logging, status: %d\n",
-				 status);
-	}
-}
-
 /**
  * ice_cfg_netdev - Allocate, configure and register a netdev
  * @vsi: the VSI associated with the new netdev
@@ -4679,6 +4693,56 @@ static void ice_deinit_eth(struct ice_pf *pf)
 	ice_decfg_netdev(vsi);
 }
 
+/**
+ * ice_pf_fwlog_init - initialize FW logging on device init
+ * @pf: pointer to the PF struct
+ */
+static void ice_pf_fwlog_init(struct ice_pf *pf)
+{
+	/* only supported on PF 0 */
+	if (pf->hw.bus.func)
+		return;
+
+	INIT_LIST_HEAD(&pf->fwlog_data_list);
+}
+
+/**
+ * ice_pf_fwlog_deinit - clear FW logging metadata on device exit
+ * @pf: pointer to the PF struct
+ */
+static void ice_pf_fwlog_deinit(struct ice_pf *pf)
+{
+	struct ice_fwlog_data *fwlog, *fwlog_tmp;
+	struct ice_hw *hw = &pf->hw;
+
+	/* only supported on PF 0 */
+	if (pf->hw.bus.func)
+		return;
+
+	/* make sure FW logging is disabled to not put the FW in a weird state
+	 * for the next driver load
+	 */
+	if (hw->fwlog_ena) {
+		int status;
+
+		hw->fwlog_cfg.options &= ~ICE_FWLOG_OPTION_ARQ_ENA;
+		status = ice_fwlog_set(hw, &hw->fwlog_cfg);
+		if (status)
+			dev_warn(ice_pf_to_dev(pf), "Unable to turn off FW logging, status: %d\n",
+				 status);
+
+		status = ice_fwlog_unregister(hw);
+		if (status)
+			dev_warn(ice_pf_to_dev(pf), "Unable to unregister FW logging, status: %d\n",
+				 status);
+	}
+
+	list_for_each_entry_safe(fwlog, fwlog_tmp, &pf->fwlog_data_list, list) {
+		vfree(fwlog->data);
+		vfree(fwlog);
+	}
+}
+
 static int ice_init_dev(struct ice_pf *pf)
 {
 	struct device *dev = ice_pf_to_dev(pf);
@@ -5186,6 +5250,9 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		hw->debug_mask = debug;
 #endif
 
+	ice_pf_fwlog_init(pf);
+	ice_debugfs_fwlog_init(pf);
+
 	err = ice_init(pf);
 	if (err)
 		goto err_init;
@@ -5294,6 +5361,7 @@ static void ice_remove(struct pci_dev *pdev)
 	}
 
 	ice_pf_fwlog_deinit(pf);
+	ice_debugfs_exit();
 
 	if (test_bit(ICE_FLAG_SRIOV_ENA, pf->flags)) {
 		set_bit(ICE_VF_RESETS_DISABLED, pf->state);
@@ -5756,10 +5824,13 @@ static int __init ice_module_init(void)
 		return -ENOMEM;
 	}
 
+	ice_debugfs_init();
+
 	status = pci_register_driver(&ice_driver);
 	if (status) {
 		pr_err("failed to register PCI driver, err %d\n", status);
 		destroy_workqueue(ice_wq);
+		ice_debugfs_exit();
 	}
 
 	return status;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
                   ` (4 preceding siblings ...)
  2023-02-09 19:07 ` [PATCH net-next 5/5] ice: use debugfs to output FW log data Tony Nguyen
@ 2023-02-11  4:23 ` Jakub Kicinski
  2023-02-13 23:46   ` Paul M Stillwell Jr
  5 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-11  4:23 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, pabeni, edumazet, netdev, paul.m.stillwell.jr,
	jacob.e.keller, jiri, idosch

Can you describe how this is used a little bit?
The FW log is captured at some level always (e.g. warns) 
or unless user enables _nothing_ will come out?

On Thu,  9 Feb 2023 11:06:57 -0800 Tony Nguyen wrote:
> devlink dev param set <pci dev> name fwlog_enabled value <true/false> cmode runtime
> devlink dev param set <pci dev> name fwlog_level value <0-4> cmode runtime
> devlink dev param set <pci dev> name fwlog_resolution value <1-128> cmode runtime

If you're using debugfs as a pipe you should put these enable knobs 
in there as well. Or add a proper devlink command to carry all this
information via structured netlink (fw log + level + enable are hardly
Intel specific).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-11  4:23 ` [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Jakub Kicinski
@ 2023-02-13 23:46   ` Paul M Stillwell Jr
  2023-02-14  0:40     ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Paul M Stillwell Jr @ 2023-02-13 23:46 UTC (permalink / raw)
  To: Jakub Kicinski, Tony Nguyen
  Cc: davem, pabeni, edumazet, netdev, jacob.e.keller, jiri, idosch

On 2/10/2023 8:23 PM, Jakub Kicinski wrote:
> Can you describe how this is used a little bit?
> The FW log is captured at some level always (e.g. warns)
> or unless user enables _nothing_ will come out?
> 

My understanding is that the FW is constantly logging data into internal 
buffers. When the user indicates what data they want and what level they 
want then the data is filtered and output via either the UART or the 
Admin queues. These patches retrieve the FW logs via the admin queue 
commands.

The output from the FW is a binary blob that a user would send back to 
Intel to be decoded. This is only used for troubleshooting issues where 
a user is working with someone from Intel on a specific problem.

Does that help?

> On Thu,  9 Feb 2023 11:06:57 -0800 Tony Nguyen wrote:
>> devlink dev param set <pci dev> name fwlog_enabled value <true/false> cmode runtime
>> devlink dev param set <pci dev> name fwlog_level value <0-4> cmode runtime
>> devlink dev param set <pci dev> name fwlog_resolution value <1-128> cmode runtime
> 
> If you're using debugfs as a pipe you should put these enable knobs
> in there as well.

My understanding is that debugfs use as a write mechanism is frowned on. 
If that's not true and if we were to submit patches that used debugfs 
instead of devlink and they would be accepted then I'll happily do that. :)

Or add a proper devlink command to carry all this
> information via structured netlink (fw log + level + enable are hardly
> Intel specific).

I don't know how other companies FW interface works so wouldn't assume 
that I could come up with an interface that would work across all devices.

Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-13 23:46   ` Paul M Stillwell Jr
@ 2023-02-14  0:40     ` Jakub Kicinski
  2023-02-14 16:14       ` Paul M Stillwell Jr
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-14  0:40 UTC (permalink / raw)
  To: Paul M Stillwell Jr
  Cc: Tony Nguyen, davem, pabeni, edumazet, netdev, jacob.e.keller,
	jiri, idosch

On Mon, 13 Feb 2023 15:46:53 -0800 Paul M Stillwell Jr wrote:
> On 2/10/2023 8:23 PM, Jakub Kicinski wrote:
> > Can you describe how this is used a little bit?
> > The FW log is captured at some level always (e.g. warns)
> > or unless user enables _nothing_ will come out?
> 
> My understanding is that the FW is constantly logging data into internal 
> buffers. When the user indicates what data they want and what level they 
> want then the data is filtered and output via either the UART or the 
> Admin queues. These patches retrieve the FW logs via the admin queue 
> commands.

What's the trigger to perform the collection?

If it's some error condition / assert in FW then maybe it's worth
wrapping it up (or at least some portion of the functionality) into
devlink health?

AFAIU the purpose of devlink health is exactly to bubble up to the host
asserts / errors / crashes in the FW, with associated "dump".

> The output from the FW is a binary blob that a user would send back to 
> Intel to be decoded. This is only used for troubleshooting issues where 
> a user is working with someone from Intel on a specific problem.

I believe that's in line with devlink health. The devlink health log 
is "formatted" but I really doubt that any user can get far in debugging
without vendor support.

> > On Thu,  9 Feb 2023 11:06:57 -0800 Tony Nguyen wrote:  
> >> devlink dev param set <pci dev> name fwlog_enabled value <true/false> cmode runtime
> >> devlink dev param set <pci dev> name fwlog_level value <0-4> cmode runtime
> >> devlink dev param set <pci dev> name fwlog_resolution value <1-128> cmode runtime  
> > 
> > If you're using debugfs as a pipe you should put these enable knobs
> > in there as well.  
> 
> My understanding is that debugfs use as a write mechanism is frowned on. 
> If that's not true and if we were to submit patches that used debugfs 
> instead of devlink and they would be accepted then I'll happily do that. :)

Frowned upon, but any vendor specific write API is frowned up, I don't
think the API is the matter of devlink vs debugfs. To put it differently -
a lot of people try to use devlink params or debugfs without stopping
to think about how the interface can be used and shared across vendors.
Or even more sadly - how the end user will integrate them into their
operations / fleet management.

> Or add a proper devlink command to carry all this
> > information via structured netlink (fw log + level + enable are hardly
> > Intel specific).  
> 
> I don't know how other companies FW interface works so wouldn't assume 
> that I could come up with an interface that would work across all devices.

Let's think about devlink health first.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-14  0:40     ` Jakub Kicinski
@ 2023-02-14 16:14       ` Paul M Stillwell Jr
  2023-02-14 20:44         ` Jakub Kicinski
  2023-02-14 22:39         ` Jacob Keller
  0 siblings, 2 replies; 17+ messages in thread
From: Paul M Stillwell Jr @ 2023-02-14 16:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tony Nguyen, davem, pabeni, edumazet, netdev, jacob.e.keller,
	jiri, idosch

On 2/13/2023 4:40 PM, Jakub Kicinski wrote:
> On Mon, 13 Feb 2023 15:46:53 -0800 Paul M Stillwell Jr wrote:
>> On 2/10/2023 8:23 PM, Jakub Kicinski wrote:
>>> Can you describe how this is used a little bit?
>>> The FW log is captured at some level always (e.g. warns)
>>> or unless user enables _nothing_ will come out?
>>
>> My understanding is that the FW is constantly logging data into internal
>> buffers. When the user indicates what data they want and what level they
>> want then the data is filtered and output via either the UART or the
>> Admin queues. These patches retrieve the FW logs via the admin queue
>> commands.
> 
> What's the trigger to perform the collection?
> 
> If it's some error condition / assert in FW then maybe it's worth
> wrapping it up (or at least some portion of the functionality) into
> devlink health?

The trigger is the user asking to collect the FW logs. There isn't 
anything within the FW that triggers the logging; generally there is 
some issue on the user side and we think there may be some issue in the 
FW or that FW can provide more info on what is going on so we request FW 
logs. As an example, sometimes users report issues with link flap and we 
request FW logs to see what the FW link management code thinks is 
happening. In this example there is no "error" per se, but the user is 
seeing some undesired behavior and we are looking for more information 
on what could be going on.

> 
> AFAIU the purpose of devlink health is exactly to bubble up to the host
> asserts / errors / crashes in the FW, with associated "dump".
> 

Maybe it is, but when I look at devlink health it doesn't seem like it 
is designed for something like this. It looks like (based on my reading 
of the documentation) that it responds to errors from the device; that's 
not really what is happening in our case. The user is seeing some 
behavior that they don't like and we are asking the FW to shed some 
light on what the FW thinks is happening.

Link flap is an excellent example of this. The FW is doing what it 
believes to be the correct thing, but due to some change on the link 
partner that the FW doesn't handle correctly then there is some issue. 
This is a classic bug, the code thinks it's doing the correct thing and 
in reality it is not.

In the above example nothing on the device is reporting an error so I 
don't see how the health reporter would get triggered.

Also, devlink health seems like it is geared towards a model of the 
device has an error, the error gets reported to the driver, the driver 
gets some info to report to the user, and the driver moves on. The FW 
logging is different from that in that we want to see data across a long 
period of time generally because we can't always pinpoint the time that 
the thing we want to see happened.

>> The output from the FW is a binary blob that a user would send back to
>> Intel to be decoded. This is only used for troubleshooting issues where
>> a user is working with someone from Intel on a specific problem.
> 
> I believe that's in line with devlink health. The devlink health log
> is "formatted" but I really doubt that any user can get far in debugging
> without vendor support.
> 

I agree, I just don't see what the trigger is in our case for FW logging.

>>> On Thu,  9 Feb 2023 11:06:57 -0800 Tony Nguyen wrote:
>>>> devlink dev param set <pci dev> name fwlog_enabled value <true/false> cmode runtime
>>>> devlink dev param set <pci dev> name fwlog_level value <0-4> cmode runtime
>>>> devlink dev param set <pci dev> name fwlog_resolution value <1-128> cmode runtime
>>>
>>> If you're using debugfs as a pipe you should put these enable knobs
>>> in there as well.
>>
>> My understanding is that debugfs use as a write mechanism is frowned on.
>> If that's not true and if we were to submit patches that used debugfs
>> instead of devlink and they would be accepted then I'll happily do that. :)
> 
> Frowned upon, but any vendor specific write API is frowned up, I don't
> think the API is the matter of devlink vs debugfs. To put it differently -
> a lot of people try to use devlink params or debugfs without stopping
> to think about how the interface can be used and shared across vendors.
> Or even more sadly - how the end user will integrate them into their
> operations / fleet management.
> 
>> Or add a proper devlink command to carry all this
>>> information via structured netlink (fw log + level + enable are hardly
>>> Intel specific).
>>
>> I don't know how other companies FW interface works so wouldn't assume
>> that I could come up with an interface that would work across all devices.
> 
> Let's think about devlink health first.

I'm happy to think about it, but as I said I don't see how our FW 
logging model fits into the paradigm of devlink health. I'm open to 
suggestions because I may not have thought about this in a way that 
would fit into devlink health.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-14 16:14       ` Paul M Stillwell Jr
@ 2023-02-14 20:44         ` Jakub Kicinski
  2023-02-14 22:39         ` Jacob Keller
  1 sibling, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-14 20:44 UTC (permalink / raw)
  To: Paul M Stillwell Jr
  Cc: Tony Nguyen, davem, pabeni, edumazet, netdev, jacob.e.keller,
	jiri, idosch

On Tue, 14 Feb 2023 08:14:04 -0800 Paul M Stillwell Jr wrote:
> >> I don't know how other companies FW interface works so wouldn't assume
> >> that I could come up with an interface that would work across all devices.  
> > 
> > Let's think about devlink health first.  
> 
> I'm happy to think about it, but as I said I don't see how our FW 
> logging model fits into the paradigm of devlink health. I'm open to 
> suggestions because I may not have thought about this in a way that 
> would fit into devlink health.

Maybe just use ethtool set_dump / get_dump_data then?
That's more of an ad-hoc collection API.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-14 16:14       ` Paul M Stillwell Jr
  2023-02-14 20:44         ` Jakub Kicinski
@ 2023-02-14 22:39         ` Jacob Keller
  2023-02-14 23:19           ` Jakub Kicinski
  1 sibling, 1 reply; 17+ messages in thread
From: Jacob Keller @ 2023-02-14 22:39 UTC (permalink / raw)
  To: Paul M Stillwell Jr, Jakub Kicinski
  Cc: Tony Nguyen, davem, pabeni, edumazet, netdev, jiri, idosch



On 2/14/2023 8:14 AM, Paul M Stillwell Jr wrote:
> On 2/13/2023 4:40 PM, Jakub Kicinski wrote:
>> On Mon, 13 Feb 2023 15:46:53 -0800 Paul M Stillwell Jr wrote:
>>> On 2/10/2023 8:23 PM, Jakub Kicinski wrote:
>>>> Can you describe how this is used a little bit?
>>>> The FW log is captured at some level always (e.g. warns)
>>>> or unless user enables _nothing_ will come out?
>>>
>>> My understanding is that the FW is constantly logging data into internal
>>> buffers. When the user indicates what data they want and what level they
>>> want then the data is filtered and output via either the UART or the
>>> Admin queues. These patches retrieve the FW logs via the admin queue
>>> commands.
>>
>> What's the trigger to perform the collection?
>>
>> If it's some error condition / assert in FW then maybe it's worth
>> wrapping it up (or at least some portion of the functionality) into
>> devlink health?
> 
> The trigger is the user asking to collect the FW logs. There isn't 
> anything within the FW that triggers the logging; generally there is 
> some issue on the user side and we think there may be some issue in the 
> FW or that FW can provide more info on what is going on so we request FW 
> logs. As an example, sometimes users report issues with link flap and we 
> request FW logs to see what the FW link management code thinks is 
> happening. In this example there is no "error" per se, but the user is 
> seeing some undesired behavior and we are looking for more information 
> on what could be going on.
> 
>>
>> AFAIU the purpose of devlink health is exactly to bubble up to the host
>> asserts / errors / crashes in the FW, with associated "dump".
>>
> 
> Maybe it is, but when I look at devlink health it doesn't seem like it 
> is designed for something like this. It looks like (based on my reading 
> of the documentation) that it responds to errors from the device; that's 
> not really what is happening in our case. The user is seeing some 
> behavior that they don't like and we are asking the FW to shed some 
> light on what the FW thinks is happening.
> 
> Link flap is an excellent example of this. The FW is doing what it 
> believes to be the correct thing, but due to some change on the link 
> partner that the FW doesn't handle correctly then there is some issue. 
> This is a classic bug, the code thinks it's doing the correct thing and 
> in reality it is not.
> 
> In the above example nothing on the device is reporting an error so I 
> don't see how the health reporter would get triggered.
> 
> Also, devlink health seems like it is geared towards a model of the 
> device has an error, the error gets reported to the driver, the driver 
> gets some info to report to the user, and the driver moves on. The FW 
> logging is different from that in that we want to see data across a long 
> period of time generally because we can't always pinpoint the time that 
> the thing we want to see happened.
> 
>>> The output from the FW is a binary blob that a user would send back to
>>> Intel to be decoded. This is only used for troubleshooting issues where
>>> a user is working with someone from Intel on a specific problem.
>>
>> I believe that's in line with devlink health. The devlink health log
>> is "formatted" but I really doubt that any user can get far in debugging
>> without vendor support.
>>
> 
> I agree, I just don't see what the trigger is in our case for FW logging.
> 

Here's the thoughts I had for devlink health:

1) support health reporters storing more than a single event. Currently
all health reporters respond to a single event and then do not allow
storing new captures until the current one is processed. This breaks for
our firmware logging because we get separate events from firmware for
each buffer of messages. We could make this configurable such that we
limit the total maximum to prevent kernel memory overrun. (and some
policy for how to discard events when the buffer is full?)

2a) add some knobs to enable/disable a health reporter

2b) add some firmware logging specific knobs as a "build on top of
health reporters" or by creating a separate firmware logging bit that
ties into a reporter. These knows would be how to set level, etc.

3) for ice, once the health reporter is enabled we request the firmware
to send us logging, then we get our admin queue message and simply copy
this into the health reporter as a new event

4) user space is in charge of monitoring health reports and can decide
how to copy events out to disk and when to delete the health reports
from the kernel.

Basically: extend health reporters to allow multiple captures and add a
related module to configure firmware logging via a health reporter,
where the "event" is just "I have a new blob to store".

How does this sound?

For the specifics of 2b) I think we can probably agree that levels is
fairly generic (i.e. the specifics of what each level are is vendor
specific but the fact that there are numbers and that higher or lower
numbers means more severe is fairly standard)

I know the ice firmware has many such modules we can enable or disable
and we would ideally be able to set which modules are active or not.
However all messages come through in the same blobs so we can't separate
them and report them to individual health reporter events. I think we
could have modules as a separate option for toggling which ones are on
or off. I would expect other vendors to have something similar or have
no modules at all and just an on/off switch?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-14 22:39         ` Jacob Keller
@ 2023-02-14 23:19           ` Jakub Kicinski
  2023-02-15  0:07             ` Jacob Keller
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-14 23:19 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Paul M Stillwell Jr, Tony Nguyen, davem, pabeni, edumazet,
	netdev, jiri, idosch

On Tue, 14 Feb 2023 14:39:18 -0800 Jacob Keller wrote:
> On 2/14/2023 8:14 AM, Paul M Stillwell Jr wrote:
> >> I believe that's in line with devlink health. The devlink health log
> >> is "formatted" but I really doubt that any user can get far in debugging
> >> without vendor support.
> >>  
> > 
> > I agree, I just don't see what the trigger is in our case for FW logging.
> >   
> 
> Here's the thoughts I had for devlink health:
> 
> 1) support health reporters storing more than a single event. Currently
> all health reporters respond to a single event and then do not allow
> storing new captures until the current one is processed. This breaks for
> our firmware logging because we get separate events from firmware for
> each buffer of messages. We could make this configurable such that we
> limit the total maximum to prevent kernel memory overrun. (and some
> policy for how to discard events when the buffer is full?)

I think the idea is that the system keeps a continuous ring buffer of
logs and dumps it whenever bad events happens. That's in case of logs,
obviously you can expose other types of state with health.

> 2a) add some knobs to enable/disable a health reporter

For ad-hoc collection of the ring buffer there are dump and diagnose
callbacks which can be triggered at any time.

> 2b) add some firmware logging specific knobs as a "build on top of
> health reporters" or by creating a separate firmware logging bit that
> ties into a reporter. These knows would be how to set level, etc.

Right, the level setting is the part that I'm the least sure of.
That sounds like something more fitting to ethtool dumps.

> 3) for ice, once the health reporter is enabled we request the firmware
> to send us logging, then we get our admin queue message and simply copy
> this into the health reporter as a new event
> 
> 4) user space is in charge of monitoring health reports and can decide
> how to copy events out to disk and when to delete the health reports
> from the kernel.

That's also out of what's expected with health reporters. User should
not have to run vendor tools with devlink health. Decoding of the dump
may require vendor tools but checking if system is healthy or something
crashed should happen without any user space involvement.

> Basically: extend health reporters to allow multiple captures and add a
> related module to configure firmware logging via a health reporter,
> where the "event" is just "I have a new blob to store".
> 
> How does this sound?
> 
> For the specifics of 2b) I think we can probably agree that levels is
> fairly generic (i.e. the specifics of what each level are is vendor
> specific but the fact that there are numbers and that higher or lower
> numbers means more severe is fairly standard)
> 
> I know the ice firmware has many such modules we can enable or disable
> and we would ideally be able to set which modules are active or not.
> However all messages come through in the same blobs so we can't separate
> them and report them to individual health reporter events. I think we
> could have modules as a separate option for toggling which ones are on
> or off. I would expect other vendors to have something similar or have
> no modules at all and just an on/off switch?

I bet all vendors at this point have separate modules in the FW.
It's been the case for a while, that's why we have multiple versions
supported in devlink dev info.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-14 23:19           ` Jakub Kicinski
@ 2023-02-15  0:07             ` Jacob Keller
  2023-02-15  1:16               ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Keller @ 2023-02-15  0:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paul M Stillwell Jr, Tony Nguyen, davem, pabeni, edumazet,
	netdev, jiri, idosch



On 2/14/2023 3:19 PM, Jakub Kicinski wrote:
> On Tue, 14 Feb 2023 14:39:18 -0800 Jacob Keller wrote:
>> On 2/14/2023 8:14 AM, Paul M Stillwell Jr wrote:
>>>> I believe that's in line with devlink health. The devlink health log
>>>> is "formatted" but I really doubt that any user can get far in debugging
>>>> without vendor support.
>>>>  
>>>
>>> I agree, I just don't see what the trigger is in our case for FW logging.
>>>   
>>
>> Here's the thoughts I had for devlink health:
>>
>> 1) support health reporters storing more than a single event. Currently
>> all health reporters respond to a single event and then do not allow
>> storing new captures until the current one is processed. This breaks for
>> our firmware logging because we get separate events from firmware for
>> each buffer of messages. We could make this configurable such that we
>> limit the total maximum to prevent kernel memory overrun. (and some
>> policy for how to discard events when the buffer is full?)
> 
> I think the idea is that the system keeps a continuous ring buffer of
> logs and dumps it whenever bad events happens. That's in case of logs,
> obviously you can expose other types of state with health.
> 

Ok so in our case the firmware presumably has some internal storage for
messages and some set of masks for what messages it will "store" and
forward to us.

We can enable logging and enable the modules and logging levels. This
this enables firmware to log messages to its buffer, which is then
periodically flushed to us. The messages may be for "bad" events but may
not be. Sometimes its tracing information, or other details about how
the firmware. This is all encoded, and we (driver devs) are not given
the tools to decode it. My understanding is that this partly is from
fear of tying to specific driver versions.

We're supposed to dump this binary contents and give it to firmware devs
who can then decode it using the appropriate decoder for the given version.

The information is very helpful for debugging problems, but we can't
exactly translate it into usable information on our own. The mere
presence of some information (once logging is enabled) is *not*
sufficient to know if there was a problem. Basically every flow can be
enabled to log all kinds of detail.

Additionally, we get messages in blocks, and don't have any mechanism in
the driver to split these blocks into individual messages, or to tell
what log level or module the messages in a block are from.

>> 2a) add some knobs to enable/disable a health reporter
> 
> For ad-hoc collection of the ring buffer there are dump and diagnose
> callbacks which can be triggered at any time.

Sure. The point I was making is that we need some indication from user
space whether to enable or disable the logging.. because otherwise we'd
potentially just consume memory indefinitely, or we'd have limited space
to store messages and we'd overflow it.

The volume of logs can be very high depending on what modules or log
level is enabled. Thus storing all messages all the time is not really
viable.

One of our early prototypes for this feature hex-encoded the messages to
syslog but we pretty much always lost messages doing this because we'd
overflow the syslog buffer. That is what led us to use debugfs for
reading/writing the binary data. (Not to mention that since its binary
data only decodable by Intel there is no reason to spam the syslog
buffer...)

> 
>> 2b) add some firmware logging specific knobs as a "build on top of
>> health reporters" or by creating a separate firmware logging bit that
>> ties into a reporter. These knows would be how to set level, etc.
> 
> Right, the level setting is the part that I'm the least sure of.
> That sounds like something more fitting to ethtool dumps.
> 

I don't feel like this fits into ethtool at all as its not network
specific and tying it to a netdev feels weird.

>> 3) for ice, once the health reporter is enabled we request the firmware
>> to send us logging, then we get our admin queue message and simply copy
>> this into the health reporter as a new event
>>
>> 4) user space is in charge of monitoring health reports and can decide
>> how to copy events out to disk and when to delete the health reports
>> from the kernel.
> 
> That's also out of what's expected with health reporters. User should
> not have to run vendor tools with devlink health. Decoding of the dump
> may require vendor tools but checking if system is healthy or something
> crashed should happen without any user space involvement.
> 

So this wasn't about using a specific "vendor" tool, but more that
devlink health can decide when to delete a given dump?

Ultimately we have to take the binary data and give it to a vendor
specific tool to decode (whether I like that or not...). The information
required to decode the messages is not something we have permission to
share and code into the driver.

>> Basically: extend health reporters to allow multiple captures and add a
>> related module to configure firmware logging via a health reporter,
>> where the "event" is just "I have a new blob to store".
>>
>> How does this sound?
>>
>> For the specifics of 2b) I think we can probably agree that levels is
>> fairly generic (i.e. the specifics of what each level are is vendor
>> specific but the fact that there are numbers and that higher or lower
>> numbers means more severe is fairly standard)
>>
>> I know the ice firmware has many such modules we can enable or disable
>> and we would ideally be able to set which modules are active or not.
>> However all messages come through in the same blobs so we can't separate
>> them and report them to individual health reporter events. I think we
>> could have modules as a separate option for toggling which ones are on
>> or off. I would expect other vendors to have something similar or have
>> no modules at all and just an on/off switch?
> 
> I bet all vendors at this point have separate modules in the FW.
> It's been the case for a while, that's why we have multiple versions
> supported in devlink dev info.

So one key here is that module for us refers to various sub-components
of our main firmware, and does not tie into the devlink info modules at
all, nor would that even make sense to us.

Its more like sections i.e.

DCB,
MDIO,
NVM,
Scheduler,
Tx queue management,
SyncE,
LLDP,
Link Management,
...

I believe when a firmware dev adds a log message they choose an
appropriate section and log level for when it should be reported.

This makes me think the right approach is to add a new "devlink fwlog"
section entirely where we can define its semantics. It doesn't quite
line up with the current intention of health reporters.

We also considered some sort of extension to devlink regions, where each
new batch of messages from firmware would be a new snapshot.

Again this still requires some form of controls for whether to enable
logging, how many snapshots to store, how to discard old snapshots if we
run out of space, and what modules and log levels to enable.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-15  0:07             ` Jacob Keller
@ 2023-02-15  1:16               ` Jakub Kicinski
  2023-02-15  1:33                 ` Jacob Keller
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-15  1:16 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Paul M Stillwell Jr, Tony Nguyen, davem, pabeni, edumazet,
	netdev, jiri, idosch

On Tue, 14 Feb 2023 16:07:04 -0800 Jacob Keller wrote:
> >> 2b) add some firmware logging specific knobs as a "build on top of
> >> health reporters" or by creating a separate firmware logging bit that
> >> ties into a reporter. These knows would be how to set level, etc.  
> > 
> > Right, the level setting is the part that I'm the least sure of.
> > That sounds like something more fitting to ethtool dumps.
> 
> I don't feel like this fits into ethtool at all as its not network
> specific and tying it to a netdev feels weird.

Yes, I know, all NICs are generic IO devices now. While the only
example of what can go wrong we heard so far is a link flap...

Reimplementing a similar API in devlink with a backward compat
is definitely an option.

> >> 3) for ice, once the health reporter is enabled we request the firmware
> >> to send us logging, then we get our admin queue message and simply copy
> >> this into the health reporter as a new event
> >>
> >> 4) user space is in charge of monitoring health reports and can decide
> >> how to copy events out to disk and when to delete the health reports
> >> from the kernel.  
> > 
> > That's also out of what's expected with health reporters. User should
> > not have to run vendor tools with devlink health. Decoding of the dump
> > may require vendor tools but checking if system is healthy or something
> > crashed should happen without any user space involvement.
> 
> So this wasn't about using a specific "vendor" tool, but more that
> devlink health can decide when to delete a given dump?
> 
> Ultimately we have to take the binary data and give it to a vendor
> specific tool to decode (whether I like that or not...). The information
> required to decode the messages is not something we have permission to
> share and code into the driver.
> 
> > I bet all vendors at this point have separate modules in the FW.
> > It's been the case for a while, that's why we have multiple versions
> > supported in devlink dev info.  
> 
> So one key here is that module for us refers to various sub-components
> of our main firmware, and does not tie into the devlink info modules at
> all, nor would that even make sense to us.
> 
> Its more like sections i.e.
> 
> DCB,
> MDIO,
> NVM,
> Scheduler,
> Tx queue management,
> SyncE,
> LLDP,
> Link Management,
> ...
> 
> I believe when a firmware dev adds a log message they choose an
> appropriate section and log level for when it should be reported.
> 
> This makes me think the right approach is to add a new "devlink fwlog"
> section entirely where we can define its semantics. It doesn't quite
> line up with the current intention of health reporters.
> 
> We also considered some sort of extension to devlink regions, where each
> new batch of messages from firmware would be a new snapshot.
> 
> Again this still requires some form of controls for whether to enable
> logging, how many snapshots to store, how to discard old snapshots if we
> run out of space, and what modules and log levels to enable.

Yeah, it doesn't fit into health or regions if there's no signal on 
when things go wrong. If ethtool set_dump / get_dump doesn't fit a new
command may be better.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-15  1:16               ` Jakub Kicinski
@ 2023-02-15  1:33                 ` Jacob Keller
  2023-02-15  2:07                   ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Keller @ 2023-02-15  1:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paul M Stillwell Jr, Tony Nguyen, davem, pabeni, edumazet,
	netdev, jiri, idosch



On 2/14/2023 5:16 PM, Jakub Kicinski wrote:
> On Tue, 14 Feb 2023 16:07:04 -0800 Jacob Keller wrote:
>>>> 2b) add some firmware logging specific knobs as a "build on top of
>>>> health reporters" or by creating a separate firmware logging bit that
>>>> ties into a reporter. These knows would be how to set level, etc.  
>>>
>>> Right, the level setting is the part that I'm the least sure of.
>>> That sounds like something more fitting to ethtool dumps.
>>
>> I don't feel like this fits into ethtool at all as its not network
>> specific and tying it to a netdev feels weird.
> 
> Yes, I know, all NICs are generic IO devices now. While the only
> example of what can go wrong we heard so far is a link flap...
> 
> Reimplementing a similar API in devlink with a backward compat
> is definitely an option.
> 

Sure. Well the interface is more of a way for firmware team to get
debugging information out of the firmware. Its sort of like a "print"
debugging, where information about the state of firmware during
different activities can be recorded.

The idea is that when a problem is detected by a user, they can enable
firmware logging to capture this data and then that can aid us in
determining what really went wrong.

It isn't a "we detected a problem" interface. It's a "here's a bunch of
debugging logging you asked for!" interface.

>> I believe when a firmware dev adds a log message they choose an
>> appropriate section and log level for when it should be reported.
>>
>> This makes me think the right approach is to add a new "devlink fwlog"
>> section entirely where we can define its semantics. It doesn't quite
>> line up with the current intention of health reporters.
>>
>> We also considered some sort of extension to devlink regions, where each
>> new batch of messages from firmware would be a new snapshot.
>>
>> Again this still requires some form of controls for whether to enable
>> logging, how many snapshots to store, how to discard old snapshots if we
>> run out of space, and what modules and log levels to enable.
> 
> Yeah, it doesn't fit into health or regions if there's no signal on 
> when things go wrong. If ethtool set_dump / get_dump doesn't fit a new
> command may be better.

I am not sure how ETHTOOL_SET_DUMP works for configuring the settings of
what modules to enable or what log level. I guess it has "@flag" which
would be the settings. This is u32 which might not be enough information
to specify the log level and the module config. It also doesn't expose
those values to allow users to actually understand what they're
configuring on or off.

It does look like it would work for this patch set via the following flow:

ETHTOOL_SET_DUMP -> enable dumping with the appropriate flags (driver
defined...)

Once activated, the driver would then capture and store data as we do
now, and user space could retrieve it via ETHTOOL_GET_DUMP_FLAG and
ETHTOOL_GET_DUMP_DATA

But this doesn't really help solve the problem of how to expose the
module and log levels. Are we accepting just a binary flags that are
driver specific?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver
  2023-02-15  1:33                 ` Jacob Keller
@ 2023-02-15  2:07                   ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2023-02-15  2:07 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Paul M Stillwell Jr, Tony Nguyen, davem, pabeni, edumazet,
	netdev, jiri, idosch

On Tue, 14 Feb 2023 17:33:25 -0800 Jacob Keller wrote:
> > Yes, I know, all NICs are generic IO devices now. While the only
> > example of what can go wrong we heard so far is a link flap...
> > 
> > Reimplementing a similar API in devlink with a backward compat
> > is definitely an option.
> 
> Sure. Well the interface is more of a way for firmware team to get
> debugging information out of the firmware. Its sort of like a "print"
> debugging, where information about the state of firmware during
> different activities can be recorded.
> 
> The idea is that when a problem is detected by a user, they can enable
> firmware logging to capture this data and then that can aid us in
> determining what really went wrong.
> 
> It isn't a "we detected a problem" interface. It's a "here's a bunch of
> debugging logging you asked for!" interface.

Yes, none of this sounds very usable in production. So maybe just put
the basic interface in debugfs and keep the per-module etc. whetevers
in the out of tree driver? That seems like a level of granularity
useful for development.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-02-15  2:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 19:06 [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Tony Nguyen
2023-02-09 19:06 ` [PATCH net-next 1/5] ice: remove FW logging code Tony Nguyen
2023-02-09 19:06 ` [PATCH net-next 2/5] ice: enable devlink to check FW logging status Tony Nguyen
2023-02-09 19:07 ` [PATCH net-next 3/5] ice: add ability to query/set FW log level and resolution Tony Nguyen
2023-02-09 19:07 ` [PATCH net-next 4/5] ice: disable FW logging on driver unload Tony Nguyen
2023-02-09 19:07 ` [PATCH net-next 5/5] ice: use debugfs to output FW log data Tony Nguyen
2023-02-11  4:23 ` [PATCH net-next 0/5][pull request] add v2 FW logging for ice driver Jakub Kicinski
2023-02-13 23:46   ` Paul M Stillwell Jr
2023-02-14  0:40     ` Jakub Kicinski
2023-02-14 16:14       ` Paul M Stillwell Jr
2023-02-14 20:44         ` Jakub Kicinski
2023-02-14 22:39         ` Jacob Keller
2023-02-14 23:19           ` Jakub Kicinski
2023-02-15  0:07             ` Jacob Keller
2023-02-15  1:16               ` Jakub Kicinski
2023-02-15  1:33                 ` Jacob Keller
2023-02-15  2:07                   ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.