[Intel-wired-lan] [PATCH iwl-next v4 0/5] ice: Support 5 layer Tx scheduler topology

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-wired-lan] [PATCH iwl-next v4 0/5] ice: Support 5 layer Tx scheduler topology
@ 2024-02-19 10:05 ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Mateusz Polchlopek, horms, przemyslaw.kitszel

For performance reasons there is a need to have support for selectable
Tx scheduler topology. Currently firmware supports only the default
9-layer and 5-layer topology. This patch series enables switch from
default to 5-layer topology, if user decides to opt-in.

---
v4:
- restored the initial way of passing firmware data to ice_cfg_tx_topo
  function in ice_init_tx_topology function in ice_main.c file. In v2
  and v3 version it was passed as const u8 parameter which caused kernel
  crash. Because of this change I decided to drop all Reviewed-by tags.

v3:
- fixed documentation warnings
https://lore.kernel.org/netdev/20231009090711.136777-1-mateusz.polchlopek@intel.com/

v2:
- updated documentation
- reorder of variables list (default-init first)
- comments changed to be more descriptive
- added elseif's instead of few if's
- returned error when ice_request_fw fails
- ice_cfg_tx_topo() changed to take const u8 as parameter (get rid of copy
  buffer)
- renamed all "balance" occurences to the new one
- prevent fail of ice_aq_read_nvm() function
- unified variables names (int err instead of int status in few
  functions)
- some smaller fixes, typo fixes
https://lore.kernel.org/netdev/20231006110212.96305-1-mateusz.polchlopek@intel.com/

v1:
https://lore.kernel.org/netdev/20230523174008.3585300-1-anthony.l.nguyen@intel.com/
---

Lukasz Czapnik (1):
  ice: Add tx_scheduling_layers devlink param

Michal Wilczynski (2):
  ice: Enable switching default Tx scheduler topology
  ice: Document tx_scheduling_layers parameter

Raj Victor (2):
  ice: Support 5 layer topology
  ice: Adjust the VSI/Aggregator layers

 Documentation/networking/devlink/ice.rst      |  41 ++++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  31 +++
 drivers/net/ethernet/intel/ice/ice_common.c   |   5 +
 drivers/net/ethernet/intel/ice/ice_ddp.c      | 199 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ddp.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 169 +++++++++++++++
 .../net/ethernet/intel/ice/ice_fw_update.c    |   7 +-
 .../net/ethernet/intel/ice/ice_fw_update.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_main.c     | 102 +++++++--
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   7 +-
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   3 +
 drivers/net/ethernet/intel/ice/ice_sched.c    |  37 ++--
 drivers/net/ethernet/intel/ice/ice_sched.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 14 files changed, 565 insertions(+), 45 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 0/5] ice: Support 5 layer Tx scheduler topology
@ 2024-02-19 10:05 ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, horms, przemyslaw.kitszel, Mateusz Polchlopek

For performance reasons there is a need to have support for selectable
Tx scheduler topology. Currently firmware supports only the default
9-layer and 5-layer topology. This patch series enables switch from
default to 5-layer topology, if user decides to opt-in.

---
v4:
- restored the initial way of passing firmware data to ice_cfg_tx_topo
  function in ice_init_tx_topology function in ice_main.c file. In v2
  and v3 version it was passed as const u8 parameter which caused kernel
  crash. Because of this change I decided to drop all Reviewed-by tags.

v3:
- fixed documentation warnings
https://lore.kernel.org/netdev/20231009090711.136777-1-mateusz.polchlopek@intel.com/

v2:
- updated documentation
- reorder of variables list (default-init first)
- comments changed to be more descriptive
- added elseif's instead of few if's
- returned error when ice_request_fw fails
- ice_cfg_tx_topo() changed to take const u8 as parameter (get rid of copy
  buffer)
- renamed all "balance" occurences to the new one
- prevent fail of ice_aq_read_nvm() function
- unified variables names (int err instead of int status in few
  functions)
- some smaller fixes, typo fixes
https://lore.kernel.org/netdev/20231006110212.96305-1-mateusz.polchlopek@intel.com/

v1:
https://lore.kernel.org/netdev/20230523174008.3585300-1-anthony.l.nguyen@intel.com/
---

Lukasz Czapnik (1):
  ice: Add tx_scheduling_layers devlink param

Michal Wilczynski (2):
  ice: Enable switching default Tx scheduler topology
  ice: Document tx_scheduling_layers parameter

Raj Victor (2):
  ice: Support 5 layer topology
  ice: Adjust the VSI/Aggregator layers

 Documentation/networking/devlink/ice.rst      |  41 ++++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  31 +++
 drivers/net/ethernet/intel/ice/ice_common.c   |   5 +
 drivers/net/ethernet/intel/ice/ice_ddp.c      | 199 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ddp.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 169 +++++++++++++++
 .../net/ethernet/intel/ice/ice_fw_update.c    |   7 +-
 .../net/ethernet/intel/ice/ice_fw_update.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_main.c     | 102 +++++++--
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   7 +-
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   3 +
 drivers/net/ethernet/intel/ice/ice_sched.c    |  37 ++--
 drivers/net/ethernet/intel/ice/ice_sched.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 14 files changed, 565 insertions(+), 45 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 1/5] ice: Support 5 layer topology
  2024-02-19 10:05 ` Mateusz Polchlopek
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Michal Wilczynski, netdev, Raj Victor, horms, Mateusz Polchlopek,
	przemyslaw.kitszel

From: Raj Victor <victor.raj@intel.com>

There is a performance issue when the number of VSIs are not multiple
of 8. This is caused due to the max children limitation per node(8) in
9 layer topology. The BW credits are shared evenly among the children
by default. Assume one node has 8 children and the other has 1.
The parent of these nodes share the BW credit equally among them.
Apparently this causes a problem for the first node which has 8 children.
The 9th VM get more BW credits than the first 8 VMs.

Example:

1) With 8 VM's:
for x in 0 1 2 3 4 5 6 7;
do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done

tx_queue_0_packets: 23283027
tx_queue_1_packets: 23292289
tx_queue_2_packets: 23276136
tx_queue_3_packets: 23279828
tx_queue_4_packets: 23279828
tx_queue_5_packets: 23279333
tx_queue_6_packets: 23277745
tx_queue_7_packets: 23279950
tx_queue_8_packets: 0

2) With 9 VM's:
for x in 0 1 2 3 4 5 6 7 8;
do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done

tx_queue_0_packets: 24163396
tx_queue_1_packets: 24164623
tx_queue_2_packets: 24163188
tx_queue_3_packets: 24163701
tx_queue_4_packets: 24163683
tx_queue_5_packets: 24164668
tx_queue_6_packets: 23327200
tx_queue_7_packets: 24163853
tx_queue_8_packets: 91101417

So on average queue 8 statistics show that 3.7 times more packets were
send there than to the other queues.

The FW starting with version 3.20, has increased the max number of
children per node by reducing the number of layers from 9 to 5. Reflect
this on driver side.

Signed-off-by: Raj Victor <victor.raj@intel.com>
Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  23 ++
 drivers/net/ethernet/intel/ice/ice_common.c   |   5 +
 drivers/net/ethernet/intel/ice/ice_ddp.c      | 199 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ddp.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_sched.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 6 files changed, 233 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index b315c734455a..02102e937b30 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -121,6 +121,7 @@ struct ice_aqc_list_caps_elem {
 #define ICE_AQC_CAPS_PCIE_RESET_AVOIDANCE		0x0076
 #define ICE_AQC_CAPS_POST_UPDATE_RESET_RESTRICT		0x0077
 #define ICE_AQC_CAPS_NVM_MGMT				0x0080
+#define ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE		0x0085
 #define ICE_AQC_CAPS_FW_LAG_SUPPORT			0x0092
 #define ICE_AQC_BIT_ROCEV2_LAG				0x01
 #define ICE_AQC_BIT_SRIOV_LAG				0x02
@@ -819,6 +820,23 @@ struct ice_aqc_get_topo {
 	__le32 addr_low;
 };
 
+/* Get/Set Tx Topology (indirect 0x0418/0x0417) */
+struct ice_aqc_get_set_tx_topo {
+	u8 set_flags;
+#define ICE_AQC_TX_TOPO_FLAGS_CORRER		BIT(0)
+#define ICE_AQC_TX_TOPO_FLAGS_SRC_RAM		BIT(1)
+#define ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW		BIT(4)
+#define ICE_AQC_TX_TOPO_FLAGS_ISSUED		BIT(5)
+
+	u8 get_flags;
+#define ICE_AQC_TX_TOPO_GET_RAM		2
+
+	__le16 reserved1;
+	__le32 reserved2;
+	__le32 addr_high;
+	__le32 addr_low;
+};
+
 /* Update TSE (indirect 0x0403)
  * Get TSE (indirect 0x0404)
  * Add TSE (indirect 0x0401)
@@ -2547,6 +2565,7 @@ struct ice_aq_desc {
 		struct ice_aqc_get_link_topo get_link_topo;
 		struct ice_aqc_i2c read_write_i2c;
 		struct ice_aqc_read_i2c_resp read_i2c_resp;
+		struct ice_aqc_get_set_tx_topo get_set_tx_topo;
 	} params;
 };
 
@@ -2653,6 +2672,10 @@ enum ice_adminq_opc {
 	ice_aqc_opc_query_sched_res			= 0x0412,
 	ice_aqc_opc_remove_rl_profiles			= 0x0415,
 
+	/* tx topology commands */
+	ice_aqc_opc_set_tx_topo				= 0x0417,
+	ice_aqc_opc_get_tx_topo				= 0x0418,
+
 	/* PHY commands */
 	ice_aqc_opc_get_phy_caps			= 0x0600,
 	ice_aqc_opc_set_phy_cfg				= 0x0601,
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 090a2b8b5ff2..175091011251 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1622,6 +1622,8 @@ ice_aq_send_cmd(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf,
 	case ice_aqc_opc_set_port_params:
 	case ice_aqc_opc_get_vlan_mode_parameters:
 	case ice_aqc_opc_set_vlan_mode_parameters:
+	case ice_aqc_opc_set_tx_topo:
+	case ice_aqc_opc_get_tx_topo:
 	case ice_aqc_opc_add_recipe:
 	case ice_aqc_opc_recipe_to_profile:
 	case ice_aqc_opc_get_recipe:
@@ -2178,6 +2180,9 @@ ice_parse_common_caps(struct ice_hw *hw, struct ice_hw_common_caps *caps,
 		ice_debug(hw, ICE_DBG_INIT, "%s: sriov_lag = %u\n",
 			  prefix, caps->sriov_lag);
 		break;
+	case ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE:
+		caps->tx_sched_topo_comp_mode_en = (number == 1);
+		break;
 	default:
 		/* Not one of the recognized common capabilities */
 		found = false;
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index 7532d11ad7f3..766437944774 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -4,6 +4,7 @@
 #include "ice_common.h"
 #include "ice.h"
 #include "ice_ddp.h"
+#include "ice_sched.h"
 
 /* For supporting double VLAN mode, it is necessary to enable or disable certain
  * boost tcam entries. The metadata labels names that match the following
@@ -2263,3 +2264,201 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
 
 	return state;
 }
+
+/**
+ * ice_get_set_tx_topo - get or set Tx topology
+ * @hw: pointer to the HW struct
+ * @buf: pointer to Tx topology buffer
+ * @buf_size: buffer size
+ * @cd: pointer to command details structure or NULL
+ * @flags: pointer to descriptor flags
+ * @set: 0-get, 1-set topology
+ *
+ * The function will get or set Tx topology
+ */
+static int
+ice_get_set_tx_topo(struct ice_hw *hw, u8 *buf, u16 buf_size,
+		    struct ice_sq_cd *cd, u8 *flags, bool set)
+{
+	struct ice_aqc_get_set_tx_topo *cmd;
+	struct ice_aq_desc desc;
+	int status;
+
+	cmd = &desc.params.get_set_tx_topo;
+	if (set) {
+		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_tx_topo);
+		cmd->set_flags = ICE_AQC_TX_TOPO_FLAGS_ISSUED;
+		/* requested to update a new topology, not a default topology */
+		if (buf)
+			cmd->set_flags |= ICE_AQC_TX_TOPO_FLAGS_SRC_RAM |
+					  ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW;
+	} else {
+		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_tx_topo);
+		cmd->get_flags = ICE_AQC_TX_TOPO_GET_RAM;
+	}
+	desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
+	status = ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
+	if (status)
+		return status;
+	/* read the return flag values (first byte) for get operation */
+	if (!set && flags)
+		*flags = desc.params.get_set_tx_topo.set_flags;
+
+	return 0;
+}
+
+/**
+ * ice_cfg_tx_topo - Initialize new Tx topology if available
+ * @hw: pointer to the HW struct
+ * @buf: pointer to Tx topology buffer
+ * @len: buffer size
+ *
+ * The function will apply the new Tx topology from the package buffer
+ * if available.
+ */
+int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
+{
+	u8 *current_topo, *new_topo = NULL;
+	struct ice_run_time_cfg_seg *seg;
+	struct ice_buf_hdr *section;
+	struct ice_pkg_hdr *pkg_hdr;
+	enum ice_ddp_state state;
+	u16 offset, size = 0;
+	u32 reg = 0;
+	int status;
+	u8 flags;
+
+	if (!buf || !len)
+		return -EINVAL;
+
+	/* Does FW support new Tx topology mode ? */
+	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
+		ice_debug(hw, ICE_DBG_INIT, "FW doesn't support compatibility mode\n");
+		return -EOPNOTSUPP;
+	}
+
+	current_topo = kzalloc(ICE_AQ_MAX_BUF_LEN, GFP_KERNEL);
+	if (!current_topo)
+		return -ENOMEM;
+
+	/* Get the current Tx topology */
+	status = ice_get_set_tx_topo(hw, current_topo, ICE_AQ_MAX_BUF_LEN, NULL,
+				     &flags, false);
+
+	kfree(current_topo);
+
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Get current topology is failed\n");
+		return status;
+	}
+
+	/* Is default topology already applied ? */
+	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
+	    hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS) {
+		ice_debug(hw, ICE_DBG_INIT, "Default topology already applied\n");
+		return -EEXIST;
+	}
+
+	/* Is new topology already applied ? */
+	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
+	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
+		ice_debug(hw, ICE_DBG_INIT, "New topology already applied\n");
+		return -EEXIST;
+	}
+
+	/* Setting topology already issued? */
+	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
+		ice_debug(hw, ICE_DBG_INIT, "Update Tx topology was done by another PF\n");
+		/* Add a small delay before exiting */
+		msleep(2000);
+		return -EEXIST;
+	}
+
+	/* Change the topology from new to default (5 to 9) */
+	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
+	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
+		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
+		goto update_topo;
+	}
+
+	pkg_hdr = (struct ice_pkg_hdr *)buf;
+	state = ice_verify_pkg(pkg_hdr, len);
+	if (state) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to verify pkg (err: %d)\n",
+			  state);
+		return -EIO;
+	}
+
+	/* Find runtime configuration segment */
+	seg = (struct ice_run_time_cfg_seg *)
+	      ice_find_seg_in_pkg(hw, SEGMENT_TYPE_ICE_RUN_TIME_CFG, pkg_hdr);
+	if (!seg) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment is missing\n");
+		return -EIO;
+	}
+
+	if (le32_to_cpu(seg->buf_table.buf_count) < ICE_MIN_S_COUNT) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment count(%d) is wrong\n",
+			  seg->buf_table.buf_count);
+		return -EIO;
+	}
+
+	section = ice_pkg_val_buf(seg->buf_table.buf_array);
+	if (!section || le32_to_cpu(section->section_entry[0].type) !=
+		ICE_SID_TX_5_LAYER_TOPO) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section type is wrong\n");
+		return -EIO;
+	}
+
+	size = le16_to_cpu(section->section_entry[0].size);
+	offset = le16_to_cpu(section->section_entry[0].offset);
+	if (size < ICE_MIN_S_SZ || size > ICE_MAX_S_SZ) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section size is wrong\n");
+		return -EIO;
+	}
+
+	/* Make sure the section fits in the buffer */
+	if (offset + size > ICE_PKG_BUF_SIZE) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology buffer > 4K\n");
+		return -EIO;
+	}
+
+	/* Get the new topology buffer */
+	new_topo = ((u8 *)section) + offset;
+
+update_topo:
+	/* Acquire global lock to make sure that set topology issued
+	 * by one PF.
+	 */
+	status = ice_acquire_res(hw, ICE_GLOBAL_CFG_LOCK_RES_ID, ICE_RES_WRITE,
+				 ICE_GLOBAL_CFG_LOCK_TIMEOUT);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to acquire global lock\n");
+		return status;
+	}
+
+	/* Check if reset was triggered already. */
+	reg = rd32(hw, GLGEN_RSTAT);
+	if (reg & GLGEN_RSTAT_DEVSTATE_M) {
+		/* Reset is in progress, re-init the HW again */
+		ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. Layer topology might be applied already\n");
+		ice_check_reset(hw);
+		return 0;
+	}
+
+	/* Set new topology */
+	status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology\n");
+		return status;
+	}
+
+	/* New topology is updated, delay 1 second before issuing the CORER */
+	msleep(1000);
+	ice_reset(hw, ICE_RESET_CORER);
+	/* CORER will clear the global lock, so no explicit call
+	 * required for release.
+	 */
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.h b/drivers/net/ethernet/intel/ice/ice_ddp.h
index ff66c2ffb1a2..622543f08b43 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.h
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.h
@@ -454,4 +454,6 @@ u16 ice_pkg_buf_get_active_sections(struct ice_buf_build *bld);
 void *ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 			   u32 sect_type);
 
+int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len);
+
 #endif
diff --git a/drivers/net/ethernet/intel/ice/ice_sched.h b/drivers/net/ethernet/intel/ice/ice_sched.h
index 1aef05ea5a57..9baff6a857d8 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.h
+++ b/drivers/net/ethernet/intel/ice/ice_sched.h
@@ -6,6 +6,9 @@
 
 #include "ice_common.h"
 
+#define ICE_SCHED_5_LAYERS	5
+#define ICE_SCHED_9_LAYERS	9
+
 #define SCHED_NODE_NAME_MAX_LEN 32
 
 #define ICE_QGRP_LAYER_OFFSET	2
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 657f97e2105f..f964f26664d0 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -296,6 +296,7 @@ struct ice_hw_common_caps {
 	bool pcie_reset_avoidance;
 	/* Post update reset restriction */
 	bool reset_restrict_support;
+	bool tx_sched_topo_comp_mode_en;
 };
 
 /* IEEE 1588 TIME_SYNC specific info */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 1/5] ice: Support 5 layer topology
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, horms, przemyslaw.kitszel, Raj Victor, Michal Wilczynski,
	Mateusz Polchlopek

From: Raj Victor <victor.raj@intel.com>

There is a performance issue when the number of VSIs are not multiple
of 8. This is caused due to the max children limitation per node(8) in
9 layer topology. The BW credits are shared evenly among the children
by default. Assume one node has 8 children and the other has 1.
The parent of these nodes share the BW credit equally among them.
Apparently this causes a problem for the first node which has 8 children.
The 9th VM get more BW credits than the first 8 VMs.

Example:

1) With 8 VM's:
for x in 0 1 2 3 4 5 6 7;
do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done

tx_queue_0_packets: 23283027
tx_queue_1_packets: 23292289
tx_queue_2_packets: 23276136
tx_queue_3_packets: 23279828
tx_queue_4_packets: 23279828
tx_queue_5_packets: 23279333
tx_queue_6_packets: 23277745
tx_queue_7_packets: 23279950
tx_queue_8_packets: 0

2) With 9 VM's:
for x in 0 1 2 3 4 5 6 7 8;
do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done

tx_queue_0_packets: 24163396
tx_queue_1_packets: 24164623
tx_queue_2_packets: 24163188
tx_queue_3_packets: 24163701
tx_queue_4_packets: 24163683
tx_queue_5_packets: 24164668
tx_queue_6_packets: 23327200
tx_queue_7_packets: 24163853
tx_queue_8_packets: 91101417

So on average queue 8 statistics show that 3.7 times more packets were
send there than to the other queues.

The FW starting with version 3.20, has increased the max number of
children per node by reducing the number of layers from 9 to 5. Reflect
this on driver side.

Signed-off-by: Raj Victor <victor.raj@intel.com>
Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  23 ++
 drivers/net/ethernet/intel/ice/ice_common.c   |   5 +
 drivers/net/ethernet/intel/ice/ice_ddp.c      | 199 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_ddp.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_sched.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 6 files changed, 233 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index b315c734455a..02102e937b30 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -121,6 +121,7 @@ struct ice_aqc_list_caps_elem {
 #define ICE_AQC_CAPS_PCIE_RESET_AVOIDANCE		0x0076
 #define ICE_AQC_CAPS_POST_UPDATE_RESET_RESTRICT		0x0077
 #define ICE_AQC_CAPS_NVM_MGMT				0x0080
+#define ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE		0x0085
 #define ICE_AQC_CAPS_FW_LAG_SUPPORT			0x0092
 #define ICE_AQC_BIT_ROCEV2_LAG				0x01
 #define ICE_AQC_BIT_SRIOV_LAG				0x02
@@ -819,6 +820,23 @@ struct ice_aqc_get_topo {
 	__le32 addr_low;
 };
 
+/* Get/Set Tx Topology (indirect 0x0418/0x0417) */
+struct ice_aqc_get_set_tx_topo {
+	u8 set_flags;
+#define ICE_AQC_TX_TOPO_FLAGS_CORRER		BIT(0)
+#define ICE_AQC_TX_TOPO_FLAGS_SRC_RAM		BIT(1)
+#define ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW		BIT(4)
+#define ICE_AQC_TX_TOPO_FLAGS_ISSUED		BIT(5)
+
+	u8 get_flags;
+#define ICE_AQC_TX_TOPO_GET_RAM		2
+
+	__le16 reserved1;
+	__le32 reserved2;
+	__le32 addr_high;
+	__le32 addr_low;
+};
+
 /* Update TSE (indirect 0x0403)
  * Get TSE (indirect 0x0404)
  * Add TSE (indirect 0x0401)
@@ -2547,6 +2565,7 @@ struct ice_aq_desc {
 		struct ice_aqc_get_link_topo get_link_topo;
 		struct ice_aqc_i2c read_write_i2c;
 		struct ice_aqc_read_i2c_resp read_i2c_resp;
+		struct ice_aqc_get_set_tx_topo get_set_tx_topo;
 	} params;
 };
 
@@ -2653,6 +2672,10 @@ enum ice_adminq_opc {
 	ice_aqc_opc_query_sched_res			= 0x0412,
 	ice_aqc_opc_remove_rl_profiles			= 0x0415,
 
+	/* tx topology commands */
+	ice_aqc_opc_set_tx_topo				= 0x0417,
+	ice_aqc_opc_get_tx_topo				= 0x0418,
+
 	/* PHY commands */
 	ice_aqc_opc_get_phy_caps			= 0x0600,
 	ice_aqc_opc_set_phy_cfg				= 0x0601,
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 090a2b8b5ff2..175091011251 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1622,6 +1622,8 @@ ice_aq_send_cmd(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf,
 	case ice_aqc_opc_set_port_params:
 	case ice_aqc_opc_get_vlan_mode_parameters:
 	case ice_aqc_opc_set_vlan_mode_parameters:
+	case ice_aqc_opc_set_tx_topo:
+	case ice_aqc_opc_get_tx_topo:
 	case ice_aqc_opc_add_recipe:
 	case ice_aqc_opc_recipe_to_profile:
 	case ice_aqc_opc_get_recipe:
@@ -2178,6 +2180,9 @@ ice_parse_common_caps(struct ice_hw *hw, struct ice_hw_common_caps *caps,
 		ice_debug(hw, ICE_DBG_INIT, "%s: sriov_lag = %u\n",
 			  prefix, caps->sriov_lag);
 		break;
+	case ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE:
+		caps->tx_sched_topo_comp_mode_en = (number == 1);
+		break;
 	default:
 		/* Not one of the recognized common capabilities */
 		found = false;
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
index 7532d11ad7f3..766437944774 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
@@ -4,6 +4,7 @@
 #include "ice_common.h"
 #include "ice.h"
 #include "ice_ddp.h"
+#include "ice_sched.h"
 
 /* For supporting double VLAN mode, it is necessary to enable or disable certain
  * boost tcam entries. The metadata labels names that match the following
@@ -2263,3 +2264,201 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
 
 	return state;
 }
+
+/**
+ * ice_get_set_tx_topo - get or set Tx topology
+ * @hw: pointer to the HW struct
+ * @buf: pointer to Tx topology buffer
+ * @buf_size: buffer size
+ * @cd: pointer to command details structure or NULL
+ * @flags: pointer to descriptor flags
+ * @set: 0-get, 1-set topology
+ *
+ * The function will get or set Tx topology
+ */
+static int
+ice_get_set_tx_topo(struct ice_hw *hw, u8 *buf, u16 buf_size,
+		    struct ice_sq_cd *cd, u8 *flags, bool set)
+{
+	struct ice_aqc_get_set_tx_topo *cmd;
+	struct ice_aq_desc desc;
+	int status;
+
+	cmd = &desc.params.get_set_tx_topo;
+	if (set) {
+		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_tx_topo);
+		cmd->set_flags = ICE_AQC_TX_TOPO_FLAGS_ISSUED;
+		/* requested to update a new topology, not a default topology */
+		if (buf)
+			cmd->set_flags |= ICE_AQC_TX_TOPO_FLAGS_SRC_RAM |
+					  ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW;
+	} else {
+		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_tx_topo);
+		cmd->get_flags = ICE_AQC_TX_TOPO_GET_RAM;
+	}
+	desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
+	status = ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
+	if (status)
+		return status;
+	/* read the return flag values (first byte) for get operation */
+	if (!set && flags)
+		*flags = desc.params.get_set_tx_topo.set_flags;
+
+	return 0;
+}
+
+/**
+ * ice_cfg_tx_topo - Initialize new Tx topology if available
+ * @hw: pointer to the HW struct
+ * @buf: pointer to Tx topology buffer
+ * @len: buffer size
+ *
+ * The function will apply the new Tx topology from the package buffer
+ * if available.
+ */
+int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
+{
+	u8 *current_topo, *new_topo = NULL;
+	struct ice_run_time_cfg_seg *seg;
+	struct ice_buf_hdr *section;
+	struct ice_pkg_hdr *pkg_hdr;
+	enum ice_ddp_state state;
+	u16 offset, size = 0;
+	u32 reg = 0;
+	int status;
+	u8 flags;
+
+	if (!buf || !len)
+		return -EINVAL;
+
+	/* Does FW support new Tx topology mode ? */
+	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
+		ice_debug(hw, ICE_DBG_INIT, "FW doesn't support compatibility mode\n");
+		return -EOPNOTSUPP;
+	}
+
+	current_topo = kzalloc(ICE_AQ_MAX_BUF_LEN, GFP_KERNEL);
+	if (!current_topo)
+		return -ENOMEM;
+
+	/* Get the current Tx topology */
+	status = ice_get_set_tx_topo(hw, current_topo, ICE_AQ_MAX_BUF_LEN, NULL,
+				     &flags, false);
+
+	kfree(current_topo);
+
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Get current topology is failed\n");
+		return status;
+	}
+
+	/* Is default topology already applied ? */
+	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
+	    hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS) {
+		ice_debug(hw, ICE_DBG_INIT, "Default topology already applied\n");
+		return -EEXIST;
+	}
+
+	/* Is new topology already applied ? */
+	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
+	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
+		ice_debug(hw, ICE_DBG_INIT, "New topology already applied\n");
+		return -EEXIST;
+	}
+
+	/* Setting topology already issued? */
+	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
+		ice_debug(hw, ICE_DBG_INIT, "Update Tx topology was done by another PF\n");
+		/* Add a small delay before exiting */
+		msleep(2000);
+		return -EEXIST;
+	}
+
+	/* Change the topology from new to default (5 to 9) */
+	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
+	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
+		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
+		goto update_topo;
+	}
+
+	pkg_hdr = (struct ice_pkg_hdr *)buf;
+	state = ice_verify_pkg(pkg_hdr, len);
+	if (state) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to verify pkg (err: %d)\n",
+			  state);
+		return -EIO;
+	}
+
+	/* Find runtime configuration segment */
+	seg = (struct ice_run_time_cfg_seg *)
+	      ice_find_seg_in_pkg(hw, SEGMENT_TYPE_ICE_RUN_TIME_CFG, pkg_hdr);
+	if (!seg) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment is missing\n");
+		return -EIO;
+	}
+
+	if (le32_to_cpu(seg->buf_table.buf_count) < ICE_MIN_S_COUNT) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment count(%d) is wrong\n",
+			  seg->buf_table.buf_count);
+		return -EIO;
+	}
+
+	section = ice_pkg_val_buf(seg->buf_table.buf_array);
+	if (!section || le32_to_cpu(section->section_entry[0].type) !=
+		ICE_SID_TX_5_LAYER_TOPO) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section type is wrong\n");
+		return -EIO;
+	}
+
+	size = le16_to_cpu(section->section_entry[0].size);
+	offset = le16_to_cpu(section->section_entry[0].offset);
+	if (size < ICE_MIN_S_SZ || size > ICE_MAX_S_SZ) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section size is wrong\n");
+		return -EIO;
+	}
+
+	/* Make sure the section fits in the buffer */
+	if (offset + size > ICE_PKG_BUF_SIZE) {
+		ice_debug(hw, ICE_DBG_INIT, "5 layer topology buffer > 4K\n");
+		return -EIO;
+	}
+
+	/* Get the new topology buffer */
+	new_topo = ((u8 *)section) + offset;
+
+update_topo:
+	/* Acquire global lock to make sure that set topology issued
+	 * by one PF.
+	 */
+	status = ice_acquire_res(hw, ICE_GLOBAL_CFG_LOCK_RES_ID, ICE_RES_WRITE,
+				 ICE_GLOBAL_CFG_LOCK_TIMEOUT);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to acquire global lock\n");
+		return status;
+	}
+
+	/* Check if reset was triggered already. */
+	reg = rd32(hw, GLGEN_RSTAT);
+	if (reg & GLGEN_RSTAT_DEVSTATE_M) {
+		/* Reset is in progress, re-init the HW again */
+		ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. Layer topology might be applied already\n");
+		ice_check_reset(hw);
+		return 0;
+	}
+
+	/* Set new topology */
+	status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true);
+	if (status) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology\n");
+		return status;
+	}
+
+	/* New topology is updated, delay 1 second before issuing the CORER */
+	msleep(1000);
+	ice_reset(hw, ICE_RESET_CORER);
+	/* CORER will clear the global lock, so no explicit call
+	 * required for release.
+	 */
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.h b/drivers/net/ethernet/intel/ice/ice_ddp.h
index ff66c2ffb1a2..622543f08b43 100644
--- a/drivers/net/ethernet/intel/ice/ice_ddp.h
+++ b/drivers/net/ethernet/intel/ice/ice_ddp.h
@@ -454,4 +454,6 @@ u16 ice_pkg_buf_get_active_sections(struct ice_buf_build *bld);
 void *ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 			   u32 sect_type);
 
+int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len);
+
 #endif
diff --git a/drivers/net/ethernet/intel/ice/ice_sched.h b/drivers/net/ethernet/intel/ice/ice_sched.h
index 1aef05ea5a57..9baff6a857d8 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.h
+++ b/drivers/net/ethernet/intel/ice/ice_sched.h
@@ -6,6 +6,9 @@
 
 #include "ice_common.h"
 
+#define ICE_SCHED_5_LAYERS	5
+#define ICE_SCHED_9_LAYERS	9
+
 #define SCHED_NODE_NAME_MAX_LEN 32
 
 #define ICE_QGRP_LAYER_OFFSET	2
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 657f97e2105f..f964f26664d0 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -296,6 +296,7 @@ struct ice_hw_common_caps {
 	bool pcie_reset_avoidance;
 	/* Post update reset restriction */
 	bool reset_restrict_support;
+	bool tx_sched_topo_comp_mode_en;
 };
 
 /* IEEE 1588 TIME_SYNC specific info */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 2/5] ice: Adjust the VSI/Aggregator layers
  2024-02-19 10:05 ` Mateusz Polchlopek
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, horms, przemyslaw.kitszel, Raj Victor, Michal Wilczynski,
	Mateusz Polchlopek

From: Raj Victor <victor.raj@intel.com>

Adjust the VSI/Aggregator layers based on the number of logical layers
supported by the FW. Currently the VSI and Aggregator layers are
fixed based on the 9 layer scheduler tree layout. Due to performance
reasons the number of layers of the scheduler tree is changing from
9 to 5. It requires a readjustment of these VSI/Aggregator layer values.

Signed-off-by: Raj Victor <victor.raj@intel.com>
Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_sched.c | 37 +++++++++++-----------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c
index d174a4eeb899..1ac3686328ae 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -1128,12 +1128,11 @@ u8 ice_sched_get_vsi_layer(struct ice_hw *hw)
 	 *     5 or less       sw_entry_point_layer
 	 */
 	/* calculate the VSI layer based on number of layers. */
-	if (hw->num_tx_sched_layers > ICE_VSI_LAYER_OFFSET + 1) {
-		u8 layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-
-		if (layer > hw->sw_entry_point_layer)
-			return layer;
-	}
+	if (hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS)
+		return hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
+	else if (hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS)
+		/* qgroup and VSI layers are same */
+		return hw->num_tx_sched_layers - ICE_QGRP_LAYER_OFFSET;
 	return hw->sw_entry_point_layer;
 }
 
@@ -1150,13 +1149,10 @@ u8 ice_sched_get_agg_layer(struct ice_hw *hw)
 	 *     7 or less       sw_entry_point_layer
 	 */
 	/* calculate the aggregator layer based on number of layers. */
-	if (hw->num_tx_sched_layers > ICE_AGG_LAYER_OFFSET + 1) {
-		u8 layer = hw->num_tx_sched_layers - ICE_AGG_LAYER_OFFSET;
-
-		if (layer > hw->sw_entry_point_layer)
-			return layer;
-	}
-	return hw->sw_entry_point_layer;
+	if (hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS)
+		return hw->num_tx_sched_layers - ICE_AGG_LAYER_OFFSET;
+	else
+		return hw->sw_entry_point_layer;
 }
 
 /**
@@ -1510,10 +1506,11 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 {
 	struct ice_sched_node *vsi_node, *qgrp_node;
 	struct ice_vsi_ctx *vsi_ctx;
+	u8 qgrp_layer, vsi_layer;
 	u16 max_children;
-	u8 qgrp_layer;
 
 	qgrp_layer = ice_sched_get_qgrp_layer(pi->hw);
+	vsi_layer = ice_sched_get_vsi_layer(pi->hw);
 	max_children = pi->hw->max_children[qgrp_layer];
 
 	vsi_ctx = ice_get_vsi_ctx(pi->hw, vsi_handle);
@@ -1524,6 +1521,12 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	if (!vsi_node)
 		return NULL;
 
+	/* If the queue group and VSI layer are same then queues
+	 * are all attached directly to VSI
+	 */
+	if (qgrp_layer == vsi_layer)
+		return vsi_node;
+
 	/* get the first queue group node from VSI sub-tree */
 	qgrp_node = ice_sched_get_first_node(pi, vsi_node, qgrp_layer);
 	while (qgrp_node) {
@@ -3199,7 +3202,7 @@ ice_sched_add_rl_profile(struct ice_port_info *pi,
 	u8 profile_type;
 	int status;
 
-	if (layer_num >= ICE_AQC_TOPO_MAX_LEVEL_NUM)
+	if (!pi || layer_num >= pi->hw->num_tx_sched_layers)
 		return NULL;
 	switch (rl_type) {
 	case ICE_MIN_BW:
@@ -3215,8 +3218,6 @@ ice_sched_add_rl_profile(struct ice_port_info *pi,
 		return NULL;
 	}
 
-	if (!pi)
-		return NULL;
 	hw = pi->hw;
 	list_for_each_entry(rl_prof_elem, &pi->rl_prof_list[layer_num],
 			    list_entry)
@@ -3446,7 +3447,7 @@ ice_sched_rm_rl_profile(struct ice_port_info *pi, u8 layer_num, u8 profile_type,
 	struct ice_aqc_rl_profile_info *rl_prof_elem;
 	int status = 0;
 
-	if (layer_num >= ICE_AQC_TOPO_MAX_LEVEL_NUM)
+	if (layer_num >= pi->hw->num_tx_sched_layers)
 		return -EINVAL;
 	/* Check the existing list for RL profile */
 	list_for_each_entry(rl_prof_elem, &pi->rl_prof_list[layer_num],
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 2/5] ice: Adjust the VSI/Aggregator layers
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Michal Wilczynski, netdev, Raj Victor, horms, Mateusz Polchlopek,
	przemyslaw.kitszel

From: Raj Victor <victor.raj@intel.com>

Adjust the VSI/Aggregator layers based on the number of logical layers
supported by the FW. Currently the VSI and Aggregator layers are
fixed based on the 9 layer scheduler tree layout. Due to performance
reasons the number of layers of the scheduler tree is changing from
9 to 5. It requires a readjustment of these VSI/Aggregator layer values.

Signed-off-by: Raj Victor <victor.raj@intel.com>
Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_sched.c | 37 +++++++++++-----------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c
index d174a4eeb899..1ac3686328ae 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -1128,12 +1128,11 @@ u8 ice_sched_get_vsi_layer(struct ice_hw *hw)
 	 *     5 or less       sw_entry_point_layer
 	 */
 	/* calculate the VSI layer based on number of layers. */
-	if (hw->num_tx_sched_layers > ICE_VSI_LAYER_OFFSET + 1) {
-		u8 layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-
-		if (layer > hw->sw_entry_point_layer)
-			return layer;
-	}
+	if (hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS)
+		return hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
+	else if (hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS)
+		/* qgroup and VSI layers are same */
+		return hw->num_tx_sched_layers - ICE_QGRP_LAYER_OFFSET;
 	return hw->sw_entry_point_layer;
 }
 
@@ -1150,13 +1149,10 @@ u8 ice_sched_get_agg_layer(struct ice_hw *hw)
 	 *     7 or less       sw_entry_point_layer
 	 */
 	/* calculate the aggregator layer based on number of layers. */
-	if (hw->num_tx_sched_layers > ICE_AGG_LAYER_OFFSET + 1) {
-		u8 layer = hw->num_tx_sched_layers - ICE_AGG_LAYER_OFFSET;
-
-		if (layer > hw->sw_entry_point_layer)
-			return layer;
-	}
-	return hw->sw_entry_point_layer;
+	if (hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS)
+		return hw->num_tx_sched_layers - ICE_AGG_LAYER_OFFSET;
+	else
+		return hw->sw_entry_point_layer;
 }
 
 /**
@@ -1510,10 +1506,11 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 {
 	struct ice_sched_node *vsi_node, *qgrp_node;
 	struct ice_vsi_ctx *vsi_ctx;
+	u8 qgrp_layer, vsi_layer;
 	u16 max_children;
-	u8 qgrp_layer;
 
 	qgrp_layer = ice_sched_get_qgrp_layer(pi->hw);
+	vsi_layer = ice_sched_get_vsi_layer(pi->hw);
 	max_children = pi->hw->max_children[qgrp_layer];
 
 	vsi_ctx = ice_get_vsi_ctx(pi->hw, vsi_handle);
@@ -1524,6 +1521,12 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	if (!vsi_node)
 		return NULL;
 
+	/* If the queue group and VSI layer are same then queues
+	 * are all attached directly to VSI
+	 */
+	if (qgrp_layer == vsi_layer)
+		return vsi_node;
+
 	/* get the first queue group node from VSI sub-tree */
 	qgrp_node = ice_sched_get_first_node(pi, vsi_node, qgrp_layer);
 	while (qgrp_node) {
@@ -3199,7 +3202,7 @@ ice_sched_add_rl_profile(struct ice_port_info *pi,
 	u8 profile_type;
 	int status;
 
-	if (layer_num >= ICE_AQC_TOPO_MAX_LEVEL_NUM)
+	if (!pi || layer_num >= pi->hw->num_tx_sched_layers)
 		return NULL;
 	switch (rl_type) {
 	case ICE_MIN_BW:
@@ -3215,8 +3218,6 @@ ice_sched_add_rl_profile(struct ice_port_info *pi,
 		return NULL;
 	}
 
-	if (!pi)
-		return NULL;
 	hw = pi->hw;
 	list_for_each_entry(rl_prof_elem, &pi->rl_prof_list[layer_num],
 			    list_entry)
@@ -3446,7 +3447,7 @@ ice_sched_rm_rl_profile(struct ice_port_info *pi, u8 layer_num, u8 profile_type,
 	struct ice_aqc_rl_profile_info *rl_prof_elem;
 	int status = 0;
 
-	if (layer_num >= ICE_AQC_TOPO_MAX_LEVEL_NUM)
+	if (layer_num >= pi->hw->num_tx_sched_layers)
 		return -EINVAL;
 	/* Check the existing list for RL profile */
 	list_for_each_entry(rl_prof_elem, &pi->rl_prof_list[layer_num],
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 3/5] ice: Enable switching default Tx scheduler topology
  2024-02-19 10:05 ` Mateusz Polchlopek
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, horms, przemyslaw.kitszel, Michal Wilczynski, Mateusz Polchlopek

From: Michal Wilczynski <michal.wilczynski@intel.com>

Introduce support for Tx scheduler topology change, based on user
selection, from default 9-layer to 5-layer.
Change requires NVM (version 3.20 or newer) and DDP package (OS Package
1.3.30 or newer - available for over a year in linux-firmware, since
commit aed71f296637 in linux-firmware ("ice: Update package to 1.3.30.0"))
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=aed71f296637

Enable 5-layer topology switch in init path of the driver. To accomplish
that upload of the DDP package needs to be delayed, until change in Tx
topology is finished. To trigger the Tx change user selection should be
changed in NVM using devlink. Then the platform should be rebooted.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 102 ++++++++++++++++++----
 1 file changed, 83 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 0dbbda218ec5..a04c08053e7d 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4417,11 +4417,11 @@ static char *ice_get_opt_fw_name(struct ice_pf *pf)
 /**
  * ice_request_fw - Device initialization routine
  * @pf: pointer to the PF instance
+ * @firmware: double pointer to firmware struct
  */
-static void ice_request_fw(struct ice_pf *pf)
+static int ice_request_fw(struct ice_pf *pf, const struct firmware **firmware)
 {
 	char *opt_fw_filename = ice_get_opt_fw_name(pf);
-	const struct firmware *firmware = NULL;
 	struct device *dev = ice_pf_to_dev(pf);
 	int err = 0;
 
@@ -4430,29 +4430,91 @@ static void ice_request_fw(struct ice_pf *pf)
 	 * and warning messages for other errors.
 	 */
 	if (opt_fw_filename) {
-		err = firmware_request_nowarn(&firmware, opt_fw_filename, dev);
-		if (err) {
-			kfree(opt_fw_filename);
-			goto dflt_pkg_load;
-		}
-
-		/* request for firmware was successful. Download to device */
-		ice_load_pkg(firmware, pf);
+		err = firmware_request_nowarn(firmware, opt_fw_filename, dev);
 		kfree(opt_fw_filename);
-		release_firmware(firmware);
-		return;
+		if (!err)
+			return err;
+	}
+	err = request_firmware(firmware, ICE_DDP_PKG_FILE, dev);
+	if (err)
+		dev_err(dev, "The DDP package file was not found or could not be read. Entering Safe Mode\n");
+
+	return err;
+}
+
+/**
+ * ice_init_tx_topology - performs Tx topology initialization
+ * @hw: pointer to the hardware structure
+ * @firmware: pointer to firmware structure
+ */
+static int
+ice_init_tx_topology(struct ice_hw *hw, const struct firmware *firmware)
+{
+	u8 num_tx_sched_layers = hw->num_tx_sched_layers;
+	struct ice_pf *pf = hw->back;
+	struct device *dev;
+	u8 *buf_copy;
+	int err;
+
+	dev = ice_pf_to_dev(pf);
+	/* ice_cfg_tx_topo buf argument is not a constant,
+	 * so we have to make a copy
+	 */
+	buf_copy = kmemdup(firmware->data, firmware->size, GFP_KERNEL);
+
+	err = ice_cfg_tx_topo(hw, buf_copy, firmware->size);
+	if (!err) {
+		if (hw->num_tx_sched_layers > num_tx_sched_layers)
+			dev_info(dev, "Tx scheduling layers switching feature disabled\n");
+		else
+			dev_info(dev, "Tx scheduling layers switching feature enabled\n");
+		/* if there was a change in topology ice_cfg_tx_topo triggered
+		 * a CORER and we need to re-init hw
+		 */
+		ice_deinit_hw(hw);
+		err = ice_init_hw(hw);
+
+		return err;
+	} else if (err == -EIO) {
+		dev_info(dev, "DDP package does not support Tx scheduling layers switching feature - please update to the latest DDP package and try again\n");
+	}
+
+	return 0;
+}
+
+/**
+ * ice_init_ddp_config - DDP related configuration
+ * @hw: pointer to the hardware structure
+ * @pf: pointer to pf structure
+ *
+ * This function loads DDP file from the disk, then initializes Tx
+ * topology. At the end DDP package is loaded on the card.
+ */
+static int ice_init_ddp_config(struct ice_hw *hw, struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	const struct firmware *firmware = NULL;
+	int err;
+
+	err = ice_request_fw(pf, &firmware);
+	if (err) {
+		dev_err(dev, "Fail during requesting FW: %d\n", err);
+		return err;
 	}
 
-dflt_pkg_load:
-	err = request_firmware(&firmware, ICE_DDP_PKG_FILE, dev);
+	err = ice_init_tx_topology(hw, firmware);
 	if (err) {
-		dev_err(dev, "The DDP package file was not found or could not be read. Entering Safe Mode\n");
-		return;
+		dev_err(dev, "Fail during initialization of Tx topology: %d\n",
+			err);
+		release_firmware(firmware);
+		return err;
 	}
 
-	/* request for firmware was successful. Download to device */
+	/* Download firmware to device */
 	ice_load_pkg(firmware, pf);
 	release_firmware(firmware);
+
+	return 0;
 }
 
 /**
@@ -4625,9 +4687,11 @@ int ice_init_dev(struct ice_pf *pf)
 
 	ice_init_feature_support(pf);
 
-	ice_request_fw(pf);
+	err = ice_init_ddp_config(hw, pf);
+	if (err)
+		return err;
 
-	/* if ice_request_fw fails, ICE_FLAG_ADV_FEATURES bit won't be
+	/* if ice_init_ddp_config fails, ICE_FLAG_ADV_FEATURES bit won't be
 	 * set in pf->state, which will cause ice_is_safe_mode to return
 	 * true
 	 */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 3/5] ice: Enable switching default Tx scheduler topology
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Mateusz Polchlopek, Michal Wilczynski, horms, przemyslaw.kitszel

From: Michal Wilczynski <michal.wilczynski@intel.com>

Introduce support for Tx scheduler topology change, based on user
selection, from default 9-layer to 5-layer.
Change requires NVM (version 3.20 or newer) and DDP package (OS Package
1.3.30 or newer - available for over a year in linux-firmware, since
commit aed71f296637 in linux-firmware ("ice: Update package to 1.3.30.0"))
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=aed71f296637

Enable 5-layer topology switch in init path of the driver. To accomplish
that upload of the DDP package needs to be delayed, until change in Tx
topology is finished. To trigger the Tx change user selection should be
changed in NVM using devlink. Then the platform should be rebooted.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 102 ++++++++++++++++++----
 1 file changed, 83 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 0dbbda218ec5..a04c08053e7d 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4417,11 +4417,11 @@ static char *ice_get_opt_fw_name(struct ice_pf *pf)
 /**
  * ice_request_fw - Device initialization routine
  * @pf: pointer to the PF instance
+ * @firmware: double pointer to firmware struct
  */
-static void ice_request_fw(struct ice_pf *pf)
+static int ice_request_fw(struct ice_pf *pf, const struct firmware **firmware)
 {
 	char *opt_fw_filename = ice_get_opt_fw_name(pf);
-	const struct firmware *firmware = NULL;
 	struct device *dev = ice_pf_to_dev(pf);
 	int err = 0;
 
@@ -4430,29 +4430,91 @@ static void ice_request_fw(struct ice_pf *pf)
 	 * and warning messages for other errors.
 	 */
 	if (opt_fw_filename) {
-		err = firmware_request_nowarn(&firmware, opt_fw_filename, dev);
-		if (err) {
-			kfree(opt_fw_filename);
-			goto dflt_pkg_load;
-		}
-
-		/* request for firmware was successful. Download to device */
-		ice_load_pkg(firmware, pf);
+		err = firmware_request_nowarn(firmware, opt_fw_filename, dev);
 		kfree(opt_fw_filename);
-		release_firmware(firmware);
-		return;
+		if (!err)
+			return err;
+	}
+	err = request_firmware(firmware, ICE_DDP_PKG_FILE, dev);
+	if (err)
+		dev_err(dev, "The DDP package file was not found or could not be read. Entering Safe Mode\n");
+
+	return err;
+}
+
+/**
+ * ice_init_tx_topology - performs Tx topology initialization
+ * @hw: pointer to the hardware structure
+ * @firmware: pointer to firmware structure
+ */
+static int
+ice_init_tx_topology(struct ice_hw *hw, const struct firmware *firmware)
+{
+	u8 num_tx_sched_layers = hw->num_tx_sched_layers;
+	struct ice_pf *pf = hw->back;
+	struct device *dev;
+	u8 *buf_copy;
+	int err;
+
+	dev = ice_pf_to_dev(pf);
+	/* ice_cfg_tx_topo buf argument is not a constant,
+	 * so we have to make a copy
+	 */
+	buf_copy = kmemdup(firmware->data, firmware->size, GFP_KERNEL);
+
+	err = ice_cfg_tx_topo(hw, buf_copy, firmware->size);
+	if (!err) {
+		if (hw->num_tx_sched_layers > num_tx_sched_layers)
+			dev_info(dev, "Tx scheduling layers switching feature disabled\n");
+		else
+			dev_info(dev, "Tx scheduling layers switching feature enabled\n");
+		/* if there was a change in topology ice_cfg_tx_topo triggered
+		 * a CORER and we need to re-init hw
+		 */
+		ice_deinit_hw(hw);
+		err = ice_init_hw(hw);
+
+		return err;
+	} else if (err == -EIO) {
+		dev_info(dev, "DDP package does not support Tx scheduling layers switching feature - please update to the latest DDP package and try again\n");
+	}
+
+	return 0;
+}
+
+/**
+ * ice_init_ddp_config - DDP related configuration
+ * @hw: pointer to the hardware structure
+ * @pf: pointer to pf structure
+ *
+ * This function loads DDP file from the disk, then initializes Tx
+ * topology. At the end DDP package is loaded on the card.
+ */
+static int ice_init_ddp_config(struct ice_hw *hw, struct ice_pf *pf)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	const struct firmware *firmware = NULL;
+	int err;
+
+	err = ice_request_fw(pf, &firmware);
+	if (err) {
+		dev_err(dev, "Fail during requesting FW: %d\n", err);
+		return err;
 	}
 
-dflt_pkg_load:
-	err = request_firmware(&firmware, ICE_DDP_PKG_FILE, dev);
+	err = ice_init_tx_topology(hw, firmware);
 	if (err) {
-		dev_err(dev, "The DDP package file was not found or could not be read. Entering Safe Mode\n");
-		return;
+		dev_err(dev, "Fail during initialization of Tx topology: %d\n",
+			err);
+		release_firmware(firmware);
+		return err;
 	}
 
-	/* request for firmware was successful. Download to device */
+	/* Download firmware to device */
 	ice_load_pkg(firmware, pf);
 	release_firmware(firmware);
+
+	return 0;
 }
 
 /**
@@ -4625,9 +4687,11 @@ int ice_init_dev(struct ice_pf *pf)
 
 	ice_init_feature_support(pf);
 
-	ice_request_fw(pf);
+	err = ice_init_ddp_config(hw, pf);
+	if (err)
+		return err;
 
-	/* if ice_request_fw fails, ICE_FLAG_ADV_FEATURES bit won't be
+	/* if ice_init_ddp_config fails, ICE_FLAG_ADV_FEATURES bit won't be
 	 * set in pf->state, which will cause ice_is_safe_mode to return
 	 * true
 	 */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-19 10:05 ` Mateusz Polchlopek
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, horms, przemyslaw.kitszel, Lukasz Czapnik, Mateusz Polchlopek

From: Lukasz Czapnik <lukasz.czapnik@intel.com>

It was observed that Tx performance was inconsistent across all queues
and/or VSIs and that it was directly connected to existing 9-layer
topology of the Tx scheduler.

Introduce new private devlink param - tx_scheduling_layers. This parameter
gives user flexibility to choose the 5-layer transmit scheduler topology
which helps to smooth out the transmit performance.

Allowed parameter values are 5 and 9.

Example usage:

Show:
devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
pci/0000:4b:00.0:
  name tx_scheduling_layers type driver-specific
    values:
      cmode permanent value 9

Set:
devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
cmode permanent

devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 9
cmode permanent

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 169 ++++++++++++++++++
 .../net/ethernet/intel/ice/ice_fw_update.c    |   7 +-
 .../net/ethernet/intel/ice/ice_fw_update.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   7 +-
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   3 +
 6 files changed, 189 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 02102e937b30..4143b50bd15d 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1692,6 +1692,14 @@ struct ice_aqc_nvm {
 };
 
 #define ICE_AQC_NVM_START_POINT			0
+#define ICE_AQC_NVM_TX_TOPO_MOD_ID             0x14B
+
+struct ice_aqc_nvm_tx_topo_user_sel {
+	__le16 length;
+	u8 data;
+#define ICE_AQC_NVM_TX_TOPO_USER_SEL	BIT(4)
+	u8 reserved;
+};
 
 /* NVM Checksum Command (direct, 0x0706) */
 struct ice_aqc_nvm_checksum {
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index cc717175178b..db4872990e51 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -770,6 +770,167 @@ ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port,
 	return ice_devlink_port_split(devlink, port, 1, extack);
 }
 
+/**
+ * ice_get_tx_topo_user_sel - Read user's choice from flash
+ * @pf: pointer to pf structure
+ * @layers: value read from flash will be saved here
+ *
+ * Reads user's preference for Tx Scheduler Topology Tree from PFA TLV.
+ *
+ * Returns zero when read was successful, negative values otherwise.
+ */
+static int ice_get_tx_topo_user_sel(struct ice_pf *pf, uint8_t *layers)
+{
+	struct ice_aqc_nvm_tx_topo_user_sel usr_sel = {};
+	struct ice_hw *hw = &pf->hw;
+	int err;
+
+	err = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (err)
+		return err;
+
+	err = ice_aq_read_nvm(hw, ICE_AQC_NVM_TX_TOPO_MOD_ID, 0,
+			     sizeof(usr_sel), &usr_sel, true, true, NULL);
+	if (err)
+		goto exit_release_res;
+
+	if (usr_sel.data & ICE_AQC_NVM_TX_TOPO_USER_SEL)
+		*layers = ICE_SCHED_5_LAYERS;
+       else
+		*layers = ICE_SCHED_9_LAYERS;
+
+exit_release_res:
+	ice_release_nvm(hw);
+
+	return err;
+}
+
+/**
+ * ice_update_tx_topo_user_sel - Save user's preference in flash
+ * @pf: pointer to pf structure
+ * @layers: value to be saved in flash
+ *
+ * Variable "layers" defines user's preference about number of layers in Tx
+ * Scheduler Topology Tree. This choice should be stored in PFA TLV field
+ * and be picked up by driver, next time during init.
+ *
+ * Returns zero when save was successful, negative values otherwise.
+ */
+static int ice_update_tx_topo_user_sel(struct ice_pf *pf, int layers)
+{
+	struct ice_aqc_nvm_tx_topo_user_sel usr_sel = {};
+	struct ice_hw *hw = &pf->hw;
+	int err;
+
+	err = ice_acquire_nvm(hw, ICE_RES_WRITE);
+	if (err)
+		return err;
+
+	err = ice_aq_read_nvm(hw, ICE_AQC_NVM_TX_TOPO_MOD_ID, 0,
+			      sizeof(usr_sel), &usr_sel, true, true, NULL);
+	if (err)
+		goto exit_release_res;
+
+	if (layers == ICE_SCHED_5_LAYERS)
+		usr_sel.data |= ICE_AQC_NVM_TX_TOPO_USER_SEL;
+	else
+		usr_sel.data &= ~ICE_AQC_NVM_TX_TOPO_USER_SEL;
+
+	err = ice_write_one_nvm_block(pf, ICE_AQC_NVM_TX_TOPO_MOD_ID, 2,
+				      sizeof(usr_sel.data), &usr_sel.data,
+				      true, NULL, NULL);
+	if (err)
+		err = -EIO;
+
+exit_release_res:
+	ice_release_nvm(hw);
+
+	return err;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_get - Get tx_scheduling_layers parameter
+ * @devlink: pointer to the devlink instance
+ * @id: the parameter ID to set
+ * @ctx: context to store the parameter value
+ *
+ * Returns zero on success and negative value on failure.
+ */
+static int ice_devlink_tx_sched_layers_get(struct devlink *devlink, u32 id,
+					   struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_get_tx_topo_user_sel(pf, &ctx->val.vu8);
+	if (err) {
+		dev_warn(dev, "Failed to read Tx Scheduler Tree - User Selection data from flash\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_set - Set tx_scheduling_layers parameter
+ * @devlink: pointer to the devlink instance
+ * @id: the parameter ID to set
+ * @ctx: context to get the parameter value
+ *
+ * Returns zero on success and negative value on failure.
+ */
+static int ice_devlink_tx_sched_layers_set(struct devlink *devlink, u32 id,
+					   struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_update_tx_topo_user_sel(pf, ctx->val.vu8);
+	if (err)
+		return -EIO;
+
+	dev_warn(dev, "Tx scheduling layers have been changed on this device. You must reboot the system for the change to take effect.");
+
+	return 0;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_validate - Validate passed tx_scheduling_layers
+ *                                       parameter value
+ * @devlink: unused pointer to devlink instance
+ * @id: the parameter ID to validate
+ * @val: value to validate
+ * @extack: netlink extended ACK structure
+ *
+ * Supported values are:
+ * - 5 - five layers Tx Scheduler Topology Tree
+ * - 9 - nine layers Tx Scheduler Topology Tree
+ *
+ * Returns zero when passed parameter value is supported. Negative value on
+ * error.
+ */
+static int ice_devlink_tx_sched_layers_validate(struct devlink *devlink, u32 id,
+					        union devlink_param_value val,
+					        struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_hw *hw = &pf->hw;
+
+	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
+		NL_SET_ERR_MSG_MOD(extack, "Error: Requested feature is not supported by the FW on this device. Update the FW and run this command again.");
+		return -EOPNOTSUPP;
+	}
+
+	if (val.vu8 != ICE_SCHED_5_LAYERS && val.vu8 != ICE_SCHED_9_LAYERS) {
+		NL_SET_ERR_MSG_MOD(extack, "Error: Wrong number of tx scheduler layers provided.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * ice_tear_down_devlink_rate_tree - removes devlink-rate exported tree
  * @pf: pf struct
@@ -1601,6 +1762,7 @@ static int ice_devlink_loopback_validate(struct devlink *devlink, u32 id,
 enum ice_param_id {
 	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	ICE_DEVLINK_PARAM_ID_LOOPBACK,
+	ICE_DEVLINK_PARAM_ID_TX_BALANCE,
 };
 
 static const struct devlink_param ice_devlink_params[] = {
@@ -1618,6 +1780,13 @@ static const struct devlink_param ice_devlink_params[] = {
 			     ice_devlink_loopback_get,
 			     ice_devlink_loopback_set,
 			     ice_devlink_loopback_validate),
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_BALANCE,
+			     "tx_scheduling_layers",
+			     DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+			     ice_devlink_tx_sched_layers_get,
+			     ice_devlink_tx_sched_layers_set,
+			     ice_devlink_tx_sched_layers_validate),
 };
 
 static void ice_devlink_free(void *devlink_ptr)
diff --git a/drivers/net/ethernet/intel/ice/ice_fw_update.c b/drivers/net/ethernet/intel/ice/ice_fw_update.c
index 319a2d6fe26c..f81db6c107c8 100644
--- a/drivers/net/ethernet/intel/ice/ice_fw_update.c
+++ b/drivers/net/ethernet/intel/ice/ice_fw_update.c
@@ -286,10 +286,9 @@ ice_send_component_table(struct pldmfw *context, struct pldmfw_component *compon
  *
  * Returns: zero on success, or a negative error code on failure.
  */
-static int
-ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
-			u16 block_size, u8 *block, bool last_cmd,
-			u8 *reset_level, struct netlink_ext_ack *extack)
+int ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
+			    u16 block_size, u8 *block, bool last_cmd,
+			    u8 *reset_level, struct netlink_ext_ack *extack)
 {
 	u16 completion_module, completion_retval;
 	struct device *dev = ice_pf_to_dev(pf);
diff --git a/drivers/net/ethernet/intel/ice/ice_fw_update.h b/drivers/net/ethernet/intel/ice/ice_fw_update.h
index 750574885716..04b200462757 100644
--- a/drivers/net/ethernet/intel/ice/ice_fw_update.h
+++ b/drivers/net/ethernet/intel/ice/ice_fw_update.h
@@ -9,5 +9,8 @@ int ice_devlink_flash_update(struct devlink *devlink,
 			     struct netlink_ext_ack *extack);
 int ice_get_pending_updates(struct ice_pf *pf, u8 *pending,
 			    struct netlink_ext_ack *extack);
+int ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
+			    u16 block_size, u8 *block, bool last_cmd,
+			    u8 *reset_level, struct netlink_ext_ack *extack);
 
 #endif
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index d4e05d2cb30c..84eab92dc03c 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -18,10 +18,9 @@
  *
  * Read the NVM using the admin queue commands (0x0701)
  */
-static int
-ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
-		void *data, bool last_command, bool read_shadow_ram,
-		struct ice_sq_cd *cd)
+int ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset,
+		    u16 length, void *data, bool last_command,
+		    bool read_shadow_ram, struct ice_sq_cd *cd)
 {
 	struct ice_aq_desc desc;
 	struct ice_aqc_nvm *cmd;
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 774c2317967d..63cdc6bdac58 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -14,6 +14,9 @@ struct ice_orom_civd_info {
 
 int ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access);
 void ice_release_nvm(struct ice_hw *hw);
+int ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset,
+		    u16 length, void *data, bool last_command,
+		    bool read_shadow_ram, struct ice_sq_cd *cd);
 int
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, horms, przemyslaw.kitszel

From: Lukasz Czapnik <lukasz.czapnik@intel.com>

It was observed that Tx performance was inconsistent across all queues
and/or VSIs and that it was directly connected to existing 9-layer
topology of the Tx scheduler.

Introduce new private devlink param - tx_scheduling_layers. This parameter
gives user flexibility to choose the 5-layer transmit scheduler topology
which helps to smooth out the transmit performance.

Allowed parameter values are 5 and 9.

Example usage:

Show:
devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
pci/0000:4b:00.0:
  name tx_scheduling_layers type driver-specific
    values:
      cmode permanent value 9

Set:
devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
cmode permanent

devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 9
cmode permanent

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 169 ++++++++++++++++++
 .../net/ethernet/intel/ice/ice_fw_update.c    |   7 +-
 .../net/ethernet/intel/ice/ice_fw_update.h    |   3 +
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   7 +-
 drivers/net/ethernet/intel/ice/ice_nvm.h      |   3 +
 6 files changed, 189 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 02102e937b30..4143b50bd15d 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1692,6 +1692,14 @@ struct ice_aqc_nvm {
 };
 
 #define ICE_AQC_NVM_START_POINT			0
+#define ICE_AQC_NVM_TX_TOPO_MOD_ID             0x14B
+
+struct ice_aqc_nvm_tx_topo_user_sel {
+	__le16 length;
+	u8 data;
+#define ICE_AQC_NVM_TX_TOPO_USER_SEL	BIT(4)
+	u8 reserved;
+};
 
 /* NVM Checksum Command (direct, 0x0706) */
 struct ice_aqc_nvm_checksum {
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index cc717175178b..db4872990e51 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -770,6 +770,167 @@ ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port,
 	return ice_devlink_port_split(devlink, port, 1, extack);
 }
 
+/**
+ * ice_get_tx_topo_user_sel - Read user's choice from flash
+ * @pf: pointer to pf structure
+ * @layers: value read from flash will be saved here
+ *
+ * Reads user's preference for Tx Scheduler Topology Tree from PFA TLV.
+ *
+ * Returns zero when read was successful, negative values otherwise.
+ */
+static int ice_get_tx_topo_user_sel(struct ice_pf *pf, uint8_t *layers)
+{
+	struct ice_aqc_nvm_tx_topo_user_sel usr_sel = {};
+	struct ice_hw *hw = &pf->hw;
+	int err;
+
+	err = ice_acquire_nvm(hw, ICE_RES_READ);
+	if (err)
+		return err;
+
+	err = ice_aq_read_nvm(hw, ICE_AQC_NVM_TX_TOPO_MOD_ID, 0,
+			     sizeof(usr_sel), &usr_sel, true, true, NULL);
+	if (err)
+		goto exit_release_res;
+
+	if (usr_sel.data & ICE_AQC_NVM_TX_TOPO_USER_SEL)
+		*layers = ICE_SCHED_5_LAYERS;
+       else
+		*layers = ICE_SCHED_9_LAYERS;
+
+exit_release_res:
+	ice_release_nvm(hw);
+
+	return err;
+}
+
+/**
+ * ice_update_tx_topo_user_sel - Save user's preference in flash
+ * @pf: pointer to pf structure
+ * @layers: value to be saved in flash
+ *
+ * Variable "layers" defines user's preference about number of layers in Tx
+ * Scheduler Topology Tree. This choice should be stored in PFA TLV field
+ * and be picked up by driver, next time during init.
+ *
+ * Returns zero when save was successful, negative values otherwise.
+ */
+static int ice_update_tx_topo_user_sel(struct ice_pf *pf, int layers)
+{
+	struct ice_aqc_nvm_tx_topo_user_sel usr_sel = {};
+	struct ice_hw *hw = &pf->hw;
+	int err;
+
+	err = ice_acquire_nvm(hw, ICE_RES_WRITE);
+	if (err)
+		return err;
+
+	err = ice_aq_read_nvm(hw, ICE_AQC_NVM_TX_TOPO_MOD_ID, 0,
+			      sizeof(usr_sel), &usr_sel, true, true, NULL);
+	if (err)
+		goto exit_release_res;
+
+	if (layers == ICE_SCHED_5_LAYERS)
+		usr_sel.data |= ICE_AQC_NVM_TX_TOPO_USER_SEL;
+	else
+		usr_sel.data &= ~ICE_AQC_NVM_TX_TOPO_USER_SEL;
+
+	err = ice_write_one_nvm_block(pf, ICE_AQC_NVM_TX_TOPO_MOD_ID, 2,
+				      sizeof(usr_sel.data), &usr_sel.data,
+				      true, NULL, NULL);
+	if (err)
+		err = -EIO;
+
+exit_release_res:
+	ice_release_nvm(hw);
+
+	return err;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_get - Get tx_scheduling_layers parameter
+ * @devlink: pointer to the devlink instance
+ * @id: the parameter ID to set
+ * @ctx: context to store the parameter value
+ *
+ * Returns zero on success and negative value on failure.
+ */
+static int ice_devlink_tx_sched_layers_get(struct devlink *devlink, u32 id,
+					   struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_get_tx_topo_user_sel(pf, &ctx->val.vu8);
+	if (err) {
+		dev_warn(dev, "Failed to read Tx Scheduler Tree - User Selection data from flash\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_set - Set tx_scheduling_layers parameter
+ * @devlink: pointer to the devlink instance
+ * @id: the parameter ID to set
+ * @ctx: context to get the parameter value
+ *
+ * Returns zero on success and negative value on failure.
+ */
+static int ice_devlink_tx_sched_layers_set(struct devlink *devlink, u32 id,
+					   struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct device *dev = ice_pf_to_dev(pf);
+	int err;
+
+	err = ice_update_tx_topo_user_sel(pf, ctx->val.vu8);
+	if (err)
+		return -EIO;
+
+	dev_warn(dev, "Tx scheduling layers have been changed on this device. You must reboot the system for the change to take effect.");
+
+	return 0;
+}
+
+/**
+ * ice_devlink_tx_sched_layers_validate - Validate passed tx_scheduling_layers
+ *                                       parameter value
+ * @devlink: unused pointer to devlink instance
+ * @id: the parameter ID to validate
+ * @val: value to validate
+ * @extack: netlink extended ACK structure
+ *
+ * Supported values are:
+ * - 5 - five layers Tx Scheduler Topology Tree
+ * - 9 - nine layers Tx Scheduler Topology Tree
+ *
+ * Returns zero when passed parameter value is supported. Negative value on
+ * error.
+ */
+static int ice_devlink_tx_sched_layers_validate(struct devlink *devlink, u32 id,
+					        union devlink_param_value val,
+					        struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct ice_hw *hw = &pf->hw;
+
+	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
+		NL_SET_ERR_MSG_MOD(extack, "Error: Requested feature is not supported by the FW on this device. Update the FW and run this command again.");
+		return -EOPNOTSUPP;
+	}
+
+	if (val.vu8 != ICE_SCHED_5_LAYERS && val.vu8 != ICE_SCHED_9_LAYERS) {
+		NL_SET_ERR_MSG_MOD(extack, "Error: Wrong number of tx scheduler layers provided.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * ice_tear_down_devlink_rate_tree - removes devlink-rate exported tree
  * @pf: pf struct
@@ -1601,6 +1762,7 @@ static int ice_devlink_loopback_validate(struct devlink *devlink, u32 id,
 enum ice_param_id {
 	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	ICE_DEVLINK_PARAM_ID_LOOPBACK,
+	ICE_DEVLINK_PARAM_ID_TX_BALANCE,
 };
 
 static const struct devlink_param ice_devlink_params[] = {
@@ -1618,6 +1780,13 @@ static const struct devlink_param ice_devlink_params[] = {
 			     ice_devlink_loopback_get,
 			     ice_devlink_loopback_set,
 			     ice_devlink_loopback_validate),
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_BALANCE,
+			     "tx_scheduling_layers",
+			     DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+			     ice_devlink_tx_sched_layers_get,
+			     ice_devlink_tx_sched_layers_set,
+			     ice_devlink_tx_sched_layers_validate),
 };
 
 static void ice_devlink_free(void *devlink_ptr)
diff --git a/drivers/net/ethernet/intel/ice/ice_fw_update.c b/drivers/net/ethernet/intel/ice/ice_fw_update.c
index 319a2d6fe26c..f81db6c107c8 100644
--- a/drivers/net/ethernet/intel/ice/ice_fw_update.c
+++ b/drivers/net/ethernet/intel/ice/ice_fw_update.c
@@ -286,10 +286,9 @@ ice_send_component_table(struct pldmfw *context, struct pldmfw_component *compon
  *
  * Returns: zero on success, or a negative error code on failure.
  */
-static int
-ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
-			u16 block_size, u8 *block, bool last_cmd,
-			u8 *reset_level, struct netlink_ext_ack *extack)
+int ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
+			    u16 block_size, u8 *block, bool last_cmd,
+			    u8 *reset_level, struct netlink_ext_ack *extack)
 {
 	u16 completion_module, completion_retval;
 	struct device *dev = ice_pf_to_dev(pf);
diff --git a/drivers/net/ethernet/intel/ice/ice_fw_update.h b/drivers/net/ethernet/intel/ice/ice_fw_update.h
index 750574885716..04b200462757 100644
--- a/drivers/net/ethernet/intel/ice/ice_fw_update.h
+++ b/drivers/net/ethernet/intel/ice/ice_fw_update.h
@@ -9,5 +9,8 @@ int ice_devlink_flash_update(struct devlink *devlink,
 			     struct netlink_ext_ack *extack);
 int ice_get_pending_updates(struct ice_pf *pf, u8 *pending,
 			    struct netlink_ext_ack *extack);
+int ice_write_one_nvm_block(struct ice_pf *pf, u16 module, u32 offset,
+			    u16 block_size, u8 *block, bool last_cmd,
+			    u8 *reset_level, struct netlink_ext_ack *extack);
 
 #endif
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c
index d4e05d2cb30c..84eab92dc03c 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -18,10 +18,9 @@
  *
  * Read the NVM using the admin queue commands (0x0701)
  */
-static int
-ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset, u16 length,
-		void *data, bool last_command, bool read_shadow_ram,
-		struct ice_sq_cd *cd)
+int ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset,
+		    u16 length, void *data, bool last_command,
+		    bool read_shadow_ram, struct ice_sq_cd *cd)
 {
 	struct ice_aq_desc desc;
 	struct ice_aqc_nvm *cmd;
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.h b/drivers/net/ethernet/intel/ice/ice_nvm.h
index 774c2317967d..63cdc6bdac58 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.h
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.h
@@ -14,6 +14,9 @@ struct ice_orom_civd_info {
 
 int ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access);
 void ice_release_nvm(struct ice_hw *hw);
+int ice_aq_read_nvm(struct ice_hw *hw, u16 module_typeid, u32 offset,
+		    u16 length, void *data, bool last_command,
+		    bool read_shadow_ram, struct ice_sq_cd *cd);
 int
 ice_read_flat_nvm(struct ice_hw *hw, u32 offset, u32 *length, u8 *data,
 		  bool read_shadow_ram);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 5/5] ice: Document tx_scheduling_layers parameter
  2024-02-19 10:05 ` Mateusz Polchlopek
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, horms, przemyslaw.kitszel, Michal Wilczynski, Mateusz Polchlopek

From: Michal Wilczynski <michal.wilczynski@intel.com>

New driver specific parameter 'tx_scheduling_layers' was introduced.
Describe parameter in the documentation.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 Documentation/networking/devlink/ice.rst | 41 ++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index efc6be109dc3..1ae46dee0fd5 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -36,6 +36,47 @@ Parameters
        The latter allows for bandwidth higher than external port speed
        when looping back traffic between VFs. Works with 8x10G and 4x25G
        cards.
+   * - ``tx_scheduling_layers``
+     - permanent
+     - The ice hardware uses hierarchical scheduling for Tx with a fixed
+       number of layers in the scheduling tree. Root node is representing a
+       port, while all the leaves represents the queues. This way of
+       configuring Tx scheduler allows features like DCB or devlink-rate
+       (documented below) for fine-grained configuration how much BW is given
+       to any given queue or group of queues, as scheduling parameters can be
+       configured at any given layer of the tree. By default 9-layer tree
+       topology was deemed best for most workloads, as it gives optimal
+       performance to configurability ratio. However for some specific cases,
+       this might not be the case. A great example would be sending traffic to
+       queues that is not a multiple of 8. Since in 9-layer topology maximum
+       number of children is limited to 8, the 9th queue has a different parent
+       than the rest, and it's given more BW credits. This causes a problem
+       when the system is sending traffic to 9 queues:
+
+       | tx_queue_0_packets: 24163396
+       | tx_queue_1_packets: 24164623
+       | tx_queue_2_packets: 24163188
+       | tx_queue_3_packets: 24163701
+       | tx_queue_4_packets: 24163683
+       | tx_queue_5_packets: 24164668
+       | tx_queue_6_packets: 23327200
+       | tx_queue_7_packets: 24163853
+       | tx_queue_8_packets: 91101417 < Too much traffic is sent to 9th
+
+       Sometimes this might be a big concern, so the idea is to empower the
+       user to switch to 5-layer topology, enabling performance gains but
+       sacrificing configurability for features like DCB and devlink-rate.
+
+       This parameter gives user flexibility to choose the 5-layer transmit
+       scheduler topology. After switching parameter reboot is required for
+       the feature to start working.
+
+       User could choose 9 (the default) or 5 as a value of parameter, e.g.:
+       $ devlink dev param set pci/0000:16:00.0 name tx_scheduling_layers
+       value 5 cmode permanent
+
+       And verify that value has been set:
+       $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
 
 Info versions
 =============
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 5/5] ice: Document tx_scheduling_layers parameter
@ 2024-02-19 10:05   ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:05 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Mateusz Polchlopek, Michal Wilczynski, horms, przemyslaw.kitszel

From: Michal Wilczynski <michal.wilczynski@intel.com>

New driver specific parameter 'tx_scheduling_layers' was introduced.
Describe parameter in the documentation.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 Documentation/networking/devlink/ice.rst | 41 ++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index efc6be109dc3..1ae46dee0fd5 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -36,6 +36,47 @@ Parameters
        The latter allows for bandwidth higher than external port speed
        when looping back traffic between VFs. Works with 8x10G and 4x25G
        cards.
+   * - ``tx_scheduling_layers``
+     - permanent
+     - The ice hardware uses hierarchical scheduling for Tx with a fixed
+       number of layers in the scheduling tree. Root node is representing a
+       port, while all the leaves represents the queues. This way of
+       configuring Tx scheduler allows features like DCB or devlink-rate
+       (documented below) for fine-grained configuration how much BW is given
+       to any given queue or group of queues, as scheduling parameters can be
+       configured at any given layer of the tree. By default 9-layer tree
+       topology was deemed best for most workloads, as it gives optimal
+       performance to configurability ratio. However for some specific cases,
+       this might not be the case. A great example would be sending traffic to
+       queues that is not a multiple of 8. Since in 9-layer topology maximum
+       number of children is limited to 8, the 9th queue has a different parent
+       than the rest, and it's given more BW credits. This causes a problem
+       when the system is sending traffic to 9 queues:
+
+       | tx_queue_0_packets: 24163396
+       | tx_queue_1_packets: 24164623
+       | tx_queue_2_packets: 24163188
+       | tx_queue_3_packets: 24163701
+       | tx_queue_4_packets: 24163683
+       | tx_queue_5_packets: 24164668
+       | tx_queue_6_packets: 23327200
+       | tx_queue_7_packets: 24163853
+       | tx_queue_8_packets: 91101417 < Too much traffic is sent to 9th
+
+       Sometimes this might be a big concern, so the idea is to empower the
+       user to switch to 5-layer topology, enabling performance gains but
+       sacrificing configurability for features like DCB and devlink-rate.
+
+       This parameter gives user flexibility to choose the 5-layer transmit
+       scheduler topology. After switching parameter reboot is required for
+       the feature to start working.
+
+       User could choose 9 (the default) or 5 as a value of parameter, e.g.:
+       $ devlink dev param set pci/0000:16:00.0 name tx_scheduling_layers
+       value 5 cmode permanent
+
+       And verify that value has been set:
+       $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
 
 Info versions
 =============
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 1/5] ice: Support 5 layer topology
  2024-02-19 10:05   ` Mateusz Polchlopek
@ 2024-02-19 10:16     ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:16 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Michal Wilczynski, Raj Victor, horms, przemyslaw.kitszel

On 2/19/2024 11:05 AM, Mateusz Polchlopek wrote:
> From: Raj Victor <victor.raj@intel.com>
> 
> There is a performance issue when the number of VSIs are not multiple
> of 8. This is caused due to the max children limitation per node(8) in
> 9 layer topology. The BW credits are shared evenly among the children
> by default. Assume one node has 8 children and the other has 1.
> The parent of these nodes share the BW credit equally among them.
> Apparently this causes a problem for the first node which has 8 children.
> The 9th VM get more BW credits than the first 8 VMs.
> 
> Example:
> 
> 1) With 8 VM's:
> for x in 0 1 2 3 4 5 6 7;
> do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done
> 
> tx_queue_0_packets: 23283027
> tx_queue_1_packets: 23292289
> tx_queue_2_packets: 23276136
> tx_queue_3_packets: 23279828
> tx_queue_4_packets: 23279828
> tx_queue_5_packets: 23279333
> tx_queue_6_packets: 23277745
> tx_queue_7_packets: 23279950
> tx_queue_8_packets: 0
> 
> 2) With 9 VM's:
> for x in 0 1 2 3 4 5 6 7 8;
> do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done
> 
> tx_queue_0_packets: 24163396
> tx_queue_1_packets: 24164623
> tx_queue_2_packets: 24163188
> tx_queue_3_packets: 24163701
> tx_queue_4_packets: 24163683
> tx_queue_5_packets: 24164668
> tx_queue_6_packets: 23327200
> tx_queue_7_packets: 24163853
> tx_queue_8_packets: 91101417
> 
> So on average queue 8 statistics show that 3.7 times more packets were
> send there than to the other queues.
> 
> The FW starting with version 3.20, has increased the max number of
> children per node by reducing the number of layers from 9 to 5. Reflect
> this on driver side.
> 
> Signed-off-by: Raj Victor <victor.raj@intel.com>
> Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> ---
>   .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  23 ++
>   drivers/net/ethernet/intel/ice/ice_common.c   |   5 +
>   drivers/net/ethernet/intel/ice/ice_ddp.c      | 199 ++++++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_ddp.h      |   2 +
>   drivers/net/ethernet/intel/ice/ice_sched.h    |   3 +
>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>   6 files changed, 233 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
> index b315c734455a..02102e937b30 100644
> --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
> +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
> @@ -121,6 +121,7 @@ struct ice_aqc_list_caps_elem {
>   #define ICE_AQC_CAPS_PCIE_RESET_AVOIDANCE		0x0076
>   #define ICE_AQC_CAPS_POST_UPDATE_RESET_RESTRICT		0x0077
>   #define ICE_AQC_CAPS_NVM_MGMT				0x0080
> +#define ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE		0x0085
>   #define ICE_AQC_CAPS_FW_LAG_SUPPORT			0x0092
>   #define ICE_AQC_BIT_ROCEV2_LAG				0x01
>   #define ICE_AQC_BIT_SRIOV_LAG				0x02
> @@ -819,6 +820,23 @@ struct ice_aqc_get_topo {
>   	__le32 addr_low;
>   };
>   
> +/* Get/Set Tx Topology (indirect 0x0418/0x0417) */
> +struct ice_aqc_get_set_tx_topo {
> +	u8 set_flags;
> +#define ICE_AQC_TX_TOPO_FLAGS_CORRER		BIT(0)
> +#define ICE_AQC_TX_TOPO_FLAGS_SRC_RAM		BIT(1)
> +#define ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW		BIT(4)
> +#define ICE_AQC_TX_TOPO_FLAGS_ISSUED		BIT(5)
> +
> +	u8 get_flags;
> +#define ICE_AQC_TX_TOPO_GET_RAM		2
> +
> +	__le16 reserved1;
> +	__le32 reserved2;
> +	__le32 addr_high;
> +	__le32 addr_low;
> +};
> +
>   /* Update TSE (indirect 0x0403)
>    * Get TSE (indirect 0x0404)
>    * Add TSE (indirect 0x0401)
> @@ -2547,6 +2565,7 @@ struct ice_aq_desc {
>   		struct ice_aqc_get_link_topo get_link_topo;
>   		struct ice_aqc_i2c read_write_i2c;
>   		struct ice_aqc_read_i2c_resp read_i2c_resp;
> +		struct ice_aqc_get_set_tx_topo get_set_tx_topo;
>   	} params;
>   };
>   
> @@ -2653,6 +2672,10 @@ enum ice_adminq_opc {
>   	ice_aqc_opc_query_sched_res			= 0x0412,
>   	ice_aqc_opc_remove_rl_profiles			= 0x0415,
>   
> +	/* tx topology commands */
> +	ice_aqc_opc_set_tx_topo				= 0x0417,
> +	ice_aqc_opc_get_tx_topo				= 0x0418,
> +
>   	/* PHY commands */
>   	ice_aqc_opc_get_phy_caps			= 0x0600,
>   	ice_aqc_opc_set_phy_cfg				= 0x0601,
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index 090a2b8b5ff2..175091011251 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -1622,6 +1622,8 @@ ice_aq_send_cmd(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf,
>   	case ice_aqc_opc_set_port_params:
>   	case ice_aqc_opc_get_vlan_mode_parameters:
>   	case ice_aqc_opc_set_vlan_mode_parameters:
> +	case ice_aqc_opc_set_tx_topo:
> +	case ice_aqc_opc_get_tx_topo:
>   	case ice_aqc_opc_add_recipe:
>   	case ice_aqc_opc_recipe_to_profile:
>   	case ice_aqc_opc_get_recipe:
> @@ -2178,6 +2180,9 @@ ice_parse_common_caps(struct ice_hw *hw, struct ice_hw_common_caps *caps,
>   		ice_debug(hw, ICE_DBG_INIT, "%s: sriov_lag = %u\n",
>   			  prefix, caps->sriov_lag);
>   		break;
> +	case ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE:
> +		caps->tx_sched_topo_comp_mode_en = (number == 1);
> +		break;
>   	default:
>   		/* Not one of the recognized common capabilities */
>   		found = false;
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
> index 7532d11ad7f3..766437944774 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
> @@ -4,6 +4,7 @@
>   #include "ice_common.h"
>   #include "ice.h"
>   #include "ice_ddp.h"
> +#include "ice_sched.h"
>   
>   /* For supporting double VLAN mode, it is necessary to enable or disable certain
>    * boost tcam entries. The metadata labels names that match the following
> @@ -2263,3 +2264,201 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
>   
>   	return state;
>   }
> +
> +/**
> + * ice_get_set_tx_topo - get or set Tx topology
> + * @hw: pointer to the HW struct
> + * @buf: pointer to Tx topology buffer
> + * @buf_size: buffer size
> + * @cd: pointer to command details structure or NULL
> + * @flags: pointer to descriptor flags
> + * @set: 0-get, 1-set topology
> + *
> + * The function will get or set Tx topology
> + */
> +static int
> +ice_get_set_tx_topo(struct ice_hw *hw, u8 *buf, u16 buf_size,
> +		    struct ice_sq_cd *cd, u8 *flags, bool set)
> +{
> +	struct ice_aqc_get_set_tx_topo *cmd;
> +	struct ice_aq_desc desc;
> +	int status;
> +
> +	cmd = &desc.params.get_set_tx_topo;
> +	if (set) {
> +		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_tx_topo);
> +		cmd->set_flags = ICE_AQC_TX_TOPO_FLAGS_ISSUED;
> +		/* requested to update a new topology, not a default topology */
> +		if (buf)
> +			cmd->set_flags |= ICE_AQC_TX_TOPO_FLAGS_SRC_RAM |
> +					  ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW;
> +	} else {
> +		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_tx_topo);
> +		cmd->get_flags = ICE_AQC_TX_TOPO_GET_RAM;
> +	}
> +	desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
> +	status = ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> +	if (status)
> +		return status;
> +	/* read the return flag values (first byte) for get operation */
> +	if (!set && flags)
> +		*flags = desc.params.get_set_tx_topo.set_flags;
> +
> +	return 0;
> +}
> +
> +/**
> + * ice_cfg_tx_topo - Initialize new Tx topology if available
> + * @hw: pointer to the HW struct
> + * @buf: pointer to Tx topology buffer
> + * @len: buffer size
> + *
> + * The function will apply the new Tx topology from the package buffer
> + * if available.
> + */
> +int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
> +{
> +	u8 *current_topo, *new_topo = NULL;
> +	struct ice_run_time_cfg_seg *seg;
> +	struct ice_buf_hdr *section;
> +	struct ice_pkg_hdr *pkg_hdr;
> +	enum ice_ddp_state state;
> +	u16 offset, size = 0;
> +	u32 reg = 0;
> +	int status;
> +	u8 flags;
> +
> +	if (!buf || !len)
> +		return -EINVAL;
> +
> +	/* Does FW support new Tx topology mode ? */
> +	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
> +		ice_debug(hw, ICE_DBG_INIT, "FW doesn't support compatibility mode\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	current_topo = kzalloc(ICE_AQ_MAX_BUF_LEN, GFP_KERNEL);
> +	if (!current_topo)
> +		return -ENOMEM;
> +
> +	/* Get the current Tx topology */
> +	status = ice_get_set_tx_topo(hw, current_topo, ICE_AQ_MAX_BUF_LEN, NULL,
> +				     &flags, false);
> +
> +	kfree(current_topo);
> +
> +	if (status) {
> +		ice_debug(hw, ICE_DBG_INIT, "Get current topology is failed\n");
> +		return status;
> +	}
> +
> +	/* Is default topology already applied ? */
> +	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
> +	    hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS) {
> +		ice_debug(hw, ICE_DBG_INIT, "Default topology already applied\n");
> +		return -EEXIST;
> +	}
> +
> +	/* Is new topology already applied ? */
> +	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
> +	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
> +		ice_debug(hw, ICE_DBG_INIT, "New topology already applied\n");
> +		return -EEXIST;
> +	}
> +
> +	/* Setting topology already issued? */
> +	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
> +		ice_debug(hw, ICE_DBG_INIT, "Update Tx topology was done by another PF\n");
> +		/* Add a small delay before exiting */
> +		msleep(2000);
> +		return -EEXIST;
> +	}
> +
> +	/* Change the topology from new to default (5 to 9) */
> +	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
> +	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
> +		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
> +		goto update_topo;
> +	}
> +
> +	pkg_hdr = (struct ice_pkg_hdr *)buf;
> +	state = ice_verify_pkg(pkg_hdr, len);
> +	if (state) {
> +		ice_debug(hw, ICE_DBG_INIT, "Failed to verify pkg (err: %d)\n",
> +			  state);
> +		return -EIO;
> +	}
> +
> +	/* Find runtime configuration segment */
> +	seg = (struct ice_run_time_cfg_seg *)
> +	      ice_find_seg_in_pkg(hw, SEGMENT_TYPE_ICE_RUN_TIME_CFG, pkg_hdr);
> +	if (!seg) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment is missing\n");
> +		return -EIO;
> +	}
> +
> +	if (le32_to_cpu(seg->buf_table.buf_count) < ICE_MIN_S_COUNT) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment count(%d) is wrong\n",
> +			  seg->buf_table.buf_count);
> +		return -EIO;
> +	}
> +
> +	section = ice_pkg_val_buf(seg->buf_table.buf_array);
> +	if (!section || le32_to_cpu(section->section_entry[0].type) !=
> +		ICE_SID_TX_5_LAYER_TOPO) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section type is wrong\n");
> +		return -EIO;
> +	}
> +
> +	size = le16_to_cpu(section->section_entry[0].size);
> +	offset = le16_to_cpu(section->section_entry[0].offset);
> +	if (size < ICE_MIN_S_SZ || size > ICE_MAX_S_SZ) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section size is wrong\n");
> +		return -EIO;
> +	}
> +
> +	/* Make sure the section fits in the buffer */
> +	if (offset + size > ICE_PKG_BUF_SIZE) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology buffer > 4K\n");
> +		return -EIO;
> +	}
> +
> +	/* Get the new topology buffer */
> +	new_topo = ((u8 *)section) + offset;
> +
> +update_topo:
> +	/* Acquire global lock to make sure that set topology issued
> +	 * by one PF.
> +	 */
> +	status = ice_acquire_res(hw, ICE_GLOBAL_CFG_LOCK_RES_ID, ICE_RES_WRITE,
> +				 ICE_GLOBAL_CFG_LOCK_TIMEOUT);
> +	if (status) {
> +		ice_debug(hw, ICE_DBG_INIT, "Failed to acquire global lock\n");
> +		return status;
> +	}
> +
> +	/* Check if reset was triggered already. */
> +	reg = rd32(hw, GLGEN_RSTAT);
> +	if (reg & GLGEN_RSTAT_DEVSTATE_M) {
> +		/* Reset is in progress, re-init the HW again */
> +		ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. Layer topology might be applied already\n");
> +		ice_check_reset(hw);
> +		return 0;
> +	}
> +
> +	/* Set new topology */
> +	status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true);
> +	if (status) {
> +		ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology\n");
> +		return status;
> +	}
> +
> +	/* New topology is updated, delay 1 second before issuing the CORER */
> +	msleep(1000);
> +	ice_reset(hw, ICE_RESET_CORER);
> +	/* CORER will clear the global lock, so no explicit call
> +	 * required for release.
> +	 */
> +
> +	return 0;
> +}
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.h b/drivers/net/ethernet/intel/ice/ice_ddp.h
> index ff66c2ffb1a2..622543f08b43 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.h
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.h
> @@ -454,4 +454,6 @@ u16 ice_pkg_buf_get_active_sections(struct ice_buf_build *bld);
>   void *ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
>   			   u32 sect_type);
>   
> +int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len);
> +
>   #endif
> diff --git a/drivers/net/ethernet/intel/ice/ice_sched.h b/drivers/net/ethernet/intel/ice/ice_sched.h
> index 1aef05ea5a57..9baff6a857d8 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sched.h
> +++ b/drivers/net/ethernet/intel/ice/ice_sched.h
> @@ -6,6 +6,9 @@
>   
>   #include "ice_common.h"
>   
> +#define ICE_SCHED_5_LAYERS	5
> +#define ICE_SCHED_9_LAYERS	9
> +
>   #define SCHED_NODE_NAME_MAX_LEN 32
>   
>   #define ICE_QGRP_LAYER_OFFSET	2
> diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
> index 657f97e2105f..f964f26664d0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_type.h
> +++ b/drivers/net/ethernet/intel/ice/ice_type.h
> @@ -296,6 +296,7 @@ struct ice_hw_common_caps {
>   	bool pcie_reset_avoidance;
>   	/* Post update reset restriction */
>   	bool reset_restrict_support;
> +	bool tx_sched_topo_comp_mode_en;
>   };
>   
>   /* IEEE 1588 TIME_SYNC specific info */

This is of course v4, not v1, sorry for the mistake in tag
Mateusz

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 1/5] ice: Support 5 layer topology
@ 2024-02-19 10:16     ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-19 10:16 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, horms, przemyslaw.kitszel, Raj Victor, Michal Wilczynski

On 2/19/2024 11:05 AM, Mateusz Polchlopek wrote:
> From: Raj Victor <victor.raj@intel.com>
> 
> There is a performance issue when the number of VSIs are not multiple
> of 8. This is caused due to the max children limitation per node(8) in
> 9 layer topology. The BW credits are shared evenly among the children
> by default. Assume one node has 8 children and the other has 1.
> The parent of these nodes share the BW credit equally among them.
> Apparently this causes a problem for the first node which has 8 children.
> The 9th VM get more BW credits than the first 8 VMs.
> 
> Example:
> 
> 1) With 8 VM's:
> for x in 0 1 2 3 4 5 6 7;
> do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done
> 
> tx_queue_0_packets: 23283027
> tx_queue_1_packets: 23292289
> tx_queue_2_packets: 23276136
> tx_queue_3_packets: 23279828
> tx_queue_4_packets: 23279828
> tx_queue_5_packets: 23279333
> tx_queue_6_packets: 23277745
> tx_queue_7_packets: 23279950
> tx_queue_8_packets: 0
> 
> 2) With 9 VM's:
> for x in 0 1 2 3 4 5 6 7 8;
> do taskset -c ${x} netperf -P0 -H 172.68.169.125 &  sleep .1 ; done
> 
> tx_queue_0_packets: 24163396
> tx_queue_1_packets: 24164623
> tx_queue_2_packets: 24163188
> tx_queue_3_packets: 24163701
> tx_queue_4_packets: 24163683
> tx_queue_5_packets: 24164668
> tx_queue_6_packets: 23327200
> tx_queue_7_packets: 24163853
> tx_queue_8_packets: 91101417
> 
> So on average queue 8 statistics show that 3.7 times more packets were
> send there than to the other queues.
> 
> The FW starting with version 3.20, has increased the max number of
> children per node by reducing the number of layers from 9 to 5. Reflect
> this on driver side.
> 
> Signed-off-by: Raj Victor <victor.raj@intel.com>
> Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com>
> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> ---
>   .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  23 ++
>   drivers/net/ethernet/intel/ice/ice_common.c   |   5 +
>   drivers/net/ethernet/intel/ice/ice_ddp.c      | 199 ++++++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_ddp.h      |   2 +
>   drivers/net/ethernet/intel/ice/ice_sched.h    |   3 +
>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>   6 files changed, 233 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
> index b315c734455a..02102e937b30 100644
> --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
> +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
> @@ -121,6 +121,7 @@ struct ice_aqc_list_caps_elem {
>   #define ICE_AQC_CAPS_PCIE_RESET_AVOIDANCE		0x0076
>   #define ICE_AQC_CAPS_POST_UPDATE_RESET_RESTRICT		0x0077
>   #define ICE_AQC_CAPS_NVM_MGMT				0x0080
> +#define ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE		0x0085
>   #define ICE_AQC_CAPS_FW_LAG_SUPPORT			0x0092
>   #define ICE_AQC_BIT_ROCEV2_LAG				0x01
>   #define ICE_AQC_BIT_SRIOV_LAG				0x02
> @@ -819,6 +820,23 @@ struct ice_aqc_get_topo {
>   	__le32 addr_low;
>   };
>   
> +/* Get/Set Tx Topology (indirect 0x0418/0x0417) */
> +struct ice_aqc_get_set_tx_topo {
> +	u8 set_flags;
> +#define ICE_AQC_TX_TOPO_FLAGS_CORRER		BIT(0)
> +#define ICE_AQC_TX_TOPO_FLAGS_SRC_RAM		BIT(1)
> +#define ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW		BIT(4)
> +#define ICE_AQC_TX_TOPO_FLAGS_ISSUED		BIT(5)
> +
> +	u8 get_flags;
> +#define ICE_AQC_TX_TOPO_GET_RAM		2
> +
> +	__le16 reserved1;
> +	__le32 reserved2;
> +	__le32 addr_high;
> +	__le32 addr_low;
> +};
> +
>   /* Update TSE (indirect 0x0403)
>    * Get TSE (indirect 0x0404)
>    * Add TSE (indirect 0x0401)
> @@ -2547,6 +2565,7 @@ struct ice_aq_desc {
>   		struct ice_aqc_get_link_topo get_link_topo;
>   		struct ice_aqc_i2c read_write_i2c;
>   		struct ice_aqc_read_i2c_resp read_i2c_resp;
> +		struct ice_aqc_get_set_tx_topo get_set_tx_topo;
>   	} params;
>   };
>   
> @@ -2653,6 +2672,10 @@ enum ice_adminq_opc {
>   	ice_aqc_opc_query_sched_res			= 0x0412,
>   	ice_aqc_opc_remove_rl_profiles			= 0x0415,
>   
> +	/* tx topology commands */
> +	ice_aqc_opc_set_tx_topo				= 0x0417,
> +	ice_aqc_opc_get_tx_topo				= 0x0418,
> +
>   	/* PHY commands */
>   	ice_aqc_opc_get_phy_caps			= 0x0600,
>   	ice_aqc_opc_set_phy_cfg				= 0x0601,
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index 090a2b8b5ff2..175091011251 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -1622,6 +1622,8 @@ ice_aq_send_cmd(struct ice_hw *hw, struct ice_aq_desc *desc, void *buf,
>   	case ice_aqc_opc_set_port_params:
>   	case ice_aqc_opc_get_vlan_mode_parameters:
>   	case ice_aqc_opc_set_vlan_mode_parameters:
> +	case ice_aqc_opc_set_tx_topo:
> +	case ice_aqc_opc_get_tx_topo:
>   	case ice_aqc_opc_add_recipe:
>   	case ice_aqc_opc_recipe_to_profile:
>   	case ice_aqc_opc_get_recipe:
> @@ -2178,6 +2180,9 @@ ice_parse_common_caps(struct ice_hw *hw, struct ice_hw_common_caps *caps,
>   		ice_debug(hw, ICE_DBG_INIT, "%s: sriov_lag = %u\n",
>   			  prefix, caps->sriov_lag);
>   		break;
> +	case ICE_AQC_CAPS_TX_SCHED_TOPO_COMP_MODE:
> +		caps->tx_sched_topo_comp_mode_en = (number == 1);
> +		break;
>   	default:
>   		/* Not one of the recognized common capabilities */
>   		found = false;
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.c b/drivers/net/ethernet/intel/ice/ice_ddp.c
> index 7532d11ad7f3..766437944774 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.c
> @@ -4,6 +4,7 @@
>   #include "ice_common.h"
>   #include "ice.h"
>   #include "ice_ddp.h"
> +#include "ice_sched.h"
>   
>   /* For supporting double VLAN mode, it is necessary to enable or disable certain
>    * boost tcam entries. The metadata labels names that match the following
> @@ -2263,3 +2264,201 @@ enum ice_ddp_state ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf,
>   
>   	return state;
>   }
> +
> +/**
> + * ice_get_set_tx_topo - get or set Tx topology
> + * @hw: pointer to the HW struct
> + * @buf: pointer to Tx topology buffer
> + * @buf_size: buffer size
> + * @cd: pointer to command details structure or NULL
> + * @flags: pointer to descriptor flags
> + * @set: 0-get, 1-set topology
> + *
> + * The function will get or set Tx topology
> + */
> +static int
> +ice_get_set_tx_topo(struct ice_hw *hw, u8 *buf, u16 buf_size,
> +		    struct ice_sq_cd *cd, u8 *flags, bool set)
> +{
> +	struct ice_aqc_get_set_tx_topo *cmd;
> +	struct ice_aq_desc desc;
> +	int status;
> +
> +	cmd = &desc.params.get_set_tx_topo;
> +	if (set) {
> +		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_set_tx_topo);
> +		cmd->set_flags = ICE_AQC_TX_TOPO_FLAGS_ISSUED;
> +		/* requested to update a new topology, not a default topology */
> +		if (buf)
> +			cmd->set_flags |= ICE_AQC_TX_TOPO_FLAGS_SRC_RAM |
> +					  ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW;
> +	} else {
> +		ice_fill_dflt_direct_cmd_desc(&desc, ice_aqc_opc_get_tx_topo);
> +		cmd->get_flags = ICE_AQC_TX_TOPO_GET_RAM;
> +	}
> +	desc.flags |= cpu_to_le16(ICE_AQ_FLAG_RD);
> +	status = ice_aq_send_cmd(hw, &desc, buf, buf_size, cd);
> +	if (status)
> +		return status;
> +	/* read the return flag values (first byte) for get operation */
> +	if (!set && flags)
> +		*flags = desc.params.get_set_tx_topo.set_flags;
> +
> +	return 0;
> +}
> +
> +/**
> + * ice_cfg_tx_topo - Initialize new Tx topology if available
> + * @hw: pointer to the HW struct
> + * @buf: pointer to Tx topology buffer
> + * @len: buffer size
> + *
> + * The function will apply the new Tx topology from the package buffer
> + * if available.
> + */
> +int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
> +{
> +	u8 *current_topo, *new_topo = NULL;
> +	struct ice_run_time_cfg_seg *seg;
> +	struct ice_buf_hdr *section;
> +	struct ice_pkg_hdr *pkg_hdr;
> +	enum ice_ddp_state state;
> +	u16 offset, size = 0;
> +	u32 reg = 0;
> +	int status;
> +	u8 flags;
> +
> +	if (!buf || !len)
> +		return -EINVAL;
> +
> +	/* Does FW support new Tx topology mode ? */
> +	if (!hw->func_caps.common_cap.tx_sched_topo_comp_mode_en) {
> +		ice_debug(hw, ICE_DBG_INIT, "FW doesn't support compatibility mode\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	current_topo = kzalloc(ICE_AQ_MAX_BUF_LEN, GFP_KERNEL);
> +	if (!current_topo)
> +		return -ENOMEM;
> +
> +	/* Get the current Tx topology */
> +	status = ice_get_set_tx_topo(hw, current_topo, ICE_AQ_MAX_BUF_LEN, NULL,
> +				     &flags, false);
> +
> +	kfree(current_topo);
> +
> +	if (status) {
> +		ice_debug(hw, ICE_DBG_INIT, "Get current topology is failed\n");
> +		return status;
> +	}
> +
> +	/* Is default topology already applied ? */
> +	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
> +	    hw->num_tx_sched_layers == ICE_SCHED_9_LAYERS) {
> +		ice_debug(hw, ICE_DBG_INIT, "Default topology already applied\n");
> +		return -EEXIST;
> +	}
> +
> +	/* Is new topology already applied ? */
> +	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
> +	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
> +		ice_debug(hw, ICE_DBG_INIT, "New topology already applied\n");
> +		return -EEXIST;
> +	}
> +
> +	/* Setting topology already issued? */
> +	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
> +		ice_debug(hw, ICE_DBG_INIT, "Update Tx topology was done by another PF\n");
> +		/* Add a small delay before exiting */
> +		msleep(2000);
> +		return -EEXIST;
> +	}
> +
> +	/* Change the topology from new to default (5 to 9) */
> +	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
> +	    hw->num_tx_sched_layers == ICE_SCHED_5_LAYERS) {
> +		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
> +		goto update_topo;
> +	}
> +
> +	pkg_hdr = (struct ice_pkg_hdr *)buf;
> +	state = ice_verify_pkg(pkg_hdr, len);
> +	if (state) {
> +		ice_debug(hw, ICE_DBG_INIT, "Failed to verify pkg (err: %d)\n",
> +			  state);
> +		return -EIO;
> +	}
> +
> +	/* Find runtime configuration segment */
> +	seg = (struct ice_run_time_cfg_seg *)
> +	      ice_find_seg_in_pkg(hw, SEGMENT_TYPE_ICE_RUN_TIME_CFG, pkg_hdr);
> +	if (!seg) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment is missing\n");
> +		return -EIO;
> +	}
> +
> +	if (le32_to_cpu(seg->buf_table.buf_count) < ICE_MIN_S_COUNT) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology segment count(%d) is wrong\n",
> +			  seg->buf_table.buf_count);
> +		return -EIO;
> +	}
> +
> +	section = ice_pkg_val_buf(seg->buf_table.buf_array);
> +	if (!section || le32_to_cpu(section->section_entry[0].type) !=
> +		ICE_SID_TX_5_LAYER_TOPO) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section type is wrong\n");
> +		return -EIO;
> +	}
> +
> +	size = le16_to_cpu(section->section_entry[0].size);
> +	offset = le16_to_cpu(section->section_entry[0].offset);
> +	if (size < ICE_MIN_S_SZ || size > ICE_MAX_S_SZ) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology section size is wrong\n");
> +		return -EIO;
> +	}
> +
> +	/* Make sure the section fits in the buffer */
> +	if (offset + size > ICE_PKG_BUF_SIZE) {
> +		ice_debug(hw, ICE_DBG_INIT, "5 layer topology buffer > 4K\n");
> +		return -EIO;
> +	}
> +
> +	/* Get the new topology buffer */
> +	new_topo = ((u8 *)section) + offset;
> +
> +update_topo:
> +	/* Acquire global lock to make sure that set topology issued
> +	 * by one PF.
> +	 */
> +	status = ice_acquire_res(hw, ICE_GLOBAL_CFG_LOCK_RES_ID, ICE_RES_WRITE,
> +				 ICE_GLOBAL_CFG_LOCK_TIMEOUT);
> +	if (status) {
> +		ice_debug(hw, ICE_DBG_INIT, "Failed to acquire global lock\n");
> +		return status;
> +	}
> +
> +	/* Check if reset was triggered already. */
> +	reg = rd32(hw, GLGEN_RSTAT);
> +	if (reg & GLGEN_RSTAT_DEVSTATE_M) {
> +		/* Reset is in progress, re-init the HW again */
> +		ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. Layer topology might be applied already\n");
> +		ice_check_reset(hw);
> +		return 0;
> +	}
> +
> +	/* Set new topology */
> +	status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true);
> +	if (status) {
> +		ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology\n");
> +		return status;
> +	}
> +
> +	/* New topology is updated, delay 1 second before issuing the CORER */
> +	msleep(1000);
> +	ice_reset(hw, ICE_RESET_CORER);
> +	/* CORER will clear the global lock, so no explicit call
> +	 * required for release.
> +	 */
> +
> +	return 0;
> +}
> diff --git a/drivers/net/ethernet/intel/ice/ice_ddp.h b/drivers/net/ethernet/intel/ice/ice_ddp.h
> index ff66c2ffb1a2..622543f08b43 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ddp.h
> +++ b/drivers/net/ethernet/intel/ice/ice_ddp.h
> @@ -454,4 +454,6 @@ u16 ice_pkg_buf_get_active_sections(struct ice_buf_build *bld);
>   void *ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
>   			   u32 sect_type);
>   
> +int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len);
> +
>   #endif
> diff --git a/drivers/net/ethernet/intel/ice/ice_sched.h b/drivers/net/ethernet/intel/ice/ice_sched.h
> index 1aef05ea5a57..9baff6a857d8 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sched.h
> +++ b/drivers/net/ethernet/intel/ice/ice_sched.h
> @@ -6,6 +6,9 @@
>   
>   #include "ice_common.h"
>   
> +#define ICE_SCHED_5_LAYERS	5
> +#define ICE_SCHED_9_LAYERS	9
> +
>   #define SCHED_NODE_NAME_MAX_LEN 32
>   
>   #define ICE_QGRP_LAYER_OFFSET	2
> diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
> index 657f97e2105f..f964f26664d0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_type.h
> +++ b/drivers/net/ethernet/intel/ice/ice_type.h
> @@ -296,6 +296,7 @@ struct ice_hw_common_caps {
>   	bool pcie_reset_avoidance;
>   	/* Post update reset restriction */
>   	bool reset_restrict_support;
> +	bool tx_sched_topo_comp_mode_en;
>   };
>   
>   /* IEEE 1588 TIME_SYNC specific info */

This is of course v4, not v1, sorry for the mistake in tag
Mateusz

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-19 10:05   ` Mateusz Polchlopek
@ 2024-02-19 12:37     ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-19 12:37 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, horms, przemyslaw.kitszel, Lukasz Czapnik

Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>
>It was observed that Tx performance was inconsistent across all queues
>and/or VSIs and that it was directly connected to existing 9-layer
>topology of the Tx scheduler.
>
>Introduce new private devlink param - tx_scheduling_layers. This parameter
>gives user flexibility to choose the 5-layer transmit scheduler topology
>which helps to smooth out the transmit performance.
>
>Allowed parameter values are 5 and 9.
>
>Example usage:
>
>Show:
>devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>pci/0000:4b:00.0:
>  name tx_scheduling_layers type driver-specific
>    values:
>      cmode permanent value 9
>
>Set:
>devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>cmode permanent

This is kind of proprietary param similar to number of which were shot
down for mlx5 in past. Jakub?

Also, given this is apparently nvconfig configuration, there could be
probably more suitable to use some provisioning tool. This is related to
the mlx5 misc driver.

Until be figure out the plan, this has my nack:

NAcked-by: Jiri Pirko <jiri@nvidia.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-19 12:37     ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-19 12:37 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: netdev, Lukasz Czapnik, intel-wired-lan, horms, przemyslaw.kitszel

Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>
>It was observed that Tx performance was inconsistent across all queues
>and/or VSIs and that it was directly connected to existing 9-layer
>topology of the Tx scheduler.
>
>Introduce new private devlink param - tx_scheduling_layers. This parameter
>gives user flexibility to choose the 5-layer transmit scheduler topology
>which helps to smooth out the transmit performance.
>
>Allowed parameter values are 5 and 9.
>
>Example usage:
>
>Show:
>devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>pci/0000:4b:00.0:
>  name tx_scheduling_layers type driver-specific
>    values:
>      cmode permanent value 9
>
>Set:
>devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>cmode permanent

This is kind of proprietary param similar to number of which were shot
down for mlx5 in past. Jakub?

Also, given this is apparently nvconfig configuration, there could be
probably more suitable to use some provisioning tool. This is related to
the mlx5 misc driver.

Until be figure out the plan, this has my nack:

NAcked-by: Jiri Pirko <jiri@nvidia.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-19 12:37     ` Jiri Pirko
@ 2024-02-19 13:33       ` Przemek Kitszel
  -1 siblings, 0 replies; 46+ messages in thread
From: Przemek Kitszel @ 2024-02-19 13:33 UTC (permalink / raw)
  To: Jiri Pirko, Mateusz Polchlopek
  Cc: netdev, Lukasz Czapnik, intel-wired-lan, horms

On 2/19/24 13:37, Jiri Pirko wrote:
> Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>> From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>>
>> It was observed that Tx performance was inconsistent across all queues
>> and/or VSIs and that it was directly connected to existing 9-layer
>> topology of the Tx scheduler.
>>
>> Introduce new private devlink param - tx_scheduling_layers. This parameter
>> gives user flexibility to choose the 5-layer transmit scheduler topology
>> which helps to smooth out the transmit performance.
>>
>> Allowed parameter values are 5 and 9.
>>
>> Example usage:
>>
>> Show:
>> devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>> pci/0000:4b:00.0:
>>   name tx_scheduling_layers type driver-specific
>>     values:
>>       cmode permanent value 9
>>
>> Set:
>> devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>> cmode permanent
> 
> This is kind of proprietary param similar to number of which were shot

not sure if this is the same kind of param, but for sure proprietary one

> down for mlx5 in past. Jakub?

I'm not that familiar with the history/ies around mlx5, but this case is
somewhat different, at least for me:
we have a performance fix for the tree inside the FW/HW, while you
(IIRC) were about to introduce some nice and general abstraction layer,
which could be used by other HW vendors too, but instead it was mlx-only

> 
> Also, given this is apparently nvconfig configuration, there could be
> probably more suitable to use some provisioning tool. 

TBH, we will want to add some other NVM related params, but that does
not justify yet another tool to configure PF. (And then there would be
a big debate if FW update should be moved there too for consistency).

> This is related to the mlx5 misc driver.
> 
> Until be figure out the plan, this has my nack:
> 
> NAcked-by: Jiri Pirko <jiri@nvidia.com>

IMO this is an easy case, but would like to hear from netdev maintainers



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-19 13:33       ` Przemek Kitszel
  0 siblings, 0 replies; 46+ messages in thread
From: Przemek Kitszel @ 2024-02-19 13:33 UTC (permalink / raw)
  To: Jiri Pirko, Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, horms, Lukasz Czapnik

On 2/19/24 13:37, Jiri Pirko wrote:
> Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>> From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>>
>> It was observed that Tx performance was inconsistent across all queues
>> and/or VSIs and that it was directly connected to existing 9-layer
>> topology of the Tx scheduler.
>>
>> Introduce new private devlink param - tx_scheduling_layers. This parameter
>> gives user flexibility to choose the 5-layer transmit scheduler topology
>> which helps to smooth out the transmit performance.
>>
>> Allowed parameter values are 5 and 9.
>>
>> Example usage:
>>
>> Show:
>> devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>> pci/0000:4b:00.0:
>>   name tx_scheduling_layers type driver-specific
>>     values:
>>       cmode permanent value 9
>>
>> Set:
>> devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>> cmode permanent
> 
> This is kind of proprietary param similar to number of which were shot

not sure if this is the same kind of param, but for sure proprietary one

> down for mlx5 in past. Jakub?

I'm not that familiar with the history/ies around mlx5, but this case is
somewhat different, at least for me:
we have a performance fix for the tree inside the FW/HW, while you
(IIRC) were about to introduce some nice and general abstraction layer,
which could be used by other HW vendors too, but instead it was mlx-only

> 
> Also, given this is apparently nvconfig configuration, there could be
> probably more suitable to use some provisioning tool. 

TBH, we will want to add some other NVM related params, but that does
not justify yet another tool to configure PF. (And then there would be
a big debate if FW update should be moved there too for consistency).

> This is related to the mlx5 misc driver.
> 
> Until be figure out the plan, this has my nack:
> 
> NAcked-by: Jiri Pirko <jiri@nvidia.com>

IMO this is an easy case, but would like to hear from netdev maintainers



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-19 13:33       ` Przemek Kitszel
@ 2024-02-19 17:15         ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-19 17:15 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: netdev, Lukasz Czapnik, intel-wired-lan, horms, Mateusz Polchlopek

Mon, Feb 19, 2024 at 02:33:54PM CET, przemyslaw.kitszel@intel.com wrote:
>On 2/19/24 13:37, Jiri Pirko wrote:
>> Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>> > From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>> > 
>> > It was observed that Tx performance was inconsistent across all queues
>> > and/or VSIs and that it was directly connected to existing 9-layer
>> > topology of the Tx scheduler.
>> > 
>> > Introduce new private devlink param - tx_scheduling_layers. This parameter
>> > gives user flexibility to choose the 5-layer transmit scheduler topology
>> > which helps to smooth out the transmit performance.
>> > 
>> > Allowed parameter values are 5 and 9.
>> > 
>> > Example usage:
>> > 
>> > Show:
>> > devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>> > pci/0000:4b:00.0:
>> >   name tx_scheduling_layers type driver-specific
>> >     values:
>> >       cmode permanent value 9
>> > 
>> > Set:
>> > devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>> > cmode permanent
>> 
>> This is kind of proprietary param similar to number of which were shot
>
>not sure if this is the same kind of param, but for sure proprietary one
>
>> down for mlx5 in past. Jakub?
>
>I'm not that familiar with the history/ies around mlx5, but this case is
>somewhat different, at least for me:
>we have a performance fix for the tree inside the FW/HW, while you
>(IIRC) were about to introduce some nice and general abstraction layer,
>which could be used by other HW vendors too, but instead it was mlx-only

Nope. Same thing. Vendor/device specific FW/HW knob. Nothing to
abstract.


>
>> 
>> Also, given this is apparently nvconfig configuration, there could be
>> probably more suitable to use some provisioning tool.
>
>TBH, we will want to add some other NVM related params, but that does
>not justify yet another tool to configure PF. (And then there would be
>a big debate if FW update should be moved there too for consistency).
>
>> This is related to the mlx5 misc driver.
>> 
>> Until be figure out the plan, this has my nack:
>> 
>> NAcked-by: Jiri Pirko <jiri@nvidia.com>
>
>IMO this is an easy case, but would like to hear from netdev maintainers
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-19 17:15         ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-19 17:15 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms, Lukasz Czapnik

Mon, Feb 19, 2024 at 02:33:54PM CET, przemyslaw.kitszel@intel.com wrote:
>On 2/19/24 13:37, Jiri Pirko wrote:
>> Mon, Feb 19, 2024 at 11:05:57AM CET, mateusz.polchlopek@intel.com wrote:
>> > From: Lukasz Czapnik <lukasz.czapnik@intel.com>
>> > 
>> > It was observed that Tx performance was inconsistent across all queues
>> > and/or VSIs and that it was directly connected to existing 9-layer
>> > topology of the Tx scheduler.
>> > 
>> > Introduce new private devlink param - tx_scheduling_layers. This parameter
>> > gives user flexibility to choose the 5-layer transmit scheduler topology
>> > which helps to smooth out the transmit performance.
>> > 
>> > Allowed parameter values are 5 and 9.
>> > 
>> > Example usage:
>> > 
>> > Show:
>> > devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>> > pci/0000:4b:00.0:
>> >   name tx_scheduling_layers type driver-specific
>> >     values:
>> >       cmode permanent value 9
>> > 
>> > Set:
>> > devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>> > cmode permanent
>> 
>> This is kind of proprietary param similar to number of which were shot
>
>not sure if this is the same kind of param, but for sure proprietary one
>
>> down for mlx5 in past. Jakub?
>
>I'm not that familiar with the history/ies around mlx5, but this case is
>somewhat different, at least for me:
>we have a performance fix for the tree inside the FW/HW, while you
>(IIRC) were about to introduce some nice and general abstraction layer,
>which could be used by other HW vendors too, but instead it was mlx-only

Nope. Same thing. Vendor/device specific FW/HW knob. Nothing to
abstract.


>
>> 
>> Also, given this is apparently nvconfig configuration, there could be
>> probably more suitable to use some provisioning tool.
>
>TBH, we will want to add some other NVM related params, but that does
>not justify yet another tool to configure PF. (And then there would be
>a big debate if FW update should be moved there too for consistency).
>
>> This is related to the mlx5 misc driver.
>> 
>> Until be figure out the plan, this has my nack:
>> 
>> NAcked-by: Jiri Pirko <jiri@nvidia.com>
>
>IMO this is an easy case, but would like to hear from netdev maintainers
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-19 12:37     ` Jiri Pirko
@ 2024-02-21 23:38       ` Jakub Kicinski
  -1 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-21 23:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

On Mon, 19 Feb 2024 13:37:36 +0100 Jiri Pirko wrote:
> >devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
> >pci/0000:4b:00.0:
> >  name tx_scheduling_layers type driver-specific
> >    values:
> >      cmode permanent value 9
> >
> >Set:
> >devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
> >cmode permanent  

It's been almost a year since v1 was posted, could you iterate quicker? :( 

> This is kind of proprietary param similar to number of which were shot
> down for mlx5 in past. Jakub?

I remain somewhat confused about what this does.
Specifically IIUC the problem is that the radix of each node is
limited, so we need to start creating multi-layer hierarchies 
if we want a higher radix. Or in the "5-layer mode" the radix 
is automatically higher?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-21 23:38       ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-21 23:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

On Mon, 19 Feb 2024 13:37:36 +0100 Jiri Pirko wrote:
> >devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
> >pci/0000:4b:00.0:
> >  name tx_scheduling_layers type driver-specific
> >    values:
> >      cmode permanent value 9
> >
> >Set:
> >devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
> >cmode permanent  

It's been almost a year since v1 was posted, could you iterate quicker? :( 

> This is kind of proprietary param similar to number of which were shot
> down for mlx5 in past. Jakub?

I remain somewhat confused about what this does.
Specifically IIUC the problem is that the radix of each node is
limited, so we need to start creating multi-layer hierarchies 
if we want a higher radix. Or in the "5-layer mode" the radix 
is automatically higher?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-21 23:38       ` Jakub Kicinski
@ 2024-02-22 13:25         ` Mateusz Polchlopek
  -1 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-22 13:25 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: intel-wired-lan, netdev, horms, przemyslaw.kitszel, Lukasz Czapnik



On 2/22/2024 12:38 AM, Jakub Kicinski wrote:
> On Mon, 19 Feb 2024 13:37:36 +0100 Jiri Pirko wrote:
>>> devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>>> pci/0000:4b:00.0:
>>>   name tx_scheduling_layers type driver-specific
>>>     values:
>>>       cmode permanent value 9
>>>
>>> Set:
>>> devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>>> cmode permanent
> 
> It's been almost a year since v1 was posted, could you iterate quicker? :(
> 

Yes... Sorry for iterating so slowly, I promise to do it quicker in the 
future. That issue was kind of problematic for us, so it took quite a 
long time.

>> This is kind of proprietary param similar to number of which were shot
>> down for mlx5 in past. Jakub?
> 
> I remain somewhat confused about what this does.
> Specifically IIUC the problem is that the radix of each node is
> limited, so we need to start creating multi-layer hierarchies
> if we want a higher radix. Or in the "5-layer mode" the radix
> is automatically higher?

Basically, switching from 9 to 5 layers topology allows us to have 512 
leaves instead of 8 leaves which improves performance. I will add this 
information to the commit message and Documentation too, when we get an 
ACK for devlink parameter.

In the future we plan to extend this feature to support other layer 
counts too.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-22 13:25         ` Mateusz Polchlopek
  0 siblings, 0 replies; 46+ messages in thread
From: Mateusz Polchlopek @ 2024-02-22 13:25 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: netdev, Lukasz Czapnik, intel-wired-lan, horms, przemyslaw.kitszel



On 2/22/2024 12:38 AM, Jakub Kicinski wrote:
> On Mon, 19 Feb 2024 13:37:36 +0100 Jiri Pirko wrote:
>>> devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers
>>> pci/0000:4b:00.0:
>>>   name tx_scheduling_layers type driver-specific
>>>     values:
>>>       cmode permanent value 9
>>>
>>> Set:
>>> devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5
>>> cmode permanent
> 
> It's been almost a year since v1 was posted, could you iterate quicker? :(
> 

Yes... Sorry for iterating so slowly, I promise to do it quicker in the 
future. That issue was kind of problematic for us, so it took quite a 
long time.

>> This is kind of proprietary param similar to number of which were shot
>> down for mlx5 in past. Jakub?
> 
> I remain somewhat confused about what this does.
> Specifically IIUC the problem is that the radix of each node is
> limited, so we need to start creating multi-layer hierarchies
> if we want a higher radix. Or in the "5-layer mode" the radix
> is automatically higher?

Basically, switching from 9 to 5 layers topology allows us to have 512 
leaves instead of 8 leaves which improves performance. I will add this 
information to the commit message and Documentation too, when we get an 
ACK for devlink parameter.

In the future we plan to extend this feature to support other layer 
counts too.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-22 13:25         ` Mateusz Polchlopek
@ 2024-02-22 23:07           ` Jakub Kicinski
  -1 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-22 23:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

On Thu, 22 Feb 2024 14:25:21 +0100 Mateusz Polchlopek wrote:
> >> This is kind of proprietary param similar to number of which were shot
> >> down for mlx5 in past. Jakub?  
> > 
> > I remain somewhat confused about what this does.
> > Specifically IIUC the problem is that the radix of each node is
> > limited, so we need to start creating multi-layer hierarchies
> > if we want a higher radix. Or in the "5-layer mode" the radix
> > is automatically higher?  
> 
> Basically, switching from 9 to 5 layers topology allows us to have 512 
> leaves instead of 8 leaves which improves performance. I will add this 
> information to the commit message and Documentation too, when we get an 
> ACK for devlink parameter.

Sounds fine. Please update the doc to focus on the radix, rather than
the layers. Layers are not so important to the user. And maybe give an
example of things which won't be possible with 5-layer config.

Jiri, I'm not aware of any other devices with this sort of trade off.
We shouldn't add the param if either:
 - this can be changed dynamically as user instantiates rate limiters;
 - we know other devices have similar needs.
If neither of those is true, param seems fine to me..

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-22 23:07           ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-22 23:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

On Thu, 22 Feb 2024 14:25:21 +0100 Mateusz Polchlopek wrote:
> >> This is kind of proprietary param similar to number of which were shot
> >> down for mlx5 in past. Jakub?  
> > 
> > I remain somewhat confused about what this does.
> > Specifically IIUC the problem is that the radix of each node is
> > limited, so we need to start creating multi-layer hierarchies
> > if we want a higher radix. Or in the "5-layer mode" the radix
> > is automatically higher?  
> 
> Basically, switching from 9 to 5 layers topology allows us to have 512 
> leaves instead of 8 leaves which improves performance. I will add this 
> information to the commit message and Documentation too, when we get an 
> ACK for devlink parameter.

Sounds fine. Please update the doc to focus on the radix, rather than
the layers. Layers are not so important to the user. And maybe give an
example of things which won't be possible with 5-layer config.

Jiri, I'm not aware of any other devices with this sort of trade off.
We shouldn't add the param if either:
 - this can be changed dynamically as user instantiates rate limiters;
 - we know other devices have similar needs.
If neither of those is true, param seems fine to me..

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-22 23:07           ` Jakub Kicinski
@ 2024-02-23  9:45             ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-23  9:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

Fri, Feb 23, 2024 at 12:07:17AM CET, kuba@kernel.org wrote:
>On Thu, 22 Feb 2024 14:25:21 +0100 Mateusz Polchlopek wrote:
>> >> This is kind of proprietary param similar to number of which were shot
>> >> down for mlx5 in past. Jakub?  
>> > 
>> > I remain somewhat confused about what this does.
>> > Specifically IIUC the problem is that the radix of each node is
>> > limited, so we need to start creating multi-layer hierarchies
>> > if we want a higher radix. Or in the "5-layer mode" the radix
>> > is automatically higher?  
>> 
>> Basically, switching from 9 to 5 layers topology allows us to have 512 
>> leaves instead of 8 leaves which improves performance. I will add this 
>> information to the commit message and Documentation too, when we get an 
>> ACK for devlink parameter.
>
>Sounds fine. Please update the doc to focus on the radix, rather than
>the layers. Layers are not so important to the user. And maybe give an
>example of things which won't be possible with 5-layer config.
>
>Jiri, I'm not aware of any other devices with this sort of trade off.
>We shouldn't add the param if either:
> - this can be changed dynamically as user instantiates rate limiters;
> - we know other devices have similar needs.
>If neither of those is true, param seems fine to me..

Where is this policy documented? If not, could you please? Let's make
this policy clear for now and for the future.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-23  9:45             ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-23  9:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

Fri, Feb 23, 2024 at 12:07:17AM CET, kuba@kernel.org wrote:
>On Thu, 22 Feb 2024 14:25:21 +0100 Mateusz Polchlopek wrote:
>> >> This is kind of proprietary param similar to number of which were shot
>> >> down for mlx5 in past. Jakub?  
>> > 
>> > I remain somewhat confused about what this does.
>> > Specifically IIUC the problem is that the radix of each node is
>> > limited, so we need to start creating multi-layer hierarchies
>> > if we want a higher radix. Or in the "5-layer mode" the radix
>> > is automatically higher?  
>> 
>> Basically, switching from 9 to 5 layers topology allows us to have 512 
>> leaves instead of 8 leaves which improves performance. I will add this 
>> information to the commit message and Documentation too, when we get an 
>> ACK for devlink parameter.
>
>Sounds fine. Please update the doc to focus on the radix, rather than
>the layers. Layers are not so important to the user. And maybe give an
>example of things which won't be possible with 5-layer config.
>
>Jiri, I'm not aware of any other devices with this sort of trade off.
>We shouldn't add the param if either:
> - this can be changed dynamically as user instantiates rate limiters;
> - we know other devices have similar needs.
>If neither of those is true, param seems fine to me..

Where is this policy documented? If not, could you please? Let's make
this policy clear for now and for the future.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-23  9:45             ` Jiri Pirko
@ 2024-02-23 14:27               ` Jakub Kicinski
  -1 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-23 14:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

On Fri, 23 Feb 2024 10:45:01 +0100 Jiri Pirko wrote:
>> Jiri, I'm not aware of any other devices with this sort of trade off.
>> We shouldn't add the param if either:
>>  - this can be changed dynamically as user instantiates rate limiters;
>>  - we know other devices have similar needs.
>> If neither of those is true, param seems fine to me..  
> 
> Where is this policy documented? If not, could you please? Let's make
> this policy clear for now and for the future.

Because you think it's good as a policy or because not so much?
Both of the points are a judgment call, at least from upstream
perspective since we're working with very limited information.
So enshrining this as a "policy" is not very practical.

Do you recall any specific param that got rejected from mlx5?
Y'all were allowed to add the eq sizing params, which I think
is not going to be mlx5-only for long. Otherwise I only remember
cases where I'd try to push people to use the resource API, which
IMO is better for setting limits and delegating resources.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-23 14:27               ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-23 14:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

On Fri, 23 Feb 2024 10:45:01 +0100 Jiri Pirko wrote:
>> Jiri, I'm not aware of any other devices with this sort of trade off.
>> We shouldn't add the param if either:
>>  - this can be changed dynamically as user instantiates rate limiters;
>>  - we know other devices have similar needs.
>> If neither of those is true, param seems fine to me..  
> 
> Where is this policy documented? If not, could you please? Let's make
> this policy clear for now and for the future.

Because you think it's good as a policy or because not so much?
Both of the points are a judgment call, at least from upstream
perspective since we're working with very limited information.
So enshrining this as a "policy" is not very practical.

Do you recall any specific param that got rejected from mlx5?
Y'all were allowed to add the eq sizing params, which I think
is not going to be mlx5-only for long. Otherwise I only remember
cases where I'd try to push people to use the resource API, which
IMO is better for setting limits and delegating resources.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-23 14:27               ` Jakub Kicinski
@ 2024-02-25  7:18                 ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-25  7:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

Fri, Feb 23, 2024 at 03:27:57PM CET, kuba@kernel.org wrote:
>On Fri, 23 Feb 2024 10:45:01 +0100 Jiri Pirko wrote:
>>> Jiri, I'm not aware of any other devices with this sort of trade off.
>>> We shouldn't add the param if either:
>>>  - this can be changed dynamically as user instantiates rate limiters;
>>>  - we know other devices have similar needs.
>>> If neither of those is true, param seems fine to me..  
>> 
>> Where is this policy documented? If not, could you please? Let's make
>> this policy clear for now and for the future.
>
>Because you think it's good as a policy or because not so much?
>Both of the points are a judgment call, at least from upstream
>perspective since we're working with very limited information.
>So enshrining this as a "policy" is not very practical.

No, I don't mind the policy. Up to you. Makes sense to me. I'm just
saying it would be great to have this written down so everyone can
easily tell which kind of param is and is not acceptable.


>
>Do you recall any specific param that got rejected from mlx5?
>Y'all were allowed to add the eq sizing params, which I think
>is not going to be mlx5-only for long. Otherwise I only remember
>cases where I'd try to push people to use the resource API, which
>IMO is better for setting limits and delegating resources.

I don't have anything solid in mind, I would have to look it up. But
there is certainly quite big amount of uncertainties among my
colleagues to jundge is some param would or would not be acceptable to
you. That's why I believe it would save a lot of people time to write
the policy down in details, with examples, etc. Could you please?

Thanks!


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-25  7:18                 ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-25  7:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

Fri, Feb 23, 2024 at 03:27:57PM CET, kuba@kernel.org wrote:
>On Fri, 23 Feb 2024 10:45:01 +0100 Jiri Pirko wrote:
>>> Jiri, I'm not aware of any other devices with this sort of trade off.
>>> We shouldn't add the param if either:
>>>  - this can be changed dynamically as user instantiates rate limiters;
>>>  - we know other devices have similar needs.
>>> If neither of those is true, param seems fine to me..  
>> 
>> Where is this policy documented? If not, could you please? Let's make
>> this policy clear for now and for the future.
>
>Because you think it's good as a policy or because not so much?
>Both of the points are a judgment call, at least from upstream
>perspective since we're working with very limited information.
>So enshrining this as a "policy" is not very practical.

No, I don't mind the policy. Up to you. Makes sense to me. I'm just
saying it would be great to have this written down so everyone can
easily tell which kind of param is and is not acceptable.


>
>Do you recall any specific param that got rejected from mlx5?
>Y'all were allowed to add the eq sizing params, which I think
>is not going to be mlx5-only for long. Otherwise I only remember
>cases where I'd try to push people to use the resource API, which
>IMO is better for setting limits and delegating resources.

I don't have anything solid in mind, I would have to look it up. But
there is certainly quite big amount of uncertainties among my
colleagues to jundge is some param would or would not be acceptable to
you. That's why I believe it would save a lot of people time to write
the policy down in details, with examples, etc. Could you please?

Thanks!


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-25  7:18                 ` Jiri Pirko
@ 2024-02-27  2:37                   ` Jakub Kicinski
  -1 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-27  2:37 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
> >Do you recall any specific param that got rejected from mlx5?
> >Y'all were allowed to add the eq sizing params, which I think
> >is not going to be mlx5-only for long. Otherwise I only remember
> >cases where I'd try to push people to use the resource API, which
> >IMO is better for setting limits and delegating resources.  
> 
> I don't have anything solid in mind, I would have to look it up. But
> there is certainly quite big amount of uncertainties among my
> colleagues to jundge is some param would or would not be acceptable to
> you. That's why I believe it would save a lot of people time to write
> the policy down in details, with examples, etc. Could you please?

How about this? (BTW took me half an hour to write, just in case 
you're wondering)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 4e01dc32bc08..f1eef6d065be 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
 level, it can be used to provide configuration that may affect multiple
 ports on a single device.
 
-This document describes a number of generic parameters that are supported
-across multiple drivers. Each driver is also free to add their own
-parameters. Each driver must document the specific parameters they support,
-whether generic or not.
+There are two categories of devlink parameters - generic parameters
+and device-specific quirks. Generic devlink parameters are configuration
+knobs which don't fit into any larger API, but are supported across multiple
+drivers. This document describes a number of generic parameters.
+Each driver can also add its own parameters, which are documented in driver
+specific files.
 
 Configuration modes
 ===================
@@ -137,3 +139,32 @@ own name.
    * - ``event_eq_size``
      - u32
      - Control the size of asynchronous control events EQ.
+
+Adding new params
+=================
+
+Addition of new devlink params is carefully scrutinized upstream.
+More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
+devlink params should never be used in their place e.g. to allow easier
+delivery via out-of-tree modules, or to save development time.
+
+devlink parameters must always be thoroughly documented, both from technical
+perspective (to allow meaningful upstream review), and from user perspective
+(to allow users to make informed decisions).
+
+The requirements above should make it obvious that any "automatic" /
+"pass-through" registration of devlink parameters, based on strings
+read from the device, will not be accepted.
+
+There are two broad categories of devlink params which had been accepted
+in the past:
+
+ - device-specific configuration knobs, which cannot be inferred from
+   other device configuration. Note that the author is expected to study
+   other drivers to make sure that the configuration is in fact unique
+   to the implementation.
+
+ - configuration which must be set at device initialization time.
+   Allowing user to enable features at runtime is always preferable
+   but in reality most devices allow certain features to be enabled/disabled
+   only by changing configuration stored in NVM.

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27  2:37                   ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2024-02-27  2:37 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
> >Do you recall any specific param that got rejected from mlx5?
> >Y'all were allowed to add the eq sizing params, which I think
> >is not going to be mlx5-only for long. Otherwise I only remember
> >cases where I'd try to push people to use the resource API, which
> >IMO is better for setting limits and delegating resources.  
> 
> I don't have anything solid in mind, I would have to look it up. But
> there is certainly quite big amount of uncertainties among my
> colleagues to jundge is some param would or would not be acceptable to
> you. That's why I believe it would save a lot of people time to write
> the policy down in details, with examples, etc. Could you please?

How about this? (BTW took me half an hour to write, just in case 
you're wondering)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 4e01dc32bc08..f1eef6d065be 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
 level, it can be used to provide configuration that may affect multiple
 ports on a single device.
 
-This document describes a number of generic parameters that are supported
-across multiple drivers. Each driver is also free to add their own
-parameters. Each driver must document the specific parameters they support,
-whether generic or not.
+There are two categories of devlink parameters - generic parameters
+and device-specific quirks. Generic devlink parameters are configuration
+knobs which don't fit into any larger API, but are supported across multiple
+drivers. This document describes a number of generic parameters.
+Each driver can also add its own parameters, which are documented in driver
+specific files.
 
 Configuration modes
 ===================
@@ -137,3 +139,32 @@ own name.
    * - ``event_eq_size``
      - u32
      - Control the size of asynchronous control events EQ.
+
+Adding new params
+=================
+
+Addition of new devlink params is carefully scrutinized upstream.
+More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
+devlink params should never be used in their place e.g. to allow easier
+delivery via out-of-tree modules, or to save development time.
+
+devlink parameters must always be thoroughly documented, both from technical
+perspective (to allow meaningful upstream review), and from user perspective
+(to allow users to make informed decisions).
+
+The requirements above should make it obvious that any "automatic" /
+"pass-through" registration of devlink parameters, based on strings
+read from the device, will not be accepted.
+
+There are two broad categories of devlink params which had been accepted
+in the past:
+
+ - device-specific configuration knobs, which cannot be inferred from
+   other device configuration. Note that the author is expected to study
+   other drivers to make sure that the configuration is in fact unique
+   to the implementation.
+
+ - configuration which must be set at device initialization time.
+   Allowing user to enable features at runtime is always preferable
+   but in reality most devices allow certain features to be enabled/disabled
+   only by changing configuration stored in NVM.

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-27  2:37                   ` Jakub Kicinski
@ 2024-02-27 12:17                     ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-27 12:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms,
	przemyslaw.kitszel, Lukasz Czapnik

Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>> >Do you recall any specific param that got rejected from mlx5?
>> >Y'all were allowed to add the eq sizing params, which I think
>> >is not going to be mlx5-only for long. Otherwise I only remember
>> >cases where I'd try to push people to use the resource API, which
>> >IMO is better for setting limits and delegating resources.  
>> 
>> I don't have anything solid in mind, I would have to look it up. But
>> there is certainly quite big amount of uncertainties among my
>> colleagues to jundge is some param would or would not be acceptable to
>> you. That's why I believe it would save a lot of people time to write
>> the policy down in details, with examples, etc. Could you please?
>
>How about this? (BTW took me half an hour to write, just in case 
>you're wondering)
>
>diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>index 4e01dc32bc08..f1eef6d065be 100644
>--- a/Documentation/networking/devlink/devlink-params.rst
>+++ b/Documentation/networking/devlink/devlink-params.rst
>@@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
> level, it can be used to provide configuration that may affect multiple
> ports on a single device.
> 
>-This document describes a number of generic parameters that are supported
>-across multiple drivers. Each driver is also free to add their own
>-parameters. Each driver must document the specific parameters they support,
>-whether generic or not.
>+There are two categories of devlink parameters - generic parameters
>+and device-specific quirks. Generic devlink parameters are configuration
>+knobs which don't fit into any larger API, but are supported across multiple
>+drivers. This document describes a number of generic parameters.
>+Each driver can also add its own parameters, which are documented in driver
>+specific files.
> 
> Configuration modes
> ===================
>@@ -137,3 +139,32 @@ own name.
>    * - ``event_eq_size``
>      - u32
>      - Control the size of asynchronous control events EQ.
>+
>+Adding new params
>+=================
>+
>+Addition of new devlink params is carefully scrutinized upstream.
>+More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>+devlink params should never be used in their place e.g. to allow easier
>+delivery via out-of-tree modules, or to save development time.
>+
>+devlink parameters must always be thoroughly documented, both from technical
>+perspective (to allow meaningful upstream review), and from user perspective
>+(to allow users to make informed decisions).
>+
>+The requirements above should make it obvious that any "automatic" /
>+"pass-through" registration of devlink parameters, based on strings
>+read from the device, will not be accepted.
>+
>+There are two broad categories of devlink params which had been accepted
>+in the past:
>+
>+ - device-specific configuration knobs, which cannot be inferred from
>+   other device configuration. Note that the author is expected to study
>+   other drivers to make sure that the configuration is in fact unique
>+   to the implementation.
>+
>+ - configuration which must be set at device initialization time.
>+   Allowing user to enable features at runtime is always preferable
>+   but in reality most devices allow certain features to be enabled/disabled
>+   only by changing configuration stored in NVM.

Looks like most of the generic params does not fit either of these 2
categories. Did you mean these 2 categories for driver specific?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27 12:17                     ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-27 12:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mateusz Polchlopek, netdev, Lukasz Czapnik, intel-wired-lan,
	horms, przemyslaw.kitszel

Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>> >Do you recall any specific param that got rejected from mlx5?
>> >Y'all were allowed to add the eq sizing params, which I think
>> >is not going to be mlx5-only for long. Otherwise I only remember
>> >cases where I'd try to push people to use the resource API, which
>> >IMO is better for setting limits and delegating resources.  
>> 
>> I don't have anything solid in mind, I would have to look it up. But
>> there is certainly quite big amount of uncertainties among my
>> colleagues to jundge is some param would or would not be acceptable to
>> you. That's why I believe it would save a lot of people time to write
>> the policy down in details, with examples, etc. Could you please?
>
>How about this? (BTW took me half an hour to write, just in case 
>you're wondering)
>
>diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>index 4e01dc32bc08..f1eef6d065be 100644
>--- a/Documentation/networking/devlink/devlink-params.rst
>+++ b/Documentation/networking/devlink/devlink-params.rst
>@@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
> level, it can be used to provide configuration that may affect multiple
> ports on a single device.
> 
>-This document describes a number of generic parameters that are supported
>-across multiple drivers. Each driver is also free to add their own
>-parameters. Each driver must document the specific parameters they support,
>-whether generic or not.
>+There are two categories of devlink parameters - generic parameters
>+and device-specific quirks. Generic devlink parameters are configuration
>+knobs which don't fit into any larger API, but are supported across multiple
>+drivers. This document describes a number of generic parameters.
>+Each driver can also add its own parameters, which are documented in driver
>+specific files.
> 
> Configuration modes
> ===================
>@@ -137,3 +139,32 @@ own name.
>    * - ``event_eq_size``
>      - u32
>      - Control the size of asynchronous control events EQ.
>+
>+Adding new params
>+=================
>+
>+Addition of new devlink params is carefully scrutinized upstream.
>+More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>+devlink params should never be used in their place e.g. to allow easier
>+delivery via out-of-tree modules, or to save development time.
>+
>+devlink parameters must always be thoroughly documented, both from technical
>+perspective (to allow meaningful upstream review), and from user perspective
>+(to allow users to make informed decisions).
>+
>+The requirements above should make it obvious that any "automatic" /
>+"pass-through" registration of devlink parameters, based on strings
>+read from the device, will not be accepted.
>+
>+There are two broad categories of devlink params which had been accepted
>+in the past:
>+
>+ - device-specific configuration knobs, which cannot be inferred from
>+   other device configuration. Note that the author is expected to study
>+   other drivers to make sure that the configuration is in fact unique
>+   to the implementation.
>+
>+ - configuration which must be set at device initialization time.
>+   Allowing user to enable features at runtime is always preferable
>+   but in reality most devices allow certain features to be enabled/disabled
>+   only by changing configuration stored in NVM.

Looks like most of the generic params does not fit either of these 2
categories. Did you mean these 2 categories for driver specific?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-27 12:17                     ` Jiri Pirko
@ 2024-02-27 13:05                       ` Przemek Kitszel
  -1 siblings, 0 replies; 46+ messages in thread
From: Przemek Kitszel @ 2024-02-27 13:05 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: netdev, Lukasz Czapnik, intel-wired-lan, horms, Mateusz Polchlopek

On 2/27/24 13:17, Jiri Pirko wrote:
> Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>> On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>>>> Do you recall any specific param that got rejected from mlx5?
>>>> Y'all were allowed to add the eq sizing params, which I think
>>>> is not going to be mlx5-only for long. Otherwise I only remember
>>>> cases where I'd try to push people to use the resource API, which
>>>> IMO is better for setting limits and delegating resources.
>>>
>>> I don't have anything solid in mind, I would have to look it up. But
>>> there is certainly quite big amount of uncertainties among my
>>> colleagues to jundge is some param would or would not be acceptable to
>>> you. That's why I believe it would save a lot of people time to write
>>> the policy down in details, with examples, etc. Could you please?
>>
>> How about this? (BTW took me half an hour to write, just in case
>> you're wondering)

Thank you!

>>
>> diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>> index 4e01dc32bc08..f1eef6d065be 100644
>> --- a/Documentation/networking/devlink/devlink-params.rst
>> +++ b/Documentation/networking/devlink/devlink-params.rst
>> @@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
>> level, it can be used to provide configuration that may affect multiple
>> ports on a single device.
>>
>> -This document describes a number of generic parameters that are supported
>> -across multiple drivers. Each driver is also free to add their own
>> -parameters. Each driver must document the specific parameters they support,
>> -whether generic or not.
>> +There are two categories of devlink parameters - generic parameters
>> +and device-specific quirks. Generic devlink parameters are configuration
>> +knobs which don't fit into any larger API, but are supported across multiple

re Jiri: Generic ones are described here.

>> +drivers. This document describes a number of generic parameters.
>> +Each driver can also add its own parameters, which are documented in driver
>> +specific files.
>>
>> Configuration modes
>> ===================
>> @@ -137,3 +139,32 @@ own name.
>>     * - ``event_eq_size``
>>       - u32
>>       - Control the size of asynchronous control events EQ.
>> +
>> +Adding new params
>> +=================
>> +
>> +Addition of new devlink params is carefully scrutinized upstream.
>> +More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>> +devlink params should never be used in their place e.g. to allow easier
>> +delivery via out-of-tree modules, or to save development time.
>> +
>> +devlink parameters must always be thoroughly documented, both from technical
>> +perspective (to allow meaningful upstream review), and from user perspective
>> +(to allow users to make informed decisions).
>> +
>> +The requirements above should make it obvious that any "automatic" /
>> +"pass-through" registration of devlink parameters, based on strings
>> +read from the device, will not be accepted.
>> +
>> +There are two broad categories of devlink params which had been accepted
>> +in the past:
>> +
>> + - device-specific configuration knobs, which cannot be inferred from
>> +   other device configuration. Note that the author is expected to study
>> +   other drivers to make sure that the configuration is in fact unique
>> +   to the implementation.

What if it would not be unique, should they then proceed to add generic
(other word would be "common") param, and make the other driver/s use
it? Without deprecating the old method ofc.

What about knob being vendor specific, but given vendor has multiple,
very similar drivers? (ugh)

>> +
>> + - configuration which must be set at device initialization time.
>> +   Allowing user to enable features at runtime is always preferable
>> +   but in reality most devices allow certain features to be enabled/disabled
>> +   only by changing configuration stored in NVM.
> 
> Looks like most of the generic params does not fit either of these 2
> categories. Did you mean these 2 categories for driver specific?

If you mean the two paragraphs above (both started by "-"), this is for
vendor specific knobs, and reads fine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27 13:05                       ` Przemek Kitszel
  0 siblings, 0 replies; 46+ messages in thread
From: Przemek Kitszel @ 2024-02-27 13:05 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: Mateusz Polchlopek, intel-wired-lan, netdev, horms, Lukasz Czapnik

On 2/27/24 13:17, Jiri Pirko wrote:
> Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>> On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>>>> Do you recall any specific param that got rejected from mlx5?
>>>> Y'all were allowed to add the eq sizing params, which I think
>>>> is not going to be mlx5-only for long. Otherwise I only remember
>>>> cases where I'd try to push people to use the resource API, which
>>>> IMO is better for setting limits and delegating resources.
>>>
>>> I don't have anything solid in mind, I would have to look it up. But
>>> there is certainly quite big amount of uncertainties among my
>>> colleagues to jundge is some param would or would not be acceptable to
>>> you. That's why I believe it would save a lot of people time to write
>>> the policy down in details, with examples, etc. Could you please?
>>
>> How about this? (BTW took me half an hour to write, just in case
>> you're wondering)

Thank you!

>>
>> diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>> index 4e01dc32bc08..f1eef6d065be 100644
>> --- a/Documentation/networking/devlink/devlink-params.rst
>> +++ b/Documentation/networking/devlink/devlink-params.rst
>> @@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
>> level, it can be used to provide configuration that may affect multiple
>> ports on a single device.
>>
>> -This document describes a number of generic parameters that are supported
>> -across multiple drivers. Each driver is also free to add their own
>> -parameters. Each driver must document the specific parameters they support,
>> -whether generic or not.
>> +There are two categories of devlink parameters - generic parameters
>> +and device-specific quirks. Generic devlink parameters are configuration
>> +knobs which don't fit into any larger API, but are supported across multiple

re Jiri: Generic ones are described here.

>> +drivers. This document describes a number of generic parameters.
>> +Each driver can also add its own parameters, which are documented in driver
>> +specific files.
>>
>> Configuration modes
>> ===================
>> @@ -137,3 +139,32 @@ own name.
>>     * - ``event_eq_size``
>>       - u32
>>       - Control the size of asynchronous control events EQ.
>> +
>> +Adding new params
>> +=================
>> +
>> +Addition of new devlink params is carefully scrutinized upstream.
>> +More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>> +devlink params should never be used in their place e.g. to allow easier
>> +delivery via out-of-tree modules, or to save development time.
>> +
>> +devlink parameters must always be thoroughly documented, both from technical
>> +perspective (to allow meaningful upstream review), and from user perspective
>> +(to allow users to make informed decisions).
>> +
>> +The requirements above should make it obvious that any "automatic" /
>> +"pass-through" registration of devlink parameters, based on strings
>> +read from the device, will not be accepted.
>> +
>> +There are two broad categories of devlink params which had been accepted
>> +in the past:
>> +
>> + - device-specific configuration knobs, which cannot be inferred from
>> +   other device configuration. Note that the author is expected to study
>> +   other drivers to make sure that the configuration is in fact unique
>> +   to the implementation.

What if it would not be unique, should they then proceed to add generic
(other word would be "common") param, and make the other driver/s use
it? Without deprecating the old method ofc.

What about knob being vendor specific, but given vendor has multiple,
very similar drivers? (ugh)

>> +
>> + - configuration which must be set at device initialization time.
>> +   Allowing user to enable features at runtime is always preferable
>> +   but in reality most devices allow certain features to be enabled/disabled
>> +   only by changing configuration stored in NVM.
> 
> Looks like most of the generic params does not fit either of these 2
> categories. Did you mean these 2 categories for driver specific?

If you mean the two paragraphs above (both started by "-"), this is for
vendor specific knobs, and reads fine.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-27 13:05                       ` Przemek Kitszel
@ 2024-02-27 15:39                         ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-27 15:39 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Jakub Kicinski, Mateusz Polchlopek, intel-wired-lan, netdev,
	horms, Lukasz Czapnik

Tue, Feb 27, 2024 at 02:05:45PM CET, przemyslaw.kitszel@intel.com wrote:
>On 2/27/24 13:17, Jiri Pirko wrote:
>> Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>> > On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>> > > > Do you recall any specific param that got rejected from mlx5?
>> > > > Y'all were allowed to add the eq sizing params, which I think
>> > > > is not going to be mlx5-only for long. Otherwise I only remember
>> > > > cases where I'd try to push people to use the resource API, which
>> > > > IMO is better for setting limits and delegating resources.
>> > > 
>> > > I don't have anything solid in mind, I would have to look it up. But
>> > > there is certainly quite big amount of uncertainties among my
>> > > colleagues to jundge is some param would or would not be acceptable to
>> > > you. That's why I believe it would save a lot of people time to write
>> > > the policy down in details, with examples, etc. Could you please?
>> > 
>> > How about this? (BTW took me half an hour to write, just in case
>> > you're wondering)
>
>Thank you!
>
>> > 
>> > diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>> > index 4e01dc32bc08..f1eef6d065be 100644
>> > --- a/Documentation/networking/devlink/devlink-params.rst
>> > +++ b/Documentation/networking/devlink/devlink-params.rst
>> > @@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
>> > level, it can be used to provide configuration that may affect multiple
>> > ports on a single device.
>> > 
>> > -This document describes a number of generic parameters that are supported
>> > -across multiple drivers. Each driver is also free to add their own
>> > -parameters. Each driver must document the specific parameters they support,
>> > -whether generic or not.
>> > +There are two categories of devlink parameters - generic parameters
>> > +and device-specific quirks. Generic devlink parameters are configuration
>> > +knobs which don't fit into any larger API, but are supported across multiple
>
>re Jiri: Generic ones are described here.
>
>> > +drivers. This document describes a number of generic parameters.
>> > +Each driver can also add its own parameters, which are documented in driver
>> > +specific files.
>> > 
>> > Configuration modes
>> > ===================
>> > @@ -137,3 +139,32 @@ own name.
>> >     * - ``event_eq_size``
>> >       - u32
>> >       - Control the size of asynchronous control events EQ.
>> > +
>> > +Adding new params
>> > +=================
>> > +
>> > +Addition of new devlink params is carefully scrutinized upstream.
>> > +More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>> > +devlink params should never be used in their place e.g. to allow easier
>> > +delivery via out-of-tree modules, or to save development time.
>> > +
>> > +devlink parameters must always be thoroughly documented, both from technical
>> > +perspective (to allow meaningful upstream review), and from user perspective
>> > +(to allow users to make informed decisions).
>> > +
>> > +The requirements above should make it obvious that any "automatic" /
>> > +"pass-through" registration of devlink parameters, based on strings
>> > +read from the device, will not be accepted.
>> > +
>> > +There are two broad categories of devlink params which had been accepted
>> > +in the past:
>> > +
>> > + - device-specific configuration knobs, which cannot be inferred from
>> > +   other device configuration. Note that the author is expected to study
>> > +   other drivers to make sure that the configuration is in fact unique
>> > +   to the implementation.
>
>What if it would not be unique, should they then proceed to add generic
>(other word would be "common") param, and make the other driver/s use
>it? Without deprecating the old method ofc.
>
>What about knob being vendor specific, but given vendor has multiple,
>very similar drivers? (ugh)
>
>> > +
>> > + - configuration which must be set at device initialization time.
>> > +   Allowing user to enable features at runtime is always preferable
>> > +   but in reality most devices allow certain features to be enabled/disabled
>> > +   only by changing configuration stored in NVM.
>> 
>> Looks like most of the generic params does not fit either of these 2
>> categories. Did you mean these 2 categories for driver specific?
>
>If you mean the two paragraphs above (both started by "-"), this is for
>vendor specific knobs, and reads fine.

Do you assume or read it somewhere? I don't see it. I have the same
assumption though :)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27 15:39                         ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-27 15:39 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: netdev, Lukasz Czapnik, intel-wired-lan, horms,
	Mateusz Polchlopek, Jakub Kicinski

Tue, Feb 27, 2024 at 02:05:45PM CET, przemyslaw.kitszel@intel.com wrote:
>On 2/27/24 13:17, Jiri Pirko wrote:
>> Tue, Feb 27, 2024 at 03:37:00AM CET, kuba@kernel.org wrote:
>> > On Sun, 25 Feb 2024 08:18:00 +0100 Jiri Pirko wrote:
>> > > > Do you recall any specific param that got rejected from mlx5?
>> > > > Y'all were allowed to add the eq sizing params, which I think
>> > > > is not going to be mlx5-only for long. Otherwise I only remember
>> > > > cases where I'd try to push people to use the resource API, which
>> > > > IMO is better for setting limits and delegating resources.
>> > > 
>> > > I don't have anything solid in mind, I would have to look it up. But
>> > > there is certainly quite big amount of uncertainties among my
>> > > colleagues to jundge is some param would or would not be acceptable to
>> > > you. That's why I believe it would save a lot of people time to write
>> > > the policy down in details, with examples, etc. Could you please?
>> > 
>> > How about this? (BTW took me half an hour to write, just in case
>> > you're wondering)
>
>Thank you!
>
>> > 
>> > diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
>> > index 4e01dc32bc08..f1eef6d065be 100644
>> > --- a/Documentation/networking/devlink/devlink-params.rst
>> > +++ b/Documentation/networking/devlink/devlink-params.rst
>> > @@ -9,10 +9,12 @@ level device functionality. Since devlink can operate at the device-wide
>> > level, it can be used to provide configuration that may affect multiple
>> > ports on a single device.
>> > 
>> > -This document describes a number of generic parameters that are supported
>> > -across multiple drivers. Each driver is also free to add their own
>> > -parameters. Each driver must document the specific parameters they support,
>> > -whether generic or not.
>> > +There are two categories of devlink parameters - generic parameters
>> > +and device-specific quirks. Generic devlink parameters are configuration
>> > +knobs which don't fit into any larger API, but are supported across multiple
>
>re Jiri: Generic ones are described here.
>
>> > +drivers. This document describes a number of generic parameters.
>> > +Each driver can also add its own parameters, which are documented in driver
>> > +specific files.
>> > 
>> > Configuration modes
>> > ===================
>> > @@ -137,3 +139,32 @@ own name.
>> >     * - ``event_eq_size``
>> >       - u32
>> >       - Control the size of asynchronous control events EQ.
>> > +
>> > +Adding new params
>> > +=================
>> > +
>> > +Addition of new devlink params is carefully scrutinized upstream.
>> > +More complete APIs (in devlink, ethtool, netdev etc.) are always preferred,
>> > +devlink params should never be used in their place e.g. to allow easier
>> > +delivery via out-of-tree modules, or to save development time.
>> > +
>> > +devlink parameters must always be thoroughly documented, both from technical
>> > +perspective (to allow meaningful upstream review), and from user perspective
>> > +(to allow users to make informed decisions).
>> > +
>> > +The requirements above should make it obvious that any "automatic" /
>> > +"pass-through" registration of devlink parameters, based on strings
>> > +read from the device, will not be accepted.
>> > +
>> > +There are two broad categories of devlink params which had been accepted
>> > +in the past:
>> > +
>> > + - device-specific configuration knobs, which cannot be inferred from
>> > +   other device configuration. Note that the author is expected to study
>> > +   other drivers to make sure that the configuration is in fact unique
>> > +   to the implementation.
>
>What if it would not be unique, should they then proceed to add generic
>(other word would be "common") param, and make the other driver/s use
>it? Without deprecating the old method ofc.
>
>What about knob being vendor specific, but given vendor has multiple,
>very similar drivers? (ugh)
>
>> > +
>> > + - configuration which must be set at device initialization time.
>> > +   Allowing user to enable features at runtime is always preferable
>> > +   but in reality most devices allow certain features to be enabled/disabled
>> > +   only by changing configuration stored in NVM.
>> 
>> Looks like most of the generic params does not fit either of these 2
>> categories. Did you mean these 2 categories for driver specific?
>
>If you mean the two paragraphs above (both started by "-"), this is for
>vendor specific knobs, and reads fine.

Do you assume or read it somewhere? I don't see it. I have the same
assumption though :)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-27 13:05                       ` Przemek Kitszel
@ 2024-02-27 15:41                         ` Andrew Lunn
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrew Lunn @ 2024-02-27 15:41 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Jiri Pirko, Jakub Kicinski, Mateusz Polchlopek, intel-wired-lan,
	netdev, horms, Lukasz Czapnik

> What if it would not be unique, should they then proceed to add generic
> (other word would be "common") param, and make the other driver/s use
> it? Without deprecating the old method ofc.

If it is useful, somebody else will copy it and it will become
common. If nobody copies it, its probably not useful :-)

A lot of what we do in networking comes from standard. Its the
standards which gives us interoperability. Also, there is the saying,
form follows function. There are only so many ways you can implement
the same thing.

Is anybody truly building unique hardware, whos form somehow does not
follow function and is yet still standards compliant? More likely,
they are just the first, and others will copy or re-invent it sooner
or later.

So for me, unique is a pretty high bar to reach.

	Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27 15:41                         ` Andrew Lunn
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Lunn @ 2024-02-27 15:41 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Jiri Pirko, netdev, Lukasz Czapnik, intel-wired-lan, horms,
	Mateusz Polchlopek, Jakub Kicinski

> What if it would not be unique, should they then proceed to add generic
> (other word would be "common") param, and make the other driver/s use
> it? Without deprecating the old method ofc.

If it is useful, somebody else will copy it and it will become
common. If nobody copies it, its probably not useful :-)

A lot of what we do in networking comes from standard. Its the
standards which gives us interoperability. Also, there is the saying,
form follows function. There are only so many ways you can implement
the same thing.

Is anybody truly building unique hardware, whos form somehow does not
follow function and is yet still standards compliant? More likely,
they are just the first, and others will copy or re-invent it sooner
or later.

So for me, unique is a pretty high bar to reach.

	Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-27 15:41                         ` Andrew Lunn
@ 2024-02-27 16:04                           ` Jiri Pirko
  -1 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-27 16:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Przemek Kitszel, Jakub Kicinski, Mateusz Polchlopek,
	intel-wired-lan, netdev, horms, Lukasz Czapnik

Tue, Feb 27, 2024 at 04:41:52PM CET, andrew@lunn.ch wrote:
>> What if it would not be unique, should they then proceed to add generic
>> (other word would be "common") param, and make the other driver/s use
>> it? Without deprecating the old method ofc.
>
>If it is useful, somebody else will copy it and it will become
>common. If nobody copies it, its probably not useful :-)
>
>A lot of what we do in networking comes from standard. Its the
>standards which gives us interoperability. Also, there is the saying,
>form follows function. There are only so many ways you can implement
>the same thing.
>
>Is anybody truly building unique hardware, whos form somehow does not
>follow function and is yet still standards compliant? More likely,
>they are just the first, and others will copy or re-invent it sooner
>or later.

Wait, standard in protocol sense is completely parallel to the hw/fw
implementations. They may be (and in reality they are) a lots of
tunables to tweak specific hw/fw internals. So modern nics are very
unique. Still providing the same inputs and outputs, protocol-wise.


>
>So for me, unique is a pretty high bar to reach.
>
>	Andrew
>	

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27 16:04                           ` Jiri Pirko
  0 siblings, 0 replies; 46+ messages in thread
From: Jiri Pirko @ 2024-02-27 16:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Przemek Kitszel, Lukasz Czapnik, netdev, intel-wired-lan, horms,
	Mateusz Polchlopek, Jakub Kicinski

Tue, Feb 27, 2024 at 04:41:52PM CET, andrew@lunn.ch wrote:
>> What if it would not be unique, should they then proceed to add generic
>> (other word would be "common") param, and make the other driver/s use
>> it? Without deprecating the old method ofc.
>
>If it is useful, somebody else will copy it and it will become
>common. If nobody copies it, its probably not useful :-)
>
>A lot of what we do in networking comes from standard. Its the
>standards which gives us interoperability. Also, there is the saying,
>form follows function. There are only so many ways you can implement
>the same thing.
>
>Is anybody truly building unique hardware, whos form somehow does not
>follow function and is yet still standards compliant? More likely,
>they are just the first, and others will copy or re-invent it sooner
>or later.

Wait, standard in protocol sense is completely parallel to the hw/fw
implementations. They may be (and in reality they are) a lots of
tunables to tweak specific hw/fw internals. So modern nics are very
unique. Still providing the same inputs and outputs, protocol-wise.


>
>So for me, unique is a pretty high bar to reach.
>
>	Andrew
>	

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
  2024-02-27 16:04                           ` Jiri Pirko
@ 2024-02-27 20:38                             ` Andrew Lunn
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrew Lunn @ 2024-02-27 20:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Przemek Kitszel, Jakub Kicinski, Mateusz Polchlopek,
	intel-wired-lan, netdev, horms, Lukasz Czapnik

On Tue, Feb 27, 2024 at 05:04:47PM +0100, Jiri Pirko wrote:
> Tue, Feb 27, 2024 at 04:41:52PM CET, andrew@lunn.ch wrote:
> >> What if it would not be unique, should they then proceed to add generic
> >> (other word would be "common") param, and make the other driver/s use
> >> it? Without deprecating the old method ofc.
> >
> >If it is useful, somebody else will copy it and it will become
> >common. If nobody copies it, its probably not useful :-)
> >
> >A lot of what we do in networking comes from standard. Its the
> >standards which gives us interoperability. Also, there is the saying,
> >form follows function. There are only so many ways you can implement
> >the same thing.
> >
> >Is anybody truly building unique hardware, whos form somehow does not
> >follow function and is yet still standards compliant? More likely,
> >they are just the first, and others will copy or re-invent it sooner
> >or later.
> 
> Wait, standard in protocol sense is completely parallel to the hw/fw
> implementations. They may be (and in reality they are) a lots of
> tunables to tweak specific hw/fw internals. So modern nics are very
> unique. Still providing the same inputs and outputs, protocol-wise.

I think there is a general trickle down of technologies. What is top
of the line now, because normal everyday in 5 - 10 years time. Think
of a top of the line 10G Ethernet from 10 years ago. Is its design
that different to what you get integrated into today's SoC?  Are the
same or similar tunables from 10 year old top the line NICs also in
todays everyday SoCs?

Every PC is going to be an AI PC, if you believe the market hype at
the moment. But don't you think every PC will also have a network
processor of some sort in 5 - 10 years, derived from today network
processor. It will just be another tile in the SoC, next to the AI
tile, the GPU tile, and the CPU tiles. My guess would be, those
tunables in todays hardware will trickle down into those tiles in 5-10
years because they have been shown to be useful, they work, lets
re-use what we have.

       Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param
@ 2024-02-27 20:38                             ` Andrew Lunn
  0 siblings, 0 replies; 46+ messages in thread
From: Andrew Lunn @ 2024-02-27 20:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Przemek Kitszel, Lukasz Czapnik, netdev, intel-wired-lan, horms,
	Mateusz Polchlopek, Jakub Kicinski

On Tue, Feb 27, 2024 at 05:04:47PM +0100, Jiri Pirko wrote:
> Tue, Feb 27, 2024 at 04:41:52PM CET, andrew@lunn.ch wrote:
> >> What if it would not be unique, should they then proceed to add generic
> >> (other word would be "common") param, and make the other driver/s use
> >> it? Without deprecating the old method ofc.
> >
> >If it is useful, somebody else will copy it and it will become
> >common. If nobody copies it, its probably not useful :-)
> >
> >A lot of what we do in networking comes from standard. Its the
> >standards which gives us interoperability. Also, there is the saying,
> >form follows function. There are only so many ways you can implement
> >the same thing.
> >
> >Is anybody truly building unique hardware, whos form somehow does not
> >follow function and is yet still standards compliant? More likely,
> >they are just the first, and others will copy or re-invent it sooner
> >or later.
> 
> Wait, standard in protocol sense is completely parallel to the hw/fw
> implementations. They may be (and in reality they are) a lots of
> tunables to tweak specific hw/fw internals. So modern nics are very
> unique. Still providing the same inputs and outputs, protocol-wise.

I think there is a general trickle down of technologies. What is top
of the line now, because normal everyday in 5 - 10 years time. Think
of a top of the line 10G Ethernet from 10 years ago. Is its design
that different to what you get integrated into today's SoC?  Are the
same or similar tunables from 10 year old top the line NICs also in
todays everyday SoCs?

Every PC is going to be an AI PC, if you believe the market hype at
the moment. But don't you think every PC will also have a network
processor of some sort in 5 - 10 years, derived from today network
processor. It will just be another tile in the SoC, next to the AI
tile, the GPU tile, and the CPU tiles. My guess would be, those
tunables in todays hardware will trickle down into those tiles in 5-10
years because they have been shown to be useful, they work, lets
re-use what we have.

       Andrew

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2024-02-27 20:37 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-19 10:05 [Intel-wired-lan] [PATCH iwl-next v4 0/5] ice: Support 5 layer Tx scheduler topology Mateusz Polchlopek
2024-02-19 10:05 ` Mateusz Polchlopek
2024-02-19 10:05 ` [Intel-wired-lan] [PATCH iwl-next v1 1/5] ice: Support 5 layer topology Mateusz Polchlopek
2024-02-19 10:05   ` Mateusz Polchlopek
2024-02-19 10:16   ` Mateusz Polchlopek
2024-02-19 10:16     ` Mateusz Polchlopek
2024-02-19 10:05 ` [Intel-wired-lan] [PATCH iwl-next v4 2/5] ice: Adjust the VSI/Aggregator layers Mateusz Polchlopek
2024-02-19 10:05   ` Mateusz Polchlopek
2024-02-19 10:05 ` [Intel-wired-lan] [PATCH iwl-next v4 3/5] ice: Enable switching default Tx scheduler topology Mateusz Polchlopek
2024-02-19 10:05   ` Mateusz Polchlopek
2024-02-19 10:05 ` [Intel-wired-lan] [PATCH iwl-next v4 4/5] ice: Add tx_scheduling_layers devlink param Mateusz Polchlopek
2024-02-19 10:05   ` Mateusz Polchlopek
2024-02-19 12:37   ` Jiri Pirko
2024-02-19 12:37     ` Jiri Pirko
2024-02-19 13:33     ` Przemek Kitszel
2024-02-19 13:33       ` Przemek Kitszel
2024-02-19 17:15       ` Jiri Pirko
2024-02-19 17:15         ` Jiri Pirko
2024-02-21 23:38     ` Jakub Kicinski
2024-02-21 23:38       ` Jakub Kicinski
2024-02-22 13:25       ` Mateusz Polchlopek
2024-02-22 13:25         ` Mateusz Polchlopek
2024-02-22 23:07         ` Jakub Kicinski
2024-02-22 23:07           ` Jakub Kicinski
2024-02-23  9:45           ` Jiri Pirko
2024-02-23  9:45             ` Jiri Pirko
2024-02-23 14:27             ` Jakub Kicinski
2024-02-23 14:27               ` Jakub Kicinski
2024-02-25  7:18               ` Jiri Pirko
2024-02-25  7:18                 ` Jiri Pirko
2024-02-27  2:37                 ` Jakub Kicinski
2024-02-27  2:37                   ` Jakub Kicinski
2024-02-27 12:17                   ` Jiri Pirko
2024-02-27 12:17                     ` Jiri Pirko
2024-02-27 13:05                     ` Przemek Kitszel
2024-02-27 13:05                       ` Przemek Kitszel
2024-02-27 15:39                       ` Jiri Pirko
2024-02-27 15:39                         ` Jiri Pirko
2024-02-27 15:41                       ` Andrew Lunn
2024-02-27 15:41                         ` Andrew Lunn
2024-02-27 16:04                         ` Jiri Pirko
2024-02-27 16:04                           ` Jiri Pirko
2024-02-27 20:38                           ` Andrew Lunn
2024-02-27 20:38                             ` Andrew Lunn
2024-02-19 10:05 ` [Intel-wired-lan] [PATCH iwl-next v4 5/5] ice: Document tx_scheduling_layers parameter Mateusz Polchlopek
2024-02-19 10:05   ` Mateusz Polchlopek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.