All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support
@ 2023-07-27  2:10 Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
                   ` (8 more replies)
  0 siblings, 9 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-07-27  2:10 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang, Wenjun Wu

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/


Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 388 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 317 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 113 +++++
 18 files changed, 1225 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
@ 2023-07-27  2:10 ` Wenjun Wu
  2023-07-31 22:22   ` Tony Nguyen
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 115+ messages in thread
From: Wenjun Wu @ 2023-07-27  2:10 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang, Wenjun Wu

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 113 +++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index c15221dcb75e..f1250ddd063d 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1367,6 +1376,83 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[1];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(44, virtchnl_qos_cap_list);
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[1];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_queues_bw_cfg);
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 /**
  * virtchnl_vc_validate_vf_msg
  * @ver: Virtchnl version info
@@ -1558,6 +1644,33 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = sizeof(struct virtchnl_queues_bw_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+			valid_len += (q_bw->num_queues - 1) *
+					 sizeof(q_bw->cfg[0]);
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
@ 2023-07-27  2:10 ` Wenjun Wu
  2023-07-31 22:23   ` Tony Nguyen
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 3/5] iavf: Add devlink and devlink port support Wenjun Wu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 115+ messages in thread
From: Wenjun Wu @ 2023-07-27  2:10 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang, Wenjun Wu

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 317 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 377 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 125a2e753e29..25267ae6ab62 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -637,6 +637,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 n_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 4a12316f7b46..c5274d1eb5bf 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 6899f6af1866..606823ed68e8 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2470,6 +2470,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2487,6 +2504,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 20f40dfeb761..999bd4633d4f 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -500,5 +500,13 @@
 #define PFPM_WUS_FW_RST_WK_M			BIT(31)
 #define VFINT_DYN_CTLN(_i)			(0x00003800 + ((_i) * 4))
 #define VFINT_DYN_CTLN_CLEARPBA_M		BIT(1)
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 
 #endif /* _ICE_HW_AUTOGEN_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index bf47936e396a..3e17a1e7c6be 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -843,6 +843,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 
 	u16 max_burst_size;	/* driver sets this value */
 
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 67172fdd9bc2..6499d83cc706 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u16 queue_id;
+	u32 committed;
+	u32 peak;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 85d996531502..016b7e1d6e91 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,174 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = {0};
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(*cap_list) + sizeof(cap_list->cap[0]) * (numtc - 1);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	u32 p_rate;
+	int ret;
+	u16 i;
+	u8 tc;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return VIRTCHNL_STATUS_ERR_PARAM;
+
+	for (i = 0; i < num_queues; i++) {
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate) {
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		} else {
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		}
+		if (ret)
+			return ret;
+	}
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
+	begin_id = hw->logical_pf_id * per_pf;
+	n_used = pf->n_quanta_prof_used;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->n_quanta_prof_used++;
+		} else {
+			return VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		}
+	}
+
+	reg = rd32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx));
+	reg &= ~GLCOMM_QUANTA_PROF_QUANTA_SIZE_M;
+	reg |= quanta_size << GLCOMM_QUANTA_PROF_QUANTA_SIZE_S;
+	reg &= ~GLCOMM_QUANTA_PROF_MAX_CMD_M;
+	reg |= n_cmd << GLCOMM_QUANTA_PROF_MAX_CMD_S;
+	reg &= ~GLCOMM_QUANTA_PROF_MAX_DESC_M;
+	reg |= n_desc << GLCOMM_QUANTA_PROF_MAX_DESC_S;
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return VIRTCHNL_STATUS_SUCCESS;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1758,137 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vf_qs_bw *qs_bw;
+	struct ice_vsi *vsi;
+	size_t len;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
+	qs_bw = kzalloc(len, GFP_KERNEL);
+	if (!qs_bw) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		goto err_bw;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+	memcpy(vf->qs_bw, qs_bw, len);
+
+err_bw:
+	kfree(qs_bw);
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	num_queues = qquanta->queue_select.num_queues;
+	quanta_size = qquanta->quanta_size;
+	end_qid = start_qid + num_queues;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				    v_ret, NULL, 0);
+	return ret;
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2012,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3992,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4040,6 +4348,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 3/5] iavf: Add devlink and devlink port support
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
@ 2023-07-27  2:10 ` Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 4/5] iavf: Add devlink port function rate API support Wenjun Wu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-07-27  2:10 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
 6 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 9bc0a9519899..f916b8ef6acb 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 8cbdebc5b698..519aeaec793c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,10 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	/* devlink & port data */
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..991d041e5922
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register allocated devlink instance for iavf adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Register the devlink instance associated with this iavf adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	/* Allocate devlink instance */
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	/* Init iavf adapter devlink */
+	adapter->devlink = devlink;
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	/* Create devlink port: attr/port flavour, port index */
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..5c122278611a
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+/* iavf devlink structure pointing to iavf adapter */
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 06ea61f30b6f..c9ee1e8712a8 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2036,6 +2036,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2707,6 +2708,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2748,6 +2752,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4994,6 +4999,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err)
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+
+	/* Keep driver interface even on devlink registration failure */
 	return 0;
 
 err_ioremap:
@@ -5138,6 +5149,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 4/5] iavf: Add devlink port function rate API support
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
                   ` (2 preceding siblings ...)
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 3/5] iavf: Add devlink and devlink port support Wenjun Wu
@ 2023-07-27  2:10 ` Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting Wenjun Wu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-07-27  2:10 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 270 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 991d041e5922..e8469fda054d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,273 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num, size, i;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strncpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		size = q_num * sizeof(struct iavf_dev_rate_node);
+		dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		memset(dl_priv->queue_nodes, 0, size);
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag ==
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+		node->tx_max = node->tx_max_temp;
+		node->tx_share = node->tx_share_temp;
+	} else {
+		return 0;
+	}
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
+ * @rate_node: devlink rate struct instance
+ *
+ * This function implements rate_node_tx_max_set function of devlink_ops
+ */
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+/**
+ * iavf_devlink_rate_node_tx_share_set - devlink_rate API for setting tx share
+ * @rate_node: devlink rate struct instance
+ *
+ * This function implements rate_node_tx_share_set function of devlink_ops
+ */
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register allocated devlink instance for iavf adapter
@@ -30,7 +296,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	adapter->devlink = devlink;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 5c122278611a..897ff5fc87af 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,14 +4,35 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 /* iavf devlink structure pointing to iavf adapter */
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index c9ee1e8712a8..b621e44e8890 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2036,6 +2036,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2708,8 +2709,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2752,6 +2755,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5149,6 +5153,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v1 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
                   ` (3 preceding siblings ...)
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 4/5] iavf: Add devlink port function rate API support Wenjun Wu
@ 2023-07-27  2:10 ` Wenjun Wu
  2023-07-31 22:21 ` [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Tony Nguyen
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-07-27  2:10 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++++++++++-
 5 files changed, 313 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 519aeaec793c..e9b781cacffa 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -374,6 +380,7 @@ struct iavf_adapter {
 	/* devlink & port data */
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -423,6 +430,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -431,6 +440,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -577,6 +587,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index e8469fda054d..40af7f6e0a86 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -96,6 +96,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -107,6 +131,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -294,6 +322,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	/* Init iavf adapter devlink */
 	adapter->devlink = devlink;
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 897ff5fc87af..a8a41f343f56 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -34,5 +34,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index b621e44e8890..f19d5eb2a5fc 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2129,6 +2129,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2711,7 +2726,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3135,6 +3152,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4899,7 +4929,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -5003,10 +5033,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps)
+		goto err_ioremap;
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
-	if (err)
+	if (err) {
 		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		kfree(adapter->qos_caps);
+		goto err_ioremap;
+	}
 
 	/* Keep driver interface even on devlink registration failure */
 	return 0;
@@ -5156,6 +5194,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index be3c007ce90a..68f4df27f2ee 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1479,6 +1480,209 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues - 1);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	int q_idx;
+	int i, j;
+	u16 tc;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues - 1);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2138,6 +2342,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
                   ` (4 preceding siblings ...)
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting Wenjun Wu
@ 2023-07-31 22:21 ` Tony Nguyen
  2023-08-01 18:43   ` Zhang, Xuejun
  2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 115+ messages in thread
From: Tony Nguyen @ 2023-07-31 22:21 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang



On 7/26/2023 7:10 PM, Wenjun Wu wrote:
> To allow user to configure queue bandwidth, devlink port support
> is added to support devlink port rate API. [1]
> 
> Add devlink framework registration/unregistration on iavf driver
> initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
> is created to be associated iavf netdevice.
> 
> iavf rate tree with root node, queue nodes, and leaf node is created
> and registered with devlink rate when iavf adapter is configured, and
> if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
> Capability Exchange.
> 
> [root@localhost ~]# devlink port function rate show
> pci/0000:af:01.0/txq_15: type node parent iavf_root
> pci/0000:af:01.0/txq_14: type node parent iavf_root
> pci/0000:af:01.0/txq_13: type node parent iavf_root
> pci/0000:af:01.0/txq_12: type node parent iavf_root
> pci/0000:af:01.0/txq_11: type node parent iavf_root
> pci/0000:af:01.0/txq_10: type node parent iavf_root
> pci/0000:af:01.0/txq_9: type node parent iavf_root
> pci/0000:af:01.0/txq_8: type node parent iavf_root
> pci/0000:af:01.0/txq_7: type node parent iavf_root
> pci/0000:af:01.0/txq_6: type node parent iavf_root
> pci/0000:af:01.0/txq_5: type node parent iavf_root
> pci/0000:af:01.0/txq_4: type node parent iavf_root
> pci/0000:af:01.0/txq_3: type node parent iavf_root
> pci/0000:af:01.0/txq_2: type node parent iavf_root
> pci/0000:af:01.0/txq_1: type node parent iavf_root
> pci/0000:af:01.0/txq_0: type node parent iavf_root
> pci/0000:af:01.0/iavf_root: type node
> 
> 
>                           +---------+
>                           |   root  |
>                           +----+----+
>                                |
>              |-----------------|-----------------|
>         +----v----+       +----v----+       +----v----+
>         |  txq_0  |       |  txq_1  |       |  txq_x  |
>         +----+----+       +----+----+       +----+----+
> 
> User can configure the tx_max and tx_share of each queue. Once any one of the
> queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
> and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
> to VF
> 
> Example:
> 
> 1.To Set the queue tx_share:
> devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps
> 
> 2.To Set the queue tx_max:
> devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps
> 
> 3.To Show Current devlink port rate info:
> devlink port function rate function show
> [root@localhost ~]# devlink port function rate show
> pci/0000:af:01.0/txq_15: type node parent iavf_root
> pci/0000:af:01.0/txq_14: type node parent iavf_root
> pci/0000:af:01.0/txq_13: type node parent iavf_root
> pci/0000:af:01.0/txq_12: type node parent iavf_root
> pci/0000:af:01.0/txq_11: type node parent iavf_root
> pci/0000:af:01.0/txq_10: type node parent iavf_root
> pci/0000:af:01.0/txq_9: type node parent iavf_root
> pci/0000:af:01.0/txq_8: type node parent iavf_root
> pci/0000:af:01.0/txq_7: type node parent iavf_root
> pci/0000:af:01.0/txq_6: type node parent iavf_root
> pci/0000:af:01.0/txq_5: type node parent iavf_root
> pci/0000:af:01.0/txq_4: type node parent iavf_root
> pci/0000:af:01.0/txq_3: type node parent iavf_root
> pci/0000:af:01.0/txq_2: type node parent iavf_root
> pci/0000:af:01.0/txq_1: type node parent iavf_root
> pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
> pci/0000:af:01.0/iavf_root: type node
> 
> 
> [1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/
> 
> 
> Jun Zhang (3):
>    iavf: Add devlink and devlink port support
>    iavf: Add devlink port function rate API support
>    iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
> 
> Wenjun Wu (2):
>    virtchnl: support queue rate limit and quanta size configuration
>    ice: Support VF queue rate limit and quanta size configuration


This series does not apply.

>   drivers/net/ethernet/intel/Kconfig            |   1 +
>   drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
>   drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
>   .../net/ethernet/intel/iavf/iavf_devlink.c    | 388 ++++++++++++++++++
>   .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
>   drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++-
>   drivers/net/ethernet/intel/ice/ice.h          |   2 +
>   drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
>   drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
>   drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 317 ++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
>   .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
>   include/linux/avf/virtchnl.h                  | 113 +++++
>   18 files changed, 1225 insertions(+), 3 deletions(-)
>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
@ 2023-07-31 22:22   ` Tony Nguyen
  2023-08-01  9:24     ` Wu, Wenjun1
  0 siblings, 1 reply; 115+ messages in thread
From: Tony Nguyen @ 2023-07-31 22:22 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang

On 7/26/2023 7:10 PM, Wenjun Wu wrote:
> This patch adds new virtchnl opcodes and structures for rate limit
> and quanta size configuration, which include:
> 1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
> VF per queue.
> 2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
> 3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
> as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
> configuration is previously set by DCB and PF, and now is the potential
> QoS capability of VF. VF can take it as reference to configure queue TC
> mapping.
> 
> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
> ---
>   include/linux/avf/virtchnl.h | 113 +++++++++++++++++++++++++++++++++++
>   1 file changed, 113 insertions(+)
> 
> diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
> index c15221dcb75e..f1250ddd063d 100644
> --- a/include/linux/avf/virtchnl.h
> +++ b/include/linux/avf/virtchnl.h
> @@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
>   	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
>   };
>   
> +enum virtchnl_bw_limit_type {
> +	VIRTCHNL_BW_SHAPER = 0,
> +};
>   /* END GENERIC DEFINES */
>   
>   /* Opcodes for VF-PF communication. These are placed in the v_opcode field
> @@ -145,6 +148,11 @@ enum virtchnl_ops {
>   	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
>   	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
>   	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
> +	/* opcode 57 - 65 are reserved */
> +	VIRTCHNL_OP_GET_QOS_CAPS = 66,
> +	/* opcode 68 through 111 are reserved */
> +	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
> +	VIRTCHNL_OP_CONFIG_QUANTA = 113,
>   	VIRTCHNL_OP_MAX,
>   };
>   
> @@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
>   #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
>   #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
>   #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
> +#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
>   
>   #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
>   			       VIRTCHNL_VF_OFFLOAD_VLAN | \
> @@ -1367,6 +1376,83 @@ struct virtchnl_fdir_del {
>   
>   VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
>   
> +struct virtchnl_shaper_bw {
> +	/* Unit is Kbps */
> +	u32 committed;
> +	u32 peak;
> +};
> +
> +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
> +
> +/* VIRTCHNL_OP_GET_QOS_CAPS
> + * VF sends this message to get its QoS Caps, such as
> + * TC number, Arbiter and Bandwidth.
> + */
> +struct virtchnl_qos_cap_elem {
> +	u8 tc_num;
> +	u8 tc_prio;
> +#define VIRTCHNL_ABITER_STRICT      0
> +#define VIRTCHNL_ABITER_ETS         2
> +	u8 arbiter;
> +#define VIRTCHNL_STRICT_WEIGHT      1
> +	u8 weight;
> +	enum virtchnl_bw_limit_type type;
> +	union {
> +		struct virtchnl_shaper_bw shaper;
> +		u8 pad2[32];
> +	};
> +};
> +
> +VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
> +
> +struct virtchnl_qos_cap_list {
> +	u16 vsi_id;
> +	u16 num_elem;
> +	struct virtchnl_qos_cap_elem cap[1];
> +};

If it's not too late to use a flex arrays, we should. Otherwise, this 
should model after Olek's work [1].

Adding Olek in case he has input.

> +
> +VIRTCHNL_CHECK_STRUCT_LEN(44, virtchnl_qos_cap_list);
> +
> +/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
> +struct virtchnl_queue_bw {
> +	u16 queue_id;
> +	u8 tc;
> +	u8 pad;
> +	struct virtchnl_shaper_bw shaper;
> +};
> +
> +VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
> +
> +struct virtchnl_queues_bw_cfg {
> +	u16 vsi_id;
> +	u16 num_queues;
> +	struct virtchnl_queue_bw cfg[1];

same here

> +};
> +
> +VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_queues_bw_cfg);
> +
> +enum virtchnl_queue_type {
> +	VIRTCHNL_QUEUE_TYPE_TX			= 0,
> +	VIRTCHNL_QUEUE_TYPE_RX			= 1,
> +};
> +
> +/* structure to specify a chunk of contiguous queues */
> +struct virtchnl_queue_chunk {
> +	/* see enum virtchnl_queue_type */
> +	s32 type;
> +	u16 start_queue_id;
> +	u16 num_queues;
> +};
> +
> +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
> +
> +struct virtchnl_quanta_cfg {
> +	u16 quanta_size;
> +	struct virtchnl_queue_chunk queue_select;
> +};
> +
> +VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
> +
>   /**
>    * virtchnl_vc_validate_vf_msg
>    * @ver: Virtchnl version info
> @@ -1558,6 +1644,33 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
>   	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
>   		valid_len = sizeof(struct virtchnl_vlan_setting);
>   		break;
> +	case VIRTCHNL_OP_GET_QOS_CAPS:
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> +		valid_len = sizeof(struct virtchnl_queues_bw_cfg);
> +		if (msglen >= valid_len) {
> +			struct virtchnl_queues_bw_cfg *q_bw =
> +				(struct virtchnl_queues_bw_cfg *)msg;

missing newline here.

> +			if (q_bw->num_queues == 0) {
> +				err_msg_format = true;
> +				break;
> +			}
> +			valid_len += (q_bw->num_queues - 1) *
> +					 sizeof(q_bw->cfg[0]);

See referenced series for changes to this too.

> +		}
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUANTA:
> +		valid_len = sizeof(struct virtchnl_quanta_cfg);
> +		if (msglen >= valid_len) {
> +			struct virtchnl_quanta_cfg *q_quanta =
> +				(struct virtchnl_quanta_cfg *)msg;

need newline

> +			if (q_quanta->quanta_size == 0 ||
> +			    q_quanta->queue_select.num_queues == 0) {
> +				err_msg_format = true;
> +				break;
> +			}
> +		}
> +		break;
>   	/* These are always errors coming from the VF. */
>   	case VIRTCHNL_OP_EVENT:
>   	case VIRTCHNL_OP_UNKNOWN:

[1] 
https://lore.kernel.org/netdev/20230728155207.10042-1-aleksander.lobakin@intel.com/#t
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
@ 2023-07-31 22:23   ` Tony Nguyen
  2023-08-01  9:30     ` Wu, Wenjun1
  0 siblings, 1 reply; 115+ messages in thread
From: Tony Nguyen @ 2023-07-31 22:23 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang



On 7/26/2023 7:10 PM, Wenjun Wu wrote:
> Add support to configure VF queue rate limit and quanta size.
> 
> For quanta size configuration, the quanta profiles are divided evenly
> by PF numbers. For each port, the first quanta profile is reserved for
> default. When VF is asked to set queue quanta size, PF will search for
> an available profile, change the fields and assigned this profile to the
> queue.
> 
> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice.h          |   2 +
>   drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
>   drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
>   drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 317 ++++++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
>   .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
>   10 files changed, 377 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 125a2e753e29..25267ae6ab62 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -637,6 +637,8 @@ struct ice_pf {
>   #define ICE_VF_AGG_NODE_ID_START	65
>   #define ICE_MAX_VF_AGG_NODES		32
>   	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
> +
> +	u8 n_quanta_prof_used;
>   };
>   
>   extern struct workqueue_struct *ice_lag_wq;
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index 4a12316f7b46..c5274d1eb5bf 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
>   		break;
>   	}
>   
> +	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
> +
>   	tlan_ctx->tso_ena = ICE_TX_LEGACY;
>   	tlan_ctx->tso_qnum = pf_q;
>   
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index 6899f6af1866..606823ed68e8 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -2470,6 +2470,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
>   	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
>   }
>   
> +/**
> + * ice_func_id_to_logical_id - map from function id to logical pf id
> + * @active_function_bitmap: active function bitmap
> + * @pf_id: function number of device
> + */
> +static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
> +{
> +	u8 logical_id = 0;
> +	u8 i;
> +
> +	for (i = 0; i < pf_id; i++)
> +		if (active_function_bitmap & BIT(i))
> +			logical_id++;
> +
> +	return logical_id;
> +}
> +
>   /**
>    * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
>    * @hw: pointer to the HW struct
> @@ -2487,6 +2504,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
>   	dev_p->num_funcs = hweight32(number);
>   	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
>   		  dev_p->num_funcs);
> +
> +	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
>   }
>   
>   /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> index 20f40dfeb761..999bd4633d4f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> @@ -500,5 +500,13 @@
>   #define PFPM_WUS_FW_RST_WK_M			BIT(31)
>   #define VFINT_DYN_CTLN(_i)			(0x00003800 + ((_i) * 4))
>   #define VFINT_DYN_CTLN_CLEARPBA_M		BIT(1)
> +#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
> +#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
> +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
> +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
> +#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
> +#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
> +#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
> +#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)

These don't look like the right placement within the file. Please 
check/correct.

>   
>   #endif /* _ICE_HW_AUTOGEN_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
> index 166413fc33f4..7e152ab5b727 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> @@ -381,6 +381,8 @@ struct ice_tx_ring {
>   	u8 flags;
>   	u8 dcb_tc;			/* Traffic class of ring */
>   	u8 ptp_tx;
> +
> +	u16 quanta_prof_id;
>   } ____cacheline_internodealigned_in_smp;
>   
>   static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
> diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
> index bf47936e396a..3e17a1e7c6be 100644
> --- a/drivers/net/ethernet/intel/ice/ice_type.h
> +++ b/drivers/net/ethernet/intel/ice/ice_type.h
> @@ -843,6 +843,7 @@ struct ice_hw {
>   	u8 revision_id;
>   
>   	u8 pf_id;		/* device profile info */
> +	u8 logical_pf_id;
>   
>   	u16 max_burst_size;	/* driver sets this value */
>   
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> index 67172fdd9bc2..6499d83cc706 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> @@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
>   	u16 last_printed;
>   };
>   
> +struct ice_vf_qs_bw {
> +	u16 queue_id;
> +	u32 committed;
> +	u32 peak;
> +	u8 tc;
> +};
> +
>   /* VF operations */
>   struct ice_vf_ops {
>   	enum ice_disq_rst_src reset_type;
> @@ -133,6 +140,8 @@ struct ice_vf {
>   
>   	/* devlink port data */
>   	struct devlink_port devlink_port;
> +
> +	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
>   };
>   
>   /* Flags for controlling behavior of ice_reset_vf */
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index 85d996531502..016b7e1d6e91 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
>   	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
>   		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
>   
> +	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
> +		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
> +
>   	vfres->num_vsis = 1;
>   	/* Tx and Rx queue are equal for VF */
>   	vfres->num_queue_pairs = vsi->num_txq;
> @@ -985,6 +988,174 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
>   				     NULL, 0);
>   }
>   
> +/**
> + * ice_vc_get_qos_caps - Get current QoS caps from PF
> + * @vf: pointer to the VF info
> + *
> + * Get VF's QoS capabilities, such as TC number, arbiter and
> + * bandwidth from PF.
> + */
> +static int ice_vc_get_qos_caps(struct ice_vf *vf)
> +{
> +	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +	struct virtchnl_qos_cap_list *cap_list = NULL;
> +	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = {0};

init with { } is preferred

> +	struct virtchnl_qos_cap_elem *cfg = NULL;
> +	struct ice_vsi_ctx *vsi_ctx;
> +	struct ice_pf *pf = vf->pf;
> +	struct ice_port_info *pi;
> +	struct ice_vsi *vsi;
> +	u8 numtc, tc;
> +	u16 len = 0;
> +	int ret, i;
> +
> +	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	vsi = ice_get_vf_vsi(vf);
> +	if (!vsi) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	pi = pf->hw.port_info;
> +	numtc = vsi->tc_cfg.numtc;
> +
> +	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
> +	if (!vsi_ctx) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	len = sizeof(*cap_list) + sizeof(cap_list->cap[0]) * (numtc - 1);

I believe struct_size helper can be used here

> +	cap_list = kzalloc(len, GFP_KERNEL);
> +	if (!cap_list) {
> +		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +		len = 0;
> +		goto err;
> +	}
> +
> +	cap_list->vsi_id = vsi->vsi_num;
> +	cap_list->num_elem = numtc;
> +
> +	/* Store the UP2TC configuration from DCB to a user priority bitmap
> +	 * of each TC. Each element of prio_of_tc represents one TC. Each
> +	 * bitmap indicates the user priorities belong to this TC.
> +	 */
> +	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
> +		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
> +		tc_prio[tc] |= BIT(i);
> +	}
> +
> +	for (i = 0; i < numtc; i++) {
> +		cfg = &cap_list->cap[i];
> +		cfg->tc_num = i;
> +		cfg->tc_prio = tc_prio[i];
> +		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
> +		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
> +		cfg->type = VIRTCHNL_BW_SHAPER;
> +		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
> +		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
> +	}
> +
> +err:
> +	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
> +				    (u8 *)cap_list, len);
> +	kfree(cap_list);
> +	return ret;
> +}
> +
> +/**
> + * ice_vf_cfg_qs_bw - Configure per queue bandwidth
> + * @vf: pointer to the VF info
> + * @num_queues: number of queues to be configured
> + *
> + * Configure per queue bandwidth.
> + */
> +static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
> +{
> +	struct ice_hw *hw = &vf->pf->hw;
> +	struct ice_vsi *vsi;
> +	u32 p_rate;
> +	int ret;
> +	u16 i;
> +	u8 tc;
> +
> +	vsi = ice_get_vf_vsi(vf);
> +	if (!vsi)
> +		return VIRTCHNL_STATUS_ERR_PARAM;
> +
> +	for (i = 0; i < num_queues; i++) {
> +		p_rate = vf->qs_bw[i].peak;
> +		tc = vf->qs_bw[i].tc;
> +		if (p_rate) {
> +			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
> +					       vf->qs_bw[i].queue_id,
> +					       ICE_MAX_BW, p_rate);
> +		} else {
> +			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
> +						    vf->qs_bw[i].queue_id,
> +						    ICE_MAX_BW);

These functions return kernel error codes...

> +		}
> +		if (ret)
> +			return ret;
> +	}
> + > +	return VIRTCHNL_STATUS_SUCCESS;

... this and the error above are returning VIRTCHNL errors. These are 
not returning consistent types.

> +}
> +
> +/**
> + * ice_vf_cfg_q_quanta_profile
> + * @vf: pointer to the VF info
> + * @quanta_prof_idx: pointer to the quanta profile index
> + * @quanta_size: quanta size to be set
> + *
> + * This function chooses available quanta profile and configures the register.
> + * The quanta profile is evenly divided by the number of device ports, and then
> + * available to the specific PF and VFs. The first profile for each PF is a
> + * reserved default profile. Only quanta size of the rest unused profile can be
> + * modified.
> + */
> +static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
> +				       u16 *quanta_prof_idx)
> +{
> +	const u16 n_desc = calc_quanta_desc(quanta_size);
> +	struct ice_hw *hw = &vf->pf->hw;
> +	const u16 n_cmd = 2 * n_desc;
> +	struct ice_pf *pf = vf->pf;
> +	u16 per_pf, begin_id;
> +	u8 n_used;
> +	u32 reg;
> +
> +	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
> +	begin_id = hw->logical_pf_id * per_pf;
> +	n_used = pf->n_quanta_prof_used;
> +
> +	if (quanta_size == ICE_DFLT_QUANTA) {
> +		*quanta_prof_idx = begin_id;
> +	} else {
> +		if (n_used < per_pf) {
> +			*quanta_prof_idx = begin_id + 1 + n_used;
> +			pf->n_quanta_prof_used++;
> +		} else {
> +			return VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
> +		}
> +	}
> +
> +	reg = rd32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx));
> +	reg &= ~GLCOMM_QUANTA_PROF_QUANTA_SIZE_M;
> +	reg |= quanta_size << GLCOMM_QUANTA_PROF_QUANTA_SIZE_S;
> +	reg &= ~GLCOMM_QUANTA_PROF_MAX_CMD_M;
> +	reg |= n_cmd << GLCOMM_QUANTA_PROF_MAX_CMD_S;
> +	reg &= ~GLCOMM_QUANTA_PROF_MAX_DESC_M;
> +	reg |= n_desc << GLCOMM_QUANTA_PROF_MAX_DESC_S;
> +	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
> +
> +	return VIRTCHNL_STATUS_SUCCESS;

Is this really supposed to return VIRTCHNL codes? That's not a standard 
convention the driver is doing.

> +}
> +
>   /**
>    * ice_vc_cfg_promiscuous_mode_msg
>    * @vf: pointer to the VF info
> @@ -1587,6 +1758,137 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
>   				     NULL, 0);
>   }
>   
> +/**
> + * ice_vc_cfg_q_bw - Configure per queue bandwidth
> + * @vf: pointer to the VF info
> + * @msg: pointer to the msg buffer which holds the command descriptor
> + *
> + * Configure VF queues bandwidth.
> + */
> +static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
> +{
> +	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +	struct virtchnl_queues_bw_cfg *qbw =
> +		(struct virtchnl_queues_bw_cfg *)msg;
> +	struct ice_vf_qs_bw *qs_bw;
> +	struct ice_vsi *vsi;
> +	size_t len;
> +	u16 i;
> +
> +	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
> +	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	vsi = ice_get_vf_vsi(vf);
> +	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
> +	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
> +		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
> +			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
> +	qs_bw = kzalloc(len, GFP_KERNEL);
> +	if (!qs_bw) {
> +		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +		goto err_bw;
> +	}
> +
> +	for (i = 0; i < qbw->num_queues; i++) {
> +		qs_bw[i].queue_id = qbw->cfg[i].queue_id;
> +		qs_bw[i].peak = qbw->cfg[i].shaper.peak;
> +		qs_bw[i].committed = qbw->cfg[i].shaper.committed;
> +		qs_bw[i].tc = qbw->cfg[i].tc;
> +	}
> +
> +	memcpy(vf->qs_bw, qs_bw, len);
> +
> +err_bw:
> +	kfree(qs_bw);
> +
> +err:
> +	/* send the response to the VF */
> +	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +				    v_ret, NULL, 0);
> +}
> +
> +/**
> + * ice_vc_cfg_q_quanta - Configure per queue quanta
> + * @vf: pointer to the VF info
> + * @msg: pointer to the msg buffer which holds the command descriptor
> + *
> + * Configure VF queues quanta.
> + */
> +static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
> +{
> +	u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
> +	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +	struct virtchnl_quanta_cfg *qquanta =
> +		(struct virtchnl_quanta_cfg *)msg;
> +	struct ice_vsi *vsi;
> +	int ret;
> +
> +	start_qid = qquanta->queue_select.start_queue_id;
> +	num_queues = qquanta->queue_select.num_queues;
> +	quanta_size = qquanta->quanta_size;
> +	end_qid = start_qid + num_queues;
> +
> +	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	vsi = ice_get_vf_vsi(vf);
> +	if (!vsi) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
> +	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
> +		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
> +			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
> +	    quanta_size < ICE_MIN_QUANTA_SIZE) {
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	if (quanta_size % 64) {
> +		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
> +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +		goto err;
> +	}
> +
> +	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
> +					  &quanta_prof_id);
> +	if (ret) {
> +		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
> +		goto err;
> +	}
> +
> +	for (i = start_qid; i < end_qid; i++)
> +		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
> +
> +err:
> +	/* send the response to the VF */
> +	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
> +				    v_ret, NULL, 0);
> +	return ret;

return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
			    v_ret, NULL, 0);

> +}
> +
>   /**
>    * ice_vc_cfg_qs_msg
>    * @vf: pointer to the VF info
> @@ -1710,6 +2012,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>   		}
>   	}
>   
> +	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
> +		goto error_param;
> +
>   	/* send the response to the VF */
>   	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
>   				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
> @@ -3687,6 +3992,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
>   	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
>   	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
>   	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
> +	.get_qos_caps = ice_vc_get_qos_caps,
> +	.cfg_q_bw = ice_vc_cfg_q_bw,
> +	.cfg_q_quanta = ice_vc_cfg_q_quanta,
>   };
>   
>   /**
> @@ -4040,6 +4348,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>   	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
>   		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
>   		break;
> +	case VIRTCHNL_OP_GET_QOS_CAPS:
> +		err = ops->get_qos_caps(vf);
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> +		err = ops->cfg_q_bw(vf, msg);
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUANTA:
> +		err = ops->cfg_q_quanta(vf, msg);
> +		break;
>   	case VIRTCHNL_OP_UNKNOWN:
>   	default:
>   		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> index cd747718de73..0efb9c0f669a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> @@ -13,6 +13,13 @@
>   /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
>   #define ICE_MAX_VLAN_PER_VF		8
>   
> +#define ICE_DFLT_QUANTA 1024
> +#define ICE_MAX_QUANTA_SIZE 4096
> +#define ICE_MIN_QUANTA_SIZE 256
> +
> +#define calc_quanta_desc(x)	\
> +	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
> +
>   /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
>    * broadcast, and 16 for additional unicast/multicast filters
>    */
> @@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
>   	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
>   	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
>   	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
> +	int (*get_qos_caps)(struct ice_vf *vf);
> +	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
> +	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
> +	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
>   };
>   
>   #ifdef CONFIG_PCI_IOV
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> index 7d547fa616fa..2e3f63a429cd 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> @@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
>   	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
>   };
>   
> +static const u32 tc_allowlist_opcodes[] = {
> +	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +	VIRTCHNL_OP_CONFIG_QUANTA,
> +};
> +
>   struct allowlist_opcode_info {
>   	const u32 *opcodes;
>   	size_t size;
> @@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
>   	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
>   	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
>   	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
> +	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
>   };
>   
>   /**
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration
  2023-07-31 22:22   ` Tony Nguyen
@ 2023-08-01  9:24     ` Wu, Wenjun1
  0 siblings, 0 replies; 115+ messages in thread
From: Wu, Wenjun1 @ 2023-08-01  9:24 UTC (permalink / raw)
  To: Nguyen, Anthony L, intel-wired-lan; +Cc: Aggarwal, Mitu, Zhang, Qi Z

Thanks for the careful review!

> -----Original Message-----
> From: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> Sent: Tuesday, August 1, 2023 6:23 AM
> To: Wu, Wenjun1 <wenjun1.wu@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Aggarwal, Mitu <mitu.aggarwal@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Lobakin, Aleksander
> <aleksander.lobakin@intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue
> rate limit and quanta size configuration
> 
> On 7/26/2023 7:10 PM, Wenjun Wu wrote:
> > This patch adds new virtchnl opcodes and structures for rate limit and
> > quanta size configuration, which include:
> > 1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for
> each VF
> > per queue.
> > 2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
> > 3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration,
> > such as enabled TCs, arbiter type, up2tc and bandwidth of VSI node.
> > The configuration is previously set by DCB and PF, and now is the
> > potential QoS capability of VF. VF can take it as reference to
> > configure queue TC mapping.
> >
> > Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
> > ---
> >   include/linux/avf/virtchnl.h | 113
> +++++++++++++++++++++++++++++++++++
> >   1 file changed, 113 insertions(+)
> > +VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
> > +
> > +struct virtchnl_qos_cap_list {
> > +	u16 vsi_id;
> > +	u16 num_elem;
> > +	struct virtchnl_qos_cap_elem cap[1]; };
> 
> If it's not too late to use a flex arrays, we should. Otherwise, this should
> model after Olek's work [1].
> 
> Adding Olek in case he has input.

Yes, I will change it to flex array in the next version.

> 
> > +
> > +VIRTCHNL_CHECK_STRUCT_LEN(44, virtchnl_qos_cap_list);
> > +
> > +/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
> > +struct virtchnl_queue_bw {
> > +	u16 queue_id;
> > +	u8 tc;
> > +	u8 pad;
> > +	struct virtchnl_shaper_bw shaper;
> > +};
> > +
> > +VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
> > +
> > +struct virtchnl_queues_bw_cfg {
> > +	u16 vsi_id;
> > +	u16 num_queues;
> > +	struct virtchnl_queue_bw cfg[1];
> 
> same here

Got it.

> 
> > +};
> > +
> > +VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_queues_bw_cfg);
> > +
> > +enum virtchnl_queue_type {
> > +	VIRTCHNL_QUEUE_TYPE_TX			= 0,
> > +	VIRTCHNL_QUEUE_TYPE_RX			= 1,
> > +};
> > +
> > +/* structure to specify a chunk of contiguous queues */ struct
> > +virtchnl_queue_chunk {
> > +	/* see enum virtchnl_queue_type */
> > +	s32 type;
> > +	u16 start_queue_id;
> > +	u16 num_queues;
> > +};
> > +
> > +VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
> > +
> > +struct virtchnl_quanta_cfg {
> > +	u16 quanta_size;
> > +	struct virtchnl_queue_chunk queue_select; };
> > +
> > +VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
> > +
> >   /**
> >    * virtchnl_vc_validate_vf_msg
> >    * @ver: Virtchnl version info
> > @@ -1558,6 +1644,33 @@ virtchnl_vc_validate_vf_msg(struct
> virtchnl_version_info *ver, u32 v_opcode,
> >   	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
> >   		valid_len = sizeof(struct virtchnl_vlan_setting);
> >   		break;
> > +	case VIRTCHNL_OP_GET_QOS_CAPS:
> > +		break;
> > +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> > +		valid_len = sizeof(struct virtchnl_queues_bw_cfg);
> > +		if (msglen >= valid_len) {
> > +			struct virtchnl_queues_bw_cfg *q_bw =
> > +				(struct virtchnl_queues_bw_cfg *)msg;
> 
> missing newline here.

I will add a new line here in the next version.

> 
> > +			if (q_bw->num_queues == 0) {
> > +				err_msg_format = true;
> > +				break;
> > +			}
> > +			valid_len += (q_bw->num_queues - 1) *
> > +					 sizeof(q_bw->cfg[0]);
> 
> See referenced series for changes to this too.

I will use struct_size here in the next version.

> 
> > +		}
> > +		break;
> > +	case VIRTCHNL_OP_CONFIG_QUANTA:
> > +		valid_len = sizeof(struct virtchnl_quanta_cfg);
> > +		if (msglen >= valid_len) {
> > +			struct virtchnl_quanta_cfg *q_quanta =
> > +				(struct virtchnl_quanta_cfg *)msg;
> 
> need newline

Got it.

> 
> > +			if (q_quanta->quanta_size == 0 ||
> > +			    q_quanta->queue_select.num_queues == 0) {
> > +				err_msg_format = true;
> > +				break;
> > +			}
> > +		}
> > +		break;
> >   	/* These are always errors coming from the VF. */
> >   	case VIRTCHNL_OP_EVENT:
> >   	case VIRTCHNL_OP_UNKNOWN:
> 
> [1]
> https://lore.kernel.org/netdev/20230728155207.10042-1-
> aleksander.lobakin@intel.com/#t
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-07-31 22:23   ` Tony Nguyen
@ 2023-08-01  9:30     ` Wu, Wenjun1
  0 siblings, 0 replies; 115+ messages in thread
From: Wu, Wenjun1 @ 2023-08-01  9:30 UTC (permalink / raw)
  To: Nguyen, Anthony L, intel-wired-lan; +Cc: Aggarwal, Mitu, Zhang, Qi Z

Thanks for the careful review!

> -----Original Message-----
> From: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> Sent: Tuesday, August 1, 2023 6:23 AM
> To: Wu, Wenjun1 <wenjun1.wu@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: Aggarwal, Mitu <mitu.aggarwal@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF queue
> rate limit and quanta size configuration
> 
> 
> 
> On 7/26/2023 7:10 PM, Wenjun Wu wrote:
> > Add support to configure VF queue rate limit and quanta size.
> >
> > For quanta size configuration, the quanta profiles are divided evenly
> > by PF numbers. For each port, the first quanta profile is reserved for
> > default. When VF is asked to set queue quanta size, PF will search for
> > an available profile, change the fields and assigned this profile to
> > the queue.
> >
> > Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
> > ---
> >   drivers/net/ethernet/intel/ice/ice.h          |   2 +
> >   drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
> >   drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
> >   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
> >   drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
> >   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
> >   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
> >   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 317 ++++++++++++++++++
> >   drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
> >   .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
> >   10 files changed, 377 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice.h
> > b/drivers/net/ethernet/intel/ice/ice.h
> > index 125a2e753e29..25267ae6ab62 100644
> > --- a/drivers/net/ethernet/intel/ice/ice.h
> > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > @@ -637,6 +637,8 @@ struct ice_pf {
> > diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> > b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> > index 20f40dfeb761..999bd4633d4f 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> > @@ -500,5 +500,13 @@
> >   #define PFPM_WUS_FW_RST_WK_M			BIT(31)
> >   #define VFINT_DYN_CTLN(_i)			(0x00003800 + ((_i) *
> 4))
> >   #define VFINT_DYN_CTLN_CLEARPBA_M		BIT(1)
> > +#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68
> + ((_i) * 4))
> > +#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
> > +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
> > +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF,
> 0)
> > +#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
> > +#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF,
> 16)
> > +#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
> > +#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F,
> 24)
> 
> These don't look like the right placement within the file. Please check/correct.

Yes, I will fix it.

> 
> >
> >   #endif /* _ICE_HW_AUTOGEN_H_ */
> > +/**
> > + * ice_vc_get_qos_caps - Get current QoS caps from PF
> > + * @vf: pointer to the VF info
> > + *
> > + * Get VF's QoS capabilities, such as TC number, arbiter and
> > + * bandwidth from PF.
> > + */
> > +static int ice_vc_get_qos_caps(struct ice_vf *vf) {
> > +	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> > +	struct virtchnl_qos_cap_list *cap_list = NULL;
> > +	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = {0};
> 
> init with { } is preferred

Perhaps you mean u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
I will change it in the next version.

> 
> > +	struct virtchnl_qos_cap_elem *cfg = NULL;
> > +	struct ice_vsi_ctx *vsi_ctx;
> > +	struct ice_pf *pf = vf->pf;
> > +	struct ice_port_info *pi;
> > +	struct ice_vsi *vsi;
> > +	u8 numtc, tc;
> > +	u16 len = 0;
> > +	int ret, i;
> > +
> > +	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> > +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> > +		goto err;
> > +	}
> > +
> > +	vsi = ice_get_vf_vsi(vf);
> > +	if (!vsi) {
> > +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> > +		goto err;
> > +	}
> > +
> > +	pi = pf->hw.port_info;
> > +	numtc = vsi->tc_cfg.numtc;
> > +
> > +	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
> > +	if (!vsi_ctx) {
> > +		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> > +		goto err;
> > +	}
> > +
> > +	len = sizeof(*cap_list) + sizeof(cap_list->cap[0]) * (numtc - 1);
> 
> I believe struct_size helper can be used here

Yes, I agree.

> 
> > +	cap_list = kzalloc(len, GFP_KERNEL);
> > +	if (!cap_list) {
> > +		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> > +		len = 0;
> > +		goto err;
> > +	}
> > +
> > +	cap_list->vsi_id = vsi->vsi_num;
> > +	cap_list->num_elem = numtc;
> > +
> > +	/* Store the UP2TC configuration from DCB to a user priority bitmap
> > +	 * of each TC. Each element of prio_of_tc represents one TC. Each
> > +	 * bitmap indicates the user priorities belong to this TC.
> > +	 */
> > +	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
> > +		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
> > +		tc_prio[tc] |= BIT(i);
> > +	}
> > +
> > +	for (i = 0; i < numtc; i++) {
> > +		cfg = &cap_list->cap[i];
> > +		cfg->tc_num = i;
> > +		cfg->tc_prio = tc_prio[i];
> > +		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
> > +		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
> > +		cfg->type = VIRTCHNL_BW_SHAPER;
> > +		cfg->shaper.committed = vsi_ctx-
> >sched.bw_t_info[i].cir_bw.bw;
> > +		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
> > +	}
> > +
> > +err:
> > +	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS,
> v_ret,
> > +				    (u8 *)cap_list, len);
> > +	kfree(cap_list);
> > +	return ret;
> > +}
> > +
> > +/**
> > + * ice_vf_cfg_qs_bw - Configure per queue bandwidth
> > + * @vf: pointer to the VF info
> > + * @num_queues: number of queues to be configured
> > + *
> > + * Configure per queue bandwidth.
> > + */
> > +static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues) {
> > +	struct ice_hw *hw = &vf->pf->hw;
> > +	struct ice_vsi *vsi;
> > +	u32 p_rate;
> > +	int ret;
> > +	u16 i;
> > +	u8 tc;
> > +
> > +	vsi = ice_get_vf_vsi(vf);
> > +	if (!vsi)
> > +		return VIRTCHNL_STATUS_ERR_PARAM;
> > +
> > +	for (i = 0; i < num_queues; i++) {
> > +		p_rate = vf->qs_bw[i].peak;
> > +		tc = vf->qs_bw[i].tc;
> > +		if (p_rate) {
> > +			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
> > +					       vf->qs_bw[i].queue_id,
> > +					       ICE_MAX_BW, p_rate);
> > +		} else {
> > +			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx,
> tc,
> > +						    vf->qs_bw[i].queue_id,
> > +						    ICE_MAX_BW);
> 
> These functions return kernel error codes...
> 
> > +		}
> > +		if (ret)
> > +			return ret;
> > +	}
> > + > +	return VIRTCHNL_STATUS_SUCCESS;
> 
> ... this and the error above are returning VIRTCHNL errors. These are not
> returning consistent types.

I will change it to return 0; to align to the kernel error codes.

> 
> > +}
> > +
> > +/**
> > + * ice_vf_cfg_q_quanta_profile
> > + * @vf: pointer to the VF info
> > + * @quanta_prof_idx: pointer to the quanta profile index
> > + * @quanta_size: quanta size to be set
> > + *
> > + * This function chooses available quanta profile and configures the
> register.
> > + * The quanta profile is evenly divided by the number of device
> > +ports, and then
> > + * available to the specific PF and VFs. The first profile for each
> > +PF is a
> > + * reserved default profile. Only quanta size of the rest unused
> > +profile can be
> > + * modified.
> > + */
> > +static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
> > +				       u16 *quanta_prof_idx)
> > +{
> > +	const u16 n_desc = calc_quanta_desc(quanta_size);
> > +	struct ice_hw *hw = &vf->pf->hw;
> > +	const u16 n_cmd = 2 * n_desc;
> > +	struct ice_pf *pf = vf->pf;
> > +	u16 per_pf, begin_id;
> > +	u8 n_used;
> > +	u32 reg;
> > +
> > +	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw-
> >dev_caps.num_funcs;
> > +	begin_id = hw->logical_pf_id * per_pf;
> > +	n_used = pf->n_quanta_prof_used;
> > +
> > +	if (quanta_size == ICE_DFLT_QUANTA) {
> > +		*quanta_prof_idx = begin_id;
> > +	} else {
> > +		if (n_used < per_pf) {
> > +			*quanta_prof_idx = begin_id + 1 + n_used;
> > +			pf->n_quanta_prof_used++;
> > +		} else {
> > +			return VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
> > +		}
> > +	}
> > +
> > +	reg = rd32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx));
> > +	reg &= ~GLCOMM_QUANTA_PROF_QUANTA_SIZE_M;
> > +	reg |= quanta_size << GLCOMM_QUANTA_PROF_QUANTA_SIZE_S;
> > +	reg &= ~GLCOMM_QUANTA_PROF_MAX_CMD_M;
> > +	reg |= n_cmd << GLCOMM_QUANTA_PROF_MAX_CMD_S;
> > +	reg &= ~GLCOMM_QUANTA_PROF_MAX_DESC_M;
> > +	reg |= n_desc << GLCOMM_QUANTA_PROF_MAX_DESC_S;
> > +	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
> > +
> > +	return VIRTCHNL_STATUS_SUCCESS;
> 
> Is this really supposed to return VIRTCHNL codes? That's not a standard
> convention the driver is doing.

It seems better to use kernel error code in this function,
I will change it in the next version.

> 
> > +err:
> > +	/* send the response to the VF */
> > +	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
> > +				    v_ret, NULL, 0);
> > +	return ret;
> 
> return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
> 			    v_ret, NULL, 0);

Got it.

> 
> >
> >   /**
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support
  2023-07-31 22:21 ` [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Tony Nguyen
@ 2023-08-01 18:43   ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-01 18:43 UTC (permalink / raw)
  To: Tony Nguyen, Wenjun Wu, intel-wired-lan; +Cc: mitu.aggarwal, qi.z.zhang


On 7/31/2023 3:21 PM, Tony Nguyen wrote:
>
>
> On 7/26/2023 7:10 PM, Wenjun Wu wrote:
>> To allow user to configure queue bandwidth, devlink port support
>> is added to support devlink port rate API. [1]
>>
>> Add devlink framework registration/unregistration on iavf driver
>> initialization and remove, and devlink port of 
>> DEVLINK_PORT_FLAVOUR_VIRTUAL
>> is created to be associated iavf netdevice.
>>
>> iavf rate tree with root node, queue nodes, and leaf node is created
>> and registered with devlink rate when iavf adapter is configured, and
>> if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
>> Capability Exchange.
>>
>> [root@localhost ~]# devlink port function rate show
>> pci/0000:af:01.0/txq_15: type node parent iavf_root
>> pci/0000:af:01.0/txq_14: type node parent iavf_root
>> pci/0000:af:01.0/txq_13: type node parent iavf_root
>> pci/0000:af:01.0/txq_12: type node parent iavf_root
>> pci/0000:af:01.0/txq_11: type node parent iavf_root
>> pci/0000:af:01.0/txq_10: type node parent iavf_root
>> pci/0000:af:01.0/txq_9: type node parent iavf_root
>> pci/0000:af:01.0/txq_8: type node parent iavf_root
>> pci/0000:af:01.0/txq_7: type node parent iavf_root
>> pci/0000:af:01.0/txq_6: type node parent iavf_root
>> pci/0000:af:01.0/txq_5: type node parent iavf_root
>> pci/0000:af:01.0/txq_4: type node parent iavf_root
>> pci/0000:af:01.0/txq_3: type node parent iavf_root
>> pci/0000:af:01.0/txq_2: type node parent iavf_root
>> pci/0000:af:01.0/txq_1: type node parent iavf_root
>> pci/0000:af:01.0/txq_0: type node parent iavf_root
>> pci/0000:af:01.0/iavf_root: type node
>>
>>
>>                           +---------+
>>                           |   root  |
>>                           +----+----+
>>                                |
>>              |-----------------|-----------------|
>>         +----v----+       +----v----+       +----v----+
>>         |  txq_0  |       |  txq_1  |       |  txq_x  |
>>         +----+----+       +----+----+       +----+----+
>>
>> User can configure the tx_max and tx_share of each queue. Once any 
>> one of the
>> queues are fully configured, VIRTCHNL opcodes of 
>> VIRTCHNL_OP_CONFIG_QUEUE_BW
>> and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues 
>> allocated
>> to VF
>>
>> Example:
>>
>> 1.To Set the queue tx_share:
>> devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps
>>
>> 2.To Set the queue tx_max:
>> devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps
>>
>> 3.To Show Current devlink port rate info:
>> devlink port function rate function show
>> [root@localhost ~]# devlink port function rate show
>> pci/0000:af:01.0/txq_15: type node parent iavf_root
>> pci/0000:af:01.0/txq_14: type node parent iavf_root
>> pci/0000:af:01.0/txq_13: type node parent iavf_root
>> pci/0000:af:01.0/txq_12: type node parent iavf_root
>> pci/0000:af:01.0/txq_11: type node parent iavf_root
>> pci/0000:af:01.0/txq_10: type node parent iavf_root
>> pci/0000:af:01.0/txq_9: type node parent iavf_root
>> pci/0000:af:01.0/txq_8: type node parent iavf_root
>> pci/0000:af:01.0/txq_7: type node parent iavf_root
>> pci/0000:af:01.0/txq_6: type node parent iavf_root
>> pci/0000:af:01.0/txq_5: type node parent iavf_root
>> pci/0000:af:01.0/txq_4: type node parent iavf_root
>> pci/0000:af:01.0/txq_3: type node parent iavf_root
>> pci/0000:af:01.0/txq_2: type node parent iavf_root
>> pci/0000:af:01.0/txq_1: type node parent iavf_root
>> pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit 
>> parent iavf_root
>> pci/0000:af:01.0/iavf_root: type node
>>
>>
>> [1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/ 
>>
>>
>>
>> Jun Zhang (3):
>>    iavf: Add devlink and devlink port support
>>    iavf: Add devlink port function rate API support
>>    iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
>>
>> Wenjun Wu (2):
>>    virtchnl: support queue rate limit and quanta size configuration
>>    ice: Support VF queue rate limit and quanta size configuration
>
>
> This series does not apply.
Nice find. Will do a rebase on V2
>
>> drivers/net/ethernet/intel/Kconfig            |   1 +
>>   drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
>>   drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
>>   .../net/ethernet/intel/iavf/iavf_devlink.c    | 388 ++++++++++++++++++
>>   .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
>>   drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
>>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++-
>>   drivers/net/ethernet/intel/ice/ice.h          |   2 +
>>   drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
>>   drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
>>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
>>   drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
>>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
>>   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 317 ++++++++++++++
>>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
>>   .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
>>   include/linux/avf/virtchnl.h                  | 113 +++++
>>   18 files changed, 1225 insertions(+), 3 deletions(-)
>>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
>>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h
>>
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v2 0/5] iavf: Add devlink and devlink rate support
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
@ 2023-08-08  1:57   ` Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/

Change log:

v2:
- Change static array to flex array
- Use struct_size helper
- Align all the error code types in the function
- Move the register field definitions to the right place in the file
- Fix coding style
- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields

---
Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 388 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 114 +++++
 18 files changed, 1221 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 0/5] iavf: Add devlink and devlink rate support
@ 2023-08-08  1:57   ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/

Change log:

v2:
- Change static array to flex array
- Use struct_size helper
- Align all the error code types in the function
- Move the register field definitions to the right place in the file
- Fix coding style
- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields

---
Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 388 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 114 +++++
 18 files changed, 1221 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v2 1/5] virtchnl: support queue rate limit and quanta size configuration
  2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08  1:57     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 114 +++++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index c15221dcb75e..10566a1458bb 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1367,6 +1376,83 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_qos_cap_list);
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_queues_bw_cfg);
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 /**
  * virtchnl_vc_validate_vf_msg
  * @ver: Virtchnl version info
@@ -1558,6 +1644,34 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = sizeof(struct virtchnl_queues_bw_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+			valid_len = struct_size(q_bw, cfg, q_bw->num_queues);
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 1/5] virtchnl: support queue rate limit and quanta size configuration
@ 2023-08-08  1:57     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 114 +++++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index c15221dcb75e..10566a1458bb 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1367,6 +1376,83 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_qos_cap_list);
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_queues_bw_cfg);
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 /**
  * virtchnl_vc_validate_vf_msg
  * @ver: Virtchnl version info
@@ -1558,6 +1644,34 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = sizeof(struct virtchnl_queues_bw_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+			valid_len = struct_size(q_bw, cfg, q_bw->num_queues);
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v2 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08  1:57     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 372 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 34be1cb1e28f..677ab9571b3f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -642,6 +642,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 num_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 9ab9fb558b5e..efd01874434f 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index a0e43599eb55..910525a19a4c 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2463,6 +2463,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2480,6 +2497,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6756f3d51d14..9da94e000394 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,6 +6,14 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD_HEAD_S			0
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index a5429eca4350..504b367f1c77 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -850,6 +850,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 	enum ice_phy_model phy_model;
 
 	u16 max_burst_size;	/* driver sets this value */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 67172fdd9bc2..6499d83cc706 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u16 queue_id;
+	u32 committed;
+	u32 peak;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 85d996531502..9fc1a9d1bcd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,170 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(cap_list, cap, numtc);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	u32 p_rate;
+	int ret;
+	u16 i;
+	u8 tc;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return -EINVAL;
+
+	for (i = 0; i < num_queues; i++) {
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate) {
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		} else {
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		}
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
+	begin_id = hw->logical_pf_id * per_pf;
+	n_used = pf->num_quanta_prof_used;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->num_quanta_prof_used++;
+		} else {
+			return -EINVAL;
+		}
+	}
+
+	reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return 0;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1754,136 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vf_qs_bw *qs_bw;
+	struct ice_vsi *vsi;
+	size_t len;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
+	qs_bw = kzalloc(len, GFP_KERNEL);
+	if (!qs_bw) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		goto err_bw;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+	memcpy(vf->qs_bw, qs_bw, len);
+
+err_bw:
+	kfree(qs_bw);
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	num_queues = qquanta->queue_select.num_queues;
+	quanta_size = qquanta->quanta_size;
+	end_qid = start_qid + num_queues;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2007,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3987,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4040,6 +4343,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 2/5] ice: Support VF queue rate limit and quanta size configuration
@ 2023-08-08  1:57     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 372 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 34be1cb1e28f..677ab9571b3f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -642,6 +642,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 num_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 9ab9fb558b5e..efd01874434f 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index a0e43599eb55..910525a19a4c 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2463,6 +2463,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2480,6 +2497,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6756f3d51d14..9da94e000394 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,6 +6,14 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD_HEAD_S			0
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index a5429eca4350..504b367f1c77 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -850,6 +850,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 	enum ice_phy_model phy_model;
 
 	u16 max_burst_size;	/* driver sets this value */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 67172fdd9bc2..6499d83cc706 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u16 queue_id;
+	u32 committed;
+	u32 peak;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 85d996531502..9fc1a9d1bcd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,170 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(cap_list, cap, numtc);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	u32 p_rate;
+	int ret;
+	u16 i;
+	u8 tc;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return -EINVAL;
+
+	for (i = 0; i < num_queues; i++) {
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate) {
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		} else {
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		}
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
+	begin_id = hw->logical_pf_id * per_pf;
+	n_used = pf->num_quanta_prof_used;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->num_quanta_prof_used++;
+		} else {
+			return -EINVAL;
+		}
+	}
+
+	reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return 0;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1754,136 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vf_qs_bw *qs_bw;
+	struct ice_vsi *vsi;
+	size_t len;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
+	qs_bw = kzalloc(len, GFP_KERNEL);
+	if (!qs_bw) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		goto err_bw;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+	memcpy(vf->qs_bw, qs_bw, len);
+
+err_bw:
+	kfree(qs_bw);
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	num_queues = qquanta->queue_select.num_queues;
+	quanta_size = qquanta->quanta_size;
+	end_qid = start_qid + num_queues;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2007,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3987,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4040,6 +4343,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v2 3/5] iavf: Add devlink and devlink port support
  2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08  1:57     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
 6 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 9bc0a9519899..f916b8ef6acb 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 8cbdebc5b698..519aeaec793c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,10 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	/* devlink & port data */
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..991d041e5922
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register allocated devlink instance for iavf adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Register the devlink instance associated with this iavf adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	/* Allocate devlink instance */
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	/* Init iavf adapter devlink */
+	adapter->devlink = devlink;
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	/* Create devlink port: attr/port flavour, port index */
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..5c122278611a
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+/* iavf devlink structure pointing to iavf adapter */
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 7b300c86ceda..db010e68d5d2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2708,6 +2709,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2749,6 +2753,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4995,6 +5000,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err)
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+
+	/* Keep driver interface even on devlink registration failure */
 	return 0;
 
 err_ioremap:
@@ -5139,6 +5150,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 3/5] iavf: Add devlink and devlink port support
@ 2023-08-08  1:57     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
 6 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 9bc0a9519899..f916b8ef6acb 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 8cbdebc5b698..519aeaec793c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,10 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	/* devlink & port data */
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..991d041e5922
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register allocated devlink instance for iavf adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Register the devlink instance associated with this iavf adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	/* Allocate devlink instance */
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	/* Init iavf adapter devlink */
+	adapter->devlink = devlink;
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	/* Create devlink port: attr/port flavour, port index */
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..5c122278611a
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+/* iavf devlink structure pointing to iavf adapter */
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 7b300c86ceda..db010e68d5d2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2708,6 +2709,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2749,6 +2753,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4995,6 +5000,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err)
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+
+	/* Keep driver interface even on devlink registration failure */
 	return 0;
 
 err_ioremap:
@@ -5139,6 +5150,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
  2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08  1:57     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 270 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 991d041e5922..a2bd5295c216 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,273 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num, size, i;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		size = q_num * sizeof(struct iavf_dev_rate_node);
+		dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		memset(dl_priv->queue_nodes, 0, size);
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag ==
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+		node->tx_max = node->tx_max_temp;
+		node->tx_share = node->tx_share_temp;
+	} else {
+		return 0;
+	}
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
+ * @rate_node: devlink rate struct instance
+ *
+ * This function implements rate_node_tx_max_set function of devlink_ops
+ */
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+/**
+ * iavf_devlink_rate_node_tx_share_set - devlink_rate API for setting tx share
+ * @rate_node: devlink rate struct instance
+ *
+ * This function implements rate_node_tx_share_set function of devlink_ops
+ */
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register allocated devlink instance for iavf adapter
@@ -30,7 +296,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	adapter->devlink = devlink;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 5c122278611a..897ff5fc87af 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,14 +4,35 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 /* iavf devlink structure pointing to iavf adapter */
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index db010e68d5d2..7348b65f9f19 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2709,8 +2710,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2753,6 +2756,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5150,6 +5154,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
@ 2023-08-08  1:57     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 270 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 991d041e5922..a2bd5295c216 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,273 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num, size, i;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		size = q_num * sizeof(struct iavf_dev_rate_node);
+		dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		memset(dl_priv->queue_nodes, 0, size);
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag ==
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+		node->tx_max = node->tx_max_temp;
+		node->tx_share = node->tx_share_temp;
+	} else {
+		return 0;
+	}
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
+ * @rate_node: devlink rate struct instance
+ *
+ * This function implements rate_node_tx_max_set function of devlink_ops
+ */
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+/**
+ * iavf_devlink_rate_node_tx_share_set - devlink_rate API for setting tx share
+ * @rate_node: devlink rate struct instance
+ *
+ * This function implements rate_node_tx_share_set function of devlink_ops
+ */
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register allocated devlink instance for iavf adapter
@@ -30,7 +296,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	adapter->devlink = devlink;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 5c122278611a..897ff5fc87af 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,14 +4,35 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 /* iavf devlink structure pointing to iavf adapter */
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index db010e68d5d2..7348b65f9f19 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2709,8 +2710,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2753,6 +2756,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5150,6 +5154,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08  1:57     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++++++++++-
 5 files changed, 313 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 519aeaec793c..e9b781cacffa 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -374,6 +380,7 @@ struct iavf_adapter {
 	/* devlink & port data */
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -423,6 +430,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -431,6 +440,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -577,6 +587,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index a2bd5295c216..dbe88eb538a8 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -96,6 +96,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -107,6 +131,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -294,6 +322,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	/* Init iavf adapter devlink */
 	adapter->devlink = devlink;
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 897ff5fc87af..a8a41f343f56 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -34,5 +34,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 7348b65f9f19..5e27131e5104 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2130,6 +2130,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2712,7 +2727,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3135,6 +3152,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4900,7 +4930,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -5004,10 +5034,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps)
+		goto err_ioremap;
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
-	if (err)
+	if (err) {
 		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		kfree(adapter->qos_caps);
+		goto err_ioremap;
+	}
 
 	/* Keep driver interface even on devlink registration failure */
 	return 0;
@@ -5157,6 +5195,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index be3c007ce90a..7de4ad5029fb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1479,6 +1480,209 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	int q_idx;
+	int i, j;
+	u16 tc;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2138,6 +2342,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-08  1:57     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-08  1:57 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++++++++++-
 5 files changed, 313 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 519aeaec793c..e9b781cacffa 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -374,6 +380,7 @@ struct iavf_adapter {
 	/* devlink & port data */
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -423,6 +430,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -431,6 +440,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -577,6 +587,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index a2bd5295c216..dbe88eb538a8 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -96,6 +96,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -107,6 +131,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -294,6 +322,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	/* Init iavf adapter devlink */
 	adapter->devlink = devlink;
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 897ff5fc87af..a8a41f343f56 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -34,5 +34,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 7348b65f9f19..5e27131e5104 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2130,6 +2130,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2712,7 +2727,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3135,6 +3152,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4900,7 +4930,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -5004,10 +5034,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps)
+		goto err_ioremap;
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
-	if (err)
+	if (err) {
 		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		kfree(adapter->qos_caps);
+		goto err_ioremap;
+	}
 
 	/* Keep driver interface even on devlink registration failure */
 	return 0;
@@ -5157,6 +5195,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index be3c007ce90a..7de4ad5029fb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1479,6 +1480,209 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	int q_idx;
+	int i, j;
+	u16 tc;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2138,6 +2342,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
  2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08 20:49       ` Simon Horman
  -1 siblings, 0 replies; 115+ messages in thread
From: Simon Horman @ 2023-08-08 20:49 UTC (permalink / raw)
  To: Wenjun Wu
  Cc: intel-wired-lan, netdev, xuejun.zhang, madhu.chittim, qi.z.zhang,
	anthony.l.nguyen

On Tue, Aug 08, 2023 at 09:57:33AM +0800, Wenjun Wu wrote:
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> To allow user to configure queue based parameters, devlink port function
> rate api functions are added for setting node tx_max and tx_share
> parameters.
> 
> iavf rate tree with root node and  queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>

...

> +/**
> + * iavf_update_queue_tx_max - sets tx max parameter
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + * @bw: bandwidth in bytes per second
> + * @extack: extended netdev ack structure
> + *
> + * This function sets max BW limit.
> + */
> +static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
> +				    struct iavf_dev_rate_node *node,
> +				    u64 bw, struct netlink_ext_ack *extack)
> +{
> +	/* Keep in kbps */
> +	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
> +	if (ADV_LINK_SUPPORT(adapter)) {
> +		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
> +			return -EINVAL;
> +	}
> +
> +	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
> +
> +	return iavf_check_update_config(adapter, node);
> +}
> +
> +/**
> + * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
> + * @rate_node: devlink rate struct instance

Hi Jun Zhang,

Please describe all the parameters of iavf_devlink_rate_node_tx_max_set
in it's kernel doc.

./scripts/kernel-doc -none is your friend here.

> + *
> + * This function implements rate_node_tx_max_set function of devlink_ops
> + */
> +static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
> +					     void *priv, u64 tx_max,
> +					     struct netlink_ext_ack *extack)

...

-- 
pw-bot: changes-requested

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
@ 2023-08-08 20:49       ` Simon Horman
  0 siblings, 0 replies; 115+ messages in thread
From: Simon Horman @ 2023-08-08 20:49 UTC (permalink / raw)
  To: Wenjun Wu; +Cc: netdev, anthony.l.nguyen, qi.z.zhang, intel-wired-lan

On Tue, Aug 08, 2023 at 09:57:33AM +0800, Wenjun Wu wrote:
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> To allow user to configure queue based parameters, devlink port function
> rate api functions are added for setting node tx_max and tx_share
> parameters.
> 
> iavf rate tree with root node and  queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>

...

> +/**
> + * iavf_update_queue_tx_max - sets tx max parameter
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + * @bw: bandwidth in bytes per second
> + * @extack: extended netdev ack structure
> + *
> + * This function sets max BW limit.
> + */
> +static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
> +				    struct iavf_dev_rate_node *node,
> +				    u64 bw, struct netlink_ext_ack *extack)
> +{
> +	/* Keep in kbps */
> +	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
> +	if (ADV_LINK_SUPPORT(adapter)) {
> +		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
> +			return -EINVAL;
> +	}
> +
> +	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
> +
> +	return iavf_check_update_config(adapter, node);
> +}
> +
> +/**
> + * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
> + * @rate_node: devlink rate struct instance

Hi Jun Zhang,

Please describe all the parameters of iavf_devlink_rate_node_tx_max_set
in it's kernel doc.

./scripts/kernel-doc -none is your friend here.

> + *
> + * This function implements rate_node_tx_max_set function of devlink_ops
> + */
> +static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
> +					     void *priv, u64 tx_max,
> +					     struct netlink_ext_ack *extack)

...

-- 
pw-bot: changes-requested
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-08 20:54       ` Simon Horman
  -1 siblings, 0 replies; 115+ messages in thread
From: Simon Horman @ 2023-08-08 20:54 UTC (permalink / raw)
  To: Wenjun Wu
  Cc: intel-wired-lan, netdev, xuejun.zhang, madhu.chittim, qi.z.zhang,
	anthony.l.nguyen

On Tue, Aug 08, 2023 at 09:57:34AM +0800, Wenjun Wu wrote:
> From: Jun Zhang <xuejun.zhang@intel.com>

...

> @@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
>  		if (!v_retval)
>  			iavf_netdev_features_vlan_strip_set(netdev, false);
>  		break;
> +	case VIRTCHNL_OP_GET_QOS_CAPS:
> +		u16 len = struct_size(adapter->qos_caps, cap,
> +				      IAVF_MAX_QOS_TC_NUM);

Hi Jun Zhang and Wenju Wu,

clang-16 complains about this quite a lot.
I think it is because it wants the declaration of len - and thus
the rest of this case - inside a block ({}).

 .../iavf_virtchnl.c:2691:3: error: expected expression
                 u16 len = struct_size(adapter->qos_caps, cap,
                 ^
 .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'
                 memcpy(adapter->qos_caps, msg, min(msglen, len));
                                                            ^
 .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'

> +		memcpy(adapter->qos_caps, msg, min(msglen, len));
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUANTA:
> +		iavf_notify_queue_config_complete(adapter);
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> +		break;
>  	default:
>  		if (adapter->current_op && (v_opcode != adapter->current_op))
>  			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
> -- 
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-08 20:54       ` Simon Horman
  0 siblings, 0 replies; 115+ messages in thread
From: Simon Horman @ 2023-08-08 20:54 UTC (permalink / raw)
  To: Wenjun Wu; +Cc: netdev, anthony.l.nguyen, qi.z.zhang, intel-wired-lan

On Tue, Aug 08, 2023 at 09:57:34AM +0800, Wenjun Wu wrote:
> From: Jun Zhang <xuejun.zhang@intel.com>

...

> @@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
>  		if (!v_retval)
>  			iavf_netdev_features_vlan_strip_set(netdev, false);
>  		break;
> +	case VIRTCHNL_OP_GET_QOS_CAPS:
> +		u16 len = struct_size(adapter->qos_caps, cap,
> +				      IAVF_MAX_QOS_TC_NUM);

Hi Jun Zhang and Wenju Wu,

clang-16 complains about this quite a lot.
I think it is because it wants the declaration of len - and thus
the rest of this case - inside a block ({}).

 .../iavf_virtchnl.c:2691:3: error: expected expression
                 u16 len = struct_size(adapter->qos_caps, cap,
                 ^
 .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'
                 memcpy(adapter->qos_caps, msg, min(msglen, len));
                                                            ^
 .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'

> +		memcpy(adapter->qos_caps, msg, min(msglen, len));
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUANTA:
> +		iavf_notify_queue_config_complete(adapter);
> +		break;
> +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> +		break;
>  	default:
>  		if (adapter->current_op && (v_opcode != adapter->current_op))
>  			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
> -- 
> 2.34.1
> 
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
  2023-08-08 20:49       ` [Intel-wired-lan] " Simon Horman
@ 2023-08-09 18:43         ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-09 18:43 UTC (permalink / raw)
  To: Simon Horman, Wenjun Wu
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang


On 8/8/2023 1:49 PM, Simon Horman wrote:
> On Tue, Aug 08, 2023 at 09:57:33AM +0800, Wenjun Wu wrote:
>> From: Jun Zhang <xuejun.zhang@intel.com>
>>
>> To allow user to configure queue based parameters, devlink port function
>> rate api functions are added for setting node tx_max and tx_share
>> parameters.
>>
>> iavf rate tree with root node and  queue nodes is created and registered
>> with devlink rate when iavf adapter is configured.
>>
>> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ...
>
>> +/**
>> + * iavf_update_queue_tx_max - sets tx max parameter
>> + * @adapter: iavf adapter struct instance
>> + * @node: iavf rate node struct instance
>> + * @bw: bandwidth in bytes per second
>> + * @extack: extended netdev ack structure
>> + *
>> + * This function sets max BW limit.
>> + */
>> +static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
>> +				    struct iavf_dev_rate_node *node,
>> +				    u64 bw, struct netlink_ext_ack *extack)
>> +{
>> +	/* Keep in kbps */
>> +	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
>> +	if (ADV_LINK_SUPPORT(adapter)) {
>> +		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
>> +			return -EINVAL;
>> +	}
>> +
>> +	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
>> +
>> +	return iavf_check_update_config(adapter, node);
>> +}
>> +
>> +/**
>> + * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
>> + * @rate_node: devlink rate struct instance
> Hi Jun Zhang,
>
> Please describe all the parameters of iavf_devlink_rate_node_tx_max_set
> in it's kernel doc.
>
> ./scripts/kernel-doc -none is your friend here.

Thanks.

As this function is an implementation of Kernel API. To be in sync w/ 
kernel api definition documentation, will omit the function description 
here.

>> + *
>> + * This function implements rate_node_tx_max_set function of devlink_ops
>> + */
>> +static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
>> +					     void *priv, u64 tx_max,
>> +					     struct netlink_ext_ack *extack)
> ...
>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
@ 2023-08-09 18:43         ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-09 18:43 UTC (permalink / raw)
  To: Simon Horman, Wenjun Wu
  Cc: intel-wired-lan, netdev, madhu.chittim, qi.z.zhang, anthony.l.nguyen


On 8/8/2023 1:49 PM, Simon Horman wrote:
> On Tue, Aug 08, 2023 at 09:57:33AM +0800, Wenjun Wu wrote:
>> From: Jun Zhang <xuejun.zhang@intel.com>
>>
>> To allow user to configure queue based parameters, devlink port function
>> rate api functions are added for setting node tx_max and tx_share
>> parameters.
>>
>> iavf rate tree with root node and  queue nodes is created and registered
>> with devlink rate when iavf adapter is configured.
>>
>> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ...
>
>> +/**
>> + * iavf_update_queue_tx_max - sets tx max parameter
>> + * @adapter: iavf adapter struct instance
>> + * @node: iavf rate node struct instance
>> + * @bw: bandwidth in bytes per second
>> + * @extack: extended netdev ack structure
>> + *
>> + * This function sets max BW limit.
>> + */
>> +static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
>> +				    struct iavf_dev_rate_node *node,
>> +				    u64 bw, struct netlink_ext_ack *extack)
>> +{
>> +	/* Keep in kbps */
>> +	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
>> +	if (ADV_LINK_SUPPORT(adapter)) {
>> +		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
>> +			return -EINVAL;
>> +	}
>> +
>> +	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
>> +
>> +	return iavf_check_update_config(adapter, node);
>> +}
>> +
>> +/**
>> + * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
>> + * @rate_node: devlink rate struct instance
> Hi Jun Zhang,
>
> Please describe all the parameters of iavf_devlink_rate_node_tx_max_set
> in it's kernel doc.
>
> ./scripts/kernel-doc -none is your friend here.

Thanks.

As this function is an implementation of Kernel API. To be in sync w/ 
kernel api definition documentation, will omit the function description 
here.

>> + *
>> + * This function implements rate_node_tx_max_set function of devlink_ops
>> + */
>> +static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
>> +					     void *priv, u64 tx_max,
>> +					     struct netlink_ext_ack *extack)
> ...
>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-08 20:54       ` [Intel-wired-lan] " Simon Horman
@ 2023-08-09 18:44         ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-09 18:44 UTC (permalink / raw)
  To: Simon Horman, Wenjun Wu
  Cc: intel-wired-lan, netdev, madhu.chittim, qi.z.zhang, anthony.l.nguyen


On 8/8/2023 1:54 PM, Simon Horman wrote:
> On Tue, Aug 08, 2023 at 09:57:34AM +0800, Wenjun Wu wrote:
>> From: Jun Zhang <xuejun.zhang@intel.com>
> ...
>
>> @@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
>>   		if (!v_retval)
>>   			iavf_netdev_features_vlan_strip_set(netdev, false);
>>   		break;
>> +	case VIRTCHNL_OP_GET_QOS_CAPS:
>> +		u16 len = struct_size(adapter->qos_caps, cap,
>> +				      IAVF_MAX_QOS_TC_NUM);
> Hi Jun Zhang and Wenju Wu,
>
> clang-16 complains about this quite a lot.
> I think it is because it wants the declaration of len - and thus
> the rest of this case - inside a block ({}).
>
>   .../iavf_virtchnl.c:2691:3: error: expected expression
>                   u16 len = struct_size(adapter->qos_caps, cap,
>                   ^
>   .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'
>                   memcpy(adapter->qos_caps, msg, min(msglen, len));
>                                                              ^
>   .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'
Thanks for the solution.
>> +		memcpy(adapter->qos_caps, msg, min(msglen, len));
>> +		break;
>> +	case VIRTCHNL_OP_CONFIG_QUANTA:
>> +		iavf_notify_queue_config_complete(adapter);
>> +		break;
>> +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
>> +		break;
>>   	default:
>>   		if (adapter->current_op && (v_opcode != adapter->current_op))
>>   			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
>> -- 
>> 2.34.1
>>
>>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-09 18:44         ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-09 18:44 UTC (permalink / raw)
  To: Simon Horman, Wenjun Wu
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang


On 8/8/2023 1:54 PM, Simon Horman wrote:
> On Tue, Aug 08, 2023 at 09:57:34AM +0800, Wenjun Wu wrote:
>> From: Jun Zhang <xuejun.zhang@intel.com>
> ...
>
>> @@ -2471,6 +2687,16 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
>>   		if (!v_retval)
>>   			iavf_netdev_features_vlan_strip_set(netdev, false);
>>   		break;
>> +	case VIRTCHNL_OP_GET_QOS_CAPS:
>> +		u16 len = struct_size(adapter->qos_caps, cap,
>> +				      IAVF_MAX_QOS_TC_NUM);
> Hi Jun Zhang and Wenju Wu,
>
> clang-16 complains about this quite a lot.
> I think it is because it wants the declaration of len - and thus
> the rest of this case - inside a block ({}).
>
>   .../iavf_virtchnl.c:2691:3: error: expected expression
>                   u16 len = struct_size(adapter->qos_caps, cap,
>                   ^
>   .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'
>                   memcpy(adapter->qos_caps, msg, min(msglen, len));
>                                                              ^
>   .../iavf_virtchnl.c:2693:46: error: use of undeclared identifier 'len'
Thanks for the solution.
>> +		memcpy(adapter->qos_caps, msg, min(msglen, len));
>> +		break;
>> +	case VIRTCHNL_OP_CONFIG_QUANTA:
>> +		iavf_notify_queue_config_complete(adapter);
>> +		break;
>> +	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
>> +		break;
>>   	default:
>>   		if (adapter->current_op && (v_opcode != adapter->current_op))
>>   			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
>> -- 
>> 2.34.1
>>
>>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v3 0/5] iavf: Add devlink and devlink rate support
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
@ 2023-08-16  3:33   ` Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/

Change log:

v3:
- Rebase the code
- Changed rate node max/share set function description
- Put variable in local scope

v2:
- Change static array to flex array
- Use struct_size helper
- Align all the error code types in the function
- Move the register field definitions to the right place in the file
- Fix coding style
- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields

---
Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 376 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 230 ++++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 +++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 119 ++++++
 18 files changed, 1216 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v3 0/5] iavf: Add devlink and devlink rate support
@ 2023-08-16  3:33   ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/

Change log:

v3:
- Rebase the code
- Changed rate node max/share set function description
- Put variable in local scope

v2:
- Change static array to flex array
- Use struct_size helper
- Align all the error code types in the function
- Move the register field definitions to the right place in the file
- Fix coding style
- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields

---
Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  20 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 376 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  39 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  60 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 230 ++++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 +++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 119 ++++++
 18 files changed, 1216 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v3 1/5] virtchnl: support queue rate limit and quanta size configuration
  2023-08-16  3:33   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16  3:33     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 119 +++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index d0807ad43f93..0132c002ca06 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1377,6 +1386,85 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_qos_cap_list);
+#define virtchnl_qos_cap_list_LEGACY_SIZEOF	44
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_queues_bw_cfg);
+#define virtchnl_queues_bw_cfg_LEGACY_SIZEOF	16
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 #define __vss_byone(p, member, count, old)				      \
 	(struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0)))
 
@@ -1399,6 +1487,8 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 		 __vss(virtchnl_vlan_filter_list_v2, __vss_byelem, p, m, c),  \
 		 __vss(virtchnl_tc_info, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rdma_qvlist_info, __vss_byelem, p, m, c),     \
+		 __vss(virtchnl_qos_cap_list, __vss_byelem, p, m, c),	      \
+		 __vss(virtchnl_queues_bw_cfg, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rss_key, __vss_byone, p, m, c),		      \
 		 __vss(virtchnl_rss_lut, __vss_byone, p, m, c))
 
@@ -1595,6 +1685,35 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = virtchnl_queues_bw_cfg_LEGACY_SIZEOF;
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+
+			valid_len = virtchnl_struct_size(q_bw, cfg,
+							 q_bw->num_queues);
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v3 1/5] virtchnl: support queue rate limit and quanta size configuration
@ 2023-08-16  3:33     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 119 +++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index d0807ad43f93..0132c002ca06 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1377,6 +1386,85 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_qos_cap_list);
+#define virtchnl_qos_cap_list_LEGACY_SIZEOF	44
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_queues_bw_cfg);
+#define virtchnl_queues_bw_cfg_LEGACY_SIZEOF	16
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 #define __vss_byone(p, member, count, old)				      \
 	(struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0)))
 
@@ -1399,6 +1487,8 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 		 __vss(virtchnl_vlan_filter_list_v2, __vss_byelem, p, m, c),  \
 		 __vss(virtchnl_tc_info, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rdma_qvlist_info, __vss_byelem, p, m, c),     \
+		 __vss(virtchnl_qos_cap_list, __vss_byelem, p, m, c),	      \
+		 __vss(virtchnl_queues_bw_cfg, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rss_key, __vss_byone, p, m, c),		      \
 		 __vss(virtchnl_rss_lut, __vss_byone, p, m, c))
 
@@ -1595,6 +1685,35 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = virtchnl_queues_bw_cfg_LEGACY_SIZEOF;
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+
+			valid_len = virtchnl_struct_size(q_bw, cfg,
+							 q_bw->num_queues);
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v3 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-08-16  3:33   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16  3:33     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 372 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 5d307bacf7c6..a4c9e6523fba 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -644,6 +644,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 num_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 7fa43827a3f0..2b9319801dc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 2652e4f5c4a2..7076bc1d85ab 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2251,6 +2251,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2268,6 +2285,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6756f3d51d14..9da94e000394 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,6 +6,14 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD_HEAD_S			0
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index f6061b508857..e9164c866315 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -833,6 +833,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 	enum ice_phy_model phy_model;
 
 	u16 max_burst_size;	/* driver sets this value */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 48fea6fa0362..a6078b583b79 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u16 queue_id;
+	u32 committed;
+	u32 peak;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index b03426ac932b..3aec6b5ad3aa 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,170 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(cap_list, cap, numtc);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	u32 p_rate;
+	int ret;
+	u16 i;
+	u8 tc;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return -EINVAL;
+
+	for (i = 0; i < num_queues; i++) {
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate) {
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		} else {
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		}
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
+	begin_id = hw->logical_pf_id * per_pf;
+	n_used = pf->num_quanta_prof_used;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->num_quanta_prof_used++;
+		} else {
+			return -EINVAL;
+		}
+	}
+
+	reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return 0;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1754,136 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vf_qs_bw *qs_bw;
+	struct ice_vsi *vsi;
+	size_t len;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
+	qs_bw = kzalloc(len, GFP_KERNEL);
+	if (!qs_bw) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		goto err_bw;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+	memcpy(vf->qs_bw, qs_bw, len);
+
+err_bw:
+	kfree(qs_bw);
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	num_queues = qquanta->queue_select.num_queues;
+	quanta_size = qquanta->quanta_size;
+	end_qid = start_qid + num_queues;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2007,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3987,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4039,6 +4342,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v3 2/5] ice: Support VF queue rate limit and quanta size configuration
@ 2023-08-16  3:33     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 372 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 5d307bacf7c6..a4c9e6523fba 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -644,6 +644,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 num_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 7fa43827a3f0..2b9319801dc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 2652e4f5c4a2..7076bc1d85ab 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2251,6 +2251,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2268,6 +2285,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6756f3d51d14..9da94e000394 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,6 +6,14 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD_HEAD_S			0
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index f6061b508857..e9164c866315 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -833,6 +833,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 	enum ice_phy_model phy_model;
 
 	u16 max_burst_size;	/* driver sets this value */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 48fea6fa0362..a6078b583b79 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u16 queue_id;
+	u32 committed;
+	u32 peak;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index b03426ac932b..3aec6b5ad3aa 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,170 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(cap_list, cap, numtc);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	u32 p_rate;
+	int ret;
+	u16 i;
+	u8 tc;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return -EINVAL;
+
+	for (i = 0; i < num_queues; i++) {
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate) {
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		} else {
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		}
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
+	begin_id = hw->logical_pf_id * per_pf;
+	n_used = pf->num_quanta_prof_used;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->num_quanta_prof_used++;
+		} else {
+			return -EINVAL;
+		}
+	}
+
+	reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return 0;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1754,136 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vf_qs_bw *qs_bw;
+	struct ice_vsi *vsi;
+	size_t len;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
+	qs_bw = kzalloc(len, GFP_KERNEL);
+	if (!qs_bw) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		goto err_bw;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+	memcpy(vf->qs_bw, qs_bw, len);
+
+err_bw:
+	kfree(qs_bw);
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	num_queues = qquanta->queue_select.num_queues;
+	quanta_size = qquanta->quanta_size;
+	end_qid = start_qid + num_queues;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2007,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3987,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4039,6 +4342,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v3 3/5] iavf: Add devlink and devlink port support
  2023-08-16  3:33   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16  3:33     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
 6 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index d57f70d6e4d4..5bda31c4c652 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 85fba85fbb23..eec294b5a426 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,10 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	/* devlink & port data */
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..991d041e5922
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register allocated devlink instance for iavf adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Register the devlink instance associated with this iavf adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	/* Allocate devlink instance */
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	/* Init iavf adapter devlink */
+	adapter->devlink = devlink;
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	/* Create devlink port: attr/port flavour, port index */
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..5c122278611a
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+/* iavf devlink structure pointing to iavf adapter */
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index b23ca9d80189..1fb14f3f1ad0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2709,6 +2710,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2750,6 +2754,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4996,6 +5001,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err)
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+
+	/* Keep driver interface even on devlink registration failure */
 	return 0;
 
 err_ioremap:
@@ -5140,6 +5151,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v3 3/5] iavf: Add devlink and devlink port support
@ 2023-08-16  3:33     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
 6 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index d57f70d6e4d4..5bda31c4c652 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 85fba85fbb23..eec294b5a426 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,10 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	/* devlink & port data */
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..991d041e5922
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register allocated devlink instance for iavf adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Register the devlink instance associated with this iavf adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	/* Allocate devlink instance */
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	if (!devlink)
+		return -ENOMEM;
+
+	/* Init iavf adapter devlink */
+	adapter->devlink = devlink;
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	/* Create devlink port: attr/port flavour, port index */
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..5c122278611a
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+/* iavf devlink structure pointing to iavf adapter */
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index b23ca9d80189..1fb14f3f1ad0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2709,6 +2710,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2750,6 +2754,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4996,6 +5001,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err)
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+
+	/* Keep driver interface even on devlink registration failure */
 	return 0;
 
 err_ioremap:
@@ -5140,6 +5151,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v3 4/5] iavf: Add devlink port function rate API support
  2023-08-16  3:33   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16  3:33     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 258 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 283 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 991d041e5922..24ba3744859a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,261 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num, size, i;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		size = q_num * sizeof(struct iavf_dev_rate_node);
+		dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		memset(dl_priv->queue_nodes, 0, size);
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag ==
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+		node->tx_max = node->tx_max_temp;
+		node->tx_share = node->tx_share_temp;
+	} else {
+		return 0;
+	}
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register allocated devlink instance for iavf adapter
@@ -30,7 +284,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	adapter->devlink = devlink;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 5c122278611a..897ff5fc87af 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,14 +4,35 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 /* iavf devlink structure pointing to iavf adapter */
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 1fb14f3f1ad0..2aec6427d5e2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2710,8 +2711,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2754,6 +2757,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5151,6 +5155,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v3 4/5] iavf: Add devlink port function rate API support
@ 2023-08-16  3:33     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 258 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 283 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 991d041e5922..24ba3744859a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,261 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num, size, i;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		size = q_num * sizeof(struct iavf_dev_rate_node);
+		dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		memset(dl_priv->queue_nodes, 0, size);
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag ==
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+		node->tx_max = node->tx_max_temp;
+		node->tx_share = node->tx_share_temp;
+	} else {
+		return 0;
+	}
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register allocated devlink instance for iavf adapter
@@ -30,7 +284,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	adapter->devlink = devlink;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 5c122278611a..897ff5fc87af 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,14 +4,35 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 /* iavf devlink structure pointing to iavf adapter */
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 1fb14f3f1ad0..2aec6427d5e2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2710,8 +2711,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2754,6 +2757,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5151,6 +5155,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v3 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-16  3:33   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16  3:33     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 230 +++++++++++++++++-
 5 files changed, 315 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index eec294b5a426..27a230f58816 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -374,6 +380,7 @@ struct iavf_adapter {
 	/* devlink & port data */
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -423,6 +430,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -431,6 +440,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -577,6 +587,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 24ba3744859a..0ab9a0a9823e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -96,6 +96,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -107,6 +131,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -282,6 +310,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	/* Init iavf adapter devlink */
 	adapter->devlink = devlink;
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 897ff5fc87af..a8a41f343f56 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -34,5 +34,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 2aec6427d5e2..58795a15c09b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2131,6 +2131,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2713,7 +2728,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3136,6 +3153,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4901,7 +4931,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -5005,10 +5035,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps)
+		goto err_ioremap;
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
-	if (err)
+	if (err) {
 		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		kfree(adapter->qos_caps);
+		goto err_ioremap;
+	}
 
 	/* Keep driver interface even on devlink registration failure */
 	return 0;
@@ -5158,6 +5196,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index f9727e9c3d63..146f06831bd3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1465,6 +1466,209 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	int q_idx;
+	int i, j;
+	u16 tc;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2124,6 +2328,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2456,6 +2672,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS: {
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v3 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-16  3:33     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-16  3:33 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 230 +++++++++++++++++-
 5 files changed, 315 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index eec294b5a426..27a230f58816 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -374,6 +380,7 @@ struct iavf_adapter {
 	/* devlink & port data */
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -423,6 +430,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -431,6 +440,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -577,6 +587,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 24ba3744859a..0ab9a0a9823e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -96,6 +96,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -107,6 +131,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -282,6 +310,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	/* Init iavf adapter devlink */
 	adapter->devlink = devlink;
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 897ff5fc87af..a8a41f343f56 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -34,5 +34,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 2aec6427d5e2..58795a15c09b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2131,6 +2131,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2713,7 +2728,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3136,6 +3153,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4901,7 +4931,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -5005,10 +5035,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps)
+		goto err_ioremap;
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
-	if (err)
+	if (err) {
 		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		kfree(adapter->qos_caps);
+		goto err_ioremap;
+	}
 
 	/* Keep driver interface even on devlink registration failure */
 	return 0;
@@ -5158,6 +5196,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index f9727e9c3d63..146f06831bd3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1465,6 +1466,209 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	int q_idx;
+	int i, j;
+	u16 tc;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2124,6 +2328,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2456,6 +2672,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS: {
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v3 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-16  3:33     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16  9:14       ` Simon Horman
  -1 siblings, 0 replies; 115+ messages in thread
From: Simon Horman @ 2023-08-16  9:14 UTC (permalink / raw)
  To: Wenjun Wu
  Cc: intel-wired-lan, netdev, xuejun.zhang, madhu.chittim, qi.z.zhang,
	anthony.l.nguyen

On Wed, Aug 16, 2023 at 11:33:53AM +0800, Wenjun Wu wrote:
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> iavf rate tree with root node and queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> User can configure the tx_max and tx_share of each queue. If any one of
> the queues have been fully updated by user, i.e. both tx_max and
> tx_share have been updated for that queue, VIRTCHNL opcodes of
> VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
> to PF to configure queues allocated to VF if PF indicates support of
> VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>

...

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c

...

> @@ -5005,10 +5035,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	/* Setup the wait queue for indicating virtchannel events */
>  	init_waitqueue_head(&adapter->vc_waitqueue);
>  
> +	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
> +	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
> +	if (!adapter->qos_caps)

Hi Jun Zhang and Wenjun Wu,

The goto below leads to the function returning err.
Should err be set to an error value here?

As flagged by Smatch and Coccinelle.

> +		goto err_ioremap;
> +
>  	/* Register iavf adapter with devlink */
>  	err = iavf_devlink_register(adapter);
> -	if (err)
> +	if (err) {
>  		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
> +		kfree(adapter->qos_caps);
> +		goto err_ioremap;
> +	}
>  
>  	/* Keep driver interface even on devlink registration failure */
>  	return 0;

...

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-16  9:14       ` Simon Horman
  0 siblings, 0 replies; 115+ messages in thread
From: Simon Horman @ 2023-08-16  9:14 UTC (permalink / raw)
  To: Wenjun Wu; +Cc: netdev, anthony.l.nguyen, qi.z.zhang, intel-wired-lan

On Wed, Aug 16, 2023 at 11:33:53AM +0800, Wenjun Wu wrote:
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> iavf rate tree with root node and queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> User can configure the tx_max and tx_share of each queue. If any one of
> the queues have been fully updated by user, i.e. both tx_max and
> tx_share have been updated for that queue, VIRTCHNL opcodes of
> VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
> to PF to configure queues allocated to VF if PF indicates support of
> VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>

...

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c

...

> @@ -5005,10 +5035,18 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	/* Setup the wait queue for indicating virtchannel events */
>  	init_waitqueue_head(&adapter->vc_waitqueue);
>  
> +	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
> +	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
> +	if (!adapter->qos_caps)

Hi Jun Zhang and Wenjun Wu,

The goto below leads to the function returning err.
Should err be set to an error value here?

As flagged by Smatch and Coccinelle.

> +		goto err_ioremap;
> +
>  	/* Register iavf adapter with devlink */
>  	err = iavf_devlink_register(adapter);
> -	if (err)
> +	if (err) {
>  		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
> +		kfree(adapter->qos_caps);
> +		goto err_ioremap;
> +	}
>  
>  	/* Keep driver interface even on devlink registration failure */
>  	return 0;

...
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16 16:54       ` Brett Creeley
  -1 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 16:54 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> Add support to configure VF queue rate limit and quanta size.
> 
> For quanta size configuration, the quanta profiles are divided evenly
> by PF numbers. For each port, the first quanta profile is reserved for
> default. When VF is asked to set queue quanta size, PF will search for
> an available profile, change the fields and assigned this profile to the
> queue.
> 
> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice.h          |   2 +
>   drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
>   drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
>   drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
>   .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
>   10 files changed, 372 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 34be1cb1e28f..677ab9571b3f 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -642,6 +642,8 @@ struct ice_pf {
>   #define ICE_VF_AGG_NODE_ID_START       65
>   #define ICE_MAX_VF_AGG_NODES           32
>          struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
> +
> +       u8 num_quanta_prof_used;
>   };
> 
>   extern struct workqueue_struct *ice_lag_wq;
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index 9ab9fb558b5e..efd01874434f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
>                  break;
>          }
> 
> +       tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
> +
>          tlan_ctx->tso_ena = ICE_TX_LEGACY;
>          tlan_ctx->tso_qnum = pf_q;
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index a0e43599eb55..910525a19a4c 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -2463,6 +2463,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
>          ice_recalc_port_limited_caps(hw, &func_p->common_cap);
>   }
> 
> +/**
> + * ice_func_id_to_logical_id - map from function id to logical pf id
> + * @active_function_bitmap: active function bitmap
> + * @pf_id: function number of device
> + */
> +static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
> +{
> +       u8 logical_id = 0;
> +       u8 i;
> +
> +       for (i = 0; i < pf_id; i++)
> +               if (active_function_bitmap & BIT(i))
> +                       logical_id++;
> +
> +       return logical_id;
> +}
> +
>   /**
>    * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
>    * @hw: pointer to the HW struct
> @@ -2480,6 +2497,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
>          dev_p->num_funcs = hweight32(number);
>          ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
>                    dev_p->num_funcs);
> +
> +       hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
>   }
> 
>   /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> index 6756f3d51d14..9da94e000394 100644
> --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> @@ -6,6 +6,14 @@
>   #ifndef _ICE_HW_AUTOGEN_H_
>   #define _ICE_HW_AUTOGEN_H_
> 
> +#define GLCOMM_QUANTA_PROF(_i)                 (0x002D2D68 + ((_i) * 4))
> +#define GLCOMM_QUANTA_PROF_MAX_INDEX           15
> +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S       0
> +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M       ICE_M(0x3FFF, 0)
> +#define GLCOMM_QUANTA_PROF_MAX_CMD_S           16
> +#define GLCOMM_QUANTA_PROF_MAX_CMD_M           ICE_M(0xFF, 16)
> +#define GLCOMM_QUANTA_PROF_MAX_DESC_S          24
> +#define GLCOMM_QUANTA_PROF_MAX_DESC_M          ICE_M(0x3F, 24)
>   #define QTX_COMM_DBELL(_DBQM)                  (0x002C0000 + ((_DBQM) * 4))
>   #define QTX_COMM_HEAD(_DBQM)                   (0x000E0000 + ((_DBQM) * 4))
>   #define QTX_COMM_HEAD_HEAD_S                   0
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
> index 166413fc33f4..7e152ab5b727 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> @@ -381,6 +381,8 @@ struct ice_tx_ring {
>          u8 flags;
>          u8 dcb_tc;                      /* Traffic class of ring */
>          u8 ptp_tx;
> +
> +       u16 quanta_prof_id;
>   } ____cacheline_internodealigned_in_smp;
> 
>   static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
> diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
> index a5429eca4350..504b367f1c77 100644
> --- a/drivers/net/ethernet/intel/ice/ice_type.h
> +++ b/drivers/net/ethernet/intel/ice/ice_type.h
> @@ -850,6 +850,7 @@ struct ice_hw {
>          u8 revision_id;
> 
>          u8 pf_id;               /* device profile info */
> +       u8 logical_pf_id;
>          enum ice_phy_model phy_model;
> 
>          u16 max_burst_size;     /* driver sets this value */
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> index 67172fdd9bc2..6499d83cc706 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> @@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
>          u16 last_printed;
>   };
> 
> +struct ice_vf_qs_bw {
> +       u16 queue_id;
> +       u32 committed;
> +       u32 peak;
> +       u8 tc;
> +};

Nit, but you should re-arrange this struct to have the largest numbers 
first.

> +
>   /* VF operations */
>   struct ice_vf_ops {
>          enum ice_disq_rst_src reset_type;
> @@ -133,6 +140,8 @@ struct ice_vf {
> 
>          /* devlink port data */
>          struct devlink_port devlink_port;
> +
> +       struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
>   };
> 
>   /* Flags for controlling behavior of ice_reset_vf */
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index 85d996531502..9fc1a9d1bcd4 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
>          if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
>                  vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
> 
> +       if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
> +               vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
> +
>          vfres->num_vsis = 1;
>          /* Tx and Rx queue are equal for VF */
>          vfres->num_queue_pairs = vsi->num_txq;
> @@ -985,6 +988,170 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
>                                       NULL, 0);
>   }
> 
> +/**
> + * ice_vc_get_qos_caps - Get current QoS caps from PF
> + * @vf: pointer to the VF info
> + *
> + * Get VF's QoS capabilities, such as TC number, arbiter and
> + * bandwidth from PF.
> + */
> +static int ice_vc_get_qos_caps(struct ice_vf *vf)
> +{
> +       enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS; > +       struct virtchnl_qos_cap_list *cap_list = NULL;
> +       u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
> +       struct virtchnl_qos_cap_elem *cfg = NULL;
> +       struct ice_vsi_ctx *vsi_ctx;
> +       struct ice_pf *pf = vf->pf;
> +       struct ice_port_info *pi;
> +       struct ice_vsi *vsi;
> +       u8 numtc, tc;
> +       u16 len = 0;
> +       int ret, i;
> +
> +       if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       pi = pf->hw.port_info;
> +       numtc = vsi->tc_cfg.numtc;
> +
> +       vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
> +       if (!vsi_ctx) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       len = struct_size(cap_list, cap, numtc);
> +       cap_list = kzalloc(len, GFP_KERNEL);
> +       if (!cap_list) {
> +               v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +               len = 0;
> +               goto err;
> +       }
> +
> +       cap_list->vsi_id = vsi->vsi_num;
> +       cap_list->num_elem = numtc;
> +
> +       /* Store the UP2TC configuration from DCB to a user priority bitmap
> +        * of each TC. Each element of prio_of_tc represents one TC. Each
> +        * bitmap indicates the user priorities belong to this TC.
> +        */
> +       for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
> +               tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
> +               tc_prio[tc] |= BIT(i);
> +       }
> +
> +       for (i = 0; i < numtc; i++) {
> +               cfg = &cap_list->cap[i];
> +               cfg->tc_num = i;
> +               cfg->tc_prio = tc_prio[i];
> +               cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
> +               cfg->weight = VIRTCHNL_STRICT_WEIGHT;
> +               cfg->type = VIRTCHNL_BW_SHAPER;
> +               cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
> +               cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
> +       }
> +
> +err:
> +       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
> +                                   (u8 *)cap_list, len);
> +       kfree(cap_list);
> +       return ret;
> +}
> +
> +/**
> + * ice_vf_cfg_qs_bw - Configure per queue bandwidth
> + * @vf: pointer to the VF info
> + * @num_queues: number of queues to be configured
> + *
> + * Configure per queue bandwidth.
> + */
> +static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
> +{
> +       struct ice_hw *hw = &vf->pf->hw;
> +       struct ice_vsi *vsi;
> +       u32 p_rate;
> +       int ret; > +       u16 i;
> +       u8 tc;

Nit, the scope of p_rate and tc can be reduced to the for loop below.

> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi)
> +               return -EINVAL;
> +
> +       for (i = 0; i < num_queues; i++) {
> +               p_rate = vf->qs_bw[i].peak;
> +               tc = vf->qs_bw[i].tc;
> +               if (p_rate) {
> +                       ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
> +                                              vf->qs_bw[i].queue_id,
> +                                              ICE_MAX_BW, p_rate);
> +               } else {
> +                       ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
> +                                                   vf->qs_bw[i].queue_id,
> +                                                   ICE_MAX_BW);
> +               }

Nit, brackets not needed for single statement in the if/else blocks.

> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * ice_vf_cfg_q_quanta_profile
> + * @vf: pointer to the VF info
> + * @quanta_prof_idx: pointer to the quanta profile index
> + * @quanta_size: quanta size to be set
> + *
> + * This function chooses available quanta profile and configures the register.
> + * The quanta profile is evenly divided by the number of device ports, and then
> + * available to the specific PF and VFs. The first profile for each PF is a
> + * reserved default profile. Only quanta size of the rest unused profile can be
> + * modified.
> + */
> +static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
> +                                      u16 *quanta_prof_idx)
> +{
> +       const u16 n_desc = calc_quanta_desc(quanta_size);
> +       struct ice_hw *hw = &vf->pf->hw;
> +       const u16 n_cmd = 2 * n_desc;
> +       struct ice_pf *pf = vf->pf;
> +       u16 per_pf, begin_id;
> +       u8 n_used;
> +       u32 reg;
> +
> +       per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
> +       begin_id = hw->logical_pf_id * per_pf;
> +       n_used = pf->num_quanta_prof_used;

Nit, per_pf and n_used can be local to the first else block below.

> +
> +       if (quanta_size == ICE_DFLT_QUANTA) {
> +               *quanta_prof_idx = begin_id;
> +       } else {
> +               if (n_used < per_pf) {
> +                       *quanta_prof_idx = begin_id + 1 + n_used;
> +                       pf->num_quanta_prof_used++;
> +               } else {
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
> +             FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
> +             FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
> +       wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
> +
> +       return 0;
> +}
> +
>   /**
>    * ice_vc_cfg_promiscuous_mode_msg
>    * @vf: pointer to the VF info
> @@ -1587,6 +1754,136 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
>                                       NULL, 0);
>   }
> 
> +/**
> + * ice_vc_cfg_q_bw - Configure per queue bandwidth
> + * @vf: pointer to the VF info
> + * @msg: pointer to the msg buffer which holds the command descriptor
> + *
> + * Configure VF queues bandwidth.
> + */
> +static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
> +{
> +       enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +       struct virtchnl_queues_bw_cfg *qbw =
> +               (struct virtchnl_queues_bw_cfg *)msg;
> +       struct ice_vf_qs_bw *qs_bw;
> +       struct ice_vsi *vsi;
> +       size_t len;
> +       u16 i;
> +
> +       if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
> +           !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi || vsi->vsi_num != qbw->vsi_id) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
> +           qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
> +               dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
> +                       vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
> +       qs_bw = kzalloc(len, GFP_KERNEL);
> +       if (!qs_bw) {
> +               v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +               goto err_bw;

If there's a reason you need to allocate this, the goto label should be 
"err" instead of "err_bw".

> +       }
> +
> +       for (i = 0; i < qbw->num_queues; i++) {
> +               qs_bw[i].queue_id = qbw->cfg[i].queue_id;
> +               qs_bw[i].peak = qbw->cfg[i].shaper.peak;
> +               qs_bw[i].committed = qbw->cfg[i].shaper.committed;
> +               qs_bw[i].tc = qbw->cfg[i].tc;
> +       }

Do you need to allocate qs_bw? What's stopping you from setting 
vf->qs_bw directly here? This would remove the kzalloc above and memcpy 
below. It would also get rid of the "err_bw" label.

Also, does the virtchnl_queues_bw_cfg need to be validated in any way?

> +
> +       memcpy(vf->qs_bw, qs_bw, len);
> +
> +err_bw:
> +       kfree(qs_bw);
> +
> +err:
> +       /* send the response to the VF */
> +       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +                                   v_ret, NULL, 0);
> +}
> +
> +/**
> + * ice_vc_cfg_q_quanta - Configure per queue quanta
> + * @vf: pointer to the VF info
> + * @msg: pointer to the msg buffer which holds the command descriptor
> + *
> + * Configure VF queues quanta.
> + */
> +static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
> +{
> +       u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
> +       enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +       struct virtchnl_quanta_cfg *qquanta =
> +               (struct virtchnl_quanta_cfg *)msg;
> +       struct ice_vsi *vsi;
> +       int ret;
> +
> +       start_qid = qquanta->queue_select.start_queue_id;
> +       num_queues = qquanta->queue_select.num_queues;
> +       quanta_size = qquanta->quanta_size;
> +       end_qid = start_qid + num_queues;

Does it make sense to set these right before they are used instead?

> +
> +       if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
> +           end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
> +               dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
> +                       vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (quanta_size > ICE_MAX_QUANTA_SIZE ||
> +           quanta_size < ICE_MIN_QUANTA_SIZE) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (quanta_size % 64) {
> +               dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
> +                                         &quanta_prof_id);
> +       if (ret) {
> +               v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
> +               goto err;
> +       }
> +
> +       for (i = start_qid; i < end_qid; i++)
> +               vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
> +
> +err:
> +       /* send the response to the VF */
> +       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
> +                                    v_ret, NULL, 0);
> +}
> +
>   /**
>    * ice_vc_cfg_qs_msg
>    * @vf: pointer to the VF info
> @@ -1710,6 +2007,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>                  }
>          }
> 
> +       if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
> +               goto error_param;
> +
>          /* send the response to the VF */
>          return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
>                                       VIRTCHNL_STATUS_SUCCESS, NULL, 0);
> @@ -3687,6 +3987,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
>          .dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
>          .ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
>          .dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
> +       .get_qos_caps = ice_vc_get_qos_caps,
> +       .cfg_q_bw = ice_vc_cfg_q_bw,
> +       .cfg_q_quanta = ice_vc_cfg_q_quanta,
>   };
> 
>   /**
> @@ -4040,6 +4343,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>          case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
>                  err = ops->dis_vlan_insertion_v2_msg(vf, msg);
>                  break;
> +       case VIRTCHNL_OP_GET_QOS_CAPS:
> +               err = ops->get_qos_caps(vf);
> +               break;
> +       case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> +               err = ops->cfg_q_bw(vf, msg);
> +               break;
> +       case VIRTCHNL_OP_CONFIG_QUANTA:
> +               err = ops->cfg_q_quanta(vf, msg);
> +               break;
>          case VIRTCHNL_OP_UNKNOWN:
>          default:
>                  dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> index cd747718de73..0efb9c0f669a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> @@ -13,6 +13,13 @@
>   /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
>   #define ICE_MAX_VLAN_PER_VF            8
> 
> +#define ICE_DFLT_QUANTA 1024
> +#define ICE_MAX_QUANTA_SIZE 4096
> +#define ICE_MIN_QUANTA_SIZE 256
> +
> +#define calc_quanta_desc(x)    \
> +       max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
> +
>   /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
>    * broadcast, and 16 for additional unicast/multicast filters
>    */
> @@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
>          int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
>          int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
>          int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
> +       int (*get_qos_caps)(struct ice_vf *vf);
> +       int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
> +       int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
> +       int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
>   };
> 
>   #ifdef CONFIG_PCI_IOV
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> index 7d547fa616fa..2e3f63a429cd 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> @@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
>          VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
>   };
> 
> +static const u32 tc_allowlist_opcodes[] = {
> +       VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +       VIRTCHNL_OP_CONFIG_QUANTA,
> +};
> +
>   struct allowlist_opcode_info {
>          const u32 *opcodes;
>          size_t size;
> @@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
>          ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
>          ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
>          ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
> +       ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
>   };
> 
>   /**
> --
> 2.34.1
> 
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 2/5] ice: Support VF queue rate limit and quanta size configuration
@ 2023-08-16 16:54       ` Brett Creeley
  0 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 16:54 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> Add support to configure VF queue rate limit and quanta size.
> 
> For quanta size configuration, the quanta profiles are divided evenly
> by PF numbers. For each port, the first quanta profile is reserved for
> default. When VF is asked to set queue quanta size, PF will search for
> an available profile, change the fields and assigned this profile to the
> queue.
> 
> Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice.h          |   2 +
>   drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
>   drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
>   drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
>   drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 312 ++++++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
>   .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
>   10 files changed, 372 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 34be1cb1e28f..677ab9571b3f 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -642,6 +642,8 @@ struct ice_pf {
>   #define ICE_VF_AGG_NODE_ID_START       65
>   #define ICE_MAX_VF_AGG_NODES           32
>          struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
> +
> +       u8 num_quanta_prof_used;
>   };
> 
>   extern struct workqueue_struct *ice_lag_wq;
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
> index 9ab9fb558b5e..efd01874434f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
>                  break;
>          }
> 
> +       tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
> +
>          tlan_ctx->tso_ena = ICE_TX_LEGACY;
>          tlan_ctx->tso_qnum = pf_q;
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index a0e43599eb55..910525a19a4c 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -2463,6 +2463,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
>          ice_recalc_port_limited_caps(hw, &func_p->common_cap);
>   }
> 
> +/**
> + * ice_func_id_to_logical_id - map from function id to logical pf id
> + * @active_function_bitmap: active function bitmap
> + * @pf_id: function number of device
> + */
> +static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
> +{
> +       u8 logical_id = 0;
> +       u8 i;
> +
> +       for (i = 0; i < pf_id; i++)
> +               if (active_function_bitmap & BIT(i))
> +                       logical_id++;
> +
> +       return logical_id;
> +}
> +
>   /**
>    * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
>    * @hw: pointer to the HW struct
> @@ -2480,6 +2497,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
>          dev_p->num_funcs = hweight32(number);
>          ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
>                    dev_p->num_funcs);
> +
> +       hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
>   }
> 
>   /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> index 6756f3d51d14..9da94e000394 100644
> --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> @@ -6,6 +6,14 @@
>   #ifndef _ICE_HW_AUTOGEN_H_
>   #define _ICE_HW_AUTOGEN_H_
> 
> +#define GLCOMM_QUANTA_PROF(_i)                 (0x002D2D68 + ((_i) * 4))
> +#define GLCOMM_QUANTA_PROF_MAX_INDEX           15
> +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S       0
> +#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M       ICE_M(0x3FFF, 0)
> +#define GLCOMM_QUANTA_PROF_MAX_CMD_S           16
> +#define GLCOMM_QUANTA_PROF_MAX_CMD_M           ICE_M(0xFF, 16)
> +#define GLCOMM_QUANTA_PROF_MAX_DESC_S          24
> +#define GLCOMM_QUANTA_PROF_MAX_DESC_M          ICE_M(0x3F, 24)
>   #define QTX_COMM_DBELL(_DBQM)                  (0x002C0000 + ((_DBQM) * 4))
>   #define QTX_COMM_HEAD(_DBQM)                   (0x000E0000 + ((_DBQM) * 4))
>   #define QTX_COMM_HEAD_HEAD_S                   0
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
> index 166413fc33f4..7e152ab5b727 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> @@ -381,6 +381,8 @@ struct ice_tx_ring {
>          u8 flags;
>          u8 dcb_tc;                      /* Traffic class of ring */
>          u8 ptp_tx;
> +
> +       u16 quanta_prof_id;
>   } ____cacheline_internodealigned_in_smp;
> 
>   static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
> diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
> index a5429eca4350..504b367f1c77 100644
> --- a/drivers/net/ethernet/intel/ice/ice_type.h
> +++ b/drivers/net/ethernet/intel/ice/ice_type.h
> @@ -850,6 +850,7 @@ struct ice_hw {
>          u8 revision_id;
> 
>          u8 pf_id;               /* device profile info */
> +       u8 logical_pf_id;
>          enum ice_phy_model phy_model;
> 
>          u16 max_burst_size;     /* driver sets this value */
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> index 67172fdd9bc2..6499d83cc706 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> @@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
>          u16 last_printed;
>   };
> 
> +struct ice_vf_qs_bw {
> +       u16 queue_id;
> +       u32 committed;
> +       u32 peak;
> +       u8 tc;
> +};

Nit, but you should re-arrange this struct to have the largest numbers 
first.

> +
>   /* VF operations */
>   struct ice_vf_ops {
>          enum ice_disq_rst_src reset_type;
> @@ -133,6 +140,8 @@ struct ice_vf {
> 
>          /* devlink port data */
>          struct devlink_port devlink_port;
> +
> +       struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
>   };
> 
>   /* Flags for controlling behavior of ice_reset_vf */
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index 85d996531502..9fc1a9d1bcd4 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
>          if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
>                  vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
> 
> +       if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
> +               vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
> +
>          vfres->num_vsis = 1;
>          /* Tx and Rx queue are equal for VF */
>          vfres->num_queue_pairs = vsi->num_txq;
> @@ -985,6 +988,170 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
>                                       NULL, 0);
>   }
> 
> +/**
> + * ice_vc_get_qos_caps - Get current QoS caps from PF
> + * @vf: pointer to the VF info
> + *
> + * Get VF's QoS capabilities, such as TC number, arbiter and
> + * bandwidth from PF.
> + */
> +static int ice_vc_get_qos_caps(struct ice_vf *vf)
> +{
> +       enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS; > +       struct virtchnl_qos_cap_list *cap_list = NULL;
> +       u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
> +       struct virtchnl_qos_cap_elem *cfg = NULL;
> +       struct ice_vsi_ctx *vsi_ctx;
> +       struct ice_pf *pf = vf->pf;
> +       struct ice_port_info *pi;
> +       struct ice_vsi *vsi;
> +       u8 numtc, tc;
> +       u16 len = 0;
> +       int ret, i;
> +
> +       if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       pi = pf->hw.port_info;
> +       numtc = vsi->tc_cfg.numtc;
> +
> +       vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
> +       if (!vsi_ctx) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       len = struct_size(cap_list, cap, numtc);
> +       cap_list = kzalloc(len, GFP_KERNEL);
> +       if (!cap_list) {
> +               v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +               len = 0;
> +               goto err;
> +       }
> +
> +       cap_list->vsi_id = vsi->vsi_num;
> +       cap_list->num_elem = numtc;
> +
> +       /* Store the UP2TC configuration from DCB to a user priority bitmap
> +        * of each TC. Each element of prio_of_tc represents one TC. Each
> +        * bitmap indicates the user priorities belong to this TC.
> +        */
> +       for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
> +               tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
> +               tc_prio[tc] |= BIT(i);
> +       }
> +
> +       for (i = 0; i < numtc; i++) {
> +               cfg = &cap_list->cap[i];
> +               cfg->tc_num = i;
> +               cfg->tc_prio = tc_prio[i];
> +               cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
> +               cfg->weight = VIRTCHNL_STRICT_WEIGHT;
> +               cfg->type = VIRTCHNL_BW_SHAPER;
> +               cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
> +               cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
> +       }
> +
> +err:
> +       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
> +                                   (u8 *)cap_list, len);
> +       kfree(cap_list);
> +       return ret;
> +}
> +
> +/**
> + * ice_vf_cfg_qs_bw - Configure per queue bandwidth
> + * @vf: pointer to the VF info
> + * @num_queues: number of queues to be configured
> + *
> + * Configure per queue bandwidth.
> + */
> +static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
> +{
> +       struct ice_hw *hw = &vf->pf->hw;
> +       struct ice_vsi *vsi;
> +       u32 p_rate;
> +       int ret; > +       u16 i;
> +       u8 tc;

Nit, the scope of p_rate and tc can be reduced to the for loop below.

> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi)
> +               return -EINVAL;
> +
> +       for (i = 0; i < num_queues; i++) {
> +               p_rate = vf->qs_bw[i].peak;
> +               tc = vf->qs_bw[i].tc;
> +               if (p_rate) {
> +                       ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
> +                                              vf->qs_bw[i].queue_id,
> +                                              ICE_MAX_BW, p_rate);
> +               } else {
> +                       ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
> +                                                   vf->qs_bw[i].queue_id,
> +                                                   ICE_MAX_BW);
> +               }

Nit, brackets not needed for single statement in the if/else blocks.

> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * ice_vf_cfg_q_quanta_profile
> + * @vf: pointer to the VF info
> + * @quanta_prof_idx: pointer to the quanta profile index
> + * @quanta_size: quanta size to be set
> + *
> + * This function chooses available quanta profile and configures the register.
> + * The quanta profile is evenly divided by the number of device ports, and then
> + * available to the specific PF and VFs. The first profile for each PF is a
> + * reserved default profile. Only quanta size of the rest unused profile can be
> + * modified.
> + */
> +static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
> +                                      u16 *quanta_prof_idx)
> +{
> +       const u16 n_desc = calc_quanta_desc(quanta_size);
> +       struct ice_hw *hw = &vf->pf->hw;
> +       const u16 n_cmd = 2 * n_desc;
> +       struct ice_pf *pf = vf->pf;
> +       u16 per_pf, begin_id;
> +       u8 n_used;
> +       u32 reg;
> +
> +       per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs;
> +       begin_id = hw->logical_pf_id * per_pf;
> +       n_used = pf->num_quanta_prof_used;

Nit, per_pf and n_used can be local to the first else block below.

> +
> +       if (quanta_size == ICE_DFLT_QUANTA) {
> +               *quanta_prof_idx = begin_id;
> +       } else {
> +               if (n_used < per_pf) {
> +                       *quanta_prof_idx = begin_id + 1 + n_used;
> +                       pf->num_quanta_prof_used++;
> +               } else {
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
> +             FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
> +             FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
> +       wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
> +
> +       return 0;
> +}
> +
>   /**
>    * ice_vc_cfg_promiscuous_mode_msg
>    * @vf: pointer to the VF info
> @@ -1587,6 +1754,136 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
>                                       NULL, 0);
>   }
> 
> +/**
> + * ice_vc_cfg_q_bw - Configure per queue bandwidth
> + * @vf: pointer to the VF info
> + * @msg: pointer to the msg buffer which holds the command descriptor
> + *
> + * Configure VF queues bandwidth.
> + */
> +static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
> +{
> +       enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +       struct virtchnl_queues_bw_cfg *qbw =
> +               (struct virtchnl_queues_bw_cfg *)msg;
> +       struct ice_vf_qs_bw *qs_bw;
> +       struct ice_vsi *vsi;
> +       size_t len;
> +       u16 i;
> +
> +       if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
> +           !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi || vsi->vsi_num != qbw->vsi_id) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
> +           qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
> +               dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
> +                       vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       len = sizeof(struct ice_vf_qs_bw) * qbw->num_queues;
> +       qs_bw = kzalloc(len, GFP_KERNEL);
> +       if (!qs_bw) {
> +               v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +               goto err_bw;

If there's a reason you need to allocate this, the goto label should be 
"err" instead of "err_bw".

> +       }
> +
> +       for (i = 0; i < qbw->num_queues; i++) {
> +               qs_bw[i].queue_id = qbw->cfg[i].queue_id;
> +               qs_bw[i].peak = qbw->cfg[i].shaper.peak;
> +               qs_bw[i].committed = qbw->cfg[i].shaper.committed;
> +               qs_bw[i].tc = qbw->cfg[i].tc;
> +       }

Do you need to allocate qs_bw? What's stopping you from setting 
vf->qs_bw directly here? This would remove the kzalloc above and memcpy 
below. It would also get rid of the "err_bw" label.

Also, does the virtchnl_queues_bw_cfg need to be validated in any way?

> +
> +       memcpy(vf->qs_bw, qs_bw, len);
> +
> +err_bw:
> +       kfree(qs_bw);
> +
> +err:
> +       /* send the response to the VF */
> +       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +                                   v_ret, NULL, 0);
> +}
> +
> +/**
> + * ice_vc_cfg_q_quanta - Configure per queue quanta
> + * @vf: pointer to the VF info
> + * @msg: pointer to the msg buffer which holds the command descriptor
> + *
> + * Configure VF queues quanta.
> + */
> +static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
> +{
> +       u16 quanta_prof_id, quanta_size, start_qid, num_queues, end_qid, i;
> +       enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
> +       struct virtchnl_quanta_cfg *qquanta =
> +               (struct virtchnl_quanta_cfg *)msg;
> +       struct ice_vsi *vsi;
> +       int ret;
> +
> +       start_qid = qquanta->queue_select.start_queue_id;
> +       num_queues = qquanta->queue_select.num_queues;
> +       quanta_size = qquanta->quanta_size;
> +       end_qid = start_qid + num_queues;

Does it make sense to set these right before they are used instead?

> +
> +       if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       vsi = ice_get_vf_vsi(vf);
> +       if (!vsi) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
> +           end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
> +               dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
> +                       vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (quanta_size > ICE_MAX_QUANTA_SIZE ||
> +           quanta_size < ICE_MIN_QUANTA_SIZE) {
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       if (quanta_size % 64) {
> +               dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
> +               v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> +               goto err;
> +       }
> +
> +       ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
> +                                         &quanta_prof_id);
> +       if (ret) {
> +               v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
> +               goto err;
> +       }
> +
> +       for (i = start_qid; i < end_qid; i++)
> +               vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
> +
> +err:
> +       /* send the response to the VF */
> +       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
> +                                    v_ret, NULL, 0);
> +}
> +
>   /**
>    * ice_vc_cfg_qs_msg
>    * @vf: pointer to the VF info
> @@ -1710,6 +2007,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>                  }
>          }
> 
> +       if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
> +               goto error_param;
> +
>          /* send the response to the VF */
>          return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
>                                       VIRTCHNL_STATUS_SUCCESS, NULL, 0);
> @@ -3687,6 +3987,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
>          .dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
>          .ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
>          .dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
> +       .get_qos_caps = ice_vc_get_qos_caps,
> +       .cfg_q_bw = ice_vc_cfg_q_bw,
> +       .cfg_q_quanta = ice_vc_cfg_q_quanta,
>   };
> 
>   /**
> @@ -4040,6 +4343,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>          case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
>                  err = ops->dis_vlan_insertion_v2_msg(vf, msg);
>                  break;
> +       case VIRTCHNL_OP_GET_QOS_CAPS:
> +               err = ops->get_qos_caps(vf);
> +               break;
> +       case VIRTCHNL_OP_CONFIG_QUEUE_BW:
> +               err = ops->cfg_q_bw(vf, msg);
> +               break;
> +       case VIRTCHNL_OP_CONFIG_QUANTA:
> +               err = ops->cfg_q_quanta(vf, msg);
> +               break;
>          case VIRTCHNL_OP_UNKNOWN:
>          default:
>                  dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> index cd747718de73..0efb9c0f669a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> @@ -13,6 +13,13 @@
>   /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
>   #define ICE_MAX_VLAN_PER_VF            8
> 
> +#define ICE_DFLT_QUANTA 1024
> +#define ICE_MAX_QUANTA_SIZE 4096
> +#define ICE_MIN_QUANTA_SIZE 256
> +
> +#define calc_quanta_desc(x)    \
> +       max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
> +
>   /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
>    * broadcast, and 16 for additional unicast/multicast filters
>    */
> @@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
>          int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
>          int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
>          int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
> +       int (*get_qos_caps)(struct ice_vf *vf);
> +       int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
> +       int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
> +       int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
>   };
> 
>   #ifdef CONFIG_PCI_IOV
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> index 7d547fa616fa..2e3f63a429cd 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
> @@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
>          VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
>   };
> 
> +static const u32 tc_allowlist_opcodes[] = {
> +       VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +       VIRTCHNL_OP_CONFIG_QUANTA,
> +};
> +
>   struct allowlist_opcode_info {
>          const u32 *opcodes;
>          size_t size;
> @@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
>          ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
>          ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
>          ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
> +       ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
>   };
> 
>   /**
> --
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 3/5] iavf: Add devlink and devlink port support
  2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16 17:11       ` Brett Creeley
  -1 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 17:11 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> To allow user to configure queue bandwidth, devlink port support
> is added to support devlink port rate API.
> 
> Add devlink framework registration/unregistration on iavf driver
> initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
> is created to be associated iavf net device.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ---
>   drivers/net/ethernet/intel/Kconfig            |  1 +
>   drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
>   drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
>   .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
>   .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
>   drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
>   6 files changed, 132 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h
> 
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 9bc0a9519899..f916b8ef6acb 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -256,6 +256,7 @@ config I40EVF
>          tristate "Intel(R) Ethernet Adaptive Virtual Function support"
>          select IAVF
>          depends on PCI_MSI
> +       select NET_DEVLINK
>          help
>            This driver supports virtual functions for Intel XL710,
>            X710, X722, XXV710, and all devices advertising support for
> diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
> index 9c3e45c54d01..b5d7db97ab8b 100644
> --- a/drivers/net/ethernet/intel/iavf/Makefile
> +++ b/drivers/net/ethernet/intel/iavf/Makefile
> @@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
>   obj-$(CONFIG_IAVF) += iavf.o
> 
>   iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
> -            iavf_adv_rss.o \
> +            iavf_adv_rss.o iavf_devlink.o \
>               iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
> diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
> index 8cbdebc5b698..519aeaec793c 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf.h
> @@ -33,9 +33,11 @@
>   #include <net/udp.h>
>   #include <net/tc_act/tc_gact.h>
>   #include <net/tc_act/tc_mirred.h>
> +#include <net/devlink.h>
> 
>   #include "iavf_type.h"
>   #include <linux/avf/virtchnl.h>
> +#include "iavf_devlink.h"
>   #include "iavf_txrx.h"
>   #include "iavf_fdir.h"
>   #include "iavf_adv_rss.h"
> @@ -369,6 +371,10 @@ struct iavf_adapter {
>          struct net_device *netdev;
>          struct pci_dev *pdev;
> 
> +       /* devlink & port data */

Nit, this comment doesn't add anything to the reader.

> +       struct devlink *devlink;
> +       struct devlink_port devlink_port;
> +
>          struct iavf_hw hw; /* defined in iavf_type.h */
> 
>          enum iavf_state_t state;
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> new file mode 100644
> index 000000000000..991d041e5922
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2023 Intel Corporation */
> +
> +#include "iavf.h"
> +#include "iavf_devlink.h"
> +
> +static const struct devlink_ops iavf_devlink_ops = {};
> +
> +/**
> + * iavf_devlink_register - Register allocated devlink instance for iavf adapter
> + * @adapter: the iavf adapter to register the devlink for.
> + *
> + * Register the devlink instance associated with this iavf adapter

Nit, seems like this is a duplicate of what you above.

> + *
> + * Return: zero on success or an error code on failure.
> + */
> +int iavf_devlink_register(struct iavf_adapter *adapter)
> +{
> +       struct device *dev = &adapter->pdev->dev;
> +       struct iavf_devlink *ref;
> +       struct devlink *devlink;
> +
> +       /* Allocate devlink instance */

Nit, unnecessary comment.

> +       devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
> +                               dev);
> +       if (!devlink)
> +               return -ENOMEM;
> +
> +       /* Init iavf adapter devlink */

Nit, unnecessary comment.

> +       adapter->devlink = devlink;
> +       ref = devlink_priv(devlink);
> +       ref->devlink_ref = adapter;
> +
> +       devlink_register(devlink);
> +
> +       return 0;
> +}
> +
> +/**
> + * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
> + * @adapter: the iavf adapter structure
> + *
> + * Releases resources used by devlink and cleans up associated memory.
> + */
> +void iavf_devlink_unregister(struct iavf_adapter *adapter)
> +{
> +       devlink_unregister(adapter->devlink);
> +       devlink_free(adapter->devlink);

Seems like you should bail out if (!adapter->devlink) since allocation 
of adapter->devlink can fail in iavf_devlink_register().

> +}
> +
> +/**
> + * iavf_devlink_port_register - Register devlink port for iavf adapter
> + * @adapter: the iavf adapter to register the devlink port for.
> + *
> + * Register the devlink port instance associated with this iavf adapter
> + * before iavf adapter registers with netdevice
> + *
> + * Return: zero on success or an error code on failure.
> + */
> +int iavf_devlink_port_register(struct iavf_adapter *adapter)
> +{
> +       struct device *dev = &adapter->pdev->dev;
> +       struct devlink_port_attrs attrs = {};
> +       int err;
> +
> +       /* Create devlink port: attr/port flavour, port index */

Nit, unnecessary comment.

> +       SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
> +       attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
> +       memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
> +       devlink_port_attrs_set(&adapter->devlink_port, &attrs);
> +
> +       /* Register with driver specific index (device id) */
> +       err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
> +                                   adapter->hw.bus.device);
> +       if (err)
> +               dev_err(dev, "devlink port registration failed: %d\n", err);
> +
> +       return err;
> +}
> +
> +/**
> + * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
> + * @adapter: the iavf adapter structure
> + *
> + * Releases resources used by devlink port and registration with devlink.
> + */
> +void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
> +{
> +       if (!adapter->devlink_port.registered)
> +               return;
> +
> +       devlink_port_unregister(&adapter->devlink_port);
> +}
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> new file mode 100644
> index 000000000000..5c122278611a
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2023 Intel Corporation */
> +
> +#ifndef _IAVF_DEVLINK_H_
> +#define _IAVF_DEVLINK_H_
> +
> +/* iavf devlink structure pointing to iavf adapter */

Nit, unnecessary comment.

> +struct iavf_devlink {
> +       struct iavf_adapter *devlink_ref;       /* ref to iavf adapter */
> +};
> +
> +int iavf_devlink_register(struct iavf_adapter *adapter);
> +void iavf_devlink_unregister(struct iavf_adapter *adapter);
> +int iavf_devlink_port_register(struct iavf_adapter *adapter);
> +void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
> +
> +#endif /* _IAVF_DEVLINK_H_ */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 7b300c86ceda..db010e68d5d2 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
>                                  iavf_free_rss(adapter);
>                                  iavf_free_misc_irq(adapter);
>                                  iavf_reset_interrupt_capability(adapter);
> +                               iavf_devlink_port_unregister(adapter);
>                                  iavf_change_state(adapter,
>                                                    __IAVF_INIT_CONFIG_ADAPTER);
>                                  goto out;
> @@ -2708,6 +2709,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>          if (err)
>                  goto err_sw_init;
> 
> +       if (!adapter->netdev_registered)
> +               iavf_devlink_port_register(adapter);
> +
>          netif_carrier_off(netdev);
>          adapter->link_up = false;
>          netif_tx_stop_all_queues(netdev);
> @@ -2749,6 +2753,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>   err_mem:
>          iavf_free_rss(adapter);
>          iavf_free_misc_irq(adapter);
> +       iavf_devlink_port_unregister(adapter);
>   err_sw_init:
>          iavf_reset_interrupt_capability(adapter);
>   err:
> @@ -4995,6 +5000,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>          /* Setup the wait queue for indicating virtchannel events */
>          init_waitqueue_head(&adapter->vc_waitqueue);
> 
> +       /* Register iavf adapter with devlink */
> +       err = iavf_devlink_register(adapter);
> +       if (err)
> +               dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
> +
> +       /* Keep driver interface even on devlink registration failure */
>          return 0;
> 
>   err_ioremap:
> @@ -5139,6 +5150,9 @@ static void iavf_remove(struct pci_dev *pdev)
>                                   err);
>          }
> 
> +       iavf_devlink_port_unregister(adapter);
> +       iavf_devlink_unregister(adapter);
> +
>          mutex_lock(&adapter->crit_lock);
>          dev_info(&adapter->pdev->dev, "Removing device\n");
>          iavf_change_state(adapter, __IAVF_REMOVE);
> --
> 2.34.1
> 
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 3/5] iavf: Add devlink and devlink port support
@ 2023-08-16 17:11       ` Brett Creeley
  0 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 17:11 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> To allow user to configure queue bandwidth, devlink port support
> is added to support devlink port rate API.
> 
> Add devlink framework registration/unregistration on iavf driver
> initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
> is created to be associated iavf net device.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ---
>   drivers/net/ethernet/intel/Kconfig            |  1 +
>   drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
>   drivers/net/ethernet/intel/iavf/iavf.h        |  6 ++
>   .../net/ethernet/intel/iavf/iavf_devlink.c    | 93 +++++++++++++++++++
>   .../net/ethernet/intel/iavf/iavf_devlink.h    | 17 ++++
>   drivers/net/ethernet/intel/iavf/iavf_main.c   | 14 +++
>   6 files changed, 132 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
>   create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h
> 
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 9bc0a9519899..f916b8ef6acb 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -256,6 +256,7 @@ config I40EVF
>          tristate "Intel(R) Ethernet Adaptive Virtual Function support"
>          select IAVF
>          depends on PCI_MSI
> +       select NET_DEVLINK
>          help
>            This driver supports virtual functions for Intel XL710,
>            X710, X722, XXV710, and all devices advertising support for
> diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
> index 9c3e45c54d01..b5d7db97ab8b 100644
> --- a/drivers/net/ethernet/intel/iavf/Makefile
> +++ b/drivers/net/ethernet/intel/iavf/Makefile
> @@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
>   obj-$(CONFIG_IAVF) += iavf.o
> 
>   iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
> -            iavf_adv_rss.o \
> +            iavf_adv_rss.o iavf_devlink.o \
>               iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
> diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
> index 8cbdebc5b698..519aeaec793c 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf.h
> @@ -33,9 +33,11 @@
>   #include <net/udp.h>
>   #include <net/tc_act/tc_gact.h>
>   #include <net/tc_act/tc_mirred.h>
> +#include <net/devlink.h>
> 
>   #include "iavf_type.h"
>   #include <linux/avf/virtchnl.h>
> +#include "iavf_devlink.h"
>   #include "iavf_txrx.h"
>   #include "iavf_fdir.h"
>   #include "iavf_adv_rss.h"
> @@ -369,6 +371,10 @@ struct iavf_adapter {
>          struct net_device *netdev;
>          struct pci_dev *pdev;
> 
> +       /* devlink & port data */

Nit, this comment doesn't add anything to the reader.

> +       struct devlink *devlink;
> +       struct devlink_port devlink_port;
> +
>          struct iavf_hw hw; /* defined in iavf_type.h */
> 
>          enum iavf_state_t state;
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> new file mode 100644
> index 000000000000..991d041e5922
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> @@ -0,0 +1,93 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2023 Intel Corporation */
> +
> +#include "iavf.h"
> +#include "iavf_devlink.h"
> +
> +static const struct devlink_ops iavf_devlink_ops = {};
> +
> +/**
> + * iavf_devlink_register - Register allocated devlink instance for iavf adapter
> + * @adapter: the iavf adapter to register the devlink for.
> + *
> + * Register the devlink instance associated with this iavf adapter

Nit, seems like this is a duplicate of what you above.

> + *
> + * Return: zero on success or an error code on failure.
> + */
> +int iavf_devlink_register(struct iavf_adapter *adapter)
> +{
> +       struct device *dev = &adapter->pdev->dev;
> +       struct iavf_devlink *ref;
> +       struct devlink *devlink;
> +
> +       /* Allocate devlink instance */

Nit, unnecessary comment.

> +       devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
> +                               dev);
> +       if (!devlink)
> +               return -ENOMEM;
> +
> +       /* Init iavf adapter devlink */

Nit, unnecessary comment.

> +       adapter->devlink = devlink;
> +       ref = devlink_priv(devlink);
> +       ref->devlink_ref = adapter;
> +
> +       devlink_register(devlink);
> +
> +       return 0;
> +}
> +
> +/**
> + * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
> + * @adapter: the iavf adapter structure
> + *
> + * Releases resources used by devlink and cleans up associated memory.
> + */
> +void iavf_devlink_unregister(struct iavf_adapter *adapter)
> +{
> +       devlink_unregister(adapter->devlink);
> +       devlink_free(adapter->devlink);

Seems like you should bail out if (!adapter->devlink) since allocation 
of adapter->devlink can fail in iavf_devlink_register().

> +}
> +
> +/**
> + * iavf_devlink_port_register - Register devlink port for iavf adapter
> + * @adapter: the iavf adapter to register the devlink port for.
> + *
> + * Register the devlink port instance associated with this iavf adapter
> + * before iavf adapter registers with netdevice
> + *
> + * Return: zero on success or an error code on failure.
> + */
> +int iavf_devlink_port_register(struct iavf_adapter *adapter)
> +{
> +       struct device *dev = &adapter->pdev->dev;
> +       struct devlink_port_attrs attrs = {};
> +       int err;
> +
> +       /* Create devlink port: attr/port flavour, port index */

Nit, unnecessary comment.

> +       SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
> +       attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
> +       memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
> +       devlink_port_attrs_set(&adapter->devlink_port, &attrs);
> +
> +       /* Register with driver specific index (device id) */
> +       err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
> +                                   adapter->hw.bus.device);
> +       if (err)
> +               dev_err(dev, "devlink port registration failed: %d\n", err);
> +
> +       return err;
> +}
> +
> +/**
> + * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
> + * @adapter: the iavf adapter structure
> + *
> + * Releases resources used by devlink port and registration with devlink.
> + */
> +void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
> +{
> +       if (!adapter->devlink_port.registered)
> +               return;
> +
> +       devlink_port_unregister(&adapter->devlink_port);
> +}
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> new file mode 100644
> index 000000000000..5c122278611a
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2023 Intel Corporation */
> +
> +#ifndef _IAVF_DEVLINK_H_
> +#define _IAVF_DEVLINK_H_
> +
> +/* iavf devlink structure pointing to iavf adapter */

Nit, unnecessary comment.

> +struct iavf_devlink {
> +       struct iavf_adapter *devlink_ref;       /* ref to iavf adapter */
> +};
> +
> +int iavf_devlink_register(struct iavf_adapter *adapter);
> +void iavf_devlink_unregister(struct iavf_adapter *adapter);
> +int iavf_devlink_port_register(struct iavf_adapter *adapter);
> +void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
> +
> +#endif /* _IAVF_DEVLINK_H_ */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 7b300c86ceda..db010e68d5d2 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
>                                  iavf_free_rss(adapter);
>                                  iavf_free_misc_irq(adapter);
>                                  iavf_reset_interrupt_capability(adapter);
> +                               iavf_devlink_port_unregister(adapter);
>                                  iavf_change_state(adapter,
>                                                    __IAVF_INIT_CONFIG_ADAPTER);
>                                  goto out;
> @@ -2708,6 +2709,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>          if (err)
>                  goto err_sw_init;
> 
> +       if (!adapter->netdev_registered)
> +               iavf_devlink_port_register(adapter);
> +
>          netif_carrier_off(netdev);
>          adapter->link_up = false;
>          netif_tx_stop_all_queues(netdev);
> @@ -2749,6 +2753,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>   err_mem:
>          iavf_free_rss(adapter);
>          iavf_free_misc_irq(adapter);
> +       iavf_devlink_port_unregister(adapter);
>   err_sw_init:
>          iavf_reset_interrupt_capability(adapter);
>   err:
> @@ -4995,6 +5000,12 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>          /* Setup the wait queue for indicating virtchannel events */
>          init_waitqueue_head(&adapter->vc_waitqueue);
> 
> +       /* Register iavf adapter with devlink */
> +       err = iavf_devlink_register(adapter);
> +       if (err)
> +               dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
> +
> +       /* Keep driver interface even on devlink registration failure */
>          return 0;
> 
>   err_ioremap:
> @@ -5139,6 +5150,9 @@ static void iavf_remove(struct pci_dev *pdev)
>                                   err);
>          }
> 
> +       iavf_devlink_port_unregister(adapter);
> +       iavf_devlink_unregister(adapter);
> +
>          mutex_lock(&adapter->crit_lock);
>          dev_info(&adapter->pdev->dev, "Removing device\n");
>          iavf_change_state(adapter, __IAVF_REMOVE);
> --
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
  2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16 17:27       ` Brett Creeley
  -1 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 17:27 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> To allow user to configure queue based parameters, devlink port function
> rate api functions are added for setting node tx_max and tx_share
> parameters.
> 
> iavf rate tree with root node and  queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ---
>   .../net/ethernet/intel/iavf/iavf_devlink.c    | 270 +++++++++++++++++-
>   .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
>   drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
>   3 files changed, 295 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> index 991d041e5922..a2bd5295c216 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> @@ -4,7 +4,273 @@
>   #include "iavf.h"
>   #include "iavf_devlink.h"
> 
> -static const struct devlink_ops iavf_devlink_ops = {};
> +/**
> + * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
> + * @adapter: iavf adapter struct instance
> + *
> + * This function builds Rate Tree based on iavf adapter configuration
> + * and exports it's contents to devlink rate.
> + */
> +void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +       struct iavf_dev_rate_node *iavf_r_node;
> +       struct iavf_dev_rate_node *iavf_q_node;
> +       struct devlink_rate *dl_root_node;
> +       struct devlink_rate *dl_tmp_node;
> +       int q_num, size, i;
> +
> +       if (!adapter->devlink_port.registered)
> +               return;
> +
> +       iavf_r_node = &dl_priv->root_node;
> +       memset(iavf_r_node, 0, sizeof(*iavf_r_node));
> +       iavf_r_node->tx_max = adapter->link_speed;
> +       strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
> +
> +       devl_lock(adapter->devlink);
> +       dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
> +                                            iavf_r_node->name, NULL);
> +       if (!dl_root_node || IS_ERR(dl_root_node))
> +               goto err_node;
> +
> +       iavf_r_node->rate_node = dl_root_node;
> +
> +       /* Allocate queue nodes, and chain them under root */
> +       q_num = adapter->num_active_queues;
> +       if (q_num > 0) {
> +               size = q_num * sizeof(struct iavf_dev_rate_node);
> +               dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
> +               if (!dl_priv->queue_nodes)
> +                       goto err_node;
> +
> +               memset(dl_priv->queue_nodes, 0, size);

Why not just use kcalloc() here? Also, it seems like there's no need to 
zero the memory here.

> +
> +               for (i = 0; i < q_num; ++i) {
> +                       iavf_q_node = &dl_priv->queue_nodes[i];
> +                       snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
> +                                "txq_%d", i);
> +                       dl_tmp_node = devl_rate_node_create(adapter->devlink,
> +                                                           iavf_q_node,
> +                                                           iavf_q_node->name,
> +                                                           dl_root_node);
> +                       if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
> +                               kfree(dl_priv->queue_nodes); > +                               goto err_node;
> +                       }
> +
> +                       iavf_q_node->rate_node = dl_tmp_node;
> +                       iavf_q_node->tx_max = IAVF_TX_DEFAULT;
> +                       iavf_q_node->tx_share = 0;
> +               }
> +       }
> +
> +       dl_priv->update_in_progress = false;
> +       dl_priv->iavf_dev_rate_initialized = true;
> +       devl_unlock(adapter->devlink);
> +       return;
> +err_node:
> +       devl_rate_nodes_destroy(adapter->devlink);
> +       dl_priv->iavf_dev_rate_initialized = false;
> +       devl_unlock(adapter->devlink);
> +}
> +
> +/**
> + * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
> + * @adapter: iavf adapter struct instance
> + *
> + * This function unregisters the current iavf rate tree registered with devlink
> + * rate and frees resources.
> + */
> +void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +
> +       if (!dl_priv->iavf_dev_rate_initialized)
> +               return;
> +
> +       devl_lock(adapter->devlink);
> +       devl_rate_leaf_destroy(&adapter->devlink_port);
> +       devl_rate_nodes_destroy(adapter->devlink);
> +       kfree(dl_priv->queue_nodes);
> +       devl_unlock(adapter->devlink);
> +}
> +
> +/**
> + * iavf_check_update_config - check if updating queue parameters needed
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + *
> + * This function sets queue bw & quanta size configuration if all
> + * queue parameters are set
> + */
> +static int iavf_check_update_config(struct iavf_adapter *adapter,
> +                                   struct iavf_dev_rate_node *node)
> +{
> +       /* Update queue bw if any one of the queues have been fully updated by
> +        * user, the other queues either use the default value or the last
> +        * fully updated value
> +        */
> +       if (node->tx_update_flag ==
> +           (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
> +               node->tx_max = node->tx_max_temp;
> +               node->tx_share = node->tx_share_temp;
> +       } else {
> +               return 0;
> +       }

I think it would more readable to do the following:

if (node->tx_update_flag !=
     (IAVE_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED))
	return 0;

/* rest of function */
> +
> +       /* Reconfig queue bw only when iavf driver on running state */
> +       if (adapter->state != __IAVF_RUNNING)
> +               return -EBUSY;
> +
> +       return 0;
> +}
> +
> +/**
> + * iavf_update_queue_tx_share - sets tx min parameter
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + * @bw: bandwidth in bytes per second
> + * @extack: extended netdev ack structure
> + *
> + * This function sets min BW limit.
> + */
> +static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
> +                                     struct iavf_dev_rate_node *node,
> +                                     u64 bw, struct netlink_ext_ack *extack)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +       u64 tx_share_sum = 0;
> +
> +       /* Keep in kbps */
> +       node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
> +
> +       if (ADV_LINK_SUPPORT(adapter)) {
> +               int i;
> +
> +               for (i = 0; i < adapter->num_active_queues; ++i) {
> +                       if (node != &dl_priv->queue_nodes[i])
> +                               tx_share_sum +=
> +                                       dl_priv->queue_nodes[i].tx_share;
> +                       else
> +                               tx_share_sum += node->tx_share_temp;
> +               }
> +
> +               if (tx_share_sum / 1000  > adapter->link_speed_mbps)
> +                       return -EINVAL;
> +       }
> +
> +       node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
> +       return iavf_check_update_config(adapter, node);
> +}
> +
> +/**
> + * iavf_update_queue_tx_max - sets tx max parameter
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + * @bw: bandwidth in bytes per second
> + * @extack: extended netdev ack structure
> + *
> + * This function sets max BW limit.
> + */
> +static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
> +                                   struct iavf_dev_rate_node *node,
> +                                   u64 bw, struct netlink_ext_ack *extack)
> +{
> +       /* Keep in kbps */
> +       node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
> +       if (ADV_LINK_SUPPORT(adapter)) {
> +               if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
> +                       return -EINVAL;
> +       }
> +
> +       node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
> +
> +       return iavf_check_update_config(adapter, node);
> +}
> +
> +/**
> + * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
> + * @rate_node: devlink rate struct instance
> + *
> + * This function implements rate_node_tx_max_set function of devlink_ops
> + */
> +static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
> +                                            void *priv, u64 tx_max,
> +                                            struct netlink_ext_ack *extack)
> +{
> +       struct iavf_dev_rate_node *node = priv;
> +       struct iavf_devlink *dl_priv;
> +       struct iavf_adapter *adapter;
> +
> +       if (!node)
> +               return 0;
> +
> +       dl_priv = devlink_priv(rate_node->devlink);
> +       adapter = dl_priv->devlink_ref;
> +
> +       /* Check if last update is in progress */
> +       if (dl_priv->update_in_progress)
> +               return -EBUSY;
> +
> +       if (node == &dl_priv->root_node)
> +               return 0;
> +
> +       return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
> +}
> +
> +/**
> + * iavf_devlink_rate_node_tx_share_set - devlink_rate API for setting tx share
> + * @rate_node: devlink rate struct instance
> + *
> + * This function implements rate_node_tx_share_set function of devlink_ops
> + */
> +static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
> +                                              void *priv, u64 tx_share,
> +                                              struct netlink_ext_ack *extack)
> +{
> +       struct iavf_dev_rate_node *node = priv;
> +       struct iavf_devlink *dl_priv;
> +       struct iavf_adapter *adapter;
> +
> +       if (!node)
> +               return 0;
> +
> +       dl_priv = devlink_priv(rate_node->devlink);
> +       adapter = dl_priv->devlink_ref;
> +
> +       /* Check if last update is in progress */
> +       if (dl_priv->update_in_progress)
> +               return -EBUSY;
> +
> +       if (node == &dl_priv->root_node)
> +               return 0;
> +
> +       return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
> +}
> +
> +static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
> +                                     void *priv,
> +                                     struct netlink_ext_ack *extack)
> +{
> +       return -EINVAL;
> +}
> +
> +static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
> +                                  struct devlink_rate *parent,
> +                                  void *priv, void *parent_priv,
> +                                  struct netlink_ext_ack *extack)
> +{
> +       return -EINVAL;
> +}
> +
> +static const struct devlink_ops iavf_devlink_ops = {
> +       .rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
> +       .rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
> +       .rate_node_del = iavf_devlink_rate_node_del,
> +       .rate_leaf_parent_set = iavf_devlink_set_parent,
> +       .rate_node_parent_set = iavf_devlink_set_parent,
> +};
> 
>   /**
>    * iavf_devlink_register - Register allocated devlink instance for iavf adapter
> @@ -30,7 +296,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
>          adapter->devlink = devlink;
>          ref = devlink_priv(devlink);
>          ref->devlink_ref = adapter;
> -
> +       ref->iavf_dev_rate_initialized = false;
>          devlink_register(devlink);
> 
>          return 0;
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> index 5c122278611a..897ff5fc87af 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> @@ -4,14 +4,35 @@
>   #ifndef _IAVF_DEVLINK_H_
>   #define _IAVF_DEVLINK_H_
> 
> +#define IAVF_RATE_NODE_NAME                    12
> +struct iavf_dev_rate_node {
> +       char name[IAVF_RATE_NODE_NAME];
> +       struct devlink_rate *rate_node;
> +       u8 tx_update_flag;
> +#define IAVF_FLAG_TX_SHARE_UPDATED             BIT(0)
> +#define IAVF_FLAG_TX_MAX_UPDATED               BIT(1)
> +       u64 tx_max;
> +       u64 tx_share;
> +       u64 tx_max_temp;
> +       u64 tx_share_temp;
> +#define IAVF_RATE_DIV_FACTOR                   125
> +#define IAVF_TX_DEFAULT                                100000
> +};
> +
>   /* iavf devlink structure pointing to iavf adapter */
>   struct iavf_devlink {
>          struct iavf_adapter *devlink_ref;       /* ref to iavf adapter */
> +       struct iavf_dev_rate_node root_node;
> +       struct iavf_dev_rate_node *queue_nodes;
> +       bool iavf_dev_rate_initialized;
> +       bool update_in_progress;

It seems like this is never true until patch 5/5, so IMO it makes sense 
to add it there. Also, why is this bool needed? Is there not another 
flag or lock that can be used instead of adding this new flag/bit?

>   };
> 
>   int iavf_devlink_register(struct iavf_adapter *adapter);
>   void iavf_devlink_unregister(struct iavf_adapter *adapter);
>   int iavf_devlink_port_register(struct iavf_adapter *adapter);
>   void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
> +void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
> +void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
> 
>   #endif /* _IAVF_DEVLINK_H_ */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index db010e68d5d2..7348b65f9f19 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
>                                  iavf_free_rss(adapter);
>                                  iavf_free_misc_irq(adapter);
>                                  iavf_reset_interrupt_capability(adapter);
> +                               iavf_devlink_rate_deinit_rate_tree(adapter);
>                                  iavf_devlink_port_unregister(adapter);
>                                  iavf_change_state(adapter,
>                                                    __IAVF_INIT_CONFIG_ADAPTER);
> @@ -2709,8 +2710,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>          if (err)
>                  goto err_sw_init;
> 
> -       if (!adapter->netdev_registered)
> +       if (!adapter->netdev_registered) {
>                  iavf_devlink_port_register(adapter);
> +               iavf_devlink_rate_init_rate_tree(adapter);
> +       }
> 
>          netif_carrier_off(netdev);
>          adapter->link_up = false;
> @@ -2753,6 +2756,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>   err_mem:
>          iavf_free_rss(adapter);
>          iavf_free_misc_irq(adapter);
> +       iavf_devlink_rate_deinit_rate_tree(adapter);
>          iavf_devlink_port_unregister(adapter);
>   err_sw_init:
>          iavf_reset_interrupt_capability(adapter);
> @@ -5150,6 +5154,7 @@ static void iavf_remove(struct pci_dev *pdev)
>                                   err);
>          }
> 
> +       iavf_devlink_rate_deinit_rate_tree(adapter);
>          iavf_devlink_port_unregister(adapter);
>          iavf_devlink_unregister(adapter);
> 
> --
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support
@ 2023-08-16 17:27       ` Brett Creeley
  0 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 17:27 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> To allow user to configure queue based parameters, devlink port function
> rate api functions are added for setting node tx_max and tx_share
> parameters.
> 
> iavf rate tree with root node and  queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ---
>   .../net/ethernet/intel/iavf/iavf_devlink.c    | 270 +++++++++++++++++-
>   .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
>   drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
>   3 files changed, 295 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> index 991d041e5922..a2bd5295c216 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
> @@ -4,7 +4,273 @@
>   #include "iavf.h"
>   #include "iavf_devlink.h"
> 
> -static const struct devlink_ops iavf_devlink_ops = {};
> +/**
> + * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
> + * @adapter: iavf adapter struct instance
> + *
> + * This function builds Rate Tree based on iavf adapter configuration
> + * and exports it's contents to devlink rate.
> + */
> +void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +       struct iavf_dev_rate_node *iavf_r_node;
> +       struct iavf_dev_rate_node *iavf_q_node;
> +       struct devlink_rate *dl_root_node;
> +       struct devlink_rate *dl_tmp_node;
> +       int q_num, size, i;
> +
> +       if (!adapter->devlink_port.registered)
> +               return;
> +
> +       iavf_r_node = &dl_priv->root_node;
> +       memset(iavf_r_node, 0, sizeof(*iavf_r_node));
> +       iavf_r_node->tx_max = adapter->link_speed;
> +       strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
> +
> +       devl_lock(adapter->devlink);
> +       dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
> +                                            iavf_r_node->name, NULL);
> +       if (!dl_root_node || IS_ERR(dl_root_node))
> +               goto err_node;
> +
> +       iavf_r_node->rate_node = dl_root_node;
> +
> +       /* Allocate queue nodes, and chain them under root */
> +       q_num = adapter->num_active_queues;
> +       if (q_num > 0) {
> +               size = q_num * sizeof(struct iavf_dev_rate_node);
> +               dl_priv->queue_nodes = kzalloc(size, GFP_KERNEL);
> +               if (!dl_priv->queue_nodes)
> +                       goto err_node;
> +
> +               memset(dl_priv->queue_nodes, 0, size);

Why not just use kcalloc() here? Also, it seems like there's no need to 
zero the memory here.

> +
> +               for (i = 0; i < q_num; ++i) {
> +                       iavf_q_node = &dl_priv->queue_nodes[i];
> +                       snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
> +                                "txq_%d", i);
> +                       dl_tmp_node = devl_rate_node_create(adapter->devlink,
> +                                                           iavf_q_node,
> +                                                           iavf_q_node->name,
> +                                                           dl_root_node);
> +                       if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
> +                               kfree(dl_priv->queue_nodes); > +                               goto err_node;
> +                       }
> +
> +                       iavf_q_node->rate_node = dl_tmp_node;
> +                       iavf_q_node->tx_max = IAVF_TX_DEFAULT;
> +                       iavf_q_node->tx_share = 0;
> +               }
> +       }
> +
> +       dl_priv->update_in_progress = false;
> +       dl_priv->iavf_dev_rate_initialized = true;
> +       devl_unlock(adapter->devlink);
> +       return;
> +err_node:
> +       devl_rate_nodes_destroy(adapter->devlink);
> +       dl_priv->iavf_dev_rate_initialized = false;
> +       devl_unlock(adapter->devlink);
> +}
> +
> +/**
> + * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
> + * @adapter: iavf adapter struct instance
> + *
> + * This function unregisters the current iavf rate tree registered with devlink
> + * rate and frees resources.
> + */
> +void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +
> +       if (!dl_priv->iavf_dev_rate_initialized)
> +               return;
> +
> +       devl_lock(adapter->devlink);
> +       devl_rate_leaf_destroy(&adapter->devlink_port);
> +       devl_rate_nodes_destroy(adapter->devlink);
> +       kfree(dl_priv->queue_nodes);
> +       devl_unlock(adapter->devlink);
> +}
> +
> +/**
> + * iavf_check_update_config - check if updating queue parameters needed
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + *
> + * This function sets queue bw & quanta size configuration if all
> + * queue parameters are set
> + */
> +static int iavf_check_update_config(struct iavf_adapter *adapter,
> +                                   struct iavf_dev_rate_node *node)
> +{
> +       /* Update queue bw if any one of the queues have been fully updated by
> +        * user, the other queues either use the default value or the last
> +        * fully updated value
> +        */
> +       if (node->tx_update_flag ==
> +           (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
> +               node->tx_max = node->tx_max_temp;
> +               node->tx_share = node->tx_share_temp;
> +       } else {
> +               return 0;
> +       }

I think it would more readable to do the following:

if (node->tx_update_flag !=
     (IAVE_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED))
	return 0;

/* rest of function */
> +
> +       /* Reconfig queue bw only when iavf driver on running state */
> +       if (adapter->state != __IAVF_RUNNING)
> +               return -EBUSY;
> +
> +       return 0;
> +}
> +
> +/**
> + * iavf_update_queue_tx_share - sets tx min parameter
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + * @bw: bandwidth in bytes per second
> + * @extack: extended netdev ack structure
> + *
> + * This function sets min BW limit.
> + */
> +static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
> +                                     struct iavf_dev_rate_node *node,
> +                                     u64 bw, struct netlink_ext_ack *extack)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +       u64 tx_share_sum = 0;
> +
> +       /* Keep in kbps */
> +       node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
> +
> +       if (ADV_LINK_SUPPORT(adapter)) {
> +               int i;
> +
> +               for (i = 0; i < adapter->num_active_queues; ++i) {
> +                       if (node != &dl_priv->queue_nodes[i])
> +                               tx_share_sum +=
> +                                       dl_priv->queue_nodes[i].tx_share;
> +                       else
> +                               tx_share_sum += node->tx_share_temp;
> +               }
> +
> +               if (tx_share_sum / 1000  > adapter->link_speed_mbps)
> +                       return -EINVAL;
> +       }
> +
> +       node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
> +       return iavf_check_update_config(adapter, node);
> +}
> +
> +/**
> + * iavf_update_queue_tx_max - sets tx max parameter
> + * @adapter: iavf adapter struct instance
> + * @node: iavf rate node struct instance
> + * @bw: bandwidth in bytes per second
> + * @extack: extended netdev ack structure
> + *
> + * This function sets max BW limit.
> + */
> +static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
> +                                   struct iavf_dev_rate_node *node,
> +                                   u64 bw, struct netlink_ext_ack *extack)
> +{
> +       /* Keep in kbps */
> +       node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
> +       if (ADV_LINK_SUPPORT(adapter)) {
> +               if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
> +                       return -EINVAL;
> +       }
> +
> +       node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
> +
> +       return iavf_check_update_config(adapter, node);
> +}
> +
> +/**
> + * iavf_devlink_rate_node_tx_max_set - devlink_rate API for setting tx max
> + * @rate_node: devlink rate struct instance
> + *
> + * This function implements rate_node_tx_max_set function of devlink_ops
> + */
> +static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
> +                                            void *priv, u64 tx_max,
> +                                            struct netlink_ext_ack *extack)
> +{
> +       struct iavf_dev_rate_node *node = priv;
> +       struct iavf_devlink *dl_priv;
> +       struct iavf_adapter *adapter;
> +
> +       if (!node)
> +               return 0;
> +
> +       dl_priv = devlink_priv(rate_node->devlink);
> +       adapter = dl_priv->devlink_ref;
> +
> +       /* Check if last update is in progress */
> +       if (dl_priv->update_in_progress)
> +               return -EBUSY;
> +
> +       if (node == &dl_priv->root_node)
> +               return 0;
> +
> +       return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
> +}
> +
> +/**
> + * iavf_devlink_rate_node_tx_share_set - devlink_rate API for setting tx share
> + * @rate_node: devlink rate struct instance
> + *
> + * This function implements rate_node_tx_share_set function of devlink_ops
> + */
> +static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
> +                                              void *priv, u64 tx_share,
> +                                              struct netlink_ext_ack *extack)
> +{
> +       struct iavf_dev_rate_node *node = priv;
> +       struct iavf_devlink *dl_priv;
> +       struct iavf_adapter *adapter;
> +
> +       if (!node)
> +               return 0;
> +
> +       dl_priv = devlink_priv(rate_node->devlink);
> +       adapter = dl_priv->devlink_ref;
> +
> +       /* Check if last update is in progress */
> +       if (dl_priv->update_in_progress)
> +               return -EBUSY;
> +
> +       if (node == &dl_priv->root_node)
> +               return 0;
> +
> +       return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
> +}
> +
> +static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
> +                                     void *priv,
> +                                     struct netlink_ext_ack *extack)
> +{
> +       return -EINVAL;
> +}
> +
> +static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
> +                                  struct devlink_rate *parent,
> +                                  void *priv, void *parent_priv,
> +                                  struct netlink_ext_ack *extack)
> +{
> +       return -EINVAL;
> +}
> +
> +static const struct devlink_ops iavf_devlink_ops = {
> +       .rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
> +       .rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
> +       .rate_node_del = iavf_devlink_rate_node_del,
> +       .rate_leaf_parent_set = iavf_devlink_set_parent,
> +       .rate_node_parent_set = iavf_devlink_set_parent,
> +};
> 
>   /**
>    * iavf_devlink_register - Register allocated devlink instance for iavf adapter
> @@ -30,7 +296,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
>          adapter->devlink = devlink;
>          ref = devlink_priv(devlink);
>          ref->devlink_ref = adapter;
> -
> +       ref->iavf_dev_rate_initialized = false;
>          devlink_register(devlink);
> 
>          return 0;
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> index 5c122278611a..897ff5fc87af 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
> @@ -4,14 +4,35 @@
>   #ifndef _IAVF_DEVLINK_H_
>   #define _IAVF_DEVLINK_H_
> 
> +#define IAVF_RATE_NODE_NAME                    12
> +struct iavf_dev_rate_node {
> +       char name[IAVF_RATE_NODE_NAME];
> +       struct devlink_rate *rate_node;
> +       u8 tx_update_flag;
> +#define IAVF_FLAG_TX_SHARE_UPDATED             BIT(0)
> +#define IAVF_FLAG_TX_MAX_UPDATED               BIT(1)
> +       u64 tx_max;
> +       u64 tx_share;
> +       u64 tx_max_temp;
> +       u64 tx_share_temp;
> +#define IAVF_RATE_DIV_FACTOR                   125
> +#define IAVF_TX_DEFAULT                                100000
> +};
> +
>   /* iavf devlink structure pointing to iavf adapter */
>   struct iavf_devlink {
>          struct iavf_adapter *devlink_ref;       /* ref to iavf adapter */
> +       struct iavf_dev_rate_node root_node;
> +       struct iavf_dev_rate_node *queue_nodes;
> +       bool iavf_dev_rate_initialized;
> +       bool update_in_progress;

It seems like this is never true until patch 5/5, so IMO it makes sense 
to add it there. Also, why is this bool needed? Is there not another 
flag or lock that can be used instead of adding this new flag/bit?

>   };
> 
>   int iavf_devlink_register(struct iavf_adapter *adapter);
>   void iavf_devlink_unregister(struct iavf_adapter *adapter);
>   int iavf_devlink_port_register(struct iavf_adapter *adapter);
>   void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
> +void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
> +void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
> 
>   #endif /* _IAVF_DEVLINK_H_ */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index db010e68d5d2..7348b65f9f19 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -2037,6 +2037,7 @@ static void iavf_finish_config(struct work_struct *work)
>                                  iavf_free_rss(adapter);
>                                  iavf_free_misc_irq(adapter);
>                                  iavf_reset_interrupt_capability(adapter);
> +                               iavf_devlink_rate_deinit_rate_tree(adapter);
>                                  iavf_devlink_port_unregister(adapter);
>                                  iavf_change_state(adapter,
>                                                    __IAVF_INIT_CONFIG_ADAPTER);
> @@ -2709,8 +2710,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>          if (err)
>                  goto err_sw_init;
> 
> -       if (!adapter->netdev_registered)
> +       if (!adapter->netdev_registered) {
>                  iavf_devlink_port_register(adapter);
> +               iavf_devlink_rate_init_rate_tree(adapter);
> +       }
> 
>          netif_carrier_off(netdev);
>          adapter->link_up = false;
> @@ -2753,6 +2756,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
>   err_mem:
>          iavf_free_rss(adapter);
>          iavf_free_misc_irq(adapter);
> +       iavf_devlink_rate_deinit_rate_tree(adapter);
>          iavf_devlink_port_unregister(adapter);
>   err_sw_init:
>          iavf_reset_interrupt_capability(adapter);
> @@ -5150,6 +5154,7 @@ static void iavf_remove(struct pci_dev *pdev)
>                                   err);
>          }
> 
> +       iavf_devlink_rate_deinit_rate_tree(adapter);
>          iavf_devlink_port_unregister(adapter);
>          iavf_devlink_unregister(adapter);
> 
> --
> 2.34.1
> 
> 
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-16 17:32       ` Brett Creeley
  -1 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 17:32 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> iavf rate tree with root node and queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> User can configure the tx_max and tx_share of each queue. If any one of
> the queues have been fully updated by user, i.e. both tx_max and
> tx_share have been updated for that queue, VIRTCHNL opcodes of
> VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
> to PF to configure queues allocated to VF if PF indicates support of
> VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ---
>   drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
>   .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
>   .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
>   drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++++++++++-
>   5 files changed, 313 insertions(+), 4 deletions(-)
> 

[...]

> +/**
> + * iavf_set_tc_queue_bw - set bw of allocated tc/queues
> + * @adapter: iavf adapter struct instance
> + *
> + * This function requests PF to set queue bw of multiple tc(s)
> + */
> +static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +       struct virtchnl_queues_bw_cfg *queues_bw_cfg;
> +       struct iavf_dev_rate_node *queue_rate;
> +       u16 queue_to_tc[256];
> +       size_t len;
> +       int q_idx;
> +       int i, j;


> +       u16 tc;
> +
> +       if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
> +               /* bail because we already have a command pending */
> +               dev_err(&adapter->pdev->dev,
> +                       "Cannot set tc queue bw, command %d pending\n",
> +                       adapter->current_op);
> +               return;
> +       }
> +
> +       len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
> +       queues_bw_cfg = kzalloc(len, GFP_KERNEL);
> +       if (!queues_bw_cfg)
> +               return;
> +
> +       queue_rate = dl_priv->queue_nodes;
> +       queues_bw_cfg->vsi_id = adapter->vsi.id;
> +       queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
> +
> +       /* build tc[queue] */
> +       for (i = 0; i < adapter->num_tc; i++) {

Nit, the scope of q_idx and j can have their scope reduced to this for loop.

> +               for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
> +                       q_idx = j + adapter->ch_config.ch_info[i].offset;
> +                       queue_to_tc[q_idx] = i;
> +               }
> +       }
> +
> +       for (i = 0; i < queues_bw_cfg->num_queues; i++) {
> +               tc = queue_to_tc[i];
> +               queues_bw_cfg->cfg[i].queue_id = i;
> +               queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
> +               queues_bw_cfg->cfg[i].shaper.committed =
> +                                                   queue_rate[i].tx_share;
> +               queues_bw_cfg->cfg[i].tc = tc;
> +       }
> +
> +       adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
> +       adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
> +       iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +                        (u8 *)queues_bw_cfg, len);
> +       kfree(queues_bw_cfg);
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-16 17:32       ` Brett Creeley
  0 siblings, 0 replies; 115+ messages in thread
From: Brett Creeley @ 2023-08-16 17:32 UTC (permalink / raw)
  To: Wenjun Wu, intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

On 8/7/2023 6:57 PM, Wenjun Wu wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Jun Zhang <xuejun.zhang@intel.com>
> 
> iavf rate tree with root node and queue nodes is created and registered
> with devlink rate when iavf adapter is configured.
> 
> User can configure the tx_max and tx_share of each queue. If any one of
> the queues have been fully updated by user, i.e. both tx_max and
> tx_share have been updated for that queue, VIRTCHNL opcodes of
> VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
> to PF to configure queues allocated to VF if PF indicates support of
> VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.
> 
> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
> ---
>   drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
>   .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
>   .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
>   drivers/net/ethernet/intel/iavf/iavf_main.c   |  45 +++-
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 228 +++++++++++++++++-
>   5 files changed, 313 insertions(+), 4 deletions(-)
> 

[...]

> +/**
> + * iavf_set_tc_queue_bw - set bw of allocated tc/queues
> + * @adapter: iavf adapter struct instance
> + *
> + * This function requests PF to set queue bw of multiple tc(s)
> + */
> +static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
> +{
> +       struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
> +       struct virtchnl_queues_bw_cfg *queues_bw_cfg;
> +       struct iavf_dev_rate_node *queue_rate;
> +       u16 queue_to_tc[256];
> +       size_t len;
> +       int q_idx;
> +       int i, j;


> +       u16 tc;
> +
> +       if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
> +               /* bail because we already have a command pending */
> +               dev_err(&adapter->pdev->dev,
> +                       "Cannot set tc queue bw, command %d pending\n",
> +                       adapter->current_op);
> +               return;
> +       }
> +
> +       len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
> +       queues_bw_cfg = kzalloc(len, GFP_KERNEL);
> +       if (!queues_bw_cfg)
> +               return;
> +
> +       queue_rate = dl_priv->queue_nodes;
> +       queues_bw_cfg->vsi_id = adapter->vsi.id;
> +       queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
> +
> +       /* build tc[queue] */
> +       for (i = 0; i < adapter->num_tc; i++) {

Nit, the scope of q_idx and j can have their scope reduced to this for loop.

> +               for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
> +                       q_idx = j + adapter->ch_config.ch_info[i].offset;
> +                       queue_to_tc[q_idx] = i;
> +               }
> +       }
> +
> +       for (i = 0; i < queues_bw_cfg->num_queues; i++) {
> +               tc = queue_to_tc[i];
> +               queues_bw_cfg->cfg[i].queue_id = i;
> +               queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
> +               queues_bw_cfg->cfg[i].shaper.committed =
> +                                                   queue_rate[i].tx_share;
> +               queues_bw_cfg->cfg[i].tc = tc;
> +       }
> +
> +       adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
> +       adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
> +       iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
> +                        (u8 *)queues_bw_cfg, len);
> +       kfree(queues_bw_cfg);
> +}
> +

[...]
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support
  2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
@ 2023-08-22  3:39   ` Wenjun Wu
  2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:39 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/

Change log:

v4:
- Rearrange the ice_vf_qs_bw structure, put the largest number first
- Minimize the scope of values
- Remove the unnecessary brackets
- Remove the unnecessary memory allocation.
- Added Error Code and moved devlink registration before aq lock initialization
- Changed devlink registration for error handling in case of allocation failure
- Used kcalloc for object array memory allocation and initialization
- Changed functions & comments for readability

v3:
- Rebase the code
- Changed rate node max/share set function description
- Put variable in local scope

v2:
- Change static array to flex array
- Use struct_size helper
- Align all the error code types in the function
- Move the register field definitions to the right place in the file
- Fix coding style
- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields
---

Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  19 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 377 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  38 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  64 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 231 ++++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 310 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 119 ++++++
 18 files changed, 1218 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support
@ 2023-08-22  3:39   ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:39 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API. [1]

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf netdevice.

iavf rate tree with root node, queue nodes, and leaf node is created
and registered with devlink rate when iavf adapter is configured, and
if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
Capability Exchange.

[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node parent iavf_root
pci/0000:af:01.0/iavf_root: type node


                         +---------+
                         |   root  |
                         +----+----+
                              |
            |-----------------|-----------------|
       +----v----+       +----v----+       +----v----+
       |  txq_0  |       |  txq_1  |       |  txq_x  |
       +----+----+       +----+----+       +----+----+

User can configure the tx_max and tx_share of each queue. Once any one of the
queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
to VF

Example:

1.To Set the queue tx_share:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps

2.To Set the queue tx_max:
devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps

3.To Show Current devlink port rate info:
devlink port function rate function show
[root@localhost ~]# devlink port function rate show
pci/0000:af:01.0/txq_15: type node parent iavf_root
pci/0000:af:01.0/txq_14: type node parent iavf_root
pci/0000:af:01.0/txq_13: type node parent iavf_root
pci/0000:af:01.0/txq_12: type node parent iavf_root
pci/0000:af:01.0/txq_11: type node parent iavf_root
pci/0000:af:01.0/txq_10: type node parent iavf_root
pci/0000:af:01.0/txq_9: type node parent iavf_root
pci/0000:af:01.0/txq_8: type node parent iavf_root
pci/0000:af:01.0/txq_7: type node parent iavf_root
pci/0000:af:01.0/txq_6: type node parent iavf_root
pci/0000:af:01.0/txq_5: type node parent iavf_root
pci/0000:af:01.0/txq_4: type node parent iavf_root
pci/0000:af:01.0/txq_3: type node parent iavf_root
pci/0000:af:01.0/txq_2: type node parent iavf_root
pci/0000:af:01.0/txq_1: type node parent iavf_root
pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
pci/0000:af:01.0/iavf_root: type node


[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/

Change log:

v4:
- Rearrange the ice_vf_qs_bw structure, put the largest number first
- Minimize the scope of values
- Remove the unnecessary brackets
- Remove the unnecessary memory allocation.
- Added Error Code and moved devlink registration before aq lock initialization
- Changed devlink registration for error handling in case of allocation failure
- Used kcalloc for object array memory allocation and initialization
- Changed functions & comments for readability

v3:
- Rebase the code
- Changed rate node max/share set function description
- Put variable in local scope

v2:
- Change static array to flex array
- Use struct_size helper
- Align all the error code types in the function
- Move the register field definitions to the right place in the file
- Fix coding style
- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields
---

Jun Zhang (3):
  iavf: Add devlink and devlink port support
  iavf: Add devlink port function rate API support
  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting

Wenjun Wu (2):
  virtchnl: support queue rate limit and quanta size configuration
  ice: Support VF queue rate limit and quanta size configuration

 drivers/net/ethernet/intel/Kconfig            |   1 +
 drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  19 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 377 ++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  38 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  64 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 231 ++++++++++-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 310 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 119 ++++++
 18 files changed, 1218 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v4 1/5] virtchnl: support queue rate limit and quanta size configuration
  2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-22  3:39     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:39 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 119 +++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index d0807ad43f93..0132c002ca06 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1377,6 +1386,85 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_qos_cap_list);
+#define virtchnl_qos_cap_list_LEGACY_SIZEOF	44
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_queues_bw_cfg);
+#define virtchnl_queues_bw_cfg_LEGACY_SIZEOF	16
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 #define __vss_byone(p, member, count, old)				      \
 	(struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0)))
 
@@ -1399,6 +1487,8 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 		 __vss(virtchnl_vlan_filter_list_v2, __vss_byelem, p, m, c),  \
 		 __vss(virtchnl_tc_info, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rdma_qvlist_info, __vss_byelem, p, m, c),     \
+		 __vss(virtchnl_qos_cap_list, __vss_byelem, p, m, c),	      \
+		 __vss(virtchnl_queues_bw_cfg, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rss_key, __vss_byone, p, m, c),		      \
 		 __vss(virtchnl_rss_lut, __vss_byone, p, m, c))
 
@@ -1595,6 +1685,35 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = virtchnl_queues_bw_cfg_LEGACY_SIZEOF;
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+
+			valid_len = virtchnl_struct_size(q_bw, cfg,
+							 q_bw->num_queues);
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 1/5] virtchnl: support queue rate limit and quanta size configuration
@ 2023-08-22  3:39     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:39 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 include/linux/avf/virtchnl.h | 119 +++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index d0807ad43f93..0132c002ca06 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -84,6 +84,9 @@ enum virtchnl_rx_hsplit {
 	VIRTCHNL_RX_HSPLIT_SPLIT_SCTP    = 8,
 };
 
+enum virtchnl_bw_limit_type {
+	VIRTCHNL_BW_SHAPER = 0,
+};
 /* END GENERIC DEFINES */
 
 /* Opcodes for VF-PF communication. These are placed in the v_opcode field
@@ -145,6 +148,11 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 57 - 65 are reserved */
+	VIRTCHNL_OP_GET_QOS_CAPS = 66,
+	/* opcode 68 through 111 are reserved */
+	VIRTCHNL_OP_CONFIG_QUEUE_BW = 112,
+	VIRTCHNL_OP_CONFIG_QUANTA = 113,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -253,6 +261,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_OFFLOAD_QOS			BIT(29)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1377,6 +1386,85 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+struct virtchnl_shaper_bw {
+	/* Unit is Kbps */
+	u32 committed;
+	u32 peak;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_shaper_bw);
+
+/* VIRTCHNL_OP_GET_QOS_CAPS
+ * VF sends this message to get its QoS Caps, such as
+ * TC number, Arbiter and Bandwidth.
+ */
+struct virtchnl_qos_cap_elem {
+	u8 tc_num;
+	u8 tc_prio;
+#define VIRTCHNL_ABITER_STRICT      0
+#define VIRTCHNL_ABITER_ETS         2
+	u8 arbiter;
+#define VIRTCHNL_STRICT_WEIGHT      1
+	u8 weight;
+	enum virtchnl_bw_limit_type type;
+	union {
+		struct virtchnl_shaper_bw shaper;
+		u8 pad2[32];
+	};
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(40, virtchnl_qos_cap_elem);
+
+struct virtchnl_qos_cap_list {
+	u16 vsi_id;
+	u16 num_elem;
+	struct virtchnl_qos_cap_elem cap[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_qos_cap_list);
+#define virtchnl_qos_cap_list_LEGACY_SIZEOF	44
+
+/* VIRTCHNL_OP_CONFIG_QUEUE_BW */
+struct virtchnl_queue_bw {
+	u16 queue_id;
+	u8 tc;
+	u8 pad;
+	struct virtchnl_shaper_bw shaper;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_queue_bw);
+
+struct virtchnl_queues_bw_cfg {
+	u16 vsi_id;
+	u16 num_queues;
+	struct virtchnl_queue_bw cfg[];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(4, virtchnl_queues_bw_cfg);
+#define virtchnl_queues_bw_cfg_LEGACY_SIZEOF	16
+
+enum virtchnl_queue_type {
+	VIRTCHNL_QUEUE_TYPE_TX			= 0,
+	VIRTCHNL_QUEUE_TYPE_RX			= 1,
+};
+
+/* structure to specify a chunk of contiguous queues */
+struct virtchnl_queue_chunk {
+	/* see enum virtchnl_queue_type */
+	s32 type;
+	u16 start_queue_id;
+	u16 num_queues;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(8, virtchnl_queue_chunk);
+
+struct virtchnl_quanta_cfg {
+	u16 quanta_size;
+	struct virtchnl_queue_chunk queue_select;
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_quanta_cfg);
+
 #define __vss_byone(p, member, count, old)				      \
 	(struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0)))
 
@@ -1399,6 +1487,8 @@ VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 		 __vss(virtchnl_vlan_filter_list_v2, __vss_byelem, p, m, c),  \
 		 __vss(virtchnl_tc_info, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rdma_qvlist_info, __vss_byelem, p, m, c),     \
+		 __vss(virtchnl_qos_cap_list, __vss_byelem, p, m, c),	      \
+		 __vss(virtchnl_queues_bw_cfg, __vss_byelem, p, m, c),	      \
 		 __vss(virtchnl_rss_key, __vss_byone, p, m, c),		      \
 		 __vss(virtchnl_rss_lut, __vss_byone, p, m, c))
 
@@ -1595,6 +1685,35 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		valid_len = virtchnl_queues_bw_cfg_LEGACY_SIZEOF;
+		if (msglen >= valid_len) {
+			struct virtchnl_queues_bw_cfg *q_bw =
+				(struct virtchnl_queues_bw_cfg *)msg;
+
+			valid_len = virtchnl_struct_size(q_bw, cfg,
+							 q_bw->num_queues);
+			if (q_bw->num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		valid_len = sizeof(struct virtchnl_quanta_cfg);
+		if (msglen >= valid_len) {
+			struct virtchnl_quanta_cfg *q_quanta =
+				(struct virtchnl_quanta_cfg *)msg;
+
+			if (q_quanta->quanta_size == 0 ||
+			    q_quanta->queue_select.num_queues == 0) {
+				err_msg_format = true;
+				break;
+			}
+		}
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v4 2/5] ice: Support VF queue rate limit and quanta size configuration
  2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-22  3:40     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 310 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 370 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index cf6c961e8d9b..25cdf8623063 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -641,6 +641,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 num_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 7fa43827a3f0..2b9319801dc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 2a19802847a5..86128ca1b7a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2463,6 +2463,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2480,6 +2497,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6756f3d51d14..9da94e000394 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,6 +6,14 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD_HEAD_S			0
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index a5429eca4350..504b367f1c77 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -850,6 +850,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 	enum ice_phy_model phy_model;
 
 	u16 max_burst_size;	/* driver sets this value */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 48fea6fa0362..7fe81208c62c 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u32 committed;
+	u32 peak;
+	u16 queue_id;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index b03426ac932b..b1b14377559e 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,172 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(cap_list, cap, numtc);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	int ret;
+	u16 i;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return -EINVAL;
+
+	for (i = 0; i < num_queues; i++) {
+		u32 p_rate;
+		u8 tc;
+
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate)
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		else
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	begin_id = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs *
+		   hw->logical_pf_id;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) /
+			 hw->dev_caps.num_funcs;
+		n_used = pf->num_quanta_prof_used;
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->num_quanta_prof_used++;
+		} else {
+			return -EINVAL;
+		}
+	}
+
+	reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return 0;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1756,132 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vsi *vsi;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		if (qbw->cfg[i].shaper.peak != 0 && vf->max_tx_rate != 0 &&
+		    qbw->cfg[i].shaper.peak > vf->max_tx_rate)
+			dev_warn(ice_pf_to_dev(vf->pf), "The maximum queue %d rate limit configuration may not take effect because the maximum TX rate for VF-%d is %d\n",
+				 qbw->cfg[i].queue_id, vf->vf_id, vf->max_tx_rate);
+		if (qbw->cfg[i].shaper.committed != 0 && vf->min_tx_rate != 0 &&
+		    qbw->cfg[i].shaper.committed < vf->min_tx_rate)
+			dev_warn(ice_pf_to_dev(vf->pf), "The minimum queue %d rate limit configuration may not take effect because the minimum TX rate for VF-%d is %d\n",
+				 qbw->cfg[i].queue_id, vf->vf_id, vf->max_tx_rate);
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		vf->qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		vf->qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		vf->qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		vf->qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	end_qid = qquanta->queue_select.start_queue_id +
+		  qquanta->queue_select.num_queues;
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	quanta_size = qquanta->quanta_size;
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2005,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3985,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4039,6 +4340,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 2/5] ice: Support VF queue rate limit and quanta size configuration
@ 2023-08-22  3:40     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, Wenjun Wu, qi.z.zhang

Add support to configure VF queue rate limit and quanta size.

For quanta size configuration, the quanta profiles are divided evenly
by PF numbers. For each port, the first quanta profile is reserved for
default. When VF is asked to set queue quanta size, PF will search for
an available profile, change the fields and assigned this profile to the
queue.

Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  19 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 310 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 10 files changed, 370 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index cf6c961e8d9b..25cdf8623063 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -641,6 +641,8 @@ struct ice_pf {
 #define ICE_VF_AGG_NODE_ID_START	65
 #define ICE_MAX_VF_AGG_NODES		32
 	struct ice_agg_node vf_agg_node[ICE_MAX_VF_AGG_NODES];
+
+	u8 num_quanta_prof_used;
 };
 
 extern struct workqueue_struct *ice_lag_wq;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 7fa43827a3f0..2b9319801dc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -377,6 +377,8 @@ ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf
 		break;
 	}
 
+	tlan_ctx->quanta_prof_idx = ring->quanta_prof_id;
+
 	tlan_ctx->tso_ena = ICE_TX_LEGACY;
 	tlan_ctx->tso_qnum = pf_q;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 2a19802847a5..86128ca1b7a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2463,6 +2463,23 @@ ice_parse_func_caps(struct ice_hw *hw, struct ice_hw_func_caps *func_p,
 	ice_recalc_port_limited_caps(hw, &func_p->common_cap);
 }
 
+/**
+ * ice_func_id_to_logical_id - map from function id to logical pf id
+ * @active_function_bitmap: active function bitmap
+ * @pf_id: function number of device
+ */
+static int ice_func_id_to_logical_id(u32 active_function_bitmap, u8 pf_id)
+{
+	u8 logical_id = 0;
+	u8 i;
+
+	for (i = 0; i < pf_id; i++)
+		if (active_function_bitmap & BIT(i))
+			logical_id++;
+
+	return logical_id;
+}
+
 /**
  * ice_parse_valid_functions_cap - Parse ICE_AQC_CAPS_VALID_FUNCTIONS caps
  * @hw: pointer to the HW struct
@@ -2480,6 +2497,8 @@ ice_parse_valid_functions_cap(struct ice_hw *hw, struct ice_hw_dev_caps *dev_p,
 	dev_p->num_funcs = hweight32(number);
 	ice_debug(hw, ICE_DBG_INIT, "dev caps: num_funcs = %d\n",
 		  dev_p->num_funcs);
+
+	hw->logical_pf_id = ice_func_id_to_logical_id(number, hw->pf_id);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 6756f3d51d14..9da94e000394 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -6,6 +6,14 @@
 #ifndef _ICE_HW_AUTOGEN_H_
 #define _ICE_HW_AUTOGEN_H_
 
+#define GLCOMM_QUANTA_PROF(_i)			(0x002D2D68 + ((_i) * 4))
+#define GLCOMM_QUANTA_PROF_MAX_INDEX		15
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_S	0
+#define GLCOMM_QUANTA_PROF_QUANTA_SIZE_M	ICE_M(0x3FFF, 0)
+#define GLCOMM_QUANTA_PROF_MAX_CMD_S		16
+#define GLCOMM_QUANTA_PROF_MAX_CMD_M		ICE_M(0xFF, 16)
+#define GLCOMM_QUANTA_PROF_MAX_DESC_S		24
+#define GLCOMM_QUANTA_PROF_MAX_DESC_M		ICE_M(0x3F, 24)
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD_HEAD_S			0
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 166413fc33f4..7e152ab5b727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -381,6 +381,8 @@ struct ice_tx_ring {
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
 	u8 ptp_tx;
+
+	u16 quanta_prof_id;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_rx_ring *ring)
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index a5429eca4350..504b367f1c77 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -850,6 +850,7 @@ struct ice_hw {
 	u8 revision_id;
 
 	u8 pf_id;		/* device profile info */
+	u8 logical_pf_id;
 	enum ice_phy_model phy_model;
 
 	u16 max_burst_size;	/* driver sets this value */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 48fea6fa0362..7fe81208c62c 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -52,6 +52,13 @@ struct ice_mdd_vf_events {
 	u16 last_printed;
 };
 
+struct ice_vf_qs_bw {
+	u32 committed;
+	u32 peak;
+	u16 queue_id;
+	u8 tc;
+};
+
 /* VF operations */
 struct ice_vf_ops {
 	enum ice_disq_rst_src reset_type;
@@ -133,6 +140,8 @@ struct ice_vf {
 
 	/* devlink port data */
 	struct devlink_port devlink_port;
+
+	struct ice_vf_qs_bw qs_bw[ICE_MAX_RSS_QS_PER_VF];
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index b03426ac932b..b1b14377559e 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_QOS)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_QOS;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -985,6 +988,172 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_get_qos_caps - Get current QoS caps from PF
+ * @vf: pointer to the VF info
+ *
+ * Get VF's QoS capabilities, such as TC number, arbiter and
+ * bandwidth from PF.
+ */
+static int ice_vc_get_qos_caps(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_qos_cap_list *cap_list = NULL;
+	u8 tc_prio[ICE_MAX_TRAFFIC_CLASS] = { 0 };
+	struct virtchnl_qos_cap_elem *cfg = NULL;
+	struct ice_vsi_ctx *vsi_ctx;
+	struct ice_pf *pf = vf->pf;
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	u8 numtc, tc;
+	u16 len = 0;
+	int ret, i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	pi = pf->hw.port_info;
+	numtc = vsi->tc_cfg.numtc;
+
+	vsi_ctx = ice_get_vsi_ctx(pi->hw, vf->lan_vsi_idx);
+	if (!vsi_ctx) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = struct_size(cap_list, cap, numtc);
+	cap_list = kzalloc(len, GFP_KERNEL);
+	if (!cap_list) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	cap_list->vsi_id = vsi->vsi_num;
+	cap_list->num_elem = numtc;
+
+	/* Store the UP2TC configuration from DCB to a user priority bitmap
+	 * of each TC. Each element of prio_of_tc represents one TC. Each
+	 * bitmap indicates the user priorities belong to this TC.
+	 */
+	for (i = 0; i < ICE_MAX_USER_PRIORITY; i++) {
+		tc = pi->qos_cfg.local_dcbx_cfg.etscfg.prio_table[i];
+		tc_prio[tc] |= BIT(i);
+	}
+
+	for (i = 0; i < numtc; i++) {
+		cfg = &cap_list->cap[i];
+		cfg->tc_num = i;
+		cfg->tc_prio = tc_prio[i];
+		cfg->arbiter = pi->qos_cfg.local_dcbx_cfg.etscfg.tsatable[i];
+		cfg->weight = VIRTCHNL_STRICT_WEIGHT;
+		cfg->type = VIRTCHNL_BW_SHAPER;
+		cfg->shaper.committed = vsi_ctx->sched.bw_t_info[i].cir_bw.bw;
+		cfg->shaper.peak = vsi_ctx->sched.bw_t_info[i].eir_bw.bw;
+	}
+
+err:
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_QOS_CAPS, v_ret,
+				    (u8 *)cap_list, len);
+	kfree(cap_list);
+	return ret;
+}
+
+/**
+ * ice_vf_cfg_qs_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @num_queues: number of queues to be configured
+ *
+ * Configure per queue bandwidth.
+ */
+static int ice_vf_cfg_qs_bw(struct ice_vf *vf, u16 num_queues)
+{
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	int ret;
+	u16 i;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi)
+		return -EINVAL;
+
+	for (i = 0; i < num_queues; i++) {
+		u32 p_rate;
+		u8 tc;
+
+		p_rate = vf->qs_bw[i].peak;
+		tc = vf->qs_bw[i].tc;
+		if (p_rate)
+			ret = ice_cfg_q_bw_lmt(hw->port_info, vsi->idx, tc,
+					       vf->qs_bw[i].queue_id,
+					       ICE_MAX_BW, p_rate);
+		else
+			ret = ice_cfg_q_bw_dflt_lmt(hw->port_info, vsi->idx, tc,
+						    vf->qs_bw[i].queue_id,
+						    ICE_MAX_BW);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_vf_cfg_q_quanta_profile
+ * @vf: pointer to the VF info
+ * @quanta_prof_idx: pointer to the quanta profile index
+ * @quanta_size: quanta size to be set
+ *
+ * This function chooses available quanta profile and configures the register.
+ * The quanta profile is evenly divided by the number of device ports, and then
+ * available to the specific PF and VFs. The first profile for each PF is a
+ * reserved default profile. Only quanta size of the rest unused profile can be
+ * modified.
+ */
+static int ice_vf_cfg_q_quanta_profile(struct ice_vf *vf, u16 quanta_size,
+				       u16 *quanta_prof_idx)
+{
+	const u16 n_desc = calc_quanta_desc(quanta_size);
+	struct ice_hw *hw = &vf->pf->hw;
+	const u16 n_cmd = 2 * n_desc;
+	struct ice_pf *pf = vf->pf;
+	u16 per_pf, begin_id;
+	u8 n_used;
+	u32 reg;
+
+	begin_id = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) / hw->dev_caps.num_funcs *
+		   hw->logical_pf_id;
+
+	if (quanta_size == ICE_DFLT_QUANTA) {
+		*quanta_prof_idx = begin_id;
+	} else {
+		per_pf = (GLCOMM_QUANTA_PROF_MAX_INDEX + 1) /
+			 hw->dev_caps.num_funcs;
+		n_used = pf->num_quanta_prof_used;
+		if (n_used < per_pf) {
+			*quanta_prof_idx = begin_id + 1 + n_used;
+			pf->num_quanta_prof_used++;
+		} else {
+			return -EINVAL;
+		}
+	}
+
+	reg = FIELD_PREP(GLCOMM_QUANTA_PROF_QUANTA_SIZE_M, quanta_size) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_CMD_M, n_cmd) |
+	      FIELD_PREP(GLCOMM_QUANTA_PROF_MAX_DESC_M, n_desc);
+	wr32(hw, GLCOMM_QUANTA_PROF(*quanta_prof_idx), reg);
+
+	return 0;
+}
+
 /**
  * ice_vc_cfg_promiscuous_mode_msg
  * @vf: pointer to the VF info
@@ -1587,6 +1756,132 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 				     NULL, 0);
 }
 
+/**
+ * ice_vc_cfg_q_bw - Configure per queue bandwidth
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues bandwidth.
+ */
+static int ice_vc_cfg_q_bw(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_queues_bw_cfg *qbw =
+		(struct virtchnl_queues_bw_cfg *)msg;
+	struct ice_vsi *vsi;
+	u16 i;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states) ||
+	    !ice_vc_isvalid_vsi_id(vf, qbw->vsi_id)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi || vsi->vsi_num != qbw->vsi_id) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (qbw->num_queues > ICE_MAX_RSS_QS_PER_VF ||
+	    qbw->num_queues > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		if (qbw->cfg[i].shaper.peak != 0 && vf->max_tx_rate != 0 &&
+		    qbw->cfg[i].shaper.peak > vf->max_tx_rate)
+			dev_warn(ice_pf_to_dev(vf->pf), "The maximum queue %d rate limit configuration may not take effect because the maximum TX rate for VF-%d is %d\n",
+				 qbw->cfg[i].queue_id, vf->vf_id, vf->max_tx_rate);
+		if (qbw->cfg[i].shaper.committed != 0 && vf->min_tx_rate != 0 &&
+		    qbw->cfg[i].shaper.committed < vf->min_tx_rate)
+			dev_warn(ice_pf_to_dev(vf->pf), "The minimum queue %d rate limit configuration may not take effect because the minimum TX rate for VF-%d is %d\n",
+				 qbw->cfg[i].queue_id, vf->vf_id, vf->max_tx_rate);
+	}
+
+	for (i = 0; i < qbw->num_queues; i++) {
+		vf->qs_bw[i].queue_id = qbw->cfg[i].queue_id;
+		vf->qs_bw[i].peak = qbw->cfg[i].shaper.peak;
+		vf->qs_bw[i].committed = qbw->cfg[i].shaper.committed;
+		vf->qs_bw[i].tc = qbw->cfg[i].tc;
+	}
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+				    v_ret, NULL, 0);
+}
+
+/**
+ * ice_vc_cfg_q_quanta - Configure per queue quanta
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ *
+ * Configure VF queues quanta.
+ */
+static int ice_vc_cfg_q_quanta(struct ice_vf *vf, u8 *msg)
+{
+	u16 quanta_prof_id, quanta_size, start_qid, end_qid, i;
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_quanta_cfg *qquanta =
+		(struct virtchnl_quanta_cfg *)msg;
+	struct ice_vsi *vsi;
+	int ret;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	end_qid = qquanta->queue_select.start_queue_id +
+		  qquanta->queue_select.num_queues;
+	if (end_qid > ICE_MAX_RSS_QS_PER_VF ||
+	    end_qid > min_t(u16, vsi->alloc_txq, vsi->alloc_rxq)) {
+		dev_err(ice_pf_to_dev(vf->pf), "VF-%d trying to configure more than allocated number of queues: %d\n",
+			vf->vf_id, min_t(u16, vsi->alloc_txq, vsi->alloc_rxq));
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	quanta_size = qquanta->quanta_size;
+	if (quanta_size > ICE_MAX_QUANTA_SIZE ||
+	    quanta_size < ICE_MIN_QUANTA_SIZE) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	if (quanta_size % 64) {
+		dev_err(ice_pf_to_dev(vf->pf), "quanta size should be the product of 64\n");
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	ret = ice_vf_cfg_q_quanta_profile(vf, quanta_size,
+					  &quanta_prof_id);
+	if (ret) {
+		v_ret = VIRTCHNL_STATUS_ERR_NOT_SUPPORTED;
+		goto err;
+	}
+
+	start_qid = qquanta->queue_select.start_queue_id;
+	for (i = start_qid; i < end_qid; i++)
+		vsi->tx_rings[i]->quanta_prof_id = quanta_prof_id;
+
+err:
+	/* send the response to the VF */
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_QUANTA,
+				     v_ret, NULL, 0);
+}
+
 /**
  * ice_vc_cfg_qs_msg
  * @vf: pointer to the VF info
@@ -1710,6 +2005,9 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 		}
 	}
 
+	if (ice_vf_cfg_qs_bw(vf, qci->num_queue_pairs))
+		goto error_param;
+
 	/* send the response to the VF */
 	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
 				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
@@ -3687,6 +3985,9 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_qos_caps = ice_vc_get_qos_caps,
+	.cfg_q_bw = ice_vc_cfg_q_bw,
+	.cfg_q_quanta = ice_vc_cfg_q_quanta,
 };
 
 /**
@@ -4039,6 +4340,15 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS:
+		err = ops->get_qos_caps(vf);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		err = ops->cfg_q_bw(vf, msg);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		err = ops->cfg_q_quanta(vf, msg);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..0efb9c0f669a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -13,6 +13,13 @@
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
 
+#define ICE_DFLT_QUANTA 1024
+#define ICE_MAX_QUANTA_SIZE 4096
+#define ICE_MIN_QUANTA_SIZE 256
+
+#define calc_quanta_desc(x)	\
+	max_t(u16, 12, min_t(u16, 63, (((x) + 66) / 132) * 2 + 4))
+
 /* MAC filters: 1 is reserved for the VF's default/perm_addr/LAA MAC, 1 for
  * broadcast, and 16 for additional unicast/multicast filters
  */
@@ -51,6 +58,10 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_qos_caps)(struct ice_vf *vf);
+	int (*cfg_q_tc_map)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_bw)(struct ice_vf *vf, u8 *msg);
+	int (*cfg_q_quanta)(struct ice_vf *vf, u8 *msg);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa..2e3f63a429cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -85,6 +85,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+static const u32 tc_allowlist_opcodes[] = {
+	VIRTCHNL_OP_GET_QOS_CAPS, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+	VIRTCHNL_OP_CONFIG_QUANTA,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -105,6 +110,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_QOS, tc_allowlist_opcodes),
 };
 
 /**
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v4 3/5] iavf: Add devlink and devlink port support
  2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-22  3:40     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  5 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 94 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 16 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 17 ++++
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 9bc0a9519899..f916b8ef6acb 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 85fba85fbb23..72a68061e396 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,9 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..1cace56e3f56
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register devlink interface for this VF adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Allocate a devlinkin instance for this VF, and register the devlink
+ * instance associated with this VF adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	adapter->devlink = devlink;
+	if (!devlink)
+		return -ENOMEM;
+
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink)
+		return;
+
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..65e453bbd1a8
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index b23ca9d80189..3a93d0cac60c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2709,6 +2710,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2750,6 +2754,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4960,6 +4965,13 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	hw->bus.func = PCI_FUNC(pdev->devfn);
 	hw->bus.bus_id = pdev->bus->number;
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err) {
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		goto err_devlink_reg;
+	}
+
 	/* set up the locks for the AQ, do this only once in probe
 	 * and destroy them only once in remove
 	 */
@@ -4998,6 +5010,8 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	return 0;
 
+err_devlink_reg:
+	iounmap(hw->hw_addr);
 err_ioremap:
 	destroy_workqueue(adapter->wq);
 err_alloc_wq:
@@ -5140,6 +5154,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 3/5] iavf: Add devlink and devlink port support
@ 2023-08-22  3:40     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue bandwidth, devlink port support
is added to support devlink port rate API.

Add devlink framework registration/unregistration on iavf driver
initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
is created to be associated iavf net device.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/Kconfig            |  1 +
 drivers/net/ethernet/intel/iavf/Makefile      |  2 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  5 +
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 94 +++++++++++++++++++
 .../net/ethernet/intel/iavf/iavf_devlink.h    | 16 ++++
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 17 ++++
 6 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 9bc0a9519899..f916b8ef6acb 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -256,6 +256,7 @@ config I40EVF
 	tristate "Intel(R) Ethernet Adaptive Virtual Function support"
 	select IAVF
 	depends on PCI_MSI
+	select NET_DEVLINK
 	help
 	  This driver supports virtual functions for Intel XL710,
 	  X710, X722, XXV710, and all devices advertising support for
diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 9c3e45c54d01..b5d7db97ab8b 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,5 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-objs := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	     iavf_adv_rss.o \
+	     iavf_adv_rss.o iavf_devlink.o \
 	     iavf_txrx.o iavf_common.o iavf_adminq.o iavf_client.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 85fba85fbb23..72a68061e396 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -33,9 +33,11 @@
 #include <net/udp.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_mirred.h>
+#include <net/devlink.h>
 
 #include "iavf_type.h"
 #include <linux/avf/virtchnl.h>
+#include "iavf_devlink.h"
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
@@ -369,6 +371,9 @@ struct iavf_adapter {
 	struct net_device *netdev;
 	struct pci_dev *pdev;
 
+	struct devlink *devlink;
+	struct devlink_port devlink_port;
+
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
 	enum iavf_state_t state;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
new file mode 100644
index 000000000000..1cace56e3f56
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "iavf.h"
+#include "iavf_devlink.h"
+
+static const struct devlink_ops iavf_devlink_ops = {};
+
+/**
+ * iavf_devlink_register - Register devlink interface for this VF adapter
+ * @adapter: the iavf adapter to register the devlink for.
+ *
+ * Allocate a devlinkin instance for this VF, and register the devlink
+ * instance associated with this VF adapter
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_devlink *ref;
+	struct devlink *devlink;
+
+	devlink = devlink_alloc(&iavf_devlink_ops, sizeof(struct iavf_devlink),
+				dev);
+	adapter->devlink = devlink;
+	if (!devlink)
+		return -ENOMEM;
+
+	ref = devlink_priv(devlink);
+	ref->devlink_ref = adapter;
+
+	devlink_register(devlink);
+
+	return 0;
+}
+
+/**
+ * iavf_devlink_unregister - Unregister devlink resources for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink and cleans up associated memory.
+ */
+void iavf_devlink_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink)
+		return;
+
+	devlink_unregister(adapter->devlink);
+	devlink_free(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_port_register - Register devlink port for iavf adapter
+ * @adapter: the iavf adapter to register the devlink port for.
+ *
+ * Register the devlink port instance associated with this iavf adapter
+ * before iavf adapter registers with netdevice
+ *
+ * Return: zero on success or an error code on failure.
+ */
+int iavf_devlink_port_register(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct devlink_port_attrs attrs = {};
+	int err;
+
+	SET_NETDEV_DEVLINK_PORT(adapter->netdev, &adapter->devlink_port);
+	attrs.flavour = DEVLINK_PORT_FLAVOUR_VIRTUAL;
+	memset(&adapter->devlink_port, 0, sizeof(adapter->devlink_port));
+	devlink_port_attrs_set(&adapter->devlink_port, &attrs);
+
+	/* Register with driver specific index (device id) */
+	err = devlink_port_register(adapter->devlink, &adapter->devlink_port,
+				    adapter->hw.bus.device);
+	if (err)
+		dev_err(dev, "devlink port registration failed: %d\n", err);
+
+	return err;
+}
+
+/**
+ * iavf_devlink_port_unregister - Unregister devlink port for iavf adapter.
+ * @adapter: the iavf adapter structure
+ *
+ * Releases resources used by devlink port and registration with devlink.
+ */
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter)
+{
+	if (!adapter->devlink_port.registered)
+		return;
+
+	devlink_port_unregister(&adapter->devlink_port);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
new file mode 100644
index 000000000000..65e453bbd1a8
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IAVF_DEVLINK_H_
+#define _IAVF_DEVLINK_H_
+
+struct iavf_devlink {
+	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+};
+
+int iavf_devlink_register(struct iavf_adapter *adapter);
+void iavf_devlink_unregister(struct iavf_adapter *adapter);
+int iavf_devlink_port_register(struct iavf_adapter *adapter);
+void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+
+#endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index b23ca9d80189..3a93d0cac60c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
 				goto out;
@@ -2709,6 +2710,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
+	if (!adapter->netdev_registered)
+		iavf_devlink_port_register(adapter);
+
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
 	netif_tx_stop_all_queues(netdev);
@@ -2750,6 +2754,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
 err:
@@ -4960,6 +4965,13 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	hw->bus.func = PCI_FUNC(pdev->devfn);
 	hw->bus.bus_id = pdev->bus->number;
 
+	/* Register iavf adapter with devlink */
+	err = iavf_devlink_register(adapter);
+	if (err) {
+		dev_err(&pdev->dev, "devlink registration failed: %d\n", err);
+		goto err_devlink_reg;
+	}
+
 	/* set up the locks for the AQ, do this only once in probe
 	 * and destroy them only once in remove
 	 */
@@ -4998,6 +5010,8 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	return 0;
 
+err_devlink_reg:
+	iounmap(hw->hw_addr);
 err_ioremap:
 	destroy_workqueue(adapter->wq);
 err_alloc_wq:
@@ -5140,6 +5154,9 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_port_unregister(adapter);
+	iavf_devlink_unregister(adapter);
+
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
 	iavf_change_state(adapter, __IAVF_REMOVE);
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v4 4/5] iavf: Add devlink port function rate API support
  2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-22  3:40     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 258 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 283 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 1cace56e3f56..732076c2126f 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,261 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		int i;
+
+		dl_priv->queue_nodes =
+			kcalloc(q_num, sizeof(struct iavf_dev_rate_node),
+				GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag !=
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED))
+		return 0;
+
+	node->tx_max = node->tx_max_temp;
+	node->tx_share = node->tx_share_temp;
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register devlink interface for this VF adapter
@@ -29,7 +283,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 65e453bbd1a8..751e9e093ab1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,13 +4,34 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 3a93d0cac60c..699c6375200a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2710,8 +2711,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2754,6 +2757,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5154,6 +5158,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 4/5] iavf: Add devlink port function rate API support
@ 2023-08-22  3:40     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

To allow user to configure queue based parameters, devlink port function
rate api functions are added for setting node tx_max and tx_share
parameters.

iavf rate tree with root node and  queue nodes is created and registered
with devlink rate when iavf adapter is configured.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 .../net/ethernet/intel/iavf/iavf_devlink.c    | 258 +++++++++++++++++-
 .../net/ethernet/intel/iavf/iavf_devlink.h    |  21 ++
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   7 +-
 3 files changed, 283 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 1cace56e3f56..732076c2126f 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -4,7 +4,261 @@
 #include "iavf.h"
 #include "iavf_devlink.h"
 
-static const struct devlink_ops iavf_devlink_ops = {};
+/**
+ * iavf_devlink_rate_init_rate_tree - export rate tree to devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function builds Rate Tree based on iavf adapter configuration
+ * and exports it's contents to devlink rate.
+ */
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct iavf_dev_rate_node *iavf_r_node;
+	struct iavf_dev_rate_node *iavf_q_node;
+	struct devlink_rate *dl_root_node;
+	struct devlink_rate *dl_tmp_node;
+	int q_num;
+
+	if (!adapter->devlink_port.registered)
+		return;
+
+	iavf_r_node = &dl_priv->root_node;
+	memset(iavf_r_node, 0, sizeof(*iavf_r_node));
+	iavf_r_node->tx_max = adapter->link_speed;
+	strscpy(iavf_r_node->name, "iavf_root", IAVF_RATE_NODE_NAME);
+
+	devl_lock(adapter->devlink);
+	dl_root_node = devl_rate_node_create(adapter->devlink, iavf_r_node,
+					     iavf_r_node->name, NULL);
+	if (!dl_root_node || IS_ERR(dl_root_node))
+		goto err_node;
+
+	iavf_r_node->rate_node = dl_root_node;
+
+	/* Allocate queue nodes, and chain them under root */
+	q_num = adapter->num_active_queues;
+	if (q_num > 0) {
+		int i;
+
+		dl_priv->queue_nodes =
+			kcalloc(q_num, sizeof(struct iavf_dev_rate_node),
+				GFP_KERNEL);
+		if (!dl_priv->queue_nodes)
+			goto err_node;
+
+		for (i = 0; i < q_num; ++i) {
+			iavf_q_node = &dl_priv->queue_nodes[i];
+			snprintf(iavf_q_node->name, IAVF_RATE_NODE_NAME,
+				 "txq_%d", i);
+			dl_tmp_node = devl_rate_node_create(adapter->devlink,
+							    iavf_q_node,
+							    iavf_q_node->name,
+							    dl_root_node);
+			if (!dl_tmp_node || IS_ERR(dl_tmp_node)) {
+				kfree(dl_priv->queue_nodes);
+				goto err_node;
+			}
+
+			iavf_q_node->rate_node = dl_tmp_node;
+			iavf_q_node->tx_max = IAVF_TX_DEFAULT;
+			iavf_q_node->tx_share = 0;
+		}
+	}
+
+	dl_priv->update_in_progress = false;
+	dl_priv->iavf_dev_rate_initialized = true;
+	devl_unlock(adapter->devlink);
+	return;
+err_node:
+	devl_rate_nodes_destroy(adapter->devlink);
+	dl_priv->iavf_dev_rate_initialized = false;
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_devlink_rate_deinit_rate_tree - Unregister rate tree with devlink rate
+ * @adapter: iavf adapter struct instance
+ *
+ * This function unregisters the current iavf rate tree registered with devlink
+ * rate and frees resources.
+ */
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
+	if (!dl_priv->iavf_dev_rate_initialized)
+		return;
+
+	devl_lock(adapter->devlink);
+	devl_rate_leaf_destroy(&adapter->devlink_port);
+	devl_rate_nodes_destroy(adapter->devlink);
+	kfree(dl_priv->queue_nodes);
+	devl_unlock(adapter->devlink);
+}
+
+/**
+ * iavf_check_update_config - check if updating queue parameters needed
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ *
+ * This function sets queue bw & quanta size configuration if all
+ * queue parameters are set
+ */
+static int iavf_check_update_config(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node)
+{
+	/* Update queue bw if any one of the queues have been fully updated by
+	 * user, the other queues either use the default value or the last
+	 * fully updated value
+	 */
+	if (node->tx_update_flag !=
+	    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED))
+		return 0;
+
+	node->tx_max = node->tx_max_temp;
+	node->tx_share = node->tx_share_temp;
+
+	/* Reconfig queue bw only when iavf driver on running state */
+	if (adapter->state != __IAVF_RUNNING)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * iavf_update_queue_tx_share - sets tx min parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets min BW limit.
+ */
+static int iavf_update_queue_tx_share(struct iavf_adapter *adapter,
+				      struct iavf_dev_rate_node *node,
+				      u64 bw, struct netlink_ext_ack *extack)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	u64 tx_share_sum = 0;
+
+	/* Keep in kbps */
+	node->tx_share_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+
+	if (ADV_LINK_SUPPORT(adapter)) {
+		int i;
+
+		for (i = 0; i < adapter->num_active_queues; ++i) {
+			if (node != &dl_priv->queue_nodes[i])
+				tx_share_sum +=
+					dl_priv->queue_nodes[i].tx_share;
+			else
+				tx_share_sum += node->tx_share_temp;
+		}
+
+		if (tx_share_sum / 1000  > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_SHARE_UPDATED;
+	return iavf_check_update_config(adapter, node);
+}
+
+/**
+ * iavf_update_queue_tx_max - sets tx max parameter
+ * @adapter: iavf adapter struct instance
+ * @node: iavf rate node struct instance
+ * @bw: bandwidth in bytes per second
+ * @extack: extended netdev ack structure
+ *
+ * This function sets max BW limit.
+ */
+static int iavf_update_queue_tx_max(struct iavf_adapter *adapter,
+				    struct iavf_dev_rate_node *node,
+				    u64 bw, struct netlink_ext_ack *extack)
+{
+	/* Keep in kbps */
+	node->tx_max_temp = div_u64(bw, IAVF_RATE_DIV_FACTOR);
+	if (ADV_LINK_SUPPORT(adapter)) {
+		if (node->tx_max_temp / 1000 > adapter->link_speed_mbps)
+			return -EINVAL;
+	}
+
+	node->tx_update_flag |= IAVF_FLAG_TX_MAX_UPDATED;
+
+	return iavf_check_update_config(adapter, node);
+}
+
+static int iavf_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node,
+					     void *priv, u64 tx_max,
+					     struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_max(adapter, node, tx_max, extack);
+}
+
+static int iavf_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node,
+					       void *priv, u64 tx_share,
+					       struct netlink_ext_ack *extack)
+{
+	struct iavf_dev_rate_node *node = priv;
+	struct iavf_devlink *dl_priv;
+	struct iavf_adapter *adapter;
+
+	if (!node)
+		return 0;
+
+	dl_priv = devlink_priv(rate_node->devlink);
+	adapter = dl_priv->devlink_ref;
+
+	/* Check if last update is in progress */
+	if (dl_priv->update_in_progress)
+		return -EBUSY;
+
+	if (node == &dl_priv->root_node)
+		return 0;
+
+	return iavf_update_queue_tx_share(adapter, node, tx_share, extack);
+}
+
+static int iavf_devlink_rate_node_del(struct devlink_rate *rate_node,
+				      void *priv,
+				      struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static int iavf_devlink_set_parent(struct devlink_rate *devlink_rate,
+				   struct devlink_rate *parent,
+				   void *priv, void *parent_priv,
+				   struct netlink_ext_ack *extack)
+{
+	return -EINVAL;
+}
+
+static const struct devlink_ops iavf_devlink_ops = {
+	.rate_node_tx_share_set = iavf_devlink_rate_node_tx_share_set,
+	.rate_node_tx_max_set = iavf_devlink_rate_node_tx_max_set,
+	.rate_node_del = iavf_devlink_rate_node_del,
+	.rate_leaf_parent_set = iavf_devlink_set_parent,
+	.rate_node_parent_set = iavf_devlink_set_parent,
+};
 
 /**
  * iavf_devlink_register - Register devlink interface for this VF adapter
@@ -29,7 +283,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
-
+	ref->iavf_dev_rate_initialized = false;
 	devlink_register(devlink);
 
 	return 0;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 65e453bbd1a8..751e9e093ab1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -4,13 +4,34 @@
 #ifndef _IAVF_DEVLINK_H_
 #define _IAVF_DEVLINK_H_
 
+#define IAVF_RATE_NODE_NAME			12
+struct iavf_dev_rate_node {
+	char name[IAVF_RATE_NODE_NAME];
+	struct devlink_rate *rate_node;
+	u8 tx_update_flag;
+#define IAVF_FLAG_TX_SHARE_UPDATED		BIT(0)
+#define IAVF_FLAG_TX_MAX_UPDATED		BIT(1)
+	u64 tx_max;
+	u64 tx_share;
+	u64 tx_max_temp;
+	u64 tx_share_temp;
+#define IAVF_RATE_DIV_FACTOR			125
+#define IAVF_TX_DEFAULT				100000
+};
+
 struct iavf_devlink {
 	struct iavf_adapter *devlink_ref;	/* ref to iavf adapter */
+	struct iavf_dev_rate_node root_node;
+	struct iavf_dev_rate_node *queue_nodes;
+	bool iavf_dev_rate_initialized;
+	bool update_in_progress;
 };
 
 int iavf_devlink_register(struct iavf_adapter *adapter);
 void iavf_devlink_unregister(struct iavf_adapter *adapter);
 int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
+void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
+void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 3a93d0cac60c..699c6375200a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2038,6 +2038,7 @@ static void iavf_finish_config(struct work_struct *work)
 				iavf_free_rss(adapter);
 				iavf_free_misc_irq(adapter);
 				iavf_reset_interrupt_capability(adapter);
+				iavf_devlink_rate_deinit_rate_tree(adapter);
 				iavf_devlink_port_unregister(adapter);
 				iavf_change_state(adapter,
 						  __IAVF_INIT_CONFIG_ADAPTER);
@@ -2710,8 +2711,10 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 	if (err)
 		goto err_sw_init;
 
-	if (!adapter->netdev_registered)
+	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
+		iavf_devlink_rate_init_rate_tree(adapter);
+	}
 
 	netif_carrier_off(netdev);
 	adapter->link_up = false;
@@ -2754,6 +2757,7 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 err_mem:
 	iavf_free_rss(adapter);
 	iavf_free_misc_irq(adapter);
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 err_sw_init:
 	iavf_reset_interrupt_capability(adapter);
@@ -5154,6 +5158,7 @@ static void iavf_remove(struct pci_dev *pdev)
 				 err);
 	}
 
+	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
 
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH iwl-next v4 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
  2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-22  3:40     ` Wenjun Wu
  -1 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev
  Cc: xuejun.zhang, madhu.chittim, qi.z.zhang, anthony.l.nguyen

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  46 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 231 +++++++++++++++++-
 5 files changed, 317 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 72a68061e396..c04cd1d45be7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -373,6 +379,7 @@ struct iavf_adapter {
 
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -422,6 +429,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -430,6 +439,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -576,6 +586,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 732076c2126f..aefe707aafbc 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -97,6 +97,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -108,6 +132,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -281,6 +309,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	if (!devlink)
 		return -ENOMEM;
 
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 751e9e093ab1..4709aa1a0341 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -33,5 +33,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 699c6375200a..c69c8beab3b5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2131,6 +2131,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2713,7 +2728,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3136,6 +3153,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4901,7 +4931,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -4969,6 +4999,13 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	hw->bus.func = PCI_FUNC(pdev->devfn);
 	hw->bus.bus_id = pdev->bus->number;
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps) {
+		err = -ENOMEM;
+		goto err_alloc_qos_cap;
+	}
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
 	if (err) {
@@ -5014,8 +5051,10 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	return 0;
 
-err_devlink_reg:
+err_alloc_qos_cap:
 	iounmap(hw->hw_addr);
+err_devlink_reg:
+	kfree(adapter->qos_caps);
 err_ioremap:
 	destroy_workqueue(adapter->wq);
 err_alloc_wq:
@@ -5161,6 +5200,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index f9727e9c3d63..2eaa93705527 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1465,6 +1466,210 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	u16 tc;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		int j, q_idx;
+
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2124,6 +2329,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2456,6 +2673,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS: {
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v4 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
@ 2023-08-22  3:40     ` Wenjun Wu
  0 siblings, 0 replies; 115+ messages in thread
From: Wenjun Wu @ 2023-08-22  3:40 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: anthony.l.nguyen, qi.z.zhang

From: Jun Zhang <xuejun.zhang@intel.com>

iavf rate tree with root node and queue nodes is created and registered
with devlink rate when iavf adapter is configured.

User can configure the tx_max and tx_share of each queue. If any one of
the queues have been fully updated by user, i.e. both tx_max and
tx_share have been updated for that queue, VIRTCHNL opcodes of
VIRTCHNL_OP_CONFIG_QUEUE_BW and VIRTCHNL_OP_CONFIG_QUANTA will be sent
to PF to configure queues allocated to VF if PF indicates support of
VIRTCHNL_VF_OFFLOAD_QOS through VF Resource / Capability Exchange.

Signed-off-by: Jun Zhang <xuejun.zhang@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  14 ++
 .../net/ethernet/intel/iavf/iavf_devlink.c    |  29 +++
 .../net/ethernet/intel/iavf/iavf_devlink.h    |   1 +
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  46 +++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 231 +++++++++++++++++-
 5 files changed, 317 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 72a68061e396..c04cd1d45be7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -252,6 +252,9 @@ struct iavf_cloud_filter {
 #define IAVF_RESET_WAIT_DETECTED_COUNT 500
 #define IAVF_RESET_WAIT_COMPLETE_COUNT 2000
 
+#define IAVF_MAX_QOS_TC_NUM		8
+#define IAVF_DEFAULT_QUANTA_SIZE	1024
+
 /* board specific private data structure */
 struct iavf_adapter {
 	struct workqueue_struct *wq;
@@ -351,6 +354,9 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW		BIT_ULL(39)
+#define IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE	BIT_ULL(40)
+#define IAVF_FLAG_AQ_GET_QOS_CAPS			BIT_ULL(41)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -373,6 +379,7 @@ struct iavf_adapter {
 
 	struct devlink *devlink;
 	struct devlink_port devlink_port;
+	bool devlink_update;
 
 	struct iavf_hw hw; /* defined in iavf_type.h */
 
@@ -422,6 +429,8 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define QOS_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_OFFLOAD_QOS)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -430,6 +439,7 @@ struct iavf_adapter {
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
+	struct virtchnl_qos_cap_list *qos_caps;
 	struct iavf_vsi vsi;
 	u32 aq_wait_count;
 	/* RSS stuff */
@@ -576,6 +586,10 @@ void iavf_notify_client_message(struct iavf_vsi *vsi, u8 *msg, u16 len);
 void iavf_notify_client_l2_params(struct iavf_vsi *vsi);
 void iavf_notify_client_open(struct iavf_vsi *vsi);
 void iavf_notify_client_close(struct iavf_vsi *vsi, bool reset);
+void iavf_update_queue_config(struct iavf_adapter *adapter);
+void iavf_configure_queues_bw(struct iavf_adapter *adapter);
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter);
+void iavf_get_qos_caps(struct iavf_adapter *adapter);
 void iavf_enable_channels(struct iavf_adapter *adapter);
 void iavf_disable_channels(struct iavf_adapter *adapter);
 void iavf_add_cloud_filter(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.c b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
index 732076c2126f..aefe707aafbc 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.c
@@ -97,6 +97,30 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 	devl_unlock(adapter->devlink);
 }
 
+/**
+ * iavf_notify_queue_config_complete - notify updating queue completion
+ * @adapter: iavf adapter struct instance
+ *
+ * This function sets the queue configuration update status when all
+ * queue parameters have been sent to PF
+ */
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	int q_num = adapter->num_active_queues;
+	int i;
+
+	/* clean up rate tree update flags*/
+	for (i = 0; i < q_num; i++)
+		if (dl_priv->queue_nodes[i].tx_update_flag ==
+		    (IAVF_FLAG_TX_MAX_UPDATED | IAVF_FLAG_TX_SHARE_UPDATED)) {
+			dl_priv->queue_nodes[i].tx_update_flag = 0;
+			break;
+		}
+
+	dl_priv->update_in_progress = false;
+}
+
 /**
  * iavf_check_update_config - check if updating queue parameters needed
  * @adapter: iavf adapter struct instance
@@ -108,6 +132,8 @@ void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter)
 static int iavf_check_update_config(struct iavf_adapter *adapter,
 				    struct iavf_dev_rate_node *node)
 {
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+
 	/* Update queue bw if any one of the queues have been fully updated by
 	 * user, the other queues either use the default value or the last
 	 * fully updated value
@@ -123,6 +149,8 @@ static int iavf_check_update_config(struct iavf_adapter *adapter,
 	if (adapter->state != __IAVF_RUNNING)
 		return -EBUSY;
 
+	dl_priv->update_in_progress = true;
+	iavf_update_queue_config(adapter);
 	return 0;
 }
 
@@ -281,6 +309,7 @@ int iavf_devlink_register(struct iavf_adapter *adapter)
 	if (!devlink)
 		return -ENOMEM;
 
+	adapter->devlink_update = false;
 	ref = devlink_priv(devlink);
 	ref->devlink_ref = adapter;
 	ref->iavf_dev_rate_initialized = false;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_devlink.h b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
index 751e9e093ab1..4709aa1a0341 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_devlink.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_devlink.h
@@ -33,5 +33,6 @@ int iavf_devlink_port_register(struct iavf_adapter *adapter);
 void iavf_devlink_port_unregister(struct iavf_adapter *adapter);
 void iavf_devlink_rate_init_rate_tree(struct iavf_adapter *adapter);
 void iavf_devlink_rate_deinit_rate_tree(struct iavf_adapter *adapter);
+void iavf_notify_queue_config_complete(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 699c6375200a..c69c8beab3b5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2131,6 +2131,21 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return 0;
 	}
 
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW) {
+		iavf_configure_queues_bw(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_QOS_CAPS) {
+		iavf_get_qos_caps(adapter);
+		return 0;
+	}
+
+	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE) {
+		iavf_configure_queues_quanta_size(adapter);
+		return 0;
+	}
+
 	if (adapter->aq_required & IAVF_FLAG_AQ_CONFIGURE_QUEUES) {
 		iavf_configure_queues(adapter);
 		return 0;
@@ -2713,7 +2728,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 
 	if (!adapter->netdev_registered) {
 		iavf_devlink_port_register(adapter);
-		iavf_devlink_rate_init_rate_tree(adapter);
+
+		if (QOS_ALLOWED(adapter))
+			iavf_devlink_rate_init_rate_tree(adapter);
 	}
 
 	netif_carrier_off(netdev);
@@ -3136,6 +3153,19 @@ static void iavf_reset_task(struct work_struct *work)
 		err = iavf_reinit_interrupt_scheme(adapter, running);
 		if (err)
 			goto reset_err;
+
+		if (QOS_ALLOWED(adapter)) {
+			iavf_devlink_rate_deinit_rate_tree(adapter);
+			iavf_devlink_rate_init_rate_tree(adapter);
+		}
+	}
+
+	if (adapter->devlink_update) {
+		adapter->aq_required |= IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+		adapter->aq_required |= IAVF_FLAG_AQ_GET_QOS_CAPS;
+		adapter->aq_required |=
+				IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+		adapter->devlink_update = false;
 	}
 
 	if (RSS_AQ(adapter)) {
@@ -4901,7 +4931,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct net_device *netdev;
 	struct iavf_adapter *adapter = NULL;
 	struct iavf_hw *hw = NULL;
-	int err;
+	int err, len;
 
 	err = pci_enable_device(pdev);
 	if (err)
@@ -4969,6 +4999,13 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	hw->bus.func = PCI_FUNC(pdev->devfn);
 	hw->bus.bus_id = pdev->bus->number;
 
+	len = struct_size(adapter->qos_caps, cap, IAVF_MAX_QOS_TC_NUM);
+	adapter->qos_caps = kzalloc(len, GFP_KERNEL);
+	if (!adapter->qos_caps) {
+		err = -ENOMEM;
+		goto err_alloc_qos_cap;
+	}
+
 	/* Register iavf adapter with devlink */
 	err = iavf_devlink_register(adapter);
 	if (err) {
@@ -5014,8 +5051,10 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	return 0;
 
-err_devlink_reg:
+err_alloc_qos_cap:
 	iounmap(hw->hw_addr);
+err_devlink_reg:
+	kfree(adapter->qos_caps);
 err_ioremap:
 	destroy_workqueue(adapter->wq);
 err_alloc_wq:
@@ -5161,6 +5200,7 @@ static void iavf_remove(struct pci_dev *pdev)
 	iavf_devlink_rate_deinit_rate_tree(adapter);
 	iavf_devlink_port_unregister(adapter);
 	iavf_devlink_unregister(adapter);
+	kfree(adapter->qos_caps);
 
 	mutex_lock(&adapter->crit_lock);
 	dev_info(&adapter->pdev->dev, "Removing device\n");
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index f9727e9c3d63..2eaa93705527 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,7 +148,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
 	       VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF |
-	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
+	       VIRTCHNL_VF_CAP_ADV_LINK_SPEED |
+	       VIRTCHNL_VF_OFFLOAD_QOS;
 
 	adapter->current_op = VIRTCHNL_OP_GET_VF_RESOURCES;
 	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_CONFIG;
@@ -1465,6 +1466,210 @@ iavf_set_adapter_link_speed_from_vpe(struct iavf_adapter *adapter,
 		adapter->link_speed = vpe->event_data.link_event.link_speed;
 }
 
+/**
+ * iavf_get_qos_caps - get qos caps support
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF for Supported QoS Caps.
+ */
+void iavf_get_qos_caps(struct iavf_adapter *adapter)
+{
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot get qos caps, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_GET_QOS_CAPS;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_QOS_CAPS;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_QOS_CAPS, NULL, 0);
+}
+
+/**
+ * iavf_set_quanta_size - set quanta size of queue chunk
+ * @adapter: iavf adapter struct instance
+ * @quanta_size: quanta size in bytes
+ * @queue_index: starting index of queue chunk
+ * @num_queues: number of queues in the queue chunk
+ *
+ * This function requests PF to set quanta size of queue chunk
+ * starting at queue_index.
+ */
+static void
+iavf_set_quanta_size(struct iavf_adapter *adapter, u16 quanta_size,
+		     u16 queue_index, u16 num_queues)
+{
+	struct virtchnl_quanta_cfg quanta_cfg;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set queue quanta size, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUANTA;
+	quanta_cfg.quanta_size = quanta_size;
+	quanta_cfg.queue_select.type = VIRTCHNL_QUEUE_TYPE_TX;
+	quanta_cfg.queue_select.start_queue_id = queue_index;
+	quanta_cfg.queue_select.num_queues = num_queues;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_QUANTA_SIZE;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUANTA,
+			 (u8 *)&quanta_cfg, sizeof(quanta_cfg));
+}
+
+/**
+ * iavf_set_queue_bw - set bw of allocated queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of tc0 queues
+ */
+static void iavf_set_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	size_t len;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->num_active_queues;
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = 0;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_set_tc_queue_bw - set bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to set queue bw of multiple tc(s)
+ */
+static void iavf_set_tc_queue_bw(struct iavf_adapter *adapter)
+{
+	struct iavf_devlink *dl_priv = devlink_priv(adapter->devlink);
+	struct virtchnl_queues_bw_cfg *queues_bw_cfg;
+	struct iavf_dev_rate_node *queue_rate;
+	u16 queue_to_tc[256];
+	size_t len;
+	u16 tc;
+	int i;
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(&adapter->pdev->dev,
+			"Cannot set tc queue bw, command %d pending\n",
+			adapter->current_op);
+		return;
+	}
+
+	len = struct_size(queues_bw_cfg, cfg, adapter->num_active_queues);
+	queues_bw_cfg = kzalloc(len, GFP_KERNEL);
+	if (!queues_bw_cfg)
+		return;
+
+	queue_rate = dl_priv->queue_nodes;
+	queues_bw_cfg->vsi_id = adapter->vsi.id;
+	queues_bw_cfg->num_queues = adapter->ch_config.total_qps;
+
+	/* build tc[queue] */
+	for (i = 0; i < adapter->num_tc; i++) {
+		int j, q_idx;
+
+		for (j = 0; j < adapter->ch_config.ch_info[i].count; ++j) {
+			q_idx = j + adapter->ch_config.ch_info[i].offset;
+			queue_to_tc[q_idx] = i;
+		}
+	}
+
+	for (i = 0; i < queues_bw_cfg->num_queues; i++) {
+		tc = queue_to_tc[i];
+		queues_bw_cfg->cfg[i].queue_id = i;
+		queues_bw_cfg->cfg[i].shaper.peak = queue_rate[i].tx_max;
+		queues_bw_cfg->cfg[i].shaper.committed =
+						    queue_rate[i].tx_share;
+		queues_bw_cfg->cfg[i].tc = tc;
+	}
+
+	adapter->current_op = VIRTCHNL_OP_CONFIG_QUEUE_BW;
+	adapter->aq_required &= ~IAVF_FLAG_AQ_CONFIGURE_QUEUES_BW;
+	iavf_send_pf_msg(adapter, VIRTCHNL_OP_CONFIG_QUEUE_BW,
+			 (u8 *)queues_bw_cfg, len);
+	kfree(queues_bw_cfg);
+}
+
+/**
+ * iavf_configure_queues_bw - configure bw of allocated tc/queues
+ * @adapter: iavf adapter struct instance
+ *
+ * This function requests PF to configure queue bw of allocated
+ * tc/queues
+ */
+void iavf_configure_queues_bw(struct iavf_adapter *adapter)
+{
+	/* Set Queue bw */
+	if (adapter->ch_config.state == __IAVF_TC_INVALID)
+		iavf_set_queue_bw(adapter);
+	else
+		iavf_set_tc_queue_bw(adapter);
+}
+
+/**
+ * iavf_configure_queues_quanta_size - configure quanta size of queues
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure quanta size of allocated queues.
+ **/
+void iavf_configure_queues_quanta_size(struct iavf_adapter *adapter)
+{
+	int quanta_size = IAVF_DEFAULT_QUANTA_SIZE;
+
+	/* Set Queue Quanta Size to default */
+	iavf_set_quanta_size(adapter, quanta_size, 0,
+			     adapter->num_active_queues);
+}
+
+/**
+ * iavf_update_queue_config - request queue configuration update
+ * @adapter: adapter structure
+ *
+ * Request that the PF configure queue quanta size and queue bw
+ * of allocated queues.
+ **/
+void iavf_update_queue_config(struct iavf_adapter *adapter)
+{
+	adapter->devlink_update = true;
+	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+}
+
 /**
  * iavf_enable_channels
  * @adapter: adapter structure
@@ -2124,6 +2329,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			dev_warn(&adapter->pdev->dev, "Failed to add VLAN filter, error %s\n",
 				 iavf_stat_str(&adapter->hw, v_retval));
 			break;
+		case VIRTCHNL_OP_GET_QOS_CAPS:
+			dev_warn(&adapter->pdev->dev, "Failed to Get Qos CAPs, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUANTA:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Quanta, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
+		case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+			dev_warn(&adapter->pdev->dev, "Failed to Config Queue BW, error %s\n",
+				 iavf_stat_str(&adapter->hw, v_retval));
+			break;
 		default:
 			dev_err(&adapter->pdev->dev, "PF returned error %d (%s) to our request %d\n",
 				v_retval, iavf_stat_str(&adapter->hw, v_retval),
@@ -2456,6 +2673,18 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		if (!v_retval)
 			iavf_netdev_features_vlan_strip_set(netdev, false);
 		break;
+	case VIRTCHNL_OP_GET_QOS_CAPS: {
+		u16 len = struct_size(adapter->qos_caps, cap,
+				      IAVF_MAX_QOS_TC_NUM);
+
+		memcpy(adapter->qos_caps, msg, min(msglen, len));
+		}
+		break;
+	case VIRTCHNL_OP_CONFIG_QUANTA:
+		iavf_notify_queue_config_complete(adapter);
+		break;
+	case VIRTCHNL_OP_CONFIG_QUEUE_BW:
+		break;
 	default:
 		if (adapter->current_op && (v_opcode != adapter->current_op))
 			dev_warn(&adapter->pdev->dev, "Expected response %d from PF, received %d\n",
-- 
2.34.1

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support
  2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
@ 2023-08-22  6:12     ` Jiri Pirko
  -1 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-08-22  6:12 UTC (permalink / raw)
  To: Wenjun Wu
  Cc: intel-wired-lan, netdev, xuejun.zhang, madhu.chittim, qi.z.zhang,
	anthony.l.nguyen

Tue, Aug 22, 2023 at 05:39:58AM CEST, wenjun1.wu@intel.com wrote:
>To allow user to configure queue bandwidth, devlink port support
>is added to support devlink port rate API. [1]
>
>Add devlink framework registration/unregistration on iavf driver
>initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
>is created to be associated iavf netdevice.
>
>iavf rate tree with root node, queue nodes, and leaf node is created
>and registered with devlink rate when iavf adapter is configured, and
>if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
>Capability Exchange.

NACK! Port function is there to configure the VF/SF from the eswitch
side. Yet you use it for the configureation of the actual VF, which is
clear misuse. Please don't


>
>[root@localhost ~]# devlink port function rate show
>pci/0000:af:01.0/txq_15: type node parent iavf_root
>pci/0000:af:01.0/txq_14: type node parent iavf_root
>pci/0000:af:01.0/txq_13: type node parent iavf_root
>pci/0000:af:01.0/txq_12: type node parent iavf_root
>pci/0000:af:01.0/txq_11: type node parent iavf_root
>pci/0000:af:01.0/txq_10: type node parent iavf_root
>pci/0000:af:01.0/txq_9: type node parent iavf_root
>pci/0000:af:01.0/txq_8: type node parent iavf_root
>pci/0000:af:01.0/txq_7: type node parent iavf_root
>pci/0000:af:01.0/txq_6: type node parent iavf_root
>pci/0000:af:01.0/txq_5: type node parent iavf_root
>pci/0000:af:01.0/txq_4: type node parent iavf_root
>pci/0000:af:01.0/txq_3: type node parent iavf_root
>pci/0000:af:01.0/txq_2: type node parent iavf_root
>pci/0000:af:01.0/txq_1: type node parent iavf_root
>pci/0000:af:01.0/txq_0: type node parent iavf_root
>pci/0000:af:01.0/iavf_root: type node
>
>
>                         +---------+
>                         |   root  |
>                         +----+----+
>                              |
>            |-----------------|-----------------|
>       +----v----+       +----v----+       +----v----+
>       |  txq_0  |       |  txq_1  |       |  txq_x  |
>       +----+----+       +----+----+       +----+----+
>
>User can configure the tx_max and tx_share of each queue. Once any one of the
>queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
>and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
>to VF
>
>Example:
>
>1.To Set the queue tx_share:
>devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps
>
>2.To Set the queue tx_max:
>devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps
>
>3.To Show Current devlink port rate info:
>devlink port function rate function show
>[root@localhost ~]# devlink port function rate show
>pci/0000:af:01.0/txq_15: type node parent iavf_root
>pci/0000:af:01.0/txq_14: type node parent iavf_root
>pci/0000:af:01.0/txq_13: type node parent iavf_root
>pci/0000:af:01.0/txq_12: type node parent iavf_root
>pci/0000:af:01.0/txq_11: type node parent iavf_root
>pci/0000:af:01.0/txq_10: type node parent iavf_root
>pci/0000:af:01.0/txq_9: type node parent iavf_root
>pci/0000:af:01.0/txq_8: type node parent iavf_root
>pci/0000:af:01.0/txq_7: type node parent iavf_root
>pci/0000:af:01.0/txq_6: type node parent iavf_root
>pci/0000:af:01.0/txq_5: type node parent iavf_root
>pci/0000:af:01.0/txq_4: type node parent iavf_root
>pci/0000:af:01.0/txq_3: type node parent iavf_root
>pci/0000:af:01.0/txq_2: type node parent iavf_root
>pci/0000:af:01.0/txq_1: type node parent iavf_root
>pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
>pci/0000:af:01.0/iavf_root: type node
>
>
>[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/
>
>Change log:
>
>v4:
>- Rearrange the ice_vf_qs_bw structure, put the largest number first
>- Minimize the scope of values
>- Remove the unnecessary brackets
>- Remove the unnecessary memory allocation.
>- Added Error Code and moved devlink registration before aq lock initialization
>- Changed devlink registration for error handling in case of allocation failure
>- Used kcalloc for object array memory allocation and initialization
>- Changed functions & comments for readability
>
>v3:
>- Rebase the code
>- Changed rate node max/share set function description
>- Put variable in local scope
>
>v2:
>- Change static array to flex array
>- Use struct_size helper
>- Align all the error code types in the function
>- Move the register field definitions to the right place in the file
>- Fix coding style
>- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields
>---
>
>Jun Zhang (3):
>  iavf: Add devlink and devlink port support
>  iavf: Add devlink port function rate API support
>  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
>
>Wenjun Wu (2):
>  virtchnl: support queue rate limit and quanta size configuration
>  ice: Support VF queue rate limit and quanta size configuration
>
> drivers/net/ethernet/intel/Kconfig            |   1 +
> drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
> drivers/net/ethernet/intel/iavf/iavf.h        |  19 +
> .../net/ethernet/intel/iavf/iavf_devlink.c    | 377 ++++++++++++++++++
> .../net/ethernet/intel/iavf/iavf_devlink.h    |  38 ++
> drivers/net/ethernet/intel/iavf/iavf_main.c   |  64 ++-
> .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 231 ++++++++++-
> drivers/net/ethernet/intel/ice/ice.h          |   2 +
> drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
> drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
> .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
> drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
> drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
> drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
> drivers/net/ethernet/intel/ice/ice_virtchnl.c | 310 ++++++++++++++
> drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
> .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
> include/linux/avf/virtchnl.h                  | 119 ++++++
> 18 files changed, 1218 insertions(+), 3 deletions(-)
> create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
> create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h
>
>-- 
>2.34.1
>
>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support
@ 2023-08-22  6:12     ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-08-22  6:12 UTC (permalink / raw)
  To: Wenjun Wu; +Cc: netdev, anthony.l.nguyen, qi.z.zhang, intel-wired-lan

Tue, Aug 22, 2023 at 05:39:58AM CEST, wenjun1.wu@intel.com wrote:
>To allow user to configure queue bandwidth, devlink port support
>is added to support devlink port rate API. [1]
>
>Add devlink framework registration/unregistration on iavf driver
>initialization and remove, and devlink port of DEVLINK_PORT_FLAVOUR_VIRTUAL
>is created to be associated iavf netdevice.
>
>iavf rate tree with root node, queue nodes, and leaf node is created
>and registered with devlink rate when iavf adapter is configured, and
>if PF indicates support of VIRTCHNL_VF_OFFLOAD_QOS through VF Resource /
>Capability Exchange.

NACK! Port function is there to configure the VF/SF from the eswitch
side. Yet you use it for the configureation of the actual VF, which is
clear misuse. Please don't


>
>[root@localhost ~]# devlink port function rate show
>pci/0000:af:01.0/txq_15: type node parent iavf_root
>pci/0000:af:01.0/txq_14: type node parent iavf_root
>pci/0000:af:01.0/txq_13: type node parent iavf_root
>pci/0000:af:01.0/txq_12: type node parent iavf_root
>pci/0000:af:01.0/txq_11: type node parent iavf_root
>pci/0000:af:01.0/txq_10: type node parent iavf_root
>pci/0000:af:01.0/txq_9: type node parent iavf_root
>pci/0000:af:01.0/txq_8: type node parent iavf_root
>pci/0000:af:01.0/txq_7: type node parent iavf_root
>pci/0000:af:01.0/txq_6: type node parent iavf_root
>pci/0000:af:01.0/txq_5: type node parent iavf_root
>pci/0000:af:01.0/txq_4: type node parent iavf_root
>pci/0000:af:01.0/txq_3: type node parent iavf_root
>pci/0000:af:01.0/txq_2: type node parent iavf_root
>pci/0000:af:01.0/txq_1: type node parent iavf_root
>pci/0000:af:01.0/txq_0: type node parent iavf_root
>pci/0000:af:01.0/iavf_root: type node
>
>
>                         +---------+
>                         |   root  |
>                         +----+----+
>                              |
>            |-----------------|-----------------|
>       +----v----+       +----v----+       +----v----+
>       |  txq_0  |       |  txq_1  |       |  txq_x  |
>       +----+----+       +----+----+       +----+----+
>
>User can configure the tx_max and tx_share of each queue. Once any one of the
>queues are fully configured, VIRTCHNL opcodes of VIRTCHNL_OP_CONFIG_QUEUE_BW
>and VIRTCHNL_OP_CONFIG_QUANTA will be sent to PF to configure queues allocated
>to VF
>
>Example:
>
>1.To Set the queue tx_share:
>devlink port function rate set pci/0000:af:01.0 txq_0 tx_share 100 MBps
>
>2.To Set the queue tx_max:
>devlink port function rate set pci/0000:af:01.0 txq_0 tx_max 200 MBps
>
>3.To Show Current devlink port rate info:
>devlink port function rate function show
>[root@localhost ~]# devlink port function rate show
>pci/0000:af:01.0/txq_15: type node parent iavf_root
>pci/0000:af:01.0/txq_14: type node parent iavf_root
>pci/0000:af:01.0/txq_13: type node parent iavf_root
>pci/0000:af:01.0/txq_12: type node parent iavf_root
>pci/0000:af:01.0/txq_11: type node parent iavf_root
>pci/0000:af:01.0/txq_10: type node parent iavf_root
>pci/0000:af:01.0/txq_9: type node parent iavf_root
>pci/0000:af:01.0/txq_8: type node parent iavf_root
>pci/0000:af:01.0/txq_7: type node parent iavf_root
>pci/0000:af:01.0/txq_6: type node parent iavf_root
>pci/0000:af:01.0/txq_5: type node parent iavf_root
>pci/0000:af:01.0/txq_4: type node parent iavf_root
>pci/0000:af:01.0/txq_3: type node parent iavf_root
>pci/0000:af:01.0/txq_2: type node parent iavf_root
>pci/0000:af:01.0/txq_1: type node parent iavf_root
>pci/0000:af:01.0/txq_0: type node tx_share 800Mbit tx_max 1600Mbit parent iavf_root
>pci/0000:af:01.0/iavf_root: type node
>
>
>[1]https://lore.kernel.org/netdev/20221115104825.172668-1-michal.wilczynski@intel.com/
>
>Change log:
>
>v4:
>- Rearrange the ice_vf_qs_bw structure, put the largest number first
>- Minimize the scope of values
>- Remove the unnecessary brackets
>- Remove the unnecessary memory allocation.
>- Added Error Code and moved devlink registration before aq lock initialization
>- Changed devlink registration for error handling in case of allocation failure
>- Used kcalloc for object array memory allocation and initialization
>- Changed functions & comments for readability
>
>v3:
>- Rebase the code
>- Changed rate node max/share set function description
>- Put variable in local scope
>
>v2:
>- Change static array to flex array
>- Use struct_size helper
>- Align all the error code types in the function
>- Move the register field definitions to the right place in the file
>- Fix coding style
>- Adapted to queue bw cfg and qos cap list virtchnl message with flex array fields
>---
>
>Jun Zhang (3):
>  iavf: Add devlink and devlink port support
>  iavf: Add devlink port function rate API support
>  iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting
>
>Wenjun Wu (2):
>  virtchnl: support queue rate limit and quanta size configuration
>  ice: Support VF queue rate limit and quanta size configuration
>
> drivers/net/ethernet/intel/Kconfig            |   1 +
> drivers/net/ethernet/intel/iavf/Makefile      |   2 +-
> drivers/net/ethernet/intel/iavf/iavf.h        |  19 +
> .../net/ethernet/intel/iavf/iavf_devlink.c    | 377 ++++++++++++++++++
> .../net/ethernet/intel/iavf/iavf_devlink.h    |  38 ++
> drivers/net/ethernet/intel/iavf/iavf_main.c   |  64 ++-
> .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 231 ++++++++++-
> drivers/net/ethernet/intel/ice/ice.h          |   2 +
> drivers/net/ethernet/intel/ice/ice_base.c     |   2 +
> drivers/net/ethernet/intel/ice/ice_common.c   |  19 +
> .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
> drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +
> drivers/net/ethernet/intel/ice/ice_type.h     |   1 +
> drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   9 +
> drivers/net/ethernet/intel/ice/ice_virtchnl.c | 310 ++++++++++++++
> drivers/net/ethernet/intel/ice/ice_virtchnl.h |  11 +
> .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
> include/linux/avf/virtchnl.h                  | 119 ++++++
> 18 files changed, 1218 insertions(+), 3 deletions(-)
> create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.c
> create mode 100644 drivers/net/ethernet/intel/iavf/iavf_devlink.h
>
>-- 
>2.34.1
>
>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support
  2023-08-22  6:12     ` [Intel-wired-lan] " Jiri Pirko
@ 2023-08-22 15:12       ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-08-22 15:12 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, intel-wired-lan, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
> NACK! Port function is there to configure the VF/SF from the eswitch
> side. Yet you use it for the configureation of the actual VF, which is
> clear misuse. Please don't

Stating where they are supposed to configure the rate would be helpful.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support
@ 2023-08-22 15:12       ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-08-22 15:12 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Wenjun Wu, intel-wired-lan, netdev, xuejun.zhang, madhu.chittim,
	qi.z.zhang, anthony.l.nguyen

On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
> NACK! Port function is there to configure the VF/SF from the eswitch
> side. Yet you use it for the configureation of the actual VF, which is
> clear misuse. Please don't

Stating where they are supposed to configure the rate would be helpful.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-22 15:12       ` Jakub Kicinski
@ 2023-08-22 15:34         ` Jiri Pirko
  -1 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-08-22 15:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wenjun Wu, intel-wired-lan, netdev, xuejun.zhang, madhu.chittim,
	qi.z.zhang, anthony.l.nguyen

Tue, Aug 22, 2023 at 05:12:55PM CEST, kuba@kernel.org wrote:
>On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>> NACK! Port function is there to configure the VF/SF from the eswitch
>> side. Yet you use it for the configureation of the actual VF, which is
>> clear misuse. Please don't
>
>Stating where they are supposed to configure the rate would be helpful.

TC?

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-08-22 15:34         ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-08-22 15:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, intel-wired-lan, qi.z.zhang, anthony.l.nguyen, Wenjun Wu

Tue, Aug 22, 2023 at 05:12:55PM CEST, kuba@kernel.org wrote:
>On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>> NACK! Port function is there to configure the VF/SF from the eswitch
>> side. Yet you use it for the configureation of the actual VF, which is
>> clear misuse. Please don't
>
>Stating where they are supposed to configure the rate would be helpful.

TC?
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-22 15:34         ` [Intel-wired-lan] " Jiri Pirko
  (?)
@ 2023-08-23 19:13         ` Zhang, Xuejun
  2023-08-24  7:04             ` [Intel-wired-lan] " Jiri Pirko
  -1 siblings, 1 reply; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-23 19:13 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: netdev, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, Wenjun Wu


[-- Attachment #1.1: Type: text/plain, Size: 735 bytes --]


On 8/22/2023 8:34 AM, Jiri Pirko wrote:
> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>> side. Yet you use it for the configureation of the actual VF, which is
>>> clear misuse. Please don't
>> Stating where they are supposed to configure the rate would be helpful.
> TC?

Our implementation is an extension to this commit 42c2eb6b1f43 ice: 
Implement devlink-rate API).

We are setting the Tx max & share rates of individual queues in a VF 
using the devlink rate API.

Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the 
port to distinguish it from being eswitch.

[-- Attachment #1.2: Type: text/html, Size: 1969 bytes --]

[-- Attachment #2: Type: text/plain, Size: 162 bytes --]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-22 15:34         ` [Intel-wired-lan] " Jiri Pirko
@ 2023-08-23 21:39           ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-23 21:39 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: netdev, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, Wenjun Wu


On 8/22/2023 8:34 AM, Jiri Pirko wrote:
> Tue, Aug 22, 2023 at 05:12:55PM CEST, kuba@kernel.org wrote:
>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>> side. Yet you use it for the configureation of the actual VF, which is
>>> clear misuse. Please don't
>> Stating where they are supposed to configure the rate would be helpful.
> TC?

Our implementation is an extension to this commit 42c2eb6b1f43 ice: 
Implement devlink-rate API).

We are setting the Tx max & share rates of individual queues in a VF 
using the devlink rate API.

Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the 
port to distinguish it from being eswitch.

[resend in plain text only]

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-08-23 21:39           ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-23 21:39 UTC (permalink / raw)
  To: Jiri Pirko, Jakub Kicinski
  Cc: Wenjun Wu, intel-wired-lan, netdev, madhu.chittim, qi.z.zhang,
	anthony.l.nguyen


On 8/22/2023 8:34 AM, Jiri Pirko wrote:
> Tue, Aug 22, 2023 at 05:12:55PM CEST, kuba@kernel.org wrote:
>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>> side. Yet you use it for the configureation of the actual VF, which is
>>> clear misuse. Please don't
>> Stating where they are supposed to configure the rate would be helpful.
> TC?

Our implementation is an extension to this commit 42c2eb6b1f43 ice: 
Implement devlink-rate API).

We are setting the Tx max & share rates of individual queues in a VF 
using the devlink rate API.

Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the 
port to distinguish it from being eswitch.

[resend in plain text only]


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-23 19:13         ` Zhang, Xuejun
@ 2023-08-24  7:04             ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-08-24  7:04 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jakub Kicinski, Wenjun Wu, intel-wired-lan, netdev,
	madhu.chittim, qi.z.zhang, anthony.l.nguyen

Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>
>On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>> > On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>> > > NACK! Port function is there to configure the VF/SF from the eswitch
>> > > side. Yet you use it for the configureation of the actual VF, which is
>> > > clear misuse. Please don't
>> > Stating where they are supposed to configure the rate would be helpful.
>> TC?
>
>Our implementation is an extension to this commit 42c2eb6b1f43 ice: Implement
>devlink-rate API).
>
>We are setting the Tx max & share rates of individual queues in a VF using
>the devlink rate API.
>
>Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the port
>to distinguish it from being eswitch.

I understand, that is a wrong object. So again, you should use
"function" subobject of devlink port to configure "the other side of the
wire", that means the function related to a eswitch port. Here, you are
doing it for the VF directly, which is wrong. If you need some rate
limiting to be configured on an actual VF, use what you use for any
other nic. Offload TC.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-08-24  7:04             ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-08-24  7:04 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang,
	Jakub Kicinski, Wenjun Wu

Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>
>On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>> > On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>> > > NACK! Port function is there to configure the VF/SF from the eswitch
>> > > side. Yet you use it for the configureation of the actual VF, which is
>> > > clear misuse. Please don't
>> > Stating where they are supposed to configure the rate would be helpful.
>> TC?
>
>Our implementation is an extension to this commit 42c2eb6b1f43 ice: Implement
>devlink-rate API).
>
>We are setting the Tx max & share rates of individual queues in a VF using
>the devlink rate API.
>
>Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the port
>to distinguish it from being eswitch.

I understand, that is a wrong object. So again, you should use
"function" subobject of devlink port to configure "the other side of the
wire", that means the function related to a eswitch port. Here, you are
doing it for the VF directly, which is wrong. If you need some rate
limiting to be configured on an actual VF, use what you use for any
other nic. Offload TC.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-24  7:04             ` [Intel-wired-lan] " Jiri Pirko
@ 2023-08-28 22:46               ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-28 22:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Wenjun Wu, intel-wired-lan, netdev,
	madhu.chittim, qi.z.zhang, anthony.l.nguyen


On 8/24/2023 12:04 AM, Jiri Pirko wrote:
> Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>> On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>>> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>>>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>>>> side. Yet you use it for the configureation of the actual VF, which is
>>>>> clear misuse. Please don't
>>>> Stating where they are supposed to configure the rate would be helpful.
>>> TC?
>> Our implementation is an extension to this commit 42c2eb6b1f43 ice: Implement
>> devlink-rate API).
>>
>> We are setting the Tx max & share rates of individual queues in a VF using
>> the devlink rate API.
>>
>> Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the port
>> to distinguish it from being eswitch.
> I understand, that is a wrong object. So again, you should use
> "function" subobject of devlink port to configure "the other side of the
> wire", that means the function related to a eswitch port. Here, you are
> doing it for the VF directly, which is wrong. If you need some rate
> limiting to be configured on an actual VF, use what you use for any
> other nic. Offload TC.
Thanks for detailed explanation and suggestions. Sorry for late reply as 
it took a bit longer to understand options.

As sysfs has similar rate configuration on per queue basis with 
tx_maxrate, is it a viable option for our use case (i.e allow user to 
configure tx rate for each allocated queue in a VF).

Pls aslo see If adding tx_minrate to sysfs tx queue entry is feasible on 
the current framework.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-08-28 22:46               ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-08-28 22:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang,
	Jakub Kicinski, Wenjun Wu


On 8/24/2023 12:04 AM, Jiri Pirko wrote:
> Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>> On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>>> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>>>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>>>> side. Yet you use it for the configureation of the actual VF, which is
>>>>> clear misuse. Please don't
>>>> Stating where they are supposed to configure the rate would be helpful.
>>> TC?
>> Our implementation is an extension to this commit 42c2eb6b1f43 ice: Implement
>> devlink-rate API).
>>
>> We are setting the Tx max & share rates of individual queues in a VF using
>> the devlink rate API.
>>
>> Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the port
>> to distinguish it from being eswitch.
> I understand, that is a wrong object. So again, you should use
> "function" subobject of devlink port to configure "the other side of the
> wire", that means the function related to a eswitch port. Here, you are
> doing it for the VF directly, which is wrong. If you need some rate
> limiting to be configured on an actual VF, use what you use for any
> other nic. Offload TC.
Thanks for detailed explanation and suggestions. Sorry for late reply as 
it took a bit longer to understand options.

As sysfs has similar rate configuration on per queue basis with 
tx_maxrate, is it a viable option for our use case (i.e allow user to 
configure tx rate for each allocated queue in a VF).

Pls aslo see If adding tx_minrate to sysfs tx queue entry is feasible on 
the current framework.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-24  7:04             ` [Intel-wired-lan] " Jiri Pirko
@ 2023-10-18  9:05               ` Paolo Abeni
  -1 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-10-18  9:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Wenjun Wu, intel-wired-lan, netdev,
	madhu.chittim, qi.z.zhang, anthony.l.nguyen, Zhang, Xuejun

Hi,

please allow me to revive this old thread...

On Thu, 2023-08-24 at 09:04 +0200, Jiri Pirko wrote:
> > Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
> > > > 
> > > > On 8/22/2023 8:34 AM, Jiri Pirko wrote:
> > > > > > Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
> > > > > > > > On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
> > > > > > > > > > NACK! Port function is there to configure the VF/SF from the eswitch
> > > > > > > > > > side. Yet you use it for the configureation of the actual VF, which is
> > > > > > > > > > clear misuse. Please don't
> > > > > > > > Stating where they are supposed to configure the rate would be helpful.
> > > > > > TC?
> > > > 
> > > > Our implementation is an extension to this commit 42c2eb6b1f43 ice: Implement
> > > > devlink-rate API).
> > > > 
> > > > We are setting the Tx max & share rates of individual queues in a VF using
> > > > the devlink rate API.
> > > > 
> > > > Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the port
> > > > to distinguish it from being eswitch.
> > 
> > I understand, that is a wrong object. So again, you should use
> > "function" subobject of devlink port to configure "the other side of the
> > wire", that means the function related to a eswitch port. Here, you are
> > doing it for the VF directly, which is wrong. If you need some rate
> > limiting to be configured on an actual VF, use what you use for any
> > other nic. Offload TC.

I have a doubt WRT the above. Don't we need something more/different
here? I mean: a possible intent is limiting the amount of resources (BW
in the VF -> esw direction) that the application owing the VF could
use.

If that is enforced via TC on the VF side (say, a different namespace
or VM), the VF user could circumvent such limit - changing the tc
configuration - either by mistake or malicious action. 

Looking at the thing from a different perspective, the TX B/W on the VF
side is the RX B/W on the eswitch side, so the same effect could be
obtained with a (new/different) API formally touching only eswitch side
object. WDYT?

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-10-18  9:05               ` Paolo Abeni
  0 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-10-18  9:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, Zhang, Xuejun, anthony.l.nguyen, madhu.chittim,
	intel-wired-lan, qi.z.zhang, Jakub Kicinski, Wenjun Wu

Hi,

please allow me to revive this old thread...

On Thu, 2023-08-24 at 09:04 +0200, Jiri Pirko wrote:
> > Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
> > > > 
> > > > On 8/22/2023 8:34 AM, Jiri Pirko wrote:
> > > > > > Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
> > > > > > > > On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
> > > > > > > > > > NACK! Port function is there to configure the VF/SF from the eswitch
> > > > > > > > > > side. Yet you use it for the configureation of the actual VF, which is
> > > > > > > > > > clear misuse. Please don't
> > > > > > > > Stating where they are supposed to configure the rate would be helpful.
> > > > > > TC?
> > > > 
> > > > Our implementation is an extension to this commit 42c2eb6b1f43 ice: Implement
> > > > devlink-rate API).
> > > > 
> > > > We are setting the Tx max & share rates of individual queues in a VF using
> > > > the devlink rate API.
> > > > 
> > > > Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for the port
> > > > to distinguish it from being eswitch.
> > 
> > I understand, that is a wrong object. So again, you should use
> > "function" subobject of devlink port to configure "the other side of the
> > wire", that means the function related to a eswitch port. Here, you are
> > doing it for the VF directly, which is wrong. If you need some rate
> > limiting to be configured on an actual VF, use what you use for any
> > other nic. Offload TC.

I have a doubt WRT the above. Don't we need something more/different
here? I mean: a possible intent is limiting the amount of resources (BW
in the VF -> esw direction) that the application owing the VF could
use.

If that is enforced via TC on the VF side (say, a different namespace
or VM), the VF user could circumvent such limit - changing the tc
configuration - either by mistake or malicious action. 

Looking at the thing from a different perspective, the TX B/W on the VF
side is the RX B/W on the eswitch side, so the same effect could be
obtained with a (new/different) API formally touching only eswitch side
object. WDYT?

Thanks,

Paolo


_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-08-28 22:46               ` [Intel-wired-lan] " Zhang, Xuejun
@ 2023-11-17  5:52                 ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-11-17  5:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang,
	Jakub Kicinski, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni

Hello Jiri & Jakub,

Thanks for looking into our last patch with devlink API. Really 
appreciate your candid review.

Following your suggestion, we have looked into 3 tc offload options to 
support queue rate limiting

#1 mq + matchall + police

#2 mq + tbf

#3 htb

all 3 tc offload options require some level of tc extensions to support 
VF tx queue rate limiting (tx_maxrate & tx_minrate)

htb offload requires minimal tc changes or no change with similar change 
done @ driver (we can share patch for review).

After discussing with Maxim Mikityanskiy( 
https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/ 
), looks like sysfs interface with tx_minrate extension could be the 
option we can take.

Look forward your opinion & guidance. Thanks for your time!

Regards,

Jun

On 8/28/2023 3:46 PM, Zhang, Xuejun wrote:
>
> On 8/24/2023 12:04 AM, Jiri Pirko wrote:
>> Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>>> On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>>>> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>>>>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>>>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>>>>> side. Yet you use it for the configureation of the actual VF, 
>>>>>> which is
>>>>>> clear misuse. Please don't
>>>>> Stating where they are supposed to configure the rate would be 
>>>>> helpful.
>>>> TC?
>>> Our implementation is an extension to this commit 42c2eb6b1f43 ice: 
>>> Implement
>>> devlink-rate API).
>>>
>>> We are setting the Tx max & share rates of individual queues in a VF 
>>> using
>>> the devlink rate API.
>>>
>>> Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for 
>>> the port
>>> to distinguish it from being eswitch.
>> I understand, that is a wrong object. So again, you should use
>> "function" subobject of devlink port to configure "the other side of the
>> wire", that means the function related to a eswitch port. Here, you are
>> doing it for the VF directly, which is wrong. If you need some rate
>> limiting to be configured on an actual VF, use what you use for any
>> other nic. Offload TC.
> Thanks for detailed explanation and suggestions. Sorry for late reply 
> as it took a bit longer to understand options.
>
> As sysfs has similar rate configuration on per queue basis with 
> tx_maxrate, is it a viable option for our use case (i.e allow user to 
> configure tx rate for each allocated queue in a VF).
>
> Pls aslo see If adding tx_minrate to sysfs tx queue entry is feasible 
> on the current framework.
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-17  5:52                 ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-11-17  5:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Samudrala, Sridhar, netdev, maxtram95, qi.z.zhang, Chittim,
	Madhu, intel-wired-lan, Jakub Kicinski, anthony.l.nguyen, pabeni,
	Wenjun Wu

Hello Jiri & Jakub,

Thanks for looking into our last patch with devlink API. Really 
appreciate your candid review.

Following your suggestion, we have looked into 3 tc offload options to 
support queue rate limiting

#1 mq + matchall + police

#2 mq + tbf

#3 htb

all 3 tc offload options require some level of tc extensions to support 
VF tx queue rate limiting (tx_maxrate & tx_minrate)

htb offload requires minimal tc changes or no change with similar change 
done @ driver (we can share patch for review).

After discussing with Maxim Mikityanskiy( 
https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/ 
), looks like sysfs interface with tx_minrate extension could be the 
option we can take.

Look forward your opinion & guidance. Thanks for your time!

Regards,

Jun

On 8/28/2023 3:46 PM, Zhang, Xuejun wrote:
>
> On 8/24/2023 12:04 AM, Jiri Pirko wrote:
>> Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>>> On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>>>> Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>>>>> On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>>>>>> NACK! Port function is there to configure the VF/SF from the eswitch
>>>>>> side. Yet you use it for the configureation of the actual VF, 
>>>>>> which is
>>>>>> clear misuse. Please don't
>>>>> Stating where they are supposed to configure the rate would be 
>>>>> helpful.
>>>> TC?
>>> Our implementation is an extension to this commit 42c2eb6b1f43 ice: 
>>> Implement
>>> devlink-rate API).
>>>
>>> We are setting the Tx max & share rates of individual queues in a VF 
>>> using
>>> the devlink rate API.
>>>
>>> Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute for 
>>> the port
>>> to distinguish it from being eswitch.
>> I understand, that is a wrong object. So again, you should use
>> "function" subobject of devlink port to configure "the other side of the
>> wire", that means the function related to a eswitch port. Here, you are
>> doing it for the VF directly, which is wrong. If you need some rate
>> limiting to be configured on an actual VF, use what you use for any
>> other nic. Offload TC.
> Thanks for detailed explanation and suggestions. Sorry for late reply 
> as it took a bit longer to understand options.
>
> As sysfs has similar rate configuration on per queue basis with 
> tx_maxrate, is it a viable option for our use case (i.e allow user to 
> configure tx rate for each allocated queue in a VF).
>
> Pls aslo see If adding tx_minrate to sysfs tx queue entry is feasible 
> on the current framework.
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-17  5:52                 ` Zhang, Xuejun
@ 2023-11-17 11:21                   ` Jiri Pirko
  -1 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-11-17 11:21 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang,
	Jakub Kicinski, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni

Fri, Nov 17, 2023 at 06:52:49AM CET, xuejun.zhang@intel.com wrote:
>Hello Jiri & Jakub,
>
>Thanks for looking into our last patch with devlink API. Really appreciate
>your candid review.
>
>Following your suggestion, we have looked into 3 tc offload options to
>support queue rate limiting
>
>#1 mq + matchall + police

This looks most suitable. Why it would not work?

>
>#2 mq + tbf
>
>#3 htb
>
>all 3 tc offload options require some level of tc extensions to support VF tx
>queue rate limiting (tx_maxrate & tx_minrate)
>
>htb offload requires minimal tc changes or no change with similar change done
>@ driver (we can share patch for review).
>
>After discussing with Maxim Mikityanskiy( https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/
>), looks like sysfs interface with tx_minrate extension could be the option

I don't undestand how any sysfs know is related to any of the tree tc
solutions above.


>we can take.
>
>Look forward your opinion & guidance. Thanks for your time!
>
>Regards,
>
>Jun
>
>On 8/28/2023 3:46 PM, Zhang, Xuejun wrote:
>> 
>> On 8/24/2023 12:04 AM, Jiri Pirko wrote:
>> > Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>> > > On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>> > > > Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>> > > > > On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>> > > > > > NACK! Port function is there to configure the VF/SF from the eswitch
>> > > > > > side. Yet you use it for the configureation of the
>> > > > > > actual VF, which is
>> > > > > > clear misuse. Please don't
>> > > > > Stating where they are supposed to configure the rate
>> > > > > would be helpful.
>> > > > TC?
>> > > Our implementation is an extension to this commit 42c2eb6b1f43
>> > > ice: Implement
>> > > devlink-rate API).
>> > > 
>> > > We are setting the Tx max & share rates of individual queues in a
>> > > VF using
>> > > the devlink rate API.
>> > > 
>> > > Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute
>> > > for the port
>> > > to distinguish it from being eswitch.
>> > I understand, that is a wrong object. So again, you should use
>> > "function" subobject of devlink port to configure "the other side of the
>> > wire", that means the function related to a eswitch port. Here, you are
>> > doing it for the VF directly, which is wrong. If you need some rate
>> > limiting to be configured on an actual VF, use what you use for any
>> > other nic. Offload TC.
>> Thanks for detailed explanation and suggestions. Sorry for late reply as
>> it took a bit longer to understand options.
>> 
>> As sysfs has similar rate configuration on per queue basis with
>> tx_maxrate, is it a viable option for our use case (i.e allow user to
>> configure tx rate for each allocated queue in a VF).
>> 
>> Pls aslo see If adding tx_minrate to sysfs tx queue entry is feasible on
>> the current framework.
>> _______________________________________________
>> Intel-wired-lan mailing list
>> Intel-wired-lan@osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-17 11:21                   ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-11-17 11:21 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Samudrala, Sridhar, netdev, maxtram95, qi.z.zhang, Chittim,
	Madhu, intel-wired-lan, Jakub Kicinski, anthony.l.nguyen, pabeni,
	Wenjun Wu

Fri, Nov 17, 2023 at 06:52:49AM CET, xuejun.zhang@intel.com wrote:
>Hello Jiri & Jakub,
>
>Thanks for looking into our last patch with devlink API. Really appreciate
>your candid review.
>
>Following your suggestion, we have looked into 3 tc offload options to
>support queue rate limiting
>
>#1 mq + matchall + police

This looks most suitable. Why it would not work?

>
>#2 mq + tbf
>
>#3 htb
>
>all 3 tc offload options require some level of tc extensions to support VF tx
>queue rate limiting (tx_maxrate & tx_minrate)
>
>htb offload requires minimal tc changes or no change with similar change done
>@ driver (we can share patch for review).
>
>After discussing with Maxim Mikityanskiy( https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/
>), looks like sysfs interface with tx_minrate extension could be the option

I don't undestand how any sysfs know is related to any of the tree tc
solutions above.


>we can take.
>
>Look forward your opinion & guidance. Thanks for your time!
>
>Regards,
>
>Jun
>
>On 8/28/2023 3:46 PM, Zhang, Xuejun wrote:
>> 
>> On 8/24/2023 12:04 AM, Jiri Pirko wrote:
>> > Wed, Aug 23, 2023 at 09:13:34PM CEST, xuejun.zhang@intel.com wrote:
>> > > On 8/22/2023 8:34 AM, Jiri Pirko wrote:
>> > > > Tue, Aug 22, 2023 at 05:12:55PM CEST,kuba@kernel.org  wrote:
>> > > > > On Tue, 22 Aug 2023 08:12:28 +0200 Jiri Pirko wrote:
>> > > > > > NACK! Port function is there to configure the VF/SF from the eswitch
>> > > > > > side. Yet you use it for the configureation of the
>> > > > > > actual VF, which is
>> > > > > > clear misuse. Please don't
>> > > > > Stating where they are supposed to configure the rate
>> > > > > would be helpful.
>> > > > TC?
>> > > Our implementation is an extension to this commit 42c2eb6b1f43
>> > > ice: Implement
>> > > devlink-rate API).
>> > > 
>> > > We are setting the Tx max & share rates of individual queues in a
>> > > VF using
>> > > the devlink rate API.
>> > > 
>> > > Here we are using DEVLINK_PORT_FLAVOUR_VIRTUAL as the attribute
>> > > for the port
>> > > to distinguish it from being eswitch.
>> > I understand, that is a wrong object. So again, you should use
>> > "function" subobject of devlink port to configure "the other side of the
>> > wire", that means the function related to a eswitch port. Here, you are
>> > doing it for the VF directly, which is wrong. If you need some rate
>> > limiting to be configured on an actual VF, use what you use for any
>> > other nic. Offload TC.
>> Thanks for detailed explanation and suggestions. Sorry for late reply as
>> it took a bit longer to understand options.
>> 
>> As sysfs has similar rate configuration on per queue basis with
>> tx_maxrate, is it a viable option for our use case (i.e allow user to
>> configure tx rate for each allocated queue in a VF).
>> 
>> Pls aslo see If adding tx_minrate to sysfs tx queue entry is feasible on
>> the current framework.
>> _______________________________________________
>> Intel-wired-lan mailing list
>> Intel-wired-lan@osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-17  5:52                 ` Zhang, Xuejun
@ 2023-11-18 16:48                   ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-11-18 16:48 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni

On Thu, 16 Nov 2023 21:52:49 -0800 Zhang, Xuejun wrote:
> Thanks for looking into our last patch with devlink API. Really 
> appreciate your candid review.
> 
> Following your suggestion, we have looked into 3 tc offload options to 
> support queue rate limiting
> 
> #1 mq + matchall + police
> 
> #2 mq + tbf

You can extend mqprio, too, if you wanted.

> #3 htb
> 
> all 3 tc offload options require some level of tc extensions to support 
> VF tx queue rate limiting (tx_maxrate & tx_minrate)
> 
> htb offload requires minimal tc changes or no change with similar change 
> done @ driver (we can share patch for review).
> 
> After discussing with Maxim Mikityanskiy( 
> https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/ 
> ), looks like sysfs interface with tx_minrate extension could be the 
> option we can take.
> 
> Look forward your opinion & guidance. Thanks for your time!

My least favorite thing to do is to configure the same piece of silicon
with 4 different SW interfaces. It's okay if we have 4 different uAPIs
(user level APIs) but the driver should not be exposed to all these
options.

I'm saying 4 but really I can think of 6 ways of setting maxrate :(

IMHO we need to be a bit more realistic about the notion of "offloading
the SW thing" for qdiscs specifically. Normally we offload SW constructs
to have a fallback and have a clear definition of functionality.
I bet most data-centers will use BPF+FQ these days, so the "fallback"
argument does not apply. And the "clear definition" when it comes to
basic rate limiting is.. moot.

Besides we already have mqprio, sysfs maxrate, sriov ndo, devlink rate,
none of which have SW fallback.

So since you asked for my opinion - my opinion is that step 1 is to
create a common representation of what we already have and feed it
to the drivers via a single interface. I could just be taking sysfs
maxrate and feeding it to the driver via the devlink rate interface.
If we have the right internals I give 0 cares about what uAPI you pick.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-18 16:48                   ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-11-18 16:48 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Chittim,
	Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, pabeni,
	Wenjun Wu

On Thu, 16 Nov 2023 21:52:49 -0800 Zhang, Xuejun wrote:
> Thanks for looking into our last patch with devlink API. Really 
> appreciate your candid review.
> 
> Following your suggestion, we have looked into 3 tc offload options to 
> support queue rate limiting
> 
> #1 mq + matchall + police
> 
> #2 mq + tbf

You can extend mqprio, too, if you wanted.

> #3 htb
> 
> all 3 tc offload options require some level of tc extensions to support 
> VF tx queue rate limiting (tx_maxrate & tx_minrate)
> 
> htb offload requires minimal tc changes or no change with similar change 
> done @ driver (we can share patch for review).
> 
> After discussing with Maxim Mikityanskiy( 
> https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/ 
> ), looks like sysfs interface with tx_minrate extension could be the 
> option we can take.
> 
> Look forward your opinion & guidance. Thanks for your time!

My least favorite thing to do is to configure the same piece of silicon
with 4 different SW interfaces. It's okay if we have 4 different uAPIs
(user level APIs) but the driver should not be exposed to all these
options.

I'm saying 4 but really I can think of 6 ways of setting maxrate :(

IMHO we need to be a bit more realistic about the notion of "offloading
the SW thing" for qdiscs specifically. Normally we offload SW constructs
to have a fallback and have a clear definition of functionality.
I bet most data-centers will use BPF+FQ these days, so the "fallback"
argument does not apply. And the "clear definition" when it comes to
basic rate limiting is.. moot.

Besides we already have mqprio, sysfs maxrate, sriov ndo, devlink rate,
none of which have SW fallback.

So since you asked for my opinion - my opinion is that step 1 is to
create a common representation of what we already have and feed it
to the drivers via a single interface. I could just be taking sysfs
maxrate and feeding it to the driver via the devlink rate interface.
If we have the right internals I give 0 cares about what uAPI you pick.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-17 11:21                   ` Jiri Pirko
@ 2023-11-21  9:04                     ` Paolo Abeni
  -1 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-11-21  9:04 UTC (permalink / raw)
  To: Jiri Pirko, Zhang, Xuejun
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang,
	Jakub Kicinski, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar

On Fri, 2023-11-17 at 12:21 +0100, Jiri Pirko wrote:
> Fri, Nov 17, 2023 at 06:52:49AM CET, xuejun.zhang@intel.com wrote:
> > Hello Jiri & Jakub,
> > 
> > Thanks for looking into our last patch with devlink API. Really appreciate
> > your candid review.
> > 
> > Following your suggestion, we have looked into 3 tc offload options to
> > support queue rate limiting
> > 
> > #1 mq + matchall + police
> 
> This looks most suitable. Why it would not work?

AFAICS, it should work, but it does not look the most suitable to me:
beyond splitting a "simple" task in separate entities, it poses a
constraint on the classification performed on the egress device.

Suppose the admin wants to limit the egress bandwidth on the given tx
queue _and_ do some application specific packet classification and
actions. That would not be possible right?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-21  9:04                     ` Paolo Abeni
  0 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-11-21  9:04 UTC (permalink / raw)
  To: Jiri Pirko, Zhang, Xuejun
  Cc: Samudrala, Sridhar, netdev, maxtram95, qi.z.zhang, Chittim,
	Madhu, intel-wired-lan, Jakub Kicinski, anthony.l.nguyen,
	Wenjun Wu

On Fri, 2023-11-17 at 12:21 +0100, Jiri Pirko wrote:
> Fri, Nov 17, 2023 at 06:52:49AM CET, xuejun.zhang@intel.com wrote:
> > Hello Jiri & Jakub,
> > 
> > Thanks for looking into our last patch with devlink API. Really appreciate
> > your candid review.
> > 
> > Following your suggestion, we have looked into 3 tc offload options to
> > support queue rate limiting
> > 
> > #1 mq + matchall + police
> 
> This looks most suitable. Why it would not work?

AFAICS, it should work, but it does not look the most suitable to me:
beyond splitting a "simple" task in separate entities, it poses a
constraint on the classification performed on the egress device.

Suppose the admin wants to limit the egress bandwidth on the given tx
queue _and_ do some application specific packet classification and
actions. That would not be possible right?

Thanks!

Paolo

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-18 16:48                   ` Jakub Kicinski
@ 2023-11-22 22:19                     ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-11-22 22:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni


On 11/18/2023 8:48 AM, Jakub Kicinski wrote:
> On Thu, 16 Nov 2023 21:52:49 -0800 Zhang, Xuejun wrote:
>> Thanks for looking into our last patch with devlink API. Really
>> appreciate your candid review.
>>
>> Following your suggestion, we have looked into 3 tc offload options to
>> support queue rate limiting
>>
>> #1 mq + matchall + police
>>
>> #2 mq + tbf
> You can extend mqprio, too, if you wanted.
>
>> #3 htb
>>
>> all 3 tc offload options require some level of tc extensions to support
>> VF tx queue rate limiting (tx_maxrate & tx_minrate)
>>
>> htb offload requires minimal tc changes or no change with similar change
>> done @ driver (we can share patch for review).
>>
>> After discussing with Maxim Mikityanskiy(
>> https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/
>> ), looks like sysfs interface with tx_minrate extension could be the
>> option we can take.
>>
>> Look forward your opinion & guidance. Thanks for your time!
> My least favorite thing to do is to configure the same piece of silicon
> with 4 different SW interfaces. It's okay if we have 4 different uAPIs
> (user level APIs) but the driver should not be exposed to all these
> options.
>
> I'm saying 4 but really I can think of 6 ways of setting maxrate :(
>
> IMHO we need to be a bit more realistic about the notion of "offloading
> the SW thing" for qdiscs specifically. Normally we offload SW constructs
> to have a fallback and have a clear definition of functionality.
> I bet most data-centers will use BPF+FQ these days, so the "fallback"
> argument does not apply. And the "clear definition" when it comes to
> basic rate limiting is.. moot.
>
> Besides we already have mqprio, sysfs maxrate, sriov ndo, devlink rate,
> none of which have SW fallback.
>
> So since you asked for my opinion - my opinion is that step 1 is to
> create a common representation of what we already have and feed it
> to the drivers via a single interface. I could just be taking sysfs
> maxrate and feeding it to the driver via the devlink rate interface.
> If we have the right internals I give 0 cares about what uAPI you pick.

There is an existing ndo api for setting tx_maxrate API

int (*ndo_set_tx_maxrate)(struct net_device *dev,
                           int queue_index,
               u32 maxrate);

we could expand and modify the above API with tx_minrate and burst 
parameters as below
int (*ndo_set_tx_rate)(struct net_device *dev,
                  int queue_index,
                int maxrate , int minrate, int burst);

queue_index: tx queue index of net device
maxrate: tx maxrate in mbps
minrate: tx min rate in mbps
burst: data burst in bytes


The proposed API would incur net/core and driver changes as follows
a) existing drivers with ndo_set_tx_maxrate support upgraded to use new 
ndo_set_tx_rate
b) net sysfs (replacing ndo_set_maxrate with ndo_set_tx_rate with 
minrate and burst set to 0, -1 means ignore)
c) Keep the existing /sys/class/net/ethx/queues/tx_nn/tx_maxrate as it 
is currently
d) Add sysfs entry as /sys/class/net/ethx/queues/tx_nn/tx_minrate & 
/sys/class/net/ethx/queues/tx_nn/burst

Let us know your thoughts.


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-22 22:19                     ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-11-22 22:19 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Chittim,
	Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, pabeni,
	Wenjun Wu


On 11/18/2023 8:48 AM, Jakub Kicinski wrote:
> On Thu, 16 Nov 2023 21:52:49 -0800 Zhang, Xuejun wrote:
>> Thanks for looking into our last patch with devlink API. Really
>> appreciate your candid review.
>>
>> Following your suggestion, we have looked into 3 tc offload options to
>> support queue rate limiting
>>
>> #1 mq + matchall + police
>>
>> #2 mq + tbf
> You can extend mqprio, too, if you wanted.
>
>> #3 htb
>>
>> all 3 tc offload options require some level of tc extensions to support
>> VF tx queue rate limiting (tx_maxrate & tx_minrate)
>>
>> htb offload requires minimal tc changes or no change with similar change
>> done @ driver (we can share patch for review).
>>
>> After discussing with Maxim Mikityanskiy(
>> https://lore.kernel.org/netdev/54a7dd27-a612-46f1-80dd-b43e28f8e4ce@intel.com/
>> ), looks like sysfs interface with tx_minrate extension could be the
>> option we can take.
>>
>> Look forward your opinion & guidance. Thanks for your time!
> My least favorite thing to do is to configure the same piece of silicon
> with 4 different SW interfaces. It's okay if we have 4 different uAPIs
> (user level APIs) but the driver should not be exposed to all these
> options.
>
> I'm saying 4 but really I can think of 6 ways of setting maxrate :(
>
> IMHO we need to be a bit more realistic about the notion of "offloading
> the SW thing" for qdiscs specifically. Normally we offload SW constructs
> to have a fallback and have a clear definition of functionality.
> I bet most data-centers will use BPF+FQ these days, so the "fallback"
> argument does not apply. And the "clear definition" when it comes to
> basic rate limiting is.. moot.
>
> Besides we already have mqprio, sysfs maxrate, sriov ndo, devlink rate,
> none of which have SW fallback.
>
> So since you asked for my opinion - my opinion is that step 1 is to
> create a common representation of what we already have and feed it
> to the drivers via a single interface. I could just be taking sysfs
> maxrate and feeding it to the driver via the devlink rate interface.
> If we have the right internals I give 0 cares about what uAPI you pick.

There is an existing ndo api for setting tx_maxrate API

int (*ndo_set_tx_maxrate)(struct net_device *dev,
                           int queue_index,
               u32 maxrate);

we could expand and modify the above API with tx_minrate and burst 
parameters as below
int (*ndo_set_tx_rate)(struct net_device *dev,
                  int queue_index,
                int maxrate , int minrate, int burst);

queue_index: tx queue index of net device
maxrate: tx maxrate in mbps
minrate: tx min rate in mbps
burst: data burst in bytes


The proposed API would incur net/core and driver changes as follows
a) existing drivers with ndo_set_tx_maxrate support upgraded to use new 
ndo_set_tx_rate
b) net sysfs (replacing ndo_set_maxrate with ndo_set_tx_rate with 
minrate and burst set to 0, -1 means ignore)
c) Keep the existing /sys/class/net/ethx/queues/tx_nn/tx_maxrate as it 
is currently
d) Add sysfs entry as /sys/class/net/ethx/queues/tx_nn/tx_minrate & 
/sys/class/net/ethx/queues/tx_nn/burst

Let us know your thoughts.

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-22 22:19                     ` Zhang, Xuejun
@ 2023-11-23  3:22                       ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-11-23  3:22 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni

On Wed, 22 Nov 2023 14:19:14 -0800 Zhang, Xuejun wrote:
> The proposed API would incur net/core and driver changes as follows
> a) existing drivers with ndo_set_tx_maxrate support upgraded to use new 
> ndo_set_tx_rate
> b) net sysfs (replacing ndo_set_maxrate with ndo_set_tx_rate with 
> minrate and burst set to 0, -1 means ignore)
> c) Keep the existing /sys/class/net/ethx/queues/tx_nn/tx_maxrate as it 
> is currently
> d) Add sysfs entry as /sys/class/net/ethx/queues/tx_nn/tx_minrate & 
> /sys/class/net/ethx/queues/tx_nn/burst

You described extending the sysfs API (which the ndo you mention 
is for) and nothing about converging the other existing APIs.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-23  3:22                       ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-11-23  3:22 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Chittim,
	Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, pabeni,
	Wenjun Wu

On Wed, 22 Nov 2023 14:19:14 -0800 Zhang, Xuejun wrote:
> The proposed API would incur net/core and driver changes as follows
> a) existing drivers with ndo_set_tx_maxrate support upgraded to use new 
> ndo_set_tx_rate
> b) net sysfs (replacing ndo_set_maxrate with ndo_set_tx_rate with 
> minrate and burst set to 0, -1 means ignore)
> c) Keep the existing /sys/class/net/ethx/queues/tx_nn/tx_maxrate as it 
> is currently
> d) Add sysfs entry as /sys/class/net/ethx/queues/tx_nn/tx_minrate & 
> /sys/class/net/ethx/queues/tx_nn/burst

You described extending the sysfs API (which the ndo you mention 
is for) and nothing about converging the other existing APIs.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-23  3:22                       ` Jakub Kicinski
@ 2023-11-28  0:15                         ` Zhang, Xuejun
  -1 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-11-28  0:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni


On 11/22/2023 7:22 PM, Jakub Kicinski wrote:
> On Wed, 22 Nov 2023 14:19:14 -0800 Zhang, Xuejun wrote:
>> The proposed API would incur net/core and driver changes as follows
>> a) existing drivers with ndo_set_tx_maxrate support upgraded to use new
>> ndo_set_tx_rate
>> b) net sysfs (replacing ndo_set_maxrate with ndo_set_tx_rate with
>> minrate and burst set to 0, -1 means ignore)
>> c) Keep the existing /sys/class/net/ethx/queues/tx_nn/tx_maxrate as it
>> is currently
>> d) Add sysfs entry as /sys/class/net/ethx/queues/tx_nn/tx_minrate &
>> /sys/class/net/ethx/queues/tx_nn/burst
> You described extending the sysfs API (which the ndo you mention
> is for) and nothing about converging the other existing APIs.
This is extension of ndo_set_tx_maxrate to include per queue parameters 
of tx_minrate and burst.

devlink rate api includes tx_maxrate and tx_minrate, it is intended for 
port rate configurations.

With regarding to tc mqprio, it is being used to configure queue group 
per tc.

For sriov ndo ndo_set_vf_rate, that has been used for overall VF rate 
configuration, not for queue based rate configuration.

It seems there are differences on intent of the aforementioned APIs.

Our use case here is to allow user (i.e @ uAPI) to configure tx rates of 
max rate & min rate per VF queue.Hence we are inclined to 
ndo_set_tx_maxrate extension.


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-28  0:15                         ` Zhang, Xuejun
  0 siblings, 0 replies; 115+ messages in thread
From: Zhang, Xuejun @ 2023-11-28  0:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Chittim,
	Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, pabeni,
	Wenjun Wu


On 11/22/2023 7:22 PM, Jakub Kicinski wrote:
> On Wed, 22 Nov 2023 14:19:14 -0800 Zhang, Xuejun wrote:
>> The proposed API would incur net/core and driver changes as follows
>> a) existing drivers with ndo_set_tx_maxrate support upgraded to use new
>> ndo_set_tx_rate
>> b) net sysfs (replacing ndo_set_maxrate with ndo_set_tx_rate with
>> minrate and burst set to 0, -1 means ignore)
>> c) Keep the existing /sys/class/net/ethx/queues/tx_nn/tx_maxrate as it
>> is currently
>> d) Add sysfs entry as /sys/class/net/ethx/queues/tx_nn/tx_minrate &
>> /sys/class/net/ethx/queues/tx_nn/burst
> You described extending the sysfs API (which the ndo you mention
> is for) and nothing about converging the other existing APIs.
This is extension of ndo_set_tx_maxrate to include per queue parameters 
of tx_minrate and burst.

devlink rate api includes tx_maxrate and tx_minrate, it is intended for 
port rate configurations.

With regarding to tc mqprio, it is being used to configure queue group 
per tc.

For sriov ndo ndo_set_vf_rate, that has been used for overall VF rate 
configuration, not for queue based rate configuration.

It seems there are differences on intent of the aforementioned APIs.

Our use case here is to allow user (i.e @ uAPI) to configure tx rates of 
max rate & min rate per VF queue.Hence we are inclined to 
ndo_set_tx_maxrate extension.

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-28  0:15                         ` Zhang, Xuejun
@ 2023-11-28  1:43                           ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-11-28  1:43 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Chittim,
	Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan, pabeni,
	Wenjun Wu

On Mon, 27 Nov 2023 16:15:47 -0800 Zhang, Xuejun wrote:
> This is extension of ndo_set_tx_maxrate to include per queue parameters 
> of tx_minrate and burst.
> 
> devlink rate api includes tx_maxrate and tx_minrate, it is intended for 
> port rate configurations.
> 
> With regarding to tc mqprio, it is being used to configure queue group 
> per tc.
> 
> For sriov ndo ndo_set_vf_rate, that has been used for overall VF rate 
> configuration, not for queue based rate configuration.
> 
> It seems there are differences on intent of the aforementioned APIs.
> 
> Our use case here is to allow user (i.e @ uAPI) to configure tx rates of 
> max rate & min rate per VF queue.Hence we are inclined to 
> ndo_set_tx_maxrate extension.

I said:

  So since you asked for my opinion - my opinion is that step 1 is to
  create a common representation of what we already have and feed it
  to the drivers via a single interface. I could just be taking sysfs
  maxrate and feeding it to the driver via the devlink rate interface.
  If we have the right internals I give 0 cares about what uAPI you pick.

https://lore.kernel.org/all/20231118084843.70c344d9@kernel.org/

Again, the first step is creating a common kernel <> driver interface
which can be used to send to the driver the configuration from the
existing 4 interfaces.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-11-28  1:43                           ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-11-28  1:43 UTC (permalink / raw)
  To: Zhang, Xuejun
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, pabeni

On Mon, 27 Nov 2023 16:15:47 -0800 Zhang, Xuejun wrote:
> This is extension of ndo_set_tx_maxrate to include per queue parameters 
> of tx_minrate and burst.
> 
> devlink rate api includes tx_maxrate and tx_minrate, it is intended for 
> port rate configurations.
> 
> With regarding to tc mqprio, it is being used to configure queue group 
> per tc.
> 
> For sriov ndo ndo_set_vf_rate, that has been used for overall VF rate 
> configuration, not for queue based rate configuration.
> 
> It seems there are differences on intent of the aforementioned APIs.
> 
> Our use case here is to allow user (i.e @ uAPI) to configure tx rates of 
> max rate & min rate per VF queue.Hence we are inclined to 
> ndo_set_tx_maxrate extension.

I said:

  So since you asked for my opinion - my opinion is that step 1 is to
  create a common representation of what we already have and feed it
  to the drivers via a single interface. I could just be taking sysfs
  maxrate and feeding it to the driver via the devlink rate interface.
  If we have the right internals I give 0 cares about what uAPI you pick.

https://lore.kernel.org/all/20231118084843.70c344d9@kernel.org/

Again, the first step is creating a common kernel <> driver interface
which can be used to send to the driver the configuration from the
existing 4 interfaces.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-11-28  1:43                           ` Jakub Kicinski
@ 2023-12-14 20:29                             ` Paolo Abeni
  -1 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-14 20:29 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Samudrala, Sridhar, netdev, Simon Horman, anthony.l.nguyen,
	Chittim, Madhu, maxtram95, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Mon, 2023-11-27 at 17:43 -0800, Jakub Kicinski wrote:
> On Mon, 27 Nov 2023 16:15:47 -0800 Zhang, Xuejun wrote:
> > This is extension of ndo_set_tx_maxrate to include per queue parameters 
> > of tx_minrate and burst.
> > 
> > devlink rate api includes tx_maxrate and tx_minrate, it is intended for 
> > port rate configurations.
> > 
> > With regarding to tc mqprio, it is being used to configure queue group 
> > per tc.
> > 
> > For sriov ndo ndo_set_vf_rate, that has been used for overall VF rate 
> > configuration, not for queue based rate configuration.
> > 
> > It seems there are differences on intent of the aforementioned APIs.
> > 
> > Our use case here is to allow user (i.e @ uAPI) to configure tx rates of 
> > max rate & min rate per VF queue.Hence we are inclined to 
> > ndo_set_tx_maxrate extension.
> 
> I said:
> 
>   So since you asked for my opinion - my opinion is that step 1 is to
>   create a common representation of what we already have and feed it
>   to the drivers via a single interface. I could just be taking sysfs
>   maxrate and feeding it to the driver via the devlink rate interface.
>   If we have the right internals I give 0 cares about what uAPI you pick.
> 
> https://lore.kernel.org/all/20231118084843.70c344d9@kernel.org/
> 
> Again, the first step is creating a common kernel <> driver interface
> which can be used to send to the driver the configuration from the
> existing 4 interfaces.

Together with Simon, I spent some time on the above. We think the
ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
this offloads, with some small extensions (adding a 'max_rate' param,
too).

The idea would be:
- 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
is enabled
- extend sch_btf to support max rate
- do the relevant ice implementation
- ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
latter interface is a strict super-set of former)
- ndo_set_vf_rate could also be replaced with the mentioned ndo call
(with another small extension to the offload data)

I think mqprio deserves it's own separate offload interface, as it
covers multiple tasks other than shaping (grouping queues and mapping
priority to classes)

In the long run we could have a generic implementation of the
ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
generic way to fetch the devlink_port instance corresponding to the
given netdev and mapping the TBF features to the devlink_rate API.

Not starting this due to what Jiri mentioned [1].

WDYT?

Thanks,

Paolo and Simon

[1] https://lore.kernel.org/netdev/ZORRzEBcUDEjMniz@nanopsycho/

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-14 20:29                             ` Paolo Abeni
  0 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-14 20:29 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang, Wenjun Wu,
	maxtram95, Chittim, Madhu, Samudrala, Sridhar, Simon Horman

On Mon, 2023-11-27 at 17:43 -0800, Jakub Kicinski wrote:
> On Mon, 27 Nov 2023 16:15:47 -0800 Zhang, Xuejun wrote:
> > This is extension of ndo_set_tx_maxrate to include per queue parameters 
> > of tx_minrate and burst.
> > 
> > devlink rate api includes tx_maxrate and tx_minrate, it is intended for 
> > port rate configurations.
> > 
> > With regarding to tc mqprio, it is being used to configure queue group 
> > per tc.
> > 
> > For sriov ndo ndo_set_vf_rate, that has been used for overall VF rate 
> > configuration, not for queue based rate configuration.
> > 
> > It seems there are differences on intent of the aforementioned APIs.
> > 
> > Our use case here is to allow user (i.e @ uAPI) to configure tx rates of 
> > max rate & min rate per VF queue.Hence we are inclined to 
> > ndo_set_tx_maxrate extension.
> 
> I said:
> 
>   So since you asked for my opinion - my opinion is that step 1 is to
>   create a common representation of what we already have and feed it
>   to the drivers via a single interface. I could just be taking sysfs
>   maxrate and feeding it to the driver via the devlink rate interface.
>   If we have the right internals I give 0 cares about what uAPI you pick.
> 
> https://lore.kernel.org/all/20231118084843.70c344d9@kernel.org/
> 
> Again, the first step is creating a common kernel <> driver interface
> which can be used to send to the driver the configuration from the
> existing 4 interfaces.

Together with Simon, I spent some time on the above. We think the
ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
this offloads, with some small extensions (adding a 'max_rate' param,
too).

The idea would be:
- 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
is enabled
- extend sch_btf to support max rate
- do the relevant ice implementation
- ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
latter interface is a strict super-set of former)
- ndo_set_vf_rate could also be replaced with the mentioned ndo call
(with another small extension to the offload data)

I think mqprio deserves it's own separate offload interface, as it
covers multiple tasks other than shaping (grouping queues and mapping
priority to classes)

In the long run we could have a generic implementation of the
ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
generic way to fetch the devlink_port instance corresponding to the
given netdev and mapping the TBF features to the devlink_rate API.

Not starting this due to what Jiri mentioned [1].

WDYT?

Thanks,

Paolo and Simon

[1] https://lore.kernel.org/netdev/ZORRzEBcUDEjMniz@nanopsycho/


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-14 20:29                             ` Paolo Abeni
@ 2023-12-15  1:46                               ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-12-15  1:46 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, Simon Horman

On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
> Together with Simon, I spent some time on the above. We think the
> ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
> this offloads, with some small extensions (adding a 'max_rate' param,
> too).

uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
to implement common basis?

Is it not cleaner to have a separate driver API, with its ops
and capabilities?

> The idea would be:
> - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
> is enabled
> - extend sch_btf to support max rate
> - do the relevant ice implementation
> - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
> latter interface is a strict super-set of former)
> - ndo_set_vf_rate could also be replaced with the mentioned ndo call
> (with another small extension to the offload data)
> 
> I think mqprio deserves it's own separate offload interface, as it
> covers multiple tasks other than shaping (grouping queues and mapping
> priority to classes)
> 
> In the long run we could have a generic implementation of the
> ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
> generic way to fetch the devlink_port instance corresponding to the
> given netdev and mapping the TBF features to the devlink_rate API.
> 
> Not starting this due to what Jiri mentioned [1].

Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
rate limiting. That's separate from the internal representation.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-15  1:46                               ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-12-15  1:46 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Simon Horman,
	Chittim, Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
> Together with Simon, I spent some time on the above. We think the
> ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
> this offloads, with some small extensions (adding a 'max_rate' param,
> too).

uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
to implement common basis?

Is it not cleaner to have a separate driver API, with its ops
and capabilities?

> The idea would be:
> - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
> is enabled
> - extend sch_btf to support max rate
> - do the relevant ice implementation
> - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
> latter interface is a strict super-set of former)
> - ndo_set_vf_rate could also be replaced with the mentioned ndo call
> (with another small extension to the offload data)
> 
> I think mqprio deserves it's own separate offload interface, as it
> covers multiple tasks other than shaping (grouping queues and mapping
> priority to classes)
> 
> In the long run we could have a generic implementation of the
> ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
> generic way to fetch the devlink_port instance corresponding to the
> given netdev and mapping the TBF features to the devlink_rate API.
> 
> Not starting this due to what Jiri mentioned [1].

Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
rate limiting. That's separate from the internal representation.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-15  1:46                               ` Jakub Kicinski
@ 2023-12-15 11:06                                 ` Paolo Abeni
  -1 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-15 11:06 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang, Wenjun Wu,
	maxtram95, Chittim, Madhu, Samudrala, Sridhar, Simon Horman

On Thu, 2023-12-14 at 17:46 -0800, Jakub Kicinski wrote:
> On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
> > Together with Simon, I spent some time on the above. We think the
> > ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
> > this offloads, with some small extensions (adding a 'max_rate' param,
> > too).
> 
> uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
> to implement common basis?
> 
> Is it not cleaner to have a separate driver API, with its ops
> and capabilities?

We understand one of the end goal is consolidating the existing rate-
related in kernel interfaces.  Adding a new one does not feel a good
starting to reach that goal, see [1] & [2] ;). ndo_setup_tc() feels
like the natural choice for H/W offload and TBF is the existing
interface IMHO nearest to the requirements here.

The devlink rate API could be a possible alternative...

> > The idea would be:
> > - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
> > is enabled
> > - extend sch_btf to support max rate
> > - do the relevant ice implementation
> > - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
> > latter interface is a strict super-set of former)
> > - ndo_set_vf_rate could also be replaced with the mentioned ndo call
> > (with another small extension to the offload data)
> > 
> > I think mqprio deserves it's own separate offload interface, as it
> > covers multiple tasks other than shaping (grouping queues and mapping
> > priority to classes)
> > 
> > In the long run we could have a generic implementation of the
> > ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
> > generic way to fetch the devlink_port instance corresponding to the
> > given netdev and mapping the TBF features to the devlink_rate API.
> > 
> > Not starting this due to what Jiri mentioned [1].
> 
> Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
> rate limiting. That's separate from the internal representation.

... with a couples of caveats:

1) AFAICS devlink (and/or devlink_port) does not have fine grained, per
queue representation and intel want to be able to configure shaping on
per queue basis. I think/hope we don't want to bring the discussion to
extending the devlink interface with queue support, I fear that will
block us for a long time. Perhaps I’m missing or misunderstanding
something here. Otherwise in retrospect this looks like a reasonable
point to completely avoid devlink here.

2) My understanding of Jiri statement was more restrictive. @Jiri it
would great if could share your genuine interpretation: are you ok with
using the devlink_port rate API as a basis to replace
ndo_set_tx_maxrate() (via dev->devlink_port->devlink->) and possibly
ndo_set_vf_rate(). Note the given the previous point, this option would
still feel problematic.

Cheers,

Paolo

[1] https://xkcd.com/927/
[2] https://www.youtube.com/watch?v=f8kO_L-pDwo


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-15 11:06                                 ` Paolo Abeni
  0 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-15 11:06 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Samudrala, Sridhar, netdev, Simon Horman, anthony.l.nguyen,
	Chittim, Madhu, maxtram95, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Thu, 2023-12-14 at 17:46 -0800, Jakub Kicinski wrote:
> On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
> > Together with Simon, I spent some time on the above. We think the
> > ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
> > this offloads, with some small extensions (adding a 'max_rate' param,
> > too).
> 
> uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
> to implement common basis?
> 
> Is it not cleaner to have a separate driver API, with its ops
> and capabilities?

We understand one of the end goal is consolidating the existing rate-
related in kernel interfaces.  Adding a new one does not feel a good
starting to reach that goal, see [1] & [2] ;). ndo_setup_tc() feels
like the natural choice for H/W offload and TBF is the existing
interface IMHO nearest to the requirements here.

The devlink rate API could be a possible alternative...

> > The idea would be:
> > - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
> > is enabled
> > - extend sch_btf to support max rate
> > - do the relevant ice implementation
> > - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
> > latter interface is a strict super-set of former)
> > - ndo_set_vf_rate could also be replaced with the mentioned ndo call
> > (with another small extension to the offload data)
> > 
> > I think mqprio deserves it's own separate offload interface, as it
> > covers multiple tasks other than shaping (grouping queues and mapping
> > priority to classes)
> > 
> > In the long run we could have a generic implementation of the
> > ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
> > generic way to fetch the devlink_port instance corresponding to the
> > given netdev and mapping the TBF features to the devlink_rate API.
> > 
> > Not starting this due to what Jiri mentioned [1].
> 
> Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
> rate limiting. That's separate from the internal representation.

... with a couples of caveats:

1) AFAICS devlink (and/or devlink_port) does not have fine grained, per
queue representation and intel want to be able to configure shaping on
per queue basis. I think/hope we don't want to bring the discussion to
extending the devlink interface with queue support, I fear that will
block us for a long time. Perhaps I’m missing or misunderstanding
something here. Otherwise in retrospect this looks like a reasonable
point to completely avoid devlink here.

2) My understanding of Jiri statement was more restrictive. @Jiri it
would great if could share your genuine interpretation: are you ok with
using the devlink_port rate API as a basis to replace
ndo_set_tx_maxrate() (via dev->devlink_port->devlink->) and possibly
ndo_set_vf_rate(). Note the given the previous point, this option would
still feel problematic.

Cheers,

Paolo

[1] https://xkcd.com/927/
[2] https://www.youtube.com/watch?v=f8kO_L-pDwo

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-15 11:06                                 ` Paolo Abeni
@ 2023-12-15 11:47                                   ` Paolo Abeni
  -1 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-15 11:47 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: Samudrala, Sridhar, netdev, Simon Horman, anthony.l.nguyen,
	Chittim, Madhu, maxtram95, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Fri, 2023-12-15 at 12:06 +0100, Paolo Abeni wrote:
> 1) AFAICS devlink (and/or devlink_port) does not have fine grained, per
> queue representation and intel want to be able to configure shaping on
> per queue basis. I think/hope we don't want to bring the discussion to
> extending the devlink interface with queue support, I fear that will
> block us for a long time. Perhaps I’m missing or misunderstanding
> something here. Otherwise in retrospect this looks like a reasonable
> point to completely avoid devlink here.

Note to self: never send a message to the ML before my 3rd morning
coffee.

This thread started with Intel trying to using devlink rate for their
use-case, apparently slamming my doubt above.

My understanding is that in the patches the queue devlink <> queue
relationship was kept inside the driver and not exposed to the devlink
level.

If we want to use the devlink rate api to replace e.g.
ndo_set_tx_maxrate, we would need a devlink queue(id) or the like,
hence this point.

Cheer,

Paolo

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-15 11:47                                   ` Paolo Abeni
  0 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-15 11:47 UTC (permalink / raw)
  To: Jakub Kicinski, Jiri Pirko
  Cc: netdev, anthony.l.nguyen, intel-wired-lan, qi.z.zhang, Wenjun Wu,
	maxtram95, Chittim, Madhu, Samudrala, Sridhar, Simon Horman

On Fri, 2023-12-15 at 12:06 +0100, Paolo Abeni wrote:
> 1) AFAICS devlink (and/or devlink_port) does not have fine grained, per
> queue representation and intel want to be able to configure shaping on
> per queue basis. I think/hope we don't want to bring the discussion to
> extending the devlink interface with queue support, I fear that will
> block us for a long time. Perhaps I’m missing or misunderstanding
> something here. Otherwise in retrospect this looks like a reasonable
> point to completely avoid devlink here.

Note to self: never send a message to the ML before my 3rd morning
coffee.

This thread started with Intel trying to using devlink rate for their
use-case, apparently slamming my doubt above.

My understanding is that in the patches the queue devlink <> queue
relationship was kept inside the driver and not exposed to the devlink
level.

If we want to use the devlink rate api to replace e.g.
ndo_set_tx_maxrate, we would need a devlink queue(id) or the like,
hence this point.

Cheer,

Paolo


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-15  1:46                               ` Jakub Kicinski
@ 2023-12-15 12:22                                 ` Jiri Pirko
  -1 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-12-15 12:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Samudrala, Sridhar, netdev, maxtram95, Simon Horman, Chittim,
	Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan,
	Paolo Abeni, Wenjun Wu

Fri, Dec 15, 2023 at 02:46:04AM CET, kuba@kernel.org wrote:
>On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
>> Together with Simon, I spent some time on the above. We think the
>> ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
>> this offloads, with some small extensions (adding a 'max_rate' param,
>> too).
>
>uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
>to implement common basis?
>
>Is it not cleaner to have a separate driver API, with its ops
>and capabilities?
>
>> The idea would be:
>> - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
>> is enabled
>> - extend sch_btf to support max rate
>> - do the relevant ice implementation
>> - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
>> latter interface is a strict super-set of former)
>> - ndo_set_vf_rate could also be replaced with the mentioned ndo call
>> (with another small extension to the offload data)
>> 
>> I think mqprio deserves it's own separate offload interface, as it
>> covers multiple tasks other than shaping (grouping queues and mapping
>> priority to classes)
>> 
>> In the long run we could have a generic implementation of the
>> ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
>> generic way to fetch the devlink_port instance corresponding to the
>> given netdev and mapping the TBF features to the devlink_rate API.
>> 
>> Not starting this due to what Jiri mentioned [1].
>
>Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
>rate limiting. That's separate from the internal representation.

Devlink rate was introduced for configuring port functions that are
connected to eswitch port. I don't see any reason to extend it for
configuration of netdev on the host. We have netdev instance and other
means to do it.

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-15 12:22                                 ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-12-15 12:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, Simon Horman

Fri, Dec 15, 2023 at 02:46:04AM CET, kuba@kernel.org wrote:
>On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
>> Together with Simon, I spent some time on the above. We think the
>> ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
>> this offloads, with some small extensions (adding a 'max_rate' param,
>> too).
>
>uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
>to implement common basis?
>
>Is it not cleaner to have a separate driver API, with its ops
>and capabilities?
>
>> The idea would be:
>> - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
>> is enabled
>> - extend sch_btf to support max rate
>> - do the relevant ice implementation
>> - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
>> latter interface is a strict super-set of former)
>> - ndo_set_vf_rate could also be replaced with the mentioned ndo call
>> (with another small extension to the offload data)
>> 
>> I think mqprio deserves it's own separate offload interface, as it
>> covers multiple tasks other than shaping (grouping queues and mapping
>> priority to classes)
>> 
>> In the long run we could have a generic implementation of the
>> ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
>> generic way to fetch the devlink_port instance corresponding to the
>> given netdev and mapping the TBF features to the devlink_rate API.
>> 
>> Not starting this due to what Jiri mentioned [1].
>
>Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
>rate limiting. That's separate from the internal representation.

Devlink rate was introduced for configuring port functions that are
connected to eswitch port. I don't see any reason to extend it for
configuration of netdev on the host. We have netdev instance and other
means to do it.


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-15 11:06                                 ` Paolo Abeni
@ 2023-12-15 12:30                                   ` Jiri Pirko
  -1 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-12-15 12:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Samudrala, Sridhar, netdev, maxtram95, Simon Horman,
	anthony.l.nguyen, Chittim, Madhu, intel-wired-lan, qi.z.zhang,
	Jakub Kicinski, Wenjun Wu

Fri, Dec 15, 2023 at 12:06:52PM CET, pabeni@redhat.com wrote:
>On Thu, 2023-12-14 at 17:46 -0800, Jakub Kicinski wrote:
>> On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
>> > Together with Simon, I spent some time on the above. We think the
>> > ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
>> > this offloads, with some small extensions (adding a 'max_rate' param,
>> > too).
>> 
>> uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
>> to implement common basis?
>> 
>> Is it not cleaner to have a separate driver API, with its ops
>> and capabilities?
>
>We understand one of the end goal is consolidating the existing rate-
>related in kernel interfaces.  Adding a new one does not feel a good
>starting to reach that goal, see [1] & [2] ;). ndo_setup_tc() feels
>like the natural choice for H/W offload and TBF is the existing
>interface IMHO nearest to the requirements here.
>
>The devlink rate API could be a possible alternative...

Again, devlink rate was introduced for the rate configuration of the
entity that is not present (by netdev for example) on a host.
If we have netdev, let's use it.


>
>> > The idea would be:
>> > - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
>> > is enabled
>> > - extend sch_btf to support max rate
>> > - do the relevant ice implementation
>> > - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
>> > latter interface is a strict super-set of former)
>> > - ndo_set_vf_rate could also be replaced with the mentioned ndo call
>> > (with another small extension to the offload data)
>> > 
>> > I think mqprio deserves it's own separate offload interface, as it
>> > covers multiple tasks other than shaping (grouping queues and mapping
>> > priority to classes)
>> > 
>> > In the long run we could have a generic implementation of the
>> > ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
>> > generic way to fetch the devlink_port instance corresponding to the
>> > given netdev and mapping the TBF features to the devlink_rate API.
>> > 
>> > Not starting this due to what Jiri mentioned [1].
>> 
>> Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
>> rate limiting. That's separate from the internal representation.
>
>... with a couples of caveats:
>
>1) AFAICS devlink (and/or devlink_port) does not have fine grained, per
>queue representation and intel want to be able to configure shaping on
>per queue basis. I think/hope we don't want to bring the discussion to
>extending the devlink interface with queue support, I fear that will
>block us for a long time. Perhaps I’m missing or misunderstanding
>something here. Otherwise in retrospect this looks like a reasonable
>point to completely avoid devlink here.
>
>2) My understanding of Jiri statement was more restrictive. @Jiri it
>would great if could share your genuine interpretation: are you ok with
>using the devlink_port rate API as a basis to replace
>ndo_set_tx_maxrate() (via dev->devlink_port->devlink->) and possibly

Does not make any sense to me.


>ndo_set_vf_rate(). Note the given the previous point, this option would

ndo_set_vf_rate() (and the rest of ndo_[gs]et_vf_*() ndo) is the
legacy way. Devlink rate replaced that when switchdev eswich mode is
configured by:
$ sudo devlink dev eswitch set pci/0000:08:00.1 mode switchdev

In drivers, ndo_set_vf_rate() and devlink rate are implemented in the
same way. See mlx5 for example:
mlx5_esw_qos_set_vport_rate()
mlx5_esw_devlink_rate_leaf_tx_share_set()



>still feel problematic.
>
>Cheers,
>
>Paolo
>
>[1] https://xkcd.com/927/
>[2] https://www.youtube.com/watch?v=f8kO_L-pDwo
>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-15 12:30                                   ` Jiri Pirko
  0 siblings, 0 replies; 115+ messages in thread
From: Jiri Pirko @ 2023-12-15 12:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, Simon Horman

Fri, Dec 15, 2023 at 12:06:52PM CET, pabeni@redhat.com wrote:
>On Thu, 2023-12-14 at 17:46 -0800, Jakub Kicinski wrote:
>> On Thu, 14 Dec 2023 21:29:51 +0100 Paolo Abeni wrote:
>> > Together with Simon, I spent some time on the above. We think the
>> > ndo_setup_tc(TC_SETUP_QDISC_TBF) hook could be used as common basis for
>> > this offloads, with some small extensions (adding a 'max_rate' param,
>> > too).
>> 
>> uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
>> to implement common basis?
>> 
>> Is it not cleaner to have a separate driver API, with its ops
>> and capabilities?
>
>We understand one of the end goal is consolidating the existing rate-
>related in kernel interfaces.  Adding a new one does not feel a good
>starting to reach that goal, see [1] & [2] ;). ndo_setup_tc() feels
>like the natural choice for H/W offload and TBF is the existing
>interface IMHO nearest to the requirements here.
>
>The devlink rate API could be a possible alternative...

Again, devlink rate was introduced for the rate configuration of the
entity that is not present (by netdev for example) on a host.
If we have netdev, let's use it.


>
>> > The idea would be:
>> > - 'fixing' sch_btf so that the s/w path became a no-op when h/w offload
>> > is enabled
>> > - extend sch_btf to support max rate
>> > - do the relevant ice implementation
>> > - ndo_set_tx_maxrate could be replaced with the mentioned ndo call (the
>> > latter interface is a strict super-set of former)
>> > - ndo_set_vf_rate could also be replaced with the mentioned ndo call
>> > (with another small extension to the offload data)
>> > 
>> > I think mqprio deserves it's own separate offload interface, as it
>> > covers multiple tasks other than shaping (grouping queues and mapping
>> > priority to classes)
>> > 
>> > In the long run we could have a generic implementation of the
>> > ndo_setup_tc(TC_SETUP_QDISC_TBF) in term of devlink rate adding a
>> > generic way to fetch the devlink_port instance corresponding to the
>> > given netdev and mapping the TBF features to the devlink_rate API.
>> > 
>> > Not starting this due to what Jiri mentioned [1].
>> 
>> Jiri, AFAIU, is against using devlink rate *uAPI* to configure network
>> rate limiting. That's separate from the internal representation.
>
>... with a couples of caveats:
>
>1) AFAICS devlink (and/or devlink_port) does not have fine grained, per
>queue representation and intel want to be able to configure shaping on
>per queue basis. I think/hope we don't want to bring the discussion to
>extending the devlink interface with queue support, I fear that will
>block us for a long time. Perhaps I’m missing or misunderstanding
>something here. Otherwise in retrospect this looks like a reasonable
>point to completely avoid devlink here.
>
>2) My understanding of Jiri statement was more restrictive. @Jiri it
>would great if could share your genuine interpretation: are you ok with
>using the devlink_port rate API as a basis to replace
>ndo_set_tx_maxrate() (via dev->devlink_port->devlink->) and possibly

Does not make any sense to me.


>ndo_set_vf_rate(). Note the given the previous point, this option would

ndo_set_vf_rate() (and the rest of ndo_[gs]et_vf_*() ndo) is the
legacy way. Devlink rate replaced that when switchdev eswich mode is
configured by:
$ sudo devlink dev eswitch set pci/0000:08:00.1 mode switchdev

In drivers, ndo_set_vf_rate() and devlink rate are implemented in the
same way. See mlx5 for example:
mlx5_esw_qos_set_vport_rate()
mlx5_esw_devlink_rate_leaf_tx_share_set()



>still feel problematic.
>
>Cheers,
>
>Paolo
>
>[1] https://xkcd.com/927/
>[2] https://www.youtube.com/watch?v=f8kO_L-pDwo
>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-15 11:06                                 ` Paolo Abeni
@ 2023-12-15 22:41                                   ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-12-15 22:41 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, Simon Horman

On Fri, 15 Dec 2023 12:06:52 +0100 Paolo Abeni wrote:
> > uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
> > to implement common basis?
> > 
> > Is it not cleaner to have a separate driver API, with its ops
> > and capabilities?  
> 
> We understand one of the end goal is consolidating the existing rate-
> related in kernel interfaces.  Adding a new one does not feel a good
> starting to reach that goal, see [1] & [2] ;)

ndo_setup_tc(TC_SETUP_QDISC_TBF) is a new API, too, very much so.
These attempts to build on top of existing interfaces with small
tweaks is leading us to a fragmented and incompatible driver landscape.

I explained before (perhaps on the netdev call) - Qdiscs have two
different offload models. "local" and "switchdev", here we want "local"
AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
method and which drivers support it today).

"We'll extend TBF" is very much adding a new API. You'll have to add
"local offload" support in TBF and no NIC driver today supports it.
I'm not saying TBF is bad, but I disagree that it's any different
than a new NDO for all practical purposes.

> ndo_setup_tc() feels like the natural choice for H/W offload and TBF
> is the existing interface IMHO nearest to the requirements here.

I question whether something as basic as scheduling and ACLs should
follow the "offload SW constructs" mantra. You are exposed to more
diverse users so please don't hesitate to disagree, but AFAICT
the transparent offload (user installs SW constructs and if offload
is available - offload, otherwise use SW is good enough) has not
played out like we have hoped.

Let's figure out what is the abstract model of scheduling / shaping
within a NIC that we want to target. And then come up with a way of
representing it in SW. Not which uAPI we can shoehorn into the use
case.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-15 22:41                                   ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-12-15 22:41 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Simon Horman,
	Chittim, Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Fri, 15 Dec 2023 12:06:52 +0100 Paolo Abeni wrote:
> > uAPI aside, why would we use ndo_setup_tc(TC_SETUP_QDISC_TBF)
> > to implement common basis?
> > 
> > Is it not cleaner to have a separate driver API, with its ops
> > and capabilities?  
> 
> We understand one of the end goal is consolidating the existing rate-
> related in kernel interfaces.  Adding a new one does not feel a good
> starting to reach that goal, see [1] & [2] ;)

ndo_setup_tc(TC_SETUP_QDISC_TBF) is a new API, too, very much so.
These attempts to build on top of existing interfaces with small
tweaks is leading us to a fragmented and incompatible driver landscape.

I explained before (perhaps on the netdev call) - Qdiscs have two
different offload models. "local" and "switchdev", here we want "local"
AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
method and which drivers support it today).

"We'll extend TBF" is very much adding a new API. You'll have to add
"local offload" support in TBF and no NIC driver today supports it.
I'm not saying TBF is bad, but I disagree that it's any different
than a new NDO for all practical purposes.

> ndo_setup_tc() feels like the natural choice for H/W offload and TBF
> is the existing interface IMHO nearest to the requirements here.

I question whether something as basic as scheduling and ACLs should
follow the "offload SW constructs" mantra. You are exposed to more
diverse users so please don't hesitate to disagree, but AFAICT
the transparent offload (user installs SW constructs and if offload
is available - offload, otherwise use SW is good enough) has not
played out like we have hoped.

Let's figure out what is the abstract model of scheduling / shaping
within a NIC that we want to target. And then come up with a way of
representing it in SW. Not which uAPI we can shoehorn into the use
case.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-15 22:41                                   ` Jakub Kicinski
@ 2023-12-18 20:12                                     ` Paolo Abeni
  -1 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-18 20:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, Simon Horman

On Fri, 2023-12-15 at 14:41 -0800, Jakub Kicinski wrote:
> I explained before (perhaps on the netdev call) - Qdiscs have two
> different offload models. "local" and "switchdev", here we want "local"
> AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
> method and which drivers support it today).

I must admit the above is not yet clear to me.

I initially thought you meant that "local" offloads properly
reconfigure the S/W datapath so that locally generated traffic would go
through the expected processing (e.g. shaping) just once, while with
"switchdev" offload locally generated traffic will see shaping done
both by the S/W and the H/W[1].

Reading the above I now think you mean that local offloads has only
effect for locally generated traffic but not on traffic forwarded via
eswitch, and vice versa[2]. 

The drivers I looked at did not show any clue (to me).

FTR, I think that [1] is a bug worth fixing and [2] is evil ;)

Could you please clarify which is the difference exactly between them?

> "We'll extend TBF" is very much adding a new API. You'll have to add
> "local offload" support in TBF and no NIC driver today supports it.
> I'm not saying TBF is bad, but I disagree that it's any different
> than a new NDO for all practical purposes.
> 
> > ndo_setup_tc() feels like the natural choice for H/W offload and TBF
> > is the existing interface IMHO nearest to the requirements here.
> 
> I question whether something as basic as scheduling and ACLs should
> follow the "offload SW constructs" mantra. You are exposed to more
> diverse users so please don't hesitate to disagree, but AFAICT
> the transparent offload (user installs SW constructs and if offload
> is available - offload, otherwise use SW is good enough) has not
> played out like we have hoped.
> 
> Let's figure out what is the abstract model of scheduling / shaping
> within a NIC that we want to target. And then come up with a way of
> representing it in SW. Not which uAPI we can shoehorn into the use
> case.

I thought the model was quite well defined since the initial submission
from Intel, and is quite simple: expose TX shaping on per tx queue
basis, with min rate, max rate (in bps) and burst (in bytes).

I think that making it more complex (e.g. with nesting, pkt overhead,
etc) we will still not cover every possible use case and will add
considerable complexity.
> 
Cheers,

Paolo


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-18 20:12                                     ` Paolo Abeni
  0 siblings, 0 replies; 115+ messages in thread
From: Paolo Abeni @ 2023-12-18 20:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Simon Horman,
	Chittim, Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Fri, 2023-12-15 at 14:41 -0800, Jakub Kicinski wrote:
> I explained before (perhaps on the netdev call) - Qdiscs have two
> different offload models. "local" and "switchdev", here we want "local"
> AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
> method and which drivers support it today).

I must admit the above is not yet clear to me.

I initially thought you meant that "local" offloads properly
reconfigure the S/W datapath so that locally generated traffic would go
through the expected processing (e.g. shaping) just once, while with
"switchdev" offload locally generated traffic will see shaping done
both by the S/W and the H/W[1].

Reading the above I now think you mean that local offloads has only
effect for locally generated traffic but not on traffic forwarded via
eswitch, and vice versa[2]. 

The drivers I looked at did not show any clue (to me).

FTR, I think that [1] is a bug worth fixing and [2] is evil ;)

Could you please clarify which is the difference exactly between them?

> "We'll extend TBF" is very much adding a new API. You'll have to add
> "local offload" support in TBF and no NIC driver today supports it.
> I'm not saying TBF is bad, but I disagree that it's any different
> than a new NDO for all practical purposes.
> 
> > ndo_setup_tc() feels like the natural choice for H/W offload and TBF
> > is the existing interface IMHO nearest to the requirements here.
> 
> I question whether something as basic as scheduling and ACLs should
> follow the "offload SW constructs" mantra. You are exposed to more
> diverse users so please don't hesitate to disagree, but AFAICT
> the transparent offload (user installs SW constructs and if offload
> is available - offload, otherwise use SW is good enough) has not
> played out like we have hoped.
> 
> Let's figure out what is the abstract model of scheduling / shaping
> within a NIC that we want to target. And then come up with a way of
> representing it in SW. Not which uAPI we can shoehorn into the use
> case.

I thought the model was quite well defined since the initial submission
from Intel, and is quite simple: expose TX shaping on per tx queue
basis, with min rate, max rate (in bps) and burst (in bytes).

I think that making it more complex (e.g. with nesting, pkt overhead,
etc) we will still not cover every possible use case and will add
considerable complexity.
> 
Cheers,

Paolo

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
  2023-12-18 20:12                                     ` Paolo Abeni
@ 2023-12-18 21:33                                       ` Jakub Kicinski
  -1 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-12-18 21:33 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jiri Pirko, netdev, anthony.l.nguyen, intel-wired-lan,
	qi.z.zhang, Wenjun Wu, maxtram95, Chittim, Madhu, Samudrala,
	Sridhar, Simon Horman

On Mon, 18 Dec 2023 21:12:35 +0100 Paolo Abeni wrote:
> On Fri, 2023-12-15 at 14:41 -0800, Jakub Kicinski wrote:
> > I explained before (perhaps on the netdev call) - Qdiscs have two
> > different offload models. "local" and "switchdev", here we want "local"
> > AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
> > method and which drivers support it today).  
> 
> I must admit the above is not yet clear to me.
> 
> I initially thought you meant that "local" offloads properly
> reconfigure the S/W datapath so that locally generated traffic would go
> through the expected processing (e.g. shaping) just once, while with
> "switchdev" offload locally generated traffic will see shaping done
> both by the S/W and the H/W[1].
> 
> Reading the above I now think you mean that local offloads has only
> effect for locally generated traffic but not on traffic forwarded via
> eswitch, and vice versa[2]. 
> 
> The drivers I looked at did not show any clue (to me).
> 
> FTR, I think that [1] is a bug worth fixing and [2] is evil ;)
> 
> Could you please clarify which is the difference exactly between them?

The practical difference which you can see in the code is that
"locally offloaded" qdiscs will act like a FIFO in the SW path (at least
to some extent). While "switchdev" offload qdiscs act exactly the same
regardless of the offload.

Neither is wrong, they are offloading different things. Qdisc offload
on a representor (switchdev) offloads from the switch perspective, i.e.
"ingress to host". Only fallback goes thru SW path, and should be
negligible.

"Local" offload can be implemented as admission control (and is
sometimes work conserving), it's on the "real" interface, it's egress,
and doesn't take part in forwarding.

> > I question whether something as basic as scheduling and ACLs should
> > follow the "offload SW constructs" mantra. You are exposed to more
> > diverse users so please don't hesitate to disagree, but AFAICT
> > the transparent offload (user installs SW constructs and if offload
> > is available - offload, otherwise use SW is good enough) has not
> > played out like we have hoped.
> > 
> > Let's figure out what is the abstract model of scheduling / shaping
> > within a NIC that we want to target. And then come up with a way of
> > representing it in SW. Not which uAPI we can shoehorn into the use
> > case.  
> 
> I thought the model was quite well defined since the initial submission
> from Intel, and is quite simple: expose TX shaping on per tx queue
> basis, with min rate, max rate (in bps) and burst (in bytes).

For some definition of a model, I guess. Given the confusion about
switchdev vs local (ingress vs egress) - I can't agree that the model
is well defined :(

What I mean is - given piece of functionality like "Tx queue shaping"
you can come up with a reasonable uAPI that you can hijack and it makes
sense to you. But someone else (switchdev ingress) can chose the same
API to implement a different offload. Not to mention that yet another
person will chose a different API to implement the same things as you :(

Off the top of my head we have at least:

 - Tx DMA admission control / scheduling (which Tx DMA queue will NIC 
   pull from)
 - Rx DMA scheduling (which Rx queue will NIC push to)

 - buffer/queue configuration (how to deal with buildup of packets in
   NIC SRAM, usually mostly for ingress)
 - NIC buffer configuration (how the SRAM is allocated to queues)

 - policers in the NIC forwarding logic


Let's extend this list so that it covers all reasonable NIC designs,
and them work on mapping how each of them is configured?

> I think that making it more complex (e.g. with nesting, pkt overhead,
> etc) we will still not cover every possible use case and will add
> considerable complexity.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support'
@ 2023-12-18 21:33                                       ` Jakub Kicinski
  0 siblings, 0 replies; 115+ messages in thread
From: Jakub Kicinski @ 2023-12-18 21:33 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jiri Pirko, Samudrala, Sridhar, netdev, maxtram95, Simon Horman,
	Chittim, Madhu, anthony.l.nguyen, qi.z.zhang, intel-wired-lan,
	Wenjun Wu

On Mon, 18 Dec 2023 21:12:35 +0100 Paolo Abeni wrote:
> On Fri, 2023-12-15 at 14:41 -0800, Jakub Kicinski wrote:
> > I explained before (perhaps on the netdev call) - Qdiscs have two
> > different offload models. "local" and "switchdev", here we want "local"
> > AFAIU and TBF only has "switchdev" offload (take a look at the enqueue
> > method and which drivers support it today).  
> 
> I must admit the above is not yet clear to me.
> 
> I initially thought you meant that "local" offloads properly
> reconfigure the S/W datapath so that locally generated traffic would go
> through the expected processing (e.g. shaping) just once, while with
> "switchdev" offload locally generated traffic will see shaping done
> both by the S/W and the H/W[1].
> 
> Reading the above I now think you mean that local offloads has only
> effect for locally generated traffic but not on traffic forwarded via
> eswitch, and vice versa[2]. 
> 
> The drivers I looked at did not show any clue (to me).
> 
> FTR, I think that [1] is a bug worth fixing and [2] is evil ;)
> 
> Could you please clarify which is the difference exactly between them?

The practical difference which you can see in the code is that
"locally offloaded" qdiscs will act like a FIFO in the SW path (at least
to some extent). While "switchdev" offload qdiscs act exactly the same
regardless of the offload.

Neither is wrong, they are offloading different things. Qdisc offload
on a representor (switchdev) offloads from the switch perspective, i.e.
"ingress to host". Only fallback goes thru SW path, and should be
negligible.

"Local" offload can be implemented as admission control (and is
sometimes work conserving), it's on the "real" interface, it's egress,
and doesn't take part in forwarding.

> > I question whether something as basic as scheduling and ACLs should
> > follow the "offload SW constructs" mantra. You are exposed to more
> > diverse users so please don't hesitate to disagree, but AFAICT
> > the transparent offload (user installs SW constructs and if offload
> > is available - offload, otherwise use SW is good enough) has not
> > played out like we have hoped.
> > 
> > Let's figure out what is the abstract model of scheduling / shaping
> > within a NIC that we want to target. And then come up with a way of
> > representing it in SW. Not which uAPI we can shoehorn into the use
> > case.  
> 
> I thought the model was quite well defined since the initial submission
> from Intel, and is quite simple: expose TX shaping on per tx queue
> basis, with min rate, max rate (in bps) and burst (in bytes).

For some definition of a model, I guess. Given the confusion about
switchdev vs local (ingress vs egress) - I can't agree that the model
is well defined :(

What I mean is - given piece of functionality like "Tx queue shaping"
you can come up with a reasonable uAPI that you can hijack and it makes
sense to you. But someone else (switchdev ingress) can chose the same
API to implement a different offload. Not to mention that yet another
person will chose a different API to implement the same things as you :(

Off the top of my head we have at least:

 - Tx DMA admission control / scheduling (which Tx DMA queue will NIC 
   pull from)
 - Rx DMA scheduling (which Rx queue will NIC push to)

 - buffer/queue configuration (how to deal with buildup of packets in
   NIC SRAM, usually mostly for ingress)
 - NIC buffer configuration (how the SRAM is allocated to queues)

 - policers in the NIC forwarding logic


Let's extend this list so that it covers all reasonable NIC designs,
and them work on mapping how each of them is configured?

> I think that making it more complex (e.g. with nesting, pkt overhead,
> etc) we will still not cover every possible use case and will add
> considerable complexity.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 115+ messages in thread

end of thread, other threads:[~2023-12-18 21:33 UTC | newest]

Thread overview: 115+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-27  2:10 [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
2023-07-31 22:22   ` Tony Nguyen
2023-08-01  9:24     ` Wu, Wenjun1
2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 2/5] ice: Support VF " Wenjun Wu
2023-07-31 22:23   ` Tony Nguyen
2023-08-01  9:30     ` Wu, Wenjun1
2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 3/5] iavf: Add devlink and devlink port support Wenjun Wu
2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 4/5] iavf: Add devlink port function rate API support Wenjun Wu
2023-07-27  2:10 ` [Intel-wired-lan] [PATCH iwl-next v1 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting Wenjun Wu
2023-07-31 22:21 ` [Intel-wired-lan] [PATCH iwl-next v1 0/5] iavf: Add devlink and devlink rate support Tony Nguyen
2023-08-01 18:43   ` Zhang, Xuejun
2023-08-08  1:57 ` [PATCH iwl-next v2 " Wenjun Wu
2023-08-08  1:57   ` [Intel-wired-lan] " Wenjun Wu
2023-08-08  1:57   ` [PATCH iwl-next v2 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
2023-08-08  1:57   ` [PATCH iwl-next v2 2/5] ice: Support VF " Wenjun Wu
2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16 16:54     ` Brett Creeley
2023-08-16 16:54       ` Brett Creeley
2023-08-08  1:57   ` [PATCH iwl-next v2 3/5] iavf: Add devlink and devlink port support Wenjun Wu
2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16 17:11     ` Brett Creeley
2023-08-16 17:11       ` Brett Creeley
2023-08-08  1:57   ` [PATCH iwl-next v2 4/5] iavf: Add devlink port function rate API support Wenjun Wu
2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
2023-08-08 20:49     ` Simon Horman
2023-08-08 20:49       ` [Intel-wired-lan] " Simon Horman
2023-08-09 18:43       ` Zhang, Xuejun
2023-08-09 18:43         ` Zhang, Xuejun
2023-08-16 17:27     ` Brett Creeley
2023-08-16 17:27       ` [Intel-wired-lan] " Brett Creeley
2023-08-08  1:57   ` [PATCH iwl-next v2 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting Wenjun Wu
2023-08-08  1:57     ` [Intel-wired-lan] " Wenjun Wu
2023-08-08 20:54     ` Simon Horman
2023-08-08 20:54       ` [Intel-wired-lan] " Simon Horman
2023-08-09 18:44       ` Zhang, Xuejun
2023-08-09 18:44         ` [Intel-wired-lan] " Zhang, Xuejun
2023-08-16 17:32     ` Brett Creeley
2023-08-16 17:32       ` [Intel-wired-lan] " Brett Creeley
2023-08-16  3:33 ` [PATCH iwl-next v3 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
2023-08-16  3:33   ` [Intel-wired-lan] " Wenjun Wu
2023-08-16  3:33   ` [PATCH iwl-next v3 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
2023-08-16  3:33     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16  3:33   ` [PATCH iwl-next v3 2/5] ice: Support VF " Wenjun Wu
2023-08-16  3:33     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16  3:33   ` [PATCH iwl-next v3 3/5] iavf: Add devlink and devlink port support Wenjun Wu
2023-08-16  3:33     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16  3:33   ` [PATCH iwl-next v3 4/5] iavf: Add devlink port function rate API support Wenjun Wu
2023-08-16  3:33     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16  3:33   ` [PATCH iwl-next v3 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting Wenjun Wu
2023-08-16  3:33     ` [Intel-wired-lan] " Wenjun Wu
2023-08-16  9:14     ` Simon Horman
2023-08-16  9:14       ` [Intel-wired-lan] " Simon Horman
2023-08-22  3:39 ` [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support Wenjun Wu
2023-08-22  3:39   ` [Intel-wired-lan] " Wenjun Wu
2023-08-22  3:39   ` [PATCH iwl-next v4 1/5] virtchnl: support queue rate limit and quanta size configuration Wenjun Wu
2023-08-22  3:39     ` [Intel-wired-lan] " Wenjun Wu
2023-08-22  3:40   ` [PATCH iwl-next v4 2/5] ice: Support VF " Wenjun Wu
2023-08-22  3:40     ` [Intel-wired-lan] " Wenjun Wu
2023-08-22  3:40   ` [PATCH iwl-next v4 3/5] iavf: Add devlink and devlink port support Wenjun Wu
2023-08-22  3:40     ` [Intel-wired-lan] " Wenjun Wu
2023-08-22  3:40   ` [PATCH iwl-next v4 4/5] iavf: Add devlink port function rate API support Wenjun Wu
2023-08-22  3:40     ` [Intel-wired-lan] " Wenjun Wu
2023-08-22  3:40   ` [PATCH iwl-next v4 5/5] iavf: Add VIRTCHNL Opcodes Support for Queue bw Setting Wenjun Wu
2023-08-22  3:40     ` [Intel-wired-lan] " Wenjun Wu
2023-08-22  6:12   ` [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support Jiri Pirko
2023-08-22  6:12     ` [Intel-wired-lan] " Jiri Pirko
2023-08-22 15:12     ` Jakub Kicinski
2023-08-22 15:12       ` Jakub Kicinski
2023-08-22 15:34       ` [PATCH iwl-next v4 0/5] iavf: Add devlink and devlink rate support' Jiri Pirko
2023-08-22 15:34         ` [Intel-wired-lan] " Jiri Pirko
2023-08-23 19:13         ` Zhang, Xuejun
2023-08-24  7:04           ` Jiri Pirko
2023-08-24  7:04             ` [Intel-wired-lan] " Jiri Pirko
2023-08-28 22:46             ` Zhang, Xuejun
2023-08-28 22:46               ` [Intel-wired-lan] " Zhang, Xuejun
2023-11-17  5:52               ` Zhang, Xuejun
2023-11-17  5:52                 ` Zhang, Xuejun
2023-11-17 11:21                 ` Jiri Pirko
2023-11-17 11:21                   ` Jiri Pirko
2023-11-21  9:04                   ` Paolo Abeni
2023-11-21  9:04                     ` Paolo Abeni
2023-11-18 16:48                 ` Jakub Kicinski
2023-11-18 16:48                   ` Jakub Kicinski
2023-11-22 22:19                   ` Zhang, Xuejun
2023-11-22 22:19                     ` Zhang, Xuejun
2023-11-23  3:22                     ` Jakub Kicinski
2023-11-23  3:22                       ` Jakub Kicinski
2023-11-28  0:15                       ` Zhang, Xuejun
2023-11-28  0:15                         ` Zhang, Xuejun
2023-11-28  1:43                         ` Jakub Kicinski
2023-11-28  1:43                           ` Jakub Kicinski
2023-12-14 20:29                           ` Paolo Abeni
2023-12-14 20:29                             ` Paolo Abeni
2023-12-15  1:46                             ` Jakub Kicinski
2023-12-15  1:46                               ` Jakub Kicinski
2023-12-15 11:06                               ` Paolo Abeni
2023-12-15 11:06                                 ` Paolo Abeni
2023-12-15 11:47                                 ` Paolo Abeni
2023-12-15 11:47                                   ` Paolo Abeni
2023-12-15 12:30                                 ` Jiri Pirko
2023-12-15 12:30                                   ` Jiri Pirko
2023-12-15 22:41                                 ` Jakub Kicinski
2023-12-15 22:41                                   ` Jakub Kicinski
2023-12-18 20:12                                   ` Paolo Abeni
2023-12-18 20:12                                     ` Paolo Abeni
2023-12-18 21:33                                     ` Jakub Kicinski
2023-12-18 21:33                                       ` Jakub Kicinski
2023-12-15 12:22                               ` Jiri Pirko
2023-12-15 12:22                                 ` Jiri Pirko
2023-10-18  9:05             ` Paolo Abeni
2023-10-18  9:05               ` [Intel-wired-lan] " Paolo Abeni
2023-08-23 21:39         ` Zhang, Xuejun
2023-08-23 21:39           ` Zhang, Xuejun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.