All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v8 0/9] Implement devlink-rate API and extend it
@ 2022-10-28 10:51 Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

This is a follow up on:
https://lore.kernel.org/netdev/20221018123543.1210217-1-michal.wilczynski@intel.com/

This patch series implements devlink-rate for ice driver. Unfortunately
current API isn't flexible enough for our use case, so there is a need to
extend it. Some functions have been introduced to enable the driver to
export current Tx scheduling configuration.

Pasting justification for this series from commit implementing devlink-rate
in ice driver(that is a part of this series):

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show
pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27


Let me visualize part of the tree:

                        +---------+
                        |  node_0 |
                        +---------+
                             |
                        +----v----+
                        | node_26 |
                        +----+----+
                             |
                        +----v----+
                        | node_27 |
                        +----+----+
                             |
                    |-----------------|
               +----v----+       +----v----+
               |   VF 1  |       |   VF 2  |
               +----+----+       +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
                 tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
                 pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.


V8:
- address minor formatting issues
- fix memory leak
- address warnings

V7:
- split into smaller commits
- paste justification for this series to cover letter

V6:
- replaced strncpy with strscpy
- renamed rate_vport -> rate_leaf

V5:
- removed queue support per community request
- fix division of 64bit variable with 32bit divisor by using div_u64()
- remove RDMA, ADQ exlusion as it's not necessary anymore
- changed how driver exports configuration, as queues are not supported
  anymore
- changed IDA to Xarray for unique node identification


V4:
- changed static variable counter to per port IDA to
  uniquely identify nodes

V3:
- removed shift macros, since FIELD_PREP is used
- added static_assert for struct
- removed unnecessary functions
- used tab instead of space in define

V2:
- fixed Alexandr comments
- refactored code to fix checkpatch issues
- added mutual exclusion for RDMA, DCB


Michal Wilczynski (9):
  devlink: Introduce new parameter 'tx_priority' to devlink-rate
  devlink: Introduce new parameter 'tx_weight' to devlink-rate
  devlink: Enable creation of the devlink-rate nodes from the driver
  devlink: Allow for devlink-rate nodes parent reassignment
  devlink: Allow to set up parent in devl_rate_leaf_create()
  devlink: Allow to change priv in devlink-rate from parent_set
    callbacks
  ice: Introduce new parameters in ice_sched_node
  ice: Implement devlink-rate API
  ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler

 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   4 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |   3 +
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   4 +
 drivers/net/ethernet/intel/ice/ice_devlink.c  | 477 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h  |   2 +
 drivers/net/ethernet/intel/ice/ice_idc.c      |   5 +
 drivers/net/ethernet/intel/ice/ice_repr.c     |  13 +
 drivers/net/ethernet/intel/ice/ice_sched.c    |  79 ++-
 drivers/net/ethernet/intel/ice/ice_sched.h    |  25 +
 drivers/net/ethernet/intel/ice/ice_type.h     |   8 +
 .../mellanox/mlx5/core/esw/devlink_port.c     |   4 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c |   4 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.h |   2 +-
 drivers/net/netdevsim/dev.c                   |  10 +-
 include/net/devlink.h                         |  21 +-
 include/uapi/linux/devlink.h                  |   3 +
 net/core/devlink.c                            | 145 +++++-
 17 files changed, 777 insertions(+), 32 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-31 10:13   ` Jiri Pirko
  2022-10-28 10:51 ` [PATCH net-next v8 2/9] devlink: Introduce new parameter 'tx_weight' " Michal Wilczynski
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

To fully utilize offload capabilities of Intel 100G card QoS capabilities
new parameter 'tx_priority' needs to be introduced. This parameter allows
for usage of strict priority arbiter among siblings. This arbitration
scheme attempts to schedule nodes based on their priority as long as the
nodes remain within their bandwidth limit.

Introduce new parameter in devlink-rate that will allow for
configuration of strict priority.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 include/net/devlink.h        |  6 ++++++
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 29 +++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index ba6b8b094943..9d2b0c3c4ad3 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -114,6 +114,8 @@ struct devlink_rate {
 			refcount_t refcnt;
 		};
 	};
+
+	u16 tx_priority;
 };
 
 struct devlink_port {
@@ -1493,10 +1495,14 @@ struct devlink_ops {
 				      u64 tx_share, struct netlink_ext_ack *extack);
 	int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
 				    u64 tx_max, struct netlink_ext_ack *extack);
+	int (*rate_leaf_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
+					 u64 tx_priority, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
 				      u64 tx_share, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
 				    u64 tx_max, struct netlink_ext_ack *extack);
+	int (*rate_node_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
+					 u64 tx_priority, struct netlink_ext_ack *extack);
 	int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
 			     struct netlink_ext_ack *extack);
 	int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 2f24b53a87a5..b3df5bc45ba5 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -607,6 +607,7 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_SELFTESTS,			/* nested */
 
+	DEVLINK_ATTR_RATE_TX_PRIORITY,		/* u16 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 89baa7c0938b..2586b1307cb4 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1184,6 +1184,9 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
 			      devlink_rate->tx_max, DEVLINK_ATTR_PAD))
 		goto nla_put_failure;
 
+	if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TX_PRIORITY,
+			devlink_rate->tx_priority))
+		goto nla_put_failure;
 	if (devlink_rate->parent)
 		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
 				   devlink_rate->parent->name))
@@ -1924,6 +1927,7 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 {
 	struct nlattr *nla_parent, **attrs = info->attrs;
 	int err = -EOPNOTSUPP;
+	u16 priority;
 	u64 rate;
 
 	if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) {
@@ -1952,6 +1956,20 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 		devlink_rate->tx_max = rate;
 	}
 
+	if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]) {
+		priority = nla_get_u16(attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]);
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_tx_priority_set(devlink_rate, devlink_rate->priv,
+							priority, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_tx_priority_set(devlink_rate, devlink_rate->priv,
+							priority, info->extack);
+
+		if (err)
+			return err;
+		devlink_rate->tx_priority = priority;
+	}
+
 	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
 	if (nla_parent) {
 		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
@@ -1983,6 +2001,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs");
 			return false;
 		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX priority set isn't supported for the leafs");
+			return false;
+		}
 	} else if (type == DEVLINK_RATE_TYPE_NODE) {
 		if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) {
 			NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes");
@@ -1997,6 +2020,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes");
 			return false;
 		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX priority set isn't supported for the nodes");
+			return false;
+		}
 	} else {
 		WARN(1, "Unknown type of rate object");
 		return false;
@@ -9172,6 +9200,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED },
+	[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U16 },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 2/9] devlink: Introduce new parameter 'tx_weight' to devlink-rate
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver Michal Wilczynski
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

To fully utilize offload capabilities of Intel 100G card QoS capabilities
new parameter 'tx_weight' needs to be introduced. This parameter allows
for usage of Weighted Fair Queuing arbitration scheme among siblings.
This arbitration scheme can be used simultaneously with the strict
priority.

Introduce new parameter in devlink-rate that will allow for
configuration of Weighted Fair Queueing.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 include/net/devlink.h        |  5 +++++
 include/uapi/linux/devlink.h |  2 ++
 net/core/devlink.c           | 31 +++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9d2b0c3c4ad3..929cb72ef412 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -116,6 +116,7 @@ struct devlink_rate {
 	};
 
 	u16 tx_priority;
+	u16 tx_weight;
 };
 
 struct devlink_port {
@@ -1497,12 +1498,16 @@ struct devlink_ops {
 				    u64 tx_max, struct netlink_ext_ack *extack);
 	int (*rate_leaf_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
 					 u64 tx_priority, struct netlink_ext_ack *extack);
+	int (*rate_leaf_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv,
+				       u64 tx_weight, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
 				      u64 tx_share, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
 				    u64 tx_max, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
 					 u64 tx_priority, struct netlink_ext_ack *extack);
+	int (*rate_node_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv,
+				       u64 tx_weight, struct netlink_ext_ack *extack);
 	int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
 			     struct netlink_ext_ack *extack);
 	int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b3df5bc45ba5..9f3916e02a64 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -608,6 +608,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_SELFTESTS,			/* nested */
 
 	DEVLINK_ATTR_RATE_TX_PRIORITY,		/* u16 */
+	DEVLINK_ATTR_RATE_TX_WEIGHT,		/* u16 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 2586b1307cb4..b97c077cf66e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1187,6 +1187,11 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
 	if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TX_PRIORITY,
 			devlink_rate->tx_priority))
 		goto nla_put_failure;
+
+	if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TX_WEIGHT,
+			devlink_rate->tx_weight))
+		goto nla_put_failure;
+
 	if (devlink_rate->parent)
 		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
 				   devlink_rate->parent->name))
@@ -1928,6 +1933,7 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 	struct nlattr *nla_parent, **attrs = info->attrs;
 	int err = -EOPNOTSUPP;
 	u16 priority;
+	u16 weight;
 	u64 rate;
 
 	if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) {
@@ -1970,6 +1976,20 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 		devlink_rate->tx_priority = priority;
 	}
 
+	if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]) {
+		weight = nla_get_u16(attrs[DEVLINK_ATTR_RATE_TX_WEIGHT]);
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_tx_weight_set(devlink_rate, devlink_rate->priv,
+							 weight, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_tx_weight_set(devlink_rate, devlink_rate->priv,
+							weight, info->extack);
+
+		if (err)
+			return err;
+		devlink_rate->tx_weight = weight;
+	}
+
 	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
 	if (nla_parent) {
 		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
@@ -2006,6 +2026,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
 					   "TX priority set isn't supported for the leafs");
 			return false;
 		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_leaf_tx_weight_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX weight set isn't supported for the leafs");
+			return false;
+		}
 	} else if (type == DEVLINK_RATE_TYPE_NODE) {
 		if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) {
 			NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes");
@@ -2025,6 +2050,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
 					   "TX priority set isn't supported for the nodes");
 			return false;
 		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_WEIGHT] && !ops->rate_node_tx_weight_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX weight set isn't supported for the nodes");
+			return false;
+		}
 	} else {
 		WARN(1, "Unknown type of rate object");
 		return false;
@@ -9201,6 +9231,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED },
 	[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U16 },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 2/9] devlink: Introduce new parameter 'tx_weight' " Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-31 10:19   ` Jiri Pirko
  2022-10-28 10:51 ` [PATCH net-next v8 4/9] devlink: Allow for devlink-rate nodes parent reassignment Michal Wilczynski
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
rigid and can't be easily removed. This requires an ability to export
default hierarchy to allow user to modify it. Currently the driver is
only able to create the 'leaf' nodes, which usually represent the vport.
This is not enough for HQoS implemented in Intel hardware.

Introduce new function devl_rate_node_create() that allows for creation
of the devlink-rate nodes from the driver.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 include/net/devlink.h |  4 ++++
 net/core/devlink.c    | 49 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 929cb72ef412..9d0a424712fd 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -98,6 +98,8 @@ struct devlink_port_attrs {
 	};
 };
 
+#define DEVLINK_RATE_NAME_MAX_LEN 30
+
 struct devlink_rate {
 	struct list_head list;
 	enum devlink_rate_type type;
@@ -1601,6 +1603,8 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
 				   u32 controller, u16 pf, u32 sf,
 				   bool external);
 int devl_rate_leaf_create(struct devlink_port *port, void *priv);
+int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
+			  char *parent_name);
 void devl_rate_leaf_destroy(struct devlink_port *devlink_port);
 void devl_rate_nodes_destroy(struct devlink *devlink);
 void devlink_port_linecard_set(struct devlink_port *devlink_port,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index b97c077cf66e..08f1bbd54c43 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -10270,6 +10270,55 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 contro
 }
 EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set);
 
+/**
+ * devl_rate_node_create - create devlink rate node
+ * @devlink: devlink instance
+ * @priv: driver private data
+ * @node_name: name of the resulting node
+ * @parent_name: name of the parent node
+ *
+ * Create devlink rate object of type node
+ */
+int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, char *parent_name)
+{
+	struct devlink_rate *rate_node;
+	struct devlink_rate *parent;
+
+	rate_node = devlink_rate_node_get_by_name(devlink, node_name);
+	if (!IS_ERR(rate_node))
+		return -EEXIST;
+
+	rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL);
+	if (!rate_node)
+		return -ENOMEM;
+
+	if (parent_name) {
+		parent = devlink_rate_node_get_by_name(devlink, parent_name);
+		if (IS_ERR(parent)) {
+			kfree(rate_node);
+			return -ENODEV;
+		}
+		rate_node->parent = parent;
+		refcount_inc(&rate_node->parent->refcnt);
+	}
+
+	rate_node->type = DEVLINK_RATE_TYPE_NODE;
+	rate_node->devlink = devlink;
+	rate_node->priv = priv;
+
+	rate_node->name = kstrndup(node_name, DEVLINK_RATE_NAME_MAX_LEN, GFP_KERNEL);
+	if (!rate_node->name) {
+		kfree(rate_node);
+		return -ENOMEM;
+	}
+
+	refcount_set(&rate_node->refcnt, 1);
+	list_add(&rate_node->list, &devlink->rate_list);
+	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devl_rate_node_create);
+
 /**
  * devl_rate_leaf_create - create devlink rate leaf
  * @devlink_port: devlink port object to create rate object on
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 4/9] devlink: Allow for devlink-rate nodes parent reassignment
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
                   ` (2 preceding siblings ...)
  2022-10-28 10:51 ` [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-31 10:25   ` Jiri Pirko
  2022-10-28 10:51 ` [PATCH net-next v8 5/9] devlink: Allow to set up parent in devl_rate_leaf_create() Michal Wilczynski
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

Currently it's not possible to reassign the parent of the node using one
command. As the previous commit introduced a way to export entire
hierarchy from the driver, being able to modify and reassign parents
become important. This way user might easily change QoS settings without
interrupting traffic.

Example command:
devlink port function rate set pci/0000:4b:00.0/1 parent node_custom_1

This reassigns leaf node parent to node_custom_1.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 net/core/devlink.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 08f1bbd54c43..9bdbc158c36a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1875,10 +1875,8 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 	int err = -EOPNOTSUPP;
 
 	parent = devlink_rate->parent;
-	if (parent && len) {
-		NL_SET_ERR_MSG_MOD(info->extack, "Rate object already has parent.");
-		return -EBUSY;
-	} else if (parent && !len) {
+
+	if (parent && !len) {
 		if (devlink_rate_is_leaf(devlink_rate))
 			err = ops->rate_leaf_parent_set(devlink_rate, NULL,
 							devlink_rate->priv, NULL,
@@ -1892,7 +1890,7 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 
 		refcount_dec(&parent->refcnt);
 		devlink_rate->parent = NULL;
-	} else if (!parent && len) {
+	} else if (len) {
 		parent = devlink_rate_node_get_by_name(devlink, parent_name);
 		if (IS_ERR(parent))
 			return -ENODEV;
@@ -1919,6 +1917,10 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 		if (err)
 			return err;
 
+		if (devlink_rate->parent)
+			/* we're reassigning to other parent in this case */
+			refcount_dec(&devlink_rate->parent->refcnt);
+
 		refcount_inc(&parent->refcnt);
 		devlink_rate->parent = parent;
 	}
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 5/9] devlink: Allow to set up parent in devl_rate_leaf_create()
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
                   ` (3 preceding siblings ...)
  2022-10-28 10:51 ` [PATCH net-next v8 4/9] devlink: Allow for devlink-rate nodes parent reassignment Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-31 10:26   ` Jiri Pirko
  2022-10-28 10:51 ` [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks Michal Wilczynski
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

Currently the driver is able to create leaf nodes for the devlink-rate,
but is unable to set parent for them. This wasn't as issue, before the
possibility to export hierarchy from the driver. After adding the export
feature, in order for the driver to supply correct hierarchy, it's
necessary for it to be able to supply a parent name to
devl_rate_leaf_create().

Introduce a new parameter 'parent_name' in devl_rate_leaf_create().

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 .../ethernet/mellanox/mlx5/core/esw/devlink_port.c |  4 ++--
 drivers/net/netdevsim/dev.c                        |  2 +-
 include/net/devlink.h                              |  2 +-
 net/core/devlink.c                                 | 14 +++++++++++++-
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index 9bc7be95db54..084a910bb4e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -91,7 +91,7 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
 	if (err)
 		goto reg_err;
 
-	err = devl_rate_leaf_create(dl_port, vport);
+	err = devl_rate_leaf_create(dl_port, vport, NULL);
 	if (err)
 		goto rate_err;
 
@@ -160,7 +160,7 @@ int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_p
 	if (err)
 		return err;
 
-	err = devl_rate_leaf_create(dl_port, vport);
+	err = devl_rate_leaf_create(dl_port, vport, NULL);
 	if (err)
 		goto rate_err;
 
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 794fc0cc73b8..10e5c4de6b02 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1392,7 +1392,7 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev, enum nsim_dev_port_typ
 
 	if (nsim_dev_port_is_vf(nsim_dev_port)) {
 		err = devl_rate_leaf_create(&nsim_dev_port->devlink_port,
-					    nsim_dev_port);
+					    nsim_dev_port, NULL);
 		if (err)
 			goto err_nsim_destroy;
 	}
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9d0a424712fd..2ccb69606d23 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1602,7 +1602,7 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
 void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
 				   u32 controller, u16 pf, u32 sf,
 				   bool external);
-int devl_rate_leaf_create(struct devlink_port *port, void *priv);
+int devl_rate_leaf_create(struct devlink_port *port, void *priv, char *parent_name);
 int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
 			  char *parent_name);
 void devl_rate_leaf_destroy(struct devlink_port *devlink_port);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9bdbc158c36a..140336c09bd5 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -10325,13 +10325,15 @@ EXPORT_SYMBOL_GPL(devl_rate_node_create);
  * devl_rate_leaf_create - create devlink rate leaf
  * @devlink_port: devlink port object to create rate object on
  * @priv: driver private data
+ * @parent_name: name of the parent node
  *
  * Create devlink rate object of type leaf on provided @devlink_port.
  */
-int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv)
+int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv, char *parent_name)
 {
 	struct devlink *devlink = devlink_port->devlink;
 	struct devlink_rate *devlink_rate;
+	struct devlink_rate *parent;
 
 	devl_assert_locked(devlink_port->devlink);
 
@@ -10342,6 +10344,16 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv)
 	if (!devlink_rate)
 		return -ENOMEM;
 
+	if (parent_name) {
+		parent = devlink_rate_node_get_by_name(devlink, parent_name);
+		if (IS_ERR(parent)) {
+			kfree(devlink_rate);
+			return -ENODEV;
+		}
+		devlink_rate->parent = parent;
+		refcount_inc(&devlink_rate->parent->refcnt);
+	}
+
 	devlink_rate->type = DEVLINK_RATE_TYPE_LEAF;
 	devlink_rate->devlink = devlink;
 	devlink_rate->devlink_port = devlink_port;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
                   ` (4 preceding siblings ...)
  2022-10-28 10:51 ` [PATCH net-next v8 5/9] devlink: Allow to set up parent in devl_rate_leaf_create() Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-31 12:22   ` Jiri Pirko
  2022-10-28 10:51 ` [PATCH net-next v8 7/9] ice: Introduce new parameters in ice_sched_node Michal Wilczynski
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

From driver perspective it doesn't make any sense to make any changes to
the internal HQoS tree if the created node doesn't have a parent. So a
node created without any parent doesn't have to be initialized in the
driver. Allow for such scenario by allowing to modify priv in parent_set
callbacks.

Change priv parameter to double pointer, to allow for setting priv during
the parent set phase.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h |  2 +-
 drivers/net/netdevsim/dev.c                       |  8 ++++----
 include/net/devlink.h                             |  4 ++--
 net/core/devlink.c                                | 12 ++++++------
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 4f8a24d84a86..0b55a1e477f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -940,11 +940,11 @@ int mlx5_esw_qos_vport_update_group(struct mlx5_eswitch *esw,
 
 int mlx5_esw_devlink_rate_parent_set(struct devlink_rate *devlink_rate,
 				     struct devlink_rate *parent,
-				     void *priv, void *parent_priv,
+				     void **priv, void *parent_priv,
 				     struct netlink_ext_ack *extack)
 {
 	struct mlx5_esw_rate_group *group;
-	struct mlx5_vport *vport = priv;
+	struct mlx5_vport *vport = *priv;
 
 	if (!parent)
 		return mlx5_esw_qos_vport_update_group(vport->dev->priv.eswitch,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
index 0141e9d52037..d3b3ce26883b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
@@ -24,7 +24,7 @@ int mlx5_esw_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
 				   struct netlink_ext_ack *extack);
 int mlx5_esw_devlink_rate_parent_set(struct devlink_rate *devlink_rate,
 				     struct devlink_rate *parent,
-				     void *priv, void *parent_priv,
+				     void **priv, void *parent_priv,
 				     struct netlink_ext_ack *extack);
 #endif
 
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 10e5c4de6b02..f5ae4aed8679 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1275,10 +1275,10 @@ static int nsim_rate_node_del(struct devlink_rate *node, void *priv,
 
 static int nsim_rate_leaf_parent_set(struct devlink_rate *child,
 				     struct devlink_rate *parent,
-				     void *priv_child, void *priv_parent,
+				     void **priv_child, void *priv_parent,
 				     struct netlink_ext_ack *extack)
 {
-	struct nsim_dev_port *nsim_dev_port = priv_child;
+	struct nsim_dev_port *nsim_dev_port = *priv_child;
 
 	if (parent)
 		nsim_dev_port->parent_name = parent->name;
@@ -1289,10 +1289,10 @@ static int nsim_rate_leaf_parent_set(struct devlink_rate *child,
 
 static int nsim_rate_node_parent_set(struct devlink_rate *child,
 				     struct devlink_rate *parent,
-				     void *priv_child, void *priv_parent,
+				     void **priv_child, void *priv_parent,
 				     struct netlink_ext_ack *extack)
 {
-	struct nsim_rate_node *nsim_node = priv_child;
+	struct nsim_rate_node *nsim_node = *priv_child;
 
 	if (parent)
 		nsim_node->parent_name = parent->name;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2ccb69606d23..3085b018e635 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1516,11 +1516,11 @@ struct devlink_ops {
 			     struct netlink_ext_ack *extack);
 	int (*rate_leaf_parent_set)(struct devlink_rate *child,
 				    struct devlink_rate *parent,
-				    void *priv_child, void *priv_parent,
+				    void **priv_child, void *priv_parent,
 				    struct netlink_ext_ack *extack);
 	int (*rate_node_parent_set)(struct devlink_rate *child,
 				    struct devlink_rate *parent,
-				    void *priv_child, void *priv_parent,
+				    void **priv_child, void *priv_parent,
 				    struct netlink_ext_ack *extack);
 	/**
 	 * selftests_check() - queries if selftest is supported
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 140336c09bd5..2d40ed440a33 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1879,11 +1879,11 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 	if (parent && !len) {
 		if (devlink_rate_is_leaf(devlink_rate))
 			err = ops->rate_leaf_parent_set(devlink_rate, NULL,
-							devlink_rate->priv, NULL,
+							&devlink_rate->priv, NULL,
 							info->extack);
 		else if (devlink_rate_is_node(devlink_rate))
 			err = ops->rate_node_parent_set(devlink_rate, NULL,
-							devlink_rate->priv, NULL,
+							&devlink_rate->priv, NULL,
 							info->extack);
 		if (err)
 			return err;
@@ -1908,11 +1908,11 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 
 		if (devlink_rate_is_leaf(devlink_rate))
 			err = ops->rate_leaf_parent_set(devlink_rate, parent,
-							devlink_rate->priv, parent->priv,
+							&devlink_rate->priv, parent->priv,
 							info->extack);
 		else if (devlink_rate_is_node(devlink_rate))
 			err = ops->rate_node_parent_set(devlink_rate, parent,
-							devlink_rate->priv, parent->priv,
+							&devlink_rate->priv, parent->priv,
 							info->extack);
 		if (err)
 			return err;
@@ -10410,10 +10410,10 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 
 		refcount_dec(&devlink_rate->parent->refcnt);
 		if (devlink_rate_is_leaf(devlink_rate))
-			ops->rate_leaf_parent_set(devlink_rate, NULL, devlink_rate->priv,
+			ops->rate_leaf_parent_set(devlink_rate, NULL, &devlink_rate->priv,
 						  NULL, NULL);
 		else if (devlink_rate_is_node(devlink_rate))
-			ops->rate_node_parent_set(devlink_rate, NULL, devlink_rate->priv,
+			ops->rate_node_parent_set(devlink_rate, NULL, &devlink_rate->priv,
 						  NULL, NULL);
 	}
 	list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_list, list) {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 7/9] ice: Introduce new parameters in ice_sched_node
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
                   ` (5 preceding siblings ...)
  2022-10-28 10:51 ` [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 8/9] ice: Implement devlink-rate API Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 9/9] ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler Michal Wilczynski
  8 siblings, 0 replies; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

To support new devlink-rate API ice_sched_node struct needs to store
a number of additional parameters. This includes tx_max, tx_share,
tx_weight, and tx_priority.

Add new fields to ice_sched_node struct. Add new functions to configure
the hardware with new parameters. Introduce new xarray to identify
nodes uniquely.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  4 +-
 drivers/net/ethernet/intel/ice/ice_common.c   |  3 +
 drivers/net/ethernet/intel/ice/ice_sched.c    | 79 +++++++++++++++++--
 drivers/net/ethernet/intel/ice/ice_sched.h    | 25 ++++++
 drivers/net/ethernet/intel/ice/ice_type.h     |  7 ++
 5 files changed, 111 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 1bdc70aa979d..958c1e435232 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -848,9 +848,9 @@ struct ice_aqc_txsched_elem {
 	u8 generic;
 #define ICE_AQC_ELEM_GENERIC_MODE_M		0x1
 #define ICE_AQC_ELEM_GENERIC_PRIO_S		0x1
-#define ICE_AQC_ELEM_GENERIC_PRIO_M	(0x7 << ICE_AQC_ELEM_GENERIC_PRIO_S)
+#define ICE_AQC_ELEM_GENERIC_PRIO_M	        GENMASK(3, 1)
 #define ICE_AQC_ELEM_GENERIC_SP_S		0x4
-#define ICE_AQC_ELEM_GENERIC_SP_M	(0x1 << ICE_AQC_ELEM_GENERIC_SP_S)
+#define ICE_AQC_ELEM_GENERIC_SP_M	        GENMASK(4, 4)
 #define ICE_AQC_ELEM_GENERIC_ADJUST_VAL_S	0x5
 #define ICE_AQC_ELEM_GENERIC_ADJUST_VAL_M	\
 	(0x3 << ICE_AQC_ELEM_GENERIC_ADJUST_VAL_S)
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 039342a0ed15..e2e661010176 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1105,6 +1105,9 @@ int ice_init_hw(struct ice_hw *hw)
 
 	hw->evb_veb = true;
 
+	/* init xarray for identifying scheduling nodes uniquely */
+	xa_init_flags(&hw->port_info->sched_node_ids, XA_FLAGS_ALLOC);
+
 	/* Query the allocated resources for Tx scheduler */
 	status = ice_sched_query_res_alloc(hw);
 	if (status) {
diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c
index 118595763bba..782e46488c1e 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2018, Intel Corporation. */
 
+#include <net/devlink.h>
 #include "ice_sched.h"
 
 /**
@@ -355,6 +356,9 @@ void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node)
 	/* leaf nodes have no children */
 	if (node->children)
 		devm_kfree(ice_hw_to_dev(hw), node->children);
+
+	kfree(node->name);
+	xa_erase(&pi->sched_node_ids, node->id);
 	devm_kfree(ice_hw_to_dev(hw), node);
 }
 
@@ -875,7 +879,7 @@ void ice_sched_cleanup_all(struct ice_hw *hw)
  *
  * This function add nodes to HW as well as to SW DB for a given layer
  */
-static int
+int
 ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node,
 		    struct ice_sched_node *parent, u8 layer, u16 num_nodes,
 		    u16 *num_nodes_added, u32 *first_node_teid)
@@ -940,6 +944,22 @@ ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node,
 
 		new_node->sibling = NULL;
 		new_node->tc_num = tc_node->tc_num;
+		new_node->tx_weight = ICE_SCHED_DFLT_BW_WT;
+		new_node->tx_share = ICE_SCHED_DFLT_BW;
+		new_node->tx_max = ICE_SCHED_DFLT_BW;
+		new_node->name = kzalloc(DEVLINK_RATE_NAME_MAX_LEN, GFP_KERNEL);
+		if (!new_node->name)
+			return -ENOMEM;
+
+		status = xa_alloc(&pi->sched_node_ids, &new_node->id, NULL, XA_LIMIT(0, UINT_MAX),
+				  GFP_KERNEL);
+		if (status) {
+			ice_debug(hw, ICE_DBG_SCHED, "xa_alloc failed for sched node status =%d\n",
+				  status);
+			break;
+		}
+
+		snprintf(new_node->name, DEVLINK_RATE_NAME_MAX_LEN, "node_%u", new_node->id);
 
 		/* add it to previous node sibling pointer */
 		/* Note: siblings are not linked across branches */
@@ -2154,7 +2174,7 @@ ice_sched_get_free_vsi_parent(struct ice_hw *hw, struct ice_sched_node *node,
  * This function removes the child from the old parent and adds it to a new
  * parent
  */
-static void
+void
 ice_sched_update_parent(struct ice_sched_node *new_parent,
 			struct ice_sched_node *node)
 {
@@ -2188,7 +2208,7 @@ ice_sched_update_parent(struct ice_sched_node *new_parent,
  *
  * This function move the child nodes to a given parent.
  */
-static int
+int
 ice_sched_move_nodes(struct ice_port_info *pi, struct ice_sched_node *parent,
 		     u16 num_items, u32 *list)
 {
@@ -3560,7 +3580,7 @@ ice_sched_set_eir_srl_excl(struct ice_port_info *pi,
  * node's RL profile ID of type CIR, EIR, or SRL, and removes old profile
  * ID from local database. The caller needs to hold scheduler lock.
  */
-static int
+int
 ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node,
 		      enum ice_rl_type rl_type, u32 bw, u8 layer_num)
 {
@@ -3596,6 +3616,55 @@ ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node,
 				       ICE_AQC_RL_PROFILE_TYPE_M, old_id);
 }
 
+/**
+ * ice_sched_set_node_priority - set node's priority
+ * @pi: port information structure
+ * @node: tree node
+ * @priority: number 0-7 representing priority among siblings
+ *
+ * This function sets priority of a node among it's siblings.
+ */
+int
+ice_sched_set_node_priority(struct ice_port_info *pi, struct ice_sched_node *node,
+			    u16 priority)
+{
+	struct ice_aqc_txsched_elem_data buf;
+	struct ice_aqc_txsched_elem *data;
+
+	buf = node->info;
+	data = &buf.data;
+
+	data->valid_sections |= ICE_AQC_ELEM_VALID_GENERIC;
+	data->generic |= FIELD_PREP(ICE_AQC_ELEM_GENERIC_MODE_M, 0x1);
+	data->generic |= FIELD_PREP(ICE_AQC_ELEM_GENERIC_PRIO_M, priority);
+
+	return ice_sched_update_elem(pi->hw, node, &buf);
+}
+
+/**
+ * ice_sched_set_node_weight - set node's weight
+ * @pi: port information structure
+ * @node: tree node
+ * @weight: number 1-200 representing weight for WFQ
+ *
+ * This function sets weight of the node for WFQ algorithm.
+ */
+int
+ice_sched_set_node_weight(struct ice_port_info *pi, struct ice_sched_node *node, u16 weight)
+{
+	struct ice_aqc_txsched_elem_data buf;
+	struct ice_aqc_txsched_elem *data;
+
+	buf = node->info;
+	data = &buf.data;
+
+	data->valid_sections = ICE_AQC_ELEM_VALID_CIR | ICE_AQC_ELEM_VALID_EIR;
+	data->cir_bw.bw_alloc = cpu_to_le16(weight);
+	data->eir_bw.bw_alloc = cpu_to_le16(weight);
+
+	return ice_sched_update_elem(pi->hw, node, &buf);
+}
+
 /**
  * ice_sched_set_node_bw_lmt - set node's BW limit
  * @pi: port information structure
@@ -3606,7 +3675,7 @@ ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node,
  * It updates node's BW limit parameters like BW RL profile ID of type CIR,
  * EIR, or SRL. The caller needs to hold scheduler lock.
  */
-static int
+int
 ice_sched_set_node_bw_lmt(struct ice_port_info *pi, struct ice_sched_node *node,
 			  enum ice_rl_type rl_type, u32 bw)
 {
diff --git a/drivers/net/ethernet/intel/ice/ice_sched.h b/drivers/net/ethernet/intel/ice/ice_sched.h
index 4f91577fed56..581148888b6f 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.h
+++ b/drivers/net/ethernet/intel/ice/ice_sched.h
@@ -69,6 +69,28 @@ int
 ice_aq_query_sched_elems(struct ice_hw *hw, u16 elems_req,
 			 struct ice_aqc_txsched_elem_data *buf, u16 buf_size,
 			 u16 *elems_ret, struct ice_sq_cd *cd);
+
+int
+ice_sched_set_node_bw_lmt(struct ice_port_info *pi, struct ice_sched_node *node,
+			  enum ice_rl_type rl_type, u32 bw);
+
+int
+ice_sched_set_node_bw(struct ice_port_info *pi, struct ice_sched_node *node,
+		      enum ice_rl_type rl_type, u32 bw, u8 layer_num);
+
+int
+ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node,
+		    struct ice_sched_node *parent, u8 layer, u16 num_nodes,
+		    u16 *num_nodes_added, u32 *first_node_teid);
+
+int
+ice_sched_move_nodes(struct ice_port_info *pi, struct ice_sched_node *parent,
+		     u16 num_items, u32 *list);
+
+int ice_sched_set_node_priority(struct ice_port_info *pi, struct ice_sched_node *node,
+				u16 priority);
+int ice_sched_set_node_weight(struct ice_port_info *pi, struct ice_sched_node *node, u16 weight);
+
 int ice_sched_init_port(struct ice_port_info *pi);
 int ice_sched_query_res_alloc(struct ice_hw *hw);
 void ice_sched_get_psm_clk_freq(struct ice_hw *hw);
@@ -82,6 +104,9 @@ ice_sched_find_node_by_teid(struct ice_sched_node *start_node, u32 teid);
 int
 ice_sched_add_node(struct ice_port_info *pi, u8 layer,
 		   struct ice_aqc_txsched_elem_data *info);
+void
+ice_sched_update_parent(struct ice_sched_node *new_parent,
+			struct ice_sched_node *node);
 void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node);
 struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc);
 struct ice_sched_node *
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index e1abfcee96dc..3b6d317371cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -524,8 +524,14 @@ struct ice_sched_node {
 	struct ice_sched_node *sibling; /* next sibling in the same layer */
 	struct ice_sched_node **children;
 	struct ice_aqc_txsched_elem_data info;
+	char *name;
+	u64 tx_max;
+	u64 tx_share;
 	u32 agg_id;			/* aggregator group ID */
+	u32 id;
 	u16 vsi_handle;
+	u16 tx_priority;
+	u16 tx_weight;
 	u8 in_use;			/* suspended or in use */
 	u8 tx_sched_layer;		/* Logical Layer (1-9) */
 	u8 num_children;
@@ -706,6 +712,7 @@ struct ice_port_info {
 	/* List contain profile ID(s) and other params per layer */
 	struct list_head rl_prof_list[ICE_AQC_TOPO_MAX_LEVEL_NUM];
 	struct ice_qos_cfg qos_cfg;
+	struct xarray sched_node_ids;
 	u8 is_vf:1;
 };
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 8/9] ice: Implement devlink-rate API
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
                   ` (6 preceding siblings ...)
  2022-10-28 10:51 ` [PATCH net-next v8 7/9] ice: Introduce new parameters in ice_sched_node Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  2022-10-28 10:51 ` [PATCH net-next v8 9/9] ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler Michal Wilczynski
  8 siblings, 0 replies; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show

pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27

Let me visualize part of the tree:

                    +---------+
                    |  node_0 |
                    +---------+
                         |
                    +----v----+
                    | node_26 |
                    +----+----+
                         |
                    +----v----+
                    | node_27 |
                    +----+----+
                         |
                |-----------------|
           +----v----+       +----v----+
           |   VF 1  |       |   VF 2  |
           +----+----+       +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
                 tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
                 pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
                 pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_devlink.c | 406 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_devlink.h |   2 +
 drivers/net/ethernet/intel/ice/ice_repr.c    |  13 +
 3 files changed, 421 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index e6ec20079ced..9742ad75b72a 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -713,6 +713,394 @@ ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port,
 	return ice_devlink_port_split(devlink, port, 1, extack);
 }
 
+/**
+ * ice_traverse_tx_tree - traverse Tx scheduler tree
+ * @devlink: devlink struct
+ * @node: current node, used for recursion
+ * @tc_node: tc_node struct, that is treated as a root
+ * @pf: pf struct
+ *
+ * This function traverses Tx scheduler tree and exports
+ * entire structure to the devlink-rate.
+ */
+static void ice_traverse_tx_tree(struct devlink *devlink, struct ice_sched_node *node,
+				 struct ice_sched_node *tc_node, struct ice_pf *pf)
+{
+	struct ice_vf *vf;
+	int i;
+
+	devl_lock(devlink);
+
+	if (node->parent == tc_node) {
+		/* create root node */
+		devl_rate_node_create(devlink, node, node->name, NULL);
+	} else if (node->vsi_handle &&
+		   pf->vsi[node->vsi_handle]->vf) {
+		vf = pf->vsi[node->vsi_handle]->vf;
+		if (!vf->devlink_port.devlink_rate)
+			devl_rate_leaf_create(&vf->devlink_port, node, node->parent->name);
+	} else if (node->info.data.elem_type != ICE_AQC_ELEM_TYPE_LEAF &&
+		   node->parent->name) {
+		devl_rate_node_create(devlink, node, node->name, node->parent->name);
+	}
+
+	devl_unlock(devlink);
+
+	for (i = 0; i < node->num_children; i++)
+		ice_traverse_tx_tree(devlink, node->children[i], tc_node, pf);
+}
+
+/**
+ * ice_devlink_rate_init_tx_topology - export Tx scheduler tree to devlink rate
+ * @devlink: devlink struct
+ * @vsi: main vsi struct
+ *
+ * This function finds a root node, then calls ice_traverse_tx tree, which
+ * traverses the tree and export it's contents to devlink rate.
+ */
+int ice_devlink_rate_init_tx_topology(struct devlink *devlink, struct ice_vsi *vsi)
+{
+	struct ice_port_info *pi = vsi->port_info;
+	struct ice_sched_node *tc_node;
+	struct ice_pf *pf = vsi->back;
+	int i;
+
+	tc_node = pi->root->children[0];
+	mutex_lock(&pi->sched_lock);
+	for (i = 0; i < tc_node->num_children; i++)
+		ice_traverse_tx_tree(devlink, tc_node->children[i], tc_node, pf);
+	mutex_unlock(&pi->sched_lock);
+
+	return 0;
+}
+
+/**
+ * ice_set_object_tx_share - sets node scheduling parameter
+ * @pi: devlink struct instance
+ * @node: node struct instance
+ * @extack: extended netdev ack structure
+ *
+ * This function sets ICE_MIN_BW scheduling BW limit.
+ */
+static int ice_set_object_tx_share(struct ice_port_info *pi, struct ice_sched_node *node,
+				   struct netlink_ext_ack *extack)
+{
+	int status;
+
+	mutex_lock(&pi->sched_lock);
+	status = ice_sched_set_node_bw_lmt(pi, node, ICE_MIN_BW, node->tx_share);
+	mutex_unlock(&pi->sched_lock);
+
+	if (status)
+		NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_share");
+
+	return status;
+}
+
+/**
+ * ice_set_object_tx_max - sets node scheduling parameter
+ * @pi: devlink struct instance
+ * @node: node struct instance
+ * @extack: extended netdev ack structure
+ *
+ * This function sets ICE_MAX_BW scheduling BW limit.
+ */
+static int ice_set_object_tx_max(struct ice_port_info *pi, struct ice_sched_node *node,
+				 struct netlink_ext_ack *extack)
+{
+	int status;
+
+	mutex_lock(&pi->sched_lock);
+	status = ice_sched_set_node_bw_lmt(pi, node, ICE_MAX_BW, node->tx_max);
+	mutex_unlock(&pi->sched_lock);
+
+	if (status)
+		NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_max");
+
+	return status;
+}
+
+/**
+ * ice_set_object_tx_priority - sets node scheduling parameter
+ * @pi: devlink struct instance
+ * @node: node struct instance
+ * @extack: extended netdev ack structure
+ *
+ * This function sets priority of node among siblings.
+ */
+static int ice_set_object_tx_priority(struct ice_port_info *pi, struct ice_sched_node *node,
+				      struct netlink_ext_ack *extack)
+{
+	int status;
+
+	if (node->tx_priority >= 8) {
+		NL_SET_ERR_MSG_MOD(extack, "Priority should be less than 8");
+		return -EINVAL;
+	}
+
+	mutex_lock(&pi->sched_lock);
+	status = ice_sched_set_node_priority(pi, node, node->tx_priority);
+	mutex_unlock(&pi->sched_lock);
+
+	if (status)
+		NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_priority");
+
+	return status;
+}
+
+/**
+ * ice_set_object_tx_weight - sets node scheduling parameter
+ * @pi: devlink struct instance
+ * @node: node struct instance
+ * @extack: extended netdev ack structure
+ *
+ * This function sets node weight for WFQ algorithm.
+ */
+static int ice_set_object_tx_weight(struct ice_port_info *pi, struct ice_sched_node *node,
+				    struct netlink_ext_ack *extack)
+{
+	int status;
+
+	if (node->tx_weight > 200 || node->tx_weight < 1) {
+		NL_SET_ERR_MSG_MOD(extack, "Weight must be between 1 and 200");
+		return -EINVAL;
+	}
+
+	mutex_lock(&pi->sched_lock);
+	status = ice_sched_set_node_weight(pi, node, node->tx_weight);
+	mutex_unlock(&pi->sched_lock);
+
+	if (status)
+		NL_SET_ERR_MSG_MOD(extack, "Can't set scheduling node tx_weight");
+
+	return status;
+}
+
+/**
+ * ice_get_pi_from_dev_rate - get port info from devlink_rate
+ * @rate_node: devlink struct instance
+ *
+ * This function returns corresponding port_info struct of devlink_rate
+ */
+static struct ice_port_info *ice_get_pi_from_dev_rate(struct devlink_rate *rate_node)
+{
+	struct ice_pf *pf = devlink_priv(rate_node->devlink);
+
+	return ice_get_main_vsi(pf)->port_info;
+}
+
+static int ice_devlink_rate_node_new(struct devlink_rate *rate_node, void **priv,
+				     struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+static int ice_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
+				     struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node, *tc_node;
+	struct ice_port_info *pi;
+
+	pi = ice_get_pi_from_dev_rate(rate_node);
+	tc_node = pi->root->children[0];
+	node = priv;
+
+	if (!rate_node->parent || !node || tc_node == node || !extack)
+		return 0;
+
+	/* can't allow to delete a node with children */
+	if (node->num_children)
+		return -EINVAL;
+
+	mutex_lock(&pi->sched_lock);
+	ice_free_sched_node(pi, node);
+	mutex_unlock(&pi->sched_lock);
+
+	return 0;
+}
+
+static int ice_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *priv,
+					    u64 tx_max, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_max = div_u64(tx_max, 10);
+
+	return ice_set_object_tx_max(ice_get_pi_from_dev_rate(rate_leaf), node, extack);
+}
+
+static int ice_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void *priv,
+					      u64 tx_share, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_share = div_u64(tx_share, 10);
+
+	return ice_set_object_tx_share(ice_get_pi_from_dev_rate(rate_leaf), node, extack);
+}
+
+static int ice_devlink_rate_leaf_tx_priority_set(struct devlink_rate *rate_leaf, void *priv,
+						 u64 tx_priority, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_priority = tx_priority;
+
+	return ice_set_object_tx_priority(ice_get_pi_from_dev_rate(rate_leaf), node, extack);
+}
+
+static int ice_devlink_rate_leaf_tx_weight_set(struct devlink_rate *rate_leaf, void *priv,
+					       u64 tx_weight, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_weight = tx_weight;
+
+	return ice_set_object_tx_weight(ice_get_pi_from_dev_rate(rate_leaf), node, extack);
+}
+
+static int ice_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void *priv,
+					    u64 tx_max, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_max = div_u64(tx_max, 10);
+
+	return ice_set_object_tx_max(ice_get_pi_from_dev_rate(rate_node), node, extack);
+}
+
+static int ice_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv,
+					      u64 tx_share, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_share = div_u64(tx_share, 10);
+
+	return ice_set_object_tx_share(ice_get_pi_from_dev_rate(rate_node), node, extack);
+}
+
+static int ice_devlink_rate_node_tx_priority_set(struct devlink_rate *rate_node, void *priv,
+						 u64 tx_priority, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_priority = tx_priority;
+
+	return ice_set_object_tx_priority(ice_get_pi_from_dev_rate(rate_node), node, extack);
+}
+
+static int ice_devlink_rate_node_tx_weight_set(struct devlink_rate *rate_node, void *priv,
+					       u64 tx_weight, struct netlink_ext_ack *extack)
+{
+	struct ice_sched_node *node = priv;
+
+	if (!node)
+		return 0;
+
+	node->tx_weight = tx_weight;
+
+	return ice_set_object_tx_weight(ice_get_pi_from_dev_rate(rate_node), node, extack);
+}
+
+static int ice_devlink_set_parent(struct devlink_rate *devlink_rate,
+				  struct devlink_rate *parent,
+				  void **priv, void *parent_priv,
+				  struct netlink_ext_ack *extack)
+{
+	struct ice_port_info *pi = ice_get_pi_from_dev_rate(devlink_rate);
+	struct ice_sched_node *tc_node, *node, *parent_node;
+	u16 num_nodes_added;
+	u32 first_node_teid;
+	u32 node_teid;
+	int status;
+
+	tc_node = pi->root->children[0];
+	node = *priv;
+
+	if (!extack)
+		return 0;
+
+	if (!parent) {
+		if (!node || tc_node == node || node->num_children)
+			return -EINVAL;
+
+		mutex_lock(&pi->sched_lock);
+		ice_free_sched_node(pi, node);
+		mutex_unlock(&pi->sched_lock);
+
+		return 0;
+	}
+
+	parent_node = parent_priv;
+
+	/* if the node doesn't exist, create it */
+	if (!node) {
+		mutex_lock(&pi->sched_lock);
+
+		status = ice_sched_add_elems(pi, tc_node, parent_node,
+					     parent_node->tx_sched_layer + 1,
+					     1, &num_nodes_added, &first_node_teid);
+
+		mutex_unlock(&pi->sched_lock);
+
+		if (status) {
+			NL_SET_ERR_MSG_MOD(extack, "Can't add a new node");
+			return status;
+		}
+
+		node = ice_sched_find_node_by_teid(parent_node, first_node_teid);
+		*priv = node;
+
+		if (devlink_rate->tx_share) {
+			node->tx_share = devlink_rate->tx_share;
+			ice_set_object_tx_share(pi, node, extack);
+		}
+		if (devlink_rate->tx_max) {
+			node->tx_max = devlink_rate->tx_max;
+			ice_set_object_tx_max(pi, node, extack);
+		}
+		if (devlink_rate->tx_priority) {
+			node->tx_priority = devlink_rate->tx_priority;
+			ice_set_object_tx_priority(pi, node, extack);
+		}
+		if (devlink_rate->tx_weight) {
+			node->tx_weight = devlink_rate->tx_weight;
+			ice_set_object_tx_weight(pi, node, extack);
+		}
+	} else {
+		node_teid = le32_to_cpu(node->info.node_teid);
+		mutex_lock(&pi->sched_lock);
+		status = ice_sched_move_nodes(pi, parent_node, 1, &node_teid);
+		mutex_unlock(&pi->sched_lock);
+
+		if (status)
+			NL_SET_ERR_MSG_MOD(extack, "Can't move existing node to a new parent");
+	}
+
+	return status;
+}
+
 static const struct devlink_ops ice_devlink_ops = {
 	.supported_flash_update_params = DEVLINK_SUPPORT_FLASH_UPDATE_OVERWRITE_MASK,
 	.reload_actions = BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
@@ -725,6 +1113,22 @@ static const struct devlink_ops ice_devlink_ops = {
 	.eswitch_mode_set = ice_eswitch_mode_set,
 	.info_get = ice_devlink_info_get,
 	.flash_update = ice_devlink_flash_update,
+
+	.rate_node_new = ice_devlink_rate_node_new,
+	.rate_node_del = ice_devlink_rate_node_del,
+
+	.rate_leaf_tx_max_set = ice_devlink_rate_leaf_tx_max_set,
+	.rate_leaf_tx_share_set = ice_devlink_rate_leaf_tx_share_set,
+	.rate_leaf_tx_priority_set = ice_devlink_rate_leaf_tx_priority_set,
+	.rate_leaf_tx_weight_set = ice_devlink_rate_leaf_tx_weight_set,
+
+	.rate_node_tx_max_set = ice_devlink_rate_node_tx_max_set,
+	.rate_node_tx_share_set = ice_devlink_rate_node_tx_share_set,
+	.rate_node_tx_priority_set = ice_devlink_rate_node_tx_priority_set,
+	.rate_node_tx_weight_set = ice_devlink_rate_node_tx_weight_set,
+
+	.rate_leaf_parent_set = ice_devlink_set_parent,
+	.rate_node_parent_set = ice_devlink_set_parent,
 };
 
 static int
@@ -1098,6 +1502,8 @@ void ice_devlink_destroy_vf_port(struct ice_vf *vf)
 
 	devlink_port = &vf->devlink_port;
 
+	devl_rate_leaf_destroy(devlink_port);
+
 	devlink_port_type_clear(devlink_port);
 	devlink_port_unregister(devlink_port);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
index fe006d9946f8..8bfed9ee2c4c 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.h
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -18,4 +18,6 @@ void ice_devlink_destroy_vf_port(struct ice_vf *vf);
 void ice_devlink_init_regions(struct ice_pf *pf);
 void ice_devlink_destroy_regions(struct ice_pf *pf);
 
+int ice_devlink_rate_init_tx_topology(struct devlink *devlink, struct ice_vsi *vsi);
+
 #endif /* _ICE_DEVLINK_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_repr.c b/drivers/net/ethernet/intel/ice/ice_repr.c
index bd31748aae1b..837c353f7dbb 100644
--- a/drivers/net/ethernet/intel/ice/ice_repr.c
+++ b/drivers/net/ethernet/intel/ice/ice_repr.c
@@ -399,6 +399,7 @@ static void ice_repr_rem(struct ice_vf *vf)
  */
 void ice_repr_rem_from_all_vfs(struct ice_pf *pf)
 {
+	struct devlink *devlink;
 	struct ice_vf *vf;
 	unsigned int bkt;
 
@@ -406,6 +407,14 @@ void ice_repr_rem_from_all_vfs(struct ice_pf *pf)
 
 	ice_for_each_vf(pf, bkt, vf)
 		ice_repr_rem(vf);
+
+	/* since all port representors are destroyed, there is
+	 * no point in keeping the nodes
+	 */
+	devlink = priv_to_devlink(pf);
+	devl_lock(devlink);
+	devl_rate_nodes_destroy(devlink);
+	devl_unlock(devlink);
 }
 
 /**
@@ -414,6 +423,7 @@ void ice_repr_rem_from_all_vfs(struct ice_pf *pf)
  */
 int ice_repr_add_for_all_vfs(struct ice_pf *pf)
 {
+	struct devlink *devlink;
 	struct ice_vf *vf;
 	unsigned int bkt;
 	int err;
@@ -426,6 +436,9 @@ int ice_repr_add_for_all_vfs(struct ice_pf *pf)
 			goto err;
 	}
 
+	devlink = priv_to_devlink(pf);
+	ice_devlink_rate_init_tx_topology(devlink, ice_get_main_vsi(pf));
+
 	return 0;
 
 err:
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v8 9/9] ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler
  2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
                   ` (7 preceding siblings ...)
  2022-10-28 10:51 ` [PATCH net-next v8 8/9] ice: Implement devlink-rate API Michal Wilczynski
@ 2022-10-28 10:51 ` Michal Wilczynski
  8 siblings, 0 replies; 18+ messages in thread
From: Michal Wilczynski @ 2022-10-28 10:51 UTC (permalink / raw)
  To: netdev
  Cc: alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx, jiri,
	Michal Wilczynski

ADQ, DCB, RDMA might interfere with Custom Tx Scheduler changes that user
might introduce using devlink-rate API.

Check if ADQ, DCB, RDMA is active, when user tries to change any setting
in exported Tx scheduler tree. If any of those are active block the user
from doing so, and log an appropriate message.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c |  4 ++
 drivers/net/ethernet/intel/ice/ice_devlink.c | 71 ++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_idc.c     |  5 ++
 drivers/net/ethernet/intel/ice/ice_type.h    |  1 +
 4 files changed, 81 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
index add90e75f05c..8d7fc76f49af 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
@@ -364,6 +364,10 @@ int ice_pf_dcb_cfg(struct ice_pf *pf, struct ice_dcbx_cfg *new_cfg, bool locked)
 	/* Enable DCB tagging only when more than one TC */
 	if (ice_dcb_get_num_tc(new_cfg) > 1) {
 		dev_dbg(dev, "DCB tagging enabled (num TC > 1)\n");
+		if (pf->hw.port_info->is_custom_tx_enabled) {
+			dev_err(dev, "Custom Tx scheduler feature enabled, can't configure DCB\n");
+			return -EBUSY;
+		}
 		set_bit(ICE_FLAG_DCB_ENA, pf->flags);
 	} else {
 		dev_dbg(dev, "DCB tagging disabled (num TC = 1)\n");
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index 9742ad75b72a..249778b44f3e 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -8,6 +8,7 @@
 #include "ice_devlink.h"
 #include "ice_eswitch.h"
 #include "ice_fw_update.h"
+#include "ice_dcb_lib.h"
 
 static int ice_active_port_option = -1;
 
@@ -713,6 +714,43 @@ ice_devlink_port_unsplit(struct devlink *devlink, struct devlink_port *port,
 	return ice_devlink_port_split(devlink, port, 1, extack);
 }
 
+/**
+ * ice_enable_custom_tx - try to enable custom Tx feature
+ * @pf: devlink struct
+ *
+ * This function tries to enabled custom Tx feature,
+ * it's not possible to enable it, if DCB is active.
+ */
+static bool ice_enable_custom_tx(struct ice_pf *pf)
+{
+	struct ice_port_info *pi = ice_get_main_vsi(pf)->port_info;
+	struct device *dev = ice_pf_to_dev(pf);
+
+	if (pi->is_custom_tx_enabled)
+		/* already enabled, return true */
+		return true;
+
+	if (ice_is_adq_active(pf)) {
+		dev_err(dev, "ADQ active, can't modify Tx scheduler tree\n");
+		return false;
+	}
+
+	if (ice_is_dcb_active(pf)) {
+		dev_err(dev, "DCB active, can't modify Tx scheduler tree\n");
+		return false;
+	}
+
+	/* check if auxiliary bus is plugged */
+	if (pf->adev) {
+		dev_err(dev, "RDMA active, can't modify Tx scheduler tree\n");
+		return false;
+	}
+
+	pi->is_custom_tx_enabled = true;
+
+	return true;
+}
+
 /**
  * ice_traverse_tx_tree - traverse Tx scheduler tree
  * @devlink: devlink struct
@@ -892,6 +930,9 @@ static struct ice_port_info *ice_get_pi_from_dev_rate(struct devlink_rate *rate_
 static int ice_devlink_rate_node_new(struct devlink_rate *rate_node, void **priv,
 				     struct netlink_ext_ack *extack)
 {
+	if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink)))
+		return -EBUSY;
+
 	return 0;
 }
 
@@ -905,6 +946,9 @@ static int ice_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
 	tc_node = pi->root->children[0];
 	node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink)))
+		return -EBUSY;
+
 	if (!rate_node->parent || !node || tc_node == node || !extack)
 		return 0;
 
@@ -924,6 +968,9 @@ static int ice_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -937,6 +984,9 @@ static int ice_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, vo
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -950,6 +1000,9 @@ static int ice_devlink_rate_leaf_tx_priority_set(struct devlink_rate *rate_leaf,
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -963,6 +1016,9 @@ static int ice_devlink_rate_leaf_tx_weight_set(struct devlink_rate *rate_leaf, v
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_leaf->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -976,6 +1032,9 @@ static int ice_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -989,6 +1048,9 @@ static int ice_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, vo
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -1002,6 +1064,9 @@ static int ice_devlink_rate_node_tx_priority_set(struct devlink_rate *rate_node,
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -1015,6 +1080,9 @@ static int ice_devlink_rate_node_tx_weight_set(struct devlink_rate *rate_node, v
 {
 	struct ice_sched_node *node = priv;
 
+	if (!ice_enable_custom_tx(devlink_priv(rate_node->devlink)))
+		return -EBUSY;
+
 	if (!node)
 		return 0;
 
@@ -1041,6 +1109,9 @@ static int ice_devlink_set_parent(struct devlink_rate *devlink_rate,
 	if (!extack)
 		return 0;
 
+	if (!ice_enable_custom_tx(devlink_priv(devlink_rate->devlink)))
+		return -EBUSY;
+
 	if (!parent) {
 		if (!node || tc_node == node || node->num_children)
 			return -EINVAL;
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 895c32bcc8b5..f702bd5272f2 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -273,6 +273,11 @@ int ice_plug_aux_dev(struct ice_pf *pf)
 	if (!ice_is_rdma_ena(pf))
 		return 0;
 
+	if (pf->hw.port_info->is_custom_tx_enabled) {
+		dev_err(ice_pf_to_dev(pf), "Custom Tx scheduler enabled, it's mutually exclusive with RDMA\n");
+		return -EBUSY;
+	}
+
 	iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
 	if (!iadev)
 		return -ENOMEM;
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 3b6d317371cd..05eb30f34871 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -714,6 +714,7 @@ struct ice_port_info {
 	struct ice_qos_cfg qos_cfg;
 	struct xarray sched_node_ids;
 	u8 is_vf:1;
+	u8 is_custom_tx_enabled:1;
 };
 
 struct ice_switch_info {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate
  2022-10-28 10:51 ` [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
@ 2022-10-31 10:13   ` Jiri Pirko
  2022-11-02 10:38     ` Wilczynski, Michal
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Pirko @ 2022-10-31 10:13 UTC (permalink / raw)
  To: Michal Wilczynski
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx

Fri, Oct 28, 2022 at 12:51:35PM CEST, michal.wilczynski@intel.com wrote:
>To fully utilize offload capabilities of Intel 100G card QoS capabilities
>new parameter 'tx_priority' needs to be introduced. This parameter allows

It is highly confusing to call this "parameter". Devlink parameters are
totally different thing. This is just another netlink attribute for
devlink rate object.


>for usage of strict priority arbiter among siblings. This arbitration
>scheme attempts to schedule nodes based on their priority as long as the
>nodes remain within their bandwidth limit.
>
>Introduce new parameter in devlink-rate that will allow for
>configuration of strict priority.
>
>Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
>---
> include/net/devlink.h        |  6 ++++++
> include/uapi/linux/devlink.h |  1 +
> net/core/devlink.c           | 29 +++++++++++++++++++++++++++++
> 3 files changed, 36 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index ba6b8b094943..9d2b0c3c4ad3 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -114,6 +114,8 @@ struct devlink_rate {
> 			refcount_t refcnt;
> 		};
> 	};
>+
>+	u16 tx_priority;
> };
> 
> struct devlink_port {
>@@ -1493,10 +1495,14 @@ struct devlink_ops {
> 				      u64 tx_share, struct netlink_ext_ack *extack);
> 	int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
> 				    u64 tx_max, struct netlink_ext_ack *extack);
>+	int (*rate_leaf_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
>+					 u64 tx_priority, struct netlink_ext_ack *extack);
> 	int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
> 				      u64 tx_share, struct netlink_ext_ack *extack);
> 	int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
> 				    u64 tx_max, struct netlink_ext_ack *extack);
>+	int (*rate_node_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
>+					 u64 tx_priority, struct netlink_ext_ack *extack);
> 	int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
> 			     struct netlink_ext_ack *extack);
> 	int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 2f24b53a87a5..b3df5bc45ba5 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -607,6 +607,7 @@ enum devlink_attr {
> 
> 	DEVLINK_ATTR_SELFTESTS,			/* nested */
> 
>+	DEVLINK_ATTR_RATE_TX_PRIORITY,		/* u16 */
> 	/* add new attributes above here, update the policy in devlink.c */
> 
> 	__DEVLINK_ATTR_MAX,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 89baa7c0938b..2586b1307cb4 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -1184,6 +1184,9 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
> 			      devlink_rate->tx_max, DEVLINK_ATTR_PAD))
> 		goto nla_put_failure;
> 
>+	if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TX_PRIORITY,
>+			devlink_rate->tx_priority))
>+		goto nla_put_failure;
> 	if (devlink_rate->parent)
> 		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
> 				   devlink_rate->parent->name))
>@@ -1924,6 +1927,7 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
> {
> 	struct nlattr *nla_parent, **attrs = info->attrs;
> 	int err = -EOPNOTSUPP;
>+	u16 priority;
> 	u64 rate;
> 
> 	if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) {
>@@ -1952,6 +1956,20 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
> 		devlink_rate->tx_max = rate;
> 	}
> 
>+	if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]) {
>+		priority = nla_get_u16(attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]);
>+		if (devlink_rate_is_leaf(devlink_rate))
>+			err = ops->rate_leaf_tx_priority_set(devlink_rate, devlink_rate->priv,
>+							priority, info->extack);
>+		else if (devlink_rate_is_node(devlink_rate))
>+			err = ops->rate_node_tx_priority_set(devlink_rate, devlink_rate->priv,
>+							priority, info->extack);
>+
>+		if (err)
>+			return err;
>+		devlink_rate->tx_priority = priority;
>+	}
>+
> 	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
> 	if (nla_parent) {
> 		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
>@@ -1983,6 +2001,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
> 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs");
> 			return false;
> 		}
>+		if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) {
>+			NL_SET_ERR_MSG_MOD(info->extack,
>+					   "TX priority set isn't supported for the leafs");
>+			return false;
>+		}
> 	} else if (type == DEVLINK_RATE_TYPE_NODE) {
> 		if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) {
> 			NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes");
>@@ -1997,6 +2020,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
> 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes");
> 			return false;
> 		}
>+		if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) {
>+			NL_SET_ERR_MSG_MOD(info->extack,
>+					   "TX priority set isn't supported for the nodes");
>+			return false;
>+		}
> 	} else {
> 		WARN(1, "Unknown type of rate object");
> 		return false;
>@@ -9172,6 +9200,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
> 	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
> 	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
> 	[DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED },
>+	[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U16 },

Why not u32?


> };
> 
> static const struct genl_small_ops devlink_nl_ops[] = {
>-- 
>2.37.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver
  2022-10-28 10:51 ` [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver Michal Wilczynski
@ 2022-10-31 10:19   ` Jiri Pirko
  2022-11-04 14:34     ` Wilczynski, Michal
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Pirko @ 2022-10-31 10:19 UTC (permalink / raw)
  To: Michal Wilczynski
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx

Fri, Oct 28, 2022 at 12:51:37PM CEST, michal.wilczynski@intel.com wrote:
>Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
>rigid and can't be easily removed. This requires an ability to export
>default hierarchy to allow user to modify it. Currently the driver is
>only able to create the 'leaf' nodes, which usually represent the vport.
>This is not enough for HQoS implemented in Intel hardware.
>
>Introduce new function devl_rate_node_create() that allows for creation
>of the devlink-rate nodes from the driver.
>
>Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
>---
> include/net/devlink.h |  4 ++++
> net/core/devlink.c    | 49 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 53 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 929cb72ef412..9d0a424712fd 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -98,6 +98,8 @@ struct devlink_port_attrs {
> 	};
> };
> 
>+#define DEVLINK_RATE_NAME_MAX_LEN 30
>+
> struct devlink_rate {
> 	struct list_head list;
> 	enum devlink_rate_type type;
>@@ -1601,6 +1603,8 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
> 				   u32 controller, u16 pf, u32 sf,
> 				   bool external);
> int devl_rate_leaf_create(struct devlink_port *port, void *priv);
>+int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
>+			  char *parent_name);
> void devl_rate_leaf_destroy(struct devlink_port *devlink_port);
> void devl_rate_nodes_destroy(struct devlink *devlink);
> void devlink_port_linecard_set(struct devlink_port *devlink_port,
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index b97c077cf66e..08f1bbd54c43 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -10270,6 +10270,55 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 contro
> }
> EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set);
> 
>+/**
>+ * devl_rate_node_create - create devlink rate node
>+ * @devlink: devlink instance
>+ * @priv: driver private data
>+ * @node_name: name of the resulting node
>+ * @parent_name: name of the parent node
>+ *
>+ * Create devlink rate object of type node
>+ */
>+int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, char *parent_name)

Nope, this is certainly incorrect. Do not refer to kernel object by
string. You also don't have internal kernel api based on ifname to refer
to struct net_device instance.

Please have "struct devlink_rate *parent" to refer to parent node and
make this function return "struct devlink_rate *".


>+{
>+	struct devlink_rate *rate_node;
>+	struct devlink_rate *parent;
>+
>+	rate_node = devlink_rate_node_get_by_name(devlink, node_name);
>+	if (!IS_ERR(rate_node))
>+		return -EEXIST;
>+
>+	rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL);
>+	if (!rate_node)
>+		return -ENOMEM;
>+
>+	if (parent_name) {
>+		parent = devlink_rate_node_get_by_name(devlink, parent_name);
>+		if (IS_ERR(parent)) {
>+			kfree(rate_node);
>+			return -ENODEV;
>+		}
>+		rate_node->parent = parent;
>+		refcount_inc(&rate_node->parent->refcnt);
>+	}
>+
>+	rate_node->type = DEVLINK_RATE_TYPE_NODE;
>+	rate_node->devlink = devlink;
>+	rate_node->priv = priv;
>+
>+	rate_node->name = kstrndup(node_name, DEVLINK_RATE_NAME_MAX_LEN, GFP_KERNEL);

Why do you limit the name length? We don't limit the length passed from
user, I see no reason to do it for driver.


>+	if (!rate_node->name) {
>+		kfree(rate_node);
>+		return -ENOMEM;
>+	}
>+
>+	refcount_set(&rate_node->refcnt, 1);
>+	list_add(&rate_node->list, &devlink->rate_list);
>+	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
>+	return 0;
>+}
>+EXPORT_SYMBOL_GPL(devl_rate_node_create);
>+
> /**
>  * devl_rate_leaf_create - create devlink rate leaf
>  * @devlink_port: devlink port object to create rate object on
>-- 
>2.37.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 4/9] devlink: Allow for devlink-rate nodes parent reassignment
  2022-10-28 10:51 ` [PATCH net-next v8 4/9] devlink: Allow for devlink-rate nodes parent reassignment Michal Wilczynski
@ 2022-10-31 10:25   ` Jiri Pirko
  0 siblings, 0 replies; 18+ messages in thread
From: Jiri Pirko @ 2022-10-31 10:25 UTC (permalink / raw)
  To: Michal Wilczynski
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx

Fri, Oct 28, 2022 at 12:51:38PM CEST, michal.wilczynski@intel.com wrote:
>Currently it's not possible to reassign the parent of the node using one
>command. As the previous commit introduced a way to export entire
>hierarchy from the driver, being able to modify and reassign parents
>become important. This way user might easily change QoS settings without
>interrupting traffic.
>
>Example command:
>devlink port function rate set pci/0000:4b:00.0/1 parent node_custom_1
>
>This reassigns leaf node parent to node_custom_1.
>
>Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
>Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 5/9] devlink: Allow to set up parent in devl_rate_leaf_create()
  2022-10-28 10:51 ` [PATCH net-next v8 5/9] devlink: Allow to set up parent in devl_rate_leaf_create() Michal Wilczynski
@ 2022-10-31 10:26   ` Jiri Pirko
  0 siblings, 0 replies; 18+ messages in thread
From: Jiri Pirko @ 2022-10-31 10:26 UTC (permalink / raw)
  To: Michal Wilczynski
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx

Fri, Oct 28, 2022 at 12:51:39PM CEST, michal.wilczynski@intel.com wrote:
>Currently the driver is able to create leaf nodes for the devlink-rate,
>but is unable to set parent for them. This wasn't as issue, before the
>possibility to export hierarchy from the driver. After adding the export
>feature, in order for the driver to supply correct hierarchy, it's
>necessary for it to be able to supply a parent name to
>devl_rate_leaf_create().
>
>Introduce a new parameter 'parent_name' in devl_rate_leaf_create().
>
>Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
>---
> .../ethernet/mellanox/mlx5/core/esw/devlink_port.c |  4 ++--
> drivers/net/netdevsim/dev.c                        |  2 +-
> include/net/devlink.h                              |  2 +-
> net/core/devlink.c                                 | 14 +++++++++++++-
> 4 files changed, 17 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
>index 9bc7be95db54..084a910bb4e7 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
>@@ -91,7 +91,7 @@ int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, u16 vport_
> 	if (err)
> 		goto reg_err;
> 
>-	err = devl_rate_leaf_create(dl_port, vport);
>+	err = devl_rate_leaf_create(dl_port, vport, NULL);
> 	if (err)
> 		goto rate_err;
> 
>@@ -160,7 +160,7 @@ int mlx5_esw_devlink_sf_port_register(struct mlx5_eswitch *esw, struct devlink_p
> 	if (err)
> 		return err;
> 
>-	err = devl_rate_leaf_create(dl_port, vport);
>+	err = devl_rate_leaf_create(dl_port, vport, NULL);
> 	if (err)
> 		goto rate_err;
> 
>diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
>index 794fc0cc73b8..10e5c4de6b02 100644
>--- a/drivers/net/netdevsim/dev.c
>+++ b/drivers/net/netdevsim/dev.c
>@@ -1392,7 +1392,7 @@ static int __nsim_dev_port_add(struct nsim_dev *nsim_dev, enum nsim_dev_port_typ
> 
> 	if (nsim_dev_port_is_vf(nsim_dev_port)) {
> 		err = devl_rate_leaf_create(&nsim_dev_port->devlink_port,
>-					    nsim_dev_port);
>+					    nsim_dev_port, NULL);
> 		if (err)
> 			goto err_nsim_destroy;
> 	}
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 9d0a424712fd..2ccb69606d23 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -1602,7 +1602,7 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
> void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
> 				   u32 controller, u16 pf, u32 sf,
> 				   bool external);
>-int devl_rate_leaf_create(struct devlink_port *port, void *priv);
>+int devl_rate_leaf_create(struct devlink_port *port, void *priv, char *parent_name);
> int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
> 			  char *parent_name);
> void devl_rate_leaf_destroy(struct devlink_port *devlink_port);
>diff --git a/net/core/devlink.c b/net/core/devlink.c
>index 9bdbc158c36a..140336c09bd5 100644
>--- a/net/core/devlink.c
>+++ b/net/core/devlink.c
>@@ -10325,13 +10325,15 @@ EXPORT_SYMBOL_GPL(devl_rate_node_create);
>  * devl_rate_leaf_create - create devlink rate leaf
>  * @devlink_port: devlink port object to create rate object on
>  * @priv: driver private data
>+ * @parent_name: name of the parent node
>  *
>  * Create devlink rate object of type leaf on provided @devlink_port.
>  */
>-int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv)
>+int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv, char *parent_name)

Again, don't refer to parent object by string, but rather pointer to the
struct.


> {
> 	struct devlink *devlink = devlink_port->devlink;
> 	struct devlink_rate *devlink_rate;
>+	struct devlink_rate *parent;
> 
> 	devl_assert_locked(devlink_port->devlink);
> 
>@@ -10342,6 +10344,16 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv)
> 	if (!devlink_rate)
> 		return -ENOMEM;
> 
>+	if (parent_name) {
>+		parent = devlink_rate_node_get_by_name(devlink, parent_name);
>+		if (IS_ERR(parent)) {
>+			kfree(devlink_rate);
>+			return -ENODEV;
>+		}
>+		devlink_rate->parent = parent;
>+		refcount_inc(&devlink_rate->parent->refcnt);
>+	}
>+
> 	devlink_rate->type = DEVLINK_RATE_TYPE_LEAF;
> 	devlink_rate->devlink = devlink;
> 	devlink_rate->devlink_port = devlink_port;
>-- 
>2.37.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks
  2022-10-28 10:51 ` [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks Michal Wilczynski
@ 2022-10-31 12:22   ` Jiri Pirko
  2022-11-04 14:38     ` Wilczynski, Michal
  0 siblings, 1 reply; 18+ messages in thread
From: Jiri Pirko @ 2022-10-31 12:22 UTC (permalink / raw)
  To: Michal Wilczynski
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx

Fri, Oct 28, 2022 at 12:51:40PM CEST, michal.wilczynski@intel.com wrote:
>From driver perspective it doesn't make any sense to make any changes to
>the internal HQoS tree if the created node doesn't have a parent. So a
>node created without any parent doesn't have to be initialized in the
>driver. Allow for such scenario by allowing to modify priv in parent_set
>callbacks.
>
>Change priv parameter to double pointer, to allow for setting priv during
>the parent set phase.

I fail to understand the reason for this patch, but anyway, it looks
very hacky. The priv is something the leaf/node is created with.
Changing it from the callback awfully smells like wrong design. Please
don't do that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate
  2022-10-31 10:13   ` Jiri Pirko
@ 2022-11-02 10:38     ` Wilczynski, Michal
  0 siblings, 0 replies; 18+ messages in thread
From: Wilczynski, Michal @ 2022-11-02 10:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx



On 10/31/2022 11:13 AM, Jiri Pirko wrote:
> Fri, Oct 28, 2022 at 12:51:35PM CEST, michal.wilczynski@intel.com wrote:
>> To fully utilize offload capabilities of Intel 100G card QoS capabilities
>> new parameter 'tx_priority' needs to be introduced. This parameter allows
> It is highly confusing to call this "parameter". Devlink parameters are
> totally different thing. This is just another netlink attribute for
> devlink rate object.

Hi,
Thanks for reviewing this so quickly,
I will change this.

>
>
>> for usage of strict priority arbiter among siblings. This arbitration
>> scheme attempts to schedule nodes based on their priority as long as the
>> nodes remain within their bandwidth limit.
>>
>> Introduce new parameter in devlink-rate that will allow for
>> configuration of strict priority.
>>
>> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
>> ---
>> include/net/devlink.h        |  6 ++++++
>> include/uapi/linux/devlink.h |  1 +
>> net/core/devlink.c           | 29 +++++++++++++++++++++++++++++
>> 3 files changed, 36 insertions(+)
>>
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index ba6b8b094943..9d2b0c3c4ad3 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -114,6 +114,8 @@ struct devlink_rate {
>> 			refcount_t refcnt;
>> 		};
>> 	};
>> +
>> +	u16 tx_priority;
>> };
>>
>> struct devlink_port {
>> @@ -1493,10 +1495,14 @@ struct devlink_ops {
>> 				      u64 tx_share, struct netlink_ext_ack *extack);
>> 	int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
>> 				    u64 tx_max, struct netlink_ext_ack *extack);
>> +	int (*rate_leaf_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
>> +					 u64 tx_priority, struct netlink_ext_ack *extack);
>> 	int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
>> 				      u64 tx_share, struct netlink_ext_ack *extack);
>> 	int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
>> 				    u64 tx_max, struct netlink_ext_ack *extack);
>> +	int (*rate_node_tx_priority_set)(struct devlink_rate *devlink_rate, void *priv,
>> +					 u64 tx_priority, struct netlink_ext_ack *extack);
>> 	int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
>> 			     struct netlink_ext_ack *extack);
>> 	int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
>> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>> index 2f24b53a87a5..b3df5bc45ba5 100644
>> --- a/include/uapi/linux/devlink.h
>> +++ b/include/uapi/linux/devlink.h
>> @@ -607,6 +607,7 @@ enum devlink_attr {
>>
>> 	DEVLINK_ATTR_SELFTESTS,			/* nested */
>>
>> +	DEVLINK_ATTR_RATE_TX_PRIORITY,		/* u16 */
>> 	/* add new attributes above here, update the policy in devlink.c */
>>
>> 	__DEVLINK_ATTR_MAX,
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index 89baa7c0938b..2586b1307cb4 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -1184,6 +1184,9 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
>> 			      devlink_rate->tx_max, DEVLINK_ATTR_PAD))
>> 		goto nla_put_failure;
>>
>> +	if (nla_put_u16(msg, DEVLINK_ATTR_RATE_TX_PRIORITY,
>> +			devlink_rate->tx_priority))
>> +		goto nla_put_failure;
>> 	if (devlink_rate->parent)
>> 		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
>> 				   devlink_rate->parent->name))
>> @@ -1924,6 +1927,7 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
>> {
>> 	struct nlattr *nla_parent, **attrs = info->attrs;
>> 	int err = -EOPNOTSUPP;
>> +	u16 priority;
>> 	u64 rate;
>>
>> 	if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) {
>> @@ -1952,6 +1956,20 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
>> 		devlink_rate->tx_max = rate;
>> 	}
>>
>> +	if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]) {
>> +		priority = nla_get_u16(attrs[DEVLINK_ATTR_RATE_TX_PRIORITY]);
>> +		if (devlink_rate_is_leaf(devlink_rate))
>> +			err = ops->rate_leaf_tx_priority_set(devlink_rate, devlink_rate->priv,
>> +							priority, info->extack);
>> +		else if (devlink_rate_is_node(devlink_rate))
>> +			err = ops->rate_node_tx_priority_set(devlink_rate, devlink_rate->priv,
>> +							priority, info->extack);
>> +
>> +		if (err)
>> +			return err;
>> +		devlink_rate->tx_priority = priority;
>> +	}
>> +
>> 	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
>> 	if (nla_parent) {
>> 		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
>> @@ -1983,6 +2001,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
>> 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the leafs");
>> 			return false;
>> 		}
>> +		if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_leaf_tx_priority_set) {
>> +			NL_SET_ERR_MSG_MOD(info->extack,
>> +					   "TX priority set isn't supported for the leafs");
>> +			return false;
>> +		}
>> 	} else if (type == DEVLINK_RATE_TYPE_NODE) {
>> 		if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) {
>> 			NL_SET_ERR_MSG_MOD(info->extack, "TX share set isn't supported for the nodes");
>> @@ -1997,6 +2020,11 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
>> 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes");
>> 			return false;
>> 		}
>> +		if (attrs[DEVLINK_ATTR_RATE_TX_PRIORITY] && !ops->rate_node_tx_priority_set) {
>> +			NL_SET_ERR_MSG_MOD(info->extack,
>> +					   "TX priority set isn't supported for the nodes");
>> +			return false;
>> +		}
>> 	} else {
>> 		WARN(1, "Unknown type of rate object");
>> 		return false;
>> @@ -9172,6 +9200,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>> 	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
>> 	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
>> 	[DEVLINK_ATTR_SELFTESTS] = { .type = NLA_NESTED },
>> +	[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U16 },
> Why not u32?

I felt like u32 would be too much for those variables, cause they
represent priority and weight among siblings in the tree.
Currently we don't allow that many siblings in the tree so
frankly this could even be u8, but I don't want to arbitrarily
limit this only to intel hardware, so u16 seems like
a sweet spot.

BR,
Michał


>
>
>> };
>>
>> static const struct genl_small_ops devlink_nl_ops[] = {
>> -- 
>> 2.37.2
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver
  2022-10-31 10:19   ` Jiri Pirko
@ 2022-11-04 14:34     ` Wilczynski, Michal
  0 siblings, 0 replies; 18+ messages in thread
From: Wilczynski, Michal @ 2022-11-04 14:34 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx



On 10/31/2022 11:19 AM, Jiri Pirko wrote:
> Fri, Oct 28, 2022 at 12:51:37PM CEST, michal.wilczynski@intel.com wrote:
>> Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very
>> rigid and can't be easily removed. This requires an ability to export
>> default hierarchy to allow user to modify it. Currently the driver is
>> only able to create the 'leaf' nodes, which usually represent the vport.
>> This is not enough for HQoS implemented in Intel hardware.
>>
>> Introduce new function devl_rate_node_create() that allows for creation
>> of the devlink-rate nodes from the driver.
>>
>> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
>> ---
>> include/net/devlink.h |  4 ++++
>> net/core/devlink.c    | 49 +++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 53 insertions(+)
>>
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index 929cb72ef412..9d0a424712fd 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -98,6 +98,8 @@ struct devlink_port_attrs {
>> 	};
>> };
>>
>> +#define DEVLINK_RATE_NAME_MAX_LEN 30
>> +
>> struct devlink_rate {
>> 	struct list_head list;
>> 	enum devlink_rate_type type;
>> @@ -1601,6 +1603,8 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
>> 				   u32 controller, u16 pf, u32 sf,
>> 				   bool external);
>> int devl_rate_leaf_create(struct devlink_port *port, void *priv);
>> +int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
>> +			  char *parent_name);
>> void devl_rate_leaf_destroy(struct devlink_port *devlink_port);
>> void devl_rate_nodes_destroy(struct devlink *devlink);
>> void devlink_port_linecard_set(struct devlink_port *devlink_port,
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index b97c077cf66e..08f1bbd54c43 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -10270,6 +10270,55 @@ void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, u32 contro
>> }
>> EXPORT_SYMBOL_GPL(devlink_port_attrs_pci_sf_set);
>>
>> +/**
>> + * devl_rate_node_create - create devlink rate node
>> + * @devlink: devlink instance
>> + * @priv: driver private data
>> + * @node_name: name of the resulting node
>> + * @parent_name: name of the parent node
>> + *
>> + * Create devlink rate object of type node
>> + */
>> +int devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name, char *parent_name)
> Nope, this is certainly incorrect. Do not refer to kernel object by
> string. You also don't have internal kernel api based on ifname to refer
> to struct net_device instance.
>
> Please have "struct devlink_rate *parent" to refer to parent node and
> make this function return "struct devlink_rate *".

Okay, I changed that and re-sent. The downside is I have to
store devlink_rate pointers in the driver instead of just names.

>
>
>> +{
>> +	struct devlink_rate *rate_node;
>> +	struct devlink_rate *parent;
>> +
>> +	rate_node = devlink_rate_node_get_by_name(devlink, node_name);
>> +	if (!IS_ERR(rate_node))
>> +		return -EEXIST;
>> +
>> +	rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL);
>> +	if (!rate_node)
>> +		return -ENOMEM;
>> +
>> +	if (parent_name) {
>> +		parent = devlink_rate_node_get_by_name(devlink, parent_name);
>> +		if (IS_ERR(parent)) {
>> +			kfree(rate_node);
>> +			return -ENODEV;
>> +		}
>> +		rate_node->parent = parent;
>> +		refcount_inc(&rate_node->parent->refcnt);
>> +	}
>> +
>> +	rate_node->type = DEVLINK_RATE_TYPE_NODE;
>> +	rate_node->devlink = devlink;
>> +	rate_node->priv = priv;
>> +
>> +	rate_node->name = kstrndup(node_name, DEVLINK_RATE_NAME_MAX_LEN, GFP_KERNEL);
> Why do you limit the name length? We don't limit the length passed from
> user, I see no reason to do it for driver.

I thought it's safer to limit this to avoid buffer overflow.
Changed this in v9.

>
>
>> +	if (!rate_node->name) {
>> +		kfree(rate_node);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	refcount_set(&rate_node->refcnt, 1);
>> +	list_add(&rate_node->list, &devlink->rate_list);
>> +	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(devl_rate_node_create);
>> +
>> /**
>>   * devl_rate_leaf_create - create devlink rate leaf
>>   * @devlink_port: devlink port object to create rate object on
>> -- 
>> 2.37.2
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks
  2022-10-31 12:22   ` Jiri Pirko
@ 2022-11-04 14:38     ` Wilczynski, Michal
  0 siblings, 0 replies; 18+ messages in thread
From: Wilczynski, Michal @ 2022-11-04 14:38 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, alexandr.lobakin, jacob.e.keller, jesse.brandeburg,
	przemyslaw.kitszel, anthony.l.nguyen, kuba, ecree.xilinx



On 10/31/2022 1:22 PM, Jiri Pirko wrote:
> Fri, Oct 28, 2022 at 12:51:40PM CEST, michal.wilczynski@intel.com wrote:
> >From driver perspective it doesn't make any sense to make any changes to
>> the internal HQoS tree if the created node doesn't have a parent. So a
>> node created without any parent doesn't have to be initialized in the
>> driver. Allow for such scenario by allowing to modify priv in parent_set
>> callbacks.
>>
>> Change priv parameter to double pointer, to allow for setting priv during
>> the parent set phase.
> I fail to understand the reason for this patch, but anyway, it looks
> very hacky. The priv is something the leaf/node is created with.
> Changing it from the callback awfully smells like wrong design. Please
> don't do that.

I was trying to point-out that nodes without any parent, or children
doesn't actually exist in any hierarchy, so in driver internally we don't
really need objects representing them.
Anyway I removed this commit in v9, this involved pre-allocation of
ice_sched_node so it's not ideal for me either, but it solves the problem.

Thanks for reviewing.

BR,
Michał



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-11-04 14:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-28 10:51 [PATCH net-next v8 0/9] Implement devlink-rate API and extend it Michal Wilczynski
2022-10-28 10:51 ` [PATCH net-next v8 1/9] devlink: Introduce new parameter 'tx_priority' to devlink-rate Michal Wilczynski
2022-10-31 10:13   ` Jiri Pirko
2022-11-02 10:38     ` Wilczynski, Michal
2022-10-28 10:51 ` [PATCH net-next v8 2/9] devlink: Introduce new parameter 'tx_weight' " Michal Wilczynski
2022-10-28 10:51 ` [PATCH net-next v8 3/9] devlink: Enable creation of the devlink-rate nodes from the driver Michal Wilczynski
2022-10-31 10:19   ` Jiri Pirko
2022-11-04 14:34     ` Wilczynski, Michal
2022-10-28 10:51 ` [PATCH net-next v8 4/9] devlink: Allow for devlink-rate nodes parent reassignment Michal Wilczynski
2022-10-31 10:25   ` Jiri Pirko
2022-10-28 10:51 ` [PATCH net-next v8 5/9] devlink: Allow to set up parent in devl_rate_leaf_create() Michal Wilczynski
2022-10-31 10:26   ` Jiri Pirko
2022-10-28 10:51 ` [PATCH net-next v8 6/9] devlink: Allow to change priv in devlink-rate from parent_set callbacks Michal Wilczynski
2022-10-31 12:22   ` Jiri Pirko
2022-11-04 14:38     ` Wilczynski, Michal
2022-10-28 10:51 ` [PATCH net-next v8 7/9] ice: Introduce new parameters in ice_sched_node Michal Wilczynski
2022-10-28 10:51 ` [PATCH net-next v8 8/9] ice: Implement devlink-rate API Michal Wilczynski
2022-10-28 10:51 ` [PATCH net-next v8 9/9] ice: Prevent ADQ, DCB, RDMA coexistence with Custom Tx scheduler Michal Wilczynski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.