All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] devlink rate police limiter
@ 2022-06-20 15:26 Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 1/5] devlink: Introduce limit_type attr for rate objects Dima Chumak
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Dima Chumak

Currently, kernel provides a way to limit tx rate of a VF via devlink
rate function of a port. The underlying mechanism is a shaper applied to
all traffic passing through the target VF or a group of VFs. By its
essence, a shaper naturally works with outbound traffic, and in
practice, it's rarely seen to be implemented for inbound traffic.
Nevertheless, there is a user request to have a mechanism for limiting
inbound traffic as well. It is usually done by using some form of
traffic policing, dropping excess packets over the configured limit that
set by a user. Thus, introducing another limiting mechanism to the port
function can help close this gap.

This series introduces devlink attrs, along with their ops, to manage
rate policing of a single port as well as a port group. It is based on
the existing notion of leaf and node rate objects, and extends their
attributes to support both RX and TX limiting, for a number of packets
per second and/or a number of bytes per second. Additionally, there is a
second set of parameters for specifying the size of buffering performed,
called "burst", that controls the allowed level of spikes in traffic
before it starts getting dropped.

A new sub-type of a devlink_rate object is introduced, called
"limit_type". It can be either "shaping", the default, or "police".
A single leaf or a node object can be switched from one limit type to
another, but it cannot do both types of rate limiting simultaneously.
A node and a leaf object that have parent-child relationship must have
the same limit type. In other words, it's only possible to group rate
objects of the same limit type as their group's limit_type.

devlink_ops extended with following callbacks:
- rate_{leaf|node}_tx_{burst|pkts|pkts_burst}_set
- rate_{leaf|node}_rx_{max|burst|pkts|pkts_burst}_set

UAPI provides:
- setting tx_{burst|pkts|pkts_burst} and rx_{max|burst|pkts|pkts_burst}
  of a rate object

Added devlink_rate police attrs support for netdevsim driver.

Issues/open questions:
- Current implementation requires a user to set both "rate" and "burst"
  parameters explicitly, in order to activate police rate limiting. For
  example, "rx_max 200Mbit rx_burst 16mb". Is it necessary to
  automagically deduce "burst" value when it's omitted by the user?
  For example when user only sets "rx_max 200Mbit".
- If answer is positive to the first question, at which level it's
  better to be done, at user-space iproute2, at kernel devlink core or
  at vendor driver that implements devlink_ops for police attrs?

CLI examples:

  $ devlink port function rate show
  netdevsim/netdevsim10/128: type leaf limit_type unset
  netdevsim/netdevsim10/129: type leaf limit_type unset
  netdevsim/netdevsim10/130: type leaf limit_type unset

  # Set police rate limiting of inbound traffic
  $ devlink port function rate set netdevsim/netdevsim10/128 \
            limit_type police rx_max 100mbit rx_burst 10mbit
  $ devlink port function rate show
  netdevsim/netdevsim10/128: type leaf limit_type police rx_max 100Mbit rx_burst 10485Kbit

  # Set shaping rate limiting of outbound traffic (default limit_type)
  $ devlink port function rate set netdevsim/netdevsim10/129 tx_max 200mbit
  $ devlink port function rate show
  netdevsim/netdevsim10/129: type leaf limit_type shaping tx_max 200Mbit

  # Attempt to set police attr with the default shaping limit_type
  $ devlink port function rate set netdevsim/netdevsim10/129 rx_max 400mbit
  Unsupported option "rx_max" for limit_type "shaping"

  # Set police rate attr for a port that already has active shaping
  $ devlink port function rate set netdevsim/netdevsim10/129 limit_type police rx_max 400mbit
  Error: devlink: Cannot change limit_type of the rate leaf object, reset current rate attributes first.
  kernel answers: Device or resource busy

  # Create a rate group
  $ devlink port function rate add netdevsim/netdevsim10/g1 \
            limit_type police rx_max 1Gbit
  $ devlink port function rate show
  netdevsim/netdevsim10/g1: type node limit_type police rx_max 1Gbit

  # Add port to the group
  $ devlink port function rate set netdevsim/netdevsim10/128 parent g1
  $ devlink port function rate show
  netdevsim/netdevsim10/g1: type node limit_type police rx_max 1Gbit
  netdevsim/netdevsim10/128: type leaf limit_type police rx_max 100Mbit rx_burst 10485Kbit parent g1
  netdevsim/netdevsim10/129: type leaf limit_type shaping tx_max 200Mbit
  netdevsim/netdevsim10/130: type leaf limit_type unset

  # Try to add a port with a non-matching limit_type to the group
  $ devlink port function rate set netdevsim/netdevsim10/129 parent g1
  Error: devlink: Parent and object should be of the same limit_type.
  kernel answers: Invalid argument

  # Adding a port with "unset" limit_type to a group inherits the
  # group's limit_type
  $ devlink port function rate set netdevsim/netdevsim10/130 parent g1
  $ devlink port function rate show
  netdevsim/netdevsim10/130: type leaf limit_type police parent g1

  # Set all police parameters
  $ devlink port func rate set netdevsim/netdevsim10/130 \
            limit_type police tx_max 10GBps tx_burst 1gb \
                              rx_max 25GBps rx_burst 2gb \
                              tx_pkts 10000 tx_pkts_burst 1gb \
                              rx_pkts 20000 rx_pkts_burst 2gb

Dima Chumak (5):
  devlink: Introduce limit_type attr for rate objects
  devlink: Introduce police rate limit type
  netdevsim: Support devlink rate limit_type police
  selftest: netdevsim: Add devlink rate police sub-test
  Documentation: devlink rate objects limit_type

 .../networking/devlink/devlink-port.rst       |  44 ++-
 .../networking/devlink/netdevsim.rst          |   3 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c |  28 +-
 drivers/net/netdevsim/dev.c                   | 211 ++++++++++-
 drivers/net/netdevsim/netdevsim.h             |  11 +-
 include/net/devlink.h                         |  52 ++-
 include/uapi/linux/devlink.h                  |  15 +
 net/core/devlink.c                            | 336 ++++++++++++++++--
 .../drivers/net/netdevsim/devlink.sh          | 215 ++++++++++-
 9 files changed, 853 insertions(+), 62 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net-next 1/5] devlink: Introduce limit_type attr for rate objects
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
@ 2022-06-20 15:26 ` Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 2/5] devlink: Introduce police rate limit type Dima Chumak
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Dima Chumak

Lay foundation to support different kinds of rate limiting that may be
performed by rate objects.

Existing rate limiting type is dubbed as 'shaping' and it is assumed by
default when limit_type attribute isn't set explicitly. Following patch
in the series will introduce new limit type 'police'.

Leaf rate objects inherit their limit_type from a parent node object if
it hasn't been explicitly set for the leaf object.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c |  28 +++-
 include/net/devlink.h                         |  12 +-
 include/uapi/linux/devlink.h                  |   7 +
 net/core/devlink.c                            | 123 ++++++++++++++----
 4 files changed, 140 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 694c54066955..50bd4536fab1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -874,8 +874,8 @@ int mlx5_esw_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void *
 int mlx5_esw_devlink_rate_node_new(struct devlink_rate *rate_node, void **priv,
 				   struct netlink_ext_ack *extack)
 {
-	struct mlx5_esw_rate_group *group;
 	struct mlx5_eswitch *esw;
+	void *group;
 	int err = 0;
 
 	esw = mlx5_devlink_eswitch_get(rate_node->devlink);
@@ -890,7 +890,17 @@ int mlx5_esw_devlink_rate_node_new(struct devlink_rate *rate_node, void **priv,
 		goto unlock;
 	}
 
-	group = esw_qos_create_rate_group(esw, extack);
+	switch (rate_node->limit_type) {
+	case DEVLINK_RATE_LIMIT_TYPE_UNSET:
+		group = ERR_PTR(-EINVAL);
+		break;
+	case DEVLINK_RATE_LIMIT_TYPE_SHAPING:
+		group = esw_qos_create_rate_group(esw, extack);
+		break;
+	default:
+		group = ERR_PTR(-EOPNOTSUPP);
+	}
+
 	if (IS_ERR(group)) {
 		err = PTR_ERR(group);
 		goto unlock;
@@ -905,7 +915,6 @@ int mlx5_esw_devlink_rate_node_new(struct devlink_rate *rate_node, void **priv,
 int mlx5_esw_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
 				   struct netlink_ext_ack *extack)
 {
-	struct mlx5_esw_rate_group *group = priv;
 	struct mlx5_eswitch *esw;
 	int err;
 
@@ -914,7 +923,18 @@ int mlx5_esw_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
 		return PTR_ERR(esw);
 
 	mutex_lock(&esw->state_lock);
-	err = esw_qos_destroy_rate_group(esw, group, extack);
+
+	switch (rate_node->limit_type) {
+	case DEVLINK_RATE_LIMIT_TYPE_UNSET:
+		err = -EINVAL;
+		break;
+	case DEVLINK_RATE_LIMIT_TYPE_SHAPING:
+		err = esw_qos_destroy_rate_group(esw, priv, extack);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+	}
+
 	mutex_unlock(&esw->state_lock);
 	return err;
 }
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 2a2a2a0c93f7..4fe8e657da44 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -98,13 +98,21 @@ struct devlink_port_attrs {
 	};
 };
 
+struct devlink_rate_shaping_attrs {
+	u64 tx_max;
+	u64 tx_share;
+};
+
 struct devlink_rate {
 	struct list_head list;
+	enum devlink_rate_limit_type limit_type;
 	enum devlink_rate_type type;
 	struct devlink *devlink;
 	void *priv;
-	u64 tx_share;
-	u64 tx_max;
+
+	union { /* on limit_type */
+		struct devlink_rate_shaping_attrs shaping_attrs;
+	};
 
 	struct devlink_rate *parent;
 	union {
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b3d40a5d72ff..53aad0d09231 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -221,6 +221,11 @@ enum devlink_rate_type {
 	DEVLINK_RATE_TYPE_NODE,
 };
 
+enum devlink_rate_limit_type {
+	DEVLINK_RATE_LIMIT_TYPE_UNSET,
+	DEVLINK_RATE_LIMIT_TYPE_SHAPING,
+};
+
 enum devlink_param_cmode {
 	DEVLINK_PARAM_CMODE_RUNTIME,
 	DEVLINK_PARAM_CMODE_DRIVERINIT,
@@ -576,6 +581,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
 	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
 
+	DEVLINK_ATTR_RATE_LIMIT_TYPE,		/* u16 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index db61f3a341cb..756d95c72b4d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -354,6 +354,18 @@ devlink_rate_is_node(struct devlink_rate *devlink_rate)
 	return devlink_rate->type == DEVLINK_RATE_TYPE_NODE;
 }
 
+static inline bool
+devlink_rate_is_unset(struct devlink_rate *devlink_rate)
+{
+	return devlink_rate->limit_type == DEVLINK_RATE_LIMIT_TYPE_UNSET;
+}
+
+static inline bool
+devlink_rate_is_shaping(struct devlink_rate *devlink_rate)
+{
+	return devlink_rate->limit_type == DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+}
+
 static struct devlink_rate *
 devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info)
 {
@@ -1093,13 +1105,27 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
 			goto nla_put_failure;
 	}
 
-	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE,
-			      devlink_rate->tx_share, DEVLINK_ATTR_PAD))
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_LIMIT_TYPE,
+			      devlink_rate->limit_type, DEVLINK_ATTR_PAD))
 		goto nla_put_failure;
 
-	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX,
-			      devlink_rate->tx_max, DEVLINK_ATTR_PAD))
-		goto nla_put_failure;
+	if (devlink_rate_is_unset(devlink_rate)) {
+		/* For backward compatibility with older user-space clients that
+		 * don't understatnd DEVLINK_ATTR_RATE_LIMIT_TYPE, report tx_max
+		 * and tx_share as being "unlimited".
+		 */
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX, 0, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE, 0, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+	} else if (devlink_rate_is_shaping(devlink_rate)) {
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX,
+				      devlink_rate->shaping_attrs.tx_max, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE,
+				      devlink_rate->shaping_attrs.tx_share, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+	}
 
 	if (devlink_rate->parent)
 		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
@@ -1850,6 +1876,12 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 			return -EEXIST;
 		}
 
+		if (parent->limit_type != devlink_rate->limit_type) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "Parent and object should be of the same limit_type");
+			return -EINVAL;
+		}
+
 		if (devlink_rate_is_leaf(devlink_rate))
 			err = ops->rate_leaf_parent_set(devlink_rate, parent,
 							devlink_rate->priv, parent->priv,
@@ -1873,44 +1905,82 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 			       struct genl_info *info)
 {
 	struct nlattr *nla_parent, **attrs = info->attrs;
-	int err = -EOPNOTSUPP;
-	u64 rate;
+	struct devlink_rate *parent;
+	int err = 0;
+	u16 new_limit_type;
+	u64 new_val;
+
+	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
+
+	if (attrs[DEVLINK_ATTR_RATE_LIMIT_TYPE]) {
+		new_limit_type = nla_get_u16(attrs[DEVLINK_ATTR_RATE_LIMIT_TYPE]);
+		if (devlink_rate_is_unset(devlink_rate))
+			devlink_rate->limit_type = new_limit_type;
+		if (devlink_rate->limit_type != new_limit_type) {
+			if (devlink_rate_is_node(devlink_rate)) {
+				NL_SET_ERR_MSG_MOD(info->extack,
+						   "Cannot change limit_type of the rate node object, delete and add a new one instead.");
+				return -EINVAL;
+			}
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "Cannot change limit_type of the rate leaf object, reset current rate attributes first.");
+			return -EBUSY;
+		}
+	}
+
+	if (devlink_rate_is_unset(devlink_rate)) {
+		if (nla_parent) {
+			parent = devlink_rate_node_get_by_name(devlink_rate->devlink,
+							       nla_data(nla_parent));
+			if (!IS_ERR(parent))
+				devlink_rate->limit_type = parent->limit_type;
+			else
+				devlink_rate->limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+		} else {
+			devlink_rate->limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+		}
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && devlink_rate_is_shaping(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_SHARE]);
 
-	if (attrs[DEVLINK_ATTR_RATE_TX_SHARE]) {
-		rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_SHARE]);
 		if (devlink_rate_is_leaf(devlink_rate))
 			err = ops->rate_leaf_tx_share_set(devlink_rate, devlink_rate->priv,
-							  rate, info->extack);
+							  new_val, info->extack);
 		else if (devlink_rate_is_node(devlink_rate))
 			err = ops->rate_node_tx_share_set(devlink_rate, devlink_rate->priv,
-							  rate, info->extack);
+							  new_val, info->extack);
 		if (err)
 			return err;
-		devlink_rate->tx_share = rate;
+		devlink_rate->shaping_attrs.tx_share = new_val;
 	}
 
 	if (attrs[DEVLINK_ATTR_RATE_TX_MAX]) {
-		rate = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_MAX]);
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_MAX]);
+
 		if (devlink_rate_is_leaf(devlink_rate))
 			err = ops->rate_leaf_tx_max_set(devlink_rate, devlink_rate->priv,
-							rate, info->extack);
+							new_val, info->extack);
 		else if (devlink_rate_is_node(devlink_rate))
 			err = ops->rate_node_tx_max_set(devlink_rate, devlink_rate->priv,
-							rate, info->extack);
+							new_val, info->extack);
 		if (err)
 			return err;
-		devlink_rate->tx_max = rate;
+		devlink_rate->shaping_attrs.tx_max = new_val;
 	}
 
-	nla_parent = attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME];
-	if (nla_parent) {
-		err = devlink_nl_rate_parent_node_set(devlink_rate, info,
-						      nla_parent);
-		if (err)
-			return err;
-	}
+	if (nla_parent)
+		err = devlink_nl_rate_parent_node_set(devlink_rate, info, nla_parent);
 
-	return 0;
+	/* reset limit_type when all attrs have been cleared, relevant only for
+	 * leaf objects as node objects get deleted altogether
+	 */
+	if (devlink_rate_is_leaf(devlink_rate) && !devlink_rate->parent &&
+	    ((devlink_rate_is_shaping(devlink_rate) &&
+	      !devlink_rate->shaping_attrs.tx_max && !devlink_rate->shaping_attrs.tx_share)))
+		devlink_rate->limit_type = DEVLINK_RATE_LIMIT_TYPE_UNSET;
+
+	return err;
 }
 
 static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
@@ -2002,6 +2072,10 @@ static int devlink_nl_cmd_rate_new_doit(struct sk_buff *skb,
 
 	rate_node->devlink = devlink;
 	rate_node->type = DEVLINK_RATE_TYPE_NODE;
+	if (info->attrs[DEVLINK_ATTR_RATE_LIMIT_TYPE])
+		rate_node->limit_type = nla_get_u16(info->attrs[DEVLINK_ATTR_RATE_LIMIT_TYPE]);
+	if (rate_node->limit_type == DEVLINK_RATE_LIMIT_TYPE_UNSET)
+		rate_node->limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
 	rate_node->name = nla_strdup(info->attrs[DEVLINK_ATTR_RATE_NODE_NAME], GFP_KERNEL);
 	if (!rate_node->name) {
 		err = -ENOMEM;
@@ -9000,6 +9074,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_RATE_LIMIT_TYPE] = { .type = NLA_U16 },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 2/5] devlink: Introduce police rate limit type
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 1/5] devlink: Introduce limit_type attr for rate objects Dima Chumak
@ 2022-06-20 15:26 ` Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 3/5] netdevsim: Support devlink rate limit_type police Dima Chumak
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Dima Chumak

Define a new DEVLINK_RATE_LIMIT_TYPE_POLICE along with the related
devlink_rate object attributes and their devlink_ops.

The new attributes are optional and specific to 'limit_type police'
only. Driver implementations are allowed to support any or none of them.

Example of limiting inbound traffic with the new limit type:

$ devlink port function rate set netdevsim/netdevsim10/1 \
          limit_type police rx_max 10mbit rx_burst 1mb

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 include/net/devlink.h        |  40 +++++++
 include/uapi/linux/devlink.h |   8 ++
 net/core/devlink.c           | 223 ++++++++++++++++++++++++++++++++++-
 3 files changed, 268 insertions(+), 3 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 4fe8e657da44..3de1cc42b10c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -103,6 +103,17 @@ struct devlink_rate_shaping_attrs {
 	u64 tx_share;
 };
 
+struct devlink_rate_police_attrs {
+	u64 tx_max;
+	u64 tx_burst;
+	u64 rx_max;
+	u64 rx_burst;
+	u64 tx_pkts;
+	u64 tx_pkts_burst;
+	u64 rx_pkts;
+	u64 rx_pkts_burst;
+};
+
 struct devlink_rate {
 	struct list_head list;
 	enum devlink_rate_limit_type limit_type;
@@ -112,6 +123,7 @@ struct devlink_rate {
 
 	union { /* on limit_type */
 		struct devlink_rate_shaping_attrs shaping_attrs;
+		struct devlink_rate_police_attrs police_attrs;
 	};
 
 	struct devlink_rate *parent;
@@ -1501,10 +1513,38 @@ struct devlink_ops {
 				      u64 tx_share, struct netlink_ext_ack *extack);
 	int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
 				    u64 tx_max, struct netlink_ext_ack *extack);
+	int (*rate_leaf_tx_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+				      u64 tx_burst, struct netlink_ext_ack *extack);
+	int (*rate_leaf_rx_max_set)(struct devlink_rate *devlink_rate, void *priv,
+				    u64 rx_max, struct netlink_ext_ack *extack);
+	int (*rate_leaf_rx_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+				      u64 rx_burst, struct netlink_ext_ack *extack);
+	int (*rate_leaf_tx_pkts_set)(struct devlink_rate *devlink_rate, void *priv,
+				     u64 tx_pkts, struct netlink_ext_ack *extack);
+	int (*rate_leaf_tx_pkts_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+					   u64 tx_pkts_burst, struct netlink_ext_ack *extack);
+	int (*rate_leaf_rx_pkts_set)(struct devlink_rate *devlink_rate, void *priv,
+				     u64 rx_pkts, struct netlink_ext_ack *extack);
+	int (*rate_leaf_rx_pkts_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+					   u64 rx_pkts_burst, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
 				      u64 tx_share, struct netlink_ext_ack *extack);
 	int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
 				    u64 tx_max, struct netlink_ext_ack *extack);
+	int (*rate_node_tx_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+				      u64 tx_burst, struct netlink_ext_ack *extack);
+	int (*rate_node_rx_max_set)(struct devlink_rate *devlink_rate, void *priv,
+				    u64 rx_max, struct netlink_ext_ack *extack);
+	int (*rate_node_rx_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+				      u64 rx_burst, struct netlink_ext_ack *extack);
+	int (*rate_node_tx_pkts_set)(struct devlink_rate *devlink_rate, void *priv,
+				     u64 tx_pkts, struct netlink_ext_ack *extack);
+	int (*rate_node_tx_pkts_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+					   u64 tx_pkts_burst, struct netlink_ext_ack *extack);
+	int (*rate_node_rx_pkts_set)(struct devlink_rate *devlink_rate, void *priv,
+				     u64 rx_pkts, struct netlink_ext_ack *extack);
+	int (*rate_node_rx_pkts_burst_set)(struct devlink_rate *devlink_rate, void *priv,
+					   u64 rx_pkts_burst, struct netlink_ext_ack *extack);
 	int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
 			     struct netlink_ext_ack *extack);
 	int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 53aad0d09231..4903f7b6dc93 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -224,6 +224,7 @@ enum devlink_rate_type {
 enum devlink_rate_limit_type {
 	DEVLINK_RATE_LIMIT_TYPE_UNSET,
 	DEVLINK_RATE_LIMIT_TYPE_SHAPING,
+	DEVLINK_RATE_LIMIT_TYPE_POLICE,
 };
 
 enum devlink_param_cmode {
@@ -582,6 +583,13 @@ enum devlink_attr {
 	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
 
 	DEVLINK_ATTR_RATE_LIMIT_TYPE,		/* u16 */
+	DEVLINK_ATTR_RATE_TX_BURST,		/* u64 */
+	DEVLINK_ATTR_RATE_RX_MAX,		/* u64 */
+	DEVLINK_ATTR_RATE_RX_BURST,		/* u64 */
+	DEVLINK_ATTR_RATE_TX_PKTS,		/* u64 */
+	DEVLINK_ATTR_RATE_TX_PKTS_BURST,	/* u64 */
+	DEVLINK_ATTR_RATE_RX_PKTS,		/* u64 */
+	DEVLINK_ATTR_RATE_RX_PKTS_BURST,	/* u64 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 756d95c72b4d..c74cdd0bd44d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -366,6 +366,12 @@ devlink_rate_is_shaping(struct devlink_rate *devlink_rate)
 	return devlink_rate->limit_type == DEVLINK_RATE_LIMIT_TYPE_SHAPING;
 }
 
+static inline bool
+devlink_rate_is_police(struct devlink_rate *devlink_rate)
+{
+	return devlink_rate->limit_type == DEVLINK_RATE_LIMIT_TYPE_POLICE;
+}
+
 static struct devlink_rate *
 devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info)
 {
@@ -1125,6 +1131,31 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
 		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_SHARE,
 				      devlink_rate->shaping_attrs.tx_share, DEVLINK_ATTR_PAD))
 			goto nla_put_failure;
+	} else if (devlink_rate_is_police(devlink_rate)) {
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_MAX,
+				      devlink_rate->shaping_attrs.tx_max, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_BURST,
+				      devlink_rate->police_attrs.tx_burst, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_RX_MAX,
+				      devlink_rate->police_attrs.rx_max, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_RX_BURST,
+				      devlink_rate->police_attrs.rx_burst, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_PKTS,
+				      devlink_rate->police_attrs.tx_pkts, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_TX_PKTS_BURST,
+				      devlink_rate->police_attrs.tx_pkts_burst, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_RX_PKTS,
+				      devlink_rate->police_attrs.rx_pkts, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+		if (nla_put_u64_64bit(msg, DEVLINK_ATTR_RATE_RX_PKTS_BURST,
+				      devlink_rate->police_attrs.rx_pkts_burst, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
 	}
 
 	if (devlink_rate->parent)
@@ -1966,7 +1997,110 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 							new_val, info->extack);
 		if (err)
 			return err;
-		devlink_rate->shaping_attrs.tx_max = new_val;
+
+		if (devlink_rate_is_police(devlink_rate))
+			devlink_rate->police_attrs.tx_max = new_val;
+		else
+			devlink_rate->shaping_attrs.tx_max = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_TX_BURST] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_BURST]);
+
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_tx_burst_set(devlink_rate, devlink_rate->priv,
+							  new_val, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_tx_burst_set(devlink_rate, devlink_rate->priv,
+							  new_val, info->extack);
+		if (err)
+			return err;
+		devlink_rate->police_attrs.tx_burst = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_RX_MAX] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_RX_MAX]);
+
+		if (devlink_rate_is_leaf(devlink_rate)) {
+			err = ops->rate_leaf_rx_max_set(devlink_rate, devlink_rate->priv,
+							new_val, info->extack);
+		} else if (devlink_rate_is_node(devlink_rate)) {
+			err = ops->rate_node_rx_max_set(devlink_rate, devlink_rate->priv,
+							new_val, info->extack);
+		}
+		if (err)
+			return err;
+		devlink_rate->police_attrs.rx_max = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_RX_BURST] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_RX_BURST]);
+
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_rx_burst_set(devlink_rate, devlink_rate->priv,
+							new_val, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_rx_burst_set(devlink_rate, devlink_rate->priv,
+							new_val, info->extack);
+		if (err)
+			return err;
+		devlink_rate->police_attrs.rx_burst = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_TX_PKTS] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_PKTS]);
+
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_tx_pkts_set(devlink_rate, devlink_rate->priv,
+							new_val, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_tx_pkts_set(devlink_rate, devlink_rate->priv,
+							new_val, info->extack);
+		if (err)
+			return err;
+		devlink_rate->police_attrs.tx_pkts = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_TX_PKTS_BURST] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_TX_PKTS_BURST]);
+
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_tx_pkts_burst_set(devlink_rate, devlink_rate->priv,
+							       new_val, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_tx_pkts_burst_set(devlink_rate, devlink_rate->priv,
+							       new_val, info->extack);
+		if (err)
+			return err;
+		devlink_rate->police_attrs.tx_pkts_burst = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_RX_PKTS] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_RX_PKTS]);
+
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_rx_pkts_set(devlink_rate, devlink_rate->priv,
+							 new_val, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_rx_pkts_set(devlink_rate, devlink_rate->priv,
+							 new_val, info->extack);
+		if (err)
+			return err;
+		devlink_rate->police_attrs.rx_pkts = new_val;
+	}
+
+	if (attrs[DEVLINK_ATTR_RATE_RX_PKTS_BURST] && devlink_rate_is_police(devlink_rate)) {
+		new_val = nla_get_u64(attrs[DEVLINK_ATTR_RATE_RX_PKTS_BURST]);
+
+		if (devlink_rate_is_leaf(devlink_rate))
+			err = ops->rate_leaf_rx_pkts_burst_set(devlink_rate, devlink_rate->priv,
+							       new_val, info->extack);
+		else if (devlink_rate_is_node(devlink_rate))
+			err = ops->rate_node_rx_pkts_burst_set(devlink_rate, devlink_rate->priv,
+							       new_val, info->extack);
+		if (err)
+			return err;
+		devlink_rate->police_attrs.rx_pkts_burst = new_val;
 	}
 
 	if (nla_parent)
@@ -1977,7 +2111,12 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
 	 */
 	if (devlink_rate_is_leaf(devlink_rate) && !devlink_rate->parent &&
 	    ((devlink_rate_is_shaping(devlink_rate) &&
-	      !devlink_rate->shaping_attrs.tx_max && !devlink_rate->shaping_attrs.tx_share)))
+	      !devlink_rate->shaping_attrs.tx_max && !devlink_rate->shaping_attrs.tx_share) ||
+	     (devlink_rate_is_police(devlink_rate) &&
+	      !devlink_rate->police_attrs.tx_max && !devlink_rate->police_attrs.tx_burst &&
+	      !devlink_rate->police_attrs.rx_max && !devlink_rate->police_attrs.rx_burst &&
+	      !devlink_rate->police_attrs.tx_pkts && !devlink_rate->police_attrs.tx_pkts_burst &&
+	      !devlink_rate->police_attrs.rx_pkts && !devlink_rate->police_attrs.rx_pkts_burst)))
 		devlink_rate->limit_type = DEVLINK_RATE_LIMIT_TYPE_UNSET;
 
 	return err;
@@ -1995,7 +2134,43 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
 			return false;
 		}
 		if (attrs[DEVLINK_ATTR_RATE_TX_MAX] && !ops->rate_leaf_tx_max_set) {
-			NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the leafs");
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX max set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_BURST] && !ops->rate_leaf_tx_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX burst set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_MAX] && !ops->rate_leaf_rx_max_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX max set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_BURST] && !ops->rate_leaf_rx_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX burst set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_PKTS] && !ops->rate_leaf_tx_pkts_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX pkts set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_PKTS_BURST] && !ops->rate_leaf_tx_pkts_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX pkts burst set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_PKTS] && !ops->rate_leaf_rx_pkts_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX pkts set isn't supported for the leafs");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_PKTS_BURST] && !ops->rate_leaf_rx_pkts_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX pkts burst set isn't supported for the leafs");
 			return false;
 		}
 		if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] &&
@@ -2012,6 +2187,41 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
 			NL_SET_ERR_MSG_MOD(info->extack, "TX max set isn't supported for the nodes");
 			return false;
 		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_BURST] && !ops->rate_node_tx_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX burst set isn't supported for the nodes");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_MAX] && !ops->rate_node_rx_max_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX max set isn't supported for the nodes");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_BURST] && !ops->rate_node_rx_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX burst set isn't supported for the nodes");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_PKTS] && !ops->rate_node_tx_pkts_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX pkts set isn't supported for the nodes");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_TX_PKTS_BURST] && !ops->rate_node_tx_pkts_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "TX pkts burst set isn't supported for the nodes");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_PKTS] && !ops->rate_node_rx_pkts_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX pkts set isn't supported for the nodes");
+			return false;
+		}
+		if (attrs[DEVLINK_ATTR_RATE_RX_PKTS_BURST] && !ops->rate_node_rx_pkts_burst_set) {
+			NL_SET_ERR_MSG_MOD(info->extack,
+					   "RX pkts burst set isn't supported for the nodes");
+			return false;
+		}
 		if (attrs[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] &&
 		    !ops->rate_node_parent_set) {
 			NL_SET_ERR_MSG_MOD(info->extack, "Parent set isn't supported for the nodes");
@@ -9075,6 +9285,13 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_LINECARD_INDEX] = { .type = NLA_U32 },
 	[DEVLINK_ATTR_LINECARD_TYPE] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_RATE_LIMIT_TYPE] = { .type = NLA_U16 },
+	[DEVLINK_ATTR_RATE_TX_BURST] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_RATE_RX_MAX] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_RATE_RX_BURST] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_RATE_TX_PKTS] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_RATE_TX_PKTS_BURST] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_RATE_RX_PKTS] = { .type = NLA_U64 },
+	[DEVLINK_ATTR_RATE_RX_PKTS_BURST] = { .type = NLA_U64 },
 };
 
 static const struct genl_small_ops devlink_nl_ops[] = {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 3/5] netdevsim: Support devlink rate limit_type police
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 1/5] devlink: Introduce limit_type attr for rate objects Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 2/5] devlink: Introduce police rate limit type Dima Chumak
@ 2022-06-20 15:26 ` Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 4/5] selftest: netdevsim: Add devlink rate police sub-test Dima Chumak
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Dima Chumak

Implement devlink_ops that enable setting devlink_rate attributes for
the new DEVLINK_RATE_LIMIT_TYPE_POLICE type of rate objects via devlink
API.

The new rate values of VF ports and rate nodes are exposed to netdevsim
debugfs.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 drivers/net/netdevsim/dev.c       | 211 ++++++++++++++++++++++++++++--
 drivers/net/netdevsim/netdevsim.h |  11 +-
 2 files changed, 207 insertions(+), 15 deletions(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 57a3ac893792..9ac78ab09a58 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -406,10 +406,24 @@ static int nsim_dev_port_debugfs_init(struct nsim_dev *nsim_dev,
 	if (nsim_dev_port_is_vf(nsim_dev_port)) {
 		unsigned int vf_id = nsim_dev_port_index_to_vf_index(port_index);
 
-		debugfs_create_u16("tx_share", 0400, nsim_dev_port->ddir,
+		debugfs_create_u64("tx_share", 0400, nsim_dev_port->ddir,
 				   &nsim_dev->vfconfigs[vf_id].min_tx_rate);
-		debugfs_create_u16("tx_max", 0400, nsim_dev_port->ddir,
+		debugfs_create_u64("tx_max", 0400, nsim_dev_port->ddir,
 				   &nsim_dev->vfconfigs[vf_id].max_tx_rate);
+		debugfs_create_u64("tx_burst", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].tx_burst);
+		debugfs_create_u64("rx_max", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].rx_max);
+		debugfs_create_u64("rx_burst", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].rx_burst);
+		debugfs_create_u64("tx_pkts", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].tx_pkts);
+		debugfs_create_u64("tx_pkts_burst", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].tx_pkts_burst);
+		debugfs_create_u64("rx_pkts", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].rx_pkts);
+		debugfs_create_u64("rx_pkts_burst", 0400, nsim_dev_port->ddir,
+				   &nsim_dev->vfconfigs[vf_id].rx_pkts_burst);
 		nsim_dev_port->rate_parent = debugfs_create_file("rate_parent",
 								 0400,
 								 nsim_dev_port->ddir,
@@ -1192,20 +1206,106 @@ static int nsim_leaf_tx_max_set(struct devlink_rate *devlink_rate, void *priv,
 	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
 	int err;
 
-	err = nsim_rate_bytes_to_units("tx_max", &tx_max, extack);
-	if (err)
-		return err;
+	if (devlink_rate->limit_type == DEVLINK_RATE_LIMIT_TYPE_SHAPING) {
+		err = nsim_rate_bytes_to_units("tx_max", &tx_max, extack);
+		if (err)
+			return err;
+	}
 
 	nsim_dev->vfconfigs[vf_id].max_tx_rate = tx_max;
 	return 0;
 }
 
+static int nsim_leaf_tx_burst_set(struct devlink_rate *rate_leaf, void *priv,
+				  u64 tx_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].tx_burst = tx_burst;
+	return 0;
+}
+
+static int nsim_leaf_rx_max_set(struct devlink_rate *rate_leaf, void *priv,
+				u64 rx_max, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].rx_max = rx_max;
+	return 0;
+}
+
+static int nsim_leaf_rx_burst_set(struct devlink_rate *rate_leaf, void *priv,
+				  u64 rx_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].rx_burst = rx_burst;
+	return 0;
+}
+
+static int nsim_leaf_tx_pkts_set(struct devlink_rate *rate_leaf, void *priv,
+				 u64 tx_pkts, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].tx_pkts = tx_pkts;
+	return 0;
+}
+
+static int nsim_leaf_tx_pkts_burst_set(struct devlink_rate *rate_leaf, void *priv,
+				       u64 tx_pkts_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].tx_pkts_burst = tx_pkts_burst;
+	return 0;
+}
+
+static int nsim_leaf_rx_pkts_set(struct devlink_rate *rate_leaf, void *priv,
+				 u64 rx_pkts, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].rx_pkts = rx_pkts;
+	return 0;
+}
+
+static int nsim_leaf_rx_pkts_burst_set(struct devlink_rate *rate_leaf, void *priv,
+				       u64 rx_pkts_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_dev_port *nsim_dev_port = priv;
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	int vf_id = nsim_dev_port_index_to_vf_index(nsim_dev_port->port_index);
+
+	nsim_dev->vfconfigs[vf_id].rx_pkts_burst = rx_pkts_burst;
+	return 0;
+}
+
 struct nsim_rate_node {
 	struct dentry *ddir;
 	struct dentry *rate_parent;
 	char *parent_name;
-	u16 tx_share;
-	u16 tx_max;
+	u64 tx_share;
+	u64 tx_max;
+	u64 tx_burst;
+	u64 rx_max;
+	u64 rx_burst;
+	u64 tx_pkts;
+	u64 tx_pkts_burst;
+	u64 rx_pkts;
+	u64 rx_pkts_burst;
 };
 
 static int nsim_node_tx_share_set(struct devlink_rate *devlink_rate, void *priv,
@@ -1228,14 +1328,79 @@ static int nsim_node_tx_max_set(struct devlink_rate *devlink_rate, void *priv,
 	struct nsim_rate_node *nsim_node = priv;
 	int err;
 
-	err = nsim_rate_bytes_to_units("tx_max", &tx_max, extack);
-	if (err)
-		return err;
+	if (devlink_rate->limit_type == DEVLINK_RATE_LIMIT_TYPE_SHAPING) {
+		err = nsim_rate_bytes_to_units("tx_max", &tx_max, extack);
+		if (err)
+			return err;
+	}
 
 	nsim_node->tx_max = tx_max;
 	return 0;
 }
 
+static int nsim_node_tx_burst_set(struct devlink_rate *rate_node, void *priv,
+				  u64 tx_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->tx_burst = tx_burst;
+	return 0;
+}
+
+static int nsim_node_rx_max_set(struct devlink_rate *rate_node, void *priv,
+				u64 rx_max, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->rx_max = rx_max;
+	return 0;
+}
+
+static int nsim_node_rx_burst_set(struct devlink_rate *rate_node, void *priv,
+				  u64 rx_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->rx_burst = rx_burst;
+	return 0;
+}
+
+static int nsim_node_tx_pkts_set(struct devlink_rate *rate_node, void *priv,
+				 u64 tx_pkts, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->tx_pkts = tx_pkts;
+	return 0;
+}
+
+static int nsim_node_tx_pkts_burst_set(struct devlink_rate *rate_node, void *priv,
+				       u64 tx_pkts_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->tx_pkts_burst = tx_pkts_burst;
+	return 0;
+}
+
+static int nsim_node_rx_pkts_set(struct devlink_rate *rate_node, void *priv,
+				 u64 rx_pkts, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->rx_pkts = rx_pkts;
+	return 0;
+}
+
+static int nsim_node_rx_pkts_burst_set(struct devlink_rate *rate_node, void *priv,
+				       u64 rx_pkts_burst, struct netlink_ext_ack *extack)
+{
+	struct nsim_rate_node *nsim_node = priv;
+
+	nsim_node->rx_pkts_burst = rx_pkts_burst;
+	return 0;
+}
+
 static int nsim_rate_node_new(struct devlink_rate *node, void **priv,
 			      struct netlink_ext_ack *extack)
 {
@@ -1253,13 +1418,19 @@ static int nsim_rate_node_new(struct devlink_rate *node, void **priv,
 
 	nsim_node->ddir = debugfs_create_dir(node->name, nsim_dev->nodes_ddir);
 
-	debugfs_create_u16("tx_share", 0400, nsim_node->ddir, &nsim_node->tx_share);
-	debugfs_create_u16("tx_max", 0400, nsim_node->ddir, &nsim_node->tx_max);
+	debugfs_create_u64("tx_share", 0400, nsim_node->ddir, &nsim_node->tx_share);
+	debugfs_create_u64("tx_max", 0400, nsim_node->ddir, &nsim_node->tx_max);
+	debugfs_create_u64("tx_burst", 0400, nsim_node->ddir, &nsim_node->tx_burst);
+	debugfs_create_u64("rx_max", 0400, nsim_node->ddir, &nsim_node->rx_max);
+	debugfs_create_u64("rx_burst", 0400, nsim_node->ddir, &nsim_node->rx_burst);
+	debugfs_create_u64("tx_pkts", 0400, nsim_node->ddir, &nsim_node->tx_pkts);
+	debugfs_create_u64("tx_pkts_burst", 0400, nsim_node->ddir, &nsim_node->tx_pkts_burst);
+	debugfs_create_u64("rx_pkts", 0400, nsim_node->ddir, &nsim_node->rx_pkts);
+	debugfs_create_u64("rx_pkts_burst", 0400, nsim_node->ddir, &nsim_node->rx_pkts_burst);
 	nsim_node->rate_parent = debugfs_create_file("rate_parent", 0400,
 						     nsim_node->ddir,
 						     &nsim_node->parent_name,
 						     &nsim_dev_rate_parent_fops);
-
 	*priv = nsim_node;
 	return 0;
 }
@@ -1337,8 +1508,22 @@ static const struct devlink_ops nsim_dev_devlink_ops = {
 	.trap_policer_counter_get = nsim_dev_devlink_trap_policer_counter_get,
 	.rate_leaf_tx_share_set = nsim_leaf_tx_share_set,
 	.rate_leaf_tx_max_set = nsim_leaf_tx_max_set,
+	.rate_leaf_tx_burst_set = nsim_leaf_tx_burst_set,
+	.rate_leaf_rx_max_set = nsim_leaf_rx_max_set,
+	.rate_leaf_rx_burst_set = nsim_leaf_rx_burst_set,
+	.rate_leaf_tx_pkts_set = nsim_leaf_tx_pkts_set,
+	.rate_leaf_tx_pkts_burst_set = nsim_leaf_tx_pkts_burst_set,
+	.rate_leaf_rx_pkts_set = nsim_leaf_rx_pkts_set,
+	.rate_leaf_rx_pkts_burst_set = nsim_leaf_rx_pkts_burst_set,
 	.rate_node_tx_share_set = nsim_node_tx_share_set,
 	.rate_node_tx_max_set = nsim_node_tx_max_set,
+	.rate_node_tx_burst_set = nsim_node_tx_burst_set,
+	.rate_node_rx_max_set = nsim_node_rx_max_set,
+	.rate_node_rx_burst_set = nsim_node_rx_burst_set,
+	.rate_node_tx_pkts_set = nsim_node_tx_pkts_set,
+	.rate_node_tx_pkts_burst_set = nsim_node_tx_pkts_burst_set,
+	.rate_node_rx_pkts_set = nsim_node_rx_pkts_set,
+	.rate_node_rx_pkts_burst_set = nsim_node_rx_pkts_burst_set,
 	.rate_node_new = nsim_rate_node_new,
 	.rate_node_del = nsim_rate_node_del,
 	.rate_leaf_parent_set = nsim_rate_leaf_parent_set,
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 0b122872b2c9..2040b95e5f93 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -241,8 +241,15 @@ struct nsim_dev_port {
 
 struct nsim_vf_config {
 	int link_state;
-	u16 min_tx_rate;
-	u16 max_tx_rate;
+	u64 min_tx_rate;
+	u64 max_tx_rate;
+	u64 tx_burst;
+	u64 rx_max;
+	u64 rx_burst;
+	u64 tx_pkts;
+	u64 tx_pkts_burst;
+	u64 rx_pkts;
+	u64 rx_pkts_burst;
 	u16 vlan;
 	__be16 vlan_proto;
 	u16 qos;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 4/5] selftest: netdevsim: Add devlink rate police sub-test
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (2 preceding siblings ...)
  2022-06-20 15:26 ` [PATCH net-next 3/5] netdevsim: Support devlink rate limit_type police Dima Chumak
@ 2022-06-20 15:26 ` Dima Chumak
  2022-06-20 15:26 ` [PATCH net-next 5/5] Documentation: devlink rate objects limit_type Dima Chumak
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Dima Chumak

Test verifies that netdevsim VFs and groups can set and retrieve
the new rate limit_type police attributes via devlink API.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 .../drivers/net/netdevsim/devlink.sh          | 215 ++++++++++++++++--
 1 file changed, 200 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 9de1d123f4f5..40392dcbb30e 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -534,6 +534,15 @@ rate_attr_set()
 	devlink port function rate set $handle $name $value$units
 }
 
+rate_police_attr_set()
+{
+	local handle=$1
+	local name=$2
+	local value=$3
+
+	devlink port function rate set $handle limit_type police $name $value
+}
+
 rate_attr_get()
 {
 	local handle=$1
@@ -542,7 +551,7 @@ rate_attr_get()
 	cmd_jq "devlink port function rate show $handle -j" '.[][].'$name
 }
 
-rate_attr_tx_rate_check()
+rate_attr_shaping_rate_check()
 {
 	local handle=$1
 	local name=$2
@@ -563,6 +572,35 @@ rate_attr_tx_rate_check()
 	check_err $? "Unexpected $name attr value $api_value != $rate"
 }
 
+rate_attr_police_rate_check()
+{
+	local handle=$1
+	local name=$2
+	local rate=$3
+	local debug_file=$4
+
+	rate_police_attr_set $handle $name $rate
+	check_err $? "Failed to set $name value"
+
+	local debug_value=$(cat $debug_file)
+	check_err $? "Failed to read $name value from debugfs"
+
+	# undo bits->bytes conversion forced by devlink
+	case $name in ([rt]x_max) debug_value=$((debug_value * 8)) ;; esac
+
+	[ "$debug_value" == "$rate" ]
+	check_err $? "Unexpected $name debug value $debug_value != $rate"
+
+	local api_value=$(rate_attr_get $handle $name)
+	check_err $? "Failed to get $name attr value"
+
+	# undo bits->bytes conversion forced by devlink
+	case $name in ([rt]x_max) api_value=$((api_value * 8)) ;; esac
+
+	[ "$api_value" == "$rate" ]
+	check_err $? "Unexpected $name attr value $api_value != $rate"
+}
+
 rate_attr_parent_check()
 {
 	local handle=$1
@@ -586,8 +624,9 @@ rate_attr_parent_check()
 rate_node_add()
 {
 	local handle=$1
+	local limit_type=${2:+limit_type $2}
 
-	devlink port function rate add $handle
+	devlink port function rate add $handle $limit_type
 }
 
 rate_node_del()
@@ -597,21 +636,14 @@ rate_node_del()
 	devlink port function rate del $handle
 }
 
-rate_test()
+rate_shaping_test()
 {
-	RET=0
-
-	echo $VF_COUNT > /sys/bus/netdevsim/devices/$DEV_NAME/sriov_numvfs
-	devlink dev eswitch set $DL_HANDLE mode switchdev
-	local leafs=`rate_leafs_get $DL_HANDLE`
-	local num_leafs=`echo $leafs | wc -w`
-	[ "$num_leafs" == "$VF_COUNT" ]
-	check_err $? "Expected $VF_COUNT rate leafs but got $num_leafs"
+	local leafs=$1
 
 	rate=10
 	for r_obj in $leafs
 	do
-		rate_attr_tx_rate_check $r_obj tx_share $rate \
+		rate_attr_shaping_rate_check $r_obj tx_share $rate \
 			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_share
 		rate=$(($rate+10))
 	done
@@ -619,11 +651,19 @@ rate_test()
 	rate=100
 	for r_obj in $leafs
 	do
-		rate_attr_tx_rate_check $r_obj tx_max $rate \
+		rate_attr_shaping_rate_check $r_obj tx_max $rate \
 			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_max
 		rate=$(($rate+100))
 	done
 
+	for r_obj in $leafs
+	do
+		rate_attr_shaping_rate_check $r_obj tx_share 0 \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_share
+		rate_attr_shaping_rate_check $r_obj tx_max 0 \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_max
+	done
+
 	local node1_name='group1'
 	local node1="$DL_HANDLE/$node1_name"
 	rate_node_add "$node1"
@@ -634,11 +674,11 @@ rate_test()
 	check_err $? "Expected 1 rate node in output but got $num_nodes"
 
 	local node_tx_share=10
-	rate_attr_tx_rate_check $node1 tx_share $node_tx_share \
+	rate_attr_shaping_rate_check $node1 tx_share $node_tx_share \
 		$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_share
 
 	local node_tx_max=100
-	rate_attr_tx_rate_check $node1 tx_max $node_tx_max \
+	rate_attr_shaping_rate_check $node1 tx_max $node_tx_max \
 		$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_max
 
 	rate_node_del "$node1"
@@ -668,6 +708,151 @@ rate_test()
 	check_err $? "Failed to unset $r_obj parent node"
 	rate_node_del "$node1"
 	check_err $? "Failed to delete node $node1"
+}
+
+rate_police_test()
+{
+	local leafs=$1
+
+	local rate=$((100 * 1000**2 * 8))
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj tx_max $rate \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_max
+		rate=$(($rate + 10 * 1000**2 * 8))
+	done
+
+	local size=$((1024**2))
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj tx_burst $size \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_burst
+		size=$(($size * 2))
+	done
+
+	rate=$((100 * 1000**2 * 8))
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj rx_max $rate \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/rx_max
+		rate=$(($rate + 10 * 1000**2 * 8))
+	done
+
+	size=$((1024**2))
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj rx_burst $size \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/rx_burst
+		size=$(($size * 2))
+	done
+
+	local packets=1000
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj tx_pkts $packets \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_pkts
+		packets=$(($packets * 2))
+	done
+
+	size=$((1024**2))
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj tx_pkts_burst $size \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/tx_pkts_burst
+		size=$(($size * 2))
+	done
+
+	packets=1000
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj rx_pkts $packets \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/rx_pkts
+		packets=$(($packets * 2))
+	done
+
+	size=$((1024**2))
+	for r_obj in $leafs
+	do
+		rate_attr_police_rate_check $r_obj rx_pkts_burst $size \
+			$DEBUGFS_DIR/ports/${r_obj##*/}/rx_pkts_burst
+		size=$(($size * 2))
+	done
+
+	local node1_name='group1'
+	local node1="$DL_HANDLE/$node1_name"
+	rate_node_add "$node1" police
+	check_err $? "Failed to add node $node1"
+
+	local num_nodes=`rate_nodes_get $DL_HANDLE | wc -w`
+	[ $num_nodes == 1 ]
+	check_err $? "Expected 1 rate node in output but got $num_nodes"
+
+	rate_attr_police_rate_check $node1 tx_max $((200 * 1000**2 * 8)) \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_max
+
+	rate_attr_police_rate_check $node1 tx_burst $((2 * 1024**2)) \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_burst
+
+	rate_attr_police_rate_check $node1 rx_max $((300 * 1000**2 * 8)) \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/rx_max
+
+	rate_attr_police_rate_check $node1 rx_burst $((3 * 1024**2)) \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/rx_burst
+
+	rate_attr_police_rate_check $node1 tx_pkts 4000 \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_pkts
+
+	rate_attr_police_rate_check $node1 tx_pkts_burst $((4 * 1024**2)) \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_pkts_burst
+
+	rate_attr_police_rate_check $node1 rx_pkts 5000 \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/rx_pkts
+
+	rate_attr_police_rate_check $node1 rx_pkts_burst $((5 * 1024**2)) \
+		$DEBUGFS_DIR/rate_nodes/${node1##*/}/rx_pkts_burst
+
+	rate_node_del "$node1"
+	check_err $? "Failed to delete node $node1"
+	num_nodes=`rate_nodes_get $DL_HANDLE | wc -w`
+	[ $num_nodes == 0 ]
+	check_err $? "Expected 0 rate node but got $num_nodes"
+
+	rate_node_add "$node1" police
+	check_err $? "Failed to add node $node1"
+
+	rate_attr_parent_check $r_obj $node1_name \
+		$DEBUGFS_DIR/ports/${r_obj##*/}/rate_parent
+
+	local node2_name='group2'
+	local node2="$DL_HANDLE/$node2_name"
+	rate_node_add "$node2" police
+	check_err $? "Failed to add node $node2"
+
+	rate_attr_parent_check $node2 $node1_name \
+		$DEBUGFS_DIR/rate_nodes/$node2_name/rate_parent
+	rate_node_del "$node2"
+	check_err $? "Failed to delete node $node2"
+	rate_attr_set "$r_obj" noparent
+	check_err $? "Failed to unset $r_obj parent node"
+	rate_node_del "$node1"
+	check_err $? "Failed to delete node $node1"
+}
+
+rate_test()
+{
+	RET=0
+
+	echo $VF_COUNT > /sys/bus/netdevsim/devices/$DEV_NAME/sriov_numvfs
+	devlink dev eswitch set $DL_HANDLE mode switchdev
+	local leafs=`rate_leafs_get $DL_HANDLE`
+	local num_leafs=`echo $leafs | wc -w`
+	[ "$num_leafs" == "$VF_COUNT" ]
+	check_err $? "Expected $VF_COUNT rate leafs but got $num_leafs"
+
+	rate_shaping_test "$leafs"
+	if devlink port function rate help |& grep -qF 'limit_type police' ; then
+		rate_police_test "$leafs"
+	fi
 
 	log_test "rate test"
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 5/5] Documentation: devlink rate objects limit_type
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (3 preceding siblings ...)
  2022-06-20 15:26 ` [PATCH net-next 4/5] selftest: netdevsim: Add devlink rate police sub-test Dima Chumak
@ 2022-06-20 15:26 ` Dima Chumak
  2022-06-20 15:35 ` [PATCH iproute2-next 1/5] uapi: devlink.h DEVLINK_ATTR_RATE_LIMIT_TYPE Dima Chumak
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Dima Chumak

Add devlink rate limit_type attribute information in devlink port
documentation.
Add devlink rate 'limit_type police' attributes in netdevsim devlink
documentation.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 .../networking/devlink/devlink-port.rst       | 44 +++++++++++++++++--
 .../networking/devlink/netdevsim.rst          |  3 +-
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 7627b1da01f2..c2cd97a4ec4f 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -184,20 +184,58 @@ This is done through rate objects, which can be one of the two types:
 
 API allows to configure following rate object's parameters:
 
+``limit_type``
+  Type of rate limiting performed by a rate object. Supported types are
+  ``shaping`` and ``police``. Shaping type is a form of back pressure mechanism
+  that can delay traffic until there is a capacity, available at the lower
+  level, to process it. Police type is a simple packet counting and immediate
+  dropping of those packets that exceed the threshold. Some of the parameters
+  may be specific only to one of the limit types.
+
 ``tx_share``
   Minimum TX rate value shared among all other rate objects, or rate objects
-  that parts of the parent group, if it is a part of the same group.
+  that parts of the parent group, if it is a part of the same group. Specific to
+  ``shaping`` limit type.
 
 ``tx_max``
   Maximum TX rate value.
 
+``tx_burst``
+  Size of a bucket that's used to buffer spikes when traffic exceeds ``tx_max``
+  limit. Specific to ``police`` limit type.
+
+``rx_max``
+  Maximum RX rate value.
+
+``rx_burst``
+  Size of a bucket that's used to buffer spikes when traffic exceeds ``rx_max``
+  limit. Specific to ``police`` limit type.
+
+``tx_pkts``
+  Maximum TX rate in packets per second.
+
+``tx_pkts_burst``
+  Size of a bucket that's used to buffer spikes when traffic exceeds ``tx_pkts``
+  limit. Specific to ``police`` limit type.
+
+``rx_pkts``
+  Maximum RX rate in packets per second.
+
+``rx_pkts_burst``
+  Size of a bucket that's used to buffer spikes when traffic exceeds ``rx_pkts``
+  limit. Specific to ``police`` limit type.
+
 ``parent``
   Parent node name. Parent node rate limits are considered as additional limits
   to all node children limits. ``tx_max`` is an upper limit for children.
-  ``tx_share`` is a total bandwidth distributed among children.
+  ``tx_share`` is a total bandwidth distributed among children. It's important
+  that ``limit_type`` of a child object and the parent node should match. In
+  other words, it's only possible to group rate objects of the same
+  ``limit_type``.
 
 Driver implementations are allowed to support both or either rate object types
-and setting methods of their parameters.
+and setting methods of their parameters. The same holds for limit types, a
+driver implementation may support all or only some of them.
 
 Terms and Definitions
 =====================
diff --git a/Documentation/networking/devlink/netdevsim.rst b/Documentation/networking/devlink/netdevsim.rst
index 8a292fb5aaea..32d3171ff281 100644
--- a/Documentation/networking/devlink/netdevsim.rst
+++ b/Documentation/networking/devlink/netdevsim.rst
@@ -64,7 +64,8 @@ The ``netdevsim`` driver supports rate objects management, which includes:
 
 - registerging/unregistering leaf rate objects per VF devlink port;
 - creation/deletion node rate objects;
-- setting tx_share and tx_max rate values for any rate object type;
+- setting limit_type, tx_share, tx_max, tx_burst, rx_max, rx_burst, tx_pkts,
+  tx_pkts_burst, rx_pkts and rx_pkts_burst rate values for any rate object type;
 - setting parent node for any rate object type.
 
 Rate nodes and it's parameters are exposed in ``netdevsim`` debugfs in RO mode.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH iproute2-next 1/5] uapi: devlink.h DEVLINK_ATTR_RATE_LIMIT_TYPE
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (4 preceding siblings ...)
  2022-06-20 15:26 ` [PATCH net-next 5/5] Documentation: devlink rate objects limit_type Dima Chumak
@ 2022-06-20 15:35 ` Dima Chumak
  2022-06-20 15:35 ` [PATCH iproute2-next 2/5] devlink: Add port rate limit_type support Dima Chumak
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Jakub Kicinski, Jiri Pirko, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Dima Chumak

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 include/uapi/linux/devlink.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index da0f1ba8f7a0..4b2653b1c11c 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -221,6 +221,11 @@ enum devlink_rate_type {
 	DEVLINK_RATE_TYPE_NODE,
 };
 
+enum devlink_rate_limit_type {
+	DEVLINK_RATE_LIMIT_TYPE_UNSET,
+	DEVLINK_RATE_LIMIT_TYPE_SHAPING,
+};
+
 enum devlink_param_cmode {
 	DEVLINK_PARAM_CMODE_RUNTIME,
 	DEVLINK_PARAM_CMODE_DRIVERINIT,
@@ -576,6 +581,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_LINECARD_TYPE,		/* string */
 	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
 
+	DEVLINK_ATTR_RATE_LIMIT_TYPE,		/* u16 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH iproute2-next 2/5] devlink: Add port rate limit_type support
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (5 preceding siblings ...)
  2022-06-20 15:35 ` [PATCH iproute2-next 1/5] uapi: devlink.h DEVLINK_ATTR_RATE_LIMIT_TYPE Dima Chumak
@ 2022-06-20 15:35 ` Dima Chumak
  2022-06-20 15:35 ` [PATCH iproute2-next 3/5] utils: Add get_size64() Dima Chumak
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Jakub Kicinski, Jiri Pirko, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Dima Chumak

Extend existing `devlink port func rate {set|add}` CLI with a new
parameter 'limit_type'. It specifies differnt kinds of rate limiting
that may be supported by a driver.

The parameter is optional and the only value it can take as of now is
'shaping', which is just a name for the existing rate limiting
mechanism.

It lays a foundation to adding other limit types. Following patches in
the series introduce new limit_type 'police'.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 devlink/devlink.c       | 121 ++++++++++++++++++++++++++++++++++------
 man/man8/devlink-rate.8 |  17 ++++++
 2 files changed, 120 insertions(+), 18 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index ddf430bbb02a..9b234f2a6825 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -294,6 +294,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_PORT_FN_RATE_TX_MAX	BIT(49)
 #define DL_OPT_PORT_FN_RATE_NODE_NAME	BIT(50)
 #define DL_OPT_PORT_FN_RATE_PARENT	BIT(51)
+#define DL_OPT_PORT_FN_RATE_LIMIT_TYPE	BIT(52)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -354,6 +355,7 @@ struct dl_opts {
 	uint64_t rate_tx_max;
 	char *rate_node_name;
 	const char *rate_parent_node;
+	uint16_t rate_limit_type;
 };
 
 struct dl {
@@ -1438,6 +1440,17 @@ static int port_fn_rate_type_get(const char *typestr, uint16_t *type)
 	return 0;
 }
 
+static int port_fn_rate_limit_type_get(const char *ltypestr, uint16_t *ltype)
+{
+	if (!strcmp(ltypestr, "unset"))
+		*ltype = DEVLINK_RATE_LIMIT_TYPE_UNSET;
+	else if (!strcmp(ltypestr, "shaping"))
+		*ltype = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+	else
+		return -EINVAL;
+	return 0;
+}
+
 static int port_fn_rate_value_get(struct dl *dl, uint64_t *rate)
 {
 	const char *ratestr;
@@ -1982,6 +1995,18 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_PORT_FN_RATE_TYPE;
+		} else if (dl_argv_match(dl, "limit_type") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_LIMIT_TYPE)) {
+			const char *ltypestr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &ltypestr);
+			if (err)
+				return err;
+			err = port_fn_rate_limit_type_get(ltypestr, &opts->rate_limit_type);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_LIMIT_TYPE;
 		} else if (dl_argv_match(dl, "tx_share") &&
 			   (o_all & DL_OPT_PORT_FN_RATE_TX_SHARE)) {
 			dl_arg_inc(dl);
@@ -2212,6 +2237,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_PORT_FN_RATE_TYPE)
 		mnl_attr_put_u16(nlh, DEVLINK_ATTR_RATE_TYPE,
 				 opts->rate_type);
+	if (opts->present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE)
+		mnl_attr_put_u16(nlh, DEVLINK_ATTR_RATE_LIMIT_TYPE,
+				 opts->rate_limit_type);
 	if (opts->present & DL_OPT_PORT_FN_RATE_TX_SHARE)
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_TX_SHARE,
 				 opts->rate_tx_share);
@@ -4487,8 +4515,21 @@ static char *port_rate_type_name(uint16_t type)
 	}
 }
 
+static char *port_rate_limit_type_name(uint16_t ltype)
+{
+	switch (ltype) {
+	case DEVLINK_RATE_LIMIT_TYPE_UNSET:
+		return "unset";
+	case DEVLINK_RATE_LIMIT_TYPE_SHAPING:
+		return "shaping";
+	default:
+		return "<unknown type>";
+	}
+}
+
 static void pr_out_port_fn_rate(struct dl *dl, struct nlattr **tb)
 {
+	uint16_t ltype = DEVLINK_RATE_LIMIT_TYPE_UNSET;
 
 	if (!tb[DEVLINK_ATTR_RATE_NODE_NAME])
 		pr_out_port_handle_start(dl, tb, false);
@@ -4503,7 +4544,14 @@ static void pr_out_port_fn_rate(struct dl *dl, struct nlattr **tb)
 		print_string(PRINT_ANY, "type", "type %s",
 				port_rate_type_name(type));
 	}
-	if (tb[DEVLINK_ATTR_RATE_TX_SHARE]) {
+	if (tb[DEVLINK_ATTR_RATE_LIMIT_TYPE]) {
+		ltype = mnl_attr_get_u16(tb[DEVLINK_ATTR_RATE_LIMIT_TYPE]);
+
+		print_string(PRINT_ANY, "limit_type", " limit_type %s",
+				port_rate_limit_type_name(ltype));
+	}
+	if (tb[DEVLINK_ATTR_RATE_TX_SHARE] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_SHAPING) {
 		uint64_t rate =
 			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_SHARE]);
 
@@ -4550,10 +4598,10 @@ static void cmd_port_fn_rate_help(void)
 	pr_err("Usage: devlink port function rate help\n");
 	pr_err("       devlink port function rate show [ DEV/{ PORT_INDEX | NODE_NAME } ]\n");
 	pr_err("       devlink port function rate add DEV/NODE_NAME\n");
-	pr_err("               [ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n");
+	pr_err("               [ limit_type shaping ][ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n");
 	pr_err("       devlink port function rate del DEV/NODE_NAME\n");
 	pr_err("       devlink port function rate set DEV/{ PORT_INDEX | NODE_NAME }\n");
-	pr_err("               [ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n\n");
+	pr_err("               [ limit_type shaping ][ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n\n");
 	pr_err("       VAL - float or integer value in units of bits or bytes per second (bit|bps)\n");
 	pr_err("       and SI (k-, m-, g-, t-) or IEC (ki-, mi-, gi-, ti-) case-insensitive prefix.\n");
 	pr_err("       Bare number, means bits per second, is possible.\n\n");
@@ -4604,18 +4652,22 @@ static int port_fn_get_and_check_tx_rates(struct dl_opts *reply,
 	return port_fn_check_tx_rates(min, request->rate_tx_max);
 }
 
-static int cmd_port_fn_rate_add(struct dl *dl)
+static int port_rate_shaping_add(struct dl *dl)
 {
 	struct nlmsghdr *nlh;
 	int err;
 
+	if ((dl->opts.present & DL_OPT_PORT_FN_RATE_TX_SHARE) &&
+	    (dl->opts.present & DL_OPT_PORT_FN_RATE_TX_MAX)) {
+		err = port_fn_check_tx_rates(dl->opts.rate_tx_share,
+					     dl->opts.rate_tx_max);
+		if (err)
+			return err;
+	}
+
 	nlh = mnlu_gen_socket_cmd_prepare(&dl->nlg, DEVLINK_CMD_RATE_NEW,
 					  NLM_F_REQUEST | NLM_F_ACK);
-	err = dl_argv_parse_put(nlh, dl, DL_OPT_PORT_FN_RATE_NODE_NAME,
-				DL_OPT_PORT_FN_RATE_TX_SHARE |
-				DL_OPT_PORT_FN_RATE_TX_MAX);
-	if (err)
-		return err;
+	dl_opts_put(nlh, dl);
 
 	if ((dl->opts.present & DL_OPT_PORT_FN_RATE_TX_SHARE) &&
 	    (dl->opts.present & DL_OPT_PORT_FN_RATE_TX_MAX)) {
@@ -4628,6 +4680,27 @@ static int cmd_port_fn_rate_add(struct dl *dl)
 	return mnlu_gen_socket_sndrcv(&dl->nlg, nlh, NULL, NULL);
 }
 
+#define RATE_SHAPING_OPTS	(DL_OPT_PORT_FN_RATE_TX_SHARE)
+
+static int cmd_port_fn_rate_add(struct dl *dl)
+{
+	int err;
+
+	err = dl_argv_parse(dl, DL_OPT_PORT_FN_RATE_NODE_NAME,
+			    DL_OPT_PORT_FN_RATE_LIMIT_TYPE | DL_OPT_PORT_FN_RATE_TX_MAX |
+			    RATE_SHAPING_OPTS);
+	if (err)
+		return err;
+
+	if (!(dl->opts.present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE)) {
+		dl->opts.rate_limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+		dl->opts.present |= DL_OPT_PORT_FN_RATE_LIMIT_TYPE;
+	}
+
+
+	return port_rate_shaping_add(dl);
+}
+
 static int cmd_port_fn_rate_del(struct dl *dl)
 {
 	struct nlmsghdr *nlh;
@@ -4664,20 +4737,12 @@ static int port_fn_get_rates_cb(const struct nlmsghdr *nlh, void *data)
 	return MNL_CB_OK;
 }
 
-static int cmd_port_fn_rate_set(struct dl *dl)
+static int port_rate_shaping_set(struct dl *dl)
 {
 	struct dl_opts tmp_opts = {0};
 	struct nlmsghdr *nlh;
 	int err;
 
-	err = dl_argv_parse(dl, DL_OPT_HANDLEP |
-				DL_OPT_PORT_FN_RATE_NODE_NAME,
-				DL_OPT_PORT_FN_RATE_TX_SHARE |
-				DL_OPT_PORT_FN_RATE_TX_MAX |
-				DL_OPT_PORT_FN_RATE_PARENT);
-	if (err)
-		return err;
-
 	if ((dl->opts.present & DL_OPT_PORT_FN_RATE_TX_SHARE) &&
 	    (dl->opts.present & DL_OPT_PORT_FN_RATE_TX_MAX)) {
 		err = port_fn_check_tx_rates(dl->opts.rate_tx_share,
@@ -4709,6 +4774,26 @@ static int cmd_port_fn_rate_set(struct dl *dl)
 	return mnlu_gen_socket_sndrcv(&dl->nlg, nlh, NULL, NULL);
 }
 
+static int cmd_port_fn_rate_set(struct dl *dl)
+{
+	int err;
+
+	err = dl_argv_parse(dl, DL_OPT_HANDLEP | DL_OPT_PORT_FN_RATE_NODE_NAME,
+			    DL_OPT_PORT_FN_RATE_LIMIT_TYPE | DL_OPT_PORT_FN_RATE_TX_MAX |
+			    RATE_SHAPING_OPTS | DL_OPT_PORT_FN_RATE_PARENT);
+	if (err)
+		return err;
+
+	if (!(dl->opts.present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE) &&
+	    !(dl->opts.present & DL_OPT_PORT_FN_RATE_PARENT)) {
+		dl->opts.rate_limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+		dl->opts.present |= DL_OPT_PORT_FN_RATE_LIMIT_TYPE;
+	}
+
+
+	return port_rate_shaping_set(dl);
+}
+
 static int cmd_port_function_rate(struct dl *dl)
 {
 	if (dl_argv_match(dl, "help")) {
diff --git a/man/man8/devlink-rate.8 b/man/man8/devlink-rate.8
index cc2f50c38619..6b7b179a8696 100644
--- a/man/man8/devlink-rate.8
+++ b/man/man8/devlink-rate.8
@@ -24,12 +24,16 @@ devlink-rate \- devlink rate management
 .ti -8
 .B devlink port function rate set
 .RI "{ " DEV/PORT_INDEX " | " DEV/NODE_NAME " } "
+.RB "{"
+.RB [ " limit_type \fIshaping " ]
 .RB [ " tx_share \fIVALUE " ]
 .RB [ " tx_max \fIVALUE " ]
 .RB "[ {" " parent \fINODE_NAME " | " noparent " "} ]"
 
 .ti -8
 .BI "devlink port function rate add " DEV/NODE_NAME
+.RB "{"
+.RB [ " limit_type \fIshaping " ]
 .RB [ " tx_share \fIVALUE " ]
 .RB [ " tx_max \fIVALUE " ]
 .RB "[ {" " parent \fINODE_NAME " | " noparent " "} ]"
@@ -76,6 +80,19 @@ the last occurrence is used.
 .I DEV/NODE_NAME
 - specifies devlink node rate object.
 .PP
+.BR limit_type " \fIshaping "
+- specifies a kind of rate limiting. The parameter is optional and, if omitted,
+\fIshaping\fR limit type is assumed by default. Each limit type has its own set
+of supported attributes. Some limit types may not be supported by a particular
+driver's implementation. At a high level, \fBlimit_type\fR definition is:
+.PP
+.I shaping
+- limiting traffic rate by using a back pressure mechanism, that can delay
+traffic until there is a capacity, available at the lower level, to process it.
+This type of rate limiting doesn't require packets to be dropped in order to
+ensure the requested rate, on the other hand it may suffer from excessive delays
+and it cannot be applied to inbound traffic.
+.PP
 .BI tx_share " VALUE"
 - specifies minimal tx rate value shared among all rate objects. If rate object
 is a part of some rate group, then this value shared with rate objects of this
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH iproute2-next 3/5] utils: Add get_size64()
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (6 preceding siblings ...)
  2022-06-20 15:35 ` [PATCH iproute2-next 2/5] devlink: Add port rate limit_type support Dima Chumak
@ 2022-06-20 15:35 ` Dima Chumak
  2022-06-20 15:35 ` [PATCH iproute2-next 4/5] uapi: devlink.h DEVLINK_RATE_LIMIT_TYPE_POLICE Dima Chumak
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Jakub Kicinski, Jiri Pirko, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Dima Chumak

Introduce a 64-bit version of the existing 32-bit get_size() API.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 include/utils.h  |  1 +
 lib/utils_math.c | 30 ++++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/utils.h b/include/utils.h
index 9765fdd231df..82007ec1057a 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -158,6 +158,7 @@ int get_addr64(__u64 *ap, const char *cp);
 int get_rate(unsigned int *rate, const char *str);
 int get_rate64(__u64 *rate, const char *str);
 int get_size(unsigned int *size, const char *str);
+int get_size64(__u64 *size, const char *str);
 
 int hex2mem(const char *buf, uint8_t *mem, int count);
 char *hexstring_n2a(const __u8 *str, int len, char *buf, int blen);
diff --git a/lib/utils_math.c b/lib/utils_math.c
index 9ef3dd6ed93b..903404b0f93c 100644
--- a/lib/utils_math.c
+++ b/lib/utils_math.c
@@ -121,3 +121,33 @@ int get_size(unsigned int *size, const char *str)
 
 	return 0;
 }
+
+int get_size64(__u64 *size, const char *str)
+{
+	double sz;
+	char *p;
+
+	sz = strtod(str, &p);
+	if (p == str)
+		return -1;
+
+	if (*p) {
+		if (strcasecmp(p, "kb") == 0 || strcasecmp(p, "k") == 0)
+			sz *= 1024;
+		else if (strcasecmp(p, "gb") == 0 || strcasecmp(p, "g") == 0)
+			sz *= 1024*1024*1024;
+		else if (strcasecmp(p, "gbit") == 0)
+			sz *= 1024*1024*1024/8;
+		else if (strcasecmp(p, "mb") == 0 || strcasecmp(p, "m") == 0)
+			sz *= 1024*1024;
+		else if (strcasecmp(p, "mbit") == 0)
+			sz *= 1024*1024/8;
+		else if (strcasecmp(p, "kbit") == 0)
+			sz *= 1024/8;
+		else if (strcasecmp(p, "b") != 0)
+			return -1;
+	}
+
+	*size = sz;
+	return 0;
+}
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH iproute2-next 4/5] uapi: devlink.h DEVLINK_RATE_LIMIT_TYPE_POLICE
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (7 preceding siblings ...)
  2022-06-20 15:35 ` [PATCH iproute2-next 3/5] utils: Add get_size64() Dima Chumak
@ 2022-06-20 15:35 ` Dima Chumak
  2022-06-20 15:35 ` [PATCH iproute2-next 5/5] devlink: Introduce port rate limit_type police Dima Chumak
  2022-06-20 20:04 ` [PATCH net-next 0/5] devlink rate police limiter Jakub Kicinski
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Jakub Kicinski, Jiri Pirko, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Dima Chumak

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 include/uapi/linux/devlink.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 4b2653b1c11c..b9d22448c563 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -224,6 +224,7 @@ enum devlink_rate_type {
 enum devlink_rate_limit_type {
 	DEVLINK_RATE_LIMIT_TYPE_UNSET,
 	DEVLINK_RATE_LIMIT_TYPE_SHAPING,
+	DEVLINK_RATE_LIMIT_TYPE_POLICE,
 };
 
 enum devlink_param_cmode {
@@ -582,6 +583,13 @@ enum devlink_attr {
 	DEVLINK_ATTR_LINECARD_SUPPORTED_TYPES,	/* nested */
 
 	DEVLINK_ATTR_RATE_LIMIT_TYPE,		/* u16 */
+	DEVLINK_ATTR_RATE_TX_BURST,		/* u64 */
+	DEVLINK_ATTR_RATE_RX_MAX,		/* u64 */
+	DEVLINK_ATTR_RATE_RX_BURST,		/* u64 */
+	DEVLINK_ATTR_RATE_TX_PKTS,		/* u64 */
+	DEVLINK_ATTR_RATE_TX_PKTS_BURST,	/* u64 */
+	DEVLINK_ATTR_RATE_RX_PKTS,		/* u64 */
+	DEVLINK_ATTR_RATE_RX_PKTS_BURST,	/* u64 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH iproute2-next 5/5] devlink: Introduce port rate limit_type police
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (8 preceding siblings ...)
  2022-06-20 15:35 ` [PATCH iproute2-next 4/5] uapi: devlink.h DEVLINK_RATE_LIMIT_TYPE_POLICE Dima Chumak
@ 2022-06-20 15:35 ` Dima Chumak
  2022-06-20 20:04 ` [PATCH net-next 0/5] devlink rate police limiter Jakub Kicinski
  10 siblings, 0 replies; 26+ messages in thread
From: Dima Chumak @ 2022-06-20 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Jakub Kicinski, Jiri Pirko, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Dima Chumak

Add a new set of `devlink port func rate {set|add}` CLI parameters that
are supported with DEVLINK_RATE_LIMIT_TYPE_POLICE kernel API. They
require explicit use of 'limit_type police', for example:

  $ devlink port func rate add pci/0000:03:00.0/g1 \
      limit_type police tx_max 10GBps tx_burst 1gb \
                        rx_max 25GBps rx_burst 2gb \
                        tx_pkts 10000 tx_pkts_burst 1gb \
                        rx_pkts 20000 rx_pkts_burst 2gb

  $ devlink port func rate set pci/0000:03:00.0/1 \
      limit_type police tx_max 2GBps rx_burst 256mb \
                        rx_max 8GBps rx_burst 512mb \
                        parent g1

  $ devlink port func rate set pci/0000:03:00.0/2 parent g1

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
---
 devlink/devlink.c       | 315 +++++++++++++++++++++++++++++++++++++++-
 man/man8/devlink-rate.8 |  92 +++++++++++-
 2 files changed, 401 insertions(+), 6 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 9b234f2a6825..8eea45dad285 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -295,6 +295,13 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_PORT_FN_RATE_NODE_NAME	BIT(50)
 #define DL_OPT_PORT_FN_RATE_PARENT	BIT(51)
 #define DL_OPT_PORT_FN_RATE_LIMIT_TYPE	BIT(52)
+#define DL_OPT_PORT_FN_RATE_TX_BURST	BIT(53)
+#define DL_OPT_PORT_FN_RATE_RX_MAX	BIT(54)
+#define DL_OPT_PORT_FN_RATE_RX_BURST	BIT(55)
+#define DL_OPT_PORT_FN_RATE_TX_PKTS	BIT(56)
+#define DL_OPT_PORT_FN_RATE_TX_PKTS_BURST	BIT(57)
+#define DL_OPT_PORT_FN_RATE_RX_PKTS	BIT(58)
+#define DL_OPT_PORT_FN_RATE_RX_PKTS_BURST	BIT(59)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -356,6 +363,13 @@ struct dl_opts {
 	char *rate_node_name;
 	const char *rate_parent_node;
 	uint16_t rate_limit_type;
+	uint64_t rate_tx_burst;
+	uint64_t rate_rx_max;
+	uint64_t rate_rx_burst;
+	uint64_t rate_tx_pkts;
+	uint64_t rate_tx_pkts_burst;
+	uint64_t rate_rx_pkts;
+	uint64_t rate_rx_pkts_burst;
 };
 
 struct dl {
@@ -1446,6 +1460,8 @@ static int port_fn_rate_limit_type_get(const char *ltypestr, uint16_t *ltype)
 		*ltype = DEVLINK_RATE_LIMIT_TYPE_UNSET;
 	else if (!strcmp(ltypestr, "shaping"))
 		*ltype = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
+	else if (!strcmp(ltypestr, "police"))
+		*ltype = DEVLINK_RATE_LIMIT_TYPE_POLICE;
 	else
 		return -EINVAL;
 	return 0;
@@ -1470,6 +1486,44 @@ static int port_fn_rate_value_get(struct dl *dl, uint64_t *rate)
 	return 0;
 }
 
+static int port_fn_rate_size_get(struct dl *dl, uint64_t *size)
+{
+	const char *sizestr;
+	__u64 size64;
+	int err;
+
+	err = dl_argv_str(dl, &sizestr);
+	if (err)
+		return err;
+	err = get_size64(&size64, sizestr);
+	if (err) {
+		pr_err("Invalid burst buffer size value: \"%s\"\n", sizestr);
+		return -EINVAL;
+	}
+
+	*size = size64;
+	return 0;
+}
+
+static int port_fn_rate_pkts_get(struct dl *dl, uint64_t *pkts)
+{
+	const char *pktsstr;
+	__u64 pkts64;
+	int err;
+
+	err = dl_argv_str(dl, &pktsstr);
+	if (err)
+		return err;
+	err = get_size64(&pkts64, pktsstr);
+	if (err) {
+		pr_err("Invalid pkts value: \"%s\"\n", pktsstr);
+		return -EINVAL;
+	}
+
+	*pkts = pkts64;
+	return 0;
+}
+
 struct dl_args_metadata {
 	uint64_t o_flag;
 	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
@@ -2021,6 +2075,55 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_PORT_FN_RATE_TX_MAX;
+		} else if (dl_argv_match(dl, "tx_burst") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_TX_BURST)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_size_get(dl, &opts->rate_tx_burst);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_TX_BURST;
+		} else if (dl_argv_match(dl, "rx_max") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_RX_MAX)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_value_get(dl, &opts->rate_rx_max);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_RX_MAX;
+		} else if (dl_argv_match(dl, "rx_burst") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_RX_BURST)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_size_get(dl, &opts->rate_rx_burst);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_RX_BURST;
+		} else if (dl_argv_match(dl, "tx_pkts") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_TX_PKTS)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_pkts_get(dl, &opts->rate_tx_pkts);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_TX_PKTS;
+		} else if (dl_argv_match(dl, "tx_pkts_burst") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_TX_PKTS_BURST)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_size_get(dl, &opts->rate_tx_pkts_burst);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_TX_PKTS_BURST;
+		} else if (dl_argv_match(dl, "rx_pkts") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_RX_PKTS)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_pkts_get(dl, &opts->rate_rx_pkts);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_RX_PKTS;
+		} else if (dl_argv_match(dl, "rx_pkts_burst") &&
+			   (o_all & DL_OPT_PORT_FN_RATE_RX_PKTS_BURST)) {
+			dl_arg_inc(dl);
+			err = port_fn_rate_size_get(dl, &opts->rate_rx_pkts_burst);
+			if (err)
+				return err;
+			o_found |= DL_OPT_PORT_FN_RATE_RX_PKTS_BURST;
 		} else if (dl_argv_match(dl, "parent") &&
 			   (o_all & DL_OPT_PORT_FN_RATE_PARENT)) {
 			dl_arg_inc(dl);
@@ -2246,6 +2349,27 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_PORT_FN_RATE_TX_MAX)
 		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_TX_MAX,
 				 opts->rate_tx_max);
+	if (opts->present & DL_OPT_PORT_FN_RATE_TX_BURST)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_TX_BURST,
+				 opts->rate_tx_burst);
+	if (opts->present & DL_OPT_PORT_FN_RATE_RX_MAX)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_RX_MAX,
+				 opts->rate_rx_max);
+	if (opts->present & DL_OPT_PORT_FN_RATE_RX_BURST)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_RX_BURST,
+				 opts->rate_rx_burst);
+	if (opts->present & DL_OPT_PORT_FN_RATE_TX_PKTS)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_TX_PKTS,
+				 opts->rate_tx_pkts);
+	if (opts->present & DL_OPT_PORT_FN_RATE_TX_PKTS_BURST)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_TX_PKTS_BURST,
+				 opts->rate_tx_pkts_burst);
+	if (opts->present & DL_OPT_PORT_FN_RATE_RX_PKTS)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_RX_PKTS,
+				 opts->rate_rx_pkts);
+	if (opts->present & DL_OPT_PORT_FN_RATE_RX_PKTS_BURST)
+		mnl_attr_put_u64(nlh, DEVLINK_ATTR_RATE_RX_PKTS_BURST,
+				 opts->rate_rx_pkts_burst);
 	if (opts->present & DL_OPT_PORT_FN_RATE_PARENT)
 		mnl_attr_put_strz(nlh, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
 				  opts->rate_parent_node);
@@ -4522,11 +4646,43 @@ static char *port_rate_limit_type_name(uint16_t ltype)
 		return "unset";
 	case DEVLINK_RATE_LIMIT_TYPE_SHAPING:
 		return "shaping";
+	case DEVLINK_RATE_LIMIT_TYPE_POLICE:
+		return "police";
 	default:
 		return "<unknown type>";
 	}
 }
 
+static char *port_rate_opt_name(enum devlink_rate_limit_type ltype, uint64_t present)
+{
+	switch (ltype) {
+	case DEVLINK_RATE_LIMIT_TYPE_SHAPING:
+		if (present & DL_OPT_PORT_FN_RATE_TX_SHARE)
+			return "tx_share";
+		if (present & DL_OPT_PORT_FN_RATE_TX_MAX)
+			return "tx_max";
+	case DEVLINK_RATE_LIMIT_TYPE_POLICE:
+		if (present & DL_OPT_PORT_FN_RATE_TX_MAX)
+			return "tx_max";
+		if (present & DL_OPT_PORT_FN_RATE_TX_BURST)
+			return "tx_burst";
+		if (present & DL_OPT_PORT_FN_RATE_RX_MAX)
+			return "rx_max";
+		if (present & DL_OPT_PORT_FN_RATE_RX_BURST)
+			return "rx_burst";
+		if (present & DL_OPT_PORT_FN_RATE_TX_PKTS)
+			return "tx_pkts";
+		if (present & DL_OPT_PORT_FN_RATE_TX_PKTS_BURST)
+			return "tx_pkts_burst";
+		if (present & DL_OPT_PORT_FN_RATE_RX_PKTS)
+			return "rx_pkts";
+		if (present & DL_OPT_PORT_FN_RATE_RX_PKTS_BURST)
+			return "rx_pkts_burst";
+	default:
+		return "";
+	}
+}
+
 static void pr_out_port_fn_rate(struct dl *dl, struct nlattr **tb)
 {
 	uint16_t ltype = DEVLINK_RATE_LIMIT_TYPE_UNSET;
@@ -4567,6 +4723,69 @@ static void pr_out_port_fn_rate(struct dl *dl, struct nlattr **tb)
 			print_rate(use_iec, PRINT_ANY, "tx_max",
 				   " tx_max %s", rate);
 	}
+	if (tb[DEVLINK_ATTR_RATE_TX_BURST] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t size =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_BURST]);
+
+		if (size)
+			print_rate(use_iec, PRINT_ANY, "tx_burst",
+				   " tx_burst %s", size);
+	}
+	if (tb[DEVLINK_ATTR_RATE_RX_MAX] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t rate =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_MAX]);
+
+		if (rate)
+			print_rate(use_iec, PRINT_ANY, "rx_max",
+				   " rx_max %s", rate);
+	}
+	if (tb[DEVLINK_ATTR_RATE_RX_BURST] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t size =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_BURST]);
+
+		if (size)
+			print_rate(use_iec, PRINT_ANY, "rx_burst",
+				   " rx_burst %s", size);
+	}
+	if (tb[DEVLINK_ATTR_RATE_TX_PKTS] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t rate =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_PKTS]);
+
+		if (rate)
+			print_rate(use_iec, PRINT_ANY, "tx_pkts",
+				   " tx_pkts %s", rate);
+	}
+	if (tb[DEVLINK_ATTR_RATE_TX_PKTS_BURST] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t size =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_PKTS_BURST]);
+
+		if (size)
+			print_rate(use_iec, PRINT_ANY, "tx_pkts_burst",
+				   " tx_pkts_burst %s", size);
+	}
+	if (tb[DEVLINK_ATTR_RATE_RX_PKTS] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t rate =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_PKTS]);
+
+		if (rate)
+			print_rate(use_iec, PRINT_ANY, "rx_pkts",
+				   " rx_pkts %s", rate);
+	}
+	if (tb[DEVLINK_ATTR_RATE_RX_PKTS_BURST] &&
+	    ltype == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		uint64_t size =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_PKTS_BURST]);
+
+		if (size)
+			print_rate(use_iec, PRINT_ANY, "rx_pkts_burst",
+				   " rx_pkts_burst %s", size);
+	}
 	if (tb[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]) {
 		const char *parent =
 			mnl_attr_get_str(tb[DEVLINK_ATTR_RATE_PARENT_NODE_NAME]);
@@ -4599,9 +4818,17 @@ static void cmd_port_fn_rate_help(void)
 	pr_err("       devlink port function rate show [ DEV/{ PORT_INDEX | NODE_NAME } ]\n");
 	pr_err("       devlink port function rate add DEV/NODE_NAME\n");
 	pr_err("               [ limit_type shaping ][ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n");
+	pr_err("       devlink port function rate add DEV/NODE_NAME\n");
+	pr_err("               limit_type police [ tx_max VAL [ tx_burst VAL ]][ rx_max VAL [ rx_burst VAL ]]\n");
+	pr_err("                                 [ tx_pkts VAL [ tx_pkts_burst VAL ]][ rx_pkts VAL [ rx_pkts_burst VAL ]]\n");
+	pr_err("                                 [ { parent NODE_NAME | noparent } ]\n");
 	pr_err("       devlink port function rate del DEV/NODE_NAME\n");
 	pr_err("       devlink port function rate set DEV/{ PORT_INDEX | NODE_NAME }\n");
-	pr_err("               [ limit_type shaping ][ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n\n");
+	pr_err("               [ limit_type shaping ][ tx_share VAL ][ tx_max VAL ][ { parent NODE_NAME | noparent } ]\n");
+	pr_err("       devlink port function rate set DEV/{ PORT_INDEX | NODE_NAME }\n");
+	pr_err("               limit_type police [ tx_max VAL [ tx_burst VAL ]][ rx_max VAL [ rx_burst VAL ]]\n");
+	pr_err("                                 [ tx_pkts VAL [ tx_pkts_burst VAL ]][ rx_pkts VAL [ rx_pkts_burst VAL ]]\n");
+	pr_err("                                 [ { parent NODE_NAME | noparent } ]\n\n");
 	pr_err("       VAL - float or integer value in units of bits or bytes per second (bit|bps)\n");
 	pr_err("       and SI (k-, m-, g-, t-) or IEC (ki-, mi-, gi-, ti-) case-insensitive prefix.\n");
 	pr_err("       Bare number, means bits per second, is possible.\n\n");
@@ -4680,7 +4907,24 @@ static int port_rate_shaping_add(struct dl *dl)
 	return mnlu_gen_socket_sndrcv(&dl->nlg, nlh, NULL, NULL);
 }
 
+static int port_rate_police_add(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+
+	nlh = mnlu_gen_socket_cmd_prepare(&dl->nlg, DEVLINK_CMD_RATE_NEW,
+					  NLM_F_REQUEST | NLM_F_ACK);
+	dl_opts_put(nlh, dl);
+	return mnlu_gen_socket_sndrcv(&dl->nlg, nlh, NULL, NULL);
+}
+
 #define RATE_SHAPING_OPTS	(DL_OPT_PORT_FN_RATE_TX_SHARE)
+#define RATE_POLICE_OPTS	(DL_OPT_PORT_FN_RATE_TX_BURST \
+				 | DL_OPT_PORT_FN_RATE_RX_MAX \
+				 | DL_OPT_PORT_FN_RATE_RX_BURST \
+				 | DL_OPT_PORT_FN_RATE_TX_PKTS \
+				 | DL_OPT_PORT_FN_RATE_TX_PKTS_BURST \
+				 | DL_OPT_PORT_FN_RATE_RX_PKTS \
+				 | DL_OPT_PORT_FN_RATE_RX_PKTS_BURST)
 
 static int cmd_port_fn_rate_add(struct dl *dl)
 {
@@ -4688,15 +4932,32 @@ static int cmd_port_fn_rate_add(struct dl *dl)
 
 	err = dl_argv_parse(dl, DL_OPT_PORT_FN_RATE_NODE_NAME,
 			    DL_OPT_PORT_FN_RATE_LIMIT_TYPE | DL_OPT_PORT_FN_RATE_TX_MAX |
-			    RATE_SHAPING_OPTS);
+			    RATE_SHAPING_OPTS | RATE_POLICE_OPTS);
 	if (err)
 		return err;
 
+	if ((dl->opts.present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE) &&
+	    dl->opts.rate_limit_type == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		if (dl->opts.present & RATE_SHAPING_OPTS) {
+			pr_err("Unsupported option \"%s\" for limit_type \"%s\"\n",
+			       port_rate_opt_name(dl->opts.rate_limit_type, dl->opts.present),
+			       port_rate_limit_type_name(dl->opts.rate_limit_type));
+			return -EINVAL;
+		}
+		return port_rate_police_add(dl);
+	}
+
 	if (!(dl->opts.present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE)) {
 		dl->opts.rate_limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
 		dl->opts.present |= DL_OPT_PORT_FN_RATE_LIMIT_TYPE;
 	}
 
+	if (dl->opts.present & RATE_POLICE_OPTS) {
+		pr_err("Unsupported option \"%s\" for limit_type \"%s\"\n",
+			port_rate_opt_name(dl->opts.rate_limit_type, dl->opts.present),
+			port_rate_limit_type_name(dl->opts.rate_limit_type));
+		return -EINVAL;
+	}
 
 	return port_rate_shaping_add(dl);
 }
@@ -4734,6 +4995,27 @@ static int port_fn_get_rates_cb(const struct nlmsghdr *nlh, void *data)
 	if (tb[DEVLINK_ATTR_RATE_TX_MAX])
 		opts->rate_tx_max =
 			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_MAX]);
+	if (tb[DEVLINK_ATTR_RATE_TX_BURST])
+		opts->rate_tx_burst =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_BURST]);
+	if (tb[DEVLINK_ATTR_RATE_RX_MAX])
+		opts->rate_rx_max =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_MAX]);
+	if (tb[DEVLINK_ATTR_RATE_RX_BURST])
+		opts->rate_rx_burst =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_BURST]);
+	if (tb[DEVLINK_ATTR_RATE_TX_PKTS])
+		opts->rate_tx_pkts =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_PKTS]);
+	if (tb[DEVLINK_ATTR_RATE_TX_PKTS_BURST])
+		opts->rate_tx_pkts_burst =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_TX_PKTS_BURST]);
+	if (tb[DEVLINK_ATTR_RATE_RX_PKTS])
+		opts->rate_rx_pkts =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_PKTS]);
+	if (tb[DEVLINK_ATTR_RATE_RX_PKTS_BURST])
+		opts->rate_rx_pkts_burst =
+			mnl_attr_get_u64(tb[DEVLINK_ATTR_RATE_RX_PKTS_BURST]);
 	return MNL_CB_OK;
 }
 
@@ -4774,22 +5056,49 @@ static int port_rate_shaping_set(struct dl *dl)
 	return mnlu_gen_socket_sndrcv(&dl->nlg, nlh, NULL, NULL);
 }
 
+static int port_rate_police_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+
+	nlh = mnlu_gen_socket_cmd_prepare(&dl->nlg, DEVLINK_CMD_RATE_SET,
+					  NLM_F_REQUEST | NLM_F_ACK);
+	dl_opts_put(nlh, dl);
+	return mnlu_gen_socket_sndrcv(&dl->nlg, nlh, NULL, NULL);
+}
+
 static int cmd_port_fn_rate_set(struct dl *dl)
 {
 	int err;
 
 	err = dl_argv_parse(dl, DL_OPT_HANDLEP | DL_OPT_PORT_FN_RATE_NODE_NAME,
 			    DL_OPT_PORT_FN_RATE_LIMIT_TYPE | DL_OPT_PORT_FN_RATE_TX_MAX |
-			    RATE_SHAPING_OPTS | DL_OPT_PORT_FN_RATE_PARENT);
+			    RATE_SHAPING_OPTS | RATE_POLICE_OPTS | DL_OPT_PORT_FN_RATE_PARENT);
 	if (err)
 		return err;
 
+	if ((dl->opts.present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE) &&
+	    dl->opts.rate_limit_type == DEVLINK_RATE_LIMIT_TYPE_POLICE) {
+		if (dl->opts.present & RATE_SHAPING_OPTS) {
+			pr_err("Unsupported option \"%s\" for limit_type \"%s\"\n",
+			       port_rate_opt_name(dl->opts.rate_limit_type, dl->opts.present),
+			       port_rate_limit_type_name(dl->opts.rate_limit_type));
+			return -EINVAL;
+		}
+		return port_rate_police_set(dl);
+	}
+
 	if (!(dl->opts.present & DL_OPT_PORT_FN_RATE_LIMIT_TYPE) &&
 	    !(dl->opts.present & DL_OPT_PORT_FN_RATE_PARENT)) {
 		dl->opts.rate_limit_type = DEVLINK_RATE_LIMIT_TYPE_SHAPING;
 		dl->opts.present |= DL_OPT_PORT_FN_RATE_LIMIT_TYPE;
 	}
 
+	if (dl->opts.present & RATE_POLICE_OPTS) {
+		pr_err("Unsupported option \"%s\" for limit_type \"%s\"\n",
+			port_rate_opt_name(dl->opts.rate_limit_type, dl->opts.present),
+			port_rate_limit_type_name(dl->opts.rate_limit_type));
+		return -EINVAL;
+	}
 
 	return port_rate_shaping_set(dl);
 }
diff --git a/man/man8/devlink-rate.8 b/man/man8/devlink-rate.8
index 6b7b179a8696..56907590cd9a 100644
--- a/man/man8/devlink-rate.8
+++ b/man/man8/devlink-rate.8
@@ -28,6 +28,12 @@ devlink-rate \- devlink rate management
 .RB [ " limit_type \fIshaping " ]
 .RB [ " tx_share \fIVALUE " ]
 .RB [ " tx_max \fIVALUE " ]
+.RB | " limit_type \fIpolice "
+.RB [ " tx_max \fIVALUE " [ " tx_burst \fIVALUE " ] " " ]
+.RB [ " rx_max \fIVALUE " [ " rx_burst \fIVALUE " ] " " ]
+.RB [ " tx_pkts \fIVALUE " [ " tx_pkts_burst \fIVALUE " ] " " ]
+.RB [ " rx_pkts \fIVALUE " [ " rx_pkts_burst \fIVALUE " ] " " ]
+.RB "}"
 .RB "[ {" " parent \fINODE_NAME " | " noparent " "} ]"
 
 .ti -8
@@ -36,6 +42,12 @@ devlink-rate \- devlink rate management
 .RB [ " limit_type \fIshaping " ]
 .RB [ " tx_share \fIVALUE " ]
 .RB [ " tx_max \fIVALUE " ]
+.RB | " limit_type \fIpolice "
+.RB [ " tx_max \fIVALUE " [ " tx_burst \fIVALUE " ] " " ]
+.RB [ " rx_max \fIVALUE " [ " rx_burst \fIVALUE " ] " " ]
+.RB [ " tx_pkts \fIVALUE " [ " tx_pkts_burst \fIVALUE " ] " " ]
+.RB [ " rx_pkts \fIVALUE " [ " rx_pkts_burst \fIVALUE " ] " " ]
+.RB "}"
 .RB "[ {" " parent \fINODE_NAME " | " noparent " "} ]"
 
 .ti -8
@@ -80,7 +92,7 @@ the last occurrence is used.
 .I DEV/NODE_NAME
 - specifies devlink node rate object.
 .PP
-.BR limit_type " \fIshaping "
+.BR limit_type " {" " \fIshaping " | " \fIpolice " }
 - specifies a kind of rate limiting. The parameter is optional and, if omitted,
 \fIshaping\fR limit type is assumed by default. Each limit type has its own set
 of supported attributes. Some limit types may not be supported by a particular
@@ -93,10 +105,17 @@ This type of rate limiting doesn't require packets to be dropped in order to
 ensure the requested rate, on the other hand it may suffer from excessive delays
 and it cannot be applied to inbound traffic.
 .PP
+.I police
+- limiting traffic rate by dropping excessive packets. This type of rate
+limiting can be applied to both outbound and inbound traffic, and it doesn't
+suffer from delays that might occur with \fIshaping\fR limit type. On the other
+hand, by definition this type of rate limiting may be unacceptable for certain
+applications and workloads that are sensitive to packet loss.
+.PP
 .BI tx_share " VALUE"
 - specifies minimal tx rate value shared among all rate objects. If rate object
 is a part of some rate group, then this value shared with rate objects of this
-rate group.
+rate group. This parameter is specific to \fBlimit_type\fR \fIshaping\fR only.
 .PP
 .BI tx_max " VALUE"
 - specifies maximum tx rate value.
@@ -140,11 +159,72 @@ To specify in IEC units, replace the SI prefix (k-, m-, g-, t-) with IEC prefix
 (ki-, mi-, gi- and ti-) respectively. Input is case-insensitive.
 .RE
 .PP
+.BI tx_burst " VALUE"
+- specifies size of a bucket that's used to buffer spikes when traffic exceeds
+\fBtx_max\fR limit. This parameter is specific to \fBlimit_type\fR \fIpolice\fR
+only.
+.TP 8
+.I VALUE
+This parameter accept a floating point number, possibly followed by a unit.
+.RS
+.TP
+b or a bare number
+Bytes
+.TP
+k | kb
+Kilobytes
+.TP
+m | mb
+Megabytes
+.TP
+g | gb
+Gigabytes
+.TP
+kbit
+Kilobits
+.TP
+mbit
+Megabits
+.TP
+gbit
+Gigabits
+.RE
+.PP
+.BI rx_max " VALUE"
+- specifies maximum rx rate value. It accepts same values as \fBtx_max\fR. This
+parameter is specific to \fBlimit_type\fR \fIpolice\fR only.
+.PP
+.BI rx_burst " VALUE"
+- specifies size of a bucket that's used to buffer spikes when traffic exceeds
+\fBrx_max\fR limit. It accepts the same values as \fBtx_burst\fR. This parameter
+is specific to \fBlimit_type\fR \fIpolice\fR only.
+.PP
+.BI tx_pkts " VALUE"
+- specifies maximum tx packets per second value. This parameter is specific to
+\fBlimit_type\fR \fIpolice\fR only.
+.PP
+.BI tx_pkts_burst " VALUE"
+- specifies size of a bucket that's used to buffer spikes when traffic exceeds
+\fBtx_pkts\fR limit. It accepts the same values as \fBtx_burst\fR. This
+parameter is specific to \fBlimit_type\fR \fIpolice\fR only.
+.PP
+.BI rx_pkts " VALUE"
+- specifies maximum tx packets per second value. This parameter is specific to
+\fBlimit_type\fR \fIpolice\fR only.
+.PP
+.BI rx_pkts_burst " VALUE"
+- specifies size of a bucket that's used to buffer spikes when traffic exceeds
+\fBtx_pkts\fR limit. It accepts the same values as \fBtx_burst\fR. This
+parameter is specific to \fBlimit_type\fR \fIpolice\fR only.
+.PP
 .BI parent " NODE_NAME \fR| " noparent
 - set rate object parent to existing node with name \fINODE_NAME\fR or unset
 parent. Rate limits of the parent node applied to all it's children. Actual
 behaviour is details of driver's implementation. Setting parent to empty ("")
-name due to the kernel logic threated as parent unset.
+name due to the kernel logic treated as parent unset. It's important that
+\fBlimit_type\fR of the rate object and the parent node should match,
+otherwise setting parent will fail. In other words, it's only possible to group
+rate objects of the same \fBlimit_type\fR.
 
 .SS devlink port function rate add - create node rate object with specified parameters.
 Creates rate object of type node and sets parameters. Parameters same as for the
@@ -222,6 +302,8 @@ pci/0000:03:00.0/some_group type node
         "pci/0000:03:00.0/2": {
 .br
             "type": "leaf",
+.br
+            "limit_type": "shaping",
 .br
             "tx_share": 1500000
 .br
@@ -255,6 +337,10 @@ pci/0000:03:00.0/some_group type node
 # devlink port function rate set pci/0000:03:00.0/1 \\
 .br
 	tx_share 2Mbit tx_max 10Mbit
+.PP
+# devlink port function rate set pci/0000:03:00.0/2 \\
+.br
+	limit_type police rx_max 10Mbit rx_burst 4mb
 .RE
 
 .PP
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
                   ` (9 preceding siblings ...)
  2022-06-20 15:35 ` [PATCH iproute2-next 5/5] devlink: Introduce port rate limit_type police Dima Chumak
@ 2022-06-20 20:04 ` Jakub Kicinski
  2022-06-30 15:27   ` Dima Chumak
  10 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-06-20 20:04 UTC (permalink / raw)
  To: Dima Chumak
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev

On Mon, 20 Jun 2022 18:26:42 +0300 Dima Chumak wrote:
> Currently, kernel provides a way to limit tx rate of a VF via devlink
> rate function of a port. The underlying mechanism is a shaper applied to
> all traffic passing through the target VF or a group of VFs. By its
> essence, a shaper naturally works with outbound traffic, and in
> practice, it's rarely seen to be implemented for inbound traffic.
> Nevertheless, there is a user request to have a mechanism for limiting
> inbound traffic as well. It is usually done by using some form of
> traffic policing, dropping excess packets over the configured limit that
> set by a user. Thus, introducing another limiting mechanism to the port
> function can help close this gap.
> 
> This series introduces devlink attrs, along with their ops, to manage
> rate policing of a single port as well as a port group. It is based on
> the existing notion of leaf and node rate objects, and extends their
> attributes to support both RX and TX limiting, for a number of packets
> per second and/or a number of bytes per second. Additionally, there is a
> second set of parameters for specifying the size of buffering performed,
> called "burst", that controls the allowed level of spikes in traffic
> before it starts getting dropped.
> 
> A new sub-type of a devlink_rate object is introduced, called
> "limit_type". It can be either "shaping", the default, or "police".
> A single leaf or a node object can be switched from one limit type to
> another, but it cannot do both types of rate limiting simultaneously.
> A node and a leaf object that have parent-child relationship must have
> the same limit type. In other words, it's only possible to group rate
> objects of the same limit type as their group's limit_type.

TC already has the police action. Your previous patches were accepted
because there was no exact match for shaping / admission. Now you're 
"extending" that API to duplicate existing TC APIs. Infuriating. 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-06-20 20:04 ` [PATCH net-next 0/5] devlink rate police limiter Jakub Kicinski
@ 2022-06-30 15:27   ` Dima Chumak
  2022-06-30 18:13     ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Dima Chumak @ 2022-06-30 15:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev

On 6/20/22 10:04 PM, Jakub Kicinski wrote:
> 
> On Mon, 20 Jun 2022 18:26:42 +0300 Dima Chumak wrote:
>> Currently, kernel provides a way to limit tx rate of a VF via devlink
>> rate function of a port. The underlying mechanism is a shaper applied to
>> all traffic passing through the target VF or a group of VFs. By its
>> essence, a shaper naturally works with outbound traffic, and in
>> practice, it's rarely seen to be implemented for inbound traffic.
>> Nevertheless, there is a user request to have a mechanism for limiting
>> inbound traffic as well. It is usually done by using some form of
>> traffic policing, dropping excess packets over the configured limit that
>> set by a user. Thus, introducing another limiting mechanism to the port
>> function can help close this gap.
>>
>> This series introduces devlink attrs, along with their ops, to manage
>> rate policing of a single port as well as a port group. It is based on
>> the existing notion of leaf and node rate objects, and extends their
>> attributes to support both RX and TX limiting, for a number of packets
>> per second and/or a number of bytes per second. Additionally, there is a
>> second set of parameters for specifying the size of buffering performed,
>> called "burst", that controls the allowed level of spikes in traffic
>> before it starts getting dropped.
>>
>> A new sub-type of a devlink_rate object is introduced, called
>> "limit_type". It can be either "shaping", the default, or "police".
>> A single leaf or a node object can be switched from one limit type to
>> another, but it cannot do both types of rate limiting simultaneously.
>> A node and a leaf object that have parent-child relationship must have
>> the same limit type. In other words, it's only possible to group rate
>> objects of the same limit type as their group's limit_type.
> 
> TC already has the police action. Your previous patches were accepted
> because there was no exact match for shaping / admission. Now you're
> "extending" that API to duplicate existing TC APIs. Infuriating.

I'm sorry for not being able to reply promptly.

I've re-read more carefully the cover letter of the original 'devlink:
rate objects API' series by Dmytro Linkin, off of which I based my
patches, though my understanding still might be incomplete/incorrect
here.

It seems that TC, being ingress only, doesn't cover the full spectrum of
rate-limiting that's possible to achieve with devlink. TC works only
with representors and doesn't allow to configure "the other side of the
wire", where devlink port function seems to be a better match as it
connects directly to a VF.

Also, for the existing devlink-rate mechanism of VF grouping, it would be
challenging to achieve similar functionality with TC flows, as groups don't
have a net device instance where flows can be attached.

I want to apologize in case my proposed changes have come across as
being bluntly ignoring some of the pre-established agreements and
understandings of TC / devlink responsibility separation, it wasn't
intentional.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-06-30 15:27   ` Dima Chumak
@ 2022-06-30 18:13     ` Jakub Kicinski
  2022-07-07 11:20       ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-06-30 18:13 UTC (permalink / raw)
  To: Dima Chumak
  Cc: Jiri Pirko, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Simon Horman

On Thu, 30 Jun 2022 17:27:08 +0200 Dima Chumak wrote:
> I've re-read more carefully the cover letter of the original 'devlink:
> rate objects API' series by Dmytro Linkin, off of which I based my
> patches, though my understanding still might be incomplete/incorrect
> here.
> 
> It seems that TC, being ingress only, doesn't cover the full spectrum of
> rate-limiting that's possible to achieve with devlink. TC works only
> with representors and doesn't allow to configure "the other side of the
> wire", where devlink port function seems to be a better match as it
> connects directly to a VF.

Right, but you are adding Rx and Tx now, IIUC, so you're venturing into
the same "side of the wire" where tc lives.

> Also, for the existing devlink-rate mechanism of VF grouping, it would be
> challenging to achieve similar functionality with TC flows, as groups don't
> have a net device instance where flows can be attached.

You can share actions in TC. The hierarchical aspects may be more
limited, not sure.

> I want to apologize in case my proposed changes have come across as
> being bluntly ignoring some of the pre-established agreements and
> understandings of TC / devlink responsibility separation, it wasn't
> intentional.

Apologies, TBH I thought you're the same person I was arguing with last
time.

My objective is to avoid having multiple user space interfaces which 
drivers have to (a) support and (b) reconcile. We already have the VF 
rate limits in ip link, and in TC (which I believe is used by OvS
offload). 

I presume you have a mlx5 implementation ready, so how do you reconcile
those 3 APIs?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-06-30 18:13     ` Jakub Kicinski
@ 2022-07-07 11:20       ` Jiri Pirko
  2022-07-07 20:16         ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2022-07-07 11:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Dima Chumak, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Simon Horman

Thu, Jun 30, 2022 at 08:13:27PM CEST, kuba@kernel.org wrote:
>On Thu, 30 Jun 2022 17:27:08 +0200 Dima Chumak wrote:
>> I've re-read more carefully the cover letter of the original 'devlink:
>> rate objects API' series by Dmytro Linkin, off of which I based my
>> patches, though my understanding still might be incomplete/incorrect
>> here.
>> 
>> It seems that TC, being ingress only, doesn't cover the full spectrum of
>> rate-limiting that's possible to achieve with devlink. TC works only
>> with representors and doesn't allow to configure "the other side of the
>> wire", where devlink port function seems to be a better match as it
>> connects directly to a VF.
>
>Right, but you are adding Rx and Tx now, IIUC, so you're venturing into
>the same "side of the wire" where tc lives.

Wait. Lets draw the basic picture of "the wire":

--------------------------+                +--------------------------
eswitch representor netdev|=====thewire====|function (vf/sf/whatever
--------------------------+                +-------------------------

Now the rate setting Dima is talking about, it is the configuration of
the "function" side. Setting the rate is limitting the "function" TX/RX
Note that this function could be of any type - netdev, rdma, vdpa, nvme.
Configuring the TX/RX rate (including groupping) applies to all of
these.

Putting the configuration on the eswitch representor does not fit:
1) it is configuring the other side of the wire, the configuration
   should be of the eswitch port. Configuring the other side is
   confusing and misleading. For the purpose of configuring the
   "function" side, we introduced "port function" object in devlink.
2) it is confuguring netdev/ethernet however the confuguration applies
   to all queues of the function.


>
>> Also, for the existing devlink-rate mechanism of VF grouping, it would be
>> challenging to achieve similar functionality with TC flows, as groups don't
>> have a net device instance where flows can be attached.
>
>You can share actions in TC. The hierarchical aspects may be more
>limited, not sure.
>
>> I want to apologize in case my proposed changes have come across as
>> being bluntly ignoring some of the pre-established agreements and
>> understandings of TC / devlink responsibility separation, it wasn't
>> intentional.
>
>Apologies, TBH I thought you're the same person I was arguing with last
>time.
>
>My objective is to avoid having multiple user space interfaces which 
>drivers have to (a) support and (b) reconcile. We already have the VF 
>rate limits in ip link, and in TC (which I believe is used by OvS
>offload). 
>
>I presume you have a mlx5 implementation ready, so how do you reconcile
>those 3 APIs?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-07 11:20       ` Jiri Pirko
@ 2022-07-07 20:16         ` Jakub Kicinski
  2022-07-08  7:27           ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-07-07 20:16 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Dima Chumak, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Simon Horman

On Thu, 7 Jul 2022 13:20:12 +0200 Jiri Pirko wrote:
> Wait. Lets draw the basic picture of "the wire":
> 
> --------------------------+                +--------------------------
> eswitch representor netdev|=====thewire====|function (vf/sf/whatever
> --------------------------+                +-------------------------
> 
> Now the rate setting Dima is talking about, it is the configuration of
> the "function" side. Setting the rate is limitting the "function" TX/RX
> Note that this function could be of any type - netdev, rdma, vdpa, nvme.

The patches add policing, are you saying we're gonna drop RDMA or NVMe
I/O?

> Configuring the TX/RX rate (including groupping) applies to all of
> these.

I don't understand why the "side of the wire" matters when the patches
target both Rx and Tx. Surely that covers both directions.

> Putting the configuration on the eswitch representor does not fit:
> 1) it is configuring the other side of the wire, the configuration
>    should be of the eswitch port. Configuring the other side is
>    confusing and misleading. For the purpose of configuring the
>    "function" side, we introduced "port function" object in devlink.
> 2) it is confuguring netdev/ethernet however the confuguration applies
>    to all queues of the function.

If you think it's technically superior to put it in devlink that's fine.
I'll repeat myself - what I'm asking for is convergence so that drivers
don't have  to implement 3 different ways of configuring this. We have
devlink rate for from-VF direction shaping, tc police for bi-dir
policing and obviously legacy NDOs. None of them translate between each
other so drivers and user space have to juggle interfaces.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-07 20:16         ` Jakub Kicinski
@ 2022-07-08  7:27           ` Jiri Pirko
  2022-07-08 18:05             ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2022-07-08  7:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman

Thu, Jul 07, 2022 at 10:16:49PM CEST, kuba@kernel.org wrote:
>On Thu, 7 Jul 2022 13:20:12 +0200 Jiri Pirko wrote:
>> Wait. Lets draw the basic picture of "the wire":
>> 
>> --------------------------+                +--------------------------
>> eswitch representor netdev|=====thewire====|function (vf/sf/whatever
>> --------------------------+                +-------------------------
>> 
>> Now the rate setting Dima is talking about, it is the configuration of
>> the "function" side. Setting the rate is limitting the "function" TX/RX
>> Note that this function could be of any type - netdev, rdma, vdpa, nvme.
>
>The patches add policing, are you saying we're gonna drop RDMA or NVMe
>I/O?

Well, there is some limit to the rate of VF anyway, so at some point,
the packets need to be dropped, with or without policing.
Not really sure how that is handled in rdma and nvme.


>
>> Configuring the TX/RX rate (including groupping) applies to all of
>> these.
>
>I don't understand why the "side of the wire" matters when the patches
>target both Rx and Tx. Surely that covers both directions.

Hmm, I believe it really does. We have objects which we configure. There
is a function object, which has some configuration (including this).
Making user to configure function object via another object (eswitch
port netdevice on the other side of the wire), is quite confusing and I
feel it is wrong. The only reason is to somehow fit TC interface for
which we don't have an anchor for port function.

What about another configuration? would it be ok to use eswitch port
netdev to configure port function too, if there is an interface for it?
I believe not, that is why we introduced port function.


>
>> Putting the configuration on the eswitch representor does not fit:
>> 1) it is configuring the other side of the wire, the configuration
>>    should be of the eswitch port. Configuring the other side is
>>    confusing and misleading. For the purpose of configuring the
>>    "function" side, we introduced "port function" object in devlink.
>> 2) it is confuguring netdev/ethernet however the confuguration applies
>>    to all queues of the function.
>
>If you think it's technically superior to put it in devlink that's fine.
>I'll repeat myself - what I'm asking for is convergence so that drivers
>don't have  to implement 3 different ways of configuring this. We have
>devlink rate for from-VF direction shaping, tc police for bi-dir
>policing and obviously legacy NDOs. None of them translate between each
>other so drivers and user space have to juggle interfaces.

The legacy ndo is legacy. Drivers that implement switchdev mode do
not implement those, and should not.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-08  7:27           ` Jiri Pirko
@ 2022-07-08 18:05             ` Jakub Kicinski
  2022-07-09  5:14               ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-07-08 18:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

Adding Michal

On Fri, 8 Jul 2022 09:27:14 +0200 Jiri Pirko wrote:
> >> Configuring the TX/RX rate (including groupping) applies to all of
> >> these.  
> >
> >I don't understand why the "side of the wire" matters when the patches
> >target both Rx and Tx. Surely that covers both directions.  
> 
> Hmm, I believe it really does. We have objects which we configure. There
> is a function object, which has some configuration (including this).
> Making user to configure function object via another object (eswitch
> port netdevice on the other side of the wire), is quite confusing and I
> feel it is wrong. The only reason is to somehow fit TC interface for
> which we don't have an anchor for port function.
> 
> What about another configuration? would it be ok to use eswitch port
> netdev to configure port function too, if there is an interface for it?
> I believe not, that is why we introduced port function.

I resisted the port function aberration as long as I could. It's 
a limitation of your design as far as I'm concerned.

Switches use TC to configure egress queuing, that's our Linux model.
Representor is the switch side, TC qdisc on it maps to the egress
of the switch.

I don't understand where the disconnect between us is, you know that's
what mlxsw does..

> >> Putting the configuration on the eswitch representor does not fit:
> >> 1) it is configuring the other side of the wire, the configuration
> >>    should be of the eswitch port. Configuring the other side is
> >>    confusing and misleading. For the purpose of configuring the
> >>    "function" side, we introduced "port function" object in devlink.
> >> 2) it is confuguring netdev/ethernet however the confuguration applies
> >>    to all queues of the function.  
> >
> >If you think it's technically superior to put it in devlink that's fine.
> >I'll repeat myself - what I'm asking for is convergence so that drivers
> >don't have  to implement 3 different ways of configuring this. We have
> >devlink rate for from-VF direction shaping, tc police for bi-dir
> >policing and obviously legacy NDOs. None of them translate between each
> >other so drivers and user space have to juggle interfaces.  
> 
> The legacy ndo is legacy. Drivers that implement switchdev mode do
> not implement those, and should not.

That's irrelevant - what I'm saying is that in practice drivers have to
implement _all_ of these interfaces today. Just because they are not
needed in eswitch mode doesn't mean the sales department won't find a
customer who's happy with the non-switchdev mode and doesn't want to
move.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-08 18:05             ` Jakub Kicinski
@ 2022-07-09  5:14               ` Jiri Pirko
  2022-07-11 17:29                 ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2022-07-09  5:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

Fri, Jul 08, 2022 at 08:05:35PM CEST, kuba@kernel.org wrote:
>Adding Michal
>
>On Fri, 8 Jul 2022 09:27:14 +0200 Jiri Pirko wrote:
>> >> Configuring the TX/RX rate (including groupping) applies to all of
>> >> these.  
>> >
>> >I don't understand why the "side of the wire" matters when the patches
>> >target both Rx and Tx. Surely that covers both directions.  
>> 
>> Hmm, I believe it really does. We have objects which we configure. There
>> is a function object, which has some configuration (including this).
>> Making user to configure function object via another object (eswitch
>> port netdevice on the other side of the wire), is quite confusing and I
>> feel it is wrong. The only reason is to somehow fit TC interface for
>> which we don't have an anchor for port function.
>> 
>> What about another configuration? would it be ok to use eswitch port
>> netdev to configure port function too, if there is an interface for it?
>> I believe not, that is why we introduced port function.
>
>I resisted the port function aberration as long as I could. It's 

Why do you say "aberration"? It is a legitimate feature that is allowing
to solve legitimate issues. Maybe I'm missing something.


>a limitation of your design as far as I'm concerned.

What do you mean? This is not related to us only. The need to work with
port function (the other side of the wire) is definitelly nothing
specific to mlx5 driver.


>
>Switches use TC to configure egress queuing, that's our Linux model.
>Representor is the switch side, TC qdisc on it maps to the egress
>of the switch.

Sure.

>
>I don't understand where the disconnect between us is, you know that's
>what mlxsw does..

No disconnect. mlxsw works like that. However, there is no VF/SF in
mlxsw world. The other side of the wire is a different host.

However in case of VF/SF, we also need to configure the other side of
the wire, which we are orchestrating. That is the sole purpose of why we
have devlink port function. And once we have such object, why is it
incorrect to use it for the needed configuration?

Okay, if you really feel that we need to reuse TC interface for this
feature (however mismathing it might be), lets create a netdev for the
port function to hook this to. But do we want such a beast? But to hook
this to eswitch port representor seems to me plain wrong.


>
>> >> Putting the configuration on the eswitch representor does not fit:
>> >> 1) it is configuring the other side of the wire, the configuration
>> >>    should be of the eswitch port. Configuring the other side is
>> >>    confusing and misleading. For the purpose of configuring the
>> >>    "function" side, we introduced "port function" object in devlink.
>> >> 2) it is confuguring netdev/ethernet however the confuguration applies
>> >>    to all queues of the function.  
>> >
>> >If you think it's technically superior to put it in devlink that's fine.
>> >I'll repeat myself - what I'm asking for is convergence so that drivers
>> >don't have  to implement 3 different ways of configuring this. We have
>> >devlink rate for from-VF direction shaping, tc police for bi-dir
>> >policing and obviously legacy NDOs. None of them translate between each
>> >other so drivers and user space have to juggle interfaces.  
>> 
>> The legacy ndo is legacy. Drivers that implement switchdev mode do
>> not implement those, and should not.
>
>That's irrelevant - what I'm saying is that in practice drivers have to
>implement _all_ of these interfaces today. Just because they are not
>needed in eswitch mode doesn't mean the sales department won't find a
>customer who's happy with the non-switchdev mode and doesn't want to
>move.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-09  5:14               ` Jiri Pirko
@ 2022-07-11 17:29                 ` Jakub Kicinski
  2022-07-12  6:03                   ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-07-11 17:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

On Sat, 9 Jul 2022 07:14:31 +0200 Jiri Pirko wrote:
> >I resisted the port function aberration as long as I could. It's   
> 
> Why do you say "aberration"? It is a legitimate feature that is allowing
> to solve legitimate issues. Maybe I'm missing something.

From netdev perspective it's an implementation detail irrelevant 
to the user. The netdev model is complete without it.

> >a limitation of your design as far as I'm concerned.  
> 
> What do you mean? This is not related to us only. The need to work with
> port function (the other side of the wire) is definitelly nothing
> specific to mlx5 driver.
>
> >Switches use TC to configure egress queuing, that's our Linux model.
> >Representor is the switch side, TC qdisc on it maps to the egress
> >of the switch.  
> 
> Sure.
>
> >I don't understand where the disconnect between us is, you know that's
> >what mlxsw does..  
> 
> No disconnect. mlxsw works like that. However, there is no VF/SF in
> mlxsw world. The other side of the wire is a different host.
> 
> However in case of VF/SF, we also need to configure the other side of
> the wire, which we are orchestrating. That is the sole purpose of why we
> have devlink port function. And once we have such object, why is it
> incorrect to use it for the needed configuration?

So the function conversation _is_ relevant here, eh? Sad but it is what
it is.

> Okay, if you really feel that we need to reuse TC interface for this
> feature (however mismathing it might be),

Not what I said, I'm not gonna say it the fourth time.

> lets create a netdev for the port function to hook this to. But do we
> want such a beast? But to hook this to eswitch port representor seems
> to me plain wrong.

I presume you're being facetious. Extra netdev is gonna help nothing. 

AFAIU the problem is that you want to control endpoints which are not
ndevs with this API. Is that the main or only reason? Can we agree that
it's legitimate but will result in muddying the netdev model (which in
itself is good and complete)?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-11 17:29                 ` Jakub Kicinski
@ 2022-07-12  6:03                   ` Jiri Pirko
  2022-07-13  0:13                     ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2022-07-12  6:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

Mon, Jul 11, 2022 at 07:29:57PM CEST, kuba@kernel.org wrote:
>On Sat, 9 Jul 2022 07:14:31 +0200 Jiri Pirko wrote:
>> >I resisted the port function aberration as long as I could. It's   
>> 
>> Why do you say "aberration"? It is a legitimate feature that is allowing
>> to solve legitimate issues. Maybe I'm missing something.
>
>From netdev perspective it's an implementation detail irrelevant 
>to the user. The netdev model is complete without it.

Well it is a configuration of a device part out of the scope of netdev.
So yes, netdev model is complete without it. But does does not mean we
don't need such configuration. I may be missing your point.


>
>> >a limitation of your design as far as I'm concerned.  
>> 
>> What do you mean? This is not related to us only. The need to work with
>> port function (the other side of the wire) is definitelly nothing
>> specific to mlx5 driver.
>>
>> >Switches use TC to configure egress queuing, that's our Linux model.
>> >Representor is the switch side, TC qdisc on it maps to the egress
>> >of the switch.  
>> 
>> Sure.
>>
>> >I don't understand where the disconnect between us is, you know that's
>> >what mlxsw does..  
>> 
>> No disconnect. mlxsw works like that. However, there is no VF/SF in
>> mlxsw world. The other side of the wire is a different host.
>> 
>> However in case of VF/SF, we also need to configure the other side of
>> the wire, which we are orchestrating. That is the sole purpose of why we
>> have devlink port function. And once we have such object, why is it
>> incorrect to use it for the needed configuration?
>
>So the function conversation _is_ relevant here, eh? Sad but it is what
>it is.

I'm not sure I follow what "function conversation" you mean. :/


>
>> Okay, if you really feel that we need to reuse TC interface for this
>> feature (however mismathing it might be),
>
>Not what I said, I'm not gonna say it the fourth time.

Okay, sorry for being slow, but I still don't understand your point :/


>
>> lets create a netdev for the port function to hook this to. But do we
>> want such a beast? But to hook this to eswitch port representor seems
>> to me plain wrong.
>
>I presume you're being facetious. Extra netdev is gonna help nothing. 

I'm somewhat am, yes.


>
>AFAIU the problem is that you want to control endpoints which are not
>ndevs with this API. Is that the main or only reason? Can we agree that
>it's legitimate but will result in muddying the netdev model (which in
>itself is good and complete)?

I don't think this has anything to do with netdev model. It is actually
out of the scope of it, therefore there cannot be any mudding of it.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-12  6:03                   ` Jiri Pirko
@ 2022-07-13  0:13                     ` Jakub Kicinski
  2022-07-13  5:04                       ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-07-13  0:13 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

On Tue, 12 Jul 2022 08:03:40 +0200 Jiri Pirko wrote:
> >AFAIU the problem is that you want to control endpoints which are not
> >ndevs with this API. Is that the main or only reason? Can we agree that
> >it's legitimate but will result in muddying the netdev model (which in
> >itself is good and complete)?  
> 
> I don't think this has anything to do with netdev model. 
> It is actually out of the scope of it, therefore there cannot be any mudding of it.

You should have decided that rate limiting was out of scope for netdev
before we added tc qdisc and tc police support. Now those offloads are
there, used by people and it's too late.

If you want to create a common way to rate limit functions you must
provide plumbing for the existing methods (at least tc police,
preferably legacy NDO as well) to automatically populate the new API.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-13  0:13                     ` Jakub Kicinski
@ 2022-07-13  5:04                       ` Jiri Pirko
  2022-07-13 17:52                         ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2022-07-13  5:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

Wed, Jul 13, 2022 at 02:13:41AM CEST, kuba@kernel.org wrote:
>On Tue, 12 Jul 2022 08:03:40 +0200 Jiri Pirko wrote:
>> >AFAIU the problem is that you want to control endpoints which are not
>> >ndevs with this API. Is that the main or only reason? Can we agree that
>> >it's legitimate but will result in muddying the netdev model (which in
>> >itself is good and complete)?  
>> 
>> I don't think this has anything to do with netdev model. 
>> It is actually out of the scope of it, therefore there cannot be any mudding of it.
>
>You should have decided that rate limiting was out of scope for netdev
>before we added tc qdisc and tc police support. Now those offloads are
>there, used by people and it's too late.
>
>If you want to create a common way to rate limit functions you must
>provide plumbing for the existing methods (at least tc police,
>preferably legacy NDO as well) to automatically populate the new API.

Even if there is no netdevice to hook it to, because it does not exist?
I have to be missing something, sorry :/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-13  5:04                       ` Jiri Pirko
@ 2022-07-13 17:52                         ` Jakub Kicinski
  2022-07-14  4:55                           ` Jiri Pirko
  0 siblings, 1 reply; 26+ messages in thread
From: Jakub Kicinski @ 2022-07-13 17:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

On Wed, 13 Jul 2022 07:04:04 +0200 Jiri Pirko wrote:
> Wed, Jul 13, 2022 at 02:13:41AM CEST, kuba@kernel.org wrote:
> >> I don't think this has anything to do with netdev model. 
> >> It is actually out of the scope of it, therefore there cannot be any mudding of it.  
> >
> >You should have decided that rate limiting was out of scope for netdev
> >before we added tc qdisc and tc police support. Now those offloads are
> >there, used by people and it's too late.
> >
> >If you want to create a common way to rate limit functions you must
> >provide plumbing for the existing methods (at least tc police,
> >preferably legacy NDO as well) to automatically populate the new API.  
> 
> Even if there is no netdevice to hook it to, because it does not exist?
> I have to be missing something, sorry :/

What I'm saying is that we can treat the devlink rate API as a "lower
layer interface". A layer under the netdevs. That seems sensible and
removes the API duplication which otherwise annoys me.

We want drivers to only have to implement one API.

So when user calls the legacy NDO API it should check if the device has
devlink rate support, first, and try to translate the legacy request
into devlink rate.

Same for TC police as installed by the OvS offload feature that Simon
knows far more about than I do. IIRC we use a combination of matchall
and police to do shaping.

That way drivers don't have to implement all three APIs, only devlink
rate (four APIs if we count TC qdisc but I think only NFP uses that
one and it has RED etc so that's too much).

Does this help or am I still not making sense?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-13 17:52                         ` Jakub Kicinski
@ 2022-07-14  4:55                           ` Jiri Pirko
  2022-07-14 16:07                             ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Pirko @ 2022-07-14  4:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

Wed, Jul 13, 2022 at 07:52:55PM CEST, kuba@kernel.org wrote:
>On Wed, 13 Jul 2022 07:04:04 +0200 Jiri Pirko wrote:
>> Wed, Jul 13, 2022 at 02:13:41AM CEST, kuba@kernel.org wrote:
>> >> I don't think this has anything to do with netdev model. 
>> >> It is actually out of the scope of it, therefore there cannot be any mudding of it.  
>> >
>> >You should have decided that rate limiting was out of scope for netdev
>> >before we added tc qdisc and tc police support. Now those offloads are
>> >there, used by people and it's too late.
>> >
>> >If you want to create a common way to rate limit functions you must
>> >provide plumbing for the existing methods (at least tc police,
>> >preferably legacy NDO as well) to automatically populate the new API.  
>> 
>> Even if there is no netdevice to hook it to, because it does not exist?
>> I have to be missing something, sorry :/
>
>What I'm saying is that we can treat the devlink rate API as a "lower
>layer interface". A layer under the netdevs. That seems sensible and
>removes the API duplication which otherwise annoys me.
>
>We want drivers to only have to implement one API.
>
>So when user calls the legacy NDO API it should check if the device has
>devlink rate support, first, and try to translate the legacy request
>into devlink rate.
>
>Same for TC police as installed by the OvS offload feature that Simon
>knows far more about than I do. IIRC we use a combination of matchall
>and police to do shaping.
>
>That way drivers don't have to implement all three APIs, only devlink
>rate (four APIs if we count TC qdisc but I think only NFP uses that
>one and it has RED etc so that's too much).
>
>Does this help or am I still not making sense?

I think I got it now. But in our case, there is no change for the user,
as the netdev does not exist. So user still uses devlink rate uapi as
proposed by this patchset. Only internal kernel changes requested.
Correct?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/5] devlink rate police limiter
  2022-07-14  4:55                           ` Jiri Pirko
@ 2022-07-14 16:07                             ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2022-07-14 16:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jiri Pirko, Dima Chumak, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev, Simon Horman, Michal Wilczynski

On Thu, 14 Jul 2022 06:55:05 +0200 Jiri Pirko wrote:
> Wed, Jul 13, 2022 at 07:52:55PM CEST, kuba@kernel.org wrote:
> >On Wed, 13 Jul 2022 07:04:04 +0200 Jiri Pirko wrote:  
> >> Even if there is no netdevice to hook it to, because it does not exist?
> >> I have to be missing something, sorry :/  
> >
> >What I'm saying is that we can treat the devlink rate API as a "lower
> >layer interface". A layer under the netdevs. That seems sensible and
> >removes the API duplication which otherwise annoys me.
> >
> >We want drivers to only have to implement one API.
> >
> >So when user calls the legacy NDO API it should check if the device has
> >devlink rate support, first, and try to translate the legacy request
> >into devlink rate.
> >
> >Same for TC police as installed by the OvS offload feature that Simon
> >knows far more about than I do. IIRC we use a combination of matchall
> >and police to do shaping.
> >
> >That way drivers don't have to implement all three APIs, only devlink
> >rate (four APIs if we count TC qdisc but I think only NFP uses that
> >one and it has RED etc so that's too much).
> >
> >Does this help or am I still not making sense?  
> 
> I think I got it now. But in our case, there is no change for the user,
> as the netdev does not exist. So user still uses devlink rate uapi as
> proposed by this patchset. Only internal kernel changes requested.
> Correct?

Right. If the user wants to use devlink-rate directly they can.
For legacy users the kernel will translate to devlink-rate.
The point is for drivers to only have to implement devlink-rate.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-07-14 16:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 15:26 [PATCH net-next 0/5] devlink rate police limiter Dima Chumak
2022-06-20 15:26 ` [PATCH net-next 1/5] devlink: Introduce limit_type attr for rate objects Dima Chumak
2022-06-20 15:26 ` [PATCH net-next 2/5] devlink: Introduce police rate limit type Dima Chumak
2022-06-20 15:26 ` [PATCH net-next 3/5] netdevsim: Support devlink rate limit_type police Dima Chumak
2022-06-20 15:26 ` [PATCH net-next 4/5] selftest: netdevsim: Add devlink rate police sub-test Dima Chumak
2022-06-20 15:26 ` [PATCH net-next 5/5] Documentation: devlink rate objects limit_type Dima Chumak
2022-06-20 15:35 ` [PATCH iproute2-next 1/5] uapi: devlink.h DEVLINK_ATTR_RATE_LIMIT_TYPE Dima Chumak
2022-06-20 15:35 ` [PATCH iproute2-next 2/5] devlink: Add port rate limit_type support Dima Chumak
2022-06-20 15:35 ` [PATCH iproute2-next 3/5] utils: Add get_size64() Dima Chumak
2022-06-20 15:35 ` [PATCH iproute2-next 4/5] uapi: devlink.h DEVLINK_RATE_LIMIT_TYPE_POLICE Dima Chumak
2022-06-20 15:35 ` [PATCH iproute2-next 5/5] devlink: Introduce port rate limit_type police Dima Chumak
2022-06-20 20:04 ` [PATCH net-next 0/5] devlink rate police limiter Jakub Kicinski
2022-06-30 15:27   ` Dima Chumak
2022-06-30 18:13     ` Jakub Kicinski
2022-07-07 11:20       ` Jiri Pirko
2022-07-07 20:16         ` Jakub Kicinski
2022-07-08  7:27           ` Jiri Pirko
2022-07-08 18:05             ` Jakub Kicinski
2022-07-09  5:14               ` Jiri Pirko
2022-07-11 17:29                 ` Jakub Kicinski
2022-07-12  6:03                   ` Jiri Pirko
2022-07-13  0:13                     ` Jakub Kicinski
2022-07-13  5:04                       ` Jiri Pirko
2022-07-13 17:52                         ` Jakub Kicinski
2022-07-14  4:55                           ` Jiri Pirko
2022-07-14 16:07                             ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.