All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] ethdev: abstraction layer for QoS hierarchical scheduler
@ 2017-03-04  1:10 Cristian Dumitrescu
  2017-03-04  1:10 ` [PATCH v3 1/2] ethdev: add capability control API Cristian Dumitrescu
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
  0 siblings, 2 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-03-04  1:10 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

This patch set introduces an ethdev-based abstraction layer for Quality of
Service (QoS) Traffic Manager, which includes: hierarchical scheduling, traffic
shaping, congestion management, packet marking. The goal is to provide a simple
generic API that is agnostic of the underlying HW, SW or mixed HW-SW
implementation.

Patch 1 builds on the mechanism introduced by rte_flow in DPDK and generalizes
it to make it available for other ethdev features/capabilities (such as the
traffic manager). The goal is to define a plugin-like mechanism to extend
the ethdev functionality in a modular way as opposed to the current monolithic
approach.

Patch 2 introduces the generic ethdev API for traffic manager using the
above plugin-like mechanism for ethdev.

Cristian Dumitrescu (2):
  ethdev: add capability control API
  ethdev: add hierarchical scheduler API

 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ethdev.c          |   13 +
 lib/librte_ether/rte_ethdev.h          |   29 +
 lib/librte_ether/rte_ether_version.map |   37 +
 lib/librte_ether/rte_tm.c              |  436 ++++++++++
 lib/librte_ether/rte_tm.h              | 1466 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  365 ++++++++
 8 files changed, 2354 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

-- 
2.5.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 1/2] ethdev: add capability control API
  2017-03-04  1:10 [PATCH v3 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
@ 2017-03-04  1:10 ` Cristian Dumitrescu
  2017-03-06 10:32   ` Thomas Monjalon
  2017-05-19 17:12   ` [PATCH v4 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
  1 sibling, 2 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-03-04  1:10 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

The rte_flow feature breaks the current monolithic approach for ethdev and
introduces the new generic flow API to ethdev using a plugin-like approach.

Basically, the rte_flow API is still logically part of ethdev:
- It extends the ethdev functionality: rte_flow is a new feature/capability
  of ethdev;
- all its functions work on an Ethernet device: the first parameter of the
  rte_flow functions is Ethernet device port ID.

At the same time, the rte_flow API is a sort of capability plugin for ethdev:
- the rte_flow API functions have their own name space: they are called
  rte_flow_operationXYZ() as opposed to rte_eth_dev_flow_operationXYZ());
- the rte_flow API functions are placed in separate files in the same
  librte_ether folder as opposed to rte_ethdev.[hc].

The way it works is by using the existing ethdev API function
rte_eth_dev_filter_ctrl() to query the current Ethernet device port ID for the
support of the rte_flow capability and return the pointer to the
rte_flow operations when supported and NULL otherwise:

struct rte_flow_ops *eth_flow_ops;
int rte = rte_eth_dev_filter_ctrl(eth_port_id,
	RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, &eth_flow_ops);

Unfortunately, the rte_flow opportunistically uses the rte_eth_dev_filter_ctrl()
API function, which is applicable just to RX-side filters as opposed to
introducing a mechanism that could be used by any capability in a generic way.

This is the gap that addressed by the current patch. This mechanism is intended
to be used to introduce new capabilities into ethdev in a modular plugin-like
approach, such as hierarchical scheduler. Over time, if agreed, it can also be
used for exposing the existing Ethernet device capabilities in a modular way,
such as: xstats, filters, multicast, mirroring, tunnels, time stamping, eeprom,
bypass, etc.

Changes in v3:
-Followed up on suggestion from Jerin: renamed capability from Hierarchical
 Scheduler (sched) to Traffic Manager (tm)

Changes in v2:
-Followed up on suggestion from Jerin and Hemant: renamed capability_control()
 to capability_ops_get()
-Added ACK from Keith, Jerin and Hemant

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 lib/librte_ether/rte_ethdev.c          | 13 +++++++++++++
 lib/librte_ether/rte_ethdev.h          | 29 +++++++++++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  7 +++++++
 3 files changed, 49 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index eb0a94a..674bbae 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2802,6 +2802,19 @@ rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 	return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op, arg);
 }
 
+int
+rte_eth_dev_capability_ops_get(uint8_t port_id, enum rte_eth_capability cap,
+	void *arg)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->cap_ops_get, -ENOTSUP);
+	return (*dev->dev_ops->cap_ops_get)(dev, cap, arg);
+}
+
 void *
 rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
 		rte_rx_callback_fn fn, void *user_param)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 97f3e2d..706187c 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1073,6 +1073,12 @@ TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
  * structure associated with an Ethernet device.
  */
 
+enum rte_eth_capability {
+	RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
+	RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
+	RTE_ETH_CAPABILITY_MAX
+};
+
 typedef int  (*eth_dev_configure_t)(struct rte_eth_dev *dev);
 /**< @internal Ethernet device configuration. */
 
@@ -1427,6 +1433,10 @@ typedef int (*eth_filter_ctrl_t)(struct rte_eth_dev *dev,
 				 void *arg);
 /**< @internal Take operations to assigned filter type on an Ethernet device */
 
+typedef int (*eth_capability_ops_get_t)(struct rte_eth_dev *dev,
+	enum rte_eth_capability cap, void *arg);
+/**< @internal Take capability operations on an Ethernet device */
+
 typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev,
 				 struct rte_eth_dcb_info *dcb_info);
 /**< @internal Get dcb information on an Ethernet device */
@@ -1548,6 +1558,8 @@ struct eth_dev_ops {
 	eth_timesync_adjust_time   timesync_adjust_time; /** Adjust the device clock. */
 	eth_timesync_read_time     timesync_read_time; /** Get the device clock time. */
 	eth_timesync_write_time    timesync_write_time; /** Set the device clock time. */
+
+	eth_capability_ops_get_t   cap_ops_get; /**< capability control. */
 };
 
 /**
@@ -3890,6 +3902,23 @@ int rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 			enum rte_filter_op filter_op, void *arg);
 
 /**
+ * Take capability operations on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   The capability of the Ethernet device
+ * @param arg
+ *   A pointer to arguments defined specifically for the operation.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_capability_ops_get(uint8_t port_id,
+	enum rte_eth_capability cap, void *arg);
+
+/**
  * Get DCB information on an Ethernet device.
  *
  * @param port_id
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index c6c9d0d..637317c 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -154,3 +154,10 @@ DPDK_17.02 {
 	rte_flow_validate;
 
 } DPDK_16.11;
+
+DPDK_17.05 {
+	global:
+
+	rte_eth_dev_capability_ops_get;
+
+} DPDK_17.02;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-04  1:10 [PATCH v3 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
  2017-03-04  1:10 ` [PATCH v3 1/2] ethdev: add capability control API Cristian Dumitrescu
@ 2017-03-04  1:10 ` Cristian Dumitrescu
  2017-03-06 10:38   ` Thomas Monjalon
                     ` (4 more replies)
  1 sibling, 5 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-03-04  1:10 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

This patch introduces the generic ethdev API for the traffic manager
capability, which includes: hierarchical scheduling, traffic shaping,
congestion management, packet marking.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port, per hierarchy level and per hierarchy node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes in v3:
- Implemented feedback from Jerin [5]
- Changed naming convention: scheddev -> tm
- Improvements on the capability API:
	- Specification of marking capabilities per color
	- WFQ/WRR groups: sp_n_children_max -> wfq_wrr_n_children_per_group_max,
	  added wfq_wrr_n_groups_max, improved description of both, improved
	  description of wfq_wrr_weight_max
	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent update
- Enforced/documented restrictions for root node (node_add() and update())
- Enforced/documented shaper profile restrictions on PIR: PIR != 0, PIR >= CIR
- Turned repetitive code in rte_tm.c into macro
- Removed dependency on rte_red.h file (added RED params to rte_tm.h)
- Color: removed "e_" from color names enum
- Fixed small Doxygen style issues

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, its parent, its role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API functions
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
[5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_tm.c              |  436 ++++++++++
 lib/librte_ether/rte_tm.h              | 1466 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  365 ++++++++
 6 files changed, 2305 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..7893ac6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+Traffic Manager API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_tm*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..82faa67 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_tm.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_tm.h
+SYMLINK-y-include += rte_tm_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 637317c..42ad3fb 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_ops_get;
+	rte_tm_get_leaf_nodes;
+	rte_tm_node_type_get;
+	rte_tm_capabilities_get;
+	rte_tm_level_capabilities_get;
+	rte_tm_node_capabilities_get;
+	rte_tm_wred_profile_add;
+	rte_tm_wred_profile_delete;
+	rte_tm_shared_wred_context_add_update;
+	rte_tm_shared_wred_context_delete;
+	rte_tm_shaper_profile_add;
+	rte_tm_shaper_profile_delete;
+	rte_tm_shared_shaper_add_update;
+	rte_tm_shared_shaper_delete;
+	rte_tm_node_add;
+	rte_tm_node_delete;
+	rte_tm_node_suspend;
+	rte_tm_node_resume;
+	rte_tm_hierarchy_set;
+	rte_tm_node_parent_update;
+	rte_tm_node_shaper_update;
+	rte_tm_node_shared_shaper_update;
+	rte_tm_node_stats_update;
+	rte_tm_node_scheduling_mode_update;
+	rte_tm_node_cman_update;
+	rte_tm_node_wred_context_update;
+	rte_tm_node_shared_wred_context_update;
+	rte_tm_node_stats_read;
+	rte_tm_mark_vlan_dei;
+	rte_tm_mark_ip_ecn;
+	rte_tm_mark_ip_dscp;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
new file mode 100644
index 0000000..f8bd491
--- /dev/null
+++ b/lib/librte_ether/rte_tm.c
@@ -0,0 +1,436 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm_driver.h"
+#include "rte_tm.h"
+
+/* Get generic traffic manager operations structure from a port. */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_tm_error_set(error,
+			ENODEV,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ops_get == NULL) ||
+		(dev->dev_ops->cap_ops_get(dev, RTE_ETH_CAPABILITY_TM,
+		&ops) != 0) || (ops == NULL)) {
+		rte_tm_error_set(error,
+			ENOSYS,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+#define RTE_TM_FUNC(port_id, func)				\
+({								\
+	const struct rte_tm_ops *ops =			\
+		rte_tm_ops_get(port_id, error);		\
+	if (ops == NULL)						\
+		return -rte_errno;				\
+								\
+	if (ops->func == NULL)					\
+		return -rte_tm_error_set(error,		\
+			ENOSYS,					\
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
+			NULL,					\
+			rte_strerror(ENOSYS));			\
+								\
+	ops->func;						\
+})
+
+/* Get number of leaf nodes */
+int
+rte_tm_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops =
+		rte_tm_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_tm_error_set(error,
+			EINVAL,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node ID type (leaf or non-leaf) */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_type_get)(dev,
+		node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
+		cap, error);
+}
+
+/* Get level capabilities */
+int rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
+		level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
+		node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
+		wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
+		wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
+		shared_wred_context_id, wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
+		shared_wred_context_id, error);
+}
+
+/* Add shaper profile */
+int rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
+		shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
+		shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
+		shared_shaper_id, shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
+		shared_shaper_id, error);
+}
+
+/* Add node to port traffic manager hierarchy */
+int rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_add)(dev,
+		node_id, parent_node_id, priority, weight, params, error);
+}
+
+/* Delete node from traffic manager hierarchy */
+int rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_delete)(dev,
+		node_id, error);
+}
+
+/* Suspend node */
+int rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_suspend)(dev,
+		node_id, error);
+}
+
+/* Resume node */
+int rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_resume)(dev,
+		node_id, error);
+}
+
+/* Set the initial port traffic manager hierarchy */
+int rte_tm_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, hierarchy_set)(dev,
+		clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
+		node_id, parent_node_id, priority, weight, error);
+}
+
+/* Update node private shaper */
+int rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
+		node_id, shaper_profile_id, error);
+}
+
+/* Update node shared shapers */
+int rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
+		node_id, shared_shaper_id, add, error);
+}
+
+/* Update node stats */
+int rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
+		node_id, stats_mask, error);
+}
+
+/* Update scheduling mode */
+int rte_tm_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_scheduling_mode_update)(dev,
+		node_id, scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
+		node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
+		node_id, wred_profile_id, error);
+}
+
+/* Update node shared WRED context */
+int rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
+		node_id, shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
+		node_id, stats, stats_mask, clear, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
new file mode 100644
index 0000000..64ef5dd
--- /dev/null
+++ b/lib/librte_ether/rte_tm.h
@@ -0,0 +1,1466 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_H__
+#define __INCLUDE_RTE_TM_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API
+ *
+ * This interface provides the ability to configure the traffic manager in a
+ * generic way. It includes features such as: hierarchical scheduling,
+ * traffic shaping, congestion management, packet marking, etc.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+ * is generated and added at the end of the Ethernet frame on TX side without
+ * any SW intervention.
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Node ID for the parent of the root node */
+#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_tm_color {
+	RTE_TM_GREEN = 0, /**< Green */
+	RTE_TM_YELLOW, /**< Yellow */
+	RTE_TM_RED, /**< Red */
+	RTE_TM_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_tm_stats_type {
+	/**< Number of packets scheduled from current node. */
+	RTE_TM_STATS_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_TM_STATS_N_BYTES = 1 << 1,
+
+	/**< Number of green packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/**< Number of yellow packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/**< Number of red packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/**< Number of green bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/**< Number of yellow bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/**< Number of red bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_tm_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[RTE_TM_COLORS];
+
+		/**< Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[RTE_TM_COLORS];
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Traffic manager dynamic updates
+ */
+enum rte_tm_dynamic_update_type {
+	/**< Dynamic parent node update. The new parent node is located on same
+	 * hierarchy level as the former parent node. Consequently, the node
+	 * whose parent is changed preserves its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
+
+	/**< Dynamic parent node update. The new parent node is located on
+	 * different hierarchy level than the former parent node. Consequently,
+	 * the node whose parent is changed also changes its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
+
+	/**< Dynamic node add/delete. */
+	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
+
+	/**< Suspend/resume nodes. */
+	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
+
+	/**< Dynamic switch between WFQ and WRR per node SP priority level. */
+	RTE_TM_UPDATE_NODE_SCHEDULING_MODE = 1 << 4,
+
+	/**< Dynamic update of the set of enabled stats counter types. */
+	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
+
+	/**< Dynamic update of congestion management mode for leaf nodes. */
+	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
+};
+
+/**
+ * Traffic manager node capabilities
+ */
+struct rte_tm_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Mask of supported statistics counter types. */
+	uint64_t stats_mask;
+
+	union {
+		/**< Items valid only for non-leaf nodes. */
+		struct {
+			/**< Maximum number of children nodes. */
+			uint32_t n_children_max;
+
+			/**< Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size
+			 * of the WFQ/WRR sibling node group. The value of zero
+			 * is invalid. The value of 1 indicates that WFQ/WRR
+			 * algorithms are not supported. The maximum value is
+			 * *n_children_max*.
+			 */
+			uint32_t wfq_wrr_n_children_per_group_max;
+
+			/**< Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ/WRR sibling node groups that
+			 * have two or more members. The value of zero states
+			 * that WFQ/WRR algorithms are not supported. The value
+			 * of 1 indicates that (*sp_n_priorities_max* - 1)
+			 * priority levels have at most one child node, so
+			 * there can be only one priority level with two or
+			 * more sibling nodes making up a WFQ/WRR group. The
+			 * maximum value is: min(floor(*n_children_max* / 2),
+			 * *sp_n_priorities_max*).
+			 */
+			uint32_t wfq_wrr_n_groups_max;
+
+			/**< WFQ algorithm support. */
+			int wfq_supported;
+
+			/**< WRR algorithm support. */
+			int wrr_supported;
+
+			/**< Maximum WFQ/WRR weight. The value of 1 indicates
+			 * that all sibling nodes with same priority have the
+			 * same WFQ/WRR weight, so WFQ/WRR is reduced to FQ/RR.
+			 */
+			uint32_t wfq_wrr_weight_max;
+		} nonleaf;
+
+		/**< Items valid only for leaf nodes. */
+		struct {
+			/**< Head drop algorithm support. */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts supported.
+			 * The value of zero indicates that shared WRED
+			 * contexts are not supported.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+};
+
+/**
+ * Traffic manager level capabilities
+ */
+struct rte_tm_level_capabilities {
+	/**< Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/**< Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/**< Summary of node-level capabilities across all the non-leaf nodes
+	 * of the current hierarchy level. Valid only when
+	 * *n_nodes_nonleaf_max* is greater than 0.
+	 */
+	struct rte_tm_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all the leaf nodes of
+	 * the current hierarchy level. Valid only when *n_nodes_leaf_max* is
+	 * greater than 0.
+	 */
+	struct rte_tm_node_capabilities leaf;
+};
+
+/**
+ * Traffic manager capabilities
+ */
+struct rte_tm_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared shapers.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have the private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero
+	 * indicates that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking (per color). */
+	int mark_vlan_dei_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
+	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
+	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
+	int mark_ip_dscp_supported[RTE_TM_COLORS];
+
+	/**< Set of supported dynamic update operations
+	 * (see enum rte_tm_dynamic_update_type).
+	 */
+	uint64_t dynamic_update_mask;
+
+	/**< Summary of node-level capabilities across all non-leaf nodes. */
+	struct rte_tm_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all leaf nodes. */
+	struct rte_tm_node_capabilities leaf;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_tm_cman_mode {
+	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * Random Early Detection (RED) profile
+ */
+struct rte_tm_red_params {
+	/**< Minimum queue threshold */
+	uint16_t min_th;
+
+	/**< Maximum queue threshold */
+	uint16_t max_th;
+
+	/**< Inverse of packet marking probability maximum value (maxp), i.e.
+	 * maxp_inv = 1 / maxp
+	 */
+	uint16_t maxp_inv;
+
+	/**< Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
+	uint16_t wq_log2;
+};
+
+/**
+ * Weighted RED (WRED) profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_tm_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_tm_red_params red_params[RTE_TM_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_tm_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the peak bucket has to be bigger than zero, as well as greater than
+ * or equal to the rate of the committed bucket.
+ */
+struct rte_tm_shaper_params {
+	/**< Committed token bucket */
+	struct rte_tm_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_tm_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each hierarchy node has multiple inputs (children nodes of the current
+ * parent node) and a single output (which is input to its parent node). The
+ * current node arbitrates its inputs using Strict Priority (SP), Weighted Fair
+ * Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to schedule input
+ * packets on its output while observing its shaping (rate limiting)
+ * constraints.
+ *
+ * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+ * approximations of the ideal of WFQ and are assimilated to WFQ, although an
+ * associated implementation-dependent trade-off on accuracy, performance and
+ * resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP
+ * algorithm, based on their priority, with zero (0) as the highest priority.
+ * Children with same priority are scheduled using the WFQ or WRR algorithm,
+ * based on their weight, which is relative to the sum of the weights of all
+ * siblings with same priority, with one (1) as the lowest weight.
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+ * where N is the number of TX queues configured for the current Ethernet port.
+ * The non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_tm_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	/**< Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 */
+	uint64_t stats_mask;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_tm_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED
+				 * context IDs. The absence of a private WRED
+				 * context for current leaf node is indicated
+				 * by value RTE_TM_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_tm_error::cause.
+ */
+enum rte_tm_error_type {
+	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
+	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_TM_ERROR_TYPE_CAPABILITIES,
+	RTE_TM_ERROR_TYPE_LEVEL_ID,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
+	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_PRIORITIES,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_TM_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_tm_error {
+	enum rte_tm_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Traffic manager get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port)
+ * and their IDs are 0 .. (N-1).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
+ * the number of TX queues of the current Ethernet port. The non-leaf nodes
+ * have their IDs generated by the application outside of the above range,
+ * which is reserved for leaf nodes.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID value. Needs to be valid.
+ * @param is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Traffic manager capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager level capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param level_id
+ *   The hierarchy level identifier. The value of 0 identifies the level of the
+ *   root node.
+ * @param cap
+ *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is
+ * currently at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several hierarchy leaf nodes
+ * configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
+ * context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
+ * no longer using the shaper profile previously assigned to it and is updated
+ * to use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. hierarchy node) of this shared shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ/WRR *weight* to schedule its
+ * new child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that
+ * are subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
+ * can only be one node with this parent ID (i.e. the root node). Further
+ * restrictions for root node: needs to be non-leaf, its private shaper profile
+ * needs to be valid and single rate, cannot use any shared shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has
+ * at least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the start-up hierarchy.
+ *
+ * This function fails when the currently configured hierarchy is not supported
+ * by the Ethernet port, in which case the user can abort or try out another
+ * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can
+ * be build from scratch (when *clear_on_fail* is enabled) or by modifying the
+ * existing hierarchy configuration (when *clear_on_fail* is disabled).
+ *
+ * Note that, even when the configured hierarchy is supported (so this function
+ * is successful), the Ethernet port start might still fail due to e.g. not
+ * enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node parent update
+ *
+ * Restriction for root node: its parent cannot be changed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the
+ *   WFQ/WRR algorithm running on the parent of the current node for scheduling
+ *   this child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private shaper update
+ *
+ * Restriction for root node: its private shaper profile needs to be valid and
+ * single rate.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared shapers update
+ *
+ * Restriction for root node: cannot use any shared rate shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node enabled statistics counters update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to
+ *   be disabled for the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates
+ *   that WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value
+ *   element indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or
+ *   to zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters
+ *   retrieved with the *stats* structure are valid.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read,
+ *   otherwise the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
+ * Notification (ECN) field (2 bits). The DSCP field is typically used to
+ * encode the traffic class and/or drop priority (RFC 2597), while the ECN
+ * field is used by RFC 3168 to implement a congestion notification mechanism
+ * to be leveraged by transport layer protocols such as TCP and SCTP that have
+ * congestion control mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10
+ * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
+ * that congestion is experienced). The destination endpoint can use the
+ * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
+ * source endpoint, which acknowledges it back to the destination endpoint with
+ * the Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
+ * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_H__ */
diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
new file mode 100644
index 0000000..b3c9c15
--- /dev/null
+++ b/lib/librte_ether/rte_tm_driver.h
@@ -0,0 +1,365 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_DRIVER_H__
+#define __INCLUDE_RTE_TM_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node type get */
+
+typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager capabilities get */
+
+typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager level capabilities get */
+
+typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node capabilities get */
+
+typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile add */
+
+typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile delete */
+
+typedef int (*rte_tm_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context add */
+
+typedef int (*rte_tm_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context delete */
+
+typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile add */
+
+typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile delete */
+
+typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper add/update */
+
+typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper delete */
+
+typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node add */
+
+typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node delete */
+
+typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node suspend */
+
+typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node resume */
+
+typedef int (*rte_tm_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager hierarchy set */
+
+typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node parent update */
+
+typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node stats update */
+
+typedef int (*rte_tm_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node scheduling mode update */
+
+typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node congestion management mode update */
+
+typedef int (*rte_tm_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager read stats counters for specific node */
+
+typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - VLAN DEI */
+
+typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
+
+struct rte_tm_ops {
+	/** Traffic manager node type get */
+	rte_tm_node_type_get_t node_type_get;
+
+	/** Traffic manager capabilities_get */
+	rte_tm_capabilities_get_t capabilities_get;
+	/** Traffic manager level capabilities_get */
+	rte_tm_level_capabilities_get_t level_capabilities_get;
+	/** Traffic manager node capabilities get */
+	rte_tm_node_capabilities_get_t node_capabilities_get;
+
+	/** Traffic manager WRED profile add */
+	rte_tm_wred_profile_add_t wred_profile_add;
+	/** Traffic manager WRED profile delete */
+	rte_tm_wred_profile_delete_t wred_profile_delete;
+	/** Traffic manager shared WRED context add/update */
+	rte_tm_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Traffic manager shared WRED context delete */
+	rte_tm_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Traffic manager shaper profile add */
+	rte_tm_shaper_profile_add_t shaper_profile_add;
+	/** Traffic manager shaper profile delete */
+	rte_tm_shaper_profile_delete_t shaper_profile_delete;
+	/** Traffic manager shared shaper add/update */
+	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Traffic manager shared shaper delete */
+	rte_tm_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Traffic manager node add */
+	rte_tm_node_add_t node_add;
+	/** Traffic manager node delete */
+	rte_tm_node_delete_t node_delete;
+	/** Traffic manager node suspend */
+	rte_tm_node_suspend_t node_suspend;
+	/** Traffic manager node resume */
+	rte_tm_node_resume_t node_resume;
+	/** Traffic manager hierarchy set */
+	rte_tm_hierarchy_set_t hierarchy_set;
+
+	/** Traffic manager node parent update */
+	rte_tm_node_parent_update_t node_parent_update;
+	/** Traffic manager node shaper update */
+	rte_tm_node_shaper_update_t node_shaper_update;
+	/** Traffic manager node shared shaper update */
+	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Traffic manager node stats update */
+	rte_tm_node_stats_update_t node_stats_update;
+	/** Traffic manager node scheduling mode update */
+	rte_tm_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Traffic manager node congestion management mode update */
+	rte_tm_node_cman_update_t node_cman_update;
+	/** Traffic manager node WRED context update */
+	rte_tm_node_wred_context_update_t node_wred_context_update;
+	/** Traffic manager node shared WRED context update */
+	rte_tm_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Traffic manager read statistics counters for current node */
+	rte_tm_node_stats_read_t node_stats_read;
+
+	/** Traffic manager packet marking - VLAN DEI */
+	rte_tm_mark_vlan_dei_t mark_vlan_dei;
+	/** Traffic manager packet marking - IPv4/IPv6 ECN */
+	rte_tm_mark_ip_ecn_t mark_ip_ecn;
+	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
+	rte_tm_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_tm_error_set(struct rte_tm_error *error,
+		   int code,
+		   enum rte_tm_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_tm_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic traffic manager operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The traffic manager operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-04  1:10 ` [PATCH v3 1/2] ethdev: add capability control API Cristian Dumitrescu
@ 2017-03-06 10:32   ` Thomas Monjalon
  2017-03-06 16:35     ` Dumitrescu, Cristian
  2017-03-06 16:36     ` Dumitrescu, Cristian
  2017-05-19 17:12   ` [PATCH v4 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  1 sibling, 2 replies; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-06 10:32 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

Hi Cristian,

2017-03-04 01:10, Cristian Dumitrescu:
> struct rte_flow_ops *eth_flow_ops;
> int rte = rte_eth_dev_filter_ctrl(eth_port_id,
> 	RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, &eth_flow_ops);
> 
> Unfortunately, the rte_flow opportunistically uses the rte_eth_dev_filter_ctrl()
> API function, which is applicable just to RX-side filters as opposed to
> introducing a mechanism that could be used by any capability in a generic way.
> 
> This is the gap that addressed by the current patch. This mechanism is intended
> to be used to introduce new capabilities into ethdev in a modular plugin-like
> approach, such as hierarchical scheduler. Over time, if agreed, it can also be
> used for exposing the existing Ethernet device capabilities in a modular way,
> such as: xstats, filters, multicast, mirroring, tunnels, time stamping, eeprom,
> bypass, etc.
> 
> Changes in v3:
> -Followed up on suggestion from Jerin: renamed capability from Hierarchical
>  Scheduler (sched) to Traffic Manager (tm)
> 
> Changes in v2:
> -Followed up on suggestion from Jerin and Hemant: renamed capability_control()
>  to capability_ops_get()
> -Added ACK from Keith, Jerin and Hemant

It is difficult to follow previous discussions as you do not
keep threading with --in-reply-to.

> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> Acked-by: Keith Wiles <keith.wiles@intel.com>
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
[...]
> +enum rte_eth_capability {
> +	RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> +	RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> +	RTE_ETH_CAPABILITY_MAX
> +};
[...]
>  /**
> + * Take capability operations on an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap
> + *   The capability of the Ethernet device
> + * @param arg
> + *   A pointer to arguments defined specifically for the operation.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-ENODEV) if *port_id* invalid.
> + */
> +int rte_eth_dev_capability_ops_get(uint8_t port_id,
> +	enum rte_eth_capability cap, void *arg);

What is the benefit of getting different kind of capabilities with
the same function?
enum + void* = ioctl
A self-explanatory API should have a dedicated function for each kind
of features with different argument types.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
@ 2017-03-06 10:38   ` Thomas Monjalon
  2017-03-06 16:59     ` Dumitrescu, Cristian
  2017-03-06 16:15   ` Stephen Hemminger
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-06 10:38 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

2017-03-04 01:10, Cristian Dumitrescu:
> This patch introduces the generic ethdev API for the traffic manager
> capability, which includes: hierarchical scheduling, traffic shaping,
> congestion management, packet marking.

We already have some API for QoS. Why integrating them in ethdev?
ethdev is an interface for networking drivers.
I think the QoS has nothing to do with drivers.
If there are some operations to offload in drivers, please identify them
and let's add the operations to ethdev.

> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow approach)

I do not know what you call an ethdev plugin.
rte_flow is a part of the driver interface.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
  2017-03-06 10:38   ` Thomas Monjalon
@ 2017-03-06 16:15   ` Stephen Hemminger
  2017-03-06 18:17     ` Dumitrescu, Cristian
  2017-03-16 17:35   ` Thomas Monjalon
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 52+ messages in thread
From: Stephen Hemminger @ 2017-03-06 16:15 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

On Sat,  4 Mar 2017 01:10:20 +0000
Cristian Dumitrescu <cristian.dumitrescu@intel.com> wrote:

> +/* Get generic traffic manager operations structure from a port. */
> +const struct rte_tm_ops *
> +rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_tm_ops *ops;
> +
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		rte_tm_error_set(error,
> +			ENODEV,
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> +			NULL,
> +			rte_strerror(ENODEV));
> +		return NULL;
> +	}
> +
> +	if ((dev->dev_ops->cap_ops_get == NULL) ||
> +		(dev->dev_ops->cap_ops_get(dev, RTE_ETH_CAPABILITY_TM,
> +		&ops) != 0) || (ops == NULL)) {
> +		rte_tm_error_set(error,
> +			ENOSYS,
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> +			NULL,
> +			rte_strerror(ENOSYS));
> +		return NULL;
> +	}
> +
> +	return ops;
> +}

Why are you introducing yet another version of errno? There already is
rte_errno for RTE specific errors.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 10:32   ` Thomas Monjalon
@ 2017-03-06 16:35     ` Dumitrescu, Cristian
  2017-03-06 16:57       ` Thomas Monjalon
  2017-03-06 16:36     ` Dumitrescu, Cristian
  1 sibling, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-06 16:35 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce

Hi Thomas,

Thanks for reviewing this proposal.


> > Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> > Acked-by: Keith Wiles <keith.wiles@intel.com>
> > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> [...]
> > +enum rte_eth_capability {
> > +	RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> > +	RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> > +	RTE_ETH_CAPABILITY_MAX
> > +};
> [...]
> >  /**
> > + * Take capability operations on an Ethernet device.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param cap
> > + *   The capability of the Ethernet device
> > + * @param arg
> > + *   A pointer to arguments defined specifically for the operation.
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support.
> > + *   - (-ENODEV) if *port_id* invalid.
> > + */
> > +int rte_eth_dev_capability_ops_get(uint8_t port_id,
> > +	enum rte_eth_capability cap, void *arg);
> 
> What is the benefit of getting different kind of capabilities with
> the same function?
> enum + void* = ioctl
> A self-explanatory API should have a dedicated function for each kind
> of features with different argument types.

The advantage is providing a standard interface to query the capabilities of the device rather than having each capability provide its own mechanism in a slightly different way.

IMO this mechanism is of great help to guide the developers of future ethdev features on the clean path to add new features in a modular way, extending the ethdev functionality while doing so in a separate name space and file (that's why I tend to call this a plugin-like mechanism), as opposed to the current monolithic approach for ethdev, where we have 100+ API functions in a single name space and that are split into functional groups just by blank lines in the header file. It is simply the generalization of the mechanism introduced by rte_flow in release 17.02 (so all the credit should go to Adrien and not me).

IMO, having a standard function as above it cleaner than having a separate and slightly different function per feature. People can quickly see the set of standard ethdev capabilities and which ones are supported by a specific device. Between A) and B) below, I definitely prefer A):
A) status = rte_eth_dev_capability_ops_get(port_id, RTE_ETH_CABABILITY_TM, &tm_ops);
B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);

Regards,
Cristian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 10:32   ` Thomas Monjalon
  2017-03-06 16:35     ` Dumitrescu, Cristian
@ 2017-03-06 16:36     ` Dumitrescu, Cristian
  1 sibling, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-06 16:36 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

> 
> It is difficult to follow previous discussions as you do not
> keep threading with --in-reply-to.
> 

Apologies, will do in the future.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 16:35     ` Dumitrescu, Cristian
@ 2017-03-06 16:57       ` Thomas Monjalon
  2017-03-06 18:28         ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-06 16:57 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce

2017-03-06 16:35, Dumitrescu, Cristian:
> Hi Thomas,
> 
> Thanks for reviewing this proposal.
> 
> 
> > > Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> > > Acked-by: Keith Wiles <keith.wiles@intel.com>
> > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> > [...]
> > > +enum rte_eth_capability {
> > > +	RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> > > +	RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> > > +	RTE_ETH_CAPABILITY_MAX
> > > +};
> > [...]
> > >  /**
> > > + * Take capability operations on an Ethernet device.
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the Ethernet device.
> > > + * @param cap
> > > + *   The capability of the Ethernet device
> > > + * @param arg
> > > + *   A pointer to arguments defined specifically for the operation.
> > > + * @return
> > > + *   - (0) if successful.
> > > + *   - (-ENOTSUP) if hardware doesn't support.
> > > + *   - (-ENODEV) if *port_id* invalid.
> > > + */
> > > +int rte_eth_dev_capability_ops_get(uint8_t port_id,
> > > +	enum rte_eth_capability cap, void *arg);
> > 
> > What is the benefit of getting different kind of capabilities with
> > the same function?
> > enum + void* = ioctl
> > A self-explanatory API should have a dedicated function for each kind
> > of features with different argument types.
> 
> The advantage is providing a standard interface to query the capabilities of the device rather than having each capability provide its own mechanism in a slightly different way.
> 
> IMO this mechanism is of great help to guide the developers of future ethdev features on the clean path to add new features in a modular way, extending the ethdev functionality while doing so in a separate name space and file (that's why I tend to call this a plugin-like mechanism), as opposed to the current monolithic approach for ethdev, where we have 100+ API functions in a single name space and that are split into functional groups just by blank lines in the header file. It is simply the generalization of the mechanism introduced by rte_flow in release 17.02 (so all the credit should go to Adrien and not me).
> 
> IMO, having a standard function as above it cleaner than having a separate and slightly different function per feature. People can quickly see the set of standard ethdev capabilities and which ones are supported by a specific device. Between A) and B) below, I definitely prefer A):
> A) status = rte_eth_dev_capability_ops_get(port_id, RTE_ETH_CABABILITY_TM, &tm_ops);
> B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);

I prefer B because instead of tm_ops, you can use some specific tm arguments,
show their types and properly document each parameter.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-06 10:38   ` Thomas Monjalon
@ 2017-03-06 16:59     ` Dumitrescu, Cristian
  2017-03-06 20:07       ` Thomas Monjalon
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-06 16:59 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, March 6, 2017 10:39 AM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com
> Subject: Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> 2017-03-04 01:10, Cristian Dumitrescu:
> > This patch introduces the generic ethdev API for the traffic manager
> > capability, which includes: hierarchical scheduling, traffic shaping,
> > congestion management, packet marking.
> 
> We already have some API for QoS. Why integrating them in ethdev?
> ethdev is an interface for networking drivers.
> I think the QoS has nothing to do with drivers.
> If there are some operations to offload in drivers, please identify them
> and let's add the operations to ethdev.
> 

The reason to add to ethdev is because QoS traffic management/hierarchical scheduling is just another TX offload for Ethernet devices. This TX offload is present in NICs, NPUs and SoCs from Broadcom, Cavium, Intel, Mellanox, Netronome, NXP, others.

The API we currently have in DPDK (librte_sched) is great, but it refers to an implementation for a fixed set of features for a BRAS-like hierarchy. The current abstraction layer proposal is intended to support pretty much any hierarchy and traffic management features such as hierarchical scheduling, traffic shaping, congestion management, marking under the same API. It targets pretty much any implementation, either HW, SW or hybrid; it does support the existing librte_sched library feature set, but it is not limited to it.

> > Main features:
> > - Exposed as ethdev plugin capability (similar to rte_flow approach)
> 
> I do not know what you call an ethdev plugin.
> rte_flow is a part of the driver interface.

We extend the ethdev feature set using a feature-specific name space and separate files (module/plugin-like) as opposed to simply adding new functions in structure eth_dev_ops in file rte_ethdev.h (IMO monolithic approach), similar to rte_flow, which is already part of DPDK.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-06 16:15   ` Stephen Hemminger
@ 2017-03-06 18:17     ` Dumitrescu, Cristian
  0 siblings, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-06 18:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, March 6, 2017 4:15 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; thomas.monjalon@6wind.com;
> jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com
> Subject: Re: [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler
> API
> 
> On Sat,  4 Mar 2017 01:10:20 +0000
> Cristian Dumitrescu <cristian.dumitrescu@intel.com> wrote:
> 
> > +/* Get generic traffic manager operations structure from a port. */
> > +const struct rte_tm_ops *
> > +rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +	const struct rte_tm_ops *ops;
> > +
> > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > +		rte_tm_error_set(error,
> > +			ENODEV,
> > +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> > +			NULL,
> > +			rte_strerror(ENODEV));
> > +		return NULL;
> > +	}
> > +
> > +	if ((dev->dev_ops->cap_ops_get == NULL) ||
> > +		(dev->dev_ops->cap_ops_get(dev,
> RTE_ETH_CAPABILITY_TM,
> > +		&ops) != 0) || (ops == NULL)) {
> > +		rte_tm_error_set(error,
> > +			ENOSYS,
> > +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> > +			NULL,
> > +			rte_strerror(ENOSYS));
> > +		return NULL;
> > +	}
> > +
> > +	return ops;
> > +}
> 
> Why are you introducing yet another version of errno? There already is
> rte_errno for RTE specific errors.

Have you looked at rte_flow? It is already doing this, and people asked me to follow the same approach here for domain specific error codes.

Look at Jerin's feedback on RFC here: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
"IMO, We need an explicit error number to differentiate the configuration error due do Ethernet port has been started.
The recent rte_flow spec has own error codes to get more visibility on the failure, so that application can choose better attributes for configuring."

I agreed it is a good idea, as it gives you a precise indication on what exactly went wrong. A generic error code of EBUSY needs to be complemented by a second level of library-specific error details. Note that we are also setting rte_errno.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 16:57       ` Thomas Monjalon
@ 2017-03-06 18:28         ` Dumitrescu, Cristian
  2017-03-06 20:21           ` Thomas Monjalon
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-06 18:28 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, March 6, 2017 4:57 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com; Wiles, Keith <keith.wiles@intel.com>; Richardson,
> Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v3 1/2] ethdev: add capability control API
> 
> 2017-03-06 16:35, Dumitrescu, Cristian:
> > Hi Thomas,
> >
> > Thanks for reviewing this proposal.
> >
> >
> > > > Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> > > > Acked-by: Keith Wiles <keith.wiles@intel.com>
> > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> > > [...]
> > > > +enum rte_eth_capability {
> > > > +	RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> > > > +	RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> > > > +	RTE_ETH_CAPABILITY_MAX
> > > > +};
> > > [...]
> > > >  /**
> > > > + * Take capability operations on an Ethernet device.
> > > > + *
> > > > + * @param port_id
> > > > + *   The port identifier of the Ethernet device.
> > > > + * @param cap
> > > > + *   The capability of the Ethernet device
> > > > + * @param arg
> > > > + *   A pointer to arguments defined specifically for the operation.
> > > > + * @return
> > > > + *   - (0) if successful.
> > > > + *   - (-ENOTSUP) if hardware doesn't support.
> > > > + *   - (-ENODEV) if *port_id* invalid.
> > > > + */
> > > > +int rte_eth_dev_capability_ops_get(uint8_t port_id,
> > > > +	enum rte_eth_capability cap, void *arg);
> > >
> > > What is the benefit of getting different kind of capabilities with
> > > the same function?
> > > enum + void* = ioctl
> > > A self-explanatory API should have a dedicated function for each kind
> > > of features with different argument types.
> >
> > The advantage is providing a standard interface to query the capabilities of
> the device rather than having each capability provide its own mechanism in a
> slightly different way.
> >
> > IMO this mechanism is of great help to guide the developers of future
> ethdev features on the clean path to add new features in a modular way,
> extending the ethdev functionality while doing so in a separate name space
> and file (that's why I tend to call this a plugin-like mechanism), as opposed to
> the current monolithic approach for ethdev, where we have 100+ API
> functions in a single name space and that are split into functional groups just
> by blank lines in the header file. It is simply the generalization of the
> mechanism introduced by rte_flow in release 17.02 (so all the credit should
> go to Adrien and not me).
> >
> > IMO, having a standard function as above it cleaner than having a separate
> and slightly different function per feature. People can quickly see the set of
> standard ethdev capabilities and which ones are supported by a specific
> device. Between A) and B) below, I definitely prefer A):
> > A) status = rte_eth_dev_capability_ops_get(port_id,
> RTE_ETH_CABABILITY_TM, &tm_ops);
> > B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);
> 
> I prefer B because instead of tm_ops, you can use some specific tm
> arguments,
> show their types and properly document each parameter.

Note that rte_flow already returns the flow ops as a void * with no strong argument type checking (approach A from above). Are you saying this is wrong?

	rte_eth_dev_filter_ctrl(port_id, RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, void *eth_flow_ops);

Personally, I am in favour of allowing the standard interface at the expense of strong build-time type checking. Especially that this API function is between ethdev and the drivers, as opposed to between app and ethdev.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-06 16:59     ` Dumitrescu, Cristian
@ 2017-03-06 20:07       ` Thomas Monjalon
  2017-03-07 19:29         ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-06 20:07 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce

2017-03-06 16:59, Dumitrescu, Cristian:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2017-03-04 01:10, Cristian Dumitrescu:
> > > This patch introduces the generic ethdev API for the traffic manager
> > > capability, which includes: hierarchical scheduling, traffic shaping,
> > > congestion management, packet marking.
> > 
> > We already have some API for QoS. Why integrating them in ethdev?
> > ethdev is an interface for networking drivers.
> > I think the QoS has nothing to do with drivers.
> > If there are some operations to offload in drivers, please identify them
> > and let's add the operations to ethdev.
> > 
> 
> The reason to add to ethdev is because QoS traffic management/hierarchical scheduling is just another TX offload for Ethernet devices. This TX offload is present in NICs, NPUs and SoCs from Broadcom, Cavium, Intel, Mellanox, Netronome, NXP, others.
> 
> The API we currently have in DPDK (librte_sched) is great, but it refers to an implementation for a fixed set of features for a BRAS-like hierarchy. The current abstraction layer proposal is intended to support pretty much any hierarchy and traffic management features such as hierarchical scheduling, traffic shaping, congestion management, marking under the same API. It targets pretty much any implementation, either HW, SW or hybrid; it does support the existing librte_sched library feature set, but it is not limited to it.

OK I better understand now.
You should add this level of explanation in your patch.

However I am reluctant to add an API if there is no user.
I think we should wait to have at least one existing driver implementing
this API before integrating it.
It was the approach of eventdev which has a dedicated next- tree.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 18:28         ` Dumitrescu, Cristian
@ 2017-03-06 20:21           ` Thomas Monjalon
  2017-03-06 20:41             ` Wiles, Keith
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-06 20:21 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce

> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2017-03-06 16:35, Dumitrescu, Cristian:
> > > > > +int rte_eth_dev_capability_ops_get(uint8_t port_id,
> > > > > +	enum rte_eth_capability cap, void *arg);
> > > >
> > > > What is the benefit of getting different kind of capabilities with
> > > > the same function?
> > > > enum + void* = ioctl
> > > > A self-explanatory API should have a dedicated function for each kind
> > > > of features with different argument types.
> > >
> > > The advantage is providing a standard interface to query the capabilities of
> > the device rather than having each capability provide its own mechanism in a
> > slightly different way.
> > >
> > > IMO this mechanism is of great help to guide the developers of future
> > ethdev features on the clean path to add new features in a modular way,
> > extending the ethdev functionality while doing so in a separate name space
> > and file (that's why I tend to call this a plugin-like mechanism), as opposed to
> > the current monolithic approach for ethdev, where we have 100+ API
> > functions in a single name space and that are split into functional groups just
> > by blank lines in the header file. It is simply the generalization of the
> > mechanism introduced by rte_flow in release 17.02 (so all the credit should
> > go to Adrien and not me).
> > >
> > > IMO, having a standard function as above it cleaner than having a separate
> > and slightly different function per feature. People can quickly see the set of
> > standard ethdev capabilities and which ones are supported by a specific
> > device. Between A) and B) below, I definitely prefer A):
> > > A) status = rte_eth_dev_capability_ops_get(port_id,
> > RTE_ETH_CABABILITY_TM, &tm_ops);
> > > B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);
> > 
> > I prefer B because instead of tm_ops, you can use some specific tm
> > arguments,
> > show their types and properly document each parameter.
> 
> Note that rte_flow already returns the flow ops as a void * with no strong argument type checking (approach A from above). Are you saying this is wrong?
> 
> 	rte_eth_dev_filter_ctrl(port_id, RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, void *eth_flow_ops);
> 
> Personally, I am in favour of allowing the standard interface at the expense of strong build-time type checking. Especially that this API function is between ethdev and the drivers, as opposed to between app and ethdev.

rte_eth_dev_filter_ctrl is going to be specialized in rte_flow operations.
I agree with you on having independent API blocks in ethdev like rte_flow.
But this function rte_eth_dev_capability_ops_get that you propose would be
cross-blocks. I don't see the benefit.
I especially don't think there is a sense in the enum
	enum rte_eth_capability {
		RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
		RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
		RTE_ETH_CAPABILITY_MAX
	}

I won't debate more on this. We have to read opinions of other reviewers.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 20:21           ` Thomas Monjalon
@ 2017-03-06 20:41             ` Wiles, Keith
  2017-03-06 20:54               ` Stephen Hemminger
  0 siblings, 1 reply; 52+ messages in thread
From: Wiles, Keith @ 2017-03-06 20:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Dumitrescu, Cristian, DPDK, jerin.jacob,
	balasubramanian.manoharan, hemant.agrawal, shreyansh.jain,
	Richardson, Bruce


> On Mar 6, 2017, at 2:21 PM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
>> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
>>> 2017-03-06 16:35, Dumitrescu, Cristian:
>>>>>> +int rte_eth_dev_capability_ops_get(uint8_t port_id,
>>>>>> +	enum rte_eth_capability cap, void *arg);
>>>>> 
>>>>> What is the benefit of getting different kind of capabilities with
>>>>> the same function?
>>>>> enum + void* = ioctl
>>>>> A self-explanatory API should have a dedicated function for each kind
>>>>> of features with different argument types.
>>>> 
>>>> The advantage is providing a standard interface to query the capabilities of
>>> the device rather than having each capability provide its own mechanism in a
>>> slightly different way.
>>>> 
>>>> IMO this mechanism is of great help to guide the developers of future
>>> ethdev features on the clean path to add new features in a modular way,
>>> extending the ethdev functionality while doing so in a separate name space
>>> and file (that's why I tend to call this a plugin-like mechanism), as opposed to
>>> the current monolithic approach for ethdev, where we have 100+ API
>>> functions in a single name space and that are split into functional groups just
>>> by blank lines in the header file. It is simply the generalization of the
>>> mechanism introduced by rte_flow in release 17.02 (so all the credit should
>>> go to Adrien and not me).
>>>> 
>>>> IMO, having a standard function as above it cleaner than having a separate
>>> and slightly different function per feature. People can quickly see the set of
>>> standard ethdev capabilities and which ones are supported by a specific
>>> device. Between A) and B) below, I definitely prefer A):
>>>> A) status = rte_eth_dev_capability_ops_get(port_id,
>>> RTE_ETH_CABABILITY_TM, &tm_ops);
>>>> B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);
>>> 
>>> I prefer B because instead of tm_ops, you can use some specific tm
>>> arguments,
>>> show their types and properly document each parameter.
>> 
>> Note that rte_flow already returns the flow ops as a void * with no strong argument type checking (approach A from above). Are you saying this is wrong?
>> 
>> 	rte_eth_dev_filter_ctrl(port_id, RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, void *eth_flow_ops);
>> 
>> Personally, I am in favour of allowing the standard interface at the expense of strong build-time type checking. Especially that this API function is between ethdev and the drivers, as opposed to between app and ethdev.
> 
> rte_eth_dev_filter_ctrl is going to be specialized in rte_flow operations.
> I agree with you on having independent API blocks in ethdev like rte_flow.
> But this function rte_eth_dev_capability_ops_get that you propose would be
> cross-blocks. I don't see the benefit.
> I especially don't think there is a sense in the enum
> 	enum rte_eth_capability {
> 		RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> 		RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> 		RTE_ETH_CAPABILITY_MAX
> 	}
> 
> I won't debate more on this. We have to read opinions of other reviewers.

The benefit is providing a generic API, which we do not need to alter in the future (causing ABI breakage). The PMD can add a capability to the list if not present already and then provide a API structure for the feature.

Being able to add features without having to change DPDK maybe a strong feature for companies that have special needs for its application. They just need to add a rte_eth_capability enum in a range that they want to control (which does not mean they need to change the above structure) and they can provide private features to the application especially if they are very specific features to some HW. I do not like private features, but I also do not want to stick just any old API in DPDK for any given special feature.

Today the structure is just APIs, but it could also provide some special or specific information to the application in that structure or via an API call.

Regards,
Keith

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 20:41             ` Wiles, Keith
@ 2017-03-06 20:54               ` Stephen Hemminger
  2017-03-07 10:14                 ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Stephen Hemminger @ 2017-03-06 20:54 UTC (permalink / raw)
  To: Wiles, Keith
  Cc: Thomas Monjalon, Dumitrescu, Cristian, DPDK, jerin.jacob,
	balasubramanian.manoharan, hemant.agrawal, shreyansh.jain,
	Richardson, Bruce

On Mon, 6 Mar 2017 20:41:27 +0000
"Wiles, Keith" <keith.wiles@intel.com> wrote:

> Being able to add features without having to change DPDK maybe a strong feature for companies that have special needs for its application. They just need to add a rte_eth_capability enum in a range that they want to control (which does not mean they need to change the above structure) and they can provide private features to the application especially if they are very specific features to some HW. I do not like private features, but I also do not want to stick just any old API in DPDK for any given special feature.


I understand why you make that argument, but in practice it doesn't work that way.
When new features get added to DPDK, then the application must request those features through configration and other
API's. Therefore building everything into eth_dev doesn't seem to be helpful.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-06 20:54               ` Stephen Hemminger
@ 2017-03-07 10:14                 ` Dumitrescu, Cristian
  2017-03-07 12:56                   ` Thomas Monjalon
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-07 10:14 UTC (permalink / raw)
  To: Stephen Hemminger, Wiles, Keith
  Cc: Thomas Monjalon, DPDK, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, Richardson, Bruce



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, March 6, 2017 8:54 PM
> To: Wiles, Keith <keith.wiles@intel.com>
> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>; DPDK <dev@dpdk.org>;
> jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 1/2] ethdev: add capability control API
> 
> On Mon, 6 Mar 2017 20:41:27 +0000
> "Wiles, Keith" <keith.wiles@intel.com> wrote:
> 
> > Being able to add features without having to change DPDK maybe a strong
> feature for companies that have special needs for its application. They just
> need to add a rte_eth_capability enum in a range that they want to control
> (which does not mean they need to change the above structure) and they
> can provide private features to the application especially if they are very
> specific features to some HW. I do not like private features, but I also do not
> want to stick just any old API in DPDK for any given special feature.
> 
> 
> I understand why you make that argument, but in practice it doesn't work
> that way.
> When new features get added to DPDK, then the application must request
> those features through configration and other
> API's. Therefore building everything into eth_dev doesn't seem to be
> helpful.

Stephen, I think we are all aligned here. Question is: do you want the application to discover the supported capabilities through a standard API or do you want each capability to provide its own specific discovery mechanism (if any)? This patch proposes a standard API.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-07 10:14                 ` Dumitrescu, Cristian
@ 2017-03-07 12:56                   ` Thomas Monjalon
  2017-03-07 19:17                     ` Wiles, Keith
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-07 12:56 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: Stephen Hemminger, Wiles, Keith, dev, jerin.jacob,
	balasubramanian.manoharan, hemant.agrawal, shreyansh.jain,
	Richardson, Bruce

2017-03-07 10:14, Dumitrescu, Cristian:
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > On Mon, 6 Mar 2017 20:41:27 +0000
> > "Wiles, Keith" <keith.wiles@intel.com> wrote:
> > 
> > > Being able to add features without having to change DPDK maybe a strong
> > feature for companies that have special needs for its application. They just
> > need to add a rte_eth_capability enum in a range that they want to control
> > (which does not mean they need to change the above structure) and they
> > can provide private features to the application especially if they are very
> > specific features to some HW. I do not like private features, but I also do not
> > want to stick just any old API in DPDK for any given special feature.
> > 
> > 
> > I understand why you make that argument, but in practice it doesn't work
> > that way.
> > When new features get added to DPDK, then the application must request
> > those features through configration and other
> > API's. Therefore building everything into eth_dev doesn't seem to be
> > helpful.
> 
> Stephen, I think we are all aligned here. Question is: do you want the application to discover the supported capabilities through a standard API or do you want each capability to provide its own specific discovery mechanism (if any)? This patch proposes a standard API.

Just a precision: A function with a void* parameter is not a fully defined API.
We still need to know how to interpret the void* in each case.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 1/2] ethdev: add capability control API
  2017-03-07 12:56                   ` Thomas Monjalon
@ 2017-03-07 19:17                     ` Wiles, Keith
  0 siblings, 0 replies; 52+ messages in thread
From: Wiles, Keith @ 2017-03-07 19:17 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Dumitrescu, Cristian, Stephen Hemminger, DPDK, jerin.jacob,
	balasubramanian.manoharan, hemant.agrawal, shreyansh.jain,
	Richardson, Bruce


> On Mar 7, 2017, at 6:56 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> 2017-03-07 10:14, Dumitrescu, Cristian:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>> On Mon, 6 Mar 2017 20:41:27 +0000
>>> "Wiles, Keith" <keith.wiles@intel.com> wrote:
>>> 
>>>> Being able to add features without having to change DPDK maybe a strong
>>> feature for companies that have special needs for its application. They just
>>> need to add a rte_eth_capability enum in a range that they want to control
>>> (which does not mean they need to change the above structure) and they
>>> can provide private features to the application especially if they are very
>>> specific features to some HW. I do not like private features, but I also do not
>>> want to stick just any old API in DPDK for any given special feature.
>>> 
>>> 
>>> I understand why you make that argument, but in practice it doesn't work
>>> that way.
>>> When new features get added to DPDK, then the application must request
>>> those features through configration and other
>>> API's. Therefore building everything into eth_dev doesn't seem to be
>>> helpful.
>> 
>> Stephen, I think we are all aligned here. Question is: do you want the application to discover the supported capabilities through a standard API or do you want each capability to provide its own specific discovery mechanism (if any)? This patch proposes a standard API.
> 
> Just a precision: A function with a void* parameter is not a fully defined API.
> We still need to know how to interpret the void* in each case.

One simple solution is to create inline function with the correct prototypes and a reasonable name for that function. The inline will just call the generic API providing the enum and the structure pointer converted into a void *. Using this simple method we get both solutions and adding a strong type check.

Regards,
Keith

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-06 20:07       ` Thomas Monjalon
@ 2017-03-07 19:29         ` Dumitrescu, Cristian
  2017-03-08  9:51           ` O'Driscoll, Tim
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-07 19:29 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce, O'Driscoll,
	Tim



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, March 6, 2017 8:07 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com; Wiles, Keith <keith.wiles@intel.com>; Richardson,
> Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> 2017-03-06 16:59, Dumitrescu, Cristian:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2017-03-04 01:10, Cristian Dumitrescu:
> > > > This patch introduces the generic ethdev API for the traffic manager
> > > > capability, which includes: hierarchical scheduling, traffic shaping,
> > > > congestion management, packet marking.
> > >
> > > We already have some API for QoS. Why integrating them in ethdev?
> > > ethdev is an interface for networking drivers.
> > > I think the QoS has nothing to do with drivers.
> > > If there are some operations to offload in drivers, please identify them
> > > and let's add the operations to ethdev.
> > >
> >
> > The reason to add to ethdev is because QoS traffic
> management/hierarchical scheduling is just another TX offload for Ethernet
> devices. This TX offload is present in NICs, NPUs and SoCs from Broadcom,
> Cavium, Intel, Mellanox, Netronome, NXP, others.
> >
> > The API we currently have in DPDK (librte_sched) is great, but it refers to
> an implementation for a fixed set of features for a BRAS-like hierarchy. The
> current abstraction layer proposal is intended to support pretty much any
> hierarchy and traffic management features such as hierarchical scheduling,
> traffic shaping, congestion management, marking under the same API. It
> targets pretty much any implementation, either HW, SW or hybrid; it does
> support the existing librte_sched library feature set, but it is not limited to it.
> 
> OK I better understand now.
> You should add this level of explanation in your patch.
> 
> However I am reluctant to add an API if there is no user.
> I think we should wait to have at least one existing driver implementing
> this API before integrating it.
> It was the approach of eventdev which has a dedicated next- tree.

The next-tree solution could work, but IMO is not the best for this case, as this is purely driver development. This is just a TX offload feature that is well understood, as opposed to a new library with a huge design effort required like eventdev.

I think we are reasonably close to get agreement on the API from Cavium, Intel and NXP. When this is done, how about including it in DPDK with the experimental tag attached to it until several drivers implement it?

>From Intel side, there are solid plans to implement it for ixgbe and i40e drivers in next DPDK releases, I am CC-ing Tim to confirm this. On Cavium and NXP side, Jerin and Hemant can comment on the plans to implement this API.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-07 19:29         ` Dumitrescu, Cristian
@ 2017-03-08  9:51           ` O'Driscoll, Tim
  2017-03-10 18:37             ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: O'Driscoll, Tim @ 2017-03-08  9:51 UTC (permalink / raw)
  To: Dumitrescu, Cristian, Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce

> From: Dumitrescu, Cristian
> 
> > -----Original Message-----
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > Sent: Monday, March 6, 2017 8:07 PM
> > To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> > Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
> > balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> > shreyansh.jain@nxp.com; Wiles, Keith <keith.wiles@intel.com>;
> Richardson,
> > Bruce <bruce.richardson@intel.com>
> > Subject: Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> >
> > 2017-03-06 16:59, Dumitrescu, Cristian:
> > > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > > 2017-03-04 01:10, Cristian Dumitrescu:
> > > > > This patch introduces the generic ethdev API for the traffic
> manager
> > > > > capability, which includes: hierarchical scheduling, traffic
> shaping,
> > > > > congestion management, packet marking.
> > > >
> > > > We already have some API for QoS. Why integrating them in ethdev?
> > > > ethdev is an interface for networking drivers.
> > > > I think the QoS has nothing to do with drivers.
> > > > If there are some operations to offload in drivers, please
> identify them
> > > > and let's add the operations to ethdev.
> > > >
> > >
> > > The reason to add to ethdev is because QoS traffic
> > management/hierarchical scheduling is just another TX offload for
> Ethernet
> > devices. This TX offload is present in NICs, NPUs and SoCs from
> Broadcom,
> > Cavium, Intel, Mellanox, Netronome, NXP, others.
> > >
> > > The API we currently have in DPDK (librte_sched) is great, but it
> refers to
> > an implementation for a fixed set of features for a BRAS-like
> hierarchy. The
> > current abstraction layer proposal is intended to support pretty much
> any
> > hierarchy and traffic management features such as hierarchical
> scheduling,
> > traffic shaping, congestion management, marking under the same API. It
> > targets pretty much any implementation, either HW, SW or hybrid; it
> does
> > support the existing librte_sched library feature set, but it is not
> limited to it.
> >
> > OK I better understand now.
> > You should add this level of explanation in your patch.
> >
> > However I am reluctant to add an API if there is no user.
> > I think we should wait to have at least one existing driver
> implementing
> > this API before integrating it.
> > It was the approach of eventdev which has a dedicated next- tree.
> 
> The next-tree solution could work, but IMO is not the best for this
> case, as this is purely driver development. This is just a TX offload
> feature that is well understood, as opposed to a new library with a huge
> design effort required like eventdev.
> 
> I think we are reasonably close to get agreement on the API from Cavium,
> Intel and NXP. When this is done, how about including it in DPDK with
> the experimental tag attached to it until several drivers implement it?
> 
> From Intel side, there are solid plans to implement it for ixgbe and
> i40e drivers in next DPDK releases, I am CC-ing Tim to confirm this.

That's correct. We plan to add support for this in the ixgbe and i40e drivers in 17.08.

> On
> Cavium and NXP side, Jerin and Hemant can comment on the plans to
> implement this API.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-08  9:51           ` O'Driscoll, Tim
@ 2017-03-10 18:37             ` Dumitrescu, Cristian
  2017-03-15 12:43               ` Thomas Monjalon
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-10 18:37 UTC (permalink / raw)
  To: O'Driscoll, Tim, Thomas Monjalon
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Wiles, Keith, Richardson, Bruce



> -----Original Message-----
> From: O'Driscoll, Tim
> Sent: Wednesday, March 8, 2017 9:52 AM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com; Wiles, Keith <keith.wiles@intel.com>; Richardson,
> Bruce <bruce.richardson@intel.com>
> Subject: RE: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> > From: Dumitrescu, Cristian
> >


...<snip>

> > > OK I better understand now.
> > > You should add this level of explanation in your patch.
> > >
> > > However I am reluctant to add an API if there is no user.
> > > I think we should wait to have at least one existing driver
> > implementing
> > > this API before integrating it.
> > > It was the approach of eventdev which has a dedicated next- tree.
> >
> > The next-tree solution could work, but IMO is not the best for this
> > case, as this is purely driver development. This is just a TX offload
> > feature that is well understood, as opposed to a new library with a huge
> > design effort required like eventdev.
> >
> > I think we are reasonably close to get agreement on the API from Cavium,
> > Intel and NXP. When this is done, how about including it in DPDK with
> > the experimental tag attached to it until several drivers implement it?
> >
> > From Intel side, there are solid plans to implement it for ixgbe and
> > i40e drivers in next DPDK releases, I am CC-ing Tim to confirm this.
> 
> That's correct. We plan to add support for this in the ixgbe and i40e drivers in
> 17.08.

Thomas, given Tim's confirmation of Intel's plans to implement this API for the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of including this API in 17.5 with experimental tag (subject to full API agreement being reached)?

IMO this approach has the advantage of showing that API agreement has been reached and driver development is in progress. Having it in DPDK is also a better way to advertise this API to the developers that would otherwise be unaware about this effort.

> 
> > On
> > Cavium and NXP side, Jerin and Hemant can comment on the plans to
> > implement this API.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-10 18:37             ` Dumitrescu, Cristian
@ 2017-03-15 12:43               ` Thomas Monjalon
  2017-03-16 16:23                 ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-15 12:43 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: O'Driscoll, Tim, dev, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, Wiles, Keith, Richardson, Bruce

2017-03-10 18:37, Dumitrescu, Cristian:
> From: O'Driscoll, Tim

> > > > OK I better understand now.
> > > > You should add this level of explanation in your patch.
> > > >
> > > > However I am reluctant to add an API if there is no user.
> > > > I think we should wait to have at least one existing driver
> > > implementing
> > > > this API before integrating it.
> > > > It was the approach of eventdev which has a dedicated next- tree.
> > >
> > > The next-tree solution could work, but IMO is not the best for this
> > > case, as this is purely driver development. This is just a TX offload
> > > feature that is well understood, as opposed to a new library with a huge
> > > design effort required like eventdev.
> > >
> > > I think we are reasonably close to get agreement on the API from Cavium,
> > > Intel and NXP. When this is done, how about including it in DPDK with
> > > the experimental tag attached to it until several drivers implement it?
> > >
> > > From Intel side, there are solid plans to implement it for ixgbe and
> > > i40e drivers in next DPDK releases, I am CC-ing Tim to confirm this.
> > 
> > That's correct. We plan to add support for this in the ixgbe and i40e drivers in
> > 17.08.
> 
> Thomas, given Tim's confirmation of Intel's plans to implement this API for the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of including this API in 17.5 with experimental tag (subject to full API agreement being reached)?

I think starting a branch in a dedicated "next" repo is a better approach.
rte_flow and eventdev were (and will be) integrated only when at least one
hardware device is supported.
I suggest to follow the same workflow.

> IMO this approach has the advantage of showing that API agreement has been reached and driver development is in progress. Having it in DPDK is also a better way to advertise this API to the developers that would otherwise be unaware about this effort.

IMO we can advertise a work in progress in a side branch.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-15 12:43               ` Thomas Monjalon
@ 2017-03-16 16:23                 ` Dumitrescu, Cristian
  2017-03-16 17:29                   ` Thomas Monjalon
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-16 16:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: O'Driscoll, Tim, dev, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, Wiles, Keith, Richardson, Bruce

... <snip>

> > Thomas, given Tim's confirmation of Intel's plans to implement this API for
> the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of including
> this API in 17.5 with experimental tag (subject to full API agreement being
> reached)?
> 
> I think starting a branch in a dedicated "next" repo is a better approach.
> rte_flow and eventdev were (and will be) integrated only when at least one
> hardware device is supported.
> I suggest to follow the same workflow.
> 

Thomas, if this is the only path forward you are willing to support, then let's go this way, but let's make sure we are all on the same page with the terms and conditions that apply.

Do you agree now to merge this next-tree to DPDK once this API is implemented for at least one PMD? We would like to avoid getting any last minute objections from you or anybody else on the fundamentals; if you have any, please let's discuss them now.

How do we manage the API freeze on the next-tree? Once the API is agreed, we would like to freeze it so the driver development can proceed; we can then do some reasonably small changes to the API based on the learnings we get during driver development. We would like to welcome any parties interested in contributing to join Cavium, Intel and NXP in this effort, but we would like to avoid any last minute major API change requests.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-16 16:23                 ` Dumitrescu, Cristian
@ 2017-03-16 17:29                   ` Thomas Monjalon
  2017-03-16 17:40                     ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-16 17:29 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: O'Driscoll, Tim, dev, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, keith.wiles, bruce.richardson

2017-03-16 16:23, Dumitrescu, Cristian:
> ... <snip>
> 
> > > Thomas, given Tim's confirmation of Intel's plans to implement this API for
> > the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of including
> > this API in 17.5 with experimental tag (subject to full API agreement being
> > reached)?
> > 
> > I think starting a branch in a dedicated "next" repo is a better approach.
> > rte_flow and eventdev were (and will be) integrated only when at least one
> > hardware device is supported.
> > I suggest to follow the same workflow.
> > 
> 
> Thomas, if this is the only path forward you are willing to support, then let's go this way, but let's make sure we are all on the same page with the terms and conditions that apply.
> 
> Do you agree now to merge this next-tree to DPDK once this API is implemented for at least one PMD? We would like to avoid getting any last minute objections from you or anybody else on the fundamentals; if you have any, please let's discuss them now.

At least one "hardware" PMD, yes. It would prove the API can work for real.
About accepting it definitely in a given release, it will be checked
with the technical board on Monday.

> How do we manage the API freeze on the next-tree? Once the API is agreed, we would like to freeze it so the driver development can proceed; we can then do some reasonably small changes to the API based on the learnings we get during driver development. We would like to welcome any parties interested in contributing to join Cavium, Intel and NXP in this effort, but we would like to avoid any last minute major API change requests.

You are taking it the wrong way. Your main concern is to not be disturbed
with change requests. It should be the contrary: you have a chance to
work with other vendors to test and improve the API.
You should embrace this chance and delay the API freeze as much as possible.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
  2017-03-06 10:38   ` Thomas Monjalon
  2017-03-06 16:15   ` Stephen Hemminger
@ 2017-03-16 17:35   ` Thomas Monjalon
  2017-03-30 10:32   ` Hemant Agrawal
  2017-04-07 13:20   ` Jerin Jacob
  4 siblings, 0 replies; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-16 17:35 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

2017-03-04 01:10, Cristian Dumitrescu:
> This patch introduces the generic ethdev API for the traffic manager
> capability, which includes: hierarchical scheduling, traffic shaping,
> congestion management, packet marking.
> 
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow approach)
> - Capability query API per port, per hierarchy level and per hierarchy node
> - Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
>   Weighted Round Robin (WRR)
> - Traffic shaping: single/dual rate, private (per node) and shared (by multiple
>   nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Please could you split some parts of this API in separate patches?
And it would be nice to add some text in the programmer's guide as part
of the API patches. It will help the review.
It could provide you the opportunity to explain the rationale in the
commit messages of each part and save it in the git history.

Last detail, please Cristian, try to be concise when writing such
explanations ;)

Thanks

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-16 17:29                   ` Thomas Monjalon
@ 2017-03-16 17:40                     ` Dumitrescu, Cristian
  2017-03-16 18:10                       ` Thomas Monjalon
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-16 17:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: O'Driscoll, Tim, dev, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, Wiles, Keith, Richardson, Bruce



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, March 16, 2017 5:30 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: O'Driscoll, Tim <tim.odriscoll@intel.com>; dev@dpdk.org;
> jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com; Wiles, Keith <keith.wiles@intel.com>; Richardson,
> Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> 2017-03-16 16:23, Dumitrescu, Cristian:
> > ... <snip>
> >
> > > > Thomas, given Tim's confirmation of Intel's plans to implement this API
> for
> > > the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of
> including
> > > this API in 17.5 with experimental tag (subject to full API agreement being
> > > reached)?
> > >
> > > I think starting a branch in a dedicated "next" repo is a better approach.
> > > rte_flow and eventdev were (and will be) integrated only when at least
> one
> > > hardware device is supported.
> > > I suggest to follow the same workflow.
> > >
> >
> > Thomas, if this is the only path forward you are willing to support, then let's
> go this way, but let's make sure we are all on the same page with the terms
> and conditions that apply.
> >
> > Do you agree now to merge this next-tree to DPDK once this API is
> implemented for at least one PMD? We would like to avoid getting any last
> minute objections from you or anybody else on the fundamentals; if you
> have any, please let's discuss them now.
> 
> At least one "hardware" PMD, yes. It would prove the API can work for real.
> About accepting it definitely in a given release, it will be checked
> with the technical board on Monday.
> 

OK, great, thank you. Is the agenda of the technical board meetings published in advance somewhere?


> > How do we manage the API freeze on the next-tree? Once the API is
> agreed, we would like to freeze it so the driver development can proceed;
> we can then do some reasonably small changes to the API based on the
> learnings we get during driver development. We would like to welcome any
> parties interested in contributing to join Cavium, Intel and NXP in this effort,
> but we would like to avoid any last minute major API change requests.
> 
> You are taking it the wrong way. Your main concern is to not be disturbed
> with change requests. It should be the contrary: you have a chance to
> work with other vendors to test and improve the API.
> You should embrace this chance and delay the API freeze as much as
> possible.

Not really. We definitely welcome change requests done in a timely manner. My concern is about last minute change requests, such as major API change requests just a few days before the release when driver development is complete. Is there a policy in place to prevent against such events for next-tree type of development?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-16 17:40                     ` Dumitrescu, Cristian
@ 2017-03-16 18:10                       ` Thomas Monjalon
  2017-03-16 19:06                         ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Monjalon @ 2017-03-16 18:10 UTC (permalink / raw)
  To: Dumitrescu, Cristian, konstantin.ananyev
  Cc: O'Driscoll, Tim, dev, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, Wiles, Keith, Richardson, Bruce,
	techboard

2017-03-16 17:40, Dumitrescu, Cristian:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2017-03-16 16:23, Dumitrescu, Cristian:
> > > ... <snip>
> > >
> > > > > Thomas, given Tim's confirmation of Intel's plans to implement this API
> > for
> > > > the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of
> > including
> > > > this API in 17.5 with experimental tag (subject to full API agreement being
> > > > reached)?
> > > >
> > > > I think starting a branch in a dedicated "next" repo is a better approach.
> > > > rte_flow and eventdev were (and will be) integrated only when at least
> > one
> > > > hardware device is supported.
> > > > I suggest to follow the same workflow.
> > > >
> > >
> > > Thomas, if this is the only path forward you are willing to support, then let's
> > go this way, but let's make sure we are all on the same page with the terms
> > and conditions that apply.
> > >
> > > Do you agree now to merge this next-tree to DPDK once this API is
> > implemented for at least one PMD? We would like to avoid getting any last
> > minute objections from you or anybody else on the fundamentals; if you
> > have any, please let's discuss them now.
> > 
> > At least one "hardware" PMD, yes. It would prove the API can work for real.
> > About accepting it definitely in a given release, it will be checked
> > with the technical board on Monday.
> > 
> 
> OK, great, thank you. Is the agenda of the technical board meetings published in advance somewhere?

For the previous meeting, it was published:
	https://bimestriel.framapad.org/p/r.a5199d22813a5ac79d1d365b9cecb905
For the next one, please Konstantin, could you publish the agenda on a pad?

> > > How do we manage the API freeze on the next-tree? Once the API is
> > agreed, we would like to freeze it so the driver development can proceed;
> > we can then do some reasonably small changes to the API based on the
> > learnings we get during driver development. We would like to welcome any
> > parties interested in contributing to join Cavium, Intel and NXP in this effort,
> > but we would like to avoid any last minute major API change requests.
> > 
> > You are taking it the wrong way. Your main concern is to not be disturbed
> > with change requests. It should be the contrary: you have a chance to
> > work with other vendors to test and improve the API.
> > You should embrace this chance and delay the API freeze as much as
> > possible.
> 
> Not really. We definitely welcome change requests done in a timely manner. My concern is about last minute change requests, such as major API change requests just a few days before the release when driver development is complete. Is there a policy in place to prevent against such events for next-tree type of development?

No there is no such policy on a next- tree.
It is free to the maintainer of the tree I guess.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-16 18:10                       ` Thomas Monjalon
@ 2017-03-16 19:06                         ` Dumitrescu, Cristian
  2017-03-24 19:55                           ` Dumitrescu, Cristian
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-16 19:06 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin
  Cc: O'Driscoll, Tim, dev, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain, Wiles, Keith, Richardson, Bruce,
	techboard



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, March 16, 2017 6:11 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>; Ananyev,
> Konstantin <konstantin.ananyev@intel.com>
> Cc: O'Driscoll, Tim <tim.odriscoll@intel.com>; dev@dpdk.org;
> jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com; Wiles, Keith <keith.wiles@intel.com>; Richardson,
> Bruce <bruce.richardson@intel.com>; techboard@dpdk.org
> Subject: Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> 2017-03-16 17:40, Dumitrescu, Cristian:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2017-03-16 16:23, Dumitrescu, Cristian:
> > > > ... <snip>
> > > >
> > > > > > Thomas, given Tim's confirmation of Intel's plans to implement this
> API
> > > for
> > > > > the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of
> > > including
> > > > > this API in 17.5 with experimental tag (subject to full API agreement
> being
> > > > > reached)?
> > > > >
> > > > > I think starting a branch in a dedicated "next" repo is a better
> approach.
> > > > > rte_flow and eventdev were (and will be) integrated only when at
> least
> > > one
> > > > > hardware device is supported.
> > > > > I suggest to follow the same workflow.
> > > > >
> > > >
> > > > Thomas, if this is the only path forward you are willing to support, then
> let's
> > > go this way, but let's make sure we are all on the same page with the
> terms
> > > and conditions that apply.
> > > >
> > > > Do you agree now to merge this next-tree to DPDK once this API is
> > > implemented for at least one PMD? We would like to avoid getting any
> last
> > > minute objections from you or anybody else on the fundamentals; if you
> > > have any, please let's discuss them now.
> > >
> > > At least one "hardware" PMD, yes. It would prove the API can work for
> real.
> > > About accepting it definitely in a given release, it will be checked
> > > with the technical board on Monday.
> > >
> >
> > OK, great, thank you. Is the agenda of the technical board meetings
> published in advance somewhere?
> 
> For the previous meeting, it was published:
> 	https://bimestriel.framapad.org/p/r.a5199d22813a5ac79d1d365b9ce
> cb905
> For the next one, please Konstantin, could you publish the agenda on a pad?
> 
> > > > How do we manage the API freeze on the next-tree? Once the API is
> > > agreed, we would like to freeze it so the driver development can
> proceed;
> > > we can then do some reasonably small changes to the API based on the
> > > learnings we get during driver development. We would like to welcome
> any
> > > parties interested in contributing to join Cavium, Intel and NXP in this
> effort,
> > > but we would like to avoid any last minute major API change requests.
> > >
> > > You are taking it the wrong way. Your main concern is to not be disturbed
> > > with change requests. It should be the contrary: you have a chance to
> > > work with other vendors to test and improve the API.
> > > You should embrace this chance and delay the API freeze as much as
> > > possible.
> >
> > Not really. We definitely welcome change requests done in a timely
> manner. My concern is about last minute change requests, such as major API
> change requests just a few days before the release when driver
> development is complete. Is there a policy in place to prevent against such
> events for next-tree type of development?
> 
> No there is no such policy on a next- tree.
> It is free to the maintainer of the tree I guess.

Thanks, Thomas. Can you please create a next-tree for QoS Traffic Management with the following details:
	Maintainer: Cristian
	Committers: Hemant, Jerin, Cristian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-16 19:06                         ` Dumitrescu, Cristian
@ 2017-03-24 19:55                           ` Dumitrescu, Cristian
  0 siblings, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-03-24 19:55 UTC (permalink / raw)
  To: 'Thomas Monjalon', Ananyev, Konstantin
  Cc: O'Driscoll, Tim, 'dev@dpdk.org',
	'jerin.jacob@caviumnetworks.com',
	'balasubramanian.manoharan@cavium.com',
	'hemant.agrawal@nxp.com',
	'shreyansh.jain@nxp.com',
	Wiles, Keith, Richardson, Bruce, 'techboard@dpdk.org'

> > No there is no such policy on a next- tree.
> > It is free to the maintainer of the tree I guess.
> 
> Thanks, Thomas. Can you please create a next-tree for QoS Traffic
> Management with the following details:
> 	Maintainer: Cristian
> 	Committers: Hemant, Jerin, Cristian

Hi Thomas, any progress with creating this tree? Thanks, Cristian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
                     ` (2 preceding siblings ...)
  2017-03-16 17:35   ` Thomas Monjalon
@ 2017-03-30 10:32   ` Hemant Agrawal
  2017-04-07 16:51     ` Dumitrescu, Cristian
  2017-04-07 13:20   ` Jerin Jacob
  4 siblings, 1 reply; 52+ messages in thread
From: Hemant Agrawal @ 2017-03-30 10:32 UTC (permalink / raw)
  To: Cristian Dumitrescu, dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan, shreyansh.jain

Hi Cristian,
	
On 3/4/2017 6:40 AM, Cristian Dumitrescu wrote:
> This patch introduces the generic ethdev API for the traffic manager
> capability, which includes: hierarchical scheduling, traffic shaping,
> congestion management, packet marking.
>
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow approach)
> - Capability query API per port, per hierarchy level and per hierarchy node
> - Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
>   Weighted Round Robin (WRR)
> - Traffic shaping: single/dual rate, private (per node) and shared (by multiple
>   nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
>
> Changes in v3:
> - Implemented feedback from Jerin [5]
> - Changed naming convention: scheddev -> tm
> - Improvements on the capability API:
> 	- Specification of marking capabilities per color
> 	- WFQ/WRR groups: sp_n_children_max -> wfq_wrr_n_children_per_group_max,
> 	  added wfq_wrr_n_groups_max, improved description of both, improved
> 	  description of wfq_wrr_weight_max
> 	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent update
> - Enforced/documented restrictions for root node (node_add() and update())
> - Enforced/documented shaper profile restrictions on PIR: PIR != 0, PIR >= CIR
> - Turned repetitive code in rte_tm.c into macro
> - Removed dependency on rte_red.h file (added RED params to rte_tm.h)
> - Color: removed "e_" from color names enum
> - Fixed small Doxygen style issues
>
> Changes in v2:
> - Implemented feedback from Hemant [4]
> - Improvements on the capability API
> 	- Added capability API for hierarchy level
> 	- Merged stats capability into the capability API
> 	- Added dynamic updates
> 	- Added non-leaf/leaf union to the node capability structure
> 	- Renamed sp_priority_min to sp_n_priorities_max, added clarifications
> 	- Fixed description for sp_n_children_max
> - Clarified and enforced rule on node ID range for leaf and non-leaf nodes
> 	- Added API functions to get node type (i.e. leaf/non-leaf):
> 	  get_leaf_nodes(), node_type_get()
> - Added clarification for the root node: its creation, its parent, its role
> 	- Macro NODE_ID_NULL as root node's parent
> 	- Description of the node_add() and node_parent_update() API functions
> - Added clarification for the first time add vs. subsequent updates rule
> 	- Cleaned up the description for the node_add() function
> - Statistics API improvements
> 	- Merged stats capability into the capability API
> 	- Added API function node_stats_update()
> 	- Added more stats per packet color
> - Added more error types
> - Fixed small Doxygen style issues
>
> Changes in v1 (since RFC [1]):
> - Implemented as ethdev plugin (similar to rte_flow) as opposed to more
>   monolithic additions to ethdev itself
> - Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
>   suggested items with only one exception, see the long list below, hopefully
>   nothing was forgotten.
>     - The item not done (hopefully for a good reason): driver-generated object
>       IDs. IMO the choice to have application-generated object IDs adds marginal
>       complexity to the driver (search ID function required), but it provides
>       huge simplification for the application. The app does not need to worry
>       about building & managing tree-like structure for storing driver-generated
>       object IDs, the app can use its own convention for node IDs depending on
>       the specific hierarchy that it needs. Trivial example: identify all
>       level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
>       on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
>       310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
>       112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
>       the other related simplification that was implemented: leaf nodes now have
>       predefined IDs that are the same with their Ethernet TX queue ID (
>       therefore no translation is required for leaf nodes).
> - Capability API. Done per port and per node as well.
> - Dual rate shapers
> - Added configuration of private shaper (per node) directly from the shaper
>   profile as part of node API (no shaper ID needed for private shapers), while
>   the shared shapers are configured outside of the node API using shaper profile
>   and communicated to the node using shared shaper ID. So there is no
>   configuration overhead for shared shapers if the app does not use any of them.
> - Leaf nodes now have predefined IDs that are the same with their Ethernet TX
>   queue ID (therefore no translation is required for leaf nodes). This is also
>   used to differentiate between a leaf node and a non-leaf node.
> - Domain-specific errors to give a precise indication of the error cause (same
>   as done by rte_flow)
> - Packet marking API
> - Packet length optional adjustment for shapers, positive (e.g. for adding
>   Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
>   based on IP packet bytes)
>
> [1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
> [2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
> [3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
> [4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
> [5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html
>
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
>  MAINTAINERS                            |    4 +
>  lib/librte_ether/Makefile              |    5 +-
>  lib/librte_ether/rte_ether_version.map |   30 +
>  lib/librte_ether/rte_tm.c              |  436 ++++++++++
>  lib/librte_ether/rte_tm.h              | 1466 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_tm_driver.h       |  365 ++++++++
>  6 files changed, 2305 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_ether/rte_tm.c
>  create mode 100644 lib/librte_ether/rte_tm.h
>  create mode 100644 lib/librte_ether/rte_tm_driver.h
>

<snip>...

> +
> +/**
> + * Traffic manager dynamic updates
> + */
> +enum rte_tm_dynamic_update_type {
> +	/**< Dynamic parent node update. The new parent node is located on same
> +	 * hierarchy level as the former parent node. Consequently, the node
> +	 * whose parent is changed preserves its hierarchy level.
> +	 */
> +	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
> +
> +	/**< Dynamic parent node update. The new parent node is located on
> +	 * different hierarchy level than the former parent node. Consequently,
> +	 * the node whose parent is changed also changes its hierarchy level.
> +	 */
> +	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
> +
> +	/**< Dynamic node add/delete. */
> +	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
> +
> +	/**< Suspend/resume nodes. */
> +	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,

what is the expectation from suspend/resumes?
1. the Fan-out will stop.
2. the Fan-In will stop, any enqueue request will result into error?  or 
it will be implementation dependent.
During suspend/hold state the buffers should not hold in the node.

> +
> +	/**< Dynamic switch between WFQ and WRR per node SP priority level. */
> +	RTE_TM_UPDATE_NODE_SCHEDULING_MODE = 1 << 4,
> +
> +	/**< Dynamic update of the set of enabled stats counter types. */
> +	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
> +
> +	/**< Dynamic update of congestion management mode for leaf nodes. */
> +	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
> +};
> +
> +/**
> + * Traffic manager node capabilities
> + */
> +struct rte_tm_node_capabilities {
> +	/**< Private shaper support. */
> +	int shaper_private_supported;
> +
> +	/**< Dual rate shaping support for private shaper. Valid only when
> +	 * private shaper is supported.
> +	 */
> +	int shaper_private_dual_rate_supported;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for private
> +	 * shaper. Valid only when private shaper is supported.
> +	 */
> +	uint64_t shaper_private_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for private
> +	 * shaper. Valid only when private shaper is supported.
> +	 */
> +	uint64_t shaper_private_rate_max;
> +
> +	/**< Maximum number of supported shared shapers. The value of zero
> +	 * indicates that shared shapers are not supported.
> +	 */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Mask of supported statistics counter types. */
> +	uint64_t stats_mask;
> +
> +	union {
> +		/**< Items valid only for non-leaf nodes. */
> +		struct {
> +			/**< Maximum number of children nodes. */
> +			uint32_t n_children_max;
> +
> +			/**< Maximum number of supported priority levels. The
> +			 * value of zero is invalid. The value of 1 indicates
> +			 * that only priority 0 is supported, which essentially
> +			 * means that Strict Priority (SP) algorithm is not
> +			 * supported.
> +			 */
> +			uint32_t sp_n_priorities_max;
> +
> +			/**< Maximum number of sibling nodes that can have the
> +			 * same priority at any given time, i.e. maximum size
> +			 * of the WFQ/WRR sibling node group. The value of zero
> +			 * is invalid. The value of 1 indicates that WFQ/WRR
> +			 * algorithms are not supported. The maximum value is
> +			 * *n_children_max*.
> +			 */
> +			uint32_t wfq_wrr_n_children_per_group_max;
> +
> +			/**< Maximum number of priority levels that can have
> +			 * more than one child node at any given time, i.e.
> +			 * maximum number of WFQ/WRR sibling node groups that
> +			 * have two or more members. The value of zero states
> +			 * that WFQ/WRR algorithms are not supported. The value
> +			 * of 1 indicates that (*sp_n_priorities_max* - 1)
> +			 * priority levels have at most one child node, so
> +			 * there can be only one priority level with two or
> +			 * more sibling nodes making up a WFQ/WRR group. The
> +			 * maximum value is: min(floor(*n_children_max* / 2),
> +			 * *sp_n_priorities_max*).
> +			 */
> +			uint32_t wfq_wrr_n_groups_max;
> +
> +			/**< WFQ algorithm support. */
> +			int wfq_supported;
> +
> +			/**< WRR algorithm support. */
> +			int wrr_supported;
> +
> +			/**< Maximum WFQ/WRR weight. The value of 1 indicates
> +			 * that all sibling nodes with same priority have the
> +			 * same WFQ/WRR weight, so WFQ/WRR is reduced to FQ/RR.
> +			 */
> +			uint32_t wfq_wrr_weight_max;
> +		} nonleaf;
> +
> +		/**< Items valid only for leaf nodes. */
> +		struct {
> +			/**< Head drop algorithm support. */
> +			int cman_head_drop_supported;
> +
> +			/**< Private WRED context support. */
> +			int cman_wred_context_private_supported;
> +
> +			/**< Maximum number of shared WRED contexts supported.
> +			 * The value of zero indicates that shared WRED
> +			 * contexts are not supported.
> +			 */
> +			uint32_t cman_wred_context_shared_n_max;
> +		} leaf;
> +	};
> +};
> +
> +/**
> + * Traffic manager level capabilities
> + */
> +struct rte_tm_level_capabilities {
> +	/**< Maximum number of nodes for the current hierarchy level. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of non-leaf nodes for the current hierarchy level.
> +	 * The value of 0 indicates that current level only supports leaf
> +	 * nodes. The maximum value is *n_nodes_max*.
> +	 */
> +	uint32_t n_nodes_nonleaf_max;
> +
> +	/**< Maximum number of leaf nodes for the current hierarchy level. The
> +	 * value of 0 indicates that current level only supports non-leaf
> +	 * nodes. The maximum value is *n_nodes_max*.
> +	 */
> +	uint32_t n_nodes_leaf_max;
> +
> +	/**< Summary of node-level capabilities across all the non-leaf nodes
> +	 * of the current hierarchy level. Valid only when
> +	 * *n_nodes_nonleaf_max* is greater than 0.
> +	 */
> +	struct rte_tm_node_capabilities nonleaf;
> +
> +	/**< Summary of node-level capabilities across all the leaf nodes of
> +	 * the current hierarchy level. Valid only when *n_nodes_leaf_max* is
> +	 * greater than 0.
> +	 */
> +	struct rte_tm_node_capabilities leaf;

you also need to provide a flag that all the nodes in this level has 
equal capability of not.

In case, all nodes in this level has equal priority, the implementation 
need to to inquire any further for node level capability. Generally in 
case of HWs, they have equal capabilities at a particular level.

If all nodes are not having equal priority, this information may not be 
correct and it may only provide a very rough overview of node 
capabilities. the applications should not assume anything about a 
particular node capability within this level.


> +};
> +
> +/**
> + * Traffic manager capabilities
> + */
> +struct rte_tm_capabilities {
> +	/**< Maximum number of nodes. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of levels (i.e. number of nodes connecting the root
> +	 * node with any leaf node, including the root and the leaf).
> +	 */
> +	uint32_t n_levels_max;
> +
> +	/**< Maximum number of shapers, either private or shared. In case the
> +	 * implementation does not share any resource between private and
> +	 * shared shapers, it is typically equal to the sum between
> +	 * *shaper_private_n_max* and *shaper_shared_n_max*.
> +	 */
> +	uint32_t shaper_n_max;
> +
> +	/**< Maximum number of private shapers. Indicates the maximum number of
> +	 * nodes that can concurrently have the private shaper enabled.
> +	 */
> +	uint32_t shaper_private_n_max;
> +
> +	/**< Maximum number of shared shapers. The value of zero indicates that
> +	 * shared shapers are not supported.
> +	 */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Maximum number of nodes that can share the same shared shaper.
> +	 * Only valid when shared shapers are supported.
> +	 */
> +	uint32_t shaper_shared_n_nodes_max;
> +
> +	/**< Maximum number of shared shapers that can be configured with dual
> +	 * rate shaping. The value of zero indicates that dual rate shaping
> +	 * support is not available for shared shapers.
> +	 */
> +	uint32_t shaper_shared_dual_rate_n_max;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for shared shapers.
> +	 * Only valid when shared shapers are supported.
> +	 */
> +	uint64_t shaper_shared_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for shared shaper.
> +	 * Only valid when shared shapers are supported.
> +	 */
> +	uint64_t shaper_shared_rate_max;
> +
> +	/**< Minimum value allowed for packet length adjustment for
> +	 * private/shared shapers.
> +	 */
> +	int shaper_pkt_length_adjust_min;
> +
> +	/**< Maximum value allowed for packet length adjustment for
> +	 * private/shared shapers.
> +	 */
> +	int shaper_pkt_length_adjust_max;
> +
> +	/**< Maximum number of WRED contexts. */
> +	uint32_t cman_wred_context_n_max;
> +
> +	/**< Maximum number of private WRED contexts. Indicates the maximum
> +	 * number of leaf nodes that can concurrently have the private WRED
> +	 * context enabled.
> +	 */
> +	uint32_t cman_wred_context_private_n_max;
> +
> +	/**< Maximum number of shared WRED contexts. The value of zero
> +	 * indicates that shared WRED contexts are not supported.
> +	 */
> +	uint32_t cman_wred_context_shared_n_max;
> +
> +	/**< Maximum number of leaf nodes that can share the same WRED context.
> +	 * Only valid when shared WRED contexts are supported.
> +	 */
> +	uint32_t cman_wred_context_shared_n_nodes_max;
> +
> +	/**< Support for VLAN DEI packet marking (per color). */
> +	int mark_vlan_dei_supported[RTE_TM_COLORS];
> +
> +	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
> +	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
> +
> +	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
> +	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
> +
> +	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
> +	int mark_ip_dscp_supported[RTE_TM_COLORS];
> +
> +	/**< Set of supported dynamic update operations
> +	 * (see enum rte_tm_dynamic_update_type).
> +	 */
> +	uint64_t dynamic_update_mask;
> +
> +	/**< Summary of node-level capabilities across all non-leaf nodes. */
> +	struct rte_tm_node_capabilities nonleaf;
> +
> +	/**< Summary of node-level capabilities across all leaf nodes. */
> +	struct rte_tm_node_capabilities leaf;

This is not right, When you are having per level capabilities, why to 
return a node level capability with TM.

In software, it is easy to maintain all nodes of same capabilities. But 
in case of hardware, the capabilities of different levels is going to be 
different.

This will result into non-portable implementation, where the application 
will find the easy way to build the TM tree on this basis of these values.

You already have most of the capability indications in the tm level 
parameters. the only information missing is w.r.t WRR/WFQ and SP 
capability of tm. you can add some flags to get that.


> +};
> +
> +/**
> + * Congestion management (CMAN) mode
> + *
> + * This is used for controlling the admission of packets into a packet queue or
> + * group of packet queues on congestion. On request of writing a new packet
> + * into the current queue while the queue is full, the *tail drop* algorithm
> + * drops the new packet while leaving the queue unmodified, as opposed to *head
> + * drop* algorithm, which drops the packet at the head of the queue (the oldest
> + * packet waiting in the queue) and admits the new packet at the tail of the
> + * queue.
> + *
> + * The *Random Early Detection (RED)* algorithm works by proactively dropping
> + * more and more input packets as the queue occupancy builds up. When the queue
> + * is full or almost full, RED effectively works as *tail drop*. The *Weighted
> + * RED* algorithm uses a separate set of RED thresholds for each packet color.
> + */
> +enum rte_tm_cman_mode {
> +	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
> +	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
> +	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
> +};
> +
> +/**
> + * Random Early Detection (RED) profile
> + */
> +struct rte_tm_red_params {
> +	/**< Minimum queue threshold */
> +	uint16_t min_th;
> +
> +	/**< Maximum queue threshold */
> +	uint16_t max_th;
> +
> +	/**< Inverse of packet marking probability maximum value (maxp), i.e.
> +	 * maxp_inv = 1 / maxp
> +	 */
> +	uint16_t maxp_inv;
> +
> +	/**< Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
> +	uint16_t wq_log2;
> +};
> +
> +/**
> + * Weighted RED (WRED) profile
> + *
> + * Multiple WRED contexts can share the same WRED profile. Each leaf node with
> + * WRED enabled as its congestion management mode has zero or one private WRED
> + * context (only one leaf node using it) and/or zero, one or several shared
> + * WRED contexts (multiple leaf nodes use the same WRED context). A private
> + * WRED context is used to perform congestion management for a single leaf
> + * node, while a shared WRED context is used to perform congestion management
> + * for a group of leaf nodes.
> + */
> +struct rte_tm_wred_params {
> +	/**< One set of RED parameters per packet color */
> +	struct rte_tm_red_params red_params[RTE_TM_COLORS];
> +};
> +
> +/**
> + * Token bucket
> + */
> +struct rte_tm_token_bucket {
> +	/**< Token bucket rate (bytes per second) */
> +	uint64_t rate;
> +
> +	/**< Token bucket size (bytes), a.k.a. max burst size */
> +	uint64_t size;
> +};
> +
> +/**
> + * Shaper (rate limiter) profile
> + *
> + * Multiple shaper instances can share the same shaper profile. Each node has
> + * zero or one private shaper (only one node using it) and/or zero, one or
> + * several shared shapers (multiple nodes use the same shaper instance).
> + * A private shaper is used to perform traffic shaping for a single node, while
> + * a shared shaper is used to perform traffic shaping for a group of nodes.
> + *
> + * Single rate shapers use a single token bucket. A single rate shaper can be
> + * configured by setting the rate of the committed bucket to zero, which
> + * effectively disables this bucket. The peak bucket is used to limit the rate
> + * and the burst size for the current shaper.
> + *
> + * Dual rate shapers use both the committed and the peak token buckets. The
> + * rate of the peak bucket has to be bigger than zero, as well as greater than
> + * or equal to the rate of the committed bucket.
> + */
> +struct rte_tm_shaper_params {
> +	/**< Committed token bucket */
> +	struct rte_tm_token_bucket committed;
> +
> +	/**< Peak token bucket */
> +	struct rte_tm_token_bucket peak;
> +
> +	/**< Signed value to be added to the length of each packet for the
> +	 * purpose of shaping. Can be used to correct the packet length with
> +	 * the framing overhead bytes that are also consumed on the wire (e.g.
> +	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
> +	 */
> +	int32_t pkt_length_adjust;
> +};
> +
> +/**
> + * Node parameters
> + *
> + * Each hierarchy node has multiple inputs (children nodes of the current
> + * parent node) and a single output (which is input to its parent node). The
> + * current node arbitrates its inputs using Strict Priority (SP), Weighted Fair
> + * Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to schedule input
> + * packets on its output while observing its shaping (rate limiting)
> + * constraints.
> + *
> + * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
> + * approximations of the ideal of WFQ and are assimilated to WFQ, although an
> + * associated implementation-dependent trade-off on accuracy, performance and
> + * resource usage might exist.
> + *
> + * Children nodes with different priorities are scheduled using the SP
> + * algorithm, based on their priority, with zero (0) as the highest priority.
> + * Children with same priority are scheduled using the WFQ or WRR algorithm,
> + * based on their weight, which is relative to the sum of the weights of all
> + * siblings with same priority, with one (1) as the lowest weight.
> + *
> + * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> + * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
> + * where N is the number of TX queues configured for the current Ethernet port.
> + * The non-leaf nodes have their IDs generated by the application.
> + */
> +struct rte_tm_node_params {
> +	/**< Shaper profile for the private shaper. The absence of the private
> +	 * shaper for the current node is indicated by setting this parameter
> +	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
> +	 */
> +	uint32_t shaper_profile_id;
> +
> +	/**< User allocated array of valid shared shaper IDs. */
> +	uint32_t *shared_shaper_id;
> +
> +	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
> +	uint32_t n_shared_shapers;
> +
> +	/**< Mask of statistics counter types to be enabled for this node. This
> +	 * needs to be a subset of the statistics counter types available for
> +	 * the current node. Any statistics counter type not included in this
> +	 * set is to be disabled for the current node.
> +	 */
> +	uint64_t stats_mask;
> +
> +	union {
> +		/**< Parameters only valid for non-leaf nodes. */
> +		struct {
> +			/**< For each priority, indicates whether the children
> +			 * nodes sharing the same priority are to be scheduled
> +			 * by WFQ or by WRR. When NULL, it indicates that WFQ
> +			 * is to be used for all priorities. When non-NULL, it
> +			 * points to a pre-allocated array of *n_priority*
> +			 * elements, with a non-zero value element indicating
> +			 * WFQ and a zero value element for WRR.
> +			 */
> +			int *scheduling_mode_per_priority;
> +
> +			/**< Number of priorities. */
> +			uint32_t n_priorities;
> +		} nonleaf;
> +
> +		/**< Parameters only valid for leaf nodes. */
> +		struct {
> +			/**< Congestion management mode */
> +			enum rte_tm_cman_mode cman;
> +
> +			/**< WRED parameters (valid when *cman* is WRED). */
> +			struct {
> +				/**< WRED profile for private WRED context. */
> +				uint32_t wred_profile_id;
> +
> +				/**< User allocated array of shared WRED
> +				 * context IDs. The absence of a private WRED
> +				 * context for current leaf node is indicated
> +				 * by value RTE_TM_WRED_PROFILE_ID_NONE.
> +				 */
> +				uint32_t *shared_wred_context_id;
> +
> +				/**< Number of shared WRED context IDs in the
> +				 * *shared_wred_context_id* array.
> +				 */
> +				uint32_t n_shared_wred_contexts;
> +			} wred;
> +		} leaf;
> +	};
> +};
> +
> +/**
> + * Verbose error types.
> + *
> + * Most of them provide the type of the object referenced by struct
> + * rte_tm_error::cause.
> + */
> +enum rte_tm_error_type {
> +	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
> +	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
> +	RTE_TM_ERROR_TYPE_CAPABILITIES,
> +	RTE_TM_ERROR_TYPE_LEVEL_ID,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
> +	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_PRIORITIES,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
> +	RTE_TM_ERROR_TYPE_NODE_ID,
> +};
> +
> +/**
> + * Verbose error structure definition.
> + *
> + * This object is normally allocated by applications and set by PMDs, the
> + * message points to a constant string which does not need to be freed by
> + * the application, however its pointer can be considered valid only as long
> + * as its associated DPDK port remains configured. Closing the underlying
> + * device or unloading the PMD invalidates it.
> + *
> + * Both cause and message may be NULL regardless of the error type.
> + */
> +struct rte_tm_error {
> +	enum rte_tm_error_type type; /**< Cause field and error type. */
> +	const void *cause; /**< Object responsible for the error. */
> +	const char *message; /**< Human-readable error message. */
> +};
> +
> +/**
> + * Traffic manager get number of leaf nodes
> + *
> + * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> + * Therefore, the set of leaf nodes is predefined, their number is always equal
> + * to N (where N is the number of TX queues configured for the current port)
> + * and their IDs are 0 .. (N-1).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param n_leaf_nodes
> + *   Number of leaf nodes for the current port.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_get_leaf_nodes(uint8_t port_id,
> +	uint32_t *n_leaf_nodes,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node type (i.e. leaf or non-leaf) get
> + *
> + * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
> + * the number of TX queues of the current Ethernet port. The non-leaf nodes
> + * have their IDs generated by the application outside of the above range,
> + * which is reserved for leaf nodes.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID value. Needs to be valid.
> + * @param is_leaf
> + *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_type_get(uint8_t port_id,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap
> + *   Traffic manager capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_capabilities_get(uint8_t port_id,
> +	struct rte_tm_capabilities *cap,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager level capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param level_id
> + *   The hierarchy level identifier. The value of 0 identifies the level of the
> + *   root node.
> + * @param cap
> + *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_level_capabilities_get(uint8_t port_id,
> +	uint32_t level_id,
> +	struct rte_tm_level_capabilities *cap,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param cap
> + *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_capabilities_get(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_node_capabilities *cap,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager WRED profile add
> + *
> + * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
> + * is used to create one or several WRED contexts.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + *   WRED profile ID for the new profile. Needs to be unused.
> + * @param profile
> + *   WRED profile parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_wred_profile_add(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_wred_params *profile,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager WRED profile delete
> + *
> + * Delete an existing WRED profile. This operation fails when there is
> + * currently at least one user (i.e. WRED context) of this WRED profile.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_wred_profile_delete(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared WRED context add or update
> + *
> + * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
> + * created by using the WRED profile identified by *wred_profile_id*.
> + *
> + * When *shared_wred_context_id* is valid, this WRED context is no longer using
> + * the profile previously assigned to it and is updated to use the profile
> + * identified by *wred_profile_id*.
> + *
> + * A valid shared WRED context can be assigned to several hierarchy leaf nodes
> + * configured to use WRED as the congestion management mode.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID
> + * @param wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_wred_context_add_update(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared WRED context delete
> + *
> + * Delete an existing shared WRED context. This operation fails when there is
> + * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
> + * context.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_wred_context_delete(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shaper profile add
> + *
> + * Create a new shaper profile with ID set to *shaper_profile_id*. The new
> + * shaper profile is used to create one or several shapers.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + *   Shaper profile ID for the new profile. Needs to be unused.
> + * @param profile
> + *   Shaper profile parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shaper_profile_add(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_shaper_params *profile,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shaper profile delete
> + *
> + * Delete an existing shaper profile. This operation fails when there is
> + * currently at least one user (i.e. shaper) of this shaper profile.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shaper_profile_delete(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared shaper add or update
> + *
> + * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
> + * with this ID is created using the shaper profile identified by
> + * *shaper_profile_id*.
> + *
> + * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
> + * no longer using the shaper profile previously assigned to it and is updated
> + * to use the shaper profile identified by *shaper_profile_id*.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_shaper_id
> + *   Shared shaper ID
> + * @param shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_shaper_add_update(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared shaper delete
> + *
> + * Delete an existing shared shaper. This operation fails when there is
> + * currently at least one user (i.e. hierarchy node) of this shared shaper.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_shaper_id
> + *   Shared shaper ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_shaper_delete(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node add
> + *
> + * Create new node and connect it as child of an existing node. The new node is
> + * further identified by *node_id*, which needs to be unused by any of the
> + * existing nodes. The parent node is identified by *parent_node_id*, which
> + * needs to be the valid ID of an existing non-leaf node. The parent node is
> + * going to use the provided SP *priority* and WFQ/WRR *weight* to schedule its
> + * new child node.
> + *
> + * This function has to be called for both leaf and non-leaf nodes. In the case
> + * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
> + * the number of configured TX queues of the current port), the leaf node is
> + * configured rather than created (as the set of leaf nodes is predefined) and
> + * it is also connected as child of an existing node.
> + *
> + * The first node that is added becomes the root node and all the nodes that
> + * are subsequently added have to be added as descendants of the root node. The
> + * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
> + * can only be one node with this parent ID (i.e. the root node). Further
> + * restrictions for root node: needs to be non-leaf, its private shaper profile
> + * needs to be valid and single rate, cannot use any shared shapers.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be unused by any of the existing nodes.
> + * @param parent_node_id
> + *   Parent node ID. Needs to be the valid.
> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param params
> + *   Node parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_add(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node delete
> + *
> + * Delete an existing node. This operation fails when this node currently has
> + * at least one user (i.e. child node).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_delete(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node suspend
> + *
> + * Suspend an existing node.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_suspend(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node resume
> + *
> + * Resume an existing node that was previously suspended.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_resume(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager hierarchy set
> + *
> + * This function is called during the port initialization phase (before the
> + * Ethernet port is started) to freeze the start-up hierarchy.
> + *
> + * This function fails when the currently configured hierarchy is not supported
> + * by the Ethernet port, in which case the user can abort or try out another
> + * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can
> + * be build from scratch (when *clear_on_fail* is enabled) or by modifying the
> + * existing hierarchy configuration (when *clear_on_fail* is disabled).
> + *
> + * Note that, even when the configured hierarchy is supported (so this function
> + * is successful), the Ethernet port start might still fail due to e.g. not
> + * enough memory being available in the system, etc.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param clear_on_fail
> + *   On function call failure, hierarchy is cleared when this parameter is
> + *   non-zero and preserved when this parameter is equal to zero.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_hierarchy_set(uint8_t port_id,
> +	int clear_on_fail,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node parent update
> + *
> + * Restriction for root node: its parent cannot be changed.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param parent_node_id
> + *   Node ID for the new parent. Needs to be valid.
> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is zero. Used by the
> + *   WFQ/WRR algorithm running on the parent of the current node for scheduling
> + *   this child node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node private shaper update
> + *
> + * Restriction for root node: its private shaper profile needs to be valid and
> + * single rate.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param shaper_profile_id
> + *   Shaper profile ID for the private shaper of the current node. Needs to be
> + *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
> + *   the latter disabling the private shaper of the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node shared shapers update
> + *
> + * Restriction for root node: cannot use any shared rate shapers.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param shared_shaper_id
> + *   Shared shaper ID. Needs to be valid.
> + * @param add
> + *   Set to non-zero value to add this shared shaper to current node or to zero
> + *   to delete this shared shaper from current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_shared_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int add,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node enabled statistics counters update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param stats_mask
> + *   Mask of statistics counter types to be enabled for the current node. This
> + *   needs to be a subset of the statistics counter types available for the
> + *   current node. Any statistics counter type not included in this set is to
> + *   be disabled for the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_stats_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t stats_mask,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node scheduling mode update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param scheduling_mode_per_priority
> + *   For each priority, indicates whether the children nodes sharing the same
> + *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates
> + *   that WFQ is to be used for all priorities. When non-NULL, it points to a
> + *   pre-allocated array of *n_priority* elements, with a non-zero value
> + *   element indicating WFQ and a zero value element for WRR.
> + * @param n_priorities
> + *   Number of priorities.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_scheduling_mode_update(uint8_t port_id,
> +	uint32_t node_id,
> +	int *scheduling_mode_per_priority,
> +	uint32_t n_priorities,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node congestion management mode update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param cman
> + *   Congestion management mode.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_cman_update(uint8_t port_id,
> +	uint32_t node_id,
> +	enum rte_tm_cman_mode cman,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node private WRED context update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param wred_profile_id
> + *   WRED profile ID for the private WRED context of the current node. Needs to
> + *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with
> + *   the latter disabling the private WRED context of the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node shared WRED context update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID. Needs to be valid.
> + * @param add
> + *   Set to non-zero value to add this shared WRED context to current node or
> + *   to zero to delete this shared WRED context from current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_shared_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node statistics counters read
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param stats
> + *   When non-NULL, it contains the current value for the statistics counters
> + *   enabled for the current node.
> + * @param stats_mask
> + *   When non-NULL, it contains the mask of statistics counter types that are
> + *   currently enabled for this node, indicating which of the counters
> + *   retrieved with the *stats* structure are valid.
> + * @param clear
> + *   When this parameter has a non-zero value, the statistics counters are
> + *   cleared (i.e. set to zero) immediately after they have been read,
> + *   otherwise the statistics counters are left untouched.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_stats_read(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_node_stats *stats,
> +	uint64_t *stats_mask,
> +	int clear,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
> + *
> + * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
> + * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
> + * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
> + * Format Indicator (CFI).
> + *
> + * All VLAN frames of a given color get their DEI bit set if marking is enabled
> + * for this color; otherwise, their DEI bit is left as is (either set or not).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_mark_vlan_dei(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
> + *
> + * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
> + * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
> + * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
> + * Notification (ECN) field (2 bits). The DSCP field is typically used to
> + * encode the traffic class and/or drop priority (RFC 2597), while the ECN
> + * field is used by RFC 3168 to implement a congestion notification mechanism
> + * to be leveraged by transport layer protocols such as TCP and SCTP that have
> + * congestion control mechanisms.
> + *
> + * When congestion is experienced, as alternative to dropping the packet,
> + * routers can change the ECN field of input packets from 2'b01 or 2'b10
> + * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
> + * that congestion is experienced). The destination endpoint can use the
> + * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
> + * source endpoint, which acknowledges it back to the destination endpoint with
> + * the Congestion Window Reduced (CWR) TCP flag.
> + *
> + * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
> + * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
> + * enabled for the current color, otherwise the ECN field is left as is.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_mark_ip_ecn(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
> + *
> + * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
> + * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
> + * values proposed by this RFC:
> + *
> + *                       Class 1    Class 2    Class 3    Class 4
> + *                     +----------+----------+----------+----------+
> + *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
> + *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
> + *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
> + *                     +----------+----------+----------+----------+
> + *
> + * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
> + * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
> + *
> + * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
> + * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
> + * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
> + * for each color; when not enabled for a given color, the DSCP field of all
> + * packets with that color is left as is.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_mark_ip_dscp(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_TM_H__ */
> diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
> new file mode 100644
> index 0000000..b3c9c15
> --- /dev/null
> +++ b/lib/librte_ether/rte_tm_driver.h
> @@ -0,0 +1,365 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef __INCLUDE_RTE_TM_DRIVER_H__
> +#define __INCLUDE_RTE_TM_DRIVER_H__
> +
> +/**
> + * @file
> + * RTE Generic Traffic Manager API (Driver Side)
> + *
> + * This file provides implementation helpers for internal use by PMDs, they
> + * are not intended to be exposed to applications and are not subject to ABI
> + * versioning.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include "rte_ethdev.h"
> +#include "rte_tm.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node type get */
> +
> +typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
> +	struct rte_tm_capabilities *cap,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager capabilities get */
> +
> +typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
> +	uint32_t level_id,
> +	struct rte_tm_level_capabilities *cap,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager level capabilities get */
> +
> +typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_node_capabilities *cap,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node capabilities get */
> +
> +typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_wred_params *profile,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager WRED profile add */
> +
> +typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager WRED profile delete */
> +
> +typedef int (*rte_tm_shared_wred_context_add_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared WRED context add */
> +
> +typedef int (*rte_tm_shared_wred_context_delete_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared WRED context delete */
> +
> +typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_shaper_params *profile,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shaper profile add */
> +
> +typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shaper profile delete */
> +
> +typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared shaper add/update */
> +
> +typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared shaper delete */
> +
> +typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node add */
> +
> +typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node delete */
> +
> +typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node suspend */
> +
> +typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node resume */
> +
> +typedef int (*rte_tm_hierarchy_set_t)(struct rte_eth_dev *dev,
> +	int clear_on_fail,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager hierarchy set */
> +
> +typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node parent update */
> +
> +typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node shaper update */
> +
> +typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int32_t add,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node shaper update */
> +
> +typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint64_t stats_mask,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node stats update */
> +
> +typedef int (*rte_tm_node_scheduling_mode_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	int *scheduling_mode_per_priority,
> +	uint32_t n_priorities,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node scheduling mode update */
> +
> +typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	enum rte_tm_cman_mode cman,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node congestion management mode update */
> +
> +typedef int (*rte_tm_node_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node WRED context update */
> +
> +typedef int (*rte_tm_node_shared_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node WRED context update */
> +
> +typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_node_stats *stats,
> +	uint64_t *stats_mask,
> +	int clear,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager read stats counters for specific node */
> +
> +typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager packet marking - VLAN DEI */
> +
> +typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager packet marking - IPv4/IPv6 ECN */
> +
> +typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
> +
> +struct rte_tm_ops {
> +	/** Traffic manager node type get */
> +	rte_tm_node_type_get_t node_type_get;
> +
> +	/** Traffic manager capabilities_get */
> +	rte_tm_capabilities_get_t capabilities_get;
> +	/** Traffic manager level capabilities_get */
> +	rte_tm_level_capabilities_get_t level_capabilities_get;
> +	/** Traffic manager node capabilities get */
> +	rte_tm_node_capabilities_get_t node_capabilities_get;
> +
> +	/** Traffic manager WRED profile add */
> +	rte_tm_wred_profile_add_t wred_profile_add;
> +	/** Traffic manager WRED profile delete */
> +	rte_tm_wred_profile_delete_t wred_profile_delete;
> +	/** Traffic manager shared WRED context add/update */
> +	rte_tm_shared_wred_context_add_update_t
> +		shared_wred_context_add_update;
> +	/** Traffic manager shared WRED context delete */
> +	rte_tm_shared_wred_context_delete_t
> +		shared_wred_context_delete;
> +
> +	/** Traffic manager shaper profile add */
> +	rte_tm_shaper_profile_add_t shaper_profile_add;
> +	/** Traffic manager shaper profile delete */
> +	rte_tm_shaper_profile_delete_t shaper_profile_delete;
> +	/** Traffic manager shared shaper add/update */
> +	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
> +	/** Traffic manager shared shaper delete */
> +	rte_tm_shared_shaper_delete_t shared_shaper_delete;
> +
> +	/** Traffic manager node add */
> +	rte_tm_node_add_t node_add;
> +	/** Traffic manager node delete */
> +	rte_tm_node_delete_t node_delete;
> +	/** Traffic manager node suspend */
> +	rte_tm_node_suspend_t node_suspend;
> +	/** Traffic manager node resume */
> +	rte_tm_node_resume_t node_resume;
> +	/** Traffic manager hierarchy set */
> +	rte_tm_hierarchy_set_t hierarchy_set;
> +
> +	/** Traffic manager node parent update */
> +	rte_tm_node_parent_update_t node_parent_update;
> +	/** Traffic manager node shaper update */
> +	rte_tm_node_shaper_update_t node_shaper_update;
> +	/** Traffic manager node shared shaper update */
> +	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
> +	/** Traffic manager node stats update */
> +	rte_tm_node_stats_update_t node_stats_update;
> +	/** Traffic manager node scheduling mode update */
> +	rte_tm_node_scheduling_mode_update_t node_scheduling_mode_update;
> +	/** Traffic manager node congestion management mode update */
> +	rte_tm_node_cman_update_t node_cman_update;
> +	/** Traffic manager node WRED context update */
> +	rte_tm_node_wred_context_update_t node_wred_context_update;
> +	/** Traffic manager node shared WRED context update */
> +	rte_tm_node_shared_wred_context_update_t
> +		node_shared_wred_context_update;
> +	/** Traffic manager read statistics counters for current node */
> +	rte_tm_node_stats_read_t node_stats_read;
> +
> +	/** Traffic manager packet marking - VLAN DEI */
> +	rte_tm_mark_vlan_dei_t mark_vlan_dei;
> +	/** Traffic manager packet marking - IPv4/IPv6 ECN */
> +	rte_tm_mark_ip_ecn_t mark_ip_ecn;
> +	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
> +	rte_tm_mark_ip_dscp_t mark_ip_dscp;
> +};
> +
> +/**
> + * Initialize generic error structure.
> + *
> + * This function also sets rte_errno to a given value.
> + *
> + * @param error
> + *   Pointer to error structure (may be NULL).
> + * @param code
> + *   Related error code (rte_errno).
> + * @param type
> + *   Cause field and error type.
> + * @param cause
> + *   Object responsible for the error.
> + * @param message
> + *   Human-readable error message.
> + *
> + * @return
> + *   Error code.
> + */
> +static inline int
> +rte_tm_error_set(struct rte_tm_error *error,
> +		   int code,
> +		   enum rte_tm_error_type type,
> +		   const void *cause,
> +		   const char *message)
> +{
> +	if (error) {
> +		*error = (struct rte_tm_error){
> +			.type = type,
> +			.cause = cause,
> +			.message = message,
> +		};
> +	}
> +	rte_errno = code;
> +	return code;
> +}
> +
> +/**
> + * Get generic traffic manager operations structure from a port
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param error
> + *   Error details
> + *
> + * @return
> + *   The traffic manager operations structure associated with port_id on
> + *   success, NULL otherwise.
> + */
> +const struct rte_tm_ops *
> +rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
                     ` (3 preceding siblings ...)
  2017-03-30 10:32   ` Hemant Agrawal
@ 2017-04-07 13:20   ` Jerin Jacob
  2017-04-07 17:47     ` Dumitrescu, Cristian
  4 siblings, 1 reply; 52+ messages in thread
From: Jerin Jacob @ 2017-04-07 13:20 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, thomas.monjalon, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

-----Original Message-----
> Date: Sat, 4 Mar 2017 01:10:20 +0000
> From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> To: dev@dpdk.org
> CC: thomas.monjalon@6wind.com, jerin.jacob@caviumnetworks.com,
>  balasubramanian.manoharan@cavium.com, hemant.agrawal@nxp.com,
>  shreyansh.jain@nxp.com
> Subject: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> X-Mailer: git-send-email 2.5.0
> 

Thanks Cristian for v3.

>From Cavium HW perceptive, v3 is in relatively good shape to consume it,
Except the below mentioned two pointers.

1) We strongly believes, application explicitly need to give level id, while
creating topology(i.e rte_tm_node_add()). Reasons are,

- In the capability side we are exposing nr_levels etc
- I think, _All_ the HW implementation expects to be connected from
level-n to leveln-1. IMO, Its better to express that in API.
- For the SW implementations, which don't care abut the specific level id for the
  connection can ignore the level id passed by the application.
  Let the definition be "level" aware.

2) There are lot of capability in the TM definition. I don't have strong option
here as TM stuff comes in control path.

So expect point (1), generally we are fine with V3.

Detailed comments below,

> +
> +#ifndef __INCLUDE_RTE_TM_H__
> +#define __INCLUDE_RTE_TM_H__
> +
> +/**
> + * @file
> + * RTE Generic Traffic Manager API
> + *
> + * This interface provides the ability to configure the traffic manager in a
> + * generic way. It includes features such as: hierarchical scheduling,
> + * traffic shaping, congestion management, packet marking, etc.
> + */

Fix missing API documentation doxygen hooks.

Files: doc/api/doxy-api-index.md and doc/api/doxy-api.conf.
Ref:
http://dpdk.org/browse/dpdk/commit/?id=71f2384328651dced05eceee87119a71f0cf16a7 


> +
> +#include <stdint.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** Ethernet framing overhead
> + *
> + * Overhead fields per Ethernet frame:
> + * 1. Preamble:                                            7 bytes;
> + * 2. Start of Frame Delimiter (SFD):                      1 byte;
> + * 3. Inter-Frame Gap (IFG):                              12 bytes.
> + */
> +#define RTE_TM_ETH_FRAMING_OVERHEAD                  20

This definition is not used anywhere

> +/**
> + * Color
> + */
> +enum rte_tm_color {
> +	RTE_TM_GREEN = 0, /**< Green */

Explicit zero assignment is not required.

> +	RTE_TM_YELLOW, /**< Yellow */
> +	RTE_TM_RED, /**< Red */
> +	RTE_TM_COLORS /**< Number of colors */
> +};
> +
> +/**
> + * Node statistics counter type
> + */
> +enum rte_tm_stats_type {
> +	/**< Number of packets scheduled from current node. */
> +	RTE_TM_STATS_N_PKTS = 1 << 0,
> +
> +	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
> +
> +	/**< Number of bytes currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,

I think, For bitmask, it is better to add ULL.
example:
        RTE_TM_STATS_N_BYTES_QUEUED = 1ULL << 9,
> +};

I think, The above definitions are used as "uint64_t stats_mask" in
the remaining sections. How about changing to "enum rte_tm_stats_type stats_mask"
instead of uint64_t stats_mask?

> +
> +/**
> + * Traffic manager dynamic updates
> + */
> +enum rte_tm_dynamic_update_type {
> +	/**< Dynamic parent node update. The new parent node is located on same
> +	 * hierarchy level as the former parent node. Consequently, the node
> +	 * whose parent is changed preserves its hierarchy level.
> +	 */
> +	/**< Dynamic update of the set of enabled stats counter types. */
> +	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
> +
> +	/**< Dynamic update of congestion management mode for leaf nodes. */
> +	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
> +};

Same as above comment on enum.

> +struct rte_tm_level_capabilities {

IMO, level can be either leaf or nonleaf. If so, following struct makes more
sense to me

        int is_leaf;
        uint32_t n_nodes_max;
        union {
                struct rte_tm_node_capabilities nonleaf;
                struct rte_tm_node_capabilities leaf;
        };

> +	/**< Maximum number of nodes for the current hierarchy level. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of non-leaf nodes for the current hierarchy level.
> +	 * The value of 0 indicates that current level only supports leaf
> +	 * nodes. The maximum value is *n_nodes_max*.
> +	 */
> +	uint32_t n_nodes_nonleaf_max;
> +
> +	/**< Maximum number of leaf nodes for the current hierarchy level. The
> +	 * value of 0 indicates that current level only supports non-leaf
> +	 * nodes. The maximum value is *n_nodes_max*.
> +	 */
> +	uint32_t n_nodes_leaf_max;
> +
> +	/**< Summary of node-level capabilities across all the non-leaf nodes
> +	 * of the current hierarchy level. Valid only when
> +	 * *n_nodes_nonleaf_max* is greater than 0.
> +	 */
> +	struct rte_tm_node_capabilities nonleaf;
> +
> +	/**< Summary of node-level capabilities across all the leaf nodes of
> +	 * the current hierarchy level. Valid only when *n_nodes_leaf_max* is
> +	 * greater than 0.
> +	 */
> +	struct rte_tm_node_capabilities leaf;
> +};
> +
> +/**
> + * Traffic manager capabilities
> + */
> +struct rte_tm_capabilities {
> +	/**< Maximum number of nodes. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Set of supported dynamic update operations
> +	 * (see enum rte_tm_dynamic_update_type).
> +	 */
> +	uint64_t dynamic_update_mask;

IMO, It is better to change as
enum rte_tm_dynamic_update_type dynamic_update_mask
instead of
uint64_t dynamic_update_mask;

> +
> +	/**< Summary of node-level capabilities across all non-leaf nodes. */
> +	struct rte_tm_node_capabilities nonleaf;
> +
> +	/**< Summary of node-level capabilities across all leaf nodes. */
> +	struct rte_tm_node_capabilities leaf;
> +};
> +
> +/**
> + * Congestion management (CMAN) mode
> + *
> + * This is used for controlling the admission of packets into a packet queue or
> + * group of packet queues on congestion. On request of writing a new packet
> + * into the current queue while the queue is full, the *tail drop* algorithm
> + * drops the new packet while leaving the queue unmodified, as opposed to *head
> + * drop* algorithm, which drops the packet at the head of the queue (the oldest
> + * packet waiting in the queue) and admits the new packet at the tail of the
> + * queue.
> + *
> + * The *Random Early Detection (RED)* algorithm works by proactively dropping
> + * more and more input packets as the queue occupancy builds up. When the queue
> + * is full or almost full, RED effectively works as *tail drop*. The *Weighted
> + * RED* algorithm uses a separate set of RED thresholds for each packet color.
> + */
> +enum rte_tm_cman_mode {
> +	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */

explicit zero assignment may not be required

> +	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
> +	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
> +};
> +

> +struct rte_tm_node_params {
> +	/**< Shaper profile for the private shaper. The absence of the private
> +	 * shaper for the current node is indicated by setting this parameter
> +	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
> +	 */
> +	uint32_t shaper_profile_id;
> +
> +	/**< User allocated array of valid shared shaper IDs. */
> +	uint32_t *shared_shaper_id;
> +
> +	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
> +	uint32_t n_shared_shapers;
> +
> +	/**< Mask of statistics counter types to be enabled for this node. This
> +	 * needs to be a subset of the statistics counter types available for
> +	 * the current node. Any statistics counter type not included in this
> +	 * set is to be disabled for the current node.
> +	 */
> +	uint64_t stats_mask;

How about changing to "enum rte_tm_stats_type" instead of uint64_t ?

> +
> +	union {
> +		/**< Parameters only valid for non-leaf nodes. */
> +		struct {
> +			/**< For each priority, indicates whether the children
> +			 * nodes sharing the same priority are to be scheduled
> +			 * by WFQ or by WRR. When NULL, it indicates that WFQ
> +			 * is to be used for all priorities. When non-NULL, it
> +			 * points to a pre-allocated array of *n_priority*
> +			 * elements, with a non-zero value element indicating
> +			 * WFQ and a zero value element for WRR.
> +			 */
> +			int *scheduling_mode_per_priority;
> +
> +			/**< Number of priorities. */
> +			uint32_t n_priorities;
> +		} nonleaf;


Since we are adding all node "connecting" parameter in rte_tm_node_add().
How about adding WFQ vs WRR as boolean value in rte_tm_node_add() to maintain
the consistency

How about new error type in "enum rte_tm_error_type" to specify the connection
error due to requested mode WFQ or WRR not supported.

> +
> +/**
> +/**
> + * Traffic manager get number of leaf nodes
> + *
> + * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> + * Therefore, the set of leaf nodes is predefined, their number is always equal
> + * to N (where N is the number of TX queues configured for the current port)
> + * and their IDs are 0 .. (N-1).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param n_leaf_nodes
> + *   Number of leaf nodes for the current port.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_get_leaf_nodes(uint8_t port_id,
> +	uint32_t *n_leaf_nodes,
> +	struct rte_tm_error *error);

In order to keep consistency with rest of the API, IMO, the API
name can be changed to rte_tm_leaf_nodes_get()

> +
> +/**
> + * Traffic manager node type (i.e. leaf or non-leaf) get
> + *
> + * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
> + * the number of TX queues of the current Ethernet port. The non-leaf nodes
> + * have their IDs generated by the application outside of the above range,
> + * which is reserved for leaf nodes.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID value. Needs to be valid.
> + * @param is_leaf

Change to "@param[out] is_leaf" to indicate the parameter is output.
I guess, That scheme is missing in overall header file. It is good to have.


> + *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_type_get(uint8_t port_id,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap

missing [out]. See above

> + *   Traffic manager capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_capabilities_get(uint8_t port_id,
> +	struct rte_tm_capabilities *cap,
> +	struct rte_tm_error *error);
> +

> +int
> +rte_tm_node_add(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error);

See the first comment in the beginning of the file.

> +
> + * Traffic manager node parent update
> + *
> + * Restriction for root node: its parent cannot be changed.

IMO, it is nice to mention correspond "enum rte_tm_dynamic_update_type"
flag for this API support here. May be in common code itself we can check that and
return error if implementation does not meet the capability.

Applicable to all update APIs

> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param parent_node_id
> + *   Node ID for the new parent. Needs to be valid.
> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is zero. Used by the
> + *   WFQ/WRR algorithm running on the parent of the current node for scheduling
> + *   this child node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error);
> +

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-03-30 10:32   ` Hemant Agrawal
@ 2017-04-07 16:51     ` Dumitrescu, Cristian
  0 siblings, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-04-07 16:51 UTC (permalink / raw)
  To: Hemant Agrawal, dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan, shreyansh.jain

Hi Hemant,

Thanks for your input!

...<snip>

> > +/**
> > + * Traffic manager dynamic updates
> > + */
> > +enum rte_tm_dynamic_update_type {
> > +	/**< Dynamic parent node update. The new parent node is located
> on same
> > +	 * hierarchy level as the former parent node. Consequently, the
> node
> > +	 * whose parent is changed preserves its hierarchy level.
> > +	 */
> > +	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
> > +
> > +	/**< Dynamic parent node update. The new parent node is located
> on
> > +	 * different hierarchy level than the former parent node.
> Consequently,
> > +	 * the node whose parent is changed also changes its hierarchy level.
> > +	 */
> > +	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
> > +
> > +	/**< Dynamic node add/delete. */
> > +	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
> > +
> > +	/**< Suspend/resume nodes. */
> > +	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
> 
> what is the expectation from suspend/resumes?
> 1. the Fan-out will stop.
> 2. the Fan-In will stop, any enqueue request will result into error?  or
> it will be implementation dependent.
> During suspend/hold state the buffers should not hold in the node.

Requirement is that fan-out should stop while node is in suspended state. This operation is similar to inhibiting dequeue operation from the associated queues, or to having the output rate of the node set to zero while in suspended state. I am taking the action to document the suspend operation.

I agree that the following items should be implementation specific. They are related to enqueue side and in my mind they are not related to the suspend operation, which only affects the dequeue side:
-Whether the fan-in should stop while the node is in suspended state
-What should happen with the packets stored in the queues (owned by this node and its descendants)

...<snip>

> > +/**
> > + * Traffic manager level capabilities
> > + */
> > +struct rte_tm_level_capabilities {
> > +	/**< Maximum number of nodes for the current hierarchy level. */
> > +	uint32_t n_nodes_max;
> > +
> > +	/**< Maximum number of non-leaf nodes for the current hierarchy
> level.
> > +	 * The value of 0 indicates that current level only supports leaf
> > +	 * nodes. The maximum value is *n_nodes_max*.
> > +	 */
> > +	uint32_t n_nodes_nonleaf_max;
> > +
> > +	/**< Maximum number of leaf nodes for the current hierarchy level.
> The
> > +	 * value of 0 indicates that current level only supports non-leaf
> > +	 * nodes. The maximum value is *n_nodes_max*.
> > +	 */
> > +	uint32_t n_nodes_leaf_max;
> > +
> > +	/**< Summary of node-level capabilities across all the non-leaf
> nodes
> > +	 * of the current hierarchy level. Valid only when
> > +	 * *n_nodes_nonleaf_max* is greater than 0.
> > +	 */
> > +	struct rte_tm_node_capabilities nonleaf;
> > +
> > +	/**< Summary of node-level capabilities across all the leaf nodes of
> > +	 * the current hierarchy level. Valid only when *n_nodes_leaf_max*
> is
> > +	 * greater than 0.
> > +	 */
> > +	struct rte_tm_node_capabilities leaf;
> 
> you also need to provide a flag that all the nodes in this level has
> equal capability of not.

Yes, good idea, will do.

> 
> In case, all nodes in this level has equal priority, the implementation
> need to to inquire any further for node level capability. Generally in
> case of HWs, they have equal capabilities at a particular level.
> 
> If all nodes are not having equal priority, this information may not be
> correct and it may only provide a very rough overview of node
> capabilities. the applications should not assume anything about a
> particular node capability within this level.
> 

Agree.

> 
> > +};
> > +
> > +/**
> > + * Traffic manager capabilities
> > + */
> > +struct rte_tm_capabilities {
> > +	/**< Maximum number of nodes. */
> > +	uint32_t n_nodes_max;
> > +
> > +	/**< Maximum number of levels (i.e. number of nodes connecting
> the root
> > +	 * node with any leaf node, including the root and the leaf).
> > +	 */
> > +	uint32_t n_levels_max;
> > +
> > +	/**< Maximum number of shapers, either private or shared. In case
> the
> > +	 * implementation does not share any resource between private and
> > +	 * shared shapers, it is typically equal to the sum between
> > +	 * *shaper_private_n_max* and *shaper_shared_n_max*.
> > +	 */
> > +	uint32_t shaper_n_max;
> > +
> > +	/**< Maximum number of private shapers. Indicates the maximum
> number of
> > +	 * nodes that can concurrently have the private shaper enabled.
> > +	 */
> > +	uint32_t shaper_private_n_max;
> > +
> > +	/**< Maximum number of shared shapers. The value of zero
> indicates that
> > +	 * shared shapers are not supported.
> > +	 */
> > +	uint32_t shaper_shared_n_max;
> > +
> > +	/**< Maximum number of nodes that can share the same shared
> shaper.
> > +	 * Only valid when shared shapers are supported.
> > +	 */
> > +	uint32_t shaper_shared_n_nodes_max;
> > +
> > +	/**< Maximum number of shared shapers that can be configured
> with dual
> > +	 * rate shaping. The value of zero indicates that dual rate shaping
> > +	 * support is not available for shared shapers.
> > +	 */
> > +	uint32_t shaper_shared_dual_rate_n_max;
> > +
> > +	/**< Minimum committed/peak rate (bytes per second) for shared
> shapers.
> > +	 * Only valid when shared shapers are supported.
> > +	 */
> > +	uint64_t shaper_shared_rate_min;
> > +
> > +	/**< Maximum committed/peak rate (bytes per second) for shared
> shaper.
> > +	 * Only valid when shared shapers are supported.
> > +	 */
> > +	uint64_t shaper_shared_rate_max;
> > +
> > +	/**< Minimum value allowed for packet length adjustment for
> > +	 * private/shared shapers.
> > +	 */
> > +	int shaper_pkt_length_adjust_min;
> > +
> > +	/**< Maximum value allowed for packet length adjustment for
> > +	 * private/shared shapers.
> > +	 */
> > +	int shaper_pkt_length_adjust_max;
> > +
> > +	/**< Maximum number of WRED contexts. */
> > +	uint32_t cman_wred_context_n_max;
> > +
> > +	/**< Maximum number of private WRED contexts. Indicates the
> maximum
> > +	 * number of leaf nodes that can concurrently have the private WRED
> > +	 * context enabled.
> > +	 */
> > +	uint32_t cman_wred_context_private_n_max;
> > +
> > +	/**< Maximum number of shared WRED contexts. The value of zero
> > +	 * indicates that shared WRED contexts are not supported.
> > +	 */
> > +	uint32_t cman_wred_context_shared_n_max;
> > +
> > +	/**< Maximum number of leaf nodes that can share the same WRED
> context.
> > +	 * Only valid when shared WRED contexts are supported.
> > +	 */
> > +	uint32_t cman_wred_context_shared_n_nodes_max;
> > +
> > +	/**< Support for VLAN DEI packet marking (per color). */
> > +	int mark_vlan_dei_supported[RTE_TM_COLORS];
> > +
> > +	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color).
> */
> > +	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
> > +
> > +	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color).
> */
> > +	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
> > +
> > +	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
> > +	int mark_ip_dscp_supported[RTE_TM_COLORS];
> > +
> > +	/**< Set of supported dynamic update operations
> > +	 * (see enum rte_tm_dynamic_update_type).
> > +	 */
> > +	uint64_t dynamic_update_mask;
> > +
> > +	/**< Summary of node-level capabilities across all non-leaf nodes. */
> > +	struct rte_tm_node_capabilities nonleaf;
> > +
> > +	/**< Summary of node-level capabilities across all leaf nodes. */
> > +	struct rte_tm_node_capabilities leaf;
> 
> This is not right, When you are having per level capabilities, why to
> return a node level capability with TM.
> 
> In software, it is easy to maintain all nodes of same capabilities. But
> in case of hardware, the capabilities of different levels is going to be
> different.
> 
> This will result into non-portable implementation, where the application
> will find the easy way to build the TM tree on this basis of these values.
> 
> You already have most of the capability indications in the tm level
> parameters. the only information missing is w.r.t WRR/WFQ and SP
> capability of tm. you can add some flags to get that.
> 
> 

Yes, we have a private conversation and I agree with your concern that "summary of capabilities across a set of nodes" is potentially confusing, as it is easy to confuse between "at least one node supports X" and "all nodes support X". The only reason I went this way is to save on some big data structure by trying to reuse some of them for slightly different purpose than initially stated. I am taking the action to rework this part to eliminate any confusion.

... <snip>

Regards,
Cristian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-04-07 13:20   ` Jerin Jacob
@ 2017-04-07 17:47     ` Dumitrescu, Cristian
  2017-04-10 14:00       ` Jerin Jacob
  0 siblings, 1 reply; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-04-07 17:47 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, thomas.monjalon, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

Hi Jerin,

Thanks for your review!

> 
> Thanks Cristian for v3.
> 
> From Cavium HW perceptive, v3 is in relatively good shape to consume it,
> Except the below mentioned two pointers.
> 
> 1) We strongly believes, application explicitly need to give level id, while
> creating topology(i.e rte_tm_node_add()). Reasons are,
> 
> - In the capability side we are exposing nr_levels etc
> - I think, _All_ the HW implementation expects to be connected from
> level-n to leveln-1. IMO, Its better to express that in API.
> - For the SW implementations, which don't care abut the specific level id for
> the
>   connection can ignore the level id passed by the application.
>   Let the definition be "level" aware.
> 

The current API proposal creates a new node and connects it to its parent in a single step, so when a new node is added its level if completely known based on its parent level.

Therefore, specifying the level of the node when the node is added is redundant and therefore not needed. My concern is this requirement can introduce inconsistency into the API, as the user can specify a level ID for the new node that is different than (parent node level ID + 1).

But based on private conversation it looks to me that you guys have a strong opinion on this, so I am taking the action to identify a (nice) way to implement your requirement and do it.

> 2) There are lot of capability in the TM definition. I don't have strong option
> here as TM stuff comes in control path.
> 
> So expect point (1), generally we are fine with V3.
> 

This is good news!

> Detailed comments below,
> 
> > +
> > +#ifndef __INCLUDE_RTE_TM_H__
> > +#define __INCLUDE_RTE_TM_H__
> > +
> > +/**
> > + * @file
> > + * RTE Generic Traffic Manager API
> > + *
> > + * This interface provides the ability to configure the traffic manager in a
> > + * generic way. It includes features such as: hierarchical scheduling,
> > + * traffic shaping, congestion management, packet marking, etc.
> > + */
> 
> Fix missing API documentation doxygen hooks.
> 
> Files: doc/api/doxy-api-index.md and doc/api/doxy-api.conf.
> Ref:
> http://dpdk.org/browse/dpdk/commit/?id=71f2384328651dced05eceee8711
> 9a71f0cf16a7
> 

Thanks, will do.

> 
> > +
> > +#include <stdint.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/** Ethernet framing overhead
> > + *
> > + * Overhead fields per Ethernet frame:
> > + * 1. Preamble:                                            7 bytes;
> > + * 2. Start of Frame Delimiter (SFD):                      1 byte;
> > + * 3. Inter-Frame Gap (IFG):                              12 bytes.
> > + */
> > +#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
> 
> This definition is not used anywhere

This is the typical value to be passed to the shaper profile adjust parameter. Will take the action to document this.

> 
> > +/**
> > + * Color
> > + */
> > +enum rte_tm_color {
> > +	RTE_TM_GREEN = 0, /**< Green */
> 
> Explicit zero assignment is not required.

I think it is needed if we want RTE_TM_COLORS to be equal to the number of colors?

> 
> > +	RTE_TM_YELLOW, /**< Yellow */
> > +	RTE_TM_RED, /**< Red */
> > +	RTE_TM_COLORS /**< Number of colors */
> > +};
> > +
> > +/**
> > + * Node statistics counter type
> > + */
> > +enum rte_tm_stats_type {
> > +	/**< Number of packets scheduled from current node. */
> > +	RTE_TM_STATS_N_PKTS = 1 << 0,
> > +
> > +	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
> > +
> > +	/**< Number of bytes currently waiting in the packet queue of
> current
> > +	 * leaf node.
> > +	 */
> > +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
> 
> I think, For bitmask, it is better to add ULL.
> example:
>         RTE_TM_STATS_N_BYTES_QUEUED = 1ULL << 9,

The enum fields are uint32_t, not uint64_t; therefore, we cannot use uint64_t constants. This is why all enums in DPDK are using uint32_t values. I am not sure, there might be a way to use uint64_t constants by relying on compiler extensions, by I wanted to keep ourselves out of trouble :).


> > +};
> 
> I think, The above definitions are used as "uint64_t stats_mask" in
> the remaining sections. How about changing to "enum rte_tm_stats_type
> stats_mask"
> instead of uint64_t stats_mask?
> 

The mask is a collection of flags (so multiple enum flags are typically enabled in this mask, e.g. stats_mask = RTE_TM_STATS_N_PKT | RTE_TM_STATS_N_BYTES | ...), therefore it cannot be of the same type as the enum.

I am taking the action to document that stats_mask is built out by using this specific enum flags.

> > +
> > +/**
> > + * Traffic manager dynamic updates
> > + */
> > +enum rte_tm_dynamic_update_type {
> > +	/**< Dynamic parent node update. The new parent node is located
> on same
> > +	 * hierarchy level as the former parent node. Consequently, the
> node
> > +	 * whose parent is changed preserves its hierarchy level.
> > +	 */
> > +	/**< Dynamic update of the set of enabled stats counter types. */
> > +	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
> > +
> > +	/**< Dynamic update of congestion management mode for leaf
> nodes. */
> > +	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
> > +};
> 
> Same as above comment on enum.

Same as above answer on enum :)

> 
> > +struct rte_tm_level_capabilities {
> 
> IMO, level can be either leaf or nonleaf. If so, following struct makes more
> sense to me
> 
>         int is_leaf;
>         uint32_t n_nodes_max;
>         union {
>                 struct rte_tm_node_capabilities nonleaf;
>                 struct rte_tm_node_capabilities leaf;
>         };
> 

This was the way it was done in previous versions, but Hemant rightly observed that leaf nodes typically have different capabilities as non-leaf nodes, hence the current solution.

> > +	/**< Maximum number of nodes for the current hierarchy level. */
> > +	uint32_t n_nodes_max;
> > +
> > +	/**< Maximum number of non-leaf nodes for the current hierarchy
> level.
> > +	 * The value of 0 indicates that current level only supports leaf
> > +	 * nodes. The maximum value is *n_nodes_max*.
> > +	 */
> > +	uint32_t n_nodes_nonleaf_max;
> > +
> > +	/**< Maximum number of leaf nodes for the current hierarchy level.
> The
> > +	 * value of 0 indicates that current level only supports non-leaf
> > +	 * nodes. The maximum value is *n_nodes_max*.
> > +	 */
> > +	uint32_t n_nodes_leaf_max;
> > +
> > +	/**< Summary of node-level capabilities across all the non-leaf
> nodes
> > +	 * of the current hierarchy level. Valid only when
> > +	 * *n_nodes_nonleaf_max* is greater than 0.
> > +	 */
> > +	struct rte_tm_node_capabilities nonleaf;
> > +
> > +	/**< Summary of node-level capabilities across all the leaf nodes of
> > +	 * the current hierarchy level. Valid only when *n_nodes_leaf_max*
> is
> > +	 * greater than 0.
> > +	 */
> > +	struct rte_tm_node_capabilities leaf;
> > +};
> > +
> > +/**
> > + * Traffic manager capabilities
> > + */
> > +struct rte_tm_capabilities {
> > +	/**< Maximum number of nodes. */
> > +	uint32_t n_nodes_max;
> > +
> > +	/**< Set of supported dynamic update operations
> > +	 * (see enum rte_tm_dynamic_update_type).
> > +	 */
> > +	uint64_t dynamic_update_mask;
> 
> IMO, It is better to change as
> enum rte_tm_dynamic_update_type dynamic_update_mask
> instead of
> uint64_t dynamic_update_mask;
> 

Same answer as for the other enum above (mask is not equal to a single enum value, but a set of enum flags; basically each bit of the mask corresponds to a different enum value).

> > +
> > +	/**< Summary of node-level capabilities across all non-leaf nodes. */
> > +	struct rte_tm_node_capabilities nonleaf;
> > +
> > +	/**< Summary of node-level capabilities across all leaf nodes. */
> > +	struct rte_tm_node_capabilities leaf;
> > +};
> > +
> > +/**
> > + * Congestion management (CMAN) mode
> > + *
> > + * This is used for controlling the admission of packets into a packet queue
> or
> > + * group of packet queues on congestion. On request of writing a new
> packet
> > + * into the current queue while the queue is full, the *tail drop* algorithm
> > + * drops the new packet while leaving the queue unmodified, as opposed
> to *head
> > + * drop* algorithm, which drops the packet at the head of the queue (the
> oldest
> > + * packet waiting in the queue) and admits the new packet at the tail of
> the
> > + * queue.
> > + *
> > + * The *Random Early Detection (RED)* algorithm works by proactively
> dropping
> > + * more and more input packets as the queue occupancy builds up. When
> the queue
> > + * is full or almost full, RED effectively works as *tail drop*. The
> *Weighted
> > + * RED* algorithm uses a separate set of RED thresholds for each packet
> color.
> > + */
> > +enum rte_tm_cman_mode {
> > +	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
> 
> explicit zero assignment may not be required

Agree, I don't mind either way, by I see that the most of DPD library code uses explicit initial assignment?

> 
> > +	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
> > +	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection
> (WRED) */
> > +};
> > +
> 
> > +struct rte_tm_node_params {
> > +	/**< Shaper profile for the private shaper. The absence of the
> private
> > +	 * shaper for the current node is indicated by setting this parameter
> > +	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
> > +	 */
> > +	uint32_t shaper_profile_id;
> > +
> > +	/**< User allocated array of valid shared shaper IDs. */
> > +	uint32_t *shared_shaper_id;
> > +
> > +	/**< Number of shared shaper IDs in the *shared_shaper_id* array.
> */
> > +	uint32_t n_shared_shapers;
> > +
> > +	/**< Mask of statistics counter types to be enabled for this node.
> This
> > +	 * needs to be a subset of the statistics counter types available for
> > +	 * the current node. Any statistics counter type not included in this
> > +	 * set is to be disabled for the current node.
> > +	 */
> > +	uint64_t stats_mask;
> 
> How about changing to "enum rte_tm_stats_type" instead of uint64_t ?
> 

Same answer as above ones on enums & masks.

> > +
> > +	union {
> > +		/**< Parameters only valid for non-leaf nodes. */
> > +		struct {
> > +			/**< For each priority, indicates whether the children
> > +			 * nodes sharing the same priority are to be
> scheduled
> > +			 * by WFQ or by WRR. When NULL, it indicates that
> WFQ
> > +			 * is to be used for all priorities. When non-NULL, it
> > +			 * points to a pre-allocated array of *n_priority*
> > +			 * elements, with a non-zero value element
> indicating
> > +			 * WFQ and a zero value element for WRR.
> > +			 */
> > +			int *scheduling_mode_per_priority;
> > +
> > +			/**< Number of priorities. */
> > +			uint32_t n_priorities;
> > +		} nonleaf;
> 
> 
> Since we are adding all node "connecting" parameter in rte_tm_node_add().
> How about adding WFQ vs WRR as boolean value in rte_tm_node_add() to
> maintain
> the consistency

This is not about the parent node managing this new node as one of its children nodes, it is about how this new node will manage its future children nodes, hence the reason to put it here.

> 
> How about new error type in "enum rte_tm_error_type" to specify the
> connection
> error due to requested mode WFQ or WRR not supported.
> 

I think we already have it, it is called: RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE.

> > +
> > +/**
> > +/**
> > + * Traffic manager get number of leaf nodes
> > + *
> > + * Each leaf node sits on on top of a TX queue of the current Ethernet
> port.
> > + * Therefore, the set of leaf nodes is predefined, their number is always
> equal
> > + * to N (where N is the number of TX queues configured for the current
> port)
> > + * and their IDs are 0 .. (N-1).
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param n_leaf_nodes
> > + *   Number of leaf nodes for the current port.
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_get_leaf_nodes(uint8_t port_id,
> > +	uint32_t *n_leaf_nodes,
> > +	struct rte_tm_error *error);
> 
> In order to keep consistency with rest of the API, IMO, the API
> name can be changed to rte_tm_leaf_nodes_get()
> 

IMO this is not a node API, it is a port API, hence the attempt to avoid rte_tm_node_XYZ().

Maybe a longer but less confusing name is: rte_tm_get_number_of_leaf_nodes()?

> > +
> > +/**
> > + * Traffic manager node type (i.e. leaf or non-leaf) get
> > + *
> > + * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
> > + * the number of TX queues of the current Ethernet port. The non-leaf
> nodes
> > + * have their IDs generated by the application outside of the above range,
> > + * which is reserved for leaf nodes.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param node_id
> > + *   Node ID value. Needs to be valid.
> > + * @param is_leaf
> 
> Change to "@param[out] is_leaf" to indicate the parameter is output.
> I guess, That scheme is missing in overall header file. It is good to have.
> 

OK, will do for the entire file.

> 
> > + *   Set to non-zero value when node is leaf and to zero otherwise (non-
> leaf).
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_node_type_get(uint8_t port_id,
> > +	uint32_t node_id,
> > +	int *is_leaf,
> > +	struct rte_tm_error *error);
> > +
> > +/**
> > + * Traffic manager capabilities get
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param cap
> 
> missing [out]. See above
> 

OK, will do for the entire file.

> > + *   Traffic manager capabilities. Needs to be pre-allocated and valid.
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_capabilities_get(uint8_t port_id,
> > +	struct rte_tm_capabilities *cap,
> > +	struct rte_tm_error *error);
> > +
> 
> > +int
> > +rte_tm_node_add(uint8_t port_id,
> > +	uint32_t node_id,
> > +	uint32_t parent_node_id,
> > +	uint32_t priority,
> > +	uint32_t weight,
> > +	struct rte_tm_node_params *params,
> > +	struct rte_tm_error *error);
> 
> See the first comment in the beginning of the file.
> 

Noted.

> > +
> > + * Traffic manager node parent update
> > + *
> > + * Restriction for root node: its parent cannot be changed.
> 
> IMO, it is nice to mention correspond "enum rte_tm_dynamic_update_type"
> flag for this API support here. May be in common code itself we can check
> that and
> return error if implementation does not meet the capability.
> 
> Applicable to all update APIs
> 

OK, will do for the entire file.

> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param node_id
> > + *   Node ID. Needs to be valid.
> > + * @param parent_node_id
> > + *   Node ID for the new parent. Needs to be valid.
> > + * @param priority
> > + *   Node priority. The highest node priority is zero. Used by the SP
> algorithm
> > + *   running on the parent of the current node for scheduling this child
> node.
> > + * @param weight
> > + *   Node weight. The node weight is relative to the weight sum of all
> siblings
> > + *   that have the same priority. The lowest weight is zero. Used by the
> > + *   WFQ/WRR algorithm running on the parent of the current node for
> scheduling
> > + *   this child node.
> > + * @param error
> > + *   Error details. Filled in only on error, when not NULL.
> > + * @return
> > + *   0 on success, non-zero error code otherwise.
> > + */
> > +int
> > +rte_tm_node_parent_update(uint8_t port_id,
> > +	uint32_t node_id,
> > +	uint32_t parent_node_id,
> > +	uint32_t priority,
> > +	uint32_t weight,
> > +	struct rte_tm_error *error);
> > +

Regards,
Cristian

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  2017-04-07 17:47     ` Dumitrescu, Cristian
@ 2017-04-10 14:00       ` Jerin Jacob
  0 siblings, 0 replies; 52+ messages in thread
From: Jerin Jacob @ 2017-04-10 14:00 UTC (permalink / raw)
  To: Dumitrescu, Cristian
  Cc: dev, thomas.monjalon, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

-----Original Message-----
> Date: Fri, 7 Apr 2017 17:47:40 +0000
> From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, "thomas.monjalon@6wind.com"
>  <thomas.monjalon@6wind.com>, "balasubramanian.manoharan@cavium.com"
>  <balasubramanian.manoharan@cavium.com>, "hemant.agrawal@nxp.com"
>  <hemant.agrawal@nxp.com>, "shreyansh.jain@nxp.com"
>  <shreyansh.jain@nxp.com>
> Subject: RE: [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> 
> Hi Jerin,
> 
> Thanks for your review!
> 
> > 
> > Thanks Cristian for v3.
> > 
> > From Cavium HW perceptive, v3 is in relatively good shape to consume it,
> > Except the below mentioned two pointers.
> > 
> > 1) We strongly believes, application explicitly need to give level id, while
> > creating topology(i.e rte_tm_node_add()). Reasons are,
> > 
> > - In the capability side we are exposing nr_levels etc
> > - I think, _All_ the HW implementation expects to be connected from
> > level-n to leveln-1. IMO, Its better to express that in API.
> > - For the SW implementations, which don't care abut the specific level id for
> > the
> >   connection can ignore the level id passed by the application.
> >   Let the definition be "level" aware.
> > 
> 
> The current API proposal creates a new node and connects it to its parent in a single step, so when a new node is added its level if completely known based on its parent level.
> 
> Therefore, specifying the level of the node when the node is added is redundant and therefore not needed. My concern is this requirement can introduce inconsistency into the API, as the user can specify a level ID for the new node that is different than (parent node level ID + 1).

Yes. I think, its better if we return error if implementation does not
supports in such case.

> 
> But based on private conversation it looks to me that you guys have a strong opinion on this, so I am taking the action to identify a (nice) way to implement your requirement and do it.

OK

> 
> > 2) There are lot of capability in the TM definition. I don't have strong option
> > here as TM stuff comes in control path.
> > 
> > So expect point (1), generally we are fine with V3.
> > 
> 
> This is good news!
> 
> > Detailed comments below,
> > 
> > > +
> > 
> > This definition is not used anywhere
> 
> This is the typical value to be passed to the shaper profile adjust parameter. Will take the action to document this.

OK

> 
> > 
> > > +/**
> > > + * Color
> > > + */
> > > +enum rte_tm_color {
> > > +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
> > 
> > I think, For bitmask, it is better to add ULL.
> > example:
> >         RTE_TM_STATS_N_BYTES_QUEUED = 1ULL << 9,
> 
> The enum fields are uint32_t, not uint64_t; therefore, we cannot use uint64_t constants. This is why all enums in DPDK are using uint32_t values. I am not sure, there might be a way to use uint64_t constants by relying on compiler extensions, by I wanted to keep ourselves out of trouble :).

OK

> 
> 
> > > +};
> > 
> > I think, The above definitions are used as "uint64_t stats_mask" in
> > the remaining sections. How about changing to "enum rte_tm_stats_type
> > stats_mask"
> > instead of uint64_t stats_mask?
> > 
> 
> The mask is a collection of flags (so multiple enum flags are typically enabled in this mask, e.g. stats_mask = RTE_TM_STATS_N_PKT | RTE_TM_STATS_N_BYTES | ...), therefore it cannot be of the same type as the enum.
> 
> I am taking the action to document that stats_mask is built out by using this specific enum flags.

OK

> 
> > > +
> > > +/**
> > > + * Traffic manager dynamic updates
> > > +struct rte_tm_level_capabilities {
> > 
> > IMO, level can be either leaf or nonleaf. If so, following struct makes more
> > sense to me
> > 
> >         int is_leaf;
> >         uint32_t n_nodes_max;
> >         union {
> >                 struct rte_tm_node_capabilities nonleaf;
> >                 struct rte_tm_node_capabilities leaf;
> >         };
> > 
> 
> This was the way it was done in previous versions, but Hemant rightly observed that leaf nodes typically have different capabilities as non-leaf nodes, hence the current solution.

OK. But still it has to be union. Right? because a level can be either leaf
or non-leaf(not both).


> 
> > > +	 * (see enum rte_tm_dynamic_update_type).
> > > +	 */
> > > +	uint64_t dynamic_update_mask;
> > 
> > IMO, It is better to change as
> > enum rte_tm_dynamic_update_type dynamic_update_mask
> > instead of
> > uint64_t dynamic_update_mask;
> > 
> 
> Same answer as for the other enum above (mask is not equal to a single enum value, but a set of enum flags; basically each bit of the mask corresponds to a different enum value).

OK

I think, we can add "@see enum rte_tm_dynamic_update_type" scheme in
the header file wherever there is a `reference to other element in the
header file.

reference: http://dpdk.org/browse/dpdk/tree/lib/librte_eventdev/rte_eventdev.h#n257


> > > +
> > > +	union {
> > > +		/**< Parameters only valid for non-leaf nodes. */
> > > +		struct {
> > > +			/**< For each priority, indicates whether the children
> > > +			 * nodes sharing the same priority are to be
> > scheduled
> > > +			 * by WFQ or by WRR. When NULL, it indicates that
> > WFQ
> > > +			 * is to be used for all priorities. When non-NULL, it
> > > +			 * points to a pre-allocated array of *n_priority*
> > > +			 * elements, with a non-zero value element
> > indicating
> > > +			 * WFQ and a zero value element for WRR.
> > > +			 */
> > > +			int *scheduling_mode_per_priority;
> > > +
> > > +			/**< Number of priorities. */
> > > +			uint32_t n_priorities;
> > > +		} nonleaf;
> > 
> > 
> > Since we are adding all node "connecting" parameter in rte_tm_node_add().
> > How about adding WFQ vs WRR as boolean value in rte_tm_node_add() to
> > maintain
> > the consistency
> 
> This is not about the parent node managing this new node as one of its children nodes, it is about how this new node will manage its future children nodes, hence the reason to put it here.

So, Is it same as rte_tm_node_scheduling_mode_update()? if so, then we
may not need this here. if not, what is the use case we are trying to achieve
here?

> 
> > 
> > How about new error type in "enum rte_tm_error_type" to specify the
> > connection
> > error due to requested mode WFQ or WRR not supported.
> > 
> 
> I think we already have it, it is called: RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE.

I think, explicit error code may help, something like
RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_WFQ
and
RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_WRR

> 
> > > +
> > > +/**
> > > +/**
> > > + * Traffic manager get number of leaf nodes
> > > + *
> > > + * Each leaf node sits on on top of a TX queue of the current Ethernet
> > port.
> > > + * Therefore, the set of leaf nodes is predefined, their number is always
> > equal
> > > + * to N (where N is the number of TX queues configured for the current
> > port)
> > > + * and their IDs are 0 .. (N-1).
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the Ethernet device.
> > > + * @param n_leaf_nodes
> > > + *   Number of leaf nodes for the current port.
> > > + * @param error
> > > + *   Error details. Filled in only on error, when not NULL.
> > > + * @return
> > > + *   0 on success, non-zero error code otherwise.
> > > + */
> > > +int
> > > +rte_tm_get_leaf_nodes(uint8_t port_id,
> > > +	uint32_t *n_leaf_nodes,
> > > +	struct rte_tm_error *error);
> > 
> > In order to keep consistency with rest of the API, IMO, the API
> > name can be changed to rte_tm_leaf_nodes_get()
> > 
> 
> IMO this is not a node API, it is a port API, hence the attempt to avoid rte_tm_node_XYZ().
> 
> Maybe a longer but less confusing name is: rte_tm_get_number_of_leaf_nodes()?

No strong opinion here. Shorter version may be rte_tm_leaf_nodes_count()

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4 0/2] ethdev: abstraction layer for QoS traffic management
  2017-03-04  1:10 ` [PATCH v3 1/2] ethdev: add capability control API Cristian Dumitrescu
  2017-03-06 10:32   ` Thomas Monjalon
@ 2017-05-19 17:12   ` Cristian Dumitrescu
  2017-05-19 17:12     ` [PATCH v4 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
  2017-05-19 17:12     ` [PATCH v4 2/2] ethdev: add traffic management API Cristian Dumitrescu
  1 sibling, 2 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-05-19 17:12 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

This patch set introduces an ethdev-based abstraction layer for Quality of
Service (QoS) Traffic Management, which includes: hierarchical scheduling,
traffic shaping, congestion management, packet marking. The goal is to
provide a simple generic API that is agnostic of the underlying HW, SW or
mixed HW-SW implementation.

Patch 1 uses the approach introduced by rte_flow in DPDK to extend the
ethdev functionality in a modular way for traffic management.

Patch 2 introduces the generic ethdev API for traffic management.

Cristian Dumitrescu (2):
  ethdev: add traffic management ops get API
  ethdev: add traffic management API

 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ethdev.c          |   12 +
 lib/librte_ether/rte_ethdev.h          |   20 +
 lib/librte_ether/rte_ether_version.map |   36 +
 lib/librte_ether/rte_tm.c              |  448 ++++++++
 lib/librte_ether/rte_tm.h              | 1923 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  373 +++++++
 8 files changed, 2820 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v4 1/2] ethdev: add traffic management ops get API
  2017-05-19 17:12   ` [PATCH v4 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
@ 2017-05-19 17:12     ` Cristian Dumitrescu
  2017-06-09 16:51       ` [PATCH v5 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  2017-05-19 17:12     ` [PATCH v4 2/2] ethdev: add traffic management API Cristian Dumitrescu
  1 sibling, 1 reply; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-05-19 17:12 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

The rte_flow feature breaks the monolithic approach for ethdev by
introducing the new rte_flow API to ethdev using a plugin-like approach.

Basically, the rte_flow API is still logically part of ethdev:
- It extends the ethdev functionality: rte_flow is a new feature/
  capability of ethdev;
- all its functions work on an Ethernet device: the first parameter of the
  rte_flow functions is Ethernet device port ID.

Also, the rte_flow API is a sort of capability plugin for ethdev:
- the rte_flow API functions have their own name space: they are called
  rte_flow_operationXYZ() as opposed to rte_eth_dev_flow_operationXYZ());
- the rte_flow API functions are placed in separate files in the same
  librte_ether folder as opposed to rte_ethdev.[hc].

The way it works is by using the existing ethdev API function
rte_eth_dev_filter_ctrl() to query the current Ethernet device port ID for
the support of the rte_flow capability and return the pointer to the
rte_flow operations when supported and NULL otherwise:

struct rte_flow_ops *eth_flow_ops;
int rte = rte_eth_dev_filter_ctrl(eth_port_id,
	RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, &eth_flow_ops);

This patch reuses the same approach for ethdev Traffic Management API.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
Changes in v4:
- Followed up on suggestion from Thomas: Replaced generic capability
  ethdev API function with traffic management specific function
  rte_eth_dev_tm_ops_get()

Changes in v3:
- Followed up on suggestion from Jerin: renamed capability from
  Hierarchical Scheduler (sched) to Traffic Manager (tm)

Changes in v2:
- Followed up on suggestion from Jerin and Hemant: renamed
  capability_control() to capability_ops_get()
- Added ACK from Keith, Jerin and Hemant

 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 20 ++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  6 ++++++
 3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 83898a8..f735f1e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3021,6 +3021,18 @@ rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 	return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op, arg);
 }
 
+int
+rte_eth_dev_tm_ops_get(uint8_t port_id, void *ops)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tm_ops_get, -ENOTSUP);
+	return (*dev->dev_ops->tm_ops_get)(dev, ops);
+}
+
 void *
 rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
 		rte_rx_callback_fn fn, void *user_param)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0f38b45..5cf3b80 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1441,6 +1441,9 @@ typedef int (*eth_filter_ctrl_t)(struct rte_eth_dev *dev,
 				 void *arg);
 /**< @internal Take operations to assigned filter type on an Ethernet device */
 
+typedef int (*eth_tm_ops_get_t)(struct rte_eth_dev *dev, void *ops);
+/**< @internal Get Traffic Management (TM) operations on an Ethernet device */
+
 typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev,
 				 struct rte_eth_dcb_info *dcb_info);
 /**< @internal Get dcb information on an Ethernet device */
@@ -1573,6 +1576,9 @@ struct eth_dev_ops {
 	/**< Get extended device statistic values by ID. */
 	eth_xstats_get_names_by_id_t xstats_get_names_by_id;
 	/**< Get name of extended device statistics by ID. */
+
+	eth_tm_ops_get_t tm_ops_get;
+	/**< Get Traffic Management (TM) operations. */
 };
 
 /**
@@ -4105,6 +4111,20 @@ int rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 			enum rte_filter_op filter_op, void *arg);
 
 /**
+ * Take Traffic Management (TM) operations on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param arg
+ *   Pointer to TM operations.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_tm_ops_get(uint8_t port_id, void *ops);
+
+/**
  * Get DCB information on an Ethernet device.
  *
  * @param port_id
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d6726bb..ff056e8 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -156,3 +156,9 @@ DPDK_17.05 {
 	rte_eth_xstats_get_names_by_id;
 
 } DPDK_17.02;
+
+DPDK_17.08 {
+    global:
+
+	rte_eth_dev_tm_ops_get;
+} DPDK_17.05
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v4 2/2] ethdev: add traffic management API
  2017-05-19 17:12   ` [PATCH v4 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  2017-05-19 17:12     ` [PATCH v4 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
@ 2017-05-19 17:12     ` Cristian Dumitrescu
  2017-05-19 17:34       ` Stephen Hemminger
                         ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-05-19 17:12 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

This patch introduces the generic ethdev API for the traffic manager
capability, which includes: hierarchical scheduling, traffic shaping,
congestion management, packet marking.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow)
- Capability query API per port, per level and per node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
- Traffic shaping: single/dual rate, private (per node) and shared (by
  multiple nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
Changes in v4:
- Implemented feedback from Hemant [6]
	- Capability API: Reworked the port, level and node capability API
	  data structure to remove confusion due to "summary across all
	  nodes" approach, which made it unclear whether a particular
	  capability is supported by all nodes or by at least one node.
	- Capability API: Added flags for "all nodes have identical
	  capability set"
	- Suspended state: documented the required behavior in Doxygen
	  description
- Implemented feedback from Jerin [7]
	- Node add: added level parameter (see new API function:
	  rte_tm_node_add_check_level())
	- RTE_TM_ETH_FRAMING_OVERHEAD, RTE_TM_ETH_FRAMING_OVERHEAD_FCS:
	  documented their usage in their Doxygen description
	- Capability API: for each function, mention the related
	  capability field (Doxygen @see)
	- stats_mask, capability_mask: document the enum flags used to
	  build each mask (Doxygen @see)
	- Rename rte_tm_get_leaf_nodes() to
	  rte_tm_get_number_of_leaf_nodes()
	- Doxygen: add @param[in, out] to the description of all API funcs
	- Doxygen: fix hooks in doc/api/doxy-api-index.md
- Rename rte_tm_hierarchy_set() to rte_tm_hierarchy_commit(), improved
  Doxygen description
- Node add, node delete: improved Doxygen description
- Fixed incorrect design assumption that packet-based weight mode for WFQ
  is identical to WRR. As result, removed all references to WRR support.
  Renamed the "scheduling mode" node parameters to "wfq_weight_mode".

Changes in v3:
- Implemented feedback from Jerin [5]
- Changed naming convention: scheddev -> tm
- Improvements on the capability API:
	- Specification of marking capabilities per color
	- WFQ/WRR groups: sp_n_children_max ->
	  wfq_wrr_n_children_per_group_max, added wfq_wrr_n_groups_max,
	  improved description of both, improved description of
	  wfq_wrr_weight_max
	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent
	  update
- Enforced/documented restrictions for root node (node_add() and
  update())
- Enforced/documented shaper profile restrictions on PIR: PIR != 0,
  PIR >= CIR
- Turned repetitive code in rte_tm.c into macro
- Removed dependency on rte_red.h file (added RED params to rte_tm.h)
- Color: removed "e_" from color names enum
- Fixed small Doxygen style issues

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added
	  clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, parent, role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API funcs
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below,
  hopefully nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated
      object IDs. IMO the choice to have application-generated object IDs
      adds marginal complexity to the driver (search ID function
      required), but it provides huge simplification for the application.
      The app does not need to worry about building & managing tree-like
      structure for storing driver-generated object IDs, the app can use
      its own convention for node IDs depending on the specific hierarchy
      that it needs. Trivial example: identify all level-2 nodes with IDs
      like 100, 200, 300, … and the level-3 nodes based on their level-2
      parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …, 310, 320,
      330, … and level-4 nodes based on their level-3 parents: 111, 112,
      113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log
      for the other related simplification that was implemented: leaf
      nodes now have predefined IDs that are the same with their Ethernet
      TX queue ID ( therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the
  shaper profile as part of node API (no shaper ID needed for private
  shapers), while the shared shapers are configured outside of the node
  API using shaper profile and communicated to the node using shared
  shaper ID. So there is no configuration overhead for shared shapers if
  the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet
  TX queue ID (therefore no translation is required for leaf nodes). This
  is also used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause
  (same as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate
  limiting based on IP packet bytes)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
[5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html
[6] Hemant's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-March/062354.html
[7] Jerin's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-April/063429.html

 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_tm.c              |  448 ++++++++
 lib/librte_ether/rte_tm.h              | 1923 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  373 +++++++
 6 files changed, 2782 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index afb4cab..cdaf2ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -240,6 +240,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+Traffic Management API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_tm*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 93fdde1..db692ae 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_tm.c
 
 #
 # Export include files
@@ -56,5 +57,7 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_tm.h
+SYMLINK-y-include += rte_tm_driver.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index ff056e8..7f39904 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -161,4 +161,34 @@ DPDK_17.08 {
     global:
 
 	rte_eth_dev_tm_ops_get;
+	rte_tm_get_leaf_nodes;
+	rte_tm_node_type_get;
+	rte_tm_capabilities_get;
+	rte_tm_level_capabilities_get;
+	rte_tm_node_capabilities_get;
+	rte_tm_wred_profile_add;
+	rte_tm_wred_profile_delete;
+	rte_tm_shared_wred_context_add_update;
+	rte_tm_shared_wred_context_delete;
+	rte_tm_shaper_profile_add;
+	rte_tm_shaper_profile_delete;
+	rte_tm_shared_shaper_add_update;
+	rte_tm_shared_shaper_delete;
+	rte_tm_node_add;
+	rte_tm_node_delete;
+	rte_tm_node_suspend;
+	rte_tm_node_resume;
+	rte_tm_hierarchy_commit;
+	rte_tm_node_parent_update;
+	rte_tm_node_shaper_update;
+	rte_tm_node_shared_shaper_update;
+	rte_tm_node_stats_update;
+	rte_tm_node_wfq_weight_mode_update;
+	rte_tm_node_cman_update;
+	rte_tm_node_wred_context_update;
+	rte_tm_node_shared_wred_context_update;
+	rte_tm_node_stats_read;
+	rte_tm_mark_vlan_dei;
+	rte_tm_mark_ip_ecn;
+	rte_tm_mark_ip_dscp;
 } DPDK_17.05
diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
new file mode 100644
index 0000000..2617a1a
--- /dev/null
+++ b/lib/librte_ether/rte_tm.c
@@ -0,0 +1,448 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm_driver.h"
+#include "rte_tm.h"
+
+/* Get generic traffic manager operations structure from a port. */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_tm_error_set(error,
+			ENODEV,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->tm_ops_get == NULL) ||
+		(dev->dev_ops->tm_ops_get(dev, &ops) != 0) ||
+		(ops == NULL)) {
+		rte_tm_error_set(error,
+			ENOSYS,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+#define RTE_TM_FUNC(port_id, func)				\
+({							\
+	const struct rte_tm_ops *ops =			\
+		rte_tm_ops_get(port_id, error);		\
+	if (ops == NULL)					\
+		return -rte_errno;			\
+							\
+	if (ops->func == NULL)				\
+		return -rte_tm_error_set(error,		\
+			ENOSYS,				\
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
+			NULL,				\
+			rte_strerror(ENOSYS));		\
+							\
+	ops->func;					\
+})
+
+/* Get number of leaf nodes */
+int
+rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops =
+		rte_tm_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_tm_error_set(error,
+			EINVAL,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node type (leaf or non-leaf) */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_type_get)(dev,
+		node_id, is_leaf, error);
+}
+
+/* Get node level */
+int
+rte_tm_node_level_get(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t *level_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_level_get)(dev,
+		node_id, level_id, error);
+}
+
+/* Get capabilities */
+int rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
+		cap, error);
+}
+
+/* Get level capabilities */
+int rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
+		level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
+		node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
+		wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
+		wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
+		shared_wred_context_id, wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
+		shared_wred_context_id, error);
+}
+
+/* Add shaper profile */
+int rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
+		shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
+		shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
+		shared_shaper_id, shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
+		shared_shaper_id, error);
+}
+
+/* Add node to port traffic manager hierarchy */
+int rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_add)(dev,
+		node_id, parent_node_id, priority, weight, params, error);
+}
+
+/* Delete node from traffic manager hierarchy */
+int rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_delete)(dev,
+		node_id, error);
+}
+
+/* Suspend node */
+int rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_suspend)(dev,
+		node_id, error);
+}
+
+/* Resume node */
+int rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_resume)(dev,
+		node_id, error);
+}
+
+/* Commit the initial port traffic manager hierarchy */
+int rte_tm_hierarchy_commit(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, hierarchy_commit)(dev,
+		clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
+		node_id, parent_node_id, priority, weight, error);
+}
+
+/* Update node private shaper */
+int rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
+		node_id, shaper_profile_id, error);
+}
+
+/* Update node shared shapers */
+int rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
+		node_id, shared_shaper_id, add, error);
+}
+
+/* Update node stats */
+int rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
+		node_id, stats_mask, error);
+}
+
+/* Update WFQ weight mode */
+int rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *wfq_weight_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wfq_weight_mode_update)(dev,
+		node_id, wfq_weight_mode, n_sp_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
+		node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
+		node_id, wred_profile_id, error);
+}
+
+/* Update node shared WRED context */
+int rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
+		node_id, shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
+		node_id, stats, stats_mask, clear, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
new file mode 100644
index 0000000..22167c2
--- /dev/null
+++ b/lib/librte_ether/rte_tm.h
@@ -0,0 +1,1923 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_H__
+#define __INCLUDE_RTE_TM_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API
+ *
+ * This interface provides the ability to configure the traffic manager in a
+ * generic way. It includes features such as: hierarchical scheduling,
+ * traffic shaping, congestion management, packet marking, etc.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Ethernet framing overhead.
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ *
+ * One of the typical values for the *pkt_length_adjust* field of the shaper
+ * profile.
+ *
+ * @see struct rte_tm_shaper_params
+ *
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead including the Frame Check Sequence (FCS) field.
+ * Useful when FCS is generated and added at the end of the Ethernet frame on
+ * TX side without any SW intervention.
+ *
+ * One of the typical values for the pkt_length_adjust field of the shaper
+ * profile.
+ *
+ * @see struct rte_tm_shaper_params
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Node ID for the parent of the root node */
+#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_tm_color {
+	RTE_TM_GREEN = 0, /**< Green */
+	RTE_TM_YELLOW, /**< Yellow */
+	RTE_TM_RED, /**< Red */
+	RTE_TM_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_tm_stats_type {
+	/**< Number of packets scheduled from current node. */
+	RTE_TM_STATS_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_TM_STATS_N_BYTES = 1 << 1,
+
+	/**< Number of green packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/**< Number of yellow packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/**< Number of red packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/**< Number of green bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/**< Number of yellow bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/**< Number of red bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_tm_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[RTE_TM_COLORS];
+
+		/**< Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[RTE_TM_COLORS];
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Traffic manager dynamic updates
+ */
+enum rte_tm_dynamic_update_type {
+	/**< Dynamic parent node update. The new parent node is located on same
+	 * hierarchy level as the former parent node. Consequently, the node
+	 * whose parent is changed preserves its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
+
+	/**< Dynamic parent node update. The new parent node is located on
+	 * different hierarchy level than the former parent node. Consequently,
+	 * the node whose parent is changed also changes its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
+
+	/**< Dynamic node add/delete. */
+	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
+
+	/**< Suspend/resume nodes. */
+	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
+
+	/**< Dynamic switch between byte-based and packet-based WFQ weights. */
+	RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE = 1 << 4,
+
+	/**< Dynamic update on number of SP priorities. */
+	RTE_TM_UPDATE_NODE_N_SP_PRIORITIES = 1 << 5,
+
+	/**< Dynamic update of congestion management mode for leaf nodes. */
+	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
+
+	/**< Dynamic update of the set of enabled stats counter types. */
+	RTE_TM_UPDATE_NODE_STATS = 1 << 7,
+};
+
+/**
+ * Traffic manager capabilities
+ */
+struct rte_tm_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< When non-zero, this flag indicates that all the non-leaf nodes
+	 * (with the exception of the root node) have identical capability set.
+	 */
+	int non_leaf_nodes_identical;
+
+	/**< When non-zero, this flag indicates that all the leaf nodes have
+	 * identical capability set.
+	 */
+	int leaf_nodes_identical;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resources between private and
+	 * shared shapers, it is typically equal to the sum of
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have their private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of private shapers that support dual rate shaping.
+	 * Indicates the maximum number of nodes that can concurrently have
+	 * their private shaper enabled with dual rate support. Only valid when
+	 * private shapers are supported. The value of zero indicates that dual
+	 * rate shaping is not available for private shapers. The maximum value
+	 * is *shaper_private_n_max*.
+	 */
+	int shaper_private_dual_rate_n_max;
+	
+	/**< Minimum committed/peak rate (bytes per second) for any private
+	 * shaper. Valid only when private shapers are supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for any private
+	 * shaper. Valid only when private shapers are supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_per_shaper_max;
+
+	/**< Maximum number of shared shapers a node can be part of. This
+	 * parameter indicates that there is at least one node that can be
+	 * configured with this many shared shapers, which might not be true for
+	 * all the nodes. Only valid when shared shapers are supported, in which
+	 * case it ranges from 1 to *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_shared_n_shapers_per_node_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for any shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for any shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for any private
+	 * or shared shaper.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for any private
+	 * or shared shaper.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of children nodes. This parameter indicates that
+	 * there is at least one non-leaf node that can be configured with this
+	 * many children nodes, which might not be true for all the non-leaf
+	 * nodes.
+	 */
+	uint32_t sched_n_children_max;
+
+	/**< Maximum number of supported priority levels. This parameter
+	 * indicates that there is at least one non-leaf node that can be
+	 * configured with this many priority levels for managing its children
+	 * nodes, which might not be true for all the non-leaf nodes. The value
+	 * of zero is invalid. The value of 1 indicates that only priority 0 is
+	 * supported, which essentially means that Strict Priority (SP)
+	 * algorithm is not supported.
+	 */
+	uint32_t sched_sp_n_priorities_max;
+
+	/**< Maximum number of sibling nodes that can have the same priority at
+	 * any given time, i.e. maximum size of the WFQ sibling node group. This
+	 * parameter indicates there is at least one non-leaf node that meets
+	 * this condition, which might not be true for all the non-leaf nodes.
+	 * The value of zero is invalid. The value of 1 indicates that WFQ
+	 * algorithm is not supported. The maximum value is
+	 * *sched_n_children_max*.
+	 */
+	uint32_t sched_wfq_n_children_per_group_max;
+
+	/**< Maximum number of priority levels that can have more than one child
+	 * node at any given time, i.e. maximum number of WFQ sibling node
+	 * groups that have two or more members. This parameter indicates there
+	 * is at least one non-leaf node that meets this condition, which might
+	 * not be true for all the non-leaf nodes. The value of zero states that
+	 * WFQ algorithm is not supported. The value of 1 indicates that
+	 * (*sched_sp_n_priorities_max* - 1) priority levels have at most one
+	 * child node, so there can be only one priority level with two or
+	 * more sibling nodes making up a WFQ group. The maximum value is:
+	 * min(floor(*sched_n_children_max* / 2), *sched_sp_n_priorities_max*).
+	 */
+	uint32_t sched_wfq_n_groups_max;
+
+	/**< Maximum WFQ weight. The value of 1 indicates that all sibling nodes
+	 * with same priority have the same WFQ weight, so WFQ is reduced to FQ.
+	 */
+	uint32_t sched_wfq_weight_max;
+
+	/**< Head drop algorithm support. When non-zero, this parameter
+	 * indicates that there is at least one leaf node that supports the head
+	 * drop algorithm, which might not be true for all the leaf nodes.
+	 */
+	int cman_head_drop_supported;
+
+	/**< Maximum number of WRED contexts, either private or shared. In case
+	 * the implementation does not share any resources between private and
+	 * shared WRED contexts, it is typically equal to the sum of
+	 * *cman_wred_context_private_n_max* and
+	 * *cman_wred_context_shared_n_max*.
+	 */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have their private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero
+	 * indicates that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_per_context_max;
+
+	/**< Maximum number of shared WRED contexts a leaf node can be part of.
+	 * This parameter indicates that there is at least one leaf node that
+	 * can be configured with this many shared WRED contexts, which might
+	 * not be true for all the leaf nodes. Only valid when shared WRED
+	 * contexts are supported, in which case it ranges from 1 to
+	 * *cman_wred_context_shared_n_max*.
+	 */
+	uint32_t cman_wred_context_shared_n_contexts_per_node_max;
+
+	/**< Support for VLAN DEI packet marking (per color). */
+	int mark_vlan_dei_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
+	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
+	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
+	int mark_ip_dscp_supported[RTE_TM_COLORS];
+
+	/**< Set of supported dynamic update operations.
+	 * @see enum rte_tm_dynamic_update_type
+	 */
+	uint64_t dynamic_update_mask;
+
+	/**< Set of supported statistics counter types.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Traffic manager level capabilities
+ */
+struct rte_tm_level_capabilities {
+	/**< Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/**< Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/**< When non-zero, this flag indicates that all the non-leaf nodes on
+	 * this level have identical capability set. Valid only when
+	 * *n_nodes_nonleaf_max* is non-zero.
+	 */
+	int non_leaf_nodes_identical;
+
+	/**< When non-zero, this flag indicates that all the leaf nodes on this
+	 * level have identical capability set. Valid only when
+	 * *n_nodes_leaf_max* is non-zero.
+	 */
+	int leaf_nodes_identical;
+
+	union {
+		/**< Items valid only for the non-leaf nodes on this level. */
+		struct {
+			/**< Private shaper support. When non-zero, it indicates
+			 * there is at least one non-leaf node on this level
+			 * with private shaper support, which may not be the
+			 * case for all the non-leaf nodes on this level.
+			 */
+			int shaper_private_supported;
+
+			/**< Dual rate support for private shaper. Valid only
+			 * when private shaper is supported for the non-leaf
+			 * nodes on the current level. When non-zero, it
+			 * indicates there is at least one non-leaf node on this
+			 * level with dual rate private shaper support, which
+			 * may not be the case for all the non-leaf nodes on
+			 * this level.
+			 */
+			int shaper_private_dual_rate_supported;
+
+			/**< Minimum committed/peak rate (bytes per second) for
+			 * private shapers of the non-leaf nodes of this level.
+			 * Valid only when private shaper is supported on this
+			 * level.
+			 */
+			uint64_t shaper_private_rate_min;
+
+			/**< Maximum committed/peak rate (bytes per second) for
+			 * private shapers of the non-leaf nodes on this level.
+			 * Valid only when private shaper is supported on this
+			 * level.
+			 */
+			uint64_t shaper_private_rate_max;
+
+			/**< Maximum number of shared shapers that any non-leaf
+			 * node on this level can be part of. The value of zero
+			 * indicates that shared shapers are not supported by
+			 * the non-leaf nodes on this level. When non-zero, it
+			 * indicates there is at least one non-leaf node on this
+			 * level that meets this condition, which may not be the
+			 * case for all the non-leaf nodes on this level.
+			 */
+			uint32_t shaper_shared_n_max;
+
+			/**< Maximum number of children nodes. This parameter
+			 * indicates that there is at least one non-leaf node on
+			 * this level that can be configured with this many
+			 * children nodes, which might not be true for all the
+			 * non-leaf nodes on this level.
+			 */
+			uint32_t sched_n_children_max;
+
+			/**< Maximum number of supported priority levels. This
+			 * parameter indicates that there is at least one
+			 * non-leaf node on this level that can be configured
+			 * with this many priority levels for managing its
+			 * children nodes, which might not be true for all the
+			 * non-leaf nodes on this level. The value of zero is
+			 * invalid. The value of 1 indicates that only priority
+			 * 0 is supported, which essentially means that Strict
+			 * Priority (SP) algorithm is not supported on this
+			 * level.
+			 */
+			uint32_t sched_sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size of
+			 * the WFQ sibling node group. This parameter indicates
+			 * there is at least one non-leaf node on this level
+			 * that meets this condition, which may not be true for
+			 * all the non-leaf nodes on this level. The value of
+			 * zero is invalid. The value of 1 indicates that WFQ
+			 * algorithm is not supported on this level. The maximum
+			 * value is *sched_n_children_max*.
+			 */
+			uint32_t sched_wfq_n_children_per_group_max;
+
+			/**< Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ sibling node groups that
+			 * have two or more members. This parameter indicates
+			 * there is at least one non-leaf node on this level
+			 * that meets this condition, which might not be true
+			 * for all the non-leaf nodes. The value of zero states
+			 * that WFQ algorithm is not supported on this level.
+			 * The value of 1 indicates that
+			 * (*sched_sp_n_priorities_max* - 1) priority levels on
+			 * this level have at most one child node, so there can
+			 * be only one priority level with two or more sibling
+			 * nodes making up a WFQ group on this level. The
+			 * maximum value is:
+			 * min(floor(*sched_n_children_max* / 2),
+			 * *sched_sp_n_priorities_max*).
+			 */
+			uint32_t sched_wfq_n_groups_max;
+
+			/**< Maximum WFQ weight. The value of 1 indicates that
+			 * all sibling nodes on this level with same priority
+			 * have the same WFQ weight, so on this level WFQ is
+			 * reduced to FQ.
+			 */
+			uint32_t sched_wfq_weight_max;
+
+			/**< Mask of statistics counter types supported by the
+			 * non-leaf nodes on this level. Every supported
+			 * statistics counter type is supported by at least one
+			 * non-leaf node on this level, which may not be true
+			 * for all the non-leaf nodes on this level.
+			 * @see enum rte_tm_stats_type
+			 */
+			uint64_t stats_mask;
+		} nonleaf;
+
+		/**< Items valid only for the leaf nodes on this level. */
+		struct {
+			/**< Private shaper support. When non-zero, it indicates
+			 * there is at least one leaf node on this level with
+			 * private shaper support, which may not be the case for
+			 * all the leaf nodes on this level.
+			 */
+			int shaper_private_supported;
+
+			/**< Dual rate support for private shaper. Valid only
+			 * when private shaper is supported for the leaf nodes
+			 * on this level. When non-zero, it indicates there is
+			 * at least one leaf node on this level with dual rate
+			 * private shaper support, which may not be the case for
+			 * all the leaf nodes on this level.
+			 */
+			int shaper_private_dual_rate_supported;
+
+			/**< Minimum committed/peak rate (bytes per second) for
+			 * private shapers of the leaf nodes of this level.
+			 * Valid only when private shaper is supported for the
+			 * leaf nodes on this level.
+			 */
+			uint64_t shaper_private_rate_min;
+
+			/**< Maximum committed/peak rate (bytes per second) for
+			 * private shapers of the leaf nodes on this level.
+			 * Valid only when private shaper is supported for the
+			 * leaf nodes on this level.
+			 */
+			uint64_t shaper_private_rate_max;
+
+			/**< Maximum number of shared shapers that any leaf node
+			 * on this level can be part of. The value of zero
+			 * indicates that shared shapers are not supported by
+			 * the leaf nodes on this level. When non-zero, it
+			 * indicates there is at least one leaf node on this
+			 * level that meets this condition, which may not be the
+			 * case for all the leaf nodes on this level.
+			 */
+			uint32_t shaper_shared_n_max;
+
+			/**< Head drop algorithm support. When non-zero, this
+			 * parameter indicates that there is at least one leaf
+			 * node on this level that supports the head drop
+			 * algorithm, which might not be true for all the leaf
+			 * nodes on this level.
+			 */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support. When non-zero, it
+			 * indicates there is at least one node on this level
+			 * with private WRED context support, which may not be
+			 * true for all the leaf nodes on this level. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts that any
+			 * leaf node on this level can be part of. The value of
+			 * zero indicates that shared WRED contexts are not
+			 * supported by the leaf nodes on this level. When
+			 * non-zero, it indicates there is at least one leaf
+			 * node on this level that meets this condition, which
+			 * may not be the case for all the leaf nodes on this
+			 * level.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+
+			/**< Mask of statistics counter types supported by the
+			 * leaf nodes on this level. Every supported statistics
+			 * counter type is supported by at least one leaf node
+			 * on this level, which may not be true for all the leaf
+			 * nodes on this level.
+			 * @see enum rte_tm_stats_type
+			 */
+			uint64_t stats_mask;
+		} leaf;
+	};
+};
+
+/**
+ * Traffic manager node capabilities
+ */
+struct rte_tm_node_capabilities {
+	/**< Private shaper support for the current node. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper of current node.
+	 * Valid only when private shaper is supported by the current node.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper of current node. Valid only when private shaper is supported
+	 * by the current node.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper of current node. Valid only when private shaper is supported
+	 * by the current node.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of shared shapers the current node can be part of.
+	 * The value of zero indicates that shared shapers are not supported by
+	 * the current node.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	union {
+		/**< Items valid only for non-leaf nodes. */
+		struct {
+			/**< Maximum number of children nodes. */
+			uint32_t sched_n_children_max;
+
+			/**< Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sched_sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size
+			 * of the WFQ sibling node group. The value of zero
+			 * is invalid. The value of 1 indicates that WFQ
+			 * algorithm is not supported. The maximum value is
+			 * *sched_n_children_max*.
+			 */
+			uint32_t sched_wfq_n_children_per_group_max;
+
+			/**< Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ sibling node groups that have
+			 * two or more members. The value of zero states that
+			 * WFQ algorithm is not supported. The value of 1
+			 * indicates that (*sched_sp_n_priorities_max* - 1)
+			 * priority levels have at most one child node, so there
+			 * can be only one priority level with two or more
+			 * sibling nodes making up a WFQ group. The maximum
+			 * value is: min(floor(*sched_n_children_max* / 2),
+			 * *sched_sp_n_priorities_max*).
+			 */
+			uint32_t sched_wfq_n_groups_max;
+
+			/**< Maximum WFQ weight. The value of 1 indicates that
+			 * all sibling nodes with same priority have the same
+			 * WFQ weight, so WFQ is reduced to FQ.
+			 */
+			uint32_t sched_wfq_weight_max;
+		} nonleaf;
+
+		/**< Items valid only for leaf nodes. */
+		struct {
+			/**< Head drop algorithm support for current node. */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support for current node. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts the current
+			 * node can be part of. The value of zero indicates that
+			 * shared WRED contexts are not supported by the current
+			 * node.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+
+	/**< Mask of statistics counter types supported by the current node.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_tm_cman_mode {
+	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * Random Early Detection (RED) profile
+ */
+struct rte_tm_red_params {
+	/**< Minimum queue threshold */
+	uint16_t min_th;
+
+	/**< Maximum queue threshold */
+	uint16_t max_th;
+
+	/**< Inverse of packet marking probability maximum value (maxp), i.e.
+	 * maxp_inv = 1 / maxp
+	 */
+	uint16_t maxp_inv;
+
+	/**< Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
+	uint16_t wq_log2;
+};
+
+/**
+ * Weighted RED (WRED) profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_tm_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_tm_red_params red_params[RTE_TM_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_tm_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the peak bucket has to be bigger than zero, as well as greater than
+ * or equal to the rate of the committed bucket.
+ */
+struct rte_tm_shaper_params {
+	/**< Committed token bucket */
+	struct rte_tm_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_tm_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each non-leaf node has multiple inputs (its children nodes) and single output
+ * (which is input to its parent node). It arbitrates its inputs using Strict
+ * Priority (SP) and Weighted Fair Queuing (WFQ) algorithms to schedule input
+ * packets to its output while observing its shaping (rate limiting)
+ * constraints.
+ *
+ * Algorithms such as Weighted Round Robin (WRR), Byte-level WRR, Deficit WRR
+ * (DWRR), etc. are considered approximations of the WFQ ideal and are
+ * assimilated to WFQ, although an associated implementation-dependent trade-off
+ * on accuracy, performance and resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP algorithm
+ * based on their priority, with zero (0) as the highest priority. Children with
+ * the same priority are scheduled using the WFQ algorithm according to their
+ * weights. The WFQ weight of a given child node is relative to the sum of the
+ * weights of all its sibling nodes that have the same priority, with one (1) as
+ * the lowest weight. For each SP priority, the WFQ weight mode can be set as
+ * either byte-based or packet-based.
+ *
+ * Each leaf node sits on top of a TX queue of the current Ethernet port. Hence,
+ * the leaf nodes are predefined, with their node IDs set to 0 .. (N-1), where N
+ * is the number of TX queues configured for the current Ethernet port. The
+ * non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_tm_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< WFQ weight mode for each SP priority. When NULL, it
+			 * indicates that WFQ is to be used for all priorities.
+			 * When non-NULL, it points to a pre-allocated array of
+			 * *n_sp_priorities* values, with non-zero value for
+			 * byte-mode and zero for packet-mode.
+			 */
+			int *wfq_weight_mode;
+
+			/**< Number of SP priorities. */
+			uint32_t n_sp_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_tm_cman_mode cman;
+
+			/**< WRED parameters (only valid when *cman* is set to
+			 * WRED).
+			 */
+			struct {
+				/**< WRED profile for private WRED context. The
+				 * absence of a private WRED context for the
+				 * current leaf node is indicated by value
+				 * RTE_TM_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED context
+				 * IDs. When set to NULL, it indicates that the
+				 * current leaf node should not currently be
+				 * part of any shared WRED contexts.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of elements in the
+				 * *shared_wred_context_id* array. Only valid
+				 * when *shared_wred_context_id* is non-NULL,
+				 * in which case it should be non-zero.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+
+	/**< Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_tm_error::cause.
+ */
+enum rte_tm_error_type {
+	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
+	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_TM_ERROR_TYPE_CAPABILITIES,
+	RTE_TM_ERROR_TYPE_LEVEL_ID,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
+	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SP_PRIORITIES,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_TM_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_tm_error {
+	enum rte_tm_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Traffic manager get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port)
+ * and their IDs are 0 .. (N-1).
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
+ * the number of TX queues of the current Ethernet port. The non-leaf nodes
+ * have their IDs generated by the application outside of the above range,
+ * which is reserved for leaf nodes.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID value. Needs to be valid.
+ * @param[out] is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node level get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID value. Needs to be valid.
+ * @param[out] level_id
+ *   Node level ID. Needs to be non-NULL.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_level_get(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t *level_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] cap
+ *   Traffic manager capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager level capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] level_id
+ *   The hierarchy level identifier. The value of 0 identifies the level of the
+ *   root node.
+ * @param[out] cap
+ *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] cap
+ *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param[in] profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is
+ * currently at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several hierarchy leaf nodes
+ * configured to use WRED as the congestion management mode.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID
+ * @param[in] wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
+ * context.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param[in] profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
+ * no longer using the shaper profile previously assigned to it and is updated
+ * to use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. hierarchy node) of this shared shaper.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ *weight* to schedule its new
+ * child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that
+ * are subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
+ * can only be one node with this parent ID (i.e. the root node). Further
+ * restrictions for root node: needs to be non-leaf, its private shaper profile
+ * needs to be valid and single rate, cannot use any shared shapers.
+ *
+ * When called before rte_tm_hierarchy_commit() invocation, this function is
+ * typically used to define the initial start-up hierarchy for the port.
+ * Provided that dynamic hierarchy updates are supported by the current port (as
+ * advertised in the port capability set), this function can be also called
+ * after the rte_tm_hierarchy_commit() invocation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param[in] parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[in] params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_hierarchy_commit()
+ * @see RTE_TM_UPDATE_NODE_ADD_DELETE
+ */
+int
+rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add with node level check
+ *
+ * Simple rte_tm_node_add() wrapper that also checks the node level.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param[in] parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[in] level_id
+ *   Level ID that should be met by this node.
+ * @param[in] params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+static inline int
+rte_tm_node_add_check_level(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	uint32_t lid;
+	int status;
+
+	status = rte_tm_node_add(port_id, node_id,
+		parent_node_id, priority, weight, params, error);
+	if (status)
+		return status;
+
+	status = rte_tm_node_level_get(port_id, node_id, &lid, error);
+	if (status)
+		return status;
+
+	if (lid != level_id){
+		if (error){
+			error->type = RTE_TM_ERROR_TYPE_LEVEL_ID;
+			error->cause = NULL;
+			error->message = rte_strerror(EINVAL);
+		}
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * Traffic manager node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has
+ * at least one user (i.e. child node).
+ *
+ * When called before rte_tm_hierarchy_commit() invocation, this function is
+ * typically used to define the initial start-up hierarchy for the port.
+ * Provided that dynamic hierarchy updates are supported by the current port (as
+ * advertised in the port capability set), this function can be also called
+ * after the rte_tm_hierarchy_commit() invocation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_ADD_DELETE
+ */
+int
+rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node suspend
+ *
+ * Suspend an existing node. While the node is in suspended state, no packet is
+ * scheduled from this node and its descendants. The node exits the suspended
+ * state through the node resume operation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_resume()
+ * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
+ */
+int
+rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node resume
+ *
+ * Resume an existing node that is currently in suspended state. The node
+ * entered the suspended state as result of a previous node suspend operation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_suspend()
+ * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
+ */
+int
+rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager hierarchy commit
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the start-up hierarchy.
+ *
+ * This function typically performs the following steps:
+ *    a) It validates the start-up hierarchy that was previously defined for the
+ *       current port through successive rte_tm_node_add() invocations;
+ *    b) Assuming successful validation, it performs all the necessary port
+ *       specific configuration operations to install the specified hierarchy on
+ *       the current port, with immediate effect once the port is started.
+ *
+ * This function fails when the currently configured hierarchy is not supported
+ * by the Ethernet port, in which case the user can abort or try out another
+ * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can be
+ * build from scratch (when *clear_on_fail* is enabled) or by modifying the
+ * existing hierarchy configuration (when *clear_on_fail* is disabled).
+ *
+ * Note that this function can still fail due to other causes (e.g. not enough
+ * memory available in the system, etc), even though the specified hierarchy is
+ * supported in principle by the current port.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_delete()
+ */
+int
+rte_tm_hierarchy_commit(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node parent update
+ *
+ * Restriction for root node: its parent cannot be changed.
+ *
+ * This function can only be called after the rte_tm_hierarchy_commit()
+ * invocation. Its success depends on the port support for this operation, as
+ * advertised through the port capability set.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL
+ * @see RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL
+ */
+int
+rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private shaper update
+ *
+ * Restriction for the root node: its private shaper profile needs to be valid
+ * and single rate.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared shapers update
+ *
+ * Restriction for root node: cannot use any shared rate shapers.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param[in] add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node enabled statistics counters update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to
+ *   be disabled for the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see enum rte_tm_stats_type
+ * @see RTE_TM_UPDATE_NODE_STATS
+ */
+int
+rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node WFQ weight mode update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] wfq_weight_mode
+ *   WFQ weight mode for each SP priority. When NULL, it indicates that WFQ is
+ *   to be used for all priorities. When non-NULL, it points to a pre-allocated
+ *   array of *n_sp_priorities* values, with non-zero value for byte-mode and
+ *   zero for packet-mode.
+ * @param[in] n_sp_priorities
+ *   Number of SP priorities.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE
+ * @see RTE_TM_UPDATE_NODE_N_SP_PRIORITIES
+ */
+int
+rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *wfq_weight_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node congestion management mode update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] cman
+ *   Congestion management mode.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_CMAN
+ */
+int
+rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private WRED context update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with the
+ *   latter disabling the private WRED context of the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared WRED context update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param[in] add
+ *   Set to non-zero value to add this shared WRED context to current node or
+ *   to zero to delete this shared WRED context from current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node statistics counters read
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param[out] stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters
+ *   retrieved with the *stats* structure are valid.
+ * @param[in] clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read,
+ *   otherwise the statistics counters are left untouched.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see enum rte_tm_stats_type
+ */
+int
+rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_vlan_dei_supported
+ */
+int
+rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
+ * Notification (ECN) field (2 bits). The DSCP field is typically used to
+ * encode the traffic class and/or drop priority (RFC 2597), while the ECN
+ * field is used by RFC 3168 to implement a congestion notification mechanism
+ * to be leveraged by transport layer protocols such as TCP and SCTP that have
+ * congestion control mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10
+ * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
+ * that congestion is experienced). The destination endpoint can use the
+ * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
+ * source endpoint, which acknowledges it back to the destination endpoint with
+ * the Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_ip_ecn_tcp_supported
+ * @see struct rte_tm_capabilities::mark_ip_ecn_sctp_supported
+ */
+int
+rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
+ * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_ip_dscp_supported
+ */
+int
+rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_H__ */
diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
new file mode 100644
index 0000000..c25f102
--- /dev/null
+++ b/lib/librte_ether/rte_tm_driver.h
@@ -0,0 +1,373 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_DRIVER_H__
+#define __INCLUDE_RTE_TM_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node type get */
+
+typedef int (*rte_tm_node_level_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t *level_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node level get */
+
+typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager capabilities get */
+
+typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager level capabilities get */
+
+typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node capabilities get */
+
+typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile add */
+
+typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile delete */
+
+typedef int (*rte_tm_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context add */
+
+typedef int (*rte_tm_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context delete */
+
+typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile add */
+
+typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile delete */
+
+typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper add/update */
+
+typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper delete */
+
+typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node add */
+
+typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node delete */
+
+typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node suspend */
+
+typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node resume */
+
+typedef int (*rte_tm_hierarchy_commit_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager hierarchy commit */
+
+typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node parent update */
+
+typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node stats update */
+
+typedef int (*rte_tm_node_wfq_weight_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *wfq_weigth_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WFQ weight mode update */
+
+typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node congestion management mode update */
+
+typedef int (*rte_tm_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager read stats counters for specific node */
+
+typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - VLAN DEI */
+
+typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
+
+struct rte_tm_ops {
+	/** Traffic manager node type get */
+	rte_tm_node_type_get_t node_type_get;
+	/** Traffic manager node level get */
+	rte_tm_node_level_get_t node_level_get;
+
+	/** Traffic manager capabilities_get */
+	rte_tm_capabilities_get_t capabilities_get;
+	/** Traffic manager level capabilities_get */
+	rte_tm_level_capabilities_get_t level_capabilities_get;
+	/** Traffic manager node capabilities get */
+	rte_tm_node_capabilities_get_t node_capabilities_get;
+
+	/** Traffic manager WRED profile add */
+	rte_tm_wred_profile_add_t wred_profile_add;
+	/** Traffic manager WRED profile delete */
+	rte_tm_wred_profile_delete_t wred_profile_delete;
+	/** Traffic manager shared WRED context add/update */
+	rte_tm_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Traffic manager shared WRED context delete */
+	rte_tm_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Traffic manager shaper profile add */
+	rte_tm_shaper_profile_add_t shaper_profile_add;
+	/** Traffic manager shaper profile delete */
+	rte_tm_shaper_profile_delete_t shaper_profile_delete;
+	/** Traffic manager shared shaper add/update */
+	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Traffic manager shared shaper delete */
+	rte_tm_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Traffic manager node add */
+	rte_tm_node_add_t node_add;
+	/** Traffic manager node delete */
+	rte_tm_node_delete_t node_delete;
+	/** Traffic manager node suspend */
+	rte_tm_node_suspend_t node_suspend;
+	/** Traffic manager node resume */
+	rte_tm_node_resume_t node_resume;
+	/** Traffic manager hierarchy commit */
+	rte_tm_hierarchy_commit_t hierarchy_commit;
+
+	/** Traffic manager node parent update */
+	rte_tm_node_parent_update_t node_parent_update;
+	/** Traffic manager node shaper update */
+	rte_tm_node_shaper_update_t node_shaper_update;
+	/** Traffic manager node shared shaper update */
+	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Traffic manager node stats update */
+	rte_tm_node_stats_update_t node_stats_update;
+	/** Traffic manager node WFQ weight mode update */
+	rte_tm_node_wfq_weight_mode_update_t node_wfq_weight_mode_update;
+	/** Traffic manager node congestion management mode update */
+	rte_tm_node_cman_update_t node_cman_update;
+	/** Traffic manager node WRED context update */
+	rte_tm_node_wred_context_update_t node_wred_context_update;
+	/** Traffic manager node shared WRED context update */
+	rte_tm_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Traffic manager read statistics counters for current node */
+	rte_tm_node_stats_read_t node_stats_read;
+
+	/** Traffic manager packet marking - VLAN DEI */
+	rte_tm_mark_vlan_dei_t mark_vlan_dei;
+	/** Traffic manager packet marking - IPv4/IPv6 ECN */
+	rte_tm_mark_ip_ecn_t mark_ip_ecn;
+	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
+	rte_tm_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to error structure (may be NULL).
+ * @param[in] code
+ *   Related error code (rte_errno).
+ * @param[in] type
+ *   Cause field and error type.
+ * @param[in] cause
+ *   Object responsible for the error.
+ * @param[in] message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_tm_error_set(struct rte_tm_error *error,
+		   int code,
+		   enum rte_tm_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_tm_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic traffic manager operations structure from a port
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] error
+ *   Error details
+ *
+ * @return
+ *   The traffic manager operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 2/2] ethdev: add traffic management API
  2017-05-19 17:12     ` [PATCH v4 2/2] ethdev: add traffic management API Cristian Dumitrescu
@ 2017-05-19 17:34       ` Stephen Hemminger
  2017-05-22 14:25         ` Dumitrescu, Cristian
  2017-05-24 11:28       ` Hemant Agrawal
  2017-05-31 13:45       ` Jerin Jacob
  2 siblings, 1 reply; 52+ messages in thread
From: Stephen Hemminger @ 2017-05-19 17:34 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

On Fri, 19 May 2017 18:12:52 +0100
Cristian Dumitrescu <cristian.dumitrescu@intel.com> wrote:

> +
> +#define RTE_TM_FUNC(port_id, func)				\
> +({							\
> +	const struct rte_tm_ops *ops =			\
> +		rte_tm_ops_get(port_id, error);		\
> +	if (ops == NULL)					\
> +		return -rte_errno;			\
> +							\
> +	if (ops->func == NULL)				\
> +		return -rte_tm_error_set(error,		\
> +			ENOSYS,				\
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
> +			NULL,				\
> +			rte_strerror(ENOSYS));		\
> +							\
> +	ops->func;					\
> +})

If you are going to use a templating macro why not go all the way
and generate the whole function. Examples are in Linux kernel
macros are often used to  generate show and set functions.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 2/2] ethdev: add traffic management API
  2017-05-19 17:34       ` Stephen Hemminger
@ 2017-05-22 14:25         ` Dumitrescu, Cristian
  0 siblings, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-05-22 14:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, May 19, 2017 6:34 PM
> To: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; thomas.monjalon@6wind.com;
> jerin.jacob@caviumnetworks.com;
> balasubramanian.manoharan@cavium.com; hemant.agrawal@nxp.com;
> shreyansh.jain@nxp.com
> Subject: Re: [dpdk-dev] [PATCH v4 2/2] ethdev: add traffic management API
> 
> On Fri, 19 May 2017 18:12:52 +0100
> Cristian Dumitrescu <cristian.dumitrescu@intel.com> wrote:
> 
> > +
> > +#define RTE_TM_FUNC(port_id, func)				\
> > +({							\
> > +	const struct rte_tm_ops *ops =			\
> > +		rte_tm_ops_get(port_id, error);		\
> > +	if (ops == NULL)					\
> > +		return -rte_errno;			\
> > +							\
> > +	if (ops->func == NULL)				\
> > +		return -rte_tm_error_set(error,		\
> > +			ENOSYS,				\
> > +			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
> > +			NULL,				\
> > +			rte_strerror(ENOSYS));		\
> > +							\
> > +	ops->func;					\
> > +})
> 
> If you are going to use a templating macro why not go all the way
> and generate the whole function. Examples are in Linux kernel
> macros are often used to  generate show and set functions.

After thinking long and hard, this is the best I was able to come up with. It would be good if you could pick any of the functions in this file and provide an example for your idea?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 2/2] ethdev: add traffic management API
  2017-05-19 17:12     ` [PATCH v4 2/2] ethdev: add traffic management API Cristian Dumitrescu
  2017-05-19 17:34       ` Stephen Hemminger
@ 2017-05-24 11:28       ` Hemant Agrawal
  2017-05-31 13:45       ` Jerin Jacob
  2 siblings, 0 replies; 52+ messages in thread
From: Hemant Agrawal @ 2017-05-24 11:28 UTC (permalink / raw)
  To: Cristian Dumitrescu, dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan, shreyansh.jain

On 5/19/2017 10:42 PM, Cristian Dumitrescu wrote:
> This patch introduces the generic ethdev API for the traffic manager
> capability, which includes: hierarchical scheduling, traffic shaping,
> congestion management, packet marking.
>
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow)
> - Capability query API per port, per level and per node
> - Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
> - Traffic shaping: single/dual rate, private (per node) and shared (by
>   multiple nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
>
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
> Changes in v4:
> - Implemented feedback from Hemant [6]
> 	- Capability API: Reworked the port, level and node capability API
> 	  data structure to remove confusion due to "summary across all
> 	  nodes" approach, which made it unclear whether a particular
> 	  capability is supported by all nodes or by at least one node.
> 	- Capability API: Added flags for "all nodes have identical
> 	  capability set"
> 	- Suspended state: documented the required behavior in Doxygen
> 	  description
> - Implemented feedback from Jerin [7]
> 	- Node add: added level parameter (see new API function:
> 	  rte_tm_node_add_check_level())
> 	- RTE_TM_ETH_FRAMING_OVERHEAD, RTE_TM_ETH_FRAMING_OVERHEAD_FCS:
> 	  documented their usage in their Doxygen description
> 	- Capability API: for each function, mention the related
> 	  capability field (Doxygen @see)
> 	- stats_mask, capability_mask: document the enum flags used to
> 	  build each mask (Doxygen @see)
> 	- Rename rte_tm_get_leaf_nodes() to
> 	  rte_tm_get_number_of_leaf_nodes()
> 	- Doxygen: add @param[in, out] to the description of all API funcs
> 	- Doxygen: fix hooks in doc/api/doxy-api-index.md
> - Rename rte_tm_hierarchy_set() to rte_tm_hierarchy_commit(), improved
>   Doxygen description
> - Node add, node delete: improved Doxygen description
> - Fixed incorrect design assumption that packet-based weight mode for WFQ
>   is identical to WRR. As result, removed all references to WRR support.
>   Renamed the "scheduling mode" node parameters to "wfq_weight_mode".
>
> Changes in v3:
> - Implemented feedback from Jerin [5]
> - Changed naming convention: scheddev -> tm
> - Improvements on the capability API:
> 	- Specification of marking capabilities per color
> 	- WFQ/WRR groups: sp_n_children_max ->
> 	  wfq_wrr_n_children_per_group_max, added wfq_wrr_n_groups_max,
> 	  improved description of both, improved description of
> 	  wfq_wrr_weight_max
> 	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent
> 	  update
> - Enforced/documented restrictions for root node (node_add() and
>   update())
> - Enforced/documented shaper profile restrictions on PIR: PIR != 0,
>   PIR >= CIR
> - Turned repetitive code in rte_tm.c into macro
> - Removed dependency on rte_red.h file (added RED params to rte_tm.h)
> - Color: removed "e_" from color names enum
> - Fixed small Doxygen style issues
>
> Changes in v2:
> - Implemented feedback from Hemant [4]
> - Improvements on the capability API
> 	- Added capability API for hierarchy level
> 	- Merged stats capability into the capability API
> 	- Added dynamic updates
> 	- Added non-leaf/leaf union to the node capability structure
> 	- Renamed sp_priority_min to sp_n_priorities_max, added
> 	  clarifications
> 	- Fixed description for sp_n_children_max
> - Clarified and enforced rule on node ID range for leaf and non-leaf nodes
> 	- Added API functions to get node type (i.e. leaf/non-leaf):
> 	  get_leaf_nodes(), node_type_get()
> - Added clarification for the root node: its creation, parent, role
> 	- Macro NODE_ID_NULL as root node's parent
> 	- Description of the node_add() and node_parent_update() API funcs
> - Added clarification for the first time add vs. subsequent updates rule
> 	- Cleaned up the description for the node_add() function
> - Statistics API improvements
> 	- Merged stats capability into the capability API
> 	- Added API function node_stats_update()
> 	- Added more stats per packet color
> - Added more error types
> - Fixed small Doxygen style issues
>
> Changes in v1 (since RFC [1]):
> - Implemented as ethdev plugin (similar to rte_flow) as opposed to more
>   monolithic additions to ethdev itself
> - Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
>   suggested items with only one exception, see the long list below,
>   hopefully nothing was forgotten.
>     - The item not done (hopefully for a good reason): driver-generated
>       object IDs. IMO the choice to have application-generated object IDs
>       adds marginal complexity to the driver (search ID function
>       required), but it provides huge simplification for the application.
>       The app does not need to worry about building & managing tree-like
>       structure for storing driver-generated object IDs, the app can use
>       its own convention for node IDs depending on the specific hierarchy
>       that it needs. Trivial example: identify all level-2 nodes with IDs
>       like 100, 200, 300, … and the level-3 nodes based on their level-2
>       parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …, 310, 320,
>       330, … and level-4 nodes based on their level-3 parents: 111, 112,
>       113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log
>       for the other related simplification that was implemented: leaf
>       nodes now have predefined IDs that are the same with their Ethernet
>       TX queue ID ( therefore no translation is required for leaf nodes).
> - Capability API. Done per port and per node as well.
> - Dual rate shapers
> - Added configuration of private shaper (per node) directly from the
>   shaper profile as part of node API (no shaper ID needed for private
>   shapers), while the shared shapers are configured outside of the node
>   API using shaper profile and communicated to the node using shared
>   shaper ID. So there is no configuration overhead for shared shapers if
>   the app does not use any of them.
> - Leaf nodes now have predefined IDs that are the same with their Ethernet
>   TX queue ID (therefore no translation is required for leaf nodes). This
>   is also used to differentiate between a leaf node and a non-leaf node.
> - Domain-specific errors to give a precise indication of the error cause
>   (same as done by rte_flow)
> - Packet marking API
> - Packet length optional adjustment for shapers, positive (e.g. for adding
>   Ethernet framing overhead of 20 bytes) or negative (e.g. for rate
>   limiting based on IP packet bytes)
>
> [1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
> [2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
> [3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
> [4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
> [5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html
> [6] Hemant's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-March/062354.html
> [7] Jerin's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-April/063429.html
>
>  MAINTAINERS                            |    4 +
>  lib/librte_ether/Makefile              |    5 +-
>  lib/librte_ether/rte_ether_version.map |   30 +
>  lib/librte_ether/rte_tm.c              |  448 ++++++++
>  lib/librte_ether/rte_tm.h              | 1923 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_tm_driver.h       |  373 +++++++
>  6 files changed, 2782 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_ether/rte_tm.c
>  create mode 100644 lib/librte_ether/rte_tm.h
>  create mode 100644 lib/librte_ether/rte_tm_driver.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index afb4cab..cdaf2ac 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -240,6 +240,10 @@ Flow API
>  M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
>  F: lib/librte_ether/rte_flow*
>
> +Traffic Management API
> +M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> +F: lib/librte_ether/rte_tm*
> +
>  Crypto API
>  M: Declan Doherty <declan.doherty@intel.com>
>  F: lib/librte_cryptodev/
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index 93fdde1..db692ae 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -1,6 +1,6 @@
>  #   BSD LICENSE
>  #
> -#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
> +#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
>  #   All rights reserved.
>  #
>  #   Redistribution and use in source and binary forms, with or without
> @@ -45,6 +45,7 @@ LIBABIVER := 6
>
>  SRCS-y += rte_ethdev.c
>  SRCS-y += rte_flow.c
> +SRCS-y += rte_tm.c
>
>  #
>  # Export include files
> @@ -56,5 +57,7 @@ SYMLINK-y-include += rte_eth_ctrl.h
>  SYMLINK-y-include += rte_dev_info.h
>  SYMLINK-y-include += rte_flow.h
>  SYMLINK-y-include += rte_flow_driver.h
> +SYMLINK-y-include += rte_tm.h
> +SYMLINK-y-include += rte_tm_driver.h
>
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
> index ff056e8..7f39904 100644
> --- a/lib/librte_ether/rte_ether_version.map
> +++ b/lib/librte_ether/rte_ether_version.map
> @@ -161,4 +161,34 @@ DPDK_17.08 {
>      global:
>
>  	rte_eth_dev_tm_ops_get;
> +	rte_tm_get_leaf_nodes;
> +	rte_tm_node_type_get;
> +	rte_tm_capabilities_get;
> +	rte_tm_level_capabilities_get;
> +	rte_tm_node_capabilities_get;
> +	rte_tm_wred_profile_add;
> +	rte_tm_wred_profile_delete;
> +	rte_tm_shared_wred_context_add_update;
> +	rte_tm_shared_wred_context_delete;
> +	rte_tm_shaper_profile_add;
> +	rte_tm_shaper_profile_delete;
> +	rte_tm_shared_shaper_add_update;
> +	rte_tm_shared_shaper_delete;
> +	rte_tm_node_add;
> +	rte_tm_node_delete;
> +	rte_tm_node_suspend;
> +	rte_tm_node_resume;
> +	rte_tm_hierarchy_commit;
> +	rte_tm_node_parent_update;
> +	rte_tm_node_shaper_update;
> +	rte_tm_node_shared_shaper_update;
> +	rte_tm_node_stats_update;
> +	rte_tm_node_wfq_weight_mode_update;
> +	rte_tm_node_cman_update;
> +	rte_tm_node_wred_context_update;
> +	rte_tm_node_shared_wred_context_update;
> +	rte_tm_node_stats_read;
> +	rte_tm_mark_vlan_dei;
> +	rte_tm_mark_ip_ecn;
> +	rte_tm_mark_ip_dscp;
>  } DPDK_17.05
> diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
> new file mode 100644
> index 0000000..2617a1a
> --- /dev/null
> +++ b/lib/librte_ether/rte_tm.c
> @@ -0,0 +1,448 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include "rte_ethdev.h"
> +#include "rte_tm_driver.h"
> +#include "rte_tm.h"
> +
> +/* Get generic traffic manager operations structure from a port. */
> +const struct rte_tm_ops *
> +rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_tm_ops *ops;
> +
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		rte_tm_error_set(error,
> +			ENODEV,
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> +			NULL,
> +			rte_strerror(ENODEV));
> +		return NULL;
> +	}
> +
> +	if ((dev->dev_ops->tm_ops_get == NULL) ||
> +		(dev->dev_ops->tm_ops_get(dev, &ops) != 0) ||
> +		(ops == NULL)) {
> +		rte_tm_error_set(error,
> +			ENOSYS,
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> +			NULL,
> +			rte_strerror(ENOSYS));
> +		return NULL;
> +	}
> +
> +	return ops;
> +}
> +
> +#define RTE_TM_FUNC(port_id, func)				\
> +({							\
> +	const struct rte_tm_ops *ops =			\
> +		rte_tm_ops_get(port_id, error);		\
> +	if (ops == NULL)					\
> +		return -rte_errno;			\
> +							\
> +	if (ops->func == NULL)				\
> +		return -rte_tm_error_set(error,		\
> +			ENOSYS,				\
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
> +			NULL,				\
> +			rte_strerror(ENOSYS));		\
> +							\
> +	ops->func;					\
> +})
> +
> +/* Get number of leaf nodes */
> +int
> +rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
> +	uint32_t *n_leaf_nodes,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_tm_ops *ops =
> +		rte_tm_ops_get(port_id, error);
> +
> +	if (ops == NULL)
> +		return -rte_errno;
> +
> +	if (n_leaf_nodes == NULL) {
> +		rte_tm_error_set(error,
> +			EINVAL,
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> +			NULL,
> +			rte_strerror(EINVAL));
> +		return -rte_errno;
> +	}
> +
> +	*n_leaf_nodes = dev->data->nb_tx_queues;
> +	return 0;
> +}
> +
> +/* Check node type (leaf or non-leaf) */
> +int
> +rte_tm_node_type_get(uint8_t port_id,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_type_get)(dev,
> +		node_id, is_leaf, error);
> +}
> +
> +/* Get node level */
> +int
> +rte_tm_node_level_get(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t *level_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_level_get)(dev,
> +		node_id, level_id, error);
> +}
> +
> +/* Get capabilities */
> +int rte_tm_capabilities_get(uint8_t port_id,
> +	struct rte_tm_capabilities *cap,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
> +		cap, error);
> +}
> +
> +/* Get level capabilities */
> +int rte_tm_level_capabilities_get(uint8_t port_id,
> +	uint32_t level_id,
> +	struct rte_tm_level_capabilities *cap,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
> +		level_id, cap, error);
> +}
> +
> +/* Get node capabilities */
> +int rte_tm_node_capabilities_get(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_node_capabilities *cap,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
> +		node_id, cap, error);
> +}
> +
> +/* Add WRED profile */
> +int rte_tm_wred_profile_add(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_wred_params *profile,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
> +		wred_profile_id, profile, error);
> +}
> +
> +/* Delete WRED profile */
> +int rte_tm_wred_profile_delete(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
> +		wred_profile_id, error);
> +}
> +
> +/* Add/update shared WRED context */
> +int rte_tm_shared_wred_context_add_update(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
> +		shared_wred_context_id, wred_profile_id, error);
> +}
> +
> +/* Delete shared WRED context */
> +int rte_tm_shared_wred_context_delete(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
> +		shared_wred_context_id, error);
> +}
> +
> +/* Add shaper profile */
> +int rte_tm_shaper_profile_add(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_shaper_params *profile,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
> +		shaper_profile_id, profile, error);
> +}
> +
> +/* Delete WRED profile */
> +int rte_tm_shaper_profile_delete(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
> +		shaper_profile_id, error);
> +}
> +
> +/* Add shared shaper */
> +int rte_tm_shared_shaper_add_update(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
> +		shared_shaper_id, shaper_profile_id, error);
> +}
> +
> +/* Delete shared shaper */
> +int rte_tm_shared_shaper_delete(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
> +		shared_shaper_id, error);
> +}
> +
> +/* Add node to port traffic manager hierarchy */
> +int rte_tm_node_add(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_add)(dev,
> +		node_id, parent_node_id, priority, weight, params, error);
> +}
> +
> +/* Delete node from traffic manager hierarchy */
> +int rte_tm_node_delete(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_delete)(dev,
> +		node_id, error);
> +}
> +
> +/* Suspend node */
> +int rte_tm_node_suspend(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_suspend)(dev,
> +		node_id, error);
> +}
> +
> +/* Resume node */
> +int rte_tm_node_resume(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_resume)(dev,
> +		node_id, error);
> +}
> +
> +/* Commit the initial port traffic manager hierarchy */
> +int rte_tm_hierarchy_commit(uint8_t port_id,
> +	int clear_on_fail,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, hierarchy_commit)(dev,
> +		clear_on_fail, error);
> +}
> +
> +/* Update node parent  */
> +int rte_tm_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
> +		node_id, parent_node_id, priority, weight, error);
> +}
> +
> +/* Update node private shaper */
> +int rte_tm_node_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
> +		node_id, shaper_profile_id, error);
> +}
> +
> +/* Update node shared shapers */
> +int rte_tm_node_shared_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int add,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
> +		node_id, shared_shaper_id, add, error);
> +}
> +
> +/* Update node stats */
> +int rte_tm_node_stats_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t stats_mask,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
> +		node_id, stats_mask, error);
> +}
> +
> +/* Update WFQ weight mode */
> +int rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
> +	uint32_t node_id,
> +	int *wfq_weight_mode,
> +	uint32_t n_sp_priorities,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_wfq_weight_mode_update)(dev,
> +		node_id, wfq_weight_mode, n_sp_priorities, error);
> +}
> +
> +/* Update node congestion management mode */
> +int rte_tm_node_cman_update(uint8_t port_id,
> +	uint32_t node_id,
> +	enum rte_tm_cman_mode cman,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
> +		node_id, cman, error);
> +}
> +
> +/* Update node private WRED context */
> +int rte_tm_node_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
> +		node_id, wred_profile_id, error);
> +}
> +
> +/* Update node shared WRED context */
> +int rte_tm_node_shared_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
> +		node_id, shared_wred_context_id, add, error);
> +}
> +
> +/* Read and/or clear stats counters for specific node */
> +int rte_tm_node_stats_read(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_node_stats *stats,
> +	uint64_t *stats_mask,
> +	int clear,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
> +		node_id, stats, stats_mask, clear, error);
> +}
> +
> +/* Packet marking - VLAN DEI */
> +int rte_tm_mark_vlan_dei(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
> +		mark_green, mark_yellow, mark_red, error);
> +}
> +
> +/* Packet marking - IPv4/IPv6 ECN */
> +int rte_tm_mark_ip_ecn(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
> +		mark_green, mark_yellow, mark_red, error);
> +}
> +
> +/* Packet marking - IPv4/IPv6 DSCP */
> +int rte_tm_mark_ip_dscp(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
> +		mark_green, mark_yellow, mark_red, error);
> +}
> diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
> new file mode 100644
> index 0000000..22167c2
> --- /dev/null
> +++ b/lib/librte_ether/rte_tm.h
> @@ -0,0 +1,1923 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef __INCLUDE_RTE_TM_H__
> +#define __INCLUDE_RTE_TM_H__
> +
> +/**
> + * @file
> + * RTE Generic Traffic Manager API
> + *
> + * This interface provides the ability to configure the traffic manager in a
> + * generic way. It includes features such as: hierarchical scheduling,
> + * traffic shaping, congestion management, packet marking, etc.
> + */
> +
> +#include <stdint.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Ethernet framing overhead.
> + *
> + * Overhead fields per Ethernet frame:
> + * 1. Preamble:                                            7 bytes;
> + * 2. Start of Frame Delimiter (SFD):                      1 byte;
> + * 3. Inter-Frame Gap (IFG):                              12 bytes.
> + *
> + * One of the typical values for the *pkt_length_adjust* field of the shaper
> + * profile.
> + *
> + * @see struct rte_tm_shaper_params
> + *
> + */
> +#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
> +
> +/**
> + * Ethernet framing overhead including the Frame Check Sequence (FCS) field.
> + * Useful when FCS is generated and added at the end of the Ethernet frame on
> + * TX side without any SW intervention.
> + *
> + * One of the typical values for the pkt_length_adjust field of the shaper
> + * profile.
> + *
> + * @see struct rte_tm_shaper_params
> + */
> +#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
> +
> +/**< Invalid WRED profile ID */
> +#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
> +
> +/**< Invalid shaper profile ID */
> +#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
> +
> +/**< Node ID for the parent of the root node */
> +#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
> +
> +/**
> + * Color
> + */
> +enum rte_tm_color {
> +	RTE_TM_GREEN = 0, /**< Green */
> +	RTE_TM_YELLOW, /**< Yellow */
> +	RTE_TM_RED, /**< Red */
> +	RTE_TM_COLORS /**< Number of colors */
> +};
> +
> +/**
> + * Node statistics counter type
> + */
> +enum rte_tm_stats_type {
> +	/**< Number of packets scheduled from current node. */
> +	RTE_TM_STATS_N_PKTS = 1 << 0,
> +
> +	/**< Number of bytes scheduled from current node. */
> +	RTE_TM_STATS_N_BYTES = 1 << 1,
> +
> +	/**< Number of green packets dropped by current leaf node.  */
> +	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
> +
> +	/**< Number of yellow packets dropped by current leaf node.  */
> +	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
> +
> +	/**< Number of red packets dropped by current leaf node.  */
> +	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
> +
> +	/**< Number of green bytes dropped by current leaf node.  */
> +	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
> +
> +	/**< Number of yellow bytes dropped by current leaf node.  */
> +	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
> +
> +	/**< Number of red bytes dropped by current leaf node.  */
> +	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
> +
> +	/**< Number of packets currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
> +
> +	/**< Number of bytes currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
> +};
> +
> +/**
> + * Node statistics counters
> + */
> +struct rte_tm_node_stats {
> +	/**< Number of packets scheduled from current node. */
> +	uint64_t n_pkts;
> +
> +	/**< Number of bytes scheduled from current node. */
> +	uint64_t n_bytes;
> +
> +	/**< Statistics counters for leaf nodes only. */
> +	struct {
> +		/**< Number of packets dropped by current leaf node per each
> +		 * color.
> +		 */
> +		uint64_t n_pkts_dropped[RTE_TM_COLORS];
> +
> +		/**< Number of bytes dropped by current leaf node per each
> +		 * color.
> +		 */
> +		uint64_t n_bytes_dropped[RTE_TM_COLORS];
> +
> +		/**< Number of packets currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_pkts_queued;
> +
> +		/**< Number of bytes currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_bytes_queued;
> +	} leaf;
> +};
> +
> +/**
> + * Traffic manager dynamic updates
> + */
> +enum rte_tm_dynamic_update_type {
> +	/**< Dynamic parent node update. The new parent node is located on same
> +	 * hierarchy level as the former parent node. Consequently, the node
> +	 * whose parent is changed preserves its hierarchy level.
> +	 */
> +	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
> +
> +	/**< Dynamic parent node update. The new parent node is located on
> +	 * different hierarchy level than the former parent node. Consequently,
> +	 * the node whose parent is changed also changes its hierarchy level.
> +	 */
> +	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
> +
> +	/**< Dynamic node add/delete. */
> +	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
> +
> +	/**< Suspend/resume nodes. */
> +	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
> +
> +	/**< Dynamic switch between byte-based and packet-based WFQ weights. */
> +	RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE = 1 << 4,
> +
> +	/**< Dynamic update on number of SP priorities. */
> +	RTE_TM_UPDATE_NODE_N_SP_PRIORITIES = 1 << 5,
> +
> +	/**< Dynamic update of congestion management mode for leaf nodes. */
> +	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
> +
> +	/**< Dynamic update of the set of enabled stats counter types. */
> +	RTE_TM_UPDATE_NODE_STATS = 1 << 7,
> +};
> +
> +/**
> + * Traffic manager capabilities
> + */
> +struct rte_tm_capabilities {
> +	/**< Maximum number of nodes. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of levels (i.e. number of nodes connecting the root
> +	 * node with any leaf node, including the root and the leaf).
> +	 */
> +	uint32_t n_levels_max;
> +
> +	/**< When non-zero, this flag indicates that all the non-leaf nodes
> +	 * (with the exception of the root node) have identical capability set.
> +	 */
> +	int non_leaf_nodes_identical;
> +
> +	/**< When non-zero, this flag indicates that all the leaf nodes have
> +	 * identical capability set.
> +	 */
> +	int leaf_nodes_identical;
> +
> +	/**< Maximum number of shapers, either private or shared. In case the
> +	 * implementation does not share any resources between private and
> +	 * shared shapers, it is typically equal to the sum of
> +	 * *shaper_private_n_max* and *shaper_shared_n_max*.
> +	 */
> +	uint32_t shaper_n_max;
> +
> +	/**< Maximum number of private shapers. Indicates the maximum number of
> +	 * nodes that can concurrently have their private shaper enabled.
> +	 */
> +	uint32_t shaper_private_n_max;
> +
> +	/**< Maximum number of private shapers that support dual rate shaping.
> +	 * Indicates the maximum number of nodes that can concurrently have
> +	 * their private shaper enabled with dual rate support. Only valid when
> +	 * private shapers are supported. The value of zero indicates that dual
> +	 * rate shaping is not available for private shapers. The maximum value
> +	 * is *shaper_private_n_max*.
> +	 */
> +	int shaper_private_dual_rate_n_max;
> +	
> +	/**< Minimum committed/peak rate (bytes per second) for any private
> +	 * shaper. Valid only when private shapers are supported.
> +	 */
> +	uint64_t shaper_private_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for any private
> +	 * shaper. Valid only when private shapers are supported.
> +	 */
> +	uint64_t shaper_private_rate_max;
> +
> +	/**< Maximum number of shared shapers. The value of zero indicates that
> +	 * shared shapers are not supported.
> +	 */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Maximum number of nodes that can share the same shared shaper.
> +	 * Only valid when shared shapers are supported.
> +	 */
> +	uint32_t shaper_shared_n_nodes_per_shaper_max;
> +
> +	/**< Maximum number of shared shapers a node can be part of. This
> +	 * parameter indicates that there is at least one node that can be
> +	 * configured with this many shared shapers, which might not be true for
> +	 * all the nodes. Only valid when shared shapers are supported, in which
> +	 * case it ranges from 1 to *shaper_shared_n_max*.
> +	 */
> +	uint32_t shaper_shared_n_shapers_per_node_max;
> +
> +	/**< Maximum number of shared shapers that can be configured with dual
> +	 * rate shaping. The value of zero indicates that dual rate shaping
> +	 * support is not available for shared shapers.
> +	 */
> +	uint32_t shaper_shared_dual_rate_n_max;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for any shared
> +	 * shaper. Only valid when shared shapers are supported.
> +	 */
> +	uint64_t shaper_shared_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for any shared
> +	 * shaper. Only valid when shared shapers are supported.
> +	 */
> +	uint64_t shaper_shared_rate_max;
> +
> +	/**< Minimum value allowed for packet length adjustment for any private
> +	 * or shared shaper.
> +	 */
> +	int shaper_pkt_length_adjust_min;
> +
> +	/**< Maximum value allowed for packet length adjustment for any private
> +	 * or shared shaper.
> +	 */
> +	int shaper_pkt_length_adjust_max;
> +
> +	/**< Maximum number of children nodes. This parameter indicates that
> +	 * there is at least one non-leaf node that can be configured with this
> +	 * many children nodes, which might not be true for all the non-leaf
> +	 * nodes.
> +	 */
> +	uint32_t sched_n_children_max;
> +
> +	/**< Maximum number of supported priority levels. This parameter
> +	 * indicates that there is at least one non-leaf node that can be
> +	 * configured with this many priority levels for managing its children
> +	 * nodes, which might not be true for all the non-leaf nodes. The value
> +	 * of zero is invalid. The value of 1 indicates that only priority 0 is
> +	 * supported, which essentially means that Strict Priority (SP)
> +	 * algorithm is not supported.
> +	 */
> +	uint32_t sched_sp_n_priorities_max;
> +
> +	/**< Maximum number of sibling nodes that can have the same priority at
> +	 * any given time, i.e. maximum size of the WFQ sibling node group. This
> +	 * parameter indicates there is at least one non-leaf node that meets
> +	 * this condition, which might not be true for all the non-leaf nodes.
> +	 * The value of zero is invalid. The value of 1 indicates that WFQ
> +	 * algorithm is not supported. The maximum value is
> +	 * *sched_n_children_max*.
> +	 */
> +	uint32_t sched_wfq_n_children_per_group_max;
> +
> +	/**< Maximum number of priority levels that can have more than one child
> +	 * node at any given time, i.e. maximum number of WFQ sibling node
> +	 * groups that have two or more members. This parameter indicates there
> +	 * is at least one non-leaf node that meets this condition, which might
> +	 * not be true for all the non-leaf nodes. The value of zero states that
> +	 * WFQ algorithm is not supported. The value of 1 indicates that
> +	 * (*sched_sp_n_priorities_max* - 1) priority levels have at most one
> +	 * child node, so there can be only one priority level with two or
> +	 * more sibling nodes making up a WFQ group. The maximum value is:
> +	 * min(floor(*sched_n_children_max* / 2), *sched_sp_n_priorities_max*).
> +	 */
> +	uint32_t sched_wfq_n_groups_max;
> +
> +	/**< Maximum WFQ weight. The value of 1 indicates that all sibling nodes
> +	 * with same priority have the same WFQ weight, so WFQ is reduced to FQ.
> +	 */
> +	uint32_t sched_wfq_weight_max;
> +
> +	/**< Head drop algorithm support. When non-zero, this parameter
> +	 * indicates that there is at least one leaf node that supports the head
> +	 * drop algorithm, which might not be true for all the leaf nodes.
> +	 */
> +	int cman_head_drop_supported;
> +
> +	/**< Maximum number of WRED contexts, either private or shared. In case
> +	 * the implementation does not share any resources between private and
> +	 * shared WRED contexts, it is typically equal to the sum of
> +	 * *cman_wred_context_private_n_max* and
> +	 * *cman_wred_context_shared_n_max*.
> +	 */
> +	uint32_t cman_wred_context_n_max;
> +
> +	/**< Maximum number of private WRED contexts. Indicates the maximum
> +	 * number of leaf nodes that can concurrently have their private WRED
> +	 * context enabled.
> +	 */
> +	uint32_t cman_wred_context_private_n_max;
> +
> +	/**< Maximum number of shared WRED contexts. The value of zero
> +	 * indicates that shared WRED contexts are not supported.
> +	 */
> +	uint32_t cman_wred_context_shared_n_max;
> +
> +	/**< Maximum number of leaf nodes that can share the same WRED context.
> +	 * Only valid when shared WRED contexts are supported.
> +	 */
> +	uint32_t cman_wred_context_shared_n_nodes_per_context_max;
> +
> +	/**< Maximum number of shared WRED contexts a leaf node can be part of.
> +	 * This parameter indicates that there is at least one leaf node that
> +	 * can be configured with this many shared WRED contexts, which might
> +	 * not be true for all the leaf nodes. Only valid when shared WRED
> +	 * contexts are supported, in which case it ranges from 1 to
> +	 * *cman_wred_context_shared_n_max*.
> +	 */
> +	uint32_t cman_wred_context_shared_n_contexts_per_node_max;
> +
> +	/**< Support for VLAN DEI packet marking (per color). */
> +	int mark_vlan_dei_supported[RTE_TM_COLORS];
> +
> +	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
> +	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
> +
> +	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
> +	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
> +
> +	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
> +	int mark_ip_dscp_supported[RTE_TM_COLORS];
> +
> +	/**< Set of supported dynamic update operations.
> +	 * @see enum rte_tm_dynamic_update_type
> +	 */
> +	uint64_t dynamic_update_mask;
> +
> +	/**< Set of supported statistics counter types.
> +	 * @see enum rte_tm_stats_type
> +	 */
> +	uint64_t stats_mask;
> +};
> +
> +/**
> + * Traffic manager level capabilities
> + */
> +struct rte_tm_level_capabilities {
> +	/**< Maximum number of nodes for the current hierarchy level. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of non-leaf nodes for the current hierarchy level.
> +	 * The value of 0 indicates that current level only supports leaf
> +	 * nodes. The maximum value is *n_nodes_max*.
> +	 */
> +	uint32_t n_nodes_nonleaf_max;
> +
> +	/**< Maximum number of leaf nodes for the current hierarchy level. The
> +	 * value of 0 indicates that current level only supports non-leaf
> +	 * nodes. The maximum value is *n_nodes_max*.
> +	 */
> +	uint32_t n_nodes_leaf_max;
> +
> +	/**< When non-zero, this flag indicates that all the non-leaf nodes on
> +	 * this level have identical capability set. Valid only when
> +	 * *n_nodes_nonleaf_max* is non-zero.
> +	 */
> +	int non_leaf_nodes_identical;
> +
> +	/**< When non-zero, this flag indicates that all the leaf nodes on this
> +	 * level have identical capability set. Valid only when
> +	 * *n_nodes_leaf_max* is non-zero.
> +	 */
> +	int leaf_nodes_identical;
> +
> +	union {
> +		/**< Items valid only for the non-leaf nodes on this level. */
> +		struct {
> +			/**< Private shaper support. When non-zero, it indicates
> +			 * there is at least one non-leaf node on this level
> +			 * with private shaper support, which may not be the
> +			 * case for all the non-leaf nodes on this level.
> +			 */
> +			int shaper_private_supported;
> +
> +			/**< Dual rate support for private shaper. Valid only
> +			 * when private shaper is supported for the non-leaf
> +			 * nodes on the current level. When non-zero, it
> +			 * indicates there is at least one non-leaf node on this
> +			 * level with dual rate private shaper support, which
> +			 * may not be the case for all the non-leaf nodes on
> +			 * this level.
> +			 */
> +			int shaper_private_dual_rate_supported;
> +
> +			/**< Minimum committed/peak rate (bytes per second) for
> +			 * private shapers of the non-leaf nodes of this level.
> +			 * Valid only when private shaper is supported on this
> +			 * level.
> +			 */
> +			uint64_t shaper_private_rate_min;
> +
> +			/**< Maximum committed/peak rate (bytes per second) for
> +			 * private shapers of the non-leaf nodes on this level.
> +			 * Valid only when private shaper is supported on this
> +			 * level.
> +			 */
> +			uint64_t shaper_private_rate_max;
> +
> +			/**< Maximum number of shared shapers that any non-leaf
> +			 * node on this level can be part of. The value of zero
> +			 * indicates that shared shapers are not supported by
> +			 * the non-leaf nodes on this level. When non-zero, it
> +			 * indicates there is at least one non-leaf node on this
> +			 * level that meets this condition, which may not be the
> +			 * case for all the non-leaf nodes on this level.
> +			 */
> +			uint32_t shaper_shared_n_max;
> +
> +			/**< Maximum number of children nodes. This parameter
> +			 * indicates that there is at least one non-leaf node on
> +			 * this level that can be configured with this many
> +			 * children nodes, which might not be true for all the
> +			 * non-leaf nodes on this level.
> +			 */
> +			uint32_t sched_n_children_max;
> +
> +			/**< Maximum number of supported priority levels. This
> +			 * parameter indicates that there is at least one
> +			 * non-leaf node on this level that can be configured
> +			 * with this many priority levels for managing its
> +			 * children nodes, which might not be true for all the
> +			 * non-leaf nodes on this level. The value of zero is
> +			 * invalid. The value of 1 indicates that only priority
> +			 * 0 is supported, which essentially means that Strict
> +			 * Priority (SP) algorithm is not supported on this
> +			 * level.
> +			 */
> +			uint32_t sched_sp_n_priorities_max;
> +
> +			/**< Maximum number of sibling nodes that can have the
> +			 * same priority at any given time, i.e. maximum size of
> +			 * the WFQ sibling node group. This parameter indicates
> +			 * there is at least one non-leaf node on this level
> +			 * that meets this condition, which may not be true for
> +			 * all the non-leaf nodes on this level. The value of
> +			 * zero is invalid. The value of 1 indicates that WFQ
> +			 * algorithm is not supported on this level. The maximum
> +			 * value is *sched_n_children_max*.
> +			 */
> +			uint32_t sched_wfq_n_children_per_group_max;
> +
> +			/**< Maximum number of priority levels that can have
> +			 * more than one child node at any given time, i.e.
> +			 * maximum number of WFQ sibling node groups that
> +			 * have two or more members. This parameter indicates
> +			 * there is at least one non-leaf node on this level
> +			 * that meets this condition, which might not be true
> +			 * for all the non-leaf nodes. The value of zero states
> +			 * that WFQ algorithm is not supported on this level.
> +			 * The value of 1 indicates that
> +			 * (*sched_sp_n_priorities_max* - 1) priority levels on
> +			 * this level have at most one child node, so there can
> +			 * be only one priority level with two or more sibling
> +			 * nodes making up a WFQ group on this level. The
> +			 * maximum value is:
> +			 * min(floor(*sched_n_children_max* / 2),
> +			 * *sched_sp_n_priorities_max*).
> +			 */
> +			uint32_t sched_wfq_n_groups_max;
> +
> +			/**< Maximum WFQ weight. The value of 1 indicates that
> +			 * all sibling nodes on this level with same priority
> +			 * have the same WFQ weight, so on this level WFQ is
> +			 * reduced to FQ.
> +			 */
> +			uint32_t sched_wfq_weight_max;
> +
> +			/**< Mask of statistics counter types supported by the
> +			 * non-leaf nodes on this level. Every supported
> +			 * statistics counter type is supported by at least one
> +			 * non-leaf node on this level, which may not be true
> +			 * for all the non-leaf nodes on this level.
> +			 * @see enum rte_tm_stats_type
> +			 */
> +			uint64_t stats_mask;
> +		} nonleaf;
> +
> +		/**< Items valid only for the leaf nodes on this level. */
> +		struct {
> +			/**< Private shaper support. When non-zero, it indicates
> +			 * there is at least one leaf node on this level with
> +			 * private shaper support, which may not be the case for
> +			 * all the leaf nodes on this level.
> +			 */
> +			int shaper_private_supported;
> +
> +			/**< Dual rate support for private shaper. Valid only
> +			 * when private shaper is supported for the leaf nodes
> +			 * on this level. When non-zero, it indicates there is
> +			 * at least one leaf node on this level with dual rate
> +			 * private shaper support, which may not be the case for
> +			 * all the leaf nodes on this level.
> +			 */
> +			int shaper_private_dual_rate_supported;
> +
> +			/**< Minimum committed/peak rate (bytes per second) for
> +			 * private shapers of the leaf nodes of this level.
> +			 * Valid only when private shaper is supported for the
> +			 * leaf nodes on this level.
> +			 */
> +			uint64_t shaper_private_rate_min;
> +
> +			/**< Maximum committed/peak rate (bytes per second) for
> +			 * private shapers of the leaf nodes on this level.
> +			 * Valid only when private shaper is supported for the
> +			 * leaf nodes on this level.
> +			 */
> +			uint64_t shaper_private_rate_max;
> +
> +			/**< Maximum number of shared shapers that any leaf node
> +			 * on this level can be part of. The value of zero
> +			 * indicates that shared shapers are not supported by
> +			 * the leaf nodes on this level. When non-zero, it
> +			 * indicates there is at least one leaf node on this
> +			 * level that meets this condition, which may not be the
> +			 * case for all the leaf nodes on this level.
> +			 */
> +			uint32_t shaper_shared_n_max;
> +
> +			/**< Head drop algorithm support. When non-zero, this
> +			 * parameter indicates that there is at least one leaf
> +			 * node on this level that supports the head drop
> +			 * algorithm, which might not be true for all the leaf
> +			 * nodes on this level.
> +			 */
> +			int cman_head_drop_supported;
> +
> +			/**< Private WRED context support. When non-zero, it
> +			 * indicates there is at least one node on this level
> +			 * with private WRED context support, which may not be
> +			 * true for all the leaf nodes on this level. */
> +			int cman_wred_context_private_supported;
> +
> +			/**< Maximum number of shared WRED contexts that any
> +			 * leaf node on this level can be part of. The value of
> +			 * zero indicates that shared WRED contexts are not
> +			 * supported by the leaf nodes on this level. When
> +			 * non-zero, it indicates there is at least one leaf
> +			 * node on this level that meets this condition, which
> +			 * may not be the case for all the leaf nodes on this
> +			 * level.
> +			 */
> +			uint32_t cman_wred_context_shared_n_max;
> +
> +			/**< Mask of statistics counter types supported by the
> +			 * leaf nodes on this level. Every supported statistics
> +			 * counter type is supported by at least one leaf node
> +			 * on this level, which may not be true for all the leaf
> +			 * nodes on this level.
> +			 * @see enum rte_tm_stats_type
> +			 */
> +			uint64_t stats_mask;
> +		} leaf;
> +	};
> +};
> +
> +/**
> + * Traffic manager node capabilities
> + */
> +struct rte_tm_node_capabilities {
> +	/**< Private shaper support for the current node. */
> +	int shaper_private_supported;
> +
> +	/**< Dual rate shaping support for private shaper of current node.
> +	 * Valid only when private shaper is supported by the current node.
> +	 */
> +	int shaper_private_dual_rate_supported;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for private
> +	 * shaper of current node. Valid only when private shaper is supported
> +	 * by the current node.
> +	 */
> +	uint64_t shaper_private_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for private
> +	 * shaper of current node. Valid only when private shaper is supported
> +	 * by the current node.
> +	 */
> +	uint64_t shaper_private_rate_max;
> +
> +	/**< Maximum number of shared shapers the current node can be part of.
> +	 * The value of zero indicates that shared shapers are not supported by
> +	 * the current node.
> +	 */
> +	uint32_t shaper_shared_n_max;
> +
> +	union {
> +		/**< Items valid only for non-leaf nodes. */
> +		struct {
> +			/**< Maximum number of children nodes. */
> +			uint32_t sched_n_children_max;
> +
> +			/**< Maximum number of supported priority levels. The
> +			 * value of zero is invalid. The value of 1 indicates
> +			 * that only priority 0 is supported, which essentially
> +			 * means that Strict Priority (SP) algorithm is not
> +			 * supported.
> +			 */
> +			uint32_t sched_sp_n_priorities_max;
> +
> +			/**< Maximum number of sibling nodes that can have the
> +			 * same priority at any given time, i.e. maximum size
> +			 * of the WFQ sibling node group. The value of zero
> +			 * is invalid. The value of 1 indicates that WFQ
> +			 * algorithm is not supported. The maximum value is
> +			 * *sched_n_children_max*.
> +			 */
> +			uint32_t sched_wfq_n_children_per_group_max;
> +
> +			/**< Maximum number of priority levels that can have
> +			 * more than one child node at any given time, i.e.
> +			 * maximum number of WFQ sibling node groups that have
> +			 * two or more members. The value of zero states that
> +			 * WFQ algorithm is not supported. The value of 1
> +			 * indicates that (*sched_sp_n_priorities_max* - 1)
> +			 * priority levels have at most one child node, so there
> +			 * can be only one priority level with two or more
> +			 * sibling nodes making up a WFQ group. The maximum
> +			 * value is: min(floor(*sched_n_children_max* / 2),
> +			 * *sched_sp_n_priorities_max*).
> +			 */
> +			uint32_t sched_wfq_n_groups_max;
> +
> +			/**< Maximum WFQ weight. The value of 1 indicates that
> +			 * all sibling nodes with same priority have the same
> +			 * WFQ weight, so WFQ is reduced to FQ.
> +			 */
> +			uint32_t sched_wfq_weight_max;
> +		} nonleaf;
> +
> +		/**< Items valid only for leaf nodes. */
> +		struct {
> +			/**< Head drop algorithm support for current node. */
> +			int cman_head_drop_supported;
> +
> +			/**< Private WRED context support for current node. */
> +			int cman_wred_context_private_supported;
> +
> +			/**< Maximum number of shared WRED contexts the current
> +			 * node can be part of. The value of zero indicates that
> +			 * shared WRED contexts are not supported by the current
> +			 * node.
> +			 */
> +			uint32_t cman_wred_context_shared_n_max;
> +		} leaf;
> +	};
> +
> +	/**< Mask of statistics counter types supported by the current node.
> +	 * @see enum rte_tm_stats_type
> +	 */
> +	uint64_t stats_mask;
> +};
> +
> +/**
> + * Congestion management (CMAN) mode
> + *
> + * This is used for controlling the admission of packets into a packet queue or
> + * group of packet queues on congestion. On request of writing a new packet
> + * into the current queue while the queue is full, the *tail drop* algorithm
> + * drops the new packet while leaving the queue unmodified, as opposed to *head
> + * drop* algorithm, which drops the packet at the head of the queue (the oldest
> + * packet waiting in the queue) and admits the new packet at the tail of the
> + * queue.
> + *
> + * The *Random Early Detection (RED)* algorithm works by proactively dropping
> + * more and more input packets as the queue occupancy builds up. When the queue
> + * is full or almost full, RED effectively works as *tail drop*. The *Weighted
> + * RED* algorithm uses a separate set of RED thresholds for each packet color.
> + */
> +enum rte_tm_cman_mode {
> +	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
> +	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
> +	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
> +};
> +
> +/**
> + * Random Early Detection (RED) profile
> + */
> +struct rte_tm_red_params {
> +	/**< Minimum queue threshold */
> +	uint16_t min_th;
> +
> +	/**< Maximum queue threshold */
> +	uint16_t max_th;
> +
> +	/**< Inverse of packet marking probability maximum value (maxp), i.e.
> +	 * maxp_inv = 1 / maxp
> +	 */
> +	uint16_t maxp_inv;
> +
> +	/**< Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
> +	uint16_t wq_log2;
> +};
> +
> +/**
> + * Weighted RED (WRED) profile
> + *
> + * Multiple WRED contexts can share the same WRED profile. Each leaf node with
> + * WRED enabled as its congestion management mode has zero or one private WRED
> + * context (only one leaf node using it) and/or zero, one or several shared
> + * WRED contexts (multiple leaf nodes use the same WRED context). A private
> + * WRED context is used to perform congestion management for a single leaf
> + * node, while a shared WRED context is used to perform congestion management
> + * for a group of leaf nodes.
> + */
> +struct rte_tm_wred_params {
> +	/**< One set of RED parameters per packet color */
> +	struct rte_tm_red_params red_params[RTE_TM_COLORS];
> +};
> +
> +/**
> + * Token bucket
> + */
> +struct rte_tm_token_bucket {
> +	/**< Token bucket rate (bytes per second) */
> +	uint64_t rate;
> +
> +	/**< Token bucket size (bytes), a.k.a. max burst size */
> +	uint64_t size;
> +};
> +
> +/**
> + * Shaper (rate limiter) profile
> + *
> + * Multiple shaper instances can share the same shaper profile. Each node has
> + * zero or one private shaper (only one node using it) and/or zero, one or
> + * several shared shapers (multiple nodes use the same shaper instance).
> + * A private shaper is used to perform traffic shaping for a single node, while
> + * a shared shaper is used to perform traffic shaping for a group of nodes.
> + *
> + * Single rate shapers use a single token bucket. A single rate shaper can be
> + * configured by setting the rate of the committed bucket to zero, which
> + * effectively disables this bucket. The peak bucket is used to limit the rate
> + * and the burst size for the current shaper.
> + *
> + * Dual rate shapers use both the committed and the peak token buckets. The
> + * rate of the peak bucket has to be bigger than zero, as well as greater than
> + * or equal to the rate of the committed bucket.
> + */
> +struct rte_tm_shaper_params {
> +	/**< Committed token bucket */
> +	struct rte_tm_token_bucket committed;
> +
> +	/**< Peak token bucket */
> +	struct rte_tm_token_bucket peak;
> +
> +	/**< Signed value to be added to the length of each packet for the
> +	 * purpose of shaping. Can be used to correct the packet length with
> +	 * the framing overhead bytes that are also consumed on the wire (e.g.
> +	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
> +	 */
> +	int32_t pkt_length_adjust;
> +};
> +
> +/**
> + * Node parameters
> + *
> + * Each non-leaf node has multiple inputs (its children nodes) and single output
> + * (which is input to its parent node). It arbitrates its inputs using Strict
> + * Priority (SP) and Weighted Fair Queuing (WFQ) algorithms to schedule input
> + * packets to its output while observing its shaping (rate limiting)
> + * constraints.
> + *
> + * Algorithms such as Weighted Round Robin (WRR), Byte-level WRR, Deficit WRR
> + * (DWRR), etc. are considered approximations of the WFQ ideal and are
> + * assimilated to WFQ, although an associated implementation-dependent trade-off
> + * on accuracy, performance and resource usage might exist.
> + *
> + * Children nodes with different priorities are scheduled using the SP algorithm
> + * based on their priority, with zero (0) as the highest priority. Children with
> + * the same priority are scheduled using the WFQ algorithm according to their
> + * weights. The WFQ weight of a given child node is relative to the sum of the
> + * weights of all its sibling nodes that have the same priority, with one (1) as
> + * the lowest weight. For each SP priority, the WFQ weight mode can be set as
> + * either byte-based or packet-based.
> + *
> + * Each leaf node sits on top of a TX queue of the current Ethernet port. Hence,
> + * the leaf nodes are predefined, with their node IDs set to 0 .. (N-1), where N
> + * is the number of TX queues configured for the current Ethernet port. The
> + * non-leaf nodes have their IDs generated by the application.
> + */
> +struct rte_tm_node_params {
> +	/**< Shaper profile for the private shaper. The absence of the private
> +	 * shaper for the current node is indicated by setting this parameter
> +	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
> +	 */
> +	uint32_t shaper_profile_id;
> +
> +	/**< User allocated array of valid shared shaper IDs. */
> +	uint32_t *shared_shaper_id;
> +
> +	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
> +	uint32_t n_shared_shapers;
> +
> +	union {
> +		/**< Parameters only valid for non-leaf nodes. */
> +		struct {
> +			/**< WFQ weight mode for each SP priority. When NULL, it
> +			 * indicates that WFQ is to be used for all priorities.
> +			 * When non-NULL, it points to a pre-allocated array of
> +			 * *n_sp_priorities* values, with non-zero value for
> +			 * byte-mode and zero for packet-mode.
> +			 */
> +			int *wfq_weight_mode;
> +
> +			/**< Number of SP priorities. */
> +			uint32_t n_sp_priorities;
> +		} nonleaf;
> +
> +		/**< Parameters only valid for leaf nodes. */
> +		struct {
> +			/**< Congestion management mode */
> +			enum rte_tm_cman_mode cman;
> +
> +			/**< WRED parameters (only valid when *cman* is set to
> +			 * WRED).
> +			 */
> +			struct {
> +				/**< WRED profile for private WRED context. The
> +				 * absence of a private WRED context for the
> +				 * current leaf node is indicated by value
> +				 * RTE_TM_WRED_PROFILE_ID_NONE.
> +				 */
> +				uint32_t wred_profile_id;
> +
> +				/**< User allocated array of shared WRED context
> +				 * IDs. When set to NULL, it indicates that the
> +				 * current leaf node should not currently be
> +				 * part of any shared WRED contexts.
> +				 */
> +				uint32_t *shared_wred_context_id;
> +
> +				/**< Number of elements in the
> +				 * *shared_wred_context_id* array. Only valid
> +				 * when *shared_wred_context_id* is non-NULL,
> +				 * in which case it should be non-zero.
> +				 */
> +				uint32_t n_shared_wred_contexts;
> +			} wred;
> +		} leaf;
> +	};
> +
> +	/**< Mask of statistics counter types to be enabled for this node. This
> +	 * needs to be a subset of the statistics counter types available for
> +	 * the current node. Any statistics counter type not included in this
> +	 * set is to be disabled for the current node.
> +	 * @see enum rte_tm_stats_type
> +	 */
> +	uint64_t stats_mask;
> +};
> +
> +/**
> + * Verbose error types.
> + *
> + * Most of them provide the type of the object referenced by struct
> + * rte_tm_error::cause.
> + */
> +enum rte_tm_error_type {
> +	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
> +	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
> +	RTE_TM_ERROR_TYPE_CAPABILITIES,
> +	RTE_TM_ERROR_TYPE_LEVEL_ID,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
> +	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
> +	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
> +	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SP_PRIORITIES,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
> +	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
> +	RTE_TM_ERROR_TYPE_NODE_ID,
> +};
> +
> +/**
> + * Verbose error structure definition.
> + *
> + * This object is normally allocated by applications and set by PMDs, the
> + * message points to a constant string which does not need to be freed by
> + * the application, however its pointer can be considered valid only as long
> + * as its associated DPDK port remains configured. Closing the underlying
> + * device or unloading the PMD invalidates it.
> + *
> + * Both cause and message may be NULL regardless of the error type.
> + */
> +struct rte_tm_error {
> +	enum rte_tm_error_type type; /**< Cause field and error type. */
> +	const void *cause; /**< Object responsible for the error. */
> +	const char *message; /**< Human-readable error message. */
> +};
> +
> +/**
> + * Traffic manager get number of leaf nodes
> + *
> + * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> + * Therefore, the set of leaf nodes is predefined, their number is always equal
> + * to N (where N is the number of TX queues configured for the current port)
> + * and their IDs are 0 .. (N-1).
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[out] n_leaf_nodes
> + *   Number of leaf nodes for the current port.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
> +	uint32_t *n_leaf_nodes,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node type (i.e. leaf or non-leaf) get
> + *
> + * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
> + * the number of TX queues of the current Ethernet port. The non-leaf nodes
> + * have their IDs generated by the application outside of the above range,
> + * which is reserved for leaf nodes.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID value. Needs to be valid.
> + * @param[out] is_leaf
> + *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_type_get(uint8_t port_id,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node level get
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID value. Needs to be valid.
> + * @param[out] level_id
> + *   Node level ID. Needs to be non-NULL.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_level_get(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t *level_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager capabilities get
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[out] cap
> + *   Traffic manager capabilities. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_capabilities_get(uint8_t port_id,
> +	struct rte_tm_capabilities *cap,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager level capabilities get
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] level_id
> + *   The hierarchy level identifier. The value of 0 identifies the level of the
> + *   root node.
> + * @param[out] cap
> + *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_level_capabilities_get(uint8_t port_id,
> +	uint32_t level_id,
> +	struct rte_tm_level_capabilities *cap,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node capabilities get
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] cap
> + *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_capabilities_get(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_node_capabilities *cap,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager WRED profile add
> + *
> + * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
> + * is used to create one or several WRED contexts.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] wred_profile_id
> + *   WRED profile ID for the new profile. Needs to be unused.
> + * @param[in] profile
> + *   WRED profile parameters. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_wred_profile_add(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_wred_params *profile,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager WRED profile delete
> + *
> + * Delete an existing WRED profile. This operation fails when there is
> + * currently at least one user (i.e. WRED context) of this WRED profile.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_wred_profile_delete(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared WRED context add or update
> + *
> + * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
> + * created by using the WRED profile identified by *wred_profile_id*.
> + *
> + * When *shared_wred_context_id* is valid, this WRED context is no longer using
> + * the profile previously assigned to it and is updated to use the profile
> + * identified by *wred_profile_id*.
> + *
> + * A valid shared WRED context can be assigned to several hierarchy leaf nodes
> + * configured to use WRED as the congestion management mode.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] shared_wred_context_id
> + *   Shared WRED context ID
> + * @param[in] wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_wred_context_add_update(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared WRED context delete
> + *
> + * Delete an existing shared WRED context. This operation fails when there is
> + * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
> + * context.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] shared_wred_context_id
> + *   Shared WRED context ID. Needs to be the valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_wred_context_delete(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shaper profile add
> + *
> + * Create a new shaper profile with ID set to *shaper_profile_id*. The new
> + * shaper profile is used to create one or several shapers.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] shaper_profile_id
> + *   Shaper profile ID for the new profile. Needs to be unused.
> + * @param[in] profile
> + *   Shaper profile parameters. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shaper_profile_add(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_shaper_params *profile,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shaper profile delete
> + *
> + * Delete an existing shaper profile. This operation fails when there is
> + * currently at least one user (i.e. shaper) of this shaper profile.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shaper_profile_delete(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared shaper add or update
> + *
> + * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
> + * with this ID is created using the shaper profile identified by
> + * *shaper_profile_id*.
> + *
> + * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
> + * no longer using the shaper profile previously assigned to it and is updated
> + * to use the shaper profile identified by *shaper_profile_id*.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] shared_shaper_id
> + *   Shared shaper ID
> + * @param[in] shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_shaper_add_update(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager shared shaper delete
> + *
> + * Delete an existing shared shaper. This operation fails when there is
> + * currently at least one user (i.e. hierarchy node) of this shared shaper.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] shared_shaper_id
> + *   Shared shaper ID. Needs to be the valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_shared_shaper_delete(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node add
> + *
> + * Create new node and connect it as child of an existing node. The new node is
> + * further identified by *node_id*, which needs to be unused by any of the
> + * existing nodes. The parent node is identified by *parent_node_id*, which
> + * needs to be the valid ID of an existing non-leaf node. The parent node is
> + * going to use the provided SP *priority* and WFQ *weight* to schedule its new
> + * child node.
> + *
> + * This function has to be called for both leaf and non-leaf nodes. In the case
> + * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
> + * the number of configured TX queues of the current port), the leaf node is
> + * configured rather than created (as the set of leaf nodes is predefined) and
> + * it is also connected as child of an existing node.
> + *
> + * The first node that is added becomes the root node and all the nodes that
> + * are subsequently added have to be added as descendants of the root node. The
> + * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
> + * can only be one node with this parent ID (i.e. the root node). Further
> + * restrictions for root node: needs to be non-leaf, its private shaper profile
> + * needs to be valid and single rate, cannot use any shared shapers.
> + *
> + * When called before rte_tm_hierarchy_commit() invocation, this function is
> + * typically used to define the initial start-up hierarchy for the port.
> + * Provided that dynamic hierarchy updates are supported by the current port (as
> + * advertised in the port capability set), this function can be also called
> + * after the rte_tm_hierarchy_commit() invocation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be unused by any of the existing nodes.
> + * @param[in] parent_node_id
> + *   Parent node ID. Needs to be the valid.
> + * @param[in] priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param[in] weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is one. Used by the WFQ
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param[in] params
> + *   Node parameters. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_hierarchy_commit()
> + * @see RTE_TM_UPDATE_NODE_ADD_DELETE
> + */
> +int
> +rte_tm_node_add(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node add with node level check
> + *
> + * Simple rte_tm_node_add() wrapper that also checks the node level.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be unused by any of the existing nodes.
> + * @param[in] parent_node_id
> + *   Parent node ID. Needs to be the valid.
> + * @param[in] priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param[in] weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is one. Used by the WFQ
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param[in] level_id
> + *   Level ID that should be met by this node.
> + * @param[in] params
> + *   Node parameters. Needs to be pre-allocated and valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +static inline int
> +rte_tm_node_add_check_level(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	uint32_t level_id,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error)
> +{
> +	uint32_t lid;
> +	int status;
> +
> +	status = rte_tm_node_add(port_id, node_id,
> +		parent_node_id, priority, weight, params, error);
> +	if (status)
> +		return status;
> +
> +	status = rte_tm_node_level_get(port_id, node_id, &lid, error);
> +	if (status)
> +		return status;
> +
> +	if (lid != level_id){
> +		if (error){
> +			error->type = RTE_TM_ERROR_TYPE_LEVEL_ID;
> +			error->cause = NULL;
> +			error->message = rte_strerror(EINVAL);
> +		}
> +		rte_errno = EINVAL;
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * Traffic manager node delete
> + *
> + * Delete an existing node. This operation fails when this node currently has
> + * at least one user (i.e. child node).
> + *
> + * When called before rte_tm_hierarchy_commit() invocation, this function is
> + * typically used to define the initial start-up hierarchy for the port.
> + * Provided that dynamic hierarchy updates are supported by the current port (as
> + * advertised in the port capability set), this function can be also called
> + * after the rte_tm_hierarchy_commit() invocation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_ADD_DELETE
> + */
> +int
> +rte_tm_node_delete(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node suspend
> + *
> + * Suspend an existing node. While the node is in suspended state, no packet is
> + * scheduled from this node and its descendants. The node exits the suspended
> + * state through the node resume operation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_node_resume()
> + * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
> + */
> +int
> +rte_tm_node_suspend(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node resume
> + *
> + * Resume an existing node that is currently in suspended state. The node
> + * entered the suspended state as result of a previous node suspend operation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_node_suspend()
> + * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
> + */
> +int
> +rte_tm_node_resume(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager hierarchy commit
> + *
> + * This function is called during the port initialization phase (before the
> + * Ethernet port is started) to freeze the start-up hierarchy.
> + *
> + * This function typically performs the following steps:
> + *    a) It validates the start-up hierarchy that was previously defined for the
> + *       current port through successive rte_tm_node_add() invocations;
> + *    b) Assuming successful validation, it performs all the necessary port
> + *       specific configuration operations to install the specified hierarchy on
> + *       the current port, with immediate effect once the port is started.
> + *
> + * This function fails when the currently configured hierarchy is not supported
> + * by the Ethernet port, in which case the user can abort or try out another
> + * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can be
> + * build from scratch (when *clear_on_fail* is enabled) or by modifying the
> + * existing hierarchy configuration (when *clear_on_fail* is disabled).
> + *
> + * Note that this function can still fail due to other causes (e.g. not enough
> + * memory available in the system, etc), even though the specified hierarchy is
> + * supported in principle by the current port.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] clear_on_fail
> + *   On function call failure, hierarchy is cleared when this parameter is
> + *   non-zero and preserved when this parameter is equal to zero.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_node_add()
> + * @see rte_tm_node_delete()
> + */
> +int
> +rte_tm_hierarchy_commit(uint8_t port_id,
> +	int clear_on_fail,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node parent update
> + *
> + * Restriction for root node: its parent cannot be changed.
> + *
> + * This function can only be called after the rte_tm_hierarchy_commit()
> + * invocation. Its success depends on the port support for this operation, as
> + * advertised through the port capability set.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] parent_node_id
> + *   Node ID for the new parent. Needs to be valid.
> + * @param[in] priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param[in] weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is zero. Used by the WFQ
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL
> + * @see RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL
> + */
> +int
> +rte_tm_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node private shaper update
> + *
> + * Restriction for the root node: its private shaper profile needs to be valid
> + * and single rate.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] shaper_profile_id
> + *   Shaper profile ID for the private shaper of the current node. Needs to be
> + *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
> + *   the latter disabling the private shaper of the current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node shared shapers update
> + *
> + * Restriction for root node: cannot use any shared rate shapers.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] shared_shaper_id
> + *   Shared shaper ID. Needs to be valid.
> + * @param[in] add
> + *   Set to non-zero value to add this shared shaper to current node or to zero
> + *   to delete this shared shaper from current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_shared_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int add,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node enabled statistics counters update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] stats_mask
> + *   Mask of statistics counter types to be enabled for the current node. This
> + *   needs to be a subset of the statistics counter types available for the
> + *   current node. Any statistics counter type not included in this set is to
> + *   be disabled for the current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see enum rte_tm_stats_type
> + * @see RTE_TM_UPDATE_NODE_STATS
> + */
> +int
> +rte_tm_node_stats_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t stats_mask,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node WFQ weight mode update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] wfq_weight_mode
> + *   WFQ weight mode for each SP priority. When NULL, it indicates that WFQ is
> + *   to be used for all priorities. When non-NULL, it points to a pre-allocated
> + *   array of *n_sp_priorities* values, with non-zero value for byte-mode and
> + *   zero for packet-mode.
> + * @param[in] n_sp_priorities
> + *   Number of SP priorities.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE
> + * @see RTE_TM_UPDATE_NODE_N_SP_PRIORITIES
> + */
> +int
> +rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
> +	uint32_t node_id,
> +	int *wfq_weight_mode,
> +	uint32_t n_sp_priorities,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node congestion management mode update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] cman
> + *   Congestion management mode.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_CMAN
> + */
> +int
> +rte_tm_node_cman_update(uint8_t port_id,
> +	uint32_t node_id,
> +	enum rte_tm_cman_mode cman,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node private WRED context update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] wred_profile_id
> + *   WRED profile ID for the private WRED context of the current node. Needs to
> + *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with the
> + *   latter disabling the private WRED context of the current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node shared WRED context update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] shared_wred_context_id
> + *   Shared WRED context ID. Needs to be valid.
> + * @param[in] add
> + *   Set to non-zero value to add this shared WRED context to current node or
> + *   to zero to delete this shared WRED context from current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int
> +rte_tm_node_shared_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node statistics counters read
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] stats
> + *   When non-NULL, it contains the current value for the statistics counters
> + *   enabled for the current node.
> + * @param[out] stats_mask
> + *   When non-NULL, it contains the mask of statistics counter types that are
> + *   currently enabled for this node, indicating which of the counters
> + *   retrieved with the *stats* structure are valid.
> + * @param[in] clear
> + *   When this parameter has a non-zero value, the statistics counters are
> + *   cleared (i.e. set to zero) immediately after they have been read,
> + *   otherwise the statistics counters are left untouched.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see enum rte_tm_stats_type
> + */
> +int
> +rte_tm_node_stats_read(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_node_stats *stats,
> +	uint64_t *stats_mask,
> +	int clear,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
> + *
> + * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
> + * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
> + * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
> + * Format Indicator (CFI).
> + *
> + * All VLAN frames of a given color get their DEI bit set if marking is enabled
> + * for this color; otherwise, their DEI bit is left as is (either set or not).
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param[in] mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param[in] mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see struct rte_tm_capabilities::mark_vlan_dei_supported
> + */
> +int
> +rte_tm_mark_vlan_dei(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
> + *
> + * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
> + * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
> + * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
> + * Notification (ECN) field (2 bits). The DSCP field is typically used to
> + * encode the traffic class and/or drop priority (RFC 2597), while the ECN
> + * field is used by RFC 3168 to implement a congestion notification mechanism
> + * to be leveraged by transport layer protocols such as TCP and SCTP that have
> + * congestion control mechanisms.
> + *
> + * When congestion is experienced, as alternative to dropping the packet,
> + * routers can change the ECN field of input packets from 2'b01 or 2'b10
> + * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
> + * that congestion is experienced). The destination endpoint can use the
> + * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
> + * source endpoint, which acknowledges it back to the destination endpoint with
> + * the Congestion Window Reduced (CWR) TCP flag.
> + *
> + * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
> + * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
> + * enabled for the current color, otherwise the ECN field is left as is.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param[in] mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param[in] mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see struct rte_tm_capabilities::mark_ip_ecn_tcp_supported
> + * @see struct rte_tm_capabilities::mark_ip_ecn_sctp_supported
> + */
> +int
> +rte_tm_mark_ip_ecn(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
> + *
> + * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
> + * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
> + * values proposed by this RFC:
> + *
> + *                       Class 1    Class 2    Class 3    Class 4
> + *                     +----------+----------+----------+----------+
> + *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
> + *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
> + *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
> + *                     +----------+----------+----------+----------+
> + *
> + * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
> + * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
> + *
> + * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
> + * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
> + * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
> + * for each color; when not enabled for a given color, the DSCP field of all
> + * packets with that color is left as is.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param[in] mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param[in] mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see struct rte_tm_capabilities::mark_ip_dscp_supported
> + */
> +int
> +rte_tm_mark_ip_dscp(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_TM_H__ */
> diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
> new file mode 100644
> index 0000000..c25f102
> --- /dev/null
> +++ b/lib/librte_ether/rte_tm_driver.h
> @@ -0,0 +1,373 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef __INCLUDE_RTE_TM_DRIVER_H__
> +#define __INCLUDE_RTE_TM_DRIVER_H__
> +
> +/**
> + * @file
> + * RTE Generic Traffic Manager API (Driver Side)
> + *
> + * This file provides implementation helpers for internal use by PMDs, they
> + * are not intended to be exposed to applications and are not subject to ABI
> + * versioning.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include "rte_ethdev.h"
> +#include "rte_tm.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node type get */
> +
> +typedef int (*rte_tm_node_level_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t *level_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node level get */
> +
> +typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
> +	struct rte_tm_capabilities *cap,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager capabilities get */
> +
> +typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
> +	uint32_t level_id,
> +	struct rte_tm_level_capabilities *cap,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager level capabilities get */
> +
> +typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_node_capabilities *cap,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node capabilities get */
> +
> +typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_wred_params *profile,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager WRED profile add */
> +
> +typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager WRED profile delete */
> +
> +typedef int (*rte_tm_shared_wred_context_add_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared WRED context add */
> +
> +typedef int (*rte_tm_shared_wred_context_delete_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared WRED context delete */
> +
> +typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_shaper_params *profile,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shaper profile add */
> +
> +typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shaper profile delete */
> +
> +typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared shaper add/update */
> +
> +typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager shared shaper delete */
> +
> +typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_node_params *params,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node add */
> +
> +typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node delete */
> +
> +typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node suspend */
> +
> +typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node resume */
> +
> +typedef int (*rte_tm_hierarchy_commit_t)(struct rte_eth_dev *dev,
> +	int clear_on_fail,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager hierarchy commit */
> +
> +typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node parent update */
> +
> +typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node shaper update */
> +
> +typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int32_t add,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node shaper update */
> +
> +typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint64_t stats_mask,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node stats update */
> +
> +typedef int (*rte_tm_node_wfq_weight_mode_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	int *wfq_weigth_mode,
> +	uint32_t n_sp_priorities,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node WFQ weight mode update */
> +
> +typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	enum rte_tm_cman_mode cman,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node congestion management mode update */
> +
> +typedef int (*rte_tm_node_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node WRED context update */
> +
> +typedef int (*rte_tm_node_shared_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager node WRED context update */
> +
> +typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_tm_node_stats *stats,
> +	uint64_t *stats_mask,
> +	int clear,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager read stats counters for specific node */
> +
> +typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager packet marking - VLAN DEI */
> +
> +typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager packet marking - IPv4/IPv6 ECN */
> +
> +typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +/**< @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
> +
> +struct rte_tm_ops {
> +	/** Traffic manager node type get */
> +	rte_tm_node_type_get_t node_type_get;
> +	/** Traffic manager node level get */
> +	rte_tm_node_level_get_t node_level_get;
> +
> +	/** Traffic manager capabilities_get */
> +	rte_tm_capabilities_get_t capabilities_get;
> +	/** Traffic manager level capabilities_get */
> +	rte_tm_level_capabilities_get_t level_capabilities_get;
> +	/** Traffic manager node capabilities get */
> +	rte_tm_node_capabilities_get_t node_capabilities_get;
> +
> +	/** Traffic manager WRED profile add */
> +	rte_tm_wred_profile_add_t wred_profile_add;
> +	/** Traffic manager WRED profile delete */
> +	rte_tm_wred_profile_delete_t wred_profile_delete;
> +	/** Traffic manager shared WRED context add/update */
> +	rte_tm_shared_wred_context_add_update_t
> +		shared_wred_context_add_update;
> +	/** Traffic manager shared WRED context delete */
> +	rte_tm_shared_wred_context_delete_t
> +		shared_wred_context_delete;
> +
> +	/** Traffic manager shaper profile add */
> +	rte_tm_shaper_profile_add_t shaper_profile_add;
> +	/** Traffic manager shaper profile delete */
> +	rte_tm_shaper_profile_delete_t shaper_profile_delete;
> +	/** Traffic manager shared shaper add/update */
> +	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
> +	/** Traffic manager shared shaper delete */
> +	rte_tm_shared_shaper_delete_t shared_shaper_delete;
> +
> +	/** Traffic manager node add */
> +	rte_tm_node_add_t node_add;
> +	/** Traffic manager node delete */
> +	rte_tm_node_delete_t node_delete;
> +	/** Traffic manager node suspend */
> +	rte_tm_node_suspend_t node_suspend;
> +	/** Traffic manager node resume */
> +	rte_tm_node_resume_t node_resume;
> +	/** Traffic manager hierarchy commit */
> +	rte_tm_hierarchy_commit_t hierarchy_commit;
> +
> +	/** Traffic manager node parent update */
> +	rte_tm_node_parent_update_t node_parent_update;
> +	/** Traffic manager node shaper update */
> +	rte_tm_node_shaper_update_t node_shaper_update;
> +	/** Traffic manager node shared shaper update */
> +	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
> +	/** Traffic manager node stats update */
> +	rte_tm_node_stats_update_t node_stats_update;
> +	/** Traffic manager node WFQ weight mode update */
> +	rte_tm_node_wfq_weight_mode_update_t node_wfq_weight_mode_update;
> +	/** Traffic manager node congestion management mode update */
> +	rte_tm_node_cman_update_t node_cman_update;
> +	/** Traffic manager node WRED context update */
> +	rte_tm_node_wred_context_update_t node_wred_context_update;
> +	/** Traffic manager node shared WRED context update */
> +	rte_tm_node_shared_wred_context_update_t
> +		node_shared_wred_context_update;
> +	/** Traffic manager read statistics counters for current node */
> +	rte_tm_node_stats_read_t node_stats_read;
> +
> +	/** Traffic manager packet marking - VLAN DEI */
> +	rte_tm_mark_vlan_dei_t mark_vlan_dei;
> +	/** Traffic manager packet marking - IPv4/IPv6 ECN */
> +	rte_tm_mark_ip_ecn_t mark_ip_ecn;
> +	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
> +	rte_tm_mark_ip_dscp_t mark_ip_dscp;
> +};
> +
> +/**
> + * Initialize generic error structure.
> + *
> + * This function also sets rte_errno to a given value.
> + *
> + * @param[out] error
> + *   Pointer to error structure (may be NULL).
> + * @param[in] code
> + *   Related error code (rte_errno).
> + * @param[in] type
> + *   Cause field and error type.
> + * @param[in] cause
> + *   Object responsible for the error.
> + * @param[in] message
> + *   Human-readable error message.
> + *
> + * @return
> + *   Error code.
> + */
> +static inline int
> +rte_tm_error_set(struct rte_tm_error *error,
> +		   int code,
> +		   enum rte_tm_error_type type,
> +		   const void *cause,
> +		   const char *message)
> +{
> +	if (error) {
> +		*error = (struct rte_tm_error){
> +			.type = type,
> +			.cause = cause,
> +			.message = message,
> +		};
> +	}
> +	rte_errno = code;
> +	return code;
> +}
> +
> +/**
> + * Get generic traffic manager operations structure from a port
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[out] error
> + *   Error details
> + *
> + * @return
> + *   The traffic manager operations structure associated with port_id on
> + *   success, NULL otherwise.
> + */
> +const struct rte_tm_ops *
> +rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 2/2] ethdev: add traffic management API
  2017-05-19 17:12     ` [PATCH v4 2/2] ethdev: add traffic management API Cristian Dumitrescu
  2017-05-19 17:34       ` Stephen Hemminger
  2017-05-24 11:28       ` Hemant Agrawal
@ 2017-05-31 13:45       ` Jerin Jacob
  2017-05-31 17:05         ` Manoharan, Balasubramanian
  2 siblings, 1 reply; 52+ messages in thread
From: Jerin Jacob @ 2017-05-31 13:45 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, thomas.monjalon, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain

-----Original Message-----
> Date: Fri, 19 May 2017 18:12:52 +0100
> From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> To: dev@dpdk.org
> CC: thomas.monjalon@6wind.com, jerin.jacob@caviumnetworks.com,
>  balasubramanian.manoharan@cavium.com, hemant.agrawal@nxp.com,
>  shreyansh.jain@nxp.com
> Subject: [PATCH v4 2/2] ethdev: add traffic management API
> X-Mailer: git-send-email 2.7.4
> 
> This patch introduces the generic ethdev API for the traffic manager
> capability, which includes: hierarchical scheduling, traffic shaping,
> congestion management, packet marking.
> 
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow)
> - Capability query API per port, per level and per node
> - Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
> - Traffic shaping: single/dual rate, private (per node) and shared (by
>   multiple nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
> 
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

IMO, With this version, It is reached to a reasonable shape where we can start
using it as a base for next-tm if there are no other reviewers for this
feature.

Two major comments,
1) IMO, We don't need a separate API for rte_tm_node_add_check_level()
and rte_tm_node_add().We can just keep, rte_tm_node_add() and move the
level check in common code.

2) There are a lot of doxygen rendering issues in this document. I will try
to enumerate them inline. Please cross check the header file with
"make doc-api-html" output.

With above changes,
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>


> 
>  MAINTAINERS                            |    4 +
>  lib/librte_ether/Makefile              |    5 +-
>  lib/librte_ether/rte_ether_version.map |   30 +
>  lib/librte_ether/rte_tm.c              |  448 ++++++++
>  lib/librte_ether/rte_tm.h              | 1923 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_tm_driver.h       |  373 +++++++

Missing doxygen hooks in doc/api/doxy-api-index.md

[rte_tm]             (@ref rte_tm.h),
[rte_tm_driver]      (@ref rte_tm_driver.h),

>  6 files changed, 2782 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_ether/rte_tm.c
>  create mode 100644 lib/librte_ether/rte_tm.h
>  create mode 100644 lib/librte_ether/rte_tm_driver.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index afb4cab..cdaf2ac 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -240,6 +240,10 @@ Flow API
>  M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
>  F: lib/librte_ether/rte_flow*
>  
> +Traffic Management API
> +M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> +F: lib/librte_ether/rte_tm*

Add next-tm tree here.

> +
>  Crypto API
>  M: Declan Doherty <declan.doherty@intel.com>
>  F: lib/librte_cryptodev/
> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
> index 93fdde1..db692ae 100644
> --- a/lib/librte_ether/Makefile
> +++ b/lib/librte_ether/Makefile
> @@ -1,6 +1,6 @@
>  #   BSD LICENSE
>  #
> -#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
> +#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.

Good to add them the name of other companies who contributed in the
specification.

>  #   All rights reserved.
>  #
>  #   Redistribution and use in source and binary forms, with or without
> @@ -45,6 +45,7 @@ LIBABIVER := 6
>  
>  SRCS-y += rte_ethdev.c
>  SRCS-y += rte_flow.c
> +SRCS-y += rte_tm.c
>  
> +			NULL,				\
> +			rte_strerror(ENOSYS));		\
> +							\
> +	ops->func;					\
> +})
> +
> +/* Get number of leaf nodes */
> +int
> +rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
> +	uint32_t *n_leaf_nodes,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	const struct rte_tm_ops *ops =
> +		rte_tm_ops_get(port_id, error);
> +
> +	if (ops == NULL)
> +		return -rte_errno;
> +
> +	if (n_leaf_nodes == NULL) {
> +		rte_tm_error_set(error,
> +			EINVAL,
> +			RTE_TM_ERROR_TYPE_UNSPECIFIED,
> +			NULL,
> +			rte_strerror(EINVAL));
> +		return -rte_errno;
> +	}
> +
> +	*n_leaf_nodes = dev->data->nb_tx_queues;
> +	return 0;
> +}
> +
> +/* Check node type (leaf or non-leaf) */
> +int
> +rte_tm_node_type_get(uint8_t port_id,
> +	uint32_t node_id,
> +	int *is_leaf,
> +	struct rte_tm_error *error)
> +{
> +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];

leaf node can be detected in the common code itself as it is 0 to
dev->data->nb_tx_queues.


> +	return RTE_TM_FUNC(port_id, node_type_get)(dev,
> +		node_id, is_leaf, error);
> +}
> +
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef __INCLUDE_RTE_TM_H__
> +#define __INCLUDE_RTE_TM_H__
> +
> +/**
> + * @file
> + * RTE Generic Traffic Manager API
> + *
> + * This interface provides the ability to configure the traffic manager in a
> + * generic way. It includes features such as: hierarchical scheduling,
> + * traffic shaping, congestion management, packet marking, etc.
> + */
> +
> +#include <stdint.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * Ethernet framing overhead.
> + *
> + * Overhead fields per Ethernet frame:
> + * 1. Preamble:                                            7 bytes;
> + * 2. Start of Frame Delimiter (SFD):                      1 byte;
> + * 3. Inter-Frame Gap (IFG):                              12 bytes.
> + *
> + * One of the typical values for the *pkt_length_adjust* field of the shaper
> + * profile.
> + *
> + * @see struct rte_tm_shaper_params
> + *
> + */
> +#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
> +
> +/**
> + * Ethernet framing overhead including the Frame Check Sequence (FCS) field.
> + * Useful when FCS is generated and added at the end of the Ethernet frame on
> + * TX side without any SW intervention.
> + *
> + * One of the typical values for the pkt_length_adjust field of the shaper
> + * profile.
> + *
> + * @see struct rte_tm_shaper_params
> + */
> +#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
> +
> +/**< Invalid WRED profile ID */
> +#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
> +
> +/**< Invalid shaper profile ID */
> +#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
> +
> +/**< Node ID for the parent of the root node */
> +#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
> +
> +/**
> + * Color
> + */
> +enum rte_tm_color {
> +	RTE_TM_GREEN = 0, /**< Green */
> +	RTE_TM_YELLOW, /**< Yellow */
> +	RTE_TM_RED, /**< Red */
> +	RTE_TM_COLORS /**< Number of colors */
> +};
> +
> +/**
> + * Node statistics counter type
> + */
> +enum rte_tm_stats_type {
> +	/**< Number of packets scheduled from current node. */
> +	RTE_TM_STATS_N_PKTS = 1 << 0,
> +
> +	/**< Number of bytes scheduled from current node. */
> +	RTE_TM_STATS_N_BYTES = 1 << 1,
> +
> +	/**< Number of green packets dropped by current leaf node.  */
> +	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
> +
> +	/**< Number of yellow packets dropped by current leaf node.  */
> +	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
> +
> +	/**< Number of red packets dropped by current leaf node.  */
> +	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
> +
> +	/**< Number of green bytes dropped by current leaf node.  */
> +	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
> +
> +	/**< Number of yellow bytes dropped by current leaf node.  */
> +	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
> +
> +	/**< Number of red bytes dropped by current leaf node.  */
> +	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
> +
> +	/**< Number of packets currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
> +
> +	/**< Number of bytes currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
> +};
> +

> + * Node statistics counters
> + */
> +struct rte_tm_node_stats {
> +	/**< Number of packets scheduled from current node. */
> +	uint64_t n_pkts;

Incorrect doxygen API HTML rendering. It is rendering as
"< Number of packets scheduled from current node.
^^^

Looks like comment has to come below the "uint64_t n_pkts". Applicable
across the header file.


> +
> +	/**< Number of bytes scheduled from current node. */
> +	uint64_t n_bytes;
> +
> +	/**< Statistics counters for leaf nodes only. */
> +	struct {
> +		/**< Number of packets dropped by current leaf node per each
> +		 * color.
> +		 */
> +		uint64_t n_pkts_dropped[RTE_TM_COLORS];
> +
> +		/**< Number of bytes dropped by current leaf node per each
> +		 * color.
> +		 */
> +		uint64_t n_bytes_dropped[RTE_TM_COLORS];
> +
> +		/**< Number of packets currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_pkts_queued;
> +
> +		/**< Number of bytes currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_bytes_queued;
> +	} leaf;
> +};
> +
> +/**
> + * Traffic manager dynamic updates
> + */
> +enum rte_tm_dynamic_update_type {
> +	/**< Dynamic parent node update. The new parent node is located on same
> +	 * hierarchy level as the former parent node. Consequently, the node
> +	 * whose parent is changed preserves its hierarchy level.
> +	 */
> +	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
> +
> +	/**< Dynamic parent node update. The new parent node is located on
> +	 * different hierarchy level than the former parent node. Consequently,
> +	 * the node whose parent is changed also changes its hierarchy level.
> +	 */
> +	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
> +
> +	/**< Dynamic node add/delete. */
> +	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
> +
> +	/**< Suspend/resume nodes. */
> +	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
> +
> +	/**< Dynamic switch between byte-based and packet-based WFQ weights. */
> +	RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE = 1 << 4,
> +
> +	/**< Dynamic update on number of SP priorities. */
> +	RTE_TM_UPDATE_NODE_N_SP_PRIORITIES = 1 << 5,
> +
> +	/**< Dynamic update of congestion management mode for leaf nodes. */
> +	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
> +
> +	/**< Dynamic update of the set of enabled stats counter types. */
> +	RTE_TM_UPDATE_NODE_STATS = 1 << 7,
> +};
> +
> +/**
> + * Traffic manager get number of leaf nodes
> + *
> + * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> + * Therefore, the set of leaf nodes is predefined, their number is always equal
> + * to N (where N is the number of TX queues configured for the current port)
> + * and their IDs are 0 .. (N-1).
> + *
> + * @param[in] port_id

[in] can be treated as default to avoid mentioning [in] everywhere.

> + *   The port identifier of the Ethernet device.
> + * @param[out] n_leaf_nodes
> + *   Number of leaf nodes for the current port.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.

> +/**
> + * Traffic manager node delete
> + *
> + * Delete an existing node. This operation fails when this node currently has
> + * at least one user (i.e. child node).
> + *
> + * When called before rte_tm_hierarchy_commit() invocation, this function is
> + * typically used to define the initial start-up hierarchy for the port.
> + * Provided that dynamic hierarchy updates are supported by the current port (as
> + * advertised in the port capability set), this function can be also called
> + * after the rte_tm_hierarchy_commit() invocation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_ADD_DELETE
> + */
> +int
> +rte_tm_node_delete(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node suspend
> + *
> + * Suspend an existing node. While the node is in suspended state, no packet is
> + * scheduled from this node and its descendants. The node exits the suspended
> + * state through the node resume operation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_node_resume()
> + * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
> + */
> +int
> +rte_tm_node_suspend(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node resume
> + *
> + * Resume an existing node that is currently in suspended state. The node
> + * entered the suspended state as result of a previous node suspend operation.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_node_suspend()
> + * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
> + */
> +int
> +rte_tm_node_resume(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager hierarchy commit
> + *
> + * This function is called during the port initialization phase (before the
> + * Ethernet port is started) to freeze the start-up hierarchy.
> + *
> + * This function typically performs the following steps:
> + *    a) It validates the start-up hierarchy that was previously defined for the
> + *       current port through successive rte_tm_node_add() invocations;
> + *    b) Assuming successful validation, it performs all the necessary port
> + *       specific configuration operations to install the specified hierarchy on
> + *       the current port, with immediate effect once the port is started.
> + *
> + * This function fails when the currently configured hierarchy is not supported
> + * by the Ethernet port, in which case the user can abort or try out another
> + * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can be
> + * build from scratch (when *clear_on_fail* is enabled) or by modifying the
> + * existing hierarchy configuration (when *clear_on_fail* is disabled).
> + *
> + * Note that this function can still fail due to other causes (e.g. not enough
> + * memory available in the system, etc), even though the specified hierarchy is
> + * supported in principle by the current port.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] clear_on_fail
> + *   On function call failure, hierarchy is cleared when this parameter is
> + *   non-zero and preserved when this parameter is equal to zero.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see rte_tm_node_add()
> + * @see rte_tm_node_delete()
> + */
> +int
> +rte_tm_hierarchy_commit(uint8_t port_id,
> +	int clear_on_fail,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node parent update
> + *
> + * Restriction for root node: its parent cannot be changed.
> + *
> + * This function can only be called after the rte_tm_hierarchy_commit()
> + * invocation. Its success depends on the port support for this operation, as
> + * advertised through the port capability set.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] parent_node_id
> + *   Node ID for the new parent. Needs to be valid.
> + * @param[in] priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param[in] weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is zero. Used by the WFQ
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL
> + * @see RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL
> + */
> +int
> +rte_tm_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node private shaper update
> + *
> + * Restriction for the root node: its private shaper profile needs to be valid
> + * and single rate.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] shaper_profile_id
> + *   Shaper profile ID for the private shaper of the current node. Needs to be
> + *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
> + *   the latter disabling the private shaper of the current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.

Missing the @see to point the capability.


> + */
> +int
> +rte_tm_node_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node shared shapers update
> + *
> + * Restriction for root node: cannot use any shared rate shapers.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] shared_shaper_id
> + *   Shared shaper ID. Needs to be valid.
> + * @param[in] add
> + *   Set to non-zero value to add this shared shaper to current node or to zero
> + *   to delete this shared shaper from current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.

Missing the @see to point the capability.

> + */
> +int
> +rte_tm_node_shared_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int add,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node enabled statistics counters update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid.
> + * @param[in] stats_mask
> + *   Mask of statistics counter types to be enabled for the current node. This
> + *   needs to be a subset of the statistics counter types available for the
> + *   current node. Any statistics counter type not included in this set is to
> + *   be disabled for the current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see enum rte_tm_stats_type
> + * @see RTE_TM_UPDATE_NODE_STATS

Incorrect doxygen API HTML rendering.

> + */
> +int
> +rte_tm_node_stats_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t stats_mask,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node WFQ weight mode update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] wfq_weight_mode
> + *   WFQ weight mode for each SP priority. When NULL, it indicates that WFQ is
> + *   to be used for all priorities. When non-NULL, it points to a pre-allocated
> + *   array of *n_sp_priorities* values, with non-zero value for byte-mode and
> + *   zero for packet-mode.
> + * @param[in] n_sp_priorities
> + *   Number of SP priorities.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE
> + * @see RTE_TM_UPDATE_NODE_N_SP_PRIORITIES
> + */
> +int
> +rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
> +	uint32_t node_id,
> +	int *wfq_weight_mode,
> +	uint32_t n_sp_priorities,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node congestion management mode update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] cman
> + *   Congestion management mode.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see RTE_TM_UPDATE_NODE_CMAN
> + */
> +int
> +rte_tm_node_cman_update(uint8_t port_id,
> +	uint32_t node_id,
> +	enum rte_tm_cman_mode cman,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node private WRED context update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] wred_profile_id
> + *   WRED profile ID for the private WRED context of the current node. Needs to
> + *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with the
> + *   latter disabling the private WRED context of the current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */

Missing the @see to point the capability.

> +int
> +rte_tm_node_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager node shared WRED context update
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param[in] shared_wred_context_id
> + *   Shared WRED context ID. Needs to be valid.
> + * @param[in] add
> + *   Set to non-zero value to add this shared WRED context to current node or
> + *   to zero to delete this shared WRED context from current node.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.

Missing the @see to point the capability.

> + * @see struct rte_tm_capabilities::mark_vlan_dei_supported
> + */
> +int
> +rte_tm_mark_vlan_dei(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
> + *
> + * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
> + * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
> + * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
> + * Notification (ECN) field (2 bits). The DSCP field is typically used to
> + * encode the traffic class and/or drop priority (RFC 2597), while the ECN
> + * field is used by RFC 3168 to implement a congestion notification mechanism
> + * to be leveraged by transport layer protocols such as TCP and SCTP that have
> + * congestion control mechanisms.
> + *
> + * When congestion is experienced, as alternative to dropping the packet,
> + * routers can change the ECN field of input packets from 2'b01 or 2'b10
> + * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
> + * that congestion is experienced). The destination endpoint can use the
> + * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
> + * source endpoint, which acknowledges it back to the destination endpoint with
> + * the Congestion Window Reduced (CWR) TCP flag.
> + *
> + * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
> + * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
> + * enabled for the current color, otherwise the ECN field is left as is.
> + *
> + * @param[in] port_id
> + *   The port identifier of the Ethernet device.
> + * @param[in] mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param[in] mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param[in] mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param[out] error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + *
> + * @see struct rte_tm_capabilities::mark_ip_ecn_tcp_supported
> + * @see struct rte_tm_capabilities::mark_ip_ecn_sctp_supported
> + */
> +int
> +rte_tm_mark_ip_ecn(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_tm_error *error);
> +
> +/**
> + * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
> + *
> + * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
> + * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
> + * values proposed by this RFC:
> + *
> + *                       Class 1    Class 2    Class 3    Class 4
> + *                     +----------+----------+----------+----------+
> + *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
> + *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
> + *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
> + *                     +----------+----------+----------+----------+

Incorrect doxygen API HTML rendering.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v4 2/2] ethdev: add traffic management API
  2017-05-31 13:45       ` Jerin Jacob
@ 2017-05-31 17:05         ` Manoharan, Balasubramanian
  0 siblings, 0 replies; 52+ messages in thread
From: Manoharan, Balasubramanian @ 2017-05-31 17:05 UTC (permalink / raw)
  To: Jacob,  Jerin
  Cc: Cristian Dumitrescu, dev, thomas.monjalon, hemant.agrawal,
	shreyansh.jain

I am fine with this proposal.

Acked-by: Balasubramanian Manoharan
<balasubramanian.manoharan@caviumnetworks.com>

> On 31-May-2017, at 7:15 PM, Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com> wrote:
> 
> -----Original Message-----
>> Date: Fri, 19 May 2017 18:12:52 +0100
>> From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
>> To: dev@dpdk.org
>> CC: thomas.monjalon@6wind.com, jerin.jacob@caviumnetworks.com,
>> balasubramanian.manoharan@cavium.com, hemant.agrawal@nxp.com,
>> shreyansh.jain@nxp.com
>> Subject: [PATCH v4 2/2] ethdev: add traffic management API
>> X-Mailer: git-send-email 2.7.4
>> 
>> This patch introduces the generic ethdev API for the traffic manager
>> capability, which includes: hierarchical scheduling, traffic shaping,
>> congestion management, packet marking.
>> 
>> Main features:
>> - Exposed as ethdev plugin capability (similar to rte_flow)
>> - Capability query API per port, per level and per node
>> - Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
>> - Traffic shaping: single/dual rate, private (per node) and shared (by
>>  multiple nodes) shapers
>> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>>  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>>  contexts
>> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>>  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
>> 
>> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> 
> IMO, With this version, It is reached to a reasonable shape where we can start
> using it as a base for next-tm if there are no other reviewers for this
> feature.
> 
> Two major comments,
> 1) IMO, We don't need a separate API for rte_tm_node_add_check_level()
> and rte_tm_node_add().We can just keep, rte_tm_node_add() and move the
> level check in common code.
> 
> 2) There are a lot of doxygen rendering issues in this document. I will try
> to enumerate them inline. Please cross check the header file with
> "make doc-api-html" output.
> 
> With above changes,
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> 
> 
>> 
>> MAINTAINERS                            |    4 +
>> lib/librte_ether/Makefile              |    5 +-
>> lib/librte_ether/rte_ether_version.map |   30 +
>> lib/librte_ether/rte_tm.c              |  448 ++++++++
>> lib/librte_ether/rte_tm.h              | 1923 ++++++++++++++++++++++++++++++++
>> lib/librte_ether/rte_tm_driver.h       |  373 +++++++
> 
> Missing doxygen hooks in doc/api/doxy-api-index.md
> 
> [rte_tm]             (@ref rte_tm.h),
> [rte_tm_driver]      (@ref rte_tm_driver.h),
> 
>> 6 files changed, 2782 insertions(+), 1 deletion(-)
>> create mode 100644 lib/librte_ether/rte_tm.c
>> create mode 100644 lib/librte_ether/rte_tm.h
>> create mode 100644 lib/librte_ether/rte_tm_driver.h
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index afb4cab..cdaf2ac 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -240,6 +240,10 @@ Flow API
>> M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
>> F: lib/librte_ether/rte_flow*
>> 
>> +Traffic Management API
>> +M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
>> +F: lib/librte_ether/rte_tm*
> 
> Add next-tm tree here.
> 
>> +
>> Crypto API
>> M: Declan Doherty <declan.doherty@intel.com>
>> F: lib/librte_cryptodev/
>> diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
>> index 93fdde1..db692ae 100644
>> --- a/lib/librte_ether/Makefile
>> +++ b/lib/librte_ether/Makefile
>> @@ -1,6 +1,6 @@
>> #   BSD LICENSE
>> #
>> -#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
>> +#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
> 
> Good to add them the name of other companies who contributed in the
> specification.
> 
>> #   All rights reserved.
>> #
>> #   Redistribution and use in source and binary forms, with or without
>> @@ -45,6 +45,7 @@ LIBABIVER := 6
>> 
>> SRCS-y += rte_ethdev.c
>> SRCS-y += rte_flow.c
>> +SRCS-y += rte_tm.c
>> 
>> +            NULL,                \
>> +            rte_strerror(ENOSYS));        \
>> +                            \
>> +    ops->func;                    \
>> +})
>> +
>> +/* Get number of leaf nodes */
>> +int
>> +rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
>> +    uint32_t *n_leaf_nodes,
>> +    struct rte_tm_error *error)
>> +{
>> +    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>> +    const struct rte_tm_ops *ops =
>> +        rte_tm_ops_get(port_id, error);
>> +
>> +    if (ops == NULL)
>> +        return -rte_errno;
>> +
>> +    if (n_leaf_nodes == NULL) {
>> +        rte_tm_error_set(error,
>> +            EINVAL,
>> +            RTE_TM_ERROR_TYPE_UNSPECIFIED,
>> +            NULL,
>> +            rte_strerror(EINVAL));
>> +        return -rte_errno;
>> +    }
>> +
>> +    *n_leaf_nodes = dev->data->nb_tx_queues;
>> +    return 0;
>> +}
>> +
>> +/* Check node type (leaf or non-leaf) */
>> +int
>> +rte_tm_node_type_get(uint8_t port_id,
>> +    uint32_t node_id,
>> +    int *is_leaf,
>> +    struct rte_tm_error *error)
>> +{
>> +    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> 
> leaf node can be detected in the common code itself as it is 0 to
> dev->data->nb_tx_queues.
> 
> 
>> +    return RTE_TM_FUNC(port_id, node_type_get)(dev,
>> +        node_id, is_leaf, error);
>> +}
>> +
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +#ifndef __INCLUDE_RTE_TM_H__
>> +#define __INCLUDE_RTE_TM_H__
>> +
>> +/**
>> + * @file
>> + * RTE Generic Traffic Manager API
>> + *
>> + * This interface provides the ability to configure the traffic manager in a
>> + * generic way. It includes features such as: hierarchical scheduling,
>> + * traffic shaping, congestion management, packet marking, etc.
>> + */
>> +
>> +#include <stdint.h>
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * Ethernet framing overhead.
>> + *
>> + * Overhead fields per Ethernet frame:
>> + * 1. Preamble:                                            7 bytes;
>> + * 2. Start of Frame Delimiter (SFD):                      1 byte;
>> + * 3. Inter-Frame Gap (IFG):                              12 bytes.
>> + *
>> + * One of the typical values for the *pkt_length_adjust* field of the shaper
>> + * profile.
>> + *
>> + * @see struct rte_tm_shaper_params
>> + *
>> + */
>> +#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
>> +
>> +/**
>> + * Ethernet framing overhead including the Frame Check Sequence (FCS) field.
>> + * Useful when FCS is generated and added at the end of the Ethernet frame on
>> + * TX side without any SW intervention.
>> + *
>> + * One of the typical values for the pkt_length_adjust field of the shaper
>> + * profile.
>> + *
>> + * @see struct rte_tm_shaper_params
>> + */
>> +#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
>> +
>> +/**< Invalid WRED profile ID */
>> +#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
>> +
>> +/**< Invalid shaper profile ID */
>> +#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
>> +
>> +/**< Node ID for the parent of the root node */
>> +#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
>> +
>> +/**
>> + * Color
>> + */
>> +enum rte_tm_color {
>> +    RTE_TM_GREEN = 0, /**< Green */
>> +    RTE_TM_YELLOW, /**< Yellow */
>> +    RTE_TM_RED, /**< Red */
>> +    RTE_TM_COLORS /**< Number of colors */
>> +};
>> +
>> +/**
>> + * Node statistics counter type
>> + */
>> +enum rte_tm_stats_type {
>> +    /**< Number of packets scheduled from current node. */
>> +    RTE_TM_STATS_N_PKTS = 1 << 0,
>> +
>> +    /**< Number of bytes scheduled from current node. */
>> +    RTE_TM_STATS_N_BYTES = 1 << 1,
>> +
>> +    /**< Number of green packets dropped by current leaf node.  */
>> +    RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
>> +
>> +    /**< Number of yellow packets dropped by current leaf node.  */
>> +    RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
>> +
>> +    /**< Number of red packets dropped by current leaf node.  */
>> +    RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
>> +
>> +    /**< Number of green bytes dropped by current leaf node.  */
>> +    RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
>> +
>> +    /**< Number of yellow bytes dropped by current leaf node.  */
>> +    RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
>> +
>> +    /**< Number of red bytes dropped by current leaf node.  */
>> +    RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
>> +
>> +    /**< Number of packets currently waiting in the packet queue of current
>> +     * leaf node.
>> +     */
>> +    RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
>> +
>> +    /**< Number of bytes currently waiting in the packet queue of current
>> +     * leaf node.
>> +     */
>> +    RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
>> +};
>> +
> 
>> + * Node statistics counters
>> + */
>> +struct rte_tm_node_stats {
>> +    /**< Number of packets scheduled from current node. */
>> +    uint64_t n_pkts;
> 
> Incorrect doxygen API HTML rendering. It is rendering as
> "< Number of packets scheduled from current node.
> ^^^
> 
> Looks like comment has to come below the "uint64_t n_pkts". Applicable
> across the header file.
> 
> 
>> +
>> +    /**< Number of bytes scheduled from current node. */
>> +    uint64_t n_bytes;
>> +
>> +    /**< Statistics counters for leaf nodes only. */
>> +    struct {
>> +        /**< Number of packets dropped by current leaf node per each
>> +         * color.
>> +         */
>> +        uint64_t n_pkts_dropped[RTE_TM_COLORS];
>> +
>> +        /**< Number of bytes dropped by current leaf node per each
>> +         * color.
>> +         */
>> +        uint64_t n_bytes_dropped[RTE_TM_COLORS];
>> +
>> +        /**< Number of packets currently waiting in the packet queue of
>> +         * current leaf node.
>> +         */
>> +        uint64_t n_pkts_queued;
>> +
>> +        /**< Number of bytes currently waiting in the packet queue of
>> +         * current leaf node.
>> +         */
>> +        uint64_t n_bytes_queued;
>> +    } leaf;
>> +};
>> +
>> +/**
>> + * Traffic manager dynamic updates
>> + */
>> +enum rte_tm_dynamic_update_type {
>> +    /**< Dynamic parent node update. The new parent node is located on same
>> +     * hierarchy level as the former parent node. Consequently, the node
>> +     * whose parent is changed preserves its hierarchy level.
>> +     */
>> +    RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
>> +
>> +    /**< Dynamic parent node update. The new parent node is located on
>> +     * different hierarchy level than the former parent node. Consequently,
>> +     * the node whose parent is changed also changes its hierarchy level.
>> +     */
>> +    RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
>> +
>> +    /**< Dynamic node add/delete. */
>> +    RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
>> +
>> +    /**< Suspend/resume nodes. */
>> +    RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
>> +
>> +    /**< Dynamic switch between byte-based and packet-based WFQ weights. */
>> +    RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE = 1 << 4,
>> +
>> +    /**< Dynamic update on number of SP priorities. */
>> +    RTE_TM_UPDATE_NODE_N_SP_PRIORITIES = 1 << 5,
>> +
>> +    /**< Dynamic update of congestion management mode for leaf nodes. */
>> +    RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
>> +
>> +    /**< Dynamic update of the set of enabled stats counter types. */
>> +    RTE_TM_UPDATE_NODE_STATS = 1 << 7,
>> +};
>> +
>> +/**
>> + * Traffic manager get number of leaf nodes
>> + *
>> + * Each leaf node sits on on top of a TX queue of the current Ethernet port.
>> + * Therefore, the set of leaf nodes is predefined, their number is always equal
>> + * to N (where N is the number of TX queues configured for the current port)
>> + * and their IDs are 0 .. (N-1).
>> + *
>> + * @param[in] port_id
> 
> [in] can be treated as default to avoid mentioning [in] everywhere.
> 
>> + *   The port identifier of the Ethernet device.
>> + * @param[out] n_leaf_nodes
>> + *   Number of leaf nodes for the current port.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
> 
>> +/**
>> + * Traffic manager node delete
>> + *
>> + * Delete an existing node. This operation fails when this node currently has
>> + * at least one user (i.e. child node).
>> + *
>> + * When called before rte_tm_hierarchy_commit() invocation, this function is
>> + * typically used to define the initial start-up hierarchy for the port.
>> + * Provided that dynamic hierarchy updates are supported by the current port (as
>> + * advertised in the port capability set), this function can be also called
>> + * after the rte_tm_hierarchy_commit() invocation.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see RTE_TM_UPDATE_NODE_ADD_DELETE
>> + */
>> +int
>> +rte_tm_node_delete(uint8_t port_id,
>> +    uint32_t node_id,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node suspend
>> + *
>> + * Suspend an existing node. While the node is in suspended state, no packet is
>> + * scheduled from this node and its descendants. The node exits the suspended
>> + * state through the node resume operation.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see rte_tm_node_resume()
>> + * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
>> + */
>> +int
>> +rte_tm_node_suspend(uint8_t port_id,
>> +    uint32_t node_id,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node resume
>> + *
>> + * Resume an existing node that is currently in suspended state. The node
>> + * entered the suspended state as result of a previous node suspend operation.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see rte_tm_node_suspend()
>> + * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
>> + */
>> +int
>> +rte_tm_node_resume(uint8_t port_id,
>> +    uint32_t node_id,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager hierarchy commit
>> + *
>> + * This function is called during the port initialization phase (before the
>> + * Ethernet port is started) to freeze the start-up hierarchy.
>> + *
>> + * This function typically performs the following steps:
>> + *    a) It validates the start-up hierarchy that was previously defined for the
>> + *       current port through successive rte_tm_node_add() invocations;
>> + *    b) Assuming successful validation, it performs all the necessary port
>> + *       specific configuration operations to install the specified hierarchy on
>> + *       the current port, with immediate effect once the port is started.
>> + *
>> + * This function fails when the currently configured hierarchy is not supported
>> + * by the Ethernet port, in which case the user can abort or try out another
>> + * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can be
>> + * build from scratch (when *clear_on_fail* is enabled) or by modifying the
>> + * existing hierarchy configuration (when *clear_on_fail* is disabled).
>> + *
>> + * Note that this function can still fail due to other causes (e.g. not enough
>> + * memory available in the system, etc), even though the specified hierarchy is
>> + * supported in principle by the current port.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] clear_on_fail
>> + *   On function call failure, hierarchy is cleared when this parameter is
>> + *   non-zero and preserved when this parameter is equal to zero.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see rte_tm_node_add()
>> + * @see rte_tm_node_delete()
>> + */
>> +int
>> +rte_tm_hierarchy_commit(uint8_t port_id,
>> +    int clear_on_fail,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node parent update
>> + *
>> + * Restriction for root node: its parent cannot be changed.
>> + *
>> + * This function can only be called after the rte_tm_hierarchy_commit()
>> + * invocation. Its success depends on the port support for this operation, as
>> + * advertised through the port capability set.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[in] parent_node_id
>> + *   Node ID for the new parent. Needs to be valid.
>> + * @param[in] priority
>> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
>> + *   running on the parent of the current node for scheduling this child node.
>> + * @param[in] weight
>> + *   Node weight. The node weight is relative to the weight sum of all siblings
>> + *   that have the same priority. The lowest weight is zero. Used by the WFQ
>> + *   algorithm running on the parent of the current node for scheduling this
>> + *   child node.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL
>> + * @see RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL
>> + */
>> +int
>> +rte_tm_node_parent_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    uint32_t parent_node_id,
>> +    uint32_t priority,
>> +    uint32_t weight,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node private shaper update
>> + *
>> + * Restriction for the root node: its private shaper profile needs to be valid
>> + * and single rate.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[in] shaper_profile_id
>> + *   Shaper profile ID for the private shaper of the current node. Needs to be
>> + *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
>> + *   the latter disabling the private shaper of the current node.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
> 
> Missing the @see to point the capability.
> 
> 
>> + */
>> +int
>> +rte_tm_node_shaper_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    uint32_t shaper_profile_id,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node shared shapers update
>> + *
>> + * Restriction for root node: cannot use any shared rate shapers.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[in] shared_shaper_id
>> + *   Shared shaper ID. Needs to be valid.
>> + * @param[in] add
>> + *   Set to non-zero value to add this shared shaper to current node or to zero
>> + *   to delete this shared shaper from current node.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
> 
> Missing the @see to point the capability.
> 
>> + */
>> +int
>> +rte_tm_node_shared_shaper_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    uint32_t shared_shaper_id,
>> +    int add,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node enabled statistics counters update
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid.
>> + * @param[in] stats_mask
>> + *   Mask of statistics counter types to be enabled for the current node. This
>> + *   needs to be a subset of the statistics counter types available for the
>> + *   current node. Any statistics counter type not included in this set is to
>> + *   be disabled for the current node.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see enum rte_tm_stats_type
>> + * @see RTE_TM_UPDATE_NODE_STATS
> 
> Incorrect doxygen API HTML rendering.
> 
>> + */
>> +int
>> +rte_tm_node_stats_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    uint64_t stats_mask,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node WFQ weight mode update
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid leaf node ID.
>> + * @param[in] wfq_weight_mode
>> + *   WFQ weight mode for each SP priority. When NULL, it indicates that WFQ is
>> + *   to be used for all priorities. When non-NULL, it points to a pre-allocated
>> + *   array of *n_sp_priorities* values, with non-zero value for byte-mode and
>> + *   zero for packet-mode.
>> + * @param[in] n_sp_priorities
>> + *   Number of SP priorities.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE
>> + * @see RTE_TM_UPDATE_NODE_N_SP_PRIORITIES
>> + */
>> +int
>> +rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    int *wfq_weight_mode,
>> +    uint32_t n_sp_priorities,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node congestion management mode update
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid leaf node ID.
>> + * @param[in] cman
>> + *   Congestion management mode.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see RTE_TM_UPDATE_NODE_CMAN
>> + */
>> +int
>> +rte_tm_node_cman_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    enum rte_tm_cman_mode cman,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node private WRED context update
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid leaf node ID.
>> + * @param[in] wred_profile_id
>> + *   WRED profile ID for the private WRED context of the current node. Needs to
>> + *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with the
>> + *   latter disabling the private WRED context of the current node.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + */
> 
> Missing the @see to point the capability.
> 
>> +int
>> +rte_tm_node_wred_context_update(uint8_t port_id,
>> +    uint32_t node_id,
>> +    uint32_t wred_profile_id,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager node shared WRED context update
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] node_id
>> + *   Node ID. Needs to be valid leaf node ID.
>> + * @param[in] shared_wred_context_id
>> + *   Shared WRED context ID. Needs to be valid.
>> + * @param[in] add
>> + *   Set to non-zero value to add this shared WRED context to current node or
>> + *   to zero to delete this shared WRED context from current node.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
> 
> Missing the @see to point the capability.
> 
>> + * @see struct rte_tm_capabilities::mark_vlan_dei_supported
>> + */
>> +int
>> +rte_tm_mark_vlan_dei(uint8_t port_id,
>> +    int mark_green,
>> +    int mark_yellow,
>> +    int mark_red,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
>> + *
>> + * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
>> + * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
>> + * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
>> + * Notification (ECN) field (2 bits). The DSCP field is typically used to
>> + * encode the traffic class and/or drop priority (RFC 2597), while the ECN
>> + * field is used by RFC 3168 to implement a congestion notification mechanism
>> + * to be leveraged by transport layer protocols such as TCP and SCTP that have
>> + * congestion control mechanisms.
>> + *
>> + * When congestion is experienced, as alternative to dropping the packet,
>> + * routers can change the ECN field of input packets from 2'b01 or 2'b10
>> + * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
>> + * that congestion is experienced). The destination endpoint can use the
>> + * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
>> + * source endpoint, which acknowledges it back to the destination endpoint with
>> + * the Congestion Window Reduced (CWR) TCP flag.
>> + *
>> + * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
>> + * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
>> + * enabled for the current color, otherwise the ECN field is left as is.
>> + *
>> + * @param[in] port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param[in] mark_green
>> + *   Set to non-zero value to enable marking of green packets and to zero to
>> + *   disable it.
>> + * @param[in] mark_yellow
>> + *   Set to non-zero value to enable marking of yellow packets and to zero to
>> + *   disable it.
>> + * @param[in] mark_red
>> + *   Set to non-zero value to enable marking of red packets and to zero to
>> + *   disable it.
>> + * @param[out] error
>> + *   Error details. Filled in only on error, when not NULL.
>> + * @return
>> + *   0 on success, non-zero error code otherwise.
>> + *
>> + * @see struct rte_tm_capabilities::mark_ip_ecn_tcp_supported
>> + * @see struct rte_tm_capabilities::mark_ip_ecn_sctp_supported
>> + */
>> +int
>> +rte_tm_mark_ip_ecn(uint8_t port_id,
>> +    int mark_green,
>> +    int mark_yellow,
>> +    int mark_red,
>> +    struct rte_tm_error *error);
>> +
>> +/**
>> + * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
>> + *
>> + * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
>> + * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
>> + * values proposed by this RFC:
>> + *
>> + *                       Class 1    Class 2    Class 3    Class 4
>> + *                     +----------+----------+----------+----------+
>> + *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
>> + *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
>> + *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
>> + *                     +----------+----------+----------+----------+
> 
> Incorrect doxygen API HTML rendering.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v5 0/2] ethdev: abstraction layer for QoS traffic management
  2017-05-19 17:12     ` [PATCH v4 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
@ 2017-06-09 16:51       ` Cristian Dumitrescu
  2017-06-09 16:51         ` [PATCH v5 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
  2017-06-09 16:51         ` [PATCH v5 2/2] ethdev: add traffic management API Cristian Dumitrescu
  0 siblings, 2 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-06-09 16:51 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

This patch set introduces an ethdev-based abstraction layer for Quality of
Service (QoS) Traffic Management, which includes: hierarchical scheduling,
traffic shaping, congestion management, packet marking. The goal is to
provide a simple generic API that is agnostic of the underlying HW, SW or
mixed HW-SW implementation.

Patch 1 uses the approach introduced by rte_flow in DPDK to extend the
ethdev functionality in a modular way for traffic management.

Patch 2 introduces the generic ethdev API for traffic management.

Cristian Dumitrescu (2):
  ethdev: add traffic management ops get API
  ethdev: add traffic management API

 MAINTAINERS                            |    5 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ethdev.c          |   12 +
 lib/librte_ether/rte_ethdev.h          |   20 +
 lib/librte_ether/rte_ether_version.map |   36 +
 lib/librte_ether/rte_tm.c              |  438 ++++++++
 lib/librte_ether/rte_tm.h              | 1899 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  366 ++++++
 8 files changed, 2780 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v5 1/2] ethdev: add traffic management ops get API
  2017-06-09 16:51       ` [PATCH v5 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
@ 2017-06-09 16:51         ` Cristian Dumitrescu
  2017-06-09 16:51         ` [PATCH v5 2/2] ethdev: add traffic management API Cristian Dumitrescu
  1 sibling, 0 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-06-09 16:51 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

The rte_flow feature breaks the monolithic approach for ethdev by
introducing the new rte_flow API to ethdev using a plugin-like approach.

Basically, the rte_flow API is still logically part of ethdev:
- It extends the ethdev functionality: rte_flow is a new feature/
  capability of ethdev;
- all its functions work on an Ethernet device: the first parameter of the
  rte_flow functions is Ethernet device port ID.

Also, the rte_flow API is a sort of capability plugin for ethdev:
- the rte_flow API functions have their own name space: they are called
  rte_flow_operationXYZ() as opposed to rte_eth_dev_flow_operationXYZ());
- the rte_flow API functions are placed in separate files in the same
  librte_ether folder as opposed to rte_ethdev.[hc].

The way it works is by using the existing ethdev API function
rte_eth_dev_filter_ctrl() to query the current Ethernet device port ID for
the support of the rte_flow capability and return the pointer to the
rte_flow operations when supported and NULL otherwise:

struct rte_flow_ops *eth_flow_ops;
int rte = rte_eth_dev_filter_ctrl(eth_port_id,
	RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, &eth_flow_ops);

This patch reuses the same approach for ethdev Traffic Management API.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
Changes in v4:
- Followed up on suggestion from Thomas: Replaced generic capability
  ethdev API function with traffic management specific function
  rte_eth_dev_tm_ops_get()

Changes in v3:
- Followed up on suggestion from Jerin: renamed capability from
  Hierarchical Scheduler (sched) to Traffic Manager (tm)

Changes in v2:
- Followed up on suggestion from Jerin and Hemant: renamed
  capability_control() to capability_ops_get()
- Added ACK from Keith, Jerin and Hemant

 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 20 ++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  6 ++++++
 3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 83898a8..f735f1e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3021,6 +3021,18 @@ rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 	return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op, arg);
 }
 
+int
+rte_eth_dev_tm_ops_get(uint8_t port_id, void *ops)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tm_ops_get, -ENOTSUP);
+	return (*dev->dev_ops->tm_ops_get)(dev, ops);
+}
+
 void *
 rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
 		rte_rx_callback_fn fn, void *user_param)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 121058c..26b53f4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1441,6 +1441,9 @@ typedef int (*eth_filter_ctrl_t)(struct rte_eth_dev *dev,
 				 void *arg);
 /**< @internal Take operations to assigned filter type on an Ethernet device */
 
+typedef int (*eth_tm_ops_get_t)(struct rte_eth_dev *dev, void *ops);
+/**< @internal Get Traffic Management (TM) operations on an Ethernet device */
+
 typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev,
 				 struct rte_eth_dcb_info *dcb_info);
 /**< @internal Get dcb information on an Ethernet device */
@@ -1573,6 +1576,9 @@ struct eth_dev_ops {
 	/**< Get extended device statistic values by ID. */
 	eth_xstats_get_names_by_id_t xstats_get_names_by_id;
 	/**< Get name of extended device statistics by ID. */
+
+	eth_tm_ops_get_t tm_ops_get;
+	/**< Get Traffic Management (TM) operations. */
 };
 
 /**
@@ -4105,6 +4111,20 @@ int rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 			enum rte_filter_op filter_op, void *arg);
 
 /**
+ * Take Traffic Management (TM) operations on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param ops
+ *   Pointer to TM operations.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_tm_ops_get(uint8_t port_id, void *ops);
+
+/**
  * Get DCB information on an Ethernet device.
  *
  * @param port_id
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d6726bb..2788e7b 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -156,3 +156,9 @@ DPDK_17.05 {
 	rte_eth_xstats_get_names_by_id;
 
 } DPDK_17.02;
+
+DPDK_17.08 {
+    global:
+
+	rte_eth_dev_tm_ops_get;
+} DPDK_17.05;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v5 2/2] ethdev: add traffic management API
  2017-06-09 16:51       ` [PATCH v5 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  2017-06-09 16:51         ` [PATCH v5 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
@ 2017-06-09 16:51         ` Cristian Dumitrescu
  2017-06-12  3:36           ` Jerin Jacob
  2017-06-12 13:35           ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  1 sibling, 2 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-06-09 16:51 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

This patch introduces the generic ethdev API for the traffic manager
capability, which includes: hierarchical scheduling, traffic shaping,
congestion management, packet marking.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow)
- Capability query API per port, per level and per node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
- Traffic shaping: single/dual rate, private (per node) and shared (by
  multiple nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Balasubramanian.Manoharan <balasubramanian.manoharan@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
Changes in v5:
- Implemented feedback from Jerin [8]
	- Add level parameter to node add API function
	- Doxygen: fixed comments applicable to field below/before
	- Doxygen: added missing @see
	- Doxygen: fixed hooks in doc/api/doxy-api-index.md
	- Doxygen: fixed table rendering
	- Added copyright on API header file from Cavium and NXP to
	  existing Intel copyright
	- MANTAINERS: added next-tm tree
- Added V4 ACKs from Jerin, Bala and Hemant

Changes in v4:
- Implemented feedback from Hemant [6]
	- Capability API: Reworked the port, level and node capability API
	  data structure to remove confusion due to "summary across all
	  nodes" approach, which made it unclear whether a particular
	  capability is supported by all nodes or by at least one node.
	- Capability API: Added flags for "all nodes have identical
	  capability set"
	- Suspended state: documented the required behavior in Doxygen
	  description
- Implemented feedback from Jerin [7]
	- Node add: added level parameter (see new API function:
	  rte_tm_node_add_check_level())
	- RTE_TM_ETH_FRAMING_OVERHEAD, RTE_TM_ETH_FRAMING_OVERHEAD_FCS:
	  documented their usage in their Doxygen description
	- Capability API: for each function, mention the related
	  capability field (Doxygen @see)
	- stats_mask, capability_mask: document the enum flags used to
	  build each mask (Doxygen @see)
	- Rename rte_tm_get_leaf_nodes() to
	  rte_tm_get_number_of_leaf_nodes()
	- Doxygen: add @param[in, out] to the description of all API funcs
	- Doxygen: fix hooks in doc/api/doxy-api-index.md
- Rename rte_tm_hierarchy_set() to rte_tm_hierarchy_commit(), improved
  Doxygen description
- Node add, node delete: improved Doxygen description
- Fixed incorrect design assumption that packet-based weight mode for WFQ
  is identical to WRR. As result, removed all references to WRR support.
  Renamed the "scheduling mode" node parameters to "wfq_weight_mode".

Changes in v3:
- Implemented feedback from Jerin [5]
- Changed naming convention: scheddev -> tm
- Improvements on the capability API:
	- Specification of marking capabilities per color
	- WFQ/WRR groups: sp_n_children_max ->
	  wfq_wrr_n_children_per_group_max, added wfq_wrr_n_groups_max,
	  improved description of both, improved description of
	  wfq_wrr_weight_max
	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent
	  update
- Enforced/documented restrictions for root node (node_add() and
  update())
- Enforced/documented shaper profile restrictions on PIR: PIR != 0,
  PIR >= CIR
- Turned repetitive code in rte_tm.c into macro
- Removed dependency on rte_red.h file (added RED params to rte_tm.h)
- Color: removed "e_" from color names enum
- Fixed small Doxygen style issues

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added
	  clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, parent, role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API funcs
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below,
  hopefully nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated
      object IDs. IMO the choice to have application-generated object IDs
      adds marginal complexity to the driver (search ID function
      required), but it provides huge simplification for the application.
      The app does not need to worry about building & managing tree-like
      structure for storing driver-generated object IDs, the app can use
      its own convention for node IDs depending on the specific hierarchy
      that it needs. Trivial example: identify all level-2 nodes with IDs
      like 100, 200, 300, … and the level-3 nodes based on their level-2
      parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …, 310, 320,
      330, … and level-4 nodes based on their level-3 parents: 111, 112,
      113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log
      for the other related simplification that was implemented: leaf
      nodes now have predefined IDs that are the same with their Ethernet
      TX queue ID ( therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the
  shaper profile as part of node API (no shaper ID needed for private
  shapers), while the shared shapers are configured outside of the node
  API using shaper profile and communicated to the node using shared
  shaper ID. So there is no configuration overhead for shared shapers if
  the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet
  TX queue ID (therefore no translation is required for leaf nodes). This
  is also used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause
  (same as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate
  limiting based on IP packet bytes)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
[5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html
[6] Hemant's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-March/062354.html
[7] Jerin's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-April/063429.html
[8] Jerin's feedback on v4: http://www.dpdk.org/ml/archives/dev/2017-May/066932.html


 MAINTAINERS                            |    5 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_tm.c              |  438 ++++++++
 lib/librte_ether/rte_tm.h              | 1899 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  366 ++++++
 6 files changed, 2742 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f6095ef..3c7414f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -240,6 +240,11 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+Traffic Management API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+T: git://dpdk.org/next/dpdk-next-tm
+F: lib/librte_ether/rte_tm*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 93fdde1..db692ae 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_tm.c
 
 #
 # Export include files
@@ -56,5 +57,7 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_tm.h
+SYMLINK-y-include += rte_tm_driver.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 2788e7b..5e8651d 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -161,4 +161,34 @@ DPDK_17.08 {
     global:
 
 	rte_eth_dev_tm_ops_get;
+	rte_tm_get_leaf_nodes;
+	rte_tm_node_type_get;
+	rte_tm_capabilities_get;
+	rte_tm_level_capabilities_get;
+	rte_tm_node_capabilities_get;
+	rte_tm_wred_profile_add;
+	rte_tm_wred_profile_delete;
+	rte_tm_shared_wred_context_add_update;
+	rte_tm_shared_wred_context_delete;
+	rte_tm_shaper_profile_add;
+	rte_tm_shaper_profile_delete;
+	rte_tm_shared_shaper_add_update;
+	rte_tm_shared_shaper_delete;
+	rte_tm_node_add;
+	rte_tm_node_delete;
+	rte_tm_node_suspend;
+	rte_tm_node_resume;
+	rte_tm_hierarchy_commit;
+	rte_tm_node_parent_update;
+	rte_tm_node_shaper_update;
+	rte_tm_node_shared_shaper_update;
+	rte_tm_node_stats_update;
+	rte_tm_node_wfq_weight_mode_update;
+	rte_tm_node_cman_update;
+	rte_tm_node_wred_context_update;
+	rte_tm_node_shared_wred_context_update;
+	rte_tm_node_stats_read;
+	rte_tm_mark_vlan_dei;
+	rte_tm_mark_ip_ecn;
+	rte_tm_mark_ip_dscp;
 } DPDK_17.05;
diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
new file mode 100644
index 0000000..7167965
--- /dev/null
+++ b/lib/librte_ether/rte_tm.c
@@ -0,0 +1,438 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm_driver.h"
+#include "rte_tm.h"
+
+/* Get generic traffic manager operations structure from a port. */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_tm_error_set(error,
+			ENODEV,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->tm_ops_get == NULL) ||
+		(dev->dev_ops->tm_ops_get(dev, &ops) != 0) ||
+		(ops == NULL)) {
+		rte_tm_error_set(error,
+			ENOSYS,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+#define RTE_TM_FUNC(port_id, func)				\
+({							\
+	const struct rte_tm_ops *ops =			\
+		rte_tm_ops_get(port_id, error);		\
+	if (ops == NULL)					\
+		return -rte_errno;			\
+							\
+	if (ops->func == NULL)				\
+		return -rte_tm_error_set(error,		\
+			ENOSYS,				\
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
+			NULL,				\
+			rte_strerror(ENOSYS));		\
+							\
+	ops->func;					\
+})
+
+/* Get number of leaf nodes */
+int
+rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops =
+		rte_tm_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_tm_error_set(error,
+			EINVAL,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node type (leaf or non-leaf) */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_type_get)(dev,
+		node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
+		cap, error);
+}
+
+/* Get level capabilities */
+int rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
+		level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
+		node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
+		wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
+		wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
+		shared_wred_context_id, wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
+		shared_wred_context_id, error);
+}
+
+/* Add shaper profile */
+int rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
+		shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
+		shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
+		shared_shaper_id, shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
+		shared_shaper_id, error);
+}
+
+/* Add node to port traffic manager hierarchy */
+int rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_add)(dev,
+		node_id, parent_node_id, priority, weight, level_id,
+		params, error);
+}
+
+/* Delete node from traffic manager hierarchy */
+int rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_delete)(dev,
+		node_id, error);
+}
+
+/* Suspend node */
+int rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_suspend)(dev,
+		node_id, error);
+}
+
+/* Resume node */
+int rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_resume)(dev,
+		node_id, error);
+}
+
+/* Commit the initial port traffic manager hierarchy */
+int rte_tm_hierarchy_commit(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, hierarchy_commit)(dev,
+		clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
+		node_id, parent_node_id, priority, weight, error);
+}
+
+/* Update node private shaper */
+int rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
+		node_id, shaper_profile_id, error);
+}
+
+/* Update node shared shapers */
+int rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
+		node_id, shared_shaper_id, add, error);
+}
+
+/* Update node stats */
+int rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
+		node_id, stats_mask, error);
+}
+
+/* Update WFQ weight mode */
+int rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *wfq_weight_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wfq_weight_mode_update)(dev,
+		node_id, wfq_weight_mode, n_sp_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
+		node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
+		node_id, wred_profile_id, error);
+}
+
+/* Update node shared WRED context */
+int rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
+		node_id, shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
+		node_id, stats, stats_mask, clear, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
new file mode 100644
index 0000000..9513f13
--- /dev/null
+++ b/lib/librte_ether/rte_tm.h
@@ -0,0 +1,1899 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   Copyright(c) 2017 Cavium.
+ *   Copyright(c) 2017 NXP.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_H__
+#define __INCLUDE_RTE_TM_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API
+ *
+ * This interface provides the ability to configure the traffic manager in a
+ * generic way. It includes features such as: hierarchical scheduling,
+ * traffic shaping, congestion management, packet marking, etc.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Ethernet framing overhead.
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ *
+ * One of the typical values for the *pkt_length_adjust* field of the shaper
+ * profile.
+ *
+ * @see struct rte_tm_shaper_params
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead including the Frame Check Sequence (FCS) field.
+ * Useful when FCS is generated and added at the end of the Ethernet frame on
+ * TX side without any SW intervention.
+ *
+ * One of the typical values for the pkt_length_adjust field of the shaper
+ * profile.
+ *
+ * @see struct rte_tm_shaper_params
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**
+ * Invalid WRED profile ID.
+ *
+ * @see struct rte_tm_node_params
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_wred_context_update()
+ */
+#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**
+ *Invalid shaper profile ID.
+ *
+ * @see struct rte_tm_node_params
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_shaper_update()
+ */
+#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**
+ * Node ID for the parent of the root node.
+ *
+ * @see rte_tm_node_add()
+ */
+#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Node level ID used to disable level ID checking.
+ *
+ * @see rte_tm_node_add()
+ */
+#define RTE_TM_NODE_LEVEL_ID_ANY                     UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_tm_color {
+	RTE_TM_GREEN = 0, /**< Green */
+	RTE_TM_YELLOW, /**< Yellow */
+	RTE_TM_RED, /**< Red */
+	RTE_TM_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_tm_stats_type {
+	/** Number of packets scheduled from current node. */
+	RTE_TM_STATS_N_PKTS = 1 << 0,
+
+	/** Number of bytes scheduled from current node. */
+	RTE_TM_STATS_N_BYTES = 1 << 1,
+
+	/** Number of green packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/** Number of yellow packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/** Number of red packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/** Number of green bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/** Number of yellow bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/** Number of red bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/** Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/** Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_tm_node_stats {
+	/** Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/** Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/** Statistics counters for leaf nodes only. */
+	struct {
+		/** Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[RTE_TM_COLORS];
+
+		/** Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[RTE_TM_COLORS];
+
+		/** Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/** Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Traffic manager dynamic updates
+ */
+enum rte_tm_dynamic_update_type {
+	/** Dynamic parent node update. The new parent node is located on same
+	 * hierarchy level as the former parent node. Consequently, the node
+	 * whose parent is changed preserves its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
+
+	/** Dynamic parent node update. The new parent node is located on
+	 * different hierarchy level than the former parent node. Consequently,
+	 * the node whose parent is changed also changes its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
+
+	/** Dynamic node add/delete. */
+	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
+
+	/** Suspend/resume nodes. */
+	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
+
+	/** Dynamic switch between byte-based and packet-based WFQ weights. */
+	RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE = 1 << 4,
+
+	/** Dynamic update on number of SP priorities. */
+	RTE_TM_UPDATE_NODE_N_SP_PRIORITIES = 1 << 5,
+
+	/** Dynamic update of congestion management mode for leaf nodes. */
+	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
+
+	/** Dynamic update of the set of enabled stats counter types. */
+	RTE_TM_UPDATE_NODE_STATS = 1 << 7,
+};
+
+/**
+ * Traffic manager capabilities
+ */
+struct rte_tm_capabilities {
+	/** Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/** Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/** When non-zero, this flag indicates that all the non-leaf nodes
+	 * (with the exception of the root node) have identical capability set.
+	 */
+	int non_leaf_nodes_identical;
+
+	/** When non-zero, this flag indicates that all the leaf nodes have
+	 * identical capability set.
+	 */
+	int leaf_nodes_identical;
+
+	/** Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resources between private and
+	 * shared shapers, it is typically equal to the sum of
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/** Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have their private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/** Maximum number of private shapers that support dual rate shaping.
+	 * Indicates the maximum number of nodes that can concurrently have
+	 * their private shaper enabled with dual rate support. Only valid when
+	 * private shapers are supported. The value of zero indicates that dual
+	 * rate shaping is not available for private shapers. The maximum value
+	 * is *shaper_private_n_max*.
+	 */
+	int shaper_private_dual_rate_n_max;
+
+	/** Minimum committed/peak rate (bytes per second) for any private
+	 * shaper. Valid only when private shapers are supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/** Maximum committed/peak rate (bytes per second) for any private
+	 * shaper. Valid only when private shapers are supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/** Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/** Maximum number of nodes that can share the same shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_per_shaper_max;
+
+	/** Maximum number of shared shapers a node can be part of. This
+	 * parameter indicates that there is at least one node that can be
+	 * configured with this many shared shapers, which might not be true for
+	 * all the nodes. Only valid when shared shapers are supported, in which
+	 * case it ranges from 1 to *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_shared_n_shapers_per_node_max;
+
+	/** Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/** Minimum committed/peak rate (bytes per second) for any shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/** Maximum committed/peak rate (bytes per second) for any shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/** Minimum value allowed for packet length adjustment for any private
+	 * or shared shaper.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/** Maximum value allowed for packet length adjustment for any private
+	 * or shared shaper.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/** Maximum number of children nodes. This parameter indicates that
+	 * there is at least one non-leaf node that can be configured with this
+	 * many children nodes, which might not be true for all the non-leaf
+	 * nodes.
+	 */
+	uint32_t sched_n_children_max;
+
+	/** Maximum number of supported priority levels. This parameter
+	 * indicates that there is at least one non-leaf node that can be
+	 * configured with this many priority levels for managing its children
+	 * nodes, which might not be true for all the non-leaf nodes. The value
+	 * of zero is invalid. The value of 1 indicates that only priority 0 is
+	 * supported, which essentially means that Strict Priority (SP)
+	 * algorithm is not supported.
+	 */
+	uint32_t sched_sp_n_priorities_max;
+
+	/** Maximum number of sibling nodes that can have the same priority at
+	 * any given time, i.e. maximum size of the WFQ sibling node group. This
+	 * parameter indicates there is at least one non-leaf node that meets
+	 * this condition, which might not be true for all the non-leaf nodes.
+	 * The value of zero is invalid. The value of 1 indicates that WFQ
+	 * algorithm is not supported. The maximum value is
+	 * *sched_n_children_max*.
+	 */
+	uint32_t sched_wfq_n_children_per_group_max;
+
+	/** Maximum number of priority levels that can have more than one child
+	 * node at any given time, i.e. maximum number of WFQ sibling node
+	 * groups that have two or more members. This parameter indicates there
+	 * is at least one non-leaf node that meets this condition, which might
+	 * not be true for all the non-leaf nodes. The value of zero states that
+	 * WFQ algorithm is not supported. The value of 1 indicates that
+	 * (*sched_sp_n_priorities_max* - 1) priority levels have at most one
+	 * child node, so there can be only one priority level with two or
+	 * more sibling nodes making up a WFQ group. The maximum value is:
+	 * min(floor(*sched_n_children_max* / 2), *sched_sp_n_priorities_max*).
+	 */
+	uint32_t sched_wfq_n_groups_max;
+
+	/** Maximum WFQ weight. The value of 1 indicates that all sibling nodes
+	 * with same priority have the same WFQ weight, so WFQ is reduced to FQ.
+	 */
+	uint32_t sched_wfq_weight_max;
+
+	/** Head drop algorithm support. When non-zero, this parameter
+	 * indicates that there is at least one leaf node that supports the head
+	 * drop algorithm, which might not be true for all the leaf nodes.
+	 */
+	int cman_head_drop_supported;
+
+	/** Maximum number of WRED contexts, either private or shared. In case
+	 * the implementation does not share any resources between private and
+	 * shared WRED contexts, it is typically equal to the sum of
+	 * *cman_wred_context_private_n_max* and
+	 * *cman_wred_context_shared_n_max*.
+	 */
+	uint32_t cman_wred_context_n_max;
+
+	/** Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have their private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/** Maximum number of shared WRED contexts. The value of zero
+	 * indicates that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/** Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_per_context_max;
+
+	/** Maximum number of shared WRED contexts a leaf node can be part of.
+	 * This parameter indicates that there is at least one leaf node that
+	 * can be configured with this many shared WRED contexts, which might
+	 * not be true for all the leaf nodes. Only valid when shared WRED
+	 * contexts are supported, in which case it ranges from 1 to
+	 * *cman_wred_context_shared_n_max*.
+	 */
+	uint32_t cman_wred_context_shared_n_contexts_per_node_max;
+
+	/** Support for VLAN DEI packet marking (per color). */
+	int mark_vlan_dei_supported[RTE_TM_COLORS];
+
+	/** Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
+	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
+
+	/** Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
+	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
+
+	/** Support for IPv4/IPv6 DSCP packet marking (per color). */
+	int mark_ip_dscp_supported[RTE_TM_COLORS];
+
+	/** Set of supported dynamic update operations.
+	 * @see enum rte_tm_dynamic_update_type
+	 */
+	uint64_t dynamic_update_mask;
+
+	/** Set of supported statistics counter types.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Traffic manager level capabilities
+ */
+struct rte_tm_level_capabilities {
+	/** Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/** Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/** Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/** When non-zero, this flag indicates that all the non-leaf nodes on
+	 * this level have identical capability set. Valid only when
+	 * *n_nodes_nonleaf_max* is non-zero.
+	 */
+	int non_leaf_nodes_identical;
+
+	/** When non-zero, this flag indicates that all the leaf nodes on this
+	 * level have identical capability set. Valid only when
+	 * *n_nodes_leaf_max* is non-zero.
+	 */
+	int leaf_nodes_identical;
+
+	union {
+		/** Items valid only for the non-leaf nodes on this level. */
+		struct {
+			/** Private shaper support. When non-zero, it indicates
+			 * there is at least one non-leaf node on this level
+			 * with private shaper support, which may not be the
+			 * case for all the non-leaf nodes on this level.
+			 */
+			int shaper_private_supported;
+
+			/** Dual rate support for private shaper. Valid only
+			 * when private shaper is supported for the non-leaf
+			 * nodes on the current level. When non-zero, it
+			 * indicates there is at least one non-leaf node on this
+			 * level with dual rate private shaper support, which
+			 * may not be the case for all the non-leaf nodes on
+			 * this level.
+			 */
+			int shaper_private_dual_rate_supported;
+
+			/** Minimum committed/peak rate (bytes per second) for
+			 * private shapers of the non-leaf nodes of this level.
+			 * Valid only when private shaper is supported on this
+			 * level.
+			 */
+			uint64_t shaper_private_rate_min;
+
+			/** Maximum committed/peak rate (bytes per second) for
+			 * private shapers of the non-leaf nodes on this level.
+			 * Valid only when private shaper is supported on this
+			 * level.
+			 */
+			uint64_t shaper_private_rate_max;
+
+			/** Maximum number of shared shapers that any non-leaf
+			 * node on this level can be part of. The value of zero
+			 * indicates that shared shapers are not supported by
+			 * the non-leaf nodes on this level. When non-zero, it
+			 * indicates there is at least one non-leaf node on this
+			 * level that meets this condition, which may not be the
+			 * case for all the non-leaf nodes on this level.
+			 */
+			uint32_t shaper_shared_n_max;
+
+			/** Maximum number of children nodes. This parameter
+			 * indicates that there is at least one non-leaf node on
+			 * this level that can be configured with this many
+			 * children nodes, which might not be true for all the
+			 * non-leaf nodes on this level.
+			 */
+			uint32_t sched_n_children_max;
+
+			/** Maximum number of supported priority levels. This
+			 * parameter indicates that there is at least one
+			 * non-leaf node on this level that can be configured
+			 * with this many priority levels for managing its
+			 * children nodes, which might not be true for all the
+			 * non-leaf nodes on this level. The value of zero is
+			 * invalid. The value of 1 indicates that only priority
+			 * 0 is supported, which essentially means that Strict
+			 * Priority (SP) algorithm is not supported on this
+			 * level.
+			 */
+			uint32_t sched_sp_n_priorities_max;
+
+			/** Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size of
+			 * the WFQ sibling node group. This parameter indicates
+			 * there is at least one non-leaf node on this level
+			 * that meets this condition, which may not be true for
+			 * all the non-leaf nodes on this level. The value of
+			 * zero is invalid. The value of 1 indicates that WFQ
+			 * algorithm is not supported on this level. The maximum
+			 * value is *sched_n_children_max*.
+			 */
+			uint32_t sched_wfq_n_children_per_group_max;
+
+			/** Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ sibling node groups that
+			 * have two or more members. This parameter indicates
+			 * there is at least one non-leaf node on this level
+			 * that meets this condition, which might not be true
+			 * for all the non-leaf nodes. The value of zero states
+			 * that WFQ algorithm is not supported on this level.
+			 * The value of 1 indicates that
+			 * (*sched_sp_n_priorities_max* - 1) priority levels on
+			 * this level have at most one child node, so there can
+			 * be only one priority level with two or more sibling
+			 * nodes making up a WFQ group on this level. The
+			 * maximum value is:
+			 * min(floor(*sched_n_children_max* / 2),
+			 * *sched_sp_n_priorities_max*).
+			 */
+			uint32_t sched_wfq_n_groups_max;
+
+			/** Maximum WFQ weight. The value of 1 indicates that
+			 * all sibling nodes on this level with same priority
+			 * have the same WFQ weight, so on this level WFQ is
+			 * reduced to FQ.
+			 */
+			uint32_t sched_wfq_weight_max;
+
+			/** Mask of statistics counter types supported by the
+			 * non-leaf nodes on this level. Every supported
+			 * statistics counter type is supported by at least one
+			 * non-leaf node on this level, which may not be true
+			 * for all the non-leaf nodes on this level.
+			 * @see enum rte_tm_stats_type
+			 */
+			uint64_t stats_mask;
+		} nonleaf;
+
+		/** Items valid only for the leaf nodes on this level. */
+		struct {
+			/** Private shaper support. When non-zero, it indicates
+			 * there is at least one leaf node on this level with
+			 * private shaper support, which may not be the case for
+			 * all the leaf nodes on this level.
+			 */
+			int shaper_private_supported;
+
+			/** Dual rate support for private shaper. Valid only
+			 * when private shaper is supported for the leaf nodes
+			 * on this level. When non-zero, it indicates there is
+			 * at least one leaf node on this level with dual rate
+			 * private shaper support, which may not be the case for
+			 * all the leaf nodes on this level.
+			 */
+			int shaper_private_dual_rate_supported;
+
+			/** Minimum committed/peak rate (bytes per second) for
+			 * private shapers of the leaf nodes of this level.
+			 * Valid only when private shaper is supported for the
+			 * leaf nodes on this level.
+			 */
+			uint64_t shaper_private_rate_min;
+
+			/** Maximum committed/peak rate (bytes per second) for
+			 * private shapers of the leaf nodes on this level.
+			 * Valid only when private shaper is supported for the
+			 * leaf nodes on this level.
+			 */
+			uint64_t shaper_private_rate_max;
+
+			/** Maximum number of shared shapers that any leaf node
+			 * on this level can be part of. The value of zero
+			 * indicates that shared shapers are not supported by
+			 * the leaf nodes on this level. When non-zero, it
+			 * indicates there is at least one leaf node on this
+			 * level that meets this condition, which may not be the
+			 * case for all the leaf nodes on this level.
+			 */
+			uint32_t shaper_shared_n_max;
+
+			/** Head drop algorithm support. When non-zero, this
+			 * parameter indicates that there is at least one leaf
+			 * node on this level that supports the head drop
+			 * algorithm, which might not be true for all the leaf
+			 * nodes on this level.
+			 */
+			int cman_head_drop_supported;
+
+			/** Private WRED context support. When non-zero, it
+			 * indicates there is at least one node on this level
+			 * with private WRED context support, which may not be
+			 * true for all the leaf nodes on this level.
+			 */
+			int cman_wred_context_private_supported;
+
+			/** Maximum number of shared WRED contexts that any
+			 * leaf node on this level can be part of. The value of
+			 * zero indicates that shared WRED contexts are not
+			 * supported by the leaf nodes on this level. When
+			 * non-zero, it indicates there is at least one leaf
+			 * node on this level that meets this condition, which
+			 * may not be the case for all the leaf nodes on this
+			 * level.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+
+			/** Mask of statistics counter types supported by the
+			 * leaf nodes on this level. Every supported statistics
+			 * counter type is supported by at least one leaf node
+			 * on this level, which may not be true for all the leaf
+			 * nodes on this level.
+			 * @see enum rte_tm_stats_type
+			 */
+			uint64_t stats_mask;
+		} leaf;
+	};
+};
+
+/**
+ * Traffic manager node capabilities
+ */
+struct rte_tm_node_capabilities {
+	/** Private shaper support for the current node. */
+	int shaper_private_supported;
+
+	/** Dual rate shaping support for private shaper of current node.
+	 * Valid only when private shaper is supported by the current node.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/** Minimum committed/peak rate (bytes per second) for private
+	 * shaper of current node. Valid only when private shaper is supported
+	 * by the current node.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/** Maximum committed/peak rate (bytes per second) for private
+	 * shaper of current node. Valid only when private shaper is supported
+	 * by the current node.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/** Maximum number of shared shapers the current node can be part of.
+	 * The value of zero indicates that shared shapers are not supported by
+	 * the current node.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	union {
+		/** Items valid only for non-leaf nodes. */
+		struct {
+			/** Maximum number of children nodes. */
+			uint32_t sched_n_children_max;
+
+			/** Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sched_sp_n_priorities_max;
+
+			/** Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size
+			 * of the WFQ sibling node group. The value of zero
+			 * is invalid. The value of 1 indicates that WFQ
+			 * algorithm is not supported. The maximum value is
+			 * *sched_n_children_max*.
+			 */
+			uint32_t sched_wfq_n_children_per_group_max;
+
+			/** Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ sibling node groups that have
+			 * two or more members. The value of zero states that
+			 * WFQ algorithm is not supported. The value of 1
+			 * indicates that (*sched_sp_n_priorities_max* - 1)
+			 * priority levels have at most one child node, so there
+			 * can be only one priority level with two or more
+			 * sibling nodes making up a WFQ group. The maximum
+			 * value is: min(floor(*sched_n_children_max* / 2),
+			 * *sched_sp_n_priorities_max*).
+			 */
+			uint32_t sched_wfq_n_groups_max;
+
+			/** Maximum WFQ weight. The value of 1 indicates that
+			 * all sibling nodes with same priority have the same
+			 * WFQ weight, so WFQ is reduced to FQ.
+			 */
+			uint32_t sched_wfq_weight_max;
+		} nonleaf;
+
+		/** Items valid only for leaf nodes. */
+		struct {
+			/** Head drop algorithm support for current node. */
+			int cman_head_drop_supported;
+
+			/** Private WRED context support for current node. */
+			int cman_wred_context_private_supported;
+
+			/** Maximum number of shared WRED contexts the current
+			 * node can be part of. The value of zero indicates that
+			 * shared WRED contexts are not supported by the current
+			 * node.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+
+	/** Mask of statistics counter types supported by the current node.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_tm_cman_mode {
+	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * Random Early Detection (RED) profile
+ */
+struct rte_tm_red_params {
+	/** Minimum queue threshold */
+	uint16_t min_th;
+
+	/** Maximum queue threshold */
+	uint16_t max_th;
+
+	/** Inverse of packet marking probability maximum value (maxp), i.e.
+	 * maxp_inv = 1 / maxp
+	 */
+	uint16_t maxp_inv;
+
+	/** Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
+	uint16_t wq_log2;
+};
+
+/**
+ * Weighted RED (WRED) profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_tm_wred_params {
+	/** One set of RED parameters per packet color */
+	struct rte_tm_red_params red_params[RTE_TM_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_tm_token_bucket {
+	/** Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/** Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the peak bucket has to be bigger than zero, as well as greater than
+ * or equal to the rate of the committed bucket.
+ */
+struct rte_tm_shaper_params {
+	/** Committed token bucket */
+	struct rte_tm_token_bucket committed;
+
+	/** Peak token bucket */
+	struct rte_tm_token_bucket peak;
+
+	/** Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each non-leaf node has multiple inputs (its children nodes) and single output
+ * (which is input to its parent node). It arbitrates its inputs using Strict
+ * Priority (SP) and Weighted Fair Queuing (WFQ) algorithms to schedule input
+ * packets to its output while observing its shaping (rate limiting)
+ * constraints.
+ *
+ * Algorithms such as Weighted Round Robin (WRR), Byte-level WRR, Deficit WRR
+ * (DWRR), etc. are considered approximations of the WFQ ideal and are
+ * assimilated to WFQ, although an associated implementation-dependent trade-off
+ * on accuracy, performance and resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP algorithm
+ * based on their priority, with zero (0) as the highest priority. Children with
+ * the same priority are scheduled using the WFQ algorithm according to their
+ * weights. The WFQ weight of a given child node is relative to the sum of the
+ * weights of all its sibling nodes that have the same priority, with one (1) as
+ * the lowest weight. For each SP priority, the WFQ weight mode can be set as
+ * either byte-based or packet-based.
+ *
+ * Each leaf node sits on top of a TX queue of the current Ethernet port. Hence,
+ * the leaf nodes are predefined, with their node IDs set to 0 .. (N-1), where N
+ * is the number of TX queues configured for the current Ethernet port. The
+ * non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_tm_node_params {
+	/** Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/** User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/** Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	union {
+		/** Parameters only valid for non-leaf nodes. */
+		struct {
+			/** WFQ weight mode for each SP priority. When NULL, it
+			 * indicates that WFQ is to be used for all priorities.
+			 * When non-NULL, it points to a pre-allocated array of
+			 * *n_sp_priorities* values, with non-zero value for
+			 * byte-mode and zero for packet-mode.
+			 */
+			int *wfq_weight_mode;
+
+			/** Number of SP priorities. */
+			uint32_t n_sp_priorities;
+		} nonleaf;
+
+		/** Parameters only valid for leaf nodes. */
+		struct {
+			/** Congestion management mode */
+			enum rte_tm_cman_mode cman;
+
+			/** WRED parameters (only valid when *cman* is set to
+			 * WRED).
+			 */
+			struct {
+				/** WRED profile for private WRED context. The
+				 * absence of a private WRED context for the
+				 * current leaf node is indicated by value
+				 * RTE_TM_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t wred_profile_id;
+
+				/** User allocated array of shared WRED context
+				 * IDs. When set to NULL, it indicates that the
+				 * current leaf node should not currently be
+				 * part of any shared WRED contexts.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/** Number of elements in the
+				 * *shared_wred_context_id* array. Only valid
+				 * when *shared_wred_context_id* is non-NULL,
+				 * in which case it should be non-zero.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+
+	/** Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_tm_error::cause.
+ */
+enum rte_tm_error_type {
+	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
+	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_TM_ERROR_TYPE_CAPABILITIES,
+	RTE_TM_ERROR_TYPE_LEVEL_ID,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
+	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SP_PRIORITIES,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_TM_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_tm_error {
+	enum rte_tm_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Traffic manager get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port)
+ * and their IDs are 0 .. (N-1).
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node ID validate and type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
+ * the number of TX queues of the current Ethernet port. The non-leaf nodes
+ * have their IDs generated by the application outside of the above range,
+ * which is reserved for leaf nodes.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID value. Needs to be valid.
+ * @param[out] is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] cap
+ *   Traffic manager capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager level capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] level_id
+ *   The hierarchy level identifier. The value of 0 identifies the level of the
+ *   root node.
+ * @param[out] cap
+ *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] cap
+ *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param[in] profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is
+ * currently at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several hierarchy leaf nodes
+ * configured to use WRED as the congestion management mode.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID
+ * @param[in] wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
+ * context.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param[in] profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
+ * no longer using the shaper profile previously assigned to it and is updated
+ * to use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. hierarchy node) of this shared shaper.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ *weight* to schedule its new
+ * child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that
+ * are subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
+ * can only be one node with this parent ID (i.e. the root node). Further
+ * restrictions for root node: needs to be non-leaf, its private shaper profile
+ * needs to be valid and single rate, cannot use any shared shapers.
+ *
+ * When called before rte_tm_hierarchy_commit() invocation, this function is
+ * typically used to define the initial start-up hierarchy for the port.
+ * Provided that dynamic hierarchy updates are supported by the current port (as
+ * advertised in the port capability set), this function can be also called
+ * after the rte_tm_hierarchy_commit() invocation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param[in] parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[in] level_id
+ *   Level ID that should be met by this node. The hierarchy level of the
+ *   current node is already fully specified through its parent node (i.e. the
+ *   level of this node is equal to the level of its parent node plus one),
+ *   therefore the reason for providing this parameter is to enable the
+ *   application to perform step-by-step checking of the node level during
+ *   successive invocations of this function. When not desired, this check can
+ *   be disabled by assigning value RTE_TM_NODE_LEVEL_ID_ANY to this parameter.
+ * @param[in] params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_hierarchy_commit()
+ * @see RTE_TM_UPDATE_NODE_ADD_DELETE
+ * @see RTE_TM_NODE_LEVEL_ID_ANY
+ */
+int
+rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has
+ * at least one user (i.e. child node).
+ *
+ * When called before rte_tm_hierarchy_commit() invocation, this function is
+ * typically used to define the initial start-up hierarchy for the port.
+ * Provided that dynamic hierarchy updates are supported by the current port (as
+ * advertised in the port capability set), this function can be also called
+ * after the rte_tm_hierarchy_commit() invocation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_ADD_DELETE
+ */
+int
+rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node suspend
+ *
+ * Suspend an existing node. While the node is in suspended state, no packet is
+ * scheduled from this node and its descendants. The node exits the suspended
+ * state through the node resume operation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_resume()
+ * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
+ */
+int
+rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node resume
+ *
+ * Resume an existing node that is currently in suspended state. The node
+ * entered the suspended state as result of a previous node suspend operation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_suspend()
+ * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
+ */
+int
+rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager hierarchy commit
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the start-up hierarchy.
+ *
+ * This function typically performs the following steps:
+ *    a) It validates the start-up hierarchy that was previously defined for the
+ *       current port through successive rte_tm_node_add() invocations;
+ *    b) Assuming successful validation, it performs all the necessary port
+ *       specific configuration operations to install the specified hierarchy on
+ *       the current port, with immediate effect once the port is started.
+ *
+ * This function fails when the currently configured hierarchy is not supported
+ * by the Ethernet port, in which case the user can abort or try out another
+ * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can be
+ * build from scratch (when *clear_on_fail* is enabled) or by modifying the
+ * existing hierarchy configuration (when *clear_on_fail* is disabled).
+ *
+ * Note that this function can still fail due to other causes (e.g. not enough
+ * memory available in the system, etc), even though the specified hierarchy is
+ * supported in principle by the current port.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_delete()
+ */
+int
+rte_tm_hierarchy_commit(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node parent update
+ *
+ * Restriction for root node: its parent cannot be changed.
+ *
+ * This function can only be called after the rte_tm_hierarchy_commit()
+ * invocation. Its success depends on the port support for this operation, as
+ * advertised through the port capability set.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL
+ * @see RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL
+ */
+int
+rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private shaper update
+ *
+ * Restriction for the root node: its private shaper profile needs to be valid
+ * and single rate.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared shapers update
+ *
+ * Restriction for root node: cannot use any shared rate shapers.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param[in] add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node enabled statistics counters update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to
+ *   be disabled for the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see enum rte_tm_stats_type
+ * @see RTE_TM_UPDATE_NODE_STATS
+ */
+int
+rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node WFQ weight mode update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] wfq_weight_mode
+ *   WFQ weight mode for each SP priority. When NULL, it indicates that WFQ is
+ *   to be used for all priorities. When non-NULL, it points to a pre-allocated
+ *   array of *n_sp_priorities* values, with non-zero value for byte-mode and
+ *   zero for packet-mode.
+ * @param[in] n_sp_priorities
+ *   Number of SP priorities.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE
+ * @see RTE_TM_UPDATE_NODE_N_SP_PRIORITIES
+ */
+int
+rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *wfq_weight_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node congestion management mode update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] cman
+ *   Congestion management mode.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_CMAN
+ */
+int
+rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private WRED context update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with the
+ *   latter disabling the private WRED context of the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+  *
+ * @see struct rte_tm_capabilities
+*/
+int
+rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared WRED context update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param[in] add
+ *   Set to non-zero value to add this shared WRED context to current node or
+ *   to zero to delete this shared WRED context from current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node statistics counters read
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param[out] stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters
+ *   retrieved with the *stats* structure are valid.
+ * @param[in] clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read,
+ *   otherwise the statistics counters are left untouched.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see enum rte_tm_stats_type
+ */
+int
+rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_vlan_dei_supported
+ */
+int
+rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
+ * Notification (ECN) field (2 bits). The DSCP field is typically used to
+ * encode the traffic class and/or drop priority (RFC 2597), while the ECN
+ * field is used by RFC 3168 to implement a congestion notification mechanism
+ * to be leveraged by transport layer protocols such as TCP and SCTP that have
+ * congestion control mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10
+ * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
+ * that congestion is experienced). The destination endpoint can use the
+ * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
+ * source endpoint, which acknowledges it back to the destination endpoint with
+ * the Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_ip_ecn_tcp_supported
+ * @see struct rte_tm_capabilities::mark_ip_ecn_sctp_supported
+ */
+int
+rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ * <pre>                   Class 1    Class 2    Class 3    Class 4   </pre>
+ * <pre>                 +----------+----------+----------+----------+</pre>
+ * <pre>Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |</pre>
+ * <pre>Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |</pre>
+ * <pre>High Drop Prec   |  001110  |  010110  |  011110  |  100110  |</pre>
+ * <pre>                 +----------+----------+----------+----------+</pre>
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
+ * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_ip_dscp_supported
+ */
+int
+rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_H__ */
diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
new file mode 100644
index 0000000..a5b698f
--- /dev/null
+++ b/lib/librte_ether/rte_tm_driver.h
@@ -0,0 +1,366 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_DRIVER_H__
+#define __INCLUDE_RTE_TM_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** @internal Traffic manager node ID validate and type get */
+typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager capabilities get */
+typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager level capabilities get */
+typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node capabilities get */
+typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager WRED profile add */
+typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager WRED profile delete */
+typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared WRED context add */
+typedef int (*rte_tm_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared WRED context delete */
+typedef int (*rte_tm_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shaper profile add */
+typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shaper profile delete */
+typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared shaper add/update */
+typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared shaper delete */
+typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node add */
+typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node delete */
+typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node suspend */
+typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node resume */
+typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager hierarchy commit */
+typedef int (*rte_tm_hierarchy_commit_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node parent update */
+typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node shaper update */
+typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node shaper update */
+typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node stats update */
+typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node WFQ weight mode update */
+typedef int (*rte_tm_node_wfq_weight_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *wfq_weigth_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node congestion management mode update */
+typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node WRED context update */
+typedef int (*rte_tm_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node WRED context update */
+typedef int (*rte_tm_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager read stats counters for specific node */
+typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager packet marking - VLAN DEI */
+typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager packet marking - IPv4/IPv6 ECN */
+typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
+typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+struct rte_tm_ops {
+	/** Traffic manager node type get */
+	rte_tm_node_type_get_t node_type_get;
+
+	/** Traffic manager capabilities_get */
+	rte_tm_capabilities_get_t capabilities_get;
+	/** Traffic manager level capabilities_get */
+	rte_tm_level_capabilities_get_t level_capabilities_get;
+	/** Traffic manager node capabilities get */
+	rte_tm_node_capabilities_get_t node_capabilities_get;
+
+	/** Traffic manager WRED profile add */
+	rte_tm_wred_profile_add_t wred_profile_add;
+	/** Traffic manager WRED profile delete */
+	rte_tm_wred_profile_delete_t wred_profile_delete;
+	/** Traffic manager shared WRED context add/update */
+	rte_tm_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Traffic manager shared WRED context delete */
+	rte_tm_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Traffic manager shaper profile add */
+	rte_tm_shaper_profile_add_t shaper_profile_add;
+	/** Traffic manager shaper profile delete */
+	rte_tm_shaper_profile_delete_t shaper_profile_delete;
+	/** Traffic manager shared shaper add/update */
+	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Traffic manager shared shaper delete */
+	rte_tm_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Traffic manager node add */
+	rte_tm_node_add_t node_add;
+	/** Traffic manager node delete */
+	rte_tm_node_delete_t node_delete;
+	/** Traffic manager node suspend */
+	rte_tm_node_suspend_t node_suspend;
+	/** Traffic manager node resume */
+	rte_tm_node_resume_t node_resume;
+	/** Traffic manager hierarchy commit */
+	rte_tm_hierarchy_commit_t hierarchy_commit;
+
+	/** Traffic manager node parent update */
+	rte_tm_node_parent_update_t node_parent_update;
+	/** Traffic manager node shaper update */
+	rte_tm_node_shaper_update_t node_shaper_update;
+	/** Traffic manager node shared shaper update */
+	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Traffic manager node stats update */
+	rte_tm_node_stats_update_t node_stats_update;
+	/** Traffic manager node WFQ weight mode update */
+	rte_tm_node_wfq_weight_mode_update_t node_wfq_weight_mode_update;
+	/** Traffic manager node congestion management mode update */
+	rte_tm_node_cman_update_t node_cman_update;
+	/** Traffic manager node WRED context update */
+	rte_tm_node_wred_context_update_t node_wred_context_update;
+	/** Traffic manager node shared WRED context update */
+	rte_tm_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Traffic manager read statistics counters for current node */
+	rte_tm_node_stats_read_t node_stats_read;
+
+	/** Traffic manager packet marking - VLAN DEI */
+	rte_tm_mark_vlan_dei_t mark_vlan_dei;
+	/** Traffic manager packet marking - IPv4/IPv6 ECN */
+	rte_tm_mark_ip_ecn_t mark_ip_ecn;
+	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
+	rte_tm_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to error structure (may be NULL).
+ * @param[in] code
+ *   Related error code (rte_errno).
+ * @param[in] type
+ *   Cause field and error type.
+ * @param[in] cause
+ *   Object responsible for the error.
+ * @param[in] message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_tm_error_set(struct rte_tm_error *error,
+		   int code,
+		   enum rte_tm_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_tm_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic traffic manager operations structure from a port
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] error
+ *   Error details
+ *
+ * @return
+ *   The traffic manager operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 2/2] ethdev: add traffic management API
  2017-06-09 16:51         ` [PATCH v5 2/2] ethdev: add traffic management API Cristian Dumitrescu
@ 2017-06-12  3:36           ` Jerin Jacob
  2017-06-12 10:24             ` Dumitrescu, Cristian
  2017-06-12 13:35           ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  1 sibling, 1 reply; 52+ messages in thread
From: Jerin Jacob @ 2017-06-12  3:36 UTC (permalink / raw)
  To: Cristian Dumitrescu
  Cc: dev, thomas, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

-----Original Message-----
> Date: Fri, 9 Jun 2017 17:51:15 +0100
> From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> To: dev@dpdk.org
> CC: thomas@monjalon.net, jerin.jacob@caviumnetworks.com,
>  balasubramanian.manoharan@cavium.com, hemant.agrawal@nxp.com,
>  shreyansh.jain@nxp.com, jasvinder.singh@intel.com, wenzhuo.lu@intel.com
> Subject: [PATCH v5 2/2] ethdev: add traffic management API
> X-Mailer: git-send-email 2.7.4
> 
> This patch introduces the generic ethdev API for the traffic manager
> capability, which includes: hierarchical scheduling, traffic shaping,
> congestion management, packet marking.
> 
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow)
> - Capability query API per port, per level and per node
> - Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
> - Traffic shaping: single/dual rate, private (per node) and shared (by
>   multiple nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
> 
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Acked-by: Balasubramanian.Manoharan <balasubramanian.manoharan@caviumnetworks.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
> Changes in v5:
> - Implemented feedback from Jerin [8]
> 	- Add level parameter to node add API function
> 	- Doxygen: fixed comments applicable to field below/before
> 	- Doxygen: added missing @see
> 	- Doxygen: fixed hooks in doc/api/doxy-api-index.md
> 	- Doxygen: fixed table rendering
> 	- Added copyright on API header file from Cavium and NXP to
> 	  existing Intel copyright
> 	- MANTAINERS: added next-tm tree
> - Added V4 ACKs from Jerin, Bala and Hemant
> 
> Changes in v4:
> - Implemented feedback from Hemant [6]
> 	- Capability API: Reworked the port, level and node capability API
> 	  data structure to remove confusion due to "summary across all
> 	  nodes" approach, which made it unclear whether a particular
> 	  capability is supported by all nodes or by at least one node.
> 	- Capability API: Added flags for "all nodes have identical
> 	  capability set"
> 	- Suspended state: documented the required behavior in Doxygen
> 	  description
> - Implemented feedback from Jerin [7]
> 	- Node add: added level parameter (see new API function:
> 	  rte_tm_node_add_check_level())
> 	- RTE_TM_ETH_FRAMING_OVERHEAD, RTE_TM_ETH_FRAMING_OVERHEAD_FCS:
> 	  documented their usage in their Doxygen description
> 	- Capability API: for each function, mention the related
> 	  capability field (Doxygen @see)
> 	- stats_mask, capability_mask: document the enum flags used to
> 	  build each mask (Doxygen @see)
> 	- Rename rte_tm_get_leaf_nodes() to
> 	  rte_tm_get_number_of_leaf_nodes()
> 	- Doxygen: add @param[in, out] to the description of all API funcs
> 	- Doxygen: fix hooks in doc/api/doxy-api-index.md
> - Rename rte_tm_hierarchy_set() to rte_tm_hierarchy_commit(), improved
>   Doxygen description
> - Node add, node delete: improved Doxygen description
> - Fixed incorrect design assumption that packet-based weight mode for WFQ
>   is identical to WRR. As result, removed all references to WRR support.
>   Renamed the "scheduling mode" node parameters to "wfq_weight_mode".
> 
> Changes in v3:
> - Implemented feedback from Jerin [5]
> - Changed naming convention: scheddev -> tm
> - Improvements on the capability API:
> 	- Specification of marking capabilities per color
> 	- WFQ/WRR groups: sp_n_children_max ->
> 	  wfq_wrr_n_children_per_group_max, added wfq_wrr_n_groups_max,
> 	  improved description of both, improved description of
> 	  wfq_wrr_weight_max
> 	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent
> 	  update
> - Enforced/documented restrictions for root node (node_add() and
>   update())
> - Enforced/documented shaper profile restrictions on PIR: PIR != 0,
>   PIR >= CIR
> - Turned repetitive code in rte_tm.c into macro
> - Removed dependency on rte_red.h file (added RED params to rte_tm.h)
> - Color: removed "e_" from color names enum
> - Fixed small Doxygen style issues
> 
> Changes in v2:
> - Implemented feedback from Hemant [4]
> - Improvements on the capability API
> 	- Added capability API for hierarchy level
> 	- Merged stats capability into the capability API
> 	- Added dynamic updates
> 	- Added non-leaf/leaf union to the node capability structure
> 	- Renamed sp_priority_min to sp_n_priorities_max, added
> 	  clarifications
> 	- Fixed description for sp_n_children_max
> - Clarified and enforced rule on node ID range for leaf and non-leaf nodes
> 	- Added API functions to get node type (i.e. leaf/non-leaf):
> 	  get_leaf_nodes(), node_type_get()
> - Added clarification for the root node: its creation, parent, role
> 	- Macro NODE_ID_NULL as root node's parent
> 	- Description of the node_add() and node_parent_update() API funcs
> - Added clarification for the first time add vs. subsequent updates rule
> 	- Cleaned up the description for the node_add() function
> - Statistics API improvements
> 	- Merged stats capability into the capability API
> 	- Added API function node_stats_update()
> 	- Added more stats per packet color
> - Added more error types
> - Fixed small Doxygen style issues
> 
> Changes in v1 (since RFC [1]):
> - Implemented as ethdev plugin (similar to rte_flow) as opposed to more
>   monolithic additions to ethdev itself
> - Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
>   suggested items with only one exception, see the long list below,
>   hopefully nothing was forgotten.
>     - The item not done (hopefully for a good reason): driver-generated
>       object IDs. IMO the choice to have application-generated object IDs
>       adds marginal complexity to the driver (search ID function
>       required), but it provides huge simplification for the application.
>       The app does not need to worry about building & managing tree-like
>       structure for storing driver-generated object IDs, the app can use
>       its own convention for node IDs depending on the specific hierarchy
>       that it needs. Trivial example: identify all level-2 nodes with IDs
>       like 100, 200, 300, … and the level-3 nodes based on their level-2
>       parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …, 310, 320,
>       330, … and level-4 nodes based on their level-3 parents: 111, 112,
>       113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log
>       for the other related simplification that was implemented: leaf
>       nodes now have predefined IDs that are the same with their Ethernet
>       TX queue ID ( therefore no translation is required for leaf nodes).
> - Capability API. Done per port and per node as well.
> - Dual rate shapers
> - Added configuration of private shaper (per node) directly from the
>   shaper profile as part of node API (no shaper ID needed for private
>   shapers), while the shared shapers are configured outside of the node
>   API using shaper profile and communicated to the node using shared
>   shaper ID. So there is no configuration overhead for shared shapers if
>   the app does not use any of them.
> - Leaf nodes now have predefined IDs that are the same with their Ethernet
>   TX queue ID (therefore no translation is required for leaf nodes). This
>   is also used to differentiate between a leaf node and a non-leaf node.
> - Domain-specific errors to give a precise indication of the error cause
>   (same as done by rte_flow)
> - Packet marking API
> - Packet length optional adjustment for shapers, positive (e.g. for adding
>   Ethernet framing overhead of 20 bytes) or negative (e.g. for rate
>   limiting based on IP packet bytes)
> 
> [1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
> [2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
> [3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
> [4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
> [5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html
> [6] Hemant's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-March/062354.html
> [7] Jerin's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-April/063429.html
> [8] Jerin's feedback on v4: http://www.dpdk.org/ml/archives/dev/2017-May/066932.html
> 
> 
>  MAINTAINERS                            |    5 +
>  lib/librte_ether/Makefile              |    5 +-
>  lib/librte_ether/rte_ether_version.map |   30 +
>  lib/librte_ether/rte_tm.c              |  438 ++++++++
>  lib/librte_ether/rte_tm.h              | 1899 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_tm_driver.h       |  366 ++++++
>  6 files changed, 2742 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_ether/rte_tm.c
>  create mode 100644 lib/librte_ether/rte_tm.h
>  create mode 100644 lib/librte_ether/rte_tm_driver.h

Please update the the missing the doxygen hooks in doc/api/doxy-api-index.md.

If it makes sense, then add (@see) for the exact capability field for following
functions.

rte_tm_node_wred_context_update
rte_tm_node_shared_wred_context_update
rte_tm_node_shaper_update
rte_tm_node_shared_shaper_update

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v5 2/2] ethdev: add traffic management API
  2017-06-12  3:36           ` Jerin Jacob
@ 2017-06-12 10:24             ` Dumitrescu, Cristian
  0 siblings, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-06-12 10:24 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, thomas, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Singh, Jasvinder, Lu, Wenzhuo

<snip>...
> 
> Please update the the missing the doxygen hooks in doc/api/doxy-api-
> index.md.
> 

Not sure why this file change did not make it in the patch, will triple check for V6 that I am going to send now. Thanks for checking!

> If it makes sense, then add (@see) for the exact capability field for following
> functions.
> 
> rte_tm_node_wred_context_update
> rte_tm_node_shared_wred_context_update
> rte_tm_node_shaper_update
> rte_tm_node_shared_shaper_update

OK, will do in V6.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management
  2017-06-09 16:51         ` [PATCH v5 2/2] ethdev: add traffic management API Cristian Dumitrescu
  2017-06-12  3:36           ` Jerin Jacob
@ 2017-06-12 13:35           ` Cristian Dumitrescu
  2017-06-12 13:35             ` [PATCH v6 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
                               ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-06-12 13:35 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

This patch set introduces an ethdev-based abstraction layer for Quality of
Service (QoS) Traffic Management, which includes: hierarchical scheduling,
traffic shaping, congestion management, packet marking. The goal is to
provide a simple generic API that is agnostic of the underlying HW, SW or
mixed HW-SW implementation.

Patch 1 uses the approach introduced by rte_flow in DPDK to extend the
ethdev functionality in a modular way for traffic management.

Patch 2 introduces the generic ethdev API for traffic management.

Cristian Dumitrescu (2):
  ethdev: add traffic management ops get API
  ethdev: add traffic management API

 MAINTAINERS                            |    5 +
 doc/api/doxy-api-index.md              |    2 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ethdev.c          |   12 +
 lib/librte_ether/rte_ethdev.h          |   20 +
 lib/librte_ether/rte_ether_version.map |   36 +
 lib/librte_ether/rte_tm.c              |  438 ++++++++
 lib/librte_ether/rte_tm.h              | 1904 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  366 ++++++
 9 files changed, 2787 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v6 1/2] ethdev: add traffic management ops get API
  2017-06-12 13:35           ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
@ 2017-06-12 13:35             ` Cristian Dumitrescu
  2017-06-12 13:35             ` [PATCH v6 2/2] ethdev: add traffic management API Cristian Dumitrescu
  2017-06-27 13:24             ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Dumitrescu, Cristian
  2 siblings, 0 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-06-12 13:35 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

The rte_flow feature breaks the monolithic approach for ethdev by
introducing the new rte_flow API to ethdev using a plugin-like approach.

Basically, the rte_flow API is still logically part of ethdev:
- It extends the ethdev functionality: rte_flow is a new feature/
  capability of ethdev;
- all its functions work on an Ethernet device: the first parameter of the
  rte_flow functions is Ethernet device port ID.

Also, the rte_flow API is a sort of capability plugin for ethdev:
- the rte_flow API functions have their own name space: they are called
  rte_flow_operationXYZ() as opposed to rte_eth_dev_flow_operationXYZ());
- the rte_flow API functions are placed in separate files in the same
  librte_ether folder as opposed to rte_ethdev.[hc].

The way it works is by using the existing ethdev API function
rte_eth_dev_filter_ctrl() to query the current Ethernet device port ID for
the support of the rte_flow capability and return the pointer to the
rte_flow operations when supported and NULL otherwise:

struct rte_flow_ops *eth_flow_ops;
int rte = rte_eth_dev_filter_ctrl(eth_port_id,
	RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, &eth_flow_ops);

This patch reuses the same approach for ethdev Traffic Management API.

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Keith Wiles <keith.wiles@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
Changes in v4:
- Followed up on suggestion from Thomas: Replaced generic capability
  ethdev API function with traffic management specific function
  rte_eth_dev_tm_ops_get()

Changes in v3:
- Followed up on suggestion from Jerin: renamed capability from
  Hierarchical Scheduler (sched) to Traffic Manager (tm)

Changes in v2:
- Followed up on suggestion from Jerin and Hemant: renamed
  capability_control() to capability_ops_get()
- Added ACK from Keith, Jerin and Hemant

 lib/librte_ether/rte_ethdev.c          | 12 ++++++++++++
 lib/librte_ether/rte_ethdev.h          | 20 ++++++++++++++++++++
 lib/librte_ether/rte_ether_version.map |  6 ++++++
 3 files changed, 38 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 64aefdd..7d2c7e1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3020,6 +3020,18 @@ rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 	return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op, arg);
 }
 
+int
+rte_eth_dev_tm_ops_get(uint8_t port_id, void *ops)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tm_ops_get, -ENOTSUP);
+	return (*dev->dev_ops->tm_ops_get)(dev, ops);
+}
+
 void *
 rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
 		rte_rx_callback_fn fn, void *user_param)
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2784ad1..78beb60 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1441,6 +1441,9 @@ typedef int (*eth_filter_ctrl_t)(struct rte_eth_dev *dev,
 				 void *arg);
 /**< @internal Take operations to assigned filter type on an Ethernet device */
 
+typedef int (*eth_tm_ops_get_t)(struct rte_eth_dev *dev, void *ops);
+/**< @internal Get Traffic Management (TM) operations on an Ethernet device */
+
 typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev,
 				 struct rte_eth_dcb_info *dcb_info);
 /**< @internal Get dcb information on an Ethernet device */
@@ -1573,6 +1576,9 @@ struct eth_dev_ops {
 	/**< Get extended device statistic values by ID. */
 	eth_xstats_get_names_by_id_t xstats_get_names_by_id;
 	/**< Get name of extended device statistics by ID. */
+
+	eth_tm_ops_get_t tm_ops_get;
+	/**< Get Traffic Management (TM) operations. */
 };
 
 /**
@@ -4105,6 +4111,20 @@ int rte_eth_dev_filter_ctrl(uint8_t port_id, enum rte_filter_type filter_type,
 			enum rte_filter_op filter_op, void *arg);
 
 /**
+ * Take Traffic Management (TM) operations on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param ops
+ *   Pointer to TM operations.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+int rte_eth_dev_tm_ops_get(uint8_t port_id, void *ops);
+
+/**
  * Get DCB information on an Ethernet device.
  *
  * @param port_id
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d6726bb..2788e7b 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -156,3 +156,9 @@ DPDK_17.05 {
 	rte_eth_xstats_get_names_by_id;
 
 } DPDK_17.02;
+
+DPDK_17.08 {
+    global:
+
+	rte_eth_dev_tm_ops_get;
+} DPDK_17.05;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v6 2/2] ethdev: add traffic management API
  2017-06-12 13:35           ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  2017-06-12 13:35             ` [PATCH v6 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
@ 2017-06-12 13:35             ` Cristian Dumitrescu
  2017-06-27 13:24             ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Dumitrescu, Cristian
  2 siblings, 0 replies; 52+ messages in thread
From: Cristian Dumitrescu @ 2017-06-12 13:35 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, jasvinder.singh, wenzhuo.lu

This patch introduces the generic ethdev API for the traffic manager
capability, which includes: hierarchical scheduling, traffic shaping,
congestion management, packet marking.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow)
- Capability query API per port, per level and per node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ)
- Traffic shaping: single/dual rate, private (per node) and shared (by
  multiple nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Acked-by: Balasubramanian.Manoharan <balasubramanian.manoharan@caviumnetworks.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
Changes in v6:
- Implemented feedback from Jerin [9]
	- Doxygen: improved @see to point to specific capability fields
	- Doxygen: fixed hooks in doc/api/doxy-api-index.md

Changes in v5:
- Implemented feedback from Jerin [8]
	- Add level parameter to node add API function
	- Doxygen: fixed comments applicable to field below/before
	- Doxygen: added missing @see
	- Doxygen: fixed hooks in doc/api/doxy-api-index.md
	- Doxygen: fixed table rendering
	- Added copyright on API header file from Cavium and NXP to
	  existing Intel copyright
	- MANTAINERS: added next-tm tree
- Added V4 ACKs from Jerin, Bala and Hemant

Changes in v4:
- Implemented feedback from Hemant [6]
	- Capability API: Reworked the port, level and node capability API
	  data structure to remove confusion due to "summary across all
	  nodes" approach, which made it unclear whether a particular
	  capability is supported by all nodes or by at least one node.
	- Capability API: Added flags for "all nodes have identical
	  capability set"
	- Suspended state: documented the required behavior in Doxygen
	  description
- Implemented feedback from Jerin [7]
	- Node add: added level parameter (see new API function:
	  rte_tm_node_add_check_level())
	- RTE_TM_ETH_FRAMING_OVERHEAD, RTE_TM_ETH_FRAMING_OVERHEAD_FCS:
	  documented their usage in their Doxygen description
	- Capability API: for each function, mention the related
	  capability field (Doxygen @see)
	- stats_mask, capability_mask: document the enum flags used to
	  build each mask (Doxygen @see)
	- Rename rte_tm_get_leaf_nodes() to
	  rte_tm_get_number_of_leaf_nodes()
	- Doxygen: add @param[in, out] to the description of all API funcs
	- Doxygen: fix hooks in doc/api/doxy-api-index.md
- Rename rte_tm_hierarchy_set() to rte_tm_hierarchy_commit(), improved
  Doxygen description
- Node add, node delete: improved Doxygen description
- Fixed incorrect design assumption that packet-based weight mode for WFQ
  is identical to WRR. As result, removed all references to WRR support.
  Renamed the "scheduling mode" node parameters to "wfq_weight_mode".

Changes in v3:
- Implemented feedback from Jerin [5]
- Changed naming convention: scheddev -> tm
- Improvements on the capability API:
	- Specification of marking capabilities per color
	- WFQ/WRR groups: sp_n_children_max ->
	  wfq_wrr_n_children_per_group_max, added wfq_wrr_n_groups_max,
	  improved description of both, improved description of
	  wfq_wrr_weight_max
	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent
	  update
- Enforced/documented restrictions for root node (node_add() and
  update())
- Enforced/documented shaper profile restrictions on PIR: PIR != 0,
  PIR >= CIR
- Turned repetitive code in rte_tm.c into macro
- Removed dependency on rte_red.h file (added RED params to rte_tm.h)
- Color: removed "e_" from color names enum
- Fixed small Doxygen style issues

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added
	  clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, parent, role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API funcs
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below,
  hopefully nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated
      object IDs. IMO the choice to have application-generated object IDs
      adds marginal complexity to the driver (search ID function
      required), but it provides huge simplification for the application.
      The app does not need to worry about building & managing tree-like
      structure for storing driver-generated object IDs, the app can use
      its own convention for node IDs depending on the specific hierarchy
      that it needs. Trivial example: identify all level-2 nodes with IDs
      like 100, 200, 300, … and the level-3 nodes based on their level-2
      parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …, 310, 320,
      330, … and level-4 nodes based on their level-3 parents: 111, 112,
      113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log
      for the other related simplification that was implemented: leaf
      nodes now have predefined IDs that are the same with their Ethernet
      TX queue ID ( therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the
  shaper profile as part of node API (no shaper ID needed for private
  shapers), while the shared shapers are configured outside of the node
  API using shaper profile and communicated to the node using shared
  shaper ID. So there is no configuration overhead for shared shapers if
  the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet
  TX queue ID (therefore no translation is required for leaf nodes). This
  is also used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause
  (same as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate
  limiting based on IP packet bytes)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
[5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html
[6] Hemant's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-March/062354.html
[7] Jerin's feedback on v3: http://www.dpdk.org/ml/archives/dev/2017-April/063429.html
[8] Jerin's feedback on v4: http://www.dpdk.org/ml/archives/dev/2017-May/066932.html


 MAINTAINERS                            |    5 +
 doc/api/doxy-api-index.md              |    2 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_tm.c              |  438 ++++++++
 lib/librte_ether/rte_tm.h              | 1904 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  366 ++++++
 7 files changed, 2749 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f6095ef..3c7414f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -240,6 +240,11 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+Traffic Management API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+T: git://dpdk.org/next/dpdk-next-tm
+F: lib/librte_ether/rte_tm*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f5f1f19..bcd0fdd 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -41,6 +41,8 @@ There are many libraries, so their headers may be grouped by topics:
   [ethctrl]            (@ref rte_eth_ctrl.h),
   [rte_flow]           (@ref rte_flow.h),
   [rte_flow_driver]    (@ref rte_flow_driver.h),
+  [rte_tm]             (@ref rte_tm.h),
+  [rte_tm_driver]      (@ref rte_tm_driver.h),
   [cryptodev]          (@ref rte_cryptodev.h),
   [eventdev]           (@ref rte_eventdev.h),
   [devargs]            (@ref rte_devargs.h),
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 93fdde1..db692ae 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_tm.c
 
 #
 # Export include files
@@ -56,5 +57,7 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_tm.h
+SYMLINK-y-include += rte_tm_driver.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 2788e7b..5e8651d 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -161,4 +161,34 @@ DPDK_17.08 {
     global:
 
 	rte_eth_dev_tm_ops_get;
+	rte_tm_get_leaf_nodes;
+	rte_tm_node_type_get;
+	rte_tm_capabilities_get;
+	rte_tm_level_capabilities_get;
+	rte_tm_node_capabilities_get;
+	rte_tm_wred_profile_add;
+	rte_tm_wred_profile_delete;
+	rte_tm_shared_wred_context_add_update;
+	rte_tm_shared_wred_context_delete;
+	rte_tm_shaper_profile_add;
+	rte_tm_shaper_profile_delete;
+	rte_tm_shared_shaper_add_update;
+	rte_tm_shared_shaper_delete;
+	rte_tm_node_add;
+	rte_tm_node_delete;
+	rte_tm_node_suspend;
+	rte_tm_node_resume;
+	rte_tm_hierarchy_commit;
+	rte_tm_node_parent_update;
+	rte_tm_node_shaper_update;
+	rte_tm_node_shared_shaper_update;
+	rte_tm_node_stats_update;
+	rte_tm_node_wfq_weight_mode_update;
+	rte_tm_node_cman_update;
+	rte_tm_node_wred_context_update;
+	rte_tm_node_shared_wred_context_update;
+	rte_tm_node_stats_read;
+	rte_tm_mark_vlan_dei;
+	rte_tm_mark_ip_ecn;
+	rte_tm_mark_ip_dscp;
 } DPDK_17.05;
diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
new file mode 100644
index 0000000..7167965
--- /dev/null
+++ b/lib/librte_ether/rte_tm.c
@@ -0,0 +1,438 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm_driver.h"
+#include "rte_tm.h"
+
+/* Get generic traffic manager operations structure from a port. */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_tm_error_set(error,
+			ENODEV,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->tm_ops_get == NULL) ||
+		(dev->dev_ops->tm_ops_get(dev, &ops) != 0) ||
+		(ops == NULL)) {
+		rte_tm_error_set(error,
+			ENOSYS,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+#define RTE_TM_FUNC(port_id, func)				\
+({							\
+	const struct rte_tm_ops *ops =			\
+		rte_tm_ops_get(port_id, error);		\
+	if (ops == NULL)					\
+		return -rte_errno;			\
+							\
+	if (ops->func == NULL)				\
+		return -rte_tm_error_set(error,		\
+			ENOSYS,				\
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
+			NULL,				\
+			rte_strerror(ENOSYS));		\
+							\
+	ops->func;					\
+})
+
+/* Get number of leaf nodes */
+int
+rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops =
+		rte_tm_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_tm_error_set(error,
+			EINVAL,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node type (leaf or non-leaf) */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_type_get)(dev,
+		node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
+		cap, error);
+}
+
+/* Get level capabilities */
+int rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
+		level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
+		node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
+		wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
+		wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
+		shared_wred_context_id, wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
+		shared_wred_context_id, error);
+}
+
+/* Add shaper profile */
+int rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
+		shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
+		shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
+		shared_shaper_id, shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
+		shared_shaper_id, error);
+}
+
+/* Add node to port traffic manager hierarchy */
+int rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_add)(dev,
+		node_id, parent_node_id, priority, weight, level_id,
+		params, error);
+}
+
+/* Delete node from traffic manager hierarchy */
+int rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_delete)(dev,
+		node_id, error);
+}
+
+/* Suspend node */
+int rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_suspend)(dev,
+		node_id, error);
+}
+
+/* Resume node */
+int rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_resume)(dev,
+		node_id, error);
+}
+
+/* Commit the initial port traffic manager hierarchy */
+int rte_tm_hierarchy_commit(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, hierarchy_commit)(dev,
+		clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
+		node_id, parent_node_id, priority, weight, error);
+}
+
+/* Update node private shaper */
+int rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
+		node_id, shaper_profile_id, error);
+}
+
+/* Update node shared shapers */
+int rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
+		node_id, shared_shaper_id, add, error);
+}
+
+/* Update node stats */
+int rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
+		node_id, stats_mask, error);
+}
+
+/* Update WFQ weight mode */
+int rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *wfq_weight_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wfq_weight_mode_update)(dev,
+		node_id, wfq_weight_mode, n_sp_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
+		node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
+		node_id, wred_profile_id, error);
+}
+
+/* Update node shared WRED context */
+int rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
+		node_id, shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
+		node_id, stats, stats_mask, clear, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
new file mode 100644
index 0000000..c8ef2e1
--- /dev/null
+++ b/lib/librte_ether/rte_tm.h
@@ -0,0 +1,1904 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   Copyright(c) 2017 Cavium.
+ *   Copyright(c) 2017 NXP.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_H__
+#define __INCLUDE_RTE_TM_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API
+ *
+ * This interface provides the ability to configure the traffic manager in a
+ * generic way. It includes features such as: hierarchical scheduling,
+ * traffic shaping, congestion management, packet marking, etc.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Ethernet framing overhead.
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ *
+ * One of the typical values for the *pkt_length_adjust* field of the shaper
+ * profile.
+ *
+ * @see struct rte_tm_shaper_params
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead including the Frame Check Sequence (FCS) field.
+ * Useful when FCS is generated and added at the end of the Ethernet frame on
+ * TX side without any SW intervention.
+ *
+ * One of the typical values for the pkt_length_adjust field of the shaper
+ * profile.
+ *
+ * @see struct rte_tm_shaper_params
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**
+ * Invalid WRED profile ID.
+ *
+ * @see struct rte_tm_node_params
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_wred_context_update()
+ */
+#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**
+ *Invalid shaper profile ID.
+ *
+ * @see struct rte_tm_node_params
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_shaper_update()
+ */
+#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**
+ * Node ID for the parent of the root node.
+ *
+ * @see rte_tm_node_add()
+ */
+#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Node level ID used to disable level ID checking.
+ *
+ * @see rte_tm_node_add()
+ */
+#define RTE_TM_NODE_LEVEL_ID_ANY                     UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_tm_color {
+	RTE_TM_GREEN = 0, /**< Green */
+	RTE_TM_YELLOW, /**< Yellow */
+	RTE_TM_RED, /**< Red */
+	RTE_TM_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_tm_stats_type {
+	/** Number of packets scheduled from current node. */
+	RTE_TM_STATS_N_PKTS = 1 << 0,
+
+	/** Number of bytes scheduled from current node. */
+	RTE_TM_STATS_N_BYTES = 1 << 1,
+
+	/** Number of green packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/** Number of yellow packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/** Number of red packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/** Number of green bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/** Number of yellow bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/** Number of red bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/** Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/** Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_tm_node_stats {
+	/** Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/** Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/** Statistics counters for leaf nodes only. */
+	struct {
+		/** Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[RTE_TM_COLORS];
+
+		/** Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[RTE_TM_COLORS];
+
+		/** Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/** Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Traffic manager dynamic updates
+ */
+enum rte_tm_dynamic_update_type {
+	/** Dynamic parent node update. The new parent node is located on same
+	 * hierarchy level as the former parent node. Consequently, the node
+	 * whose parent is changed preserves its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
+
+	/** Dynamic parent node update. The new parent node is located on
+	 * different hierarchy level than the former parent node. Consequently,
+	 * the node whose parent is changed also changes its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
+
+	/** Dynamic node add/delete. */
+	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
+
+	/** Suspend/resume nodes. */
+	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
+
+	/** Dynamic switch between byte-based and packet-based WFQ weights. */
+	RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE = 1 << 4,
+
+	/** Dynamic update on number of SP priorities. */
+	RTE_TM_UPDATE_NODE_N_SP_PRIORITIES = 1 << 5,
+
+	/** Dynamic update of congestion management mode for leaf nodes. */
+	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
+
+	/** Dynamic update of the set of enabled stats counter types. */
+	RTE_TM_UPDATE_NODE_STATS = 1 << 7,
+};
+
+/**
+ * Traffic manager capabilities
+ */
+struct rte_tm_capabilities {
+	/** Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/** Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/** When non-zero, this flag indicates that all the non-leaf nodes
+	 * (with the exception of the root node) have identical capability set.
+	 */
+	int non_leaf_nodes_identical;
+
+	/** When non-zero, this flag indicates that all the leaf nodes have
+	 * identical capability set.
+	 */
+	int leaf_nodes_identical;
+
+	/** Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resources between private and
+	 * shared shapers, it is typically equal to the sum of
+	 * *shaper_private_n_max* and *shaper_shared_n_max*. The
+	 * value of zero indicates that traffic shaping is not supported.
+	 */
+	uint32_t shaper_n_max;
+
+	/** Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have their private shaper enabled. The
+	 * value of zero indicates that private shapers are not supported.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/** Maximum number of private shapers that support dual rate shaping.
+	 * Indicates the maximum number of nodes that can concurrently have
+	 * their private shaper enabled with dual rate support. Only valid when
+	 * private shapers are supported. The value of zero indicates that dual
+	 * rate shaping is not available for private shapers. The maximum value
+	 * is *shaper_private_n_max*.
+	 */
+	int shaper_private_dual_rate_n_max;
+
+	/** Minimum committed/peak rate (bytes per second) for any private
+	 * shaper. Valid only when private shapers are supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/** Maximum committed/peak rate (bytes per second) for any private
+	 * shaper. Valid only when private shapers are supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/** Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/** Maximum number of nodes that can share the same shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_per_shaper_max;
+
+	/** Maximum number of shared shapers a node can be part of. This
+	 * parameter indicates that there is at least one node that can be
+	 * configured with this many shared shapers, which might not be true for
+	 * all the nodes. Only valid when shared shapers are supported, in which
+	 * case it ranges from 1 to *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_shared_n_shapers_per_node_max;
+
+	/** Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/** Minimum committed/peak rate (bytes per second) for any shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/** Maximum committed/peak rate (bytes per second) for any shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/** Minimum value allowed for packet length adjustment for any private
+	 * or shared shaper.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/** Maximum value allowed for packet length adjustment for any private
+	 * or shared shaper.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/** Maximum number of children nodes. This parameter indicates that
+	 * there is at least one non-leaf node that can be configured with this
+	 * many children nodes, which might not be true for all the non-leaf
+	 * nodes.
+	 */
+	uint32_t sched_n_children_max;
+
+	/** Maximum number of supported priority levels. This parameter
+	 * indicates that there is at least one non-leaf node that can be
+	 * configured with this many priority levels for managing its children
+	 * nodes, which might not be true for all the non-leaf nodes. The value
+	 * of zero is invalid. The value of 1 indicates that only priority 0 is
+	 * supported, which essentially means that Strict Priority (SP)
+	 * algorithm is not supported.
+	 */
+	uint32_t sched_sp_n_priorities_max;
+
+	/** Maximum number of sibling nodes that can have the same priority at
+	 * any given time, i.e. maximum size of the WFQ sibling node group. This
+	 * parameter indicates there is at least one non-leaf node that meets
+	 * this condition, which might not be true for all the non-leaf nodes.
+	 * The value of zero is invalid. The value of 1 indicates that WFQ
+	 * algorithm is not supported. The maximum value is
+	 * *sched_n_children_max*.
+	 */
+	uint32_t sched_wfq_n_children_per_group_max;
+
+	/** Maximum number of priority levels that can have more than one child
+	 * node at any given time, i.e. maximum number of WFQ sibling node
+	 * groups that have two or more members. This parameter indicates there
+	 * is at least one non-leaf node that meets this condition, which might
+	 * not be true for all the non-leaf nodes. The value of zero states that
+	 * WFQ algorithm is not supported. The value of 1 indicates that
+	 * (*sched_sp_n_priorities_max* - 1) priority levels have at most one
+	 * child node, so there can be only one priority level with two or
+	 * more sibling nodes making up a WFQ group. The maximum value is:
+	 * min(floor(*sched_n_children_max* / 2), *sched_sp_n_priorities_max*).
+	 */
+	uint32_t sched_wfq_n_groups_max;
+
+	/** Maximum WFQ weight. The value of 1 indicates that all sibling nodes
+	 * with same priority have the same WFQ weight, so WFQ is reduced to FQ.
+	 */
+	uint32_t sched_wfq_weight_max;
+
+	/** Head drop algorithm support. When non-zero, this parameter
+	 * indicates that there is at least one leaf node that supports the head
+	 * drop algorithm, which might not be true for all the leaf nodes.
+	 */
+	int cman_head_drop_supported;
+
+	/** Maximum number of WRED contexts, either private or shared. In case
+	 * the implementation does not share any resources between private and
+	 * shared WRED contexts, it is typically equal to the sum of
+	 * *cman_wred_context_private_n_max* and
+	 * *cman_wred_context_shared_n_max*. The value of zero indicates that
+	 * WRED is not supported.
+	 */
+	uint32_t cman_wred_context_n_max;
+
+	/** Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have their private WRED
+	 * context enabled. The value of zero indicates that private WRED
+	 * contexts are not supported.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/** Maximum number of shared WRED contexts. The value of zero
+	 * indicates that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/** Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_per_context_max;
+
+	/** Maximum number of shared WRED contexts a leaf node can be part of.
+	 * This parameter indicates that there is at least one leaf node that
+	 * can be configured with this many shared WRED contexts, which might
+	 * not be true for all the leaf nodes. Only valid when shared WRED
+	 * contexts are supported, in which case it ranges from 1 to
+	 * *cman_wred_context_shared_n_max*.
+	 */
+	uint32_t cman_wred_context_shared_n_contexts_per_node_max;
+
+	/** Support for VLAN DEI packet marking (per color). */
+	int mark_vlan_dei_supported[RTE_TM_COLORS];
+
+	/** Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
+	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
+
+	/** Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
+	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
+
+	/** Support for IPv4/IPv6 DSCP packet marking (per color). */
+	int mark_ip_dscp_supported[RTE_TM_COLORS];
+
+	/** Set of supported dynamic update operations.
+	 * @see enum rte_tm_dynamic_update_type
+	 */
+	uint64_t dynamic_update_mask;
+
+	/** Set of supported statistics counter types.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Traffic manager level capabilities
+ */
+struct rte_tm_level_capabilities {
+	/** Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/** Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/** Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/** When non-zero, this flag indicates that all the non-leaf nodes on
+	 * this level have identical capability set. Valid only when
+	 * *n_nodes_nonleaf_max* is non-zero.
+	 */
+	int non_leaf_nodes_identical;
+
+	/** When non-zero, this flag indicates that all the leaf nodes on this
+	 * level have identical capability set. Valid only when
+	 * *n_nodes_leaf_max* is non-zero.
+	 */
+	int leaf_nodes_identical;
+
+	union {
+		/** Items valid only for the non-leaf nodes on this level. */
+		struct {
+			/** Private shaper support. When non-zero, it indicates
+			 * there is at least one non-leaf node on this level
+			 * with private shaper support, which may not be the
+			 * case for all the non-leaf nodes on this level.
+			 */
+			int shaper_private_supported;
+
+			/** Dual rate support for private shaper. Valid only
+			 * when private shaper is supported for the non-leaf
+			 * nodes on the current level. When non-zero, it
+			 * indicates there is at least one non-leaf node on this
+			 * level with dual rate private shaper support, which
+			 * may not be the case for all the non-leaf nodes on
+			 * this level.
+			 */
+			int shaper_private_dual_rate_supported;
+
+			/** Minimum committed/peak rate (bytes per second) for
+			 * private shapers of the non-leaf nodes of this level.
+			 * Valid only when private shaper is supported on this
+			 * level.
+			 */
+			uint64_t shaper_private_rate_min;
+
+			/** Maximum committed/peak rate (bytes per second) for
+			 * private shapers of the non-leaf nodes on this level.
+			 * Valid only when private shaper is supported on this
+			 * level.
+			 */
+			uint64_t shaper_private_rate_max;
+
+			/** Maximum number of shared shapers that any non-leaf
+			 * node on this level can be part of. The value of zero
+			 * indicates that shared shapers are not supported by
+			 * the non-leaf nodes on this level. When non-zero, it
+			 * indicates there is at least one non-leaf node on this
+			 * level that meets this condition, which may not be the
+			 * case for all the non-leaf nodes on this level.
+			 */
+			uint32_t shaper_shared_n_max;
+
+			/** Maximum number of children nodes. This parameter
+			 * indicates that there is at least one non-leaf node on
+			 * this level that can be configured with this many
+			 * children nodes, which might not be true for all the
+			 * non-leaf nodes on this level.
+			 */
+			uint32_t sched_n_children_max;
+
+			/** Maximum number of supported priority levels. This
+			 * parameter indicates that there is at least one
+			 * non-leaf node on this level that can be configured
+			 * with this many priority levels for managing its
+			 * children nodes, which might not be true for all the
+			 * non-leaf nodes on this level. The value of zero is
+			 * invalid. The value of 1 indicates that only priority
+			 * 0 is supported, which essentially means that Strict
+			 * Priority (SP) algorithm is not supported on this
+			 * level.
+			 */
+			uint32_t sched_sp_n_priorities_max;
+
+			/** Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size of
+			 * the WFQ sibling node group. This parameter indicates
+			 * there is at least one non-leaf node on this level
+			 * that meets this condition, which may not be true for
+			 * all the non-leaf nodes on this level. The value of
+			 * zero is invalid. The value of 1 indicates that WFQ
+			 * algorithm is not supported on this level. The maximum
+			 * value is *sched_n_children_max*.
+			 */
+			uint32_t sched_wfq_n_children_per_group_max;
+
+			/** Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ sibling node groups that
+			 * have two or more members. This parameter indicates
+			 * there is at least one non-leaf node on this level
+			 * that meets this condition, which might not be true
+			 * for all the non-leaf nodes. The value of zero states
+			 * that WFQ algorithm is not supported on this level.
+			 * The value of 1 indicates that
+			 * (*sched_sp_n_priorities_max* - 1) priority levels on
+			 * this level have at most one child node, so there can
+			 * be only one priority level with two or more sibling
+			 * nodes making up a WFQ group on this level. The
+			 * maximum value is:
+			 * min(floor(*sched_n_children_max* / 2),
+			 * *sched_sp_n_priorities_max*).
+			 */
+			uint32_t sched_wfq_n_groups_max;
+
+			/** Maximum WFQ weight. The value of 1 indicates that
+			 * all sibling nodes on this level with same priority
+			 * have the same WFQ weight, so on this level WFQ is
+			 * reduced to FQ.
+			 */
+			uint32_t sched_wfq_weight_max;
+
+			/** Mask of statistics counter types supported by the
+			 * non-leaf nodes on this level. Every supported
+			 * statistics counter type is supported by at least one
+			 * non-leaf node on this level, which may not be true
+			 * for all the non-leaf nodes on this level.
+			 * @see enum rte_tm_stats_type
+			 */
+			uint64_t stats_mask;
+		} nonleaf;
+
+		/** Items valid only for the leaf nodes on this level. */
+		struct {
+			/** Private shaper support. When non-zero, it indicates
+			 * there is at least one leaf node on this level with
+			 * private shaper support, which may not be the case for
+			 * all the leaf nodes on this level.
+			 */
+			int shaper_private_supported;
+
+			/** Dual rate support for private shaper. Valid only
+			 * when private shaper is supported for the leaf nodes
+			 * on this level. When non-zero, it indicates there is
+			 * at least one leaf node on this level with dual rate
+			 * private shaper support, which may not be the case for
+			 * all the leaf nodes on this level.
+			 */
+			int shaper_private_dual_rate_supported;
+
+			/** Minimum committed/peak rate (bytes per second) for
+			 * private shapers of the leaf nodes of this level.
+			 * Valid only when private shaper is supported for the
+			 * leaf nodes on this level.
+			 */
+			uint64_t shaper_private_rate_min;
+
+			/** Maximum committed/peak rate (bytes per second) for
+			 * private shapers of the leaf nodes on this level.
+			 * Valid only when private shaper is supported for the
+			 * leaf nodes on this level.
+			 */
+			uint64_t shaper_private_rate_max;
+
+			/** Maximum number of shared shapers that any leaf node
+			 * on this level can be part of. The value of zero
+			 * indicates that shared shapers are not supported by
+			 * the leaf nodes on this level. When non-zero, it
+			 * indicates there is at least one leaf node on this
+			 * level that meets this condition, which may not be the
+			 * case for all the leaf nodes on this level.
+			 */
+			uint32_t shaper_shared_n_max;
+
+			/** Head drop algorithm support. When non-zero, this
+			 * parameter indicates that there is at least one leaf
+			 * node on this level that supports the head drop
+			 * algorithm, which might not be true for all the leaf
+			 * nodes on this level.
+			 */
+			int cman_head_drop_supported;
+
+			/** Private WRED context support. When non-zero, it
+			 * indicates there is at least one node on this level
+			 * with private WRED context support, which may not be
+			 * true for all the leaf nodes on this level.
+			 */
+			int cman_wred_context_private_supported;
+
+			/** Maximum number of shared WRED contexts that any
+			 * leaf node on this level can be part of. The value of
+			 * zero indicates that shared WRED contexts are not
+			 * supported by the leaf nodes on this level. When
+			 * non-zero, it indicates there is at least one leaf
+			 * node on this level that meets this condition, which
+			 * may not be the case for all the leaf nodes on this
+			 * level.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+
+			/** Mask of statistics counter types supported by the
+			 * leaf nodes on this level. Every supported statistics
+			 * counter type is supported by at least one leaf node
+			 * on this level, which may not be true for all the leaf
+			 * nodes on this level.
+			 * @see enum rte_tm_stats_type
+			 */
+			uint64_t stats_mask;
+		} leaf;
+	};
+};
+
+/**
+ * Traffic manager node capabilities
+ */
+struct rte_tm_node_capabilities {
+	/** Private shaper support for the current node. */
+	int shaper_private_supported;
+
+	/** Dual rate shaping support for private shaper of current node.
+	 * Valid only when private shaper is supported by the current node.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/** Minimum committed/peak rate (bytes per second) for private
+	 * shaper of current node. Valid only when private shaper is supported
+	 * by the current node.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/** Maximum committed/peak rate (bytes per second) for private
+	 * shaper of current node. Valid only when private shaper is supported
+	 * by the current node.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/** Maximum number of shared shapers the current node can be part of.
+	 * The value of zero indicates that shared shapers are not supported by
+	 * the current node.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	union {
+		/** Items valid only for non-leaf nodes. */
+		struct {
+			/** Maximum number of children nodes. */
+			uint32_t sched_n_children_max;
+
+			/** Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sched_sp_n_priorities_max;
+
+			/** Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size
+			 * of the WFQ sibling node group. The value of zero
+			 * is invalid. The value of 1 indicates that WFQ
+			 * algorithm is not supported. The maximum value is
+			 * *sched_n_children_max*.
+			 */
+			uint32_t sched_wfq_n_children_per_group_max;
+
+			/** Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ sibling node groups that have
+			 * two or more members. The value of zero states that
+			 * WFQ algorithm is not supported. The value of 1
+			 * indicates that (*sched_sp_n_priorities_max* - 1)
+			 * priority levels have at most one child node, so there
+			 * can be only one priority level with two or more
+			 * sibling nodes making up a WFQ group. The maximum
+			 * value is: min(floor(*sched_n_children_max* / 2),
+			 * *sched_sp_n_priorities_max*).
+			 */
+			uint32_t sched_wfq_n_groups_max;
+
+			/** Maximum WFQ weight. The value of 1 indicates that
+			 * all sibling nodes with same priority have the same
+			 * WFQ weight, so WFQ is reduced to FQ.
+			 */
+			uint32_t sched_wfq_weight_max;
+		} nonleaf;
+
+		/** Items valid only for leaf nodes. */
+		struct {
+			/** Head drop algorithm support for current node. */
+			int cman_head_drop_supported;
+
+			/** Private WRED context support for current node. */
+			int cman_wred_context_private_supported;
+
+			/** Maximum number of shared WRED contexts the current
+			 * node can be part of. The value of zero indicates that
+			 * shared WRED contexts are not supported by the current
+			 * node.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+
+	/** Mask of statistics counter types supported by the current node.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_tm_cman_mode {
+	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * Random Early Detection (RED) profile
+ */
+struct rte_tm_red_params {
+	/** Minimum queue threshold */
+	uint16_t min_th;
+
+	/** Maximum queue threshold */
+	uint16_t max_th;
+
+	/** Inverse of packet marking probability maximum value (maxp), i.e.
+	 * maxp_inv = 1 / maxp
+	 */
+	uint16_t maxp_inv;
+
+	/** Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
+	uint16_t wq_log2;
+};
+
+/**
+ * Weighted RED (WRED) profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_tm_wred_params {
+	/** One set of RED parameters per packet color */
+	struct rte_tm_red_params red_params[RTE_TM_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_tm_token_bucket {
+	/** Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/** Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the peak bucket has to be bigger than zero, as well as greater than
+ * or equal to the rate of the committed bucket.
+ */
+struct rte_tm_shaper_params {
+	/** Committed token bucket */
+	struct rte_tm_token_bucket committed;
+
+	/** Peak token bucket */
+	struct rte_tm_token_bucket peak;
+
+	/** Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each non-leaf node has multiple inputs (its children nodes) and single output
+ * (which is input to its parent node). It arbitrates its inputs using Strict
+ * Priority (SP) and Weighted Fair Queuing (WFQ) algorithms to schedule input
+ * packets to its output while observing its shaping (rate limiting)
+ * constraints.
+ *
+ * Algorithms such as Weighted Round Robin (WRR), Byte-level WRR, Deficit WRR
+ * (DWRR), etc. are considered approximations of the WFQ ideal and are
+ * assimilated to WFQ, although an associated implementation-dependent trade-off
+ * on accuracy, performance and resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP algorithm
+ * based on their priority, with zero (0) as the highest priority. Children with
+ * the same priority are scheduled using the WFQ algorithm according to their
+ * weights. The WFQ weight of a given child node is relative to the sum of the
+ * weights of all its sibling nodes that have the same priority, with one (1) as
+ * the lowest weight. For each SP priority, the WFQ weight mode can be set as
+ * either byte-based or packet-based.
+ *
+ * Each leaf node sits on top of a TX queue of the current Ethernet port. Hence,
+ * the leaf nodes are predefined, with their node IDs set to 0 .. (N-1), where N
+ * is the number of TX queues configured for the current Ethernet port. The
+ * non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_tm_node_params {
+	/** Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/** User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/** Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	union {
+		/** Parameters only valid for non-leaf nodes. */
+		struct {
+			/** WFQ weight mode for each SP priority. When NULL, it
+			 * indicates that WFQ is to be used for all priorities.
+			 * When non-NULL, it points to a pre-allocated array of
+			 * *n_sp_priorities* values, with non-zero value for
+			 * byte-mode and zero for packet-mode.
+			 */
+			int *wfq_weight_mode;
+
+			/** Number of SP priorities. */
+			uint32_t n_sp_priorities;
+		} nonleaf;
+
+		/** Parameters only valid for leaf nodes. */
+		struct {
+			/** Congestion management mode */
+			enum rte_tm_cman_mode cman;
+
+			/** WRED parameters (only valid when *cman* is set to
+			 * WRED).
+			 */
+			struct {
+				/** WRED profile for private WRED context. The
+				 * absence of a private WRED context for the
+				 * current leaf node is indicated by value
+				 * RTE_TM_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t wred_profile_id;
+
+				/** User allocated array of shared WRED context
+				 * IDs. When set to NULL, it indicates that the
+				 * current leaf node should not currently be
+				 * part of any shared WRED contexts.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/** Number of elements in the
+				 * *shared_wred_context_id* array. Only valid
+				 * when *shared_wred_context_id* is non-NULL,
+				 * in which case it should be non-zero.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+
+	/** Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 * @see enum rte_tm_stats_type
+	 */
+	uint64_t stats_mask;
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_tm_error::cause.
+ */
+enum rte_tm_error_type {
+	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
+	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_TM_ERROR_TYPE_CAPABILITIES,
+	RTE_TM_ERROR_TYPE_LEVEL_ID,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
+	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SP_PRIORITIES,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_TM_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_tm_error {
+	enum rte_tm_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Traffic manager get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port)
+ * and their IDs are 0 .. (N-1).
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_get_number_of_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node ID validate and type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
+ * the number of TX queues of the current Ethernet port. The non-leaf nodes
+ * have their IDs generated by the application outside of the above range,
+ * which is reserved for leaf nodes.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID value. Needs to be valid.
+ * @param[out] is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] cap
+ *   Traffic manager capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager level capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] level_id
+ *   The hierarchy level identifier. The value of 0 identifies the level of the
+ *   root node.
+ * @param[out] cap
+ *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node capabilities get
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] cap
+ *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param[in] profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::cman_wred_context_n_max
+ */
+int
+rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is
+ * currently at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::cman_wred_context_n_max
+ */
+int
+rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several hierarchy leaf nodes
+ * configured to use WRED as the congestion management mode.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID
+ * @param[in] wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::cman_wred_context_shared_n_max
+ */
+int
+rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
+ * context.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::cman_wred_context_shared_n_max
+ */
+int
+rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param[in] profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::shaper_n_max
+ */
+int
+rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::shaper_n_max
+ */
+int
+rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
+ * no longer using the shaper profile previously assigned to it and is updated
+ * to use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::shaper_shared_n_max
+ */
+int
+rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. hierarchy node) of this shared shaper.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::shaper_shared_n_max
+ */
+int
+rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ *weight* to schedule its new
+ * child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that
+ * are subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
+ * can only be one node with this parent ID (i.e. the root node). Further
+ * restrictions for root node: needs to be non-leaf, its private shaper profile
+ * needs to be valid and single rate, cannot use any shared shapers.
+ *
+ * When called before rte_tm_hierarchy_commit() invocation, this function is
+ * typically used to define the initial start-up hierarchy for the port.
+ * Provided that dynamic hierarchy updates are supported by the current port (as
+ * advertised in the port capability set), this function can be also called
+ * after the rte_tm_hierarchy_commit() invocation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param[in] parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[in] level_id
+ *   Level ID that should be met by this node. The hierarchy level of the
+ *   current node is already fully specified through its parent node (i.e. the
+ *   level of this node is equal to the level of its parent node plus one),
+ *   therefore the reason for providing this parameter is to enable the
+ *   application to perform step-by-step checking of the node level during
+ *   successive invocations of this function. When not desired, this check can
+ *   be disabled by assigning value RTE_TM_NODE_LEVEL_ID_ANY to this parameter.
+ * @param[in] params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_hierarchy_commit()
+ * @see RTE_TM_UPDATE_NODE_ADD_DELETE
+ * @see RTE_TM_NODE_LEVEL_ID_ANY
+ * @see struct rte_tm_capabilities
+ */
+int
+rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has
+ * at least one user (i.e. child node).
+ *
+ * When called before rte_tm_hierarchy_commit() invocation, this function is
+ * typically used to define the initial start-up hierarchy for the port.
+ * Provided that dynamic hierarchy updates are supported by the current port (as
+ * advertised in the port capability set), this function can be also called
+ * after the rte_tm_hierarchy_commit() invocation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_ADD_DELETE
+ */
+int
+rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node suspend
+ *
+ * Suspend an existing node. While the node is in suspended state, no packet is
+ * scheduled from this node and its descendants. The node exits the suspended
+ * state through the node resume operation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_resume()
+ * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
+ */
+int
+rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node resume
+ *
+ * Resume an existing node that is currently in suspended state. The node
+ * entered the suspended state as result of a previous node suspend operation.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_suspend()
+ * @see RTE_TM_UPDATE_NODE_SUSPEND_RESUME
+ */
+int
+rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager hierarchy commit
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the start-up hierarchy.
+ *
+ * This function typically performs the following steps:
+ *    a) It validates the start-up hierarchy that was previously defined for the
+ *       current port through successive rte_tm_node_add() invocations;
+ *    b) Assuming successful validation, it performs all the necessary port
+ *       specific configuration operations to install the specified hierarchy on
+ *       the current port, with immediate effect once the port is started.
+ *
+ * This function fails when the currently configured hierarchy is not supported
+ * by the Ethernet port, in which case the user can abort or try out another
+ * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can be
+ * build from scratch (when *clear_on_fail* is enabled) or by modifying the
+ * existing hierarchy configuration (when *clear_on_fail* is disabled).
+ *
+ * Note that this function can still fail due to other causes (e.g. not enough
+ * memory available in the system, etc), even though the specified hierarchy is
+ * supported in principle by the current port.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see rte_tm_node_add()
+ * @see rte_tm_node_delete()
+ */
+int
+rte_tm_hierarchy_commit(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node parent update
+ *
+ * Restriction for root node: its parent cannot be changed.
+ *
+ * This function can only be called after the rte_tm_hierarchy_commit()
+ * invocation. Its success depends on the port support for this operation, as
+ * advertised through the port capability set.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param[in] priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param[in] weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL
+ * @see RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL
+ */
+int
+rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private shaper update
+ *
+ * Restriction for the root node: its private shaper profile needs to be valid
+ * and single rate.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::shaper_private_n_max
+ */
+int
+rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared shapers update
+ *
+ * Restriction for root node: cannot use any shared rate shapers.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param[in] add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::shaper_shared_n_max
+ */
+int
+rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node enabled statistics counters update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[in] stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to
+ *   be disabled for the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see enum rte_tm_stats_type
+ * @see RTE_TM_UPDATE_NODE_STATS
+ */
+int
+rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node WFQ weight mode update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] wfq_weight_mode
+ *   WFQ weight mode for each SP priority. When NULL, it indicates that WFQ is
+ *   to be used for all priorities. When non-NULL, it points to a pre-allocated
+ *   array of *n_sp_priorities* values, with non-zero value for byte-mode and
+ *   zero for packet-mode.
+ * @param[in] n_sp_priorities
+ *   Number of SP priorities.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_WFQ_WEIGHT_MODE
+ * @see RTE_TM_UPDATE_NODE_N_SP_PRIORITIES
+ */
+int
+rte_tm_node_wfq_weight_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *wfq_weight_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node congestion management mode update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] cman
+ *   Congestion management mode.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see RTE_TM_UPDATE_NODE_CMAN
+ */
+int
+rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private WRED context update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with the
+ *   latter disabling the private WRED context of the current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+  *
+ * @see struct rte_tm_capabilities::cman_wred_context_private_n_max
+*/
+int
+rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared WRED context update
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param[in] shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param[in] add
+ *   Set to non-zero value to add this shared WRED context to current node or
+ *   to zero to delete this shared WRED context from current node.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::cman_wred_context_shared_n_max
+ */
+int
+rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node statistics counters read
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] node_id
+ *   Node ID. Needs to be valid.
+ * @param[out] stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param[out] stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters
+ *   retrieved with the *stats* structure are valid.
+ * @param[in] clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read,
+ *   otherwise the statistics counters are left untouched.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see enum rte_tm_stats_type
+ */
+int
+rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_vlan_dei_supported
+ */
+int
+rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
+ * Notification (ECN) field (2 bits). The DSCP field is typically used to
+ * encode the traffic class and/or drop priority (RFC 2597), while the ECN
+ * field is used by RFC 3168 to implement a congestion notification mechanism
+ * to be leveraged by transport layer protocols such as TCP and SCTP that have
+ * congestion control mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10
+ * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
+ * that congestion is experienced). The destination endpoint can use the
+ * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
+ * source endpoint, which acknowledges it back to the destination endpoint with
+ * the Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_ip_ecn_tcp_supported
+ * @see struct rte_tm_capabilities::mark_ip_ecn_sctp_supported
+ */
+int
+rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ * <pre>                   Class 1    Class 2    Class 3    Class 4   </pre>
+ * <pre>                 +----------+----------+----------+----------+</pre>
+ * <pre>Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |</pre>
+ * <pre>Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |</pre>
+ * <pre>High Drop Prec   |  001110  |  010110  |  011110  |  100110  |</pre>
+ * <pre>                 +----------+----------+----------+----------+</pre>
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
+ * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[in] mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param[in] mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param[in] mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param[out] error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ *
+ * @see struct rte_tm_capabilities::mark_ip_dscp_supported
+ */
+int
+rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_H__ */
diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
new file mode 100644
index 0000000..a5b698f
--- /dev/null
+++ b/lib/librte_ether/rte_tm_driver.h
@@ -0,0 +1,366 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_DRIVER_H__
+#define __INCLUDE_RTE_TM_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** @internal Traffic manager node ID validate and type get */
+typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager capabilities get */
+typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager level capabilities get */
+typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node capabilities get */
+typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager WRED profile add */
+typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager WRED profile delete */
+typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared WRED context add */
+typedef int (*rte_tm_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared WRED context delete */
+typedef int (*rte_tm_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shaper profile add */
+typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shaper profile delete */
+typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared shaper add/update */
+typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager shared shaper delete */
+typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node add */
+typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	uint32_t level_id,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node delete */
+typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node suspend */
+typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node resume */
+typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager hierarchy commit */
+typedef int (*rte_tm_hierarchy_commit_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node parent update */
+typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node shaper update */
+typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node shaper update */
+typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node stats update */
+typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node WFQ weight mode update */
+typedef int (*rte_tm_node_wfq_weight_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *wfq_weigth_mode,
+	uint32_t n_sp_priorities,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node congestion management mode update */
+typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node WRED context update */
+typedef int (*rte_tm_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager node WRED context update */
+typedef int (*rte_tm_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager read stats counters for specific node */
+typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager packet marking - VLAN DEI */
+typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager packet marking - IPv4/IPv6 ECN */
+typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/** @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
+typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+struct rte_tm_ops {
+	/** Traffic manager node type get */
+	rte_tm_node_type_get_t node_type_get;
+
+	/** Traffic manager capabilities_get */
+	rte_tm_capabilities_get_t capabilities_get;
+	/** Traffic manager level capabilities_get */
+	rte_tm_level_capabilities_get_t level_capabilities_get;
+	/** Traffic manager node capabilities get */
+	rte_tm_node_capabilities_get_t node_capabilities_get;
+
+	/** Traffic manager WRED profile add */
+	rte_tm_wred_profile_add_t wred_profile_add;
+	/** Traffic manager WRED profile delete */
+	rte_tm_wred_profile_delete_t wred_profile_delete;
+	/** Traffic manager shared WRED context add/update */
+	rte_tm_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Traffic manager shared WRED context delete */
+	rte_tm_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Traffic manager shaper profile add */
+	rte_tm_shaper_profile_add_t shaper_profile_add;
+	/** Traffic manager shaper profile delete */
+	rte_tm_shaper_profile_delete_t shaper_profile_delete;
+	/** Traffic manager shared shaper add/update */
+	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Traffic manager shared shaper delete */
+	rte_tm_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Traffic manager node add */
+	rte_tm_node_add_t node_add;
+	/** Traffic manager node delete */
+	rte_tm_node_delete_t node_delete;
+	/** Traffic manager node suspend */
+	rte_tm_node_suspend_t node_suspend;
+	/** Traffic manager node resume */
+	rte_tm_node_resume_t node_resume;
+	/** Traffic manager hierarchy commit */
+	rte_tm_hierarchy_commit_t hierarchy_commit;
+
+	/** Traffic manager node parent update */
+	rte_tm_node_parent_update_t node_parent_update;
+	/** Traffic manager node shaper update */
+	rte_tm_node_shaper_update_t node_shaper_update;
+	/** Traffic manager node shared shaper update */
+	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Traffic manager node stats update */
+	rte_tm_node_stats_update_t node_stats_update;
+	/** Traffic manager node WFQ weight mode update */
+	rte_tm_node_wfq_weight_mode_update_t node_wfq_weight_mode_update;
+	/** Traffic manager node congestion management mode update */
+	rte_tm_node_cman_update_t node_cman_update;
+	/** Traffic manager node WRED context update */
+	rte_tm_node_wred_context_update_t node_wred_context_update;
+	/** Traffic manager node shared WRED context update */
+	rte_tm_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Traffic manager read statistics counters for current node */
+	rte_tm_node_stats_read_t node_stats_read;
+
+	/** Traffic manager packet marking - VLAN DEI */
+	rte_tm_mark_vlan_dei_t mark_vlan_dei;
+	/** Traffic manager packet marking - IPv4/IPv6 ECN */
+	rte_tm_mark_ip_ecn_t mark_ip_ecn;
+	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
+	rte_tm_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param[out] error
+ *   Pointer to error structure (may be NULL).
+ * @param[in] code
+ *   Related error code (rte_errno).
+ * @param[in] type
+ *   Cause field and error type.
+ * @param[in] cause
+ *   Object responsible for the error.
+ * @param[in] message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_tm_error_set(struct rte_tm_error *error,
+		   int code,
+		   enum rte_tm_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_tm_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic traffic manager operations structure from a port
+ *
+ * @param[in] port_id
+ *   The port identifier of the Ethernet device.
+ * @param[out] error
+ *   Error details
+ *
+ * @return
+ *   The traffic manager operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management
  2017-06-12 13:35           ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
  2017-06-12 13:35             ` [PATCH v6 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
  2017-06-12 13:35             ` [PATCH v6 2/2] ethdev: add traffic management API Cristian Dumitrescu
@ 2017-06-27 13:24             ` Dumitrescu, Cristian
  2 siblings, 0 replies; 52+ messages in thread
From: Dumitrescu, Cristian @ 2017-06-27 13:24 UTC (permalink / raw)
  To: dev
  Cc: thomas, jerin.jacob, balasubramanian.manoharan, hemant.agrawal,
	shreyansh.jain, Singh, Jasvinder, Lu, Wenzhuo

> This patch set introduces an ethdev-based abstraction layer for Quality of
> Service (QoS) Traffic Management, which includes: hierarchical scheduling,
> traffic shaping, congestion management, packet marking. The goal is to
> provide a simple generic API that is agnostic of the underlying HW, SW or
> mixed HW-SW implementation.
> 
> Patch 1 uses the approach introduced by rte_flow in DPDK to extend the
> ethdev functionality in a modular way for traffic management.
> 
> Patch 2 introduces the generic ethdev API for traffic management.
> 
> Cristian Dumitrescu (2):
>   ethdev: add traffic management ops get API
>   ethdev: add traffic management API

Series applied to dpdk-next-tm/master.

Many thanks to all the people involved in creating the DPDK Traffic Management API!

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2017-06-27 13:24 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-04  1:10 [PATCH v3 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-03-04  1:10 ` [PATCH v3 1/2] ethdev: add capability control API Cristian Dumitrescu
2017-03-06 10:32   ` Thomas Monjalon
2017-03-06 16:35     ` Dumitrescu, Cristian
2017-03-06 16:57       ` Thomas Monjalon
2017-03-06 18:28         ` Dumitrescu, Cristian
2017-03-06 20:21           ` Thomas Monjalon
2017-03-06 20:41             ` Wiles, Keith
2017-03-06 20:54               ` Stephen Hemminger
2017-03-07 10:14                 ` Dumitrescu, Cristian
2017-03-07 12:56                   ` Thomas Monjalon
2017-03-07 19:17                     ` Wiles, Keith
2017-03-06 16:36     ` Dumitrescu, Cristian
2017-05-19 17:12   ` [PATCH v4 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
2017-05-19 17:12     ` [PATCH v4 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
2017-06-09 16:51       ` [PATCH v5 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
2017-06-09 16:51         ` [PATCH v5 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
2017-06-09 16:51         ` [PATCH v5 2/2] ethdev: add traffic management API Cristian Dumitrescu
2017-06-12  3:36           ` Jerin Jacob
2017-06-12 10:24             ` Dumitrescu, Cristian
2017-06-12 13:35           ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Cristian Dumitrescu
2017-06-12 13:35             ` [PATCH v6 1/2] ethdev: add traffic management ops get API Cristian Dumitrescu
2017-06-12 13:35             ` [PATCH v6 2/2] ethdev: add traffic management API Cristian Dumitrescu
2017-06-27 13:24             ` [PATCH v6 0/2] ethdev: abstraction layer for QoS traffic management Dumitrescu, Cristian
2017-05-19 17:12     ` [PATCH v4 2/2] ethdev: add traffic management API Cristian Dumitrescu
2017-05-19 17:34       ` Stephen Hemminger
2017-05-22 14:25         ` Dumitrescu, Cristian
2017-05-24 11:28       ` Hemant Agrawal
2017-05-31 13:45       ` Jerin Jacob
2017-05-31 17:05         ` Manoharan, Balasubramanian
2017-03-04  1:10 ` [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-03-06 10:38   ` Thomas Monjalon
2017-03-06 16:59     ` Dumitrescu, Cristian
2017-03-06 20:07       ` Thomas Monjalon
2017-03-07 19:29         ` Dumitrescu, Cristian
2017-03-08  9:51           ` O'Driscoll, Tim
2017-03-10 18:37             ` Dumitrescu, Cristian
2017-03-15 12:43               ` Thomas Monjalon
2017-03-16 16:23                 ` Dumitrescu, Cristian
2017-03-16 17:29                   ` Thomas Monjalon
2017-03-16 17:40                     ` Dumitrescu, Cristian
2017-03-16 18:10                       ` Thomas Monjalon
2017-03-16 19:06                         ` Dumitrescu, Cristian
2017-03-24 19:55                           ` Dumitrescu, Cristian
2017-03-06 16:15   ` Stephen Hemminger
2017-03-06 18:17     ` Dumitrescu, Cristian
2017-03-16 17:35   ` Thomas Monjalon
2017-03-30 10:32   ` Hemant Agrawal
2017-04-07 16:51     ` Dumitrescu, Cristian
2017-04-07 13:20   ` Jerin Jacob
2017-04-07 17:47     ` Dumitrescu, Cristian
2017-04-10 14:00       ` Jerin Jacob

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.