All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next Patch v2 0/5] octeontx2-pf: HTB offload support
@ 2023-01-18 10:51 Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 1/5] sch_htb: Allow HTB priority parameter in offload mode Hariprasad Kelam
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:51 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, maxtram95

octeontx2 silicon and CN10K transmit interface consists of five
transmit levels starting from MDQ, TL4 to TL1. Once packets are
submitted to MDQ, hardware picks all active MDQs using strict
priority, and MDQs having the same priority level are chosen using
round robin. Each packet will traverse MDQ, TL4 to TL1 levels.
Each level contains an array of queues to support scheduling and
shaping.

As HTB supports classful queuing mechanism by supporting rate and
ceil and allow the user to control the absolute bandwidth to
particular classes of traffic the same can be achieved by
configuring shapers and schedulers on different transmit levels.

This series of patches adds support for HTB offload,

Patch1: Allow strict priority parameter in HTB offload mode.

Patch2: defines APIs such that the driver can dynamically initialize/
        deinitialize the send queues.

Patch3: Refactors transmit alloc/free calls as preparation for QOS
        offload code.

Patch4: Adds devlink support for the user to configure round-robin
        priority at TL1

Hariprasad Kelam (2):
  octeontx2-pf: Refactor schedular queue alloc/free calls
  octeontx2-pf: Add devlink support to configure TL1 RR_PRIO

Naveen Mamindlapalli (2):
  sch_htb: Allow HTB priority parameter in offload mode
  octeontx2-pf: Add support for HTB offload

Subbaraya Sundeep (1):
  octeontx2-pf: qos send queues management

V2 * ensure other drivers won't effect by allowing 'prio'
     parameter in htb offload mode.

 .../ethernet/marvell/octeontx2/af/common.h    |    2 +-
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |    9 +-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   15 +
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |    1 +
 .../marvell/octeontx2/af/rvu_debugfs.c        |    5 +
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   |   84 +-
 .../ethernet/marvell/octeontx2/nic/Makefile   |    2 +-
 .../marvell/octeontx2/nic/otx2_common.c       |  115 +-
 .../marvell/octeontx2/nic/otx2_common.h       |   30 +-
 .../marvell/octeontx2/nic/otx2_devlink.c      |   84 +
 .../marvell/octeontx2/nic/otx2_ethtool.c      |   31 +-
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |   93 +-
 .../ethernet/marvell/octeontx2/nic/otx2_reg.h |   13 +
 .../ethernet/marvell/octeontx2/nic/otx2_tc.c  |    7 +-
 .../marvell/octeontx2/nic/otx2_txrx.c         |   27 +-
 .../marvell/octeontx2/nic/otx2_txrx.h         |    3 +-
 .../ethernet/marvell/octeontx2/nic/otx2_vf.c  |    8 +-
 .../net/ethernet/marvell/octeontx2/nic/qos.c  | 1547 +++++++++++++++++
 .../net/ethernet/marvell/octeontx2/nic/qos.h  |   71 +
 .../ethernet/marvell/octeontx2/nic/qos_sq.c   |  304 ++++
 .../net/ethernet/mellanox/mlx5/core/en/qos.c  |    6 +
 include/net/pkt_cls.h                         |    1 +
 net/sched/sch_htb.c                           |    7 +-
 23 files changed, 2378 insertions(+), 87 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/qos.c
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/qos.h
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c

--
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net-next Patch v2 1/5] sch_htb: Allow HTB priority parameter in offload mode
  2023-01-18 10:51 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
@ 2023-01-18 10:51 ` Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 2/5] octeontx2-pf: qos send queues management Hariprasad Kelam
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:51 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, maxtram95

From: Naveen Mamindlapalli <naveenm@marvell.com>

The current implementation of HTB offload returns the EINVAL error
for unsupported parameters like prio and quantum. This patch removes
the error returning checks for 'prio' parameter and populates its
value to tc_htb_qopt_offload structure such that driver can use the
same.

Add prio parameter check in mlx5 driver, as mlx5 devices are not capable
of supporting the prio parameter when htb offload is used. Report error
if prio parameter is set to a non-default value.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Co-developed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
---
v2*  ensure other drivers won't effect by allowing 'prio'
     parameter in htb offload mode


 drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 6 ++++++
 include/net/pkt_cls.h                            | 1 +
 net/sched/sch_htb.c                              | 7 +++----
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
index 2842195ee548..e16b3d6ea471 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
@@ -379,6 +379,12 @@ int mlx5e_htb_setup_tc(struct mlx5e_priv *priv, struct tc_htb_qopt_offload *htb_
 	if (!htb && htb_qopt->command != TC_HTB_CREATE)
 		return -EINVAL;

+	if (htb_qopt->prio) {
+		NL_SET_ERR_MSG(htb_qopt->extack,
+			       "prio parameter is not supported by device with HTB offload enabled.");
+		return -EINVAL;
+	}
+
 	switch (htb_qopt->command) {
 	case TC_HTB_CREATE:
 		if (!mlx5_qos_is_supported(priv->mdev)) {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 4cabb32a2ad9..02afb1baf39d 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -864,6 +864,7 @@ struct tc_htb_qopt_offload {
 	u16 qid;
 	u64 rate;
 	u64 ceil;
+	u8 prio;
 };

 #define TC_HTB_CLASSID_ROOT U32_MAX
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 2238edece1a4..f2d034cdd7bd 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1806,10 +1806,6 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			NL_SET_ERR_MSG(extack, "HTB offload doesn't support the quantum parameter");
 			goto failure;
 		}
-		if (hopt->prio) {
-			NL_SET_ERR_MSG(extack, "HTB offload doesn't support the prio parameter");
-			goto failure;
-		}
 	}

 	/* Keeping backward compatible with rate_table based iproute2 tc */
@@ -1905,6 +1901,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 					TC_HTB_CLASSID_ROOT,
 				.rate = max_t(u64, hopt->rate.rate, rate64),
 				.ceil = max_t(u64, hopt->ceil.rate, ceil64),
+				.prio = hopt->prio,
 				.extack = extack,
 			};
 			err = htb_offload(dev, &offload_opt);
@@ -1925,6 +1922,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 					TC_H_MIN(parent->common.classid),
 				.rate = max_t(u64, hopt->rate.rate, rate64),
 				.ceil = max_t(u64, hopt->ceil.rate, ceil64),
+				.prio = hopt->prio,
 				.extack = extack,
 			};
 			err = htb_offload(dev, &offload_opt);
@@ -2010,6 +2008,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 				.classid = cl->common.classid,
 				.rate = max_t(u64, hopt->rate.rate, rate64),
 				.ceil = max_t(u64, hopt->ceil.rate, ceil64),
+				.prio = hopt->prio,
 				.extack = extack,
 			};
 			err = htb_offload(dev, &offload_opt);
--
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next Patch v2 2/5] octeontx2-pf: qos send queues management
  2023-01-18 10:51 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 1/5] sch_htb: Allow HTB priority parameter in offload mode Hariprasad Kelam
@ 2023-01-18 10:51 ` Hariprasad Kelam
  2023-01-22 16:46   ` Simon Horman
  2023-01-18 10:51 ` [net-next Patch v2 3/5] octeontx2-pf: Refactor schedular queue alloc/free calls Hariprasad Kelam
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:51 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, maxtram95

From: Subbaraya Sundeep <sbhatta@marvell.com>

Current implementation is such that the number of Send queues (SQs)
are decided on the device probe which is equal to the number of online
cpus. These SQs are allocated and deallocated in interface open and c
lose calls respectively.

This patch defines new APIs for initializing and deinitializing Send
queues dynamically and allocates more number of transmit queues for
QOS feature.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
---
 .../marvell/octeontx2/af/rvu_debugfs.c        |   5 +
 .../ethernet/marvell/octeontx2/nic/Makefile   |   2 +-
 .../marvell/octeontx2/nic/otx2_common.c       |  35 ++-
 .../marvell/octeontx2/nic/otx2_common.h       |  22 +-
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  42 ++-
 .../marvell/octeontx2/nic/otx2_txrx.c         |  27 +-
 .../marvell/octeontx2/nic/otx2_txrx.h         |   3 +-
 .../ethernet/marvell/octeontx2/nic/otx2_vf.c  |   4 +-
 .../net/ethernet/marvell/octeontx2/nic/qos.h  |  19 ++
 .../ethernet/marvell/octeontx2/nic/qos_sq.c   | 290 ++++++++++++++++++
 10 files changed, 416 insertions(+), 33 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/qos.h
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
index fa280ebd3052..53ad8faf1969 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
@@ -1224,6 +1224,11 @@ static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 
 	for (aura = id; aura < max_id; aura++) {
 		aq_req.aura_id = aura;
+
+		/* Skip if queue is uninitialized */
+		if (ctype == NPA_AQ_CTYPE_POOL && !test_bit(aura, pfvf->pool_bmap))
+			continue;
+
 		seq_printf(m, "======%s : %d=======\n",
 			   (ctype == NPA_AQ_CTYPE_AURA) ? "AURA" : "POOL",
 			aq_req.aura_id);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/Makefile b/drivers/net/ethernet/marvell/octeontx2/nic/Makefile
index 73fdb8798614..3d31ddf7c652 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_OCTEONTX2_VF) += rvu_nicvf.o otx2_ptp.o
 
 rvu_nicpf-y := otx2_pf.o otx2_common.o otx2_txrx.o otx2_ethtool.o \
                otx2_flows.o otx2_tc.o cn10k.o otx2_dmac_flt.o \
-               otx2_devlink.o
+               otx2_devlink.o qos_sq.o
 rvu_nicvf-y := otx2_vf.o otx2_devlink.o
 
 rvu_nicpf-$(CONFIG_DCB) += otx2_dcbnl.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 88f8772a61cd..0868ae825736 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -758,11 +758,16 @@ int otx2_txschq_stop(struct otx2_nic *pfvf)
 void otx2_sqb_flush(struct otx2_nic *pfvf)
 {
 	int qidx, sqe_tail, sqe_head;
+	struct otx2_snd_queue *sq;
 	u64 incr, *ptr, val;
 	int timeout = 1000;
 
 	ptr = (u64 *)otx2_get_regaddr(pfvf, NIX_LF_SQ_OP_STATUS);
-	for (qidx = 0; qidx < pfvf->hw.tot_tx_queues; qidx++) {
+	for (qidx = 0; qidx < pfvf->hw.tot_tx_queues + pfvf->hw.tc_tx_queues;
+	     qidx++) {
+		sq = &pfvf->qset.sq[qidx];
+		if (!sq->sqb_ptrs)
+			continue;
 		incr = (u64)qidx << 32;
 		while (timeout) {
 			val = otx2_atomic64_add(incr, ptr);
@@ -862,7 +867,7 @@ int otx2_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura)
 	return otx2_sync_mbox_msg(&pfvf->mbox);
 }
 
-static int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura)
+int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura)
 {
 	struct otx2_qset *qset = &pfvf->qset;
 	struct otx2_snd_queue *sq;
@@ -935,9 +940,17 @@ static int otx2_cq_init(struct otx2_nic *pfvf, u16 qidx)
 		cq->cint_idx = qidx - pfvf->hw.rx_queues;
 		cq->cqe_cnt = qset->sqe_cnt;
 	} else {
-		cq->cq_type = CQ_XDP;
-		cq->cint_idx = qidx - non_xdp_queues;
-		cq->cqe_cnt = qset->sqe_cnt;
+		if (pfvf->hw.xdp_queues &&
+		    qidx < non_xdp_queues + pfvf->hw.xdp_queues) {
+			cq->cq_type = CQ_XDP;
+			cq->cint_idx = qidx - non_xdp_queues;
+			cq->cqe_cnt = qset->sqe_cnt;
+		} else {
+			cq->cq_type = CQ_QOS;
+			cq->cint_idx = qidx - non_xdp_queues -
+				       pfvf->hw.xdp_queues;
+			cq->cqe_cnt = qset->sqe_cnt;
+		}
 	}
 	cq->cqe_size = pfvf->qset.xqe_size;
 
@@ -1097,7 +1110,7 @@ int otx2_config_nix(struct otx2_nic *pfvf)
 
 	/* Set RQ/SQ/CQ counts */
 	nixlf->rq_cnt = pfvf->hw.rx_queues;
-	nixlf->sq_cnt = pfvf->hw.tot_tx_queues;
+	nixlf->sq_cnt = pfvf->hw.tot_tx_queues + pfvf->hw.tc_tx_queues;
 	nixlf->cq_cnt = pfvf->qset.cq_cnt;
 	nixlf->rss_sz = MAX_RSS_INDIR_TBL_SIZE;
 	nixlf->rss_grps = MAX_RSS_GROUPS;
@@ -1135,7 +1148,7 @@ void otx2_sq_free_sqbs(struct otx2_nic *pfvf)
 	int sqb, qidx;
 	u64 iova, pa;
 
-	for (qidx = 0; qidx < hw->tot_tx_queues; qidx++) {
+	for (qidx = 0; qidx < hw->tot_tx_queues + hw->tc_tx_queues; qidx++) {
 		sq = &qset->sq[qidx];
 		if (!sq->sqb_ptrs)
 			continue;
@@ -1203,8 +1216,8 @@ void otx2_aura_pool_free(struct otx2_nic *pfvf)
 	pfvf->qset.pool = NULL;
 }
 
-static int otx2_aura_init(struct otx2_nic *pfvf, int aura_id,
-			  int pool_id, int numptrs)
+int otx2_aura_init(struct otx2_nic *pfvf, int aura_id,
+		   int pool_id, int numptrs)
 {
 	struct npa_aq_enq_req *aq;
 	struct otx2_pool *pool;
@@ -1280,8 +1293,8 @@ static int otx2_aura_init(struct otx2_nic *pfvf, int aura_id,
 	return 0;
 }
 
-static int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
-			  int stack_pages, int numptrs, int buf_size)
+int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
+		   int stack_pages, int numptrs, int buf_size)
 {
 	struct npa_aq_enq_req *aq;
 	struct otx2_pool *pool;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index 5bee3c3a7ce4..b142cc685a28 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -27,6 +27,7 @@
 #include "otx2_txrx.h"
 #include "otx2_devlink.h"
 #include <rvu_trace.h>
+#include "qos.h"
 
 /* IPv4 flag more fragment bit */
 #define IPV4_FLAG_MORE				0x20
@@ -189,6 +190,7 @@ struct otx2_hw {
 	u16                     rx_queues;
 	u16                     tx_queues;
 	u16                     xdp_queues;
+	u16			tc_tx_queues;
 	u16                     tot_tx_queues;
 	u16			max_queues;
 	u16			pool_cnt;
@@ -501,6 +503,8 @@ struct otx2_nic {
 	u16			pfc_schq_list[NIX_TXSCH_LVL_CNT][MAX_TXSCHQ_PER_FUNC];
 	bool			pfc_alloc_status[NIX_PF_PFC_PRIO_MAX];
 #endif
+	/* qos */
+	struct otx2_qos		qos;
 
 	/* napi event count. It is needed for adaptive irq coalescing. */
 	u32 napi_events;
@@ -886,12 +890,19 @@ static inline void otx2_dma_unmap_page(struct otx2_nic *pfvf,
 
 static inline u16 otx2_get_smq_idx(struct otx2_nic *pfvf, u16 qidx)
 {
+	u16 smq;
+
 #ifdef CONFIG_DCB
 	if (qidx < NIX_PF_PFC_PRIO_MAX && pfvf->pfc_alloc_status[qidx])
 		return pfvf->pfc_schq_list[NIX_TXSCH_LVL_SMQ][qidx];
 #endif
+	/* check if qidx falls under QOS queues */
+	if (qidx >= pfvf->hw.tot_tx_queues)
+		smq = pfvf->qos.qid_to_sqmap[qidx - pfvf->hw.tot_tx_queues];
+	else
+		smq = pfvf->hw.txschq_list[NIX_TXSCH_LVL_SMQ][0];
 
-	return pfvf->hw.txschq_list[NIX_TXSCH_LVL_SMQ][0];
+	return smq;
 }
 
 /* MSI-X APIs */
@@ -926,11 +937,16 @@ int otx2_rxtx_enable(struct otx2_nic *pfvf, bool enable);
 void otx2_ctx_disable(struct mbox *mbox, int type, bool npa);
 int otx2_nix_config_bp(struct otx2_nic *pfvf, bool enable);
 void otx2_cleanup_rx_cqes(struct otx2_nic *pfvf, struct otx2_cq_queue *cq);
+int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura);
 void otx2_cleanup_tx_cqes(struct otx2_nic *pfvf, struct otx2_cq_queue *cq);
 int otx2_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura);
 int cn10k_sq_aq_init(void *dev, u16 qidx, u16 sqb_aura);
 int otx2_alloc_buffer(struct otx2_nic *pfvf, struct otx2_cq_queue *cq,
 		      dma_addr_t *dma);
+int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
+		   int stack_pages, int numptrs, int buf_size);
+int otx2_aura_init(struct otx2_nic *pfvf, int aura_id,
+		   int pool_id, int numptrs);
 
 /* RSS configuration APIs*/
 int otx2_rss_init(struct otx2_nic *pfvf);
@@ -1038,4 +1054,8 @@ static inline void cn10k_handle_mcs_event(struct otx2_nic *pfvf,
 {}
 #endif /* CONFIG_MACSEC */
 
+/* qos support */
+void otx2_qos_sq_setup(struct otx2_nic *pfvf);
+u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
+		      struct net_device *sb_dev);
 #endif /* OTX2_COMMON_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index c1ea60bc2630..3acda6d289d3 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -23,6 +23,7 @@
 #include "otx2_struct.h"
 #include "otx2_ptp.h"
 #include "cn10k.h"
+#include "qos.h"
 #include <rvu_trace.h>
 
 #define DRV_NAME	"rvu_nicpf"
@@ -1228,6 +1229,7 @@ static char *nix_snd_status_e_str[NIX_SND_STATUS_MAX] =  {
 static irqreturn_t otx2_q_intr_handler(int irq, void *data)
 {
 	struct otx2_nic *pf = data;
+	struct otx2_snd_queue *sq;
 	u64 val, *ptr;
 	u64 qidx = 0;
 
@@ -1257,15 +1259,18 @@ static irqreturn_t otx2_q_intr_handler(int irq, void *data)
 	}
 
 	/* SQ */
-	for (qidx = 0; qidx < pf->hw.tot_tx_queues; qidx++) {
+	for (qidx = 0; qidx < pf->hw.tot_tx_queues + pf->hw.tc_tx_queues; qidx++) {
 		u64 sq_op_err_dbg, mnq_err_dbg, snd_err_dbg;
 		u8 sq_op_err_code, mnq_err_code, snd_err_code;
 
+		sq = &pf->qset.sq[qidx];
+		if (!sq->sqb_ptrs)
+			continue;
+
 		/* Below debug registers captures first errors corresponding to
 		 * those registers. We don't have to check against SQ qid as
 		 * these are fatal errors.
 		 */
-
 		ptr = otx2_get_regaddr(pf, NIX_LF_SQ_OP_INT);
 		val = otx2_atomic64_add((qidx << 44), ptr);
 		otx2_write64(pf, NIX_LF_SQ_OP_INT, (qidx << 44) |
@@ -1383,7 +1388,7 @@ static void otx2_free_sq_res(struct otx2_nic *pf)
 	otx2_ctx_disable(&pf->mbox, NIX_AQ_CTYPE_SQ, false);
 	/* Free SQB pointers */
 	otx2_sq_free_sqbs(pf);
-	for (qidx = 0; qidx < pf->hw.tot_tx_queues; qidx++) {
+	for (qidx = 0; qidx < pf->hw.tot_tx_queues + pf->hw.tc_tx_queues; qidx++) {
 		sq = &qset->sq[qidx];
 		qmem_free(pf->dev, sq->sqe);
 		qmem_free(pf->dev, sq->tso_hdrs);
@@ -1433,7 +1438,7 @@ static int otx2_init_hw_resources(struct otx2_nic *pf)
 	 * so, aura count = pool count.
 	 */
 	hw->rqpool_cnt = hw->rx_queues;
-	hw->sqpool_cnt = hw->tot_tx_queues;
+	hw->sqpool_cnt = hw->tot_tx_queues + hw->tc_tx_queues;
 	hw->pool_cnt = hw->rqpool_cnt + hw->sqpool_cnt;
 
 	/* Maximum hardware supported transmit length */
@@ -1688,11 +1693,13 @@ int otx2_open(struct net_device *netdev)
 
 	netif_carrier_off(netdev);
 
-	pf->qset.cq_cnt = pf->hw.rx_queues + pf->hw.tot_tx_queues;
 	/* RQ and SQs are mapped to different CQs,
 	 * so find out max CQ IRQs (i.e CINTs) needed.
 	 */
 	pf->hw.cint_cnt = max(pf->hw.rx_queues, pf->hw.tx_queues);
+	pf->hw.cint_cnt = max_t(u8, pf->hw.cint_cnt, pf->hw.tc_tx_queues);
+	pf->qset.cq_cnt = pf->hw.rx_queues + pf->hw.tot_tx_queues + pf->hw.tc_tx_queues;
+
 	qset->napi = kcalloc(pf->hw.cint_cnt, sizeof(*cq_poll), GFP_KERNEL);
 	if (!qset->napi)
 		return -ENOMEM;
@@ -1708,7 +1715,7 @@ int otx2_open(struct net_device *netdev)
 	if (!qset->cq)
 		goto err_free_mem;
 
-	qset->sq = kcalloc(pf->hw.tot_tx_queues,
+	qset->sq = kcalloc(pf->hw.tot_tx_queues + pf->hw.tc_tx_queues,
 			   sizeof(struct otx2_snd_queue), GFP_KERNEL);
 	if (!qset->sq)
 		goto err_free_mem;
@@ -1743,6 +1750,11 @@ int otx2_open(struct net_device *netdev)
 		else
 			cq_poll->cq_ids[CQ_XDP] = CINT_INVALID_CQ;
 
+		cq_poll->cq_ids[CQ_QOS] = (qidx < pf->hw.tc_tx_queues) ?
+					  (qidx + pf->hw.rx_queues +
+					   pf->hw.tot_tx_queues) :
+					  CINT_INVALID_CQ;
+
 		cq_poll->dev = (void *)pf;
 		cq_poll->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_CQE;
 		INIT_WORK(&cq_poll->dim.work, otx2_dim_work);
@@ -1938,6 +1950,13 @@ static netdev_tx_t otx2_xmit(struct sk_buff *skb, struct net_device *netdev)
 	int qidx = skb_get_queue_mapping(skb);
 	struct otx2_snd_queue *sq;
 	struct netdev_queue *txq;
+	int sq_idx = qidx;
+
+	/* XDP SQs are not mapped with TXQs
+	 * advance qid to derive correct sq maped with QOS
+	 */
+	if (qidx == pf->hw.tx_queues)
+		sq_idx = qidx + pf->hw.xdp_queues;
 
 	/* Check for minimum and maximum packet length */
 	if (skb->len <= ETH_HLEN ||
@@ -1946,7 +1965,7 @@ static netdev_tx_t otx2_xmit(struct sk_buff *skb, struct net_device *netdev)
 		return NETDEV_TX_OK;
 	}
 
-	sq = &pf->qset.sq[qidx];
+	sq = &pf->qset.sq[sq_idx];
 	txq = netdev_get_tx_queue(netdev, qidx);
 
 	if (!otx2_sq_append_skb(netdev, sq, skb, qidx)) {
@@ -1964,8 +1983,8 @@ static netdev_tx_t otx2_xmit(struct sk_buff *skb, struct net_device *netdev)
 	return NETDEV_TX_OK;
 }
 
-static u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
-			     struct net_device *sb_dev)
+u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
+		      struct net_device *sb_dev)
 {
 #ifdef CONFIG_DCB
 	struct otx2_nic *pf = netdev_priv(netdev);
@@ -1987,6 +2006,7 @@ static u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
 #endif
 	return netdev_pick_tx(netdev, skb, NULL);
 }
+EXPORT_SYMBOL(otx2_select_queue);
 
 static netdev_features_t otx2_fix_features(struct net_device *dev,
 					   netdev_features_t features)
@@ -2729,7 +2749,7 @@ static int otx2_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	/* Set number of queues */
 	qcount = min_t(int, num_online_cpus(), OTX2_MAX_CQ_CNT);
 
-	netdev = alloc_etherdev_mqs(sizeof(*pf), qcount, qcount);
+	netdev = alloc_etherdev_mqs(sizeof(*pf), qcount + OTX2_QOS_MAX_LEAF_NODES, qcount);
 	if (!netdev) {
 		err = -ENOMEM;
 		goto err_release_regions;
@@ -2916,6 +2936,8 @@ static int otx2_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_pf_sriov_init;
 #endif
 
+	otx2_qos_sq_setup(pf);
+
 	return 0;
 
 err_pf_sriov_init:
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
index ef10aef3cda0..816b37c258e0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
@@ -463,12 +463,14 @@ static int otx2_tx_napi_handler(struct otx2_nic *pfvf,
 			break;
 		}
 
-		if (cq->cq_type == CQ_XDP) {
+		qidx = cq->cq_idx - pfvf->hw.rx_queues;
+
+		if (cq->cq_type == CQ_XDP)
 			otx2_xdp_snd_pkt_handler(pfvf, sq, cqe);
-		} else {
-			otx2_snd_pkt_handler(pfvf, cq, sq, cqe, budget,
-					     &tx_pkts, &tx_bytes);
-		}
+		else
+			otx2_snd_pkt_handler(pfvf, cq, &pfvf->qset.sq[qidx],
+					     cqe, budget, &tx_pkts, &tx_bytes);
+
 
 		cqe->hdr.cqe_type = NIX_XQE_TYPE_INVALID;
 		processed_cqe++;
@@ -485,7 +487,11 @@ static int otx2_tx_napi_handler(struct otx2_nic *pfvf,
 	if (likely(tx_pkts)) {
 		struct netdev_queue *txq;
 
-		txq = netdev_get_tx_queue(pfvf->netdev, cq->cint_idx);
+		qidx = cq->cq_idx - pfvf->hw.rx_queues;
+
+		if (qidx >= pfvf->hw.tx_queues)
+			qidx -= pfvf->hw.xdp_queues;
+		txq = netdev_get_tx_queue(pfvf->netdev, qidx);
 		netdev_tx_completed_queue(txq, tx_pkts, tx_bytes);
 		/* Check if queue was stopped earlier due to ring full */
 		smp_mb();
@@ -735,7 +741,10 @@ static void otx2_sqe_add_hdr(struct otx2_nic *pfvf, struct otx2_snd_queue *sq,
 		sqe_hdr->aura = sq->aura_id;
 		/* Post a CQE Tx after pkt transmission */
 		sqe_hdr->pnc = 1;
-		sqe_hdr->sq = qidx;
+		if (pfvf->hw.tx_queues == qidx)
+			sqe_hdr->sq = qidx + pfvf->hw.xdp_queues;
+		else
+			sqe_hdr->sq = qidx;
 	}
 	sqe_hdr->total = skb->len;
 	/* Set SQE identifier which will be used later for freeing SKB */
@@ -1183,8 +1192,10 @@ void otx2_cleanup_tx_cqes(struct otx2_nic *pfvf, struct otx2_cq_queue *cq)
 	struct nix_cqe_tx_s *cqe;
 	int processed_cqe = 0;
 	struct sg_list *sg;
+	int qidx;
 
-	sq = &pfvf->qset.sq[cq->cint_idx];
+	qidx = cq->cq_idx - pfvf->hw.rx_queues;
+	sq = &pfvf->qset.sq[qidx];
 
 	if (otx2_nix_cq_op_status(pfvf, cq) || !cq->pend_cqe)
 		return;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
index 93cac2c2664c..7ab6db9a986f 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
@@ -102,7 +102,8 @@ enum cq_type {
 	CQ_RX,
 	CQ_TX,
 	CQ_XDP,
-	CQS_PER_CINT = 3, /* RQ + SQ + XDP */
+	CQ_QOS,
+	CQS_PER_CINT = 4, /* RQ + SQ + XDP + QOS_SQ */
 };
 
 struct otx2_cq_poll {
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
index 7f8ffbf79cf7..459f559a3516 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
@@ -479,6 +479,7 @@ static const struct net_device_ops otx2vf_netdev_ops = {
 	.ndo_open = otx2vf_open,
 	.ndo_stop = otx2vf_stop,
 	.ndo_start_xmit = otx2vf_xmit,
+	.ndo_select_queue = otx2_select_queue,
 	.ndo_set_rx_mode = otx2vf_set_rx_mode,
 	.ndo_set_mac_address = otx2_set_mac_address,
 	.ndo_change_mtu = otx2vf_change_mtu,
@@ -550,7 +551,7 @@ static int otx2vf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	pci_set_master(pdev);
 
 	qcount = num_online_cpus();
-	netdev = alloc_etherdev_mqs(sizeof(*vf), qcount, qcount);
+	netdev = alloc_etherdev_mqs(sizeof(*vf), qcount + OTX2_QOS_MAX_LEAF_NODES, qcount);
 	if (!netdev) {
 		err = -ENOMEM;
 		goto err_release_regions;
@@ -699,6 +700,7 @@ static int otx2vf_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (err)
 		goto err_shutdown_tc;
 #endif
+	otx2_qos_sq_setup(vf);
 
 	return 0;
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos.h b/drivers/net/ethernet/marvell/octeontx2/nic/qos.h
new file mode 100644
index 000000000000..ef8c99a6b2d0
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Marvell RVU Ethernet driver
+ *
+ * Copyright (C) 2022 Marvell.
+ *
+ */
+#ifndef OTX2_QOS_H
+#define OTX2_QOS_H
+
+#define OTX2_QOS_MAX_LEAF_NODES		16
+
+int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx, u16 smq);
+void otx2_qos_disable_sq(struct otx2_nic *pfvf, int qidx, u16 mdq);
+
+struct otx2_qos {
+	u16 qid_to_sqmap[OTX2_QOS_MAX_LEAF_NODES];
+};
+
+#endif
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c b/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c
new file mode 100644
index 000000000000..96e53d5d1778
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell RVU Physical Function ethernet driver
+ *
+ * Copyright (C) 2022 Marvell.
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <net/tso.h>
+
+#include "cn10k.h"
+#include "otx2_reg.h"
+#include "otx2_common.h"
+#include "otx2_txrx.h"
+#include "otx2_struct.h"
+
+#define OTX2_QOS_MAX_LEAF_NODES 16
+
+void otx2_qos_sq_setup(struct otx2_nic *pfvf)
+{
+	struct otx2_hw *hw = &pfvf->hw;
+
+	hw->tc_tx_queues = OTX2_QOS_MAX_LEAF_NODES;
+}
+EXPORT_SYMBOL(otx2_qos_sq_setup);
+
+static void otx2_qos_aura_pool_free(struct otx2_nic *pfvf, int pool_id)
+{
+	struct otx2_pool *pool;
+
+	if (!pfvf->qset.pool)
+		return;
+
+	pool = &pfvf->qset.pool[pool_id];
+	qmem_free(pfvf->dev, pool->stack);
+	qmem_free(pfvf->dev, pool->fc_addr);
+	pool->stack = NULL;
+	pool->fc_addr = NULL;
+}
+
+static int otx2_qos_sq_aura_pool_init(struct otx2_nic *pfvf, int qidx)
+{
+	struct otx2_qset *qset = &pfvf->qset;
+	int pool_id, stack_pages, num_sqbs;
+	struct otx2_hw *hw = &pfvf->hw;
+	struct otx2_snd_queue *sq;
+	struct otx2_pool *pool;
+	dma_addr_t bufptr;
+	int err, ptr;
+	u64 iova, pa;
+
+	/* Calculate number of SQBs needed.
+	 *
+	 * For a 128byte SQE, and 4K size SQB, 31 SQEs will fit in one SQB.
+	 * Last SQE is used for pointing to next SQB.
+	 */
+	num_sqbs = (hw->sqb_size / 128) - 1;
+	num_sqbs = (qset->sqe_cnt + num_sqbs) / num_sqbs;
+
+	/* Get no of stack pages needed */
+	stack_pages =
+		(num_sqbs + hw->stack_pg_ptrs - 1) / hw->stack_pg_ptrs;
+
+	pool_id = otx2_get_pool_idx(pfvf, AURA_NIX_SQ, qidx);
+	pool = &pfvf->qset.pool[pool_id];
+
+	/* Initialize aura context */
+	err = otx2_aura_init(pfvf, pool_id, pool_id, num_sqbs);
+	if (err)
+		return err;
+
+	/* Initialize pool context */
+	err = otx2_pool_init(pfvf, pool_id, stack_pages,
+			     num_sqbs, hw->sqb_size);
+	if (err)
+		goto aura_free;
+
+	/* Flush accumulated messages */
+	err = otx2_sync_mbox_msg(&pfvf->mbox);
+	if (err)
+		goto pool_free;
+
+	/* Allocate pointers and free them to aura/pool */
+	sq = &qset->sq[qidx];
+	sq->sqb_count = 0;
+	sq->sqb_ptrs = kcalloc(num_sqbs, sizeof(*sq->sqb_ptrs), GFP_KERNEL);
+	if (!sq->sqb_ptrs) {
+		err = -ENOMEM;
+		goto pool_free;
+	}
+
+	for (ptr = 0; ptr < num_sqbs; ptr++) {
+		err = __otx2_alloc_rbuf(pfvf, pool, &bufptr);
+		if (err)
+			goto sqb_free;
+		pfvf->hw_ops->aura_freeptr(pfvf, pool_id, bufptr);
+		sq->sqb_ptrs[sq->sqb_count++] = (u64)bufptr;
+	}
+
+	return 0;
+
+sqb_free:
+	while (ptr--) {
+		if (!sq->sqb_ptrs[ptr])
+			continue;
+		iova = sq->sqb_ptrs[ptr];
+		pa = otx2_iova_to_phys(pfvf->iommu_domain, iova);
+		dma_unmap_page_attrs(pfvf->dev, iova, hw->sqb_size,
+				     DMA_FROM_DEVICE,
+				     DMA_ATTR_SKIP_CPU_SYNC);
+		put_page(virt_to_page(phys_to_virt(pa)));
+		otx2_aura_allocptr(pfvf, pool_id);
+	}
+	sq->sqb_count = 0;
+	kfree(sq->sqb_ptrs);
+pool_free:
+	qmem_free(pfvf->dev, pool->stack);
+aura_free:
+	qmem_free(pfvf->dev, pool->fc_addr);
+	otx2_mbox_reset(&pfvf->mbox.mbox, 0);
+	return err;
+}
+
+static void otx2_qos_sq_free_sqbs(struct otx2_nic *pfvf, int qidx)
+{
+	struct otx2_qset *qset = &pfvf->qset;
+	struct otx2_hw *hw = &pfvf->hw;
+	struct otx2_snd_queue *sq;
+	u64 iova, pa;
+	int sqb;
+
+	sq = &qset->sq[qidx];
+	if (!sq->sqb_ptrs)
+		return;
+	for (sqb = 0; sqb < sq->sqb_count; sqb++) {
+		if (!sq->sqb_ptrs[sqb])
+			continue;
+		iova = sq->sqb_ptrs[sqb];
+		pa = otx2_iova_to_phys(pfvf->iommu_domain, iova);
+		dma_unmap_page_attrs(pfvf->dev, iova, hw->sqb_size,
+				     DMA_FROM_DEVICE,
+				     DMA_ATTR_SKIP_CPU_SYNC);
+		put_page(virt_to_page(phys_to_virt(pa)));
+	}
+
+	sq->sqb_count = 0;
+
+	sq = &qset->sq[qidx];
+	qmem_free(pfvf->dev, sq->sqe);
+	qmem_free(pfvf->dev, sq->tso_hdrs);
+	kfree(sq->sg);
+	kfree(sq->sqb_ptrs);
+	qmem_free(pfvf->dev, sq->timestamps);
+
+	memset((void *)sq, 0, sizeof(*sq));
+}
+
+/* send queue id */
+static void otx2_qos_sqb_flush(struct otx2_nic *pfvf, int qidx)
+{
+	int sqe_tail, sqe_head;
+	u64 incr, *ptr, val;
+
+	ptr = (u64 *)otx2_get_regaddr(pfvf, NIX_LF_SQ_OP_STATUS);
+	incr = (u64)qidx << 32;
+	val = otx2_atomic64_add(incr, ptr);
+	sqe_head = (val >> 20) & 0x3F;
+	sqe_tail = (val >> 28) & 0x3F;
+	if (sqe_head != sqe_tail)
+		usleep_range(50, 60);
+}
+
+static int otx2_qos_ctx_disable(struct otx2_nic *pfvf, u16 qidx, int aura_id)
+{
+	struct nix_cn10k_aq_enq_req *cn10k_sq_aq;
+	struct npa_aq_enq_req *aura_aq;
+	struct npa_aq_enq_req *pool_aq;
+	struct nix_aq_enq_req *sq_aq;
+
+	if (test_bit(CN10K_LMTST, &pfvf->hw.cap_flag)) {
+		cn10k_sq_aq = otx2_mbox_alloc_msg_nix_cn10k_aq_enq(&pfvf->mbox);
+		if (!cn10k_sq_aq)
+			return -ENOMEM;
+		cn10k_sq_aq->qidx = qidx;
+		cn10k_sq_aq->sq.ena = 0;
+		cn10k_sq_aq->sq_mask.ena = 1;
+		cn10k_sq_aq->ctype = NIX_AQ_CTYPE_SQ;
+		cn10k_sq_aq->op = NIX_AQ_INSTOP_WRITE;
+	} else {
+		sq_aq = otx2_mbox_alloc_msg_nix_aq_enq(&pfvf->mbox);
+		if (!sq_aq)
+			return -ENOMEM;
+		sq_aq->qidx = qidx;
+		sq_aq->sq.ena = 0;
+		sq_aq->sq_mask.ena = 1;
+		sq_aq->ctype = NIX_AQ_CTYPE_SQ;
+		sq_aq->op = NIX_AQ_INSTOP_WRITE;
+	}
+
+	aura_aq = otx2_mbox_alloc_msg_npa_aq_enq(&pfvf->mbox);
+	if (!aura_aq) {
+		otx2_mbox_reset(&pfvf->mbox.mbox, 0);
+		return -ENOMEM;
+	}
+
+	aura_aq->aura_id = aura_id;
+	aura_aq->aura.ena = 0;
+	aura_aq->aura_mask.ena = 1;
+	aura_aq->ctype = NPA_AQ_CTYPE_AURA;
+	aura_aq->op = NPA_AQ_INSTOP_WRITE;
+
+	pool_aq = otx2_mbox_alloc_msg_npa_aq_enq(&pfvf->mbox);
+	if (!pool_aq) {
+		otx2_mbox_reset(&pfvf->mbox.mbox, 0);
+		return -ENOMEM;
+	}
+
+	pool_aq->aura_id = aura_id;
+	pool_aq->pool.ena = 0;
+	pool_aq->pool_mask.ena = 1;
+
+	pool_aq->ctype = NPA_AQ_CTYPE_POOL;
+	pool_aq->op = NPA_AQ_INSTOP_WRITE;
+
+	return otx2_sync_mbox_msg(&pfvf->mbox);
+}
+
+int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx, u16 smq)
+{
+	struct otx2_hw *hw = &pfvf->hw;
+	int pool_id, sq_idx, err;
+
+	if (pfvf->flags & OTX2_FLAG_INTF_DOWN)
+		return -EPERM;
+
+	sq_idx = hw->tot_tx_queues + qidx;
+
+	mutex_lock(&pfvf->mbox.lock);
+	err = otx2_qos_sq_aura_pool_init(pfvf, sq_idx);
+	if (err)
+		goto out;
+
+	pool_id = otx2_get_pool_idx(pfvf, AURA_NIX_SQ, sq_idx);
+	pfvf->qos.qid_to_sqmap[qidx] = smq;
+	err = otx2_sq_init(pfvf, sq_idx, pool_id);
+	if (err)
+		goto out;
+out:
+	mutex_unlock(&pfvf->mbox.lock);
+	return err;
+}
+
+void otx2_qos_disable_sq(struct otx2_nic *pfvf, int qidx, u16 mdq)
+{
+	struct otx2_qset *qset = &pfvf->qset;
+	struct otx2_hw *hw = &pfvf->hw;
+	struct otx2_snd_queue *sq;
+	struct otx2_cq_queue *cq;
+	int pool_id, sq_idx;
+
+	sq_idx = hw->tot_tx_queues + qidx;
+
+	/* If the DOWN flag is set SQs are already freed */
+	if (pfvf->flags & OTX2_FLAG_INTF_DOWN)
+		return;
+
+	sq = &pfvf->qset.sq[sq_idx];
+	if (!sq->sqb_ptrs)
+		return;
+
+	if (sq_idx < hw->tot_tx_queues ||
+	    sq_idx >= hw->tot_tx_queues + hw->tc_tx_queues) {
+		netdev_err(pfvf->netdev, "Send Queue is not a QoS queue\n");
+		return;
+	}
+
+	cq = &qset->cq[pfvf->hw.rx_queues + sq_idx];
+	pool_id = otx2_get_pool_idx(pfvf, AURA_NIX_SQ, sq_idx);
+
+	otx2_qos_sqb_flush(pfvf, sq_idx);
+	otx2_smq_flush(pfvf, otx2_get_smq_idx(pfvf, sq_idx));
+	otx2_cleanup_tx_cqes(pfvf, cq);
+
+	mutex_lock(&pfvf->mbox.lock);
+	otx2_qos_ctx_disable(pfvf, sq_idx, pool_id);
+	mutex_unlock(&pfvf->mbox.lock);
+
+	otx2_qos_sq_free_sqbs(pfvf, sq_idx);
+	otx2_qos_aura_pool_free(pfvf, pool_id);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next Patch v2 3/5] octeontx2-pf: Refactor schedular queue alloc/free calls
  2023-01-18 10:51 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 1/5] sch_htb: Allow HTB priority parameter in offload mode Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 2/5] octeontx2-pf: qos send queues management Hariprasad Kelam
@ 2023-01-18 10:51 ` Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO Hariprasad Kelam
  2023-01-18 10:51 ` [net-next Patch v2 5/5] octeontx2-pf: Add support for HTB offload Hariprasad Kelam
  4 siblings, 0 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:51 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, maxtram95

Multiple transmit scheduler queues can be configured at different
levels to support traffic shaping and scheduling. But on txschq free
requests, the transmit schedular config in hardware is not getting
reset. This patch adds support to reset the stale config.

The txschq alloc response handler updates the default txschq
array which is used to configure the transmit packet path from
SMQ to TL2 levels. However, for new features such as QoS offload
that requires it's own txschq queues, this handler is still
invoked and results in undefined behavior. The code now handles
txschq response in the mbox caller function.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   | 45 +++++++++++++++++++
 .../marvell/octeontx2/nic/otx2_common.c       | 34 +++++++-------
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  4 --
 .../ethernet/marvell/octeontx2/nic/otx2_vf.c  |  4 --
 4 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 89e94569e74c..c11859999074 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -1677,6 +1677,42 @@ handle_txschq_shaper_update(struct rvu *rvu, int blkaddr, int nixlf,
 	return true;
 }
 
+static void nix_reset_tx_schedule(struct rvu *rvu, int blkaddr,
+				  int lvl, int schq)
+{
+	u64 tlx_parent = 0, tlx_schedule = 0;
+
+	switch (lvl) {
+	case NIX_TXSCH_LVL_TL2:
+		tlx_parent   = NIX_AF_TL2X_PARENT(schq);
+		tlx_schedule = NIX_AF_TL2X_SCHEDULE(schq);
+		break;
+	case NIX_TXSCH_LVL_TL3:
+		tlx_parent   = NIX_AF_TL3X_PARENT(schq);
+		tlx_schedule = NIX_AF_TL3X_SCHEDULE(schq);
+		break;
+	case NIX_TXSCH_LVL_TL4:
+		tlx_parent   = NIX_AF_TL4X_PARENT(schq);
+		tlx_schedule = NIX_AF_TL4X_SCHEDULE(schq);
+		break;
+	case NIX_TXSCH_LVL_MDQ:
+		/* no need to reset SMQ_CFG as HW clears this CSR
+		 * on SMQ flush
+		 */
+		tlx_parent   = NIX_AF_MDQX_PARENT(schq);
+		tlx_schedule = NIX_AF_MDQX_SCHEDULE(schq);
+		break;
+	default:
+		return;
+	}
+
+	if (tlx_parent)
+		rvu_write64(rvu, blkaddr, tlx_parent, 0x0);
+
+	if (tlx_schedule)
+		rvu_write64(rvu, blkaddr, tlx_schedule, 0x0);
+}
+
 /* Disable shaping of pkts by a scheduler queue
  * at a given scheduler level.
  */
@@ -2025,6 +2061,7 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 				pfvf_map[schq] = TXSCH_MAP(pcifunc, 0);
 			nix_reset_tx_linkcfg(rvu, blkaddr, lvl, schq);
 			nix_reset_tx_shaping(rvu, blkaddr, nixlf, lvl, schq);
+			nix_reset_tx_schedule(rvu, blkaddr, lvl, schq);
 		}
 
 		for (idx = 0; idx < req->schq[lvl]; idx++) {
@@ -2034,6 +2071,7 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 				pfvf_map[schq] = TXSCH_MAP(pcifunc, 0);
 			nix_reset_tx_linkcfg(rvu, blkaddr, lvl, schq);
 			nix_reset_tx_shaping(rvu, blkaddr, nixlf, lvl, schq);
+			nix_reset_tx_schedule(rvu, blkaddr, lvl, schq);
 		}
 	}
 
@@ -2122,6 +2160,7 @@ static int nix_txschq_free(struct rvu *rvu, u16 pcifunc)
 				continue;
 			nix_reset_tx_linkcfg(rvu, blkaddr, lvl, schq);
 			nix_clear_tx_xoff(rvu, blkaddr, lvl, schq);
+			nix_reset_tx_shaping(rvu, blkaddr, nixlf, lvl, schq);
 		}
 	}
 	nix_clear_tx_xoff(rvu, blkaddr, NIX_TXSCH_LVL_TL1,
@@ -2160,6 +2199,7 @@ static int nix_txschq_free(struct rvu *rvu, u16 pcifunc)
 		for (schq = 0; schq < txsch->schq.max; schq++) {
 			if (TXSCH_MAP_FUNC(txsch->pfvf_map[schq]) != pcifunc)
 				continue;
+			nix_reset_tx_schedule(rvu, blkaddr, lvl, schq);
 			rvu_free_rsrc(&txsch->schq, schq);
 			txsch->pfvf_map[schq] = TXSCH_MAP(0, NIX_TXSCHQ_FREE);
 		}
@@ -2219,6 +2259,9 @@ static int nix_txschq_free_one(struct rvu *rvu,
 	 */
 	nix_clear_tx_xoff(rvu, blkaddr, lvl, schq);
 
+	nix_reset_tx_linkcfg(rvu, blkaddr, lvl, schq);
+	nix_reset_tx_shaping(rvu, blkaddr, nixlf, lvl, schq);
+
 	/* Flush if it is a SMQ. Onus of disabling
 	 * TL2/3 queue links before SMQ flush is on user
 	 */
@@ -2228,6 +2271,8 @@ static int nix_txschq_free_one(struct rvu *rvu,
 		goto err;
 	}
 
+	nix_reset_tx_schedule(rvu, blkaddr, lvl, schq);
+
 	/* Free the resource */
 	rvu_free_rsrc(&txsch->schq, schq);
 	txsch->pfvf_map[schq] = TXSCH_MAP(0, NIX_TXSCHQ_FREE);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 0868ae825736..a4c18565f9f4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -716,7 +716,8 @@ EXPORT_SYMBOL(otx2_smq_flush);
 int otx2_txsch_alloc(struct otx2_nic *pfvf)
 {
 	struct nix_txsch_alloc_req *req;
-	int lvl;
+	struct nix_txsch_alloc_rsp *rsp;
+	int lvl, schq, rc;
 
 	/* Get memory to put this msg */
 	req = otx2_mbox_alloc_msg_nix_txsch_alloc(&pfvf->mbox);
@@ -726,8 +727,22 @@ int otx2_txsch_alloc(struct otx2_nic *pfvf)
 	/* Request one schq per level */
 	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++)
 		req->schq[lvl] = 1;
+	rc = otx2_sync_mbox_msg(&pfvf->mbox);
+	if (rc)
+		return rc;
 
-	return otx2_sync_mbox_msg(&pfvf->mbox);
+	rsp = (struct nix_txsch_alloc_rsp *)
+	      otx2_mbox_get_rsp(&pfvf->mbox.mbox, 0, &req->hdr);
+	if (IS_ERR(rsp))
+		return PTR_ERR(rsp);
+
+	/* Setup transmit scheduler list */
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++)
+		for (schq = 0; schq < rsp->schq[lvl]; schq++)
+			pfvf->hw.txschq_list[lvl][schq] =
+				rsp->schq_list[lvl][schq];
+
+	return 0;
 }
 
 int otx2_txschq_stop(struct otx2_nic *pfvf)
@@ -1649,21 +1664,6 @@ void mbox_handler_cgx_fec_stats(struct otx2_nic *pfvf,
 	pfvf->hw.cgx_fec_uncorr_blks += rsp->fec_uncorr_blks;
 }
 
-void mbox_handler_nix_txsch_alloc(struct otx2_nic *pf,
-				  struct nix_txsch_alloc_rsp *rsp)
-{
-	int lvl, schq;
-
-	/* Setup transmit scheduler list */
-	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++)
-		for (schq = 0; schq < rsp->schq[lvl]; schq++)
-			pf->hw.txschq_list[lvl][schq] =
-				rsp->schq_list[lvl][schq];
-
-	pf->hw.txschq_link_cfg_lvl = rsp->link_cfg_lvl;
-}
-EXPORT_SYMBOL(mbox_handler_nix_txsch_alloc);
-
 void mbox_handler_npa_lf_alloc(struct otx2_nic *pfvf,
 			       struct npa_lf_alloc_rsp *rsp)
 {
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 3acda6d289d3..92e41d2d5398 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -792,10 +792,6 @@ static void otx2_process_pfaf_mbox_msg(struct otx2_nic *pf,
 	case MBOX_MSG_NIX_LF_ALLOC:
 		mbox_handler_nix_lf_alloc(pf, (struct nix_lf_alloc_rsp *)msg);
 		break;
-	case MBOX_MSG_NIX_TXSCH_ALLOC:
-		mbox_handler_nix_txsch_alloc(pf,
-					     (struct nix_txsch_alloc_rsp *)msg);
-		break;
 	case MBOX_MSG_NIX_BP_ENABLE:
 		mbox_handler_nix_bp_enable(pf, (struct nix_bp_cfg_rsp *)msg);
 		break;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
index 459f559a3516..ee9f3fca6333 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_vf.c
@@ -70,10 +70,6 @@ static void otx2vf_process_vfaf_mbox_msg(struct otx2_nic *vf,
 	case MBOX_MSG_NIX_LF_ALLOC:
 		mbox_handler_nix_lf_alloc(vf, (struct nix_lf_alloc_rsp *)msg);
 		break;
-	case MBOX_MSG_NIX_TXSCH_ALLOC:
-		mbox_handler_nix_txsch_alloc(vf,
-					     (struct nix_txsch_alloc_rsp *)msg);
-		break;
 	case MBOX_MSG_NIX_BP_ENABLE:
 		mbox_handler_nix_bp_enable(vf, (struct nix_bp_cfg_rsp *)msg);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-18 10:51 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
                   ` (2 preceding siblings ...)
  2023-01-18 10:51 ` [net-next Patch v2 3/5] octeontx2-pf: Refactor schedular queue alloc/free calls Hariprasad Kelam
@ 2023-01-18 10:51 ` Hariprasad Kelam
  2023-01-18 20:37   ` Maxim Mikityanskiy
  2023-01-18 10:51 ` [net-next Patch v2 5/5] octeontx2-pf: Add support for HTB offload Hariprasad Kelam
  4 siblings, 1 reply; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:51 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, maxtram95

All VFs and PF netdev shares same TL1 schedular, each interface
PF or VF will have different TL2 schedulars having same parent
TL1. The TL1 RR_PRIO value is static and PF/VFs use the same value
to configure its TL2 node priority in case of DWRR children.

This patch adds support to configure TL1 RR_PRIO value using devlink.
The TL1 RR_PRIO can be configured for each PF. The VFs are not allowed
to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO value from
the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/common.h    |  2 +-
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   | 15 ++++
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  1 +
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   | 39 +++++++++-
 .../marvell/octeontx2/nic/otx2_common.c       |  9 ++-
 .../marvell/octeontx2/nic/otx2_common.h       |  1 +
 .../marvell/octeontx2/nic/otx2_devlink.c      | 78 +++++++++++++++++++
 8 files changed, 147 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/common.h b/drivers/net/ethernet/marvell/octeontx2/af/common.h
index 8931864ee110..f5bf719a6ccf 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/common.h
@@ -142,7 +142,7 @@ enum nix_scheduler {
 
 #define TXSCH_RR_QTM_MAX		((1 << 24) - 1)
 #define TXSCH_TL1_DFLT_RR_QTM		TXSCH_RR_QTM_MAX
-#define TXSCH_TL1_DFLT_RR_PRIO		(0x1ull)
+#define TXSCH_TL1_DFLT_RR_PRIO		(0x7ull)
 #define CN10K_MAX_DWRR_WEIGHT          16384 /* Weight is 14bit on CN10K */
 
 /* Min/Max packet sizes, excluding FCS */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 94e743b5b545..6f0199766d5a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -84,7 +84,7 @@ struct mbox_msghdr {
 #define OTX2_MBOX_REQ_SIG (0xdead)
 #define OTX2_MBOX_RSP_SIG (0xbeef)
 	u16 sig;         /* Signature, for validating corrupted msgs */
-#define OTX2_MBOX_VERSION (0x000a)
+#define OTX2_MBOX_VERSION (0x000b)
 	u16 ver;         /* Version of msg's structure for this ID */
 	u16 next_msgoff; /* Offset of next msg within mailbox region */
 	int rc;          /* Msg process'ed response code */
@@ -299,6 +299,7 @@ M(NIX_BANDPROF_GET_HWINFO, 0x801f, nix_bandprof_get_hwinfo, msg_req,		\
 				nix_bandprof_get_hwinfo_rsp)		    \
 M(NIX_READ_INLINE_IPSEC_CFG, 0x8023, nix_read_inline_ipsec_cfg,		\
 				msg_req, nix_inline_ipsec_cfg)		\
+M(NIX_TL1_RR_PRIO,	0x8025, nix_tl1_rr_prio, nix_tl1_rr_prio_req, msg_rsp) \
 /* MCS mbox IDs (range 0xA000 - 0xBFFF) */					\
 M(MCS_ALLOC_RESOURCES,	0xa000, mcs_alloc_resources, mcs_alloc_rsrc_req,	\
 				mcs_alloc_rsrc_rsp)				\
@@ -825,6 +826,7 @@ enum nix_af_status {
 	NIX_AF_ERR_CQ_CTX_WRITE_ERR  = -429,
 	NIX_AF_ERR_AQ_CTX_RETRY_WRITE  = -430,
 	NIX_AF_ERR_LINK_CREDITS  = -431,
+	NIX_AF_ERR_TL1_RR_PRIO_PERM_DENIED  = -433,
 };
 
 /* For NIX RX vtag action  */
@@ -1269,6 +1271,11 @@ struct nix_bandprof_get_hwinfo_rsp {
 	u32 policer_timeunit;
 };
 
+struct nix_tl1_rr_prio_req {
+	struct mbox_msghdr hdr;
+	u8 tl1_rr_prio;
+};
+
 /* NPC mbox message structs */
 
 #define NPC_MCAM_ENTRY_INVALID	0xFFFF
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 3f5e09b77d4b..7d2230359bf0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -808,6 +808,18 @@ static void rvu_setup_pfvf_macaddress(struct rvu *rvu)
 	}
 }
 
+static void rvu_setup_pfvf_aggr_lvl_rr_prio(struct rvu *rvu)
+{
+	struct rvu_hwinfo *hw = rvu->hw;
+	struct rvu_pfvf *pfvf;
+	int pf;
+
+	for (pf = 0; pf < hw->total_pfs; pf++) {
+		pfvf = &rvu->pf[pf];
+		pfvf->tl1_rr_prio = TXSCH_TL1_DFLT_RR_PRIO;
+	}
+}
+
 static int rvu_fwdata_init(struct rvu *rvu)
 {
 	u64 fwdbase;
@@ -1136,6 +1148,9 @@ static int rvu_setup_hw_resources(struct rvu *rvu)
 	/* Assign MACs for CGX mapped functions */
 	rvu_setup_pfvf_macaddress(rvu);
 
+	/* Assign aggr level RR Priority */
+	rvu_setup_pfvf_aggr_lvl_rr_prio(rvu);
+
 	err = rvu_npa_init(rvu);
 	if (err) {
 		dev_err(rvu->dev, "%s: Failed to initialize npa\n", __func__);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 7f0a64731c67..eec890f98803 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -254,6 +254,7 @@ struct rvu_pfvf {
 	u64     lmt_map_ent_w1; /* Preseving the word1 of lmtst map table entry*/
 	unsigned long flags;
 	struct  sdp_node_info *sdp_info;
+	u8	tl1_rr_prio; /* RR PRIORITY set by PF */
 };
 
 enum rvu_pfvf_flags {
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index c11859999074..ef9f62aa6e2b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -2001,6 +2001,7 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 {
 	struct rvu_hwinfo *hw = rvu->hw;
 	u16 pcifunc = req->hdr.pcifunc;
+	struct rvu_pfvf *parent_pf;
 	int link, blkaddr, rc = 0;
 	int lvl, idx, start, end;
 	struct nix_txsch *txsch;
@@ -2017,6 +2018,8 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 	if (!nix_hw)
 		return NIX_AF_ERR_INVALID_NIXBLK;
 
+	parent_pf = &rvu->pf[rvu_get_pf(pcifunc)];
+
 	mutex_lock(&rvu->rsrc_lock);
 
 	/* Check if request is valid as per HW capabilities
@@ -2076,7 +2079,7 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 	}
 
 	rsp->aggr_level = hw->cap.nix_tx_aggr_lvl;
-	rsp->aggr_lvl_rr_prio = TXSCH_TL1_DFLT_RR_PRIO;
+	rsp->aggr_lvl_rr_prio = parent_pf->tl1_rr_prio;
 	rsp->link_cfg_lvl = rvu_read64(rvu, blkaddr,
 				       NIX_AF_PSE_CHANNEL_LEVEL) & 0x01 ?
 				       NIX_TXSCH_LVL_TL3 : NIX_TXSCH_LVL_TL2;
@@ -2375,7 +2378,9 @@ static bool is_txschq_shaping_valid(struct rvu_hwinfo *hw, int lvl, u64 reg)
 static void nix_tl1_default_cfg(struct rvu *rvu, struct nix_hw *nix_hw,
 				u16 pcifunc, int blkaddr)
 {
+	struct rvu_pfvf *parent_pf = &rvu->pf[rvu_get_pf(pcifunc)];
 	u32 *pfvf_map;
+
 	int schq;
 
 	schq = nix_get_tx_link(rvu, pcifunc);
@@ -2384,7 +2389,7 @@ static void nix_tl1_default_cfg(struct rvu *rvu, struct nix_hw *nix_hw,
 	if (TXSCH_MAP_FLAGS(pfvf_map[schq]) & NIX_TXSCHQ_CFG_DONE)
 		return;
 	rvu_write64(rvu, blkaddr, NIX_AF_TL1X_TOPOLOGY(schq),
-		    (TXSCH_TL1_DFLT_RR_PRIO << 1));
+		    (parent_pf->tl1_rr_prio << 1));
 
 	/* On OcteonTx2 the config was in bytes and newer silcons
 	 * it's changed to weight.
@@ -5548,3 +5553,33 @@ int rvu_mbox_handler_nix_bandprof_get_hwinfo(struct rvu *rvu, struct msg_req *re
 
 	return 0;
 }
+
+int rvu_mbox_handler_nix_tl1_rr_prio(struct rvu *rvu,
+				     struct nix_tl1_rr_prio_req *req,
+				     struct msg_rsp *rsp)
+{
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, nixlf, schq, err;
+	struct rvu_pfvf *pfvf;
+	u16 regval;
+
+	err = nix_get_nixlf(rvu, pcifunc, &nixlf, &blkaddr);
+	if (err)
+		return err;
+
+	pfvf = rvu_get_pfvf(rvu, pcifunc);
+	/* Only PF is allowed */
+	if (is_vf(pcifunc))
+		return NIX_AF_ERR_TL1_RR_PRIO_PERM_DENIED;
+
+	pfvf->tl1_rr_prio = req->tl1_rr_prio;
+
+	/* update TL1 topology */
+	schq = nix_get_tx_link(rvu, pcifunc);
+	regval = rvu_read64(rvu, blkaddr, NIX_AF_TL1X_TOPOLOGY(schq));
+	regval &= ~GENMASK_ULL(4, 1);
+	regval |= pfvf->tl1_rr_prio << 1;
+	rvu_write64(rvu, blkaddr, NIX_AF_TL1X_TOPOLOGY(schq), regval);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index a4c18565f9f4..96132cfe2141 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -591,7 +591,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for
 	u16 (*schq_list)[MAX_TXSCHQ_PER_FUNC];
 	struct otx2_hw *hw = &pfvf->hw;
 	struct nix_txschq_config *req;
-	u64 schq, parent;
+	u16 schq, parent;
 	u64 dwrr_val;
 
 	dwrr_val = mtu_to_dwrr_weight(pfvf, pfvf->tx_max_pktlen);
@@ -654,7 +654,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for
 
 		req->num_regs++;
 		req->reg[1] = NIX_AF_TL2X_SCHEDULE(schq);
-		req->regval[1] = TXSCH_TL1_DFLT_RR_PRIO << 24 | dwrr_val;
+		req->regval[1] = hw->txschq_aggr_lvl_rr_prio << 24 | dwrr_val;
 
 		if (lvl == hw->txschq_link_cfg_lvl) {
 			req->num_regs++;
@@ -678,7 +678,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for
 
 		req->num_regs++;
 		req->reg[1] = NIX_AF_TL1X_TOPOLOGY(schq);
-		req->regval[1] = (TXSCH_TL1_DFLT_RR_PRIO << 1);
+		req->regval[1] = hw->txschq_aggr_lvl_rr_prio << 1;
 
 		req->num_regs++;
 		req->reg[2] = NIX_AF_TL1X_CIR(schq);
@@ -742,6 +742,9 @@ int otx2_txsch_alloc(struct otx2_nic *pfvf)
 			pfvf->hw.txschq_list[lvl][schq] =
 				rsp->schq_list[lvl][schq];
 
+	pfvf->hw.txschq_link_cfg_lvl     = rsp->link_cfg_lvl;
+	pfvf->hw.txschq_aggr_lvl_rr_prio = rsp->aggr_lvl_rr_prio;
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index b142cc685a28..3e09f52f2dc6 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -208,6 +208,7 @@ struct otx2_hw {
 
 	/* NIX */
 	u8			txschq_link_cfg_lvl;
+	u8			txschq_aggr_lvl_rr_prio;
 	u16			txschq_list[NIX_TXSCH_LVL_CNT][MAX_TXSCHQ_PER_FUNC];
 	u16			matchall_ipolicer;
 	u32			dwrr_mtu;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
index 63ef7c41d18d..b6bb6945f08a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
@@ -64,9 +64,82 @@ static int otx2_dl_mcam_count_get(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int otx2_dl_tl1_rr_prio_validate(struct devlink *devlink, u32 id,
+					union devlink_param_value val,
+					struct netlink_ext_ack *extack)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pf = otx2_dl->pfvf;
+
+	if (is_otx2_vf(pf->pcifunc)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "TL1RR PRIORITY setting not allowed on VFs");
+		return -EOPNOTSUPP;
+	}
+
+	if (pci_num_vf(pf->pdev)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "TL1RR PRIORITY setting not allowed as VFs are already attached");
+		return -EOPNOTSUPP;
+	}
+
+	if (val.vu8 > 7) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Valid priority range 0 - 7");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int otx2_dl_tl1_rr_prio_get(struct devlink *devlink, u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pfvf = otx2_dl->pfvf;
+
+	ctx->val.vu8 = pfvf->hw.txschq_aggr_lvl_rr_prio;
+
+	return 0;
+}
+
+static int otx2_dl_tl1_rr_prio_set(struct devlink *devlink, u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pfvf = otx2_dl->pfvf;
+	struct nix_tl1_rr_prio_req *req;
+	bool if_up;
+	int err;
+
+	if_up = netif_running(pfvf->netdev);
+
+	/* send mailbox to AF */
+	mutex_lock(&pfvf->mbox.lock);
+
+	req = otx2_mbox_alloc_msg_nix_tl1_rr_prio(&pfvf->mbox);
+	if (!req) {
+		mutex_unlock(&pfvf->mbox.lock);
+		return -ENOMEM;
+	}
+
+	req->tl1_rr_prio = ctx->val.vu8;
+	err = otx2_sync_mbox_msg(&pfvf->mbox);
+	mutex_unlock(&pfvf->mbox.lock);
+
+	/* Reconfigure TL1/TL2 DWRR PRIORITY */
+	if (!err && if_up) {
+		otx2_stop(pfvf->netdev);
+		otx2_open(pfvf->netdev);
+	}
+
+	return err;
+}
+
 enum otx2_dl_param_id {
 	OTX2_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	OTX2_DEVLINK_PARAM_ID_MCAM_COUNT,
+	OTX2_DEVLINK_PARAM_ID_TL1_RR_PRIO,
 };
 
 static const struct devlink_param otx2_dl_params[] = {
@@ -75,6 +148,11 @@ static const struct devlink_param otx2_dl_params[] = {
 			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
 			     otx2_dl_mcam_count_get, otx2_dl_mcam_count_set,
 			     otx2_dl_mcam_count_validate),
+	DEVLINK_PARAM_DRIVER(OTX2_DEVLINK_PARAM_ID_TL1_RR_PRIO,
+			     "tl1_rr_prio", DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     otx2_dl_tl1_rr_prio_get, otx2_dl_tl1_rr_prio_set,
+			     otx2_dl_tl1_rr_prio_validate),
 };
 
 static const struct devlink_ops otx2_devlink_ops = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next Patch v2 5/5] octeontx2-pf: Add support for HTB offload
  2023-01-18 10:51 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
                   ` (3 preceding siblings ...)
  2023-01-18 10:51 ` [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO Hariprasad Kelam
@ 2023-01-18 10:51 ` Hariprasad Kelam
  4 siblings, 0 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:51 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, maxtram95

From: Naveen Mamindlapalli <naveenm@marvell.com>

This patch registers callbacks to support HTB offload.

Below are features supported,

- supports traffic shaping on the given class by honoring rate and ceil
configuration.

- supports traffic scheduling,  which prioritizes different types of
traffic based on strict priority values.

- supports the creation of leaf to inner classes such that parent node
rate limits apply to all child nodes.

Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
---
 .../ethernet/marvell/octeontx2/nic/Makefile   |    2 +-
 .../marvell/octeontx2/nic/otx2_common.c       |   37 +-
 .../marvell/octeontx2/nic/otx2_common.h       |    7 +
 .../marvell/octeontx2/nic/otx2_devlink.c      |    6 +
 .../marvell/octeontx2/nic/otx2_ethtool.c      |   31 +-
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |   47 +-
 .../ethernet/marvell/octeontx2/nic/otx2_reg.h |   13 +
 .../ethernet/marvell/octeontx2/nic/otx2_tc.c  |    7 +-
 .../net/ethernet/marvell/octeontx2/nic/qos.c  | 1547 +++++++++++++++++
 .../net/ethernet/marvell/octeontx2/nic/qos.h  |   56 +-
 .../ethernet/marvell/octeontx2/nic/qos_sq.c   |   20 +-
 11 files changed, 1749 insertions(+), 24 deletions(-)
 create mode 100644 drivers/net/ethernet/marvell/octeontx2/nic/qos.c

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/Makefile b/drivers/net/ethernet/marvell/octeontx2/nic/Makefile
index 3d31ddf7c652..5664f768cb0c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/Makefile
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_OCTEONTX2_VF) += rvu_nicvf.o otx2_ptp.o
 
 rvu_nicpf-y := otx2_pf.o otx2_common.o otx2_txrx.o otx2_ethtool.o \
                otx2_flows.o otx2_tc.o cn10k.o otx2_dmac_flt.o \
-               otx2_devlink.o qos_sq.o
+               otx2_devlink.o qos_sq.o qos.o
 rvu_nicvf-y := otx2_vf.o otx2_devlink.o
 
 rvu_nicpf-$(CONFIG_DCB) += otx2_dcbnl.o
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 96132cfe2141..322b65c9e202 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -89,6 +89,11 @@ int otx2_update_sq_stats(struct otx2_nic *pfvf, int qidx)
 	if (!pfvf->qset.sq)
 		return 0;
 
+	if (qidx >= pfvf->hw.tot_tx_queues) {
+		if (!test_bit(qidx - pfvf->hw.tot_tx_queues, pfvf->qos.qos_sq_bmap))
+			return 0;
+	}
+
 	otx2_nix_sq_op_stats(&sq->stats, pfvf, qidx);
 	return 1;
 }
@@ -748,29 +753,49 @@ int otx2_txsch_alloc(struct otx2_nic *pfvf)
 	return 0;
 }
 
-int otx2_txschq_stop(struct otx2_nic *pfvf)
+void otx2_txschq_free_one(struct otx2_nic *pfvf, u16 lvl, u16 schq)
 {
 	struct nix_txsch_free_req *free_req;
-	int lvl, schq, err;
+	int err;
 
 	mutex_lock(&pfvf->mbox.lock);
-	/* Free the transmit schedulers */
+
 	free_req = otx2_mbox_alloc_msg_nix_txsch_free(&pfvf->mbox);
 	if (!free_req) {
 		mutex_unlock(&pfvf->mbox.lock);
-		return -ENOMEM;
+		netdev_err(pfvf->netdev,
+			   "Failed alloc txschq free req\n");
+		return;
 	}
 
-	free_req->flags = TXSCHQ_FREE_ALL;
+	free_req->schq_lvl = lvl;
+	free_req->schq = schq;
+
 	err = otx2_sync_mbox_msg(&pfvf->mbox);
+	if (err) {
+		netdev_err(pfvf->netdev,
+			   "Failed stop txschq %d at level %d\n", lvl, schq);
+	}
+
 	mutex_unlock(&pfvf->mbox.lock);
+}
+
+int otx2_txschq_stop(struct otx2_nic *pfvf)
+{
+	int lvl, schq;
+
+	/* free non QOS TLx nodes */
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++)
+		otx2_txschq_free_one(pfvf, lvl,
+				     pfvf->hw.txschq_list[lvl][0]);
 
 	/* Clear the txschq list */
 	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
 		for (schq = 0; schq < MAX_TXSCHQ_PER_FUNC; schq++)
 			pfvf->hw.txschq_list[lvl][schq] = 0;
 	}
-	return err;
+
+	return 0;
 }
 
 void otx2_sqb_flush(struct otx2_nic *pfvf)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index 3e09f52f2dc6..c5e125e8d00b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -253,6 +253,7 @@ struct otx2_hw {
 #define CN10K_RPM		3
 #define CN10K_PTP_ONESTEP	4
 #define CN10K_HW_MACSEC		5
+#define QOS_CIR_PIR_SUPPORT	6
 	unsigned long		cap_flag;
 
 #define LMT_LINE_SIZE		128
@@ -587,6 +588,7 @@ static inline void otx2_setup_dev_hw_settings(struct otx2_nic *pfvf)
 		__set_bit(CN10K_LMTST, &hw->cap_flag);
 		__set_bit(CN10K_RPM, &hw->cap_flag);
 		__set_bit(CN10K_PTP_ONESTEP, &hw->cap_flag);
+		__set_bit(QOS_CIR_PIR_SUPPORT, &hw->cap_flag);
 	}
 
 	if (is_dev_cn10kb(pfvf->pdev))
@@ -948,6 +950,7 @@ int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
 		   int stack_pages, int numptrs, int buf_size);
 int otx2_aura_init(struct otx2_nic *pfvf, int aura_id,
 		   int pool_id, int numptrs);
+void otx2_txschq_free_one(struct otx2_nic *pfvf, u16 lvl, u16 schq);
 
 /* RSS configuration APIs*/
 int otx2_rss_init(struct otx2_nic *pfvf);
@@ -1059,4 +1062,8 @@ static inline void cn10k_handle_mcs_event(struct otx2_nic *pfvf,
 void otx2_qos_sq_setup(struct otx2_nic *pfvf);
 u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
 		      struct net_device *sb_dev);
+int otx2_get_txq_by_classid(struct otx2_nic *pfvf, u16 classid);
+void otx2_qos_config_txschq(struct otx2_nic *pfvf);
+int otx2_clean_qos_queues(struct otx2_nic *pfvf);
+bool otx2_is_qos_configured(struct otx2_nic *pfvf);
 #endif /* OTX2_COMMON_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
index b6bb6945f08a..8b26232f7a8b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
@@ -77,6 +77,12 @@ static int otx2_dl_tl1_rr_prio_validate(struct devlink *devlink, u32 id,
 		return -EOPNOTSUPP;
 	}
 
+	if (otx2_is_qos_configured(pf)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "TL1RR PRIORITY setting not allowed after QOS config");
+		return -EOPNOTSUPP;
+	}
+
 	if (pci_num_vf(pf->pdev)) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "TL1RR PRIORITY setting not allowed as VFs are already attached");
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index 0f8d1a69139f..0816fe19a2c3 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -80,6 +80,11 @@ static void otx2_get_drvinfo(struct net_device *netdev,
 	strscpy(info->bus_info, pci_name(pfvf->pdev), sizeof(info->bus_info));
 }
 
+static int otx2_get_total_txq_count(struct otx2_nic *pfvf)
+{
+	return pfvf->hw.tot_tx_queues + pfvf->hw.tc_tx_queues;
+}
+
 static void otx2_get_qset_strings(struct otx2_nic *pfvf, u8 **data, int qset)
 {
 	int start_qidx = qset * pfvf->hw.rx_queues;
@@ -92,10 +97,15 @@ static void otx2_get_qset_strings(struct otx2_nic *pfvf, u8 **data, int qset)
 			*data += ETH_GSTRING_LEN;
 		}
 	}
-	for (qidx = 0; qidx < pfvf->hw.tx_queues; qidx++) {
+	for (qidx = 0; qidx < otx2_get_total_txq_count(pfvf); qidx++) {
 		for (stats = 0; stats < otx2_n_queue_stats; stats++) {
-			sprintf(*data, "txq%d: %s", qidx + start_qidx,
-				otx2_queue_stats[stats].name);
+			if (qidx >= pfvf->hw.tot_tx_queues)
+				sprintf(*data, "txq_qos%d: %s",
+					qidx + start_qidx - pfvf->hw.tot_tx_queues,
+					otx2_queue_stats[stats].name);
+			else
+				sprintf(*data, "txq%d: %s", qidx + start_qidx,
+					otx2_queue_stats[stats].name);
 			*data += ETH_GSTRING_LEN;
 		}
 	}
@@ -159,7 +169,7 @@ static void otx2_get_qset_stats(struct otx2_nic *pfvf,
 				[otx2_queue_stats[stat].index];
 	}
 
-	for (qidx = 0; qidx < pfvf->hw.tx_queues; qidx++) {
+	for (qidx = 0; qidx <  otx2_get_total_txq_count(pfvf); qidx++) {
 		if (!otx2_update_sq_stats(pfvf, qidx)) {
 			for (stat = 0; stat < otx2_n_queue_stats; stat++)
 				*((*data)++) = 0;
@@ -254,7 +264,8 @@ static int otx2_get_sset_count(struct net_device *netdev, int sset)
 		return -EINVAL;
 
 	qstats_count = otx2_n_queue_stats *
-		       (pfvf->hw.rx_queues + pfvf->hw.tx_queues);
+		       (pfvf->hw.rx_queues + pfvf->hw.tot_tx_queues +
+			pfvf->hw.tc_tx_queues);
 	if (!test_bit(CN10K_RPM, &pfvf->hw.cap_flag))
 		mac_stats = CGX_RX_STATS_COUNT + CGX_TX_STATS_COUNT;
 	otx2_update_lmac_fec_stats(pfvf);
@@ -282,7 +293,7 @@ static int otx2_set_channels(struct net_device *dev,
 {
 	struct otx2_nic *pfvf = netdev_priv(dev);
 	bool if_up = netif_running(dev);
-	int err = 0;
+	int err = 0, qos_txqs;
 
 	if (!channel->rx_count || !channel->tx_count)
 		return -EINVAL;
@@ -296,7 +307,10 @@ static int otx2_set_channels(struct net_device *dev,
 	if (if_up)
 		dev->netdev_ops->ndo_stop(dev);
 
-	err = otx2_set_real_num_queues(dev, channel->tx_count,
+	qos_txqs = bitmap_weight(pfvf->qos.qos_sq_bmap,
+				 OTX2_QOS_MAX_LEAF_NODES);
+
+	err = otx2_set_real_num_queues(dev, channel->tx_count + qos_txqs,
 				       channel->rx_count);
 	if (err)
 		return err;
@@ -1405,7 +1419,8 @@ static int otx2vf_get_sset_count(struct net_device *netdev, int sset)
 		return -EINVAL;
 
 	qstats_count = otx2_n_queue_stats *
-		       (vf->hw.rx_queues + vf->hw.tx_queues);
+		       (vf->hw.rx_queues + vf->hw.tx_queues +
+			vf->hw.tc_tx_queues);
 
 	return otx2_n_dev_stats + otx2_n_drv_stats + qstats_count + 1;
 }
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 92e41d2d5398..331b2b554d75 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1386,6 +1386,9 @@ static void otx2_free_sq_res(struct otx2_nic *pf)
 	otx2_sq_free_sqbs(pf);
 	for (qidx = 0; qidx < pf->hw.tot_tx_queues + pf->hw.tc_tx_queues; qidx++) {
 		sq = &qset->sq[qidx];
+		/* Skip freeing Qos queues if they are not initialized */
+		if (!sq->sqe)
+			continue;
 		qmem_free(pf->dev, sq->sqe);
 		qmem_free(pf->dev, sq->tso_hdrs);
 		kfree(sq->sg);
@@ -1568,6 +1571,8 @@ static void otx2_free_hw_resources(struct otx2_nic *pf)
 		otx2_pfc_txschq_stop(pf);
 #endif
 
+	otx2_clean_qos_queues(pf);
+
 	mutex_lock(&mbox->lock);
 	/* Disable backpressure */
 	if (!(pf->pcifunc & RVU_PFVF_FUNC_MASK))
@@ -1834,6 +1839,9 @@ int otx2_open(struct net_device *netdev)
 	/* 'intf_down' may be checked on any cpu */
 	smp_wmb();
 
+	/* Enable QoS configuration before starting tx queues */
+	otx2_qos_config_txschq(pf);
+
 	/* we have already received link status notification */
 	if (pf->linfo.link_up && !(pf->pcifunc & RVU_PFVF_FUNC_MASK))
 		otx2_handle_link_event(pf);
@@ -1979,14 +1987,45 @@ static netdev_tx_t otx2_xmit(struct sk_buff *skb, struct net_device *netdev)
 	return NETDEV_TX_OK;
 }
 
+static int otx2_qos_select_htb_queue(struct otx2_nic *pf, struct sk_buff *skb,
+				     u16 htb_maj_id)
+{
+	u16 classid;
+
+	if ((TC_H_MAJ(skb->priority) >> 16) == htb_maj_id)
+		classid = TC_H_MIN(skb->priority);
+	else
+		classid = READ_ONCE(pf->qos.defcls);
+
+	if (!classid)
+		return 0;
+
+	return otx2_get_txq_by_classid(pf, classid);
+}
+
 u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
 		      struct net_device *sb_dev)
 {
-#ifdef CONFIG_DCB
 	struct otx2_nic *pf = netdev_priv(netdev);
+	bool qos_enabled;
+#ifdef CONFIG_DCB
 	u8 vlan_prio;
 #endif
+	int txq;
+
+	qos_enabled = (netdev->real_num_tx_queues > pf->hw.tx_queues) ? true : false;
+	if (unlikely(qos_enabled)) {
+		u16 htb_maj_id = smp_load_acquire(&pf->qos.maj_id); /* barrier */
 
+		if (unlikely(htb_maj_id)) {
+			txq = otx2_qos_select_htb_queue(pf, skb, htb_maj_id);
+			if (txq > 0)
+				return txq;
+			goto process_pfc;
+		}
+	}
+
+process_pfc:
 #ifdef CONFIG_DCB
 	if (!skb_vlan_tag_present(skb))
 		goto pick_tx;
@@ -2000,7 +2039,11 @@ u16 otx2_select_queue(struct net_device *netdev, struct sk_buff *skb,
 
 pick_tx:
 #endif
-	return netdev_pick_tx(netdev, skb, NULL);
+	txq = netdev_pick_tx(netdev, skb, NULL);
+	if (unlikely(qos_enabled))
+		return txq % pf->hw.tx_queues;
+
+	return txq;
 }
 EXPORT_SYMBOL(otx2_select_queue);
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h
index 1b967eaf948b..45a32e4b49d1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_reg.h
@@ -145,12 +145,25 @@
 #define NIX_AF_TL1X_TOPOLOGY(a)		(0xC80 | (a) << 16)
 #define NIX_AF_TL2X_PARENT(a)		(0xE88 | (a) << 16)
 #define NIX_AF_TL2X_SCHEDULE(a)		(0xE00 | (a) << 16)
+#define NIX_AF_TL2X_TOPOLOGY(a)		(0xE80 | (a) << 16)
+#define NIX_AF_TL2X_CIR(a)              (0xE20 | (a) << 16)
+#define NIX_AF_TL2X_PIR(a)              (0xE30 | (a) << 16)
 #define NIX_AF_TL3X_PARENT(a)		(0x1088 | (a) << 16)
 #define NIX_AF_TL3X_SCHEDULE(a)		(0x1000 | (a) << 16)
+#define NIX_AF_TL3X_SHAPE(a)		(0x1010 | (a) << 16)
+#define NIX_AF_TL3X_CIR(a)		(0x1020 | (a) << 16)
+#define NIX_AF_TL3X_PIR(a)		(0x1030 | (a) << 16)
+#define NIX_AF_TL3X_TOPOLOGY(a)		(0x1080 | (a) << 16)
 #define NIX_AF_TL4X_PARENT(a)		(0x1288 | (a) << 16)
 #define NIX_AF_TL4X_SCHEDULE(a)		(0x1200 | (a) << 16)
+#define NIX_AF_TL4X_SHAPE(a)		(0x1210 | (a) << 16)
+#define NIX_AF_TL4X_CIR(a)		(0x1220 | (a) << 16)
 #define NIX_AF_TL4X_PIR(a)		(0x1230 | (a) << 16)
+#define NIX_AF_TL4X_TOPOLOGY(a)		(0x1280 | (a) << 16)
 #define NIX_AF_MDQX_SCHEDULE(a)		(0x1400 | (a) << 16)
+#define NIX_AF_MDQX_SHAPE(a)		(0x1410 | (a) << 16)
+#define NIX_AF_MDQX_CIR(a)		(0x1420 | (a) << 16)
+#define NIX_AF_MDQX_PIR(a)		(0x1430 | (a) << 16)
 #define NIX_AF_MDQX_PARENT(a)		(0x1480 | (a) << 16)
 #define NIX_AF_TL3_TL2X_LINKX_CFG(a, b)	(0x1700 | (a) << 16 | (b) << 3)
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c
index 044cc211424e..42c49249f4e7 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_tc.c
@@ -19,6 +19,7 @@
 
 #include "cn10k.h"
 #include "otx2_common.h"
+#include "qos.h"
 
 /* Egress rate limiting definitions */
 #define MAX_BURST_EXPONENT		0x0FULL
@@ -147,8 +148,8 @@ static void otx2_get_egress_rate_cfg(u64 maxrate, u32 *exp,
 	}
 }
 
-static u64 otx2_get_txschq_rate_regval(struct otx2_nic *nic,
-				       u64 maxrate, u32 burst)
+u64 otx2_get_txschq_rate_regval(struct otx2_nic *nic,
+				u64 maxrate, u32 burst)
 {
 	u32 burst_exp, burst_mantissa;
 	u32 exp, mantissa, div_exp;
@@ -1127,6 +1128,8 @@ int otx2_setup_tc(struct net_device *netdev, enum tc_setup_type type,
 	switch (type) {
 	case TC_SETUP_BLOCK:
 		return otx2_setup_tc_block(netdev, type_data);
+	case TC_SETUP_QDISC_HTB:
+		return otx2_setup_tc_htb(netdev, type_data);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos.c b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c
new file mode 100644
index 000000000000..dc829c210845
--- /dev/null
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos.c
@@ -0,0 +1,1547 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Marvell RVU Ethernet driver
+ *
+ * Copyright (C) 2021 Marvell.
+ *
+ */
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/bitfield.h>
+
+#include "otx2_common.h"
+#include "cn10k.h"
+#include "qos.h"
+
+#define OTX2_QOS_QID_INNER		0xFFFFU
+#define OTX2_QOS_QID_NONE		0xFFFEU
+#define OTX2_QOS_ROOT_CLASSID		0xFFFFFFFF
+#define OTX2_QOS_CLASS_NONE		0
+#define OTX2_QOS_DEFAULT_PRIO		0xF
+#define OTX2_QOS_INVALID_SQ		0xFFFF
+
+/* Egress rate limiting definitions */
+#define MAX_BURST_EXPONENT		0x0FULL
+#define MAX_BURST_MANTISSA		0xFFULL
+#define MAX_BURST_SIZE			130816ULL
+#define MAX_RATE_DIVIDER_EXPONENT	12ULL
+#define MAX_RATE_EXPONENT		0x0FULL
+#define MAX_RATE_MANTISSA		0xFFULL
+
+/* Bitfields in NIX_TLX_PIR register */
+#define TLX_RATE_MANTISSA		GENMASK_ULL(8, 1)
+#define TLX_RATE_EXPONENT		GENMASK_ULL(12, 9)
+#define TLX_RATE_DIVIDER_EXPONENT	GENMASK_ULL(16, 13)
+#define TLX_BURST_MANTISSA		GENMASK_ULL(36, 29)
+#define TLX_BURST_EXPONENT		GENMASK_ULL(40, 37)
+
+static int otx2_qos_update_tx_netdev_queues(struct otx2_nic *pfvf)
+{
+	int tx_queues, err, qos_txqs = 0;
+	struct otx2_hw *hw = &pfvf->hw;
+
+	qos_txqs = bitmap_weight(pfvf->qos.qos_sq_bmap,
+				 OTX2_QOS_MAX_LEAF_NODES);
+
+	tx_queues = hw->tx_queues + qos_txqs;
+
+	err = netif_set_real_num_tx_queues(pfvf->netdev, tx_queues);
+	if (err) {
+		netdev_err(pfvf->netdev,
+			   "Failed to set no of Tx queues: %d\n", tx_queues);
+		return err;
+	}
+
+	return 0;
+}
+
+static u64 otx2_qos_convert_rate(u64 rate)
+{
+	u64 converted_rate;
+
+	/* convert bytes per second to Mbps */
+	converted_rate = rate * 8;
+	converted_rate = max_t(u64, converted_rate / 1000000, 1);
+
+	return converted_rate;
+}
+
+static void __otx2_qos_txschq_cfg(struct otx2_nic *pfvf,
+				  struct otx2_qos_node *node,
+				  struct nix_txschq_config *cfg)
+{
+	struct otx2_hw *hw = &pfvf->hw;
+	int num_regs = 0;
+	u64 maxrate;
+	u8 level;
+
+	level = node->level;
+
+	/* program txschq registers */
+	if (level == NIX_TXSCH_LVL_SMQ) {
+		cfg->reg[num_regs] = NIX_AF_SMQX_CFG(node->schq);
+		cfg->regval[num_regs] = ((u64)pfvf->tx_max_pktlen << 8) |
+					OTX2_MIN_MTU;
+		cfg->regval[num_regs] |= (0x20ULL << 51) | (0x80ULL << 39) |
+					 (0x2ULL << 36);
+		num_regs++;
+
+		/* configure parent txschq */
+		cfg->reg[num_regs] = NIX_AF_MDQX_PARENT(node->schq);
+		cfg->regval[num_regs] = node->parent->schq << 16;
+		num_regs++;
+
+		/* configure prio/quantum */
+		if (node->qid == OTX2_QOS_QID_NONE) {
+			cfg->reg[num_regs] = NIX_AF_MDQX_SCHEDULE(node->schq);
+			cfg->regval[num_regs] =  node->prio << 24 |
+						 mtu_to_dwrr_weight(pfvf,
+								    pfvf->tx_max_pktlen);
+			num_regs++;
+			goto txschq_cfg_out;
+		}
+
+		/* configure prio */
+		cfg->reg[num_regs] = NIX_AF_MDQX_SCHEDULE(node->schq);
+		cfg->regval[num_regs] = (node->schq -
+					 node->parent->prio_anchor) << 24;
+		num_regs++;
+
+		/* configure PIR */
+		maxrate = (node->rate > node->ceil) ? node->rate : node->ceil;
+
+		cfg->reg[num_regs] = NIX_AF_MDQX_PIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, maxrate, 65536);
+		num_regs++;
+
+		/* configure CIR */
+		if (!test_bit(QOS_CIR_PIR_SUPPORT, &pfvf->hw.cap_flag)) {
+			/* Don't configure CIR when both CIR+PIR not supported
+			 * On 96xx, CIR + PIR + RED_ALGO=STALL causes deadlock
+			 */
+			goto txschq_cfg_out;
+		}
+
+		cfg->reg[num_regs] = NIX_AF_MDQX_CIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, node->rate, 65536);
+		num_regs++;
+	} else if (level == NIX_TXSCH_LVL_TL4) {
+		/* configure parent txschq */
+		cfg->reg[num_regs] = NIX_AF_TL4X_PARENT(node->schq);
+		cfg->regval[num_regs] = node->parent->schq << 16;
+		num_regs++;
+
+		/* return if not htb node */
+		if (node->qid == OTX2_QOS_QID_NONE) {
+			cfg->reg[num_regs] = NIX_AF_TL4X_SCHEDULE(node->schq);
+			cfg->regval[num_regs] =  node->prio << 24 |
+						 mtu_to_dwrr_weight(pfvf,
+								    pfvf->tx_max_pktlen);
+			num_regs++;
+			goto txschq_cfg_out;
+		}
+
+		/* configure priority */
+		cfg->reg[num_regs] = NIX_AF_TL4X_SCHEDULE(node->schq);
+		cfg->regval[num_regs] = (node->schq -
+					 node->parent->prio_anchor) << 24;
+		num_regs++;
+
+		/* configure PIR */
+		maxrate = (node->rate > node->ceil) ? node->rate : node->ceil;
+		cfg->reg[num_regs] = NIX_AF_TL4X_PIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, maxrate, 65536);
+		num_regs++;
+
+		/* configure CIR */
+		if (!test_bit(QOS_CIR_PIR_SUPPORT, &pfvf->hw.cap_flag)) {
+			/* Don't configure CIR when both CIR+PIR not supported
+			 * On 96xx, CIR + PIR + RED_ALGO=STALL causes deadlock
+			 */
+			goto txschq_cfg_out;
+		}
+
+		cfg->reg[num_regs] = NIX_AF_TL4X_CIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, node->rate, 65536);
+		num_regs++;
+	} else if (level == NIX_TXSCH_LVL_TL3) {
+		/* configure parent txschq */
+		cfg->reg[num_regs] = NIX_AF_TL3X_PARENT(node->schq);
+		cfg->regval[num_regs] = node->parent->schq << 16;
+		num_regs++;
+
+		/* configure link cfg */
+		if (level == pfvf->qos.link_cfg_lvl) {
+			cfg->reg[num_regs] = NIX_AF_TL3_TL2X_LINKX_CFG(node->schq, hw->tx_link);
+			cfg->regval[num_regs] = BIT_ULL(13) | BIT_ULL(12);
+			num_regs++;
+		}
+
+		/* return if not htb node */
+		if (node->qid == OTX2_QOS_QID_NONE) {
+			cfg->reg[num_regs] = NIX_AF_TL3X_SCHEDULE(node->schq);
+			cfg->regval[num_regs] =  node->prio << 24 |
+						 mtu_to_dwrr_weight(pfvf,
+								    pfvf->tx_max_pktlen);
+			num_regs++;
+			goto txschq_cfg_out;
+		}
+
+		/* configure priority */
+		cfg->reg[num_regs] = NIX_AF_TL3X_SCHEDULE(node->schq);
+		cfg->regval[num_regs] = (node->schq -
+					 node->parent->prio_anchor) << 24;
+		num_regs++;
+
+		/* configure PIR */
+		maxrate = (node->rate > node->ceil) ? node->rate : node->ceil;
+		cfg->reg[num_regs] = NIX_AF_TL3X_PIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, maxrate, 65536);
+		num_regs++;
+
+		/* configure CIR */
+		if (!test_bit(QOS_CIR_PIR_SUPPORT, &pfvf->hw.cap_flag)) {
+			/* Don't configure CIR when both CIR+PIR not supported
+			 * On 96xx, CIR + PIR + RED_ALGO=STALL causes deadlock
+			 */
+			goto txschq_cfg_out;
+		}
+
+		cfg->reg[num_regs] = NIX_AF_TL3X_CIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, node->rate, 65536);
+		num_regs++;
+	} else if (level == NIX_TXSCH_LVL_TL2) {
+		/* configure parent txschq */
+		cfg->reg[num_regs] = NIX_AF_TL2X_PARENT(node->schq);
+		cfg->regval[num_regs] = hw->tx_link << 16;
+		num_regs++;
+
+		/* configure link cfg */
+		if (level == pfvf->qos.link_cfg_lvl) {
+			cfg->reg[num_regs] = NIX_AF_TL3_TL2X_LINKX_CFG(node->schq, hw->tx_link);
+			cfg->regval[num_regs] = BIT_ULL(13) | BIT_ULL(12);
+			num_regs++;
+		}
+
+		/* return if not htb node */
+		if (node->qid == OTX2_QOS_QID_NONE) {
+			cfg->reg[num_regs] = NIX_AF_TL2X_SCHEDULE(node->schq);
+			cfg->regval[num_regs] =  node->prio << 24 |
+						 mtu_to_dwrr_weight(pfvf,
+								    pfvf->tx_max_pktlen);
+			num_regs++;
+			goto txschq_cfg_out;
+		}
+
+		/* check if node is root */
+		if (node->qid == OTX2_QOS_QID_INNER && !node->parent) {
+			cfg->reg[num_regs] = NIX_AF_TL2X_SCHEDULE(node->schq);
+			cfg->regval[num_regs] =  hw->txschq_aggr_lvl_rr_prio << 24 |
+						 mtu_to_dwrr_weight(pfvf,
+								    pfvf->tx_max_pktlen);
+			num_regs++;
+			goto txschq_cfg_out;
+		}
+
+		/* configure priority/quantum */
+		cfg->reg[num_regs] = NIX_AF_TL2X_SCHEDULE(node->schq);
+		cfg->regval[num_regs] = (node->schq -
+					 node->parent->prio_anchor) << 24;
+		num_regs++;
+
+		/* configure PIR */
+		maxrate = (node->rate > node->ceil) ? node->rate : node->ceil;
+		cfg->reg[num_regs] = NIX_AF_TL2X_PIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, maxrate, 65536);
+		num_regs++;
+
+		/* configure CIR */
+		if (!test_bit(QOS_CIR_PIR_SUPPORT, &pfvf->hw.cap_flag)) {
+			/* Don't configure CIR when both CIR+PIR not supported
+			 * On 96xx, CIR + PIR + RED_ALGO=STALL causes deadlock
+			 */
+			goto txschq_cfg_out;
+		}
+
+		cfg->reg[num_regs] = NIX_AF_TL2X_CIR(node->schq);
+		cfg->regval[num_regs] =
+			otx2_get_txschq_rate_regval(pfvf, node->rate, 65536);
+		num_regs++;
+	}
+
+txschq_cfg_out:
+	cfg->num_regs = num_regs;
+}
+
+static int otx2_qos_txschq_set_parent_topology(struct otx2_nic *pfvf,
+					       struct otx2_qos_node *parent)
+{
+	struct mbox *mbox = &pfvf->mbox;
+	struct nix_txschq_config *cfg;
+	int rc;
+
+	if (parent->level == NIX_TXSCH_LVL_MDQ)
+		return 0;
+
+	mutex_lock(&mbox->lock);
+
+	cfg = otx2_mbox_alloc_msg_nix_txschq_cfg(&pfvf->mbox);
+	if (!cfg)
+		return -ENOMEM;
+
+	cfg->lvl = parent->level;
+
+	if (parent->level == NIX_TXSCH_LVL_TL4)
+		cfg->reg[0] = NIX_AF_TL4X_TOPOLOGY(parent->schq);
+	else if (parent->level == NIX_TXSCH_LVL_TL3)
+		cfg->reg[0] = NIX_AF_TL3X_TOPOLOGY(parent->schq);
+	else if (parent->level == NIX_TXSCH_LVL_TL2)
+		cfg->reg[0] = NIX_AF_TL2X_TOPOLOGY(parent->schq);
+	else if (parent->level == NIX_TXSCH_LVL_TL1)
+		cfg->reg[0] = NIX_AF_TL1X_TOPOLOGY(parent->schq);
+
+	cfg->regval[0] = (u64)parent->prio_anchor << 32;
+	cfg->num_regs++;
+
+	rc = otx2_sync_mbox_msg(&pfvf->mbox);
+
+	mutex_unlock(&mbox->lock);
+
+	return rc;
+}
+
+static void otx2_qos_free_hw_node_schq(struct otx2_nic *pfvf,
+				       struct otx2_qos_node *parent)
+{
+	struct otx2_qos_node *node, *tmp;
+
+	list_for_each_entry_safe(node, tmp, &parent->child_schq_list, list)
+		otx2_txschq_free_one(pfvf, node->level, node->schq);
+}
+
+static void otx2_qos_free_hw_node(struct otx2_nic *pfvf,
+				  struct otx2_qos_node *parent)
+{
+	struct otx2_qos_node *node, *tmp;
+
+	list_for_each_entry_safe(node, tmp, &parent->child_list, list) {
+		otx2_qos_free_hw_node(pfvf, node);
+		otx2_txschq_free_one(pfvf, node->level, node->schq);
+		otx2_qos_free_hw_node_schq(pfvf, node);
+	}
+}
+
+static void otx2_qos_free_hw_cfg(struct otx2_nic *pfvf,
+				 struct otx2_qos_node *node)
+{
+	mutex_lock(&pfvf->qos.qos_lock);
+
+	/* free child node hw mappings */
+	otx2_qos_free_hw_node(pfvf, node);
+	otx2_qos_free_hw_node_schq(pfvf, node);
+
+	/* free node hw mappings */
+	otx2_txschq_free_one(pfvf, node->level, node->schq);
+
+	mutex_unlock(&pfvf->qos.qos_lock);
+}
+
+static void otx2_qos_sw_node_delete(struct otx2_nic *pfvf,
+				    struct otx2_qos_node *node)
+{
+	hash_del(&node->hlist);
+
+	if (node->qid != OTX2_QOS_QID_INNER && node->qid != OTX2_QOS_QID_NONE) {
+		__clear_bit(node->qid, pfvf->qos.qos_sq_bmap);
+		otx2_qos_update_tx_netdev_queues(pfvf);
+	}
+
+	list_del(&node->list);
+	kfree(node);
+}
+
+static void otx2_qos_free_sw_node_schq(struct otx2_nic *pfvf,
+				       struct otx2_qos_node *parent)
+{
+	struct otx2_qos_node *node, *tmp;
+
+	list_for_each_entry_safe(node, tmp, &parent->child_schq_list, list) {
+		list_del(&node->list);
+		kfree(node);
+	}
+}
+
+static void __otx2_qos_free_sw_node(struct otx2_nic *pfvf,
+				    struct otx2_qos_node *parent)
+{
+	struct otx2_qos_node *node, *tmp;
+
+	list_for_each_entry_safe(node, tmp, &parent->child_list, list) {
+		__otx2_qos_free_sw_node(pfvf, node);
+		otx2_qos_free_sw_node_schq(pfvf, node);
+		otx2_qos_sw_node_delete(pfvf, node);
+	}
+}
+
+static void otx2_qos_free_sw_node(struct otx2_nic *pfvf,
+				  struct otx2_qos_node *node)
+{
+	mutex_lock(&pfvf->qos.qos_lock);
+
+	__otx2_qos_free_sw_node(pfvf, node);
+	otx2_qos_free_sw_node_schq(pfvf, node);
+	otx2_qos_sw_node_delete(pfvf, node);
+
+	mutex_unlock(&pfvf->qos.qos_lock);
+}
+
+static void otx2_qos_destroy_node(struct otx2_nic *pfvf,
+				  struct otx2_qos_node *node)
+{
+	otx2_qos_free_hw_cfg(pfvf, node);
+	otx2_qos_free_sw_node(pfvf, node);
+}
+
+static void otx2_qos_fill_cfg_schq(struct otx2_qos_node *parent,
+				   struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *node;
+
+	list_for_each_entry(node, &parent->child_schq_list, list)
+		cfg->schq[node->level]++;
+}
+
+static void otx2_qos_fill_cfg_tl(struct otx2_qos_node *parent,
+				 struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *node;
+
+	list_for_each_entry(node, &parent->child_list, list) {
+		otx2_qos_fill_cfg_tl(node, cfg);
+		cfg->schq_contig[node->level]++;
+		otx2_qos_fill_cfg_schq(node, cfg);
+	}
+}
+
+static void otx2_qos_prepare_txschq_cfg(struct otx2_nic *pfvf,
+					struct otx2_qos_node *parent,
+					struct otx2_qos_cfg *cfg)
+{
+	mutex_lock(&pfvf->qos.qos_lock);
+	otx2_qos_fill_cfg_tl(parent, cfg);
+	mutex_unlock(&pfvf->qos.qos_lock);
+}
+
+static void otx2_qos_read_txschq_cfg_schq(struct otx2_qos_node *parent,
+					  struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *node;
+	int cnt;
+
+	list_for_each_entry(node, &parent->child_schq_list, list) {
+		cnt = cfg->dwrr_node_pos[node->level];
+		cfg->schq_list[node->level][cnt] = node->schq;
+		cfg->schq[node->level]++;
+		cfg->dwrr_node_pos[node->level]++;
+	}
+}
+
+static void otx2_qos_read_txschq_cfg_tl(struct otx2_qos_node *parent,
+					struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *node;
+	int cnt;
+
+	list_for_each_entry(node, &parent->child_list, list) {
+		otx2_qos_read_txschq_cfg_tl(node, cfg);
+		cnt = cfg->static_node_pos[node->level];
+		cfg->schq_contig_list[node->level][cnt] = node->schq;
+		cfg->schq_contig[node->level]++;
+		cfg->static_node_pos[node->level]++;
+		otx2_qos_read_txschq_cfg_schq(node, cfg);
+	}
+}
+
+static void otx2_qos_read_txschq_cfg(struct otx2_nic *pfvf,
+				     struct otx2_qos_node *node,
+				     struct otx2_qos_cfg *cfg)
+{
+	mutex_lock(&pfvf->qos.qos_lock);
+	otx2_qos_read_txschq_cfg_tl(node, cfg);
+	mutex_unlock(&pfvf->qos.qos_lock);
+}
+
+static struct otx2_qos_node *
+otx2_qos_alloc_root(struct otx2_nic *pfvf)
+{
+	struct otx2_qos_node *node;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+
+	node->parent = NULL;
+	if (!is_otx2_vf(pfvf->pcifunc))
+		node->level = NIX_TXSCH_LVL_TL1;
+	else
+		node->level = NIX_TXSCH_LVL_TL2;
+
+	node->qid = OTX2_QOS_QID_INNER;
+	node->classid = OTX2_QOS_ROOT_CLASSID;
+
+	hash_add(pfvf->qos.qos_hlist, &node->hlist, node->classid);
+	list_add_tail(&node->list, &pfvf->qos.qos_tree);
+	INIT_LIST_HEAD(&node->child_list);
+	INIT_LIST_HEAD(&node->child_schq_list);
+
+	return node;
+}
+
+static int otx2_qos_add_child_node(struct otx2_qos_node *parent,
+				   struct otx2_qos_node *node)
+{
+	struct list_head *head = &parent->child_list;
+	struct otx2_qos_node *tmp_node;
+	struct list_head *tmp;
+
+	for (tmp = head->next; tmp != head; tmp = tmp->next) {
+		tmp_node = list_entry(tmp, struct otx2_qos_node, list);
+		if (tmp_node->prio == node->prio)
+			return -EEXIST;
+		if (tmp_node->prio > node->prio) {
+			list_add_tail(&node->list, tmp);
+			return 0;
+		}
+	}
+
+	list_add_tail(&node->list, head);
+	return 0;
+}
+
+static int otx2_qos_alloc_txschq_node(struct otx2_nic *pfvf,
+				      struct otx2_qos_node *node)
+{
+	struct otx2_qos_node *txschq_node, *parent, *tmp;
+	int lvl;
+
+	parent = node;
+	for (lvl = node->level - 1; lvl >= NIX_TXSCH_LVL_MDQ; lvl--) {
+		txschq_node = kzalloc(sizeof(*txschq_node), GFP_KERNEL);
+		if (!txschq_node)
+			goto err_out;
+
+		txschq_node->parent = parent;
+		txschq_node->level = lvl;
+		txschq_node->classid = OTX2_QOS_CLASS_NONE;
+		txschq_node->qid = OTX2_QOS_QID_NONE;
+		txschq_node->rate = 0;
+		txschq_node->ceil = 0;
+		txschq_node->prio = 0;
+
+		mutex_lock(&pfvf->qos.qos_lock);
+		list_add_tail(&txschq_node->list, &node->child_schq_list);
+		mutex_unlock(&pfvf->qos.qos_lock);
+
+		INIT_LIST_HEAD(&txschq_node->child_list);
+		INIT_LIST_HEAD(&txschq_node->child_schq_list);
+		parent = txschq_node;
+	}
+
+	return 0;
+
+err_out:
+	list_for_each_entry_safe(txschq_node, tmp, &node->child_schq_list,
+				 list) {
+		list_del(&txschq_node->list);
+		kfree(txschq_node);
+	}
+	return -ENOMEM;
+}
+
+static struct otx2_qos_node *
+otx2_qos_sw_create_leaf_node(struct otx2_nic *pfvf,
+			     struct otx2_qos_node *parent,
+			     u16 classid, u32 prio, u64 rate, u64 ceil,
+			     u16 qid)
+{
+	struct otx2_qos_node *node;
+	int err;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+
+	node->parent = parent;
+	node->level = parent->level - 1;
+	node->classid = classid;
+	node->qid = qid;
+	node->rate = otx2_qos_convert_rate(rate);
+	node->ceil = otx2_qos_convert_rate(ceil);
+	node->prio = prio;
+
+	__set_bit(qid, pfvf->qos.qos_sq_bmap);
+
+	hash_add(pfvf->qos.qos_hlist, &node->hlist, classid);
+
+	mutex_lock(&pfvf->qos.qos_lock);
+	err = otx2_qos_add_child_node(parent, node);
+	if (err) {
+		mutex_unlock(&pfvf->qos.qos_lock);
+		return ERR_PTR(err);
+	}
+	mutex_unlock(&pfvf->qos.qos_lock);
+
+	INIT_LIST_HEAD(&node->child_list);
+	INIT_LIST_HEAD(&node->child_schq_list);
+
+	err = otx2_qos_alloc_txschq_node(pfvf, node);
+	if (err) {
+		otx2_qos_sw_node_delete(pfvf, node);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	return node;
+}
+
+static struct otx2_qos_node
+*otx2_sw_node_find_by_qid(struct otx2_nic *pfvf, u16 qid)
+{
+	struct otx2_qos_node *node = NULL;
+	int bkt;
+
+	hash_for_each(pfvf->qos.qos_hlist, bkt, node, hlist) {
+		if (node->qid == qid)
+			break;
+	}
+
+	return node;
+}
+
+static struct otx2_qos_node *
+otx2_sw_node_find(struct otx2_nic *pfvf, u32 classid)
+{
+	struct otx2_qos_node *node = NULL;
+
+	hash_for_each_possible(pfvf->qos.qos_hlist, node, hlist, classid) {
+		if (node->classid == classid)
+			break;
+	}
+
+	return node;
+}
+
+int otx2_get_txq_by_classid(struct otx2_nic *pfvf, u16 classid)
+{
+	struct otx2_qos_node *node;
+	u16 qid;
+	int res;
+
+	node = otx2_sw_node_find(pfvf, classid);
+	if (!node) {
+		res = -ENOENT;
+		goto out;
+	}
+	qid = READ_ONCE(node->qid);
+	if (qid == OTX2_QOS_QID_INNER) {
+		res = -EINVAL;
+		goto out;
+	}
+	res = pfvf->hw.tx_queues + qid;
+out:
+	return res;
+}
+
+static int
+otx2_qos_txschq_config(struct otx2_nic *pfvf, struct otx2_qos_node *node)
+{
+	struct mbox *mbox = &pfvf->mbox;
+	struct nix_txschq_config *req;
+	int rc;
+
+	mutex_lock(&mbox->lock);
+
+	req = otx2_mbox_alloc_msg_nix_txschq_cfg(&pfvf->mbox);
+	if (!req)
+		return -ENOMEM;
+
+	req->lvl = node->level;
+	__otx2_qos_txschq_cfg(pfvf, node, req);
+
+	rc = otx2_sync_mbox_msg(&pfvf->mbox);
+
+	mutex_unlock(&mbox->lock);
+
+	return rc;
+}
+
+static int otx2_qos_txschq_alloc(struct otx2_nic *pfvf,
+				 struct otx2_qos_cfg *cfg)
+{
+	struct nix_txsch_alloc_req *req;
+	struct nix_txsch_alloc_rsp *rsp;
+	struct mbox *mbox = &pfvf->mbox;
+	int lvl, rc, schq;
+
+	mutex_lock(&mbox->lock);
+	req = otx2_mbox_alloc_msg_nix_txsch_alloc(&pfvf->mbox);
+	if (!req)
+		return -ENOMEM;
+
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
+		req->schq[lvl] = cfg->schq[lvl];
+		req->schq_contig[lvl] = cfg->schq_contig[lvl];
+	}
+
+	rc = otx2_sync_mbox_msg(&pfvf->mbox);
+	if (rc)
+		return rc;
+
+	rsp = (struct nix_txsch_alloc_rsp *)
+	      otx2_mbox_get_rsp(&pfvf->mbox.mbox, 0, &req->hdr);
+
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
+		for (schq = 0; schq < rsp->schq_contig[lvl]; schq++) {
+			cfg->schq_contig_list[lvl][schq] =
+				rsp->schq_contig_list[lvl][schq];
+		}
+	}
+
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
+		for (schq = 0; schq < rsp->schq[lvl]; schq++) {
+			cfg->schq_list[lvl][schq] =
+				rsp->schq_list[lvl][schq];
+		}
+	}
+
+	pfvf->qos.link_cfg_lvl = rsp->link_cfg_lvl;
+	pfvf->hw.txschq_aggr_lvl_rr_prio = rsp->aggr_lvl_rr_prio;
+
+	mutex_unlock(&mbox->lock);
+
+	return rc;
+}
+
+static void otx2_qos_txschq_fill_cfg_schq(struct otx2_nic *pfvf,
+					  struct otx2_qos_node *node,
+					  struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *tmp;
+	int cnt;
+
+	list_for_each_entry(tmp, &node->child_schq_list, list) {
+		cnt = cfg->dwrr_node_pos[tmp->level];
+		tmp->schq = cfg->schq_list[tmp->level][cnt];
+		cfg->dwrr_node_pos[tmp->level]++;
+	}
+}
+
+static void otx2_qos_txschq_fill_cfg_tl(struct otx2_nic *pfvf,
+					struct otx2_qos_node *node,
+					struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *tmp;
+	int cnt;
+
+	list_for_each_entry(tmp, &node->child_list, list) {
+		otx2_qos_txschq_fill_cfg_tl(pfvf, tmp, cfg);
+		cnt = cfg->static_node_pos[tmp->level];
+		tmp->schq = cfg->schq_contig_list[tmp->level][cnt];
+		if (cnt == 0)
+			node->prio_anchor = tmp->schq;
+		cfg->static_node_pos[tmp->level]++;
+		otx2_qos_txschq_fill_cfg_schq(pfvf, tmp, cfg);
+	}
+}
+
+static void otx2_qos_txschq_fill_cfg(struct otx2_nic *pfvf,
+				     struct otx2_qos_node *node,
+				     struct otx2_qos_cfg *cfg)
+{
+	mutex_lock(&pfvf->qos.qos_lock);
+	otx2_qos_txschq_fill_cfg_tl(pfvf, node, cfg);
+	otx2_qos_txschq_fill_cfg_schq(pfvf, node, cfg);
+	mutex_unlock(&pfvf->qos.qos_lock);
+}
+
+static int otx2_qos_txschq_push_cfg_schq(struct otx2_nic *pfvf,
+					 struct otx2_qos_node *node,
+					 struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *tmp;
+	int ret = 0;
+
+	list_for_each_entry(tmp, &node->child_schq_list, list) {
+		ret = otx2_qos_txschq_config(pfvf, tmp);
+		if (ret)
+			return -EIO;
+		ret = otx2_qos_txschq_set_parent_topology(pfvf, tmp->parent);
+		if (ret)
+			return -EIO;
+	}
+
+	return 0;
+}
+
+static int otx2_qos_txschq_push_cfg_tl(struct otx2_nic *pfvf,
+				       struct otx2_qos_node *node,
+				       struct otx2_qos_cfg *cfg)
+{
+	struct otx2_qos_node *tmp;
+	int ret = 0;
+
+	list_for_each_entry(tmp, &node->child_list, list) {
+		ret = otx2_qos_txschq_push_cfg_tl(pfvf, tmp, cfg);
+		if (ret)
+			return -EIO;
+		ret = otx2_qos_txschq_config(pfvf, tmp);
+		if (ret)
+			return -EIO;
+		ret = otx2_qos_txschq_push_cfg_schq(pfvf, tmp, cfg);
+		if (ret)
+			return -EIO;
+	}
+
+	ret = otx2_qos_txschq_set_parent_topology(pfvf, node);
+	if (ret)
+		return -EIO;
+
+	return 0;
+}
+
+static int otx2_qos_txschq_push_cfg(struct otx2_nic *pfvf,
+				    struct otx2_qos_node *node,
+				    struct otx2_qos_cfg *cfg)
+{
+	int ret = 0;
+
+	mutex_lock(&pfvf->qos.qos_lock);
+	ret = otx2_qos_txschq_push_cfg_tl(pfvf, node, cfg);
+	if (ret)
+		goto out;
+	ret = otx2_qos_txschq_push_cfg_schq(pfvf, node, cfg);
+out:
+	mutex_unlock(&pfvf->qos.qos_lock);
+	return ret;
+}
+
+static int otx2_qos_txschq_update_config(struct otx2_nic *pfvf,
+					 struct otx2_qos_node *node,
+					 struct otx2_qos_cfg *cfg)
+{
+	int ret = 0;
+
+	otx2_qos_txschq_fill_cfg(pfvf, node, cfg);
+	ret = otx2_qos_txschq_push_cfg(pfvf, node, cfg);
+
+	return ret;
+}
+
+static int otx2_qos_txschq_update_root_cfg(struct otx2_nic *pfvf,
+					   struct otx2_qos_node *root,
+					   struct otx2_qos_cfg *cfg)
+{
+	int ret = 0;
+
+	root->schq = cfg->schq_list[root->level][0];
+	ret = otx2_qos_txschq_config(pfvf, root);
+
+	return ret;
+}
+
+static void otx2_qos_free_cfg(struct otx2_nic *pfvf, struct otx2_qos_cfg *cfg)
+{
+	int lvl, idx, schq;
+
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
+		for (idx = 0; idx < cfg->schq_contig[lvl]; idx++) {
+			schq = cfg->schq_contig_list[lvl][idx];
+			otx2_txschq_free_one(pfvf, lvl, schq);
+		}
+	}
+
+	for (lvl = 0; lvl < NIX_TXSCH_LVL_CNT; lvl++) {
+		for (idx = 0; idx < cfg->schq[lvl]; idx++) {
+			schq = cfg->schq_list[lvl][idx];
+			otx2_txschq_free_one(pfvf, lvl, schq);
+		}
+	}
+}
+
+static void otx2_qos_enadis_sq(struct otx2_nic *pfvf,
+			       struct otx2_qos_node *node,
+			       u16 qid)
+{
+	if (pfvf->qos.qid_to_sqmap[qid] != OTX2_QOS_INVALID_SQ)
+		otx2_qos_disable_sq(pfvf, qid);
+
+	pfvf->qos.qid_to_sqmap[qid] = node->schq;
+	otx2_qos_enable_sq(pfvf, qid);
+}
+
+static void otx2_qos_update_smq_schq(struct otx2_nic *pfvf,
+				     struct otx2_qos_node *node,
+				     bool action)
+{
+	struct otx2_qos_node *tmp;
+
+	if (node->qid == OTX2_QOS_QID_INNER)
+		return;
+
+	list_for_each_entry(tmp, &node->child_schq_list, list) {
+		if (tmp->level == NIX_TXSCH_LVL_MDQ) {
+			if (action == QOS_SMQ_FLUSH)
+				otx2_smq_flush(pfvf, tmp->schq);
+			else
+				otx2_qos_enadis_sq(pfvf, tmp, node->qid);
+		}
+	}
+}
+
+static void __otx2_qos_update_smq(struct otx2_nic *pfvf,
+				  struct otx2_qos_node *node,
+				  bool action)
+{
+	struct otx2_qos_node *tmp;
+
+	list_for_each_entry(tmp, &node->child_list, list) {
+		__otx2_qos_update_smq(pfvf, tmp, action);
+		if (tmp->qid == OTX2_QOS_QID_INNER)
+			continue;
+		if (tmp->level == NIX_TXSCH_LVL_MDQ) {
+			if (action == QOS_SMQ_FLUSH)
+				otx2_smq_flush(pfvf, tmp->schq);
+			else
+				otx2_qos_enadis_sq(pfvf, tmp, tmp->qid);
+		} else {
+			otx2_qos_update_smq_schq(pfvf, tmp, action);
+		}
+	}
+}
+
+static int otx2_qos_update_smq(struct otx2_nic *pfvf,
+			       struct otx2_qos_node *node,
+			       bool action)
+{
+	mutex_lock(&pfvf->qos.qos_lock);
+	__otx2_qos_update_smq(pfvf, node, action);
+	otx2_qos_update_smq_schq(pfvf, node, action);
+	mutex_unlock(&pfvf->qos.qos_lock);
+
+	return 0;
+}
+
+static int otx2_qos_push_txschq_cfg(struct otx2_nic *pfvf,
+				    struct otx2_qos_node *node,
+				    struct otx2_qos_cfg *cfg)
+{
+	int ret = 0;
+
+	ret = otx2_qos_txschq_alloc(pfvf, cfg);
+	if (ret)
+		return -ENOSPC;
+
+	if (!(pfvf->netdev->flags & IFF_UP)) {
+		otx2_qos_txschq_fill_cfg(pfvf, node, cfg);
+		return 0;
+	}
+
+	ret = otx2_qos_txschq_update_config(pfvf, node, cfg);
+	if (ret) {
+		otx2_qos_free_cfg(pfvf, cfg);
+		return -EIO;
+	}
+
+	ret = otx2_qos_update_smq(pfvf, node, QOS_CFG_SQ);
+	if (ret) {
+		otx2_qos_free_cfg(pfvf, cfg);
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static int otx2_qos_update_tree(struct otx2_nic *pfvf,
+				struct otx2_qos_node *node,
+				struct otx2_qos_cfg *cfg)
+{
+	int ret = 0;
+
+	otx2_qos_prepare_txschq_cfg(pfvf, node->parent, cfg);
+	ret = otx2_qos_push_txschq_cfg(pfvf, node->parent, cfg);
+
+	return ret;
+}
+
+static int otx2_qos_root_add(struct otx2_nic *pfvf, u16 htb_maj_id, u16 htb_defcls,
+			     struct netlink_ext_ack *extack)
+{
+	struct otx2_qos_cfg *new_cfg;
+	struct otx2_qos_node *root;
+	int err;
+
+	netdev_dbg(pfvf->netdev,
+		   "TC_HTB_CREATE: handle=0x%x defcls=0x%x\n",
+		   htb_maj_id, htb_defcls);
+
+	INIT_LIST_HEAD(&pfvf->qos.qos_tree);
+	mutex_init(&pfvf->qos.qos_lock);
+
+	root = otx2_qos_alloc_root(pfvf);
+	if (IS_ERR(root)) {
+		mutex_destroy(&pfvf->qos.qos_lock);
+		err = PTR_ERR(root);
+		return err;
+	}
+
+	/* allocate txschq queue */
+	new_cfg = kzalloc(sizeof(*new_cfg), GFP_KERNEL);
+	if (!new_cfg) {
+		NL_SET_ERR_MSG_MOD(extack, "Memory allocation error");
+		mutex_destroy(&pfvf->qos.qos_lock);
+		return -ENOMEM;
+	}
+	/* allocate htb root node */
+	new_cfg->schq[root->level] = 1;
+	err = otx2_qos_txschq_alloc(pfvf, new_cfg);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Error allocating txschq");
+		goto free_root_node;
+	}
+
+	if (!(pfvf->netdev->flags & IFF_UP) ||
+	    root->level == NIX_TXSCH_LVL_TL1) {
+		root->schq = new_cfg->schq_list[root->level][0];
+		goto out;
+	}
+
+	/* update the txschq configuration in hw */
+	err = otx2_qos_txschq_update_root_cfg(pfvf, root, new_cfg);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Error updating txschq configuration");
+		goto txschq_free;
+	}
+
+out:
+	WRITE_ONCE(pfvf->qos.defcls, htb_defcls);
+	smp_store_release(&pfvf->qos.maj_id, htb_maj_id); /* barrier */
+	kfree(new_cfg);
+	return 0;
+
+txschq_free:
+	otx2_qos_free_cfg(pfvf, new_cfg);
+free_root_node:
+	kfree(new_cfg);
+	otx2_qos_sw_node_delete(pfvf, root);
+	mutex_destroy(&pfvf->qos.qos_lock);
+	return err;
+}
+
+static int otx2_qos_root_destroy(struct otx2_nic *pfvf)
+{
+	struct otx2_qos_node *root;
+
+	netdev_dbg(pfvf->netdev, "TC_HTB_DESTROY\n");
+
+	/* find root node */
+	root = otx2_sw_node_find(pfvf, OTX2_QOS_ROOT_CLASSID);
+	if (!root)
+		return -ENOENT;
+
+	/* free the hw mappings */
+	otx2_qos_destroy_node(pfvf, root);
+	mutex_destroy(&pfvf->qos.qos_lock);
+
+	return 0;
+}
+
+static int otx2_qos_validate_configuration(struct otx2_qos_node *parent,
+					   struct netlink_ext_ack *extack,
+					   struct otx2_nic *pfvf,
+					   u64 prio)
+{
+	if (test_bit(prio, parent->prio_bmap)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Static priority child with same priority exists");
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
+static int otx2_qos_leaf_alloc_queue(struct otx2_nic *pfvf, u16 classid,
+				     u32 parent_classid, u64 rate, u64 ceil,
+				     u64 prio, struct netlink_ext_ack *extack)
+{
+	struct otx2_qos_cfg *old_cfg, *new_cfg;
+	struct otx2_qos_node *node, *parent;
+	int qid, ret, err;
+
+	netdev_dbg(pfvf->netdev,
+		   "TC_HTB_LEAF_ALLOC_QUEUE: classid=0x%x parent_classid=0x%x rate=%lld ceil=%lld prio=%lld\n",
+		   classid, parent_classid, rate, ceil, prio);
+
+	if (prio > OTX2_QOS_MAX_PRIO) {
+		NL_SET_ERR_MSG_MOD(extack, "Valid priority range 0 to 7");
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	/* get parent node */
+	parent = otx2_sw_node_find(pfvf, parent_classid);
+	if (!parent) {
+		NL_SET_ERR_MSG_MOD(extack, "parent node not found");
+		ret = -ENOENT;
+		goto out;
+	}
+	if (parent->level == NIX_TXSCH_LVL_MDQ) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB qos max levels reached");
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	ret = otx2_qos_validate_configuration(parent, extack, pfvf, prio);
+	if (ret)
+		goto out;
+
+	set_bit(prio, parent->prio_bmap);
+
+	/* read current txschq configuration */
+	old_cfg = kzalloc(sizeof(*old_cfg), GFP_KERNEL);
+	if (!old_cfg) {
+		NL_SET_ERR_MSG_MOD(extack, "Memory allocation error");
+		ret = -ENOMEM;
+		goto out;
+	}
+	otx2_qos_read_txschq_cfg(pfvf, parent, old_cfg);
+
+	/* allocate a new sq */
+	qid = otx2_qos_get_qid(pfvf);
+	if (qid < 0) {
+		NL_SET_ERR_MSG_MOD(extack, "Reached max supported QOS SQ's");
+		ret = -ENOMEM;
+		goto free_old_cfg;
+	}
+
+	/* Actual SQ mapping will be updated after SMQ alloc */
+	pfvf->qos.qid_to_sqmap[qid] = OTX2_QOS_INVALID_SQ;
+
+	/* allocate and initialize a new child node */
+	node = otx2_qos_sw_create_leaf_node(pfvf, parent, classid, prio, rate,
+					    ceil, qid);
+	if (IS_ERR(node)) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to allocate leaf node");
+		ret = PTR_ERR(node);
+		goto free_old_cfg;
+	}
+
+	/* push new txschq config to hw */
+	new_cfg = kzalloc(sizeof(*new_cfg), GFP_KERNEL);
+	if (!new_cfg) {
+		NL_SET_ERR_MSG_MOD(extack, "Memory allocation error");
+		ret = -ENOMEM;
+		goto free_node;
+	}
+	ret = otx2_qos_update_tree(pfvf, node, new_cfg);
+	if (ret) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB HW configuration error");
+		kfree(new_cfg);
+		otx2_qos_sw_node_delete(pfvf, node);
+		/* restore the old qos tree */
+		err = otx2_qos_txschq_update_config(pfvf, parent, old_cfg);
+		if (err) {
+			netdev_err(pfvf->netdev,
+				   "Failed to restore txcshq configuration");
+			goto free_old_cfg;
+		}
+		err = otx2_qos_update_smq(pfvf, parent, QOS_CFG_SQ);
+		if (err)
+			netdev_err(pfvf->netdev,
+				   "Failed to restore smq configuration");
+		goto free_old_cfg;
+	}
+
+	/* update tx_real_queues */
+	otx2_qos_update_tx_netdev_queues(pfvf);
+
+	/* free new txschq config */
+	kfree(new_cfg);
+
+	/* free old txschq config */
+	otx2_qos_free_cfg(pfvf, old_cfg);
+	kfree(old_cfg);
+
+	return pfvf->hw.tx_queues + qid;
+
+free_node:
+	otx2_qos_sw_node_delete(pfvf, node);
+free_old_cfg:
+	kfree(old_cfg);
+out:
+	return ret;
+}
+
+static int otx2_qos_leaf_to_inner(struct otx2_nic *pfvf, u16 classid,
+				  u16 child_classid, u64 rate, u64 ceil, u64 prio,
+				  struct netlink_ext_ack *extack)
+{
+	struct otx2_qos_cfg *old_cfg, *new_cfg;
+	struct otx2_qos_node *node, *child;
+	int ret, err;
+	u16 qid;
+
+	netdev_dbg(pfvf->netdev,
+		   "TC_HTB_LEAF_TO_INNER classid %04x, child %04x, rate %llu, ceil %llu\n",
+		   classid, child_classid, rate, ceil);
+
+	if (prio > OTX2_QOS_MAX_PRIO) {
+		NL_SET_ERR_MSG_MOD(extack, "Valid priority range 0 to 7");
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	/* find node related to classid */
+	node = otx2_sw_node_find(pfvf, classid);
+	if (!node) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB node not found");
+		ret = -ENOENT;
+		goto out;
+	}
+	/* check max qos txschq level */
+	if (node->level == NIX_TXSCH_LVL_MDQ) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB qos level not supported");
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	set_bit(prio, node->prio_bmap);
+
+	/* store the qid to assign to leaf node */
+	qid = node->qid;
+
+	/* read current txschq configuration */
+	old_cfg = kzalloc(sizeof(*old_cfg), GFP_KERNEL);
+	if (!old_cfg) {
+		NL_SET_ERR_MSG_MOD(extack, "Memory allocation error");
+		ret = -ENOMEM;
+		goto out;
+	}
+	otx2_qos_read_txschq_cfg(pfvf, node, old_cfg);
+
+	/* delete the txschq nodes allocated for this node */
+	otx2_qos_free_sw_node_schq(pfvf, node);
+
+	/* mark this node as htb inner node */
+	node->qid = OTX2_QOS_QID_INNER;
+
+	/* allocate and initialize a new child node */
+	child = otx2_qos_sw_create_leaf_node(pfvf, node, child_classid,
+					     prio, rate, ceil, qid);
+	if (IS_ERR(child)) {
+		NL_SET_ERR_MSG_MOD(extack, "Unable to allocate leaf node");
+		ret = PTR_ERR(child);
+		goto free_old_cfg;
+	}
+
+	/* push new txschq config to hw */
+	new_cfg = kzalloc(sizeof(*new_cfg), GFP_KERNEL);
+	if (!new_cfg) {
+		NL_SET_ERR_MSG_MOD(extack, "Memory allocation error");
+		ret = -ENOMEM;
+		goto free_node;
+	}
+	ret = otx2_qos_update_tree(pfvf, child, new_cfg);
+	if (ret) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB HW configuration error");
+		kfree(new_cfg);
+		otx2_qos_sw_node_delete(pfvf, child);
+		/* restore the old qos tree */
+		node->qid = qid;
+		err = otx2_qos_alloc_txschq_node(pfvf, node);
+		if (err) {
+			netdev_err(pfvf->netdev,
+				   "Failed to restore old leaf node");
+			goto free_old_cfg;
+		}
+		err = otx2_qos_txschq_update_config(pfvf, node, old_cfg);
+		if (err) {
+			netdev_err(pfvf->netdev,
+				   "Failed to restore txcshq configuration");
+			goto free_old_cfg;
+		}
+		err = otx2_qos_update_smq(pfvf, node, QOS_CFG_SQ);
+		if (err)
+			netdev_err(pfvf->netdev,
+				   "Failed to restore smq configuration");
+		goto free_old_cfg;
+	}
+
+	/* free new txschq config */
+	kfree(new_cfg);
+
+	/* free old txschq config */
+	otx2_qos_free_cfg(pfvf, old_cfg);
+	kfree(old_cfg);
+
+	return 0;
+
+free_node:
+	otx2_qos_sw_node_delete(pfvf, child);
+free_old_cfg:
+	kfree(old_cfg);
+out:
+	return ret;
+}
+
+static int otx2_qos_cur_leaf_nodes(struct otx2_nic *pfvf)
+{
+	int last = find_last_bit(pfvf->qos.qos_sq_bmap, pfvf->hw.tc_tx_queues);
+
+	return last ==  pfvf->hw.tc_tx_queues ? 0 : last + 1;
+}
+
+static void otx2_reset_qdisc(struct net_device *dev, u16 qid)
+{
+	struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, qid);
+	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+	if (!qdisc)
+		return;
+
+	spin_lock_bh(qdisc_lock(qdisc));
+	qdisc_reset(qdisc);
+	spin_unlock_bh(qdisc_lock(qdisc));
+}
+
+static void otx2_cfg_smq(struct otx2_nic *pfvf, struct otx2_qos_node *node, int qid)
+{
+	struct otx2_qos_node *tmp;
+
+	list_for_each_entry(tmp, &node->child_schq_list, list)
+		if (tmp->level == NIX_TXSCH_LVL_MDQ) {
+			otx2_qos_txschq_config(pfvf, tmp);
+			pfvf->qos.qid_to_sqmap[qid] = tmp->schq;
+		}
+}
+
+static int otx2_qos_leaf_del(struct otx2_nic *pfvf, u16 *classid,
+			     struct netlink_ext_ack *extack)
+{
+	struct otx2_qos_node *node, *parent;
+	u16 qid, moved_qid;
+	u64 prio;
+
+	netdev_dbg(pfvf->netdev, "TC_HTB_LEAF_DEL classid %04x\n", *classid);
+
+	/* find node related to classid */
+	node = otx2_sw_node_find(pfvf, *classid);
+	if (!node) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB node not found");
+		return -ENOENT;
+	}
+	parent = node->parent;
+	prio   = node->prio;
+	qid    = node->qid;
+
+	otx2_qos_disable_sq(pfvf, node->qid);
+
+	otx2_qos_destroy_node(pfvf, node);
+	pfvf->qos.qid_to_sqmap[qid] = OTX2_QOS_INVALID_SQ;
+
+	clear_bit(prio, parent->prio_bmap);
+
+	moved_qid = otx2_qos_cur_leaf_nodes(pfvf);
+
+	if (moved_qid == 0 || moved_qid == qid)
+		return 0;
+
+	moved_qid--;
+
+	node = otx2_sw_node_find_by_qid(pfvf, moved_qid);
+	if (!node)
+		return 0;
+
+	node->qid =  OTX2_QOS_QID_INNER;
+	__clear_bit(moved_qid, pfvf->qos.qos_sq_bmap);
+	otx2_qos_disable_sq(pfvf, moved_qid);
+
+	otx2_reset_qdisc(pfvf->netdev, pfvf->hw.tx_queues + moved_qid);
+
+	otx2_cfg_smq(pfvf, node, qid);
+
+	otx2_qos_enable_sq(pfvf, qid);
+	__set_bit(qid, pfvf->qos.qos_sq_bmap);
+	node->qid = qid;
+
+	*classid = node->classid;
+
+	return 0;
+}
+
+static int otx2_qos_leaf_del_last(struct otx2_nic *pfvf, u16 classid, bool force,
+				  struct netlink_ext_ack *extack)
+{
+	struct otx2_qos_node *node, *parent;
+	struct otx2_qos_cfg *new_cfg;
+	u64 prio;
+	int err;
+	u16 qid;
+
+	netdev_dbg(pfvf->netdev,
+		   "TC_HTB_LEAF_DEL_LAST classid %04x\n", classid);
+
+	/* find node related to classid */
+	node = otx2_sw_node_find(pfvf, classid);
+	if (!node) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB node not found");
+		return -ENOENT;
+	}
+
+	/* save qid for use by parent */
+	qid = node->qid;
+	prio = node->prio;
+
+	parent = otx2_sw_node_find(pfvf, node->parent->classid);
+	if (!parent) {
+		NL_SET_ERR_MSG_MOD(extack, "parent node not found");
+		return -ENOENT;
+	}
+
+	/* destroy the leaf node */
+	otx2_qos_destroy_node(pfvf, node);
+	pfvf->qos.qid_to_sqmap[qid] = OTX2_QOS_INVALID_SQ;
+
+	clear_bit(prio, parent->prio_bmap);
+
+	/* create downstream txschq entries to parent */
+	otx2_qos_alloc_txschq_node(pfvf, parent);
+	parent->qid = qid;
+	__set_bit(qid, pfvf->qos.qos_sq_bmap);
+
+	/* push new txschq config to hw */
+	new_cfg = kzalloc(sizeof(*new_cfg), GFP_KERNEL);
+	if (!new_cfg) {
+		NL_SET_ERR_MSG_MOD(extack, "Memory allocation error");
+		return -ENOMEM;
+	}
+	/* fill txschq cfg and push txschq cfg to hw */
+	otx2_qos_fill_cfg_schq(parent, new_cfg);
+	err = otx2_qos_push_txschq_cfg(pfvf, parent, new_cfg);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "HTB HW configuration error");
+		kfree(new_cfg);
+		return err;
+	}
+	kfree(new_cfg);
+
+	/* update tx_real_queues */
+	otx2_qos_update_tx_netdev_queues(pfvf);
+
+	return 0;
+}
+
+int otx2_clean_qos_queues(struct otx2_nic *pfvf)
+{
+	struct otx2_qos_node *root;
+
+	root = otx2_sw_node_find(pfvf, OTX2_QOS_ROOT_CLASSID);
+	if (!root)
+		return 0;
+
+	return otx2_qos_update_smq(pfvf, root, QOS_SMQ_FLUSH);
+}
+
+void otx2_qos_config_txschq(struct otx2_nic *pfvf)
+{
+	struct otx2_qos_node *root;
+	int err;
+
+	root = otx2_sw_node_find(pfvf, OTX2_QOS_ROOT_CLASSID);
+	if (!root)
+		return;
+
+	err = otx2_qos_txschq_config(pfvf, root);
+	if (err) {
+		netdev_err(pfvf->netdev, "Error update txschq configuration\n");
+		goto root_destroy;
+	}
+
+	err = otx2_qos_txschq_push_cfg_tl(pfvf, root, NULL);
+	if (err) {
+		netdev_err(pfvf->netdev, "Error update txschq configuration\n");
+		goto root_destroy;
+	}
+
+	err = otx2_qos_update_smq(pfvf, root, QOS_CFG_SQ);
+	if (err) {
+		netdev_err(pfvf->netdev, "Error update smq configuration\n");
+		goto root_destroy;
+	}
+
+	return;
+
+root_destroy:
+	otx2_qos_root_destroy(pfvf);
+}
+
+bool otx2_is_qos_configured(struct otx2_nic *pfvf)
+{
+	struct otx2_qos_node *root;
+
+	root = otx2_sw_node_find(pfvf, OTX2_QOS_ROOT_CLASSID);
+	if (!root)
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL(otx2_is_qos_configured);
+
+int otx2_setup_tc_htb(struct net_device *ndev, struct tc_htb_qopt_offload *htb)
+{
+	struct otx2_nic *pfvf = netdev_priv(ndev);
+	int res;
+
+	switch (htb->command) {
+	case TC_HTB_CREATE:
+		return otx2_qos_root_add(pfvf, htb->parent_classid,
+					 htb->classid, htb->extack);
+	case TC_HTB_DESTROY:
+		return otx2_qos_root_destroy(pfvf);
+	case TC_HTB_LEAF_ALLOC_QUEUE:
+		res = otx2_qos_leaf_alloc_queue(pfvf, htb->classid,
+						htb->parent_classid,
+						htb->rate, htb->ceil,
+						htb->prio, htb->extack);
+		if (res < 0)
+			return res;
+		htb->qid = res;
+		return 0;
+	case TC_HTB_LEAF_TO_INNER:
+		return otx2_qos_leaf_to_inner(pfvf, htb->parent_classid,
+					      htb->classid, htb->rate,
+					      htb->ceil, htb->prio,
+					      htb->extack);
+	case TC_HTB_LEAF_DEL:
+		return otx2_qos_leaf_del(pfvf, &htb->classid, htb->extack);
+	case TC_HTB_LEAF_DEL_LAST:
+	case TC_HTB_LEAF_DEL_LAST_FORCE:
+		return otx2_qos_leaf_del_last(pfvf, htb->classid,
+				htb->command == TC_HTB_LEAF_DEL_LAST_FORCE,
+					      htb->extack);
+	case TC_HTB_LEAF_QUERY_QUEUE:
+		res = otx2_get_txq_by_classid(pfvf, htb->classid);
+		htb->qid = res;
+		return 0;
+	case TC_HTB_NODE_MODIFY:
+		fallthrough;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos.h b/drivers/net/ethernet/marvell/octeontx2/nic/qos.h
index ef8c99a6b2d0..d8e32a6e541d 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/qos.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos.h
@@ -7,13 +7,65 @@
 #ifndef OTX2_QOS_H
 #define OTX2_QOS_H
 
+#include <linux/types.h>
+#include <linux/netdevice.h>
+#include <linux/rhashtable.h>
+
+#define OTX2_QOS_MAX_LVL		4
+#define OTX2_QOS_MAX_PRIO		7
 #define OTX2_QOS_MAX_LEAF_NODES		16
 
-int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx, u16 smq);
-void otx2_qos_disable_sq(struct otx2_nic *pfvf, int qidx, u16 mdq);
+enum qos_smq_operations {
+	QOS_CFG_SQ,
+	QOS_SMQ_FLUSH,
+};
+
+u64 otx2_get_txschq_rate_regval(struct otx2_nic *nic, u64 maxrate, u32 burst);
+
+int otx2_setup_tc_htb(struct net_device *ndev, struct tc_htb_qopt_offload *htb);
+int otx2_qos_get_qid(struct otx2_nic *pfvf);
+void otx2_qos_free_qid(struct otx2_nic *pfvf, int qidx);
+int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx);
+void otx2_qos_disable_sq(struct otx2_nic *pfvf, int qidx);
+
+struct otx2_qos_cfg {
+	u16 schq[NIX_TXSCH_LVL_CNT];
+	u16 schq_contig[NIX_TXSCH_LVL_CNT];
+	u16 schq_list[NIX_TXSCH_LVL_CNT][MAX_TXSCHQ_PER_FUNC];
+	u16 schq_contig_list[NIX_TXSCH_LVL_CNT][MAX_TXSCHQ_PER_FUNC];
+	int static_node_pos[NIX_TXSCH_LVL_CNT];
+	int dwrr_node_pos[NIX_TXSCH_LVL_CNT];
+};
 
 struct otx2_qos {
+	DECLARE_HASHTABLE(qos_hlist, order_base_2(OTX2_QOS_MAX_LEAF_NODES));
+	DECLARE_BITMAP(qos_sq_bmap, OTX2_QOS_MAX_LEAF_NODES);
 	u16 qid_to_sqmap[OTX2_QOS_MAX_LEAF_NODES];
+	u16 maj_id;
+	u16 defcls;
+	struct list_head qos_tree;
+	struct mutex qos_lock; /* child list lock */
+	u8  link_cfg_lvl; /* LINKX_CFG CSRs mapped to TL3 or TL2's index ? */
+};
+
+struct otx2_qos_node {
+	/* htb params */
+	u32 classid;
+	u64 rate;
+	u64 ceil;
+	u32 prio;
+	/* hw txschq */
+	u8 level;
+	u16 schq;
+	u16 qid;
+	u16 prio_anchor;
+	DECLARE_BITMAP(prio_bmap, OTX2_QOS_MAX_PRIO + 1);
+	/* list management */
+	struct hlist_node hlist;
+	struct otx2_qos_node *parent;	/* parent qos node */
+	struct list_head list;
+	struct list_head child_list;
+	struct list_head child_schq_list;
 };
 
 #endif
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c b/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c
index 96e53d5d1778..66d2de8e7544 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/qos_sq.c
@@ -225,7 +225,22 @@ static int otx2_qos_ctx_disable(struct otx2_nic *pfvf, u16 qidx, int aura_id)
 	return otx2_sync_mbox_msg(&pfvf->mbox);
 }
 
-int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx, u16 smq)
+int otx2_qos_get_qid(struct otx2_nic *pfvf)
+{
+	int qidx;
+
+	qidx = find_first_zero_bit(pfvf->qos.qos_sq_bmap,
+				   pfvf->hw.tc_tx_queues);
+
+	return qidx == pfvf->hw.tc_tx_queues ? -ENOSPC : qidx;
+}
+
+void otx2_qos_free_qid(struct otx2_nic *pfvf, int qidx)
+{
+	clear_bit(qidx, pfvf->qos.qos_sq_bmap);
+}
+
+int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx)
 {
 	struct otx2_hw *hw = &pfvf->hw;
 	int pool_id, sq_idx, err;
@@ -241,7 +256,6 @@ int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx, u16 smq)
 		goto out;
 
 	pool_id = otx2_get_pool_idx(pfvf, AURA_NIX_SQ, sq_idx);
-	pfvf->qos.qid_to_sqmap[qidx] = smq;
 	err = otx2_sq_init(pfvf, sq_idx, pool_id);
 	if (err)
 		goto out;
@@ -250,7 +264,7 @@ int otx2_qos_enable_sq(struct otx2_nic *pfvf, int qidx, u16 smq)
 	return err;
 }
 
-void otx2_qos_disable_sq(struct otx2_nic *pfvf, int qidx, u16 mdq)
+void otx2_qos_disable_sq(struct otx2_nic *pfvf, int qidx)
 {
 	struct otx2_qset *qset = &pfvf->qset;
 	struct otx2_hw *hw = &pfvf->hw;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-18 10:51 ` [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO Hariprasad Kelam
@ 2023-01-18 20:37   ` Maxim Mikityanskiy
  2023-01-20  8:50     ` Hariprasad Kelam
  0 siblings, 1 reply; 23+ messages in thread
From: Maxim Mikityanskiy @ 2023-01-18 20:37 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet, sgoutham,
	lcherian, gakula, jerinj, sbhatta, jhs, xiyou.wangcong, jiri,
	saeedm, richardcochran, tariqt, linux-rdma

On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> All VFs and PF netdev shares same TL1 schedular, each interface
> PF or VF will have different TL2 schedulars having same parent
> TL1. The TL1 RR_PRIO value is static and PF/VFs use the same value
> to configure its TL2 node priority in case of DWRR children.
> 
> This patch adds support to configure TL1 RR_PRIO value using devlink.
> The TL1 RR_PRIO can be configured for each PF. The VFs are not allowed
> to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO value from
> the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.

I asked this question under v1, but didn't get an answer, could you shed
some light?

"Could you please elaborate how these priorities of Transmit Levels are
related to HTB priorities? I don't seem to understand why something has
to be configured with devlink in addition to HTB.

Also, what do MDQ and SMQ abbreviations mean?"

BTW, why did you remove the paragraphs with an example and a limitation?
I think they are pretty useful.

Another question unanswered under v1 was:

"Is there any technical difficulty or hardware limitation preventing from
implementing modifications?" (TC_HTB_NODE_MODIFY)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-18 20:37   ` Maxim Mikityanskiy
@ 2023-01-20  8:50     ` Hariprasad Kelam
  2023-01-20 13:37       ` Maxim Mikityanskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-20  8:50 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev


On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> All VFs and PF netdev shares same TL1 schedular, each interface PF or 
> VF will have different TL2 schedulars having same parent TL1. The TL1 
> RR_PRIO value is static and PF/VFs use the same value to configure its 
> TL2 node priority in case of DWRR children.
> 
> This patch adds support to configure TL1 RR_PRIO value using devlink.
> The TL1 RR_PRIO can be configured for each PF. The VFs are not allowed 
> to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO value from 
> the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.

I asked this question under v1, but didn't get an answer, could you shed some light?

"Could you please elaborate how these priorities of Transmit Levels are related to HTB priorities? I don't seem to understand why something has to be configured with devlink in addition to HTB.

SMQ (send meta-descriptor queue) and MDQ (meta-descriptor queue) are the first transmit levels.
Each send queue is mapped with SMQ.
 
As mentioned in cover letter, each egress packet needs to traverse all transmit levels starting from TL5 to TL1.
This applies to non-QOS Send queues as well.
 
                       SMQ/MDQ --> TL4 -->TL3 -->TL2 -->TL1

By default non QOS queues use a default hierarchy  with round robin priority. 
To avoid conflict with QOS tree priorities, with devlink user can choose round-robin priority before Qos tree formation.


BTW, why did you remove the paragraphs with an example and a limitation?
I think they are pretty useful.

Ok , removed them accidentally will correct in the next version.

Another question unanswered under v1 was:

"Is there any technical difficulty or hardware limitation preventing from implementing modifications?" (TC_HTB_NODE_MODIFY)

There is no hardware limitation, we are currently implementing it.  once it's implemented we will submit for review.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-20  8:50     ` Hariprasad Kelam
@ 2023-01-20 13:37       ` Maxim Mikityanskiy
  2023-01-20 17:03         ` Hariprasad Kelam
  0 siblings, 1 reply; 23+ messages in thread
From: Maxim Mikityanskiy @ 2023-01-20 13:37 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev

On Fri, Jan 20, 2023 at 08:50:16AM +0000, Hariprasad Kelam wrote:
> 
> On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> > All VFs and PF netdev shares same TL1 schedular, each interface PF or 
> > VF will have different TL2 schedulars having same parent TL1. The TL1 
> > RR_PRIO value is static and PF/VFs use the same value to configure its 
> > TL2 node priority in case of DWRR children.
> > 
> > This patch adds support to configure TL1 RR_PRIO value using devlink.
> > The TL1 RR_PRIO can be configured for each PF. The VFs are not allowed 
> > to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO value from 
> > the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.
> 
> I asked this question under v1, but didn't get an answer, could you shed some light?
> 
> "Could you please elaborate how these priorities of Transmit Levels are related to HTB priorities? I don't seem to understand why something has to be configured with devlink in addition to HTB.
> 
> SMQ (send meta-descriptor queue) and MDQ (meta-descriptor queue) are the first transmit levels.
> Each send queue is mapped with SMQ.
>  
> As mentioned in cover letter, each egress packet needs to traverse all transmit levels starting from TL5 to TL1.

Yeah, I saw that, just some details about your hardware which might be
obvious to you aren't so clear to me...

Do these transmit levels map to "layers" of HTB hierarchy? Does it look
like this, or is my understanding completely wrong?

TL1                 [HTB root node]
                   /               \
TL2          [HTB node]         [HTB node]
            /          \             |
TL3    [HTB node]  [HTB node]   [HTB node]
...                       ...


> This applies to non-QOS Send queues as well.
>  
>                        SMQ/MDQ --> TL4 -->TL3 -->TL2 -->TL1
> 
> By default non QOS queues use a default hierarchy  with round robin priority. 
> To avoid conflict with QOS tree priorities, with devlink user can choose round-robin priority before Qos tree formation.

So, this priority that you set with devlink is basically a weight of
unclassified (default) traffic for round robin between unclassified and
classified traffic, right? I.e. you have two hierarchies (one for HTB,
another for non-QoS queue), and you do DWRR between them, according to
this priority?

> BTW, why did you remove the paragraphs with an example and a limitation?
> I think they are pretty useful.
> 
> Ok , removed them accidentally will correct in the next version.
> 
> Another question unanswered under v1 was:
> 
> "Is there any technical difficulty or hardware limitation preventing from implementing modifications?" (TC_HTB_NODE_MODIFY)
> 
> There is no hardware limitation, we are currently implementing it.  once it's implemented we will submit for review.

Great, that's nice to hear, looking forward to it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-20 13:37       ` Maxim Mikityanskiy
@ 2023-01-20 17:03         ` Hariprasad Kelam
  2023-01-22 13:18           ` Maxim Mikityanskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-20 17:03 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev



On Fri, Jan 20, 2023 at 08:50:16AM +0000, Hariprasad Kelam wrote:
> 
> On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> > All VFs and PF netdev shares same TL1 schedular, each interface PF 
> > or VF will have different TL2 schedulars having same parent TL1. The 
> > TL1 RR_PRIO value is static and PF/VFs use the same value to 
> > configure its
> > TL2 node priority in case of DWRR children.
> > 
> > This patch adds support to configure TL1 RR_PRIO value using devlink.
> > The TL1 RR_PRIO can be configured for each PF. The VFs are not 
> > allowed to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO 
> > value from the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.
> 
> I asked this question under v1, but didn't get an answer, could you shed some light?
> 
> "Could you please elaborate how these priorities of Transmit Levels are related to HTB priorities? I don't seem to understand why something has to be configured with devlink in addition to HTB.
> 
> SMQ (send meta-descriptor queue) and MDQ (meta-descriptor queue) are the first transmit levels.
> Each send queue is mapped with SMQ.
>  
> As mentioned in cover letter, each egress packet needs to traverse all transmit levels starting from TL5 to TL1.

Yeah, I saw that, just some details about your hardware which might be obvious to you aren't so clear to me...

Do these transmit levels map to "layers" of HTB hierarchy? Does it look like this, or is my understanding completely wrong?

TL1                 [HTB root node]
                   /               \
TL2          [HTB node]         [HTB node]
            /          \             |
TL3    [HTB node]  [HTB node]   [HTB node]
...                       ...

Transmit levels to HTB mapping is correct.



> This applies to non-QOS Send queues as well.
>  
>                        SMQ/MDQ --> TL4 -->TL3 -->TL2 -->TL1
> 
> By default non QOS queues use a default hierarchy  with round robin priority. 
> To avoid conflict with QOS tree priorities, with devlink user can choose round-robin priority before Qos tree formation.

So, this priority that you set with devlink is basically a weight of unclassified (default) traffic for round robin between unclassified and classified traffic, right? I.e. you have two hierarchies (one for HTB, another for non-QoS queue), and you do DWRR between them, according to this priority?


 Not exactly, In the given scenario where  multiple vfs are attached to PF netdev.
 each VF unclassified traffic forms a hierarchy and PF also forms a hierarchy for unclassified traffic.
 
Now traffic from these all tress(multiple vfs and PFs) are aggregated at TL1. HW performs DWRR among them since these TL2 queues (belonging to each pf and vf netdevs) will be configured with the same priority by the driver.

Currently, this priority is hard coded. Now we are providing this as a configurable value to the user.

Now if a user adds a HTB node, this will have strict priority at TL2 level since DWRR priority is different this traffic won't be affected by DWRR unclassified traffic.

Thanks,
Hariprasad k







> BTW, why did you remove the paragraphs with an example and a limitation?
> I think they are pretty useful.
> 
> Ok , removed them accidentally will correct in the next version.
> 
> Another question unanswered under v1 was:
> 
> "Is there any technical difficulty or hardware limitation preventing 
> from implementing modifications?" (TC_HTB_NODE_MODIFY)
> 
> There is no hardware limitation, we are currently implementing it.  once it's implemented we will submit for review.

Great, that's nice to hear, looking forward to it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-20 17:03         ` Hariprasad Kelam
@ 2023-01-22 13:18           ` Maxim Mikityanskiy
  2023-01-23 17:03             ` Hariprasad Kelam
  0 siblings, 1 reply; 23+ messages in thread
From: Maxim Mikityanskiy @ 2023-01-22 13:18 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev

On Fri, Jan 20, 2023 at 05:03:20PM +0000, Hariprasad Kelam wrote:
> 
> 
> On Fri, Jan 20, 2023 at 08:50:16AM +0000, Hariprasad Kelam wrote:
> > 
> > On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> > > All VFs and PF netdev shares same TL1 schedular, each interface PF 
> > > or VF will have different TL2 schedulars having same parent TL1. The 
> > > TL1 RR_PRIO value is static and PF/VFs use the same value to 
> > > configure its
> > > TL2 node priority in case of DWRR children.
> > > 
> > > This patch adds support to configure TL1 RR_PRIO value using devlink.
> > > The TL1 RR_PRIO can be configured for each PF. The VFs are not 
> > > allowed to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO 
> > > value from the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.
> > 
> > I asked this question under v1, but didn't get an answer, could you shed some light?
> > 
> > "Could you please elaborate how these priorities of Transmit Levels are related to HTB priorities? I don't seem to understand why something has to be configured with devlink in addition to HTB.
> > 
> > SMQ (send meta-descriptor queue) and MDQ (meta-descriptor queue) are the first transmit levels.
> > Each send queue is mapped with SMQ.
> >  
> > As mentioned in cover letter, each egress packet needs to traverse all transmit levels starting from TL5 to TL1.
> 
> Yeah, I saw that, just some details about your hardware which might be obvious to you aren't so clear to me...
> 
> Do these transmit levels map to "layers" of HTB hierarchy? Does it look like this, or is my understanding completely wrong?
> 
> TL1                 [HTB root node]
>                    /               \
> TL2          [HTB node]         [HTB node]
>             /          \             |
> TL3    [HTB node]  [HTB node]   [HTB node]
> ...                       ...
> 
> Transmit levels to HTB mapping is correct.
> 
> 
> 
> > This applies to non-QOS Send queues as well.
> >  
> >                        SMQ/MDQ --> TL4 -->TL3 -->TL2 -->TL1
> > 
> > By default non QOS queues use a default hierarchy  with round robin priority. 
> > To avoid conflict with QOS tree priorities, with devlink user can choose round-robin priority before Qos tree formation.
> 
> So, this priority that you set with devlink is basically a weight of unclassified (default) traffic for round robin between unclassified and classified traffic, right? I.e. you have two hierarchies (one for HTB, another for non-QoS queue), and you do DWRR between them, according to this priority?
> 
> 
>  Not exactly, In the given scenario where  multiple vfs are attached to PF netdev.
>  each VF unclassified traffic forms a hierarchy and PF also forms a hierarchy for unclassified traffic.
>  
> Now traffic from these all tress(multiple vfs and PFs) are aggregated at TL1. HW performs DWRR among them since these TL2 queues (belonging to each pf and vf netdevs) will be configured with the same priority by the driver.
> 
> Currently, this priority is hard coded. Now we are providing this as a configurable value to the user.
> 
> Now if a user adds a HTB node, this will have strict priority at TL2 level since DWRR priority is different this traffic won't be affected by DWRR unclassified traffic.

So, did I get it right now?

                                     [strict priority**]
                           /---------/                 \-----\
                           |                                 |
                        [DWRR*]                              |
        /---------------/  |   \---------------\             |
        |                  |                   |             |
[ Hierarchy for ]  [ Hierarchy for  ]  [ Hierarchy for  ]    |
[ unclassified  ]  [  unclassified  ]  [  unclassified  ]    |
[traffic from PF]  [traffic from VF1]  [traffic from VF2]    |
[      ***      ]  [      ***       ]  [      ***       ]    |
                                                             |
                                                [HTB hierarchy using]
                                                [  strict priority  ]
                                                [   between nodes   ]

As far as I understand, you set priorities at ***, which affect DWRR
balancing at *, but it's not clear to me how the selection at ** works.
Does the HTB hierarchy have some fixed priority, i.e. the user can set
priority for unclassified traffic to be higher or lower than HTB
traffic?

Please also point me at any inaccuracies in my picture, I really want to
understand the algorithm here, because configuring additional priorities
outside of HTB looks unusual to me.

> 
> Thanks,
> Hariprasad k

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 2/5] octeontx2-pf: qos send queues management
  2023-01-18 10:51 ` [net-next Patch v2 2/5] octeontx2-pf: qos send queues management Hariprasad Kelam
@ 2023-01-22 16:46   ` Simon Horman
  2023-01-23 17:26     ` Hariprasad Kelam
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2023-01-22 16:46 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet, sgoutham,
	lcherian, gakula, jerinj, sbhatta, jhs, xiyou.wangcong, jiri,
	saeedm, richardcochran, tariqt, linux-rdma, maxtram95

On Wed, Jan 18, 2023 at 04:21:04PM +0530, Hariprasad Kelam wrote:
> From: Subbaraya Sundeep <sbhatta@marvell.com>
> 
> Current implementation is such that the number of Send queues (SQs)
> are decided on the device probe which is equal to the number of online
> cpus. These SQs are allocated and deallocated in interface open and c
> lose calls respectively.
> 
> This patch defines new APIs for initializing and deinitializing Send
> queues dynamically and allocates more number of transmit queues for
> QOS feature.
> 
> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>

...

> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index 88f8772a61cd..0868ae825736 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -758,11 +758,16 @@ int otx2_txschq_stop(struct otx2_nic *pfvf)
>  void otx2_sqb_flush(struct otx2_nic *pfvf)
>  {
>  	int qidx, sqe_tail, sqe_head;
> +	struct otx2_snd_queue *sq;
>  	u64 incr, *ptr, val;
>  	int timeout = 1000;
>  
>  	ptr = (u64 *)otx2_get_regaddr(pfvf, NIX_LF_SQ_OP_STATUS);
> -	for (qidx = 0; qidx < pfvf->hw.tot_tx_queues; qidx++) {
> +	for (qidx = 0; qidx < pfvf->hw.tot_tx_queues + pfvf->hw.tc_tx_queues;

nit:

It seems awkward that essentially this is saying that the
total tx queues is 'tot_tx_queues' + 'tc_tx_queues'.
As I read 'tot' as being short for 'total'.

Also, the pfvf->hw.tot_tx_queues + pfvf->hw.tc_tx_queues pattern
is rather verbose and repeated often. Perhaps a helper would... help.

> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index c1ea60bc2630..3acda6d289d3 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c

...

> @@ -1688,11 +1693,13 @@ int otx2_open(struct net_device *netdev)
>  
>  	netif_carrier_off(netdev);
>  
> -	pf->qset.cq_cnt = pf->hw.rx_queues + pf->hw.tot_tx_queues;
>  	/* RQ and SQs are mapped to different CQs,
>  	 * so find out max CQ IRQs (i.e CINTs) needed.
>  	 */
>  	pf->hw.cint_cnt = max(pf->hw.rx_queues, pf->hw.tx_queues);
> +	pf->hw.cint_cnt = max_t(u8, pf->hw.cint_cnt, pf->hw.tc_tx_queues);

nit: maybe this is nicer? *completely untested!*

	pf->hw.cint_cnt = max3(pf->hw.rx_queues, pf->hw.tx_queues),
			       pf->hw.tc_tx_queues);

...

> @@ -735,7 +741,10 @@ static void otx2_sqe_add_hdr(struct otx2_nic *pfvf, struct otx2_snd_queue *sq,
>  		sqe_hdr->aura = sq->aura_id;
>  		/* Post a CQE Tx after pkt transmission */
>  		sqe_hdr->pnc = 1;
> -		sqe_hdr->sq = qidx;
> +		if (pfvf->hw.tx_queues == qidx)
> +			sqe_hdr->sq = qidx + pfvf->hw.xdp_queues;
> +		else
> +			sqe_hdr->sq = qidx;

nit: maybe this is nicer? *completely untested!*

		sqe_hdr = pfvf->hw.tx_queues != qidx ?
			  qidx + pfvf->hw.xdp_queues : qidx;

...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-22 13:18           ` Maxim Mikityanskiy
@ 2023-01-23 17:03             ` Hariprasad Kelam
  2023-01-23 20:05               ` Maxim Mikityanskiy
  2023-01-23 22:44               ` Jakub Kicinski
  0 siblings, 2 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-23 17:03 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli



> 
> On Fri, Jan 20, 2023 at 08:50:16AM +0000, Hariprasad Kelam wrote:
> > 
> > On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> > > All VFs and PF netdev shares same TL1 schedular, each interface PF 
> > > or VF will have different TL2 schedulars having same parent TL1. 
> > > The
> > > TL1 RR_PRIO value is static and PF/VFs use the same value to 
> > > configure its
> > > TL2 node priority in case of DWRR children.
> > > 
> > > This patch adds support to configure TL1 RR_PRIO value using devlink.
> > > The TL1 RR_PRIO can be configured for each PF. The VFs are not 
> > > allowed to configure TL1 RR_PRIO value. The VFs can get the 
> > > RR_PRIO value from the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.
> > 
> > I asked this question under v1, but didn't get an answer, could you shed some light?
> > 
> > "Could you please elaborate how these priorities of Transmit Levels are related to HTB priorities? I don't seem to understand why something has to be configured with devlink in addition to HTB.
> > 
> > SMQ (send meta-descriptor queue) and MDQ (meta-descriptor queue) are the first transmit levels.
> > Each send queue is mapped with SMQ.
> >  
> > As mentioned in cover letter, each egress packet needs to traverse all transmit levels starting from TL5 to TL1.
> 
> Yeah, I saw that, just some details about your hardware which might be obvious to you aren't so clear to me...
> 
> Do these transmit levels map to "layers" of HTB hierarchy? Does it look like this, or is my understanding completely wrong?
> 
> TL1                 [HTB root node]
>                    /               \
> TL2          [HTB node]         [HTB node]
>             /          \             |
> TL3    [HTB node]  [HTB node]   [HTB node]
> ...                       ...
> 
> Transmit levels to HTB mapping is correct.
> 
> 
> 
> > This applies to non-QOS Send queues as well.
> >  
> >                        SMQ/MDQ --> TL4 -->TL3 -->TL2 -->TL1
> > 
> > By default non QOS queues use a default hierarchy  with round robin priority. 
> > To avoid conflict with QOS tree priorities, with devlink user can choose round-robin priority before Qos tree formation.
> 
> So, this priority that you set with devlink is basically a weight of unclassified (default) traffic for round robin between unclassified and classified traffic, right? I.e. you have two hierarchies (one for HTB, another for non-QoS queue), and you do DWRR between them, according to this priority?
> 
> 
>  Not exactly, In the given scenario where  multiple vfs are attached to PF netdev.
>  each VF unclassified traffic forms a hierarchy and PF also forms a hierarchy for unclassified traffic.
>  
> Now traffic from these all tress(multiple vfs and PFs) are aggregated at TL1. HW performs DWRR among them since these TL2 queues (belonging to each pf and vf netdevs) will be configured with the same priority by the driver.
> 
> Currently, this priority is hard coded. Now we are providing this as a configurable value to the user.
> 
> Now if a user adds a HTB node, this will have strict priority at TL2 level since DWRR priority is different this traffic won't be affected by DWRR unclassified traffic.

So, did I get it right now?

                                     [strict priority**]
                           /---------/                 \-----\
                           |                                 |
                        [DWRR*]                              |
        /---------------/  |   \---------------\             |
        |                  |                   |             |
[ Hierarchy for ]  [ Hierarchy for  ]  [ Hierarchy for  ]    |
[ unclassified  ]  [  unclassified  ]  [  unclassified  ]    |
[traffic from PF]  [traffic from VF1]  [traffic from VF2]    |
[      ***      ]  [      ***       ]  [      ***       ]    |
                                                             |
                                                [HTB hierarchy using]
                                                [  strict priority  ]
                                                [   between nodes   ]



        Adjusted picture

                        /--------------------------------------------------------------------------------/       Transmit level 1
                            |                                                                                      | 
                        [DWRR*] [ priority 6 ]                                      [strict priority** ]      [ priority 0 ]  Transmit level 2
        /---------------/  |   \-----------------------------------\                           |
             |                   |                                   |                                           |
[ Hierarchy for ]  [ Hierarchy for  ]  [ Hierarchy for  ]                 [ Hierarchy for  ] 
[ unclassified  ]  [  unclassified  ]  [  unclassified  ]                      [ strict priority  ]
[traffic from PF]  [traffic from VF1]  [traffic from VF2]   
[      ***      ]  [      ***       ]  [      ***       ]    
                                                


As far as I understand, you set priorities at ***, which affect DWRR balancing at *, but it's not clear to me how the selection at ** works.
Does the HTB hierarchy have some fixed priority,  ?

Hardware supports priorities from 0 to 7. lower value has high priority.
nodes having the same priority are treated as DWRR childs.

i.e. the user can set priority for unclassified traffic to be higher or lower than HTB traffic?

Yes its user configurable, unclassified traffic priority can be higher or lower than HTB traffic if a user wishes to configure it.

Please also point me at any inaccuracies in my picture, I really want to understand the algorithm here, because configuring additional priorities outside of HTB looks unusual to me.

  Please check the adjusted picture. Let us assume a user has set the priority as 6 for DWRR (unclassified traffic)  and  HTB strict priority as 0.
Once all traffic reaches  TL2,  Now hardware algorithm first pics HTB strict priority and processes DWRR later according to their priorities.

> 
> Thanks,
> Hariprasad k

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 2/5] octeontx2-pf: qos send queues management
  2023-01-22 16:46   ` Simon Horman
@ 2023-01-23 17:26     ` Hariprasad Kelam
  0 siblings, 0 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-23 17:26 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	maxtram95, Naveen Mamindlapalli

Thanks for the inputs, will address the review comments in the next version.

Thanks,
Hariprasad k

On Wed, Jan 18, 2023 at 04:21:04PM +0530, Hariprasad Kelam wrote:
> From: Subbaraya Sundeep <sbhatta@marvell.com>
> 
> Current implementation is such that the number of Send queues (SQs) 
> are decided on the device probe which is equal to the number of online 
> cpus. These SQs are allocated and deallocated in interface open and c 
> lose calls respectively.
> 
> This patch defines new APIs for initializing and deinitializing Send 
> queues dynamically and allocates more number of transmit queues for 
> QOS feature.
> 
> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>

...

> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c 
> b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index 88f8772a61cd..0868ae825736 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -758,11 +758,16 @@ int otx2_txschq_stop(struct otx2_nic *pfvf)  
> void otx2_sqb_flush(struct otx2_nic *pfvf)  {
>  	int qidx, sqe_tail, sqe_head;
> +	struct otx2_snd_queue *sq;
>  	u64 incr, *ptr, val;
>  	int timeout = 1000;
>  
>  	ptr = (u64 *)otx2_get_regaddr(pfvf, NIX_LF_SQ_OP_STATUS);
> -	for (qidx = 0; qidx < pfvf->hw.tot_tx_queues; qidx++) {
> +	for (qidx = 0; qidx < pfvf->hw.tot_tx_queues + 
> +pfvf->hw.tc_tx_queues;

nit:

It seems awkward that essentially this is saying that the total tx queues is 'tot_tx_queues' + 'tc_tx_queues'.
As I read 'tot' as being short for 'total'.

Also, the pfvf->hw.tot_tx_queues + pfvf->hw.tc_tx_queues pattern is rather verbose and repeated often. Perhaps a helper would... help.

Will add these changes in   next version.

> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c 
> b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index c1ea60bc2630..3acda6d289d3 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c

...

> @@ -1688,11 +1693,13 @@ int otx2_open(struct net_device *netdev)
>  
>  	netif_carrier_off(netdev);
>  
> -	pf->qset.cq_cnt = pf->hw.rx_queues + pf->hw.tot_tx_queues;
>  	/* RQ and SQs are mapped to different CQs,
>  	 * so find out max CQ IRQs (i.e CINTs) needed.
>  	 */
>  	pf->hw.cint_cnt = max(pf->hw.rx_queues, pf->hw.tx_queues);
> +	pf->hw.cint_cnt = max_t(u8, pf->hw.cint_cnt, pf->hw.tc_tx_queues);

nit: maybe this is nicer? *completely untested!*

	pf->hw.cint_cnt = max3(pf->hw.rx_queues, pf->hw.tx_queues),
			       pf->hw.tc_tx_queues);

Will add these changes in   next version.

...

> @@ -735,7 +741,10 @@ static void otx2_sqe_add_hdr(struct otx2_nic *pfvf, struct otx2_snd_queue *sq,
>  		sqe_hdr->aura = sq->aura_id;
>  		/* Post a CQE Tx after pkt transmission */
>  		sqe_hdr->pnc = 1;
> -		sqe_hdr->sq = qidx;
> +		if (pfvf->hw.tx_queues == qidx)
> +			sqe_hdr->sq = qidx + pfvf->hw.xdp_queues;
> +		else
> +			sqe_hdr->sq = qidx;

nit: maybe this is nicer? *completely untested!*

		sqe_hdr = pfvf->hw.tx_queues != qidx ?
			  qidx + pfvf->hw.xdp_queues : qidx;
Will add these changes in   next version.
...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-23 17:03             ` Hariprasad Kelam
@ 2023-01-23 20:05               ` Maxim Mikityanskiy
  2023-01-23 22:45                 ` Jakub Kicinski
  2023-01-23 22:44               ` Jakub Kicinski
  1 sibling, 1 reply; 23+ messages in thread
From: Maxim Mikityanskiy @ 2023-01-23 20:05 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: netdev, linux-kernel, kuba, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli

On Mon, Jan 23, 2023 at 05:03:01PM +0000, Hariprasad Kelam wrote:
> 
> 
> > 
> > On Fri, Jan 20, 2023 at 08:50:16AM +0000, Hariprasad Kelam wrote:
> > > 
> > > On Wed, Jan 18, 2023 at 04:21:06PM +0530, Hariprasad Kelam wrote:
> > > > All VFs and PF netdev shares same TL1 schedular, each interface PF 
> > > > or VF will have different TL2 schedulars having same parent TL1. 
> > > > The
> > > > TL1 RR_PRIO value is static and PF/VFs use the same value to 
> > > > configure its
> > > > TL2 node priority in case of DWRR children.
> > > > 
> > > > This patch adds support to configure TL1 RR_PRIO value using devlink.
> > > > The TL1 RR_PRIO can be configured for each PF. The VFs are not 
> > > > allowed to configure TL1 RR_PRIO value. The VFs can get the 
> > > > RR_PRIO value from the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.
> > > 
> > > I asked this question under v1, but didn't get an answer, could you shed some light?
> > > 
> > > "Could you please elaborate how these priorities of Transmit Levels are related to HTB priorities? I don't seem to understand why something has to be configured with devlink in addition to HTB.
> > > 
> > > SMQ (send meta-descriptor queue) and MDQ (meta-descriptor queue) are the first transmit levels.
> > > Each send queue is mapped with SMQ.
> > >  
> > > As mentioned in cover letter, each egress packet needs to traverse all transmit levels starting from TL5 to TL1.
> > 
> > Yeah, I saw that, just some details about your hardware which might be obvious to you aren't so clear to me...
> > 
> > Do these transmit levels map to "layers" of HTB hierarchy? Does it look like this, or is my understanding completely wrong?
> > 
> > TL1                 [HTB root node]
> >                    /               \
> > TL2          [HTB node]         [HTB node]
> >             /          \             |
> > TL3    [HTB node]  [HTB node]   [HTB node]
> > ...                       ...
> > 
> > Transmit levels to HTB mapping is correct.
> > 
> > 
> > 
> > > This applies to non-QOS Send queues as well.
> > >  
> > >                        SMQ/MDQ --> TL4 -->TL3 -->TL2 -->TL1
> > > 
> > > By default non QOS queues use a default hierarchy  with round robin priority. 
> > > To avoid conflict with QOS tree priorities, with devlink user can choose round-robin priority before Qos tree formation.
> > 
> > So, this priority that you set with devlink is basically a weight of unclassified (default) traffic for round robin between unclassified and classified traffic, right? I.e. you have two hierarchies (one for HTB, another for non-QoS queue), and you do DWRR between them, according to this priority?
> > 
> > 
> >  Not exactly, In the given scenario where  multiple vfs are attached to PF netdev.
> >  each VF unclassified traffic forms a hierarchy and PF also forms a hierarchy for unclassified traffic.
> >  
> > Now traffic from these all tress(multiple vfs and PFs) are aggregated at TL1. HW performs DWRR among them since these TL2 queues (belonging to each pf and vf netdevs) will be configured with the same priority by the driver.
> > 
> > Currently, this priority is hard coded. Now we are providing this as a configurable value to the user.
> > 
> > Now if a user adds a HTB node, this will have strict priority at TL2 level since DWRR priority is different this traffic won't be affected by DWRR unclassified traffic.
> 
> So, did I get it right now?
> 
>                                      [strict priority**]
>                            /---------/                 \-----\
>                            |                                 |
>                         [DWRR*]                              |
>         /---------------/  |   \---------------\             |
>         |                  |                   |             |
> [ Hierarchy for ]  [ Hierarchy for  ]  [ Hierarchy for  ]    |
> [ unclassified  ]  [  unclassified  ]  [  unclassified  ]    |
> [traffic from PF]  [traffic from VF1]  [traffic from VF2]    |
> [      ***      ]  [      ***       ]  [      ***       ]    |
>                                                              |
>                                                 [HTB hierarchy using]
>                                                 [  strict priority  ]
>                                                 [   between nodes   ]
> 
> 
> 
>         Adjusted picture
> 
>                         /--------------------------------------------------------------------------------/       Transmit level 1
>                             |                                                                                      | 
>                         [DWRR*] [ priority 6 ]                                      [strict priority** ]      [ priority 0 ]  Transmit level 2
>         /---------------/  |   \-----------------------------------\                           |
>              |                   |                                   |                                           |
> [ Hierarchy for ]  [ Hierarchy for  ]  [ Hierarchy for  ]                 [ Hierarchy for  ] 
> [ unclassified  ]  [  unclassified  ]  [  unclassified  ]                      [ strict priority  ]
> [traffic from PF]  [traffic from VF1]  [traffic from VF2]   
> [      ***      ]  [      ***       ]  [      ***       ]    
>                                                 
> 
> 
> As far as I understand, you set priorities at ***, which affect DWRR balancing at *, but it's not clear to me how the selection at ** works.
> Does the HTB hierarchy have some fixed priority,  ?
> 
> Hardware supports priorities from 0 to 7. lower value has high priority.
> nodes having the same priority are treated as DWRR childs.
> 
> i.e. the user can set priority for unclassified traffic to be higher or lower than HTB traffic?
> 
> Yes its user configurable, unclassified traffic priority can be higher or lower than HTB traffic if a user wishes to configure it.
> 
> Please also point me at any inaccuracies in my picture, I really want to understand the algorithm here, because configuring additional priorities outside of HTB looks unusual to me.
> 
>   Please check the adjusted picture. Let us assume a user has set the priority as 6 for DWRR (unclassified traffic)  and  HTB strict priority as 0.
> Once all traffic reaches  TL2,  Now hardware algorithm first pics HTB strict priority and processes DWRR later according to their priorities.

OK, I seem to get it now, thanks for the explanation!

How do you set the priority for HTB, though? You mentioned this command
to set priority of unclassified traffic:

devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 6 \
cmode runtime

But what is the command to change priority for HTB?

What bothers me about using devlink to configure HTB priority is:

1. Software HTB implementation doesn't have this functionality, and it
always prioritizes unclassified traffic. As far as I understand, the
rule for tc stuff is that all features must have a reference
implementation in software.

2. Adding a flag (prefer unclassified vs prefer classified) to HTB
itself may be not straightforward, because your devlink command has a
second purpose of setting priorities between PFs/VFs, and it may
conflict with the HTB flag.

> 
> > 
> > Thanks,
> > Hariprasad k

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-23 17:03             ` Hariprasad Kelam
  2023-01-23 20:05               ` Maxim Mikityanskiy
@ 2023-01-23 22:44               ` Jakub Kicinski
  1 sibling, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2023-01-23 22:44 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: Maxim Mikityanskiy, netdev, linux-kernel, davem, pabeni,
	edumazet, Sunil Kovvuri Goutham, Linu Cherian,
	Geethasowjanya Akula, Jerin Jacob Kollanukkaran,
	Subbaraya Sundeep Bhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, tariqt, linux-rdma, hariprasad.netdev,
	Naveen Mamindlapalli

On Mon, 23 Jan 2023 17:03:01 +0000 Hariprasad Kelam wrote:
> So, did I get it right now?

Hariprasad, please fix your email setup.
It's impossible to read your messages :|
There is no marking of quotes and the responses are misaligned.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-23 20:05               ` Maxim Mikityanskiy
@ 2023-01-23 22:45                 ` Jakub Kicinski
  2023-01-23 23:01                   ` Maxim Mikityanskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2023-01-23 22:45 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Hariprasad Kelam, netdev, linux-kernel, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli

On Mon, 23 Jan 2023 22:05:58 +0200 Maxim Mikityanskiy wrote:
> OK, I seem to get it now, thanks for the explanation!
> 
> How do you set the priority for HTB, though? You mentioned this command
> to set priority of unclassified traffic:
> 
> devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 6 \
> cmode runtime
> 
> But what is the command to change priority for HTB?
> 
> What bothers me about using devlink to configure HTB priority is:
> 
> 1. Software HTB implementation doesn't have this functionality, and it
> always prioritizes unclassified traffic. As far as I understand, the
> rule for tc stuff is that all features must have a reference
> implementation in software.
> 
> 2. Adding a flag (prefer unclassified vs prefer classified) to HTB
> itself may be not straightforward, because your devlink command has a
> second purpose of setting priorities between PFs/VFs, and it may
> conflict with the HTB flag.

If there is a two-stage hierarchy the lower level should be controlled
by devlink-rate, no?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-23 22:45                 ` Jakub Kicinski
@ 2023-01-23 23:01                   ` Maxim Mikityanskiy
  2023-01-24 11:49                     ` Hariprasad Kelam
  0 siblings, 1 reply; 23+ messages in thread
From: Maxim Mikityanskiy @ 2023-01-23 23:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Hariprasad Kelam, netdev, linux-kernel, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli

On Mon, Jan 23, 2023 at 02:45:48PM -0800, Jakub Kicinski wrote:
> On Mon, 23 Jan 2023 22:05:58 +0200 Maxim Mikityanskiy wrote:
> > OK, I seem to get it now, thanks for the explanation!
> > 
> > How do you set the priority for HTB, though? You mentioned this command
> > to set priority of unclassified traffic:
> > 
> > devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 6 \
> > cmode runtime
> > 
> > But what is the command to change priority for HTB?
> > 
> > What bothers me about using devlink to configure HTB priority is:
> > 
> > 1. Software HTB implementation doesn't have this functionality, and it
> > always prioritizes unclassified traffic. As far as I understand, the
> > rule for tc stuff is that all features must have a reference
> > implementation in software.
> > 
> > 2. Adding a flag (prefer unclassified vs prefer classified) to HTB
> > itself may be not straightforward, because your devlink command has a
> > second purpose of setting priorities between PFs/VFs, and it may
> > conflict with the HTB flag.
> 
> If there is a two-stage hierarchy the lower level should be controlled
> by devlink-rate, no?

From the last picture by Hariprasad, I understood that the user sets all
priorities (for unclassified traffic per PF and VF, and for HTB traffic)
on the same TL2 level, i.e. it's not two-stage. (Maybe I got it all
wrong again?)

I asked about the command to change the HTB priority, cause the
parameters aren't easily guessed, but I assume it's also devlink (i.e.
driver-specific).

If there were two levels (the upper level chooses who goes first: HTB or
unclassified, and the lower level sets priorities per PF and VF for
unclassified traffic), that would be more straightforward to solve: the
upper level should be controlled by a new HTB parameter, and the lower
level is device-specific, thus it goes to devlink.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-23 23:01                   ` Maxim Mikityanskiy
@ 2023-01-24 11:49                     ` Hariprasad Kelam
  2023-01-24 13:02                       ` Maxim Mikityanskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-24 11:49 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Jakub Kicinski
  Cc: netdev, linux-kernel, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli



> On Mon, Jan 23, 2023 at 02:45:48PM -0800, Jakub Kicinski wrote:
> > On Mon, 23 Jan 2023 22:05:58 +0200 Maxim Mikityanskiy wrote:
> > > OK, I seem to get it now, thanks for the explanation!
> > >
> > > How do you set the priority for HTB, though? You mentioned this
> > > command to set priority of unclassified traffic:
> > >
> > > devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 6 \
> > > cmode runtime
> > >
> > > But what is the command to change priority for HTB?
> > >
> > > What bothers me about using devlink to configure HTB priority is:
> > >
> > > 1. Software HTB implementation doesn't have this functionality, and
> > > it always prioritizes unclassified traffic. As far as I understand,
> > > the rule for tc stuff is that all features must have a reference
> > > implementation in software.
> > >
> > > 2. Adding a flag (prefer unclassified vs prefer classified) to HTB
> > > itself may be not straightforward, because your devlink command has
> > > a second purpose of setting priorities between PFs/VFs, and it may
> > > conflict with the HTB flag.
> >
> > If there is a two-stage hierarchy the lower level should be controlled
> > by devlink-rate, no?
> 
> From the last picture by Hariprasad, I understood that the user sets all
> priorities (for unclassified traffic per PF and VF, and for HTB traffic) on the
> same TL2 level, i.e. it's not two-stage. (Maybe I got it all wrong again?)
> 
> I asked about the command to change the HTB priority, cause the
> parameters aren't easily guessed, but I assume it's also devlink (i.e.
> driver-specific).
> 
Currently, we don't support changing HTB priority since TC_HTB_MODIFY is not yet supported.
The driver implementation is inline with htb tc framework, below are commands we use for setting htb priority

ethtool -K eth0 hw-tc-offload on
tc qdisc replace dev eth0 root handle 1: htb offload
tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 2
tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 3


> If there were two levels (the upper level chooses who goes first: HTB or
> unclassified, and the lower level sets priorities per PF and VF for unclassified
> traffic), that would be more straightforward to solve: the upper level should
> be controlled by a new HTB parameter, and the lower level is device-specific,
> thus it goes to devlink.

The PF netdev and VFs share the same physical link and share the same TL1 node.
Hardware supports one DWRR group and the rest are strict priority nodes. Driver configures
the priority set by devlink to PF and VF traffic TL2 nodes such that traffic is forwarded
to TL1 using DWRR algorithm.

Now that if we add htb command for unclassified traffic, at any given point in time HTB
rule only applies to a single interface, since we require to set DWRR priority at TL1, 
which applies to both PF/VFs, we feel it's a different use case altogether.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-24 11:49                     ` Hariprasad Kelam
@ 2023-01-24 13:02                       ` Maxim Mikityanskiy
  2023-01-25  1:26                         ` Jakub Kicinski
  2023-01-26 17:16                         ` Hariprasad Kelam
  0 siblings, 2 replies; 23+ messages in thread
From: Maxim Mikityanskiy @ 2023-01-24 13:02 UTC (permalink / raw)
  To: Hariprasad Kelam
  Cc: Jakub Kicinski, netdev, linux-kernel, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli

On Tue, Jan 24, 2023 at 11:49:48AM +0000, Hariprasad Kelam wrote:
> 
> 
> > On Mon, Jan 23, 2023 at 02:45:48PM -0800, Jakub Kicinski wrote:
> > > On Mon, 23 Jan 2023 22:05:58 +0200 Maxim Mikityanskiy wrote:
> > > > OK, I seem to get it now, thanks for the explanation!
> > > >
> > > > How do you set the priority for HTB, though? You mentioned this
> > > > command to set priority of unclassified traffic:
> > > >
> > > > devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 6 \
> > > > cmode runtime
> > > >
> > > > But what is the command to change priority for HTB?
> > > >
> > > > What bothers me about using devlink to configure HTB priority is:
> > > >
> > > > 1. Software HTB implementation doesn't have this functionality, and
> > > > it always prioritizes unclassified traffic. As far as I understand,
> > > > the rule for tc stuff is that all features must have a reference
> > > > implementation in software.
> > > >
> > > > 2. Adding a flag (prefer unclassified vs prefer classified) to HTB
> > > > itself may be not straightforward, because your devlink command has
> > > > a second purpose of setting priorities between PFs/VFs, and it may
> > > > conflict with the HTB flag.
> > >
> > > If there is a two-stage hierarchy the lower level should be controlled
> > > by devlink-rate, no?
> > 
> > From the last picture by Hariprasad, I understood that the user sets all
> > priorities (for unclassified traffic per PF and VF, and for HTB traffic) on the
> > same TL2 level, i.e. it's not two-stage. (Maybe I got it all wrong again?)
> > 
> > I asked about the command to change the HTB priority, cause the
> > parameters aren't easily guessed, but I assume it's also devlink (i.e.
> > driver-specific).
> > 
> Currently, we don't support changing HTB priority since TC_HTB_MODIFY is not yet supported.
> The driver implementation is inline with htb tc framework, below are commands we use for setting htb priority
> 
> ethtool -K eth0 hw-tc-offload on
> tc qdisc replace dev eth0 root handle 1: htb offload
> tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 2
> tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 3

I thought there was a concept of a priority of the whole HTB tree in
your implementation...

So, if I run these commands:

devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 2 cmode runtime
tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 1
tc class add dev eth0 parent 1: classid 1:2 htb rate 10Gbit prio 3

Will it prioritize class 1:1 over unclassified traffic, and unclassified
traffic over class 1:2?

> > If there were two levels (the upper level chooses who goes first: HTB or
> > unclassified, and the lower level sets priorities per PF and VF for unclassified
> > traffic), that would be more straightforward to solve: the upper level should
> > be controlled by a new HTB parameter, and the lower level is device-specific,
> > thus it goes to devlink.
> 
> The PF netdev and VFs share the same physical link and share the same TL1 node.
> Hardware supports one DWRR group and the rest are strict priority nodes. Driver configures
> the priority set by devlink to PF and VF traffic TL2 nodes such that traffic is forwarded
> to TL1 using DWRR algorithm.
> 
> Now that if we add htb command for unclassified traffic, at any given point in time HTB
> rule only applies to a single interface, since we require to set DWRR priority at TL1, 
> which applies to both PF/VFs, we feel it's a different use case altogether.

Thanks, with the example above your explanation makes sense to me now.

So, basically, in your implementation, entities prioritized by hardware
are: each HTB class, each VF and PF; you want to keep user-assigned
priorities for HTB classes, you want to let the user assign a priority
for unclassified traffic, but the latter must be equal for all VFs and
PF (for DWRR to work), correct? And that devlink command is only useful
in the HTB scenario, i.e. it doesn't matter what tl1_rr_prio you set if
HTB is not used, right?

What I don't like in the current implementation is that it adds a
feature to HTB, bypassing HTB (also not providing a software equivalent
of the feature). I would expect the priority of unclassified traffic to
be configured with tc commands that set up HTB (by the way, HTB has a
"default" option to choose a class for unclassified traffic, and a
priority can be set for this class - this functionality can be leveraged
for this purpose, or a new option can be added, whatever looks more
appropriate). On the other hand, I understand your hardware limitation
requiring to have the same priority for all VFs and PF on the same port.

It's hard to suggest something good here, actually... An obvious thought
is to let the first netdev that configures HTB choose the priority for
unclassified traffic, and block attempts from other netdevs to set a
different one, but this approach also has obvious drawbacks: PF has no
special powers here, and it can't set the desired priority if PF itself
doesn't use HTB (or doesn't configure it first, before VFs).

Another idea of mine is to keep the devlink command for enforcement
purpose, and make the behavior as follows:

1. The user will pick a priority for unclassified traffic when attaching
HTB.

2. If tl1_rr_prio was set with devlink, block attempts to attach HTB
with a different default priority.

3. If tl1_rr_prio wasn't set or was reset, allow attaching HTB to PF
with any default priority. 

This way, VFs can't attach HTB with arbitrary priorities, only with the
one that the admin has enforced using devlink. At the same time, if VFs
aren't used, no extra steps are needed to just use HTB on a PF. On the
other hand, it adds some complexity and may sound controversial to
someone. Thoughts?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-24 13:02                       ` Maxim Mikityanskiy
@ 2023-01-25  1:26                         ` Jakub Kicinski
  2023-01-26 17:16                         ` Hariprasad Kelam
  1 sibling, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2023-01-25  1:26 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Hariprasad Kelam, netdev, linux-kernel, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli

On Tue, 24 Jan 2023 15:02:29 +0200 Maxim Mikityanskiy wrote:
> So, basically, in your implementation, entities prioritized by hardware
> are: each HTB class, each VF and PF; you want to keep user-assigned
> priorities for HTB classes, you want to let the user assign a priority
> for unclassified traffic, but the latter must be equal for all VFs and
> PF (for DWRR to work), correct? And that devlink command is only useful
> in the HTB scenario, i.e. it doesn't matter what tl1_rr_prio you set if
> HTB is not used, right?

To me HW this limited does not sound like a match for HTB offload.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-24 13:02                       ` Maxim Mikityanskiy
  2023-01-25  1:26                         ` Jakub Kicinski
@ 2023-01-26 17:16                         ` Hariprasad Kelam
  1 sibling, 0 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-26 17:16 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Jakub Kicinski, netdev, linux-kernel, davem, pabeni, edumazet,
	Sunil Kovvuri Goutham, Linu Cherian, Geethasowjanya Akula,
	Jerin Jacob Kollanukkaran, Subbaraya Sundeep Bhatta, jhs,
	xiyou.wangcong, jiri, saeedm, richardcochran, tariqt, linux-rdma,
	hariprasad.netdev, Naveen Mamindlapalli



> On Tue, Jan 24, 2023 at 11:49:48AM +0000, Hariprasad Kelam wrote:
> >
> >
> > > On Mon, Jan 23, 2023 at 02:45:48PM -0800, Jakub Kicinski wrote:
> > > > On Mon, 23 Jan 2023 22:05:58 +0200 Maxim Mikityanskiy wrote:
> > > > > OK, I seem to get it now, thanks for the explanation!
> > > > >
> > > > > How do you set the priority for HTB, though? You mentioned this
> > > > > command to set priority of unclassified traffic:
> > > > >
> > > > > devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value
> > > > > 6 \ cmode runtime
> > > > >
> > > > > But what is the command to change priority for HTB?
> > > > >
> > > > > What bothers me about using devlink to configure HTB priority is:
> > > > >
> > > > > 1. Software HTB implementation doesn't have this functionality,
> > > > > and it always prioritizes unclassified traffic. As far as I
> > > > > understand, the rule for tc stuff is that all features must have
> > > > > a reference implementation in software.
> > > > >
> > > > > 2. Adding a flag (prefer unclassified vs prefer classified) to
> > > > > HTB itself may be not straightforward, because your devlink
> > > > > command has a second purpose of setting priorities between
> > > > > PFs/VFs, and it may conflict with the HTB flag.
> > > >
> > > > If there is a two-stage hierarchy the lower level should be
> > > > controlled by devlink-rate, no?
> > >
> > > From the last picture by Hariprasad, I understood that the user sets
> > > all priorities (for unclassified traffic per PF and VF, and for HTB
> > > traffic) on the same TL2 level, i.e. it's not two-stage. (Maybe I
> > > got it all wrong again?)
> > >
> > > I asked about the command to change the HTB priority, cause the
> > > parameters aren't easily guessed, but I assume it's also devlink (i.e.
> > > driver-specific).
> > >
> > Currently, we don't support changing HTB priority since TC_HTB_MODIFY is
> not yet supported.
> > The driver implementation is inline with htb tc framework, below are
> > commands we use for setting htb priority
> >
> > ethtool -K eth0 hw-tc-offload on
> > tc qdisc replace dev eth0 root handle 1: htb offload tc class add dev
> > eth0 parent 1: classid 1:1 htb rate 10Gbit prio 2 tc class add dev
> > eth0 parent 1: classid 1:1 htb rate 10Gbit prio 3
> 
> I thought there was a concept of a priority of the whole HTB tree in your
> implementation...
> 
> So, if I run these commands:
> 
> devlink -p dev param set pci/0002:04:00.0 name tl1_rr_prio value 2 cmode
> runtime tc class add dev eth0 parent 1: classid 1:1 htb rate 10Gbit prio 1 tc
> class add dev eth0 parent 1: classid 1:2 htb rate 10Gbit prio 3
> 
> Will it prioritize class 1:1 over unclassified traffic, and unclassified traffic over
> class 1:2?
> 
> > > If there were two levels (the upper level chooses who goes first:
> > > HTB or unclassified, and the lower level sets priorities per PF and
> > > VF for unclassified traffic), that would be more straightforward to
> > > solve: the upper level should be controlled by a new HTB parameter,
> > > and the lower level is device-specific, thus it goes to devlink.
> >
> > The PF netdev and VFs share the same physical link and share the same TL1
> node.
> > Hardware supports one DWRR group and the rest are strict priority
> > nodes. Driver configures the priority set by devlink to PF and VF
> > traffic TL2 nodes such that traffic is forwarded to TL1 using DWRR algorithm.
> >
> > Now that if we add htb command for unclassified traffic, at any given
> > point in time HTB rule only applies to a single interface, since we
> > require to set DWRR priority at TL1, which applies to both PF/VFs, we feel
> it's a different use case altogether.
> 
> Thanks, with the example above your explanation makes sense to me now.
> 
> So, basically, in your implementation, entities prioritized by hardware
> are: each HTB class, each VF and PF; you want to keep user-assigned
> priorities for HTB classes, you want to let the user assign a priority for
> unclassified traffic, but the latter must be equal for all VFs and PF (for DWRR
> to work), correct? And that devlink command is only useful in the HTB
> scenario, i.e. it doesn't matter what tl1_rr_prio you set if HTB is not used,
> right?
> 
> What I don't like in the current implementation is that it adds a feature to
> HTB, bypassing HTB (also not providing a software equivalent of the feature).
> I would expect the priority of unclassified traffic to be configured with tc
> commands that set up HTB (by the way, HTB has a "default" option to choose
> a class for unclassified traffic, and a priority can be set for this class - this
> functionality can be leveraged for this purpose, or a new option can be
> added, whatever looks more appropriate). On the other hand, I understand
> your hardware limitation requiring to have the same priority for all VFs and PF
> on the same port.

Thanks for pointing out the "default" option. we will explore this option if we can fit this into our current implementation or not and get back.
Meanwhile,   there are comments on other patches in this series and as we are in line with htb offload without the devlink option we will submit
 a new version of patches addressing the comments excluding this patch.

Thanks,
Hariprasad k




> 
> It's hard to suggest something good here, actually... An obvious thought is to
> let the first netdev that configures HTB choose the priority for unclassified
> traffic, and block attempts from other netdevs to set a different one, but this
> approach also has obvious drawbacks: PF has no special powers here, and it
> can't set the desired priority if PF itself doesn't use HTB (or doesn't configure
> it first, before VFs).
> 
> Another idea of mine is to keep the devlink command for enforcement
> purpose, and make the behavior as follows:
> 
> 1. The user will pick a priority for unclassified traffic when attaching HTB.
> 
> 2. If tl1_rr_prio was set with devlink, block attempts to attach HTB with a
> different default priority.
> 
> 3. If tl1_rr_prio wasn't set or was reset, allow attaching HTB to PF with any
> default priority.
> 
> This way, VFs can't attach HTB with arbitrary priorities, only with the one that
> the admin has enforced using devlink. At the same time, if VFs aren't used,
> no extra steps are needed to just use HTB on a PF. On the other hand, it adds
> some complexity and may sound controversial to someone. Thoughts?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO
  2023-01-18 10:04 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
@ 2023-01-18 10:04 ` Hariprasad Kelam
  0 siblings, 0 replies; 23+ messages in thread
From: Hariprasad Kelam @ 2023-01-18 10:04 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: kuba, davem, pabeni, edumazet, sgoutham, lcherian, gakula,
	jerinj, sbhatta, jhs, xiyou.wangcong, jiri, saeedm,
	richardcochran, maximmi, tariqt, moshet, linux-rdma, maxtram95

All VFs and PF netdev shares same TL1 schedular, each interface
PF or VF will have different TL2 schedulars having same parent
TL1. The TL1 RR_PRIO value is static and PF/VFs use the same value
to configure its TL2 node priority in case of DWRR children.

This patch adds support to configure TL1 RR_PRIO value using devlink.
The TL1 RR_PRIO can be configured for each PF. The VFs are not allowed
to configure TL1 RR_PRIO value. The VFs can get the RR_PRIO value from
the mailbox NIX_TXSCH_ALLOC response parameter aggr_lvl_rr_prio.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/common.h    |  2 +-
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |  9 ++-
 .../net/ethernet/marvell/octeontx2/af/rvu.c   | 15 ++++
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  1 +
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   | 39 +++++++++-
 .../marvell/octeontx2/nic/otx2_common.c       |  9 ++-
 .../marvell/octeontx2/nic/otx2_common.h       |  1 +
 .../marvell/octeontx2/nic/otx2_devlink.c      | 78 +++++++++++++++++++
 8 files changed, 147 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/common.h b/drivers/net/ethernet/marvell/octeontx2/af/common.h
index 8931864ee110..f5bf719a6ccf 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/common.h
@@ -142,7 +142,7 @@ enum nix_scheduler {
 
 #define TXSCH_RR_QTM_MAX		((1 << 24) - 1)
 #define TXSCH_TL1_DFLT_RR_QTM		TXSCH_RR_QTM_MAX
-#define TXSCH_TL1_DFLT_RR_PRIO		(0x1ull)
+#define TXSCH_TL1_DFLT_RR_PRIO		(0x7ull)
 #define CN10K_MAX_DWRR_WEIGHT          16384 /* Weight is 14bit on CN10K */
 
 /* Min/Max packet sizes, excluding FCS */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 94e743b5b545..6f0199766d5a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -84,7 +84,7 @@ struct mbox_msghdr {
 #define OTX2_MBOX_REQ_SIG (0xdead)
 #define OTX2_MBOX_RSP_SIG (0xbeef)
 	u16 sig;         /* Signature, for validating corrupted msgs */
-#define OTX2_MBOX_VERSION (0x000a)
+#define OTX2_MBOX_VERSION (0x000b)
 	u16 ver;         /* Version of msg's structure for this ID */
 	u16 next_msgoff; /* Offset of next msg within mailbox region */
 	int rc;          /* Msg process'ed response code */
@@ -299,6 +299,7 @@ M(NIX_BANDPROF_GET_HWINFO, 0x801f, nix_bandprof_get_hwinfo, msg_req,		\
 				nix_bandprof_get_hwinfo_rsp)		    \
 M(NIX_READ_INLINE_IPSEC_CFG, 0x8023, nix_read_inline_ipsec_cfg,		\
 				msg_req, nix_inline_ipsec_cfg)		\
+M(NIX_TL1_RR_PRIO,	0x8025, nix_tl1_rr_prio, nix_tl1_rr_prio_req, msg_rsp) \
 /* MCS mbox IDs (range 0xA000 - 0xBFFF) */					\
 M(MCS_ALLOC_RESOURCES,	0xa000, mcs_alloc_resources, mcs_alloc_rsrc_req,	\
 				mcs_alloc_rsrc_rsp)				\
@@ -825,6 +826,7 @@ enum nix_af_status {
 	NIX_AF_ERR_CQ_CTX_WRITE_ERR  = -429,
 	NIX_AF_ERR_AQ_CTX_RETRY_WRITE  = -430,
 	NIX_AF_ERR_LINK_CREDITS  = -431,
+	NIX_AF_ERR_TL1_RR_PRIO_PERM_DENIED  = -433,
 };
 
 /* For NIX RX vtag action  */
@@ -1269,6 +1271,11 @@ struct nix_bandprof_get_hwinfo_rsp {
 	u32 policer_timeunit;
 };
 
+struct nix_tl1_rr_prio_req {
+	struct mbox_msghdr hdr;
+	u8 tl1_rr_prio;
+};
+
 /* NPC mbox message structs */
 
 #define NPC_MCAM_ENTRY_INVALID	0xFFFF
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 3f5e09b77d4b..7d2230359bf0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -808,6 +808,18 @@ static void rvu_setup_pfvf_macaddress(struct rvu *rvu)
 	}
 }
 
+static void rvu_setup_pfvf_aggr_lvl_rr_prio(struct rvu *rvu)
+{
+	struct rvu_hwinfo *hw = rvu->hw;
+	struct rvu_pfvf *pfvf;
+	int pf;
+
+	for (pf = 0; pf < hw->total_pfs; pf++) {
+		pfvf = &rvu->pf[pf];
+		pfvf->tl1_rr_prio = TXSCH_TL1_DFLT_RR_PRIO;
+	}
+}
+
 static int rvu_fwdata_init(struct rvu *rvu)
 {
 	u64 fwdbase;
@@ -1136,6 +1148,9 @@ static int rvu_setup_hw_resources(struct rvu *rvu)
 	/* Assign MACs for CGX mapped functions */
 	rvu_setup_pfvf_macaddress(rvu);
 
+	/* Assign aggr level RR Priority */
+	rvu_setup_pfvf_aggr_lvl_rr_prio(rvu);
+
 	err = rvu_npa_init(rvu);
 	if (err) {
 		dev_err(rvu->dev, "%s: Failed to initialize npa\n", __func__);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 7f0a64731c67..eec890f98803 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -254,6 +254,7 @@ struct rvu_pfvf {
 	u64     lmt_map_ent_w1; /* Preseving the word1 of lmtst map table entry*/
 	unsigned long flags;
 	struct  sdp_node_info *sdp_info;
+	u8	tl1_rr_prio; /* RR PRIORITY set by PF */
 };
 
 enum rvu_pfvf_flags {
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index c11859999074..ef9f62aa6e2b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -2001,6 +2001,7 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 {
 	struct rvu_hwinfo *hw = rvu->hw;
 	u16 pcifunc = req->hdr.pcifunc;
+	struct rvu_pfvf *parent_pf;
 	int link, blkaddr, rc = 0;
 	int lvl, idx, start, end;
 	struct nix_txsch *txsch;
@@ -2017,6 +2018,8 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 	if (!nix_hw)
 		return NIX_AF_ERR_INVALID_NIXBLK;
 
+	parent_pf = &rvu->pf[rvu_get_pf(pcifunc)];
+
 	mutex_lock(&rvu->rsrc_lock);
 
 	/* Check if request is valid as per HW capabilities
@@ -2076,7 +2079,7 @@ int rvu_mbox_handler_nix_txsch_alloc(struct rvu *rvu,
 	}
 
 	rsp->aggr_level = hw->cap.nix_tx_aggr_lvl;
-	rsp->aggr_lvl_rr_prio = TXSCH_TL1_DFLT_RR_PRIO;
+	rsp->aggr_lvl_rr_prio = parent_pf->tl1_rr_prio;
 	rsp->link_cfg_lvl = rvu_read64(rvu, blkaddr,
 				       NIX_AF_PSE_CHANNEL_LEVEL) & 0x01 ?
 				       NIX_TXSCH_LVL_TL3 : NIX_TXSCH_LVL_TL2;
@@ -2375,7 +2378,9 @@ static bool is_txschq_shaping_valid(struct rvu_hwinfo *hw, int lvl, u64 reg)
 static void nix_tl1_default_cfg(struct rvu *rvu, struct nix_hw *nix_hw,
 				u16 pcifunc, int blkaddr)
 {
+	struct rvu_pfvf *parent_pf = &rvu->pf[rvu_get_pf(pcifunc)];
 	u32 *pfvf_map;
+
 	int schq;
 
 	schq = nix_get_tx_link(rvu, pcifunc);
@@ -2384,7 +2389,7 @@ static void nix_tl1_default_cfg(struct rvu *rvu, struct nix_hw *nix_hw,
 	if (TXSCH_MAP_FLAGS(pfvf_map[schq]) & NIX_TXSCHQ_CFG_DONE)
 		return;
 	rvu_write64(rvu, blkaddr, NIX_AF_TL1X_TOPOLOGY(schq),
-		    (TXSCH_TL1_DFLT_RR_PRIO << 1));
+		    (parent_pf->tl1_rr_prio << 1));
 
 	/* On OcteonTx2 the config was in bytes and newer silcons
 	 * it's changed to weight.
@@ -5548,3 +5553,33 @@ int rvu_mbox_handler_nix_bandprof_get_hwinfo(struct rvu *rvu, struct msg_req *re
 
 	return 0;
 }
+
+int rvu_mbox_handler_nix_tl1_rr_prio(struct rvu *rvu,
+				     struct nix_tl1_rr_prio_req *req,
+				     struct msg_rsp *rsp)
+{
+	u16 pcifunc = req->hdr.pcifunc;
+	int blkaddr, nixlf, schq, err;
+	struct rvu_pfvf *pfvf;
+	u16 regval;
+
+	err = nix_get_nixlf(rvu, pcifunc, &nixlf, &blkaddr);
+	if (err)
+		return err;
+
+	pfvf = rvu_get_pfvf(rvu, pcifunc);
+	/* Only PF is allowed */
+	if (is_vf(pcifunc))
+		return NIX_AF_ERR_TL1_RR_PRIO_PERM_DENIED;
+
+	pfvf->tl1_rr_prio = req->tl1_rr_prio;
+
+	/* update TL1 topology */
+	schq = nix_get_tx_link(rvu, pcifunc);
+	regval = rvu_read64(rvu, blkaddr, NIX_AF_TL1X_TOPOLOGY(schq));
+	regval &= ~GENMASK_ULL(4, 1);
+	regval |= pfvf->tl1_rr_prio << 1;
+	rvu_write64(rvu, blkaddr, NIX_AF_TL1X_TOPOLOGY(schq), regval);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index a4c18565f9f4..96132cfe2141 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -591,7 +591,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for
 	u16 (*schq_list)[MAX_TXSCHQ_PER_FUNC];
 	struct otx2_hw *hw = &pfvf->hw;
 	struct nix_txschq_config *req;
-	u64 schq, parent;
+	u16 schq, parent;
 	u64 dwrr_val;
 
 	dwrr_val = mtu_to_dwrr_weight(pfvf, pfvf->tx_max_pktlen);
@@ -654,7 +654,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for
 
 		req->num_regs++;
 		req->reg[1] = NIX_AF_TL2X_SCHEDULE(schq);
-		req->regval[1] = TXSCH_TL1_DFLT_RR_PRIO << 24 | dwrr_val;
+		req->regval[1] = hw->txschq_aggr_lvl_rr_prio << 24 | dwrr_val;
 
 		if (lvl == hw->txschq_link_cfg_lvl) {
 			req->num_regs++;
@@ -678,7 +678,7 @@ int otx2_txschq_config(struct otx2_nic *pfvf, int lvl, int prio, bool txschq_for
 
 		req->num_regs++;
 		req->reg[1] = NIX_AF_TL1X_TOPOLOGY(schq);
-		req->regval[1] = (TXSCH_TL1_DFLT_RR_PRIO << 1);
+		req->regval[1] = hw->txschq_aggr_lvl_rr_prio << 1;
 
 		req->num_regs++;
 		req->reg[2] = NIX_AF_TL1X_CIR(schq);
@@ -742,6 +742,9 @@ int otx2_txsch_alloc(struct otx2_nic *pfvf)
 			pfvf->hw.txschq_list[lvl][schq] =
 				rsp->schq_list[lvl][schq];
 
+	pfvf->hw.txschq_link_cfg_lvl     = rsp->link_cfg_lvl;
+	pfvf->hw.txschq_aggr_lvl_rr_prio = rsp->aggr_lvl_rr_prio;
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index b142cc685a28..3e09f52f2dc6 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -208,6 +208,7 @@ struct otx2_hw {
 
 	/* NIX */
 	u8			txschq_link_cfg_lvl;
+	u8			txschq_aggr_lvl_rr_prio;
 	u16			txschq_list[NIX_TXSCH_LVL_CNT][MAX_TXSCHQ_PER_FUNC];
 	u16			matchall_ipolicer;
 	u32			dwrr_mtu;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
index 63ef7c41d18d..b6bb6945f08a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_devlink.c
@@ -64,9 +64,82 @@ static int otx2_dl_mcam_count_get(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int otx2_dl_tl1_rr_prio_validate(struct devlink *devlink, u32 id,
+					union devlink_param_value val,
+					struct netlink_ext_ack *extack)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pf = otx2_dl->pfvf;
+
+	if (is_otx2_vf(pf->pcifunc)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "TL1RR PRIORITY setting not allowed on VFs");
+		return -EOPNOTSUPP;
+	}
+
+	if (pci_num_vf(pf->pdev)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "TL1RR PRIORITY setting not allowed as VFs are already attached");
+		return -EOPNOTSUPP;
+	}
+
+	if (val.vu8 > 7) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Valid priority range 0 - 7");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int otx2_dl_tl1_rr_prio_get(struct devlink *devlink, u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pfvf = otx2_dl->pfvf;
+
+	ctx->val.vu8 = pfvf->hw.txschq_aggr_lvl_rr_prio;
+
+	return 0;
+}
+
+static int otx2_dl_tl1_rr_prio_set(struct devlink *devlink, u32 id,
+				   struct devlink_param_gset_ctx *ctx)
+{
+	struct otx2_devlink *otx2_dl = devlink_priv(devlink);
+	struct otx2_nic *pfvf = otx2_dl->pfvf;
+	struct nix_tl1_rr_prio_req *req;
+	bool if_up;
+	int err;
+
+	if_up = netif_running(pfvf->netdev);
+
+	/* send mailbox to AF */
+	mutex_lock(&pfvf->mbox.lock);
+
+	req = otx2_mbox_alloc_msg_nix_tl1_rr_prio(&pfvf->mbox);
+	if (!req) {
+		mutex_unlock(&pfvf->mbox.lock);
+		return -ENOMEM;
+	}
+
+	req->tl1_rr_prio = ctx->val.vu8;
+	err = otx2_sync_mbox_msg(&pfvf->mbox);
+	mutex_unlock(&pfvf->mbox.lock);
+
+	/* Reconfigure TL1/TL2 DWRR PRIORITY */
+	if (!err && if_up) {
+		otx2_stop(pfvf->netdev);
+		otx2_open(pfvf->netdev);
+	}
+
+	return err;
+}
+
 enum otx2_dl_param_id {
 	OTX2_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	OTX2_DEVLINK_PARAM_ID_MCAM_COUNT,
+	OTX2_DEVLINK_PARAM_ID_TL1_RR_PRIO,
 };
 
 static const struct devlink_param otx2_dl_params[] = {
@@ -75,6 +148,11 @@ static const struct devlink_param otx2_dl_params[] = {
 			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
 			     otx2_dl_mcam_count_get, otx2_dl_mcam_count_set,
 			     otx2_dl_mcam_count_validate),
+	DEVLINK_PARAM_DRIVER(OTX2_DEVLINK_PARAM_ID_TL1_RR_PRIO,
+			     "tl1_rr_prio", DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     otx2_dl_tl1_rr_prio_get, otx2_dl_tl1_rr_prio_set,
+			     otx2_dl_tl1_rr_prio_validate),
 };
 
 static const struct devlink_ops otx2_devlink_ops = {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-01-26 17:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 10:51 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
2023-01-18 10:51 ` [net-next Patch v2 1/5] sch_htb: Allow HTB priority parameter in offload mode Hariprasad Kelam
2023-01-18 10:51 ` [net-next Patch v2 2/5] octeontx2-pf: qos send queues management Hariprasad Kelam
2023-01-22 16:46   ` Simon Horman
2023-01-23 17:26     ` Hariprasad Kelam
2023-01-18 10:51 ` [net-next Patch v2 3/5] octeontx2-pf: Refactor schedular queue alloc/free calls Hariprasad Kelam
2023-01-18 10:51 ` [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO Hariprasad Kelam
2023-01-18 20:37   ` Maxim Mikityanskiy
2023-01-20  8:50     ` Hariprasad Kelam
2023-01-20 13:37       ` Maxim Mikityanskiy
2023-01-20 17:03         ` Hariprasad Kelam
2023-01-22 13:18           ` Maxim Mikityanskiy
2023-01-23 17:03             ` Hariprasad Kelam
2023-01-23 20:05               ` Maxim Mikityanskiy
2023-01-23 22:45                 ` Jakub Kicinski
2023-01-23 23:01                   ` Maxim Mikityanskiy
2023-01-24 11:49                     ` Hariprasad Kelam
2023-01-24 13:02                       ` Maxim Mikityanskiy
2023-01-25  1:26                         ` Jakub Kicinski
2023-01-26 17:16                         ` Hariprasad Kelam
2023-01-23 22:44               ` Jakub Kicinski
2023-01-18 10:51 ` [net-next Patch v2 5/5] octeontx2-pf: Add support for HTB offload Hariprasad Kelam
  -- strict thread matches above, loose matches on Subject: below --
2023-01-18 10:04 [net-next Patch v2 0/5] octeontx2-pf: HTB offload support Hariprasad Kelam
2023-01-18 10:04 ` [net-next Patch v2 4/5] octeontx2-pf: Add devlink support to configure TL1 RR_PRIO Hariprasad Kelam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.