All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-11 10:18 ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

The following series introduces a new hardware offload mode in
tc/mqprio where the TCs, the queue configurations and
bandwidth rate limits are offloaded to the hardware. The existing
mqprio framework is extended to configure the queue counts and
layout and also added support for rate limiting. This is achieved
by setting the value 2 for the "hw" mode. Legacy devices can fall
back to the existing setup supporting hw mode 1 where only the
TCs are offloaded, however if hw mode 2 is specified, then this
type of offload fails to initialize if the requested mode is not
supported. The i40e driver enables the new mqprio hardware offload
mechanism factoring the TCs, queue configuration and bandwidth
rates by creating HW channel VSIs.

In this new mode, the priority to traffic class mapping and the
user specified queue ranges are used to configure the traffic
class when the 'hw' option is set to 2. This is achieved by
creating HW channels(VSI). A new channel is created for each
of the traffic class configuration offloaded via mqprio
framework except for the first TC (TC0) which is for the main
VSI. TC0 for the main VSI is also reconfigured as per user
provided queue parameters. Finally, bandwidth rate limits are
set on these traffic classes through the mqprio offload
framework by sending these rates in addition to the number of
TCs and the queue configurations.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0
  qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
               queues:(0:3) (4:7)
               min rates:0bit 0bit
               max rates:55Mbit 60Mbit
---

Amritha Nambiar (6):
      [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
      [next-queue]net: i40e: Add macro for PF reset bit
      [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler
      [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping
      [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well
      [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio


 drivers/net/ethernet/intel/i40e/i40e.h             |   45 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |    3 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    8 
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 1478 +++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |    2 
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   50 -
 include/linux/netdevice.h                          |    2 
 include/net/pkt_cls.h                              |    7 
 include/uapi/linux/pkt_sched.h                     |   13 
 net/sched/sch_mqprio.c                             |  170 ++
 10 files changed, 1526 insertions(+), 252 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-11 10:18 ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan

The following series introduces a new hardware offload mode in
tc/mqprio where the TCs, the queue configurations and
bandwidth rate limits are offloaded to the hardware. The existing
mqprio framework is extended to configure the queue counts and
layout and also added support for rate limiting. This is achieved
by setting the value 2 for the "hw" mode. Legacy devices can fall
back to the existing setup supporting hw mode 1 where only the
TCs are offloaded, however if hw mode 2 is specified, then this
type of offload fails to initialize if the requested mode is not
supported. The i40e driver enables the new mqprio hardware offload
mechanism factoring the TCs, queue configuration and bandwidth
rates by creating HW channel VSIs.

In this new mode, the priority to traffic class mapping and the
user specified queue ranges are used to configure the traffic
class when the 'hw' option is set to 2. This is achieved by
creating HW channels(VSI). A new channel is created for each
of the traffic class configuration offloaded via mqprio
framework except for the first TC (TC0) which is for the main
VSI. TC0 for the main VSI is also reconfigured as per user
provided queue parameters. Finally, bandwidth rate limits are
set on these traffic classes through the mqprio offload
framework by sending these rates in addition to the number of
TCs and the queue configurations.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0
  qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
               queues:(0:3) (4:7)
               min rates:0bit 0bit
               max rates:55Mbit 60Mbit
---

Amritha Nambiar (6):
      [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
      [next-queue]net: i40e: Add macro for PF reset bit
      [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler
      [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping
      [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well
      [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio


 drivers/net/ethernet/intel/i40e/i40e.h             |   45 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |    3 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    8 
 drivers/net/ethernet/intel/i40e/i40e_main.c        | 1478 +++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |    2 
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   50 -
 include/linux/netdevice.h                          |    2 
 include/net/pkt_cls.h                              |    7 
 include/uapi/linux/pkt_sched.h                     |   13 
 net/sched/sch_mqprio.c                             |  170 ++
 10 files changed, 1526 insertions(+), 252 deletions(-)

--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 10:18   ` Amritha Nambiar
  -1 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

This patch introduces a new hardware offload mode in mqprio
which makes full use of the mqprio options, the TCs, the
queue configurations and the bandwidth rates for the TCs.
This is achieved by setting the value 2 for the "hw" option.
This new offload mode supports new attributes for traffic
class such as minimum and maximum values for bandwidth rate limits.

Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
mqprio queue options and use this to be shared between the kernel and
device driver. This contains a copy of the exisiting datastructure
for mqprio queue options. This new datastructure can be extended when
adding new attributes for traffic class such as bandwidth rate limits. The
existing datastructure for mqprio queue options will be shared between the
kernel and userspace.

This patch enables configuring additional attributes associated
with a traffic class such as minimum and maximum bandwidth
rates and can be offloaded to the hardware in the new offload mode.
The min and max limits for bandwidth rates are provided
by the user along with the the TCs and the queue configurations
when creating the mqprio qdisc.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             min rates:0bit 0bit
             max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/netdevice.h      |    2 
 include/net/pkt_cls.h          |    7 ++
 include/uapi/linux/pkt_sched.h |   13 +++
 net/sched/sch_mqprio.c         |  170 +++++++++++++++++++++++++++++++++++++---
 4 files changed, 181 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e48ee2e..12c6c3f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,6 +779,7 @@ enum {
 	TC_SETUP_CLSFLOWER,
 	TC_SETUP_MATCHALL,
 	TC_SETUP_CLSBPF,
+	TC_SETUP_MQPRIO_EXT,
 };
 
 struct tc_cls_u32_offload;
@@ -791,6 +792,7 @@ struct tc_to_netdev {
 		struct tc_cls_matchall_offload *cls_mall;
 		struct tc_cls_bpf_offload *cls_bpf;
 		struct tc_mqprio_qopt *mqprio;
+		struct tc_mqprio_qopt_offload *mqprio_qopt;
 	};
 	bool egress_dev;
 };
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 537d0a0..9facda2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
 	u32 gen_flags;
 };
 
+struct tc_mqprio_qopt_offload {
+	/* struct tc_mqprio_qopt must always be the first element */
+	struct tc_mqprio_qopt qopt;
+	u32 flags;
+	u64 min_rate[TC_QOPT_MAX_QUEUE];
+	u64 max_rate[TC_QOPT_MAX_QUEUE];
+};
 
 /* This structure holds cookie structure that is passed from user
  * to the kernel for actions and classifiers
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf55..cf2a146 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -620,6 +620,7 @@ struct tc_drr_stats {
 enum {
 	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
 	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
+	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
 	__TC_MQPRIO_HW_OFFLOAD_MAX
 };
 
@@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+#define TC_MQPRIO_F_MIN_RATE  0x1
+#define TC_MQPRIO_F_MAX_RATE  0x2
+
+enum {
+	TCA_MQPRIO_UNSPEC,
+	TCA_MQPRIO_MIN_RATE64,
+	TCA_MQPRIO_MAX_RATE64,
+	__TCA_MQPRIO_MAX,
+};
+
+#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
+
 /* SFB */
 
 enum {
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index e0c0272..6524d12 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -18,10 +18,13 @@
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/sch_generic.h>
+#include <net/pkt_cls.h>
 
 struct mqprio_sched {
 	struct Qdisc		**qdiscs;
 	int hw_offload;
+	u32 flags;
+	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
 };
 
 static void mqprio_destroy(struct Qdisc *sch)
@@ -39,9 +42,21 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) {
-		struct tc_mqprio_qopt offload = { 0 };
-		struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-					   { .mqprio = &offload } };
+		struct tc_mqprio_qopt_offload offload = { { 0 } };
+		struct tc_to_netdev tc = { 0 };
+
+		switch (priv->hw_offload) {
+		case TC_MQPRIO_HW_OFFLOAD_TCS:
+			tc.type = TC_SETUP_MQPRIO;
+			tc.mqprio = &offload.qopt;
+			break;
+		case TC_MQPRIO_HW_OFFLOAD:
+			tc.type = TC_SETUP_MQPRIO_EXT;
+			tc.mqprio_qopt = &offload;
+			break;
+		default:
+			return;
+		}
 
 		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, 0, &tc);
 	} else {
@@ -99,6 +114,24 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt)
 	return 0;
 }
 
+static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
+	[TCA_MQPRIO_MIN_RATE64] = { .type = NLA_NESTED },
+	[TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED },
+};
+
+static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
+		      const struct nla_policy *policy, int len)
+{
+	int nested_len = nla_len(nla) - NLA_ALIGN(len);
+
+	if (nested_len >= nla_attr_size(0))
+		return nla_parse(tb, maxtype, nla_data(nla) + NLA_ALIGN(len),
+				 nested_len, policy, NULL);
+
+	memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
+	return 0;
+}
+
 static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct net_device *dev = qdisc_dev(sch);
@@ -107,6 +140,10 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	struct Qdisc *qdisc;
 	int i, err = -EOPNOTSUPP;
 	struct tc_mqprio_qopt *qopt = NULL;
+	struct nlattr *tb[TCA_MQPRIO_MAX + 1];
+	struct nlattr *attr;
+	int rem;
+	int len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt));
 
 	BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE);
 	BUILD_BUG_ON(TC_BITMASK != TC_QOPT_BITMASK);
@@ -124,6 +161,51 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	if (mqprio_parse_opt(dev, qopt))
 		return -EINVAL;
 
+	if (len > 0) {
+		err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy,
+				 sizeof(*qopt));
+		if (err < 0)
+			return err;
+
+		if (tb[TCA_MQPRIO_MIN_RATE64]) {
+			if (qopt->hw != TC_MQPRIO_HW_OFFLOAD)
+				return -EINVAL;
+
+			i = 0;
+			nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64],
+					    rem) {
+				if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64)
+					return -EINVAL;
+
+				if (i >= qopt->num_tc)
+					return -EINVAL;
+
+				priv->min_rate[i] = *(u64 *)nla_data(attr);
+				i++;
+			}
+			priv->flags |= TC_MQPRIO_F_MIN_RATE;
+		}
+
+		if (tb[TCA_MQPRIO_MAX_RATE64]) {
+			if (qopt->hw != TC_MQPRIO_HW_OFFLOAD)
+				return -EINVAL;
+
+			i = 0;
+			nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64],
+					    rem) {
+				if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64)
+					return -EINVAL;
+
+				if (i >= qopt->num_tc)
+					return -EINVAL;
+
+				priv->max_rate[i] = *(u64 *)nla_data(attr);
+				i++;
+			}
+			priv->flags |= TC_MQPRIO_F_MAX_RATE;
+		}
+	}
+
 	/* pre-allocate qdisc, attachment can't fail */
 	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
 			       GFP_KERNEL);
@@ -148,16 +230,37 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 * supplied and verified mapping
 	 */
 	if (qopt->hw) {
-		struct tc_mqprio_qopt offload = *qopt;
-		struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-					   { .mqprio = &offload } };
+		struct tc_mqprio_qopt_offload offload = {.qopt = *qopt};
+		struct tc_to_netdev tc = { 0 };
+
+		switch (qopt->hw) {
+		case TC_MQPRIO_HW_OFFLOAD_TCS:
+			tc.type = TC_SETUP_MQPRIO;
+			tc.mqprio = &offload.qopt;
+			break;
+		case TC_MQPRIO_HW_OFFLOAD:
+			tc.type = TC_SETUP_MQPRIO_EXT;
+			tc.mqprio_qopt = &offload;
+
+			offload.flags = priv->flags;
+			if (priv->flags & TC_MQPRIO_F_MIN_RATE)
+				for (i = 0; i < offload.qopt.num_tc; i++)
+					offload.min_rate[i] = priv->min_rate[i];
+
+			if (priv->flags & TC_MQPRIO_F_MAX_RATE)
+				for (i = 0; i < offload.qopt.num_tc; i++)
+					offload.max_rate[i] = priv->max_rate[i];
+			break;
+		default:
+			return -EINVAL;
+		}
 
 		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
 						    0, 0, &tc);
 		if (err)
 			return err;
 
-		priv->hw_offload = offload.hw;
+		priv->hw_offload = offload.qopt.hw;
 	} else {
 		netdev_set_num_tc(dev, qopt->num_tc);
 		for (i = 0; i < qopt->num_tc; i++)
@@ -227,11 +330,51 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 	return 0;
 }
 
+static int dump_rates(struct mqprio_sched *priv,
+		      struct tc_mqprio_qopt *opt, struct sk_buff *skb)
+{
+	struct nlattr *nest;
+	int i;
+
+	if (priv->flags & TC_MQPRIO_F_MIN_RATE) {
+		nest = nla_nest_start(skb, TCA_MQPRIO_MIN_RATE64);
+		if (!nest)
+			goto nla_put_failure;
+
+		for (i = 0; i < opt->num_tc; i++) {
+			if (nla_put(skb, TCA_MQPRIO_MIN_RATE64,
+				    sizeof(priv->min_rate[i]),
+				    &priv->min_rate[i]))
+				goto nla_put_failure;
+		}
+		nla_nest_end(skb, nest);
+	}
+
+	if (priv->flags & TC_MQPRIO_F_MAX_RATE) {
+		nest = nla_nest_start(skb, TCA_MQPRIO_MAX_RATE64);
+		if (!nest)
+			goto nla_put_failure;
+
+		for (i = 0; i < opt->num_tc; i++) {
+			if (nla_put(skb, TCA_MQPRIO_MAX_RATE64,
+				    sizeof(priv->max_rate[i]),
+				    &priv->max_rate[i]))
+				goto nla_put_failure;
+		}
+		nla_nest_end(skb, nest);
+	}
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
-	unsigned char *b = skb_tail_pointer(skb);
+	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
 	struct Qdisc *qdisc;
 	unsigned int i;
@@ -262,12 +405,17 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		opt.offset[i] = dev->tc_to_txq[i].offset;
 	}
 
-	if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt))
+	if (nla_put(skb, TCA_OPTIONS, NLA_ALIGN(sizeof(opt)), &opt))
 		goto nla_put_failure;
 
-	return skb->len;
+	if (priv->flags & TC_MQPRIO_F_MIN_RATE ||
+	    priv->flags & TC_MQPRIO_F_MAX_RATE)
+		if (dump_rates(priv, &opt, skb) != 0)
+			goto nla_put_failure;
+
+	return nla_nest_end(skb, nla);
 nla_put_failure:
-	nlmsg_trim(skb, b);
+	nlmsg_trim(skb, nla);
 	return -1;
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
@ 2017-07-11 10:18   ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan

This patch introduces a new hardware offload mode in mqprio
which makes full use of the mqprio options, the TCs, the
queue configurations and the bandwidth rates for the TCs.
This is achieved by setting the value 2 for the "hw" option.
This new offload mode supports new attributes for traffic
class such as minimum and maximum values for bandwidth rate limits.

Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
mqprio queue options and use this to be shared between the kernel and
device driver. This contains a copy of the exisiting datastructure
for mqprio queue options. This new datastructure can be extended when
adding new attributes for traffic class such as bandwidth rate limits. The
existing datastructure for mqprio queue options will be shared between the
kernel and userspace.

This patch enables configuring additional attributes associated
with a traffic class such as minimum and maximum bandwidth
rates and can be offloaded to the hardware in the new offload mode.
The min and max limits for bandwidth rates are provided
by the user along with the the TCs and the queue configurations
when creating the mqprio qdisc.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
  queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             min rates:0bit 0bit
             max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/netdevice.h      |    2 
 include/net/pkt_cls.h          |    7 ++
 include/uapi/linux/pkt_sched.h |   13 +++
 net/sched/sch_mqprio.c         |  170 +++++++++++++++++++++++++++++++++++++---
 4 files changed, 181 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e48ee2e..12c6c3f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,6 +779,7 @@ enum {
 	TC_SETUP_CLSFLOWER,
 	TC_SETUP_MATCHALL,
 	TC_SETUP_CLSBPF,
+	TC_SETUP_MQPRIO_EXT,
 };
 
 struct tc_cls_u32_offload;
@@ -791,6 +792,7 @@ struct tc_to_netdev {
 		struct tc_cls_matchall_offload *cls_mall;
 		struct tc_cls_bpf_offload *cls_bpf;
 		struct tc_mqprio_qopt *mqprio;
+		struct tc_mqprio_qopt_offload *mqprio_qopt;
 	};
 	bool egress_dev;
 };
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 537d0a0..9facda2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
 	u32 gen_flags;
 };
 
+struct tc_mqprio_qopt_offload {
+	/* struct tc_mqprio_qopt must always be the first element */
+	struct tc_mqprio_qopt qopt;
+	u32 flags;
+	u64 min_rate[TC_QOPT_MAX_QUEUE];
+	u64 max_rate[TC_QOPT_MAX_QUEUE];
+};
 
 /* This structure holds cookie structure that is passed from user
  * to the kernel for actions and classifiers
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf55..cf2a146 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -620,6 +620,7 @@ struct tc_drr_stats {
 enum {
 	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
 	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
+	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
 	__TC_MQPRIO_HW_OFFLOAD_MAX
 };
 
@@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+#define TC_MQPRIO_F_MIN_RATE  0x1
+#define TC_MQPRIO_F_MAX_RATE  0x2
+
+enum {
+	TCA_MQPRIO_UNSPEC,
+	TCA_MQPRIO_MIN_RATE64,
+	TCA_MQPRIO_MAX_RATE64,
+	__TCA_MQPRIO_MAX,
+};
+
+#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
+
 /* SFB */
 
 enum {
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index e0c0272..6524d12 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -18,10 +18,13 @@
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/sch_generic.h>
+#include <net/pkt_cls.h>
 
 struct mqprio_sched {
 	struct Qdisc		**qdiscs;
 	int hw_offload;
+	u32 flags;
+	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
 };
 
 static void mqprio_destroy(struct Qdisc *sch)
@@ -39,9 +42,21 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) {
-		struct tc_mqprio_qopt offload = { 0 };
-		struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-					   { .mqprio = &offload } };
+		struct tc_mqprio_qopt_offload offload = { { 0 } };
+		struct tc_to_netdev tc = { 0 };
+
+		switch (priv->hw_offload) {
+		case TC_MQPRIO_HW_OFFLOAD_TCS:
+			tc.type = TC_SETUP_MQPRIO;
+			tc.mqprio = &offload.qopt;
+			break;
+		case TC_MQPRIO_HW_OFFLOAD:
+			tc.type = TC_SETUP_MQPRIO_EXT;
+			tc.mqprio_qopt = &offload;
+			break;
+		default:
+			return;
+		}
 
 		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, 0, &tc);
 	} else {
@@ -99,6 +114,24 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt)
 	return 0;
 }
 
+static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
+	[TCA_MQPRIO_MIN_RATE64] = { .type = NLA_NESTED },
+	[TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED },
+};
+
+static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
+		      const struct nla_policy *policy, int len)
+{
+	int nested_len = nla_len(nla) - NLA_ALIGN(len);
+
+	if (nested_len >= nla_attr_size(0))
+		return nla_parse(tb, maxtype, nla_data(nla) + NLA_ALIGN(len),
+				 nested_len, policy, NULL);
+
+	memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
+	return 0;
+}
+
 static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct net_device *dev = qdisc_dev(sch);
@@ -107,6 +140,10 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	struct Qdisc *qdisc;
 	int i, err = -EOPNOTSUPP;
 	struct tc_mqprio_qopt *qopt = NULL;
+	struct nlattr *tb[TCA_MQPRIO_MAX + 1];
+	struct nlattr *attr;
+	int rem;
+	int len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt));
 
 	BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE);
 	BUILD_BUG_ON(TC_BITMASK != TC_QOPT_BITMASK);
@@ -124,6 +161,51 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	if (mqprio_parse_opt(dev, qopt))
 		return -EINVAL;
 
+	if (len > 0) {
+		err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy,
+				 sizeof(*qopt));
+		if (err < 0)
+			return err;
+
+		if (tb[TCA_MQPRIO_MIN_RATE64]) {
+			if (qopt->hw != TC_MQPRIO_HW_OFFLOAD)
+				return -EINVAL;
+
+			i = 0;
+			nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64],
+					    rem) {
+				if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64)
+					return -EINVAL;
+
+				if (i >= qopt->num_tc)
+					return -EINVAL;
+
+				priv->min_rate[i] = *(u64 *)nla_data(attr);
+				i++;
+			}
+			priv->flags |= TC_MQPRIO_F_MIN_RATE;
+		}
+
+		if (tb[TCA_MQPRIO_MAX_RATE64]) {
+			if (qopt->hw != TC_MQPRIO_HW_OFFLOAD)
+				return -EINVAL;
+
+			i = 0;
+			nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64],
+					    rem) {
+				if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64)
+					return -EINVAL;
+
+				if (i >= qopt->num_tc)
+					return -EINVAL;
+
+				priv->max_rate[i] = *(u64 *)nla_data(attr);
+				i++;
+			}
+			priv->flags |= TC_MQPRIO_F_MAX_RATE;
+		}
+	}
+
 	/* pre-allocate qdisc, attachment can't fail */
 	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
 			       GFP_KERNEL);
@@ -148,16 +230,37 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 * supplied and verified mapping
 	 */
 	if (qopt->hw) {
-		struct tc_mqprio_qopt offload = *qopt;
-		struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-					   { .mqprio = &offload } };
+		struct tc_mqprio_qopt_offload offload = {.qopt = *qopt};
+		struct tc_to_netdev tc = { 0 };
+
+		switch (qopt->hw) {
+		case TC_MQPRIO_HW_OFFLOAD_TCS:
+			tc.type = TC_SETUP_MQPRIO;
+			tc.mqprio = &offload.qopt;
+			break;
+		case TC_MQPRIO_HW_OFFLOAD:
+			tc.type = TC_SETUP_MQPRIO_EXT;
+			tc.mqprio_qopt = &offload;
+
+			offload.flags = priv->flags;
+			if (priv->flags & TC_MQPRIO_F_MIN_RATE)
+				for (i = 0; i < offload.qopt.num_tc; i++)
+					offload.min_rate[i] = priv->min_rate[i];
+
+			if (priv->flags & TC_MQPRIO_F_MAX_RATE)
+				for (i = 0; i < offload.qopt.num_tc; i++)
+					offload.max_rate[i] = priv->max_rate[i];
+			break;
+		default:
+			return -EINVAL;
+		}
 
 		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle,
 						    0, 0, &tc);
 		if (err)
 			return err;
 
-		priv->hw_offload = offload.hw;
+		priv->hw_offload = offload.qopt.hw;
 	} else {
 		netdev_set_num_tc(dev, qopt->num_tc);
 		for (i = 0; i < qopt->num_tc; i++)
@@ -227,11 +330,51 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 	return 0;
 }
 
+static int dump_rates(struct mqprio_sched *priv,
+		      struct tc_mqprio_qopt *opt, struct sk_buff *skb)
+{
+	struct nlattr *nest;
+	int i;
+
+	if (priv->flags & TC_MQPRIO_F_MIN_RATE) {
+		nest = nla_nest_start(skb, TCA_MQPRIO_MIN_RATE64);
+		if (!nest)
+			goto nla_put_failure;
+
+		for (i = 0; i < opt->num_tc; i++) {
+			if (nla_put(skb, TCA_MQPRIO_MIN_RATE64,
+				    sizeof(priv->min_rate[i]),
+				    &priv->min_rate[i]))
+				goto nla_put_failure;
+		}
+		nla_nest_end(skb, nest);
+	}
+
+	if (priv->flags & TC_MQPRIO_F_MAX_RATE) {
+		nest = nla_nest_start(skb, TCA_MQPRIO_MAX_RATE64);
+		if (!nest)
+			goto nla_put_failure;
+
+		for (i = 0; i < opt->num_tc; i++) {
+			if (nla_put(skb, TCA_MQPRIO_MAX_RATE64,
+				    sizeof(priv->max_rate[i]),
+				    &priv->max_rate[i]))
+				goto nla_put_failure;
+		}
+		nla_nest_end(skb, nest);
+	}
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
-	unsigned char *b = skb_tail_pointer(skb);
+	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
 	struct Qdisc *qdisc;
 	unsigned int i;
@@ -262,12 +405,17 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		opt.offset[i] = dev->tc_to_txq[i].offset;
 	}
 
-	if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt))
+	if (nla_put(skb, TCA_OPTIONS, NLA_ALIGN(sizeof(opt)), &opt))
 		goto nla_put_failure;
 
-	return skb->len;
+	if (priv->flags & TC_MQPRIO_F_MIN_RATE ||
+	    priv->flags & TC_MQPRIO_F_MAX_RATE)
+		if (dump_rates(priv, &opt, skb) != 0)
+			goto nla_put_failure;
+
+	return nla_nest_end(skb, nla);
 nla_put_failure:
-	nlmsg_trim(skb, b);
+	nlmsg_trim(skb, nla);
 	return -1;
 }
 


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/6] [next-queue]net: i40e: Add macro for PF reset bit
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 10:18   ` Amritha Nambiar
  -1 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

Introduce a macro for the bit setting the PF reset flag and
update its usages. This makes it easier to use this flag
in functions to be introduced in future without encountering
checkpatch issues related to alignment and line over 80
characters.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |    2 ++
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |    3 +--
 drivers/net/ethernet/intel/i40e/i40e_main.c        |    9 ++++-----
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |    5 ++---
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 5f75e67..bd9bf46 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -155,6 +155,8 @@ enum i40e_state_t {
 	__I40E_STATE_SIZE__,
 };
 
+#define I40E_PF_RESET_FLAG	BIT_ULL(__I40E_PF_RESET_REQUESTED)
+
 /* VSI state flags */
 enum i40e_vsi_state_t {
 	__I40E_VSI_DOWN,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 8f326f8..b46117e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -798,8 +798,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 		 */
 		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
 			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
-			i40e_do_reset_safe(pf,
-					   BIT_ULL(__I40E_PF_RESET_REQUESTED));
+			i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
 		}
 
 		vsi = i40e_vsi_setup(pf, I40E_VSI_VMDQ2, vsi_seid, 0);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 998ad96..03412a8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5738,7 +5738,7 @@ int i40e_vsi_open(struct i40e_vsi *vsi)
 err_setup_tx:
 	i40e_vsi_free_tx_resources(vsi);
 	if (vsi == pf->vsi[pf->lan_vsi])
-		i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED), true);
+		i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 
 	return err;
 }
@@ -5866,7 +5866,7 @@ void i40e_do_reset(struct i40e_pf *pf, u32 reset_flags, bool lock_acquired)
 		wr32(&pf->hw, I40E_GLGEN_RTRIG, val);
 		i40e_flush(&pf->hw);
 
-	} else if (reset_flags & BIT_ULL(__I40E_PF_RESET_REQUESTED)) {
+	} else if (reset_flags & I40E_PF_RESET_FLAG) {
 
 		/* Request a PF Reset
 		 *
@@ -9145,7 +9145,7 @@ static int i40e_set_features(struct net_device *netdev,
 	need_reset = i40e_set_ntuple(pf, features);
 
 	if (need_reset)
-		i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED), true);
+		i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 
 	return 0;
 }
@@ -9397,8 +9397,7 @@ static int i40e_ndo_bridge_setlink(struct net_device *dev,
 				pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
 			else
 				pf->flags &= ~I40E_FLAG_VEB_MODE_ENABLED;
-			i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED),
-				      true);
+			i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 			break;
 		}
 	}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 3ef67dc..be172f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1377,8 +1377,7 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	if (num_vfs) {
 		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
 			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
-			i40e_do_reset_safe(pf,
-					   BIT_ULL(__I40E_PF_RESET_REQUESTED));
+			i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
 		}
 		return i40e_pci_sriov_enable(pdev, num_vfs);
 	}
@@ -1386,7 +1385,7 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	if (!pci_vfs_assigned(pf->pdev)) {
 		i40e_free_vfs(pf);
 		pf->flags &= ~I40E_FLAG_VEB_MODE_ENABLED;
-		i40e_do_reset_safe(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED));
+		i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
 	} else {
 		dev_warn(&pdev->dev, "Unable to free VFs because some are assigned to VMs.\n");
 		return -EINVAL;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 2/6] [next-queue]net: i40e: Add macro for PF reset bit
@ 2017-07-11 10:18   ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan

Introduce a macro for the bit setting the PF reset flag and
update its usages. This makes it easier to use this flag
in functions to be introduced in future without encountering
checkpatch issues related to alignment and line over 80
characters.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |    2 ++
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |    3 +--
 drivers/net/ethernet/intel/i40e/i40e_main.c        |    9 ++++-----
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |    5 ++---
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 5f75e67..bd9bf46 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -155,6 +155,8 @@ enum i40e_state_t {
 	__I40E_STATE_SIZE__,
 };
 
+#define I40E_PF_RESET_FLAG	BIT_ULL(__I40E_PF_RESET_REQUESTED)
+
 /* VSI state flags */
 enum i40e_vsi_state_t {
 	__I40E_VSI_DOWN,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 8f326f8..b46117e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -798,8 +798,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 		 */
 		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
 			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
-			i40e_do_reset_safe(pf,
-					   BIT_ULL(__I40E_PF_RESET_REQUESTED));
+			i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
 		}
 
 		vsi = i40e_vsi_setup(pf, I40E_VSI_VMDQ2, vsi_seid, 0);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 998ad96..03412a8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5738,7 +5738,7 @@ int i40e_vsi_open(struct i40e_vsi *vsi)
 err_setup_tx:
 	i40e_vsi_free_tx_resources(vsi);
 	if (vsi == pf->vsi[pf->lan_vsi])
-		i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED), true);
+		i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 
 	return err;
 }
@@ -5866,7 +5866,7 @@ void i40e_do_reset(struct i40e_pf *pf, u32 reset_flags, bool lock_acquired)
 		wr32(&pf->hw, I40E_GLGEN_RTRIG, val);
 		i40e_flush(&pf->hw);
 
-	} else if (reset_flags & BIT_ULL(__I40E_PF_RESET_REQUESTED)) {
+	} else if (reset_flags & I40E_PF_RESET_FLAG) {
 
 		/* Request a PF Reset
 		 *
@@ -9145,7 +9145,7 @@ static int i40e_set_features(struct net_device *netdev,
 	need_reset = i40e_set_ntuple(pf, features);
 
 	if (need_reset)
-		i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED), true);
+		i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 
 	return 0;
 }
@@ -9397,8 +9397,7 @@ static int i40e_ndo_bridge_setlink(struct net_device *dev,
 				pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
 			else
 				pf->flags &= ~I40E_FLAG_VEB_MODE_ENABLED;
-			i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED),
-				      true);
+			i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 			break;
 		}
 	}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 3ef67dc..be172f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1377,8 +1377,7 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	if (num_vfs) {
 		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
 			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
-			i40e_do_reset_safe(pf,
-					   BIT_ULL(__I40E_PF_RESET_REQUESTED));
+			i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
 		}
 		return i40e_pci_sriov_enable(pdev, num_vfs);
 	}
@@ -1386,7 +1385,7 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	if (!pci_vfs_assigned(pf->pdev)) {
 		i40e_free_vfs(pf);
 		pf->flags &= ~I40E_FLAG_VEB_MODE_ENABLED;
-		i40e_do_reset_safe(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED));
+		i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
 	} else {
 		dev_warn(&pdev->dev, "Unable to free VFs because some are assigned to VMs.\n");
 		return -EINVAL;


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/6] [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 10:18   ` Amritha Nambiar
  -1 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

This patch sets up the infrastructure for offloading TCs and
queue configurations to the hardware by creating HW channels(VSI).
A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC
(TC0). TC0 for the main VSI is also reconfigured as per user provided
queue parameters. Queue counts that are not power-of-2 are handled by
reconfiguring RSS by reprogramming LUTs using the queue count value.
This patch also handles configuring the TX rings for the channels,
setting up the RX queue map for channel.

Also, the channels so created are removed and all the queue
configuration is set to default when the qdisc is detached from the
root of the device.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |   33 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |  747 +++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |    2 
 3 files changed, 773 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index bd9bf46..93bb527 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -86,6 +86,7 @@
 #define I40E_AQ_LEN			256
 #define I40E_AQ_WORK_LIMIT		66 /* max number of VFs + a little */
 #define I40E_MAX_USER_PRIORITY		8
+#define I40E_MAX_QUEUES_PER_CH		64
 #define I40E_DEFAULT_TRAFFIC_CLASS	BIT(0)
 #define I40E_DEFAULT_MSG_ENABLE		4
 #define I40E_QUEUE_WAIT_RETRY_LIMIT	10
@@ -338,6 +339,23 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+struct i40e_channel {
+	struct list_head list;
+	bool initialized;
+	u8 type;
+	u16 vsi_number; /* Assigned VSI number from AQ 'Add VSI' response */
+	u16 stat_counter_idx;
+	u16 base_queue;
+	u16 num_queue_pairs; /* Requested by user */
+	u16 seid;
+
+	u8 enabled_tc;
+	struct i40e_aqc_vsi_properties_data info;
+
+	/* track this channel belongs to which VSI */
+	struct i40e_vsi *parent_vsi;
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
 	struct pci_dev *pdev;
@@ -453,6 +471,7 @@ struct i40e_pf {
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(25)
 #define I40E_FLAG_CLIENT_RESET			BIT_ULL(26)
 #define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED	BIT_ULL(27)
+#define I40E_FLAG_TC_MQPRIO			BIT_ULL(28)
 
 	struct i40e_client_instance *cinst;
 	bool stat_offsets_loaded;
@@ -533,6 +552,8 @@ struct i40e_pf {
 	u32 ioremap_len;
 	u32 fd_inv;
 	u16 phy_led_val;
+
+	u16 override_q_count;
 };
 
 /**
@@ -697,6 +718,16 @@ struct i40e_vsi {
 	bool current_isup;	/* Sync 'link up' logging */
 	enum i40e_aq_link_speed current_speed;	/* Sync link speed logging */
 
+	/* channel specific fields */
+	u16 cnt_q_avail; /* num of queues available for channel usage */
+	u16 orig_rss_size;
+	u16 current_rss_size;
+
+	/* keeps track of next_base_queue to be used for channel setup */
+	atomic_t next_base_queue;
+
+	struct list_head ch_list;
+
 	void *priv;	/* client driver data reference. */
 
 	/* VSI specific handlers */
@@ -1004,4 +1035,6 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
 {
 	return !!vsi->xdp_prog;
 }
+
+int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 03412a8..54a6275 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2877,7 +2877,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 	struct i40e_vsi *vsi = ring->vsi;
 	cpumask_var_t mask;
 
-	if (!ring->q_vector || !ring->netdev)
+	if (!ring->q_vector || !ring->netdev || ring->ch)
 		return;
 
 	/* Single TC mode enable XPS */
@@ -2950,7 +2950,14 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	 * initialization. This has to be done regardless of
 	 * DCB as by default everything is mapped to TC0.
 	 */
-	tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
+	if (ring->ch)
+		tx_ctx.rdylist =
+			le16_to_cpu(ring->ch->info.qs_handle[ring->dcb_tc]);
+
+	else
+		tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
 	tx_ctx.rdylist_act = 0;
 
 	/* clear the context in the HMC */
@@ -2972,12 +2979,23 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	}
 
 	/* Now associate this queue with this PCI function */
-	if (vsi->type == I40E_VSI_VMDQ2) {
-		qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
-		qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
-			   I40E_QTX_CTL_VFVM_INDX_MASK;
+	if (ring->ch) {
+		if (ring->ch->type == I40E_VSI_VMDQ2)
+			qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
+		else
+			return -EINVAL;
+
+		qtx_ctl |= (ring->ch->vsi_number <<
+			    I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+			    I40E_QTX_CTL_VFVM_INDX_MASK;
 	} else {
-		qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+		if (vsi->type == I40E_VSI_VMDQ2) {
+			qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
+			qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+				    I40E_QTX_CTL_VFVM_INDX_MASK;
+		} else {
+			qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+		}
 	}
 
 	qtx_ctl |= ((hw->pf_id << I40E_QTX_CTL_PF_INDX_SHIFT) &
@@ -5160,6 +5178,672 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 }
 
 /**
+ * i40e_remove_queue_channels - Remove queue channels for the TCs
+ * @vsi: VSI to be configured
+ *
+ * Remove queue channels for the TCs
+ **/
+static void i40e_remove_queue_channels(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	int ret, i;
+
+	/* Reset rss size that was stored when reconfiguring rss for
+	 * channel VSIs with non-power-of-2 queue count.
+	 */
+	vsi->current_rss_size = 0;
+
+	/* perform cleanup for channels if they exist */
+	if (list_empty(&vsi->ch_list))
+		return;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		struct i40e_vsi *p_vsi;
+
+		list_del(&ch->list);
+		p_vsi = ch->parent_vsi;
+		if (!p_vsi || !ch->initialized) {
+			kfree(ch);
+			continue;
+		}
+		/* Reset queue contexts */
+		for (i = 0; i < ch->num_queue_pairs; i++) {
+			struct i40e_ring *tx_ring, *rx_ring;
+			u16 pf_q;
+
+			pf_q = ch->base_queue + i;
+			tx_ring = vsi->tx_rings[pf_q];
+			tx_ring->ch = NULL;
+
+			rx_ring = vsi->rx_rings[pf_q];
+			rx_ring->ch = NULL;
+		}
+
+		/* delete VSI from FW */
+		ret = i40e_aq_delete_element(&vsi->back->hw, ch->seid,
+					     NULL);
+		if (ret)
+			dev_err(&vsi->back->pdev->dev,
+				"unable to remove channel (%d) for parent VSI(%d)\n",
+				ch->seid, p_vsi->seid);
+		kfree(ch);
+	}
+	INIT_LIST_HEAD(&vsi->ch_list);
+}
+
+/**
+ * i40e_is_any_channel - channel exist or not
+ * @vsi: ptr to VSI to which channels are associated with
+ *
+ * Returns true or false if channel(s) exist for associated VSI or not
+ **/
+static bool i40e_is_any_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (ch->initialized)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * i40e_get_max_queues_for_channel
+ * @vsi: ptr to VSI to which channels are associated with
+ *
+ * Helper function which returns max value among the queue counts set on the
+ * channels/TCs created.
+ **/
+static int i40e_get_max_queues_for_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	int max = 0;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (!ch->initialized)
+			continue;
+		if (ch->num_queue_pairs > max)
+			max = ch->num_queue_pairs;
+	}
+
+	return max;
+}
+
+/**
+ * i40e_validate_num_queues - validate num_queues w.r.t channel
+ * @pf: ptr to PF device
+ * @num_queues: number of queues
+ * @vsi: the parent VSI
+ * @reconfig_rss: indicates should the RSS be reconfigured or not
+ *
+ * This function validates number of queues in the context of new channel
+ * which is being established and determines if RSS should be reconfigured
+ * or not for parent VSI.
+ **/
+static int i40e_validate_num_queues(struct i40e_pf *pf, int num_queues,
+				    struct i40e_vsi *vsi, bool *reconfig_rss)
+{
+	int max_ch_queues;
+
+	if (!reconfig_rss)
+		return -EINVAL;
+
+	*reconfig_rss = false;
+
+	if (num_queues > I40E_MAX_QUEUES_PER_CH) {
+		dev_err(&pf->pdev->dev,
+			"Failed to create VMDq VSI. User requested num_queues (%d) > I40E_MAX_QUEUES_PER_VSI (%u)\n",
+			num_queues, I40E_MAX_QUEUES_PER_CH);
+		return -EINVAL;
+	}
+
+	if (vsi->current_rss_size) {
+		if (num_queues > vsi->current_rss_size) {
+			dev_dbg(&pf->pdev->dev,
+				"Error: num_queues (%d) > vsi's current_size(%d)\n",
+				num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		} else if ((num_queues < vsi->current_rss_size) &&
+			   (!is_power_of_2(num_queues))) {
+			dev_dbg(&pf->pdev->dev,
+				"Error: num_queues (%d) < vsi's current_size(%d), but not power of 2\n",
+				num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		}
+	}
+
+	if (!is_power_of_2(num_queues)) {
+		/* Find the max num_queues configures for channel if channel
+		 * exist.
+		 * if channel exist, then enforce 'num_queues' to be more than
+		 * max ever num_queues configured for channel.
+		 */
+		max_ch_queues = i40e_get_max_queues_for_channel(vsi);
+		if (num_queues < max_ch_queues) {
+			dev_dbg(&pf->pdev->dev,
+				"Error: num_queues (%d) > main_vsi's current_size(%d)\n",
+				num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		}
+		*reconfig_rss = true;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_vsi_reconfig_rss - reconfig RSS based on specified rss_size
+ * @vsi: the VSI being setup
+ * @rss_size: size of RSS, accordingly LUT gets reprogrammed
+ *
+ * This function reconfigures RSS by reprogramming LUTs using 'rss_size'
+ **/
+static int i40e_vsi_reconfig_rss(struct i40e_vsi *vsi, u16 rss_size)
+{
+	struct i40e_pf *pf = vsi->back;
+	u8 seed[I40E_HKEY_ARRAY_SIZE];
+	struct i40e_hw *hw = &pf->hw;
+	int local_rss_size;
+	u8 *lut;
+	int ret;
+
+	if (!vsi->rss_size)
+		return -EINVAL;
+
+	if (rss_size > vsi->rss_size)
+		return -EINVAL;
+
+	local_rss_size = min_t(int, vsi->rss_size, rss_size);
+	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
+	if (!lut)
+		return -ENOMEM;
+
+	/* Ignoring user configured lut if there is one */
+	i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, local_rss_size);
+
+	/* Use user configured hash key if there is one, otherwise
+	 * use default.
+	 */
+	if (vsi->rss_hkey_user)
+		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
+	else
+		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
+
+	ret = i40e_config_rss(vsi, seed, lut, vsi->rss_table_size);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "Cannot set RSS lut, err %s aq_err %s\n",
+			 i40e_stat_str(hw, ret),
+			 i40e_aq_str(hw, hw->aq.asq_last_status));
+		kfree(lut);
+		return ret;
+	}
+	kfree(lut);
+
+	/* Do the update w.r.t. storing rss_size */
+	if (!vsi->orig_rss_size)
+		vsi->orig_rss_size = vsi->rss_size;
+	vsi->current_rss_size = local_rss_size;
+
+	return ret;
+}
+
+/**
+ * i40e_channel_setup_queue_map - Setup a channel queue map based on enabled_tc
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ctxt: VSI context structure
+ * @enabled_tc: Enabled TCs bitmap
+ * @ch: ptr to channel structure
+ *
+ * Setup queue map based on enabled_tc for specific channel
+ **/
+static void i40e_channel_setup_queue_map(struct i40e_pf *pf,
+					 struct i40e_vsi_context *ctxt,
+					 u8 enabled_tc, struct i40e_channel *ch)
+{
+	u16 qcount, num_tc_qps, qmap;
+	int pow, num_qps;
+	u16 sections = 0;
+	/* At least TC0 is enabled in case of non-DCB case, non-MQPRIO */
+	u16 numtc = 1;
+	u8 offset = 0;
+
+	sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
+	sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
+
+	if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
+		qcount = min_t(int, ch->num_queue_pairs,
+			       pf->num_lan_msix);
+		ch->num_queue_pairs = qcount;
+	} else {
+		qcount = 1;
+	}
+
+	/* find num of qps per traffic class */
+	num_tc_qps = qcount / numtc;
+	num_tc_qps = min_t(int, num_tc_qps, i40e_pf_get_max_q_per_tc(pf));
+	num_qps = qcount;
+
+	/* find the next higher power-of-2 of num queue pairs */
+	pow = ilog2(num_qps);
+	if (!is_power_of_2(num_qps))
+		pow++;
+
+	qmap = (offset << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) |
+		(pow << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT);
+
+	/* Setup queue TC[0].qmap for given VSI context */
+	ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
+
+	ctxt->info.up_enable_bits = enabled_tc;
+	ctxt->info.mapping_flags |= cpu_to_le16(I40E_AQ_VSI_QUE_MAP_CONTIG);
+	ctxt->info.queue_mapping[0] = cpu_to_le16(ch->base_queue);
+	ctxt->info.valid_sections |= cpu_to_le16(sections);
+}
+
+/**
+ * i40e_add_channel - add a channel by adding VSI
+ * @pf: ptr to PF device
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @ch: ptr to channel structure
+ *
+ * Add a channel (VSI) using add_vsi and queue_map
+ **/
+static int i40e_add_channel(struct i40e_pf *pf, u16 uplink_seid,
+			    struct i40e_channel *ch)
+{
+	struct i40e_hw *hw = &pf->hw;
+	struct i40e_vsi_context ctxt;
+	u8 enabled_tc = 0x1; /* TC0 enabled */
+	int ret;
+
+	if (ch->type != I40E_VSI_VMDQ2) {
+		dev_info(&pf->pdev->dev,
+			 "add new vsi failed, ch->type %d\n", ch->type);
+		return -EINVAL;
+	}
+
+	memset(&ctxt, 0, sizeof(ctxt));
+	ctxt.pf_num = hw->pf_id;
+	ctxt.vf_num = 0;
+	ctxt.uplink_seid = uplink_seid;
+	ctxt.connection_type = I40E_AQ_VSI_CONN_TYPE_NORMAL;
+	if (ch->type == I40E_VSI_VMDQ2)
+		ctxt.flags = I40E_AQ_VSI_TYPE_VMDQ2;
+
+	if (pf->flags & I40E_FLAG_VEB_MODE_ENABLED) {
+		ctxt.info.valid_sections |=
+		     cpu_to_le16(I40E_AQ_VSI_PROP_SWITCH_VALID);
+		ctxt.info.switch_id =
+		   cpu_to_le16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
+	}
+
+	/* Set queue map for a given VSI context */
+	i40e_channel_setup_queue_map(pf, &ctxt, enabled_tc, ch);
+
+	/* Now time to create VSI */
+	ret = i40e_aq_add_vsi(hw, &ctxt, NULL);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "add new vsi failed, err %s aq_err %s\n",
+			 i40e_stat_str(&pf->hw, ret),
+			 i40e_aq_str(&pf->hw,
+				     pf->hw.aq.asq_last_status));
+		return -ENOENT;
+	}
+
+	/* Success, update channel */
+	ch->enabled_tc = enabled_tc;
+	ch->seid = ctxt.seid;
+	ch->vsi_number = ctxt.vsi_number;
+	ch->stat_counter_idx = cpu_to_le16(ctxt.info.stat_counter_idx);
+
+	/* copy just the sections touched not the entire info
+	 * since not all sections are valid as returned by
+	 * update vsi params
+	 */
+	ch->info.mapping_flags = ctxt.info.mapping_flags;
+	memcpy(&ch->info.queue_mapping,
+	       &ctxt.info.queue_mapping, sizeof(ctxt.info.queue_mapping));
+	memcpy(&ch->info.tc_mapping, ctxt.info.tc_mapping,
+	       sizeof(ctxt.info.tc_mapping));
+
+	return 0;
+}
+
+static int i40e_channel_config_bw(struct i40e_vsi *vsi, struct i40e_channel *ch,
+				  u8 *bw_share)
+{
+	struct i40e_aqc_configure_vsi_tc_bw_data bw_data;
+	i40e_status ret;
+	int i;
+
+	bw_data.tc_valid_bits = ch->enabled_tc;
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		bw_data.tc_bw_credits[i] = bw_share[i];
+
+	ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, ch->seid,
+				       &bw_data, NULL);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Config VSI BW allocation per TC failed, aq_err: %d for new_vsi->seid %u\n",
+			 vsi->back->hw.aq.asq_last_status, ch->seid);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		ch->info.qs_handle[i] = bw_data.qs_handles[i];
+
+	return 0;
+}
+
+/**
+ * i40e_channel_config_tx_ring - config TX ring associated with new channel
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ch: ptr to channel structure
+ *
+ * Configure TX rings associated with channel (VSI) since queues are being
+ * from parent VSI.
+ **/
+static int i40e_channel_config_tx_ring(struct i40e_pf *pf,
+				       struct i40e_vsi *vsi,
+				       struct i40e_channel *ch)
+{
+	i40e_status ret;
+	int i;
+	u8 bw_share[I40E_MAX_TRAFFIC_CLASS] = {0};
+
+	/* Enable ETS TCs with equal BW Share for now across all VSIs */
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+		if (ch->enabled_tc & BIT(i))
+			bw_share[i] = 1;
+	}
+
+	/* configure BW for new VSI */
+	ret = i40e_channel_config_bw(vsi, ch, bw_share);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Failed configuring TC map %d for channel (seid %u)\n",
+			 ch->enabled_tc, ch->seid);
+		return ret;
+	}
+
+	for (i = 0; i < ch->num_queue_pairs; i++) {
+		struct i40e_ring *tx_ring, *rx_ring;
+		u16 pf_q;
+
+		pf_q = ch->base_queue + i;
+
+		/* Get to TX ring ptr of main VSI, for re-setup TX queue
+		 * context
+		 */
+		tx_ring = vsi->tx_rings[pf_q];
+		tx_ring->ch = ch;
+
+		/* Get the RX ring ptr */
+		rx_ring = vsi->rx_rings[pf_q];
+		rx_ring->ch = ch;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_setup_hw_channel - setup new channel
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ch: ptr to channel structure
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @type: type of channel to be created (VMDq2/VF)
+ *
+ * Setup new channel (VSI) based on specified type (VMDq2/VF)
+ * and configures TX rings accordingly
+ **/
+static inline int i40e_setup_hw_channel(struct i40e_pf *pf,
+					struct i40e_vsi *vsi,
+					struct i40e_channel *ch,
+					u16 uplink_seid, u8 type)
+{
+	struct i40e_q_vector *q_vector;
+	int base_queue = 0;
+	int ret, i;
+
+	ch->initialized = false;
+	ch->base_queue = atomic_read(&vsi->next_base_queue);
+	ch->type = type;
+
+	/* Proceed with creation of channel (VMDq2) VSI */
+	ret = i40e_add_channel(pf, uplink_seid, ch);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "failed to add_channel using uplink_seid %u\n",
+			 uplink_seid);
+		return ret;
+	}
+
+	/* Mark the successful creation of channel */
+	ch->initialized = true;
+
+	/* Mark q_vectors indicating that they are part of newly created
+	 * channel (VSI)
+	 */
+	base_queue = ch->base_queue;
+	for (i = 0; i < ch->num_queue_pairs; i++) {
+		q_vector = vsi->tx_rings[base_queue + i]->q_vector;
+
+		if (!q_vector)
+			continue;
+	}
+
+	/* Reconfigure TX queues using QTX_CTL register */
+	ret = i40e_channel_config_tx_ring(pf, vsi, ch);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "failed to configure TX rings for channel %u\n",
+			 ch->seid);
+		return ret;
+	}
+
+	/* update 'next_base_queue' */
+	atomic_set(&vsi->next_base_queue,
+		   atomic_read(&vsi->next_base_queue) + ch->num_queue_pairs);
+
+	dev_dbg(&pf->pdev->dev,
+		"Added channel: vsi_seid %u, vsi_number %u, stat_counter_idx %u, num_queue_pairs %u, pf->next_base_queue %d\n",
+		ch->seid, ch->vsi_number, ch->stat_counter_idx,
+		ch->num_queue_pairs,
+		atomic_read(&vsi->next_base_queue));
+
+	return ret;
+}
+
+/**
+ * i40e_setup_channel - setup new channel using uplink element
+ * @pf: ptr to PF device
+ * @type: type of channel to be created (VMDq2/VF)
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @ch: ptr to channel structure
+ *
+ * Setup new channel (VSI) based on specified type (VMDq2/VF)
+ * and uplink switching element (uplink_seid)
+ **/
+static bool i40e_setup_channel(struct i40e_pf *pf, struct i40e_vsi *vsi,
+			       struct i40e_channel *ch)
+{
+	u16 seid = vsi->seid;
+	u8 vsi_type;
+	int ret;
+
+	if (vsi->type == I40E_VSI_MAIN) {
+		vsi_type = I40E_VSI_VMDQ2;
+	} else {
+		dev_err(&pf->pdev->dev, "unsupported vsi type(%d) of parent vsi\n",
+			vsi->type);
+		return false;
+	}
+
+	/* underlying switching element */
+	seid = pf->vsi[pf->lan_vsi]->uplink_seid;
+
+	/* create channel (VSI), configure TX rings */
+	ret = i40e_setup_hw_channel(pf, vsi, ch, seid, vsi_type);
+	if (ret) {
+		dev_err(&pf->pdev->dev, "failed to setup hw_channel\n");
+		return false;
+	}
+
+	return ch->initialized ? true : false;
+}
+
+/**
+ * i40e_create_queue_channel - function to create channel
+ * @vsi: VSI to be configured
+ * @ch: ptr to channel (it contains channel specific params)
+ *
+ * This function creates channel (VSI) using num_queues specified by user,
+ * reconfigs RSS if needed.
+ **/
+int i40e_create_queue_channel(struct i40e_vsi *vsi,
+			      struct i40e_channel *ch)
+{
+	struct i40e_pf *pf = vsi->back;
+	bool reconfig_rss;
+	int err;
+
+	if (!ch)
+		return -EINVAL;
+
+	if (!ch->num_queue_pairs) {
+		dev_err(&pf->pdev->dev, "Invalid num_queues requested: %d\n",
+			ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* validate user requested num_queues for channel */
+	err = i40e_validate_num_queues(pf, ch->num_queue_pairs, vsi,
+				       &reconfig_rss);
+	if (err) {
+		dev_info(&pf->pdev->dev, "Failed to validate num_queues (%d)\n",
+			 ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* By default we are in VEPA mode, if this is the first VF/VMDq
+	 * VSI to be added switch to VEB mode.
+	 */
+	if ((!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) ||
+	    (!i40e_is_any_channel(vsi))) {
+		if (!is_power_of_2(vsi->tc_config.tc_info[0].qcount)) {
+			dev_dbg(&pf->pdev->dev,
+				"Failed to create channel. Override queues (%u) not power of 2\n",
+				vsi->tc_config.tc_info[0].qcount);
+			return -EINVAL;
+		}
+
+		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
+			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
+
+			if (vsi->type == I40E_VSI_MAIN) {
+				if (pf->flags & I40E_FLAG_TC_MQPRIO)
+					i40e_do_reset(pf, I40E_PF_RESET_FLAG,
+						      true);
+				else
+					i40e_do_reset_safe(pf,
+							   I40E_PF_RESET_FLAG);
+			}
+		}
+		/* now onwards for main VSI, number of queues will be value
+		 * of TC0's queue count
+		 */
+	}
+
+	/* By this time, vsi->cnt_q_avail shall be set to non-zero and
+	 * it should be more than num_queues
+	 */
+	if (!vsi->cnt_q_avail || (vsi->cnt_q_avail < ch->num_queue_pairs)) {
+		dev_dbg(&pf->pdev->dev,
+			"Error: cnt_q_avail (%u) less than num_queues %d\n",
+			vsi->cnt_q_avail, ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* reconfig_rss only if vsi type is MAIN_VSI */
+	if (reconfig_rss && (vsi->type == I40E_VSI_MAIN)) {
+		err = i40e_vsi_reconfig_rss(vsi, ch->num_queue_pairs);
+		if (err) {
+			dev_info(&pf->pdev->dev,
+				 "Error: unable to reconfig rss for num_queues (%u)\n",
+				 ch->num_queue_pairs);
+			return -EINVAL;
+		}
+	}
+
+	if (!i40e_setup_channel(pf, vsi, ch)) {
+		dev_info(&pf->pdev->dev, "Failed to setup channel\n");
+		return -EINVAL;
+	}
+
+	dev_info(&pf->pdev->dev,
+		 "Setup channel (id:%u) utilizing num_queues %d\n",
+		 ch->seid, ch->num_queue_pairs);
+
+	/* in case of VF, this will be main SRIOV VSI */
+	ch->parent_vsi = vsi;
+
+	/* and update main_vsi's count for queue_available to use */
+	vsi->cnt_q_avail -= ch->num_queue_pairs;
+
+	return 0;
+}
+
+/**
+ * i40e_configure_queue_channels - Add queue channel for the given TCs
+ * @vsi: VSI to be configured
+ *
+ * Configures queue channel mapping to the given TCs
+ **/
+static int i40e_configure_queue_channels(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch;
+	int ret = 0, i;
+
+	/* Create app vsi with the TCs. Main VSI with TC0 is already set up */
+	for (i = 1; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		if (vsi->tc_config.enabled_tc & BIT(i)) {
+			ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+			if (!ch) {
+				ret = -ENOMEM;
+				goto err_free;
+			}
+
+			INIT_LIST_HEAD(&ch->list);
+			ch->num_queue_pairs =
+				vsi->tc_config.tc_info[i].qcount;
+			ch->base_queue =
+				vsi->tc_config.tc_info[i].qoffset;
+
+			list_add_tail(&ch->list, &vsi->ch_list);
+
+			ret = i40e_create_queue_channel(vsi, ch);
+			if (ret) {
+				dev_err(&vsi->back->pdev->dev,
+					"Failed creating queue channel with TC%d: queues %d\n",
+					i, ch->num_queue_pairs);
+				goto err_free;
+			}
+		}
+	return ret;
+
+err_free:
+	i40e_remove_queue_channels(vsi);
+	return ret;
+}
+
+/**
  * i40e_veb_config_tc - Configure TCs for given VEB
  * @veb: given VEB
  * @enabled_tc: TC bitmap
@@ -5604,10 +6288,18 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 		goto exit;
 	}
 
-	/* Unquiesce VSI */
-	i40e_unquiesce_vsi(vsi);
+	if (pf->flags & I40E_FLAG_TC_MQPRIO) {
+		ret = i40e_configure_queue_channels(vsi);
+		if (ret) {
+			netdev_info(netdev,
+				    "Failed configuring queue channels\n");
+			goto exit;
+		}
+	}
 
 exit:
+	/* Unquiesce VSI */
+	i40e_unquiesce_vsi(vsi);
 	return ret;
 }
 
@@ -7008,6 +7700,35 @@ static void i40e_fdir_teardown(struct i40e_pf *pf)
 }
 
 /**
+ * i40e_rebuild_channels - Rebuilds channel VSIs if they existed before reset
+ * @vsi: PF main vsi
+ *
+ * Rebuilds channel VSIs if they existed before reset
+ **/
+static int i40e_rebuild_channels(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	i40e_status ret;
+
+	if (list_empty(&vsi->ch_list))
+		return 0;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (!ch->initialized)
+			break;
+		/* Proceed with creation of channel (VMDq2) VSI */
+		ret = i40e_add_channel(vsi->back, vsi->uplink_seid, ch);
+		if (ret) {
+			dev_info(&vsi->back->pdev->dev,
+				 "failed to rebuild channels using uplink_seid %u\n",
+				 vsi->uplink_seid);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+/**
  * i40e_prep_for_reset - prep for the core to reset
  * @pf: board private structure
  * @lock_acquired: indicates whether or not the lock has been acquired
@@ -7272,6 +7993,13 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		}
 	}
 
+	/* PF Main VSI is rebuild by now, go ahead and rebuild channel VSIs
+	 * for this main VSI if they exist
+	 */
+	ret = i40e_rebuild_channels(pf->vsi[pf->lan_vsi]);
+	if (ret)
+		goto end_unlock;
+
 	/* Reconfigure hardware for allowing smaller MSS in the case
 	 * of TSO, so that we avoid the MDD being fired and causing
 	 * a reset in the case of small MSS+TSO.
@@ -11489,6 +12217,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev_info(&pdev->dev, "setup_pf_switch failed: %d\n", err);
 		goto err_vsis;
 	}
+	INIT_LIST_HEAD(&pf->vsi[pf->lan_vsi]->ch_list);
 
 	/* Make sure flow control is set according to current settings */
 	err = i40e_set_fc(hw, &set_fc_aq_fail, true);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index f0a0eab..374a0a9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -423,6 +423,8 @@ struct i40e_ring {
 					 * i40e_clean_rx_ring_irq() is called
 					 * for this ring.
 					 */
+
+	struct i40e_channel *ch;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ring_uses_build_skb(struct i40e_ring *ring)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 3/6] [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler
@ 2017-07-11 10:18   ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan

This patch sets up the infrastructure for offloading TCs and
queue configurations to the hardware by creating HW channels(VSI).
A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC
(TC0). TC0 for the main VSI is also reconfigured as per user provided
queue parameters. Queue counts that are not power-of-2 are handled by
reconfiguring RSS by reprogramming LUTs using the queue count value.
This patch also handles configuring the TX rings for the channels,
setting up the RX queue map for channel.

Also, the channels so created are removed and all the queue
configuration is set to default when the qdisc is detached from the
root of the device.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |   33 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |  747 +++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |    2 
 3 files changed, 773 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index bd9bf46..93bb527 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -86,6 +86,7 @@
 #define I40E_AQ_LEN			256
 #define I40E_AQ_WORK_LIMIT		66 /* max number of VFs + a little */
 #define I40E_MAX_USER_PRIORITY		8
+#define I40E_MAX_QUEUES_PER_CH		64
 #define I40E_DEFAULT_TRAFFIC_CLASS	BIT(0)
 #define I40E_DEFAULT_MSG_ENABLE		4
 #define I40E_QUEUE_WAIT_RETRY_LIMIT	10
@@ -338,6 +339,23 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+struct i40e_channel {
+	struct list_head list;
+	bool initialized;
+	u8 type;
+	u16 vsi_number; /* Assigned VSI number from AQ 'Add VSI' response */
+	u16 stat_counter_idx;
+	u16 base_queue;
+	u16 num_queue_pairs; /* Requested by user */
+	u16 seid;
+
+	u8 enabled_tc;
+	struct i40e_aqc_vsi_properties_data info;
+
+	/* track this channel belongs to which VSI */
+	struct i40e_vsi *parent_vsi;
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
 	struct pci_dev *pdev;
@@ -453,6 +471,7 @@ struct i40e_pf {
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(25)
 #define I40E_FLAG_CLIENT_RESET			BIT_ULL(26)
 #define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED	BIT_ULL(27)
+#define I40E_FLAG_TC_MQPRIO			BIT_ULL(28)
 
 	struct i40e_client_instance *cinst;
 	bool stat_offsets_loaded;
@@ -533,6 +552,8 @@ struct i40e_pf {
 	u32 ioremap_len;
 	u32 fd_inv;
 	u16 phy_led_val;
+
+	u16 override_q_count;
 };
 
 /**
@@ -697,6 +718,16 @@ struct i40e_vsi {
 	bool current_isup;	/* Sync 'link up' logging */
 	enum i40e_aq_link_speed current_speed;	/* Sync link speed logging */
 
+	/* channel specific fields */
+	u16 cnt_q_avail; /* num of queues available for channel usage */
+	u16 orig_rss_size;
+	u16 current_rss_size;
+
+	/* keeps track of next_base_queue to be used for channel setup */
+	atomic_t next_base_queue;
+
+	struct list_head ch_list;
+
 	void *priv;	/* client driver data reference. */
 
 	/* VSI specific handlers */
@@ -1004,4 +1035,6 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
 {
 	return !!vsi->xdp_prog;
 }
+
+int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 03412a8..54a6275 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2877,7 +2877,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 	struct i40e_vsi *vsi = ring->vsi;
 	cpumask_var_t mask;
 
-	if (!ring->q_vector || !ring->netdev)
+	if (!ring->q_vector || !ring->netdev || ring->ch)
 		return;
 
 	/* Single TC mode enable XPS */
@@ -2950,7 +2950,14 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	 * initialization. This has to be done regardless of
 	 * DCB as by default everything is mapped to TC0.
 	 */
-	tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
+	if (ring->ch)
+		tx_ctx.rdylist =
+			le16_to_cpu(ring->ch->info.qs_handle[ring->dcb_tc]);
+
+	else
+		tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
 	tx_ctx.rdylist_act = 0;
 
 	/* clear the context in the HMC */
@@ -2972,12 +2979,23 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	}
 
 	/* Now associate this queue with this PCI function */
-	if (vsi->type == I40E_VSI_VMDQ2) {
-		qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
-		qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
-			   I40E_QTX_CTL_VFVM_INDX_MASK;
+	if (ring->ch) {
+		if (ring->ch->type == I40E_VSI_VMDQ2)
+			qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
+		else
+			return -EINVAL;
+
+		qtx_ctl |= (ring->ch->vsi_number <<
+			    I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+			    I40E_QTX_CTL_VFVM_INDX_MASK;
 	} else {
-		qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+		if (vsi->type == I40E_VSI_VMDQ2) {
+			qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
+			qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+				    I40E_QTX_CTL_VFVM_INDX_MASK;
+		} else {
+			qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+		}
 	}
 
 	qtx_ctl |= ((hw->pf_id << I40E_QTX_CTL_PF_INDX_SHIFT) &
@@ -5160,6 +5178,672 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 }
 
 /**
+ * i40e_remove_queue_channels - Remove queue channels for the TCs
+ * @vsi: VSI to be configured
+ *
+ * Remove queue channels for the TCs
+ **/
+static void i40e_remove_queue_channels(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	int ret, i;
+
+	/* Reset rss size that was stored when reconfiguring rss for
+	 * channel VSIs with non-power-of-2 queue count.
+	 */
+	vsi->current_rss_size = 0;
+
+	/* perform cleanup for channels if they exist */
+	if (list_empty(&vsi->ch_list))
+		return;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		struct i40e_vsi *p_vsi;
+
+		list_del(&ch->list);
+		p_vsi = ch->parent_vsi;
+		if (!p_vsi || !ch->initialized) {
+			kfree(ch);
+			continue;
+		}
+		/* Reset queue contexts */
+		for (i = 0; i < ch->num_queue_pairs; i++) {
+			struct i40e_ring *tx_ring, *rx_ring;
+			u16 pf_q;
+
+			pf_q = ch->base_queue + i;
+			tx_ring = vsi->tx_rings[pf_q];
+			tx_ring->ch = NULL;
+
+			rx_ring = vsi->rx_rings[pf_q];
+			rx_ring->ch = NULL;
+		}
+
+		/* delete VSI from FW */
+		ret = i40e_aq_delete_element(&vsi->back->hw, ch->seid,
+					     NULL);
+		if (ret)
+			dev_err(&vsi->back->pdev->dev,
+				"unable to remove channel (%d) for parent VSI(%d)\n",
+				ch->seid, p_vsi->seid);
+		kfree(ch);
+	}
+	INIT_LIST_HEAD(&vsi->ch_list);
+}
+
+/**
+ * i40e_is_any_channel - channel exist or not
+ * @vsi: ptr to VSI to which channels are associated with
+ *
+ * Returns true or false if channel(s) exist for associated VSI or not
+ **/
+static bool i40e_is_any_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (ch->initialized)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * i40e_get_max_queues_for_channel
+ * @vsi: ptr to VSI to which channels are associated with
+ *
+ * Helper function which returns max value among the queue counts set on the
+ * channels/TCs created.
+ **/
+static int i40e_get_max_queues_for_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	int max = 0;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (!ch->initialized)
+			continue;
+		if (ch->num_queue_pairs > max)
+			max = ch->num_queue_pairs;
+	}
+
+	return max;
+}
+
+/**
+ * i40e_validate_num_queues - validate num_queues w.r.t channel
+ * @pf: ptr to PF device
+ * @num_queues: number of queues
+ * @vsi: the parent VSI
+ * @reconfig_rss: indicates should the RSS be reconfigured or not
+ *
+ * This function validates number of queues in the context of new channel
+ * which is being established and determines if RSS should be reconfigured
+ * or not for parent VSI.
+ **/
+static int i40e_validate_num_queues(struct i40e_pf *pf, int num_queues,
+				    struct i40e_vsi *vsi, bool *reconfig_rss)
+{
+	int max_ch_queues;
+
+	if (!reconfig_rss)
+		return -EINVAL;
+
+	*reconfig_rss = false;
+
+	if (num_queues > I40E_MAX_QUEUES_PER_CH) {
+		dev_err(&pf->pdev->dev,
+			"Failed to create VMDq VSI. User requested num_queues (%d) > I40E_MAX_QUEUES_PER_VSI (%u)\n",
+			num_queues, I40E_MAX_QUEUES_PER_CH);
+		return -EINVAL;
+	}
+
+	if (vsi->current_rss_size) {
+		if (num_queues > vsi->current_rss_size) {
+			dev_dbg(&pf->pdev->dev,
+				"Error: num_queues (%d) > vsi's current_size(%d)\n",
+				num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		} else if ((num_queues < vsi->current_rss_size) &&
+			   (!is_power_of_2(num_queues))) {
+			dev_dbg(&pf->pdev->dev,
+				"Error: num_queues (%d) < vsi's current_size(%d), but not power of 2\n",
+				num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		}
+	}
+
+	if (!is_power_of_2(num_queues)) {
+		/* Find the max num_queues configures for channel if channel
+		 * exist.
+		 * if channel exist, then enforce 'num_queues' to be more than
+		 * max ever num_queues configured for channel.
+		 */
+		max_ch_queues = i40e_get_max_queues_for_channel(vsi);
+		if (num_queues < max_ch_queues) {
+			dev_dbg(&pf->pdev->dev,
+				"Error: num_queues (%d) > main_vsi's current_size(%d)\n",
+				num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		}
+		*reconfig_rss = true;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_vsi_reconfig_rss - reconfig RSS based on specified rss_size
+ * @vsi: the VSI being setup
+ * @rss_size: size of RSS, accordingly LUT gets reprogrammed
+ *
+ * This function reconfigures RSS by reprogramming LUTs using 'rss_size'
+ **/
+static int i40e_vsi_reconfig_rss(struct i40e_vsi *vsi, u16 rss_size)
+{
+	struct i40e_pf *pf = vsi->back;
+	u8 seed[I40E_HKEY_ARRAY_SIZE];
+	struct i40e_hw *hw = &pf->hw;
+	int local_rss_size;
+	u8 *lut;
+	int ret;
+
+	if (!vsi->rss_size)
+		return -EINVAL;
+
+	if (rss_size > vsi->rss_size)
+		return -EINVAL;
+
+	local_rss_size = min_t(int, vsi->rss_size, rss_size);
+	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
+	if (!lut)
+		return -ENOMEM;
+
+	/* Ignoring user configured lut if there is one */
+	i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, local_rss_size);
+
+	/* Use user configured hash key if there is one, otherwise
+	 * use default.
+	 */
+	if (vsi->rss_hkey_user)
+		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
+	else
+		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
+
+	ret = i40e_config_rss(vsi, seed, lut, vsi->rss_table_size);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "Cannot set RSS lut, err %s aq_err %s\n",
+			 i40e_stat_str(hw, ret),
+			 i40e_aq_str(hw, hw->aq.asq_last_status));
+		kfree(lut);
+		return ret;
+	}
+	kfree(lut);
+
+	/* Do the update w.r.t. storing rss_size */
+	if (!vsi->orig_rss_size)
+		vsi->orig_rss_size = vsi->rss_size;
+	vsi->current_rss_size = local_rss_size;
+
+	return ret;
+}
+
+/**
+ * i40e_channel_setup_queue_map - Setup a channel queue map based on enabled_tc
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ctxt: VSI context structure
+ * @enabled_tc: Enabled TCs bitmap
+ * @ch: ptr to channel structure
+ *
+ * Setup queue map based on enabled_tc for specific channel
+ **/
+static void i40e_channel_setup_queue_map(struct i40e_pf *pf,
+					 struct i40e_vsi_context *ctxt,
+					 u8 enabled_tc, struct i40e_channel *ch)
+{
+	u16 qcount, num_tc_qps, qmap;
+	int pow, num_qps;
+	u16 sections = 0;
+	/* At least TC0 is enabled in case of non-DCB case, non-MQPRIO */
+	u16 numtc = 1;
+	u8 offset = 0;
+
+	sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
+	sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
+
+	if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
+		qcount = min_t(int, ch->num_queue_pairs,
+			       pf->num_lan_msix);
+		ch->num_queue_pairs = qcount;
+	} else {
+		qcount = 1;
+	}
+
+	/* find num of qps per traffic class */
+	num_tc_qps = qcount / numtc;
+	num_tc_qps = min_t(int, num_tc_qps, i40e_pf_get_max_q_per_tc(pf));
+	num_qps = qcount;
+
+	/* find the next higher power-of-2 of num queue pairs */
+	pow = ilog2(num_qps);
+	if (!is_power_of_2(num_qps))
+		pow++;
+
+	qmap = (offset << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) |
+		(pow << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT);
+
+	/* Setup queue TC[0].qmap for given VSI context */
+	ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
+
+	ctxt->info.up_enable_bits = enabled_tc;
+	ctxt->info.mapping_flags |= cpu_to_le16(I40E_AQ_VSI_QUE_MAP_CONTIG);
+	ctxt->info.queue_mapping[0] = cpu_to_le16(ch->base_queue);
+	ctxt->info.valid_sections |= cpu_to_le16(sections);
+}
+
+/**
+ * i40e_add_channel - add a channel by adding VSI
+ * @pf: ptr to PF device
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @ch: ptr to channel structure
+ *
+ * Add a channel (VSI) using add_vsi and queue_map
+ **/
+static int i40e_add_channel(struct i40e_pf *pf, u16 uplink_seid,
+			    struct i40e_channel *ch)
+{
+	struct i40e_hw *hw = &pf->hw;
+	struct i40e_vsi_context ctxt;
+	u8 enabled_tc = 0x1; /* TC0 enabled */
+	int ret;
+
+	if (ch->type != I40E_VSI_VMDQ2) {
+		dev_info(&pf->pdev->dev,
+			 "add new vsi failed, ch->type %d\n", ch->type);
+		return -EINVAL;
+	}
+
+	memset(&ctxt, 0, sizeof(ctxt));
+	ctxt.pf_num = hw->pf_id;
+	ctxt.vf_num = 0;
+	ctxt.uplink_seid = uplink_seid;
+	ctxt.connection_type = I40E_AQ_VSI_CONN_TYPE_NORMAL;
+	if (ch->type == I40E_VSI_VMDQ2)
+		ctxt.flags = I40E_AQ_VSI_TYPE_VMDQ2;
+
+	if (pf->flags & I40E_FLAG_VEB_MODE_ENABLED) {
+		ctxt.info.valid_sections |=
+		     cpu_to_le16(I40E_AQ_VSI_PROP_SWITCH_VALID);
+		ctxt.info.switch_id =
+		   cpu_to_le16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
+	}
+
+	/* Set queue map for a given VSI context */
+	i40e_channel_setup_queue_map(pf, &ctxt, enabled_tc, ch);
+
+	/* Now time to create VSI */
+	ret = i40e_aq_add_vsi(hw, &ctxt, NULL);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "add new vsi failed, err %s aq_err %s\n",
+			 i40e_stat_str(&pf->hw, ret),
+			 i40e_aq_str(&pf->hw,
+				     pf->hw.aq.asq_last_status));
+		return -ENOENT;
+	}
+
+	/* Success, update channel */
+	ch->enabled_tc = enabled_tc;
+	ch->seid = ctxt.seid;
+	ch->vsi_number = ctxt.vsi_number;
+	ch->stat_counter_idx = cpu_to_le16(ctxt.info.stat_counter_idx);
+
+	/* copy just the sections touched not the entire info
+	 * since not all sections are valid as returned by
+	 * update vsi params
+	 */
+	ch->info.mapping_flags = ctxt.info.mapping_flags;
+	memcpy(&ch->info.queue_mapping,
+	       &ctxt.info.queue_mapping, sizeof(ctxt.info.queue_mapping));
+	memcpy(&ch->info.tc_mapping, ctxt.info.tc_mapping,
+	       sizeof(ctxt.info.tc_mapping));
+
+	return 0;
+}
+
+static int i40e_channel_config_bw(struct i40e_vsi *vsi, struct i40e_channel *ch,
+				  u8 *bw_share)
+{
+	struct i40e_aqc_configure_vsi_tc_bw_data bw_data;
+	i40e_status ret;
+	int i;
+
+	bw_data.tc_valid_bits = ch->enabled_tc;
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		bw_data.tc_bw_credits[i] = bw_share[i];
+
+	ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, ch->seid,
+				       &bw_data, NULL);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Config VSI BW allocation per TC failed, aq_err: %d for new_vsi->seid %u\n",
+			 vsi->back->hw.aq.asq_last_status, ch->seid);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		ch->info.qs_handle[i] = bw_data.qs_handles[i];
+
+	return 0;
+}
+
+/**
+ * i40e_channel_config_tx_ring - config TX ring associated with new channel
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ch: ptr to channel structure
+ *
+ * Configure TX rings associated with channel (VSI) since queues are being
+ * from parent VSI.
+ **/
+static int i40e_channel_config_tx_ring(struct i40e_pf *pf,
+				       struct i40e_vsi *vsi,
+				       struct i40e_channel *ch)
+{
+	i40e_status ret;
+	int i;
+	u8 bw_share[I40E_MAX_TRAFFIC_CLASS] = {0};
+
+	/* Enable ETS TCs with equal BW Share for now across all VSIs */
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+		if (ch->enabled_tc & BIT(i))
+			bw_share[i] = 1;
+	}
+
+	/* configure BW for new VSI */
+	ret = i40e_channel_config_bw(vsi, ch, bw_share);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Failed configuring TC map %d for channel (seid %u)\n",
+			 ch->enabled_tc, ch->seid);
+		return ret;
+	}
+
+	for (i = 0; i < ch->num_queue_pairs; i++) {
+		struct i40e_ring *tx_ring, *rx_ring;
+		u16 pf_q;
+
+		pf_q = ch->base_queue + i;
+
+		/* Get to TX ring ptr of main VSI, for re-setup TX queue
+		 * context
+		 */
+		tx_ring = vsi->tx_rings[pf_q];
+		tx_ring->ch = ch;
+
+		/* Get the RX ring ptr */
+		rx_ring = vsi->rx_rings[pf_q];
+		rx_ring->ch = ch;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_setup_hw_channel - setup new channel
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ch: ptr to channel structure
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @type: type of channel to be created (VMDq2/VF)
+ *
+ * Setup new channel (VSI) based on specified type (VMDq2/VF)
+ * and configures TX rings accordingly
+ **/
+static inline int i40e_setup_hw_channel(struct i40e_pf *pf,
+					struct i40e_vsi *vsi,
+					struct i40e_channel *ch,
+					u16 uplink_seid, u8 type)
+{
+	struct i40e_q_vector *q_vector;
+	int base_queue = 0;
+	int ret, i;
+
+	ch->initialized = false;
+	ch->base_queue = atomic_read(&vsi->next_base_queue);
+	ch->type = type;
+
+	/* Proceed with creation of channel (VMDq2) VSI */
+	ret = i40e_add_channel(pf, uplink_seid, ch);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "failed to add_channel using uplink_seid %u\n",
+			 uplink_seid);
+		return ret;
+	}
+
+	/* Mark the successful creation of channel */
+	ch->initialized = true;
+
+	/* Mark q_vectors indicating that they are part of newly created
+	 * channel (VSI)
+	 */
+	base_queue = ch->base_queue;
+	for (i = 0; i < ch->num_queue_pairs; i++) {
+		q_vector = vsi->tx_rings[base_queue + i]->q_vector;
+
+		if (!q_vector)
+			continue;
+	}
+
+	/* Reconfigure TX queues using QTX_CTL register */
+	ret = i40e_channel_config_tx_ring(pf, vsi, ch);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "failed to configure TX rings for channel %u\n",
+			 ch->seid);
+		return ret;
+	}
+
+	/* update 'next_base_queue' */
+	atomic_set(&vsi->next_base_queue,
+		   atomic_read(&vsi->next_base_queue) + ch->num_queue_pairs);
+
+	dev_dbg(&pf->pdev->dev,
+		"Added channel: vsi_seid %u, vsi_number %u, stat_counter_idx %u, num_queue_pairs %u, pf->next_base_queue %d\n",
+		ch->seid, ch->vsi_number, ch->stat_counter_idx,
+		ch->num_queue_pairs,
+		atomic_read(&vsi->next_base_queue));
+
+	return ret;
+}
+
+/**
+ * i40e_setup_channel - setup new channel using uplink element
+ * @pf: ptr to PF device
+ * @type: type of channel to be created (VMDq2/VF)
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @ch: ptr to channel structure
+ *
+ * Setup new channel (VSI) based on specified type (VMDq2/VF)
+ * and uplink switching element (uplink_seid)
+ **/
+static bool i40e_setup_channel(struct i40e_pf *pf, struct i40e_vsi *vsi,
+			       struct i40e_channel *ch)
+{
+	u16 seid = vsi->seid;
+	u8 vsi_type;
+	int ret;
+
+	if (vsi->type == I40E_VSI_MAIN) {
+		vsi_type = I40E_VSI_VMDQ2;
+	} else {
+		dev_err(&pf->pdev->dev, "unsupported vsi type(%d) of parent vsi\n",
+			vsi->type);
+		return false;
+	}
+
+	/* underlying switching element */
+	seid = pf->vsi[pf->lan_vsi]->uplink_seid;
+
+	/* create channel (VSI), configure TX rings */
+	ret = i40e_setup_hw_channel(pf, vsi, ch, seid, vsi_type);
+	if (ret) {
+		dev_err(&pf->pdev->dev, "failed to setup hw_channel\n");
+		return false;
+	}
+
+	return ch->initialized ? true : false;
+}
+
+/**
+ * i40e_create_queue_channel - function to create channel
+ * @vsi: VSI to be configured
+ * @ch: ptr to channel (it contains channel specific params)
+ *
+ * This function creates channel (VSI) using num_queues specified by user,
+ * reconfigs RSS if needed.
+ **/
+int i40e_create_queue_channel(struct i40e_vsi *vsi,
+			      struct i40e_channel *ch)
+{
+	struct i40e_pf *pf = vsi->back;
+	bool reconfig_rss;
+	int err;
+
+	if (!ch)
+		return -EINVAL;
+
+	if (!ch->num_queue_pairs) {
+		dev_err(&pf->pdev->dev, "Invalid num_queues requested: %d\n",
+			ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* validate user requested num_queues for channel */
+	err = i40e_validate_num_queues(pf, ch->num_queue_pairs, vsi,
+				       &reconfig_rss);
+	if (err) {
+		dev_info(&pf->pdev->dev, "Failed to validate num_queues (%d)\n",
+			 ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* By default we are in VEPA mode, if this is the first VF/VMDq
+	 * VSI to be added switch to VEB mode.
+	 */
+	if ((!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) ||
+	    (!i40e_is_any_channel(vsi))) {
+		if (!is_power_of_2(vsi->tc_config.tc_info[0].qcount)) {
+			dev_dbg(&pf->pdev->dev,
+				"Failed to create channel. Override queues (%u) not power of 2\n",
+				vsi->tc_config.tc_info[0].qcount);
+			return -EINVAL;
+		}
+
+		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
+			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
+
+			if (vsi->type == I40E_VSI_MAIN) {
+				if (pf->flags & I40E_FLAG_TC_MQPRIO)
+					i40e_do_reset(pf, I40E_PF_RESET_FLAG,
+						      true);
+				else
+					i40e_do_reset_safe(pf,
+							   I40E_PF_RESET_FLAG);
+			}
+		}
+		/* now onwards for main VSI, number of queues will be value
+		 * of TC0's queue count
+		 */
+	}
+
+	/* By this time, vsi->cnt_q_avail shall be set to non-zero and
+	 * it should be more than num_queues
+	 */
+	if (!vsi->cnt_q_avail || (vsi->cnt_q_avail < ch->num_queue_pairs)) {
+		dev_dbg(&pf->pdev->dev,
+			"Error: cnt_q_avail (%u) less than num_queues %d\n",
+			vsi->cnt_q_avail, ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* reconfig_rss only if vsi type is MAIN_VSI */
+	if (reconfig_rss && (vsi->type == I40E_VSI_MAIN)) {
+		err = i40e_vsi_reconfig_rss(vsi, ch->num_queue_pairs);
+		if (err) {
+			dev_info(&pf->pdev->dev,
+				 "Error: unable to reconfig rss for num_queues (%u)\n",
+				 ch->num_queue_pairs);
+			return -EINVAL;
+		}
+	}
+
+	if (!i40e_setup_channel(pf, vsi, ch)) {
+		dev_info(&pf->pdev->dev, "Failed to setup channel\n");
+		return -EINVAL;
+	}
+
+	dev_info(&pf->pdev->dev,
+		 "Setup channel (id:%u) utilizing num_queues %d\n",
+		 ch->seid, ch->num_queue_pairs);
+
+	/* in case of VF, this will be main SRIOV VSI */
+	ch->parent_vsi = vsi;
+
+	/* and update main_vsi's count for queue_available to use */
+	vsi->cnt_q_avail -= ch->num_queue_pairs;
+
+	return 0;
+}
+
+/**
+ * i40e_configure_queue_channels - Add queue channel for the given TCs
+ * @vsi: VSI to be configured
+ *
+ * Configures queue channel mapping to the given TCs
+ **/
+static int i40e_configure_queue_channels(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch;
+	int ret = 0, i;
+
+	/* Create app vsi with the TCs. Main VSI with TC0 is already set up */
+	for (i = 1; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		if (vsi->tc_config.enabled_tc & BIT(i)) {
+			ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+			if (!ch) {
+				ret = -ENOMEM;
+				goto err_free;
+			}
+
+			INIT_LIST_HEAD(&ch->list);
+			ch->num_queue_pairs =
+				vsi->tc_config.tc_info[i].qcount;
+			ch->base_queue =
+				vsi->tc_config.tc_info[i].qoffset;
+
+			list_add_tail(&ch->list, &vsi->ch_list);
+
+			ret = i40e_create_queue_channel(vsi, ch);
+			if (ret) {
+				dev_err(&vsi->back->pdev->dev,
+					"Failed creating queue channel with TC%d: queues %d\n",
+					i, ch->num_queue_pairs);
+				goto err_free;
+			}
+		}
+	return ret;
+
+err_free:
+	i40e_remove_queue_channels(vsi);
+	return ret;
+}
+
+/**
  * i40e_veb_config_tc - Configure TCs for given VEB
  * @veb: given VEB
  * @enabled_tc: TC bitmap
@@ -5604,10 +6288,18 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 		goto exit;
 	}
 
-	/* Unquiesce VSI */
-	i40e_unquiesce_vsi(vsi);
+	if (pf->flags & I40E_FLAG_TC_MQPRIO) {
+		ret = i40e_configure_queue_channels(vsi);
+		if (ret) {
+			netdev_info(netdev,
+				    "Failed configuring queue channels\n");
+			goto exit;
+		}
+	}
 
 exit:
+	/* Unquiesce VSI */
+	i40e_unquiesce_vsi(vsi);
 	return ret;
 }
 
@@ -7008,6 +7700,35 @@ static void i40e_fdir_teardown(struct i40e_pf *pf)
 }
 
 /**
+ * i40e_rebuild_channels - Rebuilds channel VSIs if they existed before reset
+ * @vsi: PF main vsi
+ *
+ * Rebuilds channel VSIs if they existed before reset
+ **/
+static int i40e_rebuild_channels(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	i40e_status ret;
+
+	if (list_empty(&vsi->ch_list))
+		return 0;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (!ch->initialized)
+			break;
+		/* Proceed with creation of channel (VMDq2) VSI */
+		ret = i40e_add_channel(vsi->back, vsi->uplink_seid, ch);
+		if (ret) {
+			dev_info(&vsi->back->pdev->dev,
+				 "failed to rebuild channels using uplink_seid %u\n",
+				 vsi->uplink_seid);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+/**
  * i40e_prep_for_reset - prep for the core to reset
  * @pf: board private structure
  * @lock_acquired: indicates whether or not the lock has been acquired
@@ -7272,6 +7993,13 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		}
 	}
 
+	/* PF Main VSI is rebuild by now, go ahead and rebuild channel VSIs
+	 * for this main VSI if they exist
+	 */
+	ret = i40e_rebuild_channels(pf->vsi[pf->lan_vsi]);
+	if (ret)
+		goto end_unlock;
+
 	/* Reconfigure hardware for allowing smaller MSS in the case
 	 * of TSO, so that we avoid the MDD being fired and causing
 	 * a reset in the case of small MSS+TSO.
@@ -11489,6 +12217,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev_info(&pdev->dev, "setup_pf_switch failed: %d\n", err);
 		goto err_vsis;
 	}
+	INIT_LIST_HEAD(&pf->vsi[pf->lan_vsi]->ch_list);
 
 	/* Make sure flow control is set according to current settings */
 	err = i40e_set_fc(hw, &set_fc_aq_fail, true);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index f0a0eab..374a0a9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -423,6 +423,8 @@ struct i40e_ring {
 					 * i40e_clean_rx_ring_irq() is called
 					 * for this ring.
 					 */
+
+	struct i40e_channel *ch;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ring_uses_build_skb(struct i40e_ring *ring)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/6] [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 10:18   ` Amritha Nambiar
  -1 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

The i40e driver is modified to enable the new mqprio hardware
offload mode and factor the TCs and queue configuration by
creating channel VSIs. In this mode, the priority to traffic
class mapping and the user specified queue ranges are used
to configure the traffic classes when the 'hw' option is set
to 2.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5 hw 2

# tc qdisc show dev eth0

qdisc mqprio 8038: root  tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
             queues:(0:1) (2:3) (4:4) (5:5)

The HW channels created are removed and all the queue configuration
is set to default when the qdisc is detached from the root of the
device.

# tc qdisc del dev eth0 root

This patch also disables setting up channels via ethtool (ethtool -L)
when the TCs are configured using mqprio scheduler.

The patch also limits setting ethtool Rx flow hash indirection
(ethtool -X eth0 equal N) to max queues configured via mqprio.
The Rx flow hash indirection input through ethtool should be
validated so that it is within in the queue range configured via
tc/mqprio. The bound checking is achieved by reporting the current
rss size to the kernel when queues are configured via mqprio.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 1 0 2 3 0 queues 2@0 4@2 8@6 11@14 hw 2

# ethtool -X eth0 equal 12
Cannot set RX flow hash configuration: Invalid argument

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |    3 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |    8 
 drivers/net/ethernet/intel/i40e/i40e_main.c    |  442 ++++++++++++++++++------
 3 files changed, 342 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 93bb527..501d62e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,7 @@
 #include <linux/clocksource.h>
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/pkt_cls.h>
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #include "i40e_client.h"
@@ -697,6 +698,7 @@ struct i40e_vsi {
 	enum i40e_vsi_type type;  /* VSI type, e.g., LAN, FCoE, etc */
 	s16 vf_id;		/* Virtual function ID for SRIOV VSIs */
 
+	struct tc_mqprio_qopt_offload mqprio_qopt; /* queue parameters */
 	struct i40e_tc_configuration tc_config;
 	struct i40e_aqc_vsi_properties_data info;
 
@@ -722,6 +724,7 @@ struct i40e_vsi {
 	u16 cnt_q_avail; /* num of queues available for channel usage */
 	u16 orig_rss_size;
 	u16 current_rss_size;
+	bool reconfig_rss;
 
 	/* keeps track of next_base_queue to be used for channel setup */
 	atomic_t next_base_queue;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a868c8d..d61af22 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2626,7 +2626,7 @@ static int i40e_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 
 	switch (cmd->cmd) {
 	case ETHTOOL_GRXRINGS:
-		cmd->data = vsi->num_queue_pairs;
+		cmd->data = vsi->rss_size;
 		ret = 0;
 		break;
 	case ETHTOOL_GRXFH:
@@ -3871,6 +3871,12 @@ static int i40e_set_channels(struct net_device *dev,
 	if (vsi->type != I40E_VSI_MAIN)
 		return -EINVAL;
 
+	/* We do not support setting channels via ethtool when TCs are
+	 * configured through mqprio
+	 */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return -EINVAL;
+
 	/* verify they are not requesting separate vectors */
 	if (!count || ch->rx_count || ch->tx_count)
 		return -EINVAL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 54a6275..40084b7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1578,6 +1578,169 @@ static int i40e_set_mac(struct net_device *netdev, void *p)
 }
 
 /**
+ * i40e_config_rss_aq - Prepare for RSS using AQ commands
+ * @vsi: vsi structure
+ * @seed: RSS hash seed
+ **/
+static int i40e_config_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
+			      u8 *lut, u16 lut_size)
+{
+	struct i40e_pf *pf = vsi->back;
+	struct i40e_hw *hw = &pf->hw;
+	int ret = 0;
+
+	if (seed) {
+		struct i40e_aqc_get_set_rss_key_data *seed_dw =
+			(struct i40e_aqc_get_set_rss_key_data *)seed;
+		ret = i40e_aq_set_rss_key(hw, vsi->id, seed_dw);
+		if (ret) {
+			dev_info(&pf->pdev->dev,
+				 "Cannot set RSS key, err %s aq_err %s\n",
+				 i40e_stat_str(hw, ret),
+				 i40e_aq_str(hw, hw->aq.asq_last_status));
+			return ret;
+		}
+	}
+	if (lut) {
+		bool pf_lut = vsi->type == I40E_VSI_MAIN ? true : false;
+
+		ret = i40e_aq_set_rss_lut(hw, vsi->id, pf_lut, lut, lut_size);
+		if (ret) {
+			dev_info(&pf->pdev->dev,
+				 "Cannot set RSS lut, err %s aq_err %s\n",
+				 i40e_stat_str(hw, ret),
+				 i40e_aq_str(hw, hw->aq.asq_last_status));
+			return ret;
+		}
+	}
+	return ret;
+}
+
+/**
+ * i40e_vsi_config_rss - Prepare for VSI(VMDq) RSS if used
+ * @vsi: VSI structure
+ **/
+static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
+{
+	u8 seed[I40E_HKEY_ARRAY_SIZE];
+	struct i40e_pf *pf = vsi->back;
+	u8 *lut;
+	int ret;
+
+	if (!(pf->hw_features & I40E_HW_RSS_AQ_CAPABLE))
+		return 0;
+	if (!vsi->rss_size)
+		vsi->rss_size = min_t(int, pf->alloc_rss_size,
+				      vsi->num_queue_pairs);
+	if (!vsi->rss_size)
+		return -EINVAL;
+	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
+	if (!lut)
+		return -ENOMEM;
+
+	/* Use the user configured hash keys and lookup table if there is one,
+	 * otherwise use default
+	 */
+	if (vsi->rss_lut_user)
+		memcpy(lut, vsi->rss_lut_user, vsi->rss_table_size);
+	else
+		i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, vsi->rss_size);
+	if (vsi->rss_hkey_user)
+		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
+	else
+		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
+	ret = i40e_config_rss_aq(vsi, seed, lut, vsi->rss_table_size);
+	kfree(lut);
+	return ret;
+}
+
+/**
+ * i40e_vsi_setup_queue_map_mqprio - Prepares VSI tc_config to have queue
+ * configurations based on MQPRIO options.
+ * @vsi: the VSI being configured,
+ * @ctxt: VSI context structure
+ * @enabled_tc: number of traffic classes to enable
+ **/
+static int i40e_vsi_setup_queue_map_mqprio(struct i40e_vsi *vsi,
+					   struct i40e_vsi_context *ctxt,
+					   u8 enabled_tc)
+{
+	u16 qcount = 0, max_qcount, qmap, sections = 0;
+	int i, override_q, pow, num_qps, ret;
+	u8 netdev_tc = 0, offset = 0;
+
+	if (vsi->type != I40E_VSI_MAIN)
+		return -EINVAL;
+	sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
+	sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
+	vsi->tc_config.numtc = vsi->mqprio_qopt.qopt.num_tc;
+	vsi->tc_config.enabled_tc = enabled_tc ? enabled_tc : 1;
+	num_qps = vsi->mqprio_qopt.qopt.count[0];
+
+	/* find the next higher power-of-2 of num queue pairs */
+	pow = ilog2(num_qps);
+	if (!is_power_of_2(num_qps))
+		pow++;
+	qmap = (offset << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) |
+		(pow << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT);
+
+	/* Setup queue offset/count for all TCs for given VSI */
+	max_qcount = vsi->mqprio_qopt.qopt.count[0];
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+		/* See if the given TC is enabled for the given VSI */
+		if (vsi->tc_config.enabled_tc & BIT(i)) {
+			offset = vsi->mqprio_qopt.qopt.offset[i];
+			qcount = vsi->mqprio_qopt.qopt.count[i];
+			if (qcount > max_qcount)
+				max_qcount = qcount;
+			vsi->tc_config.tc_info[i].qoffset = offset;
+			vsi->tc_config.tc_info[i].qcount = qcount;
+			vsi->tc_config.tc_info[i].netdev_tc = netdev_tc++;
+		} else {
+			/* TC is not enabled so set the offset to
+			 * default queue and allocate one queue
+			 * for the given TC.
+			 */
+			vsi->tc_config.tc_info[i].qoffset = 0;
+			vsi->tc_config.tc_info[i].qcount = 1;
+			vsi->tc_config.tc_info[i].netdev_tc = 0;
+		}
+	}
+
+	/* Set actual Tx/Rx queue pairs */
+	vsi->num_queue_pairs = offset + qcount;
+
+	/* Setup queue TC[0].qmap for given VSI context */
+	ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
+	ctxt->info.mapping_flags |= cpu_to_le16(I40E_AQ_VSI_QUE_MAP_CONTIG);
+	ctxt->info.queue_mapping[0] = cpu_to_le16(vsi->base_queue);
+	ctxt->info.valid_sections |= cpu_to_le16(sections);
+
+	/* Reconfigure RSS for main VSI with max queue count */
+	vsi->rss_size = max_qcount;
+	ret = i40e_vsi_config_rss(vsi);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Failed to reconfig rss for num_queues (%u)\n",
+			 max_qcount);
+		return ret;
+	}
+	vsi->reconfig_rss = true;
+	dev_dbg(&vsi->back->pdev->dev,
+		"Reconfigured rss with num_queues (%u)\n", max_qcount);
+
+	/* Find queue count available for channel VSIs and starting offset
+	 * for channel VSIs
+	 */
+	override_q = vsi->mqprio_qopt.qopt.count[0];
+	if (override_q && (override_q < vsi->num_queue_pairs)) {
+		vsi->cnt_q_avail = vsi->num_queue_pairs - override_q;
+		atomic_set(&vsi->next_base_queue, override_q);
+	}
+	return 0;
+}
+
+/**
  * i40e_vsi_setup_queue_map - Setup a VSI queue map based on enabled_tc
  * @vsi: the VSI being setup
  * @ctxt: VSI context structure
@@ -1615,7 +1778,7 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi,
 			numtc = 1;
 		}
 	} else {
-		/* At least TC0 is enabled in case of non-DCB case */
+		/* At least TC0 is enabled in non-DCB, non-MQPRIO case */
 		numtc = 1;
 	}
 
@@ -3164,6 +3327,7 @@ static void i40e_vsi_config_dcb_rings(struct i40e_vsi *vsi)
 			rx_ring->dcb_tc = 0;
 			tx_ring->dcb_tc = 0;
 		}
+		return;
 	}
 
 	for (n = 0; n < I40E_MAX_TRAFFIC_CLASS; n++) {
@@ -4872,6 +5036,24 @@ static u8 i40e_dcb_get_enabled_tc(struct i40e_dcbx_config *dcbcfg)
 }
 
 /**
+ * i40e_mqprio_get_enabled_tc - Get enabled traffic classes
+ * @pf: PF being queried
+ *
+ * Query the current MQPRIO configuration and return the number of
+ * traffic classes enabled.
+ **/
+static u8 i40e_mqprio_get_enabled_tc(struct i40e_pf *pf)
+{
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+	u8 num_tc = vsi->mqprio_qopt.qopt.num_tc;
+	u8 enabled_tc = 1, i;
+
+	for (i = 1; i < num_tc; i++)
+		enabled_tc |= BIT(i);
+	return enabled_tc;
+}
+
+/**
  * i40e_pf_get_num_tc - Get enabled traffic classes for PF
  * @pf: PF being queried
  *
@@ -4884,7 +5066,10 @@ static u8 i40e_pf_get_num_tc(struct i40e_pf *pf)
 	u8 num_tc = 0;
 	struct i40e_dcbx_config *dcbcfg = &hw->local_dcbx_config;
 
-	/* If DCB is not enabled then always in single TC */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return pf->vsi[pf->lan_vsi]->mqprio_qopt.qopt.num_tc;
+
+	/* If neither MQPRIO nor DCB is enabled, then always use single TC */
 	if (!(pf->flags & I40E_FLAG_DCB_ENABLED))
 		return 1;
 
@@ -4913,7 +5098,12 @@ static u8 i40e_pf_get_num_tc(struct i40e_pf *pf)
  **/
 static u8 i40e_pf_get_tc_map(struct i40e_pf *pf)
 {
-	/* If DCB is not enabled for this PF then just return default TC */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return i40e_mqprio_get_enabled_tc(pf);
+
+	/* If neither MQPRIO nor DCB is enabled for this PF then just return
+	 * default TC
+	 */
 	if (!(pf->flags & I40E_FLAG_DCB_ENABLED))
 		return I40E_DEFAULT_TRAFFIC_CLASS;
 
@@ -5003,6 +5193,9 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	i40e_status ret;
 	int i;
 
+	if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
+	    !vsi->mqprio_qopt.qopt.hw)
+		return 0;
 	bw_data.tc_valid_bits = enabled_tc;
 	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
 		bw_data.tc_bw_credits[i] = bw_share[i];
@@ -5065,6 +5258,9 @@ static void i40e_vsi_config_netdev_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 					vsi->tc_config.tc_info[i].qoffset);
 	}
 
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return;
+
 	/* Assign UP2TC map for the VSI */
 	for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
 		/* Get the actual TC# for the UP */
@@ -5115,7 +5311,8 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 	int i;
 
 	/* Check if enabled_tc is same as existing or new TCs */
-	if (vsi->tc_config.enabled_tc == enabled_tc)
+	if (vsi->tc_config.enabled_tc == enabled_tc &&
+	    vsi->mqprio_qopt.qopt.hw != TC_MQPRIO_HW_OFFLOAD)
 		return ret;
 
 	/* Enable ETS TCs with equal BW Share for now across all VSIs */
@@ -5138,15 +5335,37 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 	ctxt.vf_num = 0;
 	ctxt.uplink_seid = vsi->uplink_seid;
 	ctxt.info = vsi->info;
-	i40e_vsi_setup_queue_map(vsi, &ctxt, enabled_tc, false);
+	if (vsi->back->flags & I40E_FLAG_TC_MQPRIO) {
+		ret = i40e_vsi_setup_queue_map_mqprio(vsi, &ctxt, enabled_tc);
+		if (ret)
+			goto out;
+	} else {
+		i40e_vsi_setup_queue_map(vsi, &ctxt, enabled_tc, false);
+	}
 
+	/* On destroying the qdisc, reset vsi->rss_size, as number of enabled
+	 * queues changed.
+	 */
+	if (!vsi->mqprio_qopt.qopt.hw && vsi->reconfig_rss) {
+		vsi->rss_size = min_t(int, vsi->back->alloc_rss_size,
+				      vsi->num_queue_pairs);
+		ret = i40e_vsi_config_rss(vsi);
+		if (ret) {
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reconfig rss for num_queues\n");
+			return ret;
+		}
+		vsi->reconfig_rss = false;
+	}
 	if (vsi->back->flags & I40E_FLAG_IWARP_ENABLED) {
 		ctxt.info.valid_sections |=
 				cpu_to_le16(I40E_AQ_VSI_PROP_QUEUE_OPT_VALID);
 		ctxt.info.queueing_opt_flags |= I40E_AQ_VSI_QUE_OPT_TCP_ENA;
 	}
 
-	/* Update the VSI after updating the VSI queue-mapping information */
+	/* Update the VSI after updating the VSI queue-mapping
+	 * information
+	 */
 	ret = i40e_aq_update_vsi_params(&vsi->back->hw, &ctxt, NULL);
 	if (ret) {
 		dev_info(&vsi->back->pdev->dev,
@@ -6238,48 +6457,131 @@ void i40e_down(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_validate_mqprio_qopt- validate queue mapping info
+ * @vsi: the VSI being configured
+ * @mqprio_qopt: queue parametrs
+ **/
+static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
+				     struct tc_mqprio_qopt_offload *mqprio_qopt)
+{
+	int i;
+
+	if ((mqprio_qopt->qopt.offset[0] != 0) ||
+	    (mqprio_qopt->qopt.num_tc < 1) ||
+	    (mqprio_qopt->qopt.num_tc > I40E_MAX_TRAFFIC_CLASS))
+		return -EINVAL;
+	for (i = 0; ; i++) {
+		if (!mqprio_qopt->qopt.count[i])
+			return -EINVAL;
+		if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+			return -EINVAL;
+		if (i >= mqprio_qopt->qopt.num_tc - 1)
+			break;
+		if (mqprio_qopt->qopt.offset[i + 1] !=
+		    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i]))
+			return -EINVAL;
+	}
+	if (vsi->num_queue_pairs <
+	    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i])) {
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
  * i40e_setup_tc - configure multiple traffic classes
  * @netdev: net device to configure
- * @tc: number of traffic classes to enable
+ * @tc: pointer to struct tc_to_netdev
  **/
-static int i40e_setup_tc(struct net_device *netdev, u8 tc)
+static int i40e_setup_tc(struct net_device *netdev, struct tc_to_netdev *tc)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
-	u8 enabled_tc = 0;
+	u8 enabled_tc = 0, num_tc = 0, hw = 0;
 	int ret = -EINVAL;
 	int i;
 
-	/* Check if DCB enabled to continue */
-	if (!(pf->flags & I40E_FLAG_DCB_ENABLED)) {
-		netdev_info(netdev, "DCB is not enabled for adapter\n");
-		goto exit;
-	}
+	if (tc->type == TC_SETUP_MQPRIO) {
+		hw = tc->mqprio->hw;
+		num_tc = tc->mqprio->num_tc;
+	} else if (tc->type == TC_SETUP_MQPRIO_EXT) {
+		hw = tc->mqprio_qopt->qopt.hw;
+		num_tc = tc->mqprio_qopt->qopt.num_tc;
+	}
+	if (!hw) {
+		pf->flags &= ~I40E_FLAG_TC_MQPRIO;
+		if (tc->type == TC_SETUP_MQPRIO_EXT)
+			memcpy(&vsi->mqprio_qopt, tc->mqprio_qopt,
+			       sizeof(*tc->mqprio_qopt));
+		goto config_tc;
+	}
+	switch (hw) {
+	case TC_MQPRIO_HW_OFFLOAD_TCS:
+		pf->flags &= ~I40E_FLAG_TC_MQPRIO;
+
+		/* Check if DCB enabled to continue */
+		if (!(pf->flags & I40E_FLAG_DCB_ENABLED)) {
+			netdev_info(netdev,
+				    "DCB is not enabled for adapter\n");
+			goto exit;
+		}
 
-	/* Check if MFP enabled */
-	if (pf->flags & I40E_FLAG_MFP_ENABLED) {
-		netdev_info(netdev, "Configuring TC not supported in MFP mode\n");
-		goto exit;
-	}
+		/* Check if MFP enabled */
+		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+			netdev_info(netdev,
+				    "Configuring TC not supported in MFP mode\n");
+			goto exit;
+		}
 
-	/* Check whether tc count is within enabled limit */
-	if (tc > i40e_pf_get_num_tc(pf)) {
-		netdev_info(netdev, "TC count greater than enabled on link for adapter\n");
-		goto exit;
+		/* Check whether tc count is within enabled limit */
+		if (num_tc > i40e_pf_get_num_tc(pf)) {
+			netdev_info(netdev,
+				    "TC count greater than enabled on link for adapter\n");
+			goto exit;
+		}
+		break;
+	case TC_MQPRIO_HW_OFFLOAD:
+		if (pf->flags & I40E_FLAG_DCB_ENABLED) {
+			netdev_info(netdev,
+				    "Full offload of TC Mqprio options is not supported when DCB is enabled\n");
+			goto exit;
+		}
+
+		/* Check if MFP enabled */
+		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+			netdev_info(netdev,
+				    "Configuring TC not supported in MFP mode\n");
+			goto exit;
+		}
+		ret = i40e_validate_mqprio_qopt(vsi, tc->mqprio_qopt);
+		if (ret)
+			goto exit;
+		memcpy(&vsi->mqprio_qopt, tc->mqprio_qopt,
+		       sizeof(*tc->mqprio_qopt));
+		pf->flags |= I40E_FLAG_TC_MQPRIO;
+		pf->flags &= ~I40E_FLAG_DCB_ENABLED;
+		break;
+	default:
+		return -EINVAL;
 	}
 
+config_tc:
 	/* Generate TC map for number of tc requested */
-	for (i = 0; i < tc; i++)
+	for (i = 0; i < num_tc; i++)
 		enabled_tc |= BIT(i);
 
 	/* Requesting same TC configuration as already enabled */
-	if (enabled_tc == vsi->tc_config.enabled_tc)
+	if (enabled_tc == vsi->tc_config.enabled_tc &&
+	    hw != TC_MQPRIO_HW_OFFLOAD)
 		return 0;
 
 	/* Quiesce VSI queues */
 	i40e_quiesce_vsi(vsi);
 
+	if (!hw && !(pf->flags & I40E_FLAG_TC_MQPRIO))
+		i40e_remove_queue_channels(vsi);
+
 	/* Configure VSI for enabled TCs */
 	ret = i40e_vsi_config_tc(vsi, enabled_tc);
 	if (ret) {
@@ -6297,7 +6599,11 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 		}
 	}
 
+	goto out;
 exit:
+	/* Reset the configuration data */
+	memset(&vsi->tc_config, 0, sizeof(vsi->tc_config));
+out:
 	/* Unquiesce VSI */
 	i40e_unquiesce_vsi(vsi);
 	return ret;
@@ -6307,12 +6613,7 @@ static int __i40e_setup_tc(struct net_device *netdev, u32 handle,
 			   u32 chain_index, __be16 proto,
 			   struct tc_to_netdev *tc)
 {
-	if (tc->type != TC_SETUP_MQPRIO)
-		return -EINVAL;
-
-	tc->mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
-
-	return i40e_setup_tc(netdev, tc->mqprio->num_tc);
+	return i40e_setup_tc(netdev, tc);
 }
 
 /**
@@ -9110,45 +9411,6 @@ static int i40e_setup_misc_vector(struct i40e_pf *pf)
 }
 
 /**
- * i40e_config_rss_aq - Prepare for RSS using AQ commands
- * @vsi: vsi structure
- * @seed: RSS hash seed
- **/
-static int i40e_config_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
-			      u8 *lut, u16 lut_size)
-{
-	struct i40e_pf *pf = vsi->back;
-	struct i40e_hw *hw = &pf->hw;
-	int ret = 0;
-
-	if (seed) {
-		struct i40e_aqc_get_set_rss_key_data *seed_dw =
-			(struct i40e_aqc_get_set_rss_key_data *)seed;
-		ret = i40e_aq_set_rss_key(hw, vsi->id, seed_dw);
-		if (ret) {
-			dev_info(&pf->pdev->dev,
-				 "Cannot set RSS key, err %s aq_err %s\n",
-				 i40e_stat_str(hw, ret),
-				 i40e_aq_str(hw, hw->aq.asq_last_status));
-			return ret;
-		}
-	}
-	if (lut) {
-		bool pf_lut = vsi->type == I40E_VSI_MAIN ? true : false;
-
-		ret = i40e_aq_set_rss_lut(hw, vsi->id, pf_lut, lut, lut_size);
-		if (ret) {
-			dev_info(&pf->pdev->dev,
-				 "Cannot set RSS lut, err %s aq_err %s\n",
-				 i40e_stat_str(hw, ret),
-				 i40e_aq_str(hw, hw->aq.asq_last_status));
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
  * i40e_get_rss_aq - Get RSS keys and lut by using AQ commands
  * @vsi: Pointer to vsi structure
  * @seed: Buffter to store the hash keys
@@ -9195,46 +9457,6 @@ static int i40e_get_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
 }
 
 /**
- * i40e_vsi_config_rss - Prepare for VSI(VMDq) RSS if used
- * @vsi: VSI structure
- **/
-static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
-{
-	u8 seed[I40E_HKEY_ARRAY_SIZE];
-	struct i40e_pf *pf = vsi->back;
-	u8 *lut;
-	int ret;
-
-	if (!(pf->hw_features & I40E_HW_RSS_AQ_CAPABLE))
-		return 0;
-
-	if (!vsi->rss_size)
-		vsi->rss_size = min_t(int, pf->alloc_rss_size,
-				      vsi->num_queue_pairs);
-	if (!vsi->rss_size)
-		return -EINVAL;
-
-	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
-	if (!lut)
-		return -ENOMEM;
-	/* Use the user configured hash keys and lookup table if there is one,
-	 * otherwise use default
-	 */
-	if (vsi->rss_lut_user)
-		memcpy(lut, vsi->rss_lut_user, vsi->rss_table_size);
-	else
-		i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, vsi->rss_size);
-	if (vsi->rss_hkey_user)
-		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
-	else
-		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
-	ret = i40e_config_rss_aq(vsi, seed, lut, vsi->rss_table_size);
-	kfree(lut);
-
-	return ret;
-}
-
-/**
  * i40e_config_rss_reg - Configure RSS keys and lut by writing registers
  * @vsi: Pointer to vsi structure
  * @seed: RSS hash seed

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 4/6] [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping
@ 2017-07-11 10:18   ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan

The i40e driver is modified to enable the new mqprio hardware
offload mode and factor the TCs and queue configuration by
creating channel VSIs. In this mode, the priority to traffic
class mapping and the user specified queue ranges are used
to configure the traffic classes when the 'hw' option is set
to 2.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 0 1 2 2 3 queues 2 at 0 2 at 2 1 at 4 1 at 5 hw 2

# tc qdisc show dev eth0

qdisc mqprio 8038: root  tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
             queues:(0:1) (2:3) (4:4) (5:5)

The HW channels created are removed and all the queue configuration
is set to default when the qdisc is detached from the root of the
device.

# tc qdisc del dev eth0 root

This patch also disables setting up channels via ethtool (ethtool -L)
when the TCs are configured using mqprio scheduler.

The patch also limits setting ethtool Rx flow hash indirection
(ethtool -X eth0 equal N) to max queues configured via mqprio.
The Rx flow hash indirection input through ethtool should be
validated so that it is within in the queue range configured via
tc/mqprio. The bound checking is achieved by reporting the current
rss size to the kernel when queues are configured via mqprio.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 1 0 2 3 0 queues 2 at 0 4 at 2 8 at 6 11 at 14 hw 2

# ethtool -X eth0 equal 12
Cannot set RX flow hash configuration: Invalid argument

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |    3 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |    8 
 drivers/net/ethernet/intel/i40e/i40e_main.c    |  442 ++++++++++++++++++------
 3 files changed, 342 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 93bb527..501d62e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,7 @@
 #include <linux/clocksource.h>
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/pkt_cls.h>
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #include "i40e_client.h"
@@ -697,6 +698,7 @@ struct i40e_vsi {
 	enum i40e_vsi_type type;  /* VSI type, e.g., LAN, FCoE, etc */
 	s16 vf_id;		/* Virtual function ID for SRIOV VSIs */
 
+	struct tc_mqprio_qopt_offload mqprio_qopt; /* queue parameters */
 	struct i40e_tc_configuration tc_config;
 	struct i40e_aqc_vsi_properties_data info;
 
@@ -722,6 +724,7 @@ struct i40e_vsi {
 	u16 cnt_q_avail; /* num of queues available for channel usage */
 	u16 orig_rss_size;
 	u16 current_rss_size;
+	bool reconfig_rss;
 
 	/* keeps track of next_base_queue to be used for channel setup */
 	atomic_t next_base_queue;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a868c8d..d61af22 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2626,7 +2626,7 @@ static int i40e_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 
 	switch (cmd->cmd) {
 	case ETHTOOL_GRXRINGS:
-		cmd->data = vsi->num_queue_pairs;
+		cmd->data = vsi->rss_size;
 		ret = 0;
 		break;
 	case ETHTOOL_GRXFH:
@@ -3871,6 +3871,12 @@ static int i40e_set_channels(struct net_device *dev,
 	if (vsi->type != I40E_VSI_MAIN)
 		return -EINVAL;
 
+	/* We do not support setting channels via ethtool when TCs are
+	 * configured through mqprio
+	 */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return -EINVAL;
+
 	/* verify they are not requesting separate vectors */
 	if (!count || ch->rx_count || ch->tx_count)
 		return -EINVAL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 54a6275..40084b7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1578,6 +1578,169 @@ static int i40e_set_mac(struct net_device *netdev, void *p)
 }
 
 /**
+ * i40e_config_rss_aq - Prepare for RSS using AQ commands
+ * @vsi: vsi structure
+ * @seed: RSS hash seed
+ **/
+static int i40e_config_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
+			      u8 *lut, u16 lut_size)
+{
+	struct i40e_pf *pf = vsi->back;
+	struct i40e_hw *hw = &pf->hw;
+	int ret = 0;
+
+	if (seed) {
+		struct i40e_aqc_get_set_rss_key_data *seed_dw =
+			(struct i40e_aqc_get_set_rss_key_data *)seed;
+		ret = i40e_aq_set_rss_key(hw, vsi->id, seed_dw);
+		if (ret) {
+			dev_info(&pf->pdev->dev,
+				 "Cannot set RSS key, err %s aq_err %s\n",
+				 i40e_stat_str(hw, ret),
+				 i40e_aq_str(hw, hw->aq.asq_last_status));
+			return ret;
+		}
+	}
+	if (lut) {
+		bool pf_lut = vsi->type == I40E_VSI_MAIN ? true : false;
+
+		ret = i40e_aq_set_rss_lut(hw, vsi->id, pf_lut, lut, lut_size);
+		if (ret) {
+			dev_info(&pf->pdev->dev,
+				 "Cannot set RSS lut, err %s aq_err %s\n",
+				 i40e_stat_str(hw, ret),
+				 i40e_aq_str(hw, hw->aq.asq_last_status));
+			return ret;
+		}
+	}
+	return ret;
+}
+
+/**
+ * i40e_vsi_config_rss - Prepare for VSI(VMDq) RSS if used
+ * @vsi: VSI structure
+ **/
+static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
+{
+	u8 seed[I40E_HKEY_ARRAY_SIZE];
+	struct i40e_pf *pf = vsi->back;
+	u8 *lut;
+	int ret;
+
+	if (!(pf->hw_features & I40E_HW_RSS_AQ_CAPABLE))
+		return 0;
+	if (!vsi->rss_size)
+		vsi->rss_size = min_t(int, pf->alloc_rss_size,
+				      vsi->num_queue_pairs);
+	if (!vsi->rss_size)
+		return -EINVAL;
+	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
+	if (!lut)
+		return -ENOMEM;
+
+	/* Use the user configured hash keys and lookup table if there is one,
+	 * otherwise use default
+	 */
+	if (vsi->rss_lut_user)
+		memcpy(lut, vsi->rss_lut_user, vsi->rss_table_size);
+	else
+		i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, vsi->rss_size);
+	if (vsi->rss_hkey_user)
+		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
+	else
+		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
+	ret = i40e_config_rss_aq(vsi, seed, lut, vsi->rss_table_size);
+	kfree(lut);
+	return ret;
+}
+
+/**
+ * i40e_vsi_setup_queue_map_mqprio - Prepares VSI tc_config to have queue
+ * configurations based on MQPRIO options.
+ * @vsi: the VSI being configured,
+ * @ctxt: VSI context structure
+ * @enabled_tc: number of traffic classes to enable
+ **/
+static int i40e_vsi_setup_queue_map_mqprio(struct i40e_vsi *vsi,
+					   struct i40e_vsi_context *ctxt,
+					   u8 enabled_tc)
+{
+	u16 qcount = 0, max_qcount, qmap, sections = 0;
+	int i, override_q, pow, num_qps, ret;
+	u8 netdev_tc = 0, offset = 0;
+
+	if (vsi->type != I40E_VSI_MAIN)
+		return -EINVAL;
+	sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
+	sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
+	vsi->tc_config.numtc = vsi->mqprio_qopt.qopt.num_tc;
+	vsi->tc_config.enabled_tc = enabled_tc ? enabled_tc : 1;
+	num_qps = vsi->mqprio_qopt.qopt.count[0];
+
+	/* find the next higher power-of-2 of num queue pairs */
+	pow = ilog2(num_qps);
+	if (!is_power_of_2(num_qps))
+		pow++;
+	qmap = (offset << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) |
+		(pow << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT);
+
+	/* Setup queue offset/count for all TCs for given VSI */
+	max_qcount = vsi->mqprio_qopt.qopt.count[0];
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+		/* See if the given TC is enabled for the given VSI */
+		if (vsi->tc_config.enabled_tc & BIT(i)) {
+			offset = vsi->mqprio_qopt.qopt.offset[i];
+			qcount = vsi->mqprio_qopt.qopt.count[i];
+			if (qcount > max_qcount)
+				max_qcount = qcount;
+			vsi->tc_config.tc_info[i].qoffset = offset;
+			vsi->tc_config.tc_info[i].qcount = qcount;
+			vsi->tc_config.tc_info[i].netdev_tc = netdev_tc++;
+		} else {
+			/* TC is not enabled so set the offset to
+			 * default queue and allocate one queue
+			 * for the given TC.
+			 */
+			vsi->tc_config.tc_info[i].qoffset = 0;
+			vsi->tc_config.tc_info[i].qcount = 1;
+			vsi->tc_config.tc_info[i].netdev_tc = 0;
+		}
+	}
+
+	/* Set actual Tx/Rx queue pairs */
+	vsi->num_queue_pairs = offset + qcount;
+
+	/* Setup queue TC[0].qmap for given VSI context */
+	ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
+	ctxt->info.mapping_flags |= cpu_to_le16(I40E_AQ_VSI_QUE_MAP_CONTIG);
+	ctxt->info.queue_mapping[0] = cpu_to_le16(vsi->base_queue);
+	ctxt->info.valid_sections |= cpu_to_le16(sections);
+
+	/* Reconfigure RSS for main VSI with max queue count */
+	vsi->rss_size = max_qcount;
+	ret = i40e_vsi_config_rss(vsi);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Failed to reconfig rss for num_queues (%u)\n",
+			 max_qcount);
+		return ret;
+	}
+	vsi->reconfig_rss = true;
+	dev_dbg(&vsi->back->pdev->dev,
+		"Reconfigured rss with num_queues (%u)\n", max_qcount);
+
+	/* Find queue count available for channel VSIs and starting offset
+	 * for channel VSIs
+	 */
+	override_q = vsi->mqprio_qopt.qopt.count[0];
+	if (override_q && (override_q < vsi->num_queue_pairs)) {
+		vsi->cnt_q_avail = vsi->num_queue_pairs - override_q;
+		atomic_set(&vsi->next_base_queue, override_q);
+	}
+	return 0;
+}
+
+/**
  * i40e_vsi_setup_queue_map - Setup a VSI queue map based on enabled_tc
  * @vsi: the VSI being setup
  * @ctxt: VSI context structure
@@ -1615,7 +1778,7 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi,
 			numtc = 1;
 		}
 	} else {
-		/* At least TC0 is enabled in case of non-DCB case */
+		/* At least TC0 is enabled in non-DCB, non-MQPRIO case */
 		numtc = 1;
 	}
 
@@ -3164,6 +3327,7 @@ static void i40e_vsi_config_dcb_rings(struct i40e_vsi *vsi)
 			rx_ring->dcb_tc = 0;
 			tx_ring->dcb_tc = 0;
 		}
+		return;
 	}
 
 	for (n = 0; n < I40E_MAX_TRAFFIC_CLASS; n++) {
@@ -4872,6 +5036,24 @@ static u8 i40e_dcb_get_enabled_tc(struct i40e_dcbx_config *dcbcfg)
 }
 
 /**
+ * i40e_mqprio_get_enabled_tc - Get enabled traffic classes
+ * @pf: PF being queried
+ *
+ * Query the current MQPRIO configuration and return the number of
+ * traffic classes enabled.
+ **/
+static u8 i40e_mqprio_get_enabled_tc(struct i40e_pf *pf)
+{
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+	u8 num_tc = vsi->mqprio_qopt.qopt.num_tc;
+	u8 enabled_tc = 1, i;
+
+	for (i = 1; i < num_tc; i++)
+		enabled_tc |= BIT(i);
+	return enabled_tc;
+}
+
+/**
  * i40e_pf_get_num_tc - Get enabled traffic classes for PF
  * @pf: PF being queried
  *
@@ -4884,7 +5066,10 @@ static u8 i40e_pf_get_num_tc(struct i40e_pf *pf)
 	u8 num_tc = 0;
 	struct i40e_dcbx_config *dcbcfg = &hw->local_dcbx_config;
 
-	/* If DCB is not enabled then always in single TC */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return pf->vsi[pf->lan_vsi]->mqprio_qopt.qopt.num_tc;
+
+	/* If neither MQPRIO nor DCB is enabled, then always use single TC */
 	if (!(pf->flags & I40E_FLAG_DCB_ENABLED))
 		return 1;
 
@@ -4913,7 +5098,12 @@ static u8 i40e_pf_get_num_tc(struct i40e_pf *pf)
  **/
 static u8 i40e_pf_get_tc_map(struct i40e_pf *pf)
 {
-	/* If DCB is not enabled for this PF then just return default TC */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return i40e_mqprio_get_enabled_tc(pf);
+
+	/* If neither MQPRIO nor DCB is enabled for this PF then just return
+	 * default TC
+	 */
 	if (!(pf->flags & I40E_FLAG_DCB_ENABLED))
 		return I40E_DEFAULT_TRAFFIC_CLASS;
 
@@ -5003,6 +5193,9 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	i40e_status ret;
 	int i;
 
+	if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
+	    !vsi->mqprio_qopt.qopt.hw)
+		return 0;
 	bw_data.tc_valid_bits = enabled_tc;
 	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
 		bw_data.tc_bw_credits[i] = bw_share[i];
@@ -5065,6 +5258,9 @@ static void i40e_vsi_config_netdev_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 					vsi->tc_config.tc_info[i].qoffset);
 	}
 
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return;
+
 	/* Assign UP2TC map for the VSI */
 	for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
 		/* Get the actual TC# for the UP */
@@ -5115,7 +5311,8 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 	int i;
 
 	/* Check if enabled_tc is same as existing or new TCs */
-	if (vsi->tc_config.enabled_tc == enabled_tc)
+	if (vsi->tc_config.enabled_tc == enabled_tc &&
+	    vsi->mqprio_qopt.qopt.hw != TC_MQPRIO_HW_OFFLOAD)
 		return ret;
 
 	/* Enable ETS TCs with equal BW Share for now across all VSIs */
@@ -5138,15 +5335,37 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 	ctxt.vf_num = 0;
 	ctxt.uplink_seid = vsi->uplink_seid;
 	ctxt.info = vsi->info;
-	i40e_vsi_setup_queue_map(vsi, &ctxt, enabled_tc, false);
+	if (vsi->back->flags & I40E_FLAG_TC_MQPRIO) {
+		ret = i40e_vsi_setup_queue_map_mqprio(vsi, &ctxt, enabled_tc);
+		if (ret)
+			goto out;
+	} else {
+		i40e_vsi_setup_queue_map(vsi, &ctxt, enabled_tc, false);
+	}
 
+	/* On destroying the qdisc, reset vsi->rss_size, as number of enabled
+	 * queues changed.
+	 */
+	if (!vsi->mqprio_qopt.qopt.hw && vsi->reconfig_rss) {
+		vsi->rss_size = min_t(int, vsi->back->alloc_rss_size,
+				      vsi->num_queue_pairs);
+		ret = i40e_vsi_config_rss(vsi);
+		if (ret) {
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reconfig rss for num_queues\n");
+			return ret;
+		}
+		vsi->reconfig_rss = false;
+	}
 	if (vsi->back->flags & I40E_FLAG_IWARP_ENABLED) {
 		ctxt.info.valid_sections |=
 				cpu_to_le16(I40E_AQ_VSI_PROP_QUEUE_OPT_VALID);
 		ctxt.info.queueing_opt_flags |= I40E_AQ_VSI_QUE_OPT_TCP_ENA;
 	}
 
-	/* Update the VSI after updating the VSI queue-mapping information */
+	/* Update the VSI after updating the VSI queue-mapping
+	 * information
+	 */
 	ret = i40e_aq_update_vsi_params(&vsi->back->hw, &ctxt, NULL);
 	if (ret) {
 		dev_info(&vsi->back->pdev->dev,
@@ -6238,48 +6457,131 @@ void i40e_down(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_validate_mqprio_qopt- validate queue mapping info
+ * @vsi: the VSI being configured
+ * @mqprio_qopt: queue parametrs
+ **/
+static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
+				     struct tc_mqprio_qopt_offload *mqprio_qopt)
+{
+	int i;
+
+	if ((mqprio_qopt->qopt.offset[0] != 0) ||
+	    (mqprio_qopt->qopt.num_tc < 1) ||
+	    (mqprio_qopt->qopt.num_tc > I40E_MAX_TRAFFIC_CLASS))
+		return -EINVAL;
+	for (i = 0; ; i++) {
+		if (!mqprio_qopt->qopt.count[i])
+			return -EINVAL;
+		if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+			return -EINVAL;
+		if (i >= mqprio_qopt->qopt.num_tc - 1)
+			break;
+		if (mqprio_qopt->qopt.offset[i + 1] !=
+		    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i]))
+			return -EINVAL;
+	}
+	if (vsi->num_queue_pairs <
+	    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i])) {
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
  * i40e_setup_tc - configure multiple traffic classes
  * @netdev: net device to configure
- * @tc: number of traffic classes to enable
+ * @tc: pointer to struct tc_to_netdev
  **/
-static int i40e_setup_tc(struct net_device *netdev, u8 tc)
+static int i40e_setup_tc(struct net_device *netdev, struct tc_to_netdev *tc)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
-	u8 enabled_tc = 0;
+	u8 enabled_tc = 0, num_tc = 0, hw = 0;
 	int ret = -EINVAL;
 	int i;
 
-	/* Check if DCB enabled to continue */
-	if (!(pf->flags & I40E_FLAG_DCB_ENABLED)) {
-		netdev_info(netdev, "DCB is not enabled for adapter\n");
-		goto exit;
-	}
+	if (tc->type == TC_SETUP_MQPRIO) {
+		hw = tc->mqprio->hw;
+		num_tc = tc->mqprio->num_tc;
+	} else if (tc->type == TC_SETUP_MQPRIO_EXT) {
+		hw = tc->mqprio_qopt->qopt.hw;
+		num_tc = tc->mqprio_qopt->qopt.num_tc;
+	}
+	if (!hw) {
+		pf->flags &= ~I40E_FLAG_TC_MQPRIO;
+		if (tc->type == TC_SETUP_MQPRIO_EXT)
+			memcpy(&vsi->mqprio_qopt, tc->mqprio_qopt,
+			       sizeof(*tc->mqprio_qopt));
+		goto config_tc;
+	}
+	switch (hw) {
+	case TC_MQPRIO_HW_OFFLOAD_TCS:
+		pf->flags &= ~I40E_FLAG_TC_MQPRIO;
+
+		/* Check if DCB enabled to continue */
+		if (!(pf->flags & I40E_FLAG_DCB_ENABLED)) {
+			netdev_info(netdev,
+				    "DCB is not enabled for adapter\n");
+			goto exit;
+		}
 
-	/* Check if MFP enabled */
-	if (pf->flags & I40E_FLAG_MFP_ENABLED) {
-		netdev_info(netdev, "Configuring TC not supported in MFP mode\n");
-		goto exit;
-	}
+		/* Check if MFP enabled */
+		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+			netdev_info(netdev,
+				    "Configuring TC not supported in MFP mode\n");
+			goto exit;
+		}
 
-	/* Check whether tc count is within enabled limit */
-	if (tc > i40e_pf_get_num_tc(pf)) {
-		netdev_info(netdev, "TC count greater than enabled on link for adapter\n");
-		goto exit;
+		/* Check whether tc count is within enabled limit */
+		if (num_tc > i40e_pf_get_num_tc(pf)) {
+			netdev_info(netdev,
+				    "TC count greater than enabled on link for adapter\n");
+			goto exit;
+		}
+		break;
+	case TC_MQPRIO_HW_OFFLOAD:
+		if (pf->flags & I40E_FLAG_DCB_ENABLED) {
+			netdev_info(netdev,
+				    "Full offload of TC Mqprio options is not supported when DCB is enabled\n");
+			goto exit;
+		}
+
+		/* Check if MFP enabled */
+		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+			netdev_info(netdev,
+				    "Configuring TC not supported in MFP mode\n");
+			goto exit;
+		}
+		ret = i40e_validate_mqprio_qopt(vsi, tc->mqprio_qopt);
+		if (ret)
+			goto exit;
+		memcpy(&vsi->mqprio_qopt, tc->mqprio_qopt,
+		       sizeof(*tc->mqprio_qopt));
+		pf->flags |= I40E_FLAG_TC_MQPRIO;
+		pf->flags &= ~I40E_FLAG_DCB_ENABLED;
+		break;
+	default:
+		return -EINVAL;
 	}
 
+config_tc:
 	/* Generate TC map for number of tc requested */
-	for (i = 0; i < tc; i++)
+	for (i = 0; i < num_tc; i++)
 		enabled_tc |= BIT(i);
 
 	/* Requesting same TC configuration as already enabled */
-	if (enabled_tc == vsi->tc_config.enabled_tc)
+	if (enabled_tc == vsi->tc_config.enabled_tc &&
+	    hw != TC_MQPRIO_HW_OFFLOAD)
 		return 0;
 
 	/* Quiesce VSI queues */
 	i40e_quiesce_vsi(vsi);
 
+	if (!hw && !(pf->flags & I40E_FLAG_TC_MQPRIO))
+		i40e_remove_queue_channels(vsi);
+
 	/* Configure VSI for enabled TCs */
 	ret = i40e_vsi_config_tc(vsi, enabled_tc);
 	if (ret) {
@@ -6297,7 +6599,11 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 		}
 	}
 
+	goto out;
 exit:
+	/* Reset the configuration data */
+	memset(&vsi->tc_config, 0, sizeof(vsi->tc_config));
+out:
 	/* Unquiesce VSI */
 	i40e_unquiesce_vsi(vsi);
 	return ret;
@@ -6307,12 +6613,7 @@ static int __i40e_setup_tc(struct net_device *netdev, u32 handle,
 			   u32 chain_index, __be16 proto,
 			   struct tc_to_netdev *tc)
 {
-	if (tc->type != TC_SETUP_MQPRIO)
-		return -EINVAL;
-
-	tc->mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
-
-	return i40e_setup_tc(netdev, tc->mqprio->num_tc);
+	return i40e_setup_tc(netdev, tc);
 }
 
 /**
@@ -9110,45 +9411,6 @@ static int i40e_setup_misc_vector(struct i40e_pf *pf)
 }
 
 /**
- * i40e_config_rss_aq - Prepare for RSS using AQ commands
- * @vsi: vsi structure
- * @seed: RSS hash seed
- **/
-static int i40e_config_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
-			      u8 *lut, u16 lut_size)
-{
-	struct i40e_pf *pf = vsi->back;
-	struct i40e_hw *hw = &pf->hw;
-	int ret = 0;
-
-	if (seed) {
-		struct i40e_aqc_get_set_rss_key_data *seed_dw =
-			(struct i40e_aqc_get_set_rss_key_data *)seed;
-		ret = i40e_aq_set_rss_key(hw, vsi->id, seed_dw);
-		if (ret) {
-			dev_info(&pf->pdev->dev,
-				 "Cannot set RSS key, err %s aq_err %s\n",
-				 i40e_stat_str(hw, ret),
-				 i40e_aq_str(hw, hw->aq.asq_last_status));
-			return ret;
-		}
-	}
-	if (lut) {
-		bool pf_lut = vsi->type == I40E_VSI_MAIN ? true : false;
-
-		ret = i40e_aq_set_rss_lut(hw, vsi->id, pf_lut, lut, lut_size);
-		if (ret) {
-			dev_info(&pf->pdev->dev,
-				 "Cannot set RSS lut, err %s aq_err %s\n",
-				 i40e_stat_str(hw, ret),
-				 i40e_aq_str(hw, hw->aq.asq_last_status));
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
  * i40e_get_rss_aq - Get RSS keys and lut by using AQ commands
  * @vsi: Pointer to vsi structure
  * @seed: Buffter to store the hash keys
@@ -9195,46 +9457,6 @@ static int i40e_get_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
 }
 
 /**
- * i40e_vsi_config_rss - Prepare for VSI(VMDq) RSS if used
- * @vsi: VSI structure
- **/
-static int i40e_vsi_config_rss(struct i40e_vsi *vsi)
-{
-	u8 seed[I40E_HKEY_ARRAY_SIZE];
-	struct i40e_pf *pf = vsi->back;
-	u8 *lut;
-	int ret;
-
-	if (!(pf->hw_features & I40E_HW_RSS_AQ_CAPABLE))
-		return 0;
-
-	if (!vsi->rss_size)
-		vsi->rss_size = min_t(int, pf->alloc_rss_size,
-				      vsi->num_queue_pairs);
-	if (!vsi->rss_size)
-		return -EINVAL;
-
-	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
-	if (!lut)
-		return -ENOMEM;
-	/* Use the user configured hash keys and lookup table if there is one,
-	 * otherwise use default
-	 */
-	if (vsi->rss_lut_user)
-		memcpy(lut, vsi->rss_lut_user, vsi->rss_table_size);
-	else
-		i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, vsi->rss_size);
-	if (vsi->rss_hkey_user)
-		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
-	else
-		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
-	ret = i40e_config_rss_aq(vsi, seed, lut, vsi->rss_table_size);
-	kfree(lut);
-
-	return ret;
-}
-
-/**
  * i40e_config_rss_reg - Configure RSS keys and lut by writing registers
  * @vsi: Pointer to vsi structure
  * @seed: RSS hash seed


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 10:18   ` Amritha Nambiar
  -1 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

This patch refactors the BW rate limiting for Tx traffic
on the VF to be reused in the next patch for rate limiting Tx
traffic for the VSIs on the PF as well.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |    5 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c        |   63 ++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   45 +-------------
 3 files changed, 70 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 501d62e..c545be6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -127,6 +127,10 @@
 /* default to trying for four seconds */
 #define I40E_TRY_LINK_TIMEOUT	(4 * HZ)
 
+/* BW rate limiting */
+#define I40E_BW_CREDIT_DIVISOR		50 /* 50Mbps per BW credit */
+#define I40E_MAX_BW_INACTIVE_ACCUM	4  /* accumulate 4 credits max */
+
 /* driver state flags */
 enum i40e_state_t {
 	__I40E_TESTING,
@@ -1040,4 +1044,5 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
 }
 
 int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
+int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 40084b7..409ef09 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5397,6 +5397,69 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 }
 
 /**
+ * i40e_get_link_speed - Returns link speed for the interface
+ * @vsi: VSI to be configured
+ *
+ **/
+int i40e_get_link_speed(struct i40e_vsi *vsi)
+{
+	struct i40e_pf *pf = vsi->back;
+
+	switch (pf->hw.phy.link_info.link_speed) {
+	case I40E_LINK_SPEED_40GB:
+		return 40000;
+	case I40E_LINK_SPEED_25GB:
+		return 25000;
+	case I40E_LINK_SPEED_20GB:
+		return 20000;
+	case I40E_LINK_SPEED_10GB:
+		return 10000;
+	case I40E_LINK_SPEED_1GB:
+		return 1000;
+	default:
+		return -EINVAL;
+	}
+}
+
+/**
+ * i40e_set_bw_limit - setup BW limit for Tx traffic based on max_tx_rate
+ * @vsi: VSI to be configured
+ * @seid: seid of the channel/VSI
+ * @max_tx_rate: max TX rate to be configured as BW limit
+ *
+ * Helper function to set BW limit for a given VSI
+ **/
+int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate)
+{
+	struct i40e_pf *pf = vsi->back;
+	int speed = 0;
+	int ret = 0;
+
+	speed = i40e_get_link_speed(vsi);
+	if (max_tx_rate > speed) {
+		dev_err(&pf->pdev->dev,
+			"Invalid max tx rate %llu specified for VSI seid %d.",
+			max_tx_rate, seid);
+		return -EINVAL;
+	}
+	if ((max_tx_rate < 50) && (max_tx_rate > 0)) {
+		dev_warn(&pf->pdev->dev,
+			 "Setting max tx rate to minimum usable value of 50Mbps.\n");
+		max_tx_rate = 50;
+	}
+
+       /* Tx rate credits are in values of 50Mbps, 0 is disabled*/
+	ret = i40e_aq_config_vsi_bw_limit(&pf->hw, seid,
+					  max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+					  I40E_MAX_BW_INACTIVE_ACCUM, NULL);
+	if (ret)
+		dev_err(&pf->pdev->dev,
+			"Failed set tx rate (%llu Mbps) for vsi->seid %u, error code %d.\n",
+			max_tx_rate, seid, ret);
+	return ret;
+}
+
+/**
  * i40e_remove_queue_channels - Remove queue channels for the TCs
  * @vsi: VSI to be configured
  *
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index be172f8..7a0f953 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2926,8 +2926,6 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id,
 	return ret;
 }
 
-#define I40E_BW_CREDIT_DIVISOR 50     /* 50Mbps per BW credit */
-#define I40E_MAX_BW_INACTIVE_ACCUM 4  /* device can accumulate 4 credits max */
 /**
  * i40e_ndo_set_vf_bw
  * @netdev: network interface device structure
@@ -2943,7 +2941,6 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
 	struct i40e_pf *pf = np->vsi->back;
 	struct i40e_vsi *vsi;
 	struct i40e_vf *vf;
-	int speed = 0;
 	int ret = 0;
 
 	/* validate the request */
@@ -2968,48 +2965,10 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
 		goto error;
 	}
 
-	switch (pf->hw.phy.link_info.link_speed) {
-	case I40E_LINK_SPEED_40GB:
-		speed = 40000;
-		break;
-	case I40E_LINK_SPEED_25GB:
-		speed = 25000;
-		break;
-	case I40E_LINK_SPEED_20GB:
-		speed = 20000;
-		break;
-	case I40E_LINK_SPEED_10GB:
-		speed = 10000;
-		break;
-	case I40E_LINK_SPEED_1GB:
-		speed = 1000;
-		break;
-	default:
-		break;
-	}
-
-	if (max_tx_rate > speed) {
-		dev_err(&pf->pdev->dev, "Invalid max tx rate %d specified for VF %d.\n",
-			max_tx_rate, vf->vf_id);
-		ret = -EINVAL;
+	ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+	if (ret)
 		goto error;
-	}
 
-	if ((max_tx_rate < 50) && (max_tx_rate > 0)) {
-		dev_warn(&pf->pdev->dev, "Setting max Tx rate to minimum usable value of 50Mbps.\n");
-		max_tx_rate = 50;
-	}
-
-	/* Tx rate credits are in values of 50Mbps, 0 is disabled*/
-	ret = i40e_aq_config_vsi_bw_limit(&pf->hw, vsi->seid,
-					  max_tx_rate / I40E_BW_CREDIT_DIVISOR,
-					  I40E_MAX_BW_INACTIVE_ACCUM, NULL);
-	if (ret) {
-		dev_err(&pf->pdev->dev, "Unable to set max tx rate, error code %d.\n",
-			ret);
-		ret = -EIO;
-		goto error;
-	}
 	vf->tx_rate = max_tx_rate;
 error:
 	return ret;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well
@ 2017-07-11 10:18   ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:18 UTC (permalink / raw)
  To: intel-wired-lan

This patch refactors the BW rate limiting for Tx traffic
on the VF to be reused in the next patch for rate limiting Tx
traffic for the VSIs on the PF as well.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h             |    5 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c        |   63 ++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   45 +-------------
 3 files changed, 70 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 501d62e..c545be6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -127,6 +127,10 @@
 /* default to trying for four seconds */
 #define I40E_TRY_LINK_TIMEOUT	(4 * HZ)
 
+/* BW rate limiting */
+#define I40E_BW_CREDIT_DIVISOR		50 /* 50Mbps per BW credit */
+#define I40E_MAX_BW_INACTIVE_ACCUM	4  /* accumulate 4 credits max */
+
 /* driver state flags */
 enum i40e_state_t {
 	__I40E_TESTING,
@@ -1040,4 +1044,5 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
 }
 
 int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
+int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 40084b7..409ef09 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5397,6 +5397,69 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 }
 
 /**
+ * i40e_get_link_speed - Returns link speed for the interface
+ * @vsi: VSI to be configured
+ *
+ **/
+int i40e_get_link_speed(struct i40e_vsi *vsi)
+{
+	struct i40e_pf *pf = vsi->back;
+
+	switch (pf->hw.phy.link_info.link_speed) {
+	case I40E_LINK_SPEED_40GB:
+		return 40000;
+	case I40E_LINK_SPEED_25GB:
+		return 25000;
+	case I40E_LINK_SPEED_20GB:
+		return 20000;
+	case I40E_LINK_SPEED_10GB:
+		return 10000;
+	case I40E_LINK_SPEED_1GB:
+		return 1000;
+	default:
+		return -EINVAL;
+	}
+}
+
+/**
+ * i40e_set_bw_limit - setup BW limit for Tx traffic based on max_tx_rate
+ * @vsi: VSI to be configured
+ * @seid: seid of the channel/VSI
+ * @max_tx_rate: max TX rate to be configured as BW limit
+ *
+ * Helper function to set BW limit for a given VSI
+ **/
+int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate)
+{
+	struct i40e_pf *pf = vsi->back;
+	int speed = 0;
+	int ret = 0;
+
+	speed = i40e_get_link_speed(vsi);
+	if (max_tx_rate > speed) {
+		dev_err(&pf->pdev->dev,
+			"Invalid max tx rate %llu specified for VSI seid %d.",
+			max_tx_rate, seid);
+		return -EINVAL;
+	}
+	if ((max_tx_rate < 50) && (max_tx_rate > 0)) {
+		dev_warn(&pf->pdev->dev,
+			 "Setting max tx rate to minimum usable value of 50Mbps.\n");
+		max_tx_rate = 50;
+	}
+
+       /* Tx rate credits are in values of 50Mbps, 0 is disabled*/
+	ret = i40e_aq_config_vsi_bw_limit(&pf->hw, seid,
+					  max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+					  I40E_MAX_BW_INACTIVE_ACCUM, NULL);
+	if (ret)
+		dev_err(&pf->pdev->dev,
+			"Failed set tx rate (%llu Mbps) for vsi->seid %u, error code %d.\n",
+			max_tx_rate, seid, ret);
+	return ret;
+}
+
+/**
  * i40e_remove_queue_channels - Remove queue channels for the TCs
  * @vsi: VSI to be configured
  *
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index be172f8..7a0f953 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2926,8 +2926,6 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, int vf_id,
 	return ret;
 }
 
-#define I40E_BW_CREDIT_DIVISOR 50     /* 50Mbps per BW credit */
-#define I40E_MAX_BW_INACTIVE_ACCUM 4  /* device can accumulate 4 credits max */
 /**
  * i40e_ndo_set_vf_bw
  * @netdev: network interface device structure
@@ -2943,7 +2941,6 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
 	struct i40e_pf *pf = np->vsi->back;
 	struct i40e_vsi *vsi;
 	struct i40e_vf *vf;
-	int speed = 0;
 	int ret = 0;
 
 	/* validate the request */
@@ -2968,48 +2965,10 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int vf_id, int min_tx_rate,
 		goto error;
 	}
 
-	switch (pf->hw.phy.link_info.link_speed) {
-	case I40E_LINK_SPEED_40GB:
-		speed = 40000;
-		break;
-	case I40E_LINK_SPEED_25GB:
-		speed = 25000;
-		break;
-	case I40E_LINK_SPEED_20GB:
-		speed = 20000;
-		break;
-	case I40E_LINK_SPEED_10GB:
-		speed = 10000;
-		break;
-	case I40E_LINK_SPEED_1GB:
-		speed = 1000;
-		break;
-	default:
-		break;
-	}
-
-	if (max_tx_rate > speed) {
-		dev_err(&pf->pdev->dev, "Invalid max tx rate %d specified for VF %d.\n",
-			max_tx_rate, vf->vf_id);
-		ret = -EINVAL;
+	ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+	if (ret)
 		goto error;
-	}
 
-	if ((max_tx_rate < 50) && (max_tx_rate > 0)) {
-		dev_warn(&pf->pdev->dev, "Setting max Tx rate to minimum usable value of 50Mbps.\n");
-		max_tx_rate = 50;
-	}
-
-	/* Tx rate credits are in values of 50Mbps, 0 is disabled*/
-	ret = i40e_aq_config_vsi_bw_limit(&pf->hw, vsi->seid,
-					  max_tx_rate / I40E_BW_CREDIT_DIVISOR,
-					  I40E_MAX_BW_INACTIVE_ACCUM, NULL);
-	if (ret) {
-		dev_err(&pf->pdev->dev, "Unable to set max tx rate, error code %d.\n",
-			ret);
-		ret = -EIO;
-		goto error;
-	}
 	vf->tx_rate = max_tx_rate;
 error:
 	return ret;


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 10:19   ` Amritha Nambiar
  -1 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:19 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

This patch enables setting up maximum Tx rates for the traffic
classes in i40e. The maximum rate offloaded to the hardware through
the mqprio framework is configured for the VSI. Configuring
minimum Tx rate limit is not supported in the device. The minimum
usable value for Tx rate is 50Mbps.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:
# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             min rates:0bit 0bit
             max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |    2 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |   99 +++++++++++++++++++++++++--
 2 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c545be6..7a4a338 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -357,6 +357,8 @@ struct i40e_channel {
 	u8 enabled_tc;
 	struct i40e_aqc_vsi_properties_data info;
 
+	u64 max_tx_rate;
+
 	/* track this channel belongs to which VSI */
 	struct i40e_vsi *parent_vsi;
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 409ef09..c5a625c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5193,9 +5193,16 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	i40e_status ret;
 	int i;
 
-	if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
-	    !vsi->mqprio_qopt.qopt.hw)
+	if (vsi->back->flags & I40E_FLAG_TC_MQPRIO)
 		return 0;
+	if (!vsi->mqprio_qopt.qopt.hw) {
+		ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
+		if (ret)
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reset tx rate for vsi->seid %u\n",
+				 vsi->seid);
+		return ret;
+	}
 	bw_data.tc_valid_bits = enabled_tc;
 	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
 		bw_data.tc_bw_credits[i] = bw_share[i];
@@ -5501,6 +5508,13 @@ static void i40e_remove_queue_channels(struct i40e_vsi *vsi)
 			rx_ring->ch = NULL;
 		}
 
+		/* Reset BW configured for this VSI via mqprio */
+		ret = i40e_set_bw_limit(vsi, ch->seid, 0);
+		if (ret)
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reset tx rate for ch->seid %u\n",
+				 ch->seid);
+
 		/* delete VSI from FW */
 		ret = i40e_aq_delete_element(&vsi->back->hw, ch->seid,
 					     NULL);
@@ -6073,6 +6087,17 @@ int i40e_create_queue_channel(struct i40e_vsi *vsi,
 		 "Setup channel (id:%u) utilizing num_queues %d\n",
 		 ch->seid, ch->num_queue_pairs);
 
+	/* configure VSI for BW limit */
+	if (ch->max_tx_rate) {
+		if (i40e_set_bw_limit(vsi, ch->seid, ch->max_tx_rate))
+			return -EINVAL;
+
+		dev_dbg(&pf->pdev->dev,
+			"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+			ch->max_tx_rate,
+			ch->max_tx_rate / I40E_BW_CREDIT_DIVISOR, ch->seid);
+	}
+
 	/* in case of VF, this will be main SRIOV VSI */
 	ch->parent_vsi = vsi;
 
@@ -6108,6 +6133,12 @@ static int i40e_configure_queue_channels(struct i40e_vsi *vsi)
 			ch->base_queue =
 				vsi->tc_config.tc_info[i].qoffset;
 
+			/* Bandwidth limit through tc interface is in bytes/s,
+			 * change to Mbit/s
+			 */
+			ch->max_tx_rate =
+				(vsi->mqprio_qopt.max_rate[i] * 8) / 1000000;
+
 			list_add_tail(&ch->list, &vsi->ch_list);
 
 			ret = i40e_create_queue_channel(vsi, ch);
@@ -6527,6 +6558,7 @@ void i40e_down(struct i40e_vsi *vsi)
 static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 				     struct tc_mqprio_qopt_offload *mqprio_qopt)
 {
+	u64 sum_max_rate = 0;
 	int i;
 
 	if ((mqprio_qopt->qopt.offset[0] != 0) ||
@@ -6536,8 +6568,13 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 	for (i = 0; ; i++) {
 		if (!mqprio_qopt->qopt.count[i])
 			return -EINVAL;
-		if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+		if (mqprio_qopt->min_rate[i]) {
+			dev_err(&vsi->back->pdev->dev,
+				"Invalid min tx rate (greater than 0) specified\n");
 			return -EINVAL;
+		}
+		sum_max_rate += ((mqprio_qopt->max_rate[i] * 8) / 1000000);
+
 		if (i >= mqprio_qopt->qopt.num_tc - 1)
 			break;
 		if (mqprio_qopt->qopt.offset[i + 1] !=
@@ -6548,6 +6585,11 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 	    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i])) {
 		return -EINVAL;
 	}
+	if (sum_max_rate > i40e_get_link_speed(vsi)) {
+		dev_err(&vsi->back->pdev->dev,
+			"Invalid max tx rate specified\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
@@ -6654,6 +6696,19 @@ static int i40e_setup_tc(struct net_device *netdev, struct tc_to_netdev *tc)
 	}
 
 	if (pf->flags & I40E_FLAG_TC_MQPRIO) {
+		if (vsi->mqprio_qopt.max_rate[0]) {
+			u64 max_tx_rate = (vsi->mqprio_qopt.max_rate[0] * 8) /
+								1000000;
+			ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+			if (!ret)
+				dev_dbg(&vsi->back->pdev->dev,
+					"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+					max_tx_rate,
+					max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+					vsi->seid);
+			else
+				goto exit;
+		}
 		ret = i40e_configure_queue_channels(vsi);
 		if (ret) {
 			netdev_info(netdev,
@@ -8088,6 +8143,18 @@ static int i40e_rebuild_channels(struct i40e_vsi *vsi)
 				 vsi->uplink_seid);
 			return ret;
 		}
+		if (ch->max_tx_rate) {
+			if (i40e_set_bw_limit(vsi, ch->seid,
+					      ch->max_tx_rate))
+				return -EINVAL;
+
+			dev_dbg(&vsi->back->pdev->dev,
+				"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+				ch->max_tx_rate,
+				ch->max_tx_rate /
+						I40E_BW_CREDIT_DIVISOR,
+				ch->seid);
+		}
 	}
 	return 0;
 }
@@ -8228,6 +8295,7 @@ static int i40e_reset(struct i40e_pf *pf)
  **/
 static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 {
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
 	struct i40e_hw *hw = &pf->hw;
 	u8 set_fc_aq_fail = 0;
 	i40e_status ret;
@@ -8310,7 +8378,7 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 	 * If there were VEBs but the reconstitution failed, we'll try
 	 * try to recover minimal use by getting the basic PF VSI working.
 	 */
-	if (pf->vsi[pf->lan_vsi]->uplink_seid != pf->mac_seid) {
+	if (vsi->uplink_seid != pf->mac_seid) {
 		dev_dbg(&pf->pdev->dev, "attempting to rebuild switch\n");
 		/* find the one VEB connected to the MAC, and find orphans */
 		for (v = 0; v < I40E_MAX_VEB; v++) {
@@ -8334,8 +8402,7 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 					dev_info(&pf->pdev->dev,
 						 "rebuild of switch failed: %d, will try to set up simple PF connection\n",
 						 ret);
-					pf->vsi[pf->lan_vsi]->uplink_seid
-								= pf->mac_seid;
+					vsi->uplink_seid = pf->mac_seid;
 					break;
 				} else if (pf->veb[v]->uplink_seid == 0) {
 					dev_info(&pf->pdev->dev,
@@ -8346,10 +8413,10 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		}
 	}
 
-	if (pf->vsi[pf->lan_vsi]->uplink_seid == pf->mac_seid) {
+	if (vsi->uplink_seid == pf->mac_seid) {
 		dev_dbg(&pf->pdev->dev, "attempting to rebuild PF VSI\n");
 		/* no VEB, so rebuild only the Main VSI */
-		ret = i40e_add_vsi(pf->vsi[pf->lan_vsi]);
+		ret = i40e_add_vsi(vsi);
 		if (ret) {
 			dev_info(&pf->pdev->dev,
 				 "rebuild of Main VSI failed: %d\n", ret);
@@ -8357,10 +8424,24 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		}
 	}
 
+	if (vsi->mqprio_qopt.max_rate[0]) {
+		u64 max_tx_rate = (vsi->mqprio_qopt.max_rate[0] * 8) / 1000000;
+
+		ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+		if (!ret)
+			dev_dbg(&vsi->back->pdev->dev,
+				"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+				max_tx_rate,
+				max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+				vsi->seid);
+		else
+			goto end_unlock;
+	}
+
 	/* PF Main VSI is rebuild by now, go ahead and rebuild channel VSIs
 	 * for this main VSI if they exist
 	 */
-	ret = i40e_rebuild_channels(pf->vsi[pf->lan_vsi]);
+	ret = i40e_rebuild_channels(vsi);
 	if (ret)
 		goto end_unlock;
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio
@ 2017-07-11 10:19   ` Amritha Nambiar
  0 siblings, 0 replies; 32+ messages in thread
From: Amritha Nambiar @ 2017-07-11 10:19 UTC (permalink / raw)
  To: intel-wired-lan

This patch enables setting up maximum Tx rates for the traffic
classes in i40e. The maximum rate offloaded to the hardware through
the mqprio framework is configured for the VSI. Configuring
minimum Tx rate limit is not supported in the device. The minimum
usable value for Tx rate is 50Mbps.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:
# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             min rates:0bit 0bit
             max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |    2 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |   99 +++++++++++++++++++++++++--
 2 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index c545be6..7a4a338 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -357,6 +357,8 @@ struct i40e_channel {
 	u8 enabled_tc;
 	struct i40e_aqc_vsi_properties_data info;
 
+	u64 max_tx_rate;
+
 	/* track this channel belongs to which VSI */
 	struct i40e_vsi *parent_vsi;
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 409ef09..c5a625c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5193,9 +5193,16 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	i40e_status ret;
 	int i;
 
-	if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
-	    !vsi->mqprio_qopt.qopt.hw)
+	if (vsi->back->flags & I40E_FLAG_TC_MQPRIO)
 		return 0;
+	if (!vsi->mqprio_qopt.qopt.hw) {
+		ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
+		if (ret)
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reset tx rate for vsi->seid %u\n",
+				 vsi->seid);
+		return ret;
+	}
 	bw_data.tc_valid_bits = enabled_tc;
 	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
 		bw_data.tc_bw_credits[i] = bw_share[i];
@@ -5501,6 +5508,13 @@ static void i40e_remove_queue_channels(struct i40e_vsi *vsi)
 			rx_ring->ch = NULL;
 		}
 
+		/* Reset BW configured for this VSI via mqprio */
+		ret = i40e_set_bw_limit(vsi, ch->seid, 0);
+		if (ret)
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reset tx rate for ch->seid %u\n",
+				 ch->seid);
+
 		/* delete VSI from FW */
 		ret = i40e_aq_delete_element(&vsi->back->hw, ch->seid,
 					     NULL);
@@ -6073,6 +6087,17 @@ int i40e_create_queue_channel(struct i40e_vsi *vsi,
 		 "Setup channel (id:%u) utilizing num_queues %d\n",
 		 ch->seid, ch->num_queue_pairs);
 
+	/* configure VSI for BW limit */
+	if (ch->max_tx_rate) {
+		if (i40e_set_bw_limit(vsi, ch->seid, ch->max_tx_rate))
+			return -EINVAL;
+
+		dev_dbg(&pf->pdev->dev,
+			"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+			ch->max_tx_rate,
+			ch->max_tx_rate / I40E_BW_CREDIT_DIVISOR, ch->seid);
+	}
+
 	/* in case of VF, this will be main SRIOV VSI */
 	ch->parent_vsi = vsi;
 
@@ -6108,6 +6133,12 @@ static int i40e_configure_queue_channels(struct i40e_vsi *vsi)
 			ch->base_queue =
 				vsi->tc_config.tc_info[i].qoffset;
 
+			/* Bandwidth limit through tc interface is in bytes/s,
+			 * change to Mbit/s
+			 */
+			ch->max_tx_rate =
+				(vsi->mqprio_qopt.max_rate[i] * 8) / 1000000;
+
 			list_add_tail(&ch->list, &vsi->ch_list);
 
 			ret = i40e_create_queue_channel(vsi, ch);
@@ -6527,6 +6558,7 @@ void i40e_down(struct i40e_vsi *vsi)
 static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 				     struct tc_mqprio_qopt_offload *mqprio_qopt)
 {
+	u64 sum_max_rate = 0;
 	int i;
 
 	if ((mqprio_qopt->qopt.offset[0] != 0) ||
@@ -6536,8 +6568,13 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 	for (i = 0; ; i++) {
 		if (!mqprio_qopt->qopt.count[i])
 			return -EINVAL;
-		if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+		if (mqprio_qopt->min_rate[i]) {
+			dev_err(&vsi->back->pdev->dev,
+				"Invalid min tx rate (greater than 0) specified\n");
 			return -EINVAL;
+		}
+		sum_max_rate += ((mqprio_qopt->max_rate[i] * 8) / 1000000);
+
 		if (i >= mqprio_qopt->qopt.num_tc - 1)
 			break;
 		if (mqprio_qopt->qopt.offset[i + 1] !=
@@ -6548,6 +6585,11 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 	    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i])) {
 		return -EINVAL;
 	}
+	if (sum_max_rate > i40e_get_link_speed(vsi)) {
+		dev_err(&vsi->back->pdev->dev,
+			"Invalid max tx rate specified\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
@@ -6654,6 +6696,19 @@ static int i40e_setup_tc(struct net_device *netdev, struct tc_to_netdev *tc)
 	}
 
 	if (pf->flags & I40E_FLAG_TC_MQPRIO) {
+		if (vsi->mqprio_qopt.max_rate[0]) {
+			u64 max_tx_rate = (vsi->mqprio_qopt.max_rate[0] * 8) /
+								1000000;
+			ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+			if (!ret)
+				dev_dbg(&vsi->back->pdev->dev,
+					"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+					max_tx_rate,
+					max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+					vsi->seid);
+			else
+				goto exit;
+		}
 		ret = i40e_configure_queue_channels(vsi);
 		if (ret) {
 			netdev_info(netdev,
@@ -8088,6 +8143,18 @@ static int i40e_rebuild_channels(struct i40e_vsi *vsi)
 				 vsi->uplink_seid);
 			return ret;
 		}
+		if (ch->max_tx_rate) {
+			if (i40e_set_bw_limit(vsi, ch->seid,
+					      ch->max_tx_rate))
+				return -EINVAL;
+
+			dev_dbg(&vsi->back->pdev->dev,
+				"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+				ch->max_tx_rate,
+				ch->max_tx_rate /
+						I40E_BW_CREDIT_DIVISOR,
+				ch->seid);
+		}
 	}
 	return 0;
 }
@@ -8228,6 +8295,7 @@ static int i40e_reset(struct i40e_pf *pf)
  **/
 static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 {
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
 	struct i40e_hw *hw = &pf->hw;
 	u8 set_fc_aq_fail = 0;
 	i40e_status ret;
@@ -8310,7 +8378,7 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 	 * If there were VEBs but the reconstitution failed, we'll try
 	 * try to recover minimal use by getting the basic PF VSI working.
 	 */
-	if (pf->vsi[pf->lan_vsi]->uplink_seid != pf->mac_seid) {
+	if (vsi->uplink_seid != pf->mac_seid) {
 		dev_dbg(&pf->pdev->dev, "attempting to rebuild switch\n");
 		/* find the one VEB connected to the MAC, and find orphans */
 		for (v = 0; v < I40E_MAX_VEB; v++) {
@@ -8334,8 +8402,7 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 					dev_info(&pf->pdev->dev,
 						 "rebuild of switch failed: %d, will try to set up simple PF connection\n",
 						 ret);
-					pf->vsi[pf->lan_vsi]->uplink_seid
-								= pf->mac_seid;
+					vsi->uplink_seid = pf->mac_seid;
 					break;
 				} else if (pf->veb[v]->uplink_seid == 0) {
 					dev_info(&pf->pdev->dev,
@@ -8346,10 +8413,10 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		}
 	}
 
-	if (pf->vsi[pf->lan_vsi]->uplink_seid == pf->mac_seid) {
+	if (vsi->uplink_seid == pf->mac_seid) {
 		dev_dbg(&pf->pdev->dev, "attempting to rebuild PF VSI\n");
 		/* no VEB, so rebuild only the Main VSI */
-		ret = i40e_add_vsi(pf->vsi[pf->lan_vsi]);
+		ret = i40e_add_vsi(vsi);
 		if (ret) {
 			dev_info(&pf->pdev->dev,
 				 "rebuild of Main VSI failed: %d\n", ret);
@@ -8357,10 +8424,24 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired)
 		}
 	}
 
+	if (vsi->mqprio_qopt.max_rate[0]) {
+		u64 max_tx_rate = (vsi->mqprio_qopt.max_rate[0] * 8) / 1000000;
+
+		ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+		if (!ret)
+			dev_dbg(&vsi->back->pdev->dev,
+				"Set tx rate of %llu Mbps (count of 50Mbps %llu) for vsi->seid %u\n",
+				max_tx_rate,
+				max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+				vsi->seid);
+		else
+			goto end_unlock;
+	}
+
 	/* PF Main VSI is rebuild by now, go ahead and rebuild channel VSIs
 	 * for this main VSI if they exist
 	 */
-	ret = i40e_rebuild_channels(pf->vsi[pf->lan_vsi]);
+	ret = i40e_rebuild_channels(vsi);
 	if (ret)
 		goto end_unlock;
 


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 14:26   ` David Miller
  -1 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2017-07-11 14:26 UTC (permalink / raw)
  To: amritha.nambiar
  Cc: intel-wired-lan, jeffrey.t.kirsher, alexander.h.duyck,
	kiran.patil, netdev, mitch.a.williams, alexander.duyck,
	neerav.parikh, sridhar.samudrala, carolyn.wyborny

From: Amritha Nambiar <amritha.nambiar@intel.com>
Date: Tue, 11 Jul 2017 03:18:30 -0700

> The following series introduces a new hardware offload mode in
> tc/mqprio where the TCs, the queue configurations and
> bandwidth rate limits are offloaded to the hardware.

net-next is closed, please resubmit this when net-next opens
again:

	http://vger.kernel.org/~davem/net-next.html

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-11 14:26   ` David Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2017-07-11 14:26 UTC (permalink / raw)
  To: intel-wired-lan

From: Amritha Nambiar <amritha.nambiar@intel.com>
Date: Tue, 11 Jul 2017 03:18:30 -0700

> The following series introduces a new hardware offload mode in
> tc/mqprio where the TCs, the queue configurations and
> bandwidth rate limits are offloaded to the hardware.

net-next is closed, please resubmit this when net-next opens
again:

	http://vger.kernel.org/~davem/net-next.html

Thank you.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-wired-lan] [PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well
  2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-11 22:32     ` kbuild test robot
  -1 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2017-07-11 22:32 UTC (permalink / raw)
  To: Amritha Nambiar
  Cc: kbuild-all, intel-wired-lan, jeffrey.t.kirsher, netdev, mitch.a.williams

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

Hi Amritha,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[cannot apply to v4.12 next-20170711]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amritha-Nambiar/Configuring-traffic-classes-via-new-hardware-offload-mechanism-in-tc-mqprio/20170711-215943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
>> ERROR: "__udivdi3" [drivers/net/ethernet/intel/i40e/i40e.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 60446 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well
@ 2017-07-11 22:32     ` kbuild test robot
  0 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2017-07-11 22:32 UTC (permalink / raw)
  To: intel-wired-lan

Hi Amritha,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[cannot apply to v4.12 next-20170711]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amritha-Nambiar/Configuring-traffic-classes-via-new-hardware-offload-mechanism-in-tc-mqprio/20170711-215943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!
>> ERROR: "__udivdi3" [drivers/net/ethernet/intel/i40e/i40e.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 60446 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170712/704bf57b/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
  2017-07-11 14:26   ` [Intel-wired-lan] " David Miller
@ 2017-07-12  0:18     ` Jeff Kirsher
  -1 siblings, 0 replies; 32+ messages in thread
From: Jeff Kirsher @ 2017-07-12  0:18 UTC (permalink / raw)
  To: David Miller, amritha.nambiar
  Cc: intel-wired-lan, alexander.h.duyck, kiran.patil, netdev,
	mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

On Tue, 2017-07-11 at 07:26 -0700, David Miller wrote:
> From: Amritha Nambiar <amritha.nambiar@intel.com>
> Date: Tue, 11 Jul 2017 03:18:30 -0700
> 
> > The following series introduces a new hardware offload mode in
> > tc/mqprio where the TCs, the queue configurations and
> > bandwidth rate limits are offloaded to the hardware.
> 
> net-next is closed, please resubmit this when net-next opens
> again:
> 
> 	http://vger.kernel.org/~davem/net-next.html
> 
> Thank you.

Dave, this is gonna come through my tree.  Amritha is looking for
community feedback, while we have this under validation.  Sorry for the
confusion.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-12  0:18     ` Jeff Kirsher
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Kirsher @ 2017-07-12  0:18 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, 2017-07-11 at 07:26 -0700, David Miller wrote:
> From: Amritha Nambiar <amritha.nambiar@intel.com>
> Date: Tue, 11 Jul 2017 03:18:30 -0700
> 
> > The following series introduces a new hardware offload mode in
> > tc/mqprio where the TCs, the queue configurations and
> > bandwidth rate limits are offloaded to the hardware.
> 
> net-next is closed, please resubmit this when net-next opens
> again:
> 
> 	http://vger.kernel.org/~davem/net-next.html
> 
> Thank you.

Dave, this is gonna come through my tree.  Amritha is looking for
community feedback, while we have this under validation.  Sorry for the
confusion.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170711/c600841b/attachment.asc>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
  2017-07-12  0:18     ` [Intel-wired-lan] " Jeff Kirsher
@ 2017-07-12  0:36       ` David Miller
  -1 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2017-07-12  0:36 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: amritha.nambiar, intel-wired-lan, alexander.h.duyck, kiran.patil,
	netdev, mitch.a.williams, alexander.duyck, neerav.parikh,
	sridhar.samudrala, carolyn.wyborny

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 11 Jul 2017 17:18:17 -0700

> On Tue, 2017-07-11 at 07:26 -0700, David Miller wrote:
>> From: Amritha Nambiar <amritha.nambiar@intel.com>
>> Date: Tue, 11 Jul 2017 03:18:30 -0700
>> 
>> > The following series introduces a new hardware offload mode in
>> > tc/mqprio where the TCs, the queue configurations and
>> > bandwidth rate limits are offloaded to the hardware.
>> 
>> net-next is closed, please resubmit this when net-next opens
>> again:
>> 
>> 	http://vger.kernel.org/~davem/net-next.html
>> 
>> Thank you.
> 
> Dave, this is gonna come through my tree.  Amritha is looking for
> community feedback, while we have this under validation.  Sorry for the
> confusion.

One way to deal with this is to say something like:

	[PATCH net-next RFC]

in the subject lines instead of just:

	[PATCH net-next]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-12  0:36       ` David Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2017-07-12  0:36 UTC (permalink / raw)
  To: intel-wired-lan

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 11 Jul 2017 17:18:17 -0700

> On Tue, 2017-07-11 at 07:26 -0700, David Miller wrote:
>> From: Amritha Nambiar <amritha.nambiar@intel.com>
>> Date: Tue, 11 Jul 2017 03:18:30 -0700
>> 
>> > The following series introduces a new hardware offload mode in
>> > tc/mqprio where the TCs, the queue configurations and
>> > bandwidth rate limits are offloaded to the hardware.
>> 
>> net-next is closed, please resubmit this when net-next opens
>> again:
>> 
>> 	http://vger.kernel.org/~davem/net-next.html
>> 
>> Thank you.
> 
> Dave, this is gonna come through my tree.  Amritha is looking for
> community feedback, while we have this under validation.  Sorry for the
> confusion.

One way to deal with this is to say something like:

	[PATCH net-next RFC]

in the subject lines instead of just:

	[PATCH net-next]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-wired-lan] [PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio
  2017-07-11 10:19   ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-12 14:12     ` kbuild test robot
  -1 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2017-07-12 14:12 UTC (permalink / raw)
  To: Amritha Nambiar
  Cc: kbuild-all, intel-wired-lan, jeffrey.t.kirsher, netdev, mitch.a.williams

[-- Attachment #1: Type: text/plain, Size: 1412 bytes --]

Hi Amritha,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[cannot apply to v4.12]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amritha-Nambiar/Configuring-traffic-classes-via-new-hardware-offload-mechanism-in-tc-mqprio/20170711-215943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `i40e_set_bw_limit':
   (.text+0x11aabd9): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `i40e_rebuild':
>> i40e_main.c:(.text+0x11af884): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `__i40e_setup_tc':
   i40e_main.c:(.text+0x11b0cb5): undefined reference to `__udivdi3'
   i40e_main.c:(.text+0x11b0e99): undefined reference to `__udivdi3'
   i40e_main.c:(.text+0x11b0f25): undefined reference to `__udivdi3'
   drivers/built-in.o:(.text+0x1297fb8): more undefined references to `__udivdi3' follow

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59805 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio
@ 2017-07-12 14:12     ` kbuild test robot
  0 siblings, 0 replies; 32+ messages in thread
From: kbuild test robot @ 2017-07-12 14:12 UTC (permalink / raw)
  To: intel-wired-lan

Hi Amritha,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[cannot apply to v4.12]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amritha-Nambiar/Configuring-traffic-classes-via-new-hardware-offload-mechanism-in-tc-mqprio/20170711-215943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `i40e_set_bw_limit':
   (.text+0x11aabd9): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `i40e_rebuild':
>> i40e_main.c:(.text+0x11af884): undefined reference to `__udivdi3'
   drivers/built-in.o: In function `__i40e_setup_tc':
   i40e_main.c:(.text+0x11b0cb5): undefined reference to `__udivdi3'
   i40e_main.c:(.text+0x11b0e99): undefined reference to `__udivdi3'
   i40e_main.c:(.text+0x11b0f25): undefined reference to `__udivdi3'
   drivers/built-in.o:(.text+0x1297fb8): more undefined references to `__udivdi3' follow

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 59805 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170712/c97065e2/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
  2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-14  7:56   ` Jamal Hadi Salim
  -1 siblings, 0 replies; 32+ messages in thread
From: Jamal Hadi Salim @ 2017-07-14  7:56 UTC (permalink / raw)
  To: Amritha Nambiar, intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, netdev, mitch.a.williams,
	alexander.duyck, neerav.parikh, sridhar.samudrala,
	carolyn.wyborny, Roman Mashak

Hi Amritha,

On 17-07-11 06:18 AM, Amritha Nambiar wrote:
> The following series introduces a new hardware offload mode in
> tc/mqprio where the TCs, the queue configurations and
> bandwidth rate limits are offloaded to the hardware. The existing
> mqprio framework is extended to configure the queue counts and
> layout and also added support for rate limiting. This is achieved
> by setting the value 2 for the "hw" mode. Legacy devices can fall
> back to the existing setup supporting hw mode 1 where only the
> TCs are offloaded, however if hw mode 2 is specified, then this
> type of offload fails to initialize if the requested mode is not
> supported. The i40e driver enables the new mqprio hardware offload
> mechanism factoring the TCs, queue configuration and bandwidth
> rates by creating HW channel VSIs.
> 
> In this new mode, the priority to traffic class mapping and the
> user specified queue ranges are used to configure the traffic
> class when the 'hw' option is set to 2. This is achieved by
> creating HW channels(VSI). A new channel is created for each
> of the traffic class configuration offloaded via mqprio
> framework except for the first TC (TC0) which is for the main
> VSI. TC0 for the main VSI is also reconfigured as per user
> provided queue parameters. Finally, bandwidth rate limits are
> set on these traffic classes through the mqprio offload
> framework by sending these rates in addition to the number of
> TCs and the queue configurations.
> 
> Example:
> # tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
>    queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
> 
> To dump the bandwidth rates:
> 
> # tc qdisc show dev eth0
>    qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>                 queues:(0:3) (4:7)
>                 min rates:0bit 0bit
>                 max rates:55Mbit 60Mbit

My only concern is usability. Was already cryptic with hw "1" and
now we introduce another magic number "2". Could we not have two
have some human friendly nouns? I know 1 means offload qos and two
mean offload rate (+qos?).

I have some questions:
BTW, what would tc class show dev eth0 display in this case?
Above seems to select queue groupings for a specific rates which
is a short-cut to say assigning each queue via something like TBF
via "tc class" semantics (but way more cryptic than tc class;->)

cheers,
jamal

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-14  7:56   ` Jamal Hadi Salim
  0 siblings, 0 replies; 32+ messages in thread
From: Jamal Hadi Salim @ 2017-07-14  7:56 UTC (permalink / raw)
  To: intel-wired-lan

Hi Amritha,

On 17-07-11 06:18 AM, Amritha Nambiar wrote:
> The following series introduces a new hardware offload mode in
> tc/mqprio where the TCs, the queue configurations and
> bandwidth rate limits are offloaded to the hardware. The existing
> mqprio framework is extended to configure the queue counts and
> layout and also added support for rate limiting. This is achieved
> by setting the value 2 for the "hw" mode. Legacy devices can fall
> back to the existing setup supporting hw mode 1 where only the
> TCs are offloaded, however if hw mode 2 is specified, then this
> type of offload fails to initialize if the requested mode is not
> supported. The i40e driver enables the new mqprio hardware offload
> mechanism factoring the TCs, queue configuration and bandwidth
> rates by creating HW channel VSIs.
> 
> In this new mode, the priority to traffic class mapping and the
> user specified queue ranges are used to configure the traffic
> class when the 'hw' option is set to 2. This is achieved by
> creating HW channels(VSI). A new channel is created for each
> of the traffic class configuration offloaded via mqprio
> framework except for the first TC (TC0) which is for the main
> VSI. TC0 for the main VSI is also reconfigured as per user
> provided queue parameters. Finally, bandwidth rate limits are
> set on these traffic classes through the mqprio offload
> framework by sending these rates in addition to the number of
> TCs and the queue configurations.
> 
> Example:
> # tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
>    queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
> 
> To dump the bandwidth rates:
> 
> # tc qdisc show dev eth0
>    qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>                 queues:(0:3) (4:7)
>                 min rates:0bit 0bit
>                 max rates:55Mbit 60Mbit

My only concern is usability. Was already cryptic with hw "1" and
now we introduce another magic number "2". Could we not have two
have some human friendly nouns? I know 1 means offload qos and two
mean offload rate (+qos?).

I have some questions:
BTW, what would tc class show dev eth0 display in this case?
Above seems to select queue groupings for a specific rates which
is a short-cut to say assigning each queue via something like TBF
via "tc class" semantics (but way more cryptic than tc class;->)

cheers,
jamal



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
  2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
@ 2017-07-14  8:36     ` Jamal Hadi Salim
  -1 siblings, 0 replies; 32+ messages in thread
From: Jamal Hadi Salim @ 2017-07-14  8:36 UTC (permalink / raw)
  To: Amritha Nambiar, intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, netdev, mitch.a.williams,
	alexander.duyck, neerav.parikh, sridhar.samudrala,
	carolyn.wyborny

On 17-07-11 06:18 AM, Amritha Nambiar wrote:
> This patch introduces a new hardware offload mode in mqprio
> which makes full use of the mqprio options, the TCs, the
> queue configurations and the bandwidth rates for the TCs.
> This is achieved by setting the value 2 for the "hw" option.
> This new offload mode supports new attributes for traffic
> class such as minimum and maximum values for bandwidth rate limits.
> 
> Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
> mqprio queue options and use this to be shared between the kernel and
> device driver. This contains a copy of the exisiting datastructure
> for mqprio queue options. This new datastructure can be extended when
> adding new attributes for traffic class such as bandwidth rate limits. The
> existing datastructure for mqprio queue options will be shared between the
> kernel and userspace.
> 
> This patch enables configuring additional attributes associated
> with a traffic class such as minimum and maximum bandwidth
> rates and can be offloaded to the hardware in the new offload mode.
> The min and max limits for bandwidth rates are provided
> by the user along with the the TCs and the queue configurations
> when creating the mqprio qdisc.
> 
> Example:
> # tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
>    queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
> 

I know this has nothing to do with your patches - but that is very
unfriendly ;-> Most mortals will have a problem with the map (but you
can argue it has been there since prio qdisc was introduced) - leave
alone the 4@4 syntax and now min_rate where i have to type in obvious
defaults like "0Mbit".
You have some great features that not many people can use as a result.
Note:
This is just a comment maybe someone can be kind enough to fix (or
it would get annoying enough I will fix it); i.e should not be
holding your good work.

> To dump the bandwidth rates:
> 
> # tc qdisc show dev eth0
> 
> qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>               queues:(0:3) (4:7)
>               min rates:0bit 0bit
>               max rates:55Mbit 60Mbit
> 
> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> ---
>   include/linux/netdevice.h      |    2
>   include/net/pkt_cls.h          |    7 ++
>   include/uapi/linux/pkt_sched.h |   13 +++
>   net/sched/sch_mqprio.c         |  170 +++++++++++++++++++++++++++++++++++++---
>   4 files changed, 181 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e48ee2e..12c6c3f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -779,6 +779,7 @@ enum {
>   	TC_SETUP_CLSFLOWER,
>   	TC_SETUP_MATCHALL,
>   	TC_SETUP_CLSBPF,
> +	TC_SETUP_MQPRIO_EXT,
>   };
> 
>   struct tc_cls_u32_offload;
> @@ -791,6 +792,7 @@ struct tc_to_netdev {
>   		struct tc_cls_matchall_offload *cls_mall;
>   		struct tc_cls_bpf_offload *cls_bpf;
>   		struct tc_mqprio_qopt *mqprio;
> +		struct tc_mqprio_qopt_offload *mqprio_qopt;
>   	};
>   	bool egress_dev;
>   };


> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 537d0a0..9facda2 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
>   	u32 gen_flags;
>   };
>   
> +struct tc_mqprio_qopt_offload {
> +	/* struct tc_mqprio_qopt must always be the first element */
> +	struct tc_mqprio_qopt qopt;
> +	u32 flags;
> +	u64 min_rate[TC_QOPT_MAX_QUEUE];
> +	u64 max_rate[TC_QOPT_MAX_QUEUE];
> +};
>

Quickly scanned code.
My opinion is: struct tc_mqprio_qopt is messed up in terms of
alignments. And you just made it worse. Why not create a new struct
call it "tc_mqprio_qopt_hw" or something indicating it is for hw
offload. You can then fixup stuff. I think it will depend on whether
you can have both hw priority and rate in all network cards.
If some hw cannot support rate offload then I would suggest it becomes
optional via TLVs etc.
If you are willing to do that clean up I can say more.

>   /* This structure holds cookie structure that is passed from user
>    * to the kernel for actions and classifiers
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 099bf55..cf2a146 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -620,6 +620,7 @@ struct tc_drr_stats {
>   enum {
>   	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
>   	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
> +	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
>   	__TC_MQPRIO_HW_OFFLOAD_MAX
>   };
>   
> @@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
>   	__u16	offset[TC_QOPT_MAX_QUEUE];
>   };
>   
> +#define TC_MQPRIO_F_MIN_RATE  0x1
> +#define TC_MQPRIO_F_MAX_RATE  0x2
> +
> +enum {
> +	TCA_MQPRIO_UNSPEC,
> +	TCA_MQPRIO_MIN_RATE64,
> +	TCA_MQPRIO_MAX_RATE64,
> +	__TCA_MQPRIO_MAX,
> +};
> +
> +#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
> +
>   /* SFB */
>   
>   enum {
> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
> index e0c0272..6524d12 100644
> --- a/net/sched/sch_mqprio.c
> +++ b/net/sched/sch_mqprio.c
> @@ -18,10 +18,13 @@
>   #include <net/netlink.h>
>   #include <net/pkt_sched.h>
>   #include <net/sch_generic.h>
> +#include <net/pkt_cls.h>
>   
>   struct mqprio_sched {
>   	struct Qdisc		**qdiscs;
>   	int hw_offload;
> +	u32 flags;
> +	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
>   };

You have to change this before Jiri sees it ;->


I will review more later.

cheers,
jamal

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
@ 2017-07-14  8:36     ` Jamal Hadi Salim
  0 siblings, 0 replies; 32+ messages in thread
From: Jamal Hadi Salim @ 2017-07-14  8:36 UTC (permalink / raw)
  To: intel-wired-lan

On 17-07-11 06:18 AM, Amritha Nambiar wrote:
> This patch introduces a new hardware offload mode in mqprio
> which makes full use of the mqprio options, the TCs, the
> queue configurations and the bandwidth rates for the TCs.
> This is achieved by setting the value 2 for the "hw" option.
> This new offload mode supports new attributes for traffic
> class such as minimum and maximum values for bandwidth rate limits.
> 
> Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
> mqprio queue options and use this to be shared between the kernel and
> device driver. This contains a copy of the exisiting datastructure
> for mqprio queue options. This new datastructure can be extended when
> adding new attributes for traffic class such as bandwidth rate limits. The
> existing datastructure for mqprio queue options will be shared between the
> kernel and userspace.
> 
> This patch enables configuring additional attributes associated
> with a traffic class such as minimum and maximum bandwidth
> rates and can be offloaded to the hardware in the new offload mode.
> The min and max limits for bandwidth rates are provided
> by the user along with the the TCs and the queue configurations
> when creating the mqprio qdisc.
> 
> Example:
> # tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
>    queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
> 

I know this has nothing to do with your patches - but that is very
unfriendly ;-> Most mortals will have a problem with the map (but you
can argue it has been there since prio qdisc was introduced) - leave
alone the 4 at 4 syntax and now min_rate where i have to type in obvious
defaults like "0Mbit".
You have some great features that not many people can use as a result.
Note:
This is just a comment maybe someone can be kind enough to fix (or
it would get annoying enough I will fix it); i.e should not be
holding your good work.

> To dump the bandwidth rates:
> 
> # tc qdisc show dev eth0
> 
> qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>               queues:(0:3) (4:7)
>               min rates:0bit 0bit
>               max rates:55Mbit 60Mbit
> 
> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> ---
>   include/linux/netdevice.h      |    2
>   include/net/pkt_cls.h          |    7 ++
>   include/uapi/linux/pkt_sched.h |   13 +++
>   net/sched/sch_mqprio.c         |  170 +++++++++++++++++++++++++++++++++++++---
>   4 files changed, 181 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e48ee2e..12c6c3f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -779,6 +779,7 @@ enum {
>   	TC_SETUP_CLSFLOWER,
>   	TC_SETUP_MATCHALL,
>   	TC_SETUP_CLSBPF,
> +	TC_SETUP_MQPRIO_EXT,
>   };
> 
>   struct tc_cls_u32_offload;
> @@ -791,6 +792,7 @@ struct tc_to_netdev {
>   		struct tc_cls_matchall_offload *cls_mall;
>   		struct tc_cls_bpf_offload *cls_bpf;
>   		struct tc_mqprio_qopt *mqprio;
> +		struct tc_mqprio_qopt_offload *mqprio_qopt;
>   	};
>   	bool egress_dev;
>   };


> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 537d0a0..9facda2 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
>   	u32 gen_flags;
>   };
>   
> +struct tc_mqprio_qopt_offload {
> +	/* struct tc_mqprio_qopt must always be the first element */
> +	struct tc_mqprio_qopt qopt;
> +	u32 flags;
> +	u64 min_rate[TC_QOPT_MAX_QUEUE];
> +	u64 max_rate[TC_QOPT_MAX_QUEUE];
> +};
>

Quickly scanned code.
My opinion is: struct tc_mqprio_qopt is messed up in terms of
alignments. And you just made it worse. Why not create a new struct
call it "tc_mqprio_qopt_hw" or something indicating it is for hw
offload. You can then fixup stuff. I think it will depend on whether
you can have both hw priority and rate in all network cards.
If some hw cannot support rate offload then I would suggest it becomes
optional via TLVs etc.
If you are willing to do that clean up I can say more.

>   /* This structure holds cookie structure that is passed from user
>    * to the kernel for actions and classifiers
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 099bf55..cf2a146 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -620,6 +620,7 @@ struct tc_drr_stats {
>   enum {
>   	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
>   	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
> +	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
>   	__TC_MQPRIO_HW_OFFLOAD_MAX
>   };
>   
> @@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
>   	__u16	offset[TC_QOPT_MAX_QUEUE];
>   };
>   
> +#define TC_MQPRIO_F_MIN_RATE  0x1
> +#define TC_MQPRIO_F_MAX_RATE  0x2
> +
> +enum {
> +	TCA_MQPRIO_UNSPEC,
> +	TCA_MQPRIO_MIN_RATE64,
> +	TCA_MQPRIO_MAX_RATE64,
> +	__TCA_MQPRIO_MAX,
> +};
> +
> +#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
> +
>   /* SFB */
>   
>   enum {
> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
> index e0c0272..6524d12 100644
> --- a/net/sched/sch_mqprio.c
> +++ b/net/sched/sch_mqprio.c
> @@ -18,10 +18,13 @@
>   #include <net/netlink.h>
>   #include <net/pkt_sched.h>
>   #include <net/sch_generic.h>
> +#include <net/pkt_cls.h>
>   
>   struct mqprio_sched {
>   	struct Qdisc		**qdiscs;
>   	int hw_offload;
> +	u32 flags;
> +	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
>   };

You have to change this before Jiri sees it ;->


I will review more later.

cheers,
jamal

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
  2017-07-14  7:56   ` [Intel-wired-lan] " Jamal Hadi Salim
@ 2017-07-14 10:31     ` Nambiar, Amritha
  -1 siblings, 0 replies; 32+ messages in thread
From: Nambiar, Amritha @ 2017-07-14 10:31 UTC (permalink / raw)
  To: Jamal Hadi Salim, intel-wired-lan, jeffrey.t.kirsher
  Cc: alexander.h.duyck, kiran.patil, netdev, mitch.a.williams,
	alexander.duyck, neerav.parikh, sridhar.samudrala,
	carolyn.wyborny, Roman Mashak

On 7/14/2017 12:56 AM, Jamal Hadi Salim wrote:
> Hi Amritha,
>
> On 17-07-11 06:18 AM, Amritha Nambiar wrote:
>> The following series introduces a new hardware offload mode in
>> tc/mqprio where the TCs, the queue configurations and
>> bandwidth rate limits are offloaded to the hardware. The existing
>> mqprio framework is extended to configure the queue counts and
>> layout and also added support for rate limiting. This is achieved
>> by setting the value 2 for the "hw" mode. Legacy devices can fall
>> back to the existing setup supporting hw mode 1 where only the
>> TCs are offloaded, however if hw mode 2 is specified, then this
>> type of offload fails to initialize if the requested mode is not
>> supported. The i40e driver enables the new mqprio hardware offload
>> mechanism factoring the TCs, queue configuration and bandwidth
>> rates by creating HW channel VSIs.
>>
>> In this new mode, the priority to traffic class mapping and the
>> user specified queue ranges are used to configure the traffic
>> class when the 'hw' option is set to 2. This is achieved by
>> creating HW channels(VSI). A new channel is created for each
>> of the traffic class configuration offloaded via mqprio
>> framework except for the first TC (TC0) which is for the main
>> VSI. TC0 for the main VSI is also reconfigured as per user
>> provided queue parameters. Finally, bandwidth rate limits are
>> set on these traffic classes through the mqprio offload
>> framework by sending these rates in addition to the number of
>> TCs and the queue configurations.
>>
>> Example:
>> # tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
>>    queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
>>
>> To dump the bandwidth rates:
>>
>> # tc qdisc show dev eth0
>>    qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>>                 queues:(0:3) (4:7)
>>                 min rates:0bit 0bit
>>                 max rates:55Mbit 60Mbit
>
> My only concern is usability. Was already cryptic with hw "1" and
> now we introduce another magic number "2". Could we not have two
> have some human friendly nouns? I know 1 means offload qos and two
> mean offload rate (+qos?).

I agree this could be less friendly, I was trying to retain the existing
conventions. Yes, hw "2" offloads qos and rates.

>
> I have some questions:
> BTW, what would tc class show dev eth0 display in this case?
> Above seems to select queue groupings for a specific rates which
> is a short-cut to say assigning each queue via something like TBF
> via "tc class" semantics (but way more cryptic than tc class;->)

Dumping the classes under mqprio qdisc will not display any of the
queue configurations or rates. MQPRIO is classless and does not support
change_class operation like HTB does to add a class to the qdisc and set
a rate for the class.

>
> cheers,
> jamal
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
@ 2017-07-14 10:31     ` Nambiar, Amritha
  0 siblings, 0 replies; 32+ messages in thread
From: Nambiar, Amritha @ 2017-07-14 10:31 UTC (permalink / raw)
  To: intel-wired-lan

On 7/14/2017 12:56 AM, Jamal Hadi Salim wrote:
> Hi Amritha,
>
> On 17-07-11 06:18 AM, Amritha Nambiar wrote:
>> The following series introduces a new hardware offload mode in
>> tc/mqprio where the TCs, the queue configurations and
>> bandwidth rate limits are offloaded to the hardware. The existing
>> mqprio framework is extended to configure the queue counts and
>> layout and also added support for rate limiting. This is achieved
>> by setting the value 2 for the "hw" mode. Legacy devices can fall
>> back to the existing setup supporting hw mode 1 where only the
>> TCs are offloaded, however if hw mode 2 is specified, then this
>> type of offload fails to initialize if the requested mode is not
>> supported. The i40e driver enables the new mqprio hardware offload
>> mechanism factoring the TCs, queue configuration and bandwidth
>> rates by creating HW channel VSIs.
>>
>> In this new mode, the priority to traffic class mapping and the
>> user specified queue ranges are used to configure the traffic
>> class when the 'hw' option is set to 2. This is achieved by
>> creating HW channels(VSI). A new channel is created for each
>> of the traffic class configuration offloaded via mqprio
>> framework except for the first TC (TC0) which is for the main
>> VSI. TC0 for the main VSI is also reconfigured as per user
>> provided queue parameters. Finally, bandwidth rate limits are
>> set on these traffic classes through the mqprio offload
>> framework by sending these rates in addition to the number of
>> TCs and the queue configurations.
>>
>> Example:
>> # tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
>>    queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
>>
>> To dump the bandwidth rates:
>>
>> # tc qdisc show dev eth0
>>    qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>>                 queues:(0:3) (4:7)
>>                 min rates:0bit 0bit
>>                 max rates:55Mbit 60Mbit
>
> My only concern is usability. Was already cryptic with hw "1" and
> now we introduce another magic number "2". Could we not have two
> have some human friendly nouns? I know 1 means offload qos and two
> mean offload rate (+qos?).

I agree this could be less friendly, I was trying to retain the existing
conventions. Yes, hw "2" offloads qos and rates.

>
> I have some questions:
> BTW, what would tc class show dev eth0 display in this case?
> Above seems to select queue groupings for a specific rates which
> is a short-cut to say assigning each queue via something like TBF
> via "tc class" semantics (but way more cryptic than tc class;->)

Dumping the classes under mqprio qdisc will not display any of the
queue configurations or rates. MQPRIO is classless and does not support
change_class operation like HTB does to add a class to the qdisc and set
a rate for the class.

>
> cheers,
> jamal
>
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
  2017-07-14  8:36     ` [Intel-wired-lan] " Jamal Hadi Salim
@ 2017-07-15  2:00       ` Nambiar, Amritha
  -1 siblings, 0 replies; 32+ messages in thread
From: Nambiar, Amritha @ 2017-07-15  2:00 UTC (permalink / raw)
  To: Jamal Hadi Salim, intel-wired-lan, Kirsher, Jeffrey T
  Cc: Duyck, Alexander H, Patil, Kiran, netdev, Williams, Mitch A,
	alexander.duyck, Parikh, Neerav, Samudrala, Sridhar, Wyborny,
	Carolyn

On 7/14/2017 1:36 AM, Jamal Hadi Salim wrote:
> On 17-07-11 06:18 AM, Amritha Nambiar wrote:
>> This patch introduces a new hardware offload mode in mqprio
>> which makes full use of the mqprio options, the TCs, the
>> queue configurations and the bandwidth rates for the TCs.
>> This is achieved by setting the value 2 for the "hw" option.
>> This new offload mode supports new attributes for traffic
>> class such as minimum and maximum values for bandwidth rate limits.
>>
>> Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
>> mqprio queue options and use this to be shared between the kernel and
>> device driver. This contains a copy of the exisiting datastructure
>> for mqprio queue options. This new datastructure can be extended when
>> adding new attributes for traffic class such as bandwidth rate limits. The
>> existing datastructure for mqprio queue options will be shared between the
>> kernel and userspace.
>>
>> This patch enables configuring additional attributes associated
>> with a traffic class such as minimum and maximum bandwidth
>> rates and can be offloaded to the hardware in the new offload mode.
>> The min and max limits for bandwidth rates are provided
>> by the user along with the the TCs and the queue configurations
>> when creating the mqprio qdisc.
>>
>> Example:
>> # tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
>>     queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
>>
> I know this has nothing to do with your patches - but that is very
> unfriendly ;-> Most mortals will have a problem with the map (but you
> can argue it has been there since prio qdisc was introduced) - leave
> alone the 4@4 syntax and now min_rate where i have to type in obvious
> defaults like "0Mbit".

The min_rate and max_rate are optional attributes for the traffic class 
and it is
not mandatory to have both. It is also possible to have either one of 
them, say,
devices that do not support setting min rate need to specify only
the max rate and need not type in the default 0Mbit. My bad, I should 
probably
have given a better example.

# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
    queues 4@0 4@4 max_rate 55Mbit 60Mbit hw 2


> You have some great features that not many people can use as a result.
> Note:
> This is just a comment maybe someone can be kind enough to fix (or
> it would get annoying enough I will fix it); i.e should not be
> holding your good work.
>
>> To dump the bandwidth rates:
>>
>> # tc qdisc show dev eth0
>>
>> qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>>                queues:(0:3) (4:7)
>>                min rates:0bit 0bit
>>                max rates:55Mbit 60Mbit
>>
>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>> ---
>>    include/linux/netdevice.h      |    2
>>    include/net/pkt_cls.h          |    7 ++
>>    include/uapi/linux/pkt_sched.h |   13 +++
>>    net/sched/sch_mqprio.c         |  170 +++++++++++++++++++++++++++++++++++++---
>>    4 files changed, 181 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index e48ee2e..12c6c3f 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -779,6 +779,7 @@ enum {
>>    	TC_SETUP_CLSFLOWER,
>>    	TC_SETUP_MATCHALL,
>>    	TC_SETUP_CLSBPF,
>> +	TC_SETUP_MQPRIO_EXT,
>>    };
>>
>>    struct tc_cls_u32_offload;
>> @@ -791,6 +792,7 @@ struct tc_to_netdev {
>>    		struct tc_cls_matchall_offload *cls_mall;
>>    		struct tc_cls_bpf_offload *cls_bpf;
>>    		struct tc_mqprio_qopt *mqprio;
>> +		struct tc_mqprio_qopt_offload *mqprio_qopt;
>>    	};
>>    	bool egress_dev;
>>    };
>
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index 537d0a0..9facda2 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
>>    	u32 gen_flags;
>>    };
>>    
>> +struct tc_mqprio_qopt_offload {
>> +	/* struct tc_mqprio_qopt must always be the first element */
>> +	struct tc_mqprio_qopt qopt;
>> +	u32 flags;
>> +	u64 min_rate[TC_QOPT_MAX_QUEUE];
>> +	u64 max_rate[TC_QOPT_MAX_QUEUE];
>> +};
>>
> Quickly scanned code.
> My opinion is: struct tc_mqprio_qopt is messed up in terms of
> alignments. And you just made it worse. Why not create a new struct
> call it "tc_mqprio_qopt_hw" or something indicating it is for hw
> offload. You can then fixup stuff. I think it will depend on whether
> you can have both hw priority and rate in all network cards.
> If some hw cannot support rate offload then I would suggest it becomes
> optional via TLVs etc.
> If you are willing to do that clean up I can say more.

The existing struct tc_mqprio_qopt does have alignment issues, but is 
shared with
the userspace. The new struct tc_mqprio_qopt_offload is shared with the 
device
driver. This contains a copy of the existing tc_mqprio_qopt for mqprio queue
options for legacy users. The rates are optional attributes obtained as 
TLVs from
the userspace via additional netlink attributes. This would be clear 
from the
corresponding iproute2 RFC patch I submitted.
([PATCH RFC, iproute2] tc/mqprio: Add support to configure bandwidth 
rate limit through mqprio).

>
>>    /* This structure holds cookie structure that is passed from user
>>     * to the kernel for actions and classifiers
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 099bf55..cf2a146 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -620,6 +620,7 @@ struct tc_drr_stats {
>>    enum {
>>    	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
>>    	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
>> +	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
>>    	__TC_MQPRIO_HW_OFFLOAD_MAX
>>    };
>>    
>> @@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
>>    	__u16	offset[TC_QOPT_MAX_QUEUE];
>>    };
>>    
>> +#define TC_MQPRIO_F_MIN_RATE  0x1
>> +#define TC_MQPRIO_F_MAX_RATE  0x2
>> +
>> +enum {
>> +	TCA_MQPRIO_UNSPEC,
>> +	TCA_MQPRIO_MIN_RATE64,
>> +	TCA_MQPRIO_MAX_RATE64,
>> +	__TCA_MQPRIO_MAX,
>> +};
>> +
>> +#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
>> +
>>    /* SFB */
>>    
>>    enum {
>> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
>> index e0c0272..6524d12 100644
>> --- a/net/sched/sch_mqprio.c
>> +++ b/net/sched/sch_mqprio.c
>> @@ -18,10 +18,13 @@
>>    #include <net/netlink.h>
>>    #include <net/pkt_sched.h>
>>    #include <net/sch_generic.h>
>> +#include <net/pkt_cls.h>
>>    
>>    struct mqprio_sched {
>>    	struct Qdisc		**qdiscs;
>>    	int hw_offload;
>> +	u32 flags;
>> +	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
>>    };
> You have to change this before Jiri sees it ;->

I had to store the rates here since tc_mqprio_qopt_offload:offload is only a
temporary variable and I need to retrieve these rates so they can be 
reported
when someone attempts to display the current qdisc configuration using the
dump command. The rates here are obtained from the optional netlink 
attributes.
Did you mean I should have retrieved these rates by querying the device 
instead
of storing them here? I think that would require extending struct 
net_device for
the rates.

>
>
> I will review more later.
>
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-wired-lan] [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
@ 2017-07-15  2:00       ` Nambiar, Amritha
  0 siblings, 0 replies; 32+ messages in thread
From: Nambiar, Amritha @ 2017-07-15  2:00 UTC (permalink / raw)
  To: intel-wired-lan

On 7/14/2017 1:36 AM, Jamal Hadi Salim wrote:
> On 17-07-11 06:18 AM, Amritha Nambiar wrote:
>> This patch introduces a new hardware offload mode in mqprio
>> which makes full use of the mqprio options, the TCs, the
>> queue configurations and the bandwidth rates for the TCs.
>> This is achieved by setting the value 2 for the "hw" option.
>> This new offload mode supports new attributes for traffic
>> class such as minimum and maximum values for bandwidth rate limits.
>>
>> Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
>> mqprio queue options and use this to be shared between the kernel and
>> device driver. This contains a copy of the exisiting datastructure
>> for mqprio queue options. This new datastructure can be extended when
>> adding new attributes for traffic class such as bandwidth rate limits. The
>> existing datastructure for mqprio queue options will be shared between the
>> kernel and userspace.
>>
>> This patch enables configuring additional attributes associated
>> with a traffic class such as minimum and maximum bandwidth
>> rates and can be offloaded to the hardware in the new offload mode.
>> The min and max limits for bandwidth rates are provided
>> by the user along with the the TCs and the queue configurations
>> when creating the mqprio qdisc.
>>
>> Example:
>> # tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
>>     queues 4 at 0 4 at 4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2
>>
> I know this has nothing to do with your patches - but that is very
> unfriendly ;-> Most mortals will have a problem with the map (but you
> can argue it has been there since prio qdisc was introduced) - leave
> alone the 4 at 4 syntax and now min_rate where i have to type in obvious
> defaults like "0Mbit".

The min_rate and max_rate are optional attributes for the traffic class 
and it is
not mandatory to have both. It is also possible to have either one of 
them, say,
devices that do not support setting min rate need to specify only
the max rate and need not type in the default 0Mbit. My bad, I should 
probably
have given a better example.

# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
    queues 4 at 0 4 at 4 max_rate 55Mbit 60Mbit hw 2


> You have some great features that not many people can use as a result.
> Note:
> This is just a comment maybe someone can be kind enough to fix (or
> it would get annoying enough I will fix it); i.e should not be
> holding your good work.
>
>> To dump the bandwidth rates:
>>
>> # tc qdisc show dev eth0
>>
>> qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
>>                queues:(0:3) (4:7)
>>                min rates:0bit 0bit
>>                max rates:55Mbit 60Mbit
>>
>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>> ---
>>    include/linux/netdevice.h      |    2
>>    include/net/pkt_cls.h          |    7 ++
>>    include/uapi/linux/pkt_sched.h |   13 +++
>>    net/sched/sch_mqprio.c         |  170 +++++++++++++++++++++++++++++++++++++---
>>    4 files changed, 181 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index e48ee2e..12c6c3f 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -779,6 +779,7 @@ enum {
>>    	TC_SETUP_CLSFLOWER,
>>    	TC_SETUP_MATCHALL,
>>    	TC_SETUP_CLSBPF,
>> +	TC_SETUP_MQPRIO_EXT,
>>    };
>>
>>    struct tc_cls_u32_offload;
>> @@ -791,6 +792,7 @@ struct tc_to_netdev {
>>    		struct tc_cls_matchall_offload *cls_mall;
>>    		struct tc_cls_bpf_offload *cls_bpf;
>>    		struct tc_mqprio_qopt *mqprio;
>> +		struct tc_mqprio_qopt_offload *mqprio_qopt;
>>    	};
>>    	bool egress_dev;
>>    };
>
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index 537d0a0..9facda2 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
>>    	u32 gen_flags;
>>    };
>>    
>> +struct tc_mqprio_qopt_offload {
>> +	/* struct tc_mqprio_qopt must always be the first element */
>> +	struct tc_mqprio_qopt qopt;
>> +	u32 flags;
>> +	u64 min_rate[TC_QOPT_MAX_QUEUE];
>> +	u64 max_rate[TC_QOPT_MAX_QUEUE];
>> +};
>>
> Quickly scanned code.
> My opinion is: struct tc_mqprio_qopt is messed up in terms of
> alignments. And you just made it worse. Why not create a new struct
> call it "tc_mqprio_qopt_hw" or something indicating it is for hw
> offload. You can then fixup stuff. I think it will depend on whether
> you can have both hw priority and rate in all network cards.
> If some hw cannot support rate offload then I would suggest it becomes
> optional via TLVs etc.
> If you are willing to do that clean up I can say more.

The existing struct tc_mqprio_qopt does have alignment issues, but is 
shared with
the userspace. The new struct tc_mqprio_qopt_offload is shared with the 
device
driver. This contains a copy of the existing tc_mqprio_qopt for mqprio queue
options for legacy users. The rates are optional attributes obtained as 
TLVs from
the userspace via additional netlink attributes. This would be clear 
from the
corresponding iproute2 RFC patch I submitted.
([PATCH RFC, iproute2] tc/mqprio: Add support to configure bandwidth 
rate limit through mqprio).

>
>>    /* This structure holds cookie structure that is passed from user
>>     * to the kernel for actions and classifiers
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 099bf55..cf2a146 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -620,6 +620,7 @@ struct tc_drr_stats {
>>    enum {
>>    	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
>>    	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
>> +	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
>>    	__TC_MQPRIO_HW_OFFLOAD_MAX
>>    };
>>    
>> @@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
>>    	__u16	offset[TC_QOPT_MAX_QUEUE];
>>    };
>>    
>> +#define TC_MQPRIO_F_MIN_RATE  0x1
>> +#define TC_MQPRIO_F_MAX_RATE  0x2
>> +
>> +enum {
>> +	TCA_MQPRIO_UNSPEC,
>> +	TCA_MQPRIO_MIN_RATE64,
>> +	TCA_MQPRIO_MAX_RATE64,
>> +	__TCA_MQPRIO_MAX,
>> +};
>> +
>> +#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
>> +
>>    /* SFB */
>>    
>>    enum {
>> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
>> index e0c0272..6524d12 100644
>> --- a/net/sched/sch_mqprio.c
>> +++ b/net/sched/sch_mqprio.c
>> @@ -18,10 +18,13 @@
>>    #include <net/netlink.h>
>>    #include <net/pkt_sched.h>
>>    #include <net/sch_generic.h>
>> +#include <net/pkt_cls.h>
>>    
>>    struct mqprio_sched {
>>    	struct Qdisc		**qdiscs;
>>    	int hw_offload;
>> +	u32 flags;
>> +	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
>>    };
> You have to change this before Jiri sees it ;->

I had to store the rates here since tc_mqprio_qopt_offload:offload is only a
temporary variable and I need to retrieve these rates so they can be 
reported
when someone attempts to display the current qdisc configuration using the
dump command. The rates here are obtained from the optional netlink 
attributes.
Did you mean I should have retrieved these rates by querying the device 
instead
of storing them here? I think that would require extending struct 
net_device for
the rates.

>
>
> I will review more later.
>
> cheers,
> jamal


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-07-15  2:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 10:18 [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio Amritha Nambiar
2017-07-11 10:18 ` [Intel-wired-lan] " Amritha Nambiar
2017-07-11 10:18 ` [PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations Amritha Nambiar
2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
2017-07-14  8:36   ` Jamal Hadi Salim
2017-07-14  8:36     ` [Intel-wired-lan] " Jamal Hadi Salim
2017-07-15  2:00     ` Nambiar, Amritha
2017-07-15  2:00       ` [Intel-wired-lan] " Nambiar, Amritha
2017-07-11 10:18 ` [PATCH 2/6] [next-queue]net: i40e: Add macro for PF reset bit Amritha Nambiar
2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
2017-07-11 10:18 ` [PATCH 3/6] [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler Amritha Nambiar
2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
2017-07-11 10:18 ` [PATCH 4/6] [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping Amritha Nambiar
2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
2017-07-11 10:18 ` [PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well Amritha Nambiar
2017-07-11 10:18   ` [Intel-wired-lan] " Amritha Nambiar
2017-07-11 22:32   ` kbuild test robot
2017-07-11 22:32     ` kbuild test robot
2017-07-11 10:19 ` [PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio Amritha Nambiar
2017-07-11 10:19   ` [Intel-wired-lan] " Amritha Nambiar
2017-07-12 14:12   ` kbuild test robot
2017-07-12 14:12     ` kbuild test robot
2017-07-11 14:26 ` [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio David Miller
2017-07-11 14:26   ` [Intel-wired-lan] " David Miller
2017-07-12  0:18   ` Jeff Kirsher
2017-07-12  0:18     ` [Intel-wired-lan] " Jeff Kirsher
2017-07-12  0:36     ` David Miller
2017-07-12  0:36       ` [Intel-wired-lan] " David Miller
2017-07-14  7:56 ` Jamal Hadi Salim
2017-07-14  7:56   ` [Intel-wired-lan] " Jamal Hadi Salim
2017-07-14 10:31   ` Nambiar, Amritha
2017-07-14 10:31     ` [Intel-wired-lan] " Nambiar, Amritha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.