All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next v1 0/6] RFC: taprio change schedules + offload
@ 2019-04-10  0:32 Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 1/6] taprio: Fix potencial use of invalid memory during dequeue() Vinicius Costa Gomes
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:32 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

Hi,


Overview
--------

This RFC has two objectives, it adds support for changing the running
schedules during "runtime", explained in more detail later, and
proposes an interface between taprio and the drivers for hardware
offloading.

These two different features are presented together so it's clear what
the "final state" would look like. But after the RFC stage, they can
be proposed (and reviewed) separately.

Changing the schedules without disrupting traffic is important for
handling dynamic use cases, for example, when streams are
added/removed and when the network configuration changes.

Hardware offloading support allows schedules to be more precise and
have lower resource usage.


Changing schedules
------------------

The same as the other interfaces we proposed, we try to use the same
concepts as the IEEE 802.1Q-2018 specification. So, for changing
schedules, there are an "oper" (operational) and an "admin" schedule.
The "admin" schedule is mutable and not in use, the "oper" schedule is
immutable and is in use.

That is, when the user first adds an schedule it is in the "admin"
state, and it becomes "oper" when its base-time (basically when it
starts) is reached.

What this means is that now it's possible to create taprio with a schedule:

$ tc qdisc add dev IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time 10000000 \
      sched-entry S 03 300000 \
      sched-entry S 02 300000 \
      sched-entry S 06 400000 \
      clockid CLOCK_TAI
      
And then, later, after the previous schedule is "promoted" to "oper",
add a new ("admin") schedule to be used some time later:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
      base-time 1553121866000000000 \
      sched-entry S 02 500000 \
      sched-entry S 0f 400000 \
      clockid CLOCK_TAI

When enabling the ability to change schedules, it makes sense to add
two more defined knobs to schedules: "cycle-time" allows to truncate a
cycle to some value, so it repeats after a well-defined value;
"cycle-time-extension" controls how much an entry can be extended if
it's the last one before the change of schedules, the reason is to
avoid a very small cycle when transitioning from a schedule to
another.

With these, taprio in the software mode should provide a fairly
complete implementation of what's defined in the Enhancements for
Scheduled Traffic parts of the specification.


Hardware offload
----------------

Some workloads require better guarantees from their schedules than
what's provided by the software implementation. This series proposes
an interface for configuring schedules into compatible network
controllers.

This part is proposed together with the support for changing
schedules, because it raises questions like, should the "qdisc" side
be responsible of providing visibility into the schedules or should it
be the driver?

In this proposal, the driver is called passing the new schedule as
soon as it is validated, and the "core" qdisc takes care of displaying
(".dump()") the correct schedules at all times. It means that some
logic would need to be duplicated in the driver, if the hardware
doesn't have support for multiple schedules. But as taprio doesn't
have enough information about the underlying controller to know how
much in advance a schedule needs to be informed to the hardware, it
feels like a fair compromise.

The hardware offloading part of this proposal also tries to define an
interface for frame-preemption and how it interacts with the
scheduling of traffic, see Section 8.6.8.4 of IEEE 802.1Q-2018 for
more information.

One important difference between the qdisc interface and the
qdisc-driver interface, is that the "gate mask" on the qdisc side
references traffic classes, that is bit 0 of the gate mask means
Traffic Class 0, and in the driver interface, it specifies the queues,
that is bit 0 means queue 0. That is to say that taprio converts the
references to traffic classes to references to queues before sending
the offloading request to the driver.


Request for help
----------------

I would like that interested driver maintainers could take a look at
the proposed interface and see if it's going to be too awkward for any
particular device. Also, pointers to available documentation would be
appreciated. The idea here is to start a discussion so we can have an
interface that would work for multiple vendors.


Links
-----

kernel patches:
https://github.com/vcgomes/net-next/tree/taprio-add-support-for-change-v2

iproute2 patches:
https://github.com/vcgomes/iproute2/tree/taprio-add-support-for-change-v2


Thank you,
--
Vinicius



Vinicius Costa Gomes (6):
  taprio: Fix potencial use of invalid memory during dequeue()
  taprio: Add support adding an admin schedule
  taprio: Add support for setting the cycle-time manually
  taprio: Add support for cycle-time-extension
  taprio: Add support for frame-preemption
  taprio: Add support for hardware offloading

 include/linux/netdevice.h      |   1 +
 include/net/pkt_sched.h        |  20 +
 include/uapi/linux/pkt_sched.h |  18 +
 net/sched/sch_taprio.c         | 883 +++++++++++++++++++++++++--------
 4 files changed, 711 insertions(+), 211 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC net-next v1 1/6] taprio: Fix potencial use of invalid memory during dequeue()
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
@ 2019-04-10  0:33 ` Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 2/6] taprio: Add support adding an admin schedule Vinicius Costa Gomes
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

Right now, this isn't a problem, but the next commit allows schedules
to be added during runtime. When a new schedule transitions from the
inactive to the active state ("admin" -> "oper") the previous one is
free'd, if it's free'd just after the RCU read lock is released, we
may access an invalid 'entry'.

So, we should take care to protect the dequeue() flow, so all the
places that access the entries are protected by the RCU read lock.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 net/sched/sch_taprio.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index c7041999eb5d..1fb5eae46bc6 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -124,8 +124,8 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	struct sk_buff *skb = NULL;
 	struct sched_entry *entry;
-	struct sk_buff *skb;
 	u32 gate_mask;
 	int i;
 
@@ -137,10 +137,9 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 	 * "AdminGateSates"
 	 */
 	gate_mask = entry ? entry->gate_mask : TAPRIO_ALL_GATES_OPEN;
-	rcu_read_unlock();
 
 	if (!gate_mask)
-		return NULL;
+		goto done;
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct Qdisc *child = q->qdiscs[i];
@@ -170,26 +169,35 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 		 * guard band ...
 		 */
 		if (gate_mask != TAPRIO_ALL_GATES_OPEN &&
-		    ktime_after(guard, entry->close_time))
-			return NULL;
+		    ktime_after(guard, entry->close_time)) {
+			skb = NULL;
+			goto done;
+		}
 
 		/* ... and no budget. */
 		if (gate_mask != TAPRIO_ALL_GATES_OPEN &&
-		    atomic_sub_return(len, &entry->budget) < 0)
-			return NULL;
+		    atomic_sub_return(len, &entry->budget) < 0) {
+			skb = NULL;
+			goto done;
+		}
 
 		skb = child->ops->dequeue(child);
 		if (unlikely(!skb))
-			return NULL;
+			goto done;
+
 
 		qdisc_bstats_update(sch, skb);
 		qdisc_qstats_backlog_dec(sch, skb);
 		sch->q.qlen--;
 
-		return skb;
+		goto done;
 	}
 
-	return NULL;
+done:
+	rcu_read_unlock();
+
+	return skb;
+
 }
 
 static bool should_restart_cycle(const struct taprio_sched *q,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net-next v1 2/6] taprio: Add support adding an admin schedule
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 1/6] taprio: Fix potencial use of invalid memory during dequeue() Vinicius Costa Gomes
@ 2019-04-10  0:33 ` Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 3/6] taprio: Add support for setting the cycle-time manually Vinicius Costa Gomes
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

The IEEE 802.1Q-2018 defines two "types" of schedules, the "Oper" (from
operational?) and "Admin" ones. Up until now, 'taprio' only had
support for the "Oper" one, added when the qdisc is created. This adds
support for the "Admin" one, which allows the .change() operation to
be supported.

Just for clarification, some quick (and dirty) definitions, the "Oper"
schedule is the currently (as in this instant) running one, and it's
read-only. The "Admin" one is the one that the system configurator has
installed, it can be changed, and it will be "promoted" to "Oper" when
it's 'base-time' is reached.

The idea behing this patch is that calling something like the below,
(after taprio is already configured with an initial schedule):

$ tc qdisc change taprio dev IFACE parent root 	     \
     	   base-time X 	     	   	       	     \
     	   sched-entry <CMD> <GATES> <INTERVAL>	     \
	   ...

Will cause a new admin schedule to be created and programmed to be
"promoted" to "Oper" at instant X. If an "Admin" schedule already
exists, it will be overwritten with the new parameters.

Up until now, there was some code that was added to ease the support
of changing a single entry of a schedule, but was ultimately unused.
Now, that we have support for "change" with more well thought
semantics, updating a single entry seems to be less useful.

So we remove what is in practice dead code, and return a "not
supported" error if the user tries to use it. If changing a single
entry would make the user's life easier we may ressurrect this idea,
but at this point, removing it simplifies the code.

For now, only the schedule specific bits are allowed to be added for a
new schedule, that means that 'clockid', 'num_tc', 'map' and 'queues'
cannot be modified.

Example:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
      base-time $BASE_TIME \
      sched-entry S 00 500000 \
      sched-entry S 0f 500000 \
      clockid CLOCK_TAI

The only change in the netlink API introduced by this change is the
introduction of an "admin" type in the response to a dump request,
that type allows userspace to separate the "oper" schedule from the
"admin" schedule. If userspace doesn't support the "admin" type, it
will only display the "oper" schedule.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h |  11 +
 net/sched/sch_taprio.c         | 517 ++++++++++++++++++++-------------
 2 files changed, 333 insertions(+), 195 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 7ee74c3474bf..d59770d0eb84 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1148,6 +1148,16 @@ enum {
 
 #define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
 
+/* The format for the admin sched (dump only):
+ * [TCA_TAPRIO_SCHED_ADMIN_SCHED]
+ *   [TCA_TAPRIO_ATTR_SCHED_BASE_TIME]
+ *   [TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]
+ *     [TCA_TAPRIO_ATTR_SCHED_ENTRY]
+ *       [TCA_TAPRIO_ATTR_SCHED_ENTRY_CMD]
+ *       [TCA_TAPRIO_ATTR_SCHED_ENTRY_GATES]
+ *       [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL]
+ */
+
 enum {
 	TCA_TAPRIO_ATTR_UNSPEC,
 	TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
@@ -1156,6 +1166,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
 	TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
 	TCA_TAPRIO_PAD,
+	TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 1fb5eae46bc6..af12353232a5 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -15,6 +15,7 @@
 #include <linux/skbuff.h>
 #include <linux/module.h>
 #include <linux/spinlock.h>
+#include <linux/rcupdate.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/pkt_cls.h>
@@ -37,24 +38,68 @@ struct sched_entry {
 	u8 command;
 };
 
+struct sched_gate_list {
+	struct rcu_head rcu;
+	struct list_head entries;
+	size_t num_entries;
+	s64 base_time;
+};
+
 struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
-	s64 base_time;
 	int clockid;
 	int picos_per_byte; /* Using picoseconds because for 10Gbps+
 			     * speeds it's sub-nanoseconds per byte
 			     */
-	size_t num_entries;
 
 	/* Protects the update side of the RCU protected current_entry */
 	spinlock_t current_entry_lock;
 	struct sched_entry __rcu *current_entry;
-	struct list_head entries;
+	struct sched_gate_list __rcu *oper_sched;
+	struct sched_gate_list __rcu *admin_sched;
 	ktime_t (*get_time)(void);
 	struct hrtimer advance_timer;
 };
 
+static ktime_t sched_base_time(const struct sched_gate_list *sched)
+{
+	if (!sched)
+		return KTIME_MAX;
+
+	return sched->base_time;
+}
+
+static void taprio_free_sched_cb(struct rcu_head *head)
+{
+	struct sched_gate_list *sched = container_of(head, struct sched_gate_list, rcu);
+	struct sched_entry *entry, *n;
+
+	if (!sched)
+		return;
+
+	list_for_each_entry_safe(entry, n, &sched->entries, list) {
+		list_del(&entry->list);
+		kfree(entry);
+	}
+
+	kfree(sched);
+}
+
+static void switch_schedules(struct taprio_sched *q,
+			     struct sched_gate_list **admin,
+			     struct sched_gate_list **oper)
+{
+	rcu_assign_pointer(q->oper_sched, *admin);
+	rcu_assign_pointer(q->admin_sched, NULL);
+
+	if (*oper)
+		call_rcu(&(*oper)->rcu, taprio_free_sched_cb);
+
+	*oper = *admin;
+	*admin = NULL;
+}
+
 static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			  struct sk_buff **to_free)
 {
@@ -200,18 +245,39 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 
 }
 
-static bool should_restart_cycle(const struct taprio_sched *q,
+static bool should_restart_cycle(const struct sched_gate_list *oper,
 				 const struct sched_entry *entry)
 {
 	WARN_ON(!entry);
 
-	return list_is_last(&entry->list, &q->entries);
+	return list_is_last(&entry->list, &oper->entries);
+}
+
+static bool should_change_schedules(const struct sched_gate_list *admin,
+				    const struct sched_gate_list *oper,
+				    ktime_t close_time)
+{
+	ktime_t next_base_time;
+
+	if (!admin)
+		return false;
+
+	next_base_time = sched_base_time(admin);
+
+	/* This is the simple case, the close_time would fall after
+	 * the next schedule base_time.
+	 */
+	if (ktime_compare(next_base_time, close_time) <= 0)
+		return true;
+
+	return false;
 }
 
 static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 {
 	struct taprio_sched *q = container_of(timer, struct taprio_sched,
 					      advance_timer);
+	struct sched_gate_list *oper, *admin;
 	struct sched_entry *entry, *next;
 	struct Qdisc *sch = q->root;
 	ktime_t close_time;
@@ -219,26 +285,43 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 	spin_lock(&q->current_entry_lock);
 	entry = rcu_dereference_protected(q->current_entry,
 					  lockdep_is_held(&q->current_entry_lock));
+	oper = rcu_dereference_protected(q->oper_sched,
+					 lockdep_is_held(&q->current_entry_lock));
+	admin = rcu_dereference_protected(q->admin_sched,
+					  lockdep_is_held(&q->current_entry_lock));
 
-	/* This is the case that it's the first time that the schedule
-	 * runs, so it only happens once per schedule. The first entry
-	 * is pre-calculated during the schedule initialization.
+	if (!oper)
+		switch_schedules(q, &admin, &oper);
+
+	/* This can happen in two cases: 1. this is the very first run
+	 * of this function (i.e. we weren't running any schedule
+	 * previously); 2. The previous schdule just ended. The first
+	 * entry of all schedules are pre-calculated during the
+	 * schedule initialization.
 	 */
-	if (unlikely(!entry)) {
-		next = list_first_entry(&q->entries, struct sched_entry,
+	if (unlikely(!entry || entry->close_time == oper->base_time)) {
+		next = list_first_entry(&oper->entries, struct sched_entry,
 					list);
 		close_time = next->close_time;
 		goto first_run;
 	}
 
-	if (should_restart_cycle(q, entry))
-		next = list_first_entry(&q->entries, struct sched_entry,
+	if (should_restart_cycle(oper, entry))
+		next = list_first_entry(&oper->entries, struct sched_entry,
 					list);
 	else
 		next = list_next_entry(entry, list);
 
 	close_time = ktime_add_ns(entry->close_time, next->interval);
 
+	if (should_change_schedules(admin, oper, close_time)) {
+		/* Set things so the next time this runs, the new
+		 * schedule runs.
+		 */
+		close_time = sched_base_time(admin);
+		switch_schedules(q, &admin, &oper);
+	}
+
 	next->close_time = close_time;
 	atomic_set(&next->budget,
 		   (next->interval * 1000) / q->picos_per_byte);
@@ -322,70 +405,8 @@ static int parse_sched_entry(struct nlattr *n, struct sched_entry *entry,
 	return fill_sched_entry(tb, entry, extack);
 }
 
-/* Returns the number of entries in case of success */
-static int parse_sched_single_entry(struct nlattr *n,
-				    struct taprio_sched *q,
-				    struct netlink_ext_ack *extack)
-{
-	struct nlattr *tb_entry[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = { };
-	struct nlattr *tb_list[TCA_TAPRIO_SCHED_MAX + 1] = { };
-	struct sched_entry *entry;
-	bool found = false;
-	u32 index;
-	int err;
-
-	err = nla_parse_nested(tb_list, TCA_TAPRIO_SCHED_MAX,
-			       n, entry_list_policy, NULL);
-	if (err < 0) {
-		NL_SET_ERR_MSG(extack, "Could not parse nested entry");
-		return -EINVAL;
-	}
-
-	if (!tb_list[TCA_TAPRIO_SCHED_ENTRY]) {
-		NL_SET_ERR_MSG(extack, "Single-entry must include an entry");
-		return -EINVAL;
-	}
-
-	err = nla_parse_nested(tb_entry, TCA_TAPRIO_SCHED_ENTRY_MAX,
-			       tb_list[TCA_TAPRIO_SCHED_ENTRY],
-			       entry_policy, NULL);
-	if (err < 0) {
-		NL_SET_ERR_MSG(extack, "Could not parse nested entry");
-		return -EINVAL;
-	}
-
-	if (!tb_entry[TCA_TAPRIO_SCHED_ENTRY_INDEX]) {
-		NL_SET_ERR_MSG(extack, "Entry must specify an index\n");
-		return -EINVAL;
-	}
-
-	index = nla_get_u32(tb_entry[TCA_TAPRIO_SCHED_ENTRY_INDEX]);
-	if (index >= q->num_entries) {
-		NL_SET_ERR_MSG(extack, "Index for single entry exceeds number of entries in schedule");
-		return -EINVAL;
-	}
-
-	list_for_each_entry(entry, &q->entries, list) {
-		if (entry->index == index) {
-			found = true;
-			break;
-		}
-	}
-
-	if (!found) {
-		NL_SET_ERR_MSG(extack, "Could not find entry");
-		return -ENOENT;
-	}
-
-	err = fill_sched_entry(tb_entry, entry, extack);
-	if (err < 0)
-		return err;
-
-	return q->num_entries;
-}
-
 static int parse_sched_list(struct nlattr *list,
-			    struct taprio_sched *q,
+			    struct sched_gate_list *sched,
 			    struct netlink_ext_ack *extack)
 {
 	struct nlattr *n;
@@ -415,64 +436,36 @@ static int parse_sched_list(struct nlattr *list,
 			return err;
 		}
 
-		list_add_tail(&entry->list, &q->entries);
+		list_add_tail(&entry->list, &sched->entries);
 		i++;
 	}
 
-	q->num_entries = i;
+	sched->num_entries = i;
 
 	return i;
 }
 
-/* Returns the number of entries in case of success */
-static int parse_taprio_opt(struct nlattr **tb, struct taprio_sched *q,
-			    struct netlink_ext_ack *extack)
+static int parse_taprio_schedule(struct nlattr **tb,
+				 struct sched_gate_list *new,
+				 struct netlink_ext_ack *extack)
 {
 	int err = 0;
-	int clockid;
-
-	if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST] &&
-	    tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY])
-		return -EINVAL;
-
-	if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] && q->num_entries == 0)
-		return -EINVAL;
 
-	if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID])
-		return -EINVAL;
+	if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]) {
+		NL_SET_ERR_MSG(extack, "Adding a single entry is not supported");
+		return -ENOTSUPP;
+	}
 
 	if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME])
-		q->base_time = nla_get_s64(
-			tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
-
-	if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
-		clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
-
-		/* We only support static clockids and we don't allow
-		 * for it to be modified after the first init.
-		 */
-		if (clockid < 0 || (q->clockid != -1 && q->clockid != clockid))
-			return -EINVAL;
-
-		q->clockid = clockid;
-	}
+		new->base_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
 
 	if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST])
 		err = parse_sched_list(
-			tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], q, extack);
-	else if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY])
-		err = parse_sched_single_entry(
-			tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY], q, extack);
-
-	/* parse_sched_* return the number of entries in the schedule,
-	 * a schedule with zero entries is an error.
-	 */
-	if (err == 0) {
-		NL_SET_ERR_MSG(extack, "The schedule should contain at least one entry");
-		return -EINVAL;
-	}
+			tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], new, extack);
+	if (err < 0)
+		return err;
 
-	return err;
+	return 0;
 }
 
 static int taprio_parse_mqprio_opt(struct net_device *dev,
@@ -481,11 +474,17 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
 {
 	int i, j;
 
-	if (!qopt) {
+	if (!qopt && !dev->num_tc) {
 		NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary");
 		return -EINVAL;
 	}
 
+	/* If num_tc is already set, it means that the user already
+	 * configured the mqprio part
+	 */
+	if (dev->num_tc)
+		return 0;
+
 	/* Verify num_tc is not out of max range */
 	if (qopt->num_tc > TC_MAX_QUEUE) {
 		NL_SET_ERR_MSG(extack, "Number of traffic classes is outside valid range");
@@ -531,19 +530,20 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
 	return 0;
 }
 
-static ktime_t taprio_get_start_time(struct Qdisc *sch)
+static ktime_t taprio_get_start_time(struct Qdisc *sch,
+				     struct sched_gate_list *sched)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct sched_entry *entry;
 	ktime_t now, base, cycle;
 	s64 n;
 
-	base = ns_to_ktime(q->base_time);
+	base = ns_to_ktime(sched->base_time);
 	cycle = 0;
 
 	/* Calculate the cycle_time, by summing all the intervals.
 	 */
-	list_for_each_entry(entry, &q->entries, list)
+	list_for_each_entry(entry, &sched->entries, list)
 		cycle = ktime_add_ns(cycle, entry->interval);
 
 	if (!cycle)
@@ -562,23 +562,34 @@ static ktime_t taprio_get_start_time(struct Qdisc *sch)
 	return ktime_add_ns(base, (n + 1) * cycle);
 }
 
-static void taprio_start_sched(struct Qdisc *sch, ktime_t start)
+static void setup_first_close_time(struct taprio_sched *q,
+				   struct sched_gate_list *sched, ktime_t base)
 {
-	struct taprio_sched *q = qdisc_priv(sch);
 	struct sched_entry *first;
-	unsigned long flags;
 
-	spin_lock_irqsave(&q->current_entry_lock, flags);
-
-	first = list_first_entry(&q->entries, struct sched_entry,
-				 list);
+	first = list_first_entry(&sched->entries,
+				 struct sched_entry, list);
 
-	first->close_time = ktime_add_ns(start, first->interval);
+	first->close_time = ktime_add_ns(base, first->interval);
 	atomic_set(&first->budget,
 		   (first->interval * 1000) / q->picos_per_byte);
-	rcu_assign_pointer(q->current_entry, NULL);
+}
 
-	spin_unlock_irqrestore(&q->current_entry_lock, flags);
+static void taprio_start_sched(struct Qdisc *sch,
+			       ktime_t start, struct sched_gate_list *new)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+	ktime_t expires;
+
+	expires = hrtimer_get_expires(&q->advance_timer);
+	if (expires == KTIME_MAX)
+		rcu_assign_pointer(q->current_entry, NULL);
+
+	/* If the new schedule starts before the next expiration, we
+	 * reprogram it to the earliest one, so we change the admin
+	 * schedule to the operational one at the right time.
+	 */
+	start = min_t(ktime_t, start, expires);
 
 	hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
 }
@@ -587,11 +598,13 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 			 struct netlink_ext_ack *extack)
 {
 	struct nlattr *tb[TCA_TAPRIO_ATTR_MAX + 1] = { };
+	struct sched_gate_list *oper, *admin, *new_admin;
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
 	struct tc_mqprio_qopt *mqprio = NULL;
 	struct ethtool_link_ksettings ecmd;
-	int i, err, size;
+	int i, err, clockid;
+	unsigned long flags;
 	s64 link_speed;
 	ktime_t start;
 
@@ -600,7 +613,6 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	if (err < 0)
 		return err;
 
-	err = -EINVAL;
 	if (tb[TCA_TAPRIO_ATTR_PRIOMAP])
 		mqprio = nla_data(tb[TCA_TAPRIO_ATTR_PRIOMAP]);
 
@@ -608,48 +620,64 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	if (err < 0)
 		return err;
 
-	/* A schedule with less than one entry is an error */
-	size = parse_taprio_opt(tb, q, extack);
-	if (size < 0)
-		return size;
 
-	hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS);
-	q->advance_timer.function = advance_sched;
 
-	switch (q->clockid) {
-	case CLOCK_REALTIME:
-		q->get_time = ktime_get_real;
-		break;
-	case CLOCK_MONOTONIC:
-		q->get_time = ktime_get;
-		break;
-	case CLOCK_BOOTTIME:
-		q->get_time = ktime_get_boottime;
-		break;
-	case CLOCK_TAI:
-		q->get_time = ktime_get_clocktai;
-		break;
-	default:
-		return -ENOTSUPP;
+	new_admin = kzalloc(sizeof(*new_admin), GFP_KERNEL);
+	if (!new_admin) {
+		NL_SET_ERR_MSG(extack, "Not enough memory for a new schedule");
+		return -ENOMEM;
 	}
+	INIT_LIST_HEAD(&new_admin->entries);
 
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		struct netdev_queue *dev_queue;
-		struct Qdisc *qdisc;
+	rcu_read_lock();
+	oper = rcu_dereference(q->oper_sched);
+	admin = rcu_dereference(q->admin_sched);
+	rcu_read_unlock();
 
-		dev_queue = netdev_get_tx_queue(dev, i);
-		qdisc = qdisc_create_dflt(dev_queue,
-					  &pfifo_qdisc_ops,
-					  TC_H_MAKE(TC_H_MAJ(sch->handle),
-						    TC_H_MIN(i + 1)),
-					  extack);
-		if (!qdisc)
-			return -ENOMEM;
+	if (mqprio && (oper || admin)) {
+		NL_SET_ERR_MSG(extack, "Changing the traffic mapping of a running schedule is not supported");
+		err = -ENOTSUPP;
+		goto free_sched;
+	}
 
-		if (i < dev->real_num_tx_queues)
-			qdisc_hash_add(qdisc, false);
+	err = parse_taprio_schedule(tb, new_admin, extack);
+	if (err < 0)
+		goto free_sched;
 
-		q->qdiscs[i] = qdisc;
+	if (new_admin->num_entries == 0) {
+		NL_SET_ERR_MSG(extack, "There should be at least one entry in the schedule");
+		err = -EINVAL;
+		goto free_sched;
+	}
+
+	if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
+		clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
+
+		/* We only support static clockids and we don't allow
+		 * for it to be modified after the first init.
+		 */
+		if (clockid < 0 ||
+		    (q->clockid != -1 && q->clockid != clockid)) {
+			NL_SET_ERR_MSG(extack, "Changing the 'clockid' of a running schedule is not supported");
+			err = -ENOTSUPP;
+			goto free_sched;
+		}
+
+		q->clockid = clockid;
+	}
+
+	if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
+		NL_SET_ERR_MSG(extack, "Specifying a 'clockid' is mandatory");
+		err = -EINVAL;
+		goto free_sched;
+	}
+
+	/* Protects against enqueue()/dequeue() */
+	spin_lock_bh(qdisc_lock(sch));
+
+	if (!hrtimer_active(&q->advance_timer)) {
+		hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS);
+		q->advance_timer.function = advance_sched;
 	}
 
 	if (mqprio) {
@@ -665,6 +693,25 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 					       mqprio->prio_tc_map[i]);
 	}
 
+	switch (q->clockid) {
+	case CLOCK_REALTIME:
+		q->get_time = ktime_get_real;
+		break;
+	case CLOCK_MONOTONIC:
+		q->get_time = ktime_get;
+		break;
+	case CLOCK_BOOTTIME:
+		q->get_time = ktime_get_boottime;
+		break;
+	case CLOCK_TAI:
+		q->get_time = ktime_get_clocktai;
+		break;
+	default:
+		NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
+		err = -EINVAL;
+		goto unlock;
+	}
+
 	if (!__ethtool_get_link_ksettings(dev, &ecmd))
 		link_speed = ecmd.base.speed;
 	else
@@ -673,20 +720,41 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	q->picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
 				      link_speed * 1000 * 1000);
 
-	start = taprio_get_start_time(sch);
-	if (!start)
-		return 0;
+	start = taprio_get_start_time(sch, new_admin);
+	if (!start) {
+		err = 0;
+		goto unlock;
+	}
 
-	taprio_start_sched(sch, start);
+	setup_first_close_time(q, new_admin, start);
 
-	return 0;
+	/* Protects against advance_sched() */
+	spin_lock_irqsave(&q->current_entry_lock, flags);
+
+	taprio_start_sched(sch, start, new_admin);
+
+	rcu_assign_pointer(q->admin_sched, new_admin);
+	if (admin)
+		call_rcu(&admin->rcu, taprio_free_sched_cb);
+	new_admin = NULL;
+
+	spin_unlock_irqrestore(&q->current_entry_lock, flags);
+
+	err = 0;
+
+unlock:
+	spin_unlock_bh(qdisc_lock(sch));
+
+free_sched:
+	kfree(new_admin);
+
+	return err;
 }
 
 static void taprio_destroy(struct Qdisc *sch)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
-	struct sched_entry *entry, *n;
 	unsigned int i;
 
 	hrtimer_cancel(&q->advance_timer);
@@ -701,10 +769,11 @@ static void taprio_destroy(struct Qdisc *sch)
 
 	netdev_set_num_tc(dev, 0);
 
-	list_for_each_entry_safe(entry, n, &q->entries, list) {
-		list_del(&entry->list);
-		kfree(entry);
-	}
+	if (q->oper_sched)
+		call_rcu(&q->oper_sched->rcu, taprio_free_sched_cb);
+
+	if (q->admin_sched)
+		call_rcu(&q->admin_sched->rcu, taprio_free_sched_cb);
 }
 
 static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
@@ -712,12 +781,12 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	int i;
 
-	INIT_LIST_HEAD(&q->entries);
 	spin_lock_init(&q->current_entry_lock);
 
-	/* We may overwrite the configuration later */
 	hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS);
+	q->advance_timer.function = advance_sched;
 
 	q->root = sch;
 
@@ -743,6 +812,25 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!opt)
 		return -EINVAL;
 
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct netdev_queue *dev_queue;
+		struct Qdisc *qdisc;
+
+		dev_queue = netdev_get_tx_queue(dev, i);
+		qdisc = qdisc_create_dflt(dev_queue,
+					  &pfifo_qdisc_ops,
+					  TC_H_MAKE(TC_H_MAJ(sch->handle),
+						    TC_H_MIN(i + 1)),
+					  extack);
+		if (!qdisc)
+			return -ENOMEM;
+
+		if (i < dev->real_num_tx_queues)
+			qdisc_hash_add(qdisc, false);
+
+		q->qdiscs[i] = qdisc;
+	}
+
 	return taprio_change(sch, opt, extack);
 }
 
@@ -814,15 +902,46 @@ static int dump_entry(struct sk_buff *msg,
 	return -1;
 }
 
+static int dump_schedule(struct sk_buff *msg,
+			 const struct sched_gate_list *root)
+{
+	struct nlattr *entry_list;
+	struct sched_entry *entry;
+
+	if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_BASE_TIME,
+			root->base_time, TCA_TAPRIO_PAD))
+		return -1;
+
+	entry_list = nla_nest_start(msg, TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
+	if (!entry_list)
+		goto error_nest;
+
+	list_for_each_entry(entry, &root->entries, list) {
+		if (dump_entry(msg, entry) < 0)
+			goto error_nest;
+	}
+
+	nla_nest_end(msg, entry_list);
+	return 0;
+
+error_nest:
+	nla_nest_cancel(msg, entry_list);
+	return -1;
+}
+
 static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
+	struct sched_gate_list *oper, *admin;
 	struct tc_mqprio_qopt opt = { 0 };
-	struct nlattr *nest, *entry_list;
-	struct sched_entry *entry;
+	struct nlattr *nest, *sched_nest;
 	unsigned int i;
 
+	rcu_read_lock();
+	oper = rcu_dereference(q->oper_sched);
+	admin = rcu_dereference(q->admin_sched);
+
 	opt.num_tc = netdev_get_num_tc(dev);
 	memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map));
 
@@ -833,34 +952,41 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (!nest)
-		return -ENOSPC;
+		goto start_error;
 
 	if (nla_put(skb, TCA_TAPRIO_ATTR_PRIOMAP, sizeof(opt), &opt))
 		goto options_error;
 
-	if (nla_put_s64(skb, TCA_TAPRIO_ATTR_SCHED_BASE_TIME,
-			q->base_time, TCA_TAPRIO_PAD))
-		goto options_error;
-
 	if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid))
 		goto options_error;
 
-	entry_list = nla_nest_start(skb, TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
-	if (!entry_list)
+	if (oper && dump_schedule(skb, oper))
 		goto options_error;
 
-	list_for_each_entry(entry, &q->entries, list) {
-		if (dump_entry(skb, entry) < 0)
-			goto options_error;
-	}
+	if (!admin)
+		goto done;
+
+	sched_nest = nla_nest_start(skb, TCA_TAPRIO_ATTR_ADMIN_SCHED);
 
-	nla_nest_end(skb, entry_list);
+	if (dump_schedule(skb, admin))
+		goto admin_error;
+
+	nla_nest_end(skb, sched_nest);
+
+done:
+	rcu_read_unlock();
 
 	return nla_nest_end(skb, nest);
 
+admin_error:
+	nla_nest_cancel(skb, sched_nest);
+
 options_error:
 	nla_nest_cancel(skb, nest);
-	return -1;
+
+start_error:
+	rcu_read_unlock();
+	return -ENOSPC;
 }
 
 static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl)
@@ -947,6 +1073,7 @@ static struct Qdisc_ops taprio_qdisc_ops __read_mostly = {
 	.id		= "taprio",
 	.priv_size	= sizeof(struct taprio_sched),
 	.init		= taprio_init,
+	.change		= taprio_change,
 	.destroy	= taprio_destroy,
 	.peek		= taprio_peek,
 	.dequeue	= taprio_dequeue,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net-next v1 3/6] taprio: Add support for setting the cycle-time manually
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 1/6] taprio: Fix potencial use of invalid memory during dequeue() Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 2/6] taprio: Add support adding an admin schedule Vinicius Costa Gomes
@ 2019-04-10  0:33 ` Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 4/6] taprio: Add support for cycle-time-extension Vinicius Costa Gomes
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

IEEE 802.1Q-2018 defines that a the cycle-time of a schedule may be
overridden, so the schedule is truncated to a determined "width".

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_taprio.c         | 56 ++++++++++++++++++++++++++++------
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index d59770d0eb84..7a32276838e1 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1167,6 +1167,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
 	TCA_TAPRIO_PAD,
 	TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */
+	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index af12353232a5..70bba9782449 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -42,6 +42,8 @@ struct sched_gate_list {
 	struct rcu_head rcu;
 	struct list_head entries;
 	size_t num_entries;
+	ktime_t cycle_close_time;
+	s64 cycle_time;
 	s64 base_time;
 };
 
@@ -250,7 +252,13 @@ static bool should_restart_cycle(const struct sched_gate_list *oper,
 {
 	WARN_ON(!entry);
 
-	return list_is_last(&entry->list, &oper->entries);
+	if (list_is_last(&entry->list, &oper->entries))
+		return true;
+
+	if (ktime_compare(entry->close_time, oper->cycle_close_time) == 0)
+		return true;
+
+	return false;
 }
 
 static bool should_change_schedules(const struct sched_gate_list *admin,
@@ -306,13 +314,17 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
 		goto first_run;
 	}
 
-	if (should_restart_cycle(oper, entry))
+	if (should_restart_cycle(oper, entry)) {
 		next = list_first_entry(&oper->entries, struct sched_entry,
 					list);
-	else
+		oper->cycle_close_time = ktime_add_ns(oper->cycle_close_time,
+						      oper->cycle_time);
+	} else {
 		next = list_next_entry(entry, list);
+	}
 
 	close_time = ktime_add_ns(entry->close_time, next->interval);
+	close_time = min_t(ktime_t, close_time, oper->cycle_close_time);
 
 	if (should_change_schedules(admin, oper, close_time)) {
 		/* Set things so the next time this runs, the new
@@ -358,6 +370,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
 	[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]      = { .type = NLA_S64 },
 	[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]   = { .type = NLA_NESTED },
 	[TCA_TAPRIO_ATTR_SCHED_CLOCKID]        = { .type = NLA_S32 },
+	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]     = { .type = NLA_S64 },
 };
 
 static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry,
@@ -459,6 +472,9 @@ static int parse_taprio_schedule(struct nlattr **tb,
 	if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME])
 		new->base_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
 
+	if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME])
+		new->cycle_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]);
+
 	if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST])
 		err = parse_sched_list(
 			tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], new, extack);
@@ -530,22 +546,32 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
 	return 0;
 }
 
+static ktime_t get_cycle_time(struct sched_gate_list *sched)
+{
+	struct sched_entry *entry;
+	ktime_t cycle = 0;
+
+	if (sched->cycle_time != 0)
+		return sched->cycle_time;
+
+	list_for_each_entry(entry, &sched->entries, list)
+		cycle = ktime_add_ns(cycle, entry->interval);
+
+	sched->cycle_time = cycle;
+
+	return cycle;
+}
+
 static ktime_t taprio_get_start_time(struct Qdisc *sch,
 				     struct sched_gate_list *sched)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
-	struct sched_entry *entry;
 	ktime_t now, base, cycle;
 	s64 n;
 
 	base = ns_to_ktime(sched->base_time);
-	cycle = 0;
-
-	/* Calculate the cycle_time, by summing all the intervals.
-	 */
-	list_for_each_entry(entry, &sched->entries, list)
-		cycle = ktime_add_ns(cycle, entry->interval);
 
+	cycle = get_cycle_time(sched);
 	if (!cycle)
 		return base;
 
@@ -566,10 +592,16 @@ static void setup_first_close_time(struct taprio_sched *q,
 				   struct sched_gate_list *sched, ktime_t base)
 {
 	struct sched_entry *first;
+	ktime_t cycle;
 
 	first = list_first_entry(&sched->entries,
 				 struct sched_entry, list);
 
+	cycle = get_cycle_time(sched);
+
+	/* FIXME: find a better place to do this */
+	sched->cycle_close_time = ktime_add_ns(base, cycle);
+
 	first->close_time = ktime_add_ns(base, first->interval);
 	atomic_set(&first->budget,
 		   (first->interval * 1000) / q->picos_per_byte);
@@ -912,6 +944,10 @@ static int dump_schedule(struct sk_buff *msg,
 			root->base_time, TCA_TAPRIO_PAD))
 		return -1;
 
+	if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME,
+			root->cycle_time, TCA_TAPRIO_PAD))
+		return -1;
+
 	entry_list = nla_nest_start(msg, TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
 	if (!entry_list)
 		goto error_nest;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net-next v1 4/6] taprio: Add support for cycle-time-extension
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
                   ` (2 preceding siblings ...)
  2019-04-10  0:33 ` [RFC net-next v1 3/6] taprio: Add support for setting the cycle-time manually Vinicius Costa Gomes
@ 2019-04-10  0:33 ` Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 5/6] taprio: Add support for frame-preemption Vinicius Costa Gomes
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

IEEE 802.1Q-2018 defines the concept of a cycle-time-extension, so the
last entry of a schedule before the start of a new schedule can be
extended, so "too-short" entries can be avoided.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_taprio.c         | 35 ++++++++++++++++++++++++++++------
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 7a32276838e1..8b2f993cbb77 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1168,6 +1168,7 @@ enum {
 	TCA_TAPRIO_PAD,
 	TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
+	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 70bba9782449..3807aacde26b 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -44,6 +44,7 @@ struct sched_gate_list {
 	size_t num_entries;
 	ktime_t cycle_close_time;
 	s64 cycle_time;
+	s64 cycle_time_extension;
 	s64 base_time;
 };
 
@@ -265,7 +266,7 @@ static bool should_change_schedules(const struct sched_gate_list *admin,
 				    const struct sched_gate_list *oper,
 				    ktime_t close_time)
 {
-	ktime_t next_base_time;
+	ktime_t next_base_time, extension_time;
 
 	if (!admin)
 		return false;
@@ -278,6 +279,20 @@ static bool should_change_schedules(const struct sched_gate_list *admin,
 	if (ktime_compare(next_base_time, close_time) <= 0)
 		return true;
 
+	/* This is the cycle_time_extension case, if the close_time
+	 * plus the amount that can be extended would fall after the
+	 * next schedule base_time, we can extend the current schedule
+	 * for that amount.
+	 */
+	extension_time = ktime_add_ns(close_time, oper->cycle_time_extension);
+
+	/* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
+	 * how precisely the extension should be made. So after
+	 * conformance testing, this logic may change.
+	 */
+	if (ktime_compare(next_base_time, extension_time) <= 0)
+		return true;
+
 	return false;
 }
 
@@ -366,11 +381,12 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
 	[TCA_TAPRIO_ATTR_PRIOMAP]	       = {
 		.len = sizeof(struct tc_mqprio_qopt)
 	},
-	[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]     = { .type = NLA_NESTED },
-	[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]      = { .type = NLA_S64 },
-	[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]   = { .type = NLA_NESTED },
-	[TCA_TAPRIO_ATTR_SCHED_CLOCKID]        = { .type = NLA_S32 },
-	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]     = { .type = NLA_S64 },
+	[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]           = { .type = NLA_NESTED },
+	[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]            = { .type = NLA_S64 },
+	[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]         = { .type = NLA_NESTED },
+	[TCA_TAPRIO_ATTR_SCHED_CLOCKID]              = { .type = NLA_S32 },
+	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]           = { .type = NLA_S64 },
+	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
 };
 
 static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry,
@@ -472,6 +488,9 @@ static int parse_taprio_schedule(struct nlattr **tb,
 	if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME])
 		new->base_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
 
+	if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION])
+		new->cycle_time_extension = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION]);
+
 	if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME])
 		new->cycle_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]);
 
@@ -948,6 +967,10 @@ static int dump_schedule(struct sk_buff *msg,
 			root->cycle_time, TCA_TAPRIO_PAD))
 		return -1;
 
+	if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION,
+			root->cycle_time_extension, TCA_TAPRIO_PAD))
+		return -1;
+
 	entry_list = nla_nest_start(msg, TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
 	if (!entry_list)
 		goto error_nest;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net-next v1 5/6] taprio: Add support for frame-preemption
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
                   ` (3 preceding siblings ...)
  2019-04-10  0:33 ` [RFC net-next v1 4/6] taprio: Add support for cycle-time-extension Vinicius Costa Gomes
@ 2019-04-10  0:33 ` Vinicius Costa Gomes
  2019-04-10  0:33 ` [RFC net-next v1 6/6] taprio: Add support for hardware offloading Vinicius Costa Gomes
  2019-05-13 19:40 ` [RFC net-next v1 0/6] RFC: taprio change schedules + offload Murali Karicheri
  6 siblings, 0 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

Frame preemption can be used to further reduce the latency of network
communications, so some kinds of traffic can be preempted by higher
priorities ones. This is a hardware only feature.

Frame-preemption is in relation to transmission queues, if the nth bit
of the frame-preemption mask is enabled, it means that traffic going
through the nth TX queue can be preempted by higher priorities queues.

This only has any effect when offloading is enabled.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/uapi/linux/pkt_sched.h | 1 +
 net/sched/sch_taprio.c         | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 8b2f993cbb77..a04df9a76864 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1169,6 +1169,7 @@ enum {
 	TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
+	TCA_TAPRIO_ATTR_FRAME_PREEMPTION, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 3807aacde26b..0a815700c9cc 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -46,6 +46,7 @@ struct sched_gate_list {
 	s64 cycle_time;
 	s64 cycle_time_extension;
 	s64 base_time;
+	u32 frame_preemption;
 };
 
 struct taprio_sched {
@@ -387,6 +388,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
 	[TCA_TAPRIO_ATTR_SCHED_CLOCKID]              = { .type = NLA_S32 },
 	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]           = { .type = NLA_S64 },
 	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
+	[TCA_TAPRIO_ATTR_FRAME_PREEMPTION]           = { .type = NLA_U32 },
 };
 
 static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry,
@@ -494,6 +496,9 @@ static int parse_taprio_schedule(struct nlattr **tb,
 	if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME])
 		new->cycle_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]);
 
+	if (tb[TCA_TAPRIO_ATTR_FRAME_PREEMPTION])
+		new->frame_preemption = nla_get_u32(tb[TCA_TAPRIO_ATTR_FRAME_PREEMPTION]);
+
 	if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST])
 		err = parse_sched_list(
 			tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], new, extack);
@@ -971,6 +976,10 @@ static int dump_schedule(struct sk_buff *msg,
 			root->cycle_time_extension, TCA_TAPRIO_PAD))
 		return -1;
 
+	if (nla_put_u32(msg, TCA_TAPRIO_ATTR_FRAME_PREEMPTION,
+			root->frame_preemption))
+		return -1;
+
 	entry_list = nla_nest_start(msg, TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
 	if (!entry_list)
 		goto error_nest;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC net-next v1 6/6] taprio: Add support for hardware offloading
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
                   ` (4 preceding siblings ...)
  2019-04-10  0:33 ` [RFC net-next v1 5/6] taprio: Add support for frame-preemption Vinicius Costa Gomes
@ 2019-04-10  0:33 ` Vinicius Costa Gomes
  2019-05-13 19:40 ` [RFC net-next v1 0/6] RFC: taprio change schedules + offload Murali Karicheri
  6 siblings, 0 replies; 8+ messages in thread
From: Vinicius Costa Gomes @ 2019-04-10  0:33 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, olteanv,
	timo.koskiahde, m-karicheri2

This allows taprio to offload the schedule enforcement to capable
network cards, resulting in more precise windows and less CPU usage.

The important detail here is the difference between the gate_mask in
taprio and gate_mask for the network driver. For the driver, each bit
in gate_mask references a transmission queue: bit 0 for queue 0, bit 1
for queue 1, and so on. This is done so the driver doesn't need to
know about traffic classes.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h      |   1 +
 include/net/pkt_sched.h        |  20 +++
 include/uapi/linux/pkt_sched.h |   4 +
 net/sched/sch_taprio.c         | 264 ++++++++++++++++++++++++++++++++-
 4 files changed, 286 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 521eb869555e..cf7804f75b58 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -849,6 +849,7 @@ enum tc_setup_type {
 	TC_SETUP_QDISC_ETF,
 	TC_SETUP_ROOT_QDISC,
 	TC_SETUP_QDISC_GRED,
+	TC_SETUP_QDISC_TAPRIO,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index a16fbe9a2a67..6f6eab09df08 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -161,4 +161,24 @@ struct tc_etf_qopt_offload {
 	s32 queue;
 };
 
+struct tc_taprio_sched_entry {
+	u8 command; /* TC_TAPRIO_CMD_* */
+
+	/* The gate_mask in the offloading side refers to HW queues */
+	u32 gate_mask;
+	u32 interval;
+};
+
+struct tc_taprio_qopt_offload {
+	u8 enable;
+	ktime_t base_time;
+	u64 cycle_time;
+	u64 cycle_time_extension;
+
+	/* bit nth being set means that the nth queue is preemptible */
+	u32 frame_preemption_queue_mask;
+	size_t num_entries;
+	struct tc_taprio_sched_entry entries[0];
+};
+
 #endif
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index a04df9a76864..ac9f7eab5aaa 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1158,6 +1158,9 @@ enum {
  *       [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL]
  */
 
+#define TCA_TAPRIO_ATTR_OFFLOAD_FLAG_FULL_OFFLOAD 0x1
+#define TCA_TAPRIO_ATTR_OFFLOAD_FLAG_TXTIME_OFFLOAD 0x2
+
 enum {
 	TCA_TAPRIO_ATTR_UNSPEC,
 	TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
@@ -1170,6 +1173,7 @@ enum {
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	TCA_TAPRIO_ATTR_FRAME_PREEMPTION, /* u32 */
+	TCA_TAPRIO_ATTR_OFFLOAD_FLAGS, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 0a815700c9cc..672641eb24d4 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -22,6 +22,9 @@
 #include <net/sch_generic.h>
 
 #define TAPRIO_ALL_GATES_OPEN -1
+#define FULL_OFFLOAD_IS_ON(flags) ((flags) & TCA_TAPRIO_ATTR_OFFLOAD_FLAG_FULL_OFFLOAD)
+#define TXTIME_OFFLOAD_IS_ON(flags) ((flags) & TCA_TAPRIO_ATTR_OFFLOAD_FLAG_TXTIME_OFFLOAD)
+#define VALID_OFFLOAD(flags) ((flags) != U32_MAX)
 
 struct sched_entry {
 	struct list_head list;
@@ -52,6 +55,8 @@ struct sched_gate_list {
 struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
+	struct tc_mqprio_qopt mqprio;
+	u32 offload_flags;
 	int clockid;
 	int picos_per_byte; /* Using picoseconds because for 10Gbps+
 			     * speeds it's sub-nanoseconds per byte
@@ -63,6 +68,8 @@ struct taprio_sched {
 	struct sched_gate_list __rcu *oper_sched;
 	struct sched_gate_list __rcu *admin_sched;
 	ktime_t (*get_time)(void);
+	struct sk_buff *(*dequeue)(struct Qdisc *sch);
+	struct sk_buff *(*peek)(struct Qdisc *sch);
 	struct hrtimer advance_timer;
 };
 
@@ -123,7 +130,30 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	return qdisc_enqueue(skb, child, to_free);
 }
 
-static struct sk_buff *taprio_peek(struct Qdisc *sch)
+static struct sk_buff *taprio_peek_offload(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct sk_buff *skb;
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct Qdisc *child = q->qdiscs[i];
+
+		if (unlikely(!child))
+			continue;
+
+		skb = child->ops->peek(child);
+		if (!skb)
+			continue;
+
+		return skb;
+	}
+
+	return NULL;
+}
+
+static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
@@ -164,12 +194,19 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
 	return NULL;
 }
 
+static struct sk_buff *taprio_peek(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+
+	return q->peek(sch);
+}
+
 static inline int length_to_duration(struct taprio_sched *q, int len)
 {
 	return (len * q->picos_per_byte) / 1000;
 }
 
-static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
+static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch)
 {
 	struct taprio_sched *q = qdisc_priv(sch);
 	struct net_device *dev = qdisc_dev(sch);
@@ -249,6 +286,40 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 
 }
 
+static struct sk_buff *taprio_dequeue_offload(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
+	struct sk_buff *skb;
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct Qdisc *child = q->qdiscs[i];
+
+		if (unlikely(!child))
+			continue;
+
+		skb = child->ops->dequeue(child);
+		if (unlikely(!skb))
+			continue;
+
+		qdisc_bstats_update(sch, skb);
+		qdisc_qstats_backlog_dec(sch, skb);
+		sch->q.qlen--;
+
+		return skb;
+	}
+
+	return NULL;
+}
+
+static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
+{
+	struct taprio_sched *q = qdisc_priv(sch);
+
+	return q->dequeue(sch);
+}
+
 static bool should_restart_cycle(const struct sched_gate_list *oper,
 				 const struct sched_entry *entry)
 {
@@ -650,6 +721,166 @@ static void taprio_start_sched(struct Qdisc *sch,
 	hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
 }
 
+static u32 tc_mask_to_queue_mask(const struct tc_mqprio_qopt *mqprio,
+				 u32 tc_mask)
+{
+	u32 i, queue_mask = 0;
+
+	for (i = 0; i < mqprio->num_tc; i++) {
+		u32 offset, count;
+
+		if (!(tc_mask & BIT(i)))
+			continue;
+
+		offset = mqprio->offset[i];
+		count = mqprio->count[i];
+
+		queue_mask |= GENMASK(offset + count - 1, offset);
+	}
+
+	return queue_mask;
+}
+
+static void taprio_sched_to_offload(struct taprio_sched *q,
+				    struct sched_gate_list *sched,
+				    struct tc_taprio_qopt_offload *taprio)
+{
+	struct sched_entry *entry;
+	int i = 0;
+
+	taprio->base_time = sched->base_time;
+	taprio->frame_preemption_queue_mask = sched->frame_preemption;
+
+	list_for_each_entry(entry, &sched->entries, list) {
+		struct tc_taprio_sched_entry *e = &taprio->entries[i];
+
+		e->command = entry->command;
+		e->interval = entry->interval;
+
+		/* We do this transformation because the NIC
+		 * has no knowledge of traffic classes, but it
+		 * knows about queues.
+		 */
+		e->gate_mask = tc_mask_to_queue_mask(&q->mqprio,
+						     entry->gate_mask);
+		i++;
+	}
+
+	taprio->num_entries = i;
+}
+
+static void taprio_disable_offload(struct net_device *dev,
+				   struct taprio_sched *q)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_taprio_qopt_offload taprio = { };
+	int err;
+
+	if (!q->offload_flags)
+		return;
+
+	if (!ops->ndo_setup_tc)
+		return;
+
+	taprio.enable = 0;
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, &taprio);
+	if (err < 0)
+		return;
+
+	/* Just to be sure to keep the function pointers in a
+	 * consistent state always.
+	 */
+	q->dequeue = taprio_dequeue_soft;
+	q->peek = taprio_peek_soft;
+
+	q->advance_timer.function = advance_sched;
+
+	q->offload_flags = 0;
+}
+
+static enum hrtimer_restart next_sched(struct hrtimer *timer)
+{
+	struct taprio_sched *q = container_of(timer, struct taprio_sched,
+					      advance_timer);
+	struct sched_gate_list *oper, *admin;
+
+	spin_lock(&q->current_entry_lock);
+	oper = rcu_dereference_protected(q->oper_sched,
+					 lockdep_is_held(&q->current_entry_lock));
+	admin = rcu_dereference_protected(q->admin_sched,
+					  lockdep_is_held(&q->current_entry_lock));
+
+	rcu_assign_pointer(q->oper_sched, admin);
+	rcu_assign_pointer(q->admin_sched, NULL);
+
+	if (oper)
+		call_rcu(&oper->rcu, taprio_free_sched_cb);
+
+	spin_unlock(&q->current_entry_lock);
+
+	return HRTIMER_NORESTART;
+}
+
+static int taprio_enable_offload(struct net_device *dev,
+				 struct tc_mqprio_qopt *mqprio,
+				 struct taprio_sched *q,
+				 struct sched_gate_list *sched,
+				 struct netlink_ext_ack *extack,
+				 u32 offload_flags)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_taprio_qopt_offload *taprio;
+	size_t size;
+	int err = 0;
+
+	if (!FULL_OFFLOAD_IS_ON(offload_flags)) {
+		NL_SET_ERR_MSG(extack, "Offload mode is not supported");
+		return -EOPNOTSUPP;
+	}
+
+	if (!ops->ndo_setup_tc) {
+		NL_SET_ERR_MSG(extack, "Specified device does not support taprio offload");
+		return -EOPNOTSUPP;
+	}
+
+	size = sizeof(*taprio) +
+		sched->num_entries * sizeof(struct tc_taprio_sched_entry);
+
+	taprio = kzalloc(size, GFP_ATOMIC);
+	if (!taprio) {
+		NL_SET_ERR_MSG(extack, "Not enough memory for enabling offload mode");
+		return -ENOMEM;
+	}
+
+	taprio->enable = 1;
+	taprio_sched_to_offload(q, sched, taprio);
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, taprio);
+	if (err < 0) {
+		NL_SET_ERR_MSG(extack, "Specified device failed to setup taprio hardware offload");
+		goto done;
+	}
+
+	q->dequeue = taprio_dequeue_offload;
+	q->peek = taprio_peek_offload;
+
+	/* This function will only serve to keep the pointers to the
+	 * "oper" and "admin" schedules valid in relation to their
+	 * base times, so when calling dump() the users looks at the
+	 * right schedules.
+	 */
+	q->advance_timer.function = next_sched;
+
+done:
+	kfree(taprio);
+
+	if (err == 0)
+		q->offload_flags = offload_flags;
+
+	return err;
+}
+
 static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 			 struct netlink_ext_ack *extack)
 {
@@ -659,6 +890,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	struct net_device *dev = qdisc_dev(sch);
 	struct tc_mqprio_qopt *mqprio = NULL;
 	struct ethtool_link_ksettings ecmd;
+	u32 offload_flags = U32_MAX;
 	int i, err, clockid;
 	unsigned long flags;
 	s64 link_speed;
@@ -676,7 +908,8 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	if (err < 0)
 		return err;
 
-
+	if (tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS])
+		offload_flags = nla_get_u32(tb[TCA_TAPRIO_ATTR_OFFLOAD_FLAGS]);
 
 	new_admin = kzalloc(sizeof(*new_admin), GFP_KERNEL);
 	if (!new_admin) {
@@ -696,6 +929,12 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 		goto free_sched;
 	}
 
+	if (offload_flags != U32_MAX && (oper || admin)) {
+		NL_SET_ERR_MSG(extack, "Changing 'offload' of a running schedule is not supported");
+		err = -ENOTSUPP;
+		goto free_sched;
+	}
+
 	err = parse_taprio_schedule(tb, new_admin, extack);
 	if (err < 0)
 		goto free_sched;
@@ -747,6 +986,8 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 		for (i = 0; i < TC_BITMASK + 1; i++)
 			netdev_set_prio_tc_map(dev, i,
 					       mqprio->prio_tc_map[i]);
+
+		memcpy(&q->mqprio, mqprio, sizeof(q->mqprio));
 	}
 
 	switch (q->clockid) {
@@ -776,6 +1017,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	q->picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
 				      link_speed * 1000 * 1000);
 
+	if (!offload_flags) {
+		taprio_disable_offload(dev, q);
+	} else if (VALID_OFFLOAD(offload_flags) || q->offload_flags) {
+		err = taprio_enable_offload(dev, mqprio, q,
+					    new_admin, extack, offload_flags);
+		if (err)
+			goto unlock;
+	}
+
 	start = taprio_get_start_time(sch, new_admin);
 	if (!start) {
 		err = 0;
@@ -815,6 +1065,8 @@ static void taprio_destroy(struct Qdisc *sch)
 
 	hrtimer_cancel(&q->advance_timer);
 
+	taprio_disable_offload(dev, q);
+
 	if (q->qdiscs) {
 		for (i = 0; i < dev->num_tx_queues && q->qdiscs[i]; i++)
 			qdisc_put(q->qdiscs[i]);
@@ -844,6 +1096,9 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS);
 	q->advance_timer.function = advance_sched;
 
+	q->dequeue = taprio_dequeue_soft;
+	q->peek = taprio_peek_soft;
+
 	q->root = sch;
 
 	/* We only support static clockids. Use an invalid value as default
@@ -1028,6 +1283,9 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid))
 		goto options_error;
 
+	if (nla_put_u32(skb, TCA_TAPRIO_ATTR_OFFLOAD_FLAGS, q->offload_flags))
+		goto options_error;
+
 	if (oper && dump_schedule(skb, oper))
 		goto options_error;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC net-next v1 0/6] RFC: taprio change schedules + offload
  2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
                   ` (5 preceding siblings ...)
  2019-04-10  0:33 ` [RFC net-next v1 6/6] taprio: Add support for hardware offloading Vinicius Costa Gomes
@ 2019-05-13 19:40 ` Murali Karicheri
  6 siblings, 0 replies; 8+ messages in thread
From: Murali Karicheri @ 2019-05-13 19:40 UTC (permalink / raw)
  To: Vinicius Costa Gomes, netdev
  Cc: jhs, xiyou.wangcong, jiri, olteanv, timo.koskiahde

Hi Vinicius,

On 04/09/2019 08:32 PM, Vinicius Costa Gomes wrote:
> Hi,
> 
> 
> Overview
> --------
> 
> This RFC has two objectives, it adds support for changing the running
> schedules during "runtime", explained in more detail later, and
> proposes an interface between taprio and the drivers for hardware
> offloading.
> 
> These two different features are presented together so it's clear what
> the "final state" would look like. But after the RFC stage, they can
> be proposed (and reviewed) separately.
> 
> Changing the schedules without disrupting traffic is important for
> handling dynamic use cases, for example, when streams are
> added/removed and when the network configuration changes.
> 
> Hardware offloading support allows schedules to be more precise and
> have lower resource usage.
> 
> 
> Changing schedules
> ------------------
> 
> The same as the other interfaces we proposed, we try to use the same
> concepts as the IEEE 802.1Q-2018 specification. So, for changing
> schedules, there are an "oper" (operational) and an "admin" schedule.
> The "admin" schedule is mutable and not in use, the "oper" schedule is
> immutable and is in use.
> 
> That is, when the user first adds an schedule it is in the "admin"
> state, and it becomes "oper" when its base-time (basically when it
> starts) is reached.
> 
> What this means is that now it's possible to create taprio with a schedule:
> 
> $ tc qdisc add dev IFACE parent root handle 100 taprio \
>        num_tc 3 \
>        map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>        queues 1@0 1@1 2@2 \
>        base-time 10000000 \
>        sched-entry S 03 300000 \
>        sched-entry S 02 300000 \
>        sched-entry S 06 400000 \
>        clockid CLOCK_TAI
>        
> And then, later, after the previous schedule is "promoted" to "oper",
> add a new ("admin") schedule to be used some time later:
> 
> $ tc qdisc change dev IFACE parent root handle 100 taprio \
>        base-time 1553121866000000000 \
>        sched-entry S 02 500000 \
>        sched-entry S 0f 400000 \
>        clockid CLOCK_TAI
> 
> When enabling the ability to change schedules, it makes sense to add
> two more defined knobs to schedules: "cycle-time" allows to truncate a
> cycle to some value, so it repeats after a well-defined value;
> "cycle-time-extension" controls how much an entry can be extended if
> it's the last one before the change of schedules, the reason is to
> avoid a very small cycle when transitioning from a schedule to
> another.
> 
> With these, taprio in the software mode should provide a fairly
> complete implementation of what's defined in the Enhancements for
> Scheduled Traffic parts of the specification.
> 
> 
> Hardware offload
> ----------------
> 
> Some workloads require better guarantees from their schedules than
> what's provided by the software implementation. This series proposes
> an interface for configuring schedules into compatible network
> controllers.
> 
> This part is proposed together with the support for changing
> schedules, because it raises questions like, should the "qdisc" side
> be responsible of providing visibility into the schedules or should it
> be the driver?
> 
> In this proposal, the driver is called passing the new schedule as
> soon as it is validated, and the "core" qdisc takes care of displaying
> (".dump()") the correct schedules at all times. It means that some
> logic would need to be duplicated in the driver, if the hardware
> doesn't have support for multiple schedules. But as taprio doesn't
> have enough information about the underlying controller to know how
> much in advance a schedule needs to be informed to the hardware, it
> feels like a fair compromise.
> 
> The hardware offloading part of this proposal also tries to define an
> interface for frame-preemption and how it interacts with the
> scheduling of traffic, see Section 8.6.8.4 of IEEE 802.1Q-2018 for
> more information.
> 
> One important difference between the qdisc interface and the
> qdisc-driver interface, is that the "gate mask" on the qdisc side
> references traffic classes, that is bit 0 of the gate mask means
> Traffic Class 0, and in the driver interface, it specifies the queues,
> that is bit 0 means queue 0. That is to say that taprio converts the
> references to traffic classes to references to queues before sending
> the offloading request to the driver.
> 
> 
> Request for help
> ----------------
> 
> I would like that interested driver maintainers could take a look at
> the proposed interface and see if it's going to be too awkward for any
> particular device. Also, pointers to available documentation would be
> appreciated. The idea here is to start a discussion so we can have an
> interface that would work for multiple vendors.
> 
> 
> Links
> -----
> 
> kernel patches:
> https://github.com/vcgomes/net-next/tree/taprio-add-support-for-change-v2
> 
> iproute2 patches:
> https://github.com/vcgomes/iproute2/tree/taprio-add-support-for-change-v2
> 
> 
> Thank you,
> --
> Vinicius
> 
> 
> 
> Vinicius Costa Gomes (6):
>    taprio: Fix potencial use of invalid memory during dequeue()
>    taprio: Add support adding an admin schedule
>    taprio: Add support for setting the cycle-time manually
>    taprio: Add support for cycle-time-extension
>    taprio: Add support for frame-preemption
>    taprio: Add support for hardware offloading
> 
>   include/linux/netdevice.h      |   1 +
>   include/net/pkt_sched.h        |  20 +
>   include/uapi/linux/pkt_sched.h |  18 +
>   net/sched/sch_taprio.c         | 883 +++++++++++++++++++++++++--------
>   4 files changed, 711 insertions(+), 211 deletions(-)
> 
Thanks for the RFC! I was bit tied up with my release work and
couldn't comment immediately. Let me take a look on this and and also
work with my colleagues, but please bear with me as it might take a
while to get back.

Regards,

Murali


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-13 19:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10  0:32 [RFC net-next v1 0/6] RFC: taprio change schedules + offload Vinicius Costa Gomes
2019-04-10  0:33 ` [RFC net-next v1 1/6] taprio: Fix potencial use of invalid memory during dequeue() Vinicius Costa Gomes
2019-04-10  0:33 ` [RFC net-next v1 2/6] taprio: Add support adding an admin schedule Vinicius Costa Gomes
2019-04-10  0:33 ` [RFC net-next v1 3/6] taprio: Add support for setting the cycle-time manually Vinicius Costa Gomes
2019-04-10  0:33 ` [RFC net-next v1 4/6] taprio: Add support for cycle-time-extension Vinicius Costa Gomes
2019-04-10  0:33 ` [RFC net-next v1 5/6] taprio: Add support for frame-preemption Vinicius Costa Gomes
2019-04-10  0:33 ` [RFC net-next v1 6/6] taprio: Add support for hardware offloading Vinicius Costa Gomes
2019-05-13 19:40 ` [RFC net-next v1 0/6] RFC: taprio change schedules + offload Murali Karicheri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.