[PATCH net-next v1 0/1] net/sched: Introduce the taprio scheduler

* [PATCH net-next v1 0/1] net/sched: Introduce the taprio scheduler
@ 2018-09-29  0:59 Vinicius Costa Gomes
  2018-09-29  0:59 ` [PATCH net-next v1 1/1] tc: Add support for configuring " Vinicius Costa Gomes
  2018-10-01 18:12 ` [PATCH net-next v1 0/1] net/sched: Introduce " Vinicius Costa Gomes
  0 siblings, 2 replies; 4+ messages in thread
From: Vinicius Costa Gomes @ 2018-09-29  0:59 UTC (permalink / raw)
  To: netdev
  Cc: Vinicius Costa Gomes, jesus.sanchez-palencia, henrik,
	richardcochran, jhs, xiyou.wangcong, jiri, ilias.apalodimas,
	simon.fok

Hi,

Changes from the RFC:
  - Moved some fields from the per-qdisc data structure to the per
    schedule entry one, mainly "expires" (now called "close_time",
    when an entry ends) and "budget" (how many bytes can be sent
    during an entry);

  - Removed support for the schedule file, in favour of using iproute2
    batch mode (only affects the iproute2 patches) (Jiri Pirko,
    Stephen Hemminger);

  - Removed support for manually setting a cycle-time (it will be
    added in a later series);

Original cover letter
=====================
(lightly edited, updated references and usage)

This series provides a set of interfaces that can be used by
applications that require (time-based) Scheduled Transmission of
packets. It is comprised by 3 new components to the kernel:

  - etf: the per-queue TxTime-Based scheduling qdisc;
  - taprio: the per-port Time-Aware scheduler qdisc;
  - SO_TXTIME: a socket option + cmsg APIs.

ETF and SO_TXTIME are already applied[1] into the net-next tree. This
is the remaining piece.

Overview
========

The CBS qdisc proposal RFC [2] included some rough ideas about the
design and API of a "taprio" (Time Aware Priority) qdisc. The idea of
presenting the taprio ideas at that point (almost one year ago!) was
to show our vision of how things would fit together going forward.
>From that concept stage to this (almost) realised stage the main
differences are:

  - As of now, taprio is a software only implementation of a schedule
    executor;
  - Instead of taprio centralising all the time based decisions, taprio
    can work together with ETF (the Earliest TxTime First), a qdisc
    meant to use the LaunchTime (or similar) feature of various network
    controllers;

In a nutshell, taprio is a root qdisc that can execute a pre-defined
schedule, etf is a qdisc that provides time based admission control
and "earliest deadline first" dequeue mode, and SO_TXTIME is a socket
option that is used for enabling a socket to be used for time-based
Tx and configuring its parameters.

taprio
======

This scheduler allows the network administrator to configure schedules
for classes of traffic, the configuration interface is similar to what
IEEE 802.1Q-2018 defines.

Example configuration:

$ tc qdisc add dev enp2s0 parent root handle 100 taprio \
	    num_tc 3 \
	    map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
	    queues 1@0 1@1 2@2 \
	    sched-entry S 01 300000 \
	    sched-entry S 02 300000 \
	    sched-entry S 04 300000 \
	    base-time 1528743495910289987 \
	    clockid CLOCK_TAI

This qdisc borrows a few concepts from mqprio and so, most the
parameters are similar to mqprio. The main difference is the sequence of 
'sched-entry' parameters, that constitute one schedule:

	      sched-entry S 01 300000
	      sched-entry S 02 300000
	      sched-entry S 04 300000

The format of each entry is:
sched-entry <CMD> <GATE MASK> <INTERVAL>

The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Q-2018 definition (Table 8-7). <GATE MASK>
is a bit-mask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.

This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.

The other parameters can be defined as follows:
  - base-time: allows that multiple systems can have synchronised
    schedules, it specifies the instant when the schedule starts;
  - clockid: specifies the reference clock to be used;

A more complete example can be found here, with instructions of how to
test it:

https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [3]

The basic design of the scheduler is simple, after we calculate the
first expiration of the hrtimer, we set the next expiration to be the
previous plus the current entry's interval. At each time the function
runs, we set the current_entry, which has a gate_mask (that controls
which traffic classes are allowed to "go out" during each interval),
and we reuse this callback to "kick" the qdisc (this is the reason
that the usual qdisc watchdog isn't used).

Future work
===========

  - Add support for multiple schedules, so something like the Admin
    and Oper schedules from IEEE 802.1Q-2018 can be implemented,
    probably "cycle-time" will be re-implemented at this time;

  - Add support for HW offloading;

  - Add support for Frame Preemption related commands (formerly
    802.1Qbu, now part of 802.1Q);

Known Issues
============

  - As taprio is a software only implementation, and there's another
    layer of queuing in the network controller, packets can still
    leave the controller outside their "correct" windows. This happens
    mostly for low-priority classes, and only if they are 'starved' by
    the higher priority ones;

This series is also hosted on github and can be found at [4].
The companion iproute2 patches can be found at [5].

Cheers,

^ permalink raw reply	[flat|nested] 4+ messages in thread