On Wed, Mar 28, 2018 at 09:48:05AM +0200, Thomas Gleixner wrote:
> Jesus,

Thomas, Jesus,

> On Tue, 27 Mar 2018, Jesus Sanchez-Palencia wrote:
> > On 03/25/2018 04:46 AM, Thomas Gleixner wrote:
> > >   This is missing right now and you want to get that right from the very
> > >   beginning. Duct taping it on the interface later on is a bad idea.
> > 
> > Agreed that this is needed. On the SO_TXTIME + tbs proposal, I believe it's been
> > covered by the (per-packet) SCM_DROP_IF_LATE. Do you think we need a different
> > mechanism for expressing that?
> 
> Uuurgh. No. DROP_IF_LATE is just crap to be honest.
> 
> There are two modes:
> 
>       1) Send at the given TX time (Explicit mode)
> 
>       2) Send before given TX time (Deadline mode)
> 
> There is no need to specify 'drop if late' simply because if the message is
> handed in past the given TX time, it's too late by definition. What you are
> trying to implement is a hybrid of TSN and general purpose (not time aware)
> networking in one go. And you do that because your overall design is not
> looking at the big picture. You designed from a given use case assumption
> and tried to fit other things into it with duct tape.

Yes, +1 to this. The whole point of bandwidth reservation is to not drop 
frames, you should never, ever miss a deadline, if you do, then your 
admission tests are inadequate.

> > >   So you really want a way for the application to query the timing
> > >   constraints and perhaps other properties of the channel it connects
> > >   to. And you want that now before the first application starts to use the
> > >   new ABI. If the application developer does not use it, you still have to
> > >   fix the application, but you have to fix it because the developer was a
> > >   lazy bastard and not because the design was bad. That's a major
> > >   difference.
> > 
> > Ok, this is something that we have considered in the past, but then the feedback
> > here drove us onto a different direction. The overall input we got here was that
> > applications would have to be adjusted or that userspace would have to handle
> > the coordination between applications somehow (e.g.: a daemon could be developed
> > separately to accommodate the fully dynamic use-cases, etc).
> 
> The only thing which will happen is that you get applications which require
> to control the full interface themself because they are so important and
> the only ones which get it right. Good luck with fixing them up.
> 
> That extra daemon if it ever surfaces will be just a PITA. Think about
> 20khz control loops. Do you really want queueing, locking, several context
> switches and priority configuration nightmares in such a scenario?
> Definitely not! You want a fast channel directly to the root qdisc which
> takes care of getting it out at the right point, which might be immediate
> handover if the adapter supports hw scheduling.
> 
> > This is a new requirement for the entire discussion.
> > If I'm not missing anything, however, underutilization of the time slots is only
> > a problem:
> > 
> > 1) for the fully dynamic use-cases and;
> > 2) because now you are designing applications in terms of time slices, right?
> 
> No. It's a general problem. I'm not designing applications in terms of time
> slices. Time slices are a fundamental property of TSN. Whether you use them
> for explicit scheduling or bandwidth reservation or make them flat does not
> matter.
> 
> The application does not necessarily need to know about the time
> constraints at all. But if it wants to use timed scheduling then it better
> does know about them.

yep, +1 in a lot of A/V cases here, the application will have to know about 
presentation_time, and the delay through the network stack should be "low 
and deterministic", but apart from that, the application shouldn't have to 
care about SO_TXTIME and what other applications may or may not do.

> > We have not thought of making any of the proposed qdiscs capable of (optionally)
> > adjusting the "time slices", but mainly because this is not a problem we had
> > here before. Our assumption was that per-port Tx schedules would only be used
> > for static systems. In other words, no, we didn't think that re-balancing the
> > slots was a requirement, not even for 'taprio'.
> 
> Sigh. Utilization is not something entirely new in the network space. I'm
> not saying that this needs to be implemented right away, but designing it
> in a way which forces underutilization is just wrong.
> 
> > > Coming back to the overall scheme. If you start upfront with a time slice
> > > manager which is designed to:
> > >
> > >   - Handle multiple channels
> > >
> > >   - Expose the time constraints, properties per channel
> > >
> > > then you can fit all kind of use cases, whether designed by committee or
> > > not. You can configure that thing per node or network wide. It does not
> > > make a difference. The only difference are the resulting constraints.
> > 
> >
> > Ok, and I believe the above was covered by what we had proposed before, unless
> > what you meant by time constraints is beyond the configured port schedule.
> >
> > Are you suggesting that we'll need to have a kernel entity that is not only
> > aware of the current traffic classes 'schedule', but also of the resources that
> > are still available for new streams to be accommodated into the classes? Putting
> > it differently, is the TAS you envision just an entity that runs a schedule, or
> > is it a time-aware 'orchestrator'?
> 
> In the first place its something which runs a defined schedule.
> 
> The accomodation for new streams is required, but not necessarily at the
> root qdisc level. That might be a qdisc feeding into it.
> 
> Assume you have a bandwidth reservation, aka time slot, for audio. If your
> audio related qdisc does deadline scheduling then you can add new streams
> to it up to the point where it's not longer able to fit.
> 
> The only thing which might be needed at the root qdisc is the ability to
> utilize unused time slots for other purposes, but that's not required to be
> there in the first place as long as its designed in a way that it can be
> added later on.
> 
> > > So lets look once more at the picture in an abstract way:
> > >
> > >      	       [ NIC ]
> > > 	          |
> > > 	 [ Time slice manager ]
> > > 	    |           |
> > >          [ Ch 0 ] ... [ Ch N ]
> > >
> > > So you have a bunch of properties here:
> > >
> > > 1) Number of Channels ranging from 1 to N
> > >
> > > 2) Start point, slice period and slice length per channel
> > 
> > Ok, so we agree that a TAS entity is needed. Assuming that channels are traffic
> > classes, do you have something else in mind other than a new root qdisc?
> 
> Whatever you call it, the important point is that it is the gate keeper to
> the network adapter and there is no way around it. It fully controls the
> timed schedule how simple or how complex it may be.
> 
> > > 3) Queueing modes assigned per channel. Again that might be anything from
> > >    'feed through' over FIFO, PRIO to more complex things like EDF.
> > >
> > >    The queueing mode can also influence properties like the meaning of the
> > >    TX time, i.e. strict or deadline.
> > 
> > 
> > Ok, but how are the queueing modes assigned / configured per channel?
> > 
> > Just to make sure we re-visit some ideas from the past:
> > 
> > * TAS:
> > 
> >    The idea we are currently exploring is to add a "time-aware", priority based
> >    qdisc, that also exposes the Tx queues available and provides a mechanism for
> >    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
> >    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would be:
> > 
> >    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
> >      	   map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
> > 	   queues 0 1 2 3                                              \
> >      	   sched-file gates.sched [base-time <interval>]               \
> >            [cycle-time <interval>] [extension-time <interval>]
> > 
> >    <file> is multi-line, with each line being of the following format:
> >    <cmd> <gate mask> <interval in nanoseconds>
> > 
> >    Qbv only defines one <cmd>: "S" for 'SetGates'
> > 
> >    For example:
> > 
> >    S 0x01 300
> >    S 0x03 500
> > 
> >    This means that there are two intervals, the first will have the gate
> >    for traffic class 0 open for 300 nanoseconds, the second will have
> >    both traffic classes open for 500 nanoseconds.
> 
> To accomodate stuff like control systems you also need a base line, which
> is not expressed as interval. Otherwise you can't schedule network wide
> explicit plans. That's either an absolute network-time (TAI) time stamp or
> an offset to a well defined network-time (TAI) time stamp, e.g. start of
> epoch or something else which is agreed on. The actual schedule then fast
> forwards past now (TAI) and sets up the slots from there. That makes node
> hotplug possible as well.

Ok, so this is perhaps a bit of a sidetrack, but based on other discussions 
in this patch-series, does it really make sense to discuss anything *but* 
TAI?

If you have a TSN-stream (or any other time-sensitive way of prioritizing 
frames based on time), then the network is going to be PTP synched anyway, 
and all the rest of the network is going to operate on PTP-time. Why even 
bother adding CLOCK_REALTIME and CLOCK_MONOTONIC to the discussion? Sure, 
use CLOCK_REALTIME locally and sync that to TAI, but the kernel should 
worry about ptp-time _for_that_adapter_, and we should make it pretty 
obvious to userspace that if you want to specify tx-time, then there's this 
thing called 'PTP' and it rules this domain. My $0.02 etc

> Btw, it's not only control systems. Think about complex multi source A/V
> streams. They are reality in recording and life mixing and looking at the
> timing constraints of such scenarios, collision avoidance is key there. So
> you want to be able to do network wide traffic orchestration.

Yep, and if are too bursty, the network is free to drop your frames, which 
is not desired.

> > It would handle multiple channels and expose their constraints / properties.
> > Each channel also becomes a traffic class, so other qdiscs can be attached to
> > them separately.
> 
> Right.

I don't think you need a separate qdisc for each channel, if you describe a 
channel with

- period (what AVB calls observation interval)
- max data
- deadline

you should be able to keep a sorted rb-tree and handle that pretty 
efficiently. Or perhaps I'm completely missing the mark here. If so, my 
apologies

> > So, in summary, because our entire design is based on qdisc interfaces, what we
> > had proposed was a root qdisc (the time slice manager, as you put) that allows
> > for other qdiscs to be attached to each channel. The inner qdiscs define the
> > queueing modes for each channel, and tbs is just one of those modes. I
> > understand now that you want to allow for fully dynamic use-cases to be
> > supported as well, which we hadn't covered with our TAS proposal before because
> > we hadn't envisioned it being used for these systems' design.
> 
> Yes, you have the root qdisc, which is in charge of the overall scheduling
> plan, how complex or not it is defined does not matter. It exposes traffic
> classes which have properties defined by the configuration.
> 
> The qdiscs which are attached to those traffic classes can be anything
> including:
> 
>  - Simple feed through (Applications are time contraints aware and set the
>    exact schedule). qdisc has admission control.
> 
>  - Deadline aware qdisc to handle e.g. A/V streams. Applications are aware
>    of time constraints and provide the packet deadline. qdisc has admission
>    control. This can be a simple first comes, first served scheduler or
>    something like EDF which allows optimized utilization. The qdisc sets
>    the TX time depending on the deadline and feeds into the root.

As a small nitpick, it would make more sense to do a laxity-approach here, 
both for explicit mode and deadline-mode. We know the size of the frame to 
send, we know the outgoing rate, so keep a ready-queue sorted based on 
laxity

     laxity = absolute_deadline - (size / outgoing_rate)

Also, given that we use a *single* tx-queue for time-triggered 
transmission, this boils down to a uniprocessor equivalent and we have a 
lot of func real-time scheduling academia to draw from.

This could then probably handle both of the above (Direct + deadline), but 
that's implementatino specific I guess.

>  - FIFO/PRIO/XXX for general traffic. Applications do not know anything
>    about timing constraints. These qdiscs obviously have neither admission
>    control nor do they set a TX time.  The root qdisc just pulls from there
>    when the assigned time slot is due or if it (optionally) decides to use
>    underutilized time slots from other classes.
> 
>  - .... Add your favourite scheduling mode(s).

Just give it sub-qdiscs and offload enqueue/dequeue to those I suppose.

-- 
Henrik Austad