From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <netdev-owner@vger.kernel.org>
Received: from mga02.intel.com ([134.134.136.20]:3670 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752227AbeCXAhi (ORCPT <rfc822;netdev@vger.kernel.org>);
        Fri, 23 Mar 2018 20:37:38 -0400
Subject: Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
To: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com,
        jiri@resnulli.us, vinicius.gomes@intel.com,
        richardcochran@gmail.com, anna-maria@linutronix.de,
        henrik@austad.us, John Stultz <john.stultz@linaro.org>,
        levi.pearson@harman.com, edumazet@google.com, willemb@google.com,
        mlichvar@redhat.com
References: <20180307011230.24001-1-jesus.sanchez-palencia@intel.com>
 <20180307011230.24001-14-jesus.sanchez-palencia@intel.com>
 <alpine.DEB.2.21.1803211407520.3754@nanos.tec.linutronix.de>
 <alpine.DEB.2.21.1803211758140.3754@nanos.tec.linutronix.de>
 <65da0648-b835-a171-3986-2d1ddcb8ea10@intel.com>
 <alpine.DEB.2.21.1803222312061.1489@nanos.tec.linutronix.de>
From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Message-ID: <2897b562-06e0-0fcc-4fb1-e8c4469c0faa@intel.com>
Date: Fri, 23 Mar 2018 17:34:44 -0700
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.21.1803222312061.1489@nanos.tec.linutronix.de>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Hi,


On 03/22/2018 03:52 PM, Thomas Gleixner wrote:
> On Thu, 22 Mar 2018, Jesus Sanchez-Palencia wrote:
>> Our plan was to work directly with the Qbv-like scheduling (per-port) just after
>> the cbs qdisc (Qav), but the feedback here and offline was that there were use
>> cases for a more simplistic launchtime approach (per-queue) as well. We've
>> decided to invest on it first (and postpone the 'taprio' qdisc until there was
>> NIC available with HW support for it, basically).
> 
> I missed that discussion due to other urgent stuff on my plate. Just
> skimmed through it. More below.
> 
>> You are right, and we agree, that using tbs for a per-port schedule of any sort
>> will require a SW scheduler to be developed on top of it, but we've never said
>> the contrary either. Our vision has always been that these are separate
>> mechanisms with different use-cases, so we do see the value for the kernel to
>> provide both.
>>
>> In other words, tbs is not the final solution for Qbv, and we agree that a 'TAS'
>> qdisc is still necessary. And due to the wide range of applications and hw being
>> used for those out there, we need both specially given that one does not block
>> the other.
> 
> So what's the plan for this? Having TAS as a separate entity or TAS feeding
> into the proposed 'basic' time transmission thing?


The second one, I guess. Elaborating, the plan is at some point having TAS as a
separate entity, but which can use tbs for one of its classes (and cbs for
another, and strict priority for everything else, etc).

Basically, the design would something along the lines of 'taprio'. A root qdisc
that is both time and priority aware, and capable of running a schedule for the
port. That schedule can run inside the kernel with hrtimers, or just be
offloaded into the controller if Qbv is supported on HW.

Because it would expose the inner traffic classes in a mq / mqprio / prio style,
then it would allow for other per-queue qdiscs to be attached to it. On a system
using the i210, for instance, we could then have tbs installed on traffic class
0 just dialing hw offload. The Qbv schedule would be running in SW on the TAS
entity (i.e. 'taprio') which would be setting the packets' txtime before
dequeueing packets on a fast path -> tbs -> NIC.

Similarly, other qdisc, like cbs, could be installed if all that traffic class
requires is traffic shaping once its 'gate' is allowed to execute the selected
tx algorithm attached to it.


> 
> The general objection I have with the current approach is that it creates
> the playground for all flavours of misdesigned user space implementations
> and just replaces the home brewn and ugly user mode network adapter
> drivers.
> 
> But that's not helping the cause at all. There is enough crappy stuff out
> there already and I rather see a proper designed slice management which can
> be utilized and improved by all involved parties.
> 
> All variants which utilize the basic time driven packet transmission are
> based on periodic explicit plan scheduling with (local) network wide time
> slice assignment.
> 
> It does not matter whether you feed VLAN traffic into a time slice, where
> the VLAN itself does not even have to know about it, or if you have aware
> applications feeding packets to a designated timeslot. The basic principle
> of this is always the same.
> 
> So coming back to last years discussion. It totally went into the wrong
> direction because it turned from an approach (the patches) which came from
> the big picture to an single use case and application centric view. That's
> just wrong and I regret that I didn't have the time to pay attention back
> then.
> 
> You always need to look at the big picture first and design from there, not
> the other way round. There will always be the argument:
> 
>     But my application is special and needs X
> 
> It's easy to fall for that. From a long experience I know that none of
> these claims ever held. These arguments are made because the people making
> them have either never looked at the big picture or are simply refusing to
> do so because it would cause them work.
> 
> If you start from the use case and application centric view and ignore the
> big picture then you end up in a gazillion of extra magic features over
> time which could have been completely avoided if you had put your foot down
> and made everyone to agree on a proper and versatile design in the first
> place.
> 
> The more low level access you hand out in the beginning the less commonly
> used, improved and maintained infrastrucure you will get in the end. That
> has happened before in other areas and it will happen here as well. You
> create a user space ABI which you cant get rid off and before you come out
> with the proper interface after that a large number of involved parties
> have gone off and implemented on top of the low level ABI and they will
> never look back.
> 
> In the (not so) long run this will create a lot more issues than it
> solves. A simple example is that you cannot run two applications which
> easily could share the network in parallel without major surgery because
> both require to be the management authority.
> 
> I've not yet seen a convincing argument why this low level stuff with all
> of its weird flavours is superiour over something which reflects the basic
> operating principle of TSN.


As you know, not all TSN systems are designed the same. Take AVB systems, for
example. These not always are running on networks that are aware of any time
schedule, or at least not quite like what is described by Qbv.

On those systems there is usually a certain number of streams with different
priorities that care mostly about having their bandwidth reserved along the
network. The applications running on such systems are usually based on AVTP,
thus they already have to calculate and set the "avtp presentation time"
per-packet themselves. A Qbv scheduler would probably provide very little
benefits to this domain, IMHO. For "talkers" of these AVB systems, shaping
traffic using txtime (i.e. tbs) can provide a low-jitter alternative to cbs, for
instance.


Thanks,
Jesus

> 
> Thanks,
> 
> 	tglx
> 
> 
> 
> 
> 
> 
> 
>