From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <netdev-owner@vger.kernel.org>
Received: from mga02.intel.com ([134.134.136.20]:18640 "EHLO mga02.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751837AbeCVU2h (ORCPT <rfc822;netdev@vger.kernel.org>);
        Thu, 22 Mar 2018 16:28:37 -0400
Subject: Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
To: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com,
        jiri@resnulli.us, vinicius.gomes@intel.com,
        richardcochran@gmail.com, anna-maria@linutronix.de,
        henrik@austad.us, John Stultz <john.stultz@linaro.org>,
        levi.pearson@harman.com, edumazet@google.com, willemb@google.com,
        mlichvar@redhat.com
References: <20180307011230.24001-1-jesus.sanchez-palencia@intel.com>
 <20180307011230.24001-14-jesus.sanchez-palencia@intel.com>
 <alpine.DEB.2.21.1803211407520.3754@nanos.tec.linutronix.de>
 <alpine.DEB.2.21.1803211758140.3754@nanos.tec.linutronix.de>
From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Message-ID: <65da0648-b835-a171-3986-2d1ddcb8ea10@intel.com>
Date: Thu, 22 Mar 2018 13:25:44 -0700
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.21.1803211758140.3754@nanos.tec.linutronix.de>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Hi Thomas,


On 03/21/2018 03:29 PM, Thomas Gleixner wrote:
> On Wed, 21 Mar 2018, Thomas Gleixner wrote:
>> If you look at the use cases of TDM in various fields then FIFO mode is
>> pretty much useless. In industrial/automotive fieldbus applications the
>> various time slices are filled by different threads or even processes.
> 
> That brings me to a related question. The TDM cases I'm familiar with which
> aim to use this utilize multiple periodic time slices, aka 802.1Qbv
> time-aware scheduling.
> 
> Simple example:
> 
> [1a][1b][1c][1d]		[1a][1b][1c][1d]		[.....
> 		[2a][2b]			[2c][2d]
> 			[3a]				[3b]
> 			    [4a]			    [4b]
> ---------------------------------------------------------------------->	t		    
> 
> where 1-4 is the slice level and a-d are network nodes.
> 
> In most cases the slice levels on a node are handled by different
> applications or threads. Some of the protocols utilize dedicated time slice
> levels - lets assume '4' in the above example - to run general network
> traffic which might even be allowed to have collisions, i.e. [4a-d] would
> become [4] and any node can send; the involved componets like switches are
> supposed to handle that.
> 
> I'm not seing how TBS is going to assist with any of that. It requires
> everything to be handled at the application level. Not really useful
> especially not for general traffic which does not know about the scheduling
> bands at all.
> 
> If you look at an industrial control node. It basically does:
> 
> 	queue_first_packet(tx, slice1);
>    	while (!stop) {
> 		if (wait_for_packet(rx) == ERROR)
> 			goto errorhandling;
> 		tx = do_computation(rx);
> 		queue_next_tx(tx, slice1);
> 	}
> 
> that's a pretty common pattern for these kind of applications. For audio
> sources queue_next() might be triggered by the input sampler which needs to
> be synchronized to the network slices anyway in order to work properly.
> 
> TBS per current implementation is nice as a proof of concept, but it solves
> just a small portion of the complete problem space. I have the suspicion
> that this was 'designed' to replace the user space hack in the AVNU stack
> with something close to it. Not really a good plan to be honest.
> 
> I think what we really want is a strict periodic scheduler which supports
> multiple slices as shown above because thats what all relevant TDM use
> cases need: A/V, industrial fieldbusses .....
> 
>   |---------------------------------------------------------|
>   |                                                         |
>   |                           TAS                           |<- Config
>   |    1               2               3               4    |
>   |---------------------------------------------------------|
>        |               |               |               |
>        |               |               |               |
>        |               |               |               |
>        |               |               |               |
>   [DirectSocket]   [Qdisc FIFO]   [Qdisc Prio]     [Qdisc FIFO]
>                        |               |               |
> 		       |               |               |
> 		    [Socket]   	    [Socket]     [General traffic]
> 
> 
> The interesting thing here is that it does not require any time stamp
> information brought in from the application. That's especially good for
> general network traffic which is routed through a dedicated time slot. If
> we don't have that then we need a user space scheduler which does exactly
> the same thing and we have to route the general traffic out to user space
> and back into the kernel, which is obviously a pointless exercise.
> 
> There are all kind of TDM schemes out there which are not directly driven
> by applications, but rather route categorized traffic like VLANs through
> dedicated time slices. That works pretty well with the above scheme because
> in that case the applications might be completely oblivious about the tx
> time schedule.
> 
> Surely there are protocols which do not utilize every time slice they could
> use, so we need a way to tell the number of empty slices between two
> consecutive packets. There are also different policies vs. the unused time
> slices, like sending dummy frames or just nothing which wants to be
> addressed, but I don't think that changes the general approach.
> 
> There might be some special cases for setup or node hotplug, but the
> protocols I'm familiar with handle these in dedicated time slices or
> through general traffic so it should just fit in.
> 
> I'm surely missing some details, but from my knowledge about the protocols
> which want to utilize this, the general direction should be fine.
> 
> Feel free to tell me that I'm missing the point completely though :)
> 
> Thoughts?


We agree with most of the above. :)
Actually, last year Vinicius shared our ideas for a "time-aware priority" root
qdisc as part of the cbs RFC cover letter, dubbed 'taprio':

https://patchwork.ozlabs.org/cover/808504/

Our plan was to work directly with the Qbv-like scheduling (per-port) just after
the cbs qdisc (Qav), but the feedback here and offline was that there were use
cases for a more simplistic launchtime approach (per-queue) as well. We've
decided to invest on it first (and postpone the 'taprio' qdisc until there was
NIC available with HW support for it, basically).

You are right, and we agree, that using tbs for a per-port schedule of any sort
will require a SW scheduler to be developed on top of it, but we've never said
the contrary either. Our vision has always been that these are separate
mechanisms with different use-cases, so we do see the value for the kernel to
provide both.

In other words, tbs is not the final solution for Qbv, and we agree that a 'TAS'
qdisc is still necessary. And due to the wide range of applications and hw being
used for those out there, we need both specially given that one does not block
the other.


What do you think?

Thanks,
Jesus