From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <netdev-owner@vger.kernel.org>
Received: from Galois.linutronix.de ([146.0.238.70]:40302 "EHLO
        Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751548AbeCVWLS (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 22 Mar 2018 18:11:18 -0400
Date: Thu, 22 Mar 2018 23:11:10 +0100 (CET)
From: Thomas Gleixner <tglx@linutronix.de>
To: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
cc: netdev@vger.kernel.org, jhs@mojatatu.com, xiyou.wangcong@gmail.com,
        jiri@resnulli.us, vinicius.gomes@intel.com,
        richardcochran@gmail.com, anna-maria@linutronix.de,
        henrik@austad.us, john.stultz@linaro.org, levi.pearson@harman.com,
        edumazet@google.com, willemb@google.com, mlichvar@redhat.com
Subject: Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
In-Reply-To: <7c3f5a9f-cc16-8483-cb77-b5548d46cd5b@intel.com>
Message-ID: <alpine.DEB.2.21.1803222227060.1489@nanos.tec.linutronix.de>
References: <20180307011230.24001-1-jesus.sanchez-palencia@intel.com> <20180307011230.24001-14-jesus.sanchez-palencia@intel.com> <alpine.DEB.2.21.1803211407520.3754@nanos.tec.linutronix.de> <7c3f5a9f-cc16-8483-cb77-b5548d46cd5b@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Thu, 22 Mar 2018, Jesus Sanchez-Palencia wrote:
> On 03/21/2018 06:46 AM, Thomas Gleixner wrote:
> > If you look at the use cases of TDM in various fields then FIFO mode is
> > pretty much useless. In industrial/automotive fieldbus applications the
> > various time slices are filled by different threads or even processes.
> > 
> > Sure, the rbtree queue/dequeue has overhead compared to a simple linked
> > list, but you pay for that with more indirections and lots of mostly
> > duplicated code. And in the worst case one of these code pathes is going to
> > be rarely used and prone to bitrot.
> 
> 
> Our initial version (on RFC v2) was performing the sorting for all modes. After
> all the feedback we got we decided to make it optional and provide FIFO modes as
> well. For the SW fallback we need the scheduled FIFO, and for "pure" hw offload
> we need the "raw" FIFO.

I don't see how FIFO ever works without the issue that a newly qeueud
packet which has an earlier time stamp than the head of the FIFO list will
lose. Why would you even want to have that mode? Just because some weird
existing application misdesign thinks its required? That doesn't make it a
good idea.

With pure hardware offload the packets are immediately handed off to the
network card and that one is responsible for sending it on time. So there
is no FIFO at all. It's actually a bypass mode.

> This was a way to accommodate all the use cases without imposing too much of a
> burden onto anyone, regardless of their application's segment (i.e. industrial,
> pro a/v, automotive, etc).

I'm not buying that argument at all. That's all handwaving.

The whole approach is a burden on every application segment because it
pushes the whole schedule and time slice management out to user space,
which also requires that you route general traffic down to that user space
scheduling entity and then queue it back into the proper time slice. And
FIFO makes that even worse.

> Having the sorting always enabled requires that a valid static clockid is passed
> to the qdisc. For the hw offload mode, that means that the PHC and one of the
> system clocks must be synchronized since hrtimers do not support dynamic clocks.
> Not all systems do that or want to, and given that we do not want to perform
> crosstimestamping between the packets' clock reference and the qdisc's one, the
> only solution for these systems would be using the raw hw offload mode.

There are two variants of hardware offload:

1) Full hardware offload

   That bypasses the queue completely. You just stick the thing into the
   scatter gather buffers. Except when there is no room anymore, then you
   have to queue, but it does not make any difference if you queue in FIFO
   or in time order. The packets go out in time order anyway.

2) Single packet hardware offload

   What you do here is to schedule a hrtimer a bit earlier than the first
   packet tx time and when it fires stick the packet into the hardware and
   rearm the timer for the next one.

   The whole point of TSN with hardware support is that you have:

       - Global network time

       and

       - Frequency adjustment of the system time base

    PTP is TAI based and the kernel exposes clock TAI directly through
    hrtimers. You don't need dynamic clocks for that.

    You can even use clock MONOTONIC as it basically is just

       TAI - offset

If the network card uses anything else than TAI or a time stamp with a
strict correlation to TAI for actual TX scheduling then the whole thing is
broken to begin with.

Thanks,

	tglx