All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us,
	vinicius.gomes@intel.com, richardcochran@gmail.com,
	anna-maria@linutronix.de, henrik@austad.us,
	john.stultz@linaro.org, levi.pearson@harman.com,
	edumazet@google.com, willemb@google.com, mlichvar@redhat.com
Subject: Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
Date: Thu, 22 Mar 2018 13:29:00 -0700	[thread overview]
Message-ID: <7c3f5a9f-cc16-8483-cb77-b5548d46cd5b@intel.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1803211407520.3754@nanos.tec.linutronix.de>

Hi Thomas,


On 03/21/2018 06:46 AM, Thomas Gleixner wrote:
> On Tue, 6 Mar 2018, Jesus Sanchez-Palencia wrote:
>> +struct tbs_sched_data {
>> +	bool sorting;
>> +	int clockid;
>> +	int queue;
>> +	s32 delta; /* in ns */
>> +	ktime_t last; /* The txtime of the last skb sent to the netdevice. */
>> +	struct rb_root head;
> 
> Hmm. You are reimplementing timerqueue open coded. Have you checked whether
> you could reuse the timerqueue implementation?
> 
> That requires to add a timerqueue node to struct skbuff
> 
> @@ -671,7 +671,8 @@ struct sk_buff {
>  				unsigned long		dev_scratch;
>  			};
>  		};
> -		struct rb_node	rbnode; /* used in netem & tcp stack */
> +		struct rb_node		rbnode; /* used in netem & tcp stack */
> +		struct timerqueue_node	tqnode;
>  	};
>  	struct sock		*sk;
> 
> Then you can use timerqueue_head in your scheduler data and all the open
> coded rbtree handling goes away.


Yes, you are right. We actually looked into that for the first prototype of this
qdisc but we weren't so sure about adding the timerqueue node to the sk_buff's
union and whether it would impact the other usages here, but looking again now
and it looks fine.

We'll fix for the next version, thanks.


> 
>> +static bool is_packet_valid(struct Qdisc *sch, struct sk_buff *nskb)
>> +{
>> +	struct tbs_sched_data *q = qdisc_priv(sch);
>> +	ktime_t txtime = nskb->tstamp;
>> +	struct sock *sk = nskb->sk;
>> +	ktime_t now;
>> +
>> +	if (sk && !sock_flag(sk, SOCK_TXTIME))
>> +		return false;
>> +
>> +	/* We don't perform crosstimestamping.
>> +	 * Drop if packet's clockid differs from qdisc's.
>> +	 */
>> +	if (nskb->txtime_clockid != q->clockid)
>> +		return false;
>> +
>> +	now = get_time_by_clockid(q->clockid);
> 
> If you store the time getter function pointer in tbs_sched_data then you
> avoid the lookup and just can do
> 
>        now = q->get_time();
> 
> That applies to lots of other places.


Good idea, thanks. Will fix.



>> +
>> +static struct sk_buff *tbs_peek_timesortedlist(struct Qdisc *sch)
>> +{
>> +	struct tbs_sched_data *q = qdisc_priv(sch);
>> +	struct rb_node *p;
>> +
>> +	p = rb_first(&q->head);
> 
> timerqueue gives you direct access to the first expiring entry w/o walking
> the rbtree. So that would become:
> 
> 	p = timerqueue_getnext(&q->tqhead);
> 	return p ? rb_to_skb(p) : NULL;

OK.

(...)

>> +static struct sk_buff *tbs_dequeue_scheduledfifo(struct Qdisc *sch)
>> +{
>> +	struct tbs_sched_data *q = qdisc_priv(sch);
>> +	struct sk_buff *skb = tbs_peek(sch);
>> +	ktime_t now, next;
>> +
>> +	if (!skb)
>> +		return NULL;
>> +
>> +	now = get_time_by_clockid(q->clockid);
>> +
>> +	/* Drop if packet has expired while in queue and the drop_if_late
>> +	 * flag is set.
>> +	 */
>> +	if (skb->tc_drop_if_late && ktime_before(skb->tstamp, now)) {
>> +		struct sk_buff *to_free = NULL;
>> +
>> +		qdisc_queue_drop_head(sch, &to_free);
>> +		kfree_skb_list(to_free);
>> +		qdisc_qstats_overlimit(sch);
>> +
>> +		skb = NULL;
>> +		goto out;
> 
> Instead of going out immediately you should check the next skb whether its
> due for sending already.

We wanted to have a baseline before starting with the optimizations, so we left
this for a later patchset. It was one of the opens we had listed on the v2 cover
letter IIRC, but we'll look into it.


(...)


>> +	}
>> +
>> +	next = ktime_sub_ns(skb->tstamp, q->delta);
>> +
>> +	/* Dequeue only if now is within the [txtime - delta, txtime] range. */
>> +	if (ktime_after(now, next))
>> +		timesortedlist_erase(sch, skb, false);
>> +	else
>> +		skb = NULL;
>> +
>> +out:
>> +	/* Now we may need to re-arm the qdisc watchdog for the next packet. */
>> +	reset_watchdog(sch);
>> +
>> +	return skb;
>> +}
>> +
>> +static inline void setup_queueing_mode(struct tbs_sched_data *q)
>> +{
>> +	if (q->sorting) {
>> +		q->enqueue = tbs_enqueue_timesortedlist;
>> +		q->dequeue = tbs_dequeue_timesortedlist;
>> +		q->peek = tbs_peek_timesortedlist;
>> +	} else {
>> +		q->enqueue = tbs_enqueue_scheduledfifo;
>> +		q->dequeue = tbs_dequeue_scheduledfifo;
>> +		q->peek = qdisc_peek_head;
> 
> I don't see the point of these two modes and all the duplicated code it
> involves.
> 
> FIFO mode limits usage to a single thread which has to guarantee that the
> packets are queued in time order.
> 
> If you look at the use cases of TDM in various fields then FIFO mode is
> pretty much useless. In industrial/automotive fieldbus applications the
> various time slices are filled by different threads or even processes.
> 
> Sure, the rbtree queue/dequeue has overhead compared to a simple linked
> list, but you pay for that with more indirections and lots of mostly
> duplicated code. And in the worst case one of these code pathes is going to
> be rarely used and prone to bitrot.


Our initial version (on RFC v2) was performing the sorting for all modes. After
all the feedback we got we decided to make it optional and provide FIFO modes as
well. For the SW fallback we need the scheduled FIFO, and for "pure" hw offload
we need the "raw" FIFO.

This was a way to accommodate all the use cases without imposing too much of a
burden onto anyone, regardless of their application's segment (i.e. industrial,
pro a/v, automotive, etc).

Having the sorting always enabled requires that a valid static clockid is passed
to the qdisc. For the hw offload mode, that means that the PHC and one of the
system clocks must be synchronized since hrtimers do not support dynamic clocks.
Not all systems do that or want to, and given that we do not want to perform
crosstimestamping between the packets' clock reference and the qdisc's one, the
only solution for these systems would be using the raw hw offload mode.


Thanks,
Jesus

  parent reply	other threads:[~2018-03-22 20:31 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-07  1:12 [RFC v3 net-next 00/18] Time based packet transmission Jesus Sanchez-Palencia
2018-03-07  1:12 ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 01/18] sock: Fix SO_ZEROCOPY switch case Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07 16:58   ` Willem de Bruijn
2018-03-07 16:58     ` [Intel-wired-lan] " Willem de Bruijn
2018-03-07  1:12 ` [RFC v3 net-next 02/18] net: Clear skb->tstamp only on the forwarding path Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07 16:59   ` Willem de Bruijn
2018-03-07 16:59     ` [Intel-wired-lan] " Willem de Bruijn
2018-03-07 22:03     ` Jesus Sanchez-Palencia
2018-03-07 22:03       ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 03/18] posix-timers: Add CLOCKID_INVALID mask Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 04/18] net: Add a new socket option for a future transmit time Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 05/18] net: ipv4: raw: Hook into time based transmission Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07 17:00   ` Willem de Bruijn
2018-03-07 17:00     ` [Intel-wired-lan] " Willem de Bruijn
2018-03-07  1:12 ` [RFC v3 net-next 06/18] net: ipv4: udp: " Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07 17:00   ` Willem de Bruijn
2018-03-07 17:00     ` [Intel-wired-lan] " Willem de Bruijn
2018-03-07  1:12 ` [RFC v3 net-next 07/18] net: packet: " Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 08/18] net: SO_TXTIME: Add clockid and drop_if_late params Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  2:53   ` Eric Dumazet
2018-03-07  2:53     ` [Intel-wired-lan] " Eric Dumazet
2018-03-07  5:24     ` Richard Cochran
2018-03-07  5:24       ` [Intel-wired-lan] " Richard Cochran
2018-03-07 17:01       ` Willem de Bruijn
2018-03-07 17:01         ` [Intel-wired-lan] " Willem de Bruijn
2018-03-07 17:35         ` Richard Cochran
2018-03-07 17:35           ` [Intel-wired-lan] " Richard Cochran
2018-03-07 17:37           ` Richard Cochran
2018-03-07 17:37             ` [Intel-wired-lan] " Richard Cochran
2018-03-07 17:47             ` Eric Dumazet
2018-03-07 17:47               ` [Intel-wired-lan] " Eric Dumazet
2018-03-08 16:44               ` Richard Cochran
2018-03-08 16:44                 ` [Intel-wired-lan] " Richard Cochran
2018-03-08 17:56                 ` Jesus Sanchez-Palencia
2018-03-08 17:56                   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-21 12:58       ` Thomas Gleixner
2018-03-21 12:58         ` [Intel-wired-lan] " Thomas Gleixner
2018-03-21 14:59         ` Richard Cochran
2018-03-21 14:59           ` [Intel-wired-lan] " Richard Cochran
2018-03-21 15:11           ` Thomas Gleixner
2018-03-07 21:52     ` Jesus Sanchez-Palencia
2018-03-07 21:52       ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07 22:45       ` Eric Dumazet
2018-03-07 22:45         ` [Intel-wired-lan] " Eric Dumazet
2018-03-07 23:03         ` David Miller
2018-03-07 23:03           ` [Intel-wired-lan] " David Miller
2018-03-08 11:37         ` Miroslav Lichvar
2018-03-08 11:37           ` [Intel-wired-lan] " Miroslav Lichvar
2018-03-08 16:25           ` David Miller
2018-03-08 16:25             ` [Intel-wired-lan] " David Miller
2018-03-07  1:12 ` [RFC v3 net-next 09/18] net: ipv4: raw: Handle remaining txtime parameters Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 10/18] net: ipv4: udp: " Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 11/18] net: packet: " Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 12/18] net/sched: Allow creating a Qdisc watchdog with other clocks Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-21 13:46   ` Thomas Gleixner
2018-03-21 13:46     ` [Intel-wired-lan] " Thomas Gleixner
2018-03-21 22:29     ` Thomas Gleixner
2018-03-22 20:25       ` Jesus Sanchez-Palencia
2018-03-22 22:52         ` Thomas Gleixner
2018-03-24  0:34           ` Jesus Sanchez-Palencia
2018-03-25 11:46             ` Thomas Gleixner
2018-03-27 23:26               ` Jesus Sanchez-Palencia
2018-03-28  7:48                 ` Thomas Gleixner
2018-03-28 13:07                   ` Henrik Austad
2018-04-09 16:36                   ` Jesus Sanchez-Palencia
2018-04-10 12:37                     ` Thomas Gleixner
2018-04-10 21:24                       ` Jesus Sanchez-Palencia
2018-04-11 20:16                         ` Thomas Gleixner
2018-04-11 20:31                           ` Ivan Briano
2018-04-11 23:38                           ` Jesus Sanchez-Palencia
2018-04-12 15:03                             ` Richard Cochran
2018-04-12 15:19                               ` Miroslav Lichvar
2018-04-19 10:03                             ` Thomas Gleixner
2018-03-22 20:29     ` Jesus Sanchez-Palencia [this message]
2018-03-22 22:11       ` Thomas Gleixner
2018-03-22 23:26         ` Jesus Sanchez-Palencia
2018-03-23  8:49           ` Thomas Gleixner
2018-03-23 23:34             ` Jesus Sanchez-Palencia
2018-04-23 18:21     ` Jesus Sanchez-Palencia
2018-04-23 18:21       ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-04-24  8:50       ` Thomas Gleixner
2018-04-24  8:50         ` [Intel-wired-lan] " Thomas Gleixner
2018-04-24 13:50         ` David Miller
2018-04-24 13:50           ` [Intel-wired-lan] " David Miller
2018-03-07  1:12 ` [RFC v3 net-next 14/18] net/sched: Add HW offloading capability to TBS Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-21 14:22   ` Thomas Gleixner
2018-03-21 14:22     ` [Intel-wired-lan] " Thomas Gleixner
2018-03-21 15:03     ` Richard Cochran
2018-03-21 15:03       ` [Intel-wired-lan] " Richard Cochran
2018-03-21 16:18       ` Thomas Gleixner
2018-03-22 22:01         ` Jesus Sanchez-Palencia
2018-03-22 23:15     ` Jesus Sanchez-Palencia
2018-03-22 23:15       ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-23  8:51       ` Thomas Gleixner
2018-03-23  8:51         ` [Intel-wired-lan] " Thomas Gleixner
2018-03-07  1:12 ` [RFC v3 net-next 15/18] igb: Refactor igb_configure_cbs() Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 16/18] igb: Only change Tx arbitration when CBS is on Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 17/18] igb: Refactor igb_offload_cbs() Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  1:12 ` [RFC v3 net-next 18/18] igb: Add support for TBS offload Jesus Sanchez-Palencia
2018-03-07  1:12   ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-07  5:28 ` [RFC v3 net-next 00/18] Time based packet transmission Richard Cochran
2018-03-07  5:28   ` [Intel-wired-lan] " Richard Cochran
2018-03-08 14:09 ` Henrik Austad
2018-03-08 14:09   ` [Intel-wired-lan] " Henrik Austad
2018-03-08 18:06   ` Jesus Sanchez-Palencia
2018-03-08 18:06     ` [Intel-wired-lan] " Jesus Sanchez-Palencia
2018-03-08 22:54     ` Henrik Austad
2018-03-08 22:54       ` [Intel-wired-lan] " Henrik Austad
2018-03-08 23:58       ` Jesus Sanchez-Palencia
2018-03-08 23:58         ` [Intel-wired-lan] " Jesus Sanchez-Palencia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c3f5a9f-cc16-8483-cb77-b5548d46cd5b@intel.com \
    --to=jesus.sanchez-palencia@intel.com \
    --cc=anna-maria@linutronix.de \
    --cc=edumazet@google.com \
    --cc=henrik@austad.us \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.stultz@linaro.org \
    --cc=levi.pearson@harman.com \
    --cc=mlichvar@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=vinicius.gomes@intel.com \
    --cc=willemb@google.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.