From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Vladimir Oltean <olteanv@gmail.com>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
netdev@vger.kernel.org, Kurt Kanzenbach <kurt@linutronix.de>
Subject: Re: [PATCH RFC net-next] taprio: Handle short intervals and large packets
Date: Fri, 12 Mar 2021 12:18:54 -0800 [thread overview]
Message-ID: <87wnucktw1.fsf@vcostago-mobl2.amr.corp.intel.com> (raw)
In-Reply-To: <20210312092823.1429-1-kurt@linutronix.de>
Kurt Kanzenbach <kurt@linutronix.de> writes:
> When using short intervals e.g. below one millisecond, large packets won't be
> transmitted at all. The software implementations checks whether the packet can
> be fit into the remaining interval. Therefore, it takes the packet length and
> the transmission speed into account. That is correct.
>
> However, for large packets it may be that the transmission time will be larger
> than the interval resulting in no packet transmission. The same situation works
> fine with hardware offloading applied.
>
> The problem has been observerd with the following schedule and iperf3:
>
> |tc qdisc replace dev lan1 parent root handle 100 taprio \
> | num_tc 8 \
> | map 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 \
> | queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> | base-time $base \
> | sched-entry S 0x40 500000 \
> | sched-entry S 0xbf 500000 \
> | clockid CLOCK_TAI \
> | flags 0x00
>
> [...]
>
> |root@tsn:~# iperf3 -c 192.168.2.105
> |Connecting to host 192.168.2.105, port 5201
> |[ 5] local 192.168.2.121 port 52610 connected to 192.168.2.105 port 5201
> |[ ID] Interval Transfer Bitrate Retr Cwnd
> |[ 5] 0.00-1.00 sec 45.2 KBytes 370 Kbits/sec 0 1.41 KBytes
> |[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
>
> After debugging, it seems that the packet length stored in the SKB is about
> 7000-8000 bytes. Using a 100 Mbit/s link the transmission time is about 600us
> which larger than the interval of 500us.
>
> Therefore, segment the SKB into smaller chunks if the packet is too big. This
> yields similar results than the hardware offload:
>
> |root@tsn:~# iperf3 -c 192.168.2.105
> |Connecting to host 192.168.2.105, port 5201
> |- - - - - - - - - - - - - - - - - - - - - - - - -
> |[ ID] Interval Transfer Bitrate Retr
> |[ 5] 0.00-10.00 sec 48.9 MBytes 41.0 Mbits/sec 0 sender
> |[ 5] 0.00-10.02 sec 48.7 MBytes 40.7 Mbits/sec receiver
>
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> ---
> net/sched/sch_taprio.c | 39 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 8287894541e3..8434e87f79f7 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -411,6 +411,42 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
> return txtime;
> }
>
> +/* Similar to net/sched/sch_tbf.c::tbf_segment */
> +static int taprio_segment(struct sk_buff *skb, struct Qdisc *sch,
> + struct Qdisc *child, struct sk_buff **to_free)
> +{
> + netdev_features_t features = netif_skb_features(skb);
> + unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
> + struct sk_buff *segs, *nskb;
> + int ret, nb;
> +
> + segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
> +
> + if (IS_ERR_OR_NULL(segs))
> + return qdisc_drop(skb, sch, to_free);
> +
> + nb = 0;
> + skb_list_walk_safe(segs, segs, nskb) {
> + skb_mark_not_on_list(segs);
> + qdisc_skb_cb(segs)->pkt_len = segs->len;
> + len += segs->len;
> + ret = qdisc_enqueue(segs, child, to_free);
> + if (ret != NET_XMIT_SUCCESS) {
> + if (net_xmit_drop_count(ret))
> + qdisc_qstats_drop(sch);
> + } else {
> + nb++;
> + }
> + }
> +
> + sch->q.qlen += nb;
> + if (nb > 1)
> + qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
> + consume_skb(skb);
> +
> + return nb > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
> +}
> +
> static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> struct sk_buff **to_free)
> {
> @@ -433,6 +469,9 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> return qdisc_drop(skb, sch, to_free);
> }
>
> + if (skb_is_gso(skb))
> + return taprio_segment(skb, sch, child, to_free);
> +
My first worry was whether the segments had the same tstamp as their
parent, and it seems that they do, so everything should just work with
etf or the txtime-assisted mode.
I just want to play with this patch a bit and see how it works in
practice. But it looks good.
Cheers,
--
Vinicius
next prev parent reply other threads:[~2021-03-12 20:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-12 9:28 [PATCH RFC net-next] taprio: Handle short intervals and large packets Kurt Kanzenbach
2021-03-12 20:18 ` Vinicius Costa Gomes [this message]
2021-03-15 13:55 ` Kurt Kanzenbach
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wnucktw1.fsf@vcostago-mobl2.amr.corp.intel.com \
--to=vinicius.gomes@intel.com \
--cc=davem@davemloft.net \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).