netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC net-next] taprio: Handle short intervals and large packets
@ 2021-03-12  9:28 Kurt Kanzenbach
  2021-03-12 20:18 ` Vinicius Costa Gomes
  0 siblings, 1 reply; 3+ messages in thread
From: Kurt Kanzenbach @ 2021-03-12  9:28 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Vladimir Oltean, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, netdev, Kurt Kanzenbach

When using short intervals e.g. below one millisecond, large packets won't be
transmitted at all. The software implementations checks whether the packet can
be fit into the remaining interval. Therefore, it takes the packet length and
the transmission speed into account. That is correct.

However, for large packets it may be that the transmission time will be larger
than the interval resulting in no packet transmission. The same situation works
fine with hardware offloading applied.

The problem has been observerd with the following schedule and iperf3:

|tc qdisc replace dev lan1 parent root handle 100 taprio \
|   num_tc 8 \
|   map 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 \
|   queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
|   base-time $base \
|   sched-entry S 0x40 500000 \
|   sched-entry S 0xbf 500000 \
|   clockid CLOCK_TAI \
|   flags 0x00

[...]

|root@tsn:~# iperf3 -c 192.168.2.105
|Connecting to host 192.168.2.105, port 5201
|[  5] local 192.168.2.121 port 52610 connected to 192.168.2.105 port 5201
|[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
|[  5]   0.00-1.00   sec  45.2 KBytes   370 Kbits/sec    0   1.41 KBytes
|[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes

After debugging, it seems that the packet length stored in the SKB is about
7000-8000 bytes. Using a 100 Mbit/s link the transmission time is about 600us
which larger than the interval of 500us.

Therefore, segment the SKB into smaller chunks if the packet is too big. This
yields similar results than the hardware offload:

|root@tsn:~# iperf3 -c 192.168.2.105
|Connecting to host 192.168.2.105, port 5201
|- - - - - - - - - - - - - - - - - - - - - - - - -
|[ ID] Interval           Transfer     Bitrate         Retr
|[  5]   0.00-10.00  sec  48.9 MBytes  41.0 Mbits/sec    0             sender
|[  5]   0.00-10.02  sec  48.7 MBytes  40.7 Mbits/sec                  receiver

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
 net/sched/sch_taprio.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8287894541e3..8434e87f79f7 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -411,6 +411,42 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
 	return txtime;
 }
 
+/* Similar to net/sched/sch_tbf.c::tbf_segment */
+static int taprio_segment(struct sk_buff *skb, struct Qdisc *sch,
+			  struct Qdisc *child, struct sk_buff **to_free)
+{
+	netdev_features_t features = netif_skb_features(skb);
+	unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
+	struct sk_buff *segs, *nskb;
+	int ret, nb;
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs))
+		return qdisc_drop(skb, sch, to_free);
+
+	nb = 0;
+	skb_list_walk_safe(segs, segs, nskb) {
+		skb_mark_not_on_list(segs);
+		qdisc_skb_cb(segs)->pkt_len = segs->len;
+		len += segs->len;
+		ret = qdisc_enqueue(segs, child, to_free);
+		if (ret != NET_XMIT_SUCCESS) {
+			if (net_xmit_drop_count(ret))
+				qdisc_qstats_drop(sch);
+		} else {
+			nb++;
+		}
+	}
+
+	sch->q.qlen += nb;
+	if (nb > 1)
+		qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
+	consume_skb(skb);
+
+	return nb > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
+}
+
 static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			  struct sk_buff **to_free)
 {
@@ -433,6 +469,9 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			return qdisc_drop(skb, sch, to_free);
 	}
 
+	if (skb_is_gso(skb))
+		return taprio_segment(skb, sch, child, to_free);
+
 	qdisc_qstats_backlog_inc(sch, skb);
 	sch->q.qlen++;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC net-next] taprio: Handle short intervals and large packets
  2021-03-12  9:28 [PATCH RFC net-next] taprio: Handle short intervals and large packets Kurt Kanzenbach
@ 2021-03-12 20:18 ` Vinicius Costa Gomes
  2021-03-15 13:55   ` Kurt Kanzenbach
  0 siblings, 1 reply; 3+ messages in thread
From: Vinicius Costa Gomes @ 2021-03-12 20:18 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: Vladimir Oltean, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, netdev, Kurt Kanzenbach

Kurt Kanzenbach <kurt@linutronix.de> writes:

> When using short intervals e.g. below one millisecond, large packets won't be
> transmitted at all. The software implementations checks whether the packet can
> be fit into the remaining interval. Therefore, it takes the packet length and
> the transmission speed into account. That is correct.
>
> However, for large packets it may be that the transmission time will be larger
> than the interval resulting in no packet transmission. The same situation works
> fine with hardware offloading applied.
>
> The problem has been observerd with the following schedule and iperf3:
>
> |tc qdisc replace dev lan1 parent root handle 100 taprio \
> |   num_tc 8 \
> |   map 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 \
> |   queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> |   base-time $base \
> |   sched-entry S 0x40 500000 \
> |   sched-entry S 0xbf 500000 \
> |   clockid CLOCK_TAI \
> |   flags 0x00
>
> [...]
>
> |root@tsn:~# iperf3 -c 192.168.2.105
> |Connecting to host 192.168.2.105, port 5201
> |[  5] local 192.168.2.121 port 52610 connected to 192.168.2.105 port 5201
> |[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> |[  5]   0.00-1.00   sec  45.2 KBytes   370 Kbits/sec    0   1.41 KBytes
> |[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes
>
> After debugging, it seems that the packet length stored in the SKB is about
> 7000-8000 bytes. Using a 100 Mbit/s link the transmission time is about 600us
> which larger than the interval of 500us.
>
> Therefore, segment the SKB into smaller chunks if the packet is too big. This
> yields similar results than the hardware offload:
>
> |root@tsn:~# iperf3 -c 192.168.2.105
> |Connecting to host 192.168.2.105, port 5201
> |- - - - - - - - - - - - - - - - - - - - - - - - -
> |[ ID] Interval           Transfer     Bitrate         Retr
> |[  5]   0.00-10.00  sec  48.9 MBytes  41.0 Mbits/sec    0             sender
> |[  5]   0.00-10.02  sec  48.7 MBytes  40.7 Mbits/sec                  receiver
>
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> ---
>  net/sched/sch_taprio.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index 8287894541e3..8434e87f79f7 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -411,6 +411,42 @@ static long get_packet_txtime(struct sk_buff *skb, struct Qdisc *sch)
>  	return txtime;
>  }
>  
> +/* Similar to net/sched/sch_tbf.c::tbf_segment */
> +static int taprio_segment(struct sk_buff *skb, struct Qdisc *sch,
> +			  struct Qdisc *child, struct sk_buff **to_free)
> +{
> +	netdev_features_t features = netif_skb_features(skb);
> +	unsigned int len = 0, prev_len = qdisc_pkt_len(skb);
> +	struct sk_buff *segs, *nskb;
> +	int ret, nb;
> +
> +	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
> +
> +	if (IS_ERR_OR_NULL(segs))
> +		return qdisc_drop(skb, sch, to_free);
> +
> +	nb = 0;
> +	skb_list_walk_safe(segs, segs, nskb) {
> +		skb_mark_not_on_list(segs);
> +		qdisc_skb_cb(segs)->pkt_len = segs->len;
> +		len += segs->len;
> +		ret = qdisc_enqueue(segs, child, to_free);
> +		if (ret != NET_XMIT_SUCCESS) {
> +			if (net_xmit_drop_count(ret))
> +				qdisc_qstats_drop(sch);
> +		} else {
> +			nb++;
> +		}
> +	}
> +
> +	sch->q.qlen += nb;
> +	if (nb > 1)
> +		qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
> +	consume_skb(skb);
> +
> +	return nb > 0 ? NET_XMIT_SUCCESS : NET_XMIT_DROP;
> +}
> +
>  static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  			  struct sk_buff **to_free)
>  {
> @@ -433,6 +469,9 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>  			return qdisc_drop(skb, sch, to_free);
>  	}
>  
> +	if (skb_is_gso(skb))
> +		return taprio_segment(skb, sch, child, to_free);
> +

My first worry was whether the segments had the same tstamp as their
parent, and it seems that they do, so everything should just work with
etf or the txtime-assisted mode.

I just want to play with this patch a bit and see how it works in
practice. But it looks good.


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC net-next] taprio: Handle short intervals and large packets
  2021-03-12 20:18 ` Vinicius Costa Gomes
@ 2021-03-15 13:55   ` Kurt Kanzenbach
  0 siblings, 0 replies; 3+ messages in thread
From: Kurt Kanzenbach @ 2021-03-15 13:55 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Vladimir Oltean, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	David S. Miller, Jakub Kicinski, netdev

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

On Fri Mar 12 2021, Vinicius Costa Gomes wrote:
> My first worry was whether the segments had the same tstamp as their
> parent, and it seems that they do, so everything should just work with
> etf or the txtime-assisted mode.
>
> I just want to play with this patch a bit and see how it works in
> practice. But it looks good.

OK, thanks for testing. Most likely the segmentation can be skipped with
full offloading applied.

I'll also test a bit more, before sending a non-RFC version.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-15 13:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12  9:28 [PATCH RFC net-next] taprio: Handle short intervals and large packets Kurt Kanzenbach
2021-03-12 20:18 ` Vinicius Costa Gomes
2021-03-15 13:55   ` Kurt Kanzenbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).