* Poor UDP throughput with virtual devices and UFO
@ 2014-10-27 10:29 Toshiaki Makita
2014-10-27 15:01 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Toshiaki Makita @ 2014-10-27 10:29 UTC (permalink / raw)
To: netdev; +Cc: Herbert Xu, Eric Dumazet
Hi,
I recently noticed sending UDP packets ends up with very poor throughput when
using UFO and virtual devices.
Example configurations are:
- macvlan on vlan
- gre on bridge
With these configurations, the upper virtual devices (macvlan, gre) has the
UFO feature and the lower devices (vlan, bridge) don't have it. UFO packets
will be sent from the upper devices and fragmented on the lower devices.
So, they will be fragmented before entering qdisc.
Since skb_segment() doesn't increase sk_wmem_alloc, the send buffer of a UDP
socket looks almost always empty, and user space can send packets with no limit,
which causes massive drops on qdisc.
I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
if we can do this change since it has been this way for years and only TCP
handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").
Here are performance test results (macvlan on vlan):
- Before
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 144096 1224195 1258.56
212992 60.00 51 0.45
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.23 0.00 25.26 0.08 0.00 74.43
Average: 0 0.29 0.00 0.76 0.29 0.00 98.66
Average: 1 0.21 0.00 0.33 0.00 0.00 99.45
Average: 2 0.05 0.00 0.12 0.07 0.00 99.76
Average: 3 0.36 0.00 99.64 0.00 0.00 0.00
- After
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 109593 0 957.20
212992 60.00 109593 957.20
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.18 0.00 8.38 0.02 0.00 91.43
Average: 0 0.17 0.00 3.60 0.00 0.00 96.23
Average: 1 0.13 0.00 6.60 0.00 0.00 93.27
Average: 2 0.23 0.00 5.76 0.07 0.00 93.94
Average: 3 0.17 0.00 17.57 0.00 0.00 82.26
The patch (based on net tree) for the test above:
----
Subject: [PATCH net] gso: Inherit sk_wmem_alloc
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
net/core/skbuff.c | 6 +++++-
net/ipv4/tcp_offload.c | 13 ++++---------
2 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615b..29dc763 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
len, 0);
SKB_GSO_CB(nskb)->csum_start =
skb_headroom(nskb) + doffset;
- continue;
+ goto set_owner;
}
nskb_frag = skb_shinfo(nskb)->frags;
@@ -3092,6 +3092,10 @@ perform_csum_check:
SKB_GSO_CB(nskb)->csum_start =
skb_headroom(nskb) + doffset;
}
+
+set_owner:
+ if (head_skb->sk)
+ skb_set_owner_w(nskb, head_skb->sk);
} while ((offset += len) < head_skb->len);
/* Some callers want to get the end of the list.
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f..93758a8 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
th->check = gso_make_checksum(skb, ~th->check);
seq += mss;
- if (copy_destructor) {
+ if (copy_destructor)
skb->destructor = gso_skb->destructor;
- skb->sk = gso_skb->sk;
- sum_truesize += skb->truesize;
- }
skb = skb->next;
th = tcp_hdr(skb);
@@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
* is freed by GSO engine
*/
if (copy_destructor) {
- swap(gso_skb->sk, skb->sk);
- swap(gso_skb->destructor, skb->destructor);
- sum_truesize += skb->truesize;
- atomic_add(sum_truesize - gso_skb->truesize,
- &skb->sk->sk_wmem_alloc);
+ skb->destructor = gso_skb->destructor;
+ gso_skb->destructor = NULL;
+ atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
}
delta = htonl(oldlen + (skb_tail_pointer(skb) -
--
1.8.1.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Poor UDP throughput with virtual devices and UFO
2014-10-27 10:29 Poor UDP throughput with virtual devices and UFO Toshiaki Makita
@ 2014-10-27 15:01 ` Eric Dumazet
2014-10-27 15:39 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2014-10-27 15:01 UTC (permalink / raw)
To: Toshiaki Makita; +Cc: netdev, Herbert Xu, Eric Dumazet
On Mon, 2014-10-27 at 19:29 +0900, Toshiaki Makita wrote:
> Hi,
...
> I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
> if we can do this change since it has been this way for years and only TCP
> handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").
Thats probably because UFO is kind of strange : No NIC actually does UDP
segmentation.
> ----
> Subject: [PATCH net] gso: Inherit sk_wmem_alloc
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
> net/core/skbuff.c | 6 +++++-
> net/ipv4/tcp_offload.c | 13 ++++---------
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c16615b..29dc763 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> len, 0);
> SKB_GSO_CB(nskb)->csum_start =
> skb_headroom(nskb) + doffset;
> - continue;
> + goto set_owner;
> }
>
> nskb_frag = skb_shinfo(nskb)->frags;
> @@ -3092,6 +3092,10 @@ perform_csum_check:
> SKB_GSO_CB(nskb)->csum_start =
> skb_headroom(nskb) + doffset;
> }
> +
> +set_owner:
> + if (head_skb->sk)
> + skb_set_owner_w(nskb, head_skb->sk);
> } while ((offset += len) < head_skb->len);
>
> /* Some callers want to get the end of the list.
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 5b90f2f..93758a8 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> th->check = gso_make_checksum(skb, ~th->check);
>
> seq += mss;
> - if (copy_destructor) {
> + if (copy_destructor)
> skb->destructor = gso_skb->destructor;
> - skb->sk = gso_skb->sk;
> - sum_truesize += skb->truesize;
> - }
> skb = skb->next;
> th = tcp_hdr(skb);
>
> @@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> * is freed by GSO engine
> */
> if (copy_destructor) {
> - swap(gso_skb->sk, skb->sk);
> - swap(gso_skb->destructor, skb->destructor);
> - sum_truesize += skb->truesize;
> - atomic_add(sum_truesize - gso_skb->truesize,
> - &skb->sk->sk_wmem_alloc);
> + skb->destructor = gso_skb->destructor;
> + gso_skb->destructor = NULL;
> + atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
> }
>
> delta = htonl(oldlen + (skb_tail_pointer(skb) -
Please rewrite your patch to move the code out of tcp_gso_segment() into
skb_segment()
Look how I carefully avoided many atomic operations on
sk->sk_wmem_alloc, but how you removed it. :(
Alternative would be to use a single skb_set_owner_w() on the last
segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc
d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
device, while best performance is reached with TSO=on so that
segmentation is performed later on the slave device.
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Poor UDP throughput with virtual devices and UFO
2014-10-27 15:01 ` Eric Dumazet
@ 2014-10-27 15:39 ` Eric Dumazet
2014-10-27 17:30 ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2014-10-27 15:39 UTC (permalink / raw)
To: Toshiaki Makita; +Cc: netdev, Herbert Xu, Eric Dumazet
On Mon, 2014-10-27 at 08:01 -0700, Eric Dumazet wrote:
> Please rewrite your patch to move the code out of tcp_gso_segment() into
> skb_segment()
>
> Look how I carefully avoided many atomic operations on
> sk->sk_wmem_alloc, but how you removed it. :(
>
> Alternative would be to use a single skb_set_owner_w() on the last
> segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc
>
> d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
> device, while best performance is reached with TSO=on so that
> segmentation is performed later on the slave device.
Hmm.. I meant 6ff50cd55545d92 ("tcp: gso: do not generate out of order
packets")
I think I will test an alternative patch and send it, keeping you as the
author if you do not mind.
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next] net: skb_segment() should preserve backpressure
2014-10-27 15:39 ` Eric Dumazet
@ 2014-10-27 17:30 ` Eric Dumazet
2014-10-28 12:15 ` Toshiaki Makita
2014-10-29 18:47 ` David Miller
0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2014-10-27 17:30 UTC (permalink / raw)
To: Toshiaki Makita, David Miller; +Cc: netdev, Herbert Xu
From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()
TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
order packets")
This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.
Here are performance test results (macvlan on vlan):
- Before
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 144096 1224195 1258.56
212992 60.00 51 0.45
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.23 0.00 25.26 0.08 0.00 74.43
- After
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 109593 0 957.20
212992 60.00 109593 957.20
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.18 0.00 8.38 0.02 0.00 91.43
[edumazet] Rewrote patch and changelog.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/skbuff.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615bfb61edd2a1dae9ef7935a3153d78dc4df..e48e5c02e877d9a9389ea54f0e015ba041d3f2a7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3099,6 +3099,16 @@ perform_csum_check:
* (see validate_xmit_skb_list() for example)
*/
segs->prev = tail;
+
+ /* Following permits correct backpressure, for protocols
+ * using skb_set_owner_w().
+ * Idea is to tranfert ownership from head_skb to last segment.
+ */
+ if (head_skb->destructor == sock_wfree) {
+ swap(tail->truesize, head_skb->truesize);
+ swap(tail->destructor, head_skb->destructor);
+ swap(tail->sk, head_skb->sk);
+ }
return segs;
err:
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: skb_segment() should preserve backpressure
2014-10-27 17:30 ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
@ 2014-10-28 12:15 ` Toshiaki Makita
2014-10-29 18:47 ` David Miller
1 sibling, 0 replies; 6+ messages in thread
From: Toshiaki Makita @ 2014-10-28 12:15 UTC (permalink / raw)
To: Eric Dumazet, David Miller; +Cc: netdev, Herbert Xu
On 2014/10/28 2:30, Eric Dumazet wrote:
> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>
> This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
> friendly") to protocols using skb_set_owner_w()
>
> TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
> as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
> order packets")
>
> This allows UDP sockets using UFO to get proper backpressure,
> thus avoiding qdisc drops and excessive cpu usage.
>
...
> [edumazet] Rewrote patch and changelog.
This looks better in performance.
I tested this patch and couldn't find any problem.
Thank you for your reviewing and suggesting,
Toshiaki Makita
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] net: skb_segment() should preserve backpressure
2014-10-27 17:30 ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
2014-10-28 12:15 ` Toshiaki Makita
@ 2014-10-29 18:47 ` David Miller
1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2014-10-29 18:47 UTC (permalink / raw)
To: eric.dumazet; +Cc: makita.toshiaki, netdev, herbert
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 27 Oct 2014 10:30:51 -0700
> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>
> This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
> friendly") to protocols using skb_set_owner_w()
>
> TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
> as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
> order packets")
>
> This allows UDP sockets using UFO to get proper backpressure,
> thus avoiding qdisc drops and excessive cpu usage.
>
> Here are performance test results (macvlan on vlan):
...
> [edumazet] Rewrote patch and changelog.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks everyone.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-10-29 18:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-27 10:29 Poor UDP throughput with virtual devices and UFO Toshiaki Makita
2014-10-27 15:01 ` Eric Dumazet
2014-10-27 15:39 ` Eric Dumazet
2014-10-27 17:30 ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
2014-10-28 12:15 ` Toshiaki Makita
2014-10-29 18:47 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.