All of lore.kernel.org
 help / color / mirror / Atom feed
* Poor UDP throughput with virtual devices and UFO
@ 2014-10-27 10:29 Toshiaki Makita
  2014-10-27 15:01 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Toshiaki Makita @ 2014-10-27 10:29 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, Eric Dumazet

Hi,

I recently noticed sending UDP packets ends up with very poor throughput when
using UFO and virtual devices.

Example configurations are:
- macvlan on vlan
- gre on bridge

With these configurations, the upper virtual devices (macvlan, gre) has the
UFO feature and the lower devices (vlan, bridge) don't have it. UFO packets
will be sent from the upper devices and fragmented on the lower devices.
So, they will be fragmented before entering qdisc.

Since skb_segment() doesn't increase sk_wmem_alloc, the send buffer of a UDP
socket looks almost always empty, and user space can send packets with no limit,
which causes massive drops on qdisc.

I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
if we can do this change since it has been this way for years and only TCP
handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43
Average:          0      0.29      0.00      0.76      0.29      0.00     98.66
Average:          1      0.21      0.00      0.33      0.00      0.00     99.45
Average:          2      0.05      0.00      0.12      0.07      0.00     99.76
Average:          3      0.36      0.00     99.64      0.00      0.00      0.00

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43
Average:          0      0.17      0.00      3.60      0.00      0.00     96.23
Average:          1      0.13      0.00      6.60      0.00      0.00     93.27
Average:          2      0.23      0.00      5.76      0.07      0.00     93.94
Average:          3      0.17      0.00     17.57      0.00      0.00     82.26


The patch (based on net tree) for the test above:

----
Subject: [PATCH net] gso: Inherit sk_wmem_alloc

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 net/core/skbuff.c      |  6 +++++-
 net/ipv4/tcp_offload.c | 13 ++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615b..29dc763 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 							    len, 0);
 			SKB_GSO_CB(nskb)->csum_start =
 			    skb_headroom(nskb) + doffset;
-			continue;
+			goto set_owner;
 		}
 
 		nskb_frag = skb_shinfo(nskb)->frags;
@@ -3092,6 +3092,10 @@ perform_csum_check:
 			SKB_GSO_CB(nskb)->csum_start =
 			    skb_headroom(nskb) + doffset;
 		}
+
+set_owner:
+		if (head_skb->sk)
+			skb_set_owner_w(nskb, head_skb->sk);
 	} while ((offset += len) < head_skb->len);
 
 	/* Some callers want to get the end of the list.
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f..93758a8 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 			th->check = gso_make_checksum(skb, ~th->check);
 
 		seq += mss;
-		if (copy_destructor) {
+		if (copy_destructor)
 			skb->destructor = gso_skb->destructor;
-			skb->sk = gso_skb->sk;
-			sum_truesize += skb->truesize;
-		}
 		skb = skb->next;
 		th = tcp_hdr(skb);
 
@@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	 * is freed by GSO engine
 	 */
 	if (copy_destructor) {
-		swap(gso_skb->sk, skb->sk);
-		swap(gso_skb->destructor, skb->destructor);
-		sum_truesize += skb->truesize;
-		atomic_add(sum_truesize - gso_skb->truesize,
-			   &skb->sk->sk_wmem_alloc);
+		skb->destructor = gso_skb->destructor;
+		gso_skb->destructor = NULL;
+		atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
 	}
 
 	delta = htonl(oldlen + (skb_tail_pointer(skb) -
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Poor UDP throughput with virtual devices and UFO
  2014-10-27 10:29 Poor UDP throughput with virtual devices and UFO Toshiaki Makita
@ 2014-10-27 15:01 ` Eric Dumazet
  2014-10-27 15:39   ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2014-10-27 15:01 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: netdev, Herbert Xu, Eric Dumazet

On Mon, 2014-10-27 at 19:29 +0900, Toshiaki Makita wrote:
> Hi,

...

> I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
> if we can do this change since it has been this way for years and only TCP
> handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").

Thats probably because UFO is kind of strange : No NIC actually does UDP
segmentation.

> ----
> Subject: [PATCH net] gso: Inherit sk_wmem_alloc
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>  net/core/skbuff.c      |  6 +++++-
>  net/ipv4/tcp_offload.c | 13 ++++---------
>  2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c16615b..29dc763 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  							    len, 0);
>  			SKB_GSO_CB(nskb)->csum_start =
>  			    skb_headroom(nskb) + doffset;
> -			continue;
> +			goto set_owner;
>  		}
>  
>  		nskb_frag = skb_shinfo(nskb)->frags;
> @@ -3092,6 +3092,10 @@ perform_csum_check:
>  			SKB_GSO_CB(nskb)->csum_start =
>  			    skb_headroom(nskb) + doffset;
>  		}
> +
> +set_owner:
> +		if (head_skb->sk)
> +			skb_set_owner_w(nskb, head_skb->sk);
>  	} while ((offset += len) < head_skb->len);
>  
>  	/* Some callers want to get the end of the list.
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 5b90f2f..93758a8 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  			th->check = gso_make_checksum(skb, ~th->check);
>  
>  		seq += mss;
> -		if (copy_destructor) {
> +		if (copy_destructor)
>  			skb->destructor = gso_skb->destructor;
> -			skb->sk = gso_skb->sk;
> -			sum_truesize += skb->truesize;
> -		}
>  		skb = skb->next;
>  		th = tcp_hdr(skb);
>  
> @@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  	 * is freed by GSO engine
>  	 */
>  	if (copy_destructor) {
> -		swap(gso_skb->sk, skb->sk);
> -		swap(gso_skb->destructor, skb->destructor);
> -		sum_truesize += skb->truesize;
> -		atomic_add(sum_truesize - gso_skb->truesize,
> -			   &skb->sk->sk_wmem_alloc);
> +		skb->destructor = gso_skb->destructor;
> +		gso_skb->destructor = NULL;
> +		atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
>  	}
>  
>  	delta = htonl(oldlen + (skb_tail_pointer(skb) -


Please rewrite your patch to move the code out of tcp_gso_segment() into
skb_segment()

Look how I carefully avoided many atomic operations on
sk->sk_wmem_alloc, but how you removed it. :(

Alternative would be to use a single skb_set_owner_w() on the last
segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc

d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
device, while best performance is reached with TSO=on so that
segmentation is performed later on the slave device.

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Poor UDP throughput with virtual devices and UFO
  2014-10-27 15:01 ` Eric Dumazet
@ 2014-10-27 15:39   ` Eric Dumazet
  2014-10-27 17:30     ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2014-10-27 15:39 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: netdev, Herbert Xu, Eric Dumazet

On Mon, 2014-10-27 at 08:01 -0700, Eric Dumazet wrote:

> Please rewrite your patch to move the code out of tcp_gso_segment() into
> skb_segment()
> 
> Look how I carefully avoided many atomic operations on
> sk->sk_wmem_alloc, but how you removed it. :(
> 
> Alternative would be to use a single skb_set_owner_w() on the last
> segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc
> 
> d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
> device, while best performance is reached with TSO=on so that
> segmentation is performed later on the slave device.

Hmm.. I meant 6ff50cd55545d92 ("tcp: gso: do not generate out of order
packets")

I think I will test an alternative patch and send it, keeping you as the
author if you do not mind.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next] net: skb_segment() should preserve backpressure
  2014-10-27 15:39   ` Eric Dumazet
@ 2014-10-27 17:30     ` Eric Dumazet
  2014-10-28 12:15       ` Toshiaki Makita
  2014-10-29 18:47       ` David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2014-10-27 17:30 UTC (permalink / raw)
  To: Toshiaki Makita, David Miller; +Cc: netdev, Herbert Xu

From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()

TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
order packets")

This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43

[edumazet] Rewrote patch and changelog.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615bfb61edd2a1dae9ef7935a3153d78dc4df..e48e5c02e877d9a9389ea54f0e015ba041d3f2a7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3099,6 +3099,16 @@ perform_csum_check:
 	 * (see validate_xmit_skb_list() for example)
 	 */
 	segs->prev = tail;
+
+	/* Following permits correct backpressure, for protocols
+	 * using skb_set_owner_w().
+	 * Idea is to tranfert ownership from head_skb to last segment.
+	 */
+	if (head_skb->destructor == sock_wfree) {
+		swap(tail->truesize, head_skb->truesize);
+		swap(tail->destructor, head_skb->destructor);
+		swap(tail->sk, head_skb->sk);
+	}
 	return segs;
 
 err:

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: skb_segment() should preserve backpressure
  2014-10-27 17:30     ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
@ 2014-10-28 12:15       ` Toshiaki Makita
  2014-10-29 18:47       ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Toshiaki Makita @ 2014-10-28 12:15 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, Herbert Xu

On 2014/10/28 2:30, Eric Dumazet wrote:
> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> 
> This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
> friendly") to protocols using skb_set_owner_w()
> 
> TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
> as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
> order packets")
> 
> This allows UDP sockets using UFO to get proper backpressure,
> thus avoiding qdisc drops and excessive cpu usage.
> 
...
> [edumazet] Rewrote patch and changelog.

This looks better in performance.
I tested this patch and couldn't find any problem.

Thank you for your reviewing and suggesting,
Toshiaki Makita

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] net: skb_segment() should preserve backpressure
  2014-10-27 17:30     ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
  2014-10-28 12:15       ` Toshiaki Makita
@ 2014-10-29 18:47       ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David Miller @ 2014-10-29 18:47 UTC (permalink / raw)
  To: eric.dumazet; +Cc: makita.toshiaki, netdev, herbert

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 27 Oct 2014 10:30:51 -0700

> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> 
> This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
> friendly") to protocols using skb_set_owner_w()
> 
> TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
> as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
> order packets")
> 
> This allows UDP sockets using UFO to get proper backpressure,
> thus avoiding qdisc drops and excessive cpu usage.
> 
> Here are performance test results (macvlan on vlan):
 ...
> [edumazet] Rewrote patch and changelog.
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-29 18:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-27 10:29 Poor UDP throughput with virtual devices and UFO Toshiaki Makita
2014-10-27 15:01 ` Eric Dumazet
2014-10-27 15:39   ` Eric Dumazet
2014-10-27 17:30     ` [PATCH net-next] net: skb_segment() should preserve backpressure Eric Dumazet
2014-10-28 12:15       ` Toshiaki Makita
2014-10-29 18:47       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.