All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: Fix gro aggregation for udp encaps with zero csum
@ 2021-02-26 21:22 Daniel Borkmann
  2021-02-27 16:00 ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Borkmann @ 2021-02-26 21:22 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, Daniel Borkmann, Eric Dumazet, Willem de Bruijn,
	John Fastabend, Jesse Brandeburg, Tom Herbert

We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
csum for the UDP header itself is 0. In that case, GRO aggregation does
not take place on the phys dev, but instead is deferred to the vxlan/geneve
driver (see trace below).

The reason is essentially that GRO aggregation bails out in udp_gro_receive()
for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing
checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
and napi context has csum_valid set. This is however not the case for zero
UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches
on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
so it certainly is expected to handle zero csums in udp_gro_receive(). The
purpose of the check added via 662880f44203 ("net: Allow GRO to use and set
levels of checksum unnecessary") seems to catch bad csum and stop aggregation
right away.

One way to fix aggregation in the zero case is to only perform the !csum_valid
check in udp_gro_receive() if uh->check is infact non-zero.

Before:

  [...]
  swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
  swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
  swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    13129.24

After:

  [...]
  swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
  swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    24576.53

Fixes: 57c67ff4bd92 ("udp: additional GRO support")
Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tom Herbert <tom@herbertland.com>
---
 net/ipv4/udp_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index ff39e94781bf..a7f714c4b068 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -466,7 +466,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	}
 
 	if (!sk || NAPI_GRO_CB(skb)->encap_mark ||
-	    (skb->ip_summed != CHECKSUM_PARTIAL &&
+	    (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
 	     !NAPI_GRO_CB(skb)->csum_valid) ||
 	    !udp_sk(sk)->gro_receive)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net: Fix gro aggregation for udp encaps with zero csum
  2021-02-26 21:22 [PATCH net] net: Fix gro aggregation for udp encaps with zero csum Daniel Borkmann
@ 2021-02-27 16:00 ` Willem de Bruijn
  2021-02-27 17:16   ` John Fastabend
  0 siblings, 1 reply; 4+ messages in thread
From: Willem de Bruijn @ 2021-02-27 16:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Jakub Kicinski, Network Development, Eric Dumazet,
	John Fastabend, Jesse Brandeburg, Tom Herbert

On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
> csum for the UDP header itself is 0. In that case, GRO aggregation does
> not take place on the phys dev, but instead is deferred to the vxlan/geneve
> driver (see trace below).
>
> The reason is essentially that GRO aggregation bails out in udp_gro_receive()
> for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
> others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing
> checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
> and napi context has csum_valid set. This is however not the case for zero
> UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
>
> At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches
> on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
> so it certainly is expected to handle zero csums in udp_gro_receive(). The
> purpose of the check added via 662880f44203 ("net: Allow GRO to use and set
> levels of checksum unnecessary") seems to catch bad csum and stop aggregation
> right away.
>
> One way to fix aggregation in the zero case is to only perform the !csum_valid
> check in udp_gro_receive() if uh->check is infact non-zero.
>
> Before:
>
>   [...]
>   swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
>   swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
>   swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
>   swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
>   swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
>   swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
>   swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
>   swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
>   swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
>   [...]
>
>   # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
>   MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
>
>    87380  16384  16384    20.01    13129.24
>
> After:
>
>   [...]
>   swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
>   swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
>   swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
>   swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
>   swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
>   swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
>   [...]
>
>   # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
>   MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
>
>    87380  16384  16384    20.01    24576.53
>
> Fixes: 57c67ff4bd92 ("udp: additional GRO support")
> Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum unnecessary")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Cc: Tom Herbert <tom@herbertland.com>

Makes sense to me.

We cannot do checksum conversion with zero field, but that does not
have to limit coalescing.

CHECKSUM_COMPLETE with a checksum validated by
skb_gro_checksum_validate_zero_check implies csum_valid.

So the test

>             (skb->ip_summed != CHECKSUM_PARTIAL &&
>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>              !NAPI_GRO_CB(skb)->csum_valid) ||

Basically matches

- CHECKSUM_NONE
- CHECKSUM_UNNECESSARY which has already used up its valid state on a
prior header
- CHECKSUM_COMPLETE with bad checksum.

This change just refines to not drop for in the first two cases on a
zero checksum field.

Making this explicit in case anyone sees holes in the logic. Else,

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net: Fix gro aggregation for udp encaps with zero csum
  2021-02-27 16:00 ` Willem de Bruijn
@ 2021-02-27 17:16   ` John Fastabend
  2021-02-28 20:02     ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: John Fastabend @ 2021-02-27 17:16 UTC (permalink / raw)
  To: Willem de Bruijn, Daniel Borkmann
  Cc: David Miller, Jakub Kicinski, Network Development, Eric Dumazet,
	John Fastabend, Jesse Brandeburg, Tom Herbert

Willem de Bruijn wrote:
> On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
> > csum for the UDP header itself is 0. In that case, GRO aggregation does
> > not take place on the phys dev, but instead is deferred to the vxlan/geneve
> > driver (see trace below).
> >
> > The reason is essentially that GRO aggregation bails out in udp_gro_receive()
> > for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
> > others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing
> > checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
> > and napi context has csum_valid set. This is however not the case for zero
> > UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
> >
> > At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches
> > on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
> > so it certainly is expected to handle zero csums in udp_gro_receive(). The
> > purpose of the check added via 662880f44203 ("net: Allow GRO to use and set
> > levels of checksum unnecessary") seems to catch bad csum and stop aggregation
> > right away.
> >
> > One way to fix aggregation in the zero case is to only perform the !csum_valid
> > check in udp_gro_receive() if uh->check is infact non-zero.
> >

[...]

> We cannot do checksum conversion with zero field, but that does not
> have to limit coalescing.
> 
> CHECKSUM_COMPLETE with a checksum validated by
> skb_gro_checksum_validate_zero_check implies csum_valid.
> 
> So the test
> 
> >             (skb->ip_summed != CHECKSUM_PARTIAL &&
> >              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
> >              !NAPI_GRO_CB(skb)->csum_valid) ||
> 
> Basically matches
> 
> - CHECKSUM_NONE
> - CHECKSUM_UNNECESSARY which has already used up its valid state on a
> prior header
> - CHECKSUM_COMPLETE with bad checksum.
> 
> This change just refines to not drop for in the first two cases on a
> zero checksum field.

+1

> 
> Making this explicit in case anyone sees holes in the logic. Else,
> 
> Acked-by: Willem de Bruijn <willemb@google.com>

LGTM,

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net: Fix gro aggregation for udp encaps with zero csum
  2021-02-27 17:16   ` John Fastabend
@ 2021-02-28 20:02     ` Jakub Kicinski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2021-02-28 20:02 UTC (permalink / raw)
  To: John Fastabend, Willem de Bruijn, Daniel Borkmann
  Cc: David Miller, Network Development, Eric Dumazet,
	Jesse Brandeburg, Tom Herbert

On Sat, 27 Feb 2021 09:16:38 -0800 John Fastabend wrote:
> Willem de Bruijn wrote:
> > On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <daniel@iogearbox.net> wrote:  
> > >
> > > We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
> > > csum for the UDP header itself is 0. In that case, GRO aggregation does
> > > not take place on the phys dev, but instead is deferred to the vxlan/geneve
> > > driver (see trace below).
> > >
> > > The reason is essentially that GRO aggregation bails out in udp_gro_receive()
> > > for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
> > > others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing
> > > checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
> > > and napi context has csum_valid set. This is however not the case for zero
> > > UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
> > >
> > > At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches
> > > on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
> > > so it certainly is expected to handle zero csums in udp_gro_receive(). The
> > > purpose of the check added via 662880f44203 ("net: Allow GRO to use and set
> > > levels of checksum unnecessary") seems to catch bad csum and stop aggregation
> > > right away.
> > >
> > > One way to fix aggregation in the zero case is to only perform the !csum_valid
> > > check in udp_gro_receive() if uh->check is infact non-zero.
> >
> > Acked-by: Willem de Bruijn <willemb@google.com>  
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-28 20:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 21:22 [PATCH net] net: Fix gro aggregation for udp encaps with zero csum Daniel Borkmann
2021-02-27 16:00 ` Willem de Bruijn
2021-02-27 17:16   ` John Fastabend
2021-02-28 20:02     ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.