All of lore.kernel.org
 help / color / mirror / Atom feed
* Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path
@ 2023-07-12 14:00 Marek Majkowski
  2023-07-12 14:58 ` Yan Zhai
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Majkowski @ 2023-07-12 14:00 UTC (permalink / raw)
  To: network dev, Yan Zhai, kernel-team

Dear netdev,

I encountered a puzzling problem, please help.

Rootless repro:
   https://gist.github.com/majek/5e8fd12e7a1663cea63877920fe86f18

To run:

```
$ unshare -Urn python3 udp-gro-forwarding-bug.py
tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
'''

The code is really quite simple. First I create and open a tap device.
Then I send a large (>MTU) packet with vnethdr over tap0. The
gso_type=GSO_UDP_L4, and gso_size=1400. I expect the packet to egress
from tap0, be forwarded somewhere, where it will eventually be
segmented by software or hardware.

The egress tap0 packet looks perfectly fine:

tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000

To simplify routing I'm doing 'tc mirred' aka `bpf_redirect()` magic,
where I move egress tap0 packets to ingress lo, like this:

> tc filter add dev tap0 ingress protocol ip u32 match ip src 10.0.0.2 action mirred egress redirect dev lo

On ingress lo I see something really weird:

lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200

This looks like IPv4 fragments without the IP fragmentation bits set.

I think there are two independent problems here:

(1) The packet is *fragmented* and that is plain wrong here. I'm
asking for USO not UFO in vnethdr.

(2) The fragmentation attempt is broken, the IPv4 fragmentation bits are clear.

Please advise. I would assume transmitting UDP_GSO packets off tap is
a typical thing to do.

I was able to repro this on a 6.4 kernel.

For a moment I thought it's a `ethtool -K X rx-udp-gro-forwarding on`
bug, but I'm not sure anymore.

Marek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path
  2023-07-12 14:00 Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path Marek Majkowski
@ 2023-07-12 14:58 ` Yan Zhai
  2023-07-12 15:19   ` Paolo Abeni
  0 siblings, 1 reply; 5+ messages in thread
From: Yan Zhai @ 2023-07-12 14:58 UTC (permalink / raw)
  To: Marek Majkowski; +Cc: network dev, kernel-team, Andrew Melnychenko

On Wed, Jul 12, 2023 at 9:00 AM Marek Majkowski <marek@cloudflare.com> wrote:
>
> Dear netdev,
>
> I encountered a puzzling problem, please help.
>
> Rootless repro:
>    https://gist.github.com/majek/5e8fd12e7a1663cea63877920fe86f18
>
> To run:
>
> ```
> $ unshare -Urn python3 udp-gro-forwarding-bug.py
> tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> '''
>
> The code is really quite simple. First I create and open a tap device.
> Then I send a large (>MTU) packet with vnethdr over tap0. The
> gso_type=GSO_UDP_L4, and gso_size=1400. I expect the packet to egress
> from tap0, be forwarded somewhere, where it will eventually be
> segmented by software or hardware.
>
> The egress tap0 packet looks perfectly fine:
>
> tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
>
> To simplify routing I'm doing 'tc mirred' aka `bpf_redirect()` magic,
> where I move egress tap0 packets to ingress lo, like this:
>
> > tc filter add dev tap0 ingress protocol ip u32 match ip src 10.0.0.2 action mirred egress redirect dev lo
>
> On ingress lo I see something really weird:
>
> lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
>
> This looks like IPv4 fragments without the IP fragmentation bits set.
>
> I think there are two independent problems here:
>
> (1) The packet is *fragmented* and that is plain wrong here. I'm
> asking for USO not UFO in vnethdr.
>

To add some context our virtio header in hex format (12 bytes) is
01052a007805220006000000.

Some digging shows that the issue seems to come from this patch:
https://lore.kernel.org/netdev/20220907125048.396126-2-andrew@daynix.com/
At this point, skb_shared_info->gso_type is SKB_GSO_UDP_L4 |
SKB_GSO_DODGY, here the DODGY bit is set inside tun_get_user. So the
skb_gso_ok check will return true here, then the skb will fall to the
fragment code. Simple tracing confirms that __udp_gso_segment is never
called in this scenario.

So the question is really how to handle the DODGY bit. IMHO it is not
right to fall to the fragment path when the actual packet request is
segmentation. Will it be sufficient to just recompute the gso_segs
here and return the head skb instead?  The only missing bit in the
device feature is the DODGY bit here.

> (2) The fragmentation attempt is broken, the IPv4 fragmentation bits are clear.
>
> Please advise. I would assume transmitting UDP_GSO packets off tap is
> a typical thing to do.
>
> I was able to repro this on a 6.4 kernel.
>
> For a moment I thought it's a `ethtool -K X rx-udp-gro-forwarding on`
> bug, but I'm not sure anymore.
>
> Marek



-- 

Yan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path
  2023-07-12 14:58 ` Yan Zhai
@ 2023-07-12 15:19   ` Paolo Abeni
  2023-07-13  0:50     ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2023-07-12 15:19 UTC (permalink / raw)
  To: Yan Zhai, Marek Majkowski
  Cc: network dev, kernel-team, Andrew Melnychenko, Jason Wang

Adding Jason,
On Wed, 2023-07-12 at 09:58 -0500, Yan Zhai wrote:
> On Wed, Jul 12, 2023 at 9:00 AM Marek Majkowski <marek@cloudflare.com> wrote:
> > 
> > Dear netdev,
> > 
> > I encountered a puzzling problem, please help.
> > 
> > Rootless repro:
> >    https://gist.github.com/majek/5e8fd12e7a1663cea63877920fe86f18
> > 
> > To run:
> > 
> > ```
> > $ unshare -Urn python3 udp-gro-forwarding-bug.py
> > tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > '''
> > 
> > The code is really quite simple. First I create and open a tap device.
> > Then I send a large (>MTU) packet with vnethdr over tap0. The
> > gso_type=GSO_UDP_L4, and gso_size=1400. I expect the packet to egress
> > from tap0, be forwarded somewhere, where it will eventually be
> > segmented by software or hardware.
> > 
> > The egress tap0 packet looks perfectly fine:
> > 
> > tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > 
> > To simplify routing I'm doing 'tc mirred' aka `bpf_redirect()` magic,
> > where I move egress tap0 packets to ingress lo, like this:
> > 
> > > tc filter add dev tap0 ingress protocol ip u32 match ip src 10.0.0.2 action mirred egress redirect dev lo
> > 
> > On ingress lo I see something really weird:
> > 
> > lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > 
> > This looks like IPv4 fragments without the IP fragmentation bits set.
> > 
> > I think there are two independent problems here:
> > 
> > (1) The packet is *fragmented* and that is plain wrong here. I'm
> > asking for USO not UFO in vnethdr.
> > 
> 
> To add some context our virtio header in hex format (12 bytes) is
> 01052a007805220006000000.
> 
> Some digging shows that the issue seems to come from this patch:
> https://lore.kernel.org/netdev/20220907125048.396126-2-andrew@daynix.com/
> At this point, skb_shared_info->gso_type is SKB_GSO_UDP_L4 |
> SKB_GSO_DODGY, here the DODGY bit is set inside tun_get_user. So the
> skb_gso_ok check will return true here, then the skb will fall to the
> fragment code. Simple tracing confirms that __udp_gso_segment is never
> called in this scenario.
> 
> So the question is really how to handle the DODGY bit. IMHO it is not
> right to fall to the fragment path when the actual packet request is
> segmentation. 

I do agree with the above.

> Will it be sufficient to just recompute the gso_segs
> here and return the head skb instead?

Something alike what TCP is currently doing:

https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_offload.c#L85

should be fine - constrained to the skb_shinfo(skb)->gso_type &
SKB_GSO_UDP_L4 case.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path
  2023-07-12 15:19   ` Paolo Abeni
@ 2023-07-13  0:50     ` Willem de Bruijn
  2023-07-13  2:01       ` Yan Zhai
  0 siblings, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2023-07-13  0:50 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Yan Zhai, Marek Majkowski, network dev, kernel-team,
	Andrew Melnychenko, Jason Wang

On Wed, Jul 12, 2023 at 11:20 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Adding Jason,
> On Wed, 2023-07-12 at 09:58 -0500, Yan Zhai wrote:
> > On Wed, Jul 12, 2023 at 9:00 AM Marek Majkowski <marek@cloudflare.com> wrote:
> > >
> > > Dear netdev,
> > >
> > > I encountered a puzzling problem, please help.
> > >
> > > Rootless repro:
> > >    https://gist.github.com/majek/5e8fd12e7a1663cea63877920fe86f18
> > >
> > > To run:
> > >
> > > ```
> > > $ unshare -Urn python3 udp-gro-forwarding-bug.py
> > > tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > > lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > > '''
> > >
> > > The code is really quite simple. First I create and open a tap device.
> > > Then I send a large (>MTU) packet with vnethdr over tap0. The
> > > gso_type=GSO_UDP_L4, and gso_size=1400. I expect the packet to egress
> > > from tap0, be forwarded somewhere, where it will eventually be
> > > segmented by software or hardware.
> > >
> > > The egress tap0 packet looks perfectly fine:
> > >
> > > tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > >
> > > To simplify routing I'm doing 'tc mirred' aka `bpf_redirect()` magic,
> > > where I move egress tap0 packets to ingress lo, like this:
> > >
> > > > tc filter add dev tap0 ingress protocol ip u32 match ip src 10.0.0.2 action mirred egress redirect dev lo

This is the opposite of stated, attach to tap0 at ingress and send to
lo on egress?

It might matter only in the sense that tc_mirred on egress acts in
dev_queue_xmit before any segmentation would occur.

Probably irrelevant, as your example clearly hits the segmentation
logic, and it sounds like the root cause there is already well
understood.

> > >
> > > On ingress lo I see something really weird:
> > >
> > > lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > >
> > > This looks like IPv4 fragments without the IP fragmentation bits set.
> > >
> > > I think there are two independent problems here:
> > >
> > > (1) The packet is *fragmented* and that is plain wrong here. I'm
> > > asking for USO not UFO in vnethdr.
> > >
> >
> > To add some context our virtio header in hex format (12 bytes) is
> > 01052a007805220006000000.
> >
> > Some digging shows that the issue seems to come from this patch:
> > https://lore.kernel.org/netdev/20220907125048.396126-2-andrew@daynix.com/
> > At this point, skb_shared_info->gso_type is SKB_GSO_UDP_L4 |
> > SKB_GSO_DODGY, here the DODGY bit is set inside tun_get_user. So the
> > skb_gso_ok check will return true here, then the skb will fall to the
> > fragment code. Simple tracing confirms that __udp_gso_segment is never
> > called in this scenario.
> >
> > So the question is really how to handle the DODGY bit. IMHO it is not
> > right to fall to the fragment path when the actual packet request is
> > segmentation.
>
> I do agree with the above.
>
> > Will it be sufficient to just recompute the gso_segs
> > here and return the head skb instead?
>
> Something alike what TCP is currently doing:
>
> https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_offload.c#L85
>
> should be fine - constrained to the skb_shinfo(skb)->gso_type &
> SKB_GSO_UDP_L4 case.

Agreed. That's the canonical way to check dodgy segmentation offload
packets. All USO packets should enter __udp_gso_segment.

After validation and DODGY removal, a packet may fall through to
device segmentation if the devices advertises the feature (by returning
segs == NULL).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path
  2023-07-13  0:50     ` Willem de Bruijn
@ 2023-07-13  2:01       ` Yan Zhai
  0 siblings, 0 replies; 5+ messages in thread
From: Yan Zhai @ 2023-07-13  2:01 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Paolo Abeni, Marek Majkowski, network dev, kernel-team,
	Andrew Melnychenko, Jason Wang

On Wed, Jul 12, 2023 at 7:51 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Jul 12, 2023 at 11:20 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > Adding Jason,
> > On Wed, 2023-07-12 at 09:58 -0500, Yan Zhai wrote:
> > > On Wed, Jul 12, 2023 at 9:00 AM Marek Majkowski <marek@cloudflare.com> wrote:
> > > >
> > > > Dear netdev,
> > > >
> > > > I encountered a puzzling problem, please help.
> > > >
> > > > Rootless repro:
> > > >    https://gist.github.com/majek/5e8fd12e7a1663cea63877920fe86f18
> > > >
> > > > To run:
> > > >
> > > > ```
> > > > $ unshare -Urn python3 udp-gro-forwarding-bug.py
> > > > tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > > > lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > > > '''
> > > >
> > > > The code is really quite simple. First I create and open a tap device.
> > > > Then I send a large (>MTU) packet with vnethdr over tap0. The
> > > > gso_type=GSO_UDP_L4, and gso_size=1400. I expect the packet to egress
> > > > from tap0, be forwarded somewhere, where it will eventually be
> > > > segmented by software or hardware.
> > > >
> > > > The egress tap0 packet looks perfectly fine:
> > > >
> > > > tap0  In  IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, length 4000
> > > >
> > > > To simplify routing I'm doing 'tc mirred' aka `bpf_redirect()` magic,
> > > > where I move egress tap0 packets to ingress lo, like this:
> > > >
> > > > > tc filter add dev tap0 ingress protocol ip u32 match ip src 10.0.0.2 action mirred egress redirect dev lo
>
> This is the opposite of stated, attach to tap0 at ingress and send to
> lo on egress?
>
> It might matter only in the sense that tc_mirred on egress acts in
> dev_queue_xmit before any segmentation would occur.
>
> Probably irrelevant, as your example clearly hits the segmentation
> logic, and it sounds like the root cause there is already well
> understood.
>
> > > >
> > > > On ingress lo I see something really weird:
> > > >
> > > > lo    P   IP 10.0.0.2.55892 > 1.1.1.1.5021: UDP, bad length 4000 > 1392
> > > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1392
> > > > lo    P   IP 10.0.0.2.43690 > 1.1.1.1.43690: UDP, bad length 43682 > 1200
> > > >
> > > > This looks like IPv4 fragments without the IP fragmentation bits set.
> > > >
> > > > I think there are two independent problems here:
> > > >
> > > > (1) The packet is *fragmented* and that is plain wrong here. I'm
> > > > asking for USO not UFO in vnethdr.
> > > >
> > >
> > > To add some context our virtio header in hex format (12 bytes) is
> > > 01052a007805220006000000.
> > >
> > > Some digging shows that the issue seems to come from this patch:
> > > https://lore.kernel.org/netdev/20220907125048.396126-2-andrew@daynix.com/
> > > At this point, skb_shared_info->gso_type is SKB_GSO_UDP_L4 |
> > > SKB_GSO_DODGY, here the DODGY bit is set inside tun_get_user. So the
> > > skb_gso_ok check will return true here, then the skb will fall to the
> > > fragment code. Simple tracing confirms that __udp_gso_segment is never
> > > called in this scenario.
> > >
> > > So the question is really how to handle the DODGY bit. IMHO it is not
> > > right to fall to the fragment path when the actual packet request is
> > > segmentation.
> >
> > I do agree with the above.
> >
> > > Will it be sufficient to just recompute the gso_segs
> > > here and return the head skb instead?
> >
> > Something alike what TCP is currently doing:
> >
> > https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_offload.c#L85
> >
> > should be fine - constrained to the skb_shinfo(skb)->gso_type &
> > SKB_GSO_UDP_L4 case.
>
> Agreed. That's the canonical way to check dodgy segmentation offload
> packets. All USO packets should enter __udp_gso_segment.
>
> After validation and DODGY removal, a packet may fall through to
> device segmentation if the devices advertises the feature (by returning
> segs == NULL).

I sent a patch at
https://lore.kernel.org/netdev/ZK9ZiNMsJX8+1F3N@debian.debian/T/#u.
Tested on my local machine it is working correctly now.

-- 

Yan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-13  2:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-12 14:00 Broken segmentation on UDP_GSO / VIRTIO_NET_HDR_GSO_UDP_L4 forwarding path Marek Majkowski
2023-07-12 14:58 ` Yan Zhai
2023-07-12 15:19   ` Paolo Abeni
2023-07-13  0:50     ` Willem de Bruijn
2023-07-13  2:01       ` Yan Zhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.