linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
@ 2022-02-13 15:02 Tao Liu
  2022-02-14  1:28 ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Tao Liu @ 2022-02-13 15:02 UTC (permalink / raw)
  To: davem, yoshfuji, dsahern, kuba, edumazet, sridhar.samudrala
  Cc: netdev, linux-kernel, Tao Liu

We encouter a tcp drop issue in our cloud environment. Packet GROed in host
forwards to a VM virtio_net nic with net_failover enabled. VM acts as a
IPVS LB with ipip encapsulation. The full path like:
host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
 -> ipip encap -> net_failover tx -> virtio_net tx

When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
performed because it supports TSO and GSO_IPXIP4. But network_header has
been pointing to inner ip header.

Call Trace:
 tcp4_gso_segment        ------> return NULL
 inet_gso_segment        ------> inner iph, network_header points to
 ipip_gso_segment
 inet_gso_segment        ------> outer iph
 skb_mac_gso_segment

Afterwards virtio_net transmits the pkt, only inner ip header is modified.
And the outer one just keeps untouched. The pkt will be dropped in remote
host. So we need to reset network header if there is no gso performed in
net_failover.

Call Trace:
 inet_gso_segment        ------> inner iph, outer iph is skipped
 skb_mac_gso_segment
 __skb_gso_segment
 validate_xmit_skb
 validate_xmit_skb_list
 sch_direct_xmit
 __qdisc_run
 __dev_queue_xmit        ------> virtio_net
 dev_hard_start_xmit
 __dev_queue_xmit        ------> net_failover
 ip_finish_output2
 ip_output
 iptunnel_xmit
 ip_tunnel_xmit
 ipip_tunnel_xmit        ------> ipip
 dev_hard_start_xmit
 __dev_queue_xmit
 ip_finish_output2
 ip_output
 ip_forward
 ip_rcv
 __netif_receive_skb_one_core
 netif_receive_skb_internal
 napi_gro_receive
 receive_buf
 virtnet_poll
 net_rx_action

Fixes: cb32f511a70b ("ipip: add GSO/TSO support")
Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
---
 net/ipv4/af_inet.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9c465ba..f8b3f8a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 static struct sk_buff *ipip_gso_segment(struct sk_buff *skb,
 					netdev_features_t features)
 {
+	struct sk_buff *segs;
+	int nhoff;
+
 	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4))
 		return ERR_PTR(-EINVAL);
 
-	return inet_gso_segment(skb, features);
+	nhoff = skb_network_header(skb) - skb_mac_header(skb);
+	segs = inet_gso_segment(skb, features);
+	if (!segs)
+		skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
+
+	return segs;
 }
 
 struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
  2022-02-13 15:02 [PATCH] gso: do not skip outer ip header in case of ipip and net_failover Tao Liu
@ 2022-02-14  1:28 ` Willem de Bruijn
  2022-02-14  4:03   ` Tao Liu
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2022-02-14  1:28 UTC (permalink / raw)
  To: Tao Liu
  Cc: davem, yoshfuji, dsahern, kuba, edumazet, sridhar.samudrala,
	netdev, linux-kernel

On Sun, Feb 13, 2022 at 10:10 AM Tao Liu <thomas.liu@ucloud.cn> wrote:
>
> We encouter a tcp drop issue in our cloud environment. Packet GROed in host
> forwards to a VM virtio_net nic with net_failover enabled. VM acts as a
> IPVS LB with ipip encapsulation. The full path like:
> host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
>  -> ipip encap -> net_failover tx -> virtio_net tx
>
> When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
> performed because it supports TSO and GSO_IPXIP4. But network_header has
> been pointing to inner ip header.

If the packet is configured correctly, and net_failover advertises
that it can handle TSO packets with IPIP encap, then still virtio_net
should not advertise it and software GSO be applied on its
dev_queue_xmit call.

This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly
set, but also tunneling fields like skb->encapsulated and
skb_inner_network_header.

> ---
>  net/ipv4/af_inet.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 9c465ba..f8b3f8a 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  static struct sk_buff *ipip_gso_segment(struct sk_buff *skb,
>                                         netdev_features_t features)
>  {
> +       struct sk_buff *segs;
> +       int nhoff;
> +
>         if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4))
>                 return ERR_PTR(-EINVAL);
>
> -       return inet_gso_segment(skb, features);
> +       nhoff = skb_network_header(skb) - skb_mac_header(skb);
> +       segs = inet_gso_segment(skb, features);
> +       if (!segs)
> +               skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
> +
> +       return segs;
>  }

If this would be needed for IPIP, then the same would be needed for SIT, etc.

Is the skb_network_header

1. correctly pointing to the outer header of the TSO packet before the
call to inet_gso_segment
2. incorrectly pointing to the inner header of the (still) TSO packet
after the call to inet_gso_segment

inet_gso_segment already does the same operation: save nhoff, pull
network header, call callbacks.gso_segment (which can be
ipip_gso_segment->inet_gso_segment), then place the network header
back at nhoff.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
  2022-02-14  1:28 ` Willem de Bruijn
@ 2022-02-14  4:03   ` Tao Liu
  2022-02-14  4:27     ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Tao Liu @ 2022-02-14  4:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, yoshfuji, dsahern, kuba, edumazet, sridhar.samudrala,
	netdev, linux-kernel

Sorry for bothering, just repost it.

> 2022年2月14日 09:28,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道:
> 
> On Sun, Feb 13, 2022 at 10:10 AM Tao Liu <thomas.liu@ucloud.cn> wrote:
>> 
>> We encouter a tcp drop issue in our cloud environment. Packet GROed in host
>> forwards to a VM virtio_net nic with net_failover enabled. VM acts as a
>> IPVS LB with ipip encapsulation. The full path like:
>> host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
>> -> ipip encap -> net_failover tx -> virtio_net tx
>> 
>> When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
>> performed because it supports TSO and GSO_IPXIP4. But network_header has
>> been pointing to inner ip header.
> 
> If the packet is configured correctly, and net_failover advertises
> that it can handle TSO packets with IPIP encap, then still virtio_net
> should not advertise it and software GSO be applied on its
> dev_queue_xmit call.
> 
> This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly
> set, but also tunneling fields like skb->encapsulated and
> skb_inner_network_header.
Thanks very much for your comment!

Yes, the packet is correct. Another thing i have not pointed directly is
that the pkt has SKB_GSO_DODGY. net_failover do not advertises GSO_ROBUST
but virtio_net do.

>> ---
>> net/ipv4/af_inet.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 9c465ba..f8b3f8a 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>> static struct sk_buff *ipip_gso_segment(struct sk_buff *skb,
>>                                        netdev_features_t features)
>> {
>> +       struct sk_buff *segs;
>> +       int nhoff;
>> +
>>        if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4))
>>                return ERR_PTR(-EINVAL);
>> 
>> -       return inet_gso_segment(skb, features);
>> +       nhoff = skb_network_header(skb) - skb_mac_header(skb);
>> +       segs = inet_gso_segment(skb, features);
>> +       if (!segs)
>> +               skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
>> +
>> +       return segs;
>> }
> 
> If this would be needed for IPIP, then the same would be needed for SIT, etc.
> 
> Is the skb_network_header
> 
> 1. correctly pointing to the outer header of the TSO packet before the
> call to inet_gso_segment
> 2. incorrectly pointing to the inner header of the (still) TSO packet
> after the call to inet_gso_segment
> 
> inet_gso_segment already does the same operation: save nhoff, pull
> network header, call callbacks.gso_segment (which can be
> ipip_gso_segment->inet_gso_segment), then place the network header
> back at nhoff.
> 
values print in skb_mac_gso_segment() before callbacks.gso_segment:
ipip:               vlan_depth=0 mac_len=0 skb->network_header=206
net_failover:  vlan_depth=14 mac_len=14 skb->network_header=186
virtio_net:      vlan_depth=34 mac_len=34 skb->network_header=206

agree to add sit/ip4ip6/ip6ip6, and patch can be simplified as:

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9c465ba..72fde28 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1376,8 +1376,11 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
        }

        ops = rcu_dereference(inet_offloads[proto]);
-       if (likely(ops && ops->callbacks.gso_segment))
+       if (likely(ops && ops->callbacks.gso_segment)) {
                segs = ops->callbacks.gso_segment(skb, features);
+               if (!segs)
+                       skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
+       }

        if (IS_ERR_OR_NULL(segs))
                goto out;
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index b29e9ba..5f577e2 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -114,6 +114,8 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
        if (likely(ops && ops->callbacks.gso_segment)) {
                skb_reset_transport_header(skb);
                segs = ops->callbacks.gso_segment(skb, features);
+               if (!segs)
+                       skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
        }

        if (IS_ERR_OR_NULL(segs))



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
  2022-02-14  4:03   ` Tao Liu
@ 2022-02-14  4:27     ` Willem de Bruijn
       [not found]       ` <42554FCB-9180-4B32-B5CF-6D3236237D99@ucloud.cn>
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2022-02-14  4:27 UTC (permalink / raw)
  To: Tao Liu
  Cc: Willem de Bruijn, davem, yoshfuji, dsahern, kuba, edumazet,
	sridhar.samudrala, netdev, linux-kernel

On Sun, Feb 13, 2022 at 11:03 PM Tao Liu <thomas.liu@ucloud.cn> wrote:
>
> Sorry for bothering, just repost it.
>
> > 2022年2月14日 09:28,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道:
> >
> > On Sun, Feb 13, 2022 at 10:10 AM Tao Liu <thomas.liu@ucloud.cn> wrote:
> >>
> >> We encouter a tcp drop issue in our cloud environment. Packet GROed in host
> >> forwards to a VM virtio_net nic with net_failover enabled. VM acts as a
> >> IPVS LB with ipip encapsulation. The full path like:
> >> host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
> >> -> ipip encap -> net_failover tx -> virtio_net tx
> >>
> >> When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
> >> performed because it supports TSO and GSO_IPXIP4. But network_header has
> >> been pointing to inner ip header.
> >
> > If the packet is configured correctly, and net_failover advertises
> > that it can handle TSO packets with IPIP encap, then still virtio_net
> > should not advertise it and software GSO be applied on its
> > dev_queue_xmit call.
> >
> > This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly
> > set, but also tunneling fields like skb->encapsulated and
> > skb_inner_network_header.
> Thanks very much for your comment!
>
> Yes, the packet is correct. Another thing i have not pointed directly is
> that the pkt has SKB_GSO_DODGY. net_failover do not advertises GSO_ROBUST
> but virtio_net do.

If net_failover does not advertise NETIF_F_GSO_ROBUST, then
tcp_gso_segment will pass a packet with SKB_GSO_DODGY to the
software gso stack, not taking the branch

        if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {

> >> ---
> >> net/ipv4/af_inet.c | 10 +++++++++-
> >> 1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> >> index 9c465ba..f8b3f8a 100644
> >> --- a/net/ipv4/af_inet.c
> >> +++ b/net/ipv4/af_inet.c
> >> @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> >> static struct sk_buff *ipip_gso_segment(struct sk_buff *skb,
> >>                                        netdev_features_t features)
> >> {
> >> +       struct sk_buff *segs;
> >> +       int nhoff;
> >> +
> >>        if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4))
> >>                return ERR_PTR(-EINVAL);
> >>
> >> -       return inet_gso_segment(skb, features);
> >> +       nhoff = skb_network_header(skb) - skb_mac_header(skb);
> >> +       segs = inet_gso_segment(skb, features);
> >> +       if (!segs)
> >> +               skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
> >> +
> >> +       return segs;
> >> }
> >
> > If this would be needed for IPIP, then the same would be needed for SIT, etc.
> >
> > Is the skb_network_header
> >
> > 1. correctly pointing to the outer header of the TSO packet before the
> > call to inet_gso_segment
> > 2. incorrectly pointing to the inner header of the (still) TSO packet
> > after the call to inet_gso_segment
> >
> > inet_gso_segment already does the same operation: save nhoff, pull
> > network header, call callbacks.gso_segment (which can be
> > ipip_gso_segment->inet_gso_segment), then place the network header
> > back at nhoff.
> >
> values print in skb_mac_gso_segment() before callbacks.gso_segment:
> ipip:               vlan_depth=0 mac_len=0 skb->network_header=206
> net_failover:  vlan_depth=14 mac_len=14 skb->network_header=186
> virtio_net:      vlan_depth=34 mac_len=34 skb->network_header=206
>
> agree to add sit/ip4ip6/ip6ip6, and patch can be simplified as:

If IPIP GSO was so broken, I think we would have found it long before.

As said, inet_gso_segment should already do the right thing for ipip:
it will be called twice.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
       [not found]       ` <42554FCB-9180-4B32-B5CF-6D3236237D99@ucloud.cn>
@ 2022-02-15 15:01         ` Willem de Bruijn
  2022-02-15 15:35           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2022-02-15 15:01 UTC (permalink / raw)
  To: Tao Liu
  Cc: David Miller, Hideaki YOSHIFUJI, David Ahern, Jakub Kicinski,
	Eric Dumazet, Samudrala, Sridhar, Network Development, LKML

On Mon, Feb 14, 2022 at 8:38 PM Tao Liu <thomas.liu@ucloud.cn> wrote:
>
> Sorry to resend it.
>
> 2022年2月14日 12:27,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道:
>
> On Sun, Feb 13, 2022 at 11:03 PM Tao Liu <thomas.liu@ucloud.cn> wrote:
>
>
> Sorry for bothering, just repost it.
>
> 2022年2月14日 09:28,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道:
>
> On Sun, Feb 13, 2022 at 10:10 AM Tao Liu <thomas.liu@ucloud.cn> wrote:
>
>
> We encouter a tcp drop issue in our cloud environment. Packet GROed in host
> forwards to a VM virtio_net nic with net_failover enabled. VM acts as a
> IPVS LB with ipip encapsulation. The full path like:
> host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
> -> ipip encap -> net_failover tx -> virtio_net tx
>
> When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
> performed because it supports TSO and GSO_IPXIP4. But network_header has
> been pointing to inner ip header.
>
>
> If the packet is configured correctly, and net_failover advertises
> that it can handle TSO packets with IPIP encap, then still virtio_net
> should not advertise it and software GSO be applied on its
> dev_queue_xmit call.
>
> This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly
> set, but also tunneling fields like skb->encapsulated and
> skb_inner_network_header.
>
> Thanks very much for your comment!
>
> Yes, the packet is correct. Another thing i have not pointed directly is
> that the pkt has SKB_GSO_DODGY. net_failover do not advertises GSO_ROBUST
> but virtio_net do.
>
>
> If net_failover does not advertise NETIF_F_GSO_ROBUST, then
> tcp_gso_segment will pass a packet with SKB_GSO_DODGY to the
> software gso stack, not taking the branch
>
>        if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>
> As i tested, packet with SKB_GSO_DODGY hits this branch. packet's gso_type=0x0103, which
> means SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4. net_failover matches
> the condition.
>
> Consequently, tcp_gso_segment returns NULL, there is no software gso did here. And
> network_header points to inner iph.
>
> Software gso is did by virtio_net which not advertises NETIF_F_GSO_IPXIP4. It skips the outer
> iph, and keeps it unchanged.
>
> ---
> net/ipv4/af_inet.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 9c465ba..f8b3f8a 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> static struct sk_buff *ipip_gso_segment(struct sk_buff *skb,
>                                       netdev_features_t features)
> {
> +       struct sk_buff *segs;
> +       int nhoff;
> +
>       if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4))
>               return ERR_PTR(-EINVAL);
>
> -       return inet_gso_segment(skb, features);
> +       nhoff = skb_network_header(skb) - skb_mac_header(skb);
> +       segs = inet_gso_segment(skb, features);
> +       if (!segs)
> +               skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
> +
> +       return segs;
> }
>
>
> If this would be needed for IPIP, then the same would be needed for SIT, etc.
>
> Is the skb_network_header
>
> 1. correctly pointing to the outer header of the TSO packet before the
> call to inet_gso_segment
> 2. incorrectly pointing to the inner header of the (still) TSO packet
> after the call to inet_gso_segment
>
> inet_gso_segment already does the same operation: save nhoff, pull
> network header, call callbacks.gso_segment (which can be
> ipip_gso_segment->inet_gso_segment), then place the network header
> back at nhoff.
>
> values print in skb_mac_gso_segment() before callbacks.gso_segment:
> ipip:               vlan_depth=0 mac_len=0 skb->network_header=206
> net_failover:  vlan_depth=14 mac_len=14 skb->network_header=186
> virtio_net:      vlan_depth=34 mac_len=34 skb->network_header=206
>
> agree to add sit/ip4ip6/ip6ip6, and patch can be simplified as:
>
>
> If IPIP GSO was so broken, I think we would have found it long before.
>
> As said, inet_gso_segment should already do the right thing for ipip:
> it will be called twice.
>
>
> SKB_GSO_DODGY flag and net_failover conduct this issue. local traffic just works fine.

Got it. That is an uncommon combination. SKB_GSO_DODGY is set from
external virtio_net, which does not support tunnels. But a path with
an added tunnel might cause this combination.

And inet_gso_segment resets the network header, both times, before
calling callbacks.gso_segment()

        skb_reset_network_header(skb);
        nhoff = skb_network_header(skb) - skb_mac_header(skb);

        [...]

        if (likely(ops && ops->callbacks.gso_segment))
                segs = ops->callbacks.gso_segment(skb, features);

And resets that after for each skb in segs.

        skb = segs;
        do {
                [...]
                skb->network_header = (u8 *)iph - skb->head;

But does not do this if segs == NULL.

The packet has to be restored before it is passed to the device. I
think we have to handle this case correctly in inet_gso_segment,
instead of patching it up in all the various tunnel devices.

The same holds for ipv6_gso_segment.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
  2022-02-15 15:01         ` Willem de Bruijn
@ 2022-02-15 15:35           ` Eric Dumazet
  2022-02-15 15:45             ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2022-02-15 15:35 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Tao Liu, David Miller, Hideaki YOSHIFUJI, David Ahern,
	Jakub Kicinski, Samudrala, Sridhar, Network Development, LKML

On Tue, Feb 15, 2022 at 7:01 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, Feb 14, 2022 at 8:38 PM Tao Liu <thomas.liu@ucloud.cn> wrote:
> >
> > Sorry to resend it.
> >
> > 2022年2月14日 12:27,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道:
> >
> > On Sun, Feb 13, 2022 at 11:03 PM Tao Liu <thomas.liu@ucloud.cn> wrote:
> >
> >
> > Sorry for bothering, just repost it.
> >
> > 2022年2月14日 09:28,Willem de Bruijn <willemdebruijn.kernel@gmail.com> 写道:
> >
> > On Sun, Feb 13, 2022 at 10:10 AM Tao Liu <thomas.liu@ucloud.cn> wrote:
> >
> >
> > We encouter a tcp drop issue in our cloud environment. Packet GROed in host
> > forwards to a VM virtio_net nic with net_failover enabled. VM acts as a
> > IPVS LB with ipip encapsulation. The full path like:
> > host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
> > -> ipip encap -> net_failover tx -> virtio_net tx
> >
> > When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
> > performed because it supports TSO and GSO_IPXIP4. But network_header has
> > been pointing to inner ip header.
> >
> >
> > If the packet is configured correctly, and net_failover advertises
> > that it can handle TSO packets with IPIP encap, then still virtio_net
> > should not advertise it and software GSO be applied on its
> > dev_queue_xmit call.
> >
> > This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly
> > set, but also tunneling fields like skb->encapsulated and
> > skb_inner_network_header.
> >
> > Thanks very much for your comment!
> >
> > Yes, the packet is correct. Another thing i have not pointed directly is
> > that the pkt has SKB_GSO_DODGY. net_failover do not advertises GSO_ROBUST
> > but virtio_net do.
> >
> >
> > If net_failover does not advertise NETIF_F_GSO_ROBUST, then
> > tcp_gso_segment will pass a packet with SKB_GSO_DODGY to the
> > software gso stack, not taking the branch
> >
> >        if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
> >
> > As i tested, packet with SKB_GSO_DODGY hits this branch. packet's gso_type=0x0103, which
> > means SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4. net_failover matches
> > the condition.
> >
> > Consequently, tcp_gso_segment returns NULL, there is no software gso did here. And
> > network_header points to inner iph.
> >
> > Software gso is did by virtio_net which not advertises NETIF_F_GSO_IPXIP4. It skips the outer
> > iph, and keeps it unchanged.
> >
> > ---
> > net/ipv4/af_inet.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 9c465ba..f8b3f8a 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> > static struct sk_buff *ipip_gso_segment(struct sk_buff *skb,
> >                                       netdev_features_t features)
> > {
> > +       struct sk_buff *segs;
> > +       int nhoff;
> > +
> >       if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4))
> >               return ERR_PTR(-EINVAL);
> >
> > -       return inet_gso_segment(skb, features);
> > +       nhoff = skb_network_header(skb) - skb_mac_header(skb);
> > +       segs = inet_gso_segment(skb, features);
> > +       if (!segs)
> > +               skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
> > +
> > +       return segs;
> > }
> >
> >
> > If this would be needed for IPIP, then the same would be needed for SIT, etc.
> >
> > Is the skb_network_header
> >
> > 1. correctly pointing to the outer header of the TSO packet before the
> > call to inet_gso_segment
> > 2. incorrectly pointing to the inner header of the (still) TSO packet
> > after the call to inet_gso_segment
> >
> > inet_gso_segment already does the same operation: save nhoff, pull
> > network header, call callbacks.gso_segment (which can be
> > ipip_gso_segment->inet_gso_segment), then place the network header
> > back at nhoff.
> >
> > values print in skb_mac_gso_segment() before callbacks.gso_segment:
> > ipip:               vlan_depth=0 mac_len=0 skb->network_header=206
> > net_failover:  vlan_depth=14 mac_len=14 skb->network_header=186
> > virtio_net:      vlan_depth=34 mac_len=34 skb->network_header=206
> >
> > agree to add sit/ip4ip6/ip6ip6, and patch can be simplified as:
> >
> >
> > If IPIP GSO was so broken, I think we would have found it long before.
> >
> > As said, inet_gso_segment should already do the right thing for ipip:
> > it will be called twice.
> >
> >
> > SKB_GSO_DODGY flag and net_failover conduct this issue. local traffic just works fine.
>
> Got it. That is an uncommon combination. SKB_GSO_DODGY is set from
> external virtio_net, which does not support tunnels. But a path with
> an added tunnel might cause this combination.
>
> And inet_gso_segment resets the network header, both times, before
> calling callbacks.gso_segment()
>
>         skb_reset_network_header(skb);
>         nhoff = skb_network_header(skb) - skb_mac_header(skb);
>
>         [...]
>
>         if (likely(ops && ops->callbacks.gso_segment))
>                 segs = ops->callbacks.gso_segment(skb, features);
>
> And resets that after for each skb in segs.
>
>         skb = segs;
>         do {
>                 [...]
>                 skb->network_header = (u8 *)iph - skb->head;
>
> But does not do this if segs == NULL.
>
> The packet has to be restored before it is passed to the device. I
> think we have to handle this case correctly in inet_gso_segment,
> instead of patching it up in all the various tunnel devices.
>
> The same holds for ipv6_gso_segment.

Back in the days, GRO was modified so that we passed a context (nhoff)
in called functions,
instead of changing skb offsets. The concept of outer/inner header
only works with 1 encap.

Perhaps it is time to do the same in GSO, to allow arbitrary levels of
encapsulation.
Then we no longer mess with these limited
'network_header/inner_network_header' fields
in the skb.

Stuffing state in the skb has been a mistake I think.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
  2022-02-15 15:35           ` Eric Dumazet
@ 2022-02-15 15:45             ` Willem de Bruijn
  2022-02-15 17:06               ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2022-02-15 15:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Willem de Bruijn, Tao Liu, David Miller, Hideaki YOSHIFUJI,
	David Ahern, Jakub Kicinski, Samudrala, Sridhar,
	Network Development, LKML

> > Got it. That is an uncommon combination. SKB_GSO_DODGY is set from
> > external virtio_net, which does not support tunnels. But a path with
> > an added tunnel might cause this combination.
> >
> > And inet_gso_segment resets the network header, both times, before
> > calling callbacks.gso_segment()
> >
> >         skb_reset_network_header(skb);
> >         nhoff = skb_network_header(skb) - skb_mac_header(skb);
> >
> >         [...]
> >
> >         if (likely(ops && ops->callbacks.gso_segment))
> >                 segs = ops->callbacks.gso_segment(skb, features);
> >
> > And resets that after for each skb in segs.
> >
> >         skb = segs;
> >         do {
> >                 [...]
> >                 skb->network_header = (u8 *)iph - skb->head;
> >
> > But does not do this if segs == NULL.
> >
> > The packet has to be restored before it is passed to the device. I
> > think we have to handle this case correctly in inet_gso_segment,
> > instead of patching it up in all the various tunnel devices.
> >
> > The same holds for ipv6_gso_segment.
>
> Back in the days, GRO was modified so that we passed a context (nhoff)
> in called functions,
> instead of changing skb offsets. The concept of outer/inner header
> only works with 1 encap.
>
> Perhaps it is time to do the same in GSO, to allow arbitrary levels of
> encapsulation.
> Then we no longer mess with these limited
> 'network_header/inner_network_header' fields
> in the skb.
>
> Stuffing state in the skb has been a mistake I think.

If we could unwind those skb inner_* fields (and reclaim the skbuff
space!) that would be fantastic.

Immediately for this bug: perhaps it can be fixed by resetting the
network_header on the gso skb if segs == NULL. As the offset is stored
on the stack.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover
  2022-02-15 15:45             ` Willem de Bruijn
@ 2022-02-15 17:06               ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2022-02-15 17:06 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Tao Liu, David Miller, Hideaki YOSHIFUJI, David Ahern,
	Jakub Kicinski, Samudrala, Sridhar, Network Development, LKML

On Tue, Feb 15, 2022 at 7:46 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:

> > Stuffing state in the skb has been a mistake I think.
>
> If we could unwind those skb inner_* fields (and reclaim the skbuff
> space!) that would be fantastic.

Not sure we can easily remove the space, many networking drivers need them,
we probably do not want to dissect packets in their ndo_start_xmit()

>
> Immediately for this bug: perhaps it can be fixed by resetting the
> network_header on the gso skb if segs == NULL. As the offset is stored
> on the stack.

It seems correct. Any other fields we need to take care of ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-15 17:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-13 15:02 [PATCH] gso: do not skip outer ip header in case of ipip and net_failover Tao Liu
2022-02-14  1:28 ` Willem de Bruijn
2022-02-14  4:03   ` Tao Liu
2022-02-14  4:27     ` Willem de Bruijn
     [not found]       ` <42554FCB-9180-4B32-B5CF-6D3236237D99@ucloud.cn>
2022-02-15 15:01         ` Willem de Bruijn
2022-02-15 15:35           ` Eric Dumazet
2022-02-15 15:45             ` Willem de Bruijn
2022-02-15 17:06               ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).