All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
@ 2020-10-21  9:21 wenxu
  2020-10-23 21:12 ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: wenxu @ 2020-10-21  9:21 UTC (permalink / raw)
  To: netdev

From: wenxu <wenxu@ucloud.cn>

The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
should always do fragment. So it should not care the frag_off in
inner ip.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 net/ipv4/ip_tunnel.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 8b04d1d..ee65c92 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -608,9 +608,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			ttl = ip4_dst_hoplimit(&rt->dst);
 	}
 
-	if (!df && skb->protocol == htons(ETH_P_IP))
-		df = inner_iph->frag_off & htons(IP_DF);
-
 	headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
 	if (headroom > dev->needed_headroom)
 		dev->needed_headroom = headroom;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-21  9:21 [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags wenxu
@ 2020-10-23 21:12 ` Jakub Kicinski
  2020-10-26  8:23   ` wenxu
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-23 21:12 UTC (permalink / raw)
  To: wenxu; +Cc: netdev

On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
> should always do fragment. So it should not care the frag_off in
> inner ip.

Can you describe the use case better? My understanding is that we
should propagate DF in normally functioning networks, and let PMTU 
do its job.

> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
>  net/ipv4/ip_tunnel.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 8b04d1d..ee65c92 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -608,9 +608,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
>  			ttl = ip4_dst_hoplimit(&rt->dst);
>  	}
>  
> -	if (!df && skb->protocol == htons(ETH_P_IP))
> -		df = inner_iph->frag_off & htons(IP_DF);
> -
>  	headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
>  	if (headroom > dev->needed_headroom)
>  		dev->needed_headroom = headroom;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-23 21:12 ` Jakub Kicinski
@ 2020-10-26  8:23   ` wenxu
  2020-10-26 20:56     ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: wenxu @ 2020-10-26  8:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev


On 10/24/2020 5:12 AM, Jakub Kicinski wrote:
> On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
>> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
>> should always do fragment. So it should not care the frag_off in
>> inner ip.
> Can you describe the use case better? My understanding is that we
> should propagate DF in normally functioning networks, and let PMTU 
> do its job.

Sorry for relying so late.  ip_md_tunnel_xmit send packet in the collect_md mode.

For OpenVswitch example, ovs set the gre port with flags df_default=false which will not

set TUNNEL_DONT_FRAGMENT for tun_flags.

And the mtu of virtual machine is 1500 with default. And the tunnel underlay device mtu

is 1500 default too. So if the size of packet send from vm +  underlay length > underlay device mtu.

The packet always be dropped if the ip header of  packet set flags with DF.

In the collect_md the outer packet can fragment or not should depends on the tun_flags but not inner

ip header like vxlan device did.

>> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>>  net/ipv4/ip_tunnel.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
>> index 8b04d1d..ee65c92 100644
>> --- a/net/ipv4/ip_tunnel.c
>> +++ b/net/ipv4/ip_tunnel.c
>> @@ -608,9 +608,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
>>  			ttl = ip4_dst_hoplimit(&rt->dst);
>>  	}
>>  
>> -	if (!df && skb->protocol == htons(ETH_P_IP))
>> -		df = inner_iph->frag_off & htons(IP_DF);
>> -
>>  	headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
>>  	if (headroom > dev->needed_headroom)
>>  		dev->needed_headroom = headroom;
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-26  8:23   ` wenxu
@ 2020-10-26 20:56     ` Jakub Kicinski
  2020-10-27 14:51       ` David Ahern
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-26 20:56 UTC (permalink / raw)
  To: wenxu; +Cc: netdev, Stefano Brivio, David Ahern

On Mon, 26 Oct 2020 16:23:29 +0800 wenxu wrote:
> On 10/24/2020 5:12 AM, Jakub Kicinski wrote:
> > On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:  
> >> From: wenxu <wenxu@ucloud.cn>
> >>
> >> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
> >> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
> >> should always do fragment. So it should not care the frag_off in
> >> inner ip.  
> > Can you describe the use case better? My understanding is that we
> > should propagate DF in normally functioning networks, and let PMTU 
> > do its job.  
> 
> Sorry for relying so late.  ip_md_tunnel_xmit send packet in the collect_md mode.
> 
> For OpenVswitch example, ovs set the gre port with flags df_default=false which will not
> 
> set TUNNEL_DONT_FRAGMENT for tun_flags.
> 
> And the mtu of virtual machine is 1500 with default. And the tunnel underlay device mtu
> 
> is 1500 default too. So if the size of packet send from vm +  underlay length > underlay device mtu.
> 
> The packet always be dropped if the ip header of  packet set flags with DF.
> 
> In the collect_md the outer packet can fragment or not should depends on the tun_flags but not inner
> 
> ip header like vxlan device did.

Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
support for directly bridged IP packets")? Sounds like non-UDP tunnels
need the same treatment to make PMTUD work.

RFC2003 seems to clearly forbid ignoring the inner DF:

      Identification, Flags, Fragment Offset

         These three fields are set as specified in [10].  However, if
         the "Don't Fragment" bit is set in the inner IP header, it MUST
         be set in the outer IP header; if the "Don't Fragment" bit is
         not set in the inner IP header, it MAY be set in the outer IP
         header, as described in Section 5.1.

and:

   [..] In
   particular, use of IP options is allowed, and use of fragmentation is
   allowed unless the "Don't Fragment" bit is set in the inner IP
   header.  This restriction on fragmentation is required so that nodes
   employing Path MTU Discovery [7] can obtain the information they
   seek.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-26 20:56     ` Jakub Kicinski
@ 2020-10-27 14:51       ` David Ahern
  2020-10-27 15:55         ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2020-10-27 14:51 UTC (permalink / raw)
  To: Jakub Kicinski, wenxu; +Cc: netdev, Stefano Brivio, David Ahern

On 10/26/20 2:56 PM, Jakub Kicinski wrote:
> On Mon, 26 Oct 2020 16:23:29 +0800 wenxu wrote:
>> On 10/24/2020 5:12 AM, Jakub Kicinski wrote:
>>> On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:  
>>>> From: wenxu <wenxu@ucloud.cn>
>>>>
>>>> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
>>>> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
>>>> should always do fragment. So it should not care the frag_off in
>>>> inner ip.  
>>> Can you describe the use case better? My understanding is that we
>>> should propagate DF in normally functioning networks, and let PMTU 
>>> do its job.  
>>
>> Sorry for relying so late.  ip_md_tunnel_xmit send packet in the collect_md mode.
>>
>> For OpenVswitch example, ovs set the gre port with flags df_default=false which will not
>>
>> set TUNNEL_DONT_FRAGMENT for tun_flags.
>>
>> And the mtu of virtual machine is 1500 with default. And the tunnel underlay device mtu
>>
>> is 1500 default too. So if the size of packet send from vm +  underlay length > underlay device mtu.
>>
>> The packet always be dropped if the ip header of  packet set flags with DF.
>>
>> In the collect_md the outer packet can fragment or not should depends on the tun_flags but not inner
>>
>> ip header like vxlan device did.
> 
> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> support for directly bridged IP packets")? Sounds like non-UDP tunnels
> need the same treatment to make PMTUD work.
> 
> RFC2003 seems to clearly forbid ignoring the inner DF:

I was looking at this patch Sunday night. To me it seems odd that
packets flowing through the overlay affect decisions in the underlay
which meant I agree with the proposed change.

ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
based on the inner header? Or, if you agree with RFC 2003 and the DF
should be propagated outer to inner, then it seems like the df reset
needs to be moved up before the call to tnl_update_pmtu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-27 14:51       ` David Ahern
@ 2020-10-27 15:55         ` Jakub Kicinski
  2020-10-29  2:30           ` wenxu
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-27 15:55 UTC (permalink / raw)
  To: David Ahern; +Cc: wenxu, netdev, Stefano Brivio, David Ahern

On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
> > Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> > support for directly bridged IP packets")? Sounds like non-UDP tunnels
> > need the same treatment to make PMTUD work.
> > 
> > RFC2003 seems to clearly forbid ignoring the inner DF:  
> 
> I was looking at this patch Sunday night. To me it seems odd that
> packets flowing through the overlay affect decisions in the underlay
> which meant I agree with the proposed change.

The RFC was probably written before we invented terms like underlay 
and overlay, and still considered tunneling to be an inefficient hack ;)

> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
> based on the inner header? Or, if you agree with RFC 2003 and the DF
> should be propagated outer to inner, then it seems like the df reset
> needs to be moved up before the call to tnl_update_pmtu

Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
generated based on the inner headers.

We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt, 
but IMHO the default should not be violating the RFC.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-27 15:55         ` Jakub Kicinski
@ 2020-10-29  2:30           ` wenxu
  2020-10-30  1:14             ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: wenxu @ 2020-10-29  2:30 UTC (permalink / raw)
  To: Jakub Kicinski, David Ahern; +Cc: netdev, Stefano Brivio, David Ahern


On 10/27/2020 11:55 PM, Jakub Kicinski wrote:
> On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
>>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
>>> support for directly bridged IP packets")? Sounds like non-UDP tunnels
>>> need the same treatment to make PMTUD work.
>>>
>>> RFC2003 seems to clearly forbid ignoring the inner DF:  
>> I was looking at this patch Sunday night. To me it seems odd that
>> packets flowing through the overlay affect decisions in the underlay
>> which meant I agree with the proposed change.
> The RFC was probably written before we invented terms like underlay 
> and overlay, and still considered tunneling to be an inefficient hack ;)
>
>> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
>> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
>> based on the inner header? Or, if you agree with RFC 2003 and the DF
>> should be propagated outer to inner, then it seems like the df reset
>> needs to be moved up before the call to tnl_update_pmtu
> Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
> PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
> generated based on the inner headers.
>
> We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt, 
> but IMHO the default should not be violating the RFC.

If we add  TUNNEL_IGNORE_DF to lwt,  the two IGNORE_DF and DONT_FRAGMENT

flags should not coexist ?   Or DONT_FRAGMENT is prior to the IGNORE_DF?


Also there is inconsistent in the kernel for the tunnel device. For geneve and

vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set

the outer df only based  TUNNEL_DONT_FRAGMENT .

And this is also the some behavior for gre device before switching to use 
ip_md_tunnel_xmit as the following patch.

962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
  2020-10-29  2:30           ` wenxu
@ 2020-10-30  1:14             ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-30  1:14 UTC (permalink / raw)
  To: wenxu; +Cc: David Ahern, netdev, Stefano Brivio, David Ahern

On Thu, 29 Oct 2020 10:30:50 +0800 wenxu wrote:
> On 10/27/2020 11:55 PM, Jakub Kicinski wrote:
> > On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:  
> >>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> >>> support for directly bridged IP packets")? Sounds like non-UDP tunnels
> >>> need the same treatment to make PMTUD work.
> >>>
> >>> RFC2003 seems to clearly forbid ignoring the inner DF:    
> >> I was looking at this patch Sunday night. To me it seems odd that
> >> packets flowing through the overlay affect decisions in the underlay
> >> which meant I agree with the proposed change.  
> > The RFC was probably written before we invented terms like underlay 
> > and overlay, and still considered tunneling to be an inefficient hack ;)
> >  
> >> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
> >> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
> >> based on the inner header? Or, if you agree with RFC 2003 and the DF
> >> should be propagated outer to inner, then it seems like the df reset
> >> needs to be moved up before the call to tnl_update_pmtu  
> > Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
> > PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
> > generated based on the inner headers.
> >
> > We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt, 
> > but IMHO the default should not be violating the RFC.  
> 
> If we add  TUNNEL_IGNORE_DF to lwt,  the two IGNORE_DF and DONT_FRAGMENT
> 
> flags should not coexist ?   Or DONT_FRAGMENT is prior to the IGNORE_DF?
> 
> 
> Also there is inconsistent in the kernel for the tunnel device. For geneve and
> 
> vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set
> 
> the outer df only based  TUNNEL_DONT_FRAGMENT .
> 
> And this is also the some behavior for gre device before switching to use 
> ip_md_tunnel_xmit as the following patch.
> 
> 962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit

Ah, that's a lot more convincing, I was looking at the Fixes tag you
provided, but it seems like Fixes should really point at the commit you
mention here.

Please mention the change in GRE behavior and the discrepancy between
handling of DF by different tunnels in the commit message and repost.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-30  1:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21  9:21 [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags wenxu
2020-10-23 21:12 ` Jakub Kicinski
2020-10-26  8:23   ` wenxu
2020-10-26 20:56     ` Jakub Kicinski
2020-10-27 14:51       ` David Ahern
2020-10-27 15:55         ` Jakub Kicinski
2020-10-29  2:30           ` wenxu
2020-10-30  1:14             ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.