* [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
@ 2020-10-21 9:21 wenxu
2020-10-23 21:12 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: wenxu @ 2020-10-21 9:21 UTC (permalink / raw)
To: netdev
From: wenxu <wenxu@ucloud.cn>
The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
should always do fragment. So it should not care the frag_off in
inner ip.
Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
net/ipv4/ip_tunnel.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 8b04d1d..ee65c92 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -608,9 +608,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
ttl = ip4_dst_hoplimit(&rt->dst);
}
- if (!df && skb->protocol == htons(ETH_P_IP))
- df = inner_iph->frag_off & htons(IP_DF);
-
headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
if (headroom > dev->needed_headroom)
dev->needed_headroom = headroom;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-21 9:21 [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags wenxu
@ 2020-10-23 21:12 ` Jakub Kicinski
2020-10-26 8:23 ` wenxu
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-23 21:12 UTC (permalink / raw)
To: wenxu; +Cc: netdev
On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
>
> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
> should always do fragment. So it should not care the frag_off in
> inner ip.
Can you describe the use case better? My understanding is that we
should propagate DF in normally functioning networks, and let PMTU
do its job.
> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
> net/ipv4/ip_tunnel.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 8b04d1d..ee65c92 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -608,9 +608,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
> ttl = ip4_dst_hoplimit(&rt->dst);
> }
>
> - if (!df && skb->protocol == htons(ETH_P_IP))
> - df = inner_iph->frag_off & htons(IP_DF);
> -
> headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
> if (headroom > dev->needed_headroom)
> dev->needed_headroom = headroom;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-23 21:12 ` Jakub Kicinski
@ 2020-10-26 8:23 ` wenxu
2020-10-26 20:56 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: wenxu @ 2020-10-26 8:23 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev
On 10/24/2020 5:12 AM, Jakub Kicinski wrote:
> On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
>> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
>> should always do fragment. So it should not care the frag_off in
>> inner ip.
> Can you describe the use case better? My understanding is that we
> should propagate DF in normally functioning networks, and let PMTU
> do its job.
Sorry for relying so late. ip_md_tunnel_xmit send packet in the collect_md mode.
For OpenVswitch example, ovs set the gre port with flags df_default=false which will not
set TUNNEL_DONT_FRAGMENT for tun_flags.
And the mtu of virtual machine is 1500 with default. And the tunnel underlay device mtu
is 1500 default too. So if the size of packet send from vm + underlay length > underlay device mtu.
The packet always be dropped if the ip header of packet set flags with DF.
In the collect_md the outer packet can fragment or not should depends on the tun_flags but not inner
ip header like vxlan device did.
>> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>> net/ipv4/ip_tunnel.c | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
>> index 8b04d1d..ee65c92 100644
>> --- a/net/ipv4/ip_tunnel.c
>> +++ b/net/ipv4/ip_tunnel.c
>> @@ -608,9 +608,6 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
>> ttl = ip4_dst_hoplimit(&rt->dst);
>> }
>>
>> - if (!df && skb->protocol == htons(ETH_P_IP))
>> - df = inner_iph->frag_off & htons(IP_DF);
>> -
>> headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
>> if (headroom > dev->needed_headroom)
>> dev->needed_headroom = headroom;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-26 8:23 ` wenxu
@ 2020-10-26 20:56 ` Jakub Kicinski
2020-10-27 14:51 ` David Ahern
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-26 20:56 UTC (permalink / raw)
To: wenxu; +Cc: netdev, Stefano Brivio, David Ahern
On Mon, 26 Oct 2020 16:23:29 +0800 wenxu wrote:
> On 10/24/2020 5:12 AM, Jakub Kicinski wrote:
> > On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:
> >> From: wenxu <wenxu@ucloud.cn>
> >>
> >> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
> >> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
> >> should always do fragment. So it should not care the frag_off in
> >> inner ip.
> > Can you describe the use case better? My understanding is that we
> > should propagate DF in normally functioning networks, and let PMTU
> > do its job.
>
> Sorry for relying so late. ip_md_tunnel_xmit send packet in the collect_md mode.
>
> For OpenVswitch example, ovs set the gre port with flags df_default=false which will not
>
> set TUNNEL_DONT_FRAGMENT for tun_flags.
>
> And the mtu of virtual machine is 1500 with default. And the tunnel underlay device mtu
>
> is 1500 default too. So if the size of packet send from vm + underlay length > underlay device mtu.
>
> The packet always be dropped if the ip header of packet set flags with DF.
>
> In the collect_md the outer packet can fragment or not should depends on the tun_flags but not inner
>
> ip header like vxlan device did.
Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
support for directly bridged IP packets")? Sounds like non-UDP tunnels
need the same treatment to make PMTUD work.
RFC2003 seems to clearly forbid ignoring the inner DF:
Identification, Flags, Fragment Offset
These three fields are set as specified in [10]. However, if
the "Don't Fragment" bit is set in the inner IP header, it MUST
be set in the outer IP header; if the "Don't Fragment" bit is
not set in the inner IP header, it MAY be set in the outer IP
header, as described in Section 5.1.
and:
[..] In
particular, use of IP options is allowed, and use of fragmentation is
allowed unless the "Don't Fragment" bit is set in the inner IP
header. This restriction on fragmentation is required so that nodes
employing Path MTU Discovery [7] can obtain the information they
seek.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-26 20:56 ` Jakub Kicinski
@ 2020-10-27 14:51 ` David Ahern
2020-10-27 15:55 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2020-10-27 14:51 UTC (permalink / raw)
To: Jakub Kicinski, wenxu; +Cc: netdev, Stefano Brivio, David Ahern
On 10/26/20 2:56 PM, Jakub Kicinski wrote:
> On Mon, 26 Oct 2020 16:23:29 +0800 wenxu wrote:
>> On 10/24/2020 5:12 AM, Jakub Kicinski wrote:
>>> On Wed, 21 Oct 2020 17:21:55 +0800 wenxu@ucloud.cn wrote:
>>>> From: wenxu <wenxu@ucloud.cn>
>>>>
>>>> The TUNNEL_DONT_FRAGMENT flags specific the tunnel outer ip can do
>>>> fragment or not in the md mode. Without the TUNNEL_DONT_FRAGMENT
>>>> should always do fragment. So it should not care the frag_off in
>>>> inner ip.
>>> Can you describe the use case better? My understanding is that we
>>> should propagate DF in normally functioning networks, and let PMTU
>>> do its job.
>>
>> Sorry for relying so late. ip_md_tunnel_xmit send packet in the collect_md mode.
>>
>> For OpenVswitch example, ovs set the gre port with flags df_default=false which will not
>>
>> set TUNNEL_DONT_FRAGMENT for tun_flags.
>>
>> And the mtu of virtual machine is 1500 with default. And the tunnel underlay device mtu
>>
>> is 1500 default too. So if the size of packet send from vm + underlay length > underlay device mtu.
>>
>> The packet always be dropped if the ip header of packet set flags with DF.
>>
>> In the collect_md the outer packet can fragment or not should depends on the tun_flags but not inner
>>
>> ip header like vxlan device did.
>
> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> support for directly bridged IP packets")? Sounds like non-UDP tunnels
> need the same treatment to make PMTUD work.
>
> RFC2003 seems to clearly forbid ignoring the inner DF:
I was looking at this patch Sunday night. To me it seems odd that
packets flowing through the overlay affect decisions in the underlay
which meant I agree with the proposed change.
ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
based on the inner header? Or, if you agree with RFC 2003 and the DF
should be propagated outer to inner, then it seems like the df reset
needs to be moved up before the call to tnl_update_pmtu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-27 14:51 ` David Ahern
@ 2020-10-27 15:55 ` Jakub Kicinski
2020-10-29 2:30 ` wenxu
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-27 15:55 UTC (permalink / raw)
To: David Ahern; +Cc: wenxu, netdev, Stefano Brivio, David Ahern
On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
> > Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> > support for directly bridged IP packets")? Sounds like non-UDP tunnels
> > need the same treatment to make PMTUD work.
> >
> > RFC2003 seems to clearly forbid ignoring the inner DF:
>
> I was looking at this patch Sunday night. To me it seems odd that
> packets flowing through the overlay affect decisions in the underlay
> which meant I agree with the proposed change.
The RFC was probably written before we invented terms like underlay
and overlay, and still considered tunneling to be an inefficient hack ;)
> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
> based on the inner header? Or, if you agree with RFC 2003 and the DF
> should be propagated outer to inner, then it seems like the df reset
> needs to be moved up before the call to tnl_update_pmtu
Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
generated based on the inner headers.
We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt,
but IMHO the default should not be violating the RFC.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-27 15:55 ` Jakub Kicinski
@ 2020-10-29 2:30 ` wenxu
2020-10-30 1:14 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: wenxu @ 2020-10-29 2:30 UTC (permalink / raw)
To: Jakub Kicinski, David Ahern; +Cc: netdev, Stefano Brivio, David Ahern
On 10/27/2020 11:55 PM, Jakub Kicinski wrote:
> On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
>>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
>>> support for directly bridged IP packets")? Sounds like non-UDP tunnels
>>> need the same treatment to make PMTUD work.
>>>
>>> RFC2003 seems to clearly forbid ignoring the inner DF:
>> I was looking at this patch Sunday night. To me it seems odd that
>> packets flowing through the overlay affect decisions in the underlay
>> which meant I agree with the proposed change.
> The RFC was probably written before we invented terms like underlay
> and overlay, and still considered tunneling to be an inefficient hack ;)
>
>> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
>> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
>> based on the inner header? Or, if you agree with RFC 2003 and the DF
>> should be propagated outer to inner, then it seems like the df reset
>> needs to be moved up before the call to tnl_update_pmtu
> Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
> PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
> generated based on the inner headers.
>
> We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt,
> but IMHO the default should not be violating the RFC.
If we add TUNNEL_IGNORE_DF to lwt, the two IGNORE_DF and DONT_FRAGMENT
flags should not coexist ? Or DONT_FRAGMENT is prior to the IGNORE_DF?
Also there is inconsistent in the kernel for the tunnel device. For geneve and
vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set
the outer df only based TUNNEL_DONT_FRAGMENT .
And this is also the some behavior for gre device before switching to use
ip_md_tunnel_xmit as the following patch.
962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags
2020-10-29 2:30 ` wenxu
@ 2020-10-30 1:14 ` Jakub Kicinski
0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2020-10-30 1:14 UTC (permalink / raw)
To: wenxu; +Cc: David Ahern, netdev, Stefano Brivio, David Ahern
On Thu, 29 Oct 2020 10:30:50 +0800 wenxu wrote:
> On 10/27/2020 11:55 PM, Jakub Kicinski wrote:
> > On Tue, 27 Oct 2020 08:51:07 -0600 David Ahern wrote:
> >>> Is this another incarnation of 4cb47a8644cc ("tunnels: PMTU discovery
> >>> support for directly bridged IP packets")? Sounds like non-UDP tunnels
> >>> need the same treatment to make PMTUD work.
> >>>
> >>> RFC2003 seems to clearly forbid ignoring the inner DF:
> >> I was looking at this patch Sunday night. To me it seems odd that
> >> packets flowing through the overlay affect decisions in the underlay
> >> which meant I agree with the proposed change.
> > The RFC was probably written before we invented terms like underlay
> > and overlay, and still considered tunneling to be an inefficient hack ;)
> >
> >> ip_md_tunnel_xmit is inconsistent right now. tnl_update_pmtu is called
> >> based on the TUNNEL_DONT_FRAGMENT flag, so why let it be changed later
> >> based on the inner header? Or, if you agree with RFC 2003 and the DF
> >> should be propagated outer to inner, then it seems like the df reset
> >> needs to be moved up before the call to tnl_update_pmtu
> > Looks like TUNNEL_DONT_FRAGMENT is intended to switch between using
> > PMTU inside the tunnel or just the tunnel dev MTU. ICMP PTB is still
> > generated based on the inner headers.
> >
> > We should be okay to add something like IFLA_GRE_IGNORE_DF to lwt,
> > but IMHO the default should not be violating the RFC.
>
> If we add TUNNEL_IGNORE_DF to lwt, the two IGNORE_DF and DONT_FRAGMENT
>
> flags should not coexist ? Or DONT_FRAGMENT is prior to the IGNORE_DF?
>
>
> Also there is inconsistent in the kernel for the tunnel device. For geneve and
>
> vxlan tunnel (don't send tunnel with ip_md_tunnel_xmit) in the lwt mode set
>
> the outer df only based TUNNEL_DONT_FRAGMENT .
>
> And this is also the some behavior for gre device before switching to use
> ip_md_tunnel_xmit as the following patch.
>
> 962924f ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit
Ah, that's a lot more convincing, I was looking at the Fixes tag you
provided, but it seems like Fixes should really point at the commit you
mention here.
Please mention the change in GRE behavior and the discrepancy between
handling of DF by different tunnels in the commit message and repost.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-10-30 1:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21 9:21 [PATCH net] ip_tunnel: fix over-mtu packet send fail without TUNNEL_DONT_FRAGMENT flags wenxu
2020-10-23 21:12 ` Jakub Kicinski
2020-10-26 8:23 ` wenxu
2020-10-26 20:56 ` Jakub Kicinski
2020-10-27 14:51 ` David Ahern
2020-10-27 15:55 ` Jakub Kicinski
2020-10-29 2:30 ` wenxu
2020-10-30 1:14 ` Jakub Kicinski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.