All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading
@ 2021-06-25 16:21 Vadim Fedorenko
  2021-06-26 17:18 ` David Ahern
  2021-06-28 20:10 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 6+ messages in thread
From: Vadim Fedorenko @ 2021-06-25 16:21 UTC (permalink / raw)
  To: David Ahern, netdev, Roopa Prabhu
  Cc: Jakub Kicinski, Hideaki YOSHIFUJI, David S. Miller, Vadim Fedorenko

Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
fragmentation logic away from lwtunnel by carry encap headroom and
use it in output MTU calculation. But the forwarding part was not
covered and created difference in MTU for output and forwarding and
further to silent drops on ipv4 forwarding path. Fix it by taking
into account lwtunnel encap headroom.

The same commit also introduced difference in how to treat RTAX_MTU
in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
headroom from route MTU. Make IPv4 version do the same.

Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
 include/net/ip.h        | 12 ++++++++----
 include/net/ip6_route.h | 16 ++++++++++++----
 net/ipv4/route.c        |  3 ++-
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index e20874059f82..d9683bef8684 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -31,6 +31,7 @@
 #include <net/flow.h>
 #include <net/flow_dissector.h>
 #include <net/netns/hash.h>
+#include <net/lwtunnel.h>
 
 #define IPV4_MAX_PMTU		65535U		/* RFC 2675, Section 5.1 */
 #define IPV4_MIN_MTU		68			/* RFC 791 */
@@ -445,22 +446,25 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 
 	/* 'forwarding = true' case should always honour route mtu */
 	mtu = dst_metric_raw(dst, RTAX_MTU);
-	if (mtu)
-		return mtu;
+	if (!mtu)
+		mtu = min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
 
-	return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
+	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
 }
 
 static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
 					  const struct sk_buff *skb)
 {
+	unsigned int mtu;
+
 	if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
 		bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
 
 		return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding);
 	}
 
-	return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
+	mtu = min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
+	return mtu - lwtunnel_headroom(skb_dst(skb)->lwtstate, mtu);
 }
 
 struct dst_metrics *ip_fib_metrics_init(struct net *net, struct nlattr *fc_mx,
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index f51a118bfce8..f14149df5a65 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -265,11 +265,18 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 
 static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
 {
+	int mtu;
+
 	struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
 				inet6_sk(skb->sk) : NULL;
 
-	return (np && np->pmtudisc >= IPV6_PMTUDISC_PROBE) ?
-	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
+	if (np && np->pmtudisc >= IPV6_PMTUDISC_PROBE) {
+		mtu = READ_ONCE(skb_dst(skb)->dev->mtu);
+		mtu -= lwtunnel_headroom(skb_dst(skb)->lwtstate, mtu);
+	} else
+		mtu = dst_mtu(skb_dst(skb));
+
+	return mtu;
 }
 
 static inline bool ip6_sk_accept_pmtu(const struct sock *sk)
@@ -317,7 +324,7 @@ static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 	if (dst_metric_locked(dst, RTAX_MTU)) {
 		mtu = dst_metric_raw(dst, RTAX_MTU);
 		if (mtu)
-			return mtu;
+			goto out;
 	}
 
 	mtu = IPV6_MIN_MTU;
@@ -327,7 +334,8 @@ static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 		mtu = idev->cnf.mtu6;
 	rcu_read_unlock();
 
-	return mtu;
+out:
+	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
 }
 
 u32 ip6_mtu_from_fib6(const struct fib6_result *res,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6a36ac98476f..78d1e5afc452 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1306,7 +1306,7 @@ INDIRECT_CALLABLE_SCOPE unsigned int ipv4_mtu(const struct dst_entry *dst)
 		mtu = dst_metric_raw(dst, RTAX_MTU);
 
 	if (mtu)
-		return mtu;
+		goto out;
 
 	mtu = READ_ONCE(dst->dev->mtu);
 
@@ -1315,6 +1315,7 @@ INDIRECT_CALLABLE_SCOPE unsigned int ipv4_mtu(const struct dst_entry *dst)
 			mtu = 576;
 	}
 
+out:
 	mtu = min_t(unsigned int, mtu, IP_MAX_MTU);
 
 	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading
  2021-06-25 16:21 [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading Vadim Fedorenko
@ 2021-06-26 17:18 ` David Ahern
  2021-06-26 21:41   ` Vadim Fedorenko
  2021-06-26 23:21   ` Vadim Fedorenko
  2021-06-28 20:10 ` patchwork-bot+netdevbpf
  1 sibling, 2 replies; 6+ messages in thread
From: David Ahern @ 2021-06-26 17:18 UTC (permalink / raw)
  To: Vadim Fedorenko, David Ahern, netdev, Roopa Prabhu
  Cc: Jakub Kicinski, Hideaki YOSHIFUJI, David S. Miller

[ added Roopa's nvidia address ]

On 6/25/21 10:21 AM, Vadim Fedorenko wrote:
> Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
> fragmentation logic away from lwtunnel by carry encap headroom and
> use it in output MTU calculation. But the forwarding part was not
> covered and created difference in MTU for output and forwarding and
> further to silent drops on ipv4 forwarding path. Fix it by taking
> into account lwtunnel encap headroom.
> 
> The same commit also introduced difference in how to treat RTAX_MTU
> in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
> headroom from route MTU. Make IPv4 version do the same.
> 
> Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
> Suggested-by: David Ahern <dsahern@gmail.com>
> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
> ---
>  include/net/ip.h        | 12 ++++++++----
>  include/net/ip6_route.h | 16 ++++++++++++----
>  net/ipv4/route.c        |  3 ++-
>  3 files changed, 22 insertions(+), 9 deletions(-)
> 


I think this is the right approach - tunnel overhead should always be
considered for the mtu. Did you run the pmtu.sh selftests to make sure
those still work?

Reviewed-by: David Ahern <dsahern@kernel.org>


> diff --git a/include/net/ip.h b/include/net/ip.h
> index e20874059f82..d9683bef8684 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -31,6 +31,7 @@
>  #include <net/flow.h>
>  #include <net/flow_dissector.h>
>  #include <net/netns/hash.h>
> +#include <net/lwtunnel.h>
>  
>  #define IPV4_MAX_PMTU		65535U		/* RFC 2675, Section 5.1 */
>  #define IPV4_MIN_MTU		68			/* RFC 791 */
> @@ -445,22 +446,25 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>  
>  	/* 'forwarding = true' case should always honour route mtu */
>  	mtu = dst_metric_raw(dst, RTAX_MTU);
> -	if (mtu)
> -		return mtu;
> +	if (!mtu)
> +		mtu = min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
>  
> -	return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
> +	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
>  }
>  
>  static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
>  					  const struct sk_buff *skb)
>  {
> +	unsigned int mtu;
> +
>  	if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
>  		bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
>  
>  		return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding);
>  	}
>  
> -	return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
> +	mtu = min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
> +	return mtu - lwtunnel_headroom(skb_dst(skb)->lwtstate, mtu);
>  }
>  
>  struct dst_metrics *ip_fib_metrics_init(struct net *net, struct nlattr *fc_mx,
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index f51a118bfce8..f14149df5a65 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -265,11 +265,18 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>  
>  static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
>  {
> +	int mtu;
> +
>  	struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
>  				inet6_sk(skb->sk) : NULL;
>  
> -	return (np && np->pmtudisc >= IPV6_PMTUDISC_PROBE) ?
> -	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
> +	if (np && np->pmtudisc >= IPV6_PMTUDISC_PROBE) {
> +		mtu = READ_ONCE(skb_dst(skb)->dev->mtu);
> +		mtu -= lwtunnel_headroom(skb_dst(skb)->lwtstate, mtu);
> +	} else
> +		mtu = dst_mtu(skb_dst(skb));
> +
> +	return mtu;
>  }
>  
>  static inline bool ip6_sk_accept_pmtu(const struct sock *sk)
> @@ -317,7 +324,7 @@ static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
>  	if (dst_metric_locked(dst, RTAX_MTU)) {
>  		mtu = dst_metric_raw(dst, RTAX_MTU);
>  		if (mtu)
> -			return mtu;
> +			goto out;
>  	}
>  
>  	mtu = IPV6_MIN_MTU;
> @@ -327,7 +334,8 @@ static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
>  		mtu = idev->cnf.mtu6;
>  	rcu_read_unlock();
>  
> -	return mtu;
> +out:
> +	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
>  }
>  
>  u32 ip6_mtu_from_fib6(const struct fib6_result *res,
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6a36ac98476f..78d1e5afc452 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1306,7 +1306,7 @@ INDIRECT_CALLABLE_SCOPE unsigned int ipv4_mtu(const struct dst_entry *dst)
>  		mtu = dst_metric_raw(dst, RTAX_MTU);
>  
>  	if (mtu)
> -		return mtu;
> +		goto out;
>  
>  	mtu = READ_ONCE(dst->dev->mtu);
>  
> @@ -1315,6 +1315,7 @@ INDIRECT_CALLABLE_SCOPE unsigned int ipv4_mtu(const struct dst_entry *dst)
>  			mtu = 576;
>  	}
>  
> +out:
>  	mtu = min_t(unsigned int, mtu, IP_MAX_MTU);
>  
>  	return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading
  2021-06-26 17:18 ` David Ahern
@ 2021-06-26 21:41   ` Vadim Fedorenko
  2021-06-28 18:45     ` David Ahern
  2021-06-26 23:21   ` Vadim Fedorenko
  1 sibling, 1 reply; 6+ messages in thread
From: Vadim Fedorenko @ 2021-06-26 21:41 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev, Roopa Prabhu
  Cc: Jakub Kicinski, Hideaki YOSHIFUJI, David S. Miller

On 26.06.2021 18:18, David Ahern wrote:
> [ added Roopa's nvidia address ]
> 
> On 6/25/21 10:21 AM, Vadim Fedorenko wrote:
>> Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
>> fragmentation logic away from lwtunnel by carry encap headroom and
>> use it in output MTU calculation. But the forwarding part was not
>> covered and created difference in MTU for output and forwarding and
>> further to silent drops on ipv4 forwarding path. Fix it by taking
>> into account lwtunnel encap headroom.
>>
>> The same commit also introduced difference in how to treat RTAX_MTU
>> in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
>> headroom from route MTU. Make IPv4 version do the same.
>>
>> Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
>> Suggested-by: David Ahern <dsahern@gmail.com>
>> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
>> ---
>>   include/net/ip.h        | 12 ++++++++----
>>   include/net/ip6_route.h | 16 ++++++++++++----
>>   net/ipv4/route.c        |  3 ++-
>>   3 files changed, 22 insertions(+), 9 deletions(-)
>>
> 
> 
> I think this is the right approach - tunnel overhead should always be
> considered for the mtu. Did you run the pmtu.sh selftests to make sure
> those still work?
> 

Actually not, I was running my own tests of routing configurations with 
different types of tunnels like GRE, GUE and FOU with mpls lwtunnels to check 
consistency of calculated mtus.

Will re-run pmtu.sh but I my installation doesn't support OVS right now.

Also, I was thinking about this RTAX_MTU and I'm really in doubt. Do we actually 
want the situation when
   ip route A.B.C.D/32 encap mpls 100 dev ip6tnl1 mtu 1400
will actually require mtu=1396? Because this looks like not clear for users I 
suppose.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading
  2021-06-26 17:18 ` David Ahern
  2021-06-26 21:41   ` Vadim Fedorenko
@ 2021-06-26 23:21   ` Vadim Fedorenko
  1 sibling, 0 replies; 6+ messages in thread
From: Vadim Fedorenko @ 2021-06-26 23:21 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev, Roopa Prabhu
  Cc: Jakub Kicinski, Hideaki YOSHIFUJI, David S. Miller

On 26.06.2021 18:18, David Ahern wrote:
> [ added Roopa's nvidia address ]
> 
> On 6/25/21 10:21 AM, Vadim Fedorenko wrote:
>> Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
>> fragmentation logic away from lwtunnel by carry encap headroom and
>> use it in output MTU calculation. But the forwarding part was not
>> covered and created difference in MTU for output and forwarding and
>> further to silent drops on ipv4 forwarding path. Fix it by taking
>> into account lwtunnel encap headroom.
>>
>> The same commit also introduced difference in how to treat RTAX_MTU
>> in IPv4 and IPv6 where latter explicitly removes lwtunnel encap
>> headroom from route MTU. Make IPv4 version do the same.
>>
>> Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
>> Suggested-by: David Ahern <dsahern@gmail.com>
>> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
>> ---
>>   include/net/ip.h        | 12 ++++++++----
>>   include/net/ip6_route.h | 16 ++++++++++++----
>>   net/ipv4/route.c        |  3 ++-
>>   3 files changed, 22 insertions(+), 9 deletions(-)
>>
> 
> 
> I think this is the right approach - tunnel overhead should always be
> considered for the mtu. Did you run the pmtu.sh selftests to make sure
> those still work?
> 
pmtu.sh tests are ok for everything except OVS which are skipped on my test system.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading
  2021-06-26 21:41   ` Vadim Fedorenko
@ 2021-06-28 18:45     ` David Ahern
  0 siblings, 0 replies; 6+ messages in thread
From: David Ahern @ 2021-06-28 18:45 UTC (permalink / raw)
  To: Vadim Fedorenko, David Ahern, netdev, Roopa Prabhu
  Cc: Jakub Kicinski, Hideaki YOSHIFUJI, David S. Miller

On 6/26/21 3:41 PM, Vadim Fedorenko wrote:
>>
>> I think this is the right approach - tunnel overhead should always be
>> considered for the mtu. Did you run the pmtu.sh selftests to make sure
>> those still work?
>>
> 
> Actually not, I was running my own tests of routing configurations with
> different types of tunnels like GRE, GUE and FOU with mpls lwtunnels to
> check consistency of calculated mtus.
> 
> Will re-run pmtu.sh but I my installation doesn't support OVS right now.
> 
> Also, I was thinking about this RTAX_MTU and I'm really in doubt. Do we
> actually want the situation when
>   ip route A.B.C.D/32 encap mpls 100 dev ip6tnl1 mtu 1400
> will actually require mtu=1396? Because this looks like not clear for
> users I suppose.

It is simpler and cleaner to me for the stack to always subtract known
tunnel overhead from the MTU and not expect users to do the math which
is why I responded as this is the right approach.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading
  2021-06-25 16:21 [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading Vadim Fedorenko
  2021-06-26 17:18 ` David Ahern
@ 2021-06-28 20:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-06-28 20:10 UTC (permalink / raw)
  To: Vadim Fedorenko; +Cc: dsahern, netdev, roopa, kuba, yoshfuji, davem

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Fri, 25 Jun 2021 19:21:39 +0300 you wrote:
> Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved
> fragmentation logic away from lwtunnel by carry encap headroom and
> use it in output MTU calculation. But the forwarding part was not
> covered and created difference in MTU for output and forwarding and
> further to silent drops on ipv4 forwarding path. Fix it by taking
> into account lwtunnel encap headroom.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: lwtunnel: handle MTU calculation in forwading
    https://git.kernel.org/netdev/net/c/fade56410c22

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-28 20:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25 16:21 [PATCH net v2] net: lwtunnel: handle MTU calculation in forwading Vadim Fedorenko
2021-06-26 17:18 ` David Ahern
2021-06-26 21:41   ` Vadim Fedorenko
2021-06-28 18:45     ` David Ahern
2021-06-26 23:21   ` Vadim Fedorenko
2021-06-28 20:10 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.