[PATCH 1/2] net/ipv6: always honour route mtu during forwarding

From: "Maciej Żenczykowski" <zenczykowski@gmail.com>
To: "Maciej Żenczykowski" <maze@google.com>,
	"David S . Miller" <davem@davemloft.net>
Cc: Linux Network Development Mailing List <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Willem de Bruijn <willemb@google.com>,
	Lorenzo Colitti <lorenzo@google.com>,
	Sunmeet Gill <sgill@quicinc.com>,
	Vinay Paradkar <vparadka@qti.qualcomm.com>,
	Tyler Wear <twear@quicinc.com>, David Ahern <dsahern@kernel.org>
Subject: [PATCH 1/2] net/ipv6: always honour route mtu during forwarding
Date: Wed,  7 Oct 2020 20:31:01 -0700	[thread overview]
Message-ID: <20201008033102.623894-1-zenczykowski@gmail.com> (raw)

From: Maciej Żenczykowski <maze@google.com>

This matches the new ipv4 behaviour as of commit:
  commit 02a1b175b0e92d9e0fa5df3957ade8d733ceb6a0
  Author: Maciej Żenczykowski <maze@google.com>
  Date:   Wed Sep 23 13:18:15 2020 -0700

  net/ipv4: always honour route mtu during forwarding

The reasoning is similar: There doesn't seem to be any reason
why you would want to ignore route mtu.

There are two potential sources of ipv6 route mtu:
  - manually configured by NET_ADMIN, since you configured
    a route mtu explicitly you probably know best...
  - derived from mtu information from RA messages,
    but this is the network telling you what will work,
    again presumably whatever network admin configured
    the RA content knows best what the network conditions are.

One could argue that RAs can be spoofed, but if we get spoofed
RAs we're *already* screwed, and erroneous mtu information is
less dangerous then the erroneous routes themselves...
(The proper place to do RA filtering is in the switch/router)

Additionally, a reduction from 1500 to 1280 (min ipv6 mtu) is
not very noticable on performance (especially with gro/gso/tso),
while packets getting lost (due to rx buffer overruns) or
generating icmpv6 packet too big errors and needing to be
retransmitted is very noticable (guaranteed impact of full rtt)

It is pretty common to have a higher device mtu to allow receiving
large (jumbo) frames, while having some routes via that interface
(potentially including the default route to the internet) specify
a lower mtu.

There might also be use cases around xfrm/ipsec/tunnels.
Especially for something like sit/6to4/6rd, where you may have one
sit device, but traffic through it will flow over different
underlying paths and thus is per subnet and not per device.

(Note that this function does not honour pmtu, which can be spoofed
via icmpv6 messages, but see also ip6_mtu_from_fib6() which honours
pmtu for ipv6 'locked mtu' routes)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
Cc: Tyler Wear <twear@quicinc.com>
Cc: David Ahern <dsahern@kernel.org>
---
 include/net/ip6_route.h | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 2a5277758379..598415743f46 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -311,19 +311,13 @@ static inline bool rt6_duplicate_nexthop(struct fib6_info *a, struct fib6_info *
 static inline unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst)
 {
 	struct inet6_dev *idev;
-	unsigned int mtu;
+	unsigned int mtu = dst_metric_raw(dst, RTAX_MTU);
+	if (mtu)
+		return mtu;
 
-	if (dst_metric_locked(dst, RTAX_MTU)) {
-		mtu = dst_metric_raw(dst, RTAX_MTU);
-		if (mtu)
-			return mtu;
-	}
-
-	mtu = IPV6_MIN_MTU;
 	rcu_read_lock();
 	idev = __in6_dev_get(dst->dev);
-	if (idev)
-		mtu = idev->cnf.mtu6;
+	mtu = idev ? idev->cnf.mtu6 : IPV6_MIN_MTU;
 	rcu_read_unlock();
 
 	return mtu;
-- 
2.28.0.806.g8561365e88-goog