netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: <netdev@vger.kernel.org>, aconole@redhat.com
Subject: Re: [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets
Date: Mon, 13 Jul 2020 00:38:13 +0200	[thread overview]
Message-ID: <20200713003813.01f2d5d3@elisabeth> (raw)
In-Reply-To: <20200712200705.9796-2-fw@strlen.de>

Hi,

On Sun, 12 Jul 2020 22:07:03 +0200
Florian Westphal <fw@strlen.de> wrote:

> vxlan and geneve take the to-be-transmitted skb, prepend the
> encapsulation header and send the result.
> 
> Neither vxlan nor geneve can do anything about a lowered path mtu
> except notifying the peer/upper dst entry.

It could, and I think it should, update its MTU, though. I didn't
include this in the original implementation of PMTU discovery for UDP
tunnels as it worked just fine for locally generated and routed
traffic, but here we go.

As PMTU discovery happens, we have a route exception on the lower
layer for the given path, and we know that VXLAN will use that path,
so we also know there's no point in having a higher MTU on the VXLAN
device, it's really the maximum packet size we can use.

See the change to vxlan_xmit_one() in the sketch patch below.

> In routed setups, vxlan takes the updated pmtu from the encap sockets'
> dst entry and will notify/update the dst entry of the current skb.
> 
> Some setups, however, will use vxlan as a bridge port (or openvs vport).

And, on top of that, I think what we're missing on the bridge is to
update the MTU when a port lowers its MTU. The MTU is changed only as
interfaces are added, which feels like a bug. We could use the lower
layer notifier to fix this.

In the sketch below, I'm changing a few lines to adjust the MTU to the
lowest MTU value between all ports, for testing purposes.

I tried to represent the issue you're hitting with a new test case in
the pmtu.sh selftest, also included in the diff. Would that work for
Open vSwitch?

If OVS queries the MTU of VXLAN devices, I guess that should be enough.
I'm not sure it does that though. The changes to the bridge wouldn't
even be needed, but I think it's something we should also fix
eventually.

---
 drivers/net/vxlan.c                 |   13 +++++++++++++
 net/bridge/br_if.c                  |    8 +++++---
 net/bridge/br_input.c               |    6 ++++++
 tools/testing/selftests/net/pmtu.sh |   17 ++++++++++++++++-
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 5bb448ae6c9c..2e051b7366bf 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2580,6 +2580,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
 	return 0;
 }
 
+static int vxlan_change_mtu(struct net_device *dev, int new_mtu);
 static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 			   __be32 default_vni, struct vxlan_rdst *rdst,
 			   bool did_rsc)
@@ -2714,6 +2715,18 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		ndst = &rt->dst;
 		skb_tunnel_check_pmtu(skb, ndst, VXLAN_HEADROOM);
 
+		/* TODO: doesn't conflict with RFC 7348, RFC 1191, but not ideal
+		 * as we can't track PMTU increases:
+		 * - use a notifier on route cache and add a configuration field
+		 *   to track user changes
+		 * - embed logic from skb_tunnel_check_pmtu() and get this fixed
+		 *   for free for all the tunnels
+		 */
+		if (skb->len > dst_mtu(ndst) - VXLAN_HEADROOM) {
+			vxlan_change_mtu(vxlan->dev,
+					 dst_mtu(ndst) - VXLAN_HEADROOM);
+		}
+
 		tos = ip_tunnel_ecn_encap(RT_TOS(tos), old_iph, skb);
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
 		err = vxlan_build_skb(skb, ndst, sizeof(struct iphdr),
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index a0e9a7937412..6253b6d40d43 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -505,9 +505,11 @@ void br_mtu_auto_adjust(struct net_bridge *br)
 {
 	ASSERT_RTNL();
 
-	/* if the bridge MTU was manually configured don't mess with it */
-	if (br_opt_get(br, BROPT_MTU_SET_BY_USER))
-		return;
+	/* TODO: if (br_opt_get(br, BROPT_MTU_SET_BY_USER)), we should not
+	 * increase the MTU, but skipping decreases breaks functionality:
+	 * - add an 'opt' to track the set value and allow the user to decrease
+	 *   the MTU arbitrarily
+	 */
 
 	/* change to the minimum MTU and clear the flag which was set by
 	 * the bridge ndo_change_mtu callback
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 59a318b9f646..2429f70ce4ee 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -283,6 +283,12 @@ static rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 			goto drop;
 	}
 
+	/* TODO: use lower layer notifier instead. Some tunnels implement this
+	 * properly (see e.g. vti6 and pmtu_vti6_link_change_mtu selftest in
+	 * net/pmtu.sh)
+	 */
+	br_mtu_auto_adjust(p->br);
+
 	if (unlikely(is_link_local_ether_addr(dest))) {
 		u16 fwd_mask = p->br->group_fwd_mask_required;
 
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 77c09cd339c3..09731d9ea11a 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -169,7 +169,8 @@ tests="
 	cleanup_ipv4_exception		ipv4: cleanup of cached exceptions	1
 	cleanup_ipv6_exception		ipv6: cleanup of cached exceptions	1
 	list_flush_ipv4_exception	ipv4: list and flush cached exceptions	1
-	list_flush_ipv6_exception	ipv6: list and flush cached exceptions	1"
+	list_flush_ipv6_exception	ipv6: list and flush cached exceptions	1
+	pmtu_ipv4_vxlan4_exception_bridge	IPv4 over vxlan4 with bridge		1"
 
 NS_A="ns-A"
 NS_B="ns-B"
@@ -864,6 +865,20 @@ test_pmtu_ipv4_vxlan4_exception() {
 	test_pmtu_ipvX_over_vxlanY_or_geneveY_exception vxlan  4 4
 }
 
+test_pmtu_ipv4_vxlan4_exception_bridge() {
+	test_pmtu_ipvX_over_vxlanY_or_geneveY_exception vxlan  4 4
+
+	ip -n ns-A link add br0 type bridge
+	ip -n ns-A link set br0 up
+	ip -n ns-A link set dev br0 mtu 5000
+	ip -n ns-A link set vxlan_a master br0
+
+	ip -n ns-A addr del 192.168.2.1/24 dev vxlan_a
+	ip -n ns-A addr add 192.168.2.1/24 dev br0
+
+	ping -c 1 -w 2 -M want -s 5000 192.168.2.2
+}
+
 test_pmtu_ipv6_vxlan4_exception() {
 	test_pmtu_ipvX_over_vxlanY_or_geneveY_exception vxlan  6 4
 }


-- 
Stefano


  reply	other threads:[~2020-07-12 22:38 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 20:07 [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Florian Westphal
2020-07-12 22:38   ` Stefano Brivio [this message]
2020-07-13  8:04     ` Florian Westphal
2020-07-13 10:04       ` Stefano Brivio
2020-07-13 10:51         ` Numan Siddique
2020-07-14 20:38           ` Aaron Conole
2020-07-15 11:58             ` Stefano Brivio
2020-07-13 13:25       ` David Ahern
2020-07-13 14:02         ` Florian Westphal
2020-07-13 14:41           ` David Ahern
2020-07-13 14:59             ` Florian Westphal
2020-07-13 15:57               ` Stefano Brivio
2020-07-13 16:22                 ` Florian Westphal
2020-07-14 12:33                   ` Stefano Brivio
2020-07-14 12:33           ` Stefano Brivio
2020-07-15 12:42             ` Florian Westphal
2020-07-15 13:35               ` Stefano Brivio
2020-07-15 14:33                 ` Florian Westphal
2020-07-17 12:27                   ` Stefano Brivio
2020-07-17 15:04                     ` David Ahern
2020-07-17 18:43                       ` Florian Westphal
2020-07-18  6:56                       ` Stefano Brivio
2020-07-18 17:02                         ` David Ahern
2020-07-18 17:58                           ` Stefano Brivio
2020-07-18 18:04                             ` Stefano Brivio
2020-07-19 18:43                             ` David Ahern
2020-07-19 21:49                               ` Stefano Brivio
2020-07-20  3:19                                 ` David Ahern
2020-07-26 17:01                                   ` Stefano Brivio
2020-07-12 20:07 ` [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Florian Westphal
2020-07-16 19:33   ` Jakub Kicinski
2020-07-17 10:13     ` Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Florian Westphal
2020-07-12 22:39 ` [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200713003813.01f2d5d3@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=aconole@redhat.com \
    --cc=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).