All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: David Ahern <dsahern@gmail.com>
Cc: Florian Westphal <fw@strlen.de>,
	Stefano Brivio <sbrivio@redhat.com>,
	netdev@vger.kernel.org, aconole@redhat.com
Subject: Re: [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets
Date: Mon, 13 Jul 2020 16:59:11 +0200	[thread overview]
Message-ID: <20200713145911.GN32005@breakpoint.cc> (raw)
In-Reply-To: <a6821eac-82f8-0d9e-6388-ea6c9f5535d1@gmail.com>

David Ahern <dsahern@gmail.com> wrote:
> On 7/13/20 8:02 AM, Florian Westphal wrote:
> > David Ahern <dsahern@gmail.com> wrote:
> >> On 7/13/20 2:04 AM, Florian Westphal wrote:
> >>>> As PMTU discovery happens, we have a route exception on the lower
> >>>> layer for the given path, and we know that VXLAN will use that path,
> >>>> so we also know there's no point in having a higher MTU on the VXLAN
> >>>> device, it's really the maximum packet size we can use.
> >>> No, in the setup that prompted this series the route exception is wrong.
> >>
> >> Why is the exception wrong and why can't the exception code be fixed to
> >> include tunnel headers?
> > 
> > I don't know.  This occurs in a 3rd party (read: "cloud") environment.
> > After some days, tcp connections on the overlay network hang.
> > 
> > Flushing the route exception in the namespace of the vxlan interface makes
> > the traffic flow again, i.e. if the vxlan tunnel would just use the
> > physical devices MTU things would be fine.
> > 
> > I don't know what you mean by 'fix exception code to include tunnel
> > headers'.  Can you elaborate?
> 
> lwtunnel has lwtunnel_headroom which allows ipv4_mtu to accommodate the
> space needed for the encap header. Can something similar be adapted for
> the device based tunnels?

I don't see how it would help for this particular problem.

> > AFAICS everyhing functions as designed, except:
> > 1. The route exception should not exist in first place in this case
> > 2. The route exception never times out (gets refreshed every time
> >    tunnel tries to send a mtu-sized packet).
> > 3. The original sender never learns about the pmtu event
> 
> meaning the VM / container? ie., this is a VPC using VxLAN in the host
> to send packets to another hypervisor. If that is the case why isn't the
> underlay MTU bumped to handle the encap header, or the VMs MTU lowered
> to handle the encap header? seems like a config problem.

Its configured properly:

ovs bridge mtu: 1450
vxlan device mtu: 1450
physical link: 1500

so, packets coming in on the bridge (local tx or from remote bridge port)
can have the enap header (50 bytes) prepended without exceeding the
physical link mtu.

When the vxlan driver calls the ip output path, this line:

        mtu = ip_skb_dst_mtu(sk, skb);

in __ip_finish_output() will fetch the MTU based of the encap socket,
which will now be 1450 due to that route exception.

So this will behave as if someone had lowered the physical link mtu to 1450:
IP stack drops the packet and sends an icmp error (fragmentation needed,
MTU 1450).  The MTU of the VXLAN port is already at 1450.

I could make a patch that lowers the vxlan port MTU to 1450 - 50 (encap
overhead) automatically, but I don't think making such change
automatically is a good idea.

With this proposed patch, the MTU retrieved would always be the link
MTU.

I don't think this patch is enough to resolve PMTU in general of course,
after all the VXLAN peer might be unable to receive packets larger than
what the ICMP error announces.  But I do not know how to resolve this
in the general case as everyone has a differnt opinion on how (and where)
this needs to be handled.

  reply	other threads:[~2020-07-13 14:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 20:07 [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Florian Westphal
2020-07-12 22:38   ` Stefano Brivio
2020-07-13  8:04     ` Florian Westphal
2020-07-13 10:04       ` Stefano Brivio
2020-07-13 10:51         ` Numan Siddique
2020-07-14 20:38           ` Aaron Conole
2020-07-15 11:58             ` Stefano Brivio
2020-07-13 13:25       ` David Ahern
2020-07-13 14:02         ` Florian Westphal
2020-07-13 14:41           ` David Ahern
2020-07-13 14:59             ` Florian Westphal [this message]
2020-07-13 15:57               ` Stefano Brivio
2020-07-13 16:22                 ` Florian Westphal
2020-07-14 12:33                   ` Stefano Brivio
2020-07-14 12:33           ` Stefano Brivio
2020-07-15 12:42             ` Florian Westphal
2020-07-15 13:35               ` Stefano Brivio
2020-07-15 14:33                 ` Florian Westphal
2020-07-17 12:27                   ` Stefano Brivio
2020-07-17 15:04                     ` David Ahern
2020-07-17 18:43                       ` Florian Westphal
2020-07-18  6:56                       ` Stefano Brivio
2020-07-18 17:02                         ` David Ahern
2020-07-18 17:58                           ` Stefano Brivio
2020-07-18 18:04                             ` Stefano Brivio
2020-07-19 18:43                             ` David Ahern
2020-07-19 21:49                               ` Stefano Brivio
2020-07-20  3:19                                 ` David Ahern
2020-07-26 17:01                                   ` Stefano Brivio
2020-07-12 20:07 ` [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Florian Westphal
2020-07-16 19:33   ` Jakub Kicinski
2020-07-17 10:13     ` Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Florian Westphal
2020-07-12 22:39 ` [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200713145911.GN32005@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=aconole@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.