All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Florian Westphal <fw@strlen.de>, David Ahern <dsahern@gmail.com>,
	netdev@vger.kernel.org, aconole@redhat.com
Subject: Re: [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets
Date: Mon, 13 Jul 2020 18:22:55 +0200	[thread overview]
Message-ID: <20200713162255.GO32005@breakpoint.cc> (raw)
In-Reply-To: <20200713175709.2a547d7c@redhat.com>

Stefano Brivio <sbrivio@redhat.com> wrote:
> > so, packets coming in on the bridge (local tx or from remote bridge port)
> > can have the enap header (50 bytes) prepended without exceeding the
> > physical link mtu.
> > 
> > When the vxlan driver calls the ip output path, this line:
> > 
> >         mtu = ip_skb_dst_mtu(sk, skb);
> > 
> > in __ip_finish_output() will fetch the MTU based of the encap socket,
> > which will now be 1450 due to that route exception.
> > 
> > So this will behave as if someone had lowered the physical link mtu to 1450:
> > IP stack drops the packet and sends an icmp error (fragmentation needed,
> > MTU 1450).  The MTU of the VXLAN port is already at 1450.
> 
> It's not clear to me why the behaviour on this path is different from
> routed traffic. I understand the impact of bridged traffic on error
> reporting, but not here.

In routing case:
1. pmtu notification is received
2. route exception is added
3. next MTU-sized packet in vxlan triggers the if () condition in
   skb_tunnel_check_pmtu()
4. skb_dst_update_pmtu() gets called, new nexthop exception is added
5. packet is dropped in ip_output (too large)
6. next MTU-sized packet to be forwarded triggers PMTU check in
   ip_forward()
7. ip_forward drops packet and sends an icmp error for new mtu (1400 in
    the example)
8. sender receives+updates path mtu
9. next packet will be small enough

In Bridge case, 4) is a noop and even if we had dst entries here,
we do not enter ip_forward path for bridged case.

> Does it have something to do with metadata-based tunnels?

No.

> Should we omit
> the call to skb_tunnel_check_pmtu() call in vxlan_xmit_one() in that
> case (if (info)) because the dst is not the same dst?

skb_dst_update_pmtu is already omitted in this scenario since dst is NULL.

> > I don't think this patch is enough to resolve PMTU in general of course,
> > after all the VXLAN peer might be unable to receive packets larger than
> > what the ICMP error announces.  But I do not know how to resolve this
> > in the general case as everyone has a differnt opinion on how (and where)
> > this needs to be handled.
> 
> The sender here is sending packets matching the MTU, interface MTUs are
> correct, so we wouldn't benefit from "extending" PMTU discovery for
> this specific problem and we can let that topic aside for now, correct?

Yes and no.  What the hack patches (not this series, the icmp error
injection series for bridge...) does is to inject a new icmp error from
the vxlan icmp error processing callback that will report an MTU of
'received mtu - vxlan_overhead' to the sender.

So, the sender receives a PMTU update for 1400 in the given scenario.

Its not nice of course, as sender emitted a MTU-sized packet (1450)
to an on-link destination, only to be told by that *alleged* on-link
destination (address spoofed by bridge) that it needs to use 1400.

I don't see any better solution, since netdev police failed to make
such setups illegal 8)

  reply	other threads:[~2020-07-13 16:22 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 20:07 [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Florian Westphal
2020-07-12 22:38   ` Stefano Brivio
2020-07-13  8:04     ` Florian Westphal
2020-07-13 10:04       ` Stefano Brivio
2020-07-13 10:51         ` Numan Siddique
2020-07-14 20:38           ` Aaron Conole
2020-07-15 11:58             ` Stefano Brivio
2020-07-13 13:25       ` David Ahern
2020-07-13 14:02         ` Florian Westphal
2020-07-13 14:41           ` David Ahern
2020-07-13 14:59             ` Florian Westphal
2020-07-13 15:57               ` Stefano Brivio
2020-07-13 16:22                 ` Florian Westphal [this message]
2020-07-14 12:33                   ` Stefano Brivio
2020-07-14 12:33           ` Stefano Brivio
2020-07-15 12:42             ` Florian Westphal
2020-07-15 13:35               ` Stefano Brivio
2020-07-15 14:33                 ` Florian Westphal
2020-07-17 12:27                   ` Stefano Brivio
2020-07-17 15:04                     ` David Ahern
2020-07-17 18:43                       ` Florian Westphal
2020-07-18  6:56                       ` Stefano Brivio
2020-07-18 17:02                         ` David Ahern
2020-07-18 17:58                           ` Stefano Brivio
2020-07-18 18:04                             ` Stefano Brivio
2020-07-19 18:43                             ` David Ahern
2020-07-19 21:49                               ` Stefano Brivio
2020-07-20  3:19                                 ` David Ahern
2020-07-26 17:01                                   ` Stefano Brivio
2020-07-12 20:07 ` [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Florian Westphal
2020-07-16 19:33   ` Jakub Kicinski
2020-07-17 10:13     ` Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Florian Westphal
2020-07-12 22:39 ` [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200713162255.GO32005@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=aconole@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.