netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Numan Siddique <nusiddiq@redhat.com>
To: Stefano Brivio <sbrivio@redhat.com>
Cc: Florian Westphal <fw@strlen.de>, netdev <netdev@vger.kernel.org>,
	Aaron Conole <aconole@redhat.com>
Subject: Re: [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets
Date: Mon, 13 Jul 2020 16:21:15 +0530	[thread overview]
Message-ID: <CAH=CPzopMgQ=RU2jCSqDxM3ghtTMGZLBiPoh+3k4wXnGEeC+fw@mail.gmail.com> (raw)
In-Reply-To: <20200713120158.665a6677@elisabeth>

On Mon, Jul 13, 2020 at 3:34 PM Stefano Brivio <sbrivio@redhat.com> wrote:
>
> On Mon, 13 Jul 2020 10:04:13 +0200
> Florian Westphal <fw@strlen.de> wrote:
>
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> > > Hi,
> > >
> > > On Sun, 12 Jul 2020 22:07:03 +0200
> > > Florian Westphal <fw@strlen.de> wrote:
> > >
> > > > vxlan and geneve take the to-be-transmitted skb, prepend the
> > > > encapsulation header and send the result.
> > > >
> > > > Neither vxlan nor geneve can do anything about a lowered path mtu
> > > > except notifying the peer/upper dst entry.
> > >
> > > It could, and I think it should, update its MTU, though. I didn't
> > > include this in the original implementation of PMTU discovery for UDP
> > > tunnels as it worked just fine for locally generated and routed
> > > traffic, but here we go.
> >
> > I don't think its a good idea to muck with network config in response
> > to untrusted entity.
>
> I agree that this (changing MTU on VXLAN) looks like a further step,
> but the practical effect is zero: we can't send those packets already
> today.
>
> PMTU discovery has security impacts, and they are mentioned in the
> RFCs. Also here, we wouldn't increase the MTU as a result, and if the
> entity is considered untrusted, considerations from RFC 8201 and RFC
> 4890 cover that.
>
> In practice, we might have broken networks, but at a practical level, I
> guess it's enough to not make the situation any worse.
>
> > > As PMTU discovery happens, we have a route exception on the lower
> > > layer for the given path, and we know that VXLAN will use that path,
> > > so we also know there's no point in having a higher MTU on the VXLAN
> > > device, it's really the maximum packet size we can use.
> >
> > No, in the setup that prompted this series the route exception is wrong.
> > The current "fix" is a shell script that flushes the exception as soon
> > as its added to keep the tunnel working...
>
> Oh, oops.
>
> Well, as I mentioned, if this is breaking setups and this series is the
> only way to fix things, I have nothing against it, we can still work on
> a more comprehensive solution (including the bridge) once we have it.
>
> > > > Some setups, however, will use vxlan as a bridge port (or openvs vport).
> > >
> > > And, on top of that, I think what we're missing on the bridge is to
> > > update the MTU when a port lowers its MTU. The MTU is changed only as
> > > interfaces are added, which feels like a bug. We could use the lower
> > > layer notifier to fix this.
> >
> > I will defer to someone who knows bridges better but I think that
> > in bridge case we 100% depend on a human to set everything.
>
> Not entirely, MTU is auto-adjusted when interfaces are added (unless
> the user set it explicitly), however:
>
> > bridge might be forwarding frames of non-ip protocol and I worry that
> > this is a self-induced DoS when we start to alter configuration behind
> > sysadmins back.
>
> ...yes, I agree that the matter with the bridge is different. And we
> don't know if that fixes anything else than the selftest I showed, so
> let's forget about the bridge for a moment.
>
> > > I tried to represent the issue you're hitting with a new test case in
> > > the pmtu.sh selftest, also included in the diff. Would that work for
> > > Open vSwitch?
> >
> > No idea, I don't understand how it can work at all, we can't 'chop
> > up'/mangle l2 frame in arbitrary fashion to somehow make them pass to
> > the output port.  We also can't influence MTU config of the links peer.
>
> Sorry I didn't expand right away.
>
> In the test case I showed, it works because at that point sending
> packets to the bridge will result in an error, and the (local) sender
> fragments. Let's set this aside together with the bridge affair, though.
>
> Back to VXLAN and OVS: OVS implements a "check_pkt_len" action, cf.
> commit 4d5ec89fc8d1 ("net: openvswitch: Add a new action
> check_pkt_len"), that should be used when packets exceed link MTUs:
>
>   With the help of this action, OVN will check the packet length
>   and if it is greater than the MTU size, it will generate an
>   ICMP packet (type 3, code 4) and includes the next hop mtu in it
>   so that the sender can fragment the packets.
>
> and my understanding is that this can only work if we reflect the
> effective MTU on the device itself (including VXLAN).
>

check_pkt_len is OVS datapath action and the corresponding OVS action
is  check_pkt_larger.

Logically It is expected to use this way in the OVS flows- >
    reg0[0] = check_pkt_larger(1500);
    if reg0[0[ == 1; then take some action.

In the case of OVN, if the register reg0[0] bit is set, then we
generate ICMP error packet (type 3, code 4).

I don't know the requirements or the issue this patch is trying to
address. But I think for OVS, there has to be
a controller (like OVN) which makes use of the check_pkt_larger action
and takes necessary action by adding
appropriate OF flows based on the result of check_pkt_larger.

Thanks
Numan

>
> Side note: I'm not fond of the idea behind that OVS action because I
> think it competes with the kernel (and with ICMP itself, or PLPMTUD if
> ICMP is not an option) to do PMTU discovery.
>
> However, if that already works for OVS (I really don't know. Aaron,
> Numan?), perhaps you could simply consider going with that solution...
>
> --
> Stefano
>


  reply	other threads:[~2020-07-13 10:51 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 20:07 [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Florian Westphal
2020-07-12 22:38   ` Stefano Brivio
2020-07-13  8:04     ` Florian Westphal
2020-07-13 10:04       ` Stefano Brivio
2020-07-13 10:51         ` Numan Siddique [this message]
2020-07-14 20:38           ` Aaron Conole
2020-07-15 11:58             ` Stefano Brivio
2020-07-13 13:25       ` David Ahern
2020-07-13 14:02         ` Florian Westphal
2020-07-13 14:41           ` David Ahern
2020-07-13 14:59             ` Florian Westphal
2020-07-13 15:57               ` Stefano Brivio
2020-07-13 16:22                 ` Florian Westphal
2020-07-14 12:33                   ` Stefano Brivio
2020-07-14 12:33           ` Stefano Brivio
2020-07-15 12:42             ` Florian Westphal
2020-07-15 13:35               ` Stefano Brivio
2020-07-15 14:33                 ` Florian Westphal
2020-07-17 12:27                   ` Stefano Brivio
2020-07-17 15:04                     ` David Ahern
2020-07-17 18:43                       ` Florian Westphal
2020-07-18  6:56                       ` Stefano Brivio
2020-07-18 17:02                         ` David Ahern
2020-07-18 17:58                           ` Stefano Brivio
2020-07-18 18:04                             ` Stefano Brivio
2020-07-19 18:43                             ` David Ahern
2020-07-19 21:49                               ` Stefano Brivio
2020-07-20  3:19                                 ` David Ahern
2020-07-26 17:01                                   ` Stefano Brivio
2020-07-12 20:07 ` [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Florian Westphal
2020-07-16 19:33   ` Jakub Kicinski
2020-07-17 10:13     ` Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Florian Westphal
2020-07-12 22:39 ` [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH=CPzopMgQ=RU2jCSqDxM3ghtTMGZLBiPoh+3k4wXnGEeC+fw@mail.gmail.com' \
    --to=nusiddiq@redhat.com \
    --cc=aconole@redhat.com \
    --cc=fw@strlen.de \
    --cc=netdev@vger.kernel.org \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).