From: Florian Westphal <fw@strlen.de>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Florian Westphal <fw@strlen.de>,
netdev@vger.kernel.org, aconole@redhat.com, sbrivio@redhat.com
Subject: Re: [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket
Date: Fri, 17 Jul 2020 12:13:48 +0200 [thread overview]
Message-ID: <20200717101348.GS32005@breakpoint.cc> (raw)
In-Reply-To: <20200716123301.07a1c723@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Jakub Kicinski <kuba@kernel.org> wrote:
> On Sun, 12 Jul 2020 22:07:04 +0200 Florian Westphal wrote:
> > While its already possible to configure VXLAN to never set the DF bit
> > on packets that it sends, it was not yet possible to tell kernel to
> > not update the encapsulation sockets path MTU.
> >
> > This can be used to tell ip stack to always use the interface MTU
> > when VXLAN wants to send a packet.
> >
> > When packets are routed, VXLAN use skbs existing dst entries to
> > propagate the MTU update to the overlay, but on a bridge this doesn't
> > work (no routing, no dst entry, and no ip forwarding takes place, so
> > nothing emits icmp packet w. mtu update to sender).
> >
> > This is only useful when VXLAN is used as a bridge port and the
> > network is known to accept packets up to the link MTU to avoid bogus
> > pmtu icmp packets from stopping tunneled traffic.
> >
> > Signed-off-by: Florian Westphal <fw@strlen.de>
>
> Seems like a general problem for L2 tunnels, could you broaden the CC
> list a little, perhaps? Maybe there is a best practice here we can
> follow?
Yes, this is a general problem.
I will send a v2 with an expanded cover letter/cc list.
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index a43c97b13924..ceb2940a2a62 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -3316,6 +3316,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
> > [IFLA_VXLAN_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG },
> > [IFLA_VXLAN_TTL_INHERIT] = { .type = NLA_FLAG },
> > [IFLA_VXLAN_DF] = { .type = NLA_U8 },
> > + [IFLA_VXLAN_PMTUDISC] = { .type = NLA_U8 },
>
> NLA_POLICY_RANGE?
Done.
> Also I'm not seeing .strict_start_type here.
Did not know this was established practice to set it unconditionally
for new attributes. Will make this change.
> > + if (data[IFLA_VXLAN_PMTUDISC]) {
> > + int pmtuv = nla_get_u8(data[IFLA_VXLAN_PMTUDISC]);
> > +
> > + if (pmtuv < IP_PMTUDISC_DONT) {
> > + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_PMTUDISC], "PMTUDISC Value < 0");
> > + return -EOPNOTSUPP;
> > + }
> > + if (pmtuv > IP_PMTUDISC_OMIT) {
> > + NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_PMTUDISC], "PMTUDISC Value > IP_PMTUDISC_OMIT");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + conf->pmtudisc = 1;
> > + conf->pmtudiscv = pmtuv;
> > + }
>
> Can these conflict with DF settings?
Not really. DF setting only controls wheter outer header has
DF set or not.
So, for DF-clear mode, this patch isn't needed from a functional
point of view, as endpoint will fragment packets.
For inherit/set mode, traffic will get blocked forever.
There are two cases to consider:
1. Setup is known to handle MTU-sized packets, but something
generates a bogus route exception (perhaps during some
reconfiguration in the datacenter..?).
2. The route exception is genuine, i.e. ignoring the exception
doesn't solve anything and tunneled traffic will be tossed/dropped
by middlebox.
For 2) we will need something else entirely, but what/where/who
is reponsible to handle this is unknown at the moment.
So, this is just about 1). I will rewrite the cover letter to make
this more clear.
> > diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> > index 3a41627cbdfe..1414cfa2005f 100644
> > --- a/include/net/vxlan.h
> > +++ b/include/net/vxlan.h
> > @@ -220,6 +220,8 @@ struct vxlan_config {
> > unsigned long age_interval;
> > unsigned int addrmax;
> > bool no_share;
> > + u8 pmtudisc:1;
> > + u8 pmtudiscv:3;
>
> I must say for my myopic eyes discerning pmtudisc vs pmtudiscv took an
> effort. Could you find better names?
Ugh, I'm not good at that. I went with 'set_pmtudisc' and
'pmtudisc_value' for now.
Thanks for reviewing.
next prev parent reply other threads:[~2020-07-17 10:13 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-12 20:07 [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Florian Westphal
2020-07-12 22:38 ` Stefano Brivio
2020-07-13 8:04 ` Florian Westphal
2020-07-13 10:04 ` Stefano Brivio
2020-07-13 10:51 ` Numan Siddique
2020-07-14 20:38 ` Aaron Conole
2020-07-15 11:58 ` Stefano Brivio
2020-07-13 13:25 ` David Ahern
2020-07-13 14:02 ` Florian Westphal
2020-07-13 14:41 ` David Ahern
2020-07-13 14:59 ` Florian Westphal
2020-07-13 15:57 ` Stefano Brivio
2020-07-13 16:22 ` Florian Westphal
2020-07-14 12:33 ` Stefano Brivio
2020-07-14 12:33 ` Stefano Brivio
2020-07-15 12:42 ` Florian Westphal
2020-07-15 13:35 ` Stefano Brivio
2020-07-15 14:33 ` Florian Westphal
2020-07-17 12:27 ` Stefano Brivio
2020-07-17 15:04 ` David Ahern
2020-07-17 18:43 ` Florian Westphal
2020-07-18 6:56 ` Stefano Brivio
2020-07-18 17:02 ` David Ahern
2020-07-18 17:58 ` Stefano Brivio
2020-07-18 18:04 ` Stefano Brivio
2020-07-19 18:43 ` David Ahern
2020-07-19 21:49 ` Stefano Brivio
2020-07-20 3:19 ` David Ahern
2020-07-26 17:01 ` Stefano Brivio
2020-07-12 20:07 ` [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Florian Westphal
2020-07-16 19:33 ` Jakub Kicinski
2020-07-17 10:13 ` Florian Westphal [this message]
2020-07-12 20:07 ` [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Florian Westphal
2020-07-12 22:39 ` [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200717101348.GS32005@breakpoint.cc \
--to=fw@strlen.de \
--cc=aconole@redhat.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).