All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Florian Westphal <fw@strlen.de>,
	netdev@vger.kernel.org, aconole@redhat.com, sbrivio@redhat.com
Subject: Re: [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket
Date: Fri, 17 Jul 2020 12:13:48 +0200	[thread overview]
Message-ID: <20200717101348.GS32005@breakpoint.cc> (raw)
In-Reply-To: <20200716123301.07a1c723@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

Jakub Kicinski <kuba@kernel.org> wrote:
> On Sun, 12 Jul 2020 22:07:04 +0200 Florian Westphal wrote:
> > While its already possible to configure VXLAN to never set the DF bit
> > on packets that it sends, it was not yet possible to tell kernel to
> > not update the encapsulation sockets path MTU.
> > 
> > This can be used to tell ip stack to always use the interface MTU
> > when VXLAN wants to send a packet.
> > 
> > When packets are routed, VXLAN use skbs existing dst entries to
> > propagate the MTU update to the overlay, but on a bridge this doesn't
> > work (no routing, no dst entry, and no ip forwarding takes place, so
> > nothing emits icmp packet w. mtu update to sender).
> > 
> > This is only useful when VXLAN is used as a bridge port and the
> > network is known to accept packets up to the link MTU to avoid bogus
> > pmtu icmp packets from stopping tunneled traffic.
> > 
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> Seems like a general problem for L2 tunnels, could you broaden the CC
> list a little, perhaps? Maybe there is a best practice here we can
> follow?

Yes, this is a general problem.
I will send a v2 with an expanded cover letter/cc list.

> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index a43c97b13924..ceb2940a2a62 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -3316,6 +3316,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
> >  	[IFLA_VXLAN_REMCSUM_NOPARTIAL]	= { .type = NLA_FLAG },
> >  	[IFLA_VXLAN_TTL_INHERIT]	= { .type = NLA_FLAG },
> >  	[IFLA_VXLAN_DF]		= { .type = NLA_U8 },
> > +	[IFLA_VXLAN_PMTUDISC]	= { .type = NLA_U8 },
> 
> NLA_POLICY_RANGE?

Done.

> Also I'm not seeing .strict_start_type here.

Did not know this was established practice to set it unconditionally
for new attributes.  Will make this change.

> > +	if (data[IFLA_VXLAN_PMTUDISC]) {
> > +		int pmtuv = nla_get_u8(data[IFLA_VXLAN_PMTUDISC]);
> > +
> > +		if (pmtuv < IP_PMTUDISC_DONT) {
> > +			NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_PMTUDISC], "PMTUDISC Value < 0");
> > +			return -EOPNOTSUPP;
> > +		}
> > +		if (pmtuv > IP_PMTUDISC_OMIT) {
> > +			NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_PMTUDISC], "PMTUDISC Value > IP_PMTUDISC_OMIT");
> > +			return -EOPNOTSUPP;
> > +		}
> > +
> > +		conf->pmtudisc = 1;
> > +		conf->pmtudiscv = pmtuv;
> > +	}
> 
> Can these conflict with DF settings?

Not really.  DF setting only controls wheter outer header has
DF set or not.

So, for DF-clear mode, this patch isn't needed from a functional
point of view, as endpoint will fragment packets.

For inherit/set mode, traffic will get blocked forever.

There are two cases to consider:
1. Setup is known to handle MTU-sized packets, but something
   generates a bogus route exception (perhaps during some
   reconfiguration in the datacenter..?).
2. The route exception is genuine, i.e. ignoring the exception
   doesn't solve anything and tunneled traffic will be tossed/dropped
   by middlebox.

For 2) we will need something else entirely, but what/where/who
is reponsible to handle this is unknown at the moment.

So, this is just about 1).  I will rewrite the cover letter to make
this more clear.

> > diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> > index 3a41627cbdfe..1414cfa2005f 100644
> > --- a/include/net/vxlan.h
> > +++ b/include/net/vxlan.h
> > @@ -220,6 +220,8 @@ struct vxlan_config {
> >  	unsigned long		age_interval;
> >  	unsigned int		addrmax;
> >  	bool			no_share;
> > +	u8			pmtudisc:1;
> > +	u8			pmtudiscv:3;
> 
> I must say for my myopic eyes discerning pmtudisc vs pmtudiscv took an
> effort. Could you find better names?

Ugh, I'm not good at that.  I went with 'set_pmtudisc' and
'pmtudisc_value' for now.

Thanks for reviewing.

  reply	other threads:[~2020-07-17 10:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-12 20:07 [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Florian Westphal
2020-07-12 20:07 ` [PATCH net-next 1/3] udp_tunnel: allow to turn off path mtu discovery on encap sockets Florian Westphal
2020-07-12 22:38   ` Stefano Brivio
2020-07-13  8:04     ` Florian Westphal
2020-07-13 10:04       ` Stefano Brivio
2020-07-13 10:51         ` Numan Siddique
2020-07-14 20:38           ` Aaron Conole
2020-07-15 11:58             ` Stefano Brivio
2020-07-13 13:25       ` David Ahern
2020-07-13 14:02         ` Florian Westphal
2020-07-13 14:41           ` David Ahern
2020-07-13 14:59             ` Florian Westphal
2020-07-13 15:57               ` Stefano Brivio
2020-07-13 16:22                 ` Florian Westphal
2020-07-14 12:33                   ` Stefano Brivio
2020-07-14 12:33           ` Stefano Brivio
2020-07-15 12:42             ` Florian Westphal
2020-07-15 13:35               ` Stefano Brivio
2020-07-15 14:33                 ` Florian Westphal
2020-07-17 12:27                   ` Stefano Brivio
2020-07-17 15:04                     ` David Ahern
2020-07-17 18:43                       ` Florian Westphal
2020-07-18  6:56                       ` Stefano Brivio
2020-07-18 17:02                         ` David Ahern
2020-07-18 17:58                           ` Stefano Brivio
2020-07-18 18:04                             ` Stefano Brivio
2020-07-19 18:43                             ` David Ahern
2020-07-19 21:49                               ` Stefano Brivio
2020-07-20  3:19                                 ` David Ahern
2020-07-26 17:01                                   ` Stefano Brivio
2020-07-12 20:07 ` [PATCH net-next 2/3] vxlan: allow to disable path mtu learning on encap socket Florian Westphal
2020-07-16 19:33   ` Jakub Kicinski
2020-07-17 10:13     ` Florian Westphal [this message]
2020-07-12 20:07 ` [PATCH net-next 3/3] geneve: allow disabling of pmtu detection on encap sk Florian Westphal
2020-07-12 22:39 ` [PATCH net-next 0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200717101348.GS32005@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=aconole@redhat.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sbrivio@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.