netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
@ 2016-09-22 20:52 R. Parameswaran
  2016-09-27  7:31 ` David Miller
  2016-09-29 15:18 ` James Chapman
  0 siblings, 2 replies; 15+ messages in thread
From: R. Parameswaran @ 2016-09-22 20:52 UTC (permalink / raw)
  To: kleptog, jchapman, netdev
  Cc: davem, linux-kernel, nprachan, rshearma, dfawcus, stephen, acme,
	lboccass, parameswaran.r7

>From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
From: "R. Parameswaran" <rparames@brocade.com>
Date: Thu, 22 Sep 2016 13:19:25 -0700
Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

Take into account all of the tunnel encapsulation headers when setting
up the MTU on the L2TP logical interface device. Otherwise, packets
created by the applications on top of the L2TP layer are larger
than they ought to be, relative to the underlay MTU, leading to
needless fragmentation once the outer IP encap is added.

Specifically, take into account the (outer, underlay) IP header
imposed on the encapsulated L2TP packet, and the Layer 2 header
imposed on the inner IP packet prior to L2TP encapsulation.

Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
and the dst entry in the L2TP tunnel socket to directly pull up
the underlay MTU (as the baseline number on top of which the
encapsulation headers are factored in).  Fall back to Ethernet MTU
if this fails.

Signed-off-by: R. Parameswaran <rparames@brocade.com>

Reviewed-by: "N. Prachanda" <nprachan@brocade.com>,
Reviewed-by: "R. Shearman" <rshearma@brocade.com>,
Reviewed-by: "D. Fawcus" <dfawcus@brocade.com>
---
 net/l2tp/l2tp_eth.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 57fc5a4..dbcd6bd 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -30,6 +30,9 @@
 #include <net/xfrm.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/udp.h>
 
 #include "l2tp_core.h"
 
@@ -206,6 +209,46 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
 }
 #endif
 
+static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
+				struct l2tp_session *session,
+				struct net_device *dev)
+{
+	unsigned int overhead = 0;
+	struct dst_entry *dst;
+
+	if (session->mtu != 0) {
+		dev->mtu = session->mtu;
+		dev->needed_headroom += session->hdr_len;
+		if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+			dev->needed_headroom += sizeof(struct udphdr);
+		return;
+	}
+	overhead = session->hdr_len;
+	/* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
+	if (tunnel->sock->sk_family == AF_INET)
+		overhead += (ETH_HLEN + sizeof(struct iphdr));
+	else if (tunnel->sock->sk_family == AF_INET6)
+		overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
+	/* Additionally, if the encap is UDP, account for UDP header size */
+	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+		overhead += sizeof(struct udphdr);
+	/* If PMTU discovery was enabled, use discovered MTU on L2TP device */
+	dst = sk_dst_get(tunnel->sock);
+	if (dst) {
+		u32 pmtu = dst_mtu(dst);
+
+		if (pmtu != 0)
+			dev->mtu = pmtu;
+		dst_release(dst);
+	}
+	/* else (no PMTUD) L2TP dev MTU defaulted to Ethernet MTU in caller */
+	session->mtu = dev->mtu - overhead;
+	dev->mtu = session->mtu;
+	dev->needed_headroom += session->hdr_len;
+	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+		dev->needed_headroom += sizeof(struct udphdr);
+}
+
 static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
 {
 	struct net_device *dev;
@@ -255,11 +298,8 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
 	}
 
 	dev_net_set(dev, net);
-	if (session->mtu == 0)
-		session->mtu = dev->mtu - session->hdr_len;
-	dev->mtu = session->mtu;
-	dev->needed_headroom += session->hdr_len;
 
+	l2tp_eth_adjust_mtu(tunnel, session, dev);
 	priv = netdev_priv(dev);
 	priv->dev = dev;
 	priv->session = session;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-22 20:52 [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2 R. Parameswaran
@ 2016-09-27  7:31 ` David Miller
  2016-09-27 19:17   ` R. Parameswaran
  2016-09-29 15:18 ` James Chapman
  1 sibling, 1 reply; 15+ messages in thread
From: David Miller @ 2016-09-27  7:31 UTC (permalink / raw)
  To: parameswaran.r7
  Cc: kleptog, jchapman, netdev, linux-kernel, nprachan, rshearma,
	dfawcus, stephen, acme, lboccass

From: "R. Parameswaran" <parameswaran.r7@gmail.com>
Date: Thu, 22 Sep 2016 13:52:43 -0700 (PDT)

> From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> From: "R. Parameswaran" <rparames@brocade.com>
> Date: Thu, 22 Sep 2016 13:19:25 -0700
> Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
> 
> Take into account all of the tunnel encapsulation headers when setting
> up the MTU on the L2TP logical interface device. Otherwise, packets
> created by the applications on top of the L2TP layer are larger
> than they ought to be, relative to the underlay MTU, leading to
> needless fragmentation once the outer IP encap is added.
> 
> Specifically, take into account the (outer, underlay) IP header
> imposed on the encapsulated L2TP packet, and the Layer 2 header
> imposed on the inner IP packet prior to L2TP encapsulation.
> 
> Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> and the dst entry in the L2TP tunnel socket to directly pull up
> the underlay MTU (as the baseline number on top of which the
> encapsulation headers are factored in).  Fall back to Ethernet MTU
> if this fails.
> 
> Signed-off-by: R. Parameswaran <rparames@brocade.com>
> 
> Reviewed-by: "N. Prachanda" <nprachan@brocade.com>,
> Reviewed-by: "R. Shearman" <rshearma@brocade.com>,
> Reviewed-by: "D. Fawcus" <dfawcus@brocade.com>

I have to ask, how do other tunnels over UDP such as VXLAN handle
this problem?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-27  7:31 ` David Miller
@ 2016-09-27 19:17   ` R. Parameswaran
  2016-09-28  7:48     ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: R. Parameswaran @ 2016-09-27 19:17 UTC (permalink / raw)
  To: David Miller
  Cc: parameswaran.r7, kleptog, jchapman, netdev, linux-kernel,
	nprachan, rshearma, dfawcus, stephen, acme, lboccass


Hi David,

Thanks for the reply, please see inline:

On Tue, 27 Sep 2016, David Miller wrote:

> From: "R. Parameswaran" <parameswaran.r7@gmail.com>
> Date: Thu, 22 Sep 2016 13:52:43 -0700 (PDT)
> 
> > From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> > From: "R. Parameswaran" <rparames@brocade.com>
> > Date: Thu, 22 Sep 2016 13:19:25 -0700
> > Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
> > 
> > Take into account all of the tunnel encapsulation headers when setting
> > up the MTU on the L2TP logical interface device. Otherwise, packets
> > created by the applications on top of the L2TP layer are larger
> > than they ought to be, relative to the underlay MTU, leading to
> > needless fragmentation once the outer IP encap is added.
> > 
> > Specifically, take into account the (outer, underlay) IP header
> > imposed on the encapsulated L2TP packet, and the Layer 2 header
> > imposed on the inner IP packet prior to L2TP encapsulation.
> > 
> > Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> > and the dst entry in the L2TP tunnel socket to directly pull up
> > the underlay MTU (as the baseline number on top of which the
> > encapsulation headers are factored in).  Fall back to Ethernet MTU
> > if this fails.
> > 
> > Signed-off-by: R. Parameswaran <rparames@brocade.com>
> > 
> > Reviewed-by: "N. Prachanda" <nprachan@brocade.com>,
> > Reviewed-by: "R. Shearman" <rshearma@brocade.com>,
> > Reviewed-by: "D. Fawcus" <dfawcus@brocade.com>
> 
> I have to ask, how do other tunnels over UDP such as VXLAN handle
> this problem?
> 

Specific to Vxlan, it appears to behave similarly.  I haven't functionally 
tested fragmentation on vxlan interfaces, but looking at the
code, it seems to account for the headers involved:


When the vxlan interface is created, from vxlan_dev_create(), in 
vxlan_setup(), it initially starts off with an ethernet MTU:

vxlan_setup(struct net_device *dev)
{
...
...
        ether_setup(dev); <<<<<<< Will set device MTU to 1500


Later, in vxlan_dev_configure(), called from vxlan_dev_create(), it gets 
adjusted to account for the headers:

vxlan_dev_configure():
...
 if (!conf->mtu)
                        dev->mtu = lowerdev->mtu - (use_ipv6 ? 
VXLAN6_HEADROOM : VXLAN_HEADROOM);


where VXLAN_HEADROOM is defined as follows: 

/* IP header + UDP + VXLAN + Ethernet header */
#define VXLAN_HEADROOM (20 + 8 + 8 + 14)
/* IPv6 header + UDP + VXLAN + Ethernet header */
#define VXLAN6_HEADROOM (40 + 8 + 8 + 14)


This seems to match what I see with hand config:

sudo ip link add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0 dstport 
4789 <<<< (eth0 has an MTU of 1500)


sudo ip -d link show vxlan0
36: vxlan0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode 
DEFAULT group default <<<< (1450 = 1500 -50)
    link/ether e2:b8:2d:f4:f7:ae brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 42 group 239.1.1.1 dev eth0 srcport 32768 61000 dstport 4789 
ageing 300

thanks,

Ramkumar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-27 19:17   ` R. Parameswaran
@ 2016-09-28  7:48     ` David Miller
  2016-09-29  2:36       ` R. Parameswaran
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2016-09-28  7:48 UTC (permalink / raw)
  To: parameswaran.r7
  Cc: kleptog, jchapman, netdev, linux-kernel, nprachan, rshearma,
	dfawcus, stephen, acme, lboccass

From: "R. Parameswaran" <parameswaran.r7@gmail.com>
Date: Tue, 27 Sep 2016 12:17:21 -0700 (PDT)

> Later, in vxlan_dev_configure(), called from vxlan_dev_create(), it gets 
> adjusted to account for the headers:
> 
> vxlan_dev_configure():
> ...
>  if (!conf->mtu)
>                         dev->mtu = lowerdev->mtu - (use_ipv6 ? 
> VXLAN6_HEADROOM : VXLAN_HEADROOM);
> 
> 
> where VXLAN_HEADROOM is defined as follows: 
> 
> /* IP header + UDP + VXLAN + Ethernet header */
> #define VXLAN_HEADROOM (20 + 8 + 8 + 14)
> /* IPv6 header + UDP + VXLAN + Ethernet header */
> #define VXLAN6_HEADROOM (40 + 8 + 8 + 14)

Right but I don't see it going through the effort to make use of the
PMTU like you are.

I have another strong concern related to this.  There seems to be no
mechanism used to propagate any PMTU events into the device's MTU.

Because if there is a limiting nexthop in the route to the other end
of the UDP tunnel, you won't learn the PMTU until you (or some other
entity on the machine) actually starts sending traffic to the tunnel's
endpoint.

If the PMTU events aren't propagated into the tunnel's MTU or similar
I think this is an ad-hoc solution.

I would suggest that you either:

1) Do what VXLAN appears to do an ignore the PMTu

2) Add code to handle PMTU events that land on the UDP tunnel
   socket.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-28  7:48     ` David Miller
@ 2016-09-29  2:36       ` R. Parameswaran
  2016-09-29 12:21         ` Jiri Benc
  2016-09-29 15:39         ` James Chapman
  0 siblings, 2 replies; 15+ messages in thread
From: R. Parameswaran @ 2016-09-29  2:36 UTC (permalink / raw)
  To: David Miller
  Cc: parameswaran.r7, kleptog, jchapman, netdev, linux-kernel,
	nprachan, rshearma, dfawcus, stephen, acme, lboccass



Hi David,

Please see inline:

On Wed, 28 Sep 2016, David Miller wrote:

> From: "R. Parameswaran" <parameswaran.r7@gmail.com>
> Date: Tue, 27 Sep 2016 12:17:21 -0700 (PDT)
> 
> > Later, in vxlan_dev_configure(), called from vxlan_dev_create(), it gets 
> > adjusted to account for the headers:
> > 
> > vxlan_dev_configure():
> > ...
> >  if (!conf->mtu)
> >                         dev->mtu = lowerdev->mtu - (use_ipv6 ? 
> > VXLAN6_HEADROOM : VXLAN_HEADROOM);
> > 
> > 
> > where VXLAN_HEADROOM is defined as follows: 
> > 
> > /* IP header + UDP + VXLAN + Ethernet header */
> > #define VXLAN_HEADROOM (20 + 8 + 8 + 14)
> > /* IPv6 header + UDP + VXLAN + Ethernet header */
> > #define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
> 
> Right but I don't see it going through the effort to make use of the
> PMTU like you are.
> 
> I have another strong concern related to this.  There seems to be no
> mechanism used to propagate any PMTU events into the device's MTU.
> 
> Because if there is a limiting nexthop in the route to the other end
> of the UDP tunnel, you won't learn the PMTU until you (or some other
> entity on the machine) actually starts sending traffic to the tunnel's
> endpoint.
> 
> If the PMTU events aren't propagated into the tunnel's MTU or similar
> I think this is an ad-hoc solution.
> 
> I would suggest that you either:
> 
> 1) Do what VXLAN appears to do an ignore the PMTu
> 

I'd like to point out one difference with VXLAN - in VXLAN, the 
local physical interface is directly specified at the time of 
creation of the tunnel, and the data structure seems to have the ifindex 
of the local interface with which it is able to directly pull up the 
underlay interface device. Whereas in L2TP, we only have the IP
address of the remote tunnel end-point and thus only the socket and the 
dst from which we need to derive this. 

Also, dst_mtu references dst->ops->mtu, which if I followed the pointer
chain correctly, will dereference to ipv4_mtu() (for the IPv4 case, as
an example). The code in ipv4_mtu looks like the following:

ipv4_mtu():

        unsigned int mtu = rt->rt_pmtu;

        if (!mtu || time_after_eq(jiffies, rt->dst.expires))
                mtu = dst_metric_raw(dst, RTAX_MTU);

        if (mtu)
                return mtu;

        mtu = dst->dev->mtu;

        if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
                if (rt->rt_uses_gateway && mtu > 576)
                        mtu = 576;
        }

        return min_t(unsigned int, mtu, IP_MAX_MTU);

The code above does not depend on PMTU to be working. If no PMTU 
discovered MTU exists, it eventually falls back to the local 
underlay device MTU - and this is the mode in which I tested the fix - PMTU 
was off in my testbed, but it was picking up the local device MTU correctly.

Basically, this looks better than the VXLAN handling as far as I can 
tell - at least it will pick up the existing discovered PMTU on a best 
effort basis, while falling back to the underlay device if all else fails. 

I agree that something like 2. below would be needed in the long run (it 
will need some effort and redesign -e.g. how do I lookup the parent tunnel 
from the socket when receiving a PMTU update, existing pointer chain runs 
from tunnel to socket).  

But since the existing (Ethernet over L2TP) MTU derivation is incorrect, I am 
hoping this may be acceptable as an interim solution. 

thanks,

Ramkumar


> 2) Add code to handle PMTU events that land on the UDP tunnel
>    socket.
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-29  2:36       ` R. Parameswaran
@ 2016-09-29 12:21         ` Jiri Benc
  2016-09-29 15:39         ` James Chapman
  1 sibling, 0 replies; 15+ messages in thread
From: Jiri Benc @ 2016-09-29 12:21 UTC (permalink / raw)
  To: R. Parameswaran
  Cc: David Miller, kleptog, jchapman, netdev, linux-kernel, nprachan,
	rshearma, dfawcus, stephen, acme, lboccass

On Wed, 28 Sep 2016 19:36:45 -0700 (PDT), R. Parameswaran wrote:
> I'd like to point out one difference with VXLAN - in VXLAN, the 
> local physical interface is directly specified at the time of 
> creation of the tunnel, and the data structure seems to have the ifindex 
> of the local interface with which it is able to directly pull up the 
> underlay interface device. Whereas in L2TP, we only have the IP
> address of the remote tunnel end-point and thus only the socket and the 
> dst from which we need to derive this. 

Strictly speaking, VXLAN *may* know the underlying interface. It can
also be set up with just local and remote IP address, or even worse, in
metadata mode where we don't know the address nor the interface until
we get a packet (and each packet may have those different).

MTU wise, those cases are not accommodated for in the kernel. The vxlan
interface gets MTU of 1500 and it's up to the administrator to set it
correctly.

Btw, PMTU events won't help with the metadata mode. And even in
"normal" mode, it's not clear what should be done - the tunnel
interface may be in a bridge, thus there may be other interfaces that
depend on the same MTU, up to inside VMs.

 Jiri

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-22 20:52 [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2 R. Parameswaran
  2016-09-27  7:31 ` David Miller
@ 2016-09-29 15:18 ` James Chapman
  2016-09-30  2:39   ` R. Parameswaran
  1 sibling, 1 reply; 15+ messages in thread
From: James Chapman @ 2016-09-29 15:18 UTC (permalink / raw)
  To: R. Parameswaran
  Cc: kleptog, netdev, davem, linux-kernel, nprachan, rshearma,
	dfawcus, stephen, acme, lboccass

On 22/09/16 21:52, R. Parameswaran wrote:
> From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> From: "R. Parameswaran" <rparames@brocade.com>
> Date: Thu, 22 Sep 2016 13:19:25 -0700
> Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
>
> Take into account all of the tunnel encapsulation headers when setting
> up the MTU on the L2TP logical interface device. Otherwise, packets
> created by the applications on top of the L2TP layer are larger
> than they ought to be, relative to the underlay MTU, leading to
> needless fragmentation once the outer IP encap is added.
>
> Specifically, take into account the (outer, underlay) IP header
> imposed on the encapsulated L2TP packet, and the Layer 2 header
> imposed on the inner IP packet prior to L2TP encapsulation.
>
> Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> and the dst entry in the L2TP tunnel socket to directly pull up
> the underlay MTU (as the baseline number on top of which the
> encapsulation headers are factored in).  Fall back to Ethernet MTU
> if this fails.
>
> Signed-off-by: R. Parameswaran <rparames@brocade.com>
>
> Reviewed-by: "N. Prachanda" <nprachan@brocade.com>,
> Reviewed-by: "R. Shearman" <rshearma@brocade.com>,
> Reviewed-by: "D. Fawcus" <dfawcus@brocade.com>
> ---
>  net/l2tp/l2tp_eth.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> index 57fc5a4..dbcd6bd 100644
> --- a/net/l2tp/l2tp_eth.c
> +++ b/net/l2tp/l2tp_eth.c
> @@ -30,6 +30,9 @@
>  #include <net/xfrm.h>
>  #include <net/net_namespace.h>
>  #include <net/netns/generic.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <linux/udp.h>
>  
>  #include "l2tp_core.h"
>  
> @@ -206,6 +209,46 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
>  }
>  #endif
>  
> +static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
> +				struct l2tp_session *session,
> +				struct net_device *dev)
> +{
> +	unsigned int overhead = 0;
> +	struct dst_entry *dst;
> +
> +	if (session->mtu != 0) {
> +		dev->mtu = session->mtu;
> +		dev->needed_headroom += session->hdr_len;
> +		if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> +			dev->needed_headroom += sizeof(struct udphdr);
> +		return;
> +	}
> +	overhead = session->hdr_len;
> +	/* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
> +	if (tunnel->sock->sk_family == AF_INET)
> +		overhead += (ETH_HLEN + sizeof(struct iphdr));
> +	else if (tunnel->sock->sk_family == AF_INET6)
> +		overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
What about options in the IP header? If certain options are set on the
socket, the IP header may be larger.

> +	/* Additionally, if the encap is UDP, account for UDP header size */
> +	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> +		overhead += sizeof(struct udphdr);
> +	/* If PMTU discovery was enabled, use discovered MTU on L2TP device */
> +	dst = sk_dst_get(tunnel->sock);
> +	if (dst) {
> +		u32 pmtu = dst_mtu(dst);
> +
> +		if (pmtu != 0)
> +			dev->mtu = pmtu;
> +		dst_release(dst);
> +	}
> +	/* else (no PMTUD) L2TP dev MTU defaulted to Ethernet MTU in caller */
> +	session->mtu = dev->mtu - overhead;
> +	dev->mtu = session->mtu;
> +	dev->needed_headroom += session->hdr_len;
> +	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> +		dev->needed_headroom += sizeof(struct udphdr);
> +}
> +
>  static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
>  {
>  	struct net_device *dev;
> @@ -255,11 +298,8 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
>  	}
>  
>  	dev_net_set(dev, net);
> -	if (session->mtu == 0)
> -		session->mtu = dev->mtu - session->hdr_len;
> -	dev->mtu = session->mtu;
> -	dev->needed_headroom += session->hdr_len;
>  
> +	l2tp_eth_adjust_mtu(tunnel, session, dev);
>  	priv = netdev_priv(dev);
>  	priv->dev = dev;
>  	priv->session = session;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-29  2:36       ` R. Parameswaran
  2016-09-29 12:21         ` Jiri Benc
@ 2016-09-29 15:39         ` James Chapman
  1 sibling, 0 replies; 15+ messages in thread
From: James Chapman @ 2016-09-29 15:39 UTC (permalink / raw)
  To: R. Parameswaran
  Cc: David Miller, kleptog, netdev, linux-kernel, nprachan, rshearma,
	dfawcus, stephen, acme, lboccass

On 29/09/16 03:36, R. Parameswaran wrote:
> I agree that something like 2. below would be needed in the long run (it 
> will need some effort and redesign -e.g. how do I lookup the parent tunnel 
> from the socket when receiving a PMTU update, existing pointer chain runs 
> from tunnel to socket).
>> 2) Add code to handle PMTU events that land on the UDP tunnel
>>    socket.

Another function pointer could be added to struct udp_sock, similar to
encap_rcv,  such that the pmtu event could be handled by the UDP encap
protocol implementation.

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-29 15:18 ` James Chapman
@ 2016-09-30  2:39   ` R. Parameswaran
  2016-10-01 16:50     ` James Chapman
  0 siblings, 1 reply; 15+ messages in thread
From: R. Parameswaran @ 2016-09-30  2:39 UTC (permalink / raw)
  To: James Chapman
  Cc: R. Parameswaran, kleptog, netdev, davem, linux-kernel, nprachan,
	rshearma, dfawcus, stephen, acme, lboccass, bhong


Hi James,

On Thu, 29 Sep 2016, James Chapman wrote:

> On 22/09/16 21:52, R. Parameswaran wrote:
> > From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> > From: "R. Parameswaran" <rparames@brocade.com>
> > Date: Thu, 22 Sep 2016 13:19:25 -0700
> > Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
> >
> > Take into account all of the tunnel encapsulation headers when setting
> > up the MTU on the L2TP logical interface device. Otherwise, packets
> > created by the applications on top of the L2TP layer are larger
> > than they ought to be, relative to the underlay MTU, leading to
> > needless fragmentation once the outer IP encap is added.
> >
> > Specifically, take into account the (outer, underlay) IP header
> > imposed on the encapsulated L2TP packet, and the Layer 2 header
> > imposed on the inner IP packet prior to L2TP encapsulation.
> >
> > Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> > and the dst entry in the L2TP tunnel socket to directly pull up
> > the underlay MTU (as the baseline number on top of which the
> > encapsulation headers are factored in).  Fall back to Ethernet MTU
> > if this fails.
> >
> > Signed-off-by: R. Parameswaran <rparames@brocade.com>
> >
> > Reviewed-by: "N. Prachanda" <nprachan@brocade.com>,
> > Reviewed-by: "R. Shearman" <rshearma@brocade.com>,
> > Reviewed-by: "D. Fawcus" <dfawcus@brocade.com>
> > ---
> >  net/l2tp/l2tp_eth.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 44 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
> > index 57fc5a4..dbcd6bd 100644
> > --- a/net/l2tp/l2tp_eth.c
> > +++ b/net/l2tp/l2tp_eth.c
> > @@ -30,6 +30,9 @@
> >  #include <net/xfrm.h>
> >  #include <net/net_namespace.h>
> >  #include <net/netns/generic.h>
> > +#include <linux/ip.h>
> > +#include <linux/ipv6.h>
> > +#include <linux/udp.h>
> >  
> >  #include "l2tp_core.h"
> >  
> > @@ -206,6 +209,46 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
> >  }
> >  #endif
> >  
> > +static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
> > +				struct l2tp_session *session,
> > +				struct net_device *dev)
> > +{
> > +	unsigned int overhead = 0;
> > +	struct dst_entry *dst;
> > +
> > +	if (session->mtu != 0) {
> > +		dev->mtu = session->mtu;
> > +		dev->needed_headroom += session->hdr_len;
> > +		if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> > +			dev->needed_headroom += sizeof(struct udphdr);
> > +		return;
> > +	}
> > +	overhead = session->hdr_len;
> > +	/* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
> > +	if (tunnel->sock->sk_family == AF_INET)
> > +		overhead += (ETH_HLEN + sizeof(struct iphdr));
> > +	else if (tunnel->sock->sk_family == AF_INET6)
> > +		overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
> What about options in the IP header? If certain options are set on the
> socket, the IP header may be larger.
> 

Thanks for the reply - It looks like IP options can only be 
enabled through setsockopt on an application's socket (if there's any 
other way to turn on IP options, please let me know - didn't see any 
sysctl setting for transmit). This scenario would come 
into picture when an application opens a raw IP or UDP socket such that it 
routes into the L2TP logical interface.

If you take the case of a plain IP (ethernet) interface, even if an
application opened a socket turning on IP options, it would not change
the MTU of the underlying interface, and it would not affect other 
applications transacting packets on the same interface. I know its not an 
exact parallel to this case, but since the IP option control is per 
application, we probably should not factor it into the L2TP logical interface?
We cannot affect other applications/processes running on the same L2TP 
tunnel. Also, since the application  using IP options knows that it has turned 
on IP options, maybe we can count on it to factor the size of the options 
into the size of the payload it sends into the socket, or set the mtu on the 
L2TP interface through config? 

Other than this, I don't see keepalives or anything else in which the 
kernel will source its own packet into the L2TP interface, outside of 
an application injected packet - if there is something like that, please
let me know. The user space L2TP daemon would probably fall in the 
category of applications.

thanks,

Ramkumar 


> > +	/* Additionally, if the encap is UDP, account for UDP header size */
> > +	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> > +		overhead += sizeof(struct udphdr);
> > +	/* If PMTU discovery was enabled, use discovered MTU on L2TP device */
> > +	dst = sk_dst_get(tunnel->sock);
> > +	if (dst) {
> > +		u32 pmtu = dst_mtu(dst);
> > +
> > +		if (pmtu != 0)
> > +			dev->mtu = pmtu;
> > +		dst_release(dst);
> > +	}
> > +	/* else (no PMTUD) L2TP dev MTU defaulted to Ethernet MTU in caller */
> > +	session->mtu = dev->mtu - overhead;
> > +	dev->mtu = session->mtu;
> > +	dev->needed_headroom += session->hdr_len;
> > +	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
> > +		dev->needed_headroom += sizeof(struct udphdr);
> > +}
> > +
> >  static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
> >  {
> >  	struct net_device *dev;
> > @@ -255,11 +298,8 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
> >  	}
> >  
> >  	dev_net_set(dev, net);
> > -	if (session->mtu == 0)
> > -		session->mtu = dev->mtu - session->hdr_len;
> > -	dev->mtu = session->mtu;
> > -	dev->needed_headroom += session->hdr_len;
> >  
> > +	l2tp_eth_adjust_mtu(tunnel, session, dev);
> >  	priv = netdev_priv(dev);
> >  	priv->dev = dev;
> >  	priv->session = session;
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-09-30  2:39   ` R. Parameswaran
@ 2016-10-01 16:50     ` James Chapman
  2016-10-04  3:12       ` R. Parameswaran
  0 siblings, 1 reply; 15+ messages in thread
From: James Chapman @ 2016-10-01 16:50 UTC (permalink / raw)
  To: R. Parameswaran
  Cc: kleptog, netdev, davem, linux-kernel, nprachan, rshearma,
	dfawcus, stephen, acme, lboccass, bhong

On 30/09/16 03:39, R. Parameswaran wrote:
>
>>> +	/* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
>>> +	if (tunnel->sock->sk_family == AF_INET)
>>> +		overhead += (ETH_HLEN + sizeof(struct iphdr));
>>> +	else if (tunnel->sock->sk_family == AF_INET6)
>>> +		overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
>> What about options in the IP header? If certain options are set on the
>> socket, the IP header may be larger.
>>
> Thanks for the reply - It looks like IP options can only be 
> enabled through setsockopt on an application's socket (if there's any 
> other way to turn on IP options, please let me know - didn't see any 
> sysctl setting for transmit). This scenario would come 
> into picture when an application opens a raw IP or UDP socket such that it 
> routes into the L2TP logical interface.

No. An L2TP daemon (userspace) will open a socket for each tunnel that
it creates. Control and data packets use the same socket, which is the
socket used by this code. It may set any options on its sockets. L2TP
tunnel sockets can be created either by an L2TP daemon (managed tunnels)
or by ip l2tp commands (unmanaged tunnels).

> If you take the case of a plain IP (ethernet) interface, even if an
> application opened a socket turning on IP options, it would not change
> the MTU of the underlying interface, and it would not affect other 
> applications transacting packets on the same interface. I know its not an 
> exact parallel to this case, but since the IP option control is per 
> application, we probably should not factor it into the L2TP logical interface?
> We cannot affect other applications/processes running on the same L2TP 
> tunnel. Also, since the application  using IP options knows that it has turned 
> on IP options, maybe we can count on it to factor the size of the options 
> into the size of the payload it sends into the socket, or set the mtu on the 
> L2TP interface through config? 

No. See above.

>
> Other than this, I don't see keepalives or anything else in which the 
> kernel will source its own packet into the L2TP interface, outside of 
> an application injected packet - if there is something like that, please
> let me know. The user space L2TP daemon would probably fall in the 
> category of applications.
>
> thanks,
>
> Ramkumar 
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-10-01 16:50     ` James Chapman
@ 2016-10-04  3:12       ` R. Parameswaran
  2016-10-04  7:53         ` James Chapman
  0 siblings, 1 reply; 15+ messages in thread
From: R. Parameswaran @ 2016-10-04  3:12 UTC (permalink / raw)
  To: James Chapman
  Cc: R. Parameswaran, kleptog, netdev, davem, linux-kernel, nprachan,
	rshearma, dfawcus, stephen, acme, lboccass, bhong



Hi James, 

Please see inline, thanks for the reply:

On Sat, 1 Oct 2016, James Chapman wrote:

> On 30/09/16 03:39, R. Parameswaran wrote:
> >
> >>> +	/* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
> >>> +	if (tunnel->sock->sk_family == AF_INET)
> >>> +		overhead += (ETH_HLEN + sizeof(struct iphdr));
> >>> +	else if (tunnel->sock->sk_family == AF_INET6)
> >>> +		overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
> >> What about options in the IP header? If certain options are set on the
> >> socket, the IP header may be larger.
> >>
> > Thanks for the reply - It looks like IP options can only be 
> > enabled through setsockopt on an application's socket (if there's any 
> > other way to turn on IP options, please let me know - didn't see any 
> > sysctl setting for transmit). This scenario would come 
> > into picture when an application opens a raw IP or UDP socket such that it 
> > routes into the L2TP logical interface.
> 
> No. An L2TP daemon (userspace) will open a socket for each tunnel that
> it creates. Control and data packets use the same socket, which is the
> socket used by this code. It may set any options on its sockets. L2TP
> tunnel sockets can be created either by an L2TP daemon (managed tunnels)
> or by ip l2tp commands (unmanaged tunnels).
> 

One Q I have is whether it would be sufficient to solve this for the
common case (i.e no IP options) and have an expectation that the 
administrator will explicitly provision the mtu using the 'ip link ... 
mtu'  command when dealing with infrequent occurences like IP options? 

But looking at the code, it looks to be possible to pick up whether 
options are enabled and how long the options are, from the ip_options struct 
embedded in the tunnel socket. If you want me to, I can repost the patch
with this change (will need a few days) - please let me know if this is 
what you had in mind.

thanks,

Ramkumar



> > If you take the case of a plain IP (ethernet) interface, even if an
> > application opened a socket turning on IP options, it would not change
> > the MTU of the underlying interface, and it would not affect other 
> > applications transacting packets on the same interface. I know its not an 
> > exact parallel to this case, but since the IP option control is per 
> > application, we probably should not factor it into the L2TP logical interface?
> > We cannot affect other applications/processes running on the same L2TP 
> > tunnel. Also, since the application  using IP options knows that it has turned 
> > on IP options, maybe we can count on it to factor the size of the options 
> > into the size of the payload it sends into the socket, or set the mtu on the 
> > L2TP interface through config? 
> 
> No. See above.
> 
> >
> > Other than this, I don't see keepalives or anything else in which the 
> > kernel will source its own packet into the L2TP interface, outside of 
> > an application injected packet - if there is something like that, please
> > let me know. The user space L2TP daemon would probably fall in the 
> > category of applications.
> >
> > thanks,
> >
> > Ramkumar 
> >
> >
> 
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-10-04  3:12       ` R. Parameswaran
@ 2016-10-04  7:53         ` James Chapman
       [not found]           ` <CAGeBGG7AS1JZYHC6T5_H6vY4wfptUtPzO=+kdCcUzJGXA0m6_A@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: James Chapman @ 2016-10-04  7:53 UTC (permalink / raw)
  To: R. Parameswaran
  Cc: kleptog, netdev, davem, linux-kernel, nprachan, rshearma,
	dfawcus, stephen, acme, lboccass, bhong

On 04/10/16 04:12, R. Parameswaran wrote:
>
> Hi James, 
>
> Please see inline, thanks for the reply:
>
> On Sat, 1 Oct 2016, James Chapman wrote:
>
>> On 30/09/16 03:39, R. Parameswaran wrote:
>>>>> +	/* Adjust MTU, factor overhead - underlay L3 hdr, overlay L2 hdr*/
>>>>> +	if (tunnel->sock->sk_family == AF_INET)
>>>>> +		overhead += (ETH_HLEN + sizeof(struct iphdr));
>>>>> +	else if (tunnel->sock->sk_family == AF_INET6)
>>>>> +		overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
>>>> What about options in the IP header? If certain options are set on the
>>>> socket, the IP header may be larger.
>>>>
>>> Thanks for the reply - It looks like IP options can only be 
>>> enabled through setsockopt on an application's socket (if there's any 
>>> other way to turn on IP options, please let me know - didn't see any 
>>> sysctl setting for transmit). This scenario would come 
>>> into picture when an application opens a raw IP or UDP socket such that it 
>>> routes into the L2TP logical interface.
>> No. An L2TP daemon (userspace) will open a socket for each tunnel that
>> it creates. Control and data packets use the same socket, which is the
>> socket used by this code. It may set any options on its sockets. L2TP
>> tunnel sockets can be created either by an L2TP daemon (managed tunnels)
>> or by ip l2tp commands (unmanaged tunnels).
>>
> One Q I have is whether it would be sufficient to solve this for the
> common case (i.e no IP options) and have an expectation that the 
> administrator will explicitly provision the mtu using the 'ip link ... 
> mtu'  command when dealing with infrequent occurences like IP options? 
>
> But looking at the code, it looks to be possible to pick up whether 
> options are enabled and how long the options are, from the ip_options struct 
> embedded in the tunnel socket. If you want me to, I can repost the patch
> with this change (will need a few days) - please let me know if this is 
> what you had in mind.
>
>
Yes, that's what I had in mind. But my preference would be that this
would be a new function in the ip core, for use by any encap protocol,
where appropriate.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
       [not found]           ` <CAGeBGG7AS1JZYHC6T5_H6vY4wfptUtPzO=+kdCcUzJGXA0m6_A@mail.gmail.com>
@ 2016-10-11  7:47             ` James Chapman
  2016-10-17  4:05               ` [RFC PATCH v3 1/2] " R. Parameswaran
  2016-10-17  5:20               ` [RFC PATCH v3 2/2] " R. Parameswaran
  0 siblings, 2 replies; 15+ messages in thread
From: James Chapman @ 2016-10-11  7:47 UTC (permalink / raw)
  To: R Parameswaran
  Cc: kleptog, netdev, davem, linux-kernel, nprachan, Robert Shearman,
	dfawcus, stephen, acme, lboccass, bhong

On 11/10/16 02:54, R Parameswaran wrote:
>
>
> Hi James,
>
> Please see inline:
>
> On Tue, Oct 4, 2016 at 12:53 AM, James Chapman <jchapman@katalix.com
> <mailto:jchapman@katalix.com>> wrote:
>
>     On 04/10/16 04:12, R. Parameswaran wrote:
>     >
>     > Hi James,
>     >
>     > Please see inline, thanks for the reply:
>     >
>     > On Sat, 1 Oct 2016, James Chapman wrote:
>     >
>     >> On 30/09/16 03:39, R. Parameswaran wrote:
>     >>>>> + /* Adjust MTU, factor overhead - underlay L3 hdr, overlay
>     L2 hdr*/
>     >>>>> + if (tunnel->sock->sk_family == AF_INET)
>     >>>>> +         overhead += (ETH_HLEN + sizeof(struct iphdr));
>     >>>>> + else if (tunnel->sock->sk_family == AF_INET6)
>     >>>>> +         overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
>     >>>> What about options in the IP header? If certain options are
>     set on the
>     >>>> socket, the IP header may be larger.
>     >>>>
>     >>> Thanks for the reply - It looks like IP options can only be
>     >>> enabled through setsockopt on an application's socket (if
>     there's any
>     >>> other way to turn on IP options, please let me know - didn't
>     see any
>     >>> sysctl setting for transmit). This scenario would come
>     >>> into picture when an application opens a raw IP or UDP socket
>     such that it
>     >>> routes into the L2TP logical interface.
>     >> No. An L2TP daemon (userspace) will open a socket for each
>     tunnel that
>     >> it creates. Control and data packets use the same socket, which
>     is the
>     >> socket used by this code. It may set any options on its
>     sockets. L2TP
>     >> tunnel sockets can be created either by an L2TP daemon (managed
>     tunnels)
>     >> or by ip l2tp commands (unmanaged tunnels).
>     >>
>     > One Q I have is whether it would be sufficient to solve this for the
>     > common case (i.e no IP options) and have an expectation that the
>     > administrator will explicitly provision the mtu using the 'ip
>     link ...
>     > mtu'  command when dealing with infrequent occurences like IP
>     options?
>     >
>     > But looking at the code, it looks to be possible to pick up whether
>     > options are enabled and how long the options are, from the
>     ip_options struct
>     > embedded in the tunnel socket. If you want me to, I can repost
>     the patch
>     > with this change (will need a few days) - please let me know if
>     this is
>     > what you had in mind.
>     >
>     >
>     Yes, that's what I had in mind. But my preference would be that this
>     would be a new function in the ip core, for use by any encap protocol,
>     where appropriate.
>
> Discussed this with Nachi (nprachan), we were thinking of a new
> function in ip_sockglue.c which would take the tunnel socket as
> parameter, derive the underlay device MTU and compute the underlay L3
> overhead (IPv4/IPv6 header, UDP header if it is a UDP socket, and IP
> option length if the ip_options struct exists in the socket). The
> function would be agnostic to the tunnel type (although we could
> provision tunnel-type and encap type as parameters). Callers would
> call it to figure out the cumulative underlay L3 overhead and the
> underlay MTU, and then use these numbers in the MTU calculation for
> their specific tunnel type. Let me know if that is different from what
> you had in mind, and/or if you have any suggestions on which file to
> place this in. I'll try and have this re-posted  by the end of this
> week or by early next week.
>

I think keep it simple. A function to return the size of the IP header
associated with any IP socket, not necessarily a tunnel socket. Don't
mix in any MTU derivation logic or UDP header size etc.

Post code early as an RFC. You're more likely to get review feedback
from others.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 1/2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-10-11  7:47             ` James Chapman
@ 2016-10-17  4:05               ` R. Parameswaran
  2016-10-17  5:20               ` [RFC PATCH v3 2/2] " R. Parameswaran
  1 sibling, 0 replies; 15+ messages in thread
From: R. Parameswaran @ 2016-10-17  4:05 UTC (permalink / raw)
  To: James Chapman
  Cc: R Parameswaran, kleptog, netdev, davem, linux-kernel, nprachan,
	Robert Shearman, dfawcus, stephen, acme, lboccass, bhong


[v3: Picked up review comments from James Chapman, added a
 function  to compute ip header + ip option overhead on a socket, and factored
 it  into L2TP change-set, RFC, would like early feedback on name and
 placement, and logic  of new function while I test this]

>From 30c4b3900d09deb912fc6ce4af3c19e870f84e14 Mon Sep 17 00:00:00 2001
From: "R. Parameswaran" <rparames@brocade.com>
Date: Sun, 16 Oct 2016 20:19:38 -0700

In existing kernel code, when setting up the L2TP interface, all of the
tunnel encapsulation headers are not taken into account when setting
up the MTU on the  L2TP logical interface device. Due to this, the
packets created by the applications on top of the L2TP layer are larger
than they ought to be, relative to the underlay MTU, which leads to
needless fragmentation once the L2TP packet is encapsulated in an outer IP
packet.

Specifically, the MTU calculation  does not take into account the (outer)
IP header imposed on the encapsulated L2TP packet, and the Layer 2 header
imposed on the inner L2TP packet prior to encapsulation. The patch posted
here takes care of these.

Existing code also seems to assume an Ethernet (non-jumbo) underlay. The
patch uses the PMTU mechanism and the dst entry in the L2TP tunnel socket
to directly pull up the underlay MTU (as the baseline number on top of
which the encapsulation headers are factored in).  Ethernet MTU is
assumed as a fallback only if this fails.

Picked up review comments from James Chapman, added a function
to compute ip header + ip option overhead on a socket, and factored it
into L2TP change-set.

Signed-off-by: nprachan@brocade.com,
Signed-off-by: bhong@brocade.com,
Signed-off-by: rshearma@brocade.com,
Signed-off-by: dfawcus@brocade.com
---
 include/linux/net.h |  3 +++
 net/socket.c        | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index cd0c8bd..2c8b092 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -298,6 +298,9 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset,
 int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg);
 int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how);
 
+/* Following routine returns the IP overhead imposed by a socket.  */
+u32 kernel_sock_ip_overhead(struct sock *sk);
+
 #define MODULE_ALIAS_NETPROTO(proto) \
 	MODULE_ALIAS("net-pf-" __stringify(proto))
 
diff --git a/net/socket.c b/net/socket.c
index 5a9bf5e..d5e79c2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3293,3 +3293,40 @@ int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how)
 	return sock->ops->shutdown(sock, how);
 }
 EXPORT_SYMBOL(kernel_sock_shutdown);
+
+/*
+ *	This routine returns the IP overhead imposed by a socket i.e.
+ *	the length of the underlying IP header, depending on whether
+ *      this is an IPv4 or IPv6 socket and the length from IP options turned
+ *      on at the socket.
+ */
+u32 kernel_sock_ip_overhead(struct sock *sk)
+{
+	u32 overhead = 0;
+	if (!sk)
+		goto done;
+	if (sk->sk_family == AF_INET) {
+		struct ip_options_rcu *opt = NULL;
+		struct inet_sock *inet = inet_sk(sk);
+		overhead += sizeof(struct iphdr);
+		if (inet)
+			opt = rcu_dereference_protected(inet->inet_opt,
+							sock_owned_by_user(sk));
+		if (opt)
+			overhead += opt->opt.optlen;
+	}
+	else if (sk->sk_family == AF_INET6) {
+		struct ipv6_pinfo *np = inet6_sk(sk);
+		struct ipv6_txoptions *opt = NULL;
+		overhead += sizeof(struct ipv6hdr);
+		if (np)
+			opt = rcu_dereference_protected(np->opt,
+							sock_owned_by_user(sk));
+		if (opt)
+			overhead += (opt->opt_flen + opt->opt_nflen);
+	}
+
+done:
+	return overhead;
+}
+EXPORT_SYMBOL_GPL(kernel_sock_ip_overhead);
-- 
2.1.4

----
On Tue, 11 Oct 2016, James Chapman wrote:

> 
> I think keep it simple. A function to return the size of the IP header
> associated with any IP socket, not necessarily a tunnel socket. Don't
> mix in any MTU derivation logic or UDP header size etc.
> 
> Post code early as an RFC. You're more likely to get review feedback
> from others.
> 
> 
> 
> 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 2/2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
  2016-10-11  7:47             ` James Chapman
  2016-10-17  4:05               ` [RFC PATCH v3 1/2] " R. Parameswaran
@ 2016-10-17  5:20               ` R. Parameswaran
  1 sibling, 0 replies; 15+ messages in thread
From: R. Parameswaran @ 2016-10-17  5:20 UTC (permalink / raw)
  To: James Chapman
  Cc: R Parameswaran, kleptog, netdev, davem, linux-kernel, nprachan,
	Robert Shearman, dfawcus, stephen, acme, lboccass, bhong



[v3: Picked up review comments from James Chapman, added a
 function  to compute ip header + ip option overhead on a socket, and factored
 it  into L2TP change-set, RFC, would like early feedback on name and
 placement  of new function while I test this.

 Part 2/2: Changes in l2tp_eth.c, using the new API from part 1]

>From f4066da53e781ef167055c1e89ca1a7819215a40 Mon Sep 17 00:00:00 2001
From: "R. Parameswaran" <rparames@brocade.com>
Date: Sun, 16 Oct 2016 20:27:20 -0700

In existing kernel code, when setting up the L2TP interface, all of the
tunnel encapsulation headers are not taken into account when setting
up the MTU on the  L2TP logical interface device. Due to this, the
packets created by the applications on top of the L2TP layer are larger
than they ought to be, relative to the underlay MTU, which leads to
needless fragmentation once the L2TP packet is encapsulated in an outer IP
packet.

Specifically, the MTU calculation  does not take into account the (outer)
IP header imposed on the encapsulated L2TP packet, and the Layer 2 header
imposed on the inner L2TP packet prior to encapsulation. The patch posted
here takes care of these.

Existing code also seems to assume an Ethernet (non-jumbo) underlay. The
patch uses the PMTU mechanism and the dst entry in the L2TP tunnel socket
to directly pull up the underlay MTU (as the baseline number on top of
which the encapsulation headers are factored in).  Ethernet MTU is
assumed as a fallback only if this fails.

Picked up review comments from James Chapman, added a function
to compute ip header + ip option overhead on a socket, and factored it
into L2TP change-set.

Signed-off-by: nprachan@brocade.com,
Signed-off-by: bhong@brocade.com,
Signed-off-by: rshearma@brocade.com,
Signed-off-by: dfawcus@brocade.com
---
 net/l2tp/l2tp_eth.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 47 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 965f7e3..75eb5d3 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -30,6 +30,9 @@
 #include <net/xfrm.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/udp.h>
 
 #include "l2tp_core.h"
 
@@ -206,6 +209,49 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
 }
 #endif
 
+static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
+				struct l2tp_session *session,
+				struct net_device *dev)
+{
+	unsigned int overhead = 0;
+	struct dst_entry *dst;
+	u32 l3_overhead = 0;
+
+	if (session->mtu != 0) {
+		dev->mtu = session->mtu;
+		dev->needed_headroom += session->hdr_len;
+		if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+			dev->needed_headroom += sizeof(struct udphdr);
+		return;
+	}
+	overhead = session->hdr_len;
+	l3_overhead = kernel_sock_ip_overhead(tunnel->sock);
+	if (!tunnel->sock || (l3_overhead == 0)) {
+		/* L3 Overhead couldn't be identified, dev mtu stays at 1500 */
+		return;
+	}
+	/* Adjust MTU, factor overhead - underlay L3, overlay L2 hdr*/
+	overhead += ETH_HLEN + l3_overhead;
+	/* Additionally, if the encap is UDP, account for UDP header size */
+	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+		overhead += sizeof(struct udphdr);
+	/* If PMTU discovery was enabled, use discovered MTU on L2TP device */
+	dst = sk_dst_get(tunnel->sock);
+	if (dst) {
+		/* dst_mtu will use PMTU if found, else fallback to intf MTU */
+		u32 pmtu = dst_mtu(dst);
+
+		if (pmtu != 0)
+			dev->mtu = pmtu;
+		dst_release(dst);
+	}
+	session->mtu = dev->mtu - overhead;
+	dev->mtu = session->mtu;
+	dev->needed_headroom += session->hdr_len;
+	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+		dev->needed_headroom += sizeof(struct udphdr);
+}
+
 static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
 {
 	struct net_device *dev;
@@ -255,11 +301,8 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
 	}
 
 	dev_net_set(dev, net);
-	if (session->mtu == 0)
-		session->mtu = dev->mtu - session->hdr_len;
-	dev->mtu = session->mtu;
-	dev->needed_headroom += session->hdr_len;
 
+	l2tp_eth_adjust_mtu(tunnel, session, dev);
 	priv = netdev_priv(dev);
 	priv->dev = dev;
 	priv->session = session;
-- 
2.1.4

----

> 
> I think keep it simple. A function to return the size of the IP header
> associated with any IP socket, not necessarily a tunnel socket. Don't
> mix in any MTU derivation logic or UDP header size etc.
> 
> Post code early as an RFC. You're more likely to get review feedback
> from others.
> 
> 
> 
> 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-10-17  5:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 20:52 [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2 R. Parameswaran
2016-09-27  7:31 ` David Miller
2016-09-27 19:17   ` R. Parameswaran
2016-09-28  7:48     ` David Miller
2016-09-29  2:36       ` R. Parameswaran
2016-09-29 12:21         ` Jiri Benc
2016-09-29 15:39         ` James Chapman
2016-09-29 15:18 ` James Chapman
2016-09-30  2:39   ` R. Parameswaran
2016-10-01 16:50     ` James Chapman
2016-10-04  3:12       ` R. Parameswaran
2016-10-04  7:53         ` James Chapman
     [not found]           ` <CAGeBGG7AS1JZYHC6T5_H6vY4wfptUtPzO=+kdCcUzJGXA0m6_A@mail.gmail.com>
2016-10-11  7:47             ` James Chapman
2016-10-17  4:05               ` [RFC PATCH v3 1/2] " R. Parameswaran
2016-10-17  5:20               ` [RFC PATCH v3 2/2] " R. Parameswaran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).