All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
To: Shmulik Ladkani <shmulik.ladkani@gmail.com>,
	David Miller <davem@davemloft.net>
Cc: wenxu@ucloud.cn, kuznet@ms2.inr.ac.ru, jmorris@namei.org,
	kaber@trash.net, yoshfuji@linux-ipv6.org, netdev@vger.kernel.org,
	wenx05124561@163.com
Subject: Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs
Date: Fri, 12 Aug 2016 13:11:50 +0200	[thread overview]
Message-ID: <dfa12e1b-54dc-4e20-8819-cb2400a693d1@stressinduktion.org> (raw)
In-Reply-To: <20160811224132.7a17f0c3@halley>

On 11.08.2016 21:41, Shmulik Ladkani wrote:
> On Wed, 10 Aug 2016 17:35:11 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
>>> commit b8247f095edd ("net: ip_finish_output_gso: If skb_gso_network_seglen
>>> exceeds MTU, allow segmentation for local udp tunneled skbs")
>>>
>>> Given:
>>>      - tap0 and ovs-gre
>>>      - ovs-gre stacked on eth0, eth0 having the small mtu
>>>
>>> After encapsulation these skbs have skb_gso_network_seglen that exceed
>>> eth0's ip_skb_dst_mtu. So the finnal each segment would be larger than
>>> eth0 mtu. These packets maybe dropped.
>>>
>>> It has the same problem if tap0 bridge with ipgre or gretap device. So
>>> the IPSKB_FRAG_SEGS flags should also be set in gre tunneled skbs.
>>>
>>> Signed-off-by: wenxu <wenxu@ucloud.cn>  
>>
>> I am rather certain that this test is intentionally restricted to
>> UDP tunnel endpoints, because GRE and other tunnel types are PMTU safe.
>>
>> Hannes and Shmulik?
> 
> It was restricted to UDP tun encaps per Hannes' suggestion, in order to
> scope the change, as Hannes hinted gre and ipip are pmtu-safe, see [1].
> 
> Glancing at ip_tunnel_xmit() --> tnl_update_pmtu(), they do seem to
> handle PMTU, but - if I understand correctly - only in the !skb_is_gso
> case.
> (since 68c3316311 "v4 GRE: Add TCP segmentation offload for GRE")
> 
> This is probably due the same hidden assumption that 'gso_size' will be
> suitable for the egress mtu even after encapsulation, which does not
> hold true in some usecases as described in [2].
> 
> Therefore it might be that b8247f095edd needs to be augmented for
> non-udp tunnels as well.
> 
> Hannes?

I really would not like to see this expanded to gre and other protocols.
All switches drop packets where the packets are exceeding the MTU,
bridges and also openvswitch should behave the same.

Unfortunately we already had this loophole in the kernel that vxlan udp
output path could fragment the packet again, even in case of switches.
But this stopped working for GSO packets, which violates another rule in
the kernel, GSO should always be transparent and user space should never
have to care if a packet is GSO or not.

Because we couldn't a) roll back the change that we fragment packets in
UDP output paths and b) should not violate GSO transparency rule, I
strongly believed it would be better too only change the kernel in a way
that it transparently works with GSO, too. If we argue that a VTEP is
its own UDP endpoint which is set up after the bridge, I still can sleep
well. :)

My understanding was that GRE failed consistently, GSO as well as
non-GSO packets are dropped, which would be the correct behavior for me.
I don't want to change this. A good argument against this would be if we
violate the GSO transparency rule again. But when I looked into the code
I couldn't see that.

Bye,
Hannes

  parent reply	other threads:[~2016-08-12 11:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-09  7:04 [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for gre tunneled skbs wenxu
2016-08-11  0:35 ` David Miller
2016-08-11 19:41   ` Shmulik Ladkani
2016-08-12  4:29     ` wenxu
     [not found]     ` <80d116d7-e61c-1bbd-64bf-e3b1f809419b@ucloud.cn>
2016-08-12  5:18       ` Shmulik Ladkani
2016-08-12 11:11     ` Hannes Frederic Sowa [this message]
2016-08-15 11:16       ` Shmulik Ladkani
2016-08-16  7:12         ` wenxu
2016-08-19  7:26         ` Shmulik Ladkani
2016-08-19  9:20           ` Hannes Frederic Sowa
2016-08-19 13:40             ` Shmulik Ladkani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfa12e1b-54dc-4e20-8819-cb2400a693d1@stressinduktion.org \
    --to=hannes@stressinduktion.org \
    --cc=davem@davemloft.net \
    --cc=jmorris@namei.org \
    --cc=kaber@trash.net \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@vger.kernel.org \
    --cc=shmulik.ladkani@gmail.com \
    --cc=wenx05124561@163.com \
    --cc=wenxu@ucloud.cn \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.