netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Timo Teras <timo.teras@iki.fi>
To: netdev@vger.kernel.org
Subject: Re: linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken
Date: Fri, 15 Mar 2013 11:25:16 +0200	[thread overview]
Message-ID: <20130315112516.4b1651ca@vostro> (raw)
In-Reply-To: <20130313171453.0297f179@vostro>

On Wed, 13 Mar 2013 17:14:53 +0200
Timo Teras <timo.teras@iki.fi> wrote:

> In the typical DMVPN setup with IPv4-ESP-GRE-IPv4 stack, it seems that
> IPv4 fragmentation got broke around 3.6 for forwarded packets.
> 
> It would seem that fragmentation works for locally generated packets.
> Also PMTU (DF set) seems to work for both forwarded and locally
> generated packets. But forwarded packets to gre device that gets IPsec
> encrypted do not get fragmented properly.
> 
> 3.4.x kernels work, 3.6 and 3.8 series tested and fail similarly.

Actually 3.4.x vanilla does not work. It works only with 38d523e "ipv4:
Remove output route check in ipv4_mtu" applied which I've been
cherry-picking to my builds.

> I was going through the changelog and it seems that MTU is now handled
> in nexthop exceptions and one needs to produce the full flow info to
> update it. I'm wonding if this does not hold true in my code path as
> ip_gre rewraps the forwarded packet and creates new IP header - when
> it next goes to the xfrm code (which sends the ICMP error) the inner
> iphdr is no longer accessible. Would this cause the breakage that I'm
> seeing? Or the forward flow's mtu still updated somehow?

I have now a theory on what goes wrong.

My gre tunnel is configured with 'ttl 64' so the tunnel IP header
always gets DF bit set to do proper path-mtu. The kind of locally
generated ICMP messages I get, imply that re-fragmentation happens only
on the tunnel's IPv4 header level - but it'll be too late then: the
large packet is queued, IPsec'ed and it is the IPsec'ed packet that
gets is tried to be fragmented (but it has DF set so it fails and
packet is dropped).

I believe ip_gre should explicitly fragment the inner IPv4 and IPv6
packets if the tunnel's ttl is not inherited (resulting in DF bit set
in the tunnel's IPv4 header).

So basically ip_gre worked wrong all along - things just happened to
work due to GRO/GSO not implemented in ip_gre, and the way (the now
deleted) routing cache exposed pmtu.

Does this make sense?

- Timo

  reply	other threads:[~2013-03-15  9:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-13 15:14 linux-3.6+, gre+ipsec+forwarding = IP fragmentation broken Timo Teras
2013-03-15  9:25 ` Timo Teras [this message]
2013-03-15 11:38   ` Timo Teras
2013-03-15 13:03     ` Timo Teras
     [not found]       ` <20130320101318.4196d93a@vostro>
2013-03-20 17:46         ` [regression] [analyzed] fragmentation broken for tunnel devices David Miller
2013-05-01  6:46           ` Timo Teras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130315112516.4b1651ca@vostro \
    --to=timo.teras@iki.fi \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).