linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ipv4: icmp: Fix pMTU handling for rare case
@ 2014-06-30 15:16 Edward Allcutt
  2014-07-01  6:48 ` David Miller
  2014-07-08  0:23 ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Edward Allcutt @ 2014-06-30 15:16 UTC (permalink / raw)
  To: davem, kuznet, jmorris, yoshfuji, kaber, netdev
  Cc: linux-kernel, Edward Allcutt

Some older router implementations still send Fragmentation Needed
errors with the Next-Hop MTU field set to zero. This is explicitly
described as an eventuality that hosts must deal with by the
standard (RFC 1191) since older standards specified that those
bits must be zero.

Linux had a generic (for all of IPv4) implementation of the algorithm
described in the RFC for searching a list of MTU plateaus for a good
value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
removed this as part of the changes to remove the routing cache.
Subsequently any Fragmentation Needed packet with a zero Next-Hop
MTU has been discarded without being passed to the per-protocol
handlers or notifying userspace for raw sockets.

When there is a router which does not implement RFC 1191 on an
MTU limited path then this results in stalled connections since
large packets are discarded and the local protocols are not
notified so they never attempt to lower the pMTU.

One example I have seen is an OpenBSD router terminating IPSec
tunnels. It's worth pointing out that this case is distinct from
the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
since the commit in question dismissed that as a valid concern.

All of the per-protocols handlers implement the simple approach from
RFC 1191 of immediately falling back to the minimum value. Although
this is sub-optimal it is vastly preferable to connections hanging
indefinitely.

Remove the Next-Hop MTU != 0 check and allow such packets
to follow the normal path.

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Edward Allcutt <edward.allcutt@openmarket.com>

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 79c3d94..42b7bcf 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -739,8 +739,6 @@ static void icmp_unreach(struct sk_buff *skb)
 				/* fall through */
 			case 0:
 				info = ntohs(icmph->un.frag.mtu);
-				if (!info)
-					goto out;
 			}
 			break;
 		case ICMP_SR_FAILED:
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-06-30 15:16 [PATCH] ipv4: icmp: Fix pMTU handling for rare case Edward Allcutt
@ 2014-07-01  6:48 ` David Miller
  2014-07-01  8:42   ` Edward Allcutt
  2014-07-08  0:23 ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: David Miller @ 2014-07-01  6:48 UTC (permalink / raw)
  To: edward.allcutt; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Edward Allcutt <edward.allcutt@openmarket.com>
Date: Mon, 30 Jun 2014 16:16:02 +0100

> This is explicitly described as an eventuality that hosts must deal
> with by the standard (RFC 1191) since older standards specified that
> those bits must be zero.
 ...
> One example I have seen is an OpenBSD router terminating IPSec
> tunnels.

Why doesn't OpenBSD implement RFC 1191?

That's a nearly 24 year old standard.

I really don't want to allow for zero values.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-01  6:48 ` David Miller
@ 2014-07-01  8:42   ` Edward Allcutt
  2014-07-01 18:05     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Edward Allcutt @ 2014-07-01  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

On Tue, 1 Jul 2014, David Miller wrote:
> From: Edward Allcutt <edward.allcutt@openmarket.com>
> Date: Mon, 30 Jun 2014 16:16:02 +0100
>
>> This is explicitly described as an eventuality that hosts must deal
>> with by the standard (RFC 1191) since older standards specified that
>> those bits must be zero.
> ...
>> One example I have seen is an OpenBSD router terminating IPSec
>> tunnels.
>
> Why doesn't OpenBSD implement RFC 1191?

Why do you think I know? :)

However the standard says that you should interoperate with older 
implementations, and I can't see any downside to doing so.

> I really don't want to allow for zero values.

Why? I have had a look through all the higher level protocols and they 
seem to handle this fine, if they are allowed to see the signal at all. 
Most of them fall back to the minimum packet size, which isn't ideal but 
it's much better than just stalling indefinitely.

If it helps any, I've been running several production machines with this 
patch for just about a year now (mostly running 3.10 stable series).

-- 
Edward Allcutt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-01  8:42   ` Edward Allcutt
@ 2014-07-01 18:05     ` David Miller
  2014-07-01 18:50       ` Edward Allcutt
  2014-07-02  9:05       ` Edward Allcutt
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2014-07-01 18:05 UTC (permalink / raw)
  To: edward.allcutt; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Edward Allcutt <edward.allcutt@openmarket.com>
Date: Tue, 1 Jul 2014 09:42:14 +0100 (BST)

> If it helps any, I've been running several production machines with
> this patch for just about a year now (mostly running 3.10 stable
> series).

I guess if OpenBSD can wait more than 2 decades to implement proper
path mtu handling, you can wait a year to post this bug fix :-)

I still think the OpenBSD thing can't be intentional, and it's some
bug they probably want to fix and it should therefore be investigated.
It means performance of connections going through such machines is
going to be crap even if I install your patch.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-01 18:05     ` David Miller
@ 2014-07-01 18:50       ` Edward Allcutt
  2014-07-01 19:07         ` David Miller
  2014-07-02  9:05       ` Edward Allcutt
  1 sibling, 1 reply; 10+ messages in thread
From: Edward Allcutt @ 2014-07-01 18:50 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

On Tue, 1 Jul 2014, David Miller wrote:
>> If it helps any, I've been running several production machines with
>> this patch for just about a year now (mostly running 3.10 stable
>> series).
>
> I guess if OpenBSD can wait more than 2 decades to implement proper
> path mtu handling, you can wait a year to post this bug fix :-)

Sorry about that. :-/ Some of the systems on the other end of tunnels are 
now starting to run distro kernels with this bug. I can't always convince 
them to rebuild with a patch.. I should have gotten around to this sooner.

> I still think the OpenBSD thing can't be intentional, and it's some
> bug they probably want to fix and it should therefore be investigated.
> It means performance of connections going through such machines is
> going to be crap even if I install your patch.

Making newer versions behave better won't help all the existing 
deployments. Linux should be able to cope even if it's not optimal.

The performance impact of using minimum packet size vs. optimal sizing is 
that you end up sending a bit under three times the number of packets. At 
worst. It's not ideal but it's far less bad than stalling indefinitely.

The patch won't affect performance of the normal case where more useful 
ICMP errors are received.

I did look at trying to reintroduce the original function which guessed 
the next smallest value. However since the routing cache has gone away I 
don't think there's anywhere to find out the size of previously sent 
packets without interrogating the upper protocol.. so handing the upper 
protocol the 0 value and letting it handle it seems the best approach.

Now perhaps the plateau-finding function should be reintroduced for tcp 
and other stream-oriented protocols. Is "it could be done better" enough 
reason to reject a patch that mitigates a real regression?

Is there another reason you dislike this approach?

-- 
Edward Allcutt
Senior Systems Engineer
OpenMarket | http://www.openmarket.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-01 18:50       ` Edward Allcutt
@ 2014-07-01 19:07         ` David Miller
  2014-07-01 19:38           ` Edward Allcutt
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2014-07-01 19:07 UTC (permalink / raw)
  To: edward.allcutt; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Edward Allcutt <edward.allcutt@openmarket.com>
Date: Tue, 1 Jul 2014 19:50:15 +0100 (BST)

> Is there another reason you dislike this approach?

I didn't say I won't apply the patch, actually the email you are
replying to is strictly about figuring out what's happening on
the OpenBSD systems for their sake.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-01 19:07         ` David Miller
@ 2014-07-01 19:38           ` Edward Allcutt
  0 siblings, 0 replies; 10+ messages in thread
From: Edward Allcutt @ 2014-07-01 19:38 UTC (permalink / raw)
  To: David Miller
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel, Edward Allcutt

On Tue, 1 Jul 2014, David Miller wrote:
> I didn't say I won't apply the patch, actually the email you are
> replying to is strictly about figuring out what's happening on
> the OpenBSD systems for their sake.

Ah, sorry for the misunderstanding.

I'll see if I can find the place in the BSD code. Although I think in 
principle this could come up in other "technically correct" router 
implementations.

-- 
Edward Allcutt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-01 18:05     ` David Miller
  2014-07-01 18:50       ` Edward Allcutt
@ 2014-07-02  9:05       ` Edward Allcutt
  2014-07-02 19:14         ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Edward Allcutt @ 2014-07-02  9:05 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

On Tue, 1 Jul 2014, David Miller wrote:
> I still think the OpenBSD thing can't be intentional, and it's some
> bug they probably want to fix and it should therefore be investigated.

Shout at me if this is getting too off-topic..

Looking at the 5.5 kernel, the plumbing to set up icp->icmp_nextmtu is all 
there in icmp_do_error() when it is passed a non-zero destmtu. 
ip_forward() seems to be the relevant caller, however this initializes 
destmtu to 0 and only sets it in the ICMP_UNREACH_NEEDFRAG case if both
#ifdef IPSEC and some conditional on the current route which I don't 
understand.

Either way it doesn't look intentional since the code to set it is 
present.

-- 
Edward Allcutt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-07-02  9:05       ` Edward Allcutt
@ 2014-07-02 19:14         ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2014-07-02 19:14 UTC (permalink / raw)
  To: edward.allcutt; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Edward Allcutt <edward.allcutt@openmarket.com>
Date: Wed, 2 Jul 2014 10:05:13 +0100 (BST)

> On Tue, 1 Jul 2014, David Miller wrote:
>> I still think the OpenBSD thing can't be intentional, and it's some
>> bug they probably want to fix and it should therefore be investigated.
> 
> Shout at me if this is getting too off-topic..

It's something you should tell the OpenBSD folks about, more importantly
than us :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ipv4: icmp: Fix pMTU handling for rare case
  2014-06-30 15:16 [PATCH] ipv4: icmp: Fix pMTU handling for rare case Edward Allcutt
  2014-07-01  6:48 ` David Miller
@ 2014-07-08  0:23 ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2014-07-08  0:23 UTC (permalink / raw)
  To: edward.allcutt; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Edward Allcutt <edward.allcutt@openmarket.com>
Date: Mon, 30 Jun 2014 16:16:02 +0100

> Some older router implementations still send Fragmentation Needed
> errors with the Next-Hop MTU field set to zero. This is explicitly
> described as an eventuality that hosts must deal with by the
> standard (RFC 1191) since older standards specified that those
> bits must be zero.
> 
> Linux had a generic (for all of IPv4) implementation of the algorithm
> described in the RFC for searching a list of MTU plateaus for a good
> value. Commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
> removed this as part of the changes to remove the routing cache.
> Subsequently any Fragmentation Needed packet with a zero Next-Hop
> MTU has been discarded without being passed to the per-protocol
> handlers or notifying userspace for raw sockets.
> 
> When there is a router which does not implement RFC 1191 on an
> MTU limited path then this results in stalled connections since
> large packets are discarded and the local protocols are not
> notified so they never attempt to lower the pMTU.
> 
> One example I have seen is an OpenBSD router terminating IPSec
> tunnels. It's worth pointing out that this case is distinct from
> the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
> since the commit in question dismissed that as a valid concern.
> 
> All of the per-protocols handlers implement the simple approach from
> RFC 1191 of immediately falling back to the minimum value. Although
> this is sub-optimal it is vastly preferable to connections hanging
> indefinitely.
> 
> Remove the Next-Hop MTU != 0 check and allow such packets
> to follow the normal path.
> 
> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
> Signed-off-by: Edward Allcutt <edward.allcutt@openmarket.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-08  0:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-30 15:16 [PATCH] ipv4: icmp: Fix pMTU handling for rare case Edward Allcutt
2014-07-01  6:48 ` David Miller
2014-07-01  8:42   ` Edward Allcutt
2014-07-01 18:05     ` David Miller
2014-07-01 18:50       ` Edward Allcutt
2014-07-01 19:07         ` David Miller
2014-07-01 19:38           ` Edward Allcutt
2014-07-02  9:05       ` Edward Allcutt
2014-07-02 19:14         ` David Miller
2014-07-08  0:23 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).