netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bisected] ICMP fragmentation needed ignored / PMTU discovery broken since 3.19-rc7
@ 2015-04-27 12:33 Gerd v. Egidy
  2015-04-28  3:43 ` Herbert Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Gerd v. Egidy @ 2015-04-27 12:33 UTC (permalink / raw)
  To: netdev; +Cc: Li Wei, David S. Miller

Hi,

my colleagues recently reported that they had spurious problems connecting
to a specific server via ssh. Investigation showed that the kernel completely 
ignored the ICMP dest. unreachable / fragmentation needed packets in this case:

client.45662 > server.22: Flags [S], seq 3738194662, win 29200, options [mss 1460,sackOK,TS val 668602 ecr 0,nop,wscale 7], length 0
server.22 > client.45662: Flags [S.], seq 2215869953, ack 3738194663, win 5792, options [mss 1460,sackOK,TS val 2974105033 ecr 668602,nop,wscale 6], length 0
client.45662 > server.22: Flags [.], ack 1, win 229, options [nop,nop,TS val 668621 ecr 2974105033], length 0
client.45662 > server.22: Flags [P.], seq 1:22, ack 1, win 229, options [nop,nop,TS val 668626 ecr 2974105033], length 21
server.22 > client.45662: Flags [.], ack 22, win 91, options [nop,nop,TS val 2974105057 ecr 668626], length 0
server.22 > client.45662: Flags [P.], seq 1:21, ack 22, win 91, options [nop,nop,TS val 2974105069 ecr 668626], length 20
client.45662 > server.22: Flags [.], ack 21, win 229, options [nop,nop,TS val 668657 ecr 2974105069], length 0
client.45662 > server.22: Flags [.], seq 22:1470, ack 21, win 229, options [nop,nop,TS val 668658 ecr 2974105069], length 1448
router > client: ICMP server unreachable - need to frag (mtu 1456), length 556
client.45662 > server.22: Flags [P.], seq 1470:1854, ack 21, win 229, options [nop,nop,TS val 668658 ecr 2974105069], length 384
server.22 > client.45662: Flags [P.], seq 21:725, ack 22, win 91, options [nop,nop,TS val 2974105088 ecr 668657], length 704
server.22 > client.45662: Flags [.], ack 22, win 91, options [nop,nop,TS val 2974105091 ecr 668657,nop,nop,sack 1 {1470:1854}], length 0
client.45662 > server.22: Flags [.], seq 22:1470, ack 725, win 240, options [nop,nop,TS val 668684 ecr 2974105088], length 1448
router > client: ICMP server unreachable - need to frag (mtu 1456), length 556
server.22 > client.45662: Flags [P.], seq 21:725, ack 22, win 91, options [nop,nop,TS val 2974105307 ecr 668657,nop,nop,sack 1 {1470:1854}], length 704
client.45662 > server.22: Flags [.], ack 725, win 240, options [nop,nop,TS val 668897 ecr 2974105307,nop,nop,sack 1 {21:725}], length 0
client.45662 > server.22: Flags [.], seq 22:1470, ack 725, win 240, options [nop,nop,TS val 668904 ecr 2974105307], length 1448
router > client: ICMP server unreachable - need to frag (mtu 1456), length 556
client.45662 > server.22: Flags [.], seq 22:1470, ack 725, win 240, options [nop,nop,TS val 669345 ecr 2974105307], length 1448

(tcpdump was done with tso, gso and gro off to show the real packet sizes on the line)

It took me a while till I found out how to reliably reproduce this:

$ ssh server
$ exit
$ ip route get server
server via router dev eth0  src client
    cache      expires 597sec mtu 1390
$ sleep 597
# really make sure the pmtu cache is empty
$ ip route get server
server via router dev eth0  src client
    cache 
$ ssh server

Now you can observe the behavior shown in the tcpdump above and the connection
will stall. So it only occurs after a pmtu entry expired from the routing/fib cache.
Calling "ip route flush cache" instead of waiting will not show the problem.

I bisected the problem and found the patch 3cdaa5be9e8 (also attached below) to be
the point where the problem was introduced. It was applied between 3.19-rc6 and 3.19-rc7
and is still in current mainline. I have tried a 4.0.0 kernel with this patch reverted
and it does not show the problem, while I can reproduce it with a vanilla 4.0.0.

I'm not familiar enough with the routing implementation in Linux to really tell why
this patch causes the problems I'm seeing, but from cursory glance seems to me that 
the patch only checks the rt->rt_pmtu value and not the pmtu value within a fnhe entry
which might also exist.

Please look into this.

Thank you.

Kind regards,

Gerd

>From 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 Mon Sep 17 00:00:00 2001
From: Li Wei <lw@cn.fujitsu.com>
Date: Thu, 29 Jan 2015 16:09:03 +0800
Subject: [PATCH] ipv4: Don't increase PMTU with Datagram Too Big message.

RFC 1191 said, "a host MUST not increase its estimate of the Path
MTU in response to the contents of a Datagram Too Big message."

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d58dd0e..52e1f2b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -966,6 +966,9 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
    if (dst->dev->mtu < mtu)
        return;
 
+   if (rt->rt_pmtu && rt->rt_pmtu < mtu)
+       return;
+
    if (mtu < ip_rt_min_pmtu)
        mtu = ip_rt_min_pmtu;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [bisected] ICMP fragmentation needed ignored / PMTU discovery broken since 3.19-rc7
  2015-04-27 12:33 [bisected] ICMP fragmentation needed ignored / PMTU discovery broken since 3.19-rc7 Gerd v. Egidy
@ 2015-04-28  3:43 ` Herbert Xu
  2015-04-28  8:59   ` Gerd v. Egidy
  2015-04-29 18:43   ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Herbert Xu @ 2015-04-28  3:43 UTC (permalink / raw)
  To: Gerd v. Egidy; +Cc: netdev, lw, davem

Gerd v. Egidy <gerd.von.egidy@intra2net.com> wrote:
> 
> my colleagues recently reported that they had spurious problems connecting
> to a specific server via ssh. Investigation showed that the kernel completely 
> ignored the ICMP dest. unreachable / fragmentation needed packets in this case:

This patch might help:

---8<---
Subject: route: Use ipv4_mtu instead of raw rt_pmtu

The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't
increase PMTU with Datagram Too Big message") broke PMTU in cases
where the rt_pmtu value has expired but is smaller than the new
PMTU value.

This obsolete rt_pmtu then prevents the new PMTU value from being
installed.

Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message")
Reported-by: Gerd v. Egidy <gerd.von.egidy@intra2net.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a78540f..bff62fc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -962,10 +962,7 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 	if (dst_metric_locked(dst, RTAX_MTU))
 		return;
 
-	if (dst->dev->mtu < mtu)
-		return;
-
-	if (rt->rt_pmtu && rt->rt_pmtu < mtu)
+	if (ipv4_mtu(dst) < mtu)
 		return;
 
 	if (mtu < ip_rt_min_pmtu)
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [bisected] ICMP fragmentation needed ignored / PMTU discovery broken since 3.19-rc7
  2015-04-28  3:43 ` Herbert Xu
@ 2015-04-28  8:59   ` Gerd v. Egidy
  2015-04-29 18:43   ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Gerd v. Egidy @ 2015-04-28  8:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, lw, davem

Hello Herbert,

> This patch might help:
> 
> ---8<---
> Subject: route: Use ipv4_mtu instead of raw rt_pmtu

thank you very much for looking into this. With your patch applied I was not 
able to reproduce the problem anymore.

Could you take care of getting this into mainline and stable?

Thanks.

Kind regards,

Gerd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bisected] ICMP fragmentation needed ignored / PMTU discovery broken since 3.19-rc7
  2015-04-28  3:43 ` Herbert Xu
  2015-04-28  8:59   ` Gerd v. Egidy
@ 2015-04-29 18:43   ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2015-04-29 18:43 UTC (permalink / raw)
  To: herbert; +Cc: gerd.von.egidy, netdev, lw

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 28 Apr 2015 11:43:15 +0800

> Subject: route: Use ipv4_mtu instead of raw rt_pmtu
> 
> The commit 3cdaa5be9e81a914e633a6be7b7d2ef75b528562 ("ipv4: Don't
> increase PMTU with Datagram Too Big message") broke PMTU in cases
> where the rt_pmtu value has expired but is smaller than the new
> PMTU value.
> 
> This obsolete rt_pmtu then prevents the new PMTU value from being
> installed.
> 
> Fixes: 3cdaa5be9e81 ("ipv4: Don't increase PMTU with Datagram Too Big message")
> Reported-by: Gerd v. Egidy <gerd.von.egidy@intra2net.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-04-29 18:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-27 12:33 [bisected] ICMP fragmentation needed ignored / PMTU discovery broken since 3.19-rc7 Gerd v. Egidy
2015-04-28  3:43 ` Herbert Xu
2015-04-28  8:59   ` Gerd v. Egidy
2015-04-29 18:43   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).