netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* per route MTU settings
@ 2013-01-11 16:29 pupilla
  2013-01-11 18:35 ` Lukas Tribus
  0 siblings, 1 reply; 5+ messages in thread
From: pupilla @ 2013-01-11 16:29 UTC (permalink / raw)
  To: netdev

Hello everybody.

I have done some tests with per route
mtu settings.

Here is the results on the 10.81.104.254
linux box running 3.6.9 on Slackware 14
32bit:

ip route add 10.81.105.109/32 via 10.81.104.1 mtu lock 1450
ip route flush cache

ping -M do 10.81.105.109 -c 5 -s 1450
PING 10.81.105.109 (10.81.105.109) 1450(1478) bytes of data.
>From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
>From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
>From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
>From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
>From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)

Here is the results on my linux box with
ip address 10.81.104.126 (the default
gateway is 10.81.104.254) running linux
3.7.0 on Slackware 14 64bit

ping -M do 10.81.105.109 -c 5 -s 560
PING 10.81.105.109 (10.81.105.109) 560(588) bytes of data.
>From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 576)
>From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
>From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
>From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
>From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)

When packets are generated locally (on
the 10.81.104.254 box), linux icmp message
'need to frag' report the correct mtu of
next hop. Instead on the forwarded packets
(those that aren't originated on the 10.81.104.254
box), I always get the 576 value on the mtu
of nexthop.
Is this the expected behaviour?

Any response are welcome

TIA

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: per route MTU settings
  2013-01-11 16:29 per route MTU settings pupilla
@ 2013-01-11 18:35 ` Lukas Tribus
  0 siblings, 0 replies; 5+ messages in thread
From: Lukas Tribus @ 2013-01-11 18:35 UTC (permalink / raw)
  To: pupilla, netdev


Hi,

10.81.104.254 will never transmit anything with
> ping -M do 10.81.105.109 -c 5 -s 1450
because the host already knows a 1478 Byte packet won't fit the
1450 Byte route you made towards 10.81.104.1.

You do realize that 1450 is your ICMP payload, plus ICMP header (8B)
and IP header (20B) and you are at 1478B, which exceeds your route's
MTU.


I guess 10.81.104.1 has a 576B MTU route/interface towards
10.81.105.109, and you did the first test on 10.81.104.254,
and only afterwards you tried "-s 560" from .126, so only
then the .254 host realized the transport to 10.81.104.1
is actually a 576B MTU path.

You cannot do pings with 2 different packet sizes on 2 different hosts,
and expect them to behave exactly the same.

Running from both hosts pings with:
-s 548
-s 549
-s 1422
-s 1423

and then analyzing the results will probably give us a better
idea of what actually happens.



Regards,

Lukas





----------------------------------------
> Date: Fri, 11 Jan 2013 17:29:28 +0100
> From: pupilla@libero.it
> To: netdev@vger.kernel.org
> Subject: per route MTU settings
>
> Hello everybody.
>
> I have done some tests with per route
> mtu settings.
>
> Here is the results on the 10.81.104.254
> linux box running 3.6.9 on Slackware 14
> 32bit:
>
> ip route add 10.81.105.109/32 via 10.81.104.1 mtu lock 1450
> ip route flush cache
>
> ping -M do 10.81.105.109 -c 5 -s 1450
> PING 10.81.105.109 (10.81.105.109) 1450(1478) bytes of data.
> From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
> From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
> From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
> From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
> From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 1450)
>
> Here is the results on my linux box with
> ip address 10.81.104.126 (the default
> gateway is 10.81.104.254) running linux
> 3.7.0 on Slackware 14 64bit
>
> ping -M do 10.81.105.109 -c 5 -s 560
> PING 10.81.105.109 (10.81.105.109) 560(588) bytes of data.
> From 10.81.104.254 icmp_seq=1 Frag needed and DF set (mtu = 576)
> From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
> From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
> From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
> From 10.81.104.126 icmp_seq=2 Frag needed and DF set (mtu = 576)
>
> When packets are generated locally (on
> the 10.81.104.254 box), linux icmp message
> 'need to frag' report the correct mtu of
> next hop. Instead on the forwarded packets
> (those that aren't originated on the 10.81.104.254
> box), I always get the 576 value on the mtu
> of nexthop.
> Is this the expected behaviour?
>
> Any response are welcome
>
> TIA
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
 		 	   		  

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: per route MTU settings
  2013-01-15 17:07 Lukas Tribus
  2013-01-16  6:31 ` Steffen Klassert
@ 2013-02-03 11:22 ` Lukas Tribus
  1 sibling, 0 replies; 5+ messages in thread
From: Lukas Tribus @ 2013-02-03 11:22 UTC (permalink / raw)
  To: pupilla; +Cc: netdev


FYI, Steffens fix is in 3.8-rc6 now and it works for me.


root@ubuntuvm:~# cat /proc/version
Linux version 3.8.0-030800rc6-generic (root@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201301312135 SMP Fri Feb 1 02:36:25 UTC 2013
root@ubuntuvm:~# echo 1 > /proc/sys/net/ipv4/ip_forward
root@ubuntuvm:~# echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
root@ubuntuvm:~# echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
root@ubuntuvm:~# ip route add 8.8.8.0/24 via 10.0.0.254 mtu lock 1200
root@ubuntuvm:~# ip route add 8.8.4.0/24 via 10.0.0.254 mtu lock 1200
root@ubuntuvm:~#
root@ubuntuvm:~# tcpdump -nvvv icmp -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
12:17:19.085868 IP (tos 0x0, ttl 128, id 3161, offset 0, flags [DF], proto ICMP (1), length 1428)
    10.0.0.3 > 8.8.4.4: ICMP echo request, id 1, seq 38744, length 1408
12:17:19.085910 IP (tos 0xc0, ttl 64, id 53508, offset 0, flags [none], proto ICMP (1), length 576)
    10.0.0.55 > 10.0.0.3: ICMP 8.8.4.4 unreachable - need to frag (mtu 1200), length 556
        IP (tos 0x0, ttl 128, id 3161, offset 0, flags [DF], proto ICMP (1), length 1428)
    10.0.0.3 > 8.8.4.4: ICMP echo request, id 1, seq 38744, length 1408
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
root@ubuntuvm:~#



 		 	   		  

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: per route MTU settings
  2013-01-15 17:07 Lukas Tribus
@ 2013-01-16  6:31 ` Steffen Klassert
  2013-02-03 11:22 ` Lukas Tribus
  1 sibling, 0 replies; 5+ messages in thread
From: Steffen Klassert @ 2013-01-16  6:31 UTC (permalink / raw)
  To: Lukas Tribus; +Cc: pupilla, netdev

On Tue, Jan 15, 2013 at 06:07:59PM +0100, Lukas Tribus wrote:
> 
> Hi pupilla,
> 
> looks like the behavior changed with 3.2-rc5 and "[PATCH 5/5] ipv4:
> Don't use the cached pmtu informations for input routes" ([1], [2]).
> 
> Actually, a "mtu lock XYZ" applied to a route is a bit of a corner case.
> 
> 
> Steffen, you already made this statement once and I can only agree with you:
> 
> > The router that can't send the packet to the next hop network has to
> > send the ICMP Destination Unreachable message. We never propagated
> > learned PMTU informations and I would not like to change this
> 
> 
> But here is our issue:
> - the linux "ip_forwarder" has an MTU of 1500 Byte on relevant interfaces
> - there is a route with a "static" mtu lock at 1200 Byte
> - the box is supposed to forward a packet heading the 1200B MTU route
> 
> What happens is:
> - the packet is dropped (because it exceeds the 1200 Byte)
> - an ICMP Type 3 Code 4 message is generated with 576 Byte next-hop MTU
> 
> Notice that the 576 Byte indicated as next-hop MTU in the ICMP packet
> doesn't match neither outgoing interface MTU, nor the static route's MTU.
> 
> Prior to your patch (for example in 3.2-rc4), 1200 Byte was indicated as
> MTU in the ICMP packet.

This patch was needed during the times we cached the pmtu informations
on the inetpeer. Now the pmtu informations are back in the routes,
so this check is obsolete. We can simply revert it, I'll send a patch
to do that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: per route MTU settings
@ 2013-01-15 17:07 Lukas Tribus
  2013-01-16  6:31 ` Steffen Klassert
  2013-02-03 11:22 ` Lukas Tribus
  0 siblings, 2 replies; 5+ messages in thread
From: Lukas Tribus @ 2013-01-15 17:07 UTC (permalink / raw)
  To: pupilla, steffen.klassert; +Cc: netdev


Hi pupilla,

looks like the behavior changed with 3.2-rc5 and "[PATCH 5/5] ipv4:
Don't use the cached pmtu informations for input routes" ([1], [2]).

Actually, a "mtu lock XYZ" applied to a route is a bit of a corner case.


Steffen, you already made this statement once and I can only agree with you:

> The router that can't send the packet to the next hop network has to
> send the ICMP Destination Unreachable message. We never propagated
> learned PMTU informations and I would not like to change this


But here is our issue:
- the linux "ip_forwarder" has an MTU of 1500 Byte on relevant interfaces
- there is a route with a "static" mtu lock at 1200 Byte
- the box is supposed to forward a packet heading the 1200B MTU route

What happens is:
- the packet is dropped (because it exceeds the 1200 Byte)
- an ICMP Type 3 Code 4 message is generated with 576 Byte next-hop MTU

Notice that the 576 Byte indicated as next-hop MTU in the ICMP packet
doesn't match neither outgoing interface MTU, nor the static route's MTU.

Prior to your patch (for example in 3.2-rc4), 1200 Byte was indicated as
MTU in the ICMP packet.


root@ubuntuvm:~# cat /proc/version
Linux version 3.2.0-030200rc5-generic (root@gomeisa) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #201112091935 SMP Sat Dec 10 00:36:07 UTC 2011
root@ubuntuvm:~# echo 1> /proc/sys/net/ipv4/ip_forward
root@ubuntuvm:~# echo 0> /proc/sys/net/ipv4/conf/eth0/send_redirects
root@ubuntuvm:~# ip route add 8.8.8.8/32 via 10.0.0.254 mtu lock 1200
root@ubuntuvm:~# ip r
default via 10.0.0.254 dev eth0 metric 100
8.8.8.8 via 10.0.0.254 dev eth0 mtu lock 1200
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.55
root@ubuntuvm:~# tcpdump -nvvv icmp -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
01:49:20.193798 IP (tos 0x0, ttl 128, id 12090, offset 0, flags [DF], proto ICMP (1), length 1428)
10.0.0.3> 8.8.8.8: ICMP echo request, id 1, seq 4150, length 1408
01:49:20.193847 IP (tos 0xc0, ttl 64, id 15646, offset 0, flags [none], proto ICMP (1), length 576)
10.0.0.55> 10.0.0.3: ICMP 8.8.8.8 unreachable - need to frag (mtu 576), length 556
IP (tos 0x0, ttl 128, id 12090, offset 0, flags [DF], proto ICMP (1), length 1428)
10.0.0.3> 8.8.8.8: ICMP echo request, id 1, seq 4150, length 1408
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel


Now, what is the kernel supposed to do in this case?

In my opinion either act like <3.2-rc4 and return 1200 Byte in the ICMP
error message or forward the packet anyway (we have the necessary interface
MTU to do it), ignoring the route with "mtu lock".


Steffen, could you share your opinion about this?


Its probably a good idea to avoid "mtu lock" on routes completely though.



Regards,

Lukas

[1] http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=261663b0ee2ee8e3947f4c11c1a08be18cd2cea1
[2] http://patchwork.ozlabs.org/patch/127288/

 		 	   		  

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-02-03 11:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-11 16:29 per route MTU settings pupilla
2013-01-11 18:35 ` Lukas Tribus
2013-01-15 17:07 Lukas Tribus
2013-01-16  6:31 ` Steffen Klassert
2013-02-03 11:22 ` Lukas Tribus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).