* [PATCH] icmp: Restore resistence to abnormal messages @ 2016-11-11 20:20 Vicente Jimenez Aguilar 2016-11-14 18:36 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Vicente Jimenez Aguilar @ 2016-11-11 20:20 UTC (permalink / raw) To: David S . Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy Cc: netdev, linux-kernel, Vicente Jimenez Aguilar Restore network resistance to abnormal ICMP fragmentation needed messages with next hop MTU equal to (or exceeding) dropped packet size Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com> --- net/ipv4/icmp.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 38abe70..4c90d76 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -773,6 +773,7 @@ static bool icmp_tag_validation(int proto) static bool icmp_unreach(struct sk_buff *skb) { const struct iphdr *iph; + unsigned short old_mtu; struct icmphdr *icmph; struct net *net; u32 info = 0; @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) /* fall through */ case 0: info = ntohs(icmph->un.frag.mtu); + /* Handle weird case where next hop MTU is + * equal to or exceeding dropped packet size + */ + old_mtu = ntohs(iph->tot_len); + if (info >= old_mtu) + info = old_mtu - 2; } break; case ICMP_SR_FAILED: -- 2.9.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-11 20:20 [PATCH] icmp: Restore resistence to abnormal messages Vicente Jimenez Aguilar @ 2016-11-14 18:36 ` David Miller 2016-11-15 16:49 ` Vicente Jiménez 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2016-11-14 18:36 UTC (permalink / raw) To: googuy; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel From: Vicente Jimenez Aguilar <googuy@gmail.com> Date: Fri, 11 Nov 2016 21:20:18 +0100 > @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) > /* fall through */ > case 0: > info = ntohs(icmph->un.frag.mtu); > + /* Handle weird case where next hop MTU is > + * equal to or exceeding dropped packet size > + */ > + old_mtu = ntohs(iph->tot_len); > + if (info >= old_mtu) > + info = old_mtu - 2; This isn't something the old code did. The old code behaved much differently. In the case where the new mtu was smaller than 68 or larger than the iph->tot_len value, it would do several things: 1) First it would check for a BSD 4.2 anomaly and subtract old_mtu by the IP header length. 2) Second, it would try to guess the intended MTU using the mtu_plateau table. I don't see any code where a subtraction by a fixed constant of 2 occurred. Nor can I figure out what that might accomplish. If you really want to do this, you have to docuement what this 2 means, what it is accomplishing, and why you have choosen to accomplish it this way. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-14 18:36 ` David Miller @ 2016-11-15 16:49 ` Vicente Jiménez 2016-11-15 16:56 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Vicente Jiménez @ 2016-11-15 16:49 UTC (permalink / raw) To: David Miller Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 6602 bytes --] On Mon, Nov 14, 2016 at 7:36 PM, David Miller <davem@davemloft.net> wrote: > From: Vicente Jimenez Aguilar <googuy@gmail.com> > Date: Fri, 11 Nov 2016 21:20:18 +0100 > >> @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) >> /* fall through */ >> case 0: >> info = ntohs(icmph->un.frag.mtu); >> + /* Handle weird case where next hop MTU is >> + * equal to or exceeding dropped packet size >> + */ >> + old_mtu = ntohs(iph->tot_len); >> + if (info >= old_mtu) >> + info = old_mtu - 2; > > This isn't something the old code did. > > The old code behaved much differently. > I don't wanted to restore old behavior just fix a strange case that was handle by this code where the next hop MTU reported by the router is equal or greater than the actual path MTU. Because router information is wrong, we need a way to guess a good packet size ignoring router data. The simplest strategy that avoid odd numbers is reducing dropped packet size by 2. > In the case where the new mtu was smaller than 68 or larger than > the iph->tot_len value, it would do several things: Larger OR EQUAL (our case) than the iph->tot_len. > > 1) First it would check for a BSD 4.2 anomaly and subtract old_mtu > by the IP header length. This only happens when next hop MTU is zero. I don't care about this case. It is handled at protocol level as the commit that removed that function says. > > 2) Second, it would try to guess the intended MTU using the > mtu_plateau table. > Old kernel work because they reduced the current MTU by a small amount using the mtu_plateau. Not because we have some trouble with headers and we need the common 1492 value. > I don't see any code where a subtraction by a fixed constant of 2 > occurred. > This is the simplest solution I came about. Because packets of size old_mtu get dropped and suggested next hop MTU is equal or larger, we just ignore wrong router information and try to get the correct MTU by reducing current MTU by just a bit. I discarded subtracting by 1 to avoid odd numbers. For our case: - Old kernel sets MTU to 1492 from previous 1500 bytes and worked. - Newer kernel enters infinite loop sending dropped size packets. - Patched kernel sets MTU to 1498 from 1500 and go trough this abnormal router. - Microsoft Windows also recovers from those strange ICMP messages. By the way, setting MTU to 1499 also failed to communicate for us. > Nor can I figure out what that might accomplish. If you really > want to do this, you have to docuement what this 2 means, what > it is accomplishing, and why you have choosen to accomplish it > this way. > > Thanks. I just wanted to restore the "new_mtu >= old_mtu" check and instead of reducing the current packet size using a plateau function, decrement current size by 2 (to avoid odd numbers). This solution solved our problem setting MTU to 1498 and I think this must converge to any good MTU value because it reduce packet size until we don't receive any more ICMP fragmentation needed messages. We can criticize that this solution could be slow to reach small MTU values. But I suspect that those abnormal ICMP messages was caused by a router bug and must be very close to a size that get through them. If suggested MTU is smaller than current packet size, we set the MTU to next hop MTU. If those packets that are equal in size to the suggested one still get dropped, then this code try to overcome that. It had the strength of being really simple and it must be ideally never be executed (we are still trying to fix the abnormal router, but we don't manage it). I suspect that for those weird case, a working size must be very close to the suggested and it is far better than current kernel that just take the value in next hop MTU without any check and enter in a infinite loop sending dropped size packets. TCP connections stall and finally time out. I suspect that a router bug get size comparison wrong and the real path MTU is 1500 (as it is in the other direction). We need to get around this and currently we just lowered interface MTU at startup for those devices. Where do I need to put more explanation? - Code comment - Commit message Attached capture file that show the problem: Abnormal ICMP messages that drops and suggest 1500 bytes. This is the long explanation that I emailed earlier: " In a large corporate network, we spotted this weird ICMP message after a long troubleshooting. See attached capture file. Those ICMP "network unreachable - fragmentation needed and don't fragment bit set" messages are sent by a router that drop 1500 bytes IP packets and fill the next hop MTU ICMP field with 1500. Those messages cause the TCP connection to stall but only on newer kernels. Older kernels set path MTU to 1492 and communicates successfully. After checking code and commit history, I spotted how commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") from June 2012 changed ICMP messages handling by removing ip_rt_frag_needed function. The relevant part of the ip_rt_frag_needed function that was removed is: if (new_mtu < 68 || new_mtu >= old_mtu) { /* BSD 4.2 derived systems incorrectly adjust * tot_len by the IP header length, and report * a zero MTU in the ICMP message. */ if (mtu == 0 && old_mtu >= 68 + (iph->ihl << 2)) old_mtu -= iph->ihl << 2; mtu = guess_mtu(old_mtu); } This condition handled the cases when next hop MTU where zero (less than 68). Now this is handled by the protocol and fixed by commit 68b7107b6298 "ipv4: icmp: Fix pMTU handling for rare case". But the rarest case when (next hop MTU) new_mtu >= old_mtu (dropped packet length) was also removed. This commit restores this check. Instead of using a table lookup like function guess_mtu uses, it just try to set the path MTU decrementing by 2 bytes the dropped packet size. In our case, setting the path MTU to just 1498 (one iteration) worked. This solution should converge in any case to a good value by small steps. I don't think there's a need to a more complex solution. The patched kernel worked perfectly setting the path MTU to 1498 from the initial default interface value of 1500. This time I don't have a capture file from inside the affected center, but all received packed had a maximum size of 1498. " -- thanks vicente [-- Attachment #2: ICMP dropping and suggesting 1500.pcapng --] [-- Type: application/x-pcapng, Size: 89664 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-15 16:49 ` Vicente Jiménez @ 2016-11-15 16:56 ` David Miller 2016-11-15 17:30 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2016-11-15 16:56 UTC (permalink / raw) To: googuy; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel From: Vicente Jiménez <googuy@gmail.com> Date: Tue, 15 Nov 2016 17:49:43 +0100 > On Mon, Nov 14, 2016 at 7:36 PM, David Miller <davem@davemloft.net> wrote: >> From: Vicente Jimenez Aguilar <googuy@gmail.com> >> Date: Fri, 11 Nov 2016 21:20:18 +0100 >> >>> @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) >>> /* fall through */ >>> case 0: >>> info = ntohs(icmph->un.frag.mtu); >>> + /* Handle weird case where next hop MTU is >>> + * equal to or exceeding dropped packet size >>> + */ >>> + old_mtu = ntohs(iph->tot_len); >>> + if (info >= old_mtu) >>> + info = old_mtu - 2; >> >> This isn't something the old code did. >> >> The old code behaved much differently. >> > I don't wanted to restore old behavior just fix a strange case that > was handle by this code where the next hop MTU reported by the router > is equal or greater than the actual path MTU. Because router > information is wrong, we need a way to guess a good packet size > ignoring router data. The simplest strategy that avoid odd numbers is > reducing dropped packet size by 2. This whole approach seems arbitrary. You haven't discussed in any way, what causes this in the first place. And what about that cause makes simply subtracting by 2 work well or not. You have a very locallized, specific, situation on your end you want to fix. But we must accept changes that handle things generically and in a way that would help more than just your specific case. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-15 16:56 ` David Miller @ 2016-11-15 17:30 ` Florian Westphal 2016-11-15 19:32 ` Vicente Jiménez 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2016-11-15 17:30 UTC (permalink / raw) To: David Miller Cc: googuy, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel David Miller <davem@redhat.com> wrote: > From: Vicente Jiménez <googuy@gmail.com> > Date: Tue, 15 Nov 2016 17:49:43 +0100 > > > On Mon, Nov 14, 2016 at 7:36 PM, David Miller <davem@davemloft.net> wrote: > >> From: Vicente Jimenez Aguilar <googuy@gmail.com> > >> Date: Fri, 11 Nov 2016 21:20:18 +0100 > >> > >>> @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) > >>> /* fall through */ > >>> case 0: > >>> info = ntohs(icmph->un.frag.mtu); > >>> + /* Handle weird case where next hop MTU is > >>> + * equal to or exceeding dropped packet size > >>> + */ > >>> + old_mtu = ntohs(iph->tot_len); > >>> + if (info >= old_mtu) > >>> + info = old_mtu - 2; > >> > >> This isn't something the old code did. > >> > >> The old code behaved much differently. > >> > > I don't wanted to restore old behavior just fix a strange case that > > was handle by this code where the next hop MTU reported by the router > > is equal or greater than the actual path MTU. Because router > > information is wrong, we need a way to guess a good packet size > > ignoring router data. The simplest strategy that avoid odd numbers is > > reducing dropped packet size by 2. > > This whole approach seems arbitrary. > > You haven't discussed in any way, what causes this in the first place. > And what about that cause makes simply subtracting by 2 work well or > not. > > You have a very locallized, specific, situation on your end you want > to fix. But we must accept changes that handle things generically and > in a way that would help more than just your specific case. FWIW this is similar to the patch I sent a while ago: https://patchwork.ozlabs.org/patch/493997/ I think in interest of robustness principle ("eat shit and don't die") one of these changes should go in :-| ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-15 17:30 ` Florian Westphal @ 2016-11-15 19:32 ` Vicente Jiménez 2016-11-16 1:14 ` Florian Westphal 0 siblings, 1 reply; 8+ messages in thread From: Vicente Jiménez @ 2016-11-15 19:32 UTC (permalink / raw) To: Florian Westphal Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel I agree that both patches try to solve the same problem in a very similar way. Florian Westphal's patch do two more things: 1- add warning with pr_warn_ratelimited. I like this idea. I also though about adding some message but I have no kernel experience and I preferred to have just a working solution. 2- Check if the packet size is lower than (536 + 8). I think this is not necessary because low values (even the zero case) is already handled by the protocol. Also I don't understand why you choose this value, it seems to be related to TCP MSS and the compared value is IP packet size. Finally, both patches decrement current packet by a value: Mine by 2 and Florian's by 8 bytes. Both arbitrary values. Personally I prefer to go by small steps. If the small step fails, it just iterate again and with 4 iterations, my patch also decrement the original value by 8 bytes (4x2). Basically they are the same but my patch take smaller steps and miss the warning message. If David Miller thinks this could be a good addition, I'll add the warning message to my patch. We can also discuss the amount to subtract. On Tue, Nov 15, 2016 at 6:30 PM, Florian Westphal <fw@strlen.de> wrote: > David Miller <davem@redhat.com> wrote: >> From: Vicente Jiménez <googuy@gmail.com> >> Date: Tue, 15 Nov 2016 17:49:43 +0100 >> >> > On Mon, Nov 14, 2016 at 7:36 PM, David Miller <davem@davemloft.net> wrote: >> >> From: Vicente Jimenez Aguilar <googuy@gmail.com> >> >> Date: Fri, 11 Nov 2016 21:20:18 +0100 >> >> >> >>> @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) >> >>> /* fall through */ >> >>> case 0: >> >>> info = ntohs(icmph->un.frag.mtu); >> >>> + /* Handle weird case where next hop MTU is >> >>> + * equal to or exceeding dropped packet size >> >>> + */ >> >>> + old_mtu = ntohs(iph->tot_len); >> >>> + if (info >= old_mtu) >> >>> + info = old_mtu - 2; >> >> >> >> This isn't something the old code did. >> >> >> >> The old code behaved much differently. >> >> >> > I don't wanted to restore old behavior just fix a strange case that >> > was handle by this code where the next hop MTU reported by the router >> > is equal or greater than the actual path MTU. Because router >> > information is wrong, we need a way to guess a good packet size >> > ignoring router data. The simplest strategy that avoid odd numbers is >> > reducing dropped packet size by 2. >> >> This whole approach seems arbitrary. >> >> You haven't discussed in any way, what causes this in the first place. >> And what about that cause makes simply subtracting by 2 work well or >> not. >> >> You have a very locallized, specific, situation on your end you want >> to fix. But we must accept changes that handle things generically and >> in a way that would help more than just your specific case. > > FWIW this is similar to the patch I sent a while ago: > > https://patchwork.ozlabs.org/patch/493997/ > > I think in interest of robustness principle ("eat shit and don't die") > one of these changes should go in :-| -- saludos vicente ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-15 19:32 ` Vicente Jiménez @ 2016-11-16 1:14 ` Florian Westphal 2016-11-17 1:17 ` Vicente Jiménez 0 siblings, 1 reply; 8+ messages in thread From: Florian Westphal @ 2016-11-16 1:14 UTC (permalink / raw) To: Vicente Jiménez Cc: Florian Westphal, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel Vicente Jiménez <googuy@gmail.com> wrote: > I agree that both patches try to solve the same problem in a very similar way. > Florian Westphal's patch do two more things: > 1- add warning with pr_warn_ratelimited. I like this idea. I also > though about adding some message but I have no kernel experience and I > preferred to have just a working solution. I added this only to show whats happening. I don't like such printks because end users can't do anything about it. > 2- Check if the packet size is lower than (536 + 8). I think this is > not necessary because low values (even the zero case) is already > handled by the protocol. Also I don't understand why you choose this > value, it seems to be related to TCP MSS and the compared value is IP > packet size. Right, no need for this check. > Finally, both patches decrement current packet by a value: Mine by 2 > and Florian's by 8 bytes. Both arbitrary values. Personally I prefer > to go by small steps. If the small step fails, it just iterate again > and with 4 iterations, my patch also decrement the original value by 8 > bytes (4x2). > Basically they are the same but my patch take smaller steps and miss > the warning message. IIRC I chose 8 because connection recovered faster in my case. I have not experienced this issue again (I dropped the patch from my kernel at some point and the connection stalls did not reappear so this got fixed elsewhere). I'd just apply your patch, possibly with an additional comment that says that we're grasping at straws because some middlebox is evidently feeding bogus pmtu information. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] icmp: Restore resistence to abnormal messages 2016-11-16 1:14 ` Florian Westphal @ 2016-11-17 1:17 ` Vicente Jiménez 0 siblings, 0 replies; 8+ messages in thread From: Vicente Jiménez @ 2016-11-17 1:17 UTC (permalink / raw) To: Florian Westphal Cc: David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel On Wed, Nov 16, 2016 at 2:14 AM, Florian Westphal <fw@strlen.de> wrote: > Vicente Jiménez <googuy@gmail.com> wrote: >> 1- add warning with pr_warn_ratelimited. I like this idea. I also >> though about adding some message but I have no kernel experience and I >> preferred to have just a working solution. > > I added this only to show whats happening. > > I don't like such printks because end users can't do anything about it. > What about using net_dbg_ratelimited macro? it only adds messages if debug is enabled. > >> Finally, both patches decrement current packet by a value: Mine by 2 >> and Florian's by 8 bytes. Both arbitrary values. Personally I prefer >> to go by small steps. If the small step fails, it just iterate again >> and with 4 iterations, my patch also decrement the original value by 8 >> bytes (4x2). >> Basically they are the same but my patch take smaller steps and miss >> the warning message. > > IIRC I chose 8 because connection recovered faster in my case. > > I have not experienced this issue again (I dropped the patch from > my kernel at some point and the connection stalls did not reappear so > this got fixed elsewhere). My issue is permanent for now in various locations. We have to decrease MTU manually on all devices with newer kernels. We don't have direct access to those abnormal routers because they are managed by a communication provider that think the network had no problem because all their Windows machines apparently work perfectly. -- cheers vicente ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-17 1:17 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-11 20:20 [PATCH] icmp: Restore resistence to abnormal messages Vicente Jimenez Aguilar 2016-11-14 18:36 ` David Miller 2016-11-15 16:49 ` Vicente Jiménez 2016-11-15 16:56 ` David Miller 2016-11-15 17:30 ` Florian Westphal 2016-11-15 19:32 ` Vicente Jiménez 2016-11-16 1:14 ` Florian Westphal 2016-11-17 1:17 ` Vicente Jiménez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).