All of lore.kernel.org
 help / color / mirror / Atom feed
* [Regression w/ patch] Restore network resistance to weird ICMP messages
@ 2016-11-10 18:47 Vicente Jiménez
  2016-11-10 19:07 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Vicente Jiménez @ 2016-11-10 18:47 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

Handle weird ICMP fragmentation needed messages with next hop MTU
equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")

In a large corporate network, we spotted this weird ICMP message after
a long troubleshooting. See attached capture file. Those ICMP "network
unreachable - fragmentation needed and don't fragment bit set"
messages are sent by a router that drop 1500 bytes IP packets and fill
the next hop MTU ICMP field with 1500.

Those messages cause the TCP connection to stall but only on newer
kernels. Older kernels set path MTU to 1492 and communicates
successfully.

After checking code and commit history, I spotted how commit
46517008e116 ("ipv4: Kill ip_rt_frag_needed().") from June 2012
changed ICMP messages handling by removing ip_rt_frag_needed function.

The relevant part of the ip_rt_frag_needed function that was removed is:

if (new_mtu < 68 || new_mtu >= old_mtu) {
        /* BSD 4.2 derived systems incorrectly adjust
         * tot_len by the IP header length, and report
         * a zero MTU in the ICMP message.
         */
        if (mtu == 0 &&
            old_mtu >= 68 + (iph->ihl << 2))
                old_mtu -= iph->ihl << 2;
        mtu = guess_mtu(old_mtu);
}


This condition handled the cases when next hop MTU where zero (less
than 68). Now this is handled by the protocol and fixed by commit
68b7107b6298 "ipv4: icmp: Fix pMTU handling for rare case".

But the rarest case when (next hop MTU) new_mtu >= old_mtu (dropped
packet length) was also removed. This commit restores this check.
Instead of using a table lookup like function guess_mtu uses, it just
try to set the path MTU decrementing by 2 bytes the dropped packet
size.

In our case, setting the path MTU to just 1498 (one iteration) worked.
This solution should converge in any case to a good value by small
steps. I don't think there's a need to a more complex solution.

The patched kernel worked perfectly setting the path MTU to 1498 from
the initial default interface value of 1500. This time I don't have a
capture file from inside the affected center, but all received packed
had a maximum size of 1498.

-- 
cheers
vicente

[-- Attachment #2: 0001-ipv4-icmp-Fix-pMTU-handling-for-rarest-case.patch --]
[-- Type: application/octet-stream, Size: 1273 bytes --]

From 88ac49fa287e2e0d3d3bc805dea1fec301aad1df Mon Sep 17 00:00:00 2001
From: Vicente Jimenez Aguilar <googuy@gmail.com>
Date: Mon, 31 Oct 2016 13:10:29 +0100
Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case

Restore network resistance to weird ICMP fragmentation needed messages
with next hop MTU equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
---
 net/ipv4/icmp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 38abe70..4c90d76 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -773,6 +773,7 @@ static bool icmp_tag_validation(int proto)
 static bool icmp_unreach(struct sk_buff *skb)
 {
 	const struct iphdr *iph;
+	unsigned short old_mtu;
 	struct icmphdr *icmph;
 	struct net *net;
 	u32 info = 0;
@@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb)
 				/* fall through */
 			case 0:
 				info = ntohs(icmph->un.frag.mtu);
+				/* Handle weird case where next hop MTU is
+				 * equal to or exceeding dropped packet size
+				 */
+				old_mtu = ntohs(iph->tot_len);
+				if (info >= old_mtu)
+					info = old_mtu - 2;
 			}
 			break;
 		case ICMP_SR_FAILED:
-- 
2.7.3


[-- Attachment #3: ICMP discarting and sugesting 1500 2.pcapng --]
[-- Type: application/octet-stream, Size: 89664 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Regression w/ patch] Restore network resistance to weird ICMP messages
  2016-11-10 18:47 [Regression w/ patch] Restore network resistance to weird ICMP messages Vicente Jiménez
@ 2016-11-10 19:07 ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2016-11-10 19:07 UTC (permalink / raw)
  To: googuy; +Cc: kuznet, jmorris, yoshfuji, kaber, linux-kernel, netdev


This is still not formed properly.

Your Subject line should be of the form:

	Subject: $Subsystem: $Description

Something like "icmp: Restore resistence to abnormal messages."

Also, your patch must be inline, right after the commit message,
rather than included as an attachment.

Finally your Fixes: line much be exactly right before the first
Signoff of your commit message.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression w/ patch] Restore network resistance to weird ICMP messages
  2016-11-10 10:52   ` Vicente Jiménez
@ 2016-11-10 14:48     ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2016-11-10 14:48 UTC (permalink / raw)
  To: googuy; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Vicente Jiménez <googuy@gmail.com>
Date: Thu, 10 Nov 2016 11:52:01 +0100

> Corrected patch attached.
> Thanks for the advices.
> I was unaware of those style policies.

This is not how to submit a fixed patch.

You must make a new, fresh, list posting fully formed and with
a clean Subject line and commit message.

Not as a reply to the discussion.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Regression w/ patch] Restore network resistance to weird ICMP messages
  2016-11-10  1:22 ` David Miller
@ 2016-11-10 10:52   ` Vicente Jiménez
  2016-11-10 14:48     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Vicente Jiménez @ 2016-11-10 10:52 UTC (permalink / raw)
  To: David Miller
  Cc: Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

Corrected patch attached.
Thanks for the advices.
I was unaware of those style policies.

On Thu, Nov 10, 2016 at 2:22 AM, David Miller <davem@davemloft.net> wrote:
> From: Vicente Jiménez <googuy@gmail.com>
> Date: Mon, 7 Nov 2016 12:11:59 +0100
>
>> From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
>> From: Vicente Jimenez Aguilar <googuy@gmail.com>
>> Date: Mon, 31 Oct 2016 13:10:29 +0100
>> Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case
>>
>> Restore network resistance to weird ICMP fragmentation needed messages
>> with next hop MTU equal to (or exceeding) dropped packet size
>>
>> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
>> Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
>> ---
>>  net/ipv4/icmp.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 38abe70..c0af1d2 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
>>       struct icmphdr *icmph;
>>       struct net *net;
>>       u32 info = 0;
>> +     unsigned short old_mtu;
>>
>>       net = dev_net(skb_dst(skb)->dev);
>>
>
> Order local variable declarations from longest to shortest line
> please.
>
>> +                             if ( info >= old_mtu )
>
> There should be no space after the '(' and before the ')' in this
> conditional.



-- 
saludos
vicente

[-- Attachment #2: 0001-ipv4-icmp-Fix-pMTU-handling-for-rarest-case.patch --]
[-- Type: text/x-patch, Size: 1273 bytes --]

From 88ac49fa287e2e0d3d3bc805dea1fec301aad1df Mon Sep 17 00:00:00 2001
From: Vicente Jimenez Aguilar <googuy@gmail.com>
Date: Mon, 31 Oct 2016 13:10:29 +0100
Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case

Restore network resistance to weird ICMP fragmentation needed messages
with next hop MTU equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
---
 net/ipv4/icmp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 38abe70..4c90d76 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -773,6 +773,7 @@ static bool icmp_tag_validation(int proto)
 static bool icmp_unreach(struct sk_buff *skb)
 {
 	const struct iphdr *iph;
+	unsigned short old_mtu;
 	struct icmphdr *icmph;
 	struct net *net;
 	u32 info = 0;
@@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb)
 				/* fall through */
 			case 0:
 				info = ntohs(icmph->un.frag.mtu);
+				/* Handle weird case where next hop MTU is
+				 * equal to or exceeding dropped packet size
+				 */
+				old_mtu = ntohs(iph->tot_len);
+				if (info >= old_mtu)
+					info = old_mtu - 2;
 			}
 			break;
 		case ICMP_SR_FAILED:
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Regression w/ patch] Restore network resistance to weird ICMP messages
  2016-11-07 11:11 Vicente Jiménez
@ 2016-11-10  1:22 ` David Miller
  2016-11-10 10:52   ` Vicente Jiménez
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2016-11-10  1:22 UTC (permalink / raw)
  To: googuy; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel

From: Vicente Jiménez <googuy@gmail.com>
Date: Mon, 7 Nov 2016 12:11:59 +0100

> From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
> From: Vicente Jimenez Aguilar <googuy@gmail.com>
> Date: Mon, 31 Oct 2016 13:10:29 +0100
> Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case
> 
> Restore network resistance to weird ICMP fragmentation needed messages
> with next hop MTU equal to (or exceeding) dropped packet size
> 
> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
> Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
> ---
>  net/ipv4/icmp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 38abe70..c0af1d2 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
>  	struct icmphdr *icmph;
>  	struct net *net;
>  	u32 info = 0;
> +	unsigned short old_mtu;
>  
>  	net = dev_net(skb_dst(skb)->dev);
>  

Order local variable declarations from longest to shortest line
please.

> +				if ( info >= old_mtu )

There should be no space after the '(' and before the ')' in this
conditional.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Regression w/ patch] Restore network resistance to weird ICMP messages
@ 2016-11-07 11:11 Vicente Jiménez
  2016-11-10  1:22 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Vicente Jiménez @ 2016-11-07 11:11 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

Handle weird ICMP fragmentation needed messages with next hop MTU
equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")

In a large corporate network, we spotted this weird ICMP message after
a long troubleshooting. See attached capture file. Those ICMP "network
unreachable - fragmentation needed and don't fragment bit set"
messages are sent by a router that drop 1500 bytes IP packets and fill
the next hop MTU ICMP field with 1500.

Those messages cause the TCP connection to stall but only on newer
kernels. Older kernels set path MTU to 1492 and communicates
successfully.

After checking code and commit history, I spotted how commit
46517008e116 ("ipv4: Kill ip_rt_frag_needed().") from June 2012
changed ICMP messages handling by removing ip_rt_frag_needed function.

The relevant part of the ip_rt_frag_needed function that was removed is:

if (new_mtu < 68 || new_mtu >= old_mtu) {
        /* BSD 4.2 derived systems incorrectly adjust
         * tot_len by the IP header length, and report
         * a zero MTU in the ICMP message.
         */
        if (mtu == 0 &&
            old_mtu >= 68 + (iph->ihl << 2))
                old_mtu -= iph->ihl << 2;
        mtu = guess_mtu(old_mtu);
}


This condition handled the cases when next hop MTU where zero (less
than 68). Now this is handled by the protocol and fixed by commit
68b7107b6298 "ipv4: icmp: Fix pMTU handling for rare case".

But the rarest case when (next hop MTU) new_mtu >= old_mtu (dropped
packet length) was also removed. This commit restores this check.
Instead of using a table lookup like function guess_mtu uses, it just
try to set the path MTU decrementing by 2 bytes the dropped packet
size.

In our case, setting the path MTU to just 1498 (one iteration) worked.
This solution should converge in any case to a good value by small
steps. I don't think there's a need to a more complex solution.

The patched kernel worked perfectly setting the path MTU to 1498 from
the initial default interface value of 1500. This time I don't have a
capture file from inside the affected center, but all received packed
had a maximum size of 1498.

-- 
cheers
vicente

[-- Attachment #2: ICMP discarting and sugesting 1500 2.pcapng --]
[-- Type: application/x-pcapng, Size: 89664 bytes --]

[-- Attachment #3: 0001-ipv4-icmp-Fix-pMTU-handling-for-rarest-case.patch --]
[-- Type: text/x-patch, Size: 1241 bytes --]

From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
From: Vicente Jimenez Aguilar <googuy@gmail.com>
Date: Mon, 31 Oct 2016 13:10:29 +0100
Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case

Restore network resistance to weird ICMP fragmentation needed messages
with next hop MTU equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
---
 net/ipv4/icmp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 38abe70..c0af1d2 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
 	struct icmphdr *icmph;
 	struct net *net;
 	u32 info = 0;
+	unsigned short old_mtu;
 
 	net = dev_net(skb_dst(skb)->dev);
 
@@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb)
 				/* fall through */
 			case 0:
 				info = ntohs(icmph->un.frag.mtu);
+				/* Handle weird case where next hop MTU is
+				 * equal to or exceeding dropped packet size
+				 */
+				old_mtu = ntohs(iph->tot_len);
+				if ( info >= old_mtu )
+					info = old_mtu - 2;
 			}
 			break;
 		case ICMP_SR_FAILED:
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-11-10 19:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-10 18:47 [Regression w/ patch] Restore network resistance to weird ICMP messages Vicente Jiménez
2016-11-10 19:07 ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2016-11-07 11:11 Vicente Jiménez
2016-11-10  1:22 ` David Miller
2016-11-10 10:52   ` Vicente Jiménez
2016-11-10 14:48     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.