[PATCH 1/3] xfrm: Fix off by one in the replay advance functions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] xfrm: Fix off by one in the replay advance functions
@ 2011-06-06  6:46 Steffen Klassert
  2011-06-06  6:48 ` [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data Steffen Klassert
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Steffen Klassert @ 2011-06-06  6:46 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev

We may write 4 byte too much when we reinitialize the anti replay
window in the replay advance functions. This patch fixes this by
adjusting the last index of the initialization loop.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_replay.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c
index 47f1b86..b11ea69 100644
--- a/net/xfrm/xfrm_replay.c
+++ b/net/xfrm/xfrm_replay.c
@@ -265,7 +265,7 @@ static void xfrm_replay_advance_bmp(struct xfrm_state *x, __be32 net_seq)
 			bitnr = bitnr & 0x1F;
 			replay_esn->bmp[nr] |= (1U << bitnr);
 		} else {
-			nr = replay_esn->replay_window >> 5;
+			nr = (replay_esn->replay_window - 1) >> 5;
 			for (i = 0; i <= nr; i++)
 				replay_esn->bmp[i] = 0;
 
@@ -471,7 +471,7 @@ static void xfrm_replay_advance_esn(struct xfrm_state *x, __be32 net_seq)
 			bitnr = bitnr & 0x1F;
 			replay_esn->bmp[nr] |= (1U << bitnr);
 		} else {
-			nr = replay_esn->replay_window >> 5;
+			nr = (replay_esn->replay_window - 1) >> 5;
 			for (i = 0; i <= nr; i++)
 				replay_esn->bmp[i] = 0;
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-06  6:46 [PATCH 1/3] xfrm: Fix off by one in the replay advance functions Steffen Klassert
@ 2011-06-06  6:48 ` Steffen Klassert
  2011-06-06  7:38   ` Eric Dumazet
  2011-06-06  6:48 ` [PATCH 3/3] ipv4: Fix packet size calculation for raw " Steffen Klassert
  2011-06-08  4:15 ` [PATCH 1/3] xfrm: Fix off by one in the replay advance functions David Miller
  2 siblings, 1 reply; 13+ messages in thread
From: Steffen Klassert @ 2011-06-06  6:48 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev, Eric Dumazet

Git commit 59104f06 (ip: take care of last fragment in ip_append_data)
added a check to see if we exceed the mtu when we add trailer_len.
However, the mtu is already subtracted by trailer_len when the
xfrm transfomation bundles are set up. So IPsec packets with mtu
size get fragmented, or if the DF bit is set the packets will not
be send even though they match the mtu perfectly fine. This patch
actually reverts commit 59104f06.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_output.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 98af369..a16859b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -888,12 +888,9 @@ alloc_new_skb:
 			 * because we have no idea what fragment will be
 			 * the last.
 			 */
-			if (datalen == length + fraggap) {
+			if (datalen == length + fraggap)
 				alloclen += rt->dst.trailer_len;
-				/* make sure mtu is not reached */
-				if (datalen > mtu - fragheaderlen - rt->dst.trailer_len)
-					datalen -= ALIGN(rt->dst.trailer_len, 8);
-			}
+
 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
 						alloclen + hh_len + 15,
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] ipv4: Fix packet size calculation for raw IPsec packets in __ip_append_data
  2011-06-06  6:46 [PATCH 1/3] xfrm: Fix off by one in the replay advance functions Steffen Klassert
  2011-06-06  6:48 ` [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data Steffen Klassert
@ 2011-06-06  6:48 ` Steffen Klassert
  2011-06-09 21:50   ` David Miller
  2011-06-08  4:15 ` [PATCH 1/3] xfrm: Fix off by one in the replay advance functions David Miller
  2 siblings, 1 reply; 13+ messages in thread
From: Steffen Klassert @ 2011-06-06  6:48 UTC (permalink / raw)
  To: David Miller, Herbert Xu; +Cc: netdev

We assume that transhdrlen is positive on the first fragment
which is wrong for raw packets. So we don't add exthdrlen to the
packet size for raw packets. This leads to a reallocation on IPsec
because we have not enough headroom on the skb to place the IPsec
headers. This patch fixes this by adding exthdrlen to the packet
size whenever the send queue of the socket is empty. This issue was
introduced with git commit 1470ddf7 (inet: Remove explicit write
references to sk/inet in ip_append_data)

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_output.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a16859b..6b894d4 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -799,7 +799,9 @@ static int __ip_append_data(struct sock *sk,
 	int csummode = CHECKSUM_NONE;
 	struct rtable *rt = (struct rtable *)cork->dst;
 
-	exthdrlen = transhdrlen ? rt->dst.header_len : 0;
+	skb = skb_peek_tail(queue);
+
+	exthdrlen = !skb ? rt->dst.header_len : 0;
 	length += exthdrlen;
 	transhdrlen += exthdrlen;
 	mtu = cork->fragsize;
@@ -825,8 +827,6 @@ static int __ip_append_data(struct sock *sk,
 	    !exthdrlen)
 		csummode = CHECKSUM_PARTIAL;
 
-	skb = skb_peek_tail(queue);
-
 	cork->length += length;
 	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-06  6:48 ` [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data Steffen Klassert
@ 2011-06-06  7:38   ` Eric Dumazet
  2011-06-06  8:52     ` Steffen Klassert
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2011-06-06  7:38 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, Herbert Xu, netdev

Le lundi 06 juin 2011 à 08:48 +0200, Steffen Klassert a écrit :
> Git commit 59104f06 (ip: take care of last fragment in ip_append_data)
> added a check to see if we exceed the mtu when we add trailer_len.
> However, the mtu is already subtracted by trailer_len when the
> xfrm transfomation bundles are set up. So IPsec packets with mtu
> size get fragmented, or if the DF bit is set the packets will not
> be send even though they match the mtu perfectly fine. This patch
> actually reverts commit 59104f06.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---

Woh, I am afraid I wont have time in following days to check your
assertion.

What about original problem then, how should we fix it ?

We do have some cases where at least one fragment (the last one) is
oversized.

I remember I used Nick Bowler scripts at that time, I might find them
again...

[PATCH] ip : take care of last fragment in ip_append_data

While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :

- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:

        if (datalen == length + fraggap)
                alloclen += rt->dst.trailer_len;

Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)

In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.

In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-06  7:38   ` Eric Dumazet
@ 2011-06-06  8:52     ` Steffen Klassert
  2011-06-07  5:06       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Steffen Klassert @ 2011-06-06  8:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Herbert Xu, netdev

On Mon, Jun 06, 2011 at 09:38:19AM +0200, Eric Dumazet wrote:
> 
> Woh, I am afraid I wont have time in following days to check your
> assertion.

My test setup was the following:

I use an IPsec tunnel with tunnel endpoints 192.168.1.1 and 192.168.1.2

Then I do at 192.168.1.2

ping -c1 -M do -s 1410 192.168.1.1

PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
>From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1438)

--- 192.168.1.1 ping statistics ---
0 packets transmitted, 0 received, +1 errors

So the packet matches the mtu but it is not send.
I used a kernel with your patch as head commit.

Reverting your patch (going one commit deeper in the history):

ping -c1 -M do -s 1410 192.168.1.1

PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
1418 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=3.01 ms

--- 192.168.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 3.014/3.014/3.014/0.000 ms

> 
> What about original problem then, how should we fix it ?
> 

Hm, I don't know. I'll try to reproduce it here.

> We do have some cases where at least one fragment (the last one) is
> oversized.

trailer_len is used only on IPsec so the poroblem exists only when
using IPsec, right?

> 
> I remember I used Nick Bowler scripts at that time, I might find them
> again...

Would be nice if you could provide these scripts and some informations
on how to reproduce the problem.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-06  8:52     ` Steffen Klassert
@ 2011-06-07  5:06       ` Eric Dumazet
  2011-06-08  5:30         ` Steffen Klassert
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2011-06-07  5:06 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, Herbert Xu, netdev

Le lundi 06 juin 2011 à 10:52 +0200, Steffen Klassert a écrit :
> On Mon, Jun 06, 2011 at 09:38:19AM +0200, Eric Dumazet wrote:
> > 
> > Woh, I am afraid I wont have time in following days to check your
> > assertion.
> 
> My test setup was the following:
> 
> I use an IPsec tunnel with tunnel endpoints 192.168.1.1 and 192.168.1.2
> 
> Then I do at 192.168.1.2
> 
> ping -c1 -M do -s 1410 192.168.1.1
> 
> PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
> From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1438)
> 
> --- 192.168.1.1 ping statistics ---
> 0 packets transmitted, 0 received, +1 errors
> 
> So the packet matches the mtu but it is not send.
> I used a kernel with your patch as head commit.
> 
> Reverting your patch (going one commit deeper in the history):
> 
> ping -c1 -M do -s 1410 192.168.1.1
> 
> PING 192.168.1.1 (192.168.1.1) 1410(1438) bytes of data.
> 1418 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=3.01 ms
> 
> --- 192.168.1.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 3.014/3.014/3.014/0.000 ms
> 
> > 
> > What about original problem then, how should we fix it ?
> > 
> 
> Hm, I don't know. I'll try to reproduce it here.
> 
> > We do have some cases where at least one fragment (the last one) is
> > oversized.
> 
> trailer_len is used only on IPsec so the poroblem exists only when
> using IPsec, right?
> 
> > 
> > I remember I used Nick Bowler scripts at that time, I might find them
> > again...
> 
> Would be nice if you could provide these scripts and some informations
> on how to reproduce the problem.
> 

Nick mail was :

http://www.spinics.net/lists/netdev/msg141308.html

Unfortunatly I could not find on my machines where I put my own
scripts...

Not a big deal, I suspect we can revert my commit if you say it added a
regression :)

Thanks



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] xfrm: Fix off by one in the replay advance functions
  2011-06-06  6:46 [PATCH 1/3] xfrm: Fix off by one in the replay advance functions Steffen Klassert
  2011-06-06  6:48 ` [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data Steffen Klassert
  2011-06-06  6:48 ` [PATCH 3/3] ipv4: Fix packet size calculation for raw " Steffen Klassert
@ 2011-06-08  4:15 ` David Miller
  2 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2011-06-08  4:15 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Mon, 6 Jun 2011 08:46:03 +0200

> We may write 4 byte too much when we reinitialize the anti replay
> window in the replay advance functions. This patch fixes this by
> adjusting the last index of the initialization loop.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-07  5:06       ` Eric Dumazet
@ 2011-06-08  5:30         ` Steffen Klassert
  2011-06-09 21:47           ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Steffen Klassert @ 2011-06-08  5:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Herbert Xu, netdev

On Tue, Jun 07, 2011 at 07:06:57AM +0200, Eric Dumazet wrote:
> 
> Nick mail was :
> 
> http://www.spinics.net/lists/netdev/msg141308.html
> 

Thanks for providing these informations.

> Unfortunatly I could not find on my machines where I put my own
> scripts...
> 
> Not a big deal, I suspect we can revert my commit if you say it added a
> regression :)
> 

In between I can confirm that we get the slowpath problem back with my
patch, so we still have a bug somewhere. Reverting your commit would
be just a band aid. I think it is better to find the bug and do a real
fix instead. Unfortunatly I fear I'm not able to track it down before
my vacation that starts tomorrow. I'll continue to work at it once I'm
back...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-08  5:30         ` Steffen Klassert
@ 2011-06-09 21:47           ` David Miller
  2011-06-22 11:02             ` Steffen Klassert
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2011-06-09 21:47 UTC (permalink / raw)
  To: steffen.klassert; +Cc: eric.dumazet, herbert, netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 8 Jun 2011 07:30:20 +0200

> In between I can confirm that we get the slowpath problem back with my
> patch, so we still have a bug somewhere. Reverting your commit would
> be just a band aid. I think it is better to find the bug and do a real
> fix instead. Unfortunatly I fear I'm not able to track it down before
> my vacation that starts tomorrow. I'll continue to work at it once I'm
> back...

Thanks Steffen, looking forward to this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] ipv4: Fix packet size calculation for raw IPsec packets in __ip_append_data
  2011-06-06  6:48 ` [PATCH 3/3] ipv4: Fix packet size calculation for raw " Steffen Klassert
@ 2011-06-09 21:50   ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2011-06-09 21:50 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Mon, 6 Jun 2011 08:48:47 +0200

> We assume that transhdrlen is positive on the first fragment
> which is wrong for raw packets. So we don't add exthdrlen to the
> packet size for raw packets. This leads to a reallocation on IPsec
> because we have not enough headroom on the skb to place the IPsec
> headers. This patch fixes this by adding exthdrlen to the packet
> size whenever the send queue of the socket is empty. This issue was
> introduced with git commit 1470ddf7 (inet: Remove explicit write
> references to sk/inet in ip_append_data)
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-09 21:47           ` David Miller
@ 2011-06-22 11:02             ` Steffen Klassert
  2011-06-28  3:39               ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Steffen Klassert @ 2011-06-22 11:02 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, herbert, netdev

On Thu, Jun 09, 2011 at 02:47:04PM -0700, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Wed, 8 Jun 2011 07:30:20 +0200
> 
> > In between I can confirm that we get the slowpath problem back with my
> > patch, so we still have a bug somewhere. Reverting your commit would
> > be just a band aid. I think it is better to find the bug and do a real
> > fix instead. Unfortunatly I fear I'm not able to track it down before
> > my vacation that starts tomorrow. I'll continue to work at it once I'm
> > back...
> 
> Thanks Steffen, looking forward to this.

In between I found the problem. In ip_setup_cork() we take the mtu on the
base of dst_mtu(rt->dst.path) and assign it to cork->fragsize which is
used as the mtu in __ip_append_data(). The path dst_entry is a routing
dst_entry that does not take the IPsec header and trailer overhead into
account. So if we build an IPsec packet based on this mtu it might be to
big after the IPsec transformations are applied. If we take the actual
IPsec mtu from dst_mtu(&rt->dst) and adapt the exthdr handling to this,
it works as expected. So I'll send two patches, one that reverts Eric's
patch and one that fixes the slowpath issue.

While reading through the code of __ip_append_data() I noticed that we
might use ip_ufo_append_data() for packets that will be IPsec transformed
later, is this ok? I don't know how ufo handling works, but I would guess
that it expects an udp header and not an IPsec header as the packets
transport header.

I found another funny thing when I tested my patches:

I use an IPsec tunnel with tunnel endpoints 192.168.1.1 and 192.168.1.2

Then I do at 192.168.1.2

ping -c20 -M do -s 1500 192.168.1.1

PING 192.168.1.1 (192.168.1.1) 1500(1528) bytes of data.
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1438)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1438)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1374)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1310)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1246)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1182)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1118)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 1054)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 990)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 926)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 862)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 798)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 734)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 670)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 606)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 552)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 494)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 494)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 494)
>From 192.168.1.1 icmp_seq=1 Frag needed and DF set (mtu = 494)

--- 192.168.1.1 ping statistics ---
0 packets transmitted, 0 received, +20 errors

These packets are locally generated and not from 192.168.1.1 as they
claim to be. I think the problem is the following:

The IPsec mtu is 1438 here, so the first packet is too big.
xfrm4_tunnel_check_size() notices this and sends a ICMP_FRAG_NEEDED
packet that announces a mtu of 1438 to the original sender of the ping
packet. Unfortunately the sender is a local address, it's the IPsec
tunnel entry point. So we update the mtu for this connection to 1438.
Now, with the next packet xfrm_bundle_ok() notices that the path mtu has
changed, so it subtracts the IPsec overhead from the mtu a second time
and we end up with a mtu of 1374. This game goes until we reach a minimal
mtu of 494.

Unfortunately I don't know how to fix this. Any ideas?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-22 11:02             ` Steffen Klassert
@ 2011-06-28  3:39               ` David Miller
  2011-06-30  9:06                 ` Steffen Klassert
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2011-06-28  3:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: eric.dumazet, herbert, netdev

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 22 Jun 2011 13:02:19 +0200

> In between I found the problem. In ip_setup_cork() we take the mtu on the
> base of dst_mtu(rt->dst.path) and assign it to cork->fragsize which is
> used as the mtu in __ip_append_data(). The path dst_entry is a routing
> dst_entry that does not take the IPsec header and trailer overhead into
> account. So if we build an IPsec packet based on this mtu it might be to
> big after the IPsec transformations are applied. If we take the actual
> IPsec mtu from dst_mtu(&rt->dst) and adapt the exthdr handling to this,
> it works as expected. So I'll send two patches, one that reverts Eric's
> patch and one that fixes the slowpath issue.

Thanks for doing this work Steffen.

> While reading through the code of __ip_append_data() I noticed that we
> might use ip_ufo_append_data() for packets that will be IPsec transformed
> later, is this ok? I don't know how ufo handling works, but I would guess
> that it expects an udp header and not an IPsec header as the packets
> transport header.

Indeed, it could be a real problem.

> The IPsec mtu is 1438 here, so the first packet is too big.
> xfrm4_tunnel_check_size() notices this and sends a ICMP_FRAG_NEEDED
> packet that announces a mtu of 1438 to the original sender of the ping
> packet. Unfortunately the sender is a local address, it's the IPsec
> tunnel entry point. So we update the mtu for this connection to 1438.
> Now, with the next packet xfrm_bundle_ok() notices that the path mtu has
> changed, so it subtracts the IPsec overhead from the mtu a second time
> and we end up with a mtu of 1374. This game goes until we reach a minimal
> mtu of 494.
> 
> Unfortunately I don't know how to fix this. Any ideas?

If the generic PMTU handling in net/ipv4/route.c is adjusting the MTU
for the IPSEC path's route, that would be the problem.

In the case of IPSEC encapsulation, tunnels, and similar, only the
tunnel or the XFRM layer should be processing the PMTU notification.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data
  2011-06-28  3:39               ` David Miller
@ 2011-06-30  9:06                 ` Steffen Klassert
  0 siblings, 0 replies; 13+ messages in thread
From: Steffen Klassert @ 2011-06-30  9:06 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, herbert, netdev

On Mon, Jun 27, 2011 at 08:39:38PM -0700, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Wed, 22 Jun 2011 13:02:19 +0200
> 
> > While reading through the code of __ip_append_data() I noticed that we
> > might use ip_ufo_append_data() for packets that will be IPsec transformed
> > later, is this ok? I don't know how ufo handling works, but I would guess
> > that it expects an udp header and not an IPsec header as the packets
> > transport header.
> 
> Indeed, it could be a real problem.

Ok, so I'll send a patch to fix it up.

> 
> > The IPsec mtu is 1438 here, so the first packet is too big.
> > xfrm4_tunnel_check_size() notices this and sends a ICMP_FRAG_NEEDED
> > packet that announces a mtu of 1438 to the original sender of the ping
> > packet. Unfortunately the sender is a local address, it's the IPsec
> > tunnel entry point. So we update the mtu for this connection to 1438.
> > Now, with the next packet xfrm_bundle_ok() notices that the path mtu has
> > changed, so it subtracts the IPsec overhead from the mtu a second time
> > and we end up with a mtu of 1374. This game goes until we reach a minimal
> > mtu of 494.
> > 
> > Unfortunately I don't know how to fix this. Any ideas?
> 
> If the generic PMTU handling in net/ipv4/route.c is adjusting the MTU
> for the IPSEC path's route, that would be the problem.
> 

Yes, this is exactly what happens. We use icmp_send() to notify about
message size errors even for locally generated packets, this leads to
an incorrect pmtu update. Changing this to use ip_local_error() if we
have socket context fixes the problem. I'll send a patch for this too.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-06-30  9:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-06  6:46 [PATCH 1/3] xfrm: Fix off by one in the replay advance functions Steffen Klassert
2011-06-06  6:48 ` [PATCH 2/3] ipv4: Fix packet size calculation for IPsec packets in __ip_append_data Steffen Klassert
2011-06-06  7:38   ` Eric Dumazet
2011-06-06  8:52     ` Steffen Klassert
2011-06-07  5:06       ` Eric Dumazet
2011-06-08  5:30         ` Steffen Klassert
2011-06-09 21:47           ` David Miller
2011-06-22 11:02             ` Steffen Klassert
2011-06-28  3:39               ` David Miller
2011-06-30  9:06                 ` Steffen Klassert
2011-06-06  6:48 ` [PATCH 3/3] ipv4: Fix packet size calculation for raw " Steffen Klassert
2011-06-09 21:50   ` David Miller
2011-06-08  4:15 ` [PATCH 1/3] xfrm: Fix off by one in the replay advance functions David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.