netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ipsec/xfrm] IPv6 fragmentation/path-mtu
@ 2019-03-18  0:03 Bram Yvahk
  2019-03-19 23:52 ` Bram Yvahk
  0 siblings, 1 reply; 3+ messages in thread
From: Bram Yvahk @ 2019-03-18  0:03 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, herbert, davem

When playing a bit with IPv6 and XFRM I ran into a possible
issue/edge case.

In my testing I used linux 4.14.95, I was planning on testing this
with latest kernel and investigating this a bit more deeply but so
far I've not yet been able to do so... Only reason why I'm already
submitting this message is because there is a 'Linux IPsec workshop'
next week.

When path-mtu between the two ipsec gateways is 1280 (i.e. minimum
IPv6 mtu) and when a client in the network attempts to send a larger
message then it receives a ICMPv6 PKT_TOOBIG message.
The problem: mtu field in the message is set to 1198... This is lower
then the minimum IPv6 mtu and the client seems to ignore it.

(What I think should happen in this particular case: do not send a
 PKT_TOOBIG to the client but instead transmit fragmented IPv6 ESP
 packets to accommodate the path-mtu)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [ipsec/xfrm] IPv6 fragmentation/path-mtu
  2019-03-18  0:03 [ipsec/xfrm] IPv6 fragmentation/path-mtu Bram Yvahk
@ 2019-03-19 23:52 ` Bram Yvahk
  2019-03-20 21:06   ` Bram Yvahk
  0 siblings, 1 reply; 3+ messages in thread
From: Bram Yvahk @ 2019-03-19 23:52 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, herbert, davem

Bram Yvahk wrote:
> (What I think should happen in this particular case: do not send a
>  PKT_TOOBIG to the client but instead transmit fragmented IPv6 ESP
>  packets to accommodate the path-mtu)
A follow-up to clarify my thinking (since my original mail might not
be clear enough).

Let me first start by stating some of the (imo) obvious things:
- IPv4 can be fragmented by hops on the route
- IPv6 can only be fragmented by the originating source
- Minimum mtu for IPv4 is 576
- Minimum mtu for IPv6 is 1280
- IPsec has some overhead

Setup:
======

|----------|   |-----------|   |-------|   |-----------|   |----------|
| client A |---| Gateway A |---| Hop H |---| Gateway B |---| client B |
------------   |-----------|   |-------|   |-----------|   |----------|

- testing with linux 4.14.95 (setup with more recent kernel is WIP)
- link mtu between client A and Gateway A: 1500
- link mtu between Gateway A and Hop H: 1500
- link mtu between Hop H and Gateway B: 1280
- link mtu between Gateway B and client B: 1500
- path-mtu between Gateway A and Gateway B: 1280
- IPsec tunnel over IPv6 between Gateway A and Gateway B
- tunneling IPv4 over the IPsec tunnel
- tunneling IPv6 over the IPsec tunnel
- testing with XFRM (not with VTI since this has issues)
- (ip_vti module not loaded)
- (ip6_vti module not loaded)


Example with IPv4:
==================

Let's first take a look and see what happens with IPv4.
(I know IPv4 can be fragmented by all hops but that's not relevant)

- path-mtu between 'Gateway A' and 'Gateway B' is unknown
- 'client A' sends a ICMP to 'client B': size 1300, DF bit *not* set
  * 'gateway A' encrypts this and transmits one IPv6 ESP packet
    (size of outgoing packet: 1380 bytes)
  * 'gateway A' receives PKT_TOOBIG ICMPv6 from 'Hop H' (max mtu: 1280)
  * 'gateway A' now knows the path-mtu
 
  (truncated) output from tcpdump:
    IP6: ESP(spi=0xeff48047,seq=0xa), length 1380
    IP6: ICMP6, packet too big, mtu 1280, length 1240

   
- path-mtu between 'Gateway A' and 'Gateway B' is known
- 'client A' sends a ICMP to 'client B': size 1300, DF bit *not* set
  * 'gateway A' encrypts this and transmits two fragmented IPv6 packets

  (truncated) output from tcpdump:
    IP6: frag (0|1232) ESP(spi=0xeff48047,seq=0xb), length 1232
    IP6: frag (1232|148)

==> the IPv4 packet was *not* fragmented, the encrypted data [which is
    the IPv4 packet] was transmitted as two fragmented packets by
    'Gateway A'. ('Gateway A' is the originator of the ESP packet)


Example with IPv6:
==================

Now let's compare this with IPv6.
Only the originating source can fragment the packets.

- path-mtu between 'Gateway A' and 'Gateway B' is unknown
- 'client A' sends a ICMPv6 to 'client B': size 1300
  * 'gateway A' encrypts this and transmits one IPv6 ESP packet
    (size of outgoing packet: 1396 bytes)
  * 'gateway A' receives PKT_TOOBIG ICMPv6 from 'Hop H' (max mtu: 1280)
  * 'gateway A' now knows the path-mtu
 
  (truncated) output from tcpdump:
    IP6: ESP(spi=0xeff48048,seq=0x5), length 1396
    IP6: ICMP6, packet too big, mtu 1280, length 1240

- 'client A' sends a ICMPv6 to 'client B': size 1300
  * 'client A' receives PKT_TOO_BIG ICMPv6 from 'Gateway A': max 1198
    IP6: ICMP6, echo request, seq 1, length 1300
    IP6: ICMP6, packet too big, mtu 1198, length 1240
   
- gateway A' sending a ICMPv6 to 'client B': this now fails regardless
  of the size (even with -s 1)... (sendto call returns EINVAL); a ping
  from 'client A' to 'client B' still results in the PKT_TOOBIG; only
  way to fix this papers to be to make the kernel forget the path-mtu
  [this might be another bug? I could understand large packets not
   getting through but small ones? -- I'll verify this on a more recent
   kernel]


What I would've expected to happen is that 'Gateway A' would send out
two fragmented IPv6 packets containing the encrypted data. 'Gateway A'
is the originator of the IPv6 ESP packet so it can fragment these.
This similar to how it's done for IPv4. When the ESP is fragmented
then the IPv6 packet from 'client A' is left intact/not fragmented.

With my - limited - understanding of the IPv6 RFC I think this would
be allowed.

And just for the sake of argument: let's say the IPsec tunnel was
not using IPv6 but IPv4: would it then be OK to fragment the IPv4
ESP packets when the encrypted data is an IPv6 packet?


A very quick-and-dirty patch for which I do *not* know what impact
it has:

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index f112fef..066c311 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -684,14 +684,20 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
        struct crypto_aead *aead = x->data;
        u32 blksize = ALIGN(crypto_aead_blocksize(aead), 4);
        unsigned int net_adj;
+       int mtu2;
 
        if (x->props.mode != XFRM_MODE_TUNNEL)
                net_adj = sizeof(struct ipv6hdr);
        else
                net_adj = 0;
 
-       return ((mtu - x->props.header_len - crypto_aead_authsize(aead) -
+       mtu2 = ((mtu - x->props.header_len - crypto_aead_authsize(aead) -
                 net_adj) & ~(blksize - 1)) + net_adj - 2;
+
+        if (mtu2 < IPV6_MIN_MTU) {
+               return IPV6_MIN_MTU;
+       }
+       return mtu2;
 }
 
 static int esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,

=> with this patch: the IPv6 ESP packet is now fragmented.

i.e. a ping from 'client A' to 'client B': shows
    IP6: frag (0|1232) ESP(spi=0x410e6a38,seq=0x1a), length 1232
    IP6: frag (1232|68)

=> same as IPv4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [ipsec/xfrm] IPv6 fragmentation/path-mtu
  2019-03-19 23:52 ` Bram Yvahk
@ 2019-03-20 21:06   ` Bram Yvahk
  0 siblings, 0 replies; 3+ messages in thread
From: Bram Yvahk @ 2019-03-20 21:06 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, herbert, davem

>
> What I would've expected to happen is that 'Gateway A' would send out
> two fragmented IPv6 packets containing the encrypted data. 'Gateway A'
> is the originator of the IPv6 ESP packet so it can fragment these.
> This similar to how it's done for IPv4. When the ESP is fragmented
> then the IPv6 packet from 'client A' is left intact/not fragmented.
>
> With my - limited - understanding of the IPv6 RFC I think this would
> be allowed.

Parts from the IPv6 RFC that I think are relevant:

5. Packet Size Issues

   IPv6 requires that every link in the internet have an MTU of 1280
   octets or greater.  On any link that cannot convey a 1280-octet
   packet in one piece, link-specific fragmentation and reassembly must
   be provided at a layer below IPv6.
   
+

   link      - a communication facility or medium over which nodes can
               communicate at the link layer, i.e., the layer
               immediately below IPv6.  Examples are Ethernets (simple
               or bridged); PPP links; X.25, Frame Relay, or ATM
               networks; and internet (or higher) layer "tunnels",
               such as tunnels over IPv4 or IPv6 itself.

*My* interpretation from this: an IPv6 IPsec tunnel is considered a "link"
in the IPv6 RFC.

This means that the mtu inside an IPsec tunnel - which tunnels IPv6
traffic - must be at least 1280 octets.

What this technically means for IPsec: when the path-mtu between the
two IPsec Gateways is 1280 then the IPsec tunnel should still provide
a 1280 octets mtu inside the IPsec tunnel.

The way it can do this is by transmitting fragmented IPv6 ESP packets.
[Data inside the tunnel is *not* fragmented]

Before continuing: is my understanding correct? and/or does everyone
agree with the above?

Is it reasonable to expect tunneling to work when path-mtu between
the IPsec gateways is 1280?


(In case it is reasonable then one can discuss what the mtu inside
 the tunnel should be when path-mtu is 1280 (or better put when
 path-mtu - ESP overhead < 1280) - but let's leave that discussion
 for later)


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-03-20 21:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-18  0:03 [ipsec/xfrm] IPv6 fragmentation/path-mtu Bram Yvahk
2019-03-19 23:52 ` Bram Yvahk
2019-03-20 21:06   ` Bram Yvahk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).