netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bram Yvahk <bram-yvahk@mail.wizbit.be>
To: Steffen Klassert <steffen.klassert@secunet.com>
Cc: herbert@gondor.apana.org.au, davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH ipsec/vti 0/2] Fragmentation of IPv4 in VTI
Date: Fri, 22 Mar 2019 21:46:44 +0100	[thread overview]
Message-ID: <5C9549B4.8020503@mail.wizbit.be> (raw)
In-Reply-To: <5C93D910.1080008@mail.wizbit.be>

Bram Yvahk wrote:
> Steffen Klassert wrote:
>> On Sun, Mar 17, 2019 at 11:37:55PM +0000, Bram Yvahk wrote:
>>> We've experienced an issue with VTI when the path-mtu is smaller than
> the size
>>> of the "client" packet.
>>>
>>> What happens: IPv4 packet from the client (i.e. another system in the
> LAN)
>>> attempts to transmit some data; IPv4 header shows that 'DF' bit is
> not set but
>>> still the client receives ICMPv4 "need-to-frag" message [which the
> client does
>>> not expect and ignores].
>>>
>>> Example: $ ping -s 1300 -M dont -c5 192.168.235.2
>>>     PING 192.168.235.3 (192.168.235.3) 1300(1328) bytes of data.
>>>     From 192.168.236.254 icmp_seq=1 Frag needed and DF set (mtu = 1214)
>>>     From 192.168.236.254 icmp_seq=2 Frag needed and DF set (mtu = 1214)
>>>     From 192.168.236.254 icmp_seq=3 Frag needed and DF set (mtu = 1214)
>>>     From 192.168.236.254 icmp_seq=4 Frag needed and DF set (mtu = 1214)
>>>     From 192.168.236.254 icmp_seq=5 Frag needed and DF set (mtu = 1214)
>>>
>>>     --- 192.168.235.3 ping statistics ---
>>>     5 packets transmitted, 0 received, +5 errors, 100% packet loss,
> time 3999ms
>> Hm, this works here. Can you show how you setup the vti device?
>> Some tunnel configuration options (set ttl etc.) force to have
>> the DF bit set.
>
> I will provide these details Tommorow.
> What I can say is that ttl was set to inherit.
>

vti device is created (on Gateway A) using:
$ ip tun add name vti0 mode vti ikey 1 okey 1 local <ip gateway A>
$ ip link show dev vti0
46: vti0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
    link/ipip <ip gateway A> brd 0.0.0.0
$ ip tun show name vti0            
vti0: ip/ip remote any local <ip gateway A> ttl inherit key 1
   
[I've also done setup with mtu 1400 - all remains the same]

xfrm state:
src <ip gateway B> dst <ip gateway A>
        proto esp spi 0xcd76a4a9 reqid 16389 mode tunnel
        replay-window 32 flag nopmtudisc af-unspec
        auth-trunc hmac(sha1) 0x08e1ce16b1f7f9039f9cc7421cf61010c029efc3 96
        enc cbc(aes)
0x22c7aacd9680a10a52b0c5670b7d850c35ba17f7c7dc6c963252cdc311b1f4d5
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
src <ip gateway A> dst <ip gateway B>
        proto esp spi 0x8f2988c7 reqid 16389 mode tunnel
        replay-window 32 flag nopmtudisc af-unspec
        auth-trunc hmac(sha1) 0x229bbe490606ddcc6a68332babd498001591c6bf 96
        enc cbc(aes)
0xd598dba419bfc45232580e54d517aae6a77c3328a51ebb3321802b89cc51ae43
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(same behaviour with/without nopmtudisc; nopmtudisc only makes a
 difference for packets from 'client A' that *do* have the DF bit set)

>
> When testing this there is one important bit - which in hindsight I
> should've included in the previous message - the (IPsec) Gateway A
> needs to know the path-mtu to (IPsec) Gateway B.
>
> Some ways to accomplish this:
> - transmit a ICMP with DF bit set and a larger packet size from
>   Gateway A to Gateway B
> - ensure the "nopmtudisc" option is *not* set in the xfrm state
>   and then let client A transmit a ICMP *with* DF bit set to
>   client B. [when "nopmtudisc" is set then all outgoing IPv4 ESP
>   packet have the DF bit cleared, when "nopmtudisc" is not set then
>   DF bit is copied from the client packet]
> 
> For testing purposes I recommend to do the ping from Gateway A to
> Gateway B. (Otherwise tcpdumps/traffic get a bit more confusing.)
>
> A more in-depth description of what happens:
>
> Setup:
> ======
>
> |----------|   |-----------|   |-------|   |-----------|   |----------|
> | client A |---| Gateway A |---| Hop H |---| Gateway B |---| client B |
> ------------   |-----------|   |-------|   |-----------|   |----------|
>
> - testing with linux 4.14.95 (setup with more recent kernel is WIP)
> - link mtu between client A and Gateway A: 1500
> - link mtu between Gateway A and Hop H: 1500
> - link mtu between Hop H and Gateway B: 1280
> - link mtu between Gateway B and client B: 1500
> - path-mtu between Gateway A and Gateway B: 1280
> - IPsec tunnel over *IPv4* between Gateway A and Gateway B
> - tunneling IPv4 over the IPsec tunnel
> - testing with VTI
>
> Scenario:
> ==========
>
> Before starting it's important to ensure that:
> - Gateway A does *not* know the path-mtu to Gateway B
> - Client A does *not* know the path-mtu to Gateway B

On Gateway A:

$ ip route get <ip of gateway B>
<ip gateway B> via <hop H> dev eth1 src <ip gateway A> uid 0
    cache
=> no mtu shown --> path-mtu not yet known

>
> * Step 1: client A: $ ping -M dont -s 1300 ip_of_client_B
>   - IPv4 ICMP packet of client A does not have DF bit set
>   - IPv4 ESP packet of Gateway A does not have DF bit set
>   - Hop H receives a IPv4 ESP packet that is too large for link-mtu
>     between Hop H and Gateway B: it fragments the IPv4 ESP packet.
>   - Gateway B receives 2 IPv4 fragmented packets
>   - (Client B receives one IPv4 ICMP packet from client A)

tcpdump on Gateway A:
- from client A it receives:
    IP (tos 0x0, ttl 64, id 46797, offset 0, flags [none], proto ICMP
(1), length 1328)
        client_A > client_B: ICMP echo request, id 6855, seq 1, length 1308

- it transmits (to Gateway B):
    IP (tos 0x0, ttl 64, id 10932, offset 0, flags [none], proto ESP
(50), length 1400)
        gateway_A > gateway_B: ESP(spi=0x8f2988c7,seq=0x3), length 1380

tcpdump on Gateway B:
- it receives (from Gateway A):
    IP (tos 0x0, ttl 63, id 10932, offset 0, flags [+], proto ESP (50),
length 1276)
        gateway_A > gateway_B: ESP(spi=0x8f2988c7,seq=0x3), length 1256
    IP (tos 0x0, ttl 63, id 10932, offset 1256, flags [none], proto ESP
(50), length 144)
        gateway_A > gateway_B: ip-proto-50
- it transmits (to client B):
    IP (tos 0x0, ttl 62, id 46797, offset 0, flags [none], proto ICMP
(1), length 1328)
        client_A > client_B: ICMP echo request, id 6855, seq 1, length 1308

=> Hop H fragmented the IPv4 packets. This is expected: DF bit is not
   set on ESP packets and Gateway A does not know path-mtu to Gateway B

>
> * Step 2: Gateway A: $ ping -M do -s 1300 ip_of_gateway_B
>   - IPv4 ICMP packet of Gateway A does have DF bit set
>   - Gateway A receives a 'need to frag' ICMP from Hop H

tcpdump on Gateway A:

- it transmits (local packet - to Gateway B):
    IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1),
length 1328)
        gateway_A > gateway_B: ICMP echo request, id 28176, seq 1,
length 1308
- it receives (from Hop H):
    IP (tos 0xc0, ttl 64, id 52788, offset 0, flags [none], proto ICMP
(1), length 576)
        hop_H > gateway_A: ICMP 1.1.235.254 unreachable - need to frag
(mtu 1280), length 556
            IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP
(1), length 1328)
                gateway_A > gateway_B: ICMP echo request, id 28176, seq
1, length 1308

=> Hop H send need-to-frag mtu. This expected: DF bit is set on ICMP
   packet so Hop H should not fragment.
  
on Gateway A:
$ ip route get <ip of gateway B>
<ip gateway B> via <hop H> dev eth1 src <ip gateway A> uid 0
    cache expires 17sec mtu 1280
=> path-mtu known to be 1280


> * Step 3: client A: $ ping -M dont -s 1300 ip_of_client_B
>   - IPv4 ICMP packet of client A does not have DF bit set
>   - Gateway A: it process this packet in VTI module and detects that
>     packet size > path-mtu and then sends a 'need to frag' ICMP to
>     client A. [this is the code I patched]

tcpdump on Gateway A:
- from client A it receives:
    IP (tos 0x0, ttl 64, id 46798, offset 0, flags [none], proto ICMP
(1), length 1328)
        client_A > client_B: ICMP echo request, id 7063, seq 1, length 1308
       
- it transmits to client A:
    IP (tos 0xc0, ttl 64, id 59290, offset 0, flags [none], proto ICMP
(1), length 576)
        gateway_A > client_A: ICMP client_B unreachable - need to frag
(mtu 1214), length 556
           IP (tos 0x0, ttl 63, id 46798, offset 0, flags [none], proto
ICMP (1), length 1328)
               client_A > client_B: ICMP echo request, id 7063, seq 1,
length 1308

>     
> => the critical bit in the above is that Gateway A learns
>    the path-mtu to Gateway B. If it doesn't then it keeps
>    assuming path-mtu is 1500 and the check in VTI will not
>    trigger (since path-mtu of 1500 > packet size)



      reply	other threads:[~2019-03-22 20:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-17 23:37 [PATCH ipsec/vti 0/2] Fragmentation of IPv4 in VTI Bram Yvahk
2019-03-17 23:37 ` [PATCH ipsec/vti 1/2] vti: fragment IPv4 packets when DF bit is not set Bram Yvahk
2019-03-17 23:52   ` Bram Yvahk
2019-03-17 23:37 ` [PATCH ipsec/vti 2/2] vti6: process icmp msg when IPv6 is fragmented Bram Yvahk
2019-03-21 15:16 ` [PATCH ipsec/vti 0/2] Fragmentation of IPv4 in VTI Steffen Klassert
2019-03-21 18:33   ` Bram Yvahk
2019-03-22 20:46     ` Bram Yvahk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5C9549B4.8020503@mail.wizbit.be \
    --to=bram-yvahk@mail.wizbit.be \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).