From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22c.google.com ([2a00:1450:4864:20::22c]) by shelob.surriel.com with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from ) id 1gPHqv-0005yD-Ku for kernelnewbies@kernelnewbies.org; Tue, 20 Nov 2018 21:02:53 -0500 Received: by mail-lj1-x22c.google.com with SMTP id n18-v6so3399300lji.7 for ; Tue, 20 Nov 2018 18:02:52 -0800 (PST) MIME-Version: 1.0 From: Wenxin Wang Date: Wed, 21 Nov 2018 10:01:38 +0800 Message-ID: Subject: question: frag_max_size not checked in ip_finish_output To: kernelnewbies@kernelnewbies.org Cc: Hideaki YOSHIFUJI , Alexey Kuznetsov , "David S. Miller" List-Id: Learn about the Linux kernel List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kernelnewbies-bounces@kernelnewbies.org Message-ID: <20181121020138.rM7Q9FpKfn7YcL_JDnBbPX4D2eYn2xVCUTEa78NdT0Y@z> Dear developers, It seems that with defragmentation enabled, `ip_finish_output` doesn't honor `IPCB(skb)->frag_max_size`, while `ip6_finish_output` checks `IP6CB(skb)->frag_max_size`. (Sorry for the reposting, I found that I need to subscribe to the mailing list, and I also add results of my experiment) The relevant code is here https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310 https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151 As far as I know, `frag_max_size` prevents the forwarding routine from sending packets longer than the maximum fragment received. I'm wondering why `ip_finish_output` doesn't do the same as `ip6_finish_output`, especially when `ip_fragment` and `ip_do_fragment`, called (indirectly) by `ip_finish_output` itself, both cap the output mtu by this `frag_max_size`. I did some experiement with two connected machines, one as a router with ipv4/ipv6 NAT and 1500 mtu, the other as a client behind NAT and 1280 mtu. Since NAT was enabled on the router it will do defragmentation. Using `traceroute`, the client sent udp packets with length 1500, which were fragmented by itself. I captured packets on the ingress and egress port of the NAT router, and here's the output: --- IPv4 with NAT (and defrag) ingress: 09:26:39.030436 IP 192.168.1.2.44654 > 223.5.5.5.33434: UDP, bad length 1472 > 1248 09:26:39.030451 IP 192.168.1.2 > 223.5.5.5: ip-proto-17 egress: 09:26:39.030543 IP 202.38.101.2.58599 > 223.5.5.5.33437: UDP, length 1472 --- IPv6 with NAT (and defrag) ingress: 09:18:26.947246 IP6 fdff::2 > 2001:250:3::1: frag (0|1232) 57242 > 33434: UDP, bad length 1452 > 1224 09:18:26.947262 IP6 fdff::2 > 2001:250:3::1: frag (1232|228) egress: 09:18:26.947362 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (0|1232) 49616 > 33437: UDP, bad length 1452 > 1224 09:18:26.947365 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (1232|228) - It can be seen that with defragmentation, IPv6 keeps the fragments below `frag_max_size`, while IPv4 doesn't. I understand that IPv6 routers are not allowed to meddle with fragmentation, while IPv4 routes can at least further fragment packets; but judging from the behavior of `ip_fragment`, I think the IPv4 code is also trying to honor `frag_max_size`, but didn't check it when deciding to do fragmentation or not. Many thanks in advance! If I'm sending to the wrong person, or wrong mailing list, please let me know. It's my first time trying to ask questions to Linux developers, and sorry for the disturbance. Thank you for making Linux great ;) Sincerely, Wenxin Wang _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies