kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
From: Wenxin Wang <wenxin.wang94@gmail.com>
To: kernelnewbies@kernelnewbies.org
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	"David S. Miller" <davem@davemloft.net>
Subject: question: frag_max_size not checked in ip_finish_output
Date: Wed, 21 Nov 2018 10:01:38 +0800	[thread overview]
Message-ID: <CAA3R8U-+i9mpWTyPNz5wgXCFf2Z8wQ=gT-NxfjRGKjkjFD62NA@mail.gmail.com> (raw)
Message-ID: <20181121020138.rM7Q9FpKfn7YcL_JDnBbPX4D2eYn2xVCUTEa78NdT0Y@z> (raw)

Dear developers,

It seems that with defragmentation enabled,
`ip_finish_output` doesn't honor `IPCB(skb)->frag_max_size`,
while `ip6_finish_output` checks `IP6CB(skb)->frag_max_size`.
(Sorry for the reposting, I found that I need to subscribe
to the mailing list, and I also add results of my experiment)

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine
from sending packets longer than the maximum fragment received.
I'm wondering why `ip_finish_output` doesn't do the same as
`ip6_finish_output`, especially when `ip_fragment` and `ip_do_fragment`,
called (indirectly) by `ip_finish_output` itself, both cap the output mtu
by this `frag_max_size`.

I did some experiement with two connected machines, one as a router
with ipv4/ipv6 NAT and 1500 mtu, the other as a client behind NAT
and 1280 mtu. Since NAT was enabled on the router it will do
defragmentation. Using `traceroute`, the client sent udp packets with length
1500, which were fragmented by itself. I captured packets
on the ingress and egress port of the NAT router, and here's the output:

--- IPv4 with NAT (and defrag)
ingress:
09:26:39.030436 IP 192.168.1.2.44654 > 223.5.5.5.33434: UDP, bad
length 1472 > 1248
09:26:39.030451 IP 192.168.1.2 > 223.5.5.5: ip-proto-17

egress:
09:26:39.030543 IP 202.38.101.2.58599 > 223.5.5.5.33437: UDP, length 1472

--- IPv6 with NAT (and defrag)
ingress:
09:18:26.947246 IP6 fdff::2 > 2001:250:3::1: frag (0|1232) 57242 >
33434: UDP, bad length 1452 > 1224
09:18:26.947262 IP6 fdff::2 > 2001:250:3::1: frag (1232|228)

egress:
09:18:26.947362 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (0|1232)
49616 > 33437: UDP, bad length 1452 > 1224
09:18:26.947365 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (1232|228)
-

It can be seen that with defragmentation, IPv6 keeps the fragments
below `frag_max_size`, while IPv4 doesn't. I understand that IPv6 routers
are not allowed to meddle with fragmentation, while IPv4 routes can at least
further fragment packets; but judging from the behavior of `ip_fragment`,
I think the IPv4 code is also trying to honor `frag_max_size`, but didn't
check it when deciding to do fragmentation or not.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let me
know. It's my first time trying to ask questions to Linux developers, and
sorry for the disturbance.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

             reply	other threads:[~2018-11-21  2:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-21  2:01 Wenxin Wang [this message]
2018-11-21  2:01 ` question: frag_max_size not checked in ip_finish_output Wenxin Wang
  -- strict thread matches above, loose matches on Subject: below --
2018-11-20 17:02 Wenxin Wang
2018-11-20 17:02 ` Wenxin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA3R8U-+i9mpWTyPNz5wgXCFf2Z8wQ=gT-NxfjRGKjkjFD62NA@mail.gmail.com' \
    --to=wenxin.wang94@gmail.com \
    --cc=davem@davemloft.net \
    --cc=kernelnewbies@kernelnewbies.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).