Kernel Newbies archive on lore.kernel.org
 help / Atom feed
* question: frag_max_size not checked in ip_finish_output
@ 2018-11-21  2:01 wenxin.wang94
  2018-11-21  2:01 ` Wenxin Wang
  0 siblings, 1 reply; 4+ messages in thread
From: wenxin.wang94 @ 2018-11-21  2:01 UTC (permalink / raw)
  To: kernelnewbies

Dear developers,

It seems that with defragmentation enabled,
`ip_finish_output` doesn't honor `IPCB(skb)->frag_max_size`,
while `ip6_finish_output` checks `IP6CB(skb)->frag_max_size`.
(Sorry for the reposting, I found that I need to subscribe
to the mailing list, and I also add results of my experiment)

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine
from sending packets longer than the maximum fragment received.
I'm wondering why `ip_finish_output` doesn't do the same as
`ip6_finish_output`, especially when `ip_fragment` and `ip_do_fragment`,
called (indirectly) by `ip_finish_output` itself, both cap the output mtu
by this `frag_max_size`.

I did some experiement with two connected machines, one as a router
with ipv4/ipv6 NAT and 1500 mtu, the other as a client behind NAT
and 1280 mtu. Since NAT was enabled on the router it will do
defragmentation. Using `traceroute`, the client sent udp packets with length
1500, which were fragmented by itself. I captured packets
on the ingress and egress port of the NAT router, and here's the output:

--- IPv4 with NAT (and defrag)
ingress:
09:26:39.030436 IP 192.168.1.2.44654 > 223.5.5.5.33434: UDP, bad
length 1472 > 1248
09:26:39.030451 IP 192.168.1.2 > 223.5.5.5: ip-proto-17

egress:
09:26:39.030543 IP 202.38.101.2.58599 > 223.5.5.5.33437: UDP, length 1472

--- IPv6 with NAT (and defrag)
ingress:
09:18:26.947246 IP6 fdff::2 > 2001:250:3::1: frag (0|1232) 57242 >
33434: UDP, bad length 1452 > 1224
09:18:26.947262 IP6 fdff::2 > 2001:250:3::1: frag (1232|228)

egress:
09:18:26.947362 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (0|1232)
49616 > 33437: UDP, bad length 1452 > 1224
09:18:26.947365 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (1232|228)
-

It can be seen that with defragmentation, IPv6 keeps the fragments
below `frag_max_size`, while IPv4 doesn't. I understand that IPv6 routers
are not allowed to meddle with fragmentation, while IPv4 routes can at least
further fragment packets; but judging from the behavior of `ip_fragment`,
I think the IPv4 code is also trying to honor `frag_max_size`, but didn't
check it when deciding to do fragmentation or not.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let me
know. It's my first time trying to ask questions to Linux developers, and
sorry for the disturbance.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* question: frag_max_size not checked in ip_finish_output
  2018-11-21  2:01 question: frag_max_size not checked in ip_finish_output wenxin.wang94
@ 2018-11-21  2:01 ` Wenxin Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Wenxin Wang @ 2018-11-21  2:01 UTC (permalink / raw)
  To: kernelnewbies; +Cc: Hideaki YOSHIFUJI, Alexey Kuznetsov, David S. Miller

Dear developers,

It seems that with defragmentation enabled,
`ip_finish_output` doesn't honor `IPCB(skb)->frag_max_size`,
while `ip6_finish_output` checks `IP6CB(skb)->frag_max_size`.
(Sorry for the reposting, I found that I need to subscribe
to the mailing list, and I also add results of my experiment)

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine
from sending packets longer than the maximum fragment received.
I'm wondering why `ip_finish_output` doesn't do the same as
`ip6_finish_output`, especially when `ip_fragment` and `ip_do_fragment`,
called (indirectly) by `ip_finish_output` itself, both cap the output mtu
by this `frag_max_size`.

I did some experiement with two connected machines, one as a router
with ipv4/ipv6 NAT and 1500 mtu, the other as a client behind NAT
and 1280 mtu. Since NAT was enabled on the router it will do
defragmentation. Using `traceroute`, the client sent udp packets with length
1500, which were fragmented by itself. I captured packets
on the ingress and egress port of the NAT router, and here's the output:

--- IPv4 with NAT (and defrag)
ingress:
09:26:39.030436 IP 192.168.1.2.44654 > 223.5.5.5.33434: UDP, bad
length 1472 > 1248
09:26:39.030451 IP 192.168.1.2 > 223.5.5.5: ip-proto-17

egress:
09:26:39.030543 IP 202.38.101.2.58599 > 223.5.5.5.33437: UDP, length 1472

--- IPv6 with NAT (and defrag)
ingress:
09:18:26.947246 IP6 fdff::2 > 2001:250:3::1: frag (0|1232) 57242 >
33434: UDP, bad length 1452 > 1224
09:18:26.947262 IP6 fdff::2 > 2001:250:3::1: frag (1232|228)

egress:
09:18:26.947362 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (0|1232)
49616 > 33437: UDP, bad length 1452 > 1224
09:18:26.947365 IP6 2001:da8:c4c6:2::2 > 2001:250:3::1: frag (1232|228)
-

It can be seen that with defragmentation, IPv6 keeps the fragments
below `frag_max_size`, while IPv4 doesn't. I understand that IPv6 routers
are not allowed to meddle with fragmentation, while IPv4 routes can at least
further fragment packets; but judging from the behavior of `ip_fragment`,
I think the IPv4 code is also trying to honor `frag_max_size`, but didn't
check it when deciding to do fragmentation or not.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let me
know. It's my first time trying to ask questions to Linux developers, and
sorry for the disturbance.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 4+ messages in thread

* question: frag_max_size not checked in ip_finish_output
@ 2018-11-20 17:02 wenxin.wang94
  2018-11-20 17:02 ` Wenxin Wang
  0 siblings, 1 reply; 4+ messages in thread
From: wenxin.wang94 @ 2018-11-20 17:02 UTC (permalink / raw)
  To: kernelnewbies

Dear developers,
I'm trying to understand the different behavior between `ip_finish_output` and
`ip6_finish_output`, when deciding whether to do fragmentation or not.

`ip_finish_output` calls `ip_fragment` when `skb->len` exceeds the
destination mtu;
In addition to this mtu check, `ip6_finish_output` also checks if
`skb->len > IP6CB(skb)->frag_max_size`.

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine from sending
packets longer than the maximum fragment received after defragmentation.
I'm wondering why `ip_finish_output` doesn't check similarily for
`IPCB(skb)->frag_max_size`,
especially when `ip_fragment` and `ip_do_fragment`, called
(indirectly) by `ip_finish_output`,
both cap the output mtu by this frag_max_size.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let
me know. It's my first
time trying to ask questions to Linux developers, and sorry for the
disturbance. Currently I'm
not subscribed to any mailing list, but I will if necessary.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* question: frag_max_size not checked in ip_finish_output
  2018-11-20 17:02 wenxin.wang94
@ 2018-11-20 17:02 ` Wenxin Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Wenxin Wang @ 2018-11-20 17:02 UTC (permalink / raw)
  To: kernelnewbies; +Cc: Hideaki YOSHIFUJI, Alexey Kuznetsov, David S. Miller

Dear developers,
I'm trying to understand the different behavior between `ip_finish_output` and
`ip6_finish_output`, when deciding whether to do fragmentation or not.

`ip_finish_output` calls `ip_fragment` when `skb->len` exceeds the
destination mtu;
In addition to this mtu check, `ip6_finish_output` also checks if
`skb->len > IP6CB(skb)->frag_max_size`.

The relevant code is here
https://elixir.bootlin.com/linux/latest/source/net/ipv4/ip_output.c#L310
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_output.c#L151

As far as I know, `frag_max_size` prevents the forwarding routine from sending
packets longer than the maximum fragment received after defragmentation.
I'm wondering why `ip_finish_output` doesn't check similarily for
`IPCB(skb)->frag_max_size`,
especially when `ip_fragment` and `ip_do_fragment`, called
(indirectly) by `ip_finish_output`,
both cap the output mtu by this frag_max_size.

Many thanks in advance!
If I'm sending to the wrong person, or wrong mailing list, please let
me know. It's my first
time trying to ask questions to Linux developers, and sorry for the
disturbance. Currently I'm
not subscribed to any mailing list, but I will if necessary.

Thank you for making Linux great ;)
Sincerely,
Wenxin Wang

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21  2:01 question: frag_max_size not checked in ip_finish_output wenxin.wang94
2018-11-21  2:01 ` Wenxin Wang
  -- strict thread matches above, loose matches on Subject: below --
2018-11-20 17:02 wenxin.wang94
2018-11-20 17:02 ` Wenxin Wang

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org kernelnewbies@archiver.kernel.org
	public-inbox-index kernelnewbies


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/ public-inbox