netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/2] net: ipv6: Update route MTU behavior
@ 2022-06-14  5:01 Subash Abhinov Kasiviswanathan
  2022-06-14  5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan
  2022-06-14  5:01 ` [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior Subash Abhinov Kasiviswanathan
  0 siblings, 2 replies; 12+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2022-06-14  5:01 UTC (permalink / raw)
  To: davem, dsahern, yoshfuji, kuba, netdev, sbrivio
  Cc: Subash Abhinov Kasiviswanathan

This is a follow up to the patch posted by Kaustubh
https://lore.kernel.org/netdev/1614011555-21951-1-git-send-email-kapandey@codeaurora.org/T/

This series addresses the comments from David Ahern to
update the exception route logic in the original patch and
add a new patch to update the IPv6 MTU tests.

Kaustubh Pandey (1):
  ipv6: Honor route mtu if it is within limit of dev mtu

Subash Abhinov Kasiviswanathan (1):
  tools: selftests: Update tests for new IPv6 route MTU behavior

 net/ipv6/route.c                    | 11 +----------
 tools/testing/selftests/net/pmtu.sh | 12 ++++++++----
 2 files changed, 9 insertions(+), 14 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-14  5:01 [PATCH net v2 0/2] net: ipv6: Update route MTU behavior Subash Abhinov Kasiviswanathan
@ 2022-06-14  5:01 ` Subash Abhinov Kasiviswanathan
  2022-06-14 12:27   ` Stefano Brivio
  2022-06-16  0:35   ` Jakub Kicinski
  2022-06-14  5:01 ` [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior Subash Abhinov Kasiviswanathan
  1 sibling, 2 replies; 12+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2022-06-14  5:01 UTC (permalink / raw)
  To: davem, dsahern, yoshfuji, kuba, netdev, sbrivio
  Cc: Kaustubh Pandey, Sean Tranchetti, Subash Abhinov Kasiviswanathan

From: Kaustubh Pandey <quic_kapandey@quicinc.com>

When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised.

addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route ->
fib6_nh_mtu_change

As part of handling NETDEV_CHANGEMTU notification we land up on a
condition where if route mtu is less than dev mtu and route mtu equals
ipv6_devconf mtu, route mtu gets updated.

Due to this v6 traffic end up using wrong MTU then configured earlier.
This commit fixes this by removing comparison with ipv6_devconf
and updating route mtu only when it is greater than incoming dev mtu.

This can be easily reproduced with below script:
pre-condition:
device up(mtu = 1500) and route mtu for both v4 and v6 is 1500

test-script:
ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400
ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400
echo 1400 > /sys/class/net/eth0/mtu
ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500
echo 1500 > /sys/class/net/eth0/mtu

Fixes: e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes")
Signed-off-by: Kaustubh Pandey <quic_kapandey@quicinc.com>
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
---
v1 -> v2: Update the exception route logic as mentioned by David Ahern.
Also add fixes tag.

 net/ipv6/route.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d25dc83..6f7e8c5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1991,19 +1991,11 @@ static bool rt6_mtu_change_route_allowed(struct inet6_dev *idev,
 	/* If the new MTU is lower than the route PMTU, this new MTU will be the
 	 * lowest MTU in the path: always allow updating the route PMTU to
 	 * reflect PMTU decreases.
-	 *
-	 * If the new MTU is higher, and the route PMTU is equal to the local
-	 * MTU, this means the old MTU is the lowest in the path, so allow
-	 * updating it: if other nodes now have lower MTUs, PMTU discovery will
-	 * handle this.
 	 */
 
 	if (dst_mtu(&rt->dst) >= mtu)
 		return true;
 
-	if (dst_mtu(&rt->dst) == idev->cnf.mtu6)
-		return true;
-
 	return false;
 }
 
@@ -4914,8 +4906,7 @@ static int fib6_nh_mtu_change(struct fib6_nh *nh, void *_arg)
 		struct inet6_dev *idev = __in6_dev_get(arg->dev);
 		u32 mtu = f6i->fib6_pmtu;
 
-		if (mtu >= arg->mtu ||
-		    (mtu < arg->mtu && mtu == idev->cnf.mtu6))
+		if (mtu >= arg->mtu)
 			fib6_metric_set(f6i, RTAX_MTU, arg->mtu);
 
 		spin_lock_bh(&rt6_exception_lock);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior
  2022-06-14  5:01 [PATCH net v2 0/2] net: ipv6: Update route MTU behavior Subash Abhinov Kasiviswanathan
  2022-06-14  5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan
@ 2022-06-14  5:01 ` Subash Abhinov Kasiviswanathan
  1 sibling, 0 replies; 12+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2022-06-14  5:01 UTC (permalink / raw)
  To: davem, dsahern, yoshfuji, kuba, netdev, sbrivio
  Cc: Subash Abhinov Kasiviswanathan, Sean Tranchetti

The IPv6 route MTU no longer increases in case the interface MTU is
increased. Update the tests pmtu_ipv6_exception and  pmtu_vti6_exception
to account for this behavior.

Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
---
v2: New patch added to the series

 tools/testing/selftests/net/pmtu.sh | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 736e358..dac2101 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -1067,11 +1067,15 @@ test_pmtu_ipvX() {
 	pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})"
 	check_pmtu_value "1500" "${pmtu_2}" "changing local MTU on a link not on this path" || return 1
 
-	# Increase MTU, check for PMTU increase in route exception
+	# Increase MTU, check for PMTU increase in route exception for IPv4 only
 	mtu "${ns_a}"  veth_A-R1 1700
 	mtu "${ns_r1}" veth_R1-A 1700
 	pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst1})"
-	check_pmtu_value "1700" "${pmtu_1}" "increasing local MTU" || return 1
+	if [ ${family} -eq 4 ]; then
+		check_pmtu_value "1700" "${pmtu_1}" "increasing local MTU" || return 1
+	else
+		check_pmtu_value "1300" "${pmtu_1}" "no change in local MTU" || return 1
+	fi
 	# Second exception shouldn't be modified
 	pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst2})"
 	check_pmtu_value "1500" "${pmtu_2}" "changing local MTU on a link not on this path" || return 1
@@ -1637,10 +1641,10 @@ test_pmtu_vti6_exception() {
 	pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel6_b_addr})"
 	check_pmtu_value "3000" "${pmtu}" "decreasing tunnel MTU" || fail=1
 
-	# Increase tunnel MTU, check for PMTU increase in route exception
+	# Increase tunnel MTU, confirm no PMTU increase in route exception
 	mtu "${ns_a}" vti6_a 9000
 	pmtu="$(route_get_dst_pmtu_from_exception "${ns_a}" ${tunnel6_b_addr})"
-	check_pmtu_value "9000" "${pmtu}" "increasing tunnel MTU" || fail=1
+	check_pmtu_value "3000" "${pmtu}" "no change in tunnel MTU" || fail=1
 
 	return ${fail}
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-14  5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan
@ 2022-06-14 12:27   ` Stefano Brivio
  2022-06-14 18:34     ` Subash Abhinov Kasiviswanathan (KS)
  2022-06-16  0:35   ` Jakub Kicinski
  1 sibling, 1 reply; 12+ messages in thread
From: Stefano Brivio @ 2022-06-14 12:27 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan, Kaustubh Pandey
  Cc: davem, dsahern, yoshfuji, kuba, netdev, Sean Tranchetti

Hi Kaustubh,

On Mon, 13 Jun 2022 23:01:54 -0600
Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> wrote:

> From: Kaustubh Pandey <quic_kapandey@quicinc.com>
> 
> When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised.
> 
> addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route ->
> fib6_nh_mtu_change
> 
> As part of handling NETDEV_CHANGEMTU notification we land up on a
> condition where if route mtu is less than dev mtu and route mtu equals
> ipv6_devconf mtu, route mtu gets updated.
> 
> Due to this v6 traffic end up using wrong MTU then configured earlier.

I read this a few times but I still fail to understand what issue
you're actually fixing -- what makes this new MTU "wrong"?

The idea behind the original implementation is that, when an interface
MTU is administratively updated, we should allow PMTU updates, if the
old PMTU was matching the interface MTU, because the old MTU setting
might have been the one limiting the MTU on the whole path.

That is, if you lower the MTU on an interface, and then increase it
back, a permanently lower PMTU is somewhat unexpected. As far as I can
see, this behaviour persists with this patch, but:

> This commit fixes this by removing comparison with ipv6_devconf
> and updating route mtu only when it is greater than incoming dev mtu.

...I'm not sure what you really mean by "incoming dev mtu". Is it the
newly configured one?

-- 
Stefano


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-14 12:27   ` Stefano Brivio
@ 2022-06-14 18:34     ` Subash Abhinov Kasiviswanathan (KS)
  0 siblings, 0 replies; 12+ messages in thread
From: Subash Abhinov Kasiviswanathan (KS) @ 2022-06-14 18:34 UTC (permalink / raw)
  To: Stefano Brivio, Kaustubh Pandey
  Cc: davem, dsahern, yoshfuji, kuba, netdev, Sean Tranchetti

> I read this a few times but I still fail to understand what issue
> you're actually fixing -- what makes this new MTU "wrong"?
> 
> The idea behind the original implementation is that, when an interface
> MTU is administratively updated, we should allow PMTU updates, if the
> old PMTU was matching the interface MTU, because the old MTU setting
> might have been the one limiting the MTU on the whole path.

Hi Stefano

Here is some additional background on the observations - consider a case
where an interface is brought up with IPv6 connection first with PMTU 
1280 (set via the MTU option of a router advertisement) followed by an 
IPv4 connection with PMTU 1700. An userspace management entity sets the 
link MTU to the maximum of IPv4 and IPv6 PMTUs.

When the IPv4 connection is brought up, the link MTU changes to 1700 
(max of IPv4 & IPv6 PMTUs in this case) which unnecessarily causes the 
PMTUD to occur for the IPv6 case.


> That is, if you lower the MTU on an interface, and then increase it
> back, a permanently lower PMTU is somewhat unexpected. As far as I can
> see, this behaviour persists with this patch, but:

I agree that this would indeed occur after the patch. The assumption 
here is that there would be an update from a router via a new router
advertisement if the IPv6 PMTU has to be increased / updated.

> ...I'm not sure what you really mean by "incoming dev mtu". Is it the
> newly configured one?

Yes, this is the new MTU configured by userspace.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-14  5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan
  2022-06-14 12:27   ` Stefano Brivio
@ 2022-06-16  0:35   ` Jakub Kicinski
  2022-06-16  1:21     ` Maciej Żenczykowski
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-06-16  0:35 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: davem, dsahern, yoshfuji, netdev, sbrivio, Kaustubh Pandey,
	Sean Tranchetti, maze

On Mon, 13 Jun 2022 23:01:54 -0600 Subash Abhinov Kasiviswanathan wrote:
> When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised.
> 
> addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route ->
> fib6_nh_mtu_change
> 
> As part of handling NETDEV_CHANGEMTU notification we land up on a
> condition where if route mtu is less than dev mtu and route mtu equals
> ipv6_devconf mtu, route mtu gets updated.
> 
> Due to this v6 traffic end up using wrong MTU then configured earlier.
> This commit fixes this by removing comparison with ipv6_devconf
> and updating route mtu only when it is greater than incoming dev mtu.
> 
> This can be easily reproduced with below script:
> pre-condition:
> device up(mtu = 1500) and route mtu for both v4 and v6 is 1500
> 
> test-script:
> ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400
> ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400
> echo 1400 > /sys/class/net/eth0/mtu
> ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500
> echo 1500 > /sys/class/net/eth0/mtu

CC maze, please add him if there is v3

I feel like the problem is with the fact that link mtu resets protocol
MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
whatever other quantification of infinity there is) so you don't have 
to touch it as you discover the MTU for v4 and v6? 

My worry is that the tweaking of the route MTU update heuristic will
have no end. 

Stefano, does that makes sense or you think the change is good?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-16  0:35   ` Jakub Kicinski
@ 2022-06-16  1:21     ` Maciej Żenczykowski
  2022-06-16  5:36       ` Subash Abhinov Kasiviswanathan (KS)
  0 siblings, 1 reply; 12+ messages in thread
From: Maciej Żenczykowski @ 2022-06-16  1:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Subash Abhinov Kasiviswanathan, David S. Miller, David Ahern,
	Hideaki YOSHIFUJI, Linux NetDev, Stefano Brivio, Kaustubh Pandey,
	Sean Tranchetti

On Wed, Jun 15, 2022 at 5:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 13 Jun 2022 23:01:54 -0600 Subash Abhinov Kasiviswanathan wrote:
> > When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised.
> >
> > addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route ->
> > fib6_nh_mtu_change
> >
> > As part of handling NETDEV_CHANGEMTU notification we land up on a
> > condition where if route mtu is less than dev mtu and route mtu equals
> > ipv6_devconf mtu, route mtu gets updated.
> >
> > Due to this v6 traffic end up using wrong MTU then configured earlier.
> > This commit fixes this by removing comparison with ipv6_devconf
> > and updating route mtu only when it is greater than incoming dev mtu.
> >
> > This can be easily reproduced with below script:
> > pre-condition:
> > device up(mtu = 1500) and route mtu for both v4 and v6 is 1500
> >
> > test-script:
> > ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400
> > ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400
> > echo 1400 > /sys/class/net/eth0/mtu
> > ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500
> > echo 1500 > /sys/class/net/eth0/mtu
>
> CC maze, please add him if there is v3
>
> I feel like the problem is with the fact that link mtu resets protocol
> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
> whatever other quantification of infinity there is) so you don't have
> to touch it as you discover the MTU for v4 and v6?
>
> My worry is that the tweaking of the route MTU update heuristic will
> have no end.
>
> Stefano, does that makes sense or you think the change is good?

I vaguely recall that if you don't want device mtu changes to affect
ipv6 route mtu, then you should set 'mtu lock' on the routes.
(this meaning of 'lock' for v6 is different than for ipv4, where
'lock' means transmit IPv4/TCP with Don't Frag bit unset)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-16  1:21     ` Maciej Żenczykowski
@ 2022-06-16  5:36       ` Subash Abhinov Kasiviswanathan (KS)
  2022-06-16  7:33         ` Maciej Żenczykowski
  2022-06-16 13:39         ` Stefano Brivio
  0 siblings, 2 replies; 12+ messages in thread
From: Subash Abhinov Kasiviswanathan (KS) @ 2022-06-16  5:36 UTC (permalink / raw)
  To: Maciej Żenczykowski, Jakub Kicinski
  Cc: David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev,
	Stefano Brivio, Kaustubh Pandey, Sean Tranchetti

>> CC maze, please add him if there is v3
>>
>> I feel like the problem is with the fact that link mtu resets protocol
>> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
>> whatever other quantification of infinity there is) so you don't have
>> to touch it as you discover the MTU for v4 and v6?

That's a good point.

>>
>> My worry is that the tweaking of the route MTU update heuristic will
>> have no end.
>>
>> Stefano, does that makes sense or you think the change is good?

The only concern is that current behavior causes the initial packets 
after interface MTU increase to get dropped as part of PMTUD if the IPv6 
PMTU itself didn't increase. I am not sure if that was the intended 
behavior as part of the original change. Stefano, could you please confirm?

> I vaguely recall that if you don't want device mtu changes to affect
> ipv6 route mtu, then you should set 'mtu lock' on the routes.
> (this meaning of 'lock' for v6 is different than for ipv4, where
> 'lock' means transmit IPv4/TCP with Don't Frag bit unset)

I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes 
statically. The issue with that approach is that router advertisements 
can no longer update PMTU once a static route is configured.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-16  5:36       ` Subash Abhinov Kasiviswanathan (KS)
@ 2022-06-16  7:33         ` Maciej Żenczykowski
  2022-06-16 16:42           ` Jakub Kicinski
  2022-06-16 13:39         ` Stefano Brivio
  1 sibling, 1 reply; 12+ messages in thread
From: Maciej Żenczykowski @ 2022-06-16  7:33 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan (KS)
  Cc: Jakub Kicinski, David S. Miller, David Ahern, Hideaki YOSHIFUJI,
	Linux NetDev, Stefano Brivio, Kaustubh Pandey, Sean Tranchetti

On Wed, Jun 15, 2022 at 10:36 PM Subash Abhinov Kasiviswanathan (KS)
<quic_subashab@quicinc.com> wrote:
>
> >> CC maze, please add him if there is v3
> >>
> >> I feel like the problem is with the fact that link mtu resets protocol
> >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
> >> whatever other quantification of infinity there is) so you don't have
> >> to touch it as you discover the MTU for v4 and v6?
>
> That's a good point.

Because link mtu affects rx mtu which affects nic buffer allocations.
Somewhere in the vicinity of mtu 1500..2048 your packets stop fitting
in 2kB of memory and need 4kB (or more)

> >> My worry is that the tweaking of the route MTU update heuristic will
> >> have no end.
> >>
> >> Stefano, does that makes sense or you think the change is good?
>
> The only concern is that current behavior causes the initial packets
> after interface MTU increase to get dropped as part of PMTUD if the IPv6
> PMTU itself didn't increase. I am not sure if that was the intended
> behavior as part of the original change. Stefano, could you please confirm?
>
> > I vaguely recall that if you don't want device mtu changes to affect
> > ipv6 route mtu, then you should set 'mtu lock' on the routes.
> > (this meaning of 'lock' for v6 is different than for ipv4, where
> > 'lock' means transmit IPv4/TCP with Don't Frag bit unset)
>
> I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes
> statically. The issue with that approach is that router advertisements
> can no longer update PMTU once a static route is configured.

yeah.   Hmm should RA generated routes use locked mtu too?
I think the only reason an RA generated route would have mtu information
is for it to stick...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-16  5:36       ` Subash Abhinov Kasiviswanathan (KS)
  2022-06-16  7:33         ` Maciej Żenczykowski
@ 2022-06-16 13:39         ` Stefano Brivio
  1 sibling, 0 replies; 12+ messages in thread
From: Stefano Brivio @ 2022-06-16 13:39 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan (KS),
	Maciej Żenczykowski, Jakub Kicinski
  Cc: David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev,
	Kaustubh Pandey, Sean Tranchetti

[Subash, please fix quoting of replies in your client, it's stripping
email authors. I rebuilt the chain here but it's kind of painful]

On Wed, 15 Jun 2022 23:36:10 -0600
"Subash Abhinov Kasiviswanathan (KS)" <quic_subashab@quicinc.com> wrote:

> On Wed, 15 Jun 2022 18:21:07 -0700
> Maciej Żenczykowski <maze@google.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 5:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > CC maze, please add him if there is v3
> > >
> > > I feel like the problem is with the fact that link mtu resets protocol
> > > MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
> > > whatever other quantification of infinity there is)

2^16 - 1, works for both IPv4 and IPv6.

> > > so you don't have
> > > to touch it as you discover the MTU for v4 and v6?  
> 
> That's a good point.
> 
> > > My worry is that the tweaking of the route MTU update heuristic will
> > > have no end.
> > >
> > > Stefano, does that makes sense or you think the change is good?  

It makes sense -- I'm also worried that we're introducing another small
issue to fix what, I think, is the smallest possible inconvenience.

> The only concern is that current behavior causes the initial packets 
> after interface MTU increase to get dropped as part of PMTUD if the IPv6 
> PMTU itself didn't increase. I am not sure if that was the intended 
> behavior as part of the original change. Stefano, could you please confirm?

Correct, that was the intended behaviour, because I think one dropped
packet is the smallest possible price we can pay for, knowingly, not
having anymore a PMTU estimate that's accurate in terms of RFC 1191.

> > I vaguely recall that if you don't want device mtu changes to affect
> > ipv6 route mtu, then you should set 'mtu lock' on the routes.
> > (this meaning of 'lock' for v6 is different than for ipv4, where
> > 'lock' means transmit IPv4/TCP with Don't Frag bit unset)  

"Locked" exceptions are rather what's created as a result of ICMP and
ICMPv6 messages -- I guess you can have a look or run the basic
pmtu_ipv4() and pmtu_ipv6() to get a sense of it.

With the existing implementation, if you increase the link MTU to a
value that's bigger than the locked value from PMTU discovery, it will
not increase in general: the exception is locking it. That's what's
described in the comment that this patch is removing.

It will increase only under that specific condition, namely, if the
current PMTU estimate is the same as the old link MTU, because then we
can take the reasonable assumption that our link was the limiting
factor, and not some other link on the path. It might be wrong, but I
still maintain it's a reasonable assumption, and, most importantly, we
have no way to prove it wrong without PMTU discovery.

-- 
Stefano


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-16  7:33         ` Maciej Żenczykowski
@ 2022-06-16 16:42           ` Jakub Kicinski
  2022-06-16 17:08             ` Maciej Żenczykowski
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-06-16 16:42 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Subash Abhinov Kasiviswanathan (KS),
	David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev,
	Stefano Brivio, Kaustubh Pandey, Sean Tranchetti

On Thu, 16 Jun 2022 00:33:02 -0700 Maciej Żenczykowski wrote:
> On Wed, Jun 15, 2022 at 10:36 PM Subash Abhinov Kasiviswanathan (KS) <quic_subashab@quicinc.com> wrote:
> > >> CC maze, please add him if there is v3
> > >>
> > >> I feel like the problem is with the fact that link mtu resets protocol
> > >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
> > >> whatever other quantification of infinity there is) so you don't have
> > >> to touch it as you discover the MTU for v4 and v6?  
> >
> > That's a good point.  
> 
> Because link mtu affects rx mtu which affects nic buffer allocations.
> Somewhere in the vicinity of mtu 1500..2048 your packets stop fitting
> in 2kB of memory and need 4kB (or more)

I was afraid someone would point that out :) Luckily the values Subash
mentioned were both under 2k, and hope fully the device can do scatter? 
🤞😟 (Don't modems do LRO or some other form of aggregation usually?)

> > >> My worry is that the tweaking of the route MTU update heuristic will
> > >> have no end.
> > >>
> > >> Stefano, does that makes sense or you think the change is good?  
> >
> > The only concern is that current behavior causes the initial packets
> > after interface MTU increase to get dropped as part of PMTUD if the IPv6
> > PMTU itself didn't increase. I am not sure if that was the intended
> > behavior as part of the original change. Stefano, could you please confirm?
> >  
> > > I vaguely recall that if you don't want device mtu changes to affect
> > > ipv6 route mtu, then you should set 'mtu lock' on the routes.
> > > (this meaning of 'lock' for v6 is different than for ipv4, where
> > > 'lock' means transmit IPv4/TCP with Don't Frag bit unset)  
> >
> > I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes
> > statically. The issue with that approach is that router advertisements
> > can no longer update PMTU once a static route is configured.  
> 
> yeah.   Hmm should RA generated routes use locked mtu too?
> I think the only reason an RA generated route would have mtu information
> is for it to stick...

If link MTU is lower than RA MTU do we do min() or ignore the RA MTU?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu
  2022-06-16 16:42           ` Jakub Kicinski
@ 2022-06-16 17:08             ` Maciej Żenczykowski
  0 siblings, 0 replies; 12+ messages in thread
From: Maciej Żenczykowski @ 2022-06-16 17:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Subash Abhinov Kasiviswanathan (KS),
	David S. Miller, David Ahern, Hideaki YOSHIFUJI, Linux NetDev,
	Stefano Brivio, Kaustubh Pandey, Sean Tranchetti

On Thu, Jun 16, 2022 at 9:42 AM Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 16 Jun 2022 00:33:02 -0700 Maciej Żenczykowski wrote:
> > On Wed, Jun 15, 2022 at 10:36 PM Subash Abhinov Kasiviswanathan (KS) <quic_subashab@quicinc.com> wrote:
> > > >> CC maze, please add him if there is v3
> > > >>
> > > >> I feel like the problem is with the fact that link mtu resets protocol
> > > >> MTUs. Nothing we can do about that, so why not set link MTU to 9k (or
> > > >> whatever other quantification of infinity there is) so you don't have
> > > >> to touch it as you discover the MTU for v4 and v6?
> > >
> > > That's a good point.
> >
> > Because link mtu affects rx mtu which affects nic buffer allocations.
> > Somewhere in the vicinity of mtu 1500..2048 your packets stop fitting
> > in 2kB of memory and need 4kB (or more)
>
> I was afraid someone would point that out :) Luckily the values Subash
> mentioned were both under 2k, and hope fully the device can do scatter?
> 🤞😟 (Don't modems do LRO or some other form of aggregation usually?)

You'd be amazed at how ...minimal... these (cellphone modem/wifi) devices are.

I've long given up on expecting these devices to do fundamental things
like scatter gather, or transmit or receive checksum offload.

Sure *some* newer ones are better and can even do TSO or some form of
HWGRO, maybe even some limited multiqueue, but it's rare.

note: > ~3.5kB mtu also breaks (or at least used to???) xdp, because
of that requiring a single page.

Additionally, a severe lack of trust in cell/wifi firmware's ability
to withstand remote compromise (due to inability to audit the source
code), means sometimes the nics even lack DMA access to system ram,
and instead either rx, or both rx and tx are bounce buffered, either
by vritue of driver doing memcpy or some separate hw dma engine.

> > > >> My worry is that the tweaking of the route MTU update heuristic will
> > > >> have no end.
> > > >>
> > > >> Stefano, does that makes sense or you think the change is good?
> > >
> > > The only concern is that current behavior causes the initial packets
> > > after interface MTU increase to get dropped as part of PMTUD if the IPv6
> > > PMTU itself didn't increase. I am not sure if that was the intended
> > > behavior as part of the original change. Stefano, could you please confirm?
> > >
> > > > I vaguely recall that if you don't want device mtu changes to affect
> > > > ipv6 route mtu, then you should set 'mtu lock' on the routes.
> > > > (this meaning of 'lock' for v6 is different than for ipv4, where
> > > > 'lock' means transmit IPv4/TCP with Don't Frag bit unset)
> > >
> > > I assume 'mtu lock' here refers to setting the PMTU on the IPv6 routes
> > > statically. The issue with that approach is that router advertisements
> > > can no longer update PMTU once a static route is configured.
> >
> > yeah.   Hmm should RA generated routes use locked mtu too?
> > I think the only reason an RA generated route would have mtu information
> > is for it to stick...
>
> If link MTU is lower than RA MTU do we do min() or ignore the RA MTU?

I think we simply ignore it - if link mtu is changed, we'll update the
routes on the next RA we receive (which will presumably have the mtu
information again).
Perhaps link mtu change should result in immediate RS to get an RA soon??

Behaviour for mtu > link mtu heavily depends on the driver.
For RX many drivers will fail to receive packets larger than link mtu
(rx buffer overrun), but often there's some wiggle room - this is due
to how rx buffers are allocated.
ie. 1500 mtu means can receive up to 1536 byte packets...

For tx it again depends on the driver, some reject packets > mtu,
others can actually send arbitrary sized packets (up to some limit
like 64KB or even higher), because tx allocation does not require any
statically sized buffers like receive does.

All this means that ultimately route MTU > link mtu is unlikely to
work no matter what we do - on at least some nics.

Anyway... lots of words to say ignore 'RA MTU > link/device mtu' seems
like the right call,
while if it is <= then set it as locked?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-16 17:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  5:01 [PATCH net v2 0/2] net: ipv6: Update route MTU behavior Subash Abhinov Kasiviswanathan
2022-06-14  5:01 ` [PATCH net v2 1/2] ipv6: Honor route mtu if it is within limit of dev mtu Subash Abhinov Kasiviswanathan
2022-06-14 12:27   ` Stefano Brivio
2022-06-14 18:34     ` Subash Abhinov Kasiviswanathan (KS)
2022-06-16  0:35   ` Jakub Kicinski
2022-06-16  1:21     ` Maciej Żenczykowski
2022-06-16  5:36       ` Subash Abhinov Kasiviswanathan (KS)
2022-06-16  7:33         ` Maciej Żenczykowski
2022-06-16 16:42           ` Jakub Kicinski
2022-06-16 17:08             ` Maciej Żenczykowski
2022-06-16 13:39         ` Stefano Brivio
2022-06-14  5:01 ` [PATCH net v2 2/2] tools: selftests: Update tests for new IPv6 route MTU behavior Subash Abhinov Kasiviswanathan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).