All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-05 22:47 [PATCH net v2] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes Stefano Brivio
@ 2018-03-02 18:54 ` Maciej Żenczykowski
  2018-03-03 11:21   ` Stefano Brivio
  2018-03-02 22:39 ` David Ahern
  1 sibling, 1 reply; 10+ messages in thread
From: Maciej Żenczykowski @ 2018-03-02 18:54 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI, Linux NetDev

Conceptually this is right.

And I'm 100% fine with dev mtu change triggering pmtu decrease.

I'm not so sold on the pmtu increase.

PMTUD is one of those things that never ever works right in practice.
There's too many icmp blackholes, rate limits, overloaded management
cpus in switches,
misconfigurations, missing tcp mss clamps, icmps routed differently
then the flows due to ecmp hashing, middle boxes that don't affect the
icmp but change the tcp stream, etc.

In particular there's a lot of routing hardware that can handle
gigabits or terabits of traffic, but can generate only 10s-100s of
packet too big messages per second (ie. a tiny fraction of line rate
pps).  Worse yet, under overload it often falls back to simply
dropping and generating no icmp errors.

I spend a significant fraction of my time making sure we never rely on PMTUD.

Debugging MTU related blackholes is a constant bane of my existence.

[btw. we're considering adding a hack to always fragment UDP to
min(1280, dev/route/path mtu)...]

Basically: lower is always better because it's more likely to work...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-05 22:47 [PATCH net v2] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes Stefano Brivio
  2018-03-02 18:54 ` [PATCH net] " Maciej Żenczykowski
@ 2018-03-02 22:39 ` David Ahern
  2018-03-03 11:22   ` Stefano Brivio
  1 sibling, 1 reply; 10+ messages in thread
From: David Ahern @ 2018-03-02 22:39 UTC (permalink / raw)
  To: Stefano Brivio, David S . Miller
  Cc: Wei Wang, Hideaki YOSHIFUJI, Maciej Żenczykowski, netdev

On 3/2/18 8:36 AM, Stefano Brivio wrote:
> Currently, administrative MTU changes on a given netdevice are
> not reflected on route exceptions for MTU-less routes, with a
> set PMTU value, for that device:
> 
>  # ip -6 route get 3000::b
>  3000::b from :: dev vti_a proto kernel src 3000::a metric 256 pref medium
>  # ping6 -c 1 -q -s10000 3000::b > /dev/null
>  # ip netns exec a ip -6 route get 3000::b
>  3000::b from :: dev vti_a src 3000::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 3000
>  # ip -6 route get 3000::b
>  3000::b from :: dev vti_a src 3000::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 9000
>  # ip -6 route get 3000::b
>  3000::b from :: dev vti_a src 3000::a metric 0
>      cache expires 571sec mtu 4926 pref medium

Addresses in the 2001:db8: range should be used for commit messages.

And please codify the above expectation as a test under
tools/testing/selftests/net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-02 18:54 ` [PATCH net] " Maciej Żenczykowski
@ 2018-03-03 11:21   ` Stefano Brivio
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Brivio @ 2018-03-03 11:21 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI, Linux NetDev

Hi Maciej,

On Fri, 2 Mar 2018 10:54:36 -0800
Maciej Żenczykowski <zenczykowski@gmail.com> wrote:

> I spend a significant fraction of my time making sure we never rely on PMTUD.

Thanks for your comments.

I see your point, but here we are not blindly relying on PMTUD,
rather reflecting an MTU administrative change on the PMTU, and making
the behaviour consistent between regular routes and exceptions, which
is nothing else than a bug fix.

This behaviour reflects RFC 8201, par. 3:

	The basic idea is that a source node initially assumes that
	the PMTU of a path is the (known) MTU of the first hop in the path.

and the need for it is clearly explained by the existing comment in
rt6_mtu_change_route():

	/* For administrative MTU increase, there is no way to discover
	   IPv6 PMTU increase, so PMTU increase should be updated here.
	   Since RFC 1981 doesn't include administrative MTU increase
	   update PMTU increase is a MUST. (i.e. jumbo frame)
	 */

Letting that aside for a moment, a PMTU increase due to my fix is only
possible if the old local MTU (administratively set) was the lowest in
the path, no PMTUD happened meanwhile (but we have an exception route
in place e.g. due to a tunnel calling skb_dst_update_mtu()), and we get
a subsequent administrative change of the local MTU.

Relying on some old value set by the user is simply a bug, and breaks
the natural user assumption that increasing the MTU will have an
effect, if PMTU is not otherwise constrained.

If PMTUD is not working, we will rely on the MTU values set by the
user. This looks like the only sane thing to do.

> Debugging MTU related blackholes is a constant bane of my existence.
> 
> [btw. we're considering adding a hack to always fragment UDP to
> min(1280, dev/route/path mtu)...]
> 
> Basically: lower is always better because it's more likely to work...

This is not directly related to my fix, but I wonder if we shouldn't,
in general, simply comply with RFCs, and provide ways out in case the
network is broken, instead of breaking expected behaviours by default,
or making things work "by mistake". The way out, here, is as simple as
setting 1280 as MTU for the local interface.

Somebody might say higher is better because you avoid fragmentation. So
I would just keep the implementation compliant (and, perhaps more
importantly, consistent).

-- 
Stefano

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-02 22:39 ` David Ahern
@ 2018-03-03 11:22   ` Stefano Brivio
  2018-03-04 23:12     ` Stefano Brivio
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Brivio @ 2018-03-03 11:22 UTC (permalink / raw)
  To: David Ahern
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI,
	Maciej Żenczykowski, netdev

On Fri, 2 Mar 2018 15:39:03 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 3/2/18 8:36 AM, Stefano Brivio wrote:
> > Currently, administrative MTU changes on a given netdevice are
> > not reflected on route exceptions for MTU-less routes, with a
> > set PMTU value, for that device:
> > 
> >  # ip -6 route get 3000::b
> >  3000::b from :: dev vti_a proto kernel src 3000::a metric 256 pref medium
> >  # ping6 -c 1 -q -s10000 3000::b > /dev/null
> >  # ip netns exec a ip -6 route get 3000::b
> >  3000::b from :: dev vti_a src 3000::a metric 0
> >      cache expires 571sec mtu 4926 pref medium
> >  # ip link set dev vti_a mtu 3000
> >  # ip -6 route get 3000::b
> >  3000::b from :: dev vti_a src 3000::a metric 0
> >      cache expires 571sec mtu 4926 pref medium
> >  # ip link set dev vti_a mtu 9000
> >  # ip -6 route get 3000::b
> >  3000::b from :: dev vti_a src 3000::a metric 0
> >      cache expires 571sec mtu 4926 pref medium  
> 
> Addresses in the 2001:db8: range should be used for commit messages.

Thanks for pointing this out. I never related the "documentation
purposes" from RFC3849 to commit messages so far, but in the end this
is nothing else than documentation. I will post a v2 with updated
commit message.

> And please codify the above expectation as a test under
> tools/testing/selftests/net

And this, along with v2.

-- 
Stefano

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-03 11:22   ` Stefano Brivio
@ 2018-03-04 23:12     ` Stefano Brivio
  2018-03-05  1:11       ` David Ahern
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Brivio @ 2018-03-04 23:12 UTC (permalink / raw)
  To: David Ahern
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI,
	Maciej Żenczykowski, netdev

On Sat, 3 Mar 2018 12:22:36 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> > And please codify the above expectation as a test under
> > tools/testing/selftests/net  
> 
> And this, along with v2.

On a second thought: I start thinking it doesn't make much sense,
especially given the current context of self-tests, to explicitly test
this, because it's a rather particular corner case.

I think it would make more sense to introduce generic tests first.
About, say, PMTU, or route exceptions, but not "tunnel causes route
exception and administrative change doesn't affect PMTU".

-- 
Stefano

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-04 23:12     ` Stefano Brivio
@ 2018-03-05  1:11       ` David Ahern
  2018-03-05 12:29         ` Stefano Brivio
  0 siblings, 1 reply; 10+ messages in thread
From: David Ahern @ 2018-03-05  1:11 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI,
	Maciej Żenczykowski, netdev

On 3/4/18 4:12 PM, Stefano Brivio wrote:
> On Sat, 3 Mar 2018 12:22:36 +0100
> Stefano Brivio <sbrivio@redhat.com> wrote:
> 
>>> And please codify the above expectation as a test under
>>> tools/testing/selftests/net  
>>
>> And this, along with v2.
> 
> On a second thought: I start thinking it doesn't make much sense,
> especially given the current context of self-tests, to explicitly test
> this, because it's a rather particular corner case.
> 
> I think it would make more sense to introduce generic tests first.
> About, say, PMTU, or route exceptions, but not "tunnel causes route
> exception and administrative change doesn't affect PMTU".
> 

I would argue corner cases in particular should be documented.

>From the commit message it seems like you took the time to create a test
setup using network namespaces. Throw those commands into a shell script
-- tools/testing/selftests/net/mtu.sh. It can evolve from there.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-05  1:11       ` David Ahern
@ 2018-03-05 12:29         ` Stefano Brivio
  2018-03-05 14:14           ` David Miller
  2018-03-05 15:27           ` David Ahern
  0 siblings, 2 replies; 10+ messages in thread
From: Stefano Brivio @ 2018-03-05 12:29 UTC (permalink / raw)
  To: David Ahern
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI,
	Maciej Żenczykowski, netdev

On Sun, 4 Mar 2018 18:11:41 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 3/4/18 4:12 PM, Stefano Brivio wrote:
> > On Sat, 3 Mar 2018 12:22:36 +0100
> > Stefano Brivio <sbrivio@redhat.com> wrote:
> >   
> >>> And please codify the above expectation as a test under
> >>> tools/testing/selftests/net    
> >>
> >> And this, along with v2.  
> > 
> > On a second thought: I start thinking it doesn't make much sense,
> > especially given the current context of self-tests, to explicitly test
> > this, because it's a rather particular corner case.
> > 
> > I think it would make more sense to introduce generic tests first.
> > About, say, PMTU, or route exceptions, but not "tunnel causes route
> > exception and administrative change doesn't affect PMTU".
> >   
> 
> I would argue corner cases in particular should be documented.

Sure, but self-tests are not meant for documentation. I think commit
messages are.

And about corner cases, from Documentation/dev-tools/kselftest.rst:

	These are intended to be small tests to exercise individual code
	paths in the kernel. Tests are intended to be run after building, installing
	and booting a kernel.

and:

	In general, the rules for selftests are
	[...]
	 * Don't take too long;

if you plan to request a self-test for every fix in the networking area,
you need to substantially change the scope of these self-tests. This stuff
would instead fit in a comprehensive networking test suite.

> From the commit message it seems like you took the time to create a test
> setup using network namespaces. Throw those commands into a shell script
> -- tools/testing/selftests/net/mtu.sh. It can evolve from there.

My script sets up namespaces, veth and vti6 interfaces, xfrm states and
policies (could be replaced by vxlan, but that's what I have now). Then
it pings, waits, prints exception routes, changes MTU, etc. In the
commit message, I reported only the relevant parts that are enough to
clearly show the issue.

This script is some ugly monster I don't want to have on my conscience,
or wish for anybody to run as "small test to exercise individual code
paths".

I don't think sensible self-tests can evolve from it. They could
instead evolve from some generic, basic PMTU (or route exceptions) test,
rather than from my very particular fix that needs to involve so many
steps to be checked.

-- 
Stefano

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-05 12:29         ` Stefano Brivio
@ 2018-03-05 14:14           ` David Miller
  2018-03-05 15:27           ` David Ahern
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2018-03-05 14:14 UTC (permalink / raw)
  To: sbrivio; +Cc: dsahern, weiwan, yoshfuji, maze, netdev

From: Stefano Brivio <sbrivio@redhat.com>
Date: Mon, 5 Mar 2018 13:29:56 +0100

> And about corner cases, from Documentation/dev-tools/kselftest.rst:
> 
> 	These are intended to be small tests to exercise individual code
> 	paths in the kernel. Tests are intended to be run after building, installing
> 	and booting a kernel.
> 
> and:
> 
> 	In general, the rules for selftests are
> 	[...]
> 	 * Don't take too long;
> 
> if you plan to request a self-test for every fix in the networking area,
> you need to substantially change the scope of these self-tests. This stuff
> would instead fit in a comprehensive networking test suite.

Nice try, but this logic doesn't hold.

It says don't make any "_INDIVIDUAL_" test take too long to run.
This allows handling timeouts on individual tests more sanely.

It absolutely does not say that we shouldn't have a lot of tests.

Why are you working so hard to avoid adding a nice test case for the
bug you are fixing?  This makes absolultely not sense at all.

I want as many tests as possible for the networking code, so please
write the test case you are being requested to add.

Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
  2018-03-05 12:29         ` Stefano Brivio
  2018-03-05 14:14           ` David Miller
@ 2018-03-05 15:27           ` David Ahern
  1 sibling, 0 replies; 10+ messages in thread
From: David Ahern @ 2018-03-05 15:27 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David S . Miller, Wei Wang, Hideaki YOSHIFUJI,
	Maciej Żenczykowski, netdev

On 3/5/18 5:29 AM, Stefano Brivio wrote:
> On Sun, 4 Mar 2018 18:11:41 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
>> On 3/4/18 4:12 PM, Stefano Brivio wrote:
>>> On Sat, 3 Mar 2018 12:22:36 +0100
>>> Stefano Brivio <sbrivio@redhat.com> wrote:
>>>   
>>>>> And please codify the above expectation as a test under
>>>>> tools/testing/selftests/net    
>>>>
>>>> And this, along with v2.  
>>>
>>> On a second thought: I start thinking it doesn't make much sense,
>>> especially given the current context of self-tests, to explicitly test
>>> this, because it's a rather particular corner case.
>>>
>>> I think it would make more sense to introduce generic tests first.
>>> About, say, PMTU, or route exceptions, but not "tunnel causes route
>>> exception and administrative change doesn't affect PMTU".
>>>   
>>
>> I would argue corner cases in particular should be documented.
> 
> Sure, but self-tests are not meant for documentation. I think commit
> messages are.
> 
> And about corner cases, from Documentation/dev-tools/kselftest.rst:
> 
> 	These are intended to be small tests to exercise individual code
> 	paths in the kernel. Tests are intended to be run after building, installing
> 	and booting a kernel.
> 
> and:
> 
> 	In general, the rules for selftests are
> 	[...]
> 	 * Don't take too long;
> 
> if you plan to request a self-test for every fix in the networking area,
> you need to substantially change the scope of these self-tests. This stuff
> would instead fit in a comprehensive networking test suite.

The Linux Networking stack is long over due for a comprehensive
functional test. There is very little about Layer 3 that can not be
tested with network namespaces, vrf, veth and a recent iproute2 package.

No one company is going to pay someone to write this test suite. It
takes commitment from contributors to submit tests as we go, and test
cases for bug fixes is one of the easiest and best ways to get this moving.

> 
>> From the commit message it seems like you took the time to create a test
>> setup using network namespaces. Throw those commands into a shell script
>> -- tools/testing/selftests/net/mtu.sh. It can evolve from there.
> 
> My script sets up namespaces, veth and vti6 interfaces, xfrm states and
> policies (could be replaced by vxlan, but that's what I have now). Then
> it pings, waits, prints exception routes, changes MTU, etc. In the
> commit message, I reported only the relevant parts that are enough to
> clearly show the issue.
> 
> This script is some ugly monster I don't want to have on my conscience,
> or wish for anybody to run as "small test to exercise individual code
> paths".

Understood. I have a lot of those for MPLS, for example. Each time I
write one it evolves into something cleaner and now I have a few worth
submitting to selftests (and that will happen in it - e.g, when I come
back to MPLS).

> 
> I don't think sensible self-tests can evolve from it. They could
> instead evolve from some generic, basic PMTU (or route exceptions) test,
> rather than from my very particular fix that needs to involve so many
> steps to be checked.
> 

sure it can. I have a basic pmtu script that I wrote to test IPv6 for my
FIB change patch set. I will be submitting it in time as well.

We have to start somewhere, and it takes a commitment from multiple
people to make this happen.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net v2] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
@ 2018-03-05 22:47 Stefano Brivio
  2018-03-02 18:54 ` [PATCH net] " Maciej Żenczykowski
  2018-03-02 22:39 ` David Ahern
  0 siblings, 2 replies; 10+ messages in thread
From: Stefano Brivio @ 2018-03-05 22:47 UTC (permalink / raw)
  To: David S . Miller
  Cc: David Ahern, Wei Wang, Hideaki YOSHIFUJI,
	Maciej Żenczykowski, Xiumei Mu, netdev

Currently, administrative MTU changes on a given netdevice are
not reflected on route exceptions for MTU-less routes, with a
set PMTU value, for that device:

 # ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
 # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
 # ip netns exec a ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
     cache expires 571sec mtu 4926 pref medium
 # ip link set dev vti_a mtu 3000
 # ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
     cache expires 571sec mtu 4926 pref medium
 # ip link set dev vti_a mtu 9000
 # ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
     cache expires 571sec mtu 4926 pref medium

The first issue is that since commit fb56be83e43d ("net-ipv6: on
device mtu change do not add mtu to mtu-less routes") we don't
call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(),
which handles administrative MTU changes, if the regular route
is MTU-less.

However, PMTU exceptions should be always updated, as long as
RTAX_MTU is not locked. Keep the check for MTU-less main route,
as introduced by that commit, but, for exceptions,
call rt6_exceptions_update_pmtu() regardless of that check.

Once that is fixed, one problem remains: MTU changes are not
reflected if the new MTU is higher than the previous one,
because rt6_exceptions_update_pmtu() doesn't allow that. We
should instead allow PMTU increase if the old PMTU matches the
local MTU, as that implies that the old MTU was the lowest in the
path, and PMTU discovery might lead to different results.

The existing check in rt6_mtu_change_route() correctly took that
case into account (for regular routes only), so factor it out
and re-use it also in rt6_exceptions_update_pmtu().

While at it, fix comments style and grammar, and try to be a bit
more descriptive.

Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: fb56be83e43d ("net-ipv6: on device mtu change do not add mtu to mtu-less routes")
Fixes: f5bbe7ee79c2 ("ipv6: prepare rt6_mtu_change() for exception table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
v2: Use 2001:db8::/32 addresses in commit message as assigned by
    RFC 3849 for documentation purposes [David Ahern], rephrase
    paragraph about check on MTU-less route

This patch introduces some visual code churn as I'm factoring out
the existing MTU checks from rt6_exceptions_update_pmtu() and
updating comments style and syntax. Real code changes are rather
small.

Let me know if I should rather submit an "ugly" fix for net, and
a separate, small refactoring for net-next.

 net/ipv6/route.c | 71 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9dcfadddd800..0db4218c9186 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1509,7 +1509,30 @@ static void rt6_exceptions_remove_prefsrc(struct rt6_info *rt)
 	}
 }
 
-static void rt6_exceptions_update_pmtu(struct rt6_info *rt, int mtu)
+static bool rt6_mtu_change_route_allowed(struct inet6_dev *idev,
+					 struct rt6_info *rt, int mtu)
+{
+	/* If the new MTU is lower than the route PMTU, this new MTU will be the
+	 * lowest MTU in the path: always allow updating the route PMTU to
+	 * reflect PMTU decreases.
+	 *
+	 * If the new MTU is higher, and the route PMTU is equal to the local
+	 * MTU, this means the old MTU is the lowest in the path, so allow
+	 * updating it: if other nodes now have lower MTUs, PMTU discovery will
+	 * handle this.
+	 */
+
+	if (dst_mtu(&rt->dst) >= mtu)
+		return true;
+
+	if (dst_mtu(&rt->dst) == idev->cnf.mtu6)
+		return true;
+
+	return false;
+}
+
+static void rt6_exceptions_update_pmtu(struct inet6_dev *idev,
+				       struct rt6_info *rt, int mtu)
 {
 	struct rt6_exception_bucket *bucket;
 	struct rt6_exception *rt6_ex;
@@ -1518,20 +1541,22 @@ static void rt6_exceptions_update_pmtu(struct rt6_info *rt, int mtu)
 	bucket = rcu_dereference_protected(rt->rt6i_exception_bucket,
 					lockdep_is_held(&rt6_exception_lock));
 
-	if (bucket) {
-		for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
-			hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) {
-				struct rt6_info *entry = rt6_ex->rt6i;
-				/* For RTF_CACHE with rt6i_pmtu == 0
-				 * (i.e. a redirected route),
-				 * the metrics of its rt->dst.from has already
-				 * been updated.
-				 */
-				if (entry->rt6i_pmtu && entry->rt6i_pmtu > mtu)
-					entry->rt6i_pmtu = mtu;
-			}
-			bucket++;
+	if (!bucket)
+		return;
+
+	for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
+		hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) {
+			struct rt6_info *entry = rt6_ex->rt6i;
+
+			/* For RTF_CACHE with rt6i_pmtu == 0 (i.e. a redirected
+			 * route), the metrics of its rt->dst.from have already
+			 * been updated.
+			 */
+			if (entry->rt6i_pmtu &&
+			    rt6_mtu_change_route_allowed(idev, entry, mtu))
+				entry->rt6i_pmtu = mtu;
 		}
+		bucket++;
 	}
 }
 
@@ -3809,25 +3834,13 @@ static int rt6_mtu_change_route(struct rt6_info *rt, void *p_arg)
 	   Since RFC 1981 doesn't include administrative MTU increase
 	   update PMTU increase is a MUST. (i.e. jumbo frame)
 	 */
-	/*
-	   If new MTU is less than route PMTU, this new MTU will be the
-	   lowest MTU in the path, update the route PMTU to reflect PMTU
-	   decreases; if new MTU is greater than route PMTU, and the
-	   old MTU is the lowest MTU in the path, update the route PMTU
-	   to reflect the increase. In this case if the other nodes' MTU
-	   also have the lowest MTU, TOO BIG MESSAGE will be lead to
-	   PMTU discovery.
-	 */
 	if (rt->dst.dev == arg->dev &&
-	    dst_metric_raw(&rt->dst, RTAX_MTU) &&
 	    !dst_metric_locked(&rt->dst, RTAX_MTU)) {
 		spin_lock_bh(&rt6_exception_lock);
-		if (dst_mtu(&rt->dst) >= arg->mtu ||
-		    (dst_mtu(&rt->dst) < arg->mtu &&
-		     dst_mtu(&rt->dst) == idev->cnf.mtu6)) {
+		if (dst_metric_raw(&rt->dst, RTAX_MTU) &&
+		    rt6_mtu_change_route_allowed(idev, rt, arg->mtu))
 			dst_metric_set(&rt->dst, RTAX_MTU, arg->mtu);
-		}
-		rt6_exceptions_update_pmtu(rt, arg->mtu);
+		rt6_exceptions_update_pmtu(idev, rt, arg->mtu);
 		spin_unlock_bh(&rt6_exception_lock);
 	}
 	return 0;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-05 22:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 22:47 [PATCH net v2] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes Stefano Brivio
2018-03-02 18:54 ` [PATCH net] " Maciej Żenczykowski
2018-03-03 11:21   ` Stefano Brivio
2018-03-02 22:39 ` David Ahern
2018-03-03 11:22   ` Stefano Brivio
2018-03-04 23:12     ` Stefano Brivio
2018-03-05  1:11       ` David Ahern
2018-03-05 12:29         ` Stefano Brivio
2018-03-05 14:14           ` David Miller
2018-03-05 15:27           ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.