netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] ipv6/route: only update neigh confirm time if pmtu changed
@ 2019-11-22  6:19 Hangbin Liu
  2019-11-22 18:04 ` David Miller
  2019-12-03  2:11 ` [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update Hangbin Liu
  0 siblings, 2 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-11-22  6:19 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, David S . Miller, Marcelo Ricardo Leitner,
	David Ahern, Eric Dumazet, Hangbin Liu

When we setup a pair of gretap, ping each other and create neighbour cache.
Then delete and recreate one side. We will never be able to ping6 to the new
created gretap.

The reason is when we ping6 remote via gretap, we will call like

gre_tap_xmit()
 - ip_tunnel_xmit()
   - tnl_update_pmtu()
     - skb_dst_update_pmtu()
       - ip6_rt_update_pmtu()
         - __ip6_rt_update_pmtu()
           - dst_confirm_neigh()
             - ip6_confirm_neigh()
               - __ipv6_confirm_neigh()
                 - n->confirmed = now

As the confirmed time updated, in neigh_timer_handler() the check for
NUD_DELAY confirm time will pass and the neigh state will back to
NUD_REACHABLE. So the old/wrong mac address will be used again.

If we do not update the confirmed time, the neigh state will go to
neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
neigh later, which is what IPv4 does.

Fix it by reordering the dst_confirm_neigh() and only update it when
pmtu changed.

Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3f83ea851ebf..6fbef61b8f64 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2713,11 +2713,12 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 		daddr = NULL;
 		saddr = NULL;
 	}
-	dst_confirm_neigh(dst, daddr);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
 
+	dst_confirm_neigh(dst, daddr);
+
 	if (!rt6_cache_allowed_for_pmtu(rt6)) {
 		rt6_do_update_pmtu(rt6, mtu);
 		/* update rt6_ex->stamp for cache */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net] ipv6/route: only update neigh confirm time if pmtu changed
  2019-11-22  6:19 [PATCH net] ipv6/route: only update neigh confirm time if pmtu changed Hangbin Liu
@ 2019-11-22 18:04 ` David Miller
  2019-11-26  9:17   ` Hangbin Liu
  2019-12-03  2:11 ` [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update Hangbin Liu
  1 sibling, 1 reply; 51+ messages in thread
From: David Miller @ 2019-11-22 18:04 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, ja, marcelo.leitner, dsahern, edumazet

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Fri, 22 Nov 2019 14:19:19 +0800

> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now

This whole callchain violates the assumptions of the MTU update
machinery.

In this case it's just the tunneling code accounting for the
encapsulation it is creating, and checking the MTU just in case.

But the MTU update code is supposed to be invoked in response to real
networking events that update the PMTU.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

We have to stop papering over the tunneling code's abuse of the PMTU
update framework and do this properly.

Sorry, I'm not applying this.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net] ipv6/route: only update neigh confirm time if pmtu changed
  2019-11-22 18:04 ` David Miller
@ 2019-11-26  9:17   ` Hangbin Liu
  0 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-11-26  9:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ja, marcelo.leitner, dsahern, edumazet

Hi David,

Sorry for the late reply. I'm not sure why your reply went to spam list and
I didn't receive it timely.
On Fri, Nov 22, 2019 at 10:04:38AM -0800, David Miller wrote:
> From: Hangbin Liu <liuhangbin@gmail.com>
> Date: Fri, 22 Nov 2019 14:19:19 +0800
> 
> > The reason is when we ping6 remote via gretap, we will call like
> > 
> > gre_tap_xmit()
> >  - ip_tunnel_xmit()
> >    - tnl_update_pmtu()
> >      - skb_dst_update_pmtu()
> >        - ip6_rt_update_pmtu()
> >          - __ip6_rt_update_pmtu()
> >            - dst_confirm_neigh()
> >              - ip6_confirm_neigh()
> >                - __ipv6_confirm_neigh()
> >                  - n->confirmed = now
> 
> This whole callchain violates the assumptions of the MTU update
> machinery.
> 
> In this case it's just the tunneling code accounting for the
> encapsulation it is creating, and checking the MTU just in case.
> 
> But the MTU update code is supposed to be invoked in response to real
> networking events that update the PMTU.
> 
> So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
> should not be invoking dst_confirm_neigh() as we have no evidence
> of successful two-way communication at this point.

Thanks for the explanation. When I fixed the code, I was also wondering
if we need this neighbor confirmation. So I just moved the dst_confirm_neigh()
a little down to make sure pmtu changed. Your explanation make me clear that
we should not have this neighbor confirmation as PMTU is not a two-way
communication.
> 
> We have to stop papering over the tunneling code's abuse of the PMTU
> update framework and do this properly.

Should I do other works than just remove dst_confirm_neigh() in
__ip6_rt_update_pmtu().

Thanks
Hangbin
> 
> Sorry, I'm not applying this.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-11-22  6:19 [PATCH net] ipv6/route: only update neigh confirm time if pmtu changed Hangbin Liu
  2019-11-22 18:04 ` David Miller
@ 2019-12-03  2:11 ` Hangbin Liu
  2019-12-03  2:47   ` David Miller
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
  1 sibling, 2 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-03  2:11 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Hangbin Liu

When we setup a pair of gretap, ping each other and create neighbour cache.
Then delete and recreate one side. We will never be able to ping6 to the new
created gretap.

The reason is when we ping6 remote via gretap, we will call like

gre_tap_xmit()
 - ip_tunnel_xmit()
   - tnl_update_pmtu()
     - skb_dst_update_pmtu()
       - ip6_rt_update_pmtu()
         - __ip6_rt_update_pmtu()
           - dst_confirm_neigh()
             - ip6_confirm_neigh()
               - __ipv6_confirm_neigh()
                 - n->confirmed = now

As the confirmed time updated, in neigh_timer_handler() the check for
NUD_DELAY confirm time will pass and the neigh state will back to
NUD_REACHABLE. So the old/wrong mac address will be used again.

If we do not update the confirmed time, the neigh state will go to
neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
neigh later, which is what IPv4 does.

Fix it by removing the dst_confirm_neigh() in __ip6_rt_update_pmtu() as
there is no two-way communication during PMTU update.

v2: remove dst_confirm_neigh directly as David Miller pointed out.

Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/route.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b59940416cb5..335ed36a5c78 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2710,7 +2710,6 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 		daddr = NULL;
 		saddr = NULL;
 	}
-	dst_confirm_neigh(dst, daddr);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-12-03  2:11 ` [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update Hangbin Liu
@ 2019-12-03  2:47   ` David Miller
  2019-12-03 10:15     ` Hangbin Liu
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
  1 sibling, 1 reply; 51+ messages in thread
From: David Miller @ 2019-12-03  2:47 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, ja, marcelo.leitner, dsahern, edumazet

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Tue,  3 Dec 2019 10:11:37 +0800

> Fix it by removing the dst_confirm_neigh() in __ip6_rt_update_pmtu() as
> there is no two-way communication during PMTU update.
> 
> v2: remove dst_confirm_neigh directly as David Miller pointed out.

That's not what I said.

I said that this interface is designed for situations where the neigh
update is appropriate, and that's what happens for most callers _except_
these tunnel cases.

The tunnel use is the exception and invoking the interface
inappropriately.

It is important to keep the neigh reachability fresh for TCP flows so
you cannot remove this dst_confirm_neigh() call.

Instead, make a new interface that the tunnel use cases can call into
to elide the neigh update.

Yes, this means you will have too update all of the tunnel callers
into these calls chains but that's the price we have to pay in this
situation unfortunately.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-12-03  2:47   ` David Miller
@ 2019-12-03 10:15     ` Hangbin Liu
  2019-12-03 10:25       ` Hangbin Liu
  0 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-03 10:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ja, marcelo.leitner, dsahern, edumazet

On Mon, Dec 02, 2019 at 06:47:04PM -0800, David Miller wrote:
> From: Hangbin Liu <liuhangbin@gmail.com>
> Date: Tue,  3 Dec 2019 10:11:37 +0800
> 
> > Fix it by removing the dst_confirm_neigh() in __ip6_rt_update_pmtu() as
> > there is no two-way communication during PMTU update.
> > 
> > v2: remove dst_confirm_neigh directly as David Miller pointed out.
> 
> That's not what I said.
> 
> I said that this interface is designed for situations where the neigh
> update is appropriate, and that's what happens for most callers _except_
> these tunnel cases.
> 
> The tunnel use is the exception and invoking the interface
> inappropriately.
> 
> It is important to keep the neigh reachability fresh for TCP flows so
> you cannot remove this dst_confirm_neigh() call.
> 
> Instead, make a new interface that the tunnel use cases can call into
> to elide the neigh update.
> 
> Yes, this means you will have too update all of the tunnel callers
> into these calls chains but that's the price we have to pay in this
> situation unfortunately.

Oh, I got what you mean now. thanks for the explain. I will see how
to do this.

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-12-03 10:15     ` Hangbin Liu
@ 2019-12-03 10:25       ` Hangbin Liu
  2019-12-03 19:58         ` David Miller
  0 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-03 10:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ja, marcelo.leitner, dsahern, edumazet


Hi David,
On Tue, Dec 03, 2019 at 06:15:36PM +0800, Hangbin Liu wrote:
> On Mon, Dec 02, 2019 at 06:47:04PM -0800, David Miller wrote:
> > From: Hangbin Liu <liuhangbin@gmail.com>
> > Date: Tue,  3 Dec 2019 10:11:37 +0800
> > 
> > > Fix it by removing the dst_confirm_neigh() in __ip6_rt_update_pmtu() as
> > > there is no two-way communication during PMTU update.
> > > 
> > > v2: remove dst_confirm_neigh directly as David Miller pointed out.
> > 
> > That's not what I said.
> > 
> > I said that this interface is designed for situations where the neigh
> > update is appropriate, and that's what happens for most callers _except_
> > these tunnel cases.
> > 
> > The tunnel use is the exception and invoking the interface
> > inappropriately.
> > 
> > It is important to keep the neigh reachability fresh for TCP flows so
> > you cannot remove this dst_confirm_neigh() call.

I have one question here. Since we have the .confirm_neigh fuction in
struct dst_ops. How about do a dst->ops->confirm_neigh() separately after
dst->ops->update_pmtu()? Why should we mix the confirm_neigh() in
update_pmtu(), like ip6_rt_update_pmtu()?

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-12-03 10:25       ` Hangbin Liu
@ 2019-12-03 19:58         ` David Miller
  2019-12-10  3:36           ` Hangbin Liu
  0 siblings, 1 reply; 51+ messages in thread
From: David Miller @ 2019-12-03 19:58 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, ja, marcelo.leitner, dsahern, edumazet

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Tue, 3 Dec 2019 18:25:35 +0800

> 
> Hi David,
> On Tue, Dec 03, 2019 at 06:15:36PM +0800, Hangbin Liu wrote:
>> On Mon, Dec 02, 2019 at 06:47:04PM -0800, David Miller wrote:
>> > From: Hangbin Liu <liuhangbin@gmail.com>
>> > Date: Tue,  3 Dec 2019 10:11:37 +0800
>> > 
>> > > Fix it by removing the dst_confirm_neigh() in __ip6_rt_update_pmtu() as
>> > > there is no two-way communication during PMTU update.
>> > > 
>> > > v2: remove dst_confirm_neigh directly as David Miller pointed out.
>> > 
>> > That's not what I said.
>> > 
>> > I said that this interface is designed for situations where the neigh
>> > update is appropriate, and that's what happens for most callers _except_
>> > these tunnel cases.
>> > 
>> > The tunnel use is the exception and invoking the interface
>> > inappropriately.
>> > 
>> > It is important to keep the neigh reachability fresh for TCP flows so
>> > you cannot remove this dst_confirm_neigh() call.
> 
> I have one question here. Since we have the .confirm_neigh fuction in
> struct dst_ops. How about do a dst->ops->confirm_neigh() separately after
> dst->ops->update_pmtu()? Why should we mix the confirm_neigh() in
> update_pmtu(), like ip6_rt_update_pmtu()?

Two indirect calls which have high cost due to spectre mitigation?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-12-03 19:58         ` David Miller
@ 2019-12-10  3:36           ` Hangbin Liu
  2019-12-10 17:00             ` Guillaume Nault
  0 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-10  3:36 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, ja, marcelo.leitner, dsahern, edumazet, Guillaume Nault

Hi David,

Sorry for the late reply. Hope you still have impression for this discussion.
I discussed this issue with my colleagues offline and I still have some questions.
Please see comments below.

On Tue, Dec 03, 2019 at 11:58:18AM -0800, David Miller wrote:
> >> > That's not what I said.
> >> > 
> >> > I said that this interface is designed for situations where the neigh
> >> > update is appropriate, and that's what happens for most callers _except_
> >> > these tunnel cases.
> >> > 
> >> > The tunnel use is the exception and invoking the interface
> >> > inappropriately.
> >> > 
> >> > It is important to keep the neigh reachability fresh for TCP flows so
> >> > you cannot remove this dst_confirm_neigh() call.

The first is why IPv4 don't need this neigh update. I didn't
find dst_confirm_neigh() or ipv4_confirm_neigh() in ip_rt_update_pmtu()

> > 
> > I have one question here. Since we have the .confirm_neigh fuction in
> > struct dst_ops. How about do a dst->ops->confirm_neigh() separately after
> > dst->ops->update_pmtu()? Why should we mix the confirm_neigh() in
> > update_pmtu(), like ip6_rt_update_pmtu()?
> 
> Two indirect calls which have high cost due to spectre mitigation?

Guillaume pointed me that dst_confirm_neigh() is also a indriect call.
So it should take same cost to call dst_confirm_neigh() in or before
__ip6_rt_update_pmtu(). If they are the same cose, I think there would
have two fixes.

1. Add a new parameter 'bool confirm_neigh' to __ip6_rt_update_pmtu(),
update struct dst_ops.update_mtu and all functions who called it.

2. Move dst_confirm_neigh() out of __ip6_rt_update_pmtu() and only call it
   in fuctions who need it, like inet6_csk_update_pmtu().

What do you think? Please tell me if I missed something.

Regards
Hangbin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update
  2019-12-10  3:36           ` Hangbin Liu
@ 2019-12-10 17:00             ` Guillaume Nault
  0 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-10 17:00 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: David Miller, netdev, ja, marcelo.leitner, dsahern, edumazet

On Tue, Dec 10, 2019 at 11:36:56AM +0800, Hangbin Liu wrote:
> Hi David,
> 
> Sorry for the late reply. Hope you still have impression for this discussion.
> I discussed this issue with my colleagues offline and I still have some questions.
> Please see comments below.
> 
> On Tue, Dec 03, 2019 at 11:58:18AM -0800, David Miller wrote:
> > >> > That's not what I said.
> > >> > 
> > >> > I said that this interface is designed for situations where the neigh
> > >> > update is appropriate, and that's what happens for most callers _except_
> > >> > these tunnel cases.
> > >> > 
> > >> > The tunnel use is the exception and invoking the interface
> > >> > inappropriately.
> > >> > 
> > >> > It is important to keep the neigh reachability fresh for TCP flows so
> > >> > you cannot remove this dst_confirm_neigh() call.
> 
> The first is why IPv4 don't need this neigh update. I didn't
> find dst_confirm_neigh() or ipv4_confirm_neigh() in ip_rt_update_pmtu()
> 
> > > 
> > > I have one question here. Since we have the .confirm_neigh fuction in
> > > struct dst_ops. How about do a dst->ops->confirm_neigh() separately after
> > > dst->ops->update_pmtu()? Why should we mix the confirm_neigh() in
> > > update_pmtu(), like ip6_rt_update_pmtu()?
> > 
> > Two indirect calls which have high cost due to spectre mitigation?
> 
> Guillaume pointed me that dst_confirm_neigh() is also a indriect call.
> So it should take same cost to call dst_confirm_neigh() in or before
> __ip6_rt_update_pmtu(). If they are the same cose, I think there would
> have two fixes.
> 
OTOH, the dst_confirm_neigh() call could easily be replaced by a direct
ip6_confirm_neigh() call in the current code (maybe using an
INDIRECT_CALL wrapper if necessary).
I'm not sure where dst_confirm_neigh() would go if it was moved outside
of __ip6_rt_update_pmtu(), but that might make such optimisation
harder.

> 1. Add a new parameter 'bool confirm_neigh' to __ip6_rt_update_pmtu(),
> update struct dst_ops.update_mtu and all functions who called it.
> 
> 2. Move dst_confirm_neigh() out of __ip6_rt_update_pmtu() and only call it
>    in fuctions who need it, like inet6_csk_update_pmtu().
> 
> What do you think? Please tell me if I missed something.
> 
> Regards
> Hangbin
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update
  2019-12-03  2:11 ` [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update Hangbin Liu
  2019-12-03  2:47   ` David Miller
@ 2019-12-18 11:53   ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
                       ` (10 more replies)
  1 sibling, 11 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When we setup a pair of gretap, ping each other and create neighbour cache.
Then delete and recreate one side. We will never be able to ping6 to the new
created gretap.

The reason is when we ping6 remote via gretap, we will call like

gre_tap_xmit()
 - ip_tunnel_xmit()
   - tnl_update_pmtu()
     - skb_dst_update_pmtu()
       - ip6_rt_update_pmtu()
         - __ip6_rt_update_pmtu()
           - dst_confirm_neigh()
             - ip6_confirm_neigh()
               - __ipv6_confirm_neigh()
                 - n->confirmed = now

As the confirmed time updated, in neigh_timer_handler() the check for
NUD_DELAY confirm time will pass and the neigh state will back to
NUD_REACHABLE. So the old/wrong mac address will be used again.

If we do not update the confirmed time, the neigh state will go to
neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
neigh later, which is what IPv4 does.

We couldn't remove the ip6_confirm_neigh() directly as we still need it
for TCP flows. To fix it, we have to pass a bool parameter to
dst_ops.update_pmtu() and only disable neighbor update for tunnels.

v2: remove dst_confirm_neigh directly
v3: add bool confirm_neigh parameter for all dst_ops.update_pmtu and
    disable neigh update on each tunnel device.

---
Reproducer:

#!/bin/bash
set -x
ip -a netns del
modprobe -r veth
modprobe -r bridge

ip netns add ha
ip netns add hb
ip link add br0 type bridge
ip link set br0 up
ip link add br_ha type veth peer name veth0 netns ha
ip link add br_hb type veth peer name veth0 netns hb
ip link set br_ha up
ip link set br_hb up
ip link set br_ha master br0
ip link set br_hb master br0
ip netns exec ha ip link set veth0 up
ip netns exec hb ip link set veth0 up
ip netns exec ha ip addr add 192.168.0.1/24 dev veth0
ip netns exec hb ip addr add 192.168.0.2/24 dev veth0

ip netns exec ha ip link add gretap1 type gretap local 192.168.0.1 remote 192.168.0.2
ip netns exec ha ip link set gretap1 up
ip netns exec ha ip addr add 1.1.1.1/24 dev gretap1
ip netns exec ha ip addr add 1111::1/64 dev gretap1

ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1

ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
sleep 30

# recreate gretap
ip netns exec hb ip link del gretap1
ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1
ip netns exec hb ip link show dev gretap1

ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
sleep 10
ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
---

Hangbin Liu (8):
  net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  ip6_gre: do not confirm neighbor when do pmtu update
  gtp: do not confirm neighbor when do pmtu update
  net/dst: add new function skb_dst_update_pmtu_no_confirm
  tunnel: do not confirm neighbor when do pmtu update
  vti: do not confirm neighbor when do pmtu update
  sit: do not confirm neighbor when do pmtu update
  net/dst: do not confirm neighbor for vxlan and geneve pmtu update

 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                | 13 +++++++++++--
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/ip_tunnel.c             |  2 +-
 net/ipv4/ip_vti.c                |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/ip6_tunnel.c            |  4 ++--
 net/ipv6/ip6_vti.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/sit.c                   |  2 +-
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 19 files changed, 58 insertions(+), 32 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH net-next 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
  - tnl_update_pmtu()
    - skb_dst_update_pmtu()
      - ip6_rt_update_pmtu()
        - __ip6_rt_update_pmtu()
          - dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v2: remove dst_confirm_neigh directly
v3: add bool confirm_neigh parameter for all dst_ops.update_pmtu

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                |  2 +-
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 14 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index ecfe26215935..9cac0accba7a 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -541,7 +541,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		mtu = dst_mtu(&rt->dst);
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu);
+	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, true);
 
 	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
 	    mtu < ntohs(iph->tot_len)) {
diff --git a/include/net/dst.h b/include/net/dst.h
index fe62fe2eb781..0739e84152e4 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -516,7 +516,7 @@ static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
 	struct dst_entry *dst = skb_dst(skb);
 
 	if (dst && dst->ops->update_pmtu)
-		dst->ops->update_pmtu(dst, NULL, skb, mtu);
+		dst->ops->update_pmtu(dst, NULL, skb, mtu, true);
 }
 
 static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 5ec645f27ee3..443863c7b8da 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -27,7 +27,8 @@ struct dst_ops {
 	struct dst_entry *	(*negative_advice)(struct dst_entry *);
 	void			(*link_failure)(struct sk_buff *);
 	void			(*update_pmtu)(struct dst_entry *dst, struct sock *sk,
-					       struct sk_buff *skb, u32 mtu);
+					       struct sk_buff *skb, u32 mtu,
+					       bool confirm_neigh);
 	void			(*redirect)(struct dst_entry *dst, struct sock *sk,
 					    struct sk_buff *skb);
 	int			(*local_out)(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
index 2cdfc5d6c25d..8c69f0c95a8e 100644
--- a/net/bridge/br_nf_core.c
+++ b/net/bridge/br_nf_core.c
@@ -22,7 +22,8 @@
 #endif
 
 static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			     struct sk_buff *skb, u32 mtu)
+			     struct sk_buff *skb, u32 mtu,
+			     bool confirm_neigh)
 {
 }
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index aea918135ec3..08c3dc45f1a4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -110,7 +110,8 @@ static void dn_dst_ifdown(struct dst_entry *, struct net_device *dev, int how);
 static struct dst_entry *dn_dst_negative_advice(struct dst_entry *);
 static void dn_dst_link_failure(struct sk_buff *);
 static void dn_dst_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb , u32 mtu);
+			       struct sk_buff *skb , u32 mtu,
+			       bool confirm_neigh);
 static void dn_dst_redirect(struct dst_entry *dst, struct sock *sk,
 			    struct sk_buff *skb);
 static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst,
@@ -251,7 +252,8 @@ static int dn_dst_gc(struct dst_ops *ops)
  * advertise to the other end).
  */
 static void dn_dst_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb, u32 mtu)
+			       struct sk_buff *skb, u32 mtu,
+			       bool confirm_neigh)
 {
 	struct dn_route *rt = (struct dn_route *) dst;
 	struct neighbour *n = rt->n;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index e4c6e8b40490..18c0d5bffe12 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1086,7 +1086,7 @@ struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu)
 		if (!dst)
 			goto out;
 	}
-	dst->ops->update_pmtu(dst, sk, NULL, mtu);
+	dst->ops->update_pmtu(dst, sk, NULL, mtu, true);
 
 	dst = __sk_dst_check(sk, 0);
 	if (!dst)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f88c93c38f11..87e979f2b74a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -139,7 +139,8 @@ static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
 static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					   struct sk_buff *skb, u32 mtu);
+					   struct sk_buff *skb, u32 mtu,
+					   bool confirm_neigh);
 static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static void		ipv4_dst_destroy(struct dst_entry *dst);
@@ -1043,7 +1044,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 }
 
 static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct rtable *rt = (struct rtable *) dst;
 	struct flowi4 fl4;
@@ -2687,7 +2689,8 @@ static unsigned int ipv4_blackhole_mtu(const struct dst_entry *dst)
 }
 
 static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					  struct sk_buff *skb, u32 mtu)
+					  struct sk_buff *skb, u32 mtu,
+					  bool confirm_neigh)
 {
 }
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 35b84b52b702..9ebd54752e03 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -100,12 +100,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 }
 
 static void xfrm4_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	path->ops->update_pmtu(path, sk, skb, mtu);
+	path->ops->update_pmtu(path, sk, skb, mtu, confirm_neigh);
 }
 
 static void xfrm4_redirect(struct dst_entry *dst, struct sock *sk,
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index fe9cb8d1adca..e315526fa244 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -146,7 +146,7 @@ struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu)
 
 	if (IS_ERR(dst))
 		return NULL;
-	dst->ops->update_pmtu(dst, sk, NULL, mtu);
+	dst->ops->update_pmtu(dst, sk, NULL, mtu, true);
 
 	dst = inet6_csk_route_socket(sk, &fl6);
 	return IS_ERR(dst) ? NULL : dst;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 923034c52ce4..071cb237f00b 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1040,7 +1040,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 
 	/* TooBig packet may have updated dst->dev's mtu */
 	if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
-		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu);
+		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, true);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
 			   NEXTHDR_GRE);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b59940416cb5..affb51c11a25 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -95,7 +95,8 @@ static int		ip6_pkt_prohibit(struct sk_buff *skb);
 static int		ip6_pkt_prohibit_out(struct net *net, struct sock *sk, struct sk_buff *skb);
 static void		ip6_link_failure(struct sk_buff *skb);
 static void		ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					   struct sk_buff *skb, u32 mtu);
+					   struct sk_buff *skb, u32 mtu,
+					   bool confirm_neigh);
 static void		rt6_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static int rt6_score_route(const struct fib6_nh *nh, u32 fib6_flags, int oif,
@@ -264,7 +265,8 @@ static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
 }
 
 static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					 struct sk_buff *skb, u32 mtu)
+					 struct sk_buff *skb, u32 mtu,
+					 bool confirm_neigh)
 {
 }
 
@@ -2692,7 +2694,8 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 }
 
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
-				 const struct ipv6hdr *iph, u32 mtu)
+				 const struct ipv6hdr *iph, u32 mtu,
+				 bool confirm_neigh)
 {
 	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
@@ -2710,7 +2713,10 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 		daddr = NULL;
 		saddr = NULL;
 	}
-	dst_confirm_neigh(dst, daddr);
+
+	if (confirm_neigh)
+		dst_confirm_neigh(dst, daddr);
+
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
@@ -2764,9 +2770,11 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 }
 
 static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb, u32 mtu)
+			       struct sk_buff *skb, u32 mtu,
+			       bool confirm_neigh)
 {
-	__ip6_rt_update_pmtu(dst, sk, skb ? ipv6_hdr(skb) : NULL, mtu);
+	__ip6_rt_update_pmtu(dst, sk, skb ? ipv6_hdr(skb) : NULL, mtu,
+			     confirm_neigh);
 }
 
 void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
@@ -2785,7 +2793,7 @@ void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (!dst->error)
-		__ip6_rt_update_pmtu(dst, NULL, iph, ntohl(mtu));
+		__ip6_rt_update_pmtu(dst, NULL, iph, ntohl(mtu), true);
 	dst_release(dst);
 }
 EXPORT_SYMBOL_GPL(ip6_update_pmtu);
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 699e0730ce8e..af7a4b8b1e9c 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -98,12 +98,13 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 }
 
 static void xfrm6_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	path->ops->update_pmtu(path, sk, skb, mtu);
+	path->ops->update_pmtu(path, sk, skb, mtu, confirm_neigh);
 }
 
 static void xfrm6_redirect(struct dst_entry *dst, struct sock *sk,
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index b1e300f8881b..b00866d777fe 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -208,7 +208,7 @@ static inline void maybe_update_pmtu(int skb_af, struct sk_buff *skb, int mtu)
 	struct rtable *ort = skb_rtable(skb);
 
 	if (!skb->dev && sk && sk_fullsock(sk))
-		ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
+		ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu, true);
 }
 
 static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af,
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index f4de064598f8..806af58f4375 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -263,7 +263,7 @@ bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu)
 
 		pf->af->from_sk(&addr, sk);
 		pf->to_sk_daddr(&t->ipaddr, sk);
-		dst->ops->update_pmtu(dst, sk, NULL, pmtu);
+		dst->ops->update_pmtu(dst, sk, NULL, pmtu, true);
 		pf->to_sk_daddr(&addr, sk);
 
 		dst = sctp_transport_dst_check(t);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 2/8] ip6_gre: do not confirm neighbor when do pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 3/8] gtp: " Hangbin Liu
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
This will cause the neigh cache be refreshed and set to REACHABLE before
xmit.

But if the remote mac address changed, e.g. device is deleted and recreated,
we will not able to notice this and still use the old mac address as the neigh
cache is REACHABLE.

Fix this by disable neigh confirm when do pmtu update

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/ip6_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 071cb237f00b..189de56f5e36 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1040,7 +1040,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 
 	/* TooBig packet may have updated dst->dev's mtu */
 	if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
-		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, true);
+		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, false);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
 			   NEXTHDR_GRE);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 3/8] gtp: do not confirm neighbor when do pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although gtp only support ipv4 right now, and __ip_rt_update_pmtu() does not
call dst_confirm_neigh(), we still set it to false to keep consistency with
IPv6 code.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/gtp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 9cac0accba7a..71b34ff8e7eb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -541,7 +541,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		mtu = dst_mtu(&rt->dst);
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, true);
+	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, false);
 
 	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
 	    mtu < ntohs(iph->tot_len)) {
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (2 preceding siblings ...)
  2019-12-18 11:53     ` [PATCH net-next 3/8] gtp: " Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
update pmtu but should not do neighbor confirm.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/net/dst.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 0739e84152e4..208e7c0c89d8 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -519,6 +519,15 @@ static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
 		dst->ops->update_pmtu(dst, NULL, skb, mtu, true);
 }
 
+/* update dst pmtu but not do neighbor confirm */
+static inline void skb_dst_update_pmtu_no_confirm(struct sk_buff *skb, u32 mtu)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	if (dst && dst->ops->update_pmtu)
+		dst->ops->update_pmtu(dst, NULL, skb, mtu, false);
+}
+
 static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
 					 struct dst_entry *encap_dst,
 					 int headroom)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (3 preceding siblings ...)
  2019-12-18 11:53     ` [PATCH net-next 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-19 17:47       ` Guillaume Nault
  2019-12-18 11:53     ` [PATCH net-next 6/8] vti: " Hangbin Liu
                       ` (5 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although ipv4 tunnel is not affected as __ip_rt_update_pmtu() does not call
dst_confirm_neigh(), we still not do neigh confirm to keep consistency with
IPv6 code.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_tunnel.c  | 2 +-
 net/ipv6/ip6_tunnel.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 38c02bb62e2c..0fe2a5d3e258 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -505,7 +505,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
 	if (skb_valid_dst(skb))
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		if (!skb_is_gso(skb) &&
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 754a484d35df..2f376dbc37d5 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -640,7 +640,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		if (rel_info > dst_mtu(skb_dst(skb2)))
 			goto out;
 
-		skb_dst_update_pmtu(skb2, rel_info);
+		skb_dst_update_pmtu_no_confirm(skb2, rel_info);
 	}
 
 	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
@@ -1132,7 +1132,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	mtu = max(mtu, skb->protocol == htons(ETH_P_IPV6) ?
 		       IPV6_MIN_MTU : IPV4_MIN_MTU);
 
-	skb_dst_update_pmtu(skb, mtu);
+	skb_dst_update_pmtu_no_confirm(skb, mtu);
 	if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
 		*pmtu = mtu;
 		err = -EMSGSIZE;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 6/8] vti: do not confirm neighbor when do pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (4 preceding siblings ...)
  2019-12-18 11:53     ` [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 7/8] sit: " Hangbin Liu
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although ip vti is not affected as __ip_rt_update_pmtu() does not call
dst_confirm_neigh(), we still not do neigh confirm to keep consistency with
IPv6 code.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_vti.c  | 2 +-
 net/ipv6/ip6_vti.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index cfb025606793..fb9f6d60c27c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -214,7 +214,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 		if (skb->protocol == htons(ETH_P_IP)) {
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 				  htonl(mtu));
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 024db17386d2..6f08b760c2a7 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -479,7 +479,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 		if (skb->protocol == htons(ETH_P_IPV6)) {
 			if (mtu < IPV6_MIN_MTU)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 7/8] sit: do not confirm neighbor when do pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (5 preceding siblings ...)
  2019-12-18 11:53     ` [PATCH net-next 6/8] vti: " Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-18 11:53     ` [PATCH net-next 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/sit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index b2ccbc473127..98954830c40b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -944,7 +944,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		}
 
 		if (tunnel->parms.iph.daddr)
-			skb_dst_update_pmtu(skb, mtu);
+			skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 		if (skb->len > mtu && !skb_is_gso(skb)) {
 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH net-next 8/8] net/dst: do not confirm neighbor for vxlan and geneve pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (6 preceding siblings ...)
  2019-12-18 11:53     ` [PATCH net-next 7/8] sit: " Hangbin Liu
@ 2019-12-18 11:53     ` Hangbin Liu
  2019-12-19 17:49       ` Guillaume Nault
  2019-12-18 12:01     ` [PATCH net-next 0/8] disable neigh update for tunnels during " Hangbin Liu
                       ` (2 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 11:53 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

So disable the neigh confirm for vxlan and geneve pmtu update.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/net/dst.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 208e7c0c89d8..626cf614ad86 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -535,7 +535,7 @@ static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
 	u32 encap_mtu = dst_mtu(encap_dst);
 
 	if (skb->len > encap_mtu - headroom)
-		skb_dst_update_pmtu(skb, encap_mtu - headroom);
+		skb_dst_update_pmtu_no_confirm(skb, encap_mtu - headroom);
 }
 
 #endif /* _NET_DST_H */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (7 preceding siblings ...)
  2019-12-18 11:53     ` [PATCH net-next 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
@ 2019-12-18 12:01     ` Hangbin Liu
  2019-12-19 17:57       ` Guillaume Nault
  2019-12-19 17:53     ` Guillaume Nault
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
  10 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-18 12:01 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev

On Wed, Dec 18, 2019 at 07:53:05PM +0800, Hangbin Liu wrote:
> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 

Oh... Sorry I forgot to add PATCHv3 in the subject...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update
  2019-12-18 11:53     ` [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-19 17:47       ` Guillaume Nault
  2019-12-20  2:36         ` Hangbin Liu
  0 siblings, 1 reply; 51+ messages in thread
From: Guillaume Nault @ 2019-12-19 17:47 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Wed, Dec 18, 2019 at 07:53:10PM +0800, Hangbin Liu wrote:
> When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
> we should not call dst_confirm_neigh() as there is no two-way communication.
> 
> Although ipv4 tunnel is not affected as __ip_rt_update_pmtu() does not call
> dst_confirm_neigh(), we still not do neigh confirm to keep consistency with
> IPv6 code.
> 
This is a bit confusing. IPv4 tunnels (e.g. gretap) are affected, as show
by your reproducer. Quoting patch 0:
"""
  The reason is when we ping6 remote via gretap, we will call like

  gre_tap_xmit()
   - ip_tunnel_xmit()
     - tnl_update_pmtu()
       - skb_dst_update_pmtu()
         - ip6_rt_update_pmtu()
           - __ip6_rt_update_pmtu()
             - dst_confirm_neigh()
               - ip6_confirm_neigh()
                 - __ipv6_confirm_neigh()
                   - n->confirmed = now
"""

IPv4 is not affected if it's on the overlay, but that's not what is
this patch cares about. Unless I've misunderstood this paragraph, I
think that you can just delete it.

> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  net/ipv4/ip_tunnel.c  | 2 +-
>  net/ipv6/ip6_tunnel.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 38c02bb62e2c..0fe2a5d3e258 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -505,7 +505,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
>  		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
>  
>  	if (skb_valid_dst(skb))
> -		skb_dst_update_pmtu(skb, mtu);
> +		skb_dst_update_pmtu_no_confirm(skb, mtu);
>  

Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 8/8] net/dst: do not confirm neighbor for vxlan and geneve pmtu update
  2019-12-18 11:53     ` [PATCH net-next 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
@ 2019-12-19 17:49       ` Guillaume Nault
  0 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-19 17:49 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Wed, Dec 18, 2019 at 07:53:13PM +0800, Hangbin Liu wrote:
> When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
> we should not call dst_confirm_neigh() as there is no two-way communication.
> 
> So disable the neigh confirm for vxlan and geneve pmtu update.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  include/net/dst.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 208e7c0c89d8..626cf614ad86 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -535,7 +535,7 @@ static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
>  	u32 encap_mtu = dst_mtu(encap_dst);
>  
>  	if (skb->len > encap_mtu - headroom)
> -		skb_dst_update_pmtu(skb, encap_mtu - headroom);
> +		skb_dst_update_pmtu_no_confirm(skb, encap_mtu - headroom);
>  }
>  

Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (8 preceding siblings ...)
  2019-12-18 12:01     ` [PATCH net-next 0/8] disable neigh update for tunnels during " Hangbin Liu
@ 2019-12-19 17:53     ` Guillaume Nault
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
  10 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-19 17:53 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Wed, Dec 18, 2019 at 07:53:05PM +0800, Hangbin Liu wrote:
> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 
> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now
> 
> As the confirmed time updated, in neigh_timer_handler() the check for
> NUD_DELAY confirm time will pass and the neigh state will back to
> NUD_REACHABLE. So the old/wrong mac address will be used again.
> 
> If we do not update the confirmed time, the neigh state will go to
> neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
> neigh later, which is what IPv4 does.
> 
> We couldn't remove the ip6_confirm_neigh() directly as we still need it
> for TCP flows. To fix it, we have to pass a bool parameter to
> dst_ops.update_pmtu() and only disable neighbor update for tunnels.
> 

Apart from the missing Fixes tags, code looks good to me.

Reviewed-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update
  2019-12-18 12:01     ` [PATCH net-next 0/8] disable neigh update for tunnels during " Hangbin Liu
@ 2019-12-19 17:57       ` Guillaume Nault
  2019-12-20  2:48         ` Hangbin Liu
  0 siblings, 1 reply; 51+ messages in thread
From: Guillaume Nault @ 2019-12-19 17:57 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Wed, Dec 18, 2019 at 08:01:48PM +0800, Hangbin Liu wrote:
> On Wed, Dec 18, 2019 at 07:53:05PM +0800, Hangbin Liu wrote:
> > When we setup a pair of gretap, ping each other and create neighbour cache.
> > Then delete and recreate one side. We will never be able to ping6 to the new
> > created gretap.
> > 
> 
> Oh... Sorry I forgot to add PATCHv3 in the subject...
> 
Also, it looks like -net material to me.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update
  2019-12-19 17:47       ` Guillaume Nault
@ 2019-12-20  2:36         ` Hangbin Liu
  0 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  2:36 UTC (permalink / raw)
  To: Guillaume Nault
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

Hi Guillaume,

Thanks a lot for your review.
On Thu, Dec 19, 2019 at 06:47:20PM +0100, Guillaume Nault wrote:
> On Wed, Dec 18, 2019 at 07:53:10PM +0800, Hangbin Liu wrote:
> > When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
> > we should not call dst_confirm_neigh() as there is no two-way communication.
> > 
> > Although ipv4 tunnel is not affected as __ip_rt_update_pmtu() does not call
> > dst_confirm_neigh(), we still not do neigh confirm to keep consistency with
> > IPv6 code.
> > 
> This is a bit confusing. IPv4 tunnels (e.g. gretap) are affected, as show
> by your reproducer. Quoting patch 0:

Ah, yes, I forgot that IPv6 over GRE4 also affects. I will update the
commit description.

> 
> Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
> 
I will add it


Thanks
Hangbin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update
  2019-12-19 17:57       ` Guillaume Nault
@ 2019-12-20  2:48         ` Hangbin Liu
  0 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  2:48 UTC (permalink / raw)
  To: Guillaume Nault
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Thu, Dec 19, 2019 at 06:57:28PM +0100, Guillaume Nault wrote:
> On Wed, Dec 18, 2019 at 08:01:48PM +0800, Hangbin Liu wrote:
> > On Wed, Dec 18, 2019 at 07:53:05PM +0800, Hangbin Liu wrote:
> > > When we setup a pair of gretap, ping each other and create neighbour cache.
> > > Then delete and recreate one side. We will never be able to ping6 to the new
> > > created gretap.
> > > 
> > 
> > Oh... Sorry I forgot to add PATCHv3 in the subject...
> > 
> Also, it looks like -net material to me.
> 

hmm.. I set it to net-next as we add new feature/parameter in
dst_ops.update_pmtu. But it looks make more sense to set to -net after
adding the Fixes tags.

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv4 net 0/8] disable neigh update for tunnels during pmtu update
  2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
                       ` (9 preceding siblings ...)
  2019-12-19 17:53     ` Guillaume Nault
@ 2019-12-20  3:25     ` Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
                         ` (9 more replies)
  10 siblings, 10 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When we setup a pair of gretap, ping each other and create neighbour cache.
Then delete and recreate one side. We will never be able to ping6 to the new
created gretap.

The reason is when we ping6 remote via gretap, we will call like

gre_tap_xmit()
 - ip_tunnel_xmit()
   - tnl_update_pmtu()
     - skb_dst_update_pmtu()
       - ip6_rt_update_pmtu()
         - __ip6_rt_update_pmtu()
           - dst_confirm_neigh()
             - ip6_confirm_neigh()
               - __ipv6_confirm_neigh()
                 - n->confirmed = now

As the confirmed time updated, in neigh_timer_handler() the check for
NUD_DELAY confirm time will pass and the neigh state will back to
NUD_REACHABLE. So the old/wrong mac address will be used again.

If we do not update the confirmed time, the neigh state will go to
neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
neigh later, which is what IPv4 does.

We couldn't remove the ip6_confirm_neigh() directly as we still need it
for TCP flows. To fix it, we have to pass a bool parameter to
dst_ops.update_pmtu() and only disable neighbor update for tunnels.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

---
Reproducer:

#!/bin/bash
set -x
ip -a netns del
modprobe -r veth
modprobe -r bridge

ip netns add ha
ip netns add hb
ip link add br0 type bridge
ip link set br0 up
ip link add br_ha type veth peer name veth0 netns ha
ip link add br_hb type veth peer name veth0 netns hb
ip link set br_ha up
ip link set br_hb up
ip link set br_ha master br0
ip link set br_hb master br0
ip netns exec ha ip link set veth0 up
ip netns exec hb ip link set veth0 up
ip netns exec ha ip addr add 192.168.0.1/24 dev veth0
ip netns exec hb ip addr add 192.168.0.2/24 dev veth0

ip netns exec ha ip link add gretap1 type gretap local 192.168.0.1 remote 192.168.0.2
ip netns exec ha ip link set gretap1 up
ip netns exec ha ip addr add 1.1.1.1/24 dev gretap1
ip netns exec ha ip addr add 1111::1/64 dev gretap1

ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1

ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
sleep 30

# recreate gretap
ip netns exec hb ip link del gretap1
ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1
ip netns exec hb ip link show dev gretap1

ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
sleep 10
ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
---

Hangbin Liu (8):
  net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  ip6_gre: do not confirm neighbor when do pmtu update
  gtp: do not confirm neighbor when do pmtu update
  net/dst: add new function skb_dst_update_pmtu_no_confirm
  tunnel: do not confirm neighbor when do pmtu update
  vti: do not confirm neighbor when do pmtu update
  sit: do not confirm neighbor when do pmtu update
  net/dst: do not confirm neighbor for vxlan and geneve pmtu update

 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                | 13 +++++++++++--
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/ip_tunnel.c             |  2 +-
 net/ipv4/ip_vti.c                |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/ip6_tunnel.c            |  4 ++--
 net/ipv6/ip6_vti.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/sit.c                   |  2 +-
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 19 files changed, 58 insertions(+), 32 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv4 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
                         ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
  - tnl_update_pmtu()
    - skb_dst_update_pmtu()
      - ip6_rt_update_pmtu()
        - __ip6_rt_update_pmtu()
          - dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                |  2 +-
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 14 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index ecfe26215935..9cac0accba7a 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -541,7 +541,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		mtu = dst_mtu(&rt->dst);
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu);
+	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, true);
 
 	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
 	    mtu < ntohs(iph->tot_len)) {
diff --git a/include/net/dst.h b/include/net/dst.h
index fe62fe2eb781..0739e84152e4 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -516,7 +516,7 @@ static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
 	struct dst_entry *dst = skb_dst(skb);
 
 	if (dst && dst->ops->update_pmtu)
-		dst->ops->update_pmtu(dst, NULL, skb, mtu);
+		dst->ops->update_pmtu(dst, NULL, skb, mtu, true);
 }
 
 static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 5ec645f27ee3..443863c7b8da 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -27,7 +27,8 @@ struct dst_ops {
 	struct dst_entry *	(*negative_advice)(struct dst_entry *);
 	void			(*link_failure)(struct sk_buff *);
 	void			(*update_pmtu)(struct dst_entry *dst, struct sock *sk,
-					       struct sk_buff *skb, u32 mtu);
+					       struct sk_buff *skb, u32 mtu,
+					       bool confirm_neigh);
 	void			(*redirect)(struct dst_entry *dst, struct sock *sk,
 					    struct sk_buff *skb);
 	int			(*local_out)(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
index 2cdfc5d6c25d..8c69f0c95a8e 100644
--- a/net/bridge/br_nf_core.c
+++ b/net/bridge/br_nf_core.c
@@ -22,7 +22,8 @@
 #endif
 
 static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			     struct sk_buff *skb, u32 mtu)
+			     struct sk_buff *skb, u32 mtu,
+			     bool confirm_neigh)
 {
 }
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index aea918135ec3..08c3dc45f1a4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -110,7 +110,8 @@ static void dn_dst_ifdown(struct dst_entry *, struct net_device *dev, int how);
 static struct dst_entry *dn_dst_negative_advice(struct dst_entry *);
 static void dn_dst_link_failure(struct sk_buff *);
 static void dn_dst_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb , u32 mtu);
+			       struct sk_buff *skb , u32 mtu,
+			       bool confirm_neigh);
 static void dn_dst_redirect(struct dst_entry *dst, struct sock *sk,
 			    struct sk_buff *skb);
 static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst,
@@ -251,7 +252,8 @@ static int dn_dst_gc(struct dst_ops *ops)
  * advertise to the other end).
  */
 static void dn_dst_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb, u32 mtu)
+			       struct sk_buff *skb, u32 mtu,
+			       bool confirm_neigh)
 {
 	struct dn_route *rt = (struct dn_route *) dst;
 	struct neighbour *n = rt->n;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index e4c6e8b40490..18c0d5bffe12 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1086,7 +1086,7 @@ struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu)
 		if (!dst)
 			goto out;
 	}
-	dst->ops->update_pmtu(dst, sk, NULL, mtu);
+	dst->ops->update_pmtu(dst, sk, NULL, mtu, true);
 
 	dst = __sk_dst_check(sk, 0);
 	if (!dst)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f88c93c38f11..87e979f2b74a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -139,7 +139,8 @@ static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
 static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					   struct sk_buff *skb, u32 mtu);
+					   struct sk_buff *skb, u32 mtu,
+					   bool confirm_neigh);
 static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static void		ipv4_dst_destroy(struct dst_entry *dst);
@@ -1043,7 +1044,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 }
 
 static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct rtable *rt = (struct rtable *) dst;
 	struct flowi4 fl4;
@@ -2687,7 +2689,8 @@ static unsigned int ipv4_blackhole_mtu(const struct dst_entry *dst)
 }
 
 static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					  struct sk_buff *skb, u32 mtu)
+					  struct sk_buff *skb, u32 mtu,
+					  bool confirm_neigh)
 {
 }
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 35b84b52b702..9ebd54752e03 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -100,12 +100,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 }
 
 static void xfrm4_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	path->ops->update_pmtu(path, sk, skb, mtu);
+	path->ops->update_pmtu(path, sk, skb, mtu, confirm_neigh);
 }
 
 static void xfrm4_redirect(struct dst_entry *dst, struct sock *sk,
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index fe9cb8d1adca..e315526fa244 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -146,7 +146,7 @@ struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu)
 
 	if (IS_ERR(dst))
 		return NULL;
-	dst->ops->update_pmtu(dst, sk, NULL, mtu);
+	dst->ops->update_pmtu(dst, sk, NULL, mtu, true);
 
 	dst = inet6_csk_route_socket(sk, &fl6);
 	return IS_ERR(dst) ? NULL : dst;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 923034c52ce4..071cb237f00b 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1040,7 +1040,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 
 	/* TooBig packet may have updated dst->dev's mtu */
 	if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
-		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu);
+		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, true);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
 			   NEXTHDR_GRE);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b59940416cb5..affb51c11a25 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -95,7 +95,8 @@ static int		ip6_pkt_prohibit(struct sk_buff *skb);
 static int		ip6_pkt_prohibit_out(struct net *net, struct sock *sk, struct sk_buff *skb);
 static void		ip6_link_failure(struct sk_buff *skb);
 static void		ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					   struct sk_buff *skb, u32 mtu);
+					   struct sk_buff *skb, u32 mtu,
+					   bool confirm_neigh);
 static void		rt6_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static int rt6_score_route(const struct fib6_nh *nh, u32 fib6_flags, int oif,
@@ -264,7 +265,8 @@ static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
 }
 
 static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					 struct sk_buff *skb, u32 mtu)
+					 struct sk_buff *skb, u32 mtu,
+					 bool confirm_neigh)
 {
 }
 
@@ -2692,7 +2694,8 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 }
 
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
-				 const struct ipv6hdr *iph, u32 mtu)
+				 const struct ipv6hdr *iph, u32 mtu,
+				 bool confirm_neigh)
 {
 	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
@@ -2710,7 +2713,10 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 		daddr = NULL;
 		saddr = NULL;
 	}
-	dst_confirm_neigh(dst, daddr);
+
+	if (confirm_neigh)
+		dst_confirm_neigh(dst, daddr);
+
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
@@ -2764,9 +2770,11 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 }
 
 static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb, u32 mtu)
+			       struct sk_buff *skb, u32 mtu,
+			       bool confirm_neigh)
 {
-	__ip6_rt_update_pmtu(dst, sk, skb ? ipv6_hdr(skb) : NULL, mtu);
+	__ip6_rt_update_pmtu(dst, sk, skb ? ipv6_hdr(skb) : NULL, mtu,
+			     confirm_neigh);
 }
 
 void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
@@ -2785,7 +2793,7 @@ void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (!dst->error)
-		__ip6_rt_update_pmtu(dst, NULL, iph, ntohl(mtu));
+		__ip6_rt_update_pmtu(dst, NULL, iph, ntohl(mtu), true);
 	dst_release(dst);
 }
 EXPORT_SYMBOL_GPL(ip6_update_pmtu);
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 699e0730ce8e..af7a4b8b1e9c 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -98,12 +98,13 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 }
 
 static void xfrm6_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	path->ops->update_pmtu(path, sk, skb, mtu);
+	path->ops->update_pmtu(path, sk, skb, mtu, confirm_neigh);
 }
 
 static void xfrm6_redirect(struct dst_entry *dst, struct sock *sk,
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index b1e300f8881b..b00866d777fe 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -208,7 +208,7 @@ static inline void maybe_update_pmtu(int skb_af, struct sk_buff *skb, int mtu)
 	struct rtable *ort = skb_rtable(skb);
 
 	if (!skb->dev && sk && sk_fullsock(sk))
-		ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
+		ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu, true);
 }
 
 static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af,
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index f4de064598f8..806af58f4375 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -263,7 +263,7 @@ bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu)
 
 		pf->af->from_sk(&addr, sk);
 		pf->to_sk_daddr(&t->ipaddr, sk);
-		dst->ops->update_pmtu(dst, sk, NULL, pmtu);
+		dst->ops->update_pmtu(dst, sk, NULL, pmtu, true);
 		pf->to_sk_daddr(&addr, sk);
 
 		dst = sctp_transport_dst_check(t);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 3/8] gtp: " Hangbin Liu
                         ` (7 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
This will cause the neigh cache be refreshed and set to REACHABLE before
xmit.

But if the remote mac address changed, e.g. device is deleted and recreated,
we will not able to notice this and still use the old mac address as the neigh
cache is REACHABLE.

Fix this by disable neigh confirm when do pmtu update

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/ip6_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 071cb237f00b..189de56f5e36 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1040,7 +1040,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 
 	/* TooBig packet may have updated dst->dev's mtu */
 	if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
-		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, true);
+		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, false);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
 			   NEXTHDR_GRE);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 3/8] gtp: do not confirm neighbor when do pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-21 18:35         ` Guillaume Nault
  2019-12-20  3:25       ` [PATCHv4 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
                         ` (6 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although gtp only support ipv4 right now, and __ip_rt_update_pmtu() does not
call dst_confirm_neigh(), we still set it to false to keep consistency with
IPv6 code.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/gtp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 9cac0accba7a..71b34ff8e7eb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -541,7 +541,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		mtu = dst_mtu(&rt->dst);
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, true);
+	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, false);
 
 	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
 	    mtu < ntohs(iph->tot_len)) {
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (2 preceding siblings ...)
  2019-12-20  3:25       ` [PATCHv4 net 3/8] gtp: " Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
                         ` (5 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
update pmtu but should not do neighbor confirm.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/net/dst.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 0739e84152e4..208e7c0c89d8 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -519,6 +519,15 @@ static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
 		dst->ops->update_pmtu(dst, NULL, skb, mtu, true);
 }
 
+/* update dst pmtu but not do neighbor confirm */
+static inline void skb_dst_update_pmtu_no_confirm(struct sk_buff *skb, u32 mtu)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	if (dst && dst->ops->update_pmtu)
+		dst->ops->update_pmtu(dst, NULL, skb, mtu, false);
+}
+
 static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
 					 struct dst_entry *encap_dst,
 					 int headroom)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 5/8] tunnel: do not confirm neighbor when do pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (3 preceding siblings ...)
  2019-12-20  3:25       ` [PATCHv4 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-21 18:43         ` Guillaume Nault
  2019-12-20  3:25       ` [PATCHv4 net 6/8] vti: " Hangbin Liu
                         ` (4 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

v4: Update commit description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_tunnel.c  | 2 +-
 net/ipv6/ip6_tunnel.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 38c02bb62e2c..0fe2a5d3e258 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -505,7 +505,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
 	if (skb_valid_dst(skb))
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		if (!skb_is_gso(skb) &&
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 754a484d35df..2f376dbc37d5 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -640,7 +640,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		if (rel_info > dst_mtu(skb_dst(skb2)))
 			goto out;
 
-		skb_dst_update_pmtu(skb2, rel_info);
+		skb_dst_update_pmtu_no_confirm(skb2, rel_info);
 	}
 
 	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
@@ -1132,7 +1132,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	mtu = max(mtu, skb->protocol == htons(ETH_P_IPV6) ?
 		       IPV6_MIN_MTU : IPV4_MIN_MTU);
 
-	skb_dst_update_pmtu(skb, mtu);
+	skb_dst_update_pmtu_no_confirm(skb, mtu);
 	if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
 		*pmtu = mtu;
 		err = -EMSGSIZE;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 6/8] vti: do not confirm neighbor when do pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (4 preceding siblings ...)
  2019-12-20  3:25       ` [PATCHv4 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-21 18:30         ` Guillaume Nault
  2019-12-20  3:25       ` [PATCHv4 net 7/8] sit: " Hangbin Liu
                         ` (3 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although ip vti is not affected as __ip_rt_update_pmtu() does not call
dst_confirm_neigh(), we still not do neigh confirm to keep consistency with
IPv6 code.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_vti.c  | 2 +-
 net/ipv6/ip6_vti.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index cfb025606793..fb9f6d60c27c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -214,7 +214,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 		if (skb->protocol == htons(ETH_P_IP)) {
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 				  htonl(mtu));
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 024db17386d2..6f08b760c2a7 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -479,7 +479,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 		if (skb->protocol == htons(ETH_P_IPV6)) {
 			if (mtu < IPV6_MIN_MTU)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 7/8] sit: do not confirm neighbor when do pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (5 preceding siblings ...)
  2019-12-20  3:25       ` [PATCHv4 net 6/8] vti: " Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-20  3:25       ` [PATCHv4 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
                         ` (2 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/sit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index b2ccbc473127..98954830c40b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -944,7 +944,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		}
 
 		if (tunnel->parms.iph.daddr)
-			skb_dst_update_pmtu(skb, mtu);
+			skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 		if (skb->len > mtu && !skb_is_gso(skb)) {
 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv4 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (6 preceding siblings ...)
  2019-12-20  3:25       ` [PATCHv4 net 7/8] sit: " Hangbin Liu
@ 2019-12-20  3:25       ` Hangbin Liu
  2019-12-21 18:38         ` Guillaume Nault
  2019-12-20 16:14       ` [PATCHv4 net 0/8] disable neigh update for tunnels during " David Ahern
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
  9 siblings, 1 reply; 51+ messages in thread
From: Hangbin Liu @ 2019-12-20  3:25 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

So disable the neigh confirm for vxlan and geneve pmtu update.

v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/net/dst.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 208e7c0c89d8..626cf614ad86 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -535,7 +535,7 @@ static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
 	u32 encap_mtu = dst_mtu(encap_dst);
 
 	if (skb->len > encap_mtu - headroom)
-		skb_dst_update_pmtu(skb, encap_mtu - headroom);
+		skb_dst_update_pmtu_no_confirm(skb, encap_mtu - headroom);
 }
 
 #endif /* _NET_DST_H */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv4 net 0/8] disable neigh update for tunnels during pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (7 preceding siblings ...)
  2019-12-20  3:25       ` [PATCHv4 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
@ 2019-12-20 16:14       ` David Ahern
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
  9 siblings, 0 replies; 51+ messages in thread
From: David Ahern @ 2019-12-20 16:14 UTC (permalink / raw)
  To: Hangbin Liu, netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, Eric Dumazet,
	Guillaume Nault, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On 12/19/19 8:25 PM, Hangbin Liu wrote:
> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 
> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now
> 
> As the confirmed time updated, in neigh_timer_handler() the check for
> NUD_DELAY confirm time will pass and the neigh state will back to
> NUD_REACHABLE. So the old/wrong mac address will be used again.
> 
> If we do not update the confirmed time, the neigh state will go to
> neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
> neigh later, which is what IPv4 does.
> 
> We couldn't remove the ip6_confirm_neigh() directly as we still need it
> for TCP flows. To fix it, we have to pass a bool parameter to
> dst_ops.update_pmtu() and only disable neighbor update for tunnels.
> 

seems reasonable to me.

Acked-by: David Ahern <dsahern@gmail.com>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv4 net 6/8] vti: do not confirm neighbor when do pmtu update
  2019-12-20  3:25       ` [PATCHv4 net 6/8] vti: " Hangbin Liu
@ 2019-12-21 18:30         ` Guillaume Nault
  0 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-21 18:30 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Fri, Dec 20, 2019 at 11:25:23AM +0800, Hangbin Liu wrote:
> Although ip vti is not affected as __ip_rt_update_pmtu() does not call
> dst_confirm_neigh(), we still not do neigh confirm to keep consistency with
> IPv6 code.
> 
As with the GRE case, the problem is not the IP version used on the
underlay. The problem happens when the tunnel transports IPv6 packets
(which vti can do).

However, it's true that vti is immune to this problem, but that's for
another reason: it's an IFF_NOARP interface (which means vti6 is immune
too).

Patch is good (we really have no reason to confirm neighbour here), but
let's not have a misleading commit message.

> v4: No change.
> v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
>     dst_ops.update_pmtu to control whether we should do neighbor confirm.
>     Also split the big patch to small ones for each area.
> v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
> 
> Reviewed-by: Guillaume Nault <gnault@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  net/ipv4/ip_vti.c  | 2 +-
>  net/ipv6/ip6_vti.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> index cfb025606793..fb9f6d60c27c 100644
> --- a/net/ipv4/ip_vti.c
> +++ b/net/ipv4/ip_vti.c
> @@ -214,7 +214,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
>  
>  	mtu = dst_mtu(dst);
>  	if (skb->len > mtu) {
> -		skb_dst_update_pmtu(skb, mtu);
> +		skb_dst_update_pmtu_no_confirm(skb, mtu);
>  		if (skb->protocol == htons(ETH_P_IP)) {
>  			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
>  				  htonl(mtu));
> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
> index 024db17386d2..6f08b760c2a7 100644
> --- a/net/ipv6/ip6_vti.c
> +++ b/net/ipv6/ip6_vti.c
> @@ -479,7 +479,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
>  
>  	mtu = dst_mtu(dst);
>  	if (skb->len > mtu) {
> -		skb_dst_update_pmtu(skb, mtu);
> +		skb_dst_update_pmtu_no_confirm(skb, mtu);
>  
>  		if (skb->protocol == htons(ETH_P_IPV6)) {
>  			if (mtu < IPV6_MIN_MTU)
> -- 
> 2.19.2
> 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv4 net 3/8] gtp: do not confirm neighbor when do pmtu update
  2019-12-20  3:25       ` [PATCHv4 net 3/8] gtp: " Hangbin Liu
@ 2019-12-21 18:35         ` Guillaume Nault
  0 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-21 18:35 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Fri, Dec 20, 2019 at 11:25:20AM +0800, Hangbin Liu wrote:
> Although gtp only support ipv4 right now, and __ip_rt_update_pmtu() does not
> call dst_confirm_neigh(), we still set it to false to keep consistency with
> IPv6 code.
> 
I guess this explanation is ok for GTP: it seems that the current
implementation can only transport IPv4 packets.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv4 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve pmtu update
  2019-12-20  3:25       ` [PATCHv4 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
@ 2019-12-21 18:38         ` Guillaume Nault
  0 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-21 18:38 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Fri, Dec 20, 2019 at 11:25:25AM +0800, Hangbin Liu wrote:
> When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
> we should not call dst_confirm_neigh() as there is no two-way communication.
> 
> So disable the neigh confirm for vxlan and geneve pmtu update.
> 
> v4: No change.
> v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
>     dst_ops.update_pmtu to control whether we should do neighbor confirm.
>     Also split the big patch to small ones for each area.
> v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
> 
> Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
> Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
> Reviewed-by: Guillaume Nault <gnault@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  include/net/dst.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 208e7c0c89d8..626cf614ad86 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -535,7 +535,7 @@ static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
>  	u32 encap_mtu = dst_mtu(encap_dst);
>  
>  	if (skb->len > encap_mtu - headroom)
> -		skb_dst_update_pmtu(skb, encap_mtu - headroom);
> +		skb_dst_update_pmtu_no_confirm(skb, encap_mtu - headroom);
>  }
>  
Tested-by: Guillaume Nault <gnault@redhat.com>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv4 net 5/8] tunnel: do not confirm neighbor when do pmtu update
  2019-12-20  3:25       ` [PATCHv4 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-21 18:43         ` Guillaume Nault
  0 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-21 18:43 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Fri, Dec 20, 2019 at 11:25:22AM +0800, Hangbin Liu wrote:
> When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
> we should not call dst_confirm_neigh() as there is no two-way communication.
> 
> v4: Update commit description
> v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
>     dst_ops.update_pmtu to control whether we should do neighbor confirm.
>     Also split the big patch to small ones for each area.
> v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
> 
> Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
> Reviewed-by: Guillaume Nault <gnault@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  net/ipv4/ip_tunnel.c  | 2 +-
>  net/ipv6/ip6_tunnel.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 38c02bb62e2c..0fe2a5d3e258 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -505,7 +505,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
>  		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
>  
>  	if (skb_valid_dst(skb))
> -		skb_dst_update_pmtu(skb, mtu);
> +		skb_dst_update_pmtu_no_confirm(skb, mtu);
>  
>  	if (skb->protocol == htons(ETH_P_IP)) {
>  		if (!skb_is_gso(skb) &&
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index 754a484d35df..2f376dbc37d5 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -640,7 +640,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
>  		if (rel_info > dst_mtu(skb_dst(skb2)))
>  			goto out;
>  
> -		skb_dst_update_pmtu(skb2, rel_info);
> +		skb_dst_update_pmtu_no_confirm(skb2, rel_info);
>  	}
>  
>  	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
> @@ -1132,7 +1132,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
>  	mtu = max(mtu, skb->protocol == htons(ETH_P_IPV6) ?
>  		       IPV6_MIN_MTU : IPV4_MIN_MTU);
>  
> -	skb_dst_update_pmtu(skb, mtu);
> +	skb_dst_update_pmtu_no_confirm(skb, mtu);
>  	if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
>  		*pmtu = mtu;
>  		err = -EMSGSIZE;

Tested-by: Guillaume Nault <gnault@redhat.com>

Note that ip6gretap devices seem to accept frames regardless of their
destination MAC address. That's wrong, but makes this bug invisible
in practice.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv5 net 0/8] disable neigh update for tunnels during pmtu update
  2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
                         ` (8 preceding siblings ...)
  2019-12-20 16:14       ` [PATCHv4 net 0/8] disable neigh update for tunnels during " David Ahern
@ 2019-12-22  2:51       ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
                           ` (9 more replies)
  9 siblings, 10 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When we setup a pair of gretap, ping each other and create neighbour cache.
Then delete and recreate one side. We will never be able to ping6 to the new
created gretap.

The reason is when we ping6 remote via gretap, we will call like

gre_tap_xmit()
 - ip_tunnel_xmit()
   - tnl_update_pmtu()
     - skb_dst_update_pmtu()
       - ip6_rt_update_pmtu()
         - __ip6_rt_update_pmtu()
           - dst_confirm_neigh()
             - ip6_confirm_neigh()
               - __ipv6_confirm_neigh()
                 - n->confirmed = now

As the confirmed time updated, in neigh_timer_handler() the check for
NUD_DELAY confirm time will pass and the neigh state will back to
NUD_REACHABLE. So the old/wrong mac address will be used again.

If we do not update the confirmed time, the neigh state will go to
neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
neigh later, which is what IPv4 does.

We couldn't remove the ip6_confirm_neigh() directly as we still need it
for TCP flows. To fix it, we have to pass a bool parameter to
dst_ops.update_pmtu() and only disable neighbor update for tunnels.

v5: No code change, upate some commits description
v4: No code change, upate some commits description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

---
Reproducer:

#!/bin/bash
set -x
ip -a netns del
modprobe -r veth
modprobe -r bridge

ip netns add ha
ip netns add hb
ip link add br0 type bridge
ip link set br0 up
ip link add br_ha type veth peer name veth0 netns ha
ip link add br_hb type veth peer name veth0 netns hb
ip link set br_ha up
ip link set br_hb up
ip link set br_ha master br0
ip link set br_hb master br0
ip netns exec ha ip link set veth0 up
ip netns exec hb ip link set veth0 up
ip netns exec ha ip addr add 192.168.0.1/24 dev veth0
ip netns exec hb ip addr add 192.168.0.2/24 dev veth0

ip netns exec ha ip link add gretap1 type gretap local 192.168.0.1 remote 192.168.0.2
ip netns exec ha ip link set gretap1 up
ip netns exec ha ip addr add 1.1.1.1/24 dev gretap1
ip netns exec ha ip addr add 1111::1/64 dev gretap1

ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1

ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
sleep 30

# recreate gretap
ip netns exec hb ip link del gretap1
ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1
ip netns exec hb ip link show dev gretap1

ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
sleep 10
ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
---

Hangbin Liu (8):
  net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  ip6_gre: do not confirm neighbor when do pmtu update
  gtp: do not confirm neighbor when do pmtu update
  net/dst: add new function skb_dst_update_pmtu_no_confirm
  tunnel: do not confirm neighbor when do pmtu update
  vti: do not confirm neighbor when do pmtu update
  sit: do not confirm neighbor when do pmtu update
  net/dst: do not confirm neighbor for vxlan and geneve pmtu update

 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                | 13 +++++++++++--
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/ip_tunnel.c             |  2 +-
 net/ipv4/ip_vti.c                |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/ip6_tunnel.c            |  4 ++--
 net/ipv6/ip6_vti.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/sit.c                   |  2 +-
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 19 files changed, 58 insertions(+), 32 deletions(-)

-- 
2.19.2


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv5 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
                           ` (8 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.

But for tunnel code, it will call pmtu before xmit, like:
  - tnl_update_pmtu()
    - skb_dst_update_pmtu()
      - ip6_rt_update_pmtu()
        - __ip6_rt_update_pmtu()
          - dst_confirm_neigh()

If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.

So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.

On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.

To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                |  2 +-
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 14 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index e5b7d6d2286e..913062017be9 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -540,7 +540,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		mtu = dst_mtu(&rt->dst);
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu);
+	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, true);
 
 	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
 	    mtu < ntohs(iph->tot_len)) {
diff --git a/include/net/dst.h b/include/net/dst.h
index 8224dad2ae94..593630e0e076 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -516,7 +516,7 @@ static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
 	struct dst_entry *dst = skb_dst(skb);
 
 	if (dst && dst->ops->update_pmtu)
-		dst->ops->update_pmtu(dst, NULL, skb, mtu);
+		dst->ops->update_pmtu(dst, NULL, skb, mtu, true);
 }
 
 static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 5ec645f27ee3..443863c7b8da 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -27,7 +27,8 @@ struct dst_ops {
 	struct dst_entry *	(*negative_advice)(struct dst_entry *);
 	void			(*link_failure)(struct sk_buff *);
 	void			(*update_pmtu)(struct dst_entry *dst, struct sock *sk,
-					       struct sk_buff *skb, u32 mtu);
+					       struct sk_buff *skb, u32 mtu,
+					       bool confirm_neigh);
 	void			(*redirect)(struct dst_entry *dst, struct sock *sk,
 					    struct sk_buff *skb);
 	int			(*local_out)(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/bridge/br_nf_core.c b/net/bridge/br_nf_core.c
index 2cdfc5d6c25d..8c69f0c95a8e 100644
--- a/net/bridge/br_nf_core.c
+++ b/net/bridge/br_nf_core.c
@@ -22,7 +22,8 @@
 #endif
 
 static void fake_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			     struct sk_buff *skb, u32 mtu)
+			     struct sk_buff *skb, u32 mtu,
+			     bool confirm_neigh)
 {
 }
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index aea918135ec3..08c3dc45f1a4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -110,7 +110,8 @@ static void dn_dst_ifdown(struct dst_entry *, struct net_device *dev, int how);
 static struct dst_entry *dn_dst_negative_advice(struct dst_entry *);
 static void dn_dst_link_failure(struct sk_buff *);
 static void dn_dst_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb , u32 mtu);
+			       struct sk_buff *skb , u32 mtu,
+			       bool confirm_neigh);
 static void dn_dst_redirect(struct dst_entry *dst, struct sock *sk,
 			    struct sk_buff *skb);
 static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst,
@@ -251,7 +252,8 @@ static int dn_dst_gc(struct dst_ops *ops)
  * advertise to the other end).
  */
 static void dn_dst_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb, u32 mtu)
+			       struct sk_buff *skb, u32 mtu,
+			       bool confirm_neigh)
 {
 	struct dn_route *rt = (struct dn_route *) dst;
 	struct neighbour *n = rt->n;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index e4c6e8b40490..18c0d5bffe12 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1086,7 +1086,7 @@ struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu)
 		if (!dst)
 			goto out;
 	}
-	dst->ops->update_pmtu(dst, sk, NULL, mtu);
+	dst->ops->update_pmtu(dst, sk, NULL, mtu, true);
 
 	dst = __sk_dst_check(sk, 0);
 	if (!dst)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f88c93c38f11..87e979f2b74a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -139,7 +139,8 @@ static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
 static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					   struct sk_buff *skb, u32 mtu);
+					   struct sk_buff *skb, u32 mtu,
+					   bool confirm_neigh);
 static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static void		ipv4_dst_destroy(struct dst_entry *dst);
@@ -1043,7 +1044,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 }
 
 static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct rtable *rt = (struct rtable *) dst;
 	struct flowi4 fl4;
@@ -2687,7 +2689,8 @@ static unsigned int ipv4_blackhole_mtu(const struct dst_entry *dst)
 }
 
 static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					  struct sk_buff *skb, u32 mtu)
+					  struct sk_buff *skb, u32 mtu,
+					  bool confirm_neigh)
 {
 }
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 35b84b52b702..9ebd54752e03 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -100,12 +100,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 }
 
 static void xfrm4_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	path->ops->update_pmtu(path, sk, skb, mtu);
+	path->ops->update_pmtu(path, sk, skb, mtu, confirm_neigh);
 }
 
 static void xfrm4_redirect(struct dst_entry *dst, struct sock *sk,
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index fe9cb8d1adca..e315526fa244 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -146,7 +146,7 @@ struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu)
 
 	if (IS_ERR(dst))
 		return NULL;
-	dst->ops->update_pmtu(dst, sk, NULL, mtu);
+	dst->ops->update_pmtu(dst, sk, NULL, mtu, true);
 
 	dst = inet6_csk_route_socket(sk, &fl6);
 	return IS_ERR(dst) ? NULL : dst;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 923034c52ce4..071cb237f00b 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1040,7 +1040,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 
 	/* TooBig packet may have updated dst->dev's mtu */
 	if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
-		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu);
+		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, true);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
 			   NEXTHDR_GRE);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b59940416cb5..affb51c11a25 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -95,7 +95,8 @@ static int		ip6_pkt_prohibit(struct sk_buff *skb);
 static int		ip6_pkt_prohibit_out(struct net *net, struct sock *sk, struct sk_buff *skb);
 static void		ip6_link_failure(struct sk_buff *skb);
 static void		ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					   struct sk_buff *skb, u32 mtu);
+					   struct sk_buff *skb, u32 mtu,
+					   bool confirm_neigh);
 static void		rt6_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static int rt6_score_route(const struct fib6_nh *nh, u32 fib6_flags, int oif,
@@ -264,7 +265,8 @@ static unsigned int ip6_blackhole_mtu(const struct dst_entry *dst)
 }
 
 static void ip6_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
-					 struct sk_buff *skb, u32 mtu)
+					 struct sk_buff *skb, u32 mtu,
+					 bool confirm_neigh)
 {
 }
 
@@ -2692,7 +2694,8 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 }
 
 static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
-				 const struct ipv6hdr *iph, u32 mtu)
+				 const struct ipv6hdr *iph, u32 mtu,
+				 bool confirm_neigh)
 {
 	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
@@ -2710,7 +2713,10 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 		daddr = NULL;
 		saddr = NULL;
 	}
-	dst_confirm_neigh(dst, daddr);
+
+	if (confirm_neigh)
+		dst_confirm_neigh(dst, daddr);
+
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
@@ -2764,9 +2770,11 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 }
 
 static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			       struct sk_buff *skb, u32 mtu)
+			       struct sk_buff *skb, u32 mtu,
+			       bool confirm_neigh)
 {
-	__ip6_rt_update_pmtu(dst, sk, skb ? ipv6_hdr(skb) : NULL, mtu);
+	__ip6_rt_update_pmtu(dst, sk, skb ? ipv6_hdr(skb) : NULL, mtu,
+			     confirm_neigh);
 }
 
 void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
@@ -2785,7 +2793,7 @@ void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu,
 
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (!dst->error)
-		__ip6_rt_update_pmtu(dst, NULL, iph, ntohl(mtu));
+		__ip6_rt_update_pmtu(dst, NULL, iph, ntohl(mtu), true);
 	dst_release(dst);
 }
 EXPORT_SYMBOL_GPL(ip6_update_pmtu);
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 699e0730ce8e..af7a4b8b1e9c 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -98,12 +98,13 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 }
 
 static void xfrm6_update_pmtu(struct dst_entry *dst, struct sock *sk,
-			      struct sk_buff *skb, u32 mtu)
+			      struct sk_buff *skb, u32 mtu,
+			      bool confirm_neigh)
 {
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 	struct dst_entry *path = xdst->route;
 
-	path->ops->update_pmtu(path, sk, skb, mtu);
+	path->ops->update_pmtu(path, sk, skb, mtu, confirm_neigh);
 }
 
 static void xfrm6_redirect(struct dst_entry *dst, struct sock *sk,
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index b1e300f8881b..b00866d777fe 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -208,7 +208,7 @@ static inline void maybe_update_pmtu(int skb_af, struct sk_buff *skb, int mtu)
 	struct rtable *ort = skb_rtable(skb);
 
 	if (!skb->dev && sk && sk_fullsock(sk))
-		ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu);
+		ort->dst.ops->update_pmtu(&ort->dst, sk, NULL, mtu, true);
 }
 
 static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af,
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 7235a6032671..3bbe1a58ec87 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -263,7 +263,7 @@ bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu)
 
 		pf->af->from_sk(&addr, sk);
 		pf->to_sk_daddr(&t->ipaddr, sk);
-		dst->ops->update_pmtu(dst, sk, NULL, pmtu);
+		dst->ops->update_pmtu(dst, sk, NULL, pmtu, true);
 		pf->to_sk_daddr(&addr, sk);
 
 		dst = sctp_transport_dst_check(t);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 3/8] gtp: " Hangbin Liu
                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
This will cause the neigh cache be refreshed and set to REACHABLE before
xmit.

But if the remote mac address changed, e.g. device is deleted and recreated,
we will not able to notice this and still use the old mac address as the neigh
cache is REACHABLE.

Fix this by disable neigh confirm when do pmtu update

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/ip6_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 071cb237f00b..189de56f5e36 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1040,7 +1040,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
 
 	/* TooBig packet may have updated dst->dev's mtu */
 	if (!t->parms.collect_md && dst && dst_mtu(dst) > dst->dev->mtu)
-		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, true);
+		dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu, false);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
 			   NEXTHDR_GRE);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 3/8] gtp: do not confirm neighbor when do pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
                           ` (6 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although GTP only support ipv4 right now, and __ip_rt_update_pmtu() does not
call dst_confirm_neigh(), we still set it to false to keep consistency with
IPv6 code.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/gtp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 913062017be9..fca471e27f39 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -540,7 +540,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 		mtu = dst_mtu(&rt->dst);
 	}
 
-	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, true);
+	rt->dst.ops->update_pmtu(&rt->dst, NULL, skb, mtu, false);
 
 	if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
 	    mtu < ntohs(iph->tot_len)) {
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (2 preceding siblings ...)
  2019-12-22  2:51         ` [PATCHv5 net 3/8] gtp: " Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
                           ` (5 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
update pmtu but should not do neighbor confirm.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/net/dst.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 593630e0e076..dc7cc1f1051c 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -519,6 +519,15 @@ static inline void skb_dst_update_pmtu(struct sk_buff *skb, u32 mtu)
 		dst->ops->update_pmtu(dst, NULL, skb, mtu, true);
 }
 
+/* update dst pmtu but not do neighbor confirm */
+static inline void skb_dst_update_pmtu_no_confirm(struct sk_buff *skb, u32 mtu)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	if (dst && dst->ops->update_pmtu)
+		dst->ops->update_pmtu(dst, NULL, skb, mtu, false);
+}
+
 static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
 					 struct dst_entry *encap_dst,
 					 int headroom)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 5/8] tunnel: do not confirm neighbor when do pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (3 preceding siblings ...)
  2019-12-22  2:51         ` [PATCHv5 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 6/8] vti: " Hangbin Liu
                           ` (4 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

v5: No Change.
v4: Update commit description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Fixes: 0dec879f636f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_tunnel.c  | 2 +-
 net/ipv6/ip6_tunnel.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 38c02bb62e2c..0fe2a5d3e258 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -505,7 +505,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
 	if (skb_valid_dst(skb))
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		if (!skb_is_gso(skb) &&
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 754a484d35df..2f376dbc37d5 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -640,7 +640,7 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		if (rel_info > dst_mtu(skb_dst(skb2)))
 			goto out;
 
-		skb_dst_update_pmtu(skb2, rel_info);
+		skb_dst_update_pmtu_no_confirm(skb2, rel_info);
 	}
 
 	icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
@@ -1132,7 +1132,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	mtu = max(mtu, skb->protocol == htons(ETH_P_IPV6) ?
 		       IPV6_MIN_MTU : IPV4_MIN_MTU);
 
-	skb_dst_update_pmtu(skb, mtu);
+	skb_dst_update_pmtu_no_confirm(skb, mtu);
 	if (skb->len - t->tun_hlen - eth_hlen > mtu && !skb_is_gso(skb)) {
 		*pmtu = mtu;
 		err = -EMSGSIZE;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 6/8] vti: do not confirm neighbor when do pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (4 preceding siblings ...)
  2019-12-22  2:51         ` [PATCHv5 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 7/8] sit: " Hangbin Liu
                           ` (3 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although vti and vti6 are immune to this problem because they are IFF_NOARP
interfaces, as Guillaume pointed. There is still no sense to confirm neighbour
here.

v5: Update commit description.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv4/ip_vti.c  | 2 +-
 net/ipv6/ip6_vti.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index cfb025606793..fb9f6d60c27c 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -214,7 +214,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 		if (skb->protocol == htons(ETH_P_IP)) {
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 				  htonl(mtu));
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 024db17386d2..6f08b760c2a7 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -479,7 +479,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		skb_dst_update_pmtu(skb, mtu);
+		skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 		if (skb->protocol == htons(ETH_P_IPV6)) {
 			if (mtu < IPV6_MIN_MTU)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 7/8] sit: do not confirm neighbor when do pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (5 preceding siblings ...)
  2019-12-22  2:51         ` [PATCHv5 net 6/8] vti: " Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22  2:51         ` [PATCHv5 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
                           ` (2 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/sit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index b2ccbc473127..98954830c40b 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -944,7 +944,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		}
 
 		if (tunnel->parms.iph.daddr)
-			skb_dst_update_pmtu(skb, mtu);
+			skb_dst_update_pmtu_no_confirm(skb, mtu);
 
 		if (skb->len > mtu && !skb_is_gso(skb)) {
 			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (6 preceding siblings ...)
  2019-12-22  2:51         ` [PATCHv5 net 7/8] sit: " Hangbin Liu
@ 2019-12-22  2:51         ` Hangbin Liu
  2019-12-22 22:10         ` [PATCHv5 net 0/8] disable neigh update for tunnels during " Guillaume Nault
  2019-12-25  6:30         ` David Miller
  9 siblings, 0 replies; 51+ messages in thread
From: Hangbin Liu @ 2019-12-22  2:51 UTC (permalink / raw)
  To: netdev
  Cc: Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, Guillaume Nault, David Miller, Pablo Neira,
	Stephen Hemminger, Alexey Kodanev, Hangbin Liu

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

So disable the neigh confirm for vxlan and geneve pmtu update.

v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Tested-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 include/net/dst.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index dc7cc1f1051c..3448cf865ede 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -535,7 +535,7 @@ static inline void skb_tunnel_check_pmtu(struct sk_buff *skb,
 	u32 encap_mtu = dst_mtu(encap_dst);
 
 	if (skb->len > encap_mtu - headroom)
-		skb_dst_update_pmtu(skb, encap_mtu - headroom);
+		skb_dst_update_pmtu_no_confirm(skb, encap_mtu - headroom);
 }
 
 #endif /* _NET_DST_H */
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 net 0/8] disable neigh update for tunnels during pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (7 preceding siblings ...)
  2019-12-22  2:51         ` [PATCHv5 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
@ 2019-12-22 22:10         ` Guillaume Nault
  2019-12-25  6:30         ` David Miller
  9 siblings, 0 replies; 51+ messages in thread
From: Guillaume Nault @ 2019-12-22 22:10 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Julian Anastasov, Marcelo Ricardo Leitner, David Ahern,
	Eric Dumazet, David Miller, Pablo Neira, Stephen Hemminger,
	Alexey Kodanev

On Sun, Dec 22, 2019 at 10:51:08AM +0800, Hangbin Liu wrote:
> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 
> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now
> 
> As the confirmed time updated, in neigh_timer_handler() the check for
> NUD_DELAY confirm time will pass and the neigh state will back to
> NUD_REACHABLE. So the old/wrong mac address will be used again.
> 
> If we do not update the confirmed time, the neigh state will go to
> neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
> neigh later, which is what IPv4 does.
> 
> We couldn't remove the ip6_confirm_neigh() directly as we still need it
> for TCP flows. To fix it, we have to pass a bool parameter to
> dst_ops.update_pmtu() and only disable neighbor update for tunnels.
> 
No more objection from me (and you already have my Reviewed-by tag).
Thanks for your work Hangbin.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 net 0/8] disable neigh update for tunnels during pmtu update
  2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
                           ` (8 preceding siblings ...)
  2019-12-22 22:10         ` [PATCHv5 net 0/8] disable neigh update for tunnels during " Guillaume Nault
@ 2019-12-25  6:30         ` David Miller
  9 siblings, 0 replies; 51+ messages in thread
From: David Miller @ 2019-12-25  6:30 UTC (permalink / raw)
  To: liuhangbin
  Cc: netdev, ja, marcelo.leitner, dsahern, edumazet, gnault, pablo,
	stephen, alexey.kodanev

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Sun, 22 Dec 2019 10:51:08 +0800

> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 
> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now
> 
> As the confirmed time updated, in neigh_timer_handler() the check for
> NUD_DELAY confirm time will pass and the neigh state will back to
> NUD_REACHABLE. So the old/wrong mac address will be used again.
> 
> If we do not update the confirmed time, the neigh state will go to
> neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
> neigh later, which is what IPv4 does.
> 
> We couldn't remove the ip6_confirm_neigh() directly as we still need it
> for TCP flows. To fix it, we have to pass a bool parameter to
> dst_ops.update_pmtu() and only disable neighbor update for tunnels.
 ...

Series applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2019-12-25  6:30 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22  6:19 [PATCH net] ipv6/route: only update neigh confirm time if pmtu changed Hangbin Liu
2019-11-22 18:04 ` David Miller
2019-11-26  9:17   ` Hangbin Liu
2019-12-03  2:11 ` [PATCHv2 net] ipv6/route: should not update neigh confirm time during PMTU update Hangbin Liu
2019-12-03  2:47   ` David Miller
2019-12-03 10:15     ` Hangbin Liu
2019-12-03 10:25       ` Hangbin Liu
2019-12-03 19:58         ` David Miller
2019-12-10  3:36           ` Hangbin Liu
2019-12-10 17:00             ` Guillaume Nault
2019-12-18 11:53   ` [PATCH net-next 0/8] disable neigh update for tunnels during pmtu update Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 3/8] gtp: " Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
2019-12-19 17:47       ` Guillaume Nault
2019-12-20  2:36         ` Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 6/8] vti: " Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 7/8] sit: " Hangbin Liu
2019-12-18 11:53     ` [PATCH net-next 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
2019-12-19 17:49       ` Guillaume Nault
2019-12-18 12:01     ` [PATCH net-next 0/8] disable neigh update for tunnels during " Hangbin Liu
2019-12-19 17:57       ` Guillaume Nault
2019-12-20  2:48         ` Hangbin Liu
2019-12-19 17:53     ` Guillaume Nault
2019-12-20  3:25     ` [PATCHv4 net " Hangbin Liu
2019-12-20  3:25       ` [PATCHv4 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
2019-12-20  3:25       ` [PATCHv4 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
2019-12-20  3:25       ` [PATCHv4 net 3/8] gtp: " Hangbin Liu
2019-12-21 18:35         ` Guillaume Nault
2019-12-20  3:25       ` [PATCHv4 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
2019-12-20  3:25       ` [PATCHv4 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
2019-12-21 18:43         ` Guillaume Nault
2019-12-20  3:25       ` [PATCHv4 net 6/8] vti: " Hangbin Liu
2019-12-21 18:30         ` Guillaume Nault
2019-12-20  3:25       ` [PATCHv4 net 7/8] sit: " Hangbin Liu
2019-12-20  3:25       ` [PATCHv4 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
2019-12-21 18:38         ` Guillaume Nault
2019-12-20 16:14       ` [PATCHv4 net 0/8] disable neigh update for tunnels during " David Ahern
2019-12-22  2:51       ` [PATCHv5 " Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 1/8] net: add bool confirm_neigh parameter for dst_ops.update_pmtu Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 2/8] ip6_gre: do not confirm neighbor when do pmtu update Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 3/8] gtp: " Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 4/8] net/dst: add new function skb_dst_update_pmtu_no_confirm Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 5/8] tunnel: do not confirm neighbor when do pmtu update Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 6/8] vti: " Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 7/8] sit: " Hangbin Liu
2019-12-22  2:51         ` [PATCHv5 net 8/8] net/dst: do not confirm neighbor for vxlan and geneve " Hangbin Liu
2019-12-22 22:10         ` [PATCHv5 net 0/8] disable neigh update for tunnels during " Guillaume Nault
2019-12-25  6:30         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).