netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
@ 2014-01-22 15:35 Sabrina Dubroca
  2014-01-22 16:20 ` Eric Dumazet
  2014-01-22 21:34 ` Hannes Frederic Sowa
  0 siblings, 2 replies; 7+ messages in thread
From: Sabrina Dubroca @ 2014-01-22 15:35 UTC (permalink / raw)
  To: netdev; +Cc: hannes, gaofeng, Sabrina Dubroca

This patch solves bug 67951 on bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=67951

> Bug: No reply after loopback down-up
> 
> kernel:3.13.0-rc4
> 
> I did a test like this:
> On host1:
> 1)ifconfig eth0 add fd00::3
> On host2:
> 1)ifconfig eth0 add fd00::4
> 2)ping6 fd00::3
> 
> Everything went OK.
> 
> Then:
> On host1:
> 1)ifconfig lo down;ifconfig lo up
> On hsot2:
> 1)ping6 fd00::3
> 
> There was no reply from host1!


Reply from Hannes:

> This was first fixed with 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f
> ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
> and broken again in a881ae1f625c599b460cc8f8a7fcb1c438f699ad ("ipv6:
> don't call addrconf_dst_alloc again when enable lo")

This patch first reverts commit a881ae1f625, since from what I can
see, it actually stops all local routes from being added back, instead
of just the sit ones. Which is actually good, since even with a normal
interface (tried with e1000 in qemu), when I reverted a881ae1f625, I
could get the "unregister_netdevice" problem when doing rmmod e1000:

unregister_netdevice: waiting for ens3 to become free. Usage count = 1

(repeated until I removed the local routes manually)

The patch then adds a function that detects if the local route for an
IP address has been re-created before init_loopback is called. That
way, we avoid the usage count problem, but if we do a down-up on
loopback, the necessary routes are created.


Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---

I'm sending this as a RFC, because I'm not sure the way I detect if
the route has been added before is good.

It works in my tests (below). Is the call to rt6_lookup ok? Or is
there a better way to do that? The flags checked are the ones set by
addrconf_dst_alloc, so that if the route returned by rt6_lookup isn't
the local route for this IP address, it should be ignored and a new
local route created.

I've put addrconf_dst_alloc_once right above init_loopback in
addrconf.c, but it could go somewhere else, or be included directly in
init_loopback.



All suggestions and comments are welcome.

Thanks,
Sabrina

---

Here are the tests. They all fail with the unregister_netdevice
problem if I simply revert the a881ae1f625, but everything looks fine
with the patch below: no unregister_netdevice problem, and all the
local routes are back, ping works.

---

#############
# sit tests #
#############

# setup sit
ip tunnel add sit1 mode sit remote 10.0.1.10 local 10.0.1.24 ttl 10
ip link set sit1 up
ip a a fd00::1/64 dev sit1
ip r a ::/0 dev sit1

# test 1
ifconfig lo down ; ifconfig lo up
ip tunnel del sit1

# test 2
# before: setup sit tunnel again
ifconfig lo down ; ifconfig ens3 down ; ifconfig sit1 down ; ifconfig sit1 up
ip a a fd00::1/64 dev sit1
./ens3.sh
ifconfig lo up
ip tunnel del sit1


###############
# e1000 tests #
###############

#########
# ens3.sh
ifconfig ens3 up
ip a a fec0::3456/64 dev ens3
#########

./ens3.sh

# test 1
ifconfig lo down ; ifconfig lo up
rmmod e1000

# test 2
# before: setup things again
ifconfig lo down ; ifconfig ens3 down
./ens3.sh
ifconfig lo up
rmmod e1000


---
 net/ipv6/addrconf.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4b6b720..b2eb653 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
 }
 #endif
 
+struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
+					 const struct in6_addr *addr,
+					 bool anycast)
+{
+	struct net *net = dev_net(idev->dev);
+	struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);
+
+	if (rt == NULL)
+		return addrconf_dst_alloc(idev, addr, anycast);
+	else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
+		   ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
+		    (!anycast && rt->rt6i_flags & RTF_LOCAL))))
+		return addrconf_dst_alloc(idev, addr, anycast);
+
+	return ERR_PTR(-EEXIST);
+}
+
 static void init_loopback(struct net_device *dev)
 {
 	struct inet6_dev  *idev;
@@ -2611,10 +2628,7 @@ static void init_loopback(struct net_device *dev)
 			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
 				continue;
 
-			if (sp_ifa->rt)
-				continue;
-
-			sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);
+			sp_rt = addrconf_dst_alloc_once(idev, &sp_ifa->addr, false);
 
 			/* Failure cases are ignored */
 			if (!IS_ERR(sp_rt)) {
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
  2014-01-22 15:35 [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up Sabrina Dubroca
@ 2014-01-22 16:20 ` Eric Dumazet
  2014-01-22 21:34 ` Hannes Frederic Sowa
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2014-01-22 16:20 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: netdev, hannes, gaofeng

On Wed, 2014-01-22 at 16:35 +0100, Sabrina Dubroca wrote:

> 
> ---
>  net/ipv6/addrconf.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..b2eb653 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
>  }
>  #endif
>  
> +struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
> +					 const struct in6_addr *addr,
> +					 bool anycast)
> +{
> +	struct net *net = dev_net(idev->dev);
> +	struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);

This allocates a dst

> +
> +	if (rt == NULL)
> +		return addrconf_dst_alloc(idev, addr, anycast);
> +	else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
> +		   ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
> +		    (!anycast && rt->rt6i_flags & RTF_LOCAL))))
> +		return addrconf_dst_alloc(idev, addr, anycast);

But nothing seems to release it

It seems your patch lacks ip6_rt_put(rt); 

> +
> +	return ERR_PTR(-EEXIST);
> +}
> +

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
  2014-01-22 15:35 [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up Sabrina Dubroca
  2014-01-22 16:20 ` Eric Dumazet
@ 2014-01-22 21:34 ` Hannes Frederic Sowa
  2014-01-23  0:56   ` Gao feng
  1 sibling, 1 reply; 7+ messages in thread
From: Hannes Frederic Sowa @ 2014-01-22 21:34 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: netdev, gaofeng

On Wed, Jan 22, 2014 at 04:35:08PM +0100, Sabrina Dubroca wrote:
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..b2eb653 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
>  }
>  #endif
>  
> +struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
> +					 const struct in6_addr *addr,
> +					 bool anycast)
> +{
> +	struct net *net = dev_net(idev->dev);
> +	struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);

We should use addrconf_get_prefix_route here. Eric's comment applies
here, too, we do a dst_hold in addrconf_get_prefix_route and thus must
decrease the reference counter with ip6_rt_put then, too.

> +	if (rt == NULL)
> +		return addrconf_dst_alloc(idev, addr, anycast);
> +	else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
> +		   ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
> +		    (!anycast && rt->rt6i_flags & RTF_LOCAL))))
> +		return addrconf_dst_alloc(idev, addr, anycast);

The necessary flags can be given to addrconf_get_prefix_route, then.

> +	return ERR_PTR(-EEXIST);
> +}
> +
>  static void init_loopback(struct net_device *dev)
>  {
>  	struct inet6_dev  *idev;
> @@ -2611,10 +2628,7 @@ static void init_loopback(struct net_device *dev)
>  			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>  				continue;
>  
> -			if (sp_ifa->rt)
> -				continue;

We should try to clean sp_ifa->rt on ifdown so we don't have dangling
pointer to it if it is already in dst gc queue and obsolete. So even if
we leave this code in there we would never hit the continue (it seems we
have to decrease the reference counter in addrconf_ifdown before setting
sp_ifa->rt to NULL).

> -
> -			sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);
> +			sp_rt = addrconf_dst_alloc_once(idev, &sp_ifa->addr, false);
>  
>  			/* Failure cases are ignored */
>  			if (!IS_ERR(sp_rt)) {

I am fine if we get this fix in for 3.14 because it solves a real problem.
I'll already work on the full route preserving logic on ifdown/up and
would build this on this patch then.

I currently wonder if we need the relookup logic at all as we currently
remove the prefix routes in all cases (in rt6_ifdown, which iterates
over all routing tables). So we always have a obsolete dst which we
just need to clean up in addrconf_ifdown and just allocate the prefix
route in init_loopback in all cases. We just have to steal some code
from inet6_rtm_newaddr to calculate the appropriate flags from the
inet6_ifaddr's ifa_flags (e.g. RTF_EXPIRES).

Thanks a lot,

  Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
  2014-01-22 21:34 ` Hannes Frederic Sowa
@ 2014-01-23  0:56   ` Gao feng
  2014-01-23  1:01     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 7+ messages in thread
From: Gao feng @ 2014-01-23  0:56 UTC (permalink / raw)
  To: Sabrina Dubroca, netdev, Hannes Frederic Sowa

On 01/23/2014 05:34 AM, Hannes Frederic Sowa wrote:
> On Wed, Jan 22, 2014 at 04:35:08PM +0100, Sabrina Dubroca wrote:
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 4b6b720..b2eb653 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -2578,6 +2578,23 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
>>  }
>>  #endif
>>  
>> +struct rt6_info *addrconf_dst_alloc_once(struct inet6_dev *idev,
>> +					 const struct in6_addr *addr,
>> +					 bool anycast)
>> +{
>> +	struct net *net = dev_net(idev->dev);
>> +	struct rt6_info *rt = rt6_lookup(net, addr, NULL, 0, 0);
> 
> We should use addrconf_get_prefix_route here. Eric's comment applies
> here, too, we do a dst_hold in addrconf_get_prefix_route and thus must
> decrease the reference counter with ip6_rt_put then, too.
> 
>> +	if (rt == NULL)
>> +		return addrconf_dst_alloc(idev, addr, anycast);
>> +	else if (!(rt->rt6i_flags & RTF_NONEXTHOP &&
>> +		   ((anycast && rt->rt6i_flags & RTF_ANYCAST) ||
>> +		    (!anycast && rt->rt6i_flags & RTF_LOCAL))))
>> +		return addrconf_dst_alloc(idev, addr, anycast);
> 
> The necessary flags can be given to addrconf_get_prefix_route, then.
> 
>> +	return ERR_PTR(-EEXIST);
>> +}
>> +
>>  static void init_loopback(struct net_device *dev)
>>  {
>>  	struct inet6_dev  *idev;
>> @@ -2611,10 +2628,7 @@ static void init_loopback(struct net_device *dev)
>>  			if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>>  				continue;
>>  
>> -			if (sp_ifa->rt)
>> -				continue;
> 
> We should try to clean sp_ifa->rt on ifdown so we don't have dangling
> pointer to it if it is already in dst gc queue and obsolete. So even if
> we leave this code in there we would never hit the continue (it seems we
> have to decrease the reference counter in addrconf_ifdown before setting
> sp_ifa->rt to NULL).
> 
>> -
>> -			sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);
>> +			sp_rt = addrconf_dst_alloc_once(idev, &sp_ifa->addr, false);
>>  
>>  			/* Failure cases are ignored */
>>  			if (!IS_ERR(sp_rt)) {
> 
> I am fine if we get this fix in for 3.14 because it solves a real problem.
> I'll already work on the full route preserving logic on ifdown/up and
> would build this on this patch then.
> 
> I currently wonder if we need the relookup logic at all as we currently
> remove the prefix routes in all cases (in rt6_ifdown, which iterates
> over all routing tables). So we always have a obsolete dst which we
> just need to clean up in addrconf_ifdown and just allocate the prefix
> route in init_loopback in all cases. We just have to steal some code
> from inet6_rtm_newaddr to calculate the appropriate flags from the
> inet6_ifaddr's ifa_flags (e.g. RTF_EXPIRES).
> 

I still prefer the patch I send before. I don't like the rt6i_flags,
it is not clear and easy to misunderstand.

But if you all think this one is good, I'm ok, but I don't have enough
time to review it deeply.

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1a341f7..4dca886 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2610,8 +2610,16 @@ static void init_loopback(struct net_device *dev)
                        if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
                                continue;

-                       if (sp_ifa->rt)
-                               continue;
+                       if (sp_ifa->rt) {
+                               /* This dst has been added to garbage list when
+                                * lo device down, delete this obsolete dst and
+                                * reallocate new router for ifa. */
+                               if (sp_ifa->rt->dst.obsolete > 0) {
+                                       ip6_del_rt(sp_ifa->rt);
+                                       sp_ifa->rt = NULL;
+                               } else
+                                       continue;
+                       }

                        sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
  2014-01-23  0:56   ` Gao feng
@ 2014-01-23  1:01     ` Hannes Frederic Sowa
  2014-01-23  6:23       ` Gao feng
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Frederic Sowa @ 2014-01-23  1:01 UTC (permalink / raw)
  To: Gao feng; +Cc: Sabrina Dubroca, netdev

On Thu, Jan 23, 2014 at 08:56:37AM +0800, Gao feng wrote:
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 1a341f7..4dca886 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2610,8 +2610,16 @@ static void init_loopback(struct net_device *dev)
>                         if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>                                 continue;
> 
> -                       if (sp_ifa->rt)
> -                               continue;
> +                       if (sp_ifa->rt) {
> +                               /* This dst has been added to garbage list when
> +                                * lo device down, delete this obsolete dst and
> +                                * reallocate new router for ifa. */
> +                               if (sp_ifa->rt->dst.obsolete > 0) {
> +                                       ip6_del_rt(sp_ifa->rt);
> +                                       sp_ifa->rt = NULL;
> +                               } else
> +                                       continue;
> +                       }
> 
>                         sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);

I agree, this seems a lot simpler. In the end I would like to replace this
conditional loopback up/down thing with something like below. I haven't done
the correct hookups into the relevant places, but I hope you get the idea:

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 017badb..1648a59a 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -114,7 +114,8 @@ struct rt6_rtnl_dump_arg {
 };
 
 int rt6_dump_route(struct rt6_info *rt, void *p_arg);
-void rt6_ifdown(struct net *net, struct net_device *dev);
+void rt6_ifdown(struct net *net, struct net_device *dev, bool unregister);
+void rt6_ifup(struct net_device *dev);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 
diff --git a/include/uapi/linux/ipv6_route.h b/include/uapi/linux/ipv6_route.h
index 2be7bd1..5dd40ed 100644
--- a/include/uapi/linux/ipv6_route.h
+++ b/include/uapi/linux/ipv6_route.h
@@ -15,6 +15,7 @@
 
 #include <linux/types.h>
 
+#define RTF_DEAD	0x00008000	/* route dead bcs interface down */
 #define RTF_DEFAULT	0x00010000	/* default - learned via ND	*/
 #define RTF_ALLONLINK	0x00020000	/* (deprecated and will be removed)
 					   fallback, no routers on link */
diff --git a/include/uapi/linux/route.h b/include/uapi/linux/route.h
index 6600708..9099a5f 100644
--- a/include/uapi/linux/route.h
+++ b/include/uapi/linux/route.h
@@ -60,7 +60,7 @@ struct rtentry {
 #define RTF_REJECT	0x0200		/* Reject route			*/
 
 /*
- *	<linux/ipv6_route.h> uses RTF values >= 64k
+ *	<linux/ipv6_route.h> uses RTF values >= 32k
  */
 
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6913a82..e2df0f9 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -143,7 +143,7 @@ static void addrconf_leave_anycast(struct inet6_ifaddr *ifp);
 
 static void addrconf_type_change(struct net_device *dev,
 				 unsigned long event);
-static int addrconf_ifdown(struct net_device *dev, int how);
+static int addrconf_ifdown(struct net_device *dev, bool unregister);
 
 static struct rt6_info *addrconf_get_prefix_route(const struct in6_addr *pfx,
 						  int plen,
@@ -2882,7 +2882,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 			 * IPV6_MIN_MTU stop IPv6 on this interface.
 			 */
 			if (dev->mtu < IPV6_MIN_MTU)
-				addrconf_ifdown(dev, 1);
+				addrconf_ifdown(dev, true);
 		}
 		break;
 
@@ -2952,7 +2952,7 @@ static void addrconf_type_change(struct net_device *dev, unsigned long event)
 		ipv6_mc_unmap(idev);
 }
 
-static int addrconf_ifdown(struct net_device *dev, int how)
+static int addrconf_ifdown(struct net_device *dev, bool unregister)
 {
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
@@ -2961,7 +2961,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 	ASSERT_RTNL();
 
-	rt6_ifdown(net, dev);
+	rt6_ifdown(net, dev, unregister);
 	neigh_ifdown(&nd_tbl, dev);
 
 	idev = __in6_dev_get(dev);
@@ -2972,7 +2972,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	 * Step 1: remove reference to ipv6 device from parent device.
 	 *	   Do not dev_put!
 	 */
-	if (how) {
+	if (unregister) {
 		idev->dead = 1;
 
 		/* protected by rtnl_lock */
@@ -3004,10 +3004,10 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	addrconf_del_rs_timer(idev);
 
 	/* Step 2: clear flags for stateless addrconf */
-	if (!how)
+	if (!unregister)
 		idev->if_flags &= ~(IF_RS_SENT|IF_RA_RCVD|IF_READY);
 
-	if (how && del_timer(&idev->regen_timer))
+	if (unregister && del_timer(&idev->regen_timer))
 		in6_dev_put(idev);
 
 	/* Step 3: clear tempaddr list */
@@ -3053,7 +3053,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	write_unlock_bh(&idev->lock);
 
 	/* Step 5: Discard multicast list */
-	if (how)
+	if (unregister)
 		ipv6_mc_destroy_dev(idev);
 	else
 		ipv6_mc_down(idev);
@@ -3061,7 +3061,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	idev->tstamp = jiffies;
 
 	/* Last: Shot the device (if unregistered) */
-	if (how) {
+	if (unregister) {
 		addrconf_sysctl_unregister(idev);
 		neigh_parms_release(&nd_tbl, idev->nd_parms);
 		neigh_ifdown(&nd_tbl, dev);
@@ -5309,9 +5309,9 @@ void addrconf_cleanup(void)
 	for_each_netdev(&init_net, dev) {
 		if (__in6_dev_get(dev) == NULL)
 			continue;
-		addrconf_ifdown(dev, 1);
+		addrconf_ifdown(dev, true);
 	}
-	addrconf_ifdown(init_net.loopback_dev, 2);
+	addrconf_ifdown(init_net.loopback_dev, true);
 
 	/*
 	 *	Check hash table.
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 075602f..1132cfb 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1,3 +1,4 @@
+
 /*
  *	Linux INET6 implementation
  *	Forwarding Information Database
@@ -1711,7 +1712,7 @@ out_timer:
 
 static void fib6_net_exit(struct net *net)
 {
-	rt6_ifdown(net, NULL);
+	rt6_ifdown(net, NULL, true);
 	del_timer_sync(&net->ipv6.ip6_fib_timer);
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 11dac21..fb69e8b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2259,32 +2259,70 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp)
 	fib6_clean_all(net, fib6_remove_prefsrc, &adni);
 }
 
+enum arg_dev_net_action {
+	ARG_DEV_NET_REMOVE = 0,
+	ARG_DEV_NET_DISABLE,
+	ARG_DEV_NET_ENABLE,
+};
+
 struct arg_dev_net {
 	struct net_device *dev;
 	struct net *net;
+	enum arg_dev_net_action action;
 };
 
-static int fib6_ifdown(struct rt6_info *rt, void *arg)
+static int __fib6_match_or_update_if(struct rt6_info *rt, void *arg)
 {
 	const struct arg_dev_net *adn = arg;
 	const struct net_device *dev = adn->dev;
 
 	if ((rt->dst.dev == dev || !dev) &&
-	    rt != adn->net->ipv6.ip6_null_entry)
-		return -1;
+	    rt != adn->net->ipv6.ip6_null_entry) {
+		switch (adn->action) {
+		case ARG_DEV_NET_REMOVE:
+			/* remove rt */
+			return -1;
+		case ARG_DEV_NET_DISABLE:
+			WARN_ON(rt->rt6i_flags & RTF_DEAD);
+			rt->rt6i_flags |= RTF_DEAD;
+			return 0;
+		case ARG_DEV_NET_ENABLE:
+			WARN_ON(!(rt->rt6i_flags & RTF_DEAD));
+			rt->rt6i_flags &= ~RTF_DEAD;
+			return 0;
+		}
+	}
 
 	return 0;
 }
 
-void rt6_ifdown(struct net *net, struct net_device *dev)
+
+static void __rt6_fib_action(struct net *net, struct net_device *dev,
+			     enum arg_dev_net_action action)
 {
 	struct arg_dev_net adn = {
 		.dev = dev,
 		.net = net,
+		.action = action,
 	};
 
-	fib6_clean_all(net, fib6_ifdown, &adn);
-	icmp6_clean_all(fib6_ifdown, &adn);
+	fib6_clean_all(net, __fib6_match_or_update_if, &adn);
+	if (action == ARG_DEV_NET_REMOVE ||
+	    action == ARG_DEV_NET_DISABLE) {
+		adn.action = ARG_DEV_NET_REMOVE;
+		icmp6_clean_all(__fib6_match_or_update_if, &adn);
+	}
+}
+
+void rt6_ifdown(struct net *net, struct net_device *dev, bool unregister)
+{
+	__rt6_fib_action(net, dev, unregister ? ARG_DEV_NET_REMOVE :
+				       ARG_DEV_NET_DISABLE);
+}
+
+void rt6_ifup(struct net_device *dev)
+{
+	__rt6_fib_action(dev_net(dev), dev, ARG_DEV_NET_ENABLE);
 }
 
 struct rt6_mtu_change_arg {

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
  2014-01-23  1:01     ` Hannes Frederic Sowa
@ 2014-01-23  6:23       ` Gao feng
  2014-01-23 14:11         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 7+ messages in thread
From: Gao feng @ 2014-01-23  6:23 UTC (permalink / raw)
  To: Sabrina Dubroca, netdev, Hannes Frederic Sowa

On 01/23/2014 09:01 AM, Hannes Frederic Sowa wrote:
> On Thu, Jan 23, 2014 at 08:56:37AM +0800, Gao feng wrote:
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 1a341f7..4dca886 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -2610,8 +2610,16 @@ static void init_loopback(struct net_device *dev)
>>                         if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
>>                                 continue;
>>
>> -                       if (sp_ifa->rt)
>> -                               continue;
>> +                       if (sp_ifa->rt) {
>> +                               /* This dst has been added to garbage list when
>> +                                * lo device down, delete this obsolete dst and
>> +                                * reallocate new router for ifa. */
>> +                               if (sp_ifa->rt->dst.obsolete > 0) {
>> +                                       ip6_del_rt(sp_ifa->rt);
>> +                                       sp_ifa->rt = NULL;
>> +                               } else
>> +                                       continue;
>> +                       }
>>
>>                         sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, false);
> 
> I agree, this seems a lot simpler. 

I will post a official version.

> In the end I would like to replace this
> conditional loopback up/down thing with something like below. I haven't done
> the correct hookups into the relevant places, but I hope you get the idea:
> 

The code explains everything. the routers will be invalid when device is down
and be enabled when device is up.

look forward to your patch.

> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 017badb..1648a59a 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h

[...]
>  
> -static int fib6_ifdown(struct rt6_info *rt, void *arg)
> +static int __fib6_match_or_update_if(struct rt6_info *rt, void *arg)
>  {
>  	const struct arg_dev_net *adn = arg;
>  	const struct net_device *dev = adn->dev;
>  
>  	if ((rt->dst.dev == dev || !dev) &&
> -	    rt != adn->net->ipv6.ip6_null_entry)
> -		return -1;
> +	    rt != adn->net->ipv6.ip6_null_entry) {
> +		switch (adn->action) {
> +		case ARG_DEV_NET_REMOVE:
> +			/* remove rt */
> +			return -1;
> +		case ARG_DEV_NET_DISABLE:
> +			WARN_ON(rt->rt6i_flags & RTF_DEAD);
> +			rt->rt6i_flags |= RTF_DEAD;
> +			return 0;
> +		case ARG_DEV_NET_ENABLE:
> +			WARN_ON(!(rt->rt6i_flags & RTF_DEAD));

I think this may happen, think a new router is cloned from this rt but hasn't been inserted
into the router tree on other CPU at the same time.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up
  2014-01-23  6:23       ` Gao feng
@ 2014-01-23 14:11         ` Hannes Frederic Sowa
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Frederic Sowa @ 2014-01-23 14:11 UTC (permalink / raw)
  To: Gao feng; +Cc: Sabrina Dubroca, netdev

On Thu, Jan 23, 2014 at 02:23:27PM +0800, Gao feng wrote:
> >  
> > -static int fib6_ifdown(struct rt6_info *rt, void *arg)
> > +static int __fib6_match_or_update_if(struct rt6_info *rt, void *arg)
> >  {
> >  	const struct arg_dev_net *adn = arg;
> >  	const struct net_device *dev = adn->dev;
> >  
> >  	if ((rt->dst.dev == dev || !dev) &&
> > -	    rt != adn->net->ipv6.ip6_null_entry)
> > -		return -1;
> > +	    rt != adn->net->ipv6.ip6_null_entry) {
> > +		switch (adn->action) {
> > +		case ARG_DEV_NET_REMOVE:
> > +			/* remove rt */
> > +			return -1;
> > +		case ARG_DEV_NET_DISABLE:
> > +			WARN_ON(rt->rt6i_flags & RTF_DEAD);
> > +			rt->rt6i_flags |= RTF_DEAD;
> > +			return 0;
> > +		case ARG_DEV_NET_ENABLE:
> > +			WARN_ON(!(rt->rt6i_flags & RTF_DEAD));
> 
> I think this may happen, think a new router is cloned from this rt but hasn't been inserted
> into the router tree on other CPU at the same time.

Yes, there are still some problems with this patch. I actually shuffeled
the code around to purge all those RTF_CACHE RTF_DEAD routes yesterday
while holding the same write_lock on the table.

Regarding the problem you addressed, I tried to validate the route
using dst.from on insert while holding write_lock, somehow like we do
for checking expire time. But I really dislike this.

Thanks,

  Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-23 14:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-22 15:35 [RFC PATCH net] IPv6: Fix broken IPv6 routing table after loopback down-up Sabrina Dubroca
2014-01-22 16:20 ` Eric Dumazet
2014-01-22 21:34 ` Hannes Frederic Sowa
2014-01-23  0:56   ` Gao feng
2014-01-23  1:01     ` Hannes Frederic Sowa
2014-01-23  6:23       ` Gao feng
2014-01-23 14:11         ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).