All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
@ 2010-09-28 15:24 Nicolas Dichtel
  2010-09-28 16:33 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2010-09-28 15:24 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

Hi,

I face a problem when I try to remove an interface, 
netdev_wait_allrefs() complains about refcount.

Here is a trivial scenario to reproduce the problem:
# ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
# ./a.out tunl1
# ip tunnel del tunl1

Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
(SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
(IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).

Note2: when a.out is executed, tunl1 has no ip address and is down.

Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice: 
waiting for tunl1 to become free. Usage count = 3" and after some time, 
interface is removed.

The problem is that route cache entries are only invalidate on 
UNREGISTER event, and not removed (introduced by commit 
e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that 
rt_check_expire() remove the remaining route cache entries.

To fix the problem, I propose to remove a part of the previous commit.

Regards,
Nicolas

[-- Attachment #2: 0001-ipv4-remove-all-rt-cache-entries-on-UNREGISTER-even.patch --]
[-- Type: text/x-diff, Size: 2258 bytes --]

>From 3344e2e0431fe803c4dac8757a8746908357d780 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue, 28 Sep 2010 16:38:19 +0200
Subject: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event

Commit e2ce146848c81af2f6d42e67990191c284bf0c33 (ipv4: factorize cache clearing
for batched unregister operations) add a new parameter to fib_disable_ip() to
only invalidate route cache entries on unregister event.
This is wrong, we should ensure that all cache entries are removed on
unregister event, else netdev_wait_allrefs() may complain. A cache entry
can be created between event DOWN and UNREGISTER.

So, I revert a part of the patch.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv4/fib_frontend.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 7d02a9f..377e815 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -917,11 +917,11 @@ static void nl_fib_lookup_exit(struct net *net)
 	net->ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, int force, int delay)
+static void fib_disable_ip(struct net_device *dev, int force)
 {
 	if (fib_sync_down_dev(dev, force))
 		fib_flush(dev_net(dev));
-	rt_cache_flush(dev_net(dev), delay);
+	rt_cache_flush(dev_net(dev), 0);
 	arp_ifdown(dev);
 }
 
@@ -944,7 +944,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
 			/* Last address was deleted from this interface.
 			   Disable IP.
 			 */
-			fib_disable_ip(dev, 1, 0);
+			fib_disable_ip(dev, 1);
 		} else {
 			rt_cache_flush(dev_net(dev), -1);
 		}
@@ -959,7 +959,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 	struct in_device *in_dev = __in_dev_get_rtnl(dev);
 
 	if (event == NETDEV_UNREGISTER) {
-		fib_disable_ip(dev, 2, -1);
+		fib_disable_ip(dev, 2);
 		return NOTIFY_DONE;
 	}
 
@@ -977,7 +977,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		rt_cache_flush(dev_net(dev), -1);
 		break;
 	case NETDEV_DOWN:
-		fib_disable_ip(dev, 0, 0);
+		fib_disable_ip(dev, 0);
 		break;
 	case NETDEV_CHANGEMTU:
 	case NETDEV_CHANGE:
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-28 15:24 [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Nicolas Dichtel
@ 2010-09-28 16:33 ` Eric Dumazet
  2010-09-28 16:45   ` Nicolas Dichtel
  2010-09-28 17:35   ` [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Octavian Purdila
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-09-28 16:33 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, Octavian Purdila

Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
> Hi,
> 
> I face a problem when I try to remove an interface, 
> netdev_wait_allrefs() complains about refcount.
> 
> Here is a trivial scenario to reproduce the problem:
> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
> # ./a.out tunl1
> # ip tunnel del tunl1
> 
> Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
> 
> Note2: when a.out is executed, tunl1 has no ip address and is down.
> 

CC Octavian Purdila, the patch author.

I am just wondering why this route is created in the first place.

Maybe a fix would be to forbid this ?

Some machines have a giant route cache, so its very important to avoid
expensive scans.

> Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice: 
> waiting for tunl1 to become free. Usage count = 3" and after some time, 
> interface is removed.
> 
> The problem is that route cache entries are only invalidate on 
> UNREGISTER event, and not removed (introduced by commit 
> e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that 
> rt_check_expire() remove the remaining route cache entries.
> 
> To fix the problem, I propose to remove a part of the previous commit.
> 
> Regards,
> Nicolas
> pièce jointe différences entre fichiers
> (0001-ipv4-remove-all-rt-cache-entries-on-UNREGISTER-even.patch)
> From 3344e2e0431fe803c4dac8757a8746908357d780 Mon Sep 17 00:00:00 2001
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Tue, 28 Sep 2010 16:38:19 +0200
> Subject: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
> 
> Commit e2ce146848c81af2f6d42e67990191c284bf0c33 (ipv4: factorize cache clearing
> for batched unregister operations) add a new parameter to fib_disable_ip() to
> only invalidate route cache entries on unregister event.
> This is wrong, we should ensure that all cache entries are removed on
> unregister event, else netdev_wait_allrefs() may complain. A cache entry
> can be created between event DOWN and UNREGISTER.
> 
> So, I revert a part of the patch.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>  net/ipv4/fib_frontend.c |   10 +++++-----
>  1 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 7d02a9f..377e815 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -917,11 +917,11 @@ static void nl_fib_lookup_exit(struct net *net)
>  	net->ipv4.fibnl = NULL;
>  }
>  
> -static void fib_disable_ip(struct net_device *dev, int force, int delay)
> +static void fib_disable_ip(struct net_device *dev, int force)
>  {
>  	if (fib_sync_down_dev(dev, force))
>  		fib_flush(dev_net(dev));
> -	rt_cache_flush(dev_net(dev), delay);
> +	rt_cache_flush(dev_net(dev), 0);
>  	arp_ifdown(dev);
>  }
>  
> @@ -944,7 +944,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
>  			/* Last address was deleted from this interface.
>  			   Disable IP.
>  			 */
> -			fib_disable_ip(dev, 1, 0);
> +			fib_disable_ip(dev, 1);
>  		} else {
>  			rt_cache_flush(dev_net(dev), -1);
>  		}
> @@ -959,7 +959,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>  	struct in_device *in_dev = __in_dev_get_rtnl(dev);
>  
>  	if (event == NETDEV_UNREGISTER) {
> -		fib_disable_ip(dev, 2, -1);
> +		fib_disable_ip(dev, 2);
>  		return NOTIFY_DONE;
>  	}
>  
> @@ -977,7 +977,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>  		rt_cache_flush(dev_net(dev), -1);
>  		break;
>  	case NETDEV_DOWN:
> -		fib_disable_ip(dev, 0, 0);
> +		fib_disable_ip(dev, 0);
>  		break;
>  	case NETDEV_CHANGEMTU:
>  	case NETDEV_CHANGE:



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-28 16:33 ` Eric Dumazet
@ 2010-09-28 16:45   ` Nicolas Dichtel
  2010-09-28 16:56     ` Eric Dumazet
  2010-09-28 17:35   ` [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Octavian Purdila
  1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2010-09-28 16:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Octavian Purdila

Eric Dumazet wrote:
> Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
>> Hi,
>>
>> I face a problem when I try to remove an interface, 
>> netdev_wait_allrefs() complains about refcount.
>>
>> Here is a trivial scenario to reproduce the problem:
>> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
>> # ./a.out tunl1
>> # ip tunnel del tunl1
>>
>> Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
>> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
>> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
>> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
>>
>> Note2: when a.out is executed, tunl1 has no ip address and is down.
>>
> 
> CC Octavian Purdila, the patch author.
> 
> I am just wondering why this route is created in the first place.
At first, I asked myself the same question, but it seems that this is 
allowed to send a packet through this kind of socket, even if interface 
is down. Packet will be destroyed by the noop qdisk.
But I agree that it is strange to perform route lookup and everything to 
   destroy the packet at the end ...
Maybe raw_sendmsg() can delete it directly ;-) ... or maybe 
ip_route_output_flow().

Any suggestions welcome.

Regards,
Nicolas

> 
> Maybe a fix would be to forbid this ?
> 
> Some machines have a giant route cache, so its very important to avoid
> expensive scans.
> 
>> Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice: 
>> waiting for tunl1 to become free. Usage count = 3" and after some time, 
>> interface is removed.
>>
>> The problem is that route cache entries are only invalidate on 
>> UNREGISTER event, and not removed (introduced by commit 
>> e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that 
>> rt_check_expire() remove the remaining route cache entries.
>>
>> To fix the problem, I propose to remove a part of the previous commit.
>>
>> Regards,
>> Nicolas
>> pièce jointe différences entre fichiers
>> (0001-ipv4-remove-all-rt-cache-entries-on-UNREGISTER-even.patch)
>> From 3344e2e0431fe803c4dac8757a8746908357d780 Mon Sep 17 00:00:00 2001
>> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Date: Tue, 28 Sep 2010 16:38:19 +0200
>> Subject: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
>>
>> Commit e2ce146848c81af2f6d42e67990191c284bf0c33 (ipv4: factorize cache clearing
>> for batched unregister operations) add a new parameter to fib_disable_ip() to
>> only invalidate route cache entries on unregister event.
>> This is wrong, we should ensure that all cache entries are removed on
>> unregister event, else netdev_wait_allrefs() may complain. A cache entry
>> can be created between event DOWN and UNREGISTER.
>>
>> So, I revert a part of the patch.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>>  net/ipv4/fib_frontend.c |   10 +++++-----
>>  1 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 7d02a9f..377e815 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -917,11 +917,11 @@ static void nl_fib_lookup_exit(struct net *net)
>>  	net->ipv4.fibnl = NULL;
>>  }
>>  
>> -static void fib_disable_ip(struct net_device *dev, int force, int delay)
>> +static void fib_disable_ip(struct net_device *dev, int force)
>>  {
>>  	if (fib_sync_down_dev(dev, force))
>>  		fib_flush(dev_net(dev));
>> -	rt_cache_flush(dev_net(dev), delay);
>> +	rt_cache_flush(dev_net(dev), 0);
>>  	arp_ifdown(dev);
>>  }
>>  
>> @@ -944,7 +944,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
>>  			/* Last address was deleted from this interface.
>>  			   Disable IP.
>>  			 */
>> -			fib_disable_ip(dev, 1, 0);
>> +			fib_disable_ip(dev, 1);
>>  		} else {
>>  			rt_cache_flush(dev_net(dev), -1);
>>  		}
>> @@ -959,7 +959,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>>  	struct in_device *in_dev = __in_dev_get_rtnl(dev);
>>  
>>  	if (event == NETDEV_UNREGISTER) {
>> -		fib_disable_ip(dev, 2, -1);
>> +		fib_disable_ip(dev, 2);
>>  		return NOTIFY_DONE;
>>  	}
>>  
>> @@ -977,7 +977,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>>  		rt_cache_flush(dev_net(dev), -1);
>>  		break;
>>  	case NETDEV_DOWN:
>> -		fib_disable_ip(dev, 0, 0);
>> +		fib_disable_ip(dev, 0);
>>  		break;
>>  	case NETDEV_CHANGEMTU:
>>  	case NETDEV_CHANGE:
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-28 16:45   ` Nicolas Dichtel
@ 2010-09-28 16:56     ` Eric Dumazet
  2010-09-29  7:49       ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-28 16:56 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, Octavian Purdila

Le mardi 28 septembre 2010 à 18:45 +0200, Nicolas Dichtel a écrit :
> Eric Dumazet wrote:
> > Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
> >> Hi,
> >>
> >> I face a problem when I try to remove an interface, 
> >> netdev_wait_allrefs() complains about refcount.
> >>
> >> Here is a trivial scenario to reproduce the problem:
> >> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
> >> # ./a.out tunl1
> >> # ip tunnel del tunl1
> >>
> >> Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
> >> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
> >> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
> >> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
> >>
> >> Note2: when a.out is executed, tunl1 has no ip address and is down.
> >>
> > 
> > CC Octavian Purdila, the patch author.
> > 
> > I am just wondering why this route is created in the first place.
> At first, I asked myself the same question, but it seems that this is 
> allowed to send a packet through this kind of socket, even if interface 
> is down. Packet will be destroyed by the noop qdisk.
> But I agree that it is strange to perform route lookup and everything to 
>    destroy the packet at the end ...
> Maybe raw_sendmsg() can delete it directly ;-) ... or maybe 
> ip_route_output_flow().
> 
> Any suggestions welcome.
> 

Hmm...

One way to track this kind of problem would be to add a WARN_ON() in
dev_hold()

-> Check that when a reference on dev is taken, we are in a known state.

Something like this ?

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 83de0eb..54bef78 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1773,6 +1774,7 @@ static inline void dev_put(struct net_device *dev)
  */
 static inline void dev_hold(struct net_device *dev)
 {
+	WARN_ON(dev->reg_state != NETREG_REGISTERED);
 	atomic_inc(&dev->refcnt);
 }
 



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-28 16:33 ` Eric Dumazet
  2010-09-28 16:45   ` Nicolas Dichtel
@ 2010-09-28 17:35   ` Octavian Purdila
  1 sibling, 0 replies; 16+ messages in thread
From: Octavian Purdila @ 2010-09-28 17:35 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Eric Dumazet, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tuesday 28 September 2010, 19:33:49

> Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
> > Hi,
> > 
> > I face a problem when I try to remove an interface,
> > netdev_wait_allrefs() complains about refcount.
> > 
> > Here is a trivial scenario to reproduce the problem:
> > # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
> > # ./a.out tunl1
> > # ip tunnel del tunl1
> > 
> > Note: a.out binary create an IPv4 raw socket, attach it to tunl1
> > (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the
> > multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header
> > (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
> > 
> > Note2: when a.out is executed, tunl1 has no ip address and is down.
> 
> CC Octavian Purdila, the patch author.
> 
> I am just wondering why this route is created in the first place.
> 
> Maybe a fix would be to forbid this ?
> 
> Some machines have a giant route cache, so its very important to avoid
> expensive scans.
> 
> > Then, I got a serie of "kernel:[1206699.728010] unregister_netdevice:
> > waiting for tunl1 to become free. Usage count = 3" and after some time,
> > interface is removed.
> > 
> > The problem is that route cache entries are only invalidate on
> > UNREGISTER event, and not removed (introduced by commit
> > e2ce146848c81af2f6d42e67990191c284bf0c33). We must wait that
> > rt_check_expire() remove the remaining route cache entries.
> > 
> > To fix the problem, I propose to remove a part of the previous commit.
> > 


Hi Nicolas,

The purpose of my original patch was to speed up interfaces deregistration 
even more after Eric's batch work. Reverting it might slow things down again, 
but since this is breaking things we should probably revert it and think a 
proper optimization afterward. I know that Eric B has done some more work in 
this area, for batch namespace cleanup, maybe the issue is not even there 
anymore. 

So, Ack from me.

We might even fully revert the patch, since the bit that is left doesn't have 
any value anymore.

Thanks,
tavi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-28 16:56     ` Eric Dumazet
@ 2010-09-29  7:49       ` Nicolas Dichtel
  2010-09-29  8:35         ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2010-09-29  7:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Octavian Purdila

Eric Dumazet wrote:
> Le mardi 28 septembre 2010 à 18:45 +0200, Nicolas Dichtel a écrit :
>> Eric Dumazet wrote:
>>> Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
>>>> Hi,
>>>>
>>>> I face a problem when I try to remove an interface, 
>>>> netdev_wait_allrefs() complains about refcount.
>>>>
>>>> Here is a trivial scenario to reproduce the problem:
>>>> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
>>>> # ./a.out tunl1
>>>> # ip tunnel del tunl1
>>>>
>>>> Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
>>>> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
>>>> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
>>>> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
>>>>
>>>> Note2: when a.out is executed, tunl1 has no ip address and is down.
>>>>
>>> CC Octavian Purdila, the patch author.
>>>
>>> I am just wondering why this route is created in the first place.
The route is created because no function will check interface status (up 
and running or down). Just at the end, the packet will be enqueued in 
the noop qdisc.

>> At first, I asked myself the same question, but it seems that this is 
>> allowed to send a packet through this kind of socket, even if interface 
>> is down. Packet will be destroyed by the noop qdisk.
>> But I agree that it is strange to perform route lookup and everything to 
>>    destroy the packet at the end ...
>> Maybe raw_sendmsg() can delete it directly ;-) ... or maybe 
>> ip_route_output_flow().
>>
>> Any suggestions welcome.
>>
> 
> Hmm...
> 
> One way to track this kind of problem would be to add a WARN_ON() in
> dev_hold()
> 
> -> Check that when a reference on dev is taken, we are in a known state.
> 
> Something like this ?
dev_hold() is done when interface is down, but before unregistering 
process start.

Regards,
Nicolas

> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 83de0eb..54bef78 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1773,6 +1774,7 @@ static inline void dev_put(struct net_device *dev)
>   */
>  static inline void dev_hold(struct net_device *dev)
>  {
> +	WARN_ON(dev->reg_state != NETREG_REGISTERED);
>  	atomic_inc(&dev->refcnt);
>  }
>  
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-29  7:49       ` Nicolas Dichtel
@ 2010-09-29  8:35         ` Eric Dumazet
  2010-09-29  9:18           ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-09-29  8:35 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, Octavian Purdila

Le mercredi 29 septembre 2010 à 09:49 +0200, Nicolas Dichtel a écrit :
> Eric Dumazet wrote:
> > Le mardi 28 septembre 2010 à 18:45 +0200, Nicolas Dichtel a écrit :
> >> Eric Dumazet wrote:
> >>> Le mardi 28 septembre 2010 à 17:24 +0200, Nicolas Dichtel a écrit :
> >>>> Hi,
> >>>>
> >>>> I face a problem when I try to remove an interface, 
> >>>> netdev_wait_allrefs() complains about refcount.
> >>>>
> >>>> Here is a trivial scenario to reproduce the problem:
> >>>> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
> >>>> # ./a.out tunl1
> >>>> # ip tunnel del tunl1
> >>>>
> >>>> Note: a.out binary create an IPv4 raw socket, attach it to tunl1 
> >>>> (SO_BINDTODEVICE), set it as multicast (IP_MULTICAST_LOOP), set the 
> >>>> multicast interface to tunl1 (IP_MULTICAST_IF), build the IP header 
> >>>> (IP_HDRINCL) and then send a single packet (192.168.6.1 -> 224.0.0.18).
> >>>>
> >>>> Note2: when a.out is executed, tunl1 has no ip address and is down.
> >>>>
> >>> CC Octavian Purdila, the patch author.
> >>>
> >>> I am just wondering why this route is created in the first place.
> The route is created because no function will check interface status (up 
> and running or down). Just at the end, the packet will be enqueued in 
> the noop qdisc.
> 

In your case maybe, but I think there is another point where we can call
dev_hold() while device is in dismantle phase.


> >> At first, I asked myself the same question, but it seems that this is 
> >> allowed to send a packet through this kind of socket, even if interface 
> >> is down. Packet will be destroyed by the noop qdisk.
> >> But I agree that it is strange to perform route lookup and everything to 
> >>    destroy the packet at the end ...
> >> Maybe raw_sendmsg() can delete it directly ;-) ... or maybe 
> >> ip_route_output_flow().
> >>
> >> Any suggestions welcome.
> >>
> > 
> > Hmm...
> > 
> > One way to track this kind of problem would be to add a WARN_ON() in
> > dev_hold()
> > 
> > -> Check that when a reference on dev is taken, we are in a known state.
> > 
> > Something like this ?
> dev_hold() is done when interface is down, but before unregistering 
> process start.

Not on my machine. I can see the backtrace sometimes.

There is a race somewhere (maybe several), and your patch only reduce
the window of this race.

I am working on it.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-29  8:35         ` Eric Dumazet
@ 2010-09-29  9:18           ` Eric Dumazet
  2010-09-30 11:49             ` Nicolas Dichtel
  2010-12-22  8:32             ` Nicolas Dichtel
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-09-29  9:18 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, Octavian Purdila

I found following patch was enough to avoid route being created if
device is down. This is still racy and needs more thinking.


diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index ac6559c..1ee0b1a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2586,9 +2586,10 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
 			goto out;
 
 		/* RACE: Check return value of inet_select_addr instead. */
-		if (__in_dev_get_rtnl(dev_out) == NULL) {
+		if (!(dev_out->flags & IFF_UP) || __in_dev_get_rtnl(dev_out) == NULL) {
 			dev_put(dev_out);
-			goto out;	/* Wrong error code */
+			err = -ENETUNREACH;
+			goto out;
 		}
 
 		if (ipv4_is_local_multicast(oldflp->fl4_dst) ||



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-29  9:18           ` Eric Dumazet
@ 2010-09-30 11:49             ` Nicolas Dichtel
  2010-12-22  8:32             ` Nicolas Dichtel
  1 sibling, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2010-09-30 11:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Octavian Purdila

Patch works well with my case.
In fact, it's more proper to returns an error to the daemon to let it 
knows that packet was not sent.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Regards,
Nicolas

Eric Dumazet wrote:
> I found following patch was enough to avoid route being created if
> device is down. This is still racy and needs more thinking.
> 
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index ac6559c..1ee0b1a 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2586,9 +2586,10 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
>  			goto out;
>  
>  		/* RACE: Check return value of inet_select_addr instead. */
> -		if (__in_dev_get_rtnl(dev_out) == NULL) {
> +		if (!(dev_out->flags & IFF_UP) || __in_dev_get_rtnl(dev_out) == NULL) {
>  			dev_put(dev_out);
> -			goto out;	/* Wrong error code */
> +			err = -ENETUNREACH;
> +			goto out;
>  		}
>  
>  		if (ipv4_is_local_multicast(oldflp->fl4_dst) ||
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-09-29  9:18           ` Eric Dumazet
  2010-09-30 11:49             ` Nicolas Dichtel
@ 2010-12-22  8:32             ` Nicolas Dichtel
  2010-12-22  9:55               ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2010-12-22  8:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Octavian Purdila

What is the status of this patch? The initial problem is still here in 2.6.37-rc5+


Regards,
Nicolas

Le 29.09.2010 11:18, Eric Dumazet a écrit :
> I found following patch was enough to avoid route being created if
> device is down. This is still racy and needs more thinking.
> 
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index ac6559c..1ee0b1a 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2586,9 +2586,10 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
>  			goto out;
>  
>  		/* RACE: Check return value of inet_select_addr instead. */
> -		if (__in_dev_get_rtnl(dev_out) == NULL) {
> +		if (!(dev_out->flags & IFF_UP) || __in_dev_get_rtnl(dev_out) == NULL) {
>  			dev_put(dev_out);
> -			goto out;	/* Wrong error code */
> +			err = -ENETUNREACH;
> +			goto out;
>  		}
>  
>  		if (ipv4_is_local_multicast(oldflp->fl4_dst) ||
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-12-22  8:32             ` Nicolas Dichtel
@ 2010-12-22  9:55               ` Eric Dumazet
  2010-12-22 10:07                 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-12-22  9:55 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Octavian Purdila, netdev

Le mercredi 22 décembre 2010 à 09:32 +0100, Nicolas Dichtel a écrit :
> What is the status of this patch? The initial problem is still here in 2.6.37-rc5+
> 

I cannot reproduce the problem on net-next-2.6, are you sure we still
need a new patch ?

# ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
# ip link show dev tunl1
19: tunl1: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN 
    link/ipip 10.16.0.72 peer 10.16.0.164
# /root/vrrp tunl1
# ip tunnel del tunl1




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-12-22  9:55               ` Eric Dumazet
@ 2010-12-22 10:07                 ` Eric Dumazet
  2010-12-22 13:43                   ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-12-22 10:07 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Octavian Purdila, netdev

Le mercredi 22 décembre 2010 à 10:55 +0100, Eric Dumazet a écrit :
> Le mercredi 22 décembre 2010 à 09:32 +0100, Nicolas Dichtel a écrit :
> > What is the status of this patch? The initial problem is still here in 2.6.37-rc5+
> > 
> 
> I cannot reproduce the problem on net-next-2.6, are you sure we still
> need a new patch ?
> 
> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
> # ip link show dev tunl1
> 19: tunl1: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN 
>     link/ipip 10.16.0.72 peer 10.16.0.164
> # /root/vrrp tunl1
> # ip tunnel del tunl1
> 
> 

(I thought commit 332dd96f7ac15e fixed the problem for you, as it did
for me)

net/dst: dst_dev_event() called after other notifiers




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: remove all rt cache entries on UNREGISTER event
  2010-12-22 10:07                 ` Eric Dumazet
@ 2010-12-22 13:43                   ` Nicolas Dichtel
  2010-12-22 14:39                     ` [PATCH] ipv4: dont create routes on down devices Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2010-12-22 13:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Octavian Purdila, netdev

Yes, I saw this commit, but I still got the problem:

shelby:/home/root/src# uname -a
Linux shelby 2.6.37-rc5+ #10 SMP Wed Dec 22 05:02:53 EST 2010 i686 GNU/Linux
shelby:/home/root/src# ip tunnel add mode ipip remote 10.16.0.164 local 
10.16.0.72 dev eth0
shelby:/home/root/src# ./a.out tunl1
shelby:/home/root/src# ip tunnel del tunl1

Message from syslogd@shelby at Dec 22 10:12:08 ...
  kernel:[18459.828011] unregister_netdevice: waiting for tunl1 to become free. 
Usage count = 3

Message from syslogd@shelby at Dec 22 10:12:19 ...
  kernel:[18470.072017] unregister_netdevice: waiting for tunl1 to become free. 
Usage count = 3

Message from syslogd@shelby at Dec 22 10:12:29 ...
  kernel:[18480.316011] unregister_netdevice: waiting for tunl1 to become free. 
Usage count = 3

Message from syslogd@shelby at Dec 22 10:12:39 ...
  kernel:[18490.560010] unregister_netdevice: waiting for tunl1 to become free. 
Usage count = 3
shelby:/home/root/src#

I don't know if I will have time to investigate more before next year.

Regards,
Nicolas


Le 22.12.2010 11:07, Eric Dumazet a écrit :
> Le mercredi 22 décembre 2010 à 10:55 +0100, Eric Dumazet a écrit :
>> Le mercredi 22 décembre 2010 à 09:32 +0100, Nicolas Dichtel a écrit :
>>> What is the status of this patch? The initial problem is still here in 2.6.37-rc5+
>>>
>> I cannot reproduce the problem on net-next-2.6, are you sure we still
>> need a new patch ?
>>
>> # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0
>> # ip link show dev tunl1
>> 19: tunl1: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN 
>>     link/ipip 10.16.0.72 peer 10.16.0.164
>> # /root/vrrp tunl1
>> # ip tunnel del tunl1
>>
>>
> 
> (I thought commit 332dd96f7ac15e fixed the problem for you, as it did
> for me)
> 
> net/dst: dst_dev_event() called after other notifiers
> 
> 
> 

-- 
Nicolas DICHTEL
6WIND
R&D Engineer

Tel: +33 1 39 30 92 10
Fax: +33 1 39 30 92 11
nicolas.dichtel@6wind.com
www.6wind.com
Join the Multicore Packet Processing Forum: www.multicorepacketprocessing.com

Ce courriel ainsi que toutes les pièces jointes, est uniquement destiné à son ou 
ses destinataires. Il contient des informations confidentielles qui sont la 
propriété de 6WIND. Toute révélation, distribution ou copie des informations 
qu'il contient est strictement interdite. Si vous avez reçu ce message par 
erreur, veuillez immédiatement le signaler à l'émetteur et détruire toutes les 
données reçues.

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and contains information that is confidential and 
proprietary to 6WIND. All unauthorized review, use, disclosure or distribution 
is prohibited. If you are not the intended recipient, please contact the sender 
by reply e-mail and destroy all copies of the original message.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] ipv4: dont create routes on down devices
  2010-12-22 13:43                   ` Nicolas Dichtel
@ 2010-12-22 14:39                     ` Eric Dumazet
  2010-12-23  8:50                       ` Octavian Purdila
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-12-22 14:39 UTC (permalink / raw)
  To: nicolas.dichtel, David Miller; +Cc: Octavian Purdila, netdev

Le mercredi 22 décembre 2010 à 14:43 +0100, Nicolas Dichtel a écrit :
> Yes, I saw this commit, but I still got the problem:
> 
> shelby:/home/root/src# uname -a
> Linux shelby 2.6.37-rc5+ #10 SMP Wed Dec 22 05:02:53 EST 2010 i686 GNU/Linux
> shelby:/home/root/src# ip tunnel add mode ipip remote 10.16.0.164 local 
> 10.16.0.72 dev eth0
> shelby:/home/root/src# ./a.out tunl1
> shelby:/home/root/src# ip tunnel del tunl1
> 
> Message from syslogd@shelby at Dec 22 10:12:08 ...
>   kernel:[18459.828011] unregister_netdevice: waiting for tunl1 to become free. 
> Usage count = 3
> 

On another machine, and net-next-2.6, I reproduced the problem, so we
need the patch after all, sorry for the delay.

Thanks

[PATCH] ipv4: dont create routes on down devices

In ip_route_output_slow(), instead of allowing a route to be created on
a not UPed device, report -ENETUNREACH immediately.

# ip tunnel add mode ipip remote 10.16.0.164 local 
10.16.0.72 dev eth0
# (Note : tunl1 is down)
# ping -I tunl1 10.1.2.3
PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data.
(nothing)
# ./a.out tunl1
# ip tunnel del tunl1
Message from syslogd@shelby at Dec 22 10:12:08 ...
  kernel: unregister_netdevice: waiting for tunl1 to become free. 
Usage count = 3

After patch:
# ping -I tunl1 10.1.2.3
connect: Network is unreachable


Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/route.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d8b4f4d..f1defb7 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2562,9 +2562,10 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
 			goto out;
 
 		/* RACE: Check return value of inet_select_addr instead. */
-		if (rcu_dereference(dev_out->ip_ptr) == NULL)
-			goto out;	/* Wrong error code */
-
+		if (!(dev_out->flags & IFF_UP) || !__in_dev_get_rcu(dev_out)) {
+			err = -ENETUNREACH;
+			goto out;
+		}
 		if (ipv4_is_local_multicast(oldflp->fl4_dst) ||
 		    ipv4_is_lbcast(oldflp->fl4_dst)) {
 			if (!fl.fl4_src)



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: dont create routes on down devices
  2010-12-22 14:39                     ` [PATCH] ipv4: dont create routes on down devices Eric Dumazet
@ 2010-12-23  8:50                       ` Octavian Purdila
  2010-12-26  4:05                         ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Octavian Purdila @ 2010-12-23  8:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: nicolas.dichtel, David Miller, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wednesday 22 December 2010, 16:39:39

> [PATCH] ipv4: dont create routes on down devices
> 
> In ip_route_output_slow(), instead of allowing a route to be created on
> a not UPed device, report -ENETUNREACH immediately.
> 
> # ip tunnel add mode ipip remote 10.16.0.164 local
> 10.16.0.72 dev eth0
> # (Note : tunl1 is down)
> # ping -I tunl1 10.1.2.3
> PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data.
> (nothing)
> # ./a.out tunl1
> # ip tunnel del tunl1
> Message from syslogd@shelby at Dec 22 10:12:08 ...
>   kernel: unregister_netdevice: waiting for tunl1 to become free.
> Usage count = 3
> 
> After patch:
> # ping -I tunl1 10.1.2.3
> connect: Network is unreachable
> 
> 
> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Cc: Octavian Purdila <opurdila@ixiacom.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks Eric !

Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] ipv4: dont create routes on down devices
  2010-12-23  8:50                       ` Octavian Purdila
@ 2010-12-26  4:05                         ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2010-12-26  4:05 UTC (permalink / raw)
  To: opurdila; +Cc: eric.dumazet, nicolas.dichtel, netdev

From: Octavian Purdila <opurdila@ixiacom.com>
Date: Thu, 23 Dec 2010 10:50:25 +0200

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wednesday 22 December 2010, 16:39:39
> 
>> [PATCH] ipv4: dont create routes on down devices
>> 
>> In ip_route_output_slow(), instead of allowing a route to be created on
>> a not UPed device, report -ENETUNREACH immediately.
>> 
>> # ip tunnel add mode ipip remote 10.16.0.164 local
>> 10.16.0.72 dev eth0
>> # (Note : tunl1 is down)
>> # ping -I tunl1 10.1.2.3
>> PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data.
>> (nothing)
>> # ./a.out tunl1
>> # ip tunnel del tunl1
>> Message from syslogd@shelby at Dec 22 10:12:08 ...
>>   kernel: unregister_netdevice: waiting for tunl1 to become free.
>> Usage count = 3
>> 
>> After patch:
>> # ping -I tunl1 10.1.2.3
>> connect: Network is unreachable
>> 
>> 
>> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Cc: Octavian Purdila <opurdila@ixiacom.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Thanks Eric !
> 
> Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-12-26  4:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-28 15:24 [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Nicolas Dichtel
2010-09-28 16:33 ` Eric Dumazet
2010-09-28 16:45   ` Nicolas Dichtel
2010-09-28 16:56     ` Eric Dumazet
2010-09-29  7:49       ` Nicolas Dichtel
2010-09-29  8:35         ` Eric Dumazet
2010-09-29  9:18           ` Eric Dumazet
2010-09-30 11:49             ` Nicolas Dichtel
2010-12-22  8:32             ` Nicolas Dichtel
2010-12-22  9:55               ` Eric Dumazet
2010-12-22 10:07                 ` Eric Dumazet
2010-12-22 13:43                   ` Nicolas Dichtel
2010-12-22 14:39                     ` [PATCH] ipv4: dont create routes on down devices Eric Dumazet
2010-12-23  8:50                       ` Octavian Purdila
2010-12-26  4:05                         ` David Miller
2010-09-28 17:35   ` [PATCH] ipv4: remove all rt cache entries on UNREGISTER event Octavian Purdila

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.