All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev
@ 2010-09-10 13:35 Nicolas Dichtel
  2010-09-10 14:24 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Dichtel @ 2010-09-10 13:35 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

Hi all,

We got a scalability problem when we try to remove a lot of virtual interfaces. 
After analysis, we found that a refcnt on a device was released too late.
Here is a proposal patch. If we are not missing something, the refcnt can be 
release before call_rcu(). In IPv6, this is already the case.

Comments are welcome.


Regards,
Nicolas

[-- Attachment #2: 0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch --]
[-- Type: text/x-diff, Size: 1830 bytes --]

>From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 10 Sep 2010 14:52:15 +0200
Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev

When a virtual device is removed, refcnt on dev is released
after rcu barrier, hence we fall always in the msleep(250)
of netdev_wait_allrefs(). This causes a long delay when
a lot of interfaces are removed.
Refcnt can be released before this rcu barrier, this allows
to accelerate the removing of virtual interfaces.

Test of removing 50 ipip tunnel interfaces:
 Before the patch:
  real    0m12.804s
  user    0m0.020s
  sys     0m0.000s

 After the patch:
  real    0m0.988s
  user    0m0.004s
  sys     0m0.016s

Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv4/devinet.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index da14c49..dd59e79 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -131,7 +131,9 @@ static inline void inet_free_ifa(struct in_ifaddr *ifa)
 
 void in_dev_finish_destroy(struct in_device *idev)
 {
+#ifdef NET_REFCNT_DEBUG
 	struct net_device *dev = idev->dev;
+#endif
 
 	WARN_ON(idev->ifa_list);
 	WARN_ON(idev->mc_list);
@@ -139,7 +141,6 @@ void in_dev_finish_destroy(struct in_device *idev)
 	printk(KERN_DEBUG "in_dev_finish_destroy: %p=%s\n",
 	       idev, dev ? dev->name : "NIL");
 #endif
-	dev_put(dev);
 	if (!idev->dead)
 		pr_err("Freeing alive in_device %p\n", idev);
 	else
@@ -215,6 +216,7 @@ static void inetdev_destroy(struct in_device *in_dev)
 	neigh_parms_release(&arp_tbl, in_dev->arp_parms);
 	arp_ifdown(dev);
 
+	dev_put(dev);
 	call_rcu(&in_dev->rcu_head, in_dev_rcu_put);
 }
 
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev
  2010-09-10 13:35 [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev Nicolas Dichtel
@ 2010-09-10 14:24 ` Eric Dumazet
  2010-09-10 14:57   ` Nicolas Dichtel
  2010-09-13 22:24   ` [PATCH] net: use rcu_barrier() in rollback_registered_many Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-09-10 14:24 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev

Le vendredi 10 septembre 2010 à 15:35 +0200, Nicolas Dichtel a écrit :
> Hi all,
> 
> We got a scalability problem when we try to remove a lot of virtual interfaces. 
> After analysis, we found that a refcnt on a device was released too late.
> Here is a proposal patch. If we are not missing something, the refcnt can be 
> release before call_rcu(). In IPv6, this is already the case.
> 
> Comments are welcome.
> 
> 
> Regards,
> Nicolas
> pièce jointe différences entre fichiers
> (0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch)
> From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Fri, 10 Sep 2010 14:52:15 +0200
> Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev
> 
> When a virtual device is removed, refcnt on dev is released
> after rcu barrier, hence we fall always in the msleep(250)
> of netdev_wait_allrefs(). This causes a long delay when
> a lot of interfaces are removed.
> Refcnt can be released before this rcu barrier, this allows
> to accelerate the removing of virtual interfaces.
> 
> Test of removing 50 ipip tunnel interfaces:
>  Before the patch:
>   real    0m12.804s
>   user    0m0.020s
>   sys     0m0.000s
> 
>  After the patch:
>   real    0m0.988s
>   user    0m0.004s
>   sys     0m0.016s
> 
> Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---

This is a well known problem, (many patches were sent some months ago)
but your patch is not the right solution.

As long as the idev is not yet freed, it can be used and we need to
access idev->dev




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev
  2010-09-10 14:24 ` Eric Dumazet
@ 2010-09-10 14:57   ` Nicolas Dichtel
  2010-09-10 15:16     ` Eric Dumazet
  2010-09-13 22:24   ` [PATCH] net: use rcu_barrier() in rollback_registered_many Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Nicolas Dichtel @ 2010-09-10 14:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Le 10.09.2010 16:24, Eric Dumazet a écrit :
> Le vendredi 10 septembre 2010 à 15:35 +0200, Nicolas Dichtel a écrit :
>> Hi all,
>>
>> We got a scalability problem when we try to remove a lot of virtual interfaces. 
>> After analysis, we found that a refcnt on a device was released too late.
>> Here is a proposal patch. If we are not missing something, the refcnt can be 
>> release before call_rcu(). In IPv6, this is already the case.
>>
>> Comments are welcome.
>>
>>
>> Regards,
>> Nicolas
>> pièce jointe différences entre fichiers
>> (0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch)
>> From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
>> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Date: Fri, 10 Sep 2010 14:52:15 +0200
>> Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev
>>
>> When a virtual device is removed, refcnt on dev is released
>> after rcu barrier, hence we fall always in the msleep(250)
>> of netdev_wait_allrefs(). This causes a long delay when
>> a lot of interfaces are removed.
>> Refcnt can be released before this rcu barrier, this allows
>> to accelerate the removing of virtual interfaces.
>>
>> Test of removing 50 ipip tunnel interfaces:
>>  Before the patch:
>>   real    0m12.804s
>>   user    0m0.020s
>>   sys     0m0.000s
>>
>>  After the patch:
>>   real    0m0.988s
>>   user    0m0.004s
>>   sys     0m0.016s
>>
>> Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
> 
> This is a well known problem, (many patches were sent some months ago)
> but your patch is not the right solution.
> 
> As long as the idev is not yet freed, it can be used and we need to
> access idev->dev

Is this not true in IPv6? What is the difference?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev
  2010-09-10 14:57   ` Nicolas Dichtel
@ 2010-09-10 15:16     ` Eric Dumazet
  2010-09-14 20:45       ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-09-10 15:16 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev

Le vendredi 10 septembre 2010 à 16:57 +0200, Nicolas Dichtel a écrit :

> Is this not true in IPv6? What is the difference?

It might be a bug on ipv6, who knows ?

Releasing a reference count, but not setting idev->dev to NULL is 
a sign something is wrong...


diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5bc893e..26e39bc 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -330,6 +330,7 @@ void in6_dev_finish_destroy(struct inet6_dev *idev)
 	printk(KERN_DEBUG "in6_dev_finish_destroy: %s\n", dev ? dev->name : "NIL");
 #endif
 	dev_put(dev);
+	idev->dev = NULL;
 	if (!idev->dead) {
 		pr_warning("Freeing alive inet6 device %p\n", idev);
 		return;



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] net: use rcu_barrier() in rollback_registered_many
  2010-09-10 14:24 ` Eric Dumazet
  2010-09-10 14:57   ` Nicolas Dichtel
@ 2010-09-13 22:24   ` Eric Dumazet
  2010-09-14 21:27     ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-09-13 22:24 UTC (permalink / raw)
  To: nicolas.dichtel, David Miller; +Cc: netdev, Octavian Purdila, Benjamin LaHaise

Le vendredi 10 septembre 2010 à 16:24 +0200, Eric Dumazet a écrit : 
> Le vendredi 10 septembre 2010 à 15:35 +0200, Nicolas Dichtel a écrit :
> > Hi all,
> > 
> > We got a scalability problem when we try to remove a lot of virtual interfaces. 
> > After analysis, we found that a refcnt on a device was released too late.
> > Here is a proposal patch. If we are not missing something, the refcnt can be 
> > release before call_rcu(). In IPv6, this is already the case.
> > 
> > Comments are welcome.
> > 
> > 
> > Regards,
> > Nicolas
> > pièce jointe différences entre fichiers
> > (0001-ipv4-release-dev-refcnt-early-when-destroying-inetd.patch)
> > From 6fe291ff56b1f94599dfaa57dfb0ed4c168b603f Mon Sep 17 00:00:00 2001
> > From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> > Date: Fri, 10 Sep 2010 14:52:15 +0200
> > Subject: [PATCH] ipv4: release dev refcnt early when destroying inetdev
> > 
> > When a virtual device is removed, refcnt on dev is released
> > after rcu barrier, hence we fall always in the msleep(250)
> > of netdev_wait_allrefs(). This causes a long delay when
> > a lot of interfaces are removed.
> > Refcnt can be released before this rcu barrier, this allows
> > to accelerate the removing of virtual interfaces.
> > 
> > Test of removing 50 ipip tunnel interfaces:
> >  Before the patch:
> >   real    0m12.804s
> >   user    0m0.020s
> >   sys     0m0.000s
> > 
> >  After the patch:
> >   real    0m0.988s
> >   user    0m0.004s
> >   sys     0m0.016s
> > 
> > Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> > ---
> 
> This is a well known problem, (many patches were sent some months ago)
> but your patch is not the right solution.
> 
> As long as the idev is not yet freed, it can be used and we need to
> access idev->dev
> 
> 

I believe I understood one problem.

In rollback_registered_many(), we call the inetdev_event() (and
inetdev_destroy() at line 4844 :

call_netdevice_notifiers(NETDEV_UNREGISTER, dev);

Then, we call synchronize_net() at line 4870

So by the time netdev_wait_allrefs() is called, we should have called
in_dev_finish_destroy() 

But using synchronize_net() is a bit wrong here : 

	"It waits until all pre-existing rcu readers have completed."

We have no guarantee all call_rcu() that we posted to dismantle the
device completed :

- If number of online cpus is 1, synchronize_net() is a no op
- If our thread migrates to another cpu, synchronize_net() can returns
  while old callbacks are not yet processed.

We should probably use rcu_barrier() instead, to wait for all
outstanding RCU callbacks to complete.

I also believe the order of netdevice notifiers is wrong (we dont set
priority), and that we should call fib_netdev_event() _before_
dst_dev_event(). This needs another patch.

Thanks

[PATCH] net: use rcu_barrier() in rollback_registered_many

netdev_wait_allrefs() waits that all references to a device vanishes.

It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.

Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.

time to remove 50 ipip tunnels on a UP machine :

before patch : real 11.910s
after patch : real 1.250s

Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fc2dc93..6de5a82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4867,7 +4867,7 @@ static void rollback_registered_many(struct list_head *head)
 	dev = list_first_entry(head, struct net_device, unreg_list);
 	call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
 
-	synchronize_net();
+	rcu_barrier();
 
 	list_for_each_entry(dev, head, unreg_list)
 		dev_put(dev);



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev
  2010-09-10 15:16     ` Eric Dumazet
@ 2010-09-14 20:45       ` David Miller
  2010-09-15  6:01         ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2010-09-14 20:45 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nicolas.dichtel, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 10 Sep 2010 17:16:14 +0200

> Le vendredi 10 septembre 2010 à 16:57 +0200, Nicolas Dichtel a écrit :
> 
>> Is this not true in IPv6? What is the difference?
> 
> It might be a bug on ipv6, who knows ?
> 
> Releasing a reference count, but not setting idev->dev to NULL is 
> a sign something is wrong...

If anything this is more of a BUG trap than a true correctness patch,
but either way if you want me to apply this please formally submit
this with a proper commit message and signoff, thanks!

> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 5bc893e..26e39bc 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -330,6 +330,7 @@ void in6_dev_finish_destroy(struct inet6_dev *idev)
>  	printk(KERN_DEBUG "in6_dev_finish_destroy: %s\n", dev ? dev->name : "NIL");
>  #endif
>  	dev_put(dev);
> +	idev->dev = NULL;
>  	if (!idev->dead) {
>  		pr_warning("Freeing alive inet6 device %p\n", idev);
>  		return;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net: use rcu_barrier() in rollback_registered_many
  2010-09-13 22:24   ` [PATCH] net: use rcu_barrier() in rollback_registered_many Eric Dumazet
@ 2010-09-14 21:27     ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2010-09-14 21:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: nicolas.dichtel, netdev, opurdila, bcrl

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 14 Sep 2010 00:24:54 +0200

> [PATCH] net: use rcu_barrier() in rollback_registered_many
> 
> netdev_wait_allrefs() waits that all references to a device vanishes.
> 
> It currently uses a _very_ pessimistic 250 ms delay between each probe.
> Some users reported that no more than 4 devices can be dismantled per
> second, this is a pretty serious problem for some setups.
> 
> Most of the time, a refcount is about to be released by an RCU callback,
> that is still in flight because rollback_registered_many() uses a
> synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
> number of online cpus is one, because synchronize_rcu() is then a no op.
> 
> time to remove 50 ipip tunnels on a UP machine :
> 
> before patch : real 11.910s
> after patch : real 1.250s
> 
> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Reported-by: Octavian Purdila <opurdila@ixiacom.com>
> Reported-by: Benjamin LaHaise <bcrl@kvack.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev
  2010-09-14 20:45       ` David Miller
@ 2010-09-15  6:01         ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-09-15  6:01 UTC (permalink / raw)
  To: David Miller; +Cc: nicolas.dichtel, netdev

Le mardi 14 septembre 2010 à 13:45 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 10 Sep 2010 17:16:14 +0200
> 
> > Le vendredi 10 septembre 2010 à 16:57 +0200, Nicolas Dichtel a écrit :
> > 
> >> Is this not true in IPv6? What is the difference?
> > 
> > It might be a bug on ipv6, who knows ?
> > 
> > Releasing a reference count, but not setting idev->dev to NULL is 
> > a sign something is wrong...
> 
> If anything this is more of a BUG trap than a true correctness patch,
> but either way if you want me to apply this please formally submit
> this with a proper commit message and signoff, thanks!

This was not a patch for inclusion, this was to show my point.
I suspect this will trigger NULL dereference at some points later...

I do think a correct patch would be to change ipv6 to mimic ipv4 way of
course. I'll take a look at this later.

Thanks



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-09-15  6:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-10 13:35 [RFC PATCH] ipv4: release dev refcnt early when destroying inetdev Nicolas Dichtel
2010-09-10 14:24 ` Eric Dumazet
2010-09-10 14:57   ` Nicolas Dichtel
2010-09-10 15:16     ` Eric Dumazet
2010-09-14 20:45       ` David Miller
2010-09-15  6:01         ` Eric Dumazet
2010-09-13 22:24   ` [PATCH] net: use rcu_barrier() in rollback_registered_many Eric Dumazet
2010-09-14 21:27     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.