All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ
@ 2022-02-09  8:19 Xin Long
  2022-02-09  8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Xin Long @ 2022-02-09  8:19 UTC (permalink / raw)
  To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan

This issue can be simply reproduced by:

  # ip link add dummy0 type dummy
  # ip link add link dummy0 name dummy0.1 type vlan id 1
  # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
  # rmmod 8021q

 unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1

So as to fix it, adjust vlan_dev_uninit() in Patch 1/1 so that it won't
be called twice for the same device, then do the fix in vlan_dev_uninit()
in Patch 2/2.

Xin Long (2):
  vlan: introduce vlan_dev_free_egress_priority
  vlan: move dev_put into vlan_dev_uninit

 net/8021q/vlan.h         |  2 +-
 net/8021q/vlan_dev.c     | 15 +++++++++++----
 net/8021q/vlan_netlink.c |  7 ++++---
 3 files changed, 16 insertions(+), 8 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority
  2022-02-09  8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
@ 2022-02-09  8:19 ` Xin Long
  2022-02-09  8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
  2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf
  2 siblings, 0 replies; 12+ messages in thread
From: Xin Long @ 2022-02-09  8:19 UTC (permalink / raw)
  To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan

This patch is to introduce vlan_dev_free_egress_priority() to
free egress priority for vlan dev, and keep vlan_dev_uninit()
static as .ndo_uninit. It makes the code more clear and safer
when adding new code in vlan_dev_uninit() in the future.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/8021q/vlan.h         | 2 +-
 net/8021q/vlan_dev.c     | 7 ++++++-
 net/8021q/vlan_netlink.c | 7 ++++---
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 1a705a4ef7fa..5eaf38875554 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -129,6 +129,7 @@ void vlan_dev_set_ingress_priority(const struct net_device *dev,
 				   u32 skb_prio, u16 vlan_prio);
 int vlan_dev_set_egress_priority(const struct net_device *dev,
 				 u32 skb_prio, u16 vlan_prio);
+void vlan_dev_free_egress_priority(const struct net_device *dev);
 int vlan_dev_change_flags(const struct net_device *dev, u32 flag, u32 mask);
 void vlan_dev_get_realdev_name(const struct net_device *dev, char *result,
 			       size_t size);
@@ -139,7 +140,6 @@ int vlan_check_real_dev(struct net_device *real_dev,
 void vlan_setup(struct net_device *dev);
 int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack);
 void unregister_vlan_dev(struct net_device *dev, struct list_head *head);
-void vlan_dev_uninit(struct net_device *dev);
 bool vlan_dev_inherit_address(struct net_device *dev,
 			      struct net_device *real_dev);
 
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 26d031a43cc1..e5d23e75572a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -622,7 +622,7 @@ static int vlan_dev_init(struct net_device *dev)
 }
 
 /* Note: this function might be called multiple times for the same device. */
-void vlan_dev_uninit(struct net_device *dev)
+void vlan_dev_free_egress_priority(const struct net_device *dev)
 {
 	struct vlan_priority_tci_mapping *pm;
 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
@@ -636,6 +636,11 @@ void vlan_dev_uninit(struct net_device *dev)
 	}
 }
 
+static void vlan_dev_uninit(struct net_device *dev)
+{
+	vlan_dev_free_egress_priority(dev);
+}
+
 static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
 	netdev_features_t features)
 {
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index 0db85aeb119b..53b1955b027f 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -183,10 +183,11 @@ static int vlan_newlink(struct net *src_net, struct net_device *dev,
 		return -EINVAL;
 
 	err = vlan_changelink(dev, tb, data, extack);
-	if (!err)
-		err = register_vlan_dev(dev, extack);
 	if (err)
-		vlan_dev_uninit(dev);
+		return err;
+	err = register_vlan_dev(dev, extack);
+	if (err)
+		vlan_dev_free_egress_priority(dev);
 	return err;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-09  8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
  2022-02-09  8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
@ 2022-02-09  8:19 ` Xin Long
  2022-02-10  1:55   ` Jakub Kicinski
  2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf
  2 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2022-02-09  8:19 UTC (permalink / raw)
  To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan

Shuang Li reported an QinQ issue by simply doing:

  # ip link add dummy0 type dummy
  # ip link add link dummy0 name dummy0.1 type vlan id 1
  # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
  # rmmod 8021q

 unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1

When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().

When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
vlan_dev_free().

This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
netdev_wait_allrefs().

Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/8021q/vlan_dev.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index e5d23e75572a..d1902828a18a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -638,7 +638,12 @@ void vlan_dev_free_egress_priority(const struct net_device *dev)
 
 static void vlan_dev_uninit(struct net_device *dev)
 {
+	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+
 	vlan_dev_free_egress_priority(dev);
+
+	/* Get rid of the vlan's reference to real_dev */
+	dev_put_track(vlan->real_dev, &vlan->dev_tracker);
 }
 
 static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
@@ -851,9 +856,6 @@ static void vlan_dev_free(struct net_device *dev)
 
 	free_percpu(vlan->vlan_pcpu_stats);
 	vlan->vlan_pcpu_stats = NULL;
-
-	/* Get rid of the vlan's reference to real_dev */
-	dev_put_track(vlan->real_dev, &vlan->dev_tracker);
 }
 
 void vlan_setup(struct net_device *dev)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ
  2022-02-09  8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
  2022-02-09  8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
  2022-02-09  8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
@ 2022-02-09 13:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-09 13:40 UTC (permalink / raw)
  To: Xin Long; +Cc: netdev, davem, kuba, edumazet, william.xuanziyang

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed,  9 Feb 2022 03:19:54 -0500 you wrote:
> This issue can be simply reproduced by:
> 
>   # ip link add dummy0 type dummy
>   # ip link add link dummy0 name dummy0.1 type vlan id 1
>   # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
>   # rmmod 8021q
> 
> [...]

Here is the summary with links:
  - [net,1/2] vlan: introduce vlan_dev_free_egress_priority
    https://git.kernel.org/netdev/net/c/37aa50c539bc
  - [net,2/2] vlan: move dev_put into vlan_dev_uninit
    https://git.kernel.org/netdev/net/c/d6ff94afd90b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-09  8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
@ 2022-02-10  1:55   ` Jakub Kicinski
  2022-02-10  3:40     ` Xin Long
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-10  1:55 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan

On Wed,  9 Feb 2022 03:19:56 -0500 Xin Long wrote:
> Shuang Li reported an QinQ issue by simply doing:
> 
>   # ip link add dummy0 type dummy
>   # ip link add link dummy0 name dummy0.1 type vlan id 1
>   # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
>   # rmmod 8021q
> 
>  unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1

How about we put this in a selftest under tools/testing/selftests/net/
or tools/testing/selftests/drivers/net/ ?

> When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
> and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
> before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
> 
> When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
> NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
> due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
> netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
> release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
> vlan_dev_free().
> 
> This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
> vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
> vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
> netdev_wait_allrefs().
> 
> Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")

As far as I understand this is pretty much a revert of the previous fix.
Note that netdevice_event_work_handler() as seen in the backtrace in the
commit message of the fix in question is called from a workqueue, so the
ordering of netdev notifications saves us from nothing here. We can't
start freeing state until all refs are gone.

I think better fix would be to rewrite netdev_run_todo() to free the
netdevs in any order they become ready. That's gonna solve any
dependency problems and may even speed things up.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-10  1:55   ` Jakub Kicinski
@ 2022-02-10  3:40     ` Xin Long
  2022-02-10  5:28       ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2022-02-10  3:40 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan

On Thu, Feb 10, 2022 at 9:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed,  9 Feb 2022 03:19:56 -0500 Xin Long wrote:
> > Shuang Li reported an QinQ issue by simply doing:
> >
> >   # ip link add dummy0 type dummy
> >   # ip link add link dummy0 name dummy0.1 type vlan id 1
> >   # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
> >   # rmmod 8021q
> >
> >  unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
>
> How about we put this in a selftest under tools/testing/selftests/net/
> or tools/testing/selftests/drivers/net/ ?
I will try.

>
> > When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
> > and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
> > before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
> >
> > When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
> > NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
> > due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
> > netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
> > release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
> > vlan_dev_free().
> >
> > This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
> > vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
> > vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
> > netdev_wait_allrefs().
> >
> > Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
>
> As far as I understand this is pretty much a revert of the previous fix.
> Note that netdevice_event_work_handler() as seen in the backtrace in the
> commit message of the fix in question is called from a workqueue, so the
> ordering of netdev notifications saves us from nothing here. We can't
> start freeing state until all refs are gone.
understand.

>
> I think better fix would be to rewrite netdev_run_todo() to free the
> netdevs in any order they become ready. That's gonna solve any
> dependency problems and may even speed things up.
>
What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for
vlan as before, and fix this problem by using for_each_netdev_reverse()
in __rtnl_kill_links()?
It will make sense as the late added dev should be deleted early when
rtnl_link_unregister a rtnl_link_ops.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-10  3:40     ` Xin Long
@ 2022-02-10  5:28       ` Jakub Kicinski
  2022-02-10  5:49         ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-10  5:28 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan

On Thu, 10 Feb 2022 11:40:42 +0800 Xin Long wrote:
> > I think better fix would be to rewrite netdev_run_todo() to free the
> > netdevs in any order they become ready. That's gonna solve any
> > dependency problems and may even speed things up.
>
> What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for
> vlan as before, and fix this problem by using for_each_netdev_reverse()
> in __rtnl_kill_links()?
> It will make sense as the late added dev should be deleted early when
> rtnl_link_unregister a rtnl_link_ops.

Feels like sooner or later we'll run into a scenario when reversing will
cause a problem. Or some data structure will stop preserving the order.

Do you reckon rewriting netdev_run_todo() will be a lot of effort or
it's too risky?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-10  5:28       ` Jakub Kicinski
@ 2022-02-10  5:49         ` Eric Dumazet
  2022-02-10  5:59           ` Jakub Kicinski
  2022-02-10  6:10           ` Eric Dumazet
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2022-02-10  5:49 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Xin Long, network dev, davem, Ziyang Xuan

On Wed, Feb 9, 2022 at 9:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 11:40:42 +0800 Xin Long wrote:
> > > I think better fix would be to rewrite netdev_run_todo() to free the
> > > netdevs in any order they become ready. That's gonna solve any
> > > dependency problems and may even speed things up.
> >
> > What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for
> > vlan as before, and fix this problem by using for_each_netdev_reverse()
> > in __rtnl_kill_links()?
> > It will make sense as the late added dev should be deleted early when
> > rtnl_link_unregister a rtnl_link_ops.
>
> Feels like sooner or later we'll run into a scenario when reversing will
> cause a problem. Or some data structure will stop preserving the order.
>
> Do you reckon rewriting netdev_run_todo() will be a lot of effort or
> it's too risky?

This is doable, and risky ;)

BTW, I have the plan of generalizing blackhole_netdev for IPv6,
meaning that we could perhaps get rid of the dependency
about loopback dev, being the last device in a netns being dismantled.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-10  5:49         ` Eric Dumazet
@ 2022-02-10  5:59           ` Jakub Kicinski
  2022-02-11  7:58             ` Xin Long
  2022-02-10  6:10           ` Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-10  5:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Xin Long, network dev, davem, Ziyang Xuan

On Wed, 9 Feb 2022 21:49:51 -0800 Eric Dumazet wrote:
> > Feels like sooner or later we'll run into a scenario when reversing will
> > cause a problem. Or some data structure will stop preserving the order.
> >
> > Do you reckon rewriting netdev_run_todo() will be a lot of effort or
> > it's too risky?  
> 
> This is doable, and risky ;)
> 
> BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> meaning that we could perhaps get rid of the dependency
> about loopback dev, being the last device in a netns being dismantled.

Oh, I see..

I have no great ideas then, we may need to go back to zeroing
vlan->real_dev and making sure the caller can deal with that.
At least for the time being. Xin this was discussed at some 
length in response to the patch under Fixes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-10  5:49         ` Eric Dumazet
  2022-02-10  5:59           ` Jakub Kicinski
@ 2022-02-10  6:10           ` Eric Dumazet
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2022-02-10  6:10 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Xin Long, network dev, davem, Ziyang Xuan

On Wed, Feb 9, 2022 at 9:49 PM Eric Dumazet <edumazet@google.com> wrote:

>
> BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> meaning that we could perhaps get rid of the dependency
> about loopback dev, being the last device in a netns being dismantled.

Untested patch to give more ideas:
(Main incentive is that we are still chasing a unregister_device bug,
that I now think is related to IPv6 idev leaking somewhere)


diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4f402bc38f056e08f3761e63a7bc7a51e54e9384..02d31d4fcab3b3d529c4fe3260216ecee1108e82
100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -372,7 +372,7 @@ static struct inet6_dev *ipv6_add_dev(struct
net_device *dev)

        ASSERT_RTNL();

-       if (dev->mtu < IPV6_MIN_MTU)
+       if (dev->mtu < IPV6_MIN_MTU && dev != blackhole_netdev)
                return ERR_PTR(-EINVAL);

        ndev = kzalloc(sizeof(struct inet6_dev), GFP_KERNEL);
@@ -400,21 +400,22 @@ static struct inet6_dev *ipv6_add_dev(struct
net_device *dev)
        /* We refer to the device */
        dev_hold_track(dev, &ndev->dev_tracker, GFP_KERNEL);

-       if (snmp6_alloc_dev(ndev) < 0) {
-               netdev_dbg(dev, "%s: cannot allocate memory for statistics\n",
-                          __func__);
-               neigh_parms_release(&nd_tbl, ndev->nd_parms);
-               dev_put_track(dev, &ndev->dev_tracker);
-               kfree(ndev);
-               return ERR_PTR(err);
-       }
+       if (dev != blackhole_netdev) {
+               if (snmp6_alloc_dev(ndev) < 0) {
+                       netdev_dbg(dev, "%s: cannot allocate memory
for statistics\n",
+                                  __func__);
+                       neigh_parms_release(&nd_tbl, ndev->nd_parms);
+                       dev_put_track(dev, &ndev->dev_tracker);
+                       kfree(ndev);
+                       return ERR_PTR(err);
+               }

-       if (snmp6_register_dev(ndev) < 0) {
-               netdev_dbg(dev, "%s: cannot create /proc/net/dev_snmp6/%s\n",
-                          __func__, dev->name);
-               goto err_release;
+               if (snmp6_register_dev(ndev) < 0) {
+                       netdev_dbg(dev, "%s: cannot create
/proc/net/dev_snmp6/%s\n",
+                                  __func__, dev->name);
+                       goto err_release;
+               }
        }
-
        /* One reference from device. */
        refcount_set(&ndev->refcnt, 1);

@@ -445,25 +446,28 @@ static struct inet6_dev *ipv6_add_dev(struct
net_device *dev)

        ipv6_mc_init_dev(ndev);
        ndev->tstamp = jiffies;
-       err = addrconf_sysctl_register(ndev);
-       if (err) {
-               ipv6_mc_destroy_dev(ndev);
-               snmp6_unregister_dev(ndev);
-               goto err_release;
+       if (dev != blackhole_netdev) {
+               err = addrconf_sysctl_register(ndev);
+               if (err) {
+                       ipv6_mc_destroy_dev(ndev);
+                       snmp6_unregister_dev(ndev);
+                       goto err_release;
+               }
        }
        /* protected by rtnl_lock */
        rcu_assign_pointer(dev->ip6_ptr, ndev);

-       /* Join interface-local all-node multicast group */
-       ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes);
+       if (dev != blackhole_netdev) {
+               /* Join interface-local all-node multicast group */
+               ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes);

-       /* Join all-node multicast group */
-       ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
-
-       /* Join all-router multicast group if forwarding is set */
-       if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST))
-               ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);
+               /* Join all-node multicast group */
+               ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);

+               /* Join all-router multicast group if forwarding is set */
+               if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST))
+                       ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);
+       }
        return ndev;

 err_release:
@@ -7233,26 +7237,8 @@ int __init addrconf_init(void)
                goto out_nowq;
        }

-       /* The addrconf netdev notifier requires that loopback_dev
-        * has it's ipv6 private information allocated and setup
-        * before it can bring up and give link-local addresses
-        * to other devices which are up.
-        *
-        * Unfortunately, loopback_dev is not necessarily the first
-        * entry in the global dev_base list of net devices.  In fact,
-        * it is likely to be the very last entry on that list.
-        * So this causes the notifier registry below to try and
-        * give link-local addresses to all devices besides loopback_dev
-        * first, then loopback_dev, which cases all the non-loopback_dev
-        * devices to fail to get a link-local address.
-        *
-        * So, as a temporary fix, allocate the ipv6 structure for
-        * loopback_dev first by hand.
-        * Longer term, all of the dependencies ipv6 has upon the loopback
-        * device and it being up should be removed.
-        */
        rtnl_lock();
-       idev = ipv6_add_dev(init_net.loopback_dev);
+       idev = ipv6_add_dev(blackhole_netdev);
        rtnl_unlock();
        if (IS_ERR(idev)) {
                err = PTR_ERR(idev);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4884cda13b92e72d041680cbabfee2e07ec0f10..dc9d26a77c48f649ec39084c539d45b378474a35
100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -160,12 +160,8 @@ void rt6_uncached_list_del(struct rt6_info *rt)

 static void rt6_uncached_list_flush_dev(struct net *net, struct
net_device *dev)
 {
-       struct net_device *loopback_dev = net->loopback_dev;
        int cpu;

-       if (dev == loopback_dev)
-               return;
-
        for_each_possible_cpu(cpu) {
                struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
                struct rt6_info *rt;
@@ -176,7 +172,7 @@ static void rt6_uncached_list_flush_dev(struct net
*net, struct net_device *dev)
                        struct net_device *rt_dev = rt->dst.dev;

                        if (rt_idev->dev == dev) {
-                               rt->rt6i_idev = in6_dev_get(loopback_dev);
+                               rt->rt6i_idev = in6_dev_get(blackhole_netdev);
                                in6_dev_put(rt_idev);
                        }

@@ -373,13 +369,12 @@ static void ip6_dst_ifdown(struct dst_entry
*dst, struct net_device *dev,
 {
        struct rt6_info *rt = (struct rt6_info *)dst;
        struct inet6_dev *idev = rt->rt6i_idev;
-       struct net_device *loopback_dev =
-               dev_net(dev)->loopback_dev;

-       if (idev && idev->dev != loopback_dev) {
-               struct inet6_dev *loopback_idev = in6_dev_get(loopback_dev);
-               if (loopback_idev) {
-                       rt->rt6i_idev = loopback_idev;
+       if (idev && idev->dev != blackhole_netdev) {
+               struct inet6_dev *blackhole_idev =
in6_dev_get(blackhole_netdev);
+
+               if (blackhole_idev) {
+                       rt->rt6i_idev = blackhole_idev;
                        in6_dev_put(idev);
                }
        }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-10  5:59           ` Jakub Kicinski
@ 2022-02-11  7:58             ` Xin Long
  2022-02-11 22:06               ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2022-02-11  7:58 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Eric Dumazet, network dev, davem, Ziyang Xuan

On Thu, Feb 10, 2022 at 1:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 9 Feb 2022 21:49:51 -0800 Eric Dumazet wrote:
> > > Feels like sooner or later we'll run into a scenario when reversing will
> > > cause a problem. Or some data structure will stop preserving the order.
> > >
> > > Do you reckon rewriting netdev_run_todo() will be a lot of effort or
> > > it's too risky?
> >
> > This is doable, and risky ;)
> >
> > BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> > meaning that we could perhaps get rid of the dependency
> > about loopback dev, being the last device in a netns being dismantled.
>
> Oh, I see..
>
> I have no great ideas then, we may need to go back to zeroing
> vlan->real_dev and making sure the caller can deal with that.
> At least for the time being. Xin this was discussed at some
> length in response to the patch under Fixes.
Hi, Jakub,

What if dev->real_dev is freed and zeroed *after* vlan_dev_real_dev()
is called? This issue can still be triggered, right? I don't see any lock
protecting this.

> Feels like sooner or later we'll run into a scenario when reversing will
> cause a problem. Or some data structure will stop preserving the order.
I was checking a few places doing such batch devices freeing, and noticed that:
In rtnl_group_dellink() and __rtnl_kill_links(), it's using for_each_netdev(),
while in default_device_exit_batch(), it's using for_each_netdev_reverse().

shouldn't be in the same order all these places? If yes, which one is the
right one to use?

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
  2022-02-11  7:58             ` Xin Long
@ 2022-02-11 22:06               ` Jakub Kicinski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-11 22:06 UTC (permalink / raw)
  To: Xin Long; +Cc: Eric Dumazet, network dev, davem, Ziyang Xuan

On Fri, 11 Feb 2022 15:58:56 +0800 Xin Long wrote:
> On Thu, Feb 10, 2022 at 1:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > This is doable, and risky ;)
> > >
> > > BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> > > meaning that we could perhaps get rid of the dependency
> > > about loopback dev, being the last device in a netns being dismantled.  
> >
> > Oh, I see..
> >
> > I have no great ideas then, we may need to go back to zeroing
> > vlan->real_dev and making sure the caller can deal with that.
> > At least for the time being. Xin this was discussed at some
> > length in response to the patch under Fixes.  
> 
> What if dev->real_dev is freed and zeroed *after* vlan_dev_real_dev()
> is called? This issue can still be triggered, right? I don't see any lock
> protecting this.

Maybe the suggestion in the old thread was to NULL the pointer out
before unregister is called. Which seems like a bad idea, as the 
netdev would already be impaired when unregister is called.

> > Feels like sooner or later we'll run into a scenario when reversing will
> > cause a problem. Or some data structure will stop preserving the order.  
> I was checking a few places doing such batch devices freeing, and noticed that:
> In rtnl_group_dellink() and __rtnl_kill_links(), it's using for_each_netdev(),
> while in default_device_exit_batch(), it's using for_each_netdev_reverse().
> 
> shouldn't be in the same order all these places? If yes, which one is the
> right one to use?

I don't know. Maybe this will work maybe it will cause a circular
dependency with something else.

Honestly, I don't have a simple solutions to offer. Jann Horn pointed
out recently that our per-CPU netdev refs are themselves racy...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-11 22:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09  8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
2022-02-09  8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
2022-02-09  8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
2022-02-10  1:55   ` Jakub Kicinski
2022-02-10  3:40     ` Xin Long
2022-02-10  5:28       ` Jakub Kicinski
2022-02-10  5:49         ` Eric Dumazet
2022-02-10  5:59           ` Jakub Kicinski
2022-02-11  7:58             ` Xin Long
2022-02-11 22:06               ` Jakub Kicinski
2022-02-10  6:10           ` Eric Dumazet
2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.