* [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ @ 2022-02-09 8:19 Xin Long 2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Xin Long @ 2022-02-09 8:19 UTC (permalink / raw) To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan This issue can be simply reproduced by: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 So as to fix it, adjust vlan_dev_uninit() in Patch 1/1 so that it won't be called twice for the same device, then do the fix in vlan_dev_uninit() in Patch 2/2. Xin Long (2): vlan: introduce vlan_dev_free_egress_priority vlan: move dev_put into vlan_dev_uninit net/8021q/vlan.h | 2 +- net/8021q/vlan_dev.c | 15 +++++++++++---- net/8021q/vlan_netlink.c | 7 ++++--- 3 files changed, 16 insertions(+), 8 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority 2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long @ 2022-02-09 8:19 ` Xin Long 2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long 2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf 2 siblings, 0 replies; 12+ messages in thread From: Xin Long @ 2022-02-09 8:19 UTC (permalink / raw) To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan This patch is to introduce vlan_dev_free_egress_priority() to free egress priority for vlan dev, and keep vlan_dev_uninit() static as .ndo_uninit. It makes the code more clear and safer when adding new code in vlan_dev_uninit() in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/8021q/vlan.h | 2 +- net/8021q/vlan_dev.c | 7 ++++++- net/8021q/vlan_netlink.c | 7 ++++--- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index 1a705a4ef7fa..5eaf38875554 100644 --- a/net/8021q/vlan.h +++ b/net/8021q/vlan.h @@ -129,6 +129,7 @@ void vlan_dev_set_ingress_priority(const struct net_device *dev, u32 skb_prio, u16 vlan_prio); int vlan_dev_set_egress_priority(const struct net_device *dev, u32 skb_prio, u16 vlan_prio); +void vlan_dev_free_egress_priority(const struct net_device *dev); int vlan_dev_change_flags(const struct net_device *dev, u32 flag, u32 mask); void vlan_dev_get_realdev_name(const struct net_device *dev, char *result, size_t size); @@ -139,7 +140,6 @@ int vlan_check_real_dev(struct net_device *real_dev, void vlan_setup(struct net_device *dev); int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack); void unregister_vlan_dev(struct net_device *dev, struct list_head *head); -void vlan_dev_uninit(struct net_device *dev); bool vlan_dev_inherit_address(struct net_device *dev, struct net_device *real_dev); diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 26d031a43cc1..e5d23e75572a 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -622,7 +622,7 @@ static int vlan_dev_init(struct net_device *dev) } /* Note: this function might be called multiple times for the same device. */ -void vlan_dev_uninit(struct net_device *dev) +void vlan_dev_free_egress_priority(const struct net_device *dev) { struct vlan_priority_tci_mapping *pm; struct vlan_dev_priv *vlan = vlan_dev_priv(dev); @@ -636,6 +636,11 @@ void vlan_dev_uninit(struct net_device *dev) } } +static void vlan_dev_uninit(struct net_device *dev) +{ + vlan_dev_free_egress_priority(dev); +} + static netdev_features_t vlan_dev_fix_features(struct net_device *dev, netdev_features_t features) { diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c index 0db85aeb119b..53b1955b027f 100644 --- a/net/8021q/vlan_netlink.c +++ b/net/8021q/vlan_netlink.c @@ -183,10 +183,11 @@ static int vlan_newlink(struct net *src_net, struct net_device *dev, return -EINVAL; err = vlan_changelink(dev, tb, data, extack); - if (!err) - err = register_vlan_dev(dev, extack); if (err) - vlan_dev_uninit(dev); + return err; + err = register_vlan_dev(dev, extack); + if (err) + vlan_dev_free_egress_priority(dev); return err; } -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long 2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long @ 2022-02-09 8:19 ` Xin Long 2022-02-10 1:55 ` Jakub Kicinski 2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf 2 siblings, 1 reply; 12+ messages in thread From: Xin Long @ 2022-02-09 8:19 UTC (permalink / raw) To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan Shuang Li reported an QinQ issue by simply doing: # ip link add dummy0 type dummy # ip link add link dummy0 name dummy0.1 type vlan id 1 # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 # rmmod 8021q unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and release dummy0.1, as it delays dev_put until calling dev->priv_destructor, vlan_dev_free(). This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before netdev_wait_allrefs(). Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> --- net/8021q/vlan_dev.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index e5d23e75572a..d1902828a18a 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -638,7 +638,12 @@ void vlan_dev_free_egress_priority(const struct net_device *dev) static void vlan_dev_uninit(struct net_device *dev) { + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + vlan_dev_free_egress_priority(dev); + + /* Get rid of the vlan's reference to real_dev */ + dev_put_track(vlan->real_dev, &vlan->dev_tracker); } static netdev_features_t vlan_dev_fix_features(struct net_device *dev, @@ -851,9 +856,6 @@ static void vlan_dev_free(struct net_device *dev) free_percpu(vlan->vlan_pcpu_stats); vlan->vlan_pcpu_stats = NULL; - - /* Get rid of the vlan's reference to real_dev */ - dev_put_track(vlan->real_dev, &vlan->dev_tracker); } void vlan_setup(struct net_device *dev) -- 2.27.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long @ 2022-02-10 1:55 ` Jakub Kicinski 2022-02-10 3:40 ` Xin Long 0 siblings, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2022-02-10 1:55 UTC (permalink / raw) To: Xin Long; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan On Wed, 9 Feb 2022 03:19:56 -0500 Xin Long wrote: > Shuang Li reported an QinQ issue by simply doing: > > # ip link add dummy0 type dummy > # ip link add link dummy0 name dummy0.1 type vlan id 1 > # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 > # rmmod 8021q > > unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 How about we put this in a selftest under tools/testing/selftests/net/ or tools/testing/selftests/drivers/net/ ? > When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp > and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered > before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). > > When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of > NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, > due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in > netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and > release dummy0.1, as it delays dev_put until calling dev->priv_destructor, > vlan_dev_free(). > > This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in > vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into > vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before > netdev_wait_allrefs(). > > Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") As far as I understand this is pretty much a revert of the previous fix. Note that netdevice_event_work_handler() as seen in the backtrace in the commit message of the fix in question is called from a workqueue, so the ordering of netdev notifications saves us from nothing here. We can't start freeing state until all refs are gone. I think better fix would be to rewrite netdev_run_todo() to free the netdevs in any order they become ready. That's gonna solve any dependency problems and may even speed things up. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-10 1:55 ` Jakub Kicinski @ 2022-02-10 3:40 ` Xin Long 2022-02-10 5:28 ` Jakub Kicinski 0 siblings, 1 reply; 12+ messages in thread From: Xin Long @ 2022-02-10 3:40 UTC (permalink / raw) To: Jakub Kicinski; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan On Thu, Feb 10, 2022 at 9:56 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 9 Feb 2022 03:19:56 -0500 Xin Long wrote: > > Shuang Li reported an QinQ issue by simply doing: > > > > # ip link add dummy0 type dummy > > # ip link add link dummy0 name dummy0.1 type vlan id 1 > > # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 > > # rmmod 8021q > > > > unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1 > > How about we put this in a selftest under tools/testing/selftests/net/ > or tools/testing/selftests/drivers/net/ ? I will try. > > > When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp > > and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered > > before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links(). > > > > When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of > > NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However, > > due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in > > netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and > > release dummy0.1, as it delays dev_put until calling dev->priv_destructor, > > vlan_dev_free(). > > > > This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in > > vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into > > vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before > > netdev_wait_allrefs(). > > > > Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()") > > As far as I understand this is pretty much a revert of the previous fix. > Note that netdevice_event_work_handler() as seen in the backtrace in the > commit message of the fix in question is called from a workqueue, so the > ordering of netdev notifications saves us from nothing here. We can't > start freeing state until all refs are gone. understand. > > I think better fix would be to rewrite netdev_run_todo() to free the > netdevs in any order they become ready. That's gonna solve any > dependency problems and may even speed things up. > What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for vlan as before, and fix this problem by using for_each_netdev_reverse() in __rtnl_kill_links()? It will make sense as the late added dev should be deleted early when rtnl_link_unregister a rtnl_link_ops. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-10 3:40 ` Xin Long @ 2022-02-10 5:28 ` Jakub Kicinski 2022-02-10 5:49 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2022-02-10 5:28 UTC (permalink / raw) To: Xin Long; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan On Thu, 10 Feb 2022 11:40:42 +0800 Xin Long wrote: > > I think better fix would be to rewrite netdev_run_todo() to free the > > netdevs in any order they become ready. That's gonna solve any > > dependency problems and may even speed things up. > > What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for > vlan as before, and fix this problem by using for_each_netdev_reverse() > in __rtnl_kill_links()? > It will make sense as the late added dev should be deleted early when > rtnl_link_unregister a rtnl_link_ops. Feels like sooner or later we'll run into a scenario when reversing will cause a problem. Or some data structure will stop preserving the order. Do you reckon rewriting netdev_run_todo() will be a lot of effort or it's too risky? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-10 5:28 ` Jakub Kicinski @ 2022-02-10 5:49 ` Eric Dumazet 2022-02-10 5:59 ` Jakub Kicinski 2022-02-10 6:10 ` Eric Dumazet 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2022-02-10 5:49 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Xin Long, network dev, davem, Ziyang Xuan On Wed, Feb 9, 2022 at 9:28 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 10 Feb 2022 11:40:42 +0800 Xin Long wrote: > > > I think better fix would be to rewrite netdev_run_todo() to free the > > > netdevs in any order they become ready. That's gonna solve any > > > dependency problems and may even speed things up. > > > > What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for > > vlan as before, and fix this problem by using for_each_netdev_reverse() > > in __rtnl_kill_links()? > > It will make sense as the late added dev should be deleted early when > > rtnl_link_unregister a rtnl_link_ops. > > Feels like sooner or later we'll run into a scenario when reversing will > cause a problem. Or some data structure will stop preserving the order. > > Do you reckon rewriting netdev_run_todo() will be a lot of effort or > it's too risky? This is doable, and risky ;) BTW, I have the plan of generalizing blackhole_netdev for IPv6, meaning that we could perhaps get rid of the dependency about loopback dev, being the last device in a netns being dismantled. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-10 5:49 ` Eric Dumazet @ 2022-02-10 5:59 ` Jakub Kicinski 2022-02-11 7:58 ` Xin Long 2022-02-10 6:10 ` Eric Dumazet 1 sibling, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2022-02-10 5:59 UTC (permalink / raw) To: Eric Dumazet; +Cc: Xin Long, network dev, davem, Ziyang Xuan On Wed, 9 Feb 2022 21:49:51 -0800 Eric Dumazet wrote: > > Feels like sooner or later we'll run into a scenario when reversing will > > cause a problem. Or some data structure will stop preserving the order. > > > > Do you reckon rewriting netdev_run_todo() will be a lot of effort or > > it's too risky? > > This is doable, and risky ;) > > BTW, I have the plan of generalizing blackhole_netdev for IPv6, > meaning that we could perhaps get rid of the dependency > about loopback dev, being the last device in a netns being dismantled. Oh, I see.. I have no great ideas then, we may need to go back to zeroing vlan->real_dev and making sure the caller can deal with that. At least for the time being. Xin this was discussed at some length in response to the patch under Fixes. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-10 5:59 ` Jakub Kicinski @ 2022-02-11 7:58 ` Xin Long 2022-02-11 22:06 ` Jakub Kicinski 0 siblings, 1 reply; 12+ messages in thread From: Xin Long @ 2022-02-11 7:58 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Eric Dumazet, network dev, davem, Ziyang Xuan On Thu, Feb 10, 2022 at 1:59 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 9 Feb 2022 21:49:51 -0800 Eric Dumazet wrote: > > > Feels like sooner or later we'll run into a scenario when reversing will > > > cause a problem. Or some data structure will stop preserving the order. > > > > > > Do you reckon rewriting netdev_run_todo() will be a lot of effort or > > > it's too risky? > > > > This is doable, and risky ;) > > > > BTW, I have the plan of generalizing blackhole_netdev for IPv6, > > meaning that we could perhaps get rid of the dependency > > about loopback dev, being the last device in a netns being dismantled. > > Oh, I see.. > > I have no great ideas then, we may need to go back to zeroing > vlan->real_dev and making sure the caller can deal with that. > At least for the time being. Xin this was discussed at some > length in response to the patch under Fixes. Hi, Jakub, What if dev->real_dev is freed and zeroed *after* vlan_dev_real_dev() is called? This issue can still be triggered, right? I don't see any lock protecting this. > Feels like sooner or later we'll run into a scenario when reversing will > cause a problem. Or some data structure will stop preserving the order. I was checking a few places doing such batch devices freeing, and noticed that: In rtnl_group_dellink() and __rtnl_kill_links(), it's using for_each_netdev(), while in default_device_exit_batch(), it's using for_each_netdev_reverse(). shouldn't be in the same order all these places? If yes, which one is the right one to use? Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-11 7:58 ` Xin Long @ 2022-02-11 22:06 ` Jakub Kicinski 0 siblings, 0 replies; 12+ messages in thread From: Jakub Kicinski @ 2022-02-11 22:06 UTC (permalink / raw) To: Xin Long; +Cc: Eric Dumazet, network dev, davem, Ziyang Xuan On Fri, 11 Feb 2022 15:58:56 +0800 Xin Long wrote: > On Thu, Feb 10, 2022 at 1:59 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > This is doable, and risky ;) > > > > > > BTW, I have the plan of generalizing blackhole_netdev for IPv6, > > > meaning that we could perhaps get rid of the dependency > > > about loopback dev, being the last device in a netns being dismantled. > > > > Oh, I see.. > > > > I have no great ideas then, we may need to go back to zeroing > > vlan->real_dev and making sure the caller can deal with that. > > At least for the time being. Xin this was discussed at some > > length in response to the patch under Fixes. > > What if dev->real_dev is freed and zeroed *after* vlan_dev_real_dev() > is called? This issue can still be triggered, right? I don't see any lock > protecting this. Maybe the suggestion in the old thread was to NULL the pointer out before unregister is called. Which seems like a bad idea, as the netdev would already be impaired when unregister is called. > > Feels like sooner or later we'll run into a scenario when reversing will > > cause a problem. Or some data structure will stop preserving the order. > I was checking a few places doing such batch devices freeing, and noticed that: > In rtnl_group_dellink() and __rtnl_kill_links(), it's using for_each_netdev(), > while in default_device_exit_batch(), it's using for_each_netdev_reverse(). > > shouldn't be in the same order all these places? If yes, which one is the > right one to use? I don't know. Maybe this will work maybe it will cause a circular dependency with something else. Honestly, I don't have a simple solutions to offer. Jann Horn pointed out recently that our per-CPU netdev refs are themselves racy... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit 2022-02-10 5:49 ` Eric Dumazet 2022-02-10 5:59 ` Jakub Kicinski @ 2022-02-10 6:10 ` Eric Dumazet 1 sibling, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2022-02-10 6:10 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Xin Long, network dev, davem, Ziyang Xuan On Wed, Feb 9, 2022 at 9:49 PM Eric Dumazet <edumazet@google.com> wrote: > > BTW, I have the plan of generalizing blackhole_netdev for IPv6, > meaning that we could perhaps get rid of the dependency > about loopback dev, being the last device in a netns being dismantled. Untested patch to give more ideas: (Main incentive is that we are still chasing a unregister_device bug, that I now think is related to IPv6 idev leaking somewhere) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4f402bc38f056e08f3761e63a7bc7a51e54e9384..02d31d4fcab3b3d529c4fe3260216ecee1108e82 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -372,7 +372,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) ASSERT_RTNL(); - if (dev->mtu < IPV6_MIN_MTU) + if (dev->mtu < IPV6_MIN_MTU && dev != blackhole_netdev) return ERR_PTR(-EINVAL); ndev = kzalloc(sizeof(struct inet6_dev), GFP_KERNEL); @@ -400,21 +400,22 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) /* We refer to the device */ dev_hold_track(dev, &ndev->dev_tracker, GFP_KERNEL); - if (snmp6_alloc_dev(ndev) < 0) { - netdev_dbg(dev, "%s: cannot allocate memory for statistics\n", - __func__); - neigh_parms_release(&nd_tbl, ndev->nd_parms); - dev_put_track(dev, &ndev->dev_tracker); - kfree(ndev); - return ERR_PTR(err); - } + if (dev != blackhole_netdev) { + if (snmp6_alloc_dev(ndev) < 0) { + netdev_dbg(dev, "%s: cannot allocate memory for statistics\n", + __func__); + neigh_parms_release(&nd_tbl, ndev->nd_parms); + dev_put_track(dev, &ndev->dev_tracker); + kfree(ndev); + return ERR_PTR(err); + } - if (snmp6_register_dev(ndev) < 0) { - netdev_dbg(dev, "%s: cannot create /proc/net/dev_snmp6/%s\n", - __func__, dev->name); - goto err_release; + if (snmp6_register_dev(ndev) < 0) { + netdev_dbg(dev, "%s: cannot create /proc/net/dev_snmp6/%s\n", + __func__, dev->name); + goto err_release; + } } - /* One reference from device. */ refcount_set(&ndev->refcnt, 1); @@ -445,25 +446,28 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) ipv6_mc_init_dev(ndev); ndev->tstamp = jiffies; - err = addrconf_sysctl_register(ndev); - if (err) { - ipv6_mc_destroy_dev(ndev); - snmp6_unregister_dev(ndev); - goto err_release; + if (dev != blackhole_netdev) { + err = addrconf_sysctl_register(ndev); + if (err) { + ipv6_mc_destroy_dev(ndev); + snmp6_unregister_dev(ndev); + goto err_release; + } } /* protected by rtnl_lock */ rcu_assign_pointer(dev->ip6_ptr, ndev); - /* Join interface-local all-node multicast group */ - ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes); + if (dev != blackhole_netdev) { + /* Join interface-local all-node multicast group */ + ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes); - /* Join all-node multicast group */ - ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes); - - /* Join all-router multicast group if forwarding is set */ - if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST)) - ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters); + /* Join all-node multicast group */ + ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes); + /* Join all-router multicast group if forwarding is set */ + if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST)) + ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters); + } return ndev; err_release: @@ -7233,26 +7237,8 @@ int __init addrconf_init(void) goto out_nowq; } - /* The addrconf netdev notifier requires that loopback_dev - * has it's ipv6 private information allocated and setup - * before it can bring up and give link-local addresses - * to other devices which are up. - * - * Unfortunately, loopback_dev is not necessarily the first - * entry in the global dev_base list of net devices. In fact, - * it is likely to be the very last entry on that list. - * So this causes the notifier registry below to try and - * give link-local addresses to all devices besides loopback_dev - * first, then loopback_dev, which cases all the non-loopback_dev - * devices to fail to get a link-local address. - * - * So, as a temporary fix, allocate the ipv6 structure for - * loopback_dev first by hand. - * Longer term, all of the dependencies ipv6 has upon the loopback - * device and it being up should be removed. - */ rtnl_lock(); - idev = ipv6_add_dev(init_net.loopback_dev); + idev = ipv6_add_dev(blackhole_netdev); rtnl_unlock(); if (IS_ERR(idev)) { err = PTR_ERR(idev); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index f4884cda13b92e72d041680cbabfee2e07ec0f10..dc9d26a77c48f649ec39084c539d45b378474a35 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -160,12 +160,8 @@ void rt6_uncached_list_del(struct rt6_info *rt) static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev) { - struct net_device *loopback_dev = net->loopback_dev; int cpu; - if (dev == loopback_dev) - return; - for_each_possible_cpu(cpu) { struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu); struct rt6_info *rt; @@ -176,7 +172,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev) struct net_device *rt_dev = rt->dst.dev; if (rt_idev->dev == dev) { - rt->rt6i_idev = in6_dev_get(loopback_dev); + rt->rt6i_idev = in6_dev_get(blackhole_netdev); in6_dev_put(rt_idev); } @@ -373,13 +369,12 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev, { struct rt6_info *rt = (struct rt6_info *)dst; struct inet6_dev *idev = rt->rt6i_idev; - struct net_device *loopback_dev = - dev_net(dev)->loopback_dev; - if (idev && idev->dev != loopback_dev) { - struct inet6_dev *loopback_idev = in6_dev_get(loopback_dev); - if (loopback_idev) { - rt->rt6i_idev = loopback_idev; + if (idev && idev->dev != blackhole_netdev) { + struct inet6_dev *blackhole_idev = in6_dev_get(blackhole_netdev); + + if (blackhole_idev) { + rt->rt6i_idev = blackhole_idev; in6_dev_put(idev); } } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ 2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long 2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long 2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long @ 2022-02-09 13:40 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 12+ messages in thread From: patchwork-bot+netdevbpf @ 2022-02-09 13:40 UTC (permalink / raw) To: Xin Long; +Cc: netdev, davem, kuba, edumazet, william.xuanziyang Hello: This series was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Wed, 9 Feb 2022 03:19:54 -0500 you wrote: > This issue can be simply reproduced by: > > # ip link add dummy0 type dummy > # ip link add link dummy0 name dummy0.1 type vlan id 1 > # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2 > # rmmod 8021q > > [...] Here is the summary with links: - [net,1/2] vlan: introduce vlan_dev_free_egress_priority https://git.kernel.org/netdev/net/c/37aa50c539bc - [net,2/2] vlan: move dev_put into vlan_dev_uninit https://git.kernel.org/netdev/net/c/d6ff94afd90b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-02-11 22:06 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long 2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long 2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long 2022-02-10 1:55 ` Jakub Kicinski 2022-02-10 3:40 ` Xin Long 2022-02-10 5:28 ` Jakub Kicinski 2022-02-10 5:49 ` Eric Dumazet 2022-02-10 5:59 ` Jakub Kicinski 2022-02-11 7:58 ` Xin Long 2022-02-11 22:06 ` Jakub Kicinski 2022-02-10 6:10 ` Eric Dumazet 2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.