* [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ
@ 2022-02-09 8:19 Xin Long
2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Xin Long @ 2022-02-09 8:19 UTC (permalink / raw)
To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan
This issue can be simply reproduced by:
# ip link add dummy0 type dummy
# ip link add link dummy0 name dummy0.1 type vlan id 1
# ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
# rmmod 8021q
unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
So as to fix it, adjust vlan_dev_uninit() in Patch 1/1 so that it won't
be called twice for the same device, then do the fix in vlan_dev_uninit()
in Patch 2/2.
Xin Long (2):
vlan: introduce vlan_dev_free_egress_priority
vlan: move dev_put into vlan_dev_uninit
net/8021q/vlan.h | 2 +-
net/8021q/vlan_dev.c | 15 +++++++++++----
net/8021q/vlan_netlink.c | 7 ++++---
3 files changed, 16 insertions(+), 8 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority
2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
@ 2022-02-09 8:19 ` Xin Long
2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf
2 siblings, 0 replies; 12+ messages in thread
From: Xin Long @ 2022-02-09 8:19 UTC (permalink / raw)
To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan
This patch is to introduce vlan_dev_free_egress_priority() to
free egress priority for vlan dev, and keep vlan_dev_uninit()
static as .ndo_uninit. It makes the code more clear and safer
when adding new code in vlan_dev_uninit() in the future.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/8021q/vlan.h | 2 +-
net/8021q/vlan_dev.c | 7 ++++++-
net/8021q/vlan_netlink.c | 7 ++++---
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 1a705a4ef7fa..5eaf38875554 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -129,6 +129,7 @@ void vlan_dev_set_ingress_priority(const struct net_device *dev,
u32 skb_prio, u16 vlan_prio);
int vlan_dev_set_egress_priority(const struct net_device *dev,
u32 skb_prio, u16 vlan_prio);
+void vlan_dev_free_egress_priority(const struct net_device *dev);
int vlan_dev_change_flags(const struct net_device *dev, u32 flag, u32 mask);
void vlan_dev_get_realdev_name(const struct net_device *dev, char *result,
size_t size);
@@ -139,7 +140,6 @@ int vlan_check_real_dev(struct net_device *real_dev,
void vlan_setup(struct net_device *dev);
int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack);
void unregister_vlan_dev(struct net_device *dev, struct list_head *head);
-void vlan_dev_uninit(struct net_device *dev);
bool vlan_dev_inherit_address(struct net_device *dev,
struct net_device *real_dev);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 26d031a43cc1..e5d23e75572a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -622,7 +622,7 @@ static int vlan_dev_init(struct net_device *dev)
}
/* Note: this function might be called multiple times for the same device. */
-void vlan_dev_uninit(struct net_device *dev)
+void vlan_dev_free_egress_priority(const struct net_device *dev)
{
struct vlan_priority_tci_mapping *pm;
struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
@@ -636,6 +636,11 @@ void vlan_dev_uninit(struct net_device *dev)
}
}
+static void vlan_dev_uninit(struct net_device *dev)
+{
+ vlan_dev_free_egress_priority(dev);
+}
+
static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
netdev_features_t features)
{
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index 0db85aeb119b..53b1955b027f 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -183,10 +183,11 @@ static int vlan_newlink(struct net *src_net, struct net_device *dev,
return -EINVAL;
err = vlan_changelink(dev, tb, data, extack);
- if (!err)
- err = register_vlan_dev(dev, extack);
if (err)
- vlan_dev_uninit(dev);
+ return err;
+ err = register_vlan_dev(dev, extack);
+ if (err)
+ vlan_dev_free_egress_priority(dev);
return err;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
@ 2022-02-09 8:19 ` Xin Long
2022-02-10 1:55 ` Jakub Kicinski
2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf
2 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2022-02-09 8:19 UTC (permalink / raw)
To: network dev; +Cc: davem, kuba, Eric Dumazet, Ziyang Xuan
Shuang Li reported an QinQ issue by simply doing:
# ip link add dummy0 type dummy
# ip link add link dummy0 name dummy0.1 type vlan id 1
# ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
# rmmod 8021q
unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
vlan_dev_free().
This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
netdev_wait_allrefs().
Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/8021q/vlan_dev.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index e5d23e75572a..d1902828a18a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -638,7 +638,12 @@ void vlan_dev_free_egress_priority(const struct net_device *dev)
static void vlan_dev_uninit(struct net_device *dev)
{
+ struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+
vlan_dev_free_egress_priority(dev);
+
+ /* Get rid of the vlan's reference to real_dev */
+ dev_put_track(vlan->real_dev, &vlan->dev_tracker);
}
static netdev_features_t vlan_dev_fix_features(struct net_device *dev,
@@ -851,9 +856,6 @@ static void vlan_dev_free(struct net_device *dev)
free_percpu(vlan->vlan_pcpu_stats);
vlan->vlan_pcpu_stats = NULL;
-
- /* Get rid of the vlan's reference to real_dev */
- dev_put_track(vlan->real_dev, &vlan->dev_tracker);
}
void vlan_setup(struct net_device *dev)
--
2.27.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ
2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
@ 2022-02-09 13:40 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-09 13:40 UTC (permalink / raw)
To: Xin Long; +Cc: netdev, davem, kuba, edumazet, william.xuanziyang
Hello:
This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:
On Wed, 9 Feb 2022 03:19:54 -0500 you wrote:
> This issue can be simply reproduced by:
>
> # ip link add dummy0 type dummy
> # ip link add link dummy0 name dummy0.1 type vlan id 1
> # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
> # rmmod 8021q
>
> [...]
Here is the summary with links:
- [net,1/2] vlan: introduce vlan_dev_free_egress_priority
https://git.kernel.org/netdev/net/c/37aa50c539bc
- [net,2/2] vlan: move dev_put into vlan_dev_uninit
https://git.kernel.org/netdev/net/c/d6ff94afd90b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
@ 2022-02-10 1:55 ` Jakub Kicinski
2022-02-10 3:40 ` Xin Long
0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-10 1:55 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan
On Wed, 9 Feb 2022 03:19:56 -0500 Xin Long wrote:
> Shuang Li reported an QinQ issue by simply doing:
>
> # ip link add dummy0 type dummy
> # ip link add link dummy0 name dummy0.1 type vlan id 1
> # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
> # rmmod 8021q
>
> unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
How about we put this in a selftest under tools/testing/selftests/net/
or tools/testing/selftests/drivers/net/ ?
> When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
> and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
> before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
>
> When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
> NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
> due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
> netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
> release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
> vlan_dev_free().
>
> This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
> vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
> vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
> netdev_wait_allrefs().
>
> Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
As far as I understand this is pretty much a revert of the previous fix.
Note that netdevice_event_work_handler() as seen in the backtrace in the
commit message of the fix in question is called from a workqueue, so the
ordering of netdev notifications saves us from nothing here. We can't
start freeing state until all refs are gone.
I think better fix would be to rewrite netdev_run_todo() to free the
netdevs in any order they become ready. That's gonna solve any
dependency problems and may even speed things up.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-10 1:55 ` Jakub Kicinski
@ 2022-02-10 3:40 ` Xin Long
2022-02-10 5:28 ` Jakub Kicinski
0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2022-02-10 3:40 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan
On Thu, Feb 10, 2022 at 9:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 9 Feb 2022 03:19:56 -0500 Xin Long wrote:
> > Shuang Li reported an QinQ issue by simply doing:
> >
> > # ip link add dummy0 type dummy
> > # ip link add link dummy0 name dummy0.1 type vlan id 1
> > # ip link add link dummy0.1 name dummy0.1.2 type vlan id 2
> > # rmmod 8021q
> >
> > unregister_netdevice: waiting for dummy0.1 to become free. Usage count = 1
>
> How about we put this in a selftest under tools/testing/selftests/net/
> or tools/testing/selftests/drivers/net/ ?
I will try.
>
> > When rmmods 8021q, all vlan devs are deleted from their real_dev's vlan grp
> > and added into list_kill by unregister_vlan_dev(). dummy0.1 is unregistered
> > before dummy0.1.2, as it's using for_each_netdev() in __rtnl_kill_links().
> >
> > When unregisters dummy0.1, dummy0.1.2 is not unregistered in the event of
> > NETDEV_UNREGISTER, as it's been deleted from dummy0.1's vlan grp. However,
> > due to dummy0.1.2 still holding dummy0.1, dummy0.1 will keep waiting in
> > netdev_wait_allrefs(), while dummy0.1.2 will never get unregistered and
> > release dummy0.1, as it delays dev_put until calling dev->priv_destructor,
> > vlan_dev_free().
> >
> > This issue was introduced by Commit 563bcbae3ba2 ("net: vlan: fix a UAF in
> > vlan_dev_real_dev()"), and this patch is to fix it by moving dev_put() into
> > vlan_dev_uninit(), which is called after NETDEV_UNREGISTER event but before
> > netdev_wait_allrefs().
> >
> > Fixes: 563bcbae3ba2 ("net: vlan: fix a UAF in vlan_dev_real_dev()")
>
> As far as I understand this is pretty much a revert of the previous fix.
> Note that netdevice_event_work_handler() as seen in the backtrace in the
> commit message of the fix in question is called from a workqueue, so the
> ordering of netdev notifications saves us from nothing here. We can't
> start freeing state until all refs are gone.
understand.
>
> I think better fix would be to rewrite netdev_run_todo() to free the
> netdevs in any order they become ready. That's gonna solve any
> dependency problems and may even speed things up.
>
What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for
vlan as before, and fix this problem by using for_each_netdev_reverse()
in __rtnl_kill_links()?
It will make sense as the late added dev should be deleted early when
rtnl_link_unregister a rtnl_link_ops.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-10 3:40 ` Xin Long
@ 2022-02-10 5:28 ` Jakub Kicinski
2022-02-10 5:49 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-10 5:28 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, davem, Eric Dumazet, Ziyang Xuan
On Thu, 10 Feb 2022 11:40:42 +0800 Xin Long wrote:
> > I think better fix would be to rewrite netdev_run_todo() to free the
> > netdevs in any order they become ready. That's gonna solve any
> > dependency problems and may even speed things up.
>
> What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for
> vlan as before, and fix this problem by using for_each_netdev_reverse()
> in __rtnl_kill_links()?
> It will make sense as the late added dev should be deleted early when
> rtnl_link_unregister a rtnl_link_ops.
Feels like sooner or later we'll run into a scenario when reversing will
cause a problem. Or some data structure will stop preserving the order.
Do you reckon rewriting netdev_run_todo() will be a lot of effort or
it's too risky?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-10 5:28 ` Jakub Kicinski
@ 2022-02-10 5:49 ` Eric Dumazet
2022-02-10 5:59 ` Jakub Kicinski
2022-02-10 6:10 ` Eric Dumazet
0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2022-02-10 5:49 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Xin Long, network dev, davem, Ziyang Xuan
On Wed, Feb 9, 2022 at 9:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 11:40:42 +0800 Xin Long wrote:
> > > I think better fix would be to rewrite netdev_run_todo() to free the
> > > netdevs in any order they become ready. That's gonna solve any
> > > dependency problems and may even speed things up.
> >
> > What about I keep dev_put() in dev->priv_destructor()/vlan_dev_free() for
> > vlan as before, and fix this problem by using for_each_netdev_reverse()
> > in __rtnl_kill_links()?
> > It will make sense as the late added dev should be deleted early when
> > rtnl_link_unregister a rtnl_link_ops.
>
> Feels like sooner or later we'll run into a scenario when reversing will
> cause a problem. Or some data structure will stop preserving the order.
>
> Do you reckon rewriting netdev_run_todo() will be a lot of effort or
> it's too risky?
This is doable, and risky ;)
BTW, I have the plan of generalizing blackhole_netdev for IPv6,
meaning that we could perhaps get rid of the dependency
about loopback dev, being the last device in a netns being dismantled.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-10 5:49 ` Eric Dumazet
@ 2022-02-10 5:59 ` Jakub Kicinski
2022-02-11 7:58 ` Xin Long
2022-02-10 6:10 ` Eric Dumazet
1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-10 5:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Xin Long, network dev, davem, Ziyang Xuan
On Wed, 9 Feb 2022 21:49:51 -0800 Eric Dumazet wrote:
> > Feels like sooner or later we'll run into a scenario when reversing will
> > cause a problem. Or some data structure will stop preserving the order.
> >
> > Do you reckon rewriting netdev_run_todo() will be a lot of effort or
> > it's too risky?
>
> This is doable, and risky ;)
>
> BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> meaning that we could perhaps get rid of the dependency
> about loopback dev, being the last device in a netns being dismantled.
Oh, I see..
I have no great ideas then, we may need to go back to zeroing
vlan->real_dev and making sure the caller can deal with that.
At least for the time being. Xin this was discussed at some
length in response to the patch under Fixes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-10 5:49 ` Eric Dumazet
2022-02-10 5:59 ` Jakub Kicinski
@ 2022-02-10 6:10 ` Eric Dumazet
1 sibling, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2022-02-10 6:10 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Xin Long, network dev, davem, Ziyang Xuan
On Wed, Feb 9, 2022 at 9:49 PM Eric Dumazet <edumazet@google.com> wrote:
>
> BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> meaning that we could perhaps get rid of the dependency
> about loopback dev, being the last device in a netns being dismantled.
Untested patch to give more ideas:
(Main incentive is that we are still chasing a unregister_device bug,
that I now think is related to IPv6 idev leaking somewhere)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4f402bc38f056e08f3761e63a7bc7a51e54e9384..02d31d4fcab3b3d529c4fe3260216ecee1108e82
100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -372,7 +372,7 @@ static struct inet6_dev *ipv6_add_dev(struct
net_device *dev)
ASSERT_RTNL();
- if (dev->mtu < IPV6_MIN_MTU)
+ if (dev->mtu < IPV6_MIN_MTU && dev != blackhole_netdev)
return ERR_PTR(-EINVAL);
ndev = kzalloc(sizeof(struct inet6_dev), GFP_KERNEL);
@@ -400,21 +400,22 @@ static struct inet6_dev *ipv6_add_dev(struct
net_device *dev)
/* We refer to the device */
dev_hold_track(dev, &ndev->dev_tracker, GFP_KERNEL);
- if (snmp6_alloc_dev(ndev) < 0) {
- netdev_dbg(dev, "%s: cannot allocate memory for statistics\n",
- __func__);
- neigh_parms_release(&nd_tbl, ndev->nd_parms);
- dev_put_track(dev, &ndev->dev_tracker);
- kfree(ndev);
- return ERR_PTR(err);
- }
+ if (dev != blackhole_netdev) {
+ if (snmp6_alloc_dev(ndev) < 0) {
+ netdev_dbg(dev, "%s: cannot allocate memory
for statistics\n",
+ __func__);
+ neigh_parms_release(&nd_tbl, ndev->nd_parms);
+ dev_put_track(dev, &ndev->dev_tracker);
+ kfree(ndev);
+ return ERR_PTR(err);
+ }
- if (snmp6_register_dev(ndev) < 0) {
- netdev_dbg(dev, "%s: cannot create /proc/net/dev_snmp6/%s\n",
- __func__, dev->name);
- goto err_release;
+ if (snmp6_register_dev(ndev) < 0) {
+ netdev_dbg(dev, "%s: cannot create
/proc/net/dev_snmp6/%s\n",
+ __func__, dev->name);
+ goto err_release;
+ }
}
-
/* One reference from device. */
refcount_set(&ndev->refcnt, 1);
@@ -445,25 +446,28 @@ static struct inet6_dev *ipv6_add_dev(struct
net_device *dev)
ipv6_mc_init_dev(ndev);
ndev->tstamp = jiffies;
- err = addrconf_sysctl_register(ndev);
- if (err) {
- ipv6_mc_destroy_dev(ndev);
- snmp6_unregister_dev(ndev);
- goto err_release;
+ if (dev != blackhole_netdev) {
+ err = addrconf_sysctl_register(ndev);
+ if (err) {
+ ipv6_mc_destroy_dev(ndev);
+ snmp6_unregister_dev(ndev);
+ goto err_release;
+ }
}
/* protected by rtnl_lock */
rcu_assign_pointer(dev->ip6_ptr, ndev);
- /* Join interface-local all-node multicast group */
- ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes);
+ if (dev != blackhole_netdev) {
+ /* Join interface-local all-node multicast group */
+ ipv6_dev_mc_inc(dev, &in6addr_interfacelocal_allnodes);
- /* Join all-node multicast group */
- ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
-
- /* Join all-router multicast group if forwarding is set */
- if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST))
- ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);
+ /* Join all-node multicast group */
+ ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
+ /* Join all-router multicast group if forwarding is set */
+ if (ndev->cnf.forwarding && (dev->flags & IFF_MULTICAST))
+ ipv6_dev_mc_inc(dev, &in6addr_linklocal_allrouters);
+ }
return ndev;
err_release:
@@ -7233,26 +7237,8 @@ int __init addrconf_init(void)
goto out_nowq;
}
- /* The addrconf netdev notifier requires that loopback_dev
- * has it's ipv6 private information allocated and setup
- * before it can bring up and give link-local addresses
- * to other devices which are up.
- *
- * Unfortunately, loopback_dev is not necessarily the first
- * entry in the global dev_base list of net devices. In fact,
- * it is likely to be the very last entry on that list.
- * So this causes the notifier registry below to try and
- * give link-local addresses to all devices besides loopback_dev
- * first, then loopback_dev, which cases all the non-loopback_dev
- * devices to fail to get a link-local address.
- *
- * So, as a temporary fix, allocate the ipv6 structure for
- * loopback_dev first by hand.
- * Longer term, all of the dependencies ipv6 has upon the loopback
- * device and it being up should be removed.
- */
rtnl_lock();
- idev = ipv6_add_dev(init_net.loopback_dev);
+ idev = ipv6_add_dev(blackhole_netdev);
rtnl_unlock();
if (IS_ERR(idev)) {
err = PTR_ERR(idev);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4884cda13b92e72d041680cbabfee2e07ec0f10..dc9d26a77c48f649ec39084c539d45b378474a35
100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -160,12 +160,8 @@ void rt6_uncached_list_del(struct rt6_info *rt)
static void rt6_uncached_list_flush_dev(struct net *net, struct
net_device *dev)
{
- struct net_device *loopback_dev = net->loopback_dev;
int cpu;
- if (dev == loopback_dev)
- return;
-
for_each_possible_cpu(cpu) {
struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
struct rt6_info *rt;
@@ -176,7 +172,7 @@ static void rt6_uncached_list_flush_dev(struct net
*net, struct net_device *dev)
struct net_device *rt_dev = rt->dst.dev;
if (rt_idev->dev == dev) {
- rt->rt6i_idev = in6_dev_get(loopback_dev);
+ rt->rt6i_idev = in6_dev_get(blackhole_netdev);
in6_dev_put(rt_idev);
}
@@ -373,13 +369,12 @@ static void ip6_dst_ifdown(struct dst_entry
*dst, struct net_device *dev,
{
struct rt6_info *rt = (struct rt6_info *)dst;
struct inet6_dev *idev = rt->rt6i_idev;
- struct net_device *loopback_dev =
- dev_net(dev)->loopback_dev;
- if (idev && idev->dev != loopback_dev) {
- struct inet6_dev *loopback_idev = in6_dev_get(loopback_dev);
- if (loopback_idev) {
- rt->rt6i_idev = loopback_idev;
+ if (idev && idev->dev != blackhole_netdev) {
+ struct inet6_dev *blackhole_idev =
in6_dev_get(blackhole_netdev);
+
+ if (blackhole_idev) {
+ rt->rt6i_idev = blackhole_idev;
in6_dev_put(idev);
}
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-10 5:59 ` Jakub Kicinski
@ 2022-02-11 7:58 ` Xin Long
2022-02-11 22:06 ` Jakub Kicinski
0 siblings, 1 reply; 12+ messages in thread
From: Xin Long @ 2022-02-11 7:58 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Eric Dumazet, network dev, davem, Ziyang Xuan
On Thu, Feb 10, 2022 at 1:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 9 Feb 2022 21:49:51 -0800 Eric Dumazet wrote:
> > > Feels like sooner or later we'll run into a scenario when reversing will
> > > cause a problem. Or some data structure will stop preserving the order.
> > >
> > > Do you reckon rewriting netdev_run_todo() will be a lot of effort or
> > > it's too risky?
> >
> > This is doable, and risky ;)
> >
> > BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> > meaning that we could perhaps get rid of the dependency
> > about loopback dev, being the last device in a netns being dismantled.
>
> Oh, I see..
>
> I have no great ideas then, we may need to go back to zeroing
> vlan->real_dev and making sure the caller can deal with that.
> At least for the time being. Xin this was discussed at some
> length in response to the patch under Fixes.
Hi, Jakub,
What if dev->real_dev is freed and zeroed *after* vlan_dev_real_dev()
is called? This issue can still be triggered, right? I don't see any lock
protecting this.
> Feels like sooner or later we'll run into a scenario when reversing will
> cause a problem. Or some data structure will stop preserving the order.
I was checking a few places doing such batch devices freeing, and noticed that:
In rtnl_group_dellink() and __rtnl_kill_links(), it's using for_each_netdev(),
while in default_device_exit_batch(), it's using for_each_netdev_reverse().
shouldn't be in the same order all these places? If yes, which one is the
right one to use?
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit
2022-02-11 7:58 ` Xin Long
@ 2022-02-11 22:06 ` Jakub Kicinski
0 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2022-02-11 22:06 UTC (permalink / raw)
To: Xin Long; +Cc: Eric Dumazet, network dev, davem, Ziyang Xuan
On Fri, 11 Feb 2022 15:58:56 +0800 Xin Long wrote:
> On Thu, Feb 10, 2022 at 1:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > This is doable, and risky ;)
> > >
> > > BTW, I have the plan of generalizing blackhole_netdev for IPv6,
> > > meaning that we could perhaps get rid of the dependency
> > > about loopback dev, being the last device in a netns being dismantled.
> >
> > Oh, I see..
> >
> > I have no great ideas then, we may need to go back to zeroing
> > vlan->real_dev and making sure the caller can deal with that.
> > At least for the time being. Xin this was discussed at some
> > length in response to the patch under Fixes.
>
> What if dev->real_dev is freed and zeroed *after* vlan_dev_real_dev()
> is called? This issue can still be triggered, right? I don't see any lock
> protecting this.
Maybe the suggestion in the old thread was to NULL the pointer out
before unregister is called. Which seems like a bad idea, as the
netdev would already be impaired when unregister is called.
> > Feels like sooner or later we'll run into a scenario when reversing will
> > cause a problem. Or some data structure will stop preserving the order.
> I was checking a few places doing such batch devices freeing, and noticed that:
> In rtnl_group_dellink() and __rtnl_kill_links(), it's using for_each_netdev(),
> while in default_device_exit_batch(), it's using for_each_netdev_reverse().
>
> shouldn't be in the same order all these places? If yes, which one is the
> right one to use?
I don't know. Maybe this will work maybe it will cause a circular
dependency with something else.
Honestly, I don't have a simple solutions to offer. Jann Horn pointed
out recently that our per-CPU netdev refs are themselves racy...
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-02-11 22:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 8:19 [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ Xin Long
2022-02-09 8:19 ` [PATCH net 1/2] vlan: introduce vlan_dev_free_egress_priority Xin Long
2022-02-09 8:19 ` [PATCH net 2/2] vlan: move dev_put into vlan_dev_uninit Xin Long
2022-02-10 1:55 ` Jakub Kicinski
2022-02-10 3:40 ` Xin Long
2022-02-10 5:28 ` Jakub Kicinski
2022-02-10 5:49 ` Eric Dumazet
2022-02-10 5:59 ` Jakub Kicinski
2022-02-11 7:58 ` Xin Long
2022-02-11 22:06 ` Jakub Kicinski
2022-02-10 6:10 ` Eric Dumazet
2022-02-09 13:40 ` [PATCH net 0/2] vlan: fix a netdev refcnt leak for QinQ patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.