* [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() @ 2021-11-02 2:12 Ziyang Xuan 2021-11-03 13:50 ` Jason Gunthorpe ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Ziyang Xuan @ 2021-11-02 2:12 UTC (permalink / raw) To: davem, kuba; +Cc: jgg, netdev, linux-kernel The real_dev of a vlan net_device may be freed after unregister_vlan_dev(). Access the real_dev continually by vlan_dev_real_dev() will trigger the UAF problem for the real_dev like following: ================================================================== BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120 Call Trace: kasan_report.cold+0x83/0xdf vlan_dev_real_dev+0xf9/0x120 is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0 is_eth_port_of_netdev_filter+0x28/0x40 ib_enum_roce_netdev+0x1a3/0x300 ib_enum_all_roce_netdevs+0xc7/0x140 netdevice_event_work_handler+0x9d/0x210 ... Freed by task 9288: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xfc/0x130 slab_free_freelist_hook+0xdd/0x240 kfree+0xe4/0x690 kvfree+0x42/0x50 device_release+0x9f/0x240 kobject_put+0x1c8/0x530 put_device+0x1b/0x30 free_netdev+0x370/0x540 ppp_destroy_interface+0x313/0x3d0 ... Move the put_device(real_dev) to vlan_dev_free(). Ensure real_dev not be freed before vlan_dev unregistered. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> --- net/8021q/vlan.c | 3 --- net/8021q/vlan_dev.c | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 55275ef9a31a..a3a0a5e994f5 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) } vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); - - /* Get rid of the vlan's reference to real_dev */ - dev_put(real_dev); } int vlan_check_real_dev(struct net_device *real_dev, diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 0c21d1fec852..aeeb5f90417b 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev) free_percpu(vlan->vlan_pcpu_stats); vlan->vlan_pcpu_stats = NULL; + + /* Get rid of the vlan's reference to real_dev */ + dev_put(vlan->real_dev); } void vlan_setup(struct net_device *dev) -- 2.25.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-02 2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan @ 2021-11-03 13:50 ` Jason Gunthorpe 2021-11-03 15:47 ` Jakub Kicinski 2021-11-03 14:30 ` patchwork-bot+netdevbpf 2021-11-15 17:04 ` Petr Machata 2 siblings, 1 reply; 18+ messages in thread From: Jason Gunthorpe @ 2021-11-03 13:50 UTC (permalink / raw) To: Ziyang Xuan; +Cc: davem, kuba, netdev, linux-kernel On Tue, Nov 02, 2021 at 10:12:18AM +0800, Ziyang Xuan wrote: > The real_dev of a vlan net_device may be freed after > unregister_vlan_dev(). Access the real_dev continually by > vlan_dev_real_dev() will trigger the UAF problem for the > real_dev like following: > > ================================================================== > BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120 > Call Trace: > kasan_report.cold+0x83/0xdf > vlan_dev_real_dev+0xf9/0x120 > is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0 > is_eth_port_of_netdev_filter+0x28/0x40 > ib_enum_roce_netdev+0x1a3/0x300 > ib_enum_all_roce_netdevs+0xc7/0x140 > netdevice_event_work_handler+0x9d/0x210 > ... > > Freed by task 9288: > kasan_save_stack+0x1b/0x40 > kasan_set_track+0x1c/0x30 > kasan_set_free_info+0x20/0x30 > __kasan_slab_free+0xfc/0x130 > slab_free_freelist_hook+0xdd/0x240 > kfree+0xe4/0x690 > kvfree+0x42/0x50 > device_release+0x9f/0x240 > kobject_put+0x1c8/0x530 > put_device+0x1b/0x30 > free_netdev+0x370/0x540 > ppp_destroy_interface+0x313/0x3d0 > ... > > Move the put_device(real_dev) to vlan_dev_free(). Ensure > real_dev not be freed before vlan_dev unregistered. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > --- > net/8021q/vlan.c | 3 --- > net/8021q/vlan_dev.c | 3 +++ > 2 files changed, 3 insertions(+), 3 deletions(-) Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> (though I can't tell either if there is a possiblecircular dep problem in some oddball case) Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-03 13:50 ` Jason Gunthorpe @ 2021-11-03 15:47 ` Jakub Kicinski 2021-11-03 16:11 ` Jason Gunthorpe 0 siblings, 1 reply; 18+ messages in thread From: Jakub Kicinski @ 2021-11-03 15:47 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: Ziyang Xuan, davem, netdev, linux-kernel On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote: > (though I can't tell either if there is a possiblecircular dep problem > in some oddball case) Same, luckily we're just starting a new dev cycle and syzbot can have at it. We should probably not let this patch get into stable right away - assuming you agree - would you mind nacking the selection if it happens? I'm not sure I'll get CCed since it doesn't have my tags. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-03 15:47 ` Jakub Kicinski @ 2021-11-03 16:11 ` Jason Gunthorpe 0 siblings, 0 replies; 18+ messages in thread From: Jason Gunthorpe @ 2021-11-03 16:11 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Ziyang Xuan, davem, netdev, linux-kernel On Wed, Nov 03, 2021 at 08:47:46AM -0700, Jakub Kicinski wrote: > On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote: > > (though I can't tell either if there is a possiblecircular dep problem > > in some oddball case) > > Same, luckily we're just starting a new dev cycle and syzbot can have > at it. > > We should probably not let this patch get into stable right away - > assuming you agree - would you mind nacking the selection if it happens? > I'm not sure I'll get CCed since it doesn't have my tags. I will make an effort, sure. It is hard to be confident due to all the stable selection emails I get :| Thanks, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-02 2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan 2021-11-03 13:50 ` Jason Gunthorpe @ 2021-11-03 14:30 ` patchwork-bot+netdevbpf 2021-11-15 17:04 ` Petr Machata 2 siblings, 0 replies; 18+ messages in thread From: patchwork-bot+netdevbpf @ 2021-11-03 14:30 UTC (permalink / raw) To: Ziyang Xuan; +Cc: davem, kuba, jgg, netdev, linux-kernel Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Tue, 2 Nov 2021 10:12:18 +0800 you wrote: > The real_dev of a vlan net_device may be freed after > unregister_vlan_dev(). Access the real_dev continually by > vlan_dev_real_dev() will trigger the UAF problem for the > real_dev like following: > > ================================================================== > BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120 > Call Trace: > kasan_report.cold+0x83/0xdf > vlan_dev_real_dev+0xf9/0x120 > is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0 > is_eth_port_of_netdev_filter+0x28/0x40 > ib_enum_roce_netdev+0x1a3/0x300 > ib_enum_all_roce_netdevs+0xc7/0x140 > netdevice_event_work_handler+0x9d/0x210 > ... > > [...] Here is the summary with links: - [net,v2] net: vlan: fix a UAF in vlan_dev_real_dev() https://git.kernel.org/netdev/net/c/563bcbae3ba2 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-02 2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan 2021-11-03 13:50 ` Jason Gunthorpe 2021-11-03 14:30 ` patchwork-bot+netdevbpf @ 2021-11-15 17:04 ` Petr Machata 2021-11-15 17:49 ` Jakub Kicinski 2021-11-19 3:04 ` Ziyang Xuan (William) 2 siblings, 2 replies; 18+ messages in thread From: Petr Machata @ 2021-11-15 17:04 UTC (permalink / raw) To: Ziyang Xuan; +Cc: davem, kuba, jgg, netdev, linux-kernel Ziyang Xuan <william.xuanziyang@huawei.com> writes: > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index 55275ef9a31a..a3a0a5e994f5 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) > } > > vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); > - > - /* Get rid of the vlan's reference to real_dev */ > - dev_put(real_dev); > } > > int vlan_check_real_dev(struct net_device *real_dev, > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c > index 0c21d1fec852..aeeb5f90417b 100644 > --- a/net/8021q/vlan_dev.c > +++ b/net/8021q/vlan_dev.c > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev) > > free_percpu(vlan->vlan_pcpu_stats); > vlan->vlan_pcpu_stats = NULL; > + > + /* Get rid of the vlan's reference to real_dev */ > + dev_put(vlan->real_dev); > } > > void vlan_setup(struct net_device *dev) This is causing reference counting issues when vetoing is involved. Consider the following snippet: ip link add name bond1 type bond mode 802.3ad ip link set dev swp1 master bond1 ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100 # ^ vetoed, no netdevice created ip link del dev bond1 The setup process goes like this: vlan_newlink() calls register_vlan_dev() calls netdev_upper_dev_link() calls __netdev_upper_dev_link(), which issues a notifier NETDEV_PRECHANGEUPPER, which yields a non-zero error, because a listener vetoed it. So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends up decreasing reference count of the real_dev. Then when when the bond netdevice is removed, we get an endless loop of: kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 Moving the dev_hold(real_dev) to always happen even if the netdev_upper_dev_link() call makes the issue go away. I'm not sure why this wasn't happening before. After the veto, register_vlan_dev() follows with a goto out_unregister_netdev, which calls unregister_netdevice() calls unregister_netdevice_queue(), which issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(), which calls unregister_vlan_dev(), which used to dev_put(real_dev), which seems like it should have caused the same issue. Dunno. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-15 17:04 ` Petr Machata @ 2021-11-15 17:49 ` Jakub Kicinski 2021-11-16 14:20 ` Petr Machata 2021-11-17 11:50 ` Petr Machata 2021-11-19 3:04 ` Ziyang Xuan (William) 1 sibling, 2 replies; 18+ messages in thread From: Jakub Kicinski @ 2021-11-15 17:49 UTC (permalink / raw) To: Petr Machata; +Cc: Ziyang Xuan, davem, jgg, netdev, linux-kernel On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote: > Ziyang Xuan <william.xuanziyang@huawei.com> writes: > > > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > > index 55275ef9a31a..a3a0a5e994f5 100644 > > --- a/net/8021q/vlan.c > > +++ b/net/8021q/vlan.c > > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) > > } > > > > vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); > > - > > - /* Get rid of the vlan's reference to real_dev */ > > - dev_put(real_dev); > > } > > > > int vlan_check_real_dev(struct net_device *real_dev, > > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c > > index 0c21d1fec852..aeeb5f90417b 100644 > > --- a/net/8021q/vlan_dev.c > > +++ b/net/8021q/vlan_dev.c > > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev) > > > > free_percpu(vlan->vlan_pcpu_stats); > > vlan->vlan_pcpu_stats = NULL; > > + > > + /* Get rid of the vlan's reference to real_dev */ > > + dev_put(vlan->real_dev); > > } > > > > void vlan_setup(struct net_device *dev) > > This is causing reference counting issues when vetoing is involved. > Consider the following snippet: > > ip link add name bond1 type bond mode 802.3ad > ip link set dev swp1 master bond1 > ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100 > # ^ vetoed, no netdevice created > ip link del dev bond1 > > The setup process goes like this: vlan_newlink() calls > register_vlan_dev() calls netdev_upper_dev_link() calls > __netdev_upper_dev_link(), which issues a notifier > NETDEV_PRECHANGEUPPER, which yields a non-zero error, > because a listener vetoed it. > > So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends > up decreasing reference count of the real_dev. Then when when the bond > netdevice is removed, we get an endless loop of: > > kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 > > Moving the dev_hold(real_dev) to always happen even if the > netdev_upper_dev_link() call makes the issue go away. > > I'm not sure why this wasn't happening before. After the veto, > register_vlan_dev() follows with a goto out_unregister_netdev, which > calls unregister_netdevice() calls unregister_netdevice_queue(), which > issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(), > which calls unregister_vlan_dev(), which used to dev_put(real_dev), > which seems like it should have caused the same issue. Dunno. Does the notifier trigger unregister_vlan_dev()? I thought the notifier triggers when lower dev is unregistered. I think we should move the dev_hold() to ndo_init(), otherwise it's hard to reason if destructor was invoked or not if register_netdevice() errors out. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-15 17:49 ` Jakub Kicinski @ 2021-11-16 14:20 ` Petr Machata 2021-11-17 11:50 ` Petr Machata 1 sibling, 0 replies; 18+ messages in thread From: Petr Machata @ 2021-11-16 14:20 UTC (permalink / raw) To: Jakub Kicinski Cc: Petr Machata, Ziyang Xuan, davem, jgg, netdev, linux-kernel Jakub Kicinski <kuba@kernel.org> writes: > On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote: > >> I'm not sure why this wasn't happening before. After the veto, >> register_vlan_dev() follows with a goto out_unregister_netdev, which >> calls unregister_netdevice() calls unregister_netdevice_queue(), which >> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(), >> which calls unregister_vlan_dev(), which used to dev_put(real_dev), >> which seems like it should have caused the same issue. Dunno. > > Does the notifier trigger unregister_vlan_dev()? I thought the notifier > triggers when lower dev is unregistered. Right, I misinterpreted this bit: vlan_info = rtnl_dereference(dev->vlan_info); if (!vlan_info) goto out; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-15 17:49 ` Jakub Kicinski 2021-11-16 14:20 ` Petr Machata @ 2021-11-17 11:50 ` Petr Machata 2021-11-18 1:46 ` Ziyang Xuan (William) 1 sibling, 1 reply; 18+ messages in thread From: Petr Machata @ 2021-11-17 11:50 UTC (permalink / raw) To: Jakub Kicinski Cc: Petr Machata, Ziyang Xuan, davem, jgg, netdev, linux-kernel Jakub Kicinski <kuba@kernel.org> writes: > On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote: >> Ziyang Xuan <william.xuanziyang@huawei.com> writes: >> >> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c >> > index 55275ef9a31a..a3a0a5e994f5 100644 >> > --- a/net/8021q/vlan.c >> > +++ b/net/8021q/vlan.c >> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) >> > } >> > >> > vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); >> > - >> > - /* Get rid of the vlan's reference to real_dev */ >> > - dev_put(real_dev); >> > } >> > >> > int vlan_check_real_dev(struct net_device *real_dev, >> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c >> > index 0c21d1fec852..aeeb5f90417b 100644 >> > --- a/net/8021q/vlan_dev.c >> > +++ b/net/8021q/vlan_dev.c >> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev) >> > >> > free_percpu(vlan->vlan_pcpu_stats); >> > vlan->vlan_pcpu_stats = NULL; >> > + >> > + /* Get rid of the vlan's reference to real_dev */ >> > + dev_put(vlan->real_dev); >> > } >> > >> > void vlan_setup(struct net_device *dev) >> >> This is causing reference counting issues when vetoing is involved. >> Consider the following snippet: >> >> ip link add name bond1 type bond mode 802.3ad >> ip link set dev swp1 master bond1 >> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100 >> # ^ vetoed, no netdevice created >> ip link del dev bond1 >> >> The setup process goes like this: vlan_newlink() calls >> register_vlan_dev() calls netdev_upper_dev_link() calls >> __netdev_upper_dev_link(), which issues a notifier >> NETDEV_PRECHANGEUPPER, which yields a non-zero error, >> because a listener vetoed it. >> >> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends >> up decreasing reference count of the real_dev. Then when when the bond >> netdevice is removed, we get an endless loop of: >> >> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 >> >> Moving the dev_hold(real_dev) to always happen even if the >> netdev_upper_dev_link() call makes the issue go away. > > I think we should move the dev_hold() to ndo_init(), otherwise > it's hard to reason if destructor was invoked or not if > register_netdevice() errors out. Ziyang Xuan, do you intend to take care of this? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-17 11:50 ` Petr Machata @ 2021-11-18 1:46 ` Ziyang Xuan (William) 2021-11-18 14:17 ` Jakub Kicinski 0 siblings, 1 reply; 18+ messages in thread From: Ziyang Xuan (William) @ 2021-11-18 1:46 UTC (permalink / raw) To: Petr Machata, Jakub Kicinski; +Cc: davem, jgg, netdev, linux-kernel > > Jakub Kicinski <kuba@kernel.org> writes: > >> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote: >>> Ziyang Xuan <william.xuanziyang@huawei.com> writes: >>> >>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c >>>> index 55275ef9a31a..a3a0a5e994f5 100644 >>>> --- a/net/8021q/vlan.c >>>> +++ b/net/8021q/vlan.c >>>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) >>>> } >>>> >>>> vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); >>>> - >>>> - /* Get rid of the vlan's reference to real_dev */ >>>> - dev_put(real_dev); >>>> } >>>> >>>> int vlan_check_real_dev(struct net_device *real_dev, >>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c >>>> index 0c21d1fec852..aeeb5f90417b 100644 >>>> --- a/net/8021q/vlan_dev.c >>>> +++ b/net/8021q/vlan_dev.c >>>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev) >>>> >>>> free_percpu(vlan->vlan_pcpu_stats); >>>> vlan->vlan_pcpu_stats = NULL; >>>> + >>>> + /* Get rid of the vlan's reference to real_dev */ >>>> + dev_put(vlan->real_dev); >>>> } >>>> >>>> void vlan_setup(struct net_device *dev) >>> >>> This is causing reference counting issues when vetoing is involved. >>> Consider the following snippet: >>> >>> ip link add name bond1 type bond mode 802.3ad >>> ip link set dev swp1 master bond1 >>> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100 >>> # ^ vetoed, no netdevice created >>> ip link del dev bond1 >>> >>> The setup process goes like this: vlan_newlink() calls >>> register_vlan_dev() calls netdev_upper_dev_link() calls >>> __netdev_upper_dev_link(), which issues a notifier >>> NETDEV_PRECHANGEUPPER, which yields a non-zero error, >>> because a listener vetoed it. >>> >>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends >>> up decreasing reference count of the real_dev. Then when when the bond >>> netdevice is removed, we get an endless loop of: >>> >>> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 >>> >>> Moving the dev_hold(real_dev) to always happen even if the >>> netdev_upper_dev_link() call makes the issue go away. >> >> I think we should move the dev_hold() to ndo_init(), otherwise >> it's hard to reason if destructor was invoked or not if >> register_netdevice() errors out. > > Ziyang Xuan, do you intend to take care of this? > . I am reading the related processes according to the problem scenario. And I will give a more clear sequence and root cause as soon as possible by some necessary tests. Thank you! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-18 1:46 ` Ziyang Xuan (William) @ 2021-11-18 14:17 ` Jakub Kicinski 2021-11-19 3:29 ` Ziyang Xuan (William) 0 siblings, 1 reply; 18+ messages in thread From: Jakub Kicinski @ 2021-11-18 14:17 UTC (permalink / raw) To: Ziyang Xuan (William); +Cc: Petr Machata, davem, jgg, netdev, linux-kernel On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote: > >> I think we should move the dev_hold() to ndo_init(), otherwise > >> it's hard to reason if destructor was invoked or not if > >> register_netdevice() errors out. > > > > Ziyang Xuan, do you intend to take care of this? > > . > > I am reading the related processes according to the problem scenario. > And I will give a more clear sequence and root cause as soon as possible > by some necessary tests. Okay, I still don't see the fix. Petr would you mind submitting since you have a repro handy? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-18 14:17 ` Jakub Kicinski @ 2021-11-19 3:29 ` Ziyang Xuan (William) 2021-11-19 10:07 ` Petr Machata 0 siblings, 1 reply; 18+ messages in thread From: Ziyang Xuan (William) @ 2021-11-19 3:29 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Petr Machata, davem, jgg, netdev, linux-kernel > On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote: >>>> I think we should move the dev_hold() to ndo_init(), otherwise >>>> it's hard to reason if destructor was invoked or not if >>>> register_netdevice() errors out. >>> >>> Ziyang Xuan, do you intend to take care of this? >>> . >> >> I am reading the related processes according to the problem scenario. >> And I will give a more clear sequence and root cause as soon as possible >> by some necessary tests. > > Okay, I still don't see the fix. > > Petr would you mind submitting since you have a repro handy? > . The sequence for the problem maybe as following: ============================================================= # create vlan device vlan_newlink [assume real_dev refcnt == 2 just referenced by itself] register_vlan_dev register_netdevice(vlan_dev) [success] netdev_upper_dev_link [failed] unregister_netdevice(vlan_dev) [failure process] ... netdev_run_todo [vlan_dev] vlan_dev_free(vlan_dev) [priv_destructor cb] dev_put(real_dev) [real_dev refcnt == 1] # delete real device unregister_netdevice(real_dev) [real_dev refcnt == 1] unregister_netdevice_many dev_put(real_dev) [real_dev refcnt == 0] net_set_todo(real_dev) ... netdev_run_todo [real_dev] netdev_wait_allrefs(real_dev) [real_dev refcnt == 0] pr_emerg("unregister_netdevice: ...", ...) I am thinking about how to fix the problem. priv_destructor() of the net_device referenced not only by net_set_todo() but also failure process in register_netdevice(). I need some time to test my some ideas. And anyone has good ideas, please do not be stingy. Thank you! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-19 3:29 ` Ziyang Xuan (William) @ 2021-11-19 10:07 ` Petr Machata 2021-11-23 9:01 ` Ziyang Xuan (William) 0 siblings, 1 reply; 18+ messages in thread From: Petr Machata @ 2021-11-19 10:07 UTC (permalink / raw) To: Ziyang Xuan (William) Cc: Jakub Kicinski, Petr Machata, davem, jgg, netdev, linux-kernel Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: > I need some time to test my some ideas. And anyone has good ideas, please > do not be stingy. Jakub Kicinski <kuba@kernel.org> writes: > I think we should move the dev_hold() to ndo_init(), otherwise it's > hard to reason if destructor was invoked or not if > register_netdevice() errors out. That makes sense to me. We always put real_dev in the destructor, so we should always hold it in the constructor... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-19 10:07 ` Petr Machata @ 2021-11-23 9:01 ` Ziyang Xuan (William) 2021-11-23 12:35 ` Petr Machata 0 siblings, 1 reply; 18+ messages in thread From: Ziyang Xuan (William) @ 2021-11-23 9:01 UTC (permalink / raw) To: Petr Machata; +Cc: Jakub Kicinski, davem, jgg, netdev, linux-kernel > > Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: > >> I need some time to test my some ideas. And anyone has good ideas, please >> do not be stingy. > > Jakub Kicinski <kuba@kernel.org> writes: > >> I think we should move the dev_hold() to ndo_init(), otherwise it's >> hard to reason if destructor was invoked or not if >> register_netdevice() errors out. > > That makes sense to me. We always put real_dev in the destructor, so we > should always hold it in the constructor... Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute the following testcase: ip link add dev dummy1 type dummy ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection ip link del dev dummy1 Make the problem repro. The problem is solved using the following fix according to the Jakub's suggestion: diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index a3a0a5e994f5..abaa5d96ded2 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack) if (err) goto out_unregister_netdev; - /* Account for reference in struct vlan_dev_priv */ - dev_hold(real_dev); - vlan_stacked_transfer_operstate(real_dev, dev, vlan); linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */ diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index ab6dee28536d..a54535cbcf4c 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev) if (!vlan->vlan_pcpu_stats) return -ENOMEM; + /* Get vlan's reference to real_dev */ + dev_hold(real_dev); If there is not any other idea and objection, I will send the fix patch later. Thank you! ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-23 9:01 ` Ziyang Xuan (William) @ 2021-11-23 12:35 ` Petr Machata 2021-11-25 11:33 ` Petr Machata 0 siblings, 1 reply; 18+ messages in thread From: Petr Machata @ 2021-11-23 12:35 UTC (permalink / raw) To: Ziyang Xuan (William) Cc: Petr Machata, Jakub Kicinski, davem, jgg, netdev, linux-kernel Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: >> >> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: >> >>> I need some time to test my some ideas. And anyone has good ideas, please >>> do not be stingy. >> >> Jakub Kicinski <kuba@kernel.org> writes: >> >>> I think we should move the dev_hold() to ndo_init(), otherwise it's >>> hard to reason if destructor was invoked or not if >>> register_netdevice() errors out. >> >> That makes sense to me. We always put real_dev in the destructor, so we >> should always hold it in the constructor... > > Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute > the following testcase: > > ip link add dev dummy1 type dummy > ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection > ip link del dev dummy1 > > Make the problem repro. The problem is solved using the following fix > according to the Jakub's suggestion: > > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index a3a0a5e994f5..abaa5d96ded2 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack) > if (err) > goto out_unregister_netdev; > > - /* Account for reference in struct vlan_dev_priv */ > - dev_hold(real_dev); > - > vlan_stacked_transfer_operstate(real_dev, dev, vlan); > linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */ > > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c > index ab6dee28536d..a54535cbcf4c 100644 > --- a/net/8021q/vlan_dev.c > +++ b/net/8021q/vlan_dev.c > @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev) > if (!vlan->vlan_pcpu_stats) > return -ENOMEM; > > + /* Get vlan's reference to real_dev */ > + dev_hold(real_dev); > > > If there is not any other idea and objection, I will send the fix patch later. > > Thank you! This fixes the issue in my repro, and does not seems to break anything. We'll take it to full regression overnight, I'll report back tomorrow about the result. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-23 12:35 ` Petr Machata @ 2021-11-25 11:33 ` Petr Machata 2021-11-26 1:48 ` Ziyang Xuan (William) 0 siblings, 1 reply; 18+ messages in thread From: Petr Machata @ 2021-11-25 11:33 UTC (permalink / raw) To: Petr Machata Cc: Ziyang Xuan (William), Jakub Kicinski, davem, jgg, netdev, linux-kernel Petr Machata <petrm@nvidia.com> writes: > Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: > >>> >>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: >>> >>>> I need some time to test my some ideas. And anyone has good ideas, please >>>> do not be stingy. >>> >>> Jakub Kicinski <kuba@kernel.org> writes: >>> >>>> I think we should move the dev_hold() to ndo_init(), otherwise it's >>>> hard to reason if destructor was invoked or not if >>>> register_netdevice() errors out. >>> >>> That makes sense to me. We always put real_dev in the destructor, so we >>> should always hold it in the constructor... >> >> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute >> the following testcase: >> >> ip link add dev dummy1 type dummy >> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection >> ip link del dev dummy1 >> >> Make the problem repro. The problem is solved using the following fix >> according to the Jakub's suggestion: >> >> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c >> index a3a0a5e994f5..abaa5d96ded2 100644 >> --- a/net/8021q/vlan.c >> +++ b/net/8021q/vlan.c >> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack) >> if (err) >> goto out_unregister_netdev; >> >> - /* Account for reference in struct vlan_dev_priv */ >> - dev_hold(real_dev); >> - >> vlan_stacked_transfer_operstate(real_dev, dev, vlan); >> linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */ >> >> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c >> index ab6dee28536d..a54535cbcf4c 100644 >> --- a/net/8021q/vlan_dev.c >> +++ b/net/8021q/vlan_dev.c >> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev) >> if (!vlan->vlan_pcpu_stats) >> return -ENOMEM; >> >> + /* Get vlan's reference to real_dev */ >> + dev_hold(real_dev); >> >> >> If there is not any other idea and objection, I will send the fix patch later. >> >> Thank you! > > This fixes the issue in my repro, and does not seems to break anything. > We'll take it to full regression overnight, I'll report back tomorrow > about the result. Sorry, was AFK yesterday. It does look good. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-25 11:33 ` Petr Machata @ 2021-11-26 1:48 ` Ziyang Xuan (William) 0 siblings, 0 replies; 18+ messages in thread From: Ziyang Xuan (William) @ 2021-11-26 1:48 UTC (permalink / raw) To: Petr Machata; +Cc: Jakub Kicinski, davem, jgg, netdev, linux-kernel > > Petr Machata <petrm@nvidia.com> writes: > >> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: >> >>>> >>>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes: >>>> >>>>> I need some time to test my some ideas. And anyone has good ideas, please >>>>> do not be stingy. >>>> >>>> Jakub Kicinski <kuba@kernel.org> writes: >>>> >>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's >>>>> hard to reason if destructor was invoked or not if >>>>> register_netdevice() errors out. >>>> >>>> That makes sense to me. We always put real_dev in the destructor, so we >>>> should always hold it in the constructor... >>> >>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute >>> the following testcase: >>> >>> ip link add dev dummy1 type dummy >>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection >>> ip link del dev dummy1 >>> >>> Make the problem repro. The problem is solved using the following fix >>> according to the Jakub's suggestion: >>> >>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c >>> index a3a0a5e994f5..abaa5d96ded2 100644 >>> --- a/net/8021q/vlan.c >>> +++ b/net/8021q/vlan.c >>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack) >>> if (err) >>> goto out_unregister_netdev; >>> >>> - /* Account for reference in struct vlan_dev_priv */ >>> - dev_hold(real_dev); >>> - >>> vlan_stacked_transfer_operstate(real_dev, dev, vlan); >>> linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */ >>> >>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c >>> index ab6dee28536d..a54535cbcf4c 100644 >>> --- a/net/8021q/vlan_dev.c >>> +++ b/net/8021q/vlan_dev.c >>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev) >>> if (!vlan->vlan_pcpu_stats) >>> return -ENOMEM; >>> >>> + /* Get vlan's reference to real_dev */ >>> + dev_hold(real_dev); >>> >>> >>> If there is not any other idea and objection, I will send the fix patch later. >>> >>> Thank you! >> >> This fixes the issue in my repro, and does not seems to break anything. >> We'll take it to full regression overnight, I'll report back tomorrow >> about the result. > > Sorry, was AFK yesterday. It does look good. > . Thank you for your efforts very much! I have sent the fix patch. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() 2021-11-15 17:04 ` Petr Machata 2021-11-15 17:49 ` Jakub Kicinski @ 2021-11-19 3:04 ` Ziyang Xuan (William) 1 sibling, 0 replies; 18+ messages in thread From: Ziyang Xuan (William) @ 2021-11-19 3:04 UTC (permalink / raw) To: Petr Machata; +Cc: davem, kuba, jgg, netdev, linux-kernel > > Ziyang Xuan <william.xuanziyang@huawei.com> writes: > >> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c >> index 55275ef9a31a..a3a0a5e994f5 100644 >> --- a/net/8021q/vlan.c >> +++ b/net/8021q/vlan.c >> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) >> } >> >> vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id); >> - >> - /* Get rid of the vlan's reference to real_dev */ >> - dev_put(real_dev); >> } >> >> int vlan_check_real_dev(struct net_device *real_dev, >> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c >> index 0c21d1fec852..aeeb5f90417b 100644 >> --- a/net/8021q/vlan_dev.c >> +++ b/net/8021q/vlan_dev.c >> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev) >> >> free_percpu(vlan->vlan_pcpu_stats); >> vlan->vlan_pcpu_stats = NULL; >> + >> + /* Get rid of the vlan's reference to real_dev */ >> + dev_put(vlan->real_dev); >> } >> >> void vlan_setup(struct net_device *dev) > > This is causing reference counting issues when vetoing is involved. > Consider the following snippet: > > ip link add name bond1 type bond mode 802.3ad > ip link set dev swp1 master bond1 > ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100 > # ^ vetoed, no netdevice created > ip link del dev bond1 > > The setup process goes like this: vlan_newlink() calls > register_vlan_dev() calls netdev_upper_dev_link() calls > __netdev_upper_dev_link(), which issues a notifier > NETDEV_PRECHANGEUPPER, which yields a non-zero error, > because a listener vetoed it. > > So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends > up decreasing reference count of the real_dev. Then when when the bond > netdevice is removed, we get an endless loop of: > > kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 > > Moving the dev_hold(real_dev) to always happen even if the > netdev_upper_dev_link() call makes the issue go away. > > I'm not sure why this wasn't happening before. After the veto, > register_vlan_dev() follows with a goto out_unregister_netdev, which > calls unregister_netdevice() calls unregister_netdevice_queue(), which > issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(), > which calls unregister_vlan_dev(), which used to dev_put(real_dev), > which seems like it should have caused the same issue. Dunno. netdev_upper_dev_link() failure in register_vlan_dev() will result in no dev_hold(real_dev) and vlan_group_set_device(). So NETDEV_UNREGISTER notifier caused by vlan_dev invokes vlan_device_event() will not find the vlan_dev in vlan_group, and no unregister_vlan_dev() for the vlan_dev. > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-11-26 1:50 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-02 2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan 2021-11-03 13:50 ` Jason Gunthorpe 2021-11-03 15:47 ` Jakub Kicinski 2021-11-03 16:11 ` Jason Gunthorpe 2021-11-03 14:30 ` patchwork-bot+netdevbpf 2021-11-15 17:04 ` Petr Machata 2021-11-15 17:49 ` Jakub Kicinski 2021-11-16 14:20 ` Petr Machata 2021-11-17 11:50 ` Petr Machata 2021-11-18 1:46 ` Ziyang Xuan (William) 2021-11-18 14:17 ` Jakub Kicinski 2021-11-19 3:29 ` Ziyang Xuan (William) 2021-11-19 10:07 ` Petr Machata 2021-11-23 9:01 ` Ziyang Xuan (William) 2021-11-23 12:35 ` Petr Machata 2021-11-25 11:33 ` Petr Machata 2021-11-26 1:48 ` Ziyang Xuan (William) 2021-11-19 3:04 ` Ziyang Xuan (William)
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.