All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
@ 2021-10-27 12:16 Ziyang Xuan
  2021-10-28  1:46 ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Ziyang Xuan @ 2021-10-27 12:16 UTC (permalink / raw)
  To: davem, kuba; +Cc: jgg, netdev, linux-kernel

The real_dev of a vlan net_device may be freed after
unregister_vlan_dev(). Access the real_dev continually by
vlan_dev_real_dev() will trigger the UAF problem for the
real_dev like following:

==================================================================
BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
Call Trace:
 kasan_report.cold+0x83/0xdf
 vlan_dev_real_dev+0xf9/0x120
 is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
 is_eth_port_of_netdev_filter+0x28/0x40
 ib_enum_roce_netdev+0x1a3/0x300
 ib_enum_all_roce_netdevs+0xc7/0x140
 netdevice_event_work_handler+0x9d/0x210
...

Freed by task 9288:
 kasan_save_stack+0x1b/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x20/0x30
 __kasan_slab_free+0xfc/0x130
 slab_free_freelist_hook+0xdd/0x240
 kfree+0xe4/0x690
 kvfree+0x42/0x50
 device_release+0x9f/0x240
 kobject_put+0x1c8/0x530
 put_device+0x1b/0x30
 free_netdev+0x370/0x540
 ppp_destroy_interface+0x313/0x3d0
...

Set vlan->real_dev to NULL after dev_put(real_dev) in
unregister_vlan_dev(). Check real_dev is not NULL before
access it in vlan_dev_real_dev().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
 net/8021q/vlan.c      | 1 +
 net/8021q/vlan_core.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 55275ef9a31a..1106da84e725 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 
 	/* Get rid of the vlan's reference to real_dev */
 	dev_put(real_dev);
+	vlan->real_dev = NULL;
 }
 
 int vlan_check_real_dev(struct net_device *real_dev,
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 59bc13b5f14f..343f34479d8b 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
 {
 	struct net_device *ret = vlan_dev_priv(dev)->real_dev;
 
-	while (is_vlan_dev(ret))
+	while (ret && is_vlan_dev(ret))
 		ret = vlan_dev_priv(ret)->real_dev;
 
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-27 12:16 [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan
@ 2021-10-28  1:46 ` Jakub Kicinski
  2021-10-28  5:44   ` Leon Romanovsky
  2021-10-28 11:45   ` Jason Gunthorpe
  0 siblings, 2 replies; 10+ messages in thread
From: Jakub Kicinski @ 2021-10-28  1:46 UTC (permalink / raw)
  To: Ziyang Xuan; +Cc: davem, jgg, netdev, linux-kernel, linux-rdma

On Wed, 27 Oct 2021 20:16:06 +0800 Ziyang Xuan wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
> 
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
>  kasan_report.cold+0x83/0xdf
>  vlan_dev_real_dev+0xf9/0x120
>  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
>  is_eth_port_of_netdev_filter+0x28/0x40
>  ib_enum_roce_netdev+0x1a3/0x300
>  ib_enum_all_roce_netdevs+0xc7/0x140
>  netdevice_event_work_handler+0x9d/0x210
> ...
> 
> Freed by task 9288:
>  kasan_save_stack+0x1b/0x40
>  kasan_set_track+0x1c/0x30
>  kasan_set_free_info+0x20/0x30
>  __kasan_slab_free+0xfc/0x130
>  slab_free_freelist_hook+0xdd/0x240
>  kfree+0xe4/0x690
>  kvfree+0x42/0x50
>  device_release+0x9f/0x240
>  kobject_put+0x1c8/0x530
>  put_device+0x1b/0x30
>  free_netdev+0x370/0x540
>  ppp_destroy_interface+0x313/0x3d0
> ...
> 
> Set vlan->real_dev to NULL after dev_put(real_dev) in
> unregister_vlan_dev(). Check real_dev is not NULL before
> access it in vlan_dev_real_dev().
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
>  net/8021q/vlan.c      | 1 +
>  net/8021q/vlan_core.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 55275ef9a31a..1106da84e725 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>  
>  	/* Get rid of the vlan's reference to real_dev */
>  	dev_put(real_dev);
> +	vlan->real_dev = NULL;
>  }
>  
>  int vlan_check_real_dev(struct net_device *real_dev,
> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index 59bc13b5f14f..343f34479d8b 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
>  {
>  	struct net_device *ret = vlan_dev_priv(dev)->real_dev;
>  
> -	while (is_vlan_dev(ret))
> +	while (ret && is_vlan_dev(ret))
>  		ret = vlan_dev_priv(ret)->real_dev;
>  
>  	return ret;

But will make all the callers of vlan_dev_real_dev() feel like they
should NULL-check the result, which is not necessary.

RDMA must be calling this helper on a vlan which was already
unregistered, can we fix RDMA instead?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-28  1:46 ` Jakub Kicinski
@ 2021-10-28  5:44   ` Leon Romanovsky
  2021-10-28 11:45   ` Jason Gunthorpe
  1 sibling, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-10-28  5:44 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ziyang Xuan, davem, jgg, netdev, linux-kernel, linux-rdma

On Wed, Oct 27, 2021 at 06:46:40PM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 20:16:06 +0800 Ziyang Xuan wrote:
> > The real_dev of a vlan net_device may be freed after
> > unregister_vlan_dev(). Access the real_dev continually by
> > vlan_dev_real_dev() will trigger the UAF problem for the
> > real_dev like following:
> > 
> > ==================================================================
> > BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> > Call Trace:
> >  kasan_report.cold+0x83/0xdf
> >  vlan_dev_real_dev+0xf9/0x120
> >  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> >  is_eth_port_of_netdev_filter+0x28/0x40
> >  ib_enum_roce_netdev+0x1a3/0x300
> >  ib_enum_all_roce_netdevs+0xc7/0x140
> >  netdevice_event_work_handler+0x9d/0x210
> > ...
> > 
> > Freed by task 9288:
> >  kasan_save_stack+0x1b/0x40
> >  kasan_set_track+0x1c/0x30
> >  kasan_set_free_info+0x20/0x30
> >  __kasan_slab_free+0xfc/0x130
> >  slab_free_freelist_hook+0xdd/0x240
> >  kfree+0xe4/0x690
> >  kvfree+0x42/0x50
> >  device_release+0x9f/0x240
> >  kobject_put+0x1c8/0x530
> >  put_device+0x1b/0x30
> >  free_netdev+0x370/0x540
> >  ppp_destroy_interface+0x313/0x3d0
> > ...
> > 
> > Set vlan->real_dev to NULL after dev_put(real_dev) in
> > unregister_vlan_dev(). Check real_dev is not NULL before
> > access it in vlan_dev_real_dev().
> > 
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
> > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> > ---
> >  net/8021q/vlan.c      | 1 +
> >  net/8021q/vlan_core.c | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..1106da84e725 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> >  
> >  	/* Get rid of the vlan's reference to real_dev */
> >  	dev_put(real_dev);
> > +	vlan->real_dev = NULL;
> >  }
> >  
> >  int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> > index 59bc13b5f14f..343f34479d8b 100644
> > --- a/net/8021q/vlan_core.c
> > +++ b/net/8021q/vlan_core.c
> > @@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
> >  {
> >  	struct net_device *ret = vlan_dev_priv(dev)->real_dev;
> >  
> > -	while (is_vlan_dev(ret))
> > +	while (ret && is_vlan_dev(ret))
> >  		ret = vlan_dev_priv(ret)->real_dev;
> >  
> >  	return ret;
> 
> But will make all the callers of vlan_dev_real_dev() feel like they
> should NULL-check the result, which is not necessary.
> 
> RDMA must be calling this helper on a vlan which was already
> unregistered, can we fix RDMA instead?

He tried to fix RDMA first, but we came to conclusion that we real
solution is in netdev land.

https://lore.kernel.org/linux-rdma/20211025163941.GA393143@nvidia.com/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-28  1:46 ` Jakub Kicinski
  2021-10-28  5:44   ` Leon Romanovsky
@ 2021-10-28 11:45   ` Jason Gunthorpe
  2021-10-28 14:00     ` Jakub Kicinski
  1 sibling, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-10-28 11:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ziyang Xuan, davem, netdev, linux-kernel, linux-rdma

On Wed, Oct 27, 2021 at 06:46:40PM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 20:16:06 +0800 Ziyang Xuan wrote:
> > The real_dev of a vlan net_device may be freed after
> > unregister_vlan_dev(). Access the real_dev continually by
> > vlan_dev_real_dev() will trigger the UAF problem for the
> > real_dev like following:
> > 
> > ==================================================================
> > BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> > Call Trace:
> >  kasan_report.cold+0x83/0xdf
> >  vlan_dev_real_dev+0xf9/0x120
> >  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
> >  is_eth_port_of_netdev_filter+0x28/0x40
> >  ib_enum_roce_netdev+0x1a3/0x300
> >  ib_enum_all_roce_netdevs+0xc7/0x140
> >  netdevice_event_work_handler+0x9d/0x210
> > ...
> > 
> > Freed by task 9288:
> >  kasan_save_stack+0x1b/0x40
> >  kasan_set_track+0x1c/0x30
> >  kasan_set_free_info+0x20/0x30
> >  __kasan_slab_free+0xfc/0x130
> >  slab_free_freelist_hook+0xdd/0x240
> >  kfree+0xe4/0x690
> >  kvfree+0x42/0x50
> >  device_release+0x9f/0x240
> >  kobject_put+0x1c8/0x530
> >  put_device+0x1b/0x30
> >  free_netdev+0x370/0x540
> >  ppp_destroy_interface+0x313/0x3d0
> > ...
> > 
> > Set vlan->real_dev to NULL after dev_put(real_dev) in
> > unregister_vlan_dev(). Check real_dev is not NULL before
> > access it in vlan_dev_real_dev().
> > 
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
> > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> >  net/8021q/vlan.c      | 1 +
> >  net/8021q/vlan_core.c | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..1106da84e725 100644
> > +++ b/net/8021q/vlan.c
> > @@ -126,6 +126,7 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> >  
> >  	/* Get rid of the vlan's reference to real_dev */
> >  	dev_put(real_dev);
> > +	vlan->real_dev = NULL;
> >  }
> >  
> >  int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> > index 59bc13b5f14f..343f34479d8b 100644
> > +++ b/net/8021q/vlan_core.c
> > @@ -103,7 +103,7 @@ struct net_device *vlan_dev_real_dev(const struct net_device *dev)
> >  {
> >  	struct net_device *ret = vlan_dev_priv(dev)->real_dev;
> >  
> > -	while (is_vlan_dev(ret))
> > +	while (ret && is_vlan_dev(ret))
> >  		ret = vlan_dev_priv(ret)->real_dev;
> >  
> >  	return ret;
> 
> But will make all the callers of vlan_dev_real_dev() feel like they
> should NULL-check the result, which is not necessary.

Isn't it better to reliably return NULL instead of a silent UAF in
this edge case? 

> RDMA must be calling this helper on a vlan which was already
> unregistered, can we fix RDMA instead?

RDMA holds a get on the netdev which prevents unregistration, however
unregister_vlan_dev() does:

        unregister_netdevice_queue(dev, head);
        dev_put(real_dev);

Which corrupts the still registered vlan device while it is sitting in
the queue waiting to unregister. So, it is not true that a registered
vlan device always has working vlan_dev_real_dev().

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-28 11:45   ` Jason Gunthorpe
@ 2021-10-28 14:00     ` Jakub Kicinski
  2021-10-29  7:04       ` Ziyang Xuan (William)
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2021-10-28 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Ziyang Xuan, davem, netdev, linux-kernel, linux-rdma

On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
> > But will make all the callers of vlan_dev_real_dev() feel like they
> > should NULL-check the result, which is not necessary.  
> 
> Isn't it better to reliably return NULL instead of a silent UAF in
> this edge case? 

I don't know what the best practice is for maintaining sanity of
unregistered objects.

If there really is a requirement for the real_dev pointer to be sane we
may want to move the put_device(real_dev) to vlan_dev_free(). There
should not be any risk of circular dependency but I'm not 100% sure.

> > RDMA must be calling this helper on a vlan which was already
> > unregistered, can we fix RDMA instead?  
> 
> RDMA holds a get on the netdev which prevents unregistration, however
> unregister_vlan_dev() does:
> 
>         unregister_netdevice_queue(dev, head);
>         dev_put(real_dev);
> 
> Which corrupts the still registered vlan device while it is sitting in
> the queue waiting to unregister. So, it is not true that a registered
> vlan device always has working vlan_dev_real_dev().

That's not my reading, unless we have a different definition of
"registered". The RDMA code in question runs from a workqueue, at the
time the UNREGISTER notification is generated all objects are still
alive and no UAF can happen. Past UNREGISTER extra care is needed when
accessing the object.

Note that unregister_vlan_dev() may queue the unregistration, without
running it. If it clears real_dev the UNREGISTER notification will no
longer be able to access real_dev, which used to be completely legal.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-28 14:00     ` Jakub Kicinski
@ 2021-10-29  7:04       ` Ziyang Xuan (William)
  2021-10-29 12:13         ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Ziyang Xuan (William) @ 2021-10-29  7:04 UTC (permalink / raw)
  To: Jakub Kicinski, Jason Gunthorpe; +Cc: davem, netdev, linux-kernel, linux-rdma

> On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
>>> But will make all the callers of vlan_dev_real_dev() feel like they
>>> should NULL-check the result, which is not necessary.  
>>
>> Isn't it better to reliably return NULL instead of a silent UAF in
>> this edge case? 
> 
> I don't know what the best practice is for maintaining sanity of
> unregistered objects.
> 
> If there really is a requirement for the real_dev pointer to be sane we
> may want to move the put_device(real_dev) to vlan_dev_free(). There
> should not be any risk of circular dependency but I'm not 100% sure.
> 
>>> RDMA must be calling this helper on a vlan which was already
>>> unregistered, can we fix RDMA instead?  
>>
>> RDMA holds a get on the netdev which prevents unregistration, however
>> unregister_vlan_dev() does:
>>
>>         unregister_netdevice_queue(dev, head);
>>         dev_put(real_dev);
>>
>> Which corrupts the still registered vlan device while it is sitting in
>> the queue waiting to unregister. So, it is not true that a registered
>> vlan device always has working vlan_dev_real_dev().
> 
> That's not my reading, unless we have a different definition of
> "registered". The RDMA code in question runs from a workqueue, at the
> time the UNREGISTER notification is generated all objects are still
> alive and no UAF can happen. Past UNREGISTER extra care is needed when
> accessing the object.
> 
> Note that unregister_vlan_dev() may queue the unregistration, without
> running it. If it clears real_dev the UNREGISTER notification will no
> longer be able to access real_dev, which used to be completely legal.
> .
> 

I am sorry. I have made a misunderstanding and given a wrong conclusion
that unregister_vlan_dev() just move the vlan_ndev to a list to unregister
later and it is possible the real_dev has been freed when we access in
netdevice_queue_work().

real_ndev UNREGISTE trigger NETDEV_UNREGISTER notification in
vlan_device_event(), unregister_vlan_dev() and unregister_netdevice_many()
are within real_ndev UNREGISTE process. real_dev and vlan_ndev are all
alive before real_ndev UNREGISTE finished.

Above is the correction for my previous misunderstanding. But the real
scenario of the problem is as following:

__rtnl_newlink
vlan_newlink
register_vlan_dev(vlan_ndev, ...)
register_netdevice(vlan_ndev)
netdevice_queue_work(..., vlan_ndev) [dev_hold(vlan_ndev)]
queue_work(gid_cache_wq, ...)
...
rtnl_configure_link(vlan_ndev, ...) [failed]
ops->dellink(vlan_ndev, &list_kill) [unregister_vlan_dev]
unregister_netdevice_many(&list_kill)
...
ppp_release
unregister_netdevice(real_dev)
ppp_destroy_interface
free_netdev(real_dev)
netdev_freemem(real_dev) [real_dev freed]
...
netdevice_event_work_handler [vlan_ndev NETDEV_REGISTER notifier work]
is_eth_port_of_netdev_filter
vlan_dev_real_dev [real_dev UAF]

So my first solution as following for the problem is correct.
https://lore.kernel.org/linux-rdma/20211025163941.GA393143@nvidia.com/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

Thank you!


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-29  7:04       ` Ziyang Xuan (William)
@ 2021-10-29 12:13         ` Jason Gunthorpe
  2021-10-29 13:46           ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-10-29 12:13 UTC (permalink / raw)
  To: Ziyang Xuan (William)
  Cc: Jakub Kicinski, davem, netdev, linux-kernel, linux-rdma

On Fri, Oct 29, 2021 at 03:04:35PM +0800, Ziyang Xuan (William) wrote:
> > On Thu, 28 Oct 2021 08:45:03 -0300 Jason Gunthorpe wrote:
> >>> But will make all the callers of vlan_dev_real_dev() feel like they
> >>> should NULL-check the result, which is not necessary.  
> >>
> >> Isn't it better to reliably return NULL instead of a silent UAF in
> >> this edge case? 
> > 
> > I don't know what the best practice is for maintaining sanity of
> > unregistered objects.
> > 
> > If there really is a requirement for the real_dev pointer to be sane we
> > may want to move the put_device(real_dev) to vlan_dev_free(). There
> > should not be any risk of circular dependency but I'm not 100% sure.
> > 
> >>> RDMA must be calling this helper on a vlan which was already
> >>> unregistered, can we fix RDMA instead?  
> >>
> >> RDMA holds a get on the netdev which prevents unregistration, however
> >> unregister_vlan_dev() does:
> >>
> >>         unregister_netdevice_queue(dev, head);
> >>         dev_put(real_dev);
> >>
> >> Which corrupts the still registered vlan device while it is sitting in
> >> the queue waiting to unregister. So, it is not true that a registered
> >> vlan device always has working vlan_dev_real_dev().
> > 
> > That's not my reading, unless we have a different definition of
> > "registered". The RDMA code in question runs from a workqueue, at the
> > time the UNREGISTER notification is generated all objects are still
> > alive and no UAF can happen. Past UNREGISTER extra care is needed when
> > accessing the object.
> > 
> > Note that unregister_vlan_dev() may queue the unregistration, without
> > running it. If it clears real_dev the UNREGISTER notification will no
> > longer be able to access real_dev, which used to be completely legal.
> > .
> > 
> 
> I am sorry. I have made a misunderstanding and given a wrong conclusion
> that unregister_vlan_dev() just move the vlan_ndev to a list to unregister
> later and it is possible the real_dev has been freed when we access in
> netdevice_queue_work().
> 
> real_ndev UNREGISTE trigger NETDEV_UNREGISTER notification in
> vlan_device_event(), unregister_vlan_dev() and unregister_netdevice_many()
> are within real_ndev UNREGISTE process. real_dev and vlan_ndev are all
> alive before real_ndev UNREGISTE finished.
> 
> Above is the correction for my previous misunderstanding. But the real
> scenario of the problem is as following:
> 
> __rtnl_newlink
> vlan_newlink
> register_vlan_dev(vlan_ndev, ...)
> register_netdevice(vlan_ndev)
> netdevice_queue_work(..., vlan_ndev) [dev_hold(vlan_ndev)]
> queue_work(gid_cache_wq, ...)

This is exactly what I'm saying, the rdma code saw a registered device
and captured a ref on it, passing it to a work queue.

> rtnl_configure_link(vlan_ndev, ...) [failed]
> ops->dellink(vlan_ndev, &list_kill) [unregister_vlan_dev]
	/* Get rid of the vlan's reference to real_dev */
	dev_put(real_dev);
> unregister_netdevice_many(&list_kill)

Then it released the real_dev reference, leaving a dangled pointer and
goes into unregister_netdevice_many which does:

		dev->reg_state = NETREG_UNREGISTERING;
and eventually

		net_set_todo(dev);

then unlocks RTNL. The get prevents it from progressing past
NETREG_UNREGISTERING

Now later we touch the vlan dev, it is reg_state UNREGISTERED and it's
memory is corrupted because it dropped the ref it was holding on the
pointer it returns, which has now since been freed.

The only reason the dangled pointer doesn't cause larger problems, is
because rtnl saves it - but continuing to reference a pointer that no
longer has a valid ref is certainly a bad practice.

> So my first solution as following for the problem is correct.
> https://lore.kernel.org/linux-rdma/20211025163941.GA393143@nvidia.com/T/#m44abbf1ea5e4b5237610c1b389c3340d92a03b8d

No, it still isn't.

Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
in the work queue, but that feels pretty hacky to me as the main point
of the UNREGISTERING state is to keep the object alive enough that
those with outstanding gets can compelte their work and release the
get. Leaving a wrecked object in UNREGISTERING is a bad design.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-29 12:13         ` Jason Gunthorpe
@ 2021-10-29 13:46           ` Jakub Kicinski
  2021-10-29 18:47             ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2021-10-29 13:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ziyang Xuan (William), davem, netdev, linux-kernel, linux-rdma

On Fri, 29 Oct 2021 09:13:24 -0300 Jason Gunthorpe wrote:
> Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
> in the work queue, but that feels pretty hacky to me as the main point
> of the UNREGISTERING state is to keep the object alive enough that
> those with outstanding gets can compelte their work and release the
> get. Leaving a wrecked object in UNREGISTERING is a bad design.

That or we should investigate if we could hold the ref for real_dev all
the way until vlan_dev_free().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-29 13:46           ` Jakub Kicinski
@ 2021-10-29 18:47             ` Jason Gunthorpe
  2021-10-30  4:09               ` Ziyang Xuan (William)
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-10-29 18:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ziyang Xuan (William), davem, netdev, linux-kernel, linux-rdma

On Fri, Oct 29, 2021 at 06:46:10AM -0700, Jakub Kicinski wrote:
> On Fri, 29 Oct 2021 09:13:24 -0300 Jason Gunthorpe wrote:
> > Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
> > in the work queue, but that feels pretty hacky to me as the main point
> > of the UNREGISTERING state is to keep the object alive enough that
> > those with outstanding gets can compelte their work and release the
> > get. Leaving a wrecked object in UNREGISTERING is a bad design.
> 
> That or we should investigate if we could hold the ref for real_dev all
> the way until vlan_dev_free().

The latter is certainly better if it works out, no circular deps, etc.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-10-29 18:47             ` Jason Gunthorpe
@ 2021-10-30  4:09               ` Ziyang Xuan (William)
  0 siblings, 0 replies; 10+ messages in thread
From: Ziyang Xuan (William) @ 2021-10-30  4:09 UTC (permalink / raw)
  To: Jason Gunthorpe, Jakub Kicinski; +Cc: davem, netdev, linux-kernel, linux-rdma

> On Fri, Oct 29, 2021 at 06:46:10AM -0700, Jakub Kicinski wrote:
>> On Fri, 29 Oct 2021 09:13:24 -0300 Jason Gunthorpe wrote:
>>> Jakub's path would be to test vlan_dev->reg_state != NETREG_REGISTERED
>>> in the work queue, but that feels pretty hacky to me as the main point
>>> of the UNREGISTERING state is to keep the object alive enough that
>>> those with outstanding gets can compelte their work and release the
>>> get. Leaving a wrecked object in UNREGISTERING is a bad design.
>>
>> That or we should investigate if we could hold the ref for real_dev all
>> the way until vlan_dev_free().
> 

Synchronize test results with the following modification:

@@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
        }

        vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
-
-       /* Get rid of the vlan's reference to real_dev */
-       dev_put(real_dev);
 }

@@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)

        free_percpu(vlan->vlan_pcpu_stats);
        vlan->vlan_pcpu_stats = NULL;
+
+       /* Get rid of the vlan's reference to real_dev */
+       dev_put(vlan->real_dev);
 }

It works on the UAF problem. And I have taken kmemleak tests for vlan_dev and real_dev,
no kmemleak problem and new UAF problem.

So take this modification for this problem?

> The latter is certainly better if it works out, no circular deps, etc.
> 
> Thanks,
> Jason
> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-30  4:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27 12:16 [PATCH net] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan
2021-10-28  1:46 ` Jakub Kicinski
2021-10-28  5:44   ` Leon Romanovsky
2021-10-28 11:45   ` Jason Gunthorpe
2021-10-28 14:00     ` Jakub Kicinski
2021-10-29  7:04       ` Ziyang Xuan (William)
2021-10-29 12:13         ` Jason Gunthorpe
2021-10-29 13:46           ` Jakub Kicinski
2021-10-29 18:47             ` Jason Gunthorpe
2021-10-30  4:09               ` Ziyang Xuan (William)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.