netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
@ 2022-06-17  7:29 Jason Wang
  2022-06-17 10:12 ` Michael S. Tsirkin
  2022-06-27  6:33 ` Jason Wang
  0 siblings, 2 replies; 6+ messages in thread
From: Jason Wang @ 2022-06-17  7:29 UTC (permalink / raw)
  To: mst, jasowang, davem, kuba; +Cc: virtualization, netdev, linux-kernel

We used to call virtio_device_ready() after netdev registration. This
cause a race between ndo_open() and virtio_device_ready(): if
ndo_open() is called before virtio_device_ready(), the driver may
start to use the device before DRIVER_OK which violates the spec.

Fixing this by switching to use register_netdevice() and protect the
virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
only be called after virtio_device_ready().

Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index db05b5e930be..8a5810bcb839 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (vi->has_rss || vi->has_rss_hash_report)
 		virtnet_init_default_rss(vi);
 
-	err = register_netdev(dev);
+	/* serialize netdev register + virtio_device_ready() with ndo_open() */
+	rtnl_lock();
+
+	err = register_netdevice(dev);
 	if (err) {
 		pr_debug("virtio_net: registering device failed\n");
+		rtnl_unlock();
 		goto free_failover;
 	}
 
 	virtio_device_ready(vdev);
 
+	rtnl_unlock();
+
 	err = virtnet_cpu_notif_add(vi);
 	if (err) {
 		pr_debug("virtio_net: registering cpu notifier failed\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
  2022-06-17  7:29 [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready() Jason Wang
@ 2022-06-17 10:12 ` Michael S. Tsirkin
  2022-06-17 11:46   ` Jason Wang
  2022-06-27  6:33 ` Jason Wang
  1 sibling, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2022-06-17 10:12 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, kuba, virtualization, netdev, linux-kernel

On Fri, Jun 17, 2022 at 03:29:49PM +0800, Jason Wang wrote:
> We used to call virtio_device_ready() after netdev registration. This
> cause a race between ndo_open() and virtio_device_ready(): if
> ndo_open() is called before virtio_device_ready(), the driver may
> start to use the device before DRIVER_OK which violates the spec.
> 
> Fixing this by switching to use register_netdevice() and protect the
> virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
> only be called after virtio_device_ready().
> 
> Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index db05b5e930be..8a5810bcb839 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (vi->has_rss || vi->has_rss_hash_report)
>  		virtnet_init_default_rss(vi);
>  
> -	err = register_netdev(dev);
> +	/* serialize netdev register + virtio_device_ready() with ndo_open() */
> +	rtnl_lock();
> +
> +	err = register_netdevice(dev);
>  	if (err) {
>  		pr_debug("virtio_net: registering device failed\n");
> +		rtnl_unlock();
>  		goto free_failover;
>  	}
>  
>  	virtio_device_ready(vdev);
>  
> +	rtnl_unlock();
> +
>  	err = virtnet_cpu_notif_add(vi);
>  	if (err) {
>  		pr_debug("virtio_net: registering cpu notifier failed\n");


Looks good but then don't we have the same issue when removing the
device?

Actually I looked at  virtnet_remove and I see
        unregister_netdev(vi->dev);

        net_failover_destroy(vi->failover);

        remove_vq_common(vi); <- this will reset the device

a window here?


Really, I think what we had originally was a better idea -
instead of dropping interrupts they were delayed and
when driver is ready to accept them it just enables them.
We just need to make sure driver does not wait for
interrupts before enabling them.

And I suspect we need to make this opt-in on a per driver
basis.



> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
  2022-06-17 10:12 ` Michael S. Tsirkin
@ 2022-06-17 11:46   ` Jason Wang
  2022-06-17 12:32     ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2022-06-17 11:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, Jakub Kicinski, virtualization, netdev, linux-kernel

On Fri, Jun 17, 2022 at 6:13 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jun 17, 2022 at 03:29:49PM +0800, Jason Wang wrote:
> > We used to call virtio_device_ready() after netdev registration. This
> > cause a race between ndo_open() and virtio_device_ready(): if
> > ndo_open() is called before virtio_device_ready(), the driver may
> > start to use the device before DRIVER_OK which violates the spec.
> >
> > Fixing this by switching to use register_netdevice() and protect the
> > virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
> > only be called after virtio_device_ready().
> >
> > Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/net/virtio_net.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index db05b5e930be..8a5810bcb839 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev)
> >       if (vi->has_rss || vi->has_rss_hash_report)
> >               virtnet_init_default_rss(vi);
> >
> > -     err = register_netdev(dev);
> > +     /* serialize netdev register + virtio_device_ready() with ndo_open() */
> > +     rtnl_lock();
> > +
> > +     err = register_netdevice(dev);
> >       if (err) {
> >               pr_debug("virtio_net: registering device failed\n");
> > +             rtnl_unlock();
> >               goto free_failover;
> >       }
> >
> >       virtio_device_ready(vdev);
> >
> > +     rtnl_unlock();
> > +
> >       err = virtnet_cpu_notif_add(vi);
> >       if (err) {
> >               pr_debug("virtio_net: registering cpu notifier failed\n");
>
>
> Looks good but then don't we have the same issue when removing the
> device?
>
> Actually I looked at  virtnet_remove and I see
>         unregister_netdev(vi->dev);
>
>         net_failover_destroy(vi->failover);
>
>         remove_vq_common(vi); <- this will reset the device
>
> a window here?

Probably. For safety, we probably need to reset before unregistering.

>
>
> Really, I think what we had originally was a better idea -
> instead of dropping interrupts they were delayed and
> when driver is ready to accept them it just enables them.

The problem is that it works only on some specific setup:

- doesn't work on shared IRQ
- doesn't work on some specific driver e.g virtio-blk

> We just need to make sure driver does not wait for
> interrupts before enabling them.
>
> And I suspect we need to make this opt-in on a per driver
> basis.

Exactly.

Thanks

>
>
>
> > --
> > 2.25.1
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
  2022-06-17 11:46   ` Jason Wang
@ 2022-06-17 12:32     ` Michael S. Tsirkin
  2022-06-20  3:34       ` Jason Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2022-06-17 12:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, Jakub Kicinski, virtualization, netdev, linux-kernel

On Fri, Jun 17, 2022 at 07:46:23PM +0800, Jason Wang wrote:
> On Fri, Jun 17, 2022 at 6:13 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Jun 17, 2022 at 03:29:49PM +0800, Jason Wang wrote:
> > > We used to call virtio_device_ready() after netdev registration. This
> > > cause a race between ndo_open() and virtio_device_ready(): if
> > > ndo_open() is called before virtio_device_ready(), the driver may
> > > start to use the device before DRIVER_OK which violates the spec.
> > >
> > > Fixing this by switching to use register_netdevice() and protect the
> > > virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
> > > only be called after virtio_device_ready().
> > >
> > > Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/net/virtio_net.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index db05b5e930be..8a5810bcb839 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >       if (vi->has_rss || vi->has_rss_hash_report)
> > >               virtnet_init_default_rss(vi);
> > >
> > > -     err = register_netdev(dev);
> > > +     /* serialize netdev register + virtio_device_ready() with ndo_open() */
> > > +     rtnl_lock();
> > > +
> > > +     err = register_netdevice(dev);
> > >       if (err) {
> > >               pr_debug("virtio_net: registering device failed\n");
> > > +             rtnl_unlock();
> > >               goto free_failover;
> > >       }
> > >
> > >       virtio_device_ready(vdev);
> > >
> > > +     rtnl_unlock();
> > > +
> > >       err = virtnet_cpu_notif_add(vi);
> > >       if (err) {
> > >               pr_debug("virtio_net: registering cpu notifier failed\n");
> >
> >
> > Looks good but then don't we have the same issue when removing the
> > device?
> >
> > Actually I looked at  virtnet_remove and I see
> >         unregister_netdev(vi->dev);
> >
> >         net_failover_destroy(vi->failover);
> >
> >         remove_vq_common(vi); <- this will reset the device
> >
> > a window here?
> 
> Probably. For safety, we probably need to reset before unregistering.


careful not to create new races, let's analyse this one to be
sure first.

> >
> >
> > Really, I think what we had originally was a better idea -
> > instead of dropping interrupts they were delayed and
> > when driver is ready to accept them it just enables them.
> 
> The problem is that it works only on some specific setup:
> 
> - doesn't work on shared IRQ
> - doesn't work on some specific driver e.g virtio-blk

can some core irq work fix that?

> > We just need to make sure driver does not wait for
> > interrupts before enabling them.
> >
> > And I suspect we need to make this opt-in on a per driver
> > basis.
> 
> Exactly.
> 
> Thanks
> 
> >
> >
> >
> > > --
> > > 2.25.1
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
  2022-06-17 12:32     ` Michael S. Tsirkin
@ 2022-06-20  3:34       ` Jason Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Wang @ 2022-06-20  3:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, Jakub Kicinski, virtualization, netdev, linux-kernel


在 2022/6/17 20:32, Michael S. Tsirkin 写道:
> On Fri, Jun 17, 2022 at 07:46:23PM +0800, Jason Wang wrote:
>> On Fri, Jun 17, 2022 at 6:13 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Fri, Jun 17, 2022 at 03:29:49PM +0800, Jason Wang wrote:
>>>> We used to call virtio_device_ready() after netdev registration. This
>>>> cause a race between ndo_open() and virtio_device_ready(): if
>>>> ndo_open() is called before virtio_device_ready(), the driver may
>>>> start to use the device before DRIVER_OK which violates the spec.
>>>>
>>>> Fixing this by switching to use register_netdevice() and protect the
>>>> virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
>>>> only be called after virtio_device_ready().
>>>>
>>>> Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early")
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>>   drivers/net/virtio_net.c | 8 +++++++-
>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index db05b5e930be..8a5810bcb839 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev)
>>>>        if (vi->has_rss || vi->has_rss_hash_report)
>>>>                virtnet_init_default_rss(vi);
>>>>
>>>> -     err = register_netdev(dev);
>>>> +     /* serialize netdev register + virtio_device_ready() with ndo_open() */
>>>> +     rtnl_lock();
>>>> +
>>>> +     err = register_netdevice(dev);
>>>>        if (err) {
>>>>                pr_debug("virtio_net: registering device failed\n");
>>>> +             rtnl_unlock();
>>>>                goto free_failover;
>>>>        }
>>>>
>>>>        virtio_device_ready(vdev);
>>>>
>>>> +     rtnl_unlock();
>>>> +
>>>>        err = virtnet_cpu_notif_add(vi);
>>>>        if (err) {
>>>>                pr_debug("virtio_net: registering cpu notifier failed\n");
>>>
>>> Looks good but then don't we have the same issue when removing the
>>> device?
>>>
>>> Actually I looked at  virtnet_remove and I see
>>>          unregister_netdev(vi->dev);
>>>
>>>          net_failover_destroy(vi->failover);
>>>
>>>          remove_vq_common(vi); <- this will reset the device
>>>
>>> a window here?
>> Probably. For safety, we probably need to reset before unregistering.
>
> careful not to create new races, let's analyse this one to be
> sure first.


Yes, if we do that, there could be an infinite wait in ctrl commands.

So we are probably fine here since unregister_netdev() will make sure 
(otherwise it should be a bug of unregister_netdev()):

1) NAPI is disabled (and synced) so no new NAPI could be enabled by the 
callbacks
2) TX is disabled (and synced) so the qdisc could not be scheduled even 
if skb_xmit_done() is called between the window


>
>>>
>>> Really, I think what we had originally was a better idea -
>>> instead of dropping interrupts they were delayed and
>>> when driver is ready to accept them it just enables them.
>> The problem is that it works only on some specific setup:
>>
>> - doesn't work on shared IRQ
>> - doesn't work on some specific driver e.g virtio-blk
> can some core irq work fix that?


Not sure. At least for the shared IRQ part, there's no way to disable a 
specific handler currently. More below.


>
>>> We just need to make sure driver does not wait for
>>> interrupts before enabling them.


This only help for the case:

1) the virtio_device_ready() is called after subsystem 
initialization/registration
2) the driver use rx interrupt

It doesn't solve the race between subsystem registration/initialization 
and virtio_device_ready() or the case when the virtio_device_ready() 
needs to be called before subsystem registration.

Thanks


>>>
>>> And I suspect we need to make this opt-in on a per driver
>>> basis.
>> Exactly.
>>
>> Thanks
>>
>>>
>>>
>>>> --
>>>> 2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
  2022-06-17  7:29 [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready() Jason Wang
  2022-06-17 10:12 ` Michael S. Tsirkin
@ 2022-06-27  6:33 ` Jason Wang
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Wang @ 2022-06-27  6:33 UTC (permalink / raw)
  To: mst, jasowang, davem, Jakub Kicinski; +Cc: virtualization, netdev, linux-kernel

On Fri, Jun 17, 2022 at 3:29 PM Jason Wang <jasowang@redhat.com> wrote:
>
> We used to call virtio_device_ready() after netdev registration. This
> cause a race between ndo_open() and virtio_device_ready(): if
> ndo_open() is called before virtio_device_ready(), the driver may
> start to use the device before DRIVER_OK which violates the spec.
>
> Fixing this by switching to use register_netdevice() and protect the
> virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
> only be called after virtio_device_ready().
>
> Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Ok, I think we're fine with this. So I will repost against -net.

If we spot issues with unregistering, we can use a separate patch for that.

Thanks

> ---
>  drivers/net/virtio_net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index db05b5e930be..8a5810bcb839 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev)
>         if (vi->has_rss || vi->has_rss_hash_report)
>                 virtnet_init_default_rss(vi);
>
> -       err = register_netdev(dev);
> +       /* serialize netdev register + virtio_device_ready() with ndo_open() */
> +       rtnl_lock();
> +
> +       err = register_netdevice(dev);
>         if (err) {
>                 pr_debug("virtio_net: registering device failed\n");
> +               rtnl_unlock();
>                 goto free_failover;
>         }
>
>         virtio_device_ready(vdev);
>
> +       rtnl_unlock();
> +
>         err = virtnet_cpu_notif_add(vi);
>         if (err) {
>                 pr_debug("virtio_net: registering cpu notifier failed\n");
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-27  6:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17  7:29 [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready() Jason Wang
2022-06-17 10:12 ` Michael S. Tsirkin
2022-06-17 11:46   ` Jason Wang
2022-06-17 12:32     ` Michael S. Tsirkin
2022-06-20  3:34       ` Jason Wang
2022-06-27  6:33 ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).