* [PATCH 0/3] Fixing races in probe/remove
@ 2022-06-20 5:11 Jason Wang
2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw)
To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin
Hi all:
This series tries to fix races spotted during code review for
caif_virtio in probe() and remove().
Compile test only. (I don't have setup to test them).
Please review.
Thanks
Jason Wang (3):
caif_virtio: remove virtqueue_disable_cb() in probe
caif_virtio: fix the race between virtio_device_ready() and ndo_open()
caif_virtio: fix the race between reset and netdev unregister
drivers/net/caif/caif_virtio.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe
2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang
@ 2022-06-20 5:11 ` Jason Wang
2022-06-20 9:02 ` Michael S. Tsirkin
2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang
2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang
2 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw)
To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin
This disabling is a just a hint with best effort, there's no guarantee
that device doesn't send notification. The driver should survive with
that, so let's remove it.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/caif/caif_virtio.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index 5458f57177a0..c677ded81133 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -705,9 +705,6 @@ static int cfv_probe(struct virtio_device *vdev)
netdev->needed_headroom = cfv->tx_hr;
netdev->needed_tailroom = cfv->tx_tr;
- /* Disable buffer release interrupts unless we have stopped TX queues */
- virtqueue_disable_cb(cfv->vq_tx);
-
netdev->mtu = cfv->mtu - cfv->tx_tr;
vdev->priv = cfv;
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open()
2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang
2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang
@ 2022-06-20 5:11 ` Jason Wang
2022-06-20 9:04 ` Michael S. Tsirkin
2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang
2 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw)
To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin
We used to depend on the virtio_device_ready() that is called after
probe() by virtio_dev_probe() after netdev registration. This
cause a race between ndo_open() and virtio_device_ready(): if
ndo_open() is called before virtio_device_ready(), the driver may
start to use the device (e.g TX) before DRIVER_OK which violates the
spec.
Fixing this by switching to use register_netdevice() and protect the
virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
only be called after virtio_device_ready().
Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/caif/caif_virtio.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index c677ded81133..66375bea2fcd 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -719,13 +719,21 @@ static int cfv_probe(struct virtio_device *vdev)
/* Carrier is off until netdevice is opened */
netif_carrier_off(netdev);
+ /* serialize netdev register + virtio_device_ready() with ndo_open() */
+ rtnl_lock();
+
/* register Netdev */
- err = register_netdev(netdev);
+ err = register_netdevice(netdev);
if (err) {
+ rtnl_unlock();
dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err);
goto err;
}
+ virtio_device_ready(vdev);
+
+ rtnl_unlock();
+
debugfs_init(cfv);
return 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang
2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang
2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang
@ 2022-06-20 5:11 ` Jason Wang
2022-06-20 9:09 ` Michael S. Tsirkin
2 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-06-20 5:11 UTC (permalink / raw)
To: netdev, linux-kernel, mst, jasowang; +Cc: davem, kuba, erwan.yvin
We use to do the following steps during .remove():
static void cfv_remove(struct virtio_device *vdev)
{
struct cfv_info *cfv = vdev->priv;
rtnl_lock();
dev_close(cfv->ndev);
rtnl_unlock();
tasklet_kill(&cfv->tx_release_tasklet);
debugfs_remove_recursive(cfv->debugfs);
vringh_kiov_cleanup(&cfv->ctx.riov);
virtio_reset_device(vdev);
vdev->vringh_config->del_vrhs(cfv->vdev);
cfv->vr_rx = NULL;
vdev->config->del_vqs(cfv->vdev);
unregister_netdev(cfv->ndev);
}
This is racy since device could be re-opened after dev_close() but
before unregister_netdevice():
1) RX vringh is cleaned before resetting the device, rx callbacks that
is called after the vringh_kiov_cleanup() will result a UAF
2) Network stack can still try to use TX virtqueue even if it has been
deleted after dev_vqs()
Fixing this by unregistering the network device first to make sure not
device access from both TX and RX side.
Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/caif/caif_virtio.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index 66375bea2fcd..a29f9b2df5b1 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
{
struct cfv_info *cfv = vdev->priv;
- rtnl_lock();
- dev_close(cfv->ndev);
- rtnl_unlock();
+ /* Make sure NAPI/TX won't try to access the device */
+ unregister_netdev(cfv->ndev);
tasklet_kill(&cfv->tx_release_tasklet);
debugfs_remove_recursive(cfv->debugfs);
@@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
vdev->vringh_config->del_vrhs(cfv->vdev);
cfv->vr_rx = NULL;
vdev->config->del_vqs(cfv->vdev);
- unregister_netdev(cfv->ndev);
}
static struct virtio_device_id id_table[] = {
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe
2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang
@ 2022-06-20 9:02 ` Michael S. Tsirkin
2022-06-21 3:10 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-06-20 9:02 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, davem, kuba, erwan.yvin
On Mon, Jun 20, 2022 at 01:11:13PM +0800, Jason Wang wrote:
> This disabling is a just a hint with best effort, there's no guarantee
> that device doesn't send notification. The driver should survive with
> that, so let's remove it.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
I guess, but frankly this change feels gratituous, and just might
uncover latent bugs. Which would be fine if we were out to
find and fix them, but given this was compile-tested only,
I'm not sure that's the case.
> ---
> drivers/net/caif/caif_virtio.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> index 5458f57177a0..c677ded81133 100644
> --- a/drivers/net/caif/caif_virtio.c
> +++ b/drivers/net/caif/caif_virtio.c
> @@ -705,9 +705,6 @@ static int cfv_probe(struct virtio_device *vdev)
> netdev->needed_headroom = cfv->tx_hr;
> netdev->needed_tailroom = cfv->tx_tr;
>
> - /* Disable buffer release interrupts unless we have stopped TX queues */
> - virtqueue_disable_cb(cfv->vq_tx);
> -
> netdev->mtu = cfv->mtu - cfv->tx_tr;
> vdev->priv = cfv;
>
> --
> 2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open()
2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang
@ 2022-06-20 9:04 ` Michael S. Tsirkin
0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-06-20 9:04 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, davem, kuba, erwan.yvin
On Mon, Jun 20, 2022 at 01:11:14PM +0800, Jason Wang wrote:
> We used to depend on the virtio_device_ready() that is called after
"We used to" implies it's not the case currently.
I think you mean "we currently depend".
> probe() by virtio_dev_probe() after netdev registration. This
> cause
causes
>a race between ndo_open() and virtio_device_ready(): if
> ndo_open() is called before virtio_device_ready(), the driver may
> start to use the device (e.g TX) before DRIVER_OK which violates the
> spec.
>
> Fixing this
Fix this
> by switching to use register_netdevice() and protect the
> virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
> only be called after virtio_device_ready().
>
> Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/net/caif/caif_virtio.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> index c677ded81133..66375bea2fcd 100644
> --- a/drivers/net/caif/caif_virtio.c
> +++ b/drivers/net/caif/caif_virtio.c
> @@ -719,13 +719,21 @@ static int cfv_probe(struct virtio_device *vdev)
> /* Carrier is off until netdevice is opened */
> netif_carrier_off(netdev);
>
> + /* serialize netdev register + virtio_device_ready() with ndo_open() */
> + rtnl_lock();
> +
> /* register Netdev */
> - err = register_netdev(netdev);
> + err = register_netdevice(netdev);
> if (err) {
> + rtnl_unlock();
> dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err);
> goto err;
> }
>
> + virtio_device_ready(vdev);
> +
> + rtnl_unlock();
> +
> debugfs_init(cfv);
>
> return 0;
> --
> 2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang
@ 2022-06-20 9:09 ` Michael S. Tsirkin
2022-06-20 9:18 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-06-20 9:09 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, davem, kuba, erwan.yvin
On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> We use to do the following steps during .remove():
We currently do
> static void cfv_remove(struct virtio_device *vdev)
> {
> struct cfv_info *cfv = vdev->priv;
>
> rtnl_lock();
> dev_close(cfv->ndev);
> rtnl_unlock();
>
> tasklet_kill(&cfv->tx_release_tasklet);
> debugfs_remove_recursive(cfv->debugfs);
>
> vringh_kiov_cleanup(&cfv->ctx.riov);
> virtio_reset_device(vdev);
> vdev->vringh_config->del_vrhs(cfv->vdev);
> cfv->vr_rx = NULL;
> vdev->config->del_vqs(cfv->vdev);
> unregister_netdev(cfv->ndev);
> }
> This is racy since device could be re-opened after dev_close() but
> before unregister_netdevice():
>
> 1) RX vringh is cleaned before resetting the device, rx callbacks that
> is called after the vringh_kiov_cleanup() will result a UAF
> 2) Network stack can still try to use TX virtqueue even if it has been
> deleted after dev_vqs()
>
> Fixing this by unregistering the network device first to make sure not
> device access from both TX and RX side.
>
> Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/net/caif/caif_virtio.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> index 66375bea2fcd..a29f9b2df5b1 100644
> --- a/drivers/net/caif/caif_virtio.c
> +++ b/drivers/net/caif/caif_virtio.c
> @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> {
> struct cfv_info *cfv = vdev->priv;
>
> - rtnl_lock();
> - dev_close(cfv->ndev);
> - rtnl_unlock();
> + /* Make sure NAPI/TX won't try to access the device */
> + unregister_netdev(cfv->ndev);
>
> tasklet_kill(&cfv->tx_release_tasklet);
> debugfs_remove_recursive(cfv->debugfs);
> @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> vdev->vringh_config->del_vrhs(cfv->vdev);
> cfv->vr_rx = NULL;
> vdev->config->del_vqs(cfv->vdev);
> - unregister_netdev(cfv->ndev);
> }
This gives me pause, callbacks can now trigger after device
has been unregistered. Are we sure this is safe?
Won't it be safer to just keep the rtnl_lock around
the whole process?
> static struct virtio_device_id id_table[] = {
> --
> 2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-20 9:09 ` Michael S. Tsirkin
@ 2022-06-20 9:18 ` Jason Wang
2022-06-20 10:18 ` Michael S. Tsirkin
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-06-20 9:18 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin
On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > We use to do the following steps during .remove():
>
> We currently do
>
>
> > static void cfv_remove(struct virtio_device *vdev)
> > {
> > struct cfv_info *cfv = vdev->priv;
> >
> > rtnl_lock();
> > dev_close(cfv->ndev);
> > rtnl_unlock();
> >
> > tasklet_kill(&cfv->tx_release_tasklet);
> > debugfs_remove_recursive(cfv->debugfs);
> >
> > vringh_kiov_cleanup(&cfv->ctx.riov);
> > virtio_reset_device(vdev);
> > vdev->vringh_config->del_vrhs(cfv->vdev);
> > cfv->vr_rx = NULL;
> > vdev->config->del_vqs(cfv->vdev);
> > unregister_netdev(cfv->ndev);
> > }
> > This is racy since device could be re-opened after dev_close() but
> > before unregister_netdevice():
> >
> > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > is called after the vringh_kiov_cleanup() will result a UAF
> > 2) Network stack can still try to use TX virtqueue even if it has been
> > deleted after dev_vqs()
> >
> > Fixing this by unregistering the network device first to make sure not
> > device access from both TX and RX side.
> >
> > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > drivers/net/caif/caif_virtio.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > index 66375bea2fcd..a29f9b2df5b1 100644
> > --- a/drivers/net/caif/caif_virtio.c
> > +++ b/drivers/net/caif/caif_virtio.c
> > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > {
> > struct cfv_info *cfv = vdev->priv;
> >
> > - rtnl_lock();
> > - dev_close(cfv->ndev);
> > - rtnl_unlock();
> > + /* Make sure NAPI/TX won't try to access the device */
> > + unregister_netdev(cfv->ndev);
> >
> > tasklet_kill(&cfv->tx_release_tasklet);
> > debugfs_remove_recursive(cfv->debugfs);
> > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > vdev->vringh_config->del_vrhs(cfv->vdev);
> > cfv->vr_rx = NULL;
> > vdev->config->del_vqs(cfv->vdev);
> > - unregister_netdev(cfv->ndev);
> > }
>
>
> This gives me pause, callbacks can now trigger after device
> has been unregistered. Are we sure this is safe?
It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
after tasklet_kill(). I can add a comment to explain this.
> Won't it be safer to just keep the rtnl_lock around
> the whole process?
It looks to me we rtnl_lock can't help in synchronizing with the
callbacks, anything I miss?
Thanks
>
> > static struct virtio_device_id id_table[] = {
> > --
> > 2.25.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-20 9:18 ` Jason Wang
@ 2022-06-20 10:18 ` Michael S. Tsirkin
2022-06-21 3:09 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-06-20 10:18 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin
On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > We use to do the following steps during .remove():
> >
> > We currently do
> >
> >
> > > static void cfv_remove(struct virtio_device *vdev)
> > > {
> > > struct cfv_info *cfv = vdev->priv;
> > >
> > > rtnl_lock();
> > > dev_close(cfv->ndev);
> > > rtnl_unlock();
> > >
> > > tasklet_kill(&cfv->tx_release_tasklet);
> > > debugfs_remove_recursive(cfv->debugfs);
> > >
> > > vringh_kiov_cleanup(&cfv->ctx.riov);
> > > virtio_reset_device(vdev);
> > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > cfv->vr_rx = NULL;
> > > vdev->config->del_vqs(cfv->vdev);
> > > unregister_netdev(cfv->ndev);
> > > }
> > > This is racy since device could be re-opened after dev_close() but
> > > before unregister_netdevice():
> > >
> > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > is called after the vringh_kiov_cleanup() will result a UAF
> > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > deleted after dev_vqs()
> > >
> > > Fixing this by unregistering the network device first to make sure not
> > > device access from both TX and RX side.
> > >
> > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > > drivers/net/caif/caif_virtio.c | 6 ++----
> > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > --- a/drivers/net/caif/caif_virtio.c
> > > +++ b/drivers/net/caif/caif_virtio.c
> > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > {
> > > struct cfv_info *cfv = vdev->priv;
> > >
> > > - rtnl_lock();
> > > - dev_close(cfv->ndev);
> > > - rtnl_unlock();
> > > + /* Make sure NAPI/TX won't try to access the device */
> > > + unregister_netdev(cfv->ndev);
> > >
> > > tasklet_kill(&cfv->tx_release_tasklet);
> > > debugfs_remove_recursive(cfv->debugfs);
> > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > cfv->vr_rx = NULL;
> > > vdev->config->del_vqs(cfv->vdev);
> > > - unregister_netdev(cfv->ndev);
> > > }
> >
> >
> > This gives me pause, callbacks can now trigger after device
> > has been unregistered. Are we sure this is safe?
>
> It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> after tasklet_kill(). I can add a comment to explain this.
that waits for outstanding tasklets but does it really prevent
future ones?
> > Won't it be safer to just keep the rtnl_lock around
> > the whole process?
>
> It looks to me we rtnl_lock can't help in synchronizing with the
> callbacks, anything I miss?
>
> Thanks
good point.
> >
> > > static struct virtio_device_id id_table[] = {
> > > --
> > > 2.25.1
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-20 10:18 ` Michael S. Tsirkin
@ 2022-06-21 3:09 ` Jason Wang
2022-06-21 6:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2022-06-21 3:09 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin
On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > We use to do the following steps during .remove():
> > >
> > > We currently do
> > >
> > >
> > > > static void cfv_remove(struct virtio_device *vdev)
> > > > {
> > > > struct cfv_info *cfv = vdev->priv;
> > > >
> > > > rtnl_lock();
> > > > dev_close(cfv->ndev);
> > > > rtnl_unlock();
> > > >
> > > > tasklet_kill(&cfv->tx_release_tasklet);
> > > > debugfs_remove_recursive(cfv->debugfs);
> > > >
> > > > vringh_kiov_cleanup(&cfv->ctx.riov);
> > > > virtio_reset_device(vdev);
> > > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > cfv->vr_rx = NULL;
> > > > vdev->config->del_vqs(cfv->vdev);
> > > > unregister_netdev(cfv->ndev);
> > > > }
> > > > This is racy since device could be re-opened after dev_close() but
> > > > before unregister_netdevice():
> > > >
> > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > > is called after the vringh_kiov_cleanup() will result a UAF
> > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > > deleted after dev_vqs()
> > > >
> > > > Fixing this by unregistering the network device first to make sure not
> > > > device access from both TX and RX side.
> > > >
> > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > drivers/net/caif/caif_virtio.c | 6 ++----
> > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > --- a/drivers/net/caif/caif_virtio.c
> > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > {
> > > > struct cfv_info *cfv = vdev->priv;
> > > >
> > > > - rtnl_lock();
> > > > - dev_close(cfv->ndev);
> > > > - rtnl_unlock();
> > > > + /* Make sure NAPI/TX won't try to access the device */
> > > > + unregister_netdev(cfv->ndev);
> > > >
> > > > tasklet_kill(&cfv->tx_release_tasklet);
> > > > debugfs_remove_recursive(cfv->debugfs);
> > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > cfv->vr_rx = NULL;
> > > > vdev->config->del_vqs(cfv->vdev);
> > > > - unregister_netdev(cfv->ndev);
> > > > }
> > >
> > >
> > > This gives me pause, callbacks can now trigger after device
> > > has been unregistered. Are we sure this is safe?
> >
> > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > after tasklet_kill(). I can add a comment to explain this.
>
> that waits for outstanding tasklets but does it really prevent
> future ones?
I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
the future scheduling of a tasklet.
Thanks
>
> > > Won't it be safer to just keep the rtnl_lock around
> > > the whole process?
> >
> > It looks to me we rtnl_lock can't help in synchronizing with the
> > callbacks, anything I miss?
> >
> > Thanks
>
> good point.
>
>
> > >
> > > > static struct virtio_device_id id_table[] = {
> > > > --
> > > > 2.25.1
> > >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe
2022-06-20 9:02 ` Michael S. Tsirkin
@ 2022-06-21 3:10 ` Jason Wang
0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2022-06-21 3:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin
On Mon, Jun 20, 2022 at 5:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 20, 2022 at 01:11:13PM +0800, Jason Wang wrote:
> > This disabling is a just a hint with best effort, there's no guarantee
> > that device doesn't send notification. The driver should survive with
> > that, so let's remove it.
> >
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
> I guess, but frankly this change feels gratituous, and just might
> uncover latent bugs. Which would be fine if we were out to
> find and fix them, but given this was compile-tested only,
> I'm not sure that's the case.
Ok, let me drop this from the next version (no way to test).
Thanks
>
>
> > ---
> > drivers/net/caif/caif_virtio.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > index 5458f57177a0..c677ded81133 100644
> > --- a/drivers/net/caif/caif_virtio.c
> > +++ b/drivers/net/caif/caif_virtio.c
> > @@ -705,9 +705,6 @@ static int cfv_probe(struct virtio_device *vdev)
> > netdev->needed_headroom = cfv->tx_hr;
> > netdev->needed_tailroom = cfv->tx_tr;
> >
> > - /* Disable buffer release interrupts unless we have stopped TX queues */
> > - virtqueue_disable_cb(cfv->vq_tx);
> > -
> > netdev->mtu = cfv->mtu - cfv->tx_tr;
> > vdev->priv = cfv;
> >
> > --
> > 2.25.1
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-21 3:09 ` Jason Wang
@ 2022-06-21 6:00 ` Michael S. Tsirkin
2022-06-21 6:25 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2022-06-21 6:00 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin
On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote:
> On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > > We use to do the following steps during .remove():
> > > >
> > > > We currently do
> > > >
> > > >
> > > > > static void cfv_remove(struct virtio_device *vdev)
> > > > > {
> > > > > struct cfv_info *cfv = vdev->priv;
> > > > >
> > > > > rtnl_lock();
> > > > > dev_close(cfv->ndev);
> > > > > rtnl_unlock();
> > > > >
> > > > > tasklet_kill(&cfv->tx_release_tasklet);
> > > > > debugfs_remove_recursive(cfv->debugfs);
> > > > >
> > > > > vringh_kiov_cleanup(&cfv->ctx.riov);
> > > > > virtio_reset_device(vdev);
> > > > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > cfv->vr_rx = NULL;
> > > > > vdev->config->del_vqs(cfv->vdev);
> > > > > unregister_netdev(cfv->ndev);
> > > > > }
> > > > > This is racy since device could be re-opened after dev_close() but
> > > > > before unregister_netdevice():
> > > > >
> > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > > > is called after the vringh_kiov_cleanup() will result a UAF
> > > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > > > deleted after dev_vqs()
> > > > >
> > > > > Fixing this by unregistering the network device first to make sure not
> > > > > device access from both TX and RX side.
> > > > >
> > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > > > drivers/net/caif/caif_virtio.c | 6 ++----
> > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > > --- a/drivers/net/caif/caif_virtio.c
> > > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > {
> > > > > struct cfv_info *cfv = vdev->priv;
> > > > >
> > > > > - rtnl_lock();
> > > > > - dev_close(cfv->ndev);
> > > > > - rtnl_unlock();
> > > > > + /* Make sure NAPI/TX won't try to access the device */
> > > > > + unregister_netdev(cfv->ndev);
> > > > >
> > > > > tasklet_kill(&cfv->tx_release_tasklet);
> > > > > debugfs_remove_recursive(cfv->debugfs);
> > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > cfv->vr_rx = NULL;
> > > > > vdev->config->del_vqs(cfv->vdev);
> > > > > - unregister_netdev(cfv->ndev);
> > > > > }
> > > >
> > > >
> > > > This gives me pause, callbacks can now trigger after device
> > > > has been unregistered. Are we sure this is safe?
> > >
> > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > > after tasklet_kill(). I can add a comment to explain this.
> >
> > that waits for outstanding tasklets but does it really prevent
> > future ones?
>
> I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
> the future scheduling of a tasklet.
>
> Thanks
But then in the end it clears it, does it not?
> >
> > > > Won't it be safer to just keep the rtnl_lock around
> > > > the whole process?
> > >
> > > It looks to me we rtnl_lock can't help in synchronizing with the
> > > callbacks, anything I miss?
> > >
> > > Thanks
> >
> > good point.
> >
> >
> > > >
> > > > > static struct virtio_device_id id_table[] = {
> > > > > --
> > > > > 2.25.1
> > > >
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister
2022-06-21 6:00 ` Michael S. Tsirkin
@ 2022-06-21 6:25 ` Jason Wang
0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2022-06-21 6:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, linux-kernel, davem, Jakub Kicinski, erwan.yvin
On Tue, Jun 21, 2022 at 2:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote:
> > On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > > > We use to do the following steps during .remove():
> > > > >
> > > > > We currently do
> > > > >
> > > > >
> > > > > > static void cfv_remove(struct virtio_device *vdev)
> > > > > > {
> > > > > > struct cfv_info *cfv = vdev->priv;
> > > > > >
> > > > > > rtnl_lock();
> > > > > > dev_close(cfv->ndev);
> > > > > > rtnl_unlock();
> > > > > >
> > > > > > tasklet_kill(&cfv->tx_release_tasklet);
> > > > > > debugfs_remove_recursive(cfv->debugfs);
> > > > > >
> > > > > > vringh_kiov_cleanup(&cfv->ctx.riov);
> > > > > > virtio_reset_device(vdev);
> > > > > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > > cfv->vr_rx = NULL;
> > > > > > vdev->config->del_vqs(cfv->vdev);
> > > > > > unregister_netdev(cfv->ndev);
> > > > > > }
> > > > > > This is racy since device could be re-opened after dev_close() but
> > > > > > before unregister_netdevice():
> > > > > >
> > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > > > > is called after the vringh_kiov_cleanup() will result a UAF
> > > > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > > > > deleted after dev_vqs()
> > > > > >
> > > > > > Fixing this by unregistering the network device first to make sure not
> > > > > > device access from both TX and RX side.
> > > > > >
> > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > > drivers/net/caif/caif_virtio.c | 6 ++----
> > > > > > 1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > > > --- a/drivers/net/caif/caif_virtio.c
> > > > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > > {
> > > > > > struct cfv_info *cfv = vdev->priv;
> > > > > >
> > > > > > - rtnl_lock();
> > > > > > - dev_close(cfv->ndev);
> > > > > > - rtnl_unlock();
> > > > > > + /* Make sure NAPI/TX won't try to access the device */
> > > > > > + unregister_netdev(cfv->ndev);
> > > > > >
> > > > > > tasklet_kill(&cfv->tx_release_tasklet);
> > > > > > debugfs_remove_recursive(cfv->debugfs);
> > > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > > vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > > cfv->vr_rx = NULL;
> > > > > > vdev->config->del_vqs(cfv->vdev);
> > > > > > - unregister_netdev(cfv->ndev);
> > > > > > }
> > > > >
> > > > >
> > > > > This gives me pause, callbacks can now trigger after device
> > > > > has been unregistered. Are we sure this is safe?
> > > >
> > > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > > > after tasklet_kill(). I can add a comment to explain this.
> > >
> > > that waits for outstanding tasklets but does it really prevent
> > > future ones?
> >
> > I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
> > the future scheduling of a tasklet.
> >
> > Thanks
>
> But then in the end it clears it, does it not?
Right, so we need to reset before taskset_kill().
Thanks
>
> > >
> > > > > Won't it be safer to just keep the rtnl_lock around
> > > > > the whole process?
> > > >
> > > > It looks to me we rtnl_lock can't help in synchronizing with the
> > > > callbacks, anything I miss?
> > > >
> > > > Thanks
> > >
> > > good point.
> > >
> > >
> > > > >
> > > > > > static struct virtio_device_id id_table[] = {
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-06-21 6:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 5:11 [PATCH 0/3] Fixing races in probe/remove Jason Wang
2022-06-20 5:11 ` [PATCH 1/3] caif_virtio: remove virtqueue_disable_cb() in probe Jason Wang
2022-06-20 9:02 ` Michael S. Tsirkin
2022-06-21 3:10 ` Jason Wang
2022-06-20 5:11 ` [PATCH 2/3] caif_virtio: fix the race between virtio_device_ready() and ndo_open() Jason Wang
2022-06-20 9:04 ` Michael S. Tsirkin
2022-06-20 5:11 ` [PATCH 3/3] caif_virtio: fix the race between reset and netdev unregister Jason Wang
2022-06-20 9:09 ` Michael S. Tsirkin
2022-06-20 9:18 ` Jason Wang
2022-06-20 10:18 ` Michael S. Tsirkin
2022-06-21 3:09 ` Jason Wang
2022-06-21 6:00 ` Michael S. Tsirkin
2022-06-21 6:25 ` Jason Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).