All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Jones <lee.jones@linaro.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	kvm <kvm@vger.kernel.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	netdev <netdev@vger.kernel.org>,
	stable@vger.kernel.org,
	syzbot+adc3cb32385586bec859@syzkaller.appspotmail.com
Subject: Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
Date: Wed, 2 Mar 2022 16:49:17 +0000	[thread overview]
Message-ID: <Yh+gDZUbgBRx/1ro@google.com> (raw)
In-Reply-To: <20220302112945-mutt-send-email-mst@kernel.org>

On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 05:28:31PM +0100, Stefano Garzarella wrote:
> > On Wed, Mar 2, 2022 at 3:57 PM Lee Jones <lee.jones@linaro.org> wrote:
> > >
> > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > >
> > > > On Wed, Mar 02, 2022 at 01:56:35PM +0000, Lee Jones wrote:
> > > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > > >
> > > > > > On Wed, Mar 02, 2022 at 07:54:21AM +0000, Lee Jones wrote:
> > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > >
> > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > >
> > > > > > > Cc: <stable@vger.kernel.org>
> > > > > > > Reported-by: syzbot+adc3cb32385586bec859@syzkaller.appspotmail.com
> > > > > > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > > > > > ---
> > > > > > >  drivers/vhost/vhost.c | 2 ++
> > > > > > >  1 file changed, 2 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > >         int i;
> > > > > > >
> > > > > > >         for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > +               mutex_lock(&dev->vqs[i]->mutex);
> > > > > > >                 if (dev->vqs[i]->error_ctx)
> > > > > > >                         eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > > >                 if (dev->vqs[i]->kick)
> > > > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > >                 if (dev->vqs[i]->call_ctx.ctx)
> > > > > > >                         eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > > >                 vhost_vq_reset(dev, dev->vqs[i]);
> > > > > > > +               mutex_unlock(&dev->vqs[i]->mutex);
> > > > > > >         }
> > > > > >
> > > > > > So this is a mitigation plan but the bug is still there though
> > > > > > we don't know exactly what it is.  I would prefer adding something like
> > > > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > > > >
> > > > > As a rework to this, or as a subsequent patch?
> > > >
> > > > Can be a separate patch.
> > > >
> > > > > Just before the first lock I assume?
> > > >
> > > > I guess so, yes.
> > >
> > > No problem.  Patch to follow.
> > >
> > > I'm also going to attempt to debug the root cause, but I'm new to this
> > > subsystem to it might take a while for me to get my head around.
> > 
> > IIUC the root cause should be the same as the one we solved here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> > 
> > The worker was not stopped before calling vhost_dev_cleanup(). So while 
> > the worker was still running we were going to free memory or initialize 
> > fields while it was still using virtqueue.
> 
> Right, and I agree but it's not the root though, we do attempt to stop all workers.

Exactly.  This is what happens, but the question I'm going to attempt
to answer is *why* does this happen.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

WARNING: multiple messages have this Message-ID (diff)
From: Lee Jones <lee.jones@linaro.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: syzbot+adc3cb32385586bec859@syzkaller.appspotmail.com,
	kvm <kvm@vger.kernel.org>, netdev <netdev@vger.kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	stable@vger.kernel.org,
	Linux Virtualization <virtualization@lists.linux-foundation.org>
Subject: Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
Date: Wed, 2 Mar 2022 16:49:17 +0000	[thread overview]
Message-ID: <Yh+gDZUbgBRx/1ro@google.com> (raw)
In-Reply-To: <20220302112945-mutt-send-email-mst@kernel.org>

On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:

> On Wed, Mar 02, 2022 at 05:28:31PM +0100, Stefano Garzarella wrote:
> > On Wed, Mar 2, 2022 at 3:57 PM Lee Jones <lee.jones@linaro.org> wrote:
> > >
> > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > >
> > > > On Wed, Mar 02, 2022 at 01:56:35PM +0000, Lee Jones wrote:
> > > > > On Wed, 02 Mar 2022, Michael S. Tsirkin wrote:
> > > > >
> > > > > > On Wed, Mar 02, 2022 at 07:54:21AM +0000, Lee Jones wrote:
> > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call
> > > > > > > to vhost_get_vq_desc().  All we have to do is take the same lock
> > > > > > > during virtqueue clean-up and we mitigate the reported issues.
> > > > > > >
> > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > >
> > > > > > > Cc: <stable@vger.kernel.org>
> > > > > > > Reported-by: syzbot+adc3cb32385586bec859@syzkaller.appspotmail.com
> > > > > > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > > > > > ---
> > > > > > >  drivers/vhost/vhost.c | 2 ++
> > > > > > >  1 file changed, 2 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > index 59edb5a1ffe28..bbaff6a5e21b8 100644
> > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > @@ -693,6 +693,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > >         int i;
> > > > > > >
> > > > > > >         for (i = 0; i < dev->nvqs; ++i) {
> > > > > > > +               mutex_lock(&dev->vqs[i]->mutex);
> > > > > > >                 if (dev->vqs[i]->error_ctx)
> > > > > > >                         eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > > > > >                 if (dev->vqs[i]->kick)
> > > > > > > @@ -700,6 +701,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > > > > >                 if (dev->vqs[i]->call_ctx.ctx)
> > > > > > >                         eventfd_ctx_put(dev->vqs[i]->call_ctx.ctx);
> > > > > > >                 vhost_vq_reset(dev, dev->vqs[i]);
> > > > > > > +               mutex_unlock(&dev->vqs[i]->mutex);
> > > > > > >         }
> > > > > >
> > > > > > So this is a mitigation plan but the bug is still there though
> > > > > > we don't know exactly what it is.  I would prefer adding something like
> > > > > > WARN_ON(mutex_is_locked(vqs[i]->mutex) here - does this make sense?
> > > > >
> > > > > As a rework to this, or as a subsequent patch?
> > > >
> > > > Can be a separate patch.
> > > >
> > > > > Just before the first lock I assume?
> > > >
> > > > I guess so, yes.
> > >
> > > No problem.  Patch to follow.
> > >
> > > I'm also going to attempt to debug the root cause, but I'm new to this
> > > subsystem to it might take a while for me to get my head around.
> > 
> > IIUC the root cause should be the same as the one we solved here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a58da53ffd70294ebea8ecd0eb45fd0d74add9f9
> > 
> > The worker was not stopped before calling vhost_dev_cleanup(). So while 
> > the worker was still running we were going to free memory or initialize 
> > fields while it was still using virtqueue.
> 
> Right, and I agree but it's not the root though, we do attempt to stop all workers.

Exactly.  This is what happens, but the question I'm going to attempt
to answer is *why* does this happen.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2022-03-02 16:49 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-02  7:54 [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use Lee Jones
2022-03-02  7:54 ` Lee Jones
2022-03-02  9:34 ` Stefano Garzarella
2022-03-02  9:34   ` Stefano Garzarella
2022-03-02 10:07   ` Lee Jones
2022-03-02 10:07     ` Lee Jones
2022-03-02 13:35   ` Michael S. Tsirkin
2022-03-02 13:35     ` Michael S. Tsirkin
2022-03-02 14:11     ` Stefano Garzarella
2022-03-02 14:11       ` Stefano Garzarella
2022-03-02 14:50       ` Michael S. Tsirkin
2022-03-02 14:50         ` Michael S. Tsirkin
2022-03-02 15:36         ` Stefano Garzarella
2022-03-02 15:36           ` Stefano Garzarella
2022-03-04 16:46           ` Michael S. Tsirkin
2022-03-04 16:46             ` Michael S. Tsirkin
2022-03-02 13:30 ` Michael S. Tsirkin
2022-03-02 13:30   ` Michael S. Tsirkin
2022-03-02 13:56   ` Lee Jones
2022-03-02 13:56     ` Lee Jones
2022-03-02 14:51     ` Michael S. Tsirkin
2022-03-02 14:51       ` Michael S. Tsirkin
2022-03-02 14:57       ` Lee Jones
2022-03-02 14:57         ` Lee Jones
2022-03-02 16:28         ` Stefano Garzarella
2022-03-02 16:28           ` Stefano Garzarella
2022-03-02 16:30           ` Michael S. Tsirkin
2022-03-02 16:30             ` Michael S. Tsirkin
2022-03-02 16:49             ` Lee Jones [this message]
2022-03-02 16:49               ` Lee Jones
2022-03-02 17:10               ` Stefano Garzarella
2022-03-02 17:10                 ` Stefano Garzarella
2022-03-03 14:17                 ` Lee Jones
2022-03-03 14:17                   ` Lee Jones
2022-03-04  5:00 ` Michael S. Tsirkin
2022-03-04  5:00   ` Michael S. Tsirkin
2022-03-04 15:22   ` Lee Jones
2022-03-04 15:22     ` Lee Jones
2022-03-04 16:48 ` Michael S. Tsirkin
2022-03-04 16:48   ` Michael S. Tsirkin
2022-03-04 16:56   ` Lee Jones
2022-03-04 16:56     ` Lee Jones
2022-03-07 19:17 Lee Jones
2022-03-07 19:17 ` Lee Jones
2022-03-07 19:33 ` Greg KH
2022-03-07 19:33   ` Greg KH
2022-03-07 22:39   ` Michael S. Tsirkin
2022-03-07 22:39     ` Michael S. Tsirkin
2022-03-08  8:10   ` Lee Jones
2022-03-08  8:10     ` Lee Jones
2022-03-08  8:11     ` Lee Jones
2022-03-08  8:11       ` Lee Jones
2022-03-08  8:57     ` Greg KH
2022-03-08  8:57       ` Greg KH
2022-03-08  9:15       ` Lee Jones
2022-03-08  9:15         ` Lee Jones
2022-03-08  9:57         ` Greg KH
2022-03-08  9:57           ` Greg KH
2022-03-08 10:08           ` Lee Jones
2022-03-08 10:08             ` Lee Jones
2022-03-08 10:55           ` Michael S. Tsirkin
2022-03-08 10:55             ` Michael S. Tsirkin
2022-03-08 11:45             ` Greg KH
2022-03-08 11:45               ` Greg KH
2022-03-08 12:27               ` Michael S. Tsirkin
2022-03-08 12:27                 ` Michael S. Tsirkin
2022-03-08 13:17                 ` Lee Jones
2022-03-08 13:17                   ` Lee Jones
2022-03-08 17:17                   ` Michael S. Tsirkin
2022-03-08 17:17                     ` Michael S. Tsirkin
2022-03-08 11:05       ` Michael S. Tsirkin
2022-03-08 11:05         ` Michael S. Tsirkin
2022-03-09 18:52       ` Leon Romanovsky
2022-03-09 18:52         ` Leon Romanovsky
2022-03-07 22:37 ` Michael S. Tsirkin
2022-03-07 22:37   ` Michael S. Tsirkin
2022-03-08  8:01   ` Lee Jones
2022-03-08  8:01     ` Lee Jones
2022-03-08 11:07     ` Michael S. Tsirkin
2022-03-08 11:07       ` Michael S. Tsirkin
2022-03-08  6:15 ` Jason Wang
2022-03-08  6:15   ` Jason Wang
2022-03-08  8:08   ` Lee Jones
2022-03-08  8:08     ` Lee Jones
2022-03-08 11:06     ` Michael S. Tsirkin
2022-03-08 11:06       ` Michael S. Tsirkin
2022-03-14  8:43 Lee Jones
2022-03-14  8:43 ` Lee Jones
2022-03-14  8:56 ` Greg KH
2022-03-14  8:56   ` Greg KH
2022-03-14 11:49 ` Michael S. Tsirkin
2022-03-14 11:49   ` Michael S. Tsirkin
2022-03-14 12:47   ` Lee Jones
2022-03-14 12:47     ` Lee Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yh+gDZUbgBRx/1ro@google.com \
    --to=lee.jones@linaro.org \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sgarzare@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+adc3cb32385586bec859@syzkaller.appspotmail.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.