All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	Halil Pasic <pasic@linux.ibm.com>,
	virtualization <virtualization@lists.linux-foundation.org>,
	"Hetzelt, Felicitas" <f.hetzelt@tu-berlin.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"kaplan, david" <david.kaplan@amd.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	mcgrof@kernel.org, David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
Date: Wed, 24 Nov 2021 15:59:12 +0800	[thread overview]
Message-ID: <CACGkMEsn8xbdEgrCwCWpGz7u=NoX-yADotCaeB2oNbZy_u9iOQ@mail.gmail.com> (raw)
In-Reply-To: <20211124022101-mutt-send-email-mst@kernel.org>

On Wed, Nov 24, 2021 at 3:22 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Nov 24, 2021 at 10:33:28AM +0800, Jason Wang wrote:
> > On Wed, Nov 24, 2021 at 10:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Nov 24, 2021 at 9:30 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> > > >
> > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > > On Tue, Nov 23, 2021 at 10:25:20AM +0800, Jason Wang wrote:
> > > > >> On Tue, Nov 23, 2021 at 4:24 AM Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > >> >
> > > > >> > On Mon, 22 Nov 2021 14:25:26 +0800
> > > > >> > Jason Wang <jasowang@redhat.com> wrote:
> > > > >> >
> > > > >> > > I think the fixes are:
> > > > >> > >
> > > > >> > > 1) fixing the vhost vsock
> > > > >> > > 2) use suppress_used_validation=true to let vsock driver to validate
> > > > >> > > the in buffer length
> > > > >> > > 3) probably a new feature so the driver can only enable the validation
> > > > >> > > when the feature is enabled.
> > > > >> >
> > > > >> > I'm not sure, I would consider a F_DEV_Y_FIXED_BUG_X a perfectly good
> > > > >> > feature. Frankly the set of such bugs is device implementation
> > > > >> > specific and it makes little sense to specify a feature bit
> > > > >> > that says the device implementation claims to adhere to some
> > > > >> > aspect of the specification. Also what would be the semantic
> > > > >> > of not negotiating F_DEV_Y_FIXED_BUG_X?
> > > > >>
> > > > >> Yes, I agree. Rethink of the feature bit, it seems unnecessary,
> > > > >> especially considering the driver should not care about the used
> > > > >> length for tx.
> > > > >>
> > > > >> >
> > > > >> > On the other hand I see no other way to keep the validation
> > > > >> > permanently enabled for fixed implementations, and get around the problem
> > > > >> > with broken implementations. So we could have something like
> > > > >> > VHOST_USED_LEN_STRICT.
> > > > >>
> > > > >> It's more about a choice of the driver's knowledge. For vsock TX it
> > > > >> should be fine. If we introduce a parameter and disable it by default,
> > > > >> it won't be very useful.
> > > > >>
> > > > >> >
> > > > >> > Maybe, we can also think of 'warn and don't alter behavior' instead of
> > > > >> > 'warn' and alter behavior. Or maybe even not having such checks on in
> > > > >> > production, but only when testing.
> > > > >>
> > > > >> I think there's an agreement that virtio drivers need more hardening,
> > > > >> that's why a lot of patches were merged. Especially considering the
> > > > >> new requirements came from confidential computing, smart NIC and
> > > > >> VDUSE. For virtio drivers, enabling the validation may help to
> > > > >>
> > > > >> 1) protect the driver from the buggy and malicious device
> > > > >> 2) uncover the bugs of the devices (as vsock did, and probably rpmsg)
> > > > >> 3) force the have a smart driver that can do the validation itself
> > > > >> then we can finally remove the validation in the core
> > > > >>
> > > > >> So I'd like to keep it enabled.
> > > > >>
> > > > >> Thanks
> > > > >
> > > > > Let's see how far we can get. But yes, maybe we were too aggressive in
> > > > > breaking things by default, a warning might be a better choice for a
> > > > > couple of cycles.
> > >
> > > Ok, considering we saw the issues with balloons I think I can post a
> > > patch to use warn instead. I wonder if we need to taint the kernel in
> > > this case.
> >
> > Rethink this, consider we still have some time, I tend to convert the
> > drivers to validate the length by themselves. Does this make sense?
> >
> > Thanks
>
> That's separate but let's stop crashing guests for people ASAP.

Ok, will post a patch soon.

Thanks

>
>
> > >
> > > >
> > > > This series appears to break the virtio_balloon driver as well.
> > > >
> > > > The symptom is soft lockup warnings, eg:
> > > >
> > > >   INFO: task kworker/1:1:109 blocked for more than 614 seconds.
> > > >         Not tainted 5.16.0-rc2-gcc-10.3.0 #21
> > > >   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > >   task:kworker/1:1     state:D stack:12496 pid:  109 ppid:     2 flags:0x00000800
> > > >   Workqueue: events_freezable update_balloon_size_func
> > > >   Call Trace:
> > > >   [c000000003cef7c0] [c000000003cef820] 0xc000000003cef820 (unreliable)
> > > >   [c000000003cef9b0] [c00000000001e238] __switch_to+0x1e8/0x2f0
> > > >   [c000000003cefa10] [c000000000f0a00c] __schedule+0x2cc/0xb50
> > > >   [c000000003cefae0] [c000000000f0a8fc] schedule+0x6c/0x140
> > > >   [c000000003cefb10] [c00000000095b6c4] tell_host+0xe4/0x130
> > > >   [c000000003cefba0] [c00000000095d234] update_balloon_size_func+0x394/0x3f0
> > > >   [c000000003cefc70] [c000000000178064] process_one_work+0x2c4/0x5b0
> > > >   [c000000003cefd10] [c0000000001783f8] worker_thread+0xa8/0x640
> > > >   [c000000003cefda0] [c000000000185444] kthread+0x1b4/0x1c0
> > > >   [c000000003cefe10] [c00000000000cee4] ret_from_kernel_thread+0x5c/0x64
> > > >
> > > > Similar backtrace reported here by Luis:
> > > >
> > > >   https://lore.kernel.org/lkml/YY2duTi0wAyAKUTJ@bombadil.infradead.org/
> > > >
> > > > Bisect points to:
> > > >
> > > >   # first bad commit: [939779f5152d161b34f612af29e7dc1ac4472fcf] virtio_ring: validate used buffer length
> > > >
> > > > Adding suppress used validation to the virtio balloon driver "fixes" it, eg.
> > > >
> > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > > > index c22ff0117b46..a14b82ceebb2 100644
> > > > --- a/drivers/virtio/virtio_balloon.c
> > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > @@ -1150,6 +1150,7 @@ static unsigned int features[] = {
> > > >  };
> > > >
> > > >  static struct virtio_driver virtio_balloon_driver = {
> > > > +       .suppress_used_validation = true,
> > > >         .feature_table = features,
> > > >         .feature_table_size = ARRAY_SIZE(features),
> > > >         .driver.name =  KBUILD_MODNAME,
> > >
> > > Looks good, we need a formal patch for this.
> > >
> > > And we need fix Qemu as well which advertise non zero used length for
> > > inflate/deflate queue:
> > >
> > > static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > ...
> > >         virtqueue_push(vq, elem, offset);
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > cheers
> > > >
>


WARNING: multiple messages have this Message-ID (diff)
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "kaplan, david" <david.kaplan@amd.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Hetzelt, Felicitas" <f.hetzelt@tu-berlin.de>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	virtualization <virtualization@lists.linux-foundation.org>,
	Halil Pasic <pasic@linux.ibm.com>,
	mcgrof@kernel.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [PATCH V5 1/4] virtio_ring: validate used buffer length
Date: Wed, 24 Nov 2021 15:59:12 +0800	[thread overview]
Message-ID: <CACGkMEsn8xbdEgrCwCWpGz7u=NoX-yADotCaeB2oNbZy_u9iOQ@mail.gmail.com> (raw)
In-Reply-To: <20211124022101-mutt-send-email-mst@kernel.org>

On Wed, Nov 24, 2021 at 3:22 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Nov 24, 2021 at 10:33:28AM +0800, Jason Wang wrote:
> > On Wed, Nov 24, 2021 at 10:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Wed, Nov 24, 2021 at 9:30 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> > > >
> > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > > On Tue, Nov 23, 2021 at 10:25:20AM +0800, Jason Wang wrote:
> > > > >> On Tue, Nov 23, 2021 at 4:24 AM Halil Pasic <pasic@linux.ibm.com> wrote:
> > > > >> >
> > > > >> > On Mon, 22 Nov 2021 14:25:26 +0800
> > > > >> > Jason Wang <jasowang@redhat.com> wrote:
> > > > >> >
> > > > >> > > I think the fixes are:
> > > > >> > >
> > > > >> > > 1) fixing the vhost vsock
> > > > >> > > 2) use suppress_used_validation=true to let vsock driver to validate
> > > > >> > > the in buffer length
> > > > >> > > 3) probably a new feature so the driver can only enable the validation
> > > > >> > > when the feature is enabled.
> > > > >> >
> > > > >> > I'm not sure, I would consider a F_DEV_Y_FIXED_BUG_X a perfectly good
> > > > >> > feature. Frankly the set of such bugs is device implementation
> > > > >> > specific and it makes little sense to specify a feature bit
> > > > >> > that says the device implementation claims to adhere to some
> > > > >> > aspect of the specification. Also what would be the semantic
> > > > >> > of not negotiating F_DEV_Y_FIXED_BUG_X?
> > > > >>
> > > > >> Yes, I agree. Rethink of the feature bit, it seems unnecessary,
> > > > >> especially considering the driver should not care about the used
> > > > >> length for tx.
> > > > >>
> > > > >> >
> > > > >> > On the other hand I see no other way to keep the validation
> > > > >> > permanently enabled for fixed implementations, and get around the problem
> > > > >> > with broken implementations. So we could have something like
> > > > >> > VHOST_USED_LEN_STRICT.
> > > > >>
> > > > >> It's more about a choice of the driver's knowledge. For vsock TX it
> > > > >> should be fine. If we introduce a parameter and disable it by default,
> > > > >> it won't be very useful.
> > > > >>
> > > > >> >
> > > > >> > Maybe, we can also think of 'warn and don't alter behavior' instead of
> > > > >> > 'warn' and alter behavior. Or maybe even not having such checks on in
> > > > >> > production, but only when testing.
> > > > >>
> > > > >> I think there's an agreement that virtio drivers need more hardening,
> > > > >> that's why a lot of patches were merged. Especially considering the
> > > > >> new requirements came from confidential computing, smart NIC and
> > > > >> VDUSE. For virtio drivers, enabling the validation may help to
> > > > >>
> > > > >> 1) protect the driver from the buggy and malicious device
> > > > >> 2) uncover the bugs of the devices (as vsock did, and probably rpmsg)
> > > > >> 3) force the have a smart driver that can do the validation itself
> > > > >> then we can finally remove the validation in the core
> > > > >>
> > > > >> So I'd like to keep it enabled.
> > > > >>
> > > > >> Thanks
> > > > >
> > > > > Let's see how far we can get. But yes, maybe we were too aggressive in
> > > > > breaking things by default, a warning might be a better choice for a
> > > > > couple of cycles.
> > >
> > > Ok, considering we saw the issues with balloons I think I can post a
> > > patch to use warn instead. I wonder if we need to taint the kernel in
> > > this case.
> >
> > Rethink this, consider we still have some time, I tend to convert the
> > drivers to validate the length by themselves. Does this make sense?
> >
> > Thanks
>
> That's separate but let's stop crashing guests for people ASAP.

Ok, will post a patch soon.

Thanks

>
>
> > >
> > > >
> > > > This series appears to break the virtio_balloon driver as well.
> > > >
> > > > The symptom is soft lockup warnings, eg:
> > > >
> > > >   INFO: task kworker/1:1:109 blocked for more than 614 seconds.
> > > >         Not tainted 5.16.0-rc2-gcc-10.3.0 #21
> > > >   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > >   task:kworker/1:1     state:D stack:12496 pid:  109 ppid:     2 flags:0x00000800
> > > >   Workqueue: events_freezable update_balloon_size_func
> > > >   Call Trace:
> > > >   [c000000003cef7c0] [c000000003cef820] 0xc000000003cef820 (unreliable)
> > > >   [c000000003cef9b0] [c00000000001e238] __switch_to+0x1e8/0x2f0
> > > >   [c000000003cefa10] [c000000000f0a00c] __schedule+0x2cc/0xb50
> > > >   [c000000003cefae0] [c000000000f0a8fc] schedule+0x6c/0x140
> > > >   [c000000003cefb10] [c00000000095b6c4] tell_host+0xe4/0x130
> > > >   [c000000003cefba0] [c00000000095d234] update_balloon_size_func+0x394/0x3f0
> > > >   [c000000003cefc70] [c000000000178064] process_one_work+0x2c4/0x5b0
> > > >   [c000000003cefd10] [c0000000001783f8] worker_thread+0xa8/0x640
> > > >   [c000000003cefda0] [c000000000185444] kthread+0x1b4/0x1c0
> > > >   [c000000003cefe10] [c00000000000cee4] ret_from_kernel_thread+0x5c/0x64
> > > >
> > > > Similar backtrace reported here by Luis:
> > > >
> > > >   https://lore.kernel.org/lkml/YY2duTi0wAyAKUTJ@bombadil.infradead.org/
> > > >
> > > > Bisect points to:
> > > >
> > > >   # first bad commit: [939779f5152d161b34f612af29e7dc1ac4472fcf] virtio_ring: validate used buffer length
> > > >
> > > > Adding suppress used validation to the virtio balloon driver "fixes" it, eg.
> > > >
> > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > > > index c22ff0117b46..a14b82ceebb2 100644
> > > > --- a/drivers/virtio/virtio_balloon.c
> > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > @@ -1150,6 +1150,7 @@ static unsigned int features[] = {
> > > >  };
> > > >
> > > >  static struct virtio_driver virtio_balloon_driver = {
> > > > +       .suppress_used_validation = true,
> > > >         .feature_table = features,
> > > >         .feature_table_size = ARRAY_SIZE(features),
> > > >         .driver.name =  KBUILD_MODNAME,
> > >
> > > Looks good, we need a formal patch for this.
> > >
> > > And we need fix Qemu as well which advertise non zero used length for
> > > inflate/deflate queue:
> > >
> > > static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > ...
> > >         virtqueue_push(vq, elem, offset);
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > cheers
> > > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-11-24  7:59 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27  2:21 [PATCH V5 0/4] Validate used buffer length Jason Wang
2021-10-27  2:21 ` Jason Wang
2021-10-27  2:21 ` [PATCH V5 1/4] virtio_ring: validate " Jason Wang
2021-10-27  2:21   ` Jason Wang
2021-11-02  3:18   ` Xuan Zhuo
2021-11-02  3:54     ` Jason Wang
2021-11-02  3:54       ` Jason Wang
2021-11-19 15:09   ` Halil Pasic
2021-11-19 15:09     ` Halil Pasic
2021-11-22  3:51     ` Jason Wang
2021-11-22  3:51       ` Jason Wang
2021-11-22  5:35       ` Halil Pasic
2021-11-22  5:35         ` Halil Pasic
2021-11-22  5:49         ` Halil Pasic
2021-11-22  5:49           ` Halil Pasic
2021-11-22  6:25           ` Jason Wang
2021-11-22  6:25             ` Jason Wang
2021-11-22  7:55             ` Stefano Garzarella
2021-11-22  7:55               ` Stefano Garzarella
2021-11-22 11:08               ` Stefano Garzarella
2021-11-22 11:08                 ` Stefano Garzarella
2021-11-22 14:24                 ` Halil Pasic
2021-11-22 14:24                   ` Halil Pasic
2021-11-22 16:23                   ` Stefano Garzarella
2021-11-22 16:23                     ` Stefano Garzarella
2021-11-22 13:50             ` Halil Pasic
2021-11-22 13:50               ` Halil Pasic
2021-11-23  2:30               ` Jason Wang
2021-11-23  2:30                 ` Jason Wang
2021-11-23 12:17               ` Michael S. Tsirkin
2021-11-23 12:17                 ` Michael S. Tsirkin
2021-11-23 12:43                 ` Halil Pasic
2021-11-23 12:43                   ` Halil Pasic
2021-11-22 20:23             ` Halil Pasic
2021-11-22 20:23               ` Halil Pasic
2021-11-23  2:25               ` Jason Wang
2021-11-23  2:25                 ` Jason Wang
2021-11-23 11:05                 ` Michael S. Tsirkin
2021-11-23 11:05                   ` Michael S. Tsirkin
2021-11-24  1:30                   ` Michael Ellerman
2021-11-24  1:30                     ` Michael Ellerman
2021-11-24  2:26                     ` Jason Wang
2021-11-24  2:26                       ` Jason Wang
2021-11-24  2:33                       ` Jason Wang
2021-11-24  2:33                         ` Jason Wang
2021-11-24  7:22                         ` Michael S. Tsirkin
2021-11-24  7:22                           ` Michael S. Tsirkin
2021-11-24  7:59                           ` Jason Wang [this message]
2021-11-24  7:59                             ` Jason Wang
2021-11-24  8:24                             ` Michael S. Tsirkin
2021-11-24  8:24                               ` Michael S. Tsirkin
2021-11-24  8:28                               ` Jason Wang
2021-11-24  8:28                                 ` Jason Wang
2021-11-24 11:33                         ` Halil Pasic
2021-11-24 11:33                           ` Halil Pasic
2021-11-25  2:27                           ` Jason Wang
2021-11-25  2:27                             ` Jason Wang
2021-11-22  7:42       ` Stefano Garzarella
2021-11-22  7:42         ` Stefano Garzarella
2021-10-27  2:21 ` [PATCH V5 2/4] virtio-net: don't let virtio core to validate used length Jason Wang
2021-10-27  2:21   ` Jason Wang
2021-10-27  2:21 ` [PATCH V5 3/4] virtio-blk: " Jason Wang
2021-10-27  2:21   ` Jason Wang
2021-10-27  2:21 ` [PATCH V5 4/4] virtio-scsi: don't let virtio core to validate used buffer length Jason Wang
2021-10-27  2:21   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGkMEsn8xbdEgrCwCWpGz7u=NoX-yADotCaeB2oNbZy_u9iOQ@mail.gmail.com' \
    --to=jasowang@redhat.com \
    --cc=david.kaplan@amd.com \
    --cc=david@redhat.com \
    --cc=f.hetzelt@tu-berlin.de \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.