From: "Michael S. Tsirkin" <mst@redhat.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>,
Alexey Kardashevskiy <aik@ozlabs.ru>,
Michael Roth <mdroth@linux.vnet.ibm.com>,
qemu-devel@nongnu.org,
Christian Borntraeger <borntraeger@de.ibm.com>,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v2] virtio-pci: disable vring processing when bus-mastering is disabled
Date: Thu, 28 Nov 2019 12:03:01 -0500 [thread overview]
Message-ID: <20191128120223-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20191128174800.2d4a2cc2.pasic@linux.ibm.com>
On Thu, Nov 28, 2019 at 05:48:00PM +0100, Halil Pasic wrote:
> On Tue, 19 Nov 2019 18:50:03 -0600
> Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>
> [..]
> > I.e. the calling code is only scheduling a one-shot BH for
> > virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
> > an additional virtqueue entry before we get there. This is likely due
> > to the following check in virtio_queue_host_notifier_aio_poll:
> >
> > static bool virtio_queue_host_notifier_aio_poll(void *opaque)
> > {
> > EventNotifier *n = opaque;
> > VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
> > bool progress;
> >
> > if (!vq->vring.desc || virtio_queue_empty(vq)) {
> > return false;
> > }
> >
> > progress = virtio_queue_notify_aio_vq(vq);
> >
> > namely the call to virtio_queue_empty(). In this case, since no new
> > requests have actually been issued, shadow_avail_idx == last_avail_idx,
> > so we actually try to access the vring via vring_avail_idx() to get
> > the latest non-shadowed idx:
> >
> > int virtio_queue_empty(VirtQueue *vq)
> > {
> > bool empty;
> > ...
> >
> > if (vq->shadow_avail_idx != vq->last_avail_idx) {
> > return 0;
> > }
> >
> > rcu_read_lock();
> > empty = vring_avail_idx(vq) == vq->last_avail_idx;
> > rcu_read_unlock();
> > return empty;
> >
> > but since the IOMMU region has been disabled we get a bogus value (0
> > usually), which causes virtio_queue_empty() to falsely report that
> > there are entries to be processed, which causes errors such as:
> >
> > "virtio: zero sized buffers are not allowed"
> >
> > or
> >
> > "virtio-blk missing headers"
> >
> > and puts the device in an error state.
> >
>
> I've seen something similar on s390x with virtio-ccw-blk under
> protected virtualization, that made me wonder about how virtio-blk in
> particular but also virtio in general handles shutdown and reset.
>
> This makes me wonder if bus-mastering disabled is the only scenario when
> a something like vdev->disabled should be used.
>
> In particular I have the following mechanism in mind
>
> qemu_system_reset() --> ... --> qemu_devices_reset() --> ... -->
> --> virtio_[transport]_reset() --> ... --> virtio_bus_stop_ioeventfd()
> --> virtio_blk_data_plane_stop()
>
> which in turn triggesrs the following cascade:
> virtio_blk_data_plane_stop_bh --> virtio_queue_aio_set_host_notifier_handler() -->
> --> virtio_queue_host_notifier_aio_read()
> which however calls
> virtio_queue_notify_aio_vq() if the notifier tests as
> positive.
>
> Since we still have vq->handle_aio_output that means we may
> call virtqueue_pop() during the reset procedure.
>
> This was a problem for us, because (due to a bug) the shared pages that
> constitute the virtio ring weren't shared any more. And thus we got
> the infamous
> virtio_error(vdev, "virtio: zero sized buffers are not allowed").
>
> Now the bug is no more, and we can tolerate that somewhat late access
> to the virtio ring.
>
> But it keeps nagging me, is it really OK for the device to access the
> virtio ring during reset? My intuition tells me that the device should
> not look for new requests after it has been told to reset.
Well it's after it was told to reset but it's not after
it completed reset. So I think it's fine ...
> Opinions? (Michael, Connie)
>
> Regards,
> Halil
>
> > This patch works around the issue by introducing virtio_set_disabled(),
> > which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
> > when bus-mastering is disabled. Since we'd check this flag at all the
> > same sites as vdev->broken, we replace those checks with an inline
> > function which checks for either vdev->broken or vdev->disabled.
> >
> > The 'disabled' flag is only migrated when set, which should be fairly
> > rare, but to maintain migration compatibility we disable it's use for
> > older machine types. Users requiring the use of the flag in conjunction
> > with older machine types can set it explicitly as a virtio-device
> > option.
> >
next prev parent reply other threads:[~2019-11-28 19:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-20 0:50 [PATCH v2] virtio-pci: disable vring processing when bus-mastering is disabled Michael Roth
2019-11-20 6:12 ` no-reply
2019-11-20 16:23 ` Michael Roth
2019-11-20 16:40 ` Dr. David Alan Gilbert
2019-11-28 16:48 ` Halil Pasic
2019-11-28 17:03 ` Michael S. Tsirkin [this message]
2019-11-29 13:51 ` Halil Pasic
2019-12-05 6:04 ` Alexey Kardashevskiy
2019-12-11 16:02 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191128120223-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=aik@ozlabs.ru \
--cc=borntraeger@de.ibm.com \
--cc=david@gibson.dropbear.id.au \
--cc=frankja@linux.ibm.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).