All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: qemu-level <qemu-devel@nongnu.org>
Subject: Re: [RFC PATCH 13/27] vhost: Send buffers to device
Date: Wed, 24 Mar 2021 19:56:36 +0000	[thread overview]
Message-ID: <YFuZdDNIhfROJQCi@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAJaqyWfnAYQnuUyp1UUJ0g0KAizycqRQ5kTfFWevzyVktQTzMg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10097 bytes --]

On Wed, Mar 24, 2021 at 08:04:07PM +0100, Eugenio Perez Martin wrote:
> On Mon, Mar 22, 2021 at 6:40 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, Mar 22, 2021 at 04:55:13PM +0100, Eugenio Perez Martin wrote:
> > > On Mon, Mar 22, 2021 at 11:51 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >
> > > > On Thu, Mar 11, 2021 at 07:53:53PM +0100, Eugenio Perez Martin wrote:
> > > > > On Fri, Jan 22, 2021 at 7:18 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Dec 10, 2020 at 12:55 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 09, 2020 at 07:41:23PM +0100, Eugenio Perez Martin wrote:
> > > > > > > > On Tue, Dec 8, 2020 at 9:16 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > > > > > > > On Fri, Nov 20, 2020 at 07:50:51PM +0100, Eugenio Pérez wrote:
> > > > > > > > > > +        while (true) {
> > > > > > > > > > +            int r;
> > > > > > > > > > +            if (virtio_queue_full(vq)) {
> > > > > > > > > > +                break;
> > > > > > > > > > +            }
> > > > > > > > >
> > > > > > > > > Why is this check necessary? The guest cannot provide more descriptors
> > > > > > > > > than there is ring space. If that happens somehow then it's a driver
> > > > > > > > > error that is already reported in virtqueue_pop() below.
> > > > > > > > >
> > > > > > > >
> > > > > > > > It's just checked because virtqueue_pop prints an error on that case,
> > > > > > > > and there is no way to tell the difference between a regular error and
> > > > > > > > another caused by other causes. Maybe the right thing to do is just to
> > > > > > > > not to print that error? Caller should do the error printing in that
> > > > > > > > case. Should we return an error code?
> > > > > > >
> > > > > > > The reason an error is printed today is because it's a guest error that
> > > > > > > never happens with correct guest drivers. Something is broken and the
> > > > > > > user should know about it.
> > > > > > >
> > > > > > > Why is "virtio_queue_full" (I already forgot what that actually means,
> > > > > > > it's not clear whether this is referring to avail elements or used
> > > > > > > elements) a condition that should be silently ignored in shadow vqs?
> > > > > > >
> > > > > >
> > > > > > TL;DR: It can be changed to a check of the number of available
> > > > > > descriptors in shadow vq, instead of returning as a regular operation.
> > > > > > However, I think that making it a special return of virtqueue_pop
> > > > > > could help in devices that run to completion, avoiding having to
> > > > > > duplicate the count logic in them.
> > > > > >
> > > > > > The function virtio_queue_empty checks if the vq has all descriptors
> > > > > > available, so the device has no more work to do until the driver makes
> > > > > > another descriptor available. I can see how it can be a bad name
> > > > > > choice, but virtio_queue_full means the opposite: device has pop()
> > > > > > every descriptor available, and it has not returned any, so the driver
> > > > > > cannot progress until the device marks some descriptors as used.
> > > > > >
> > > > > > As I understand, if vq->in_use >vq->num would mean we have a bug in
> > > > > > the device vq code, not in the driver. virtio_queue_full could even be
> > > > > > changed to "assert(vq->inuse <= vq->vring.num); return vq->inuse ==
> > > > > > vq->vring.num", as long as vq->in_use is operated right.
> > > > > >
> > > > > > If we hit vq->in_use == vq->num in virtqueue_pop it means the device
> > > > > > tried to pop() one more buffer after having all of them available and
> > > > > > pop'ed. This would be invalid if the device is counting right the
> > > > > > number of in_use descriptors, but then we are duplicating that logic
> > > > > > in the device and the vq.
> > > >
> > > > Devices call virtqueue_pop() until it returns NULL. They don't need to
> > > > count virtqueue buffers explicitly. It returns NULL when vq->num
> > > > virtqueue buffers have already been popped (either because
> > > > virtio_queue_empty() is true or because an invalid driver state is
> > > > detected by checking vq->num in virtqueue_pop()).
> > >
> > > If I understood you right, the virtio_queue_full addresses the reverse
> > > problem: it controls when the virtqueue is out of buffers to make
> > > available for the device because the latter has not consumed any, not
> > > when the driver does not offer more buffers to the device because it
> > > has no more data to offer.
> > >
> > > I find it easier to explain with the virtio-net rx queue (and I think
> > > it's the easier way to trigger this issue). You are describing it's
> > > regular behavior: The guest fills it (let's say 100%), and the device
> > > picks buffers one by one:
> > >
> > > virtio_net_receive_rcu:
> > > while (offset < size) {
> > >     elem = virtqueue_pop(q->rx_vq, sizeof(VirtQueueElement));
> >
> > The lines before this loop return early when the virtqueue does not have
> > sufficient buffer space:
> >
> >   if (!virtio_net_has_buffers(q, size + n->guest_hdr_len - n->host_hdr_len)) {
> >       return 0;
> >   }
> >
> > When entering this loop we know that we can pop the buffers needed to
> > fill one rx packet.
> >
> > >     if (!elem) {
> > >         virtio_error("unexpected empty queue");
> > >     }
> > >     /* [1] */
> > >     /* fill elem with rx packet */
> > >     virtqueue_fill(virtqueue, elem);
> > >     ...
> > >     virtqueue_flush(q->rx_vq, i);
> > > }
> > >
> > > Every device as far as I know does this buffer by buffer, there is
> > > just processing code in [1], and it never tries to pop more than one
> > > buffers/chain of buffers at the same time. In the case of a queue
> > > empty (no more available buffers), we hit an error, because there are
> > > no more buffers to write.
> >
> > It's an error because we already checked that the virtqueue has buffer
> > space. This should never happen.
> >
> > > In other devices (or tx queue), empty
> > > buffers means there is no more work to do, not an error.
> > >
> > > In the case of shadow virtqueue, we cannot limit the number of exposed
> > > rx buffers to 1 buffer/chain of buffers in [1], since it will affect
> > > batching. We have the opposite problem: All devices (but rx queue)
> > > want to queue "as empty as possible", or "to mark all available
> > > buffers empty". Net rx queue is ok as long as it has a buffer/buffer
> > > chain big enough to write to, but it will fetch them on demand, so
> > > "queue full" (as in all buffers are available) is not a problem for
> > > the device.
> > >
> > > However, the part of the shadow virtqueue that forwards the available
> > > buffer seeks the opposite: It wants as many buffers as possible to be
> > > available. That means that there is no [1] code that fills/read &
> > > flush/detach the buffer immediately: Shadow virtqueue wants to make
> > > available as many buffers as possible, but the device may not use them
> > > until it has more data available. To the extreme (virtio-net rx queue
> > > full), shadow virtqueue may make available all buffers, so in a
> > > while(true) loop, it will try to make them available until it hits
> > > that all the buffers are already available (vq->in_use == vq->num).
> > >
> > > The solution is to check the number of buffers already available
> > > before calling virtio_queue_pop(). We could duplicate in_use in shadow
> > > virtqueue, of course, but everything we need is already done in
> > > VirtQueue code, so I think to reuse it is a better solution. Another
> > > solution could be to treat vq->in_use == vq->num as an special return
> > > code with no printed error in virtqueue_pop, but to expose if the
> > > queue is full (as vq->in_use == vq->num) sounds less invasive to me.
> > >
> > > >
> > > > > > In shadow vq this situation happens with the correct guest network
> > > > > > driver, since the rx queue is filled for the device to write. Network
> > > > > > device in qemu fetch descriptors on demand, but shadow vq fetch all
> > > > > > available in batching. If the driver just happens to fill the queue of
> > > > > > available descriptors, the log will raise, so we need to check in
> > > > > > handle_sw_lm_vq before calling pop(). Of course the shadow vq can
> > > > > > duplicate guest_vq->in_use instead of checking virtio_queue_full, but
> > > > > > then it needs to check two things for every virtqueue_pop() [1].
> > > >
> > > > I don't understand this scenario. It sounds like you are saying the
> > > > guest and shadow rx vq are not in sync so there is a case where
> > > > vq->in_use > vq->num is triggered?
> > >
> > > Sorry if I explain it bad, what I meant is that there is a case where
> > > SVQ (as device code) will call virtqueue_pop() when vq->in_use ==
> > > vq->num. virtio_queue_full maintains the check as >=, I think it
> > > should be safe to even to code virtio_queue_full to:
> > >
> > > assert(vq->in_use > vq->num);
> > > return vq->inuse == vq->num;
> > >
> > > Please let me know if this is not clear enough.
> >
> > I don't get it. When virtqueue_split_pop() has popped all requests
> > virtio_queue_empty_rcu() should return true and we shouldn't reach if
> > (vq->inuse >= vq->vring.num). The guest driver cannot submit more
> > available buffers at this point.
> >
> 
> Hi Stefan.
> 
> After the meeting, and reviewing the code carefully, I think you are
> right. I'm not sure what I did to reproduce the issue, but I'm not
> able to do it anymore, even in the conditions I thought where it was
> trivially reproducible. Now I think it was caused in the previous
> series because of accessing directly to guest's vring.
> 
> So I will delete this commit from the series. I still need to test SVQ
> with the new additions, so if the bug persists it will reproduce for
> sure.

Okay, thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2021-03-24 19:58 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 18:50 [RFC PATCH 00/27] vDPA software assisted live migration Eugenio Pérez
2020-11-20 18:50 ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 01/27] vhost: Add vhost_dev_can_log Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 02/27] vhost: Add device callback in vhost_migration_log Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-07 16:19   ` Stefan Hajnoczi
2020-12-07 16:19     ` Stefan Hajnoczi
2020-12-07 16:19     ` Stefan Hajnoczi
2020-12-09 12:20     ` Eugenio Perez Martin
2020-12-09 12:20       ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 03/27] vhost: Move log resize/put to vhost_dev_set_log Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 04/27] vhost: add vhost_kernel_set_vring_enable Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-07 16:43   ` Stefan Hajnoczi
2020-12-07 16:43     ` Stefan Hajnoczi
2020-12-07 16:43     ` Stefan Hajnoczi
2020-12-09 12:00     ` Eugenio Perez Martin
2020-12-09 12:00       ` Eugenio Perez Martin
2020-12-09 16:08       ` Stefan Hajnoczi
2020-12-09 16:08         ` Stefan Hajnoczi
2020-12-09 16:08         ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 05/27] vhost: Add hdev->dev.sw_lm_vq_handler Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-07 16:52   ` Stefan Hajnoczi
2020-12-07 16:52     ` Stefan Hajnoczi
2020-12-07 16:52     ` Stefan Hajnoczi
2020-12-09 15:02     ` Eugenio Perez Martin
2020-12-09 15:02       ` Eugenio Perez Martin
2020-12-10 11:30       ` Stefan Hajnoczi
2020-12-10 11:30         ` Stefan Hajnoczi
2020-12-10 11:30         ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 06/27] virtio: Add virtio_queue_get_used_notify_split Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-07 16:58   ` Stefan Hajnoczi
2020-12-07 16:58     ` Stefan Hajnoczi
2020-12-07 16:58     ` Stefan Hajnoczi
2021-01-12 18:21     ` Eugenio Perez Martin
2021-01-12 18:21       ` Eugenio Perez Martin
2021-03-02 11:22       ` Stefan Hajnoczi
2021-03-02 11:22         ` Stefan Hajnoczi
2021-03-02 11:22         ` Stefan Hajnoczi
2021-03-02 18:34         ` Eugenio Perez Martin
2021-03-02 18:34           ` Eugenio Perez Martin
2021-03-08 10:46           ` Stefan Hajnoczi
2021-03-08 10:46             ` Stefan Hajnoczi
2021-03-08 10:46             ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 07/27] vhost: Route guest->host notification through qemu Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-07 17:42   ` Stefan Hajnoczi
2020-12-07 17:42     ` Stefan Hajnoczi
2020-12-07 17:42     ` Stefan Hajnoczi
2020-12-09 17:08     ` Eugenio Perez Martin
2020-12-09 17:08       ` Eugenio Perez Martin
2020-12-10 11:50       ` Stefan Hajnoczi
2020-12-10 11:50         ` Stefan Hajnoczi
2020-12-10 11:50         ` Stefan Hajnoczi
2021-01-21 20:10         ` Eugenio Perez Martin
2021-01-21 20:10           ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 08/27] vhost: Add a flag for software assisted Live Migration Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  7:20   ` Stefan Hajnoczi
2020-12-08  7:20     ` Stefan Hajnoczi
2020-12-08  7:20     ` Stefan Hajnoczi
2020-12-09 17:57     ` Eugenio Perez Martin
2020-12-09 17:57       ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 09/27] vhost: Route host->guest notification through qemu Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  7:34   ` Stefan Hajnoczi
2020-12-08  7:34     ` Stefan Hajnoczi
2020-12-08  7:34     ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 10/27] vhost: Allocate shadow vring Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  7:49   ` Stefan Hajnoczi
2020-12-08  7:49     ` Stefan Hajnoczi
2020-12-08  7:49     ` Stefan Hajnoczi
2020-12-08  8:17   ` Stefan Hajnoczi
2020-12-08  8:17     ` Stefan Hajnoczi
2020-12-08  8:17     ` Stefan Hajnoczi
2020-12-09 18:15     ` Eugenio Perez Martin
2020-12-09 18:15       ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 11/27] virtio: const-ify all virtio_tswap* functions Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 12/27] virtio: Add virtio_queue_full Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 13/27] vhost: Send buffers to device Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  8:16   ` Stefan Hajnoczi
2020-12-08  8:16     ` Stefan Hajnoczi
2020-12-08  8:16     ` Stefan Hajnoczi
2020-12-09 18:41     ` Eugenio Perez Martin
2020-12-09 18:41       ` Eugenio Perez Martin
2020-12-10 11:55       ` Stefan Hajnoczi
2020-12-10 11:55         ` Stefan Hajnoczi
2020-12-10 11:55         ` Stefan Hajnoczi
2021-01-22 18:18         ` Eugenio Perez Martin
2021-01-22 18:18           ` Eugenio Perez Martin
     [not found]           ` <CAJaqyWdNeaboGaSsXPA8r=mUsbctFLzACFKLX55yRQpTvjqxJw@mail.gmail.com>
2021-03-22 10:51             ` Stefan Hajnoczi
2021-03-22 15:55               ` Eugenio Perez Martin
2021-03-22 17:40                 ` Stefan Hajnoczi
2021-03-24 19:04                   ` Eugenio Perez Martin
2021-03-24 19:56                     ` Stefan Hajnoczi [this message]
2020-11-20 18:50 ` [RFC PATCH 14/27] virtio: Remove virtio_queue_get_used_notify_split Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 15/27] vhost: Do not invalidate signalled used Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 16/27] virtio: Expose virtqueue_alloc_element Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  8:25   ` Stefan Hajnoczi
2020-12-08  8:25     ` Stefan Hajnoczi
2020-12-08  8:25     ` Stefan Hajnoczi
2020-12-09 18:46     ` Eugenio Perez Martin
2020-12-09 18:46       ` Eugenio Perez Martin
2020-12-10 11:57       ` Stefan Hajnoczi
2020-12-10 11:57         ` Stefan Hajnoczi
2020-12-10 11:57         ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 17/27] vhost: add vhost_vring_set_notification_rcu Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 18/27] vhost: add vhost_vring_poll_rcu Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  8:41   ` Stefan Hajnoczi
2020-12-08  8:41     ` Stefan Hajnoczi
2020-12-08  8:41     ` Stefan Hajnoczi
2020-12-09 18:48     ` Eugenio Perez Martin
2020-12-09 18:48       ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 19/27] vhost: add vhost_vring_get_buf_rcu Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 20/27] vhost: Return used buffers Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-12-08  8:50   ` Stefan Hajnoczi
2020-12-08  8:50     ` Stefan Hajnoczi
2020-12-08  8:50     ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 21/27] vhost: Add vhost_virtqueue_memory_unmap Eugenio Pérez
2020-11-20 18:50   ` Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 22/27] vhost: Add vhost_virtqueue_memory_map Eugenio Pérez
2020-11-20 18:51   ` Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 23/27] vhost: unmap qemu's shadow virtqueues on sw live migration Eugenio Pérez
2020-11-20 18:51   ` Eugenio Pérez
2020-11-27 15:29   ` Stefano Garzarella
2020-11-27 15:29     ` Stefano Garzarella
2020-11-27 15:29     ` Stefano Garzarella
2020-11-30  7:54     ` Eugenio Perez Martin
2020-11-30  7:54       ` Eugenio Perez Martin
2020-11-20 18:51 ` [RFC PATCH 24/27] vhost: iommu changes Eugenio Pérez
2020-11-20 18:51   ` Eugenio Pérez
2020-12-08  9:02   ` Stefan Hajnoczi
2020-12-08  9:02     ` Stefan Hajnoczi
2020-12-08  9:02     ` Stefan Hajnoczi
2020-11-20 18:51 ` [RFC PATCH 25/27] vhost: Do not commit vhost used idx on vhost_virtqueue_stop Eugenio Pérez
2020-11-20 18:51   ` Eugenio Pérez
2020-11-20 19:35   ` Eugenio Perez Martin
2020-11-20 19:35     ` Eugenio Perez Martin
2020-11-20 18:51 ` [RFC PATCH 26/27] vhost: Add vhost_hdev_can_sw_lm Eugenio Pérez
2020-11-20 18:51   ` Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 27/27] vhost: forbid vhost devices logging Eugenio Pérez
2020-11-20 18:51   ` Eugenio Pérez
2020-11-20 19:03 ` [RFC PATCH 00/27] vDPA software assisted live migration Eugenio Perez Martin
2020-11-20 19:03   ` Eugenio Perez Martin
2020-11-20 19:30 ` no-reply
2020-11-20 19:30   ` no-reply
2020-11-25  7:08 ` Jason Wang
2020-11-25  7:08   ` Jason Wang
2020-11-25  7:08   ` Jason Wang
2020-11-25 12:03   ` Eugenio Perez Martin
2020-11-25 12:03     ` Eugenio Perez Martin
2020-11-25 12:14     ` Eugenio Perez Martin
2020-11-25 12:14       ` Eugenio Perez Martin
2020-11-26  3:07     ` Jason Wang
2020-11-26  3:07       ` Jason Wang
2020-11-26  3:07       ` Jason Wang
2020-11-27 15:44 ` Stefano Garzarella
2020-11-27 15:44   ` Stefano Garzarella
2020-11-27 15:44   ` Stefano Garzarella
2020-12-08  9:37 ` Stefan Hajnoczi
2020-12-08  9:37   ` Stefan Hajnoczi
2020-12-08  9:37   ` Stefan Hajnoczi
2020-12-09  9:26   ` Jason Wang
2020-12-09  9:26     ` Jason Wang
2020-12-09  9:26     ` Jason Wang
2020-12-09 15:57     ` Stefan Hajnoczi
2020-12-09 15:57       ` Stefan Hajnoczi
2020-12-09 15:57       ` Stefan Hajnoczi
2020-12-10  9:12       ` Jason Wang
2020-12-10  9:12         ` Jason Wang
2020-12-10  9:12         ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFuZdDNIhfROJQCi@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.