qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-level <qemu-devel@nongnu.org>
Subject: Re: [RFC PATCH 13/27] vhost: Send buffers to device
Date: Wed, 24 Mar 2021 20:04:07 +0100	[thread overview]
Message-ID: <CAJaqyWfnAYQnuUyp1UUJ0g0KAizycqRQ5kTfFWevzyVktQTzMg@mail.gmail.com> (raw)
In-Reply-To: <YFjWcswhZWkyKOOR@stefanha-x1.localdomain>

On Mon, Mar 22, 2021 at 6:40 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Mar 22, 2021 at 04:55:13PM +0100, Eugenio Perez Martin wrote:
> > On Mon, Mar 22, 2021 at 11:51 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Thu, Mar 11, 2021 at 07:53:53PM +0100, Eugenio Perez Martin wrote:
> > > > On Fri, Jan 22, 2021 at 7:18 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Thu, Dec 10, 2020 at 12:55 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 09, 2020 at 07:41:23PM +0100, Eugenio Perez Martin wrote:
> > > > > > > On Tue, Dec 8, 2020 at 9:16 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > > > > > > On Fri, Nov 20, 2020 at 07:50:51PM +0100, Eugenio Pérez wrote:
> > > > > > > > > +        while (true) {
> > > > > > > > > +            int r;
> > > > > > > > > +            if (virtio_queue_full(vq)) {
> > > > > > > > > +                break;
> > > > > > > > > +            }
> > > > > > > >
> > > > > > > > Why is this check necessary? The guest cannot provide more descriptors
> > > > > > > > than there is ring space. If that happens somehow then it's a driver
> > > > > > > > error that is already reported in virtqueue_pop() below.
> > > > > > > >
> > > > > > >
> > > > > > > It's just checked because virtqueue_pop prints an error on that case,
> > > > > > > and there is no way to tell the difference between a regular error and
> > > > > > > another caused by other causes. Maybe the right thing to do is just to
> > > > > > > not to print that error? Caller should do the error printing in that
> > > > > > > case. Should we return an error code?
> > > > > >
> > > > > > The reason an error is printed today is because it's a guest error that
> > > > > > never happens with correct guest drivers. Something is broken and the
> > > > > > user should know about it.
> > > > > >
> > > > > > Why is "virtio_queue_full" (I already forgot what that actually means,
> > > > > > it's not clear whether this is referring to avail elements or used
> > > > > > elements) a condition that should be silently ignored in shadow vqs?
> > > > > >
> > > > >
> > > > > TL;DR: It can be changed to a check of the number of available
> > > > > descriptors in shadow vq, instead of returning as a regular operation.
> > > > > However, I think that making it a special return of virtqueue_pop
> > > > > could help in devices that run to completion, avoiding having to
> > > > > duplicate the count logic in them.
> > > > >
> > > > > The function virtio_queue_empty checks if the vq has all descriptors
> > > > > available, so the device has no more work to do until the driver makes
> > > > > another descriptor available. I can see how it can be a bad name
> > > > > choice, but virtio_queue_full means the opposite: device has pop()
> > > > > every descriptor available, and it has not returned any, so the driver
> > > > > cannot progress until the device marks some descriptors as used.
> > > > >
> > > > > As I understand, if vq->in_use >vq->num would mean we have a bug in
> > > > > the device vq code, not in the driver. virtio_queue_full could even be
> > > > > changed to "assert(vq->inuse <= vq->vring.num); return vq->inuse ==
> > > > > vq->vring.num", as long as vq->in_use is operated right.
> > > > >
> > > > > If we hit vq->in_use == vq->num in virtqueue_pop it means the device
> > > > > tried to pop() one more buffer after having all of them available and
> > > > > pop'ed. This would be invalid if the device is counting right the
> > > > > number of in_use descriptors, but then we are duplicating that logic
> > > > > in the device and the vq.
> > >
> > > Devices call virtqueue_pop() until it returns NULL. They don't need to
> > > count virtqueue buffers explicitly. It returns NULL when vq->num
> > > virtqueue buffers have already been popped (either because
> > > virtio_queue_empty() is true or because an invalid driver state is
> > > detected by checking vq->num in virtqueue_pop()).
> >
> > If I understood you right, the virtio_queue_full addresses the reverse
> > problem: it controls when the virtqueue is out of buffers to make
> > available for the device because the latter has not consumed any, not
> > when the driver does not offer more buffers to the device because it
> > has no more data to offer.
> >
> > I find it easier to explain with the virtio-net rx queue (and I think
> > it's the easier way to trigger this issue). You are describing it's
> > regular behavior: The guest fills it (let's say 100%), and the device
> > picks buffers one by one:
> >
> > virtio_net_receive_rcu:
> > while (offset < size) {
> >     elem = virtqueue_pop(q->rx_vq, sizeof(VirtQueueElement));
>
> The lines before this loop return early when the virtqueue does not have
> sufficient buffer space:
>
>   if (!virtio_net_has_buffers(q, size + n->guest_hdr_len - n->host_hdr_len)) {
>       return 0;
>   }
>
> When entering this loop we know that we can pop the buffers needed to
> fill one rx packet.
>
> >     if (!elem) {
> >         virtio_error("unexpected empty queue");
> >     }
> >     /* [1] */
> >     /* fill elem with rx packet */
> >     virtqueue_fill(virtqueue, elem);
> >     ...
> >     virtqueue_flush(q->rx_vq, i);
> > }
> >
> > Every device as far as I know does this buffer by buffer, there is
> > just processing code in [1], and it never tries to pop more than one
> > buffers/chain of buffers at the same time. In the case of a queue
> > empty (no more available buffers), we hit an error, because there are
> > no more buffers to write.
>
> It's an error because we already checked that the virtqueue has buffer
> space. This should never happen.
>
> > In other devices (or tx queue), empty
> > buffers means there is no more work to do, not an error.
> >
> > In the case of shadow virtqueue, we cannot limit the number of exposed
> > rx buffers to 1 buffer/chain of buffers in [1], since it will affect
> > batching. We have the opposite problem: All devices (but rx queue)
> > want to queue "as empty as possible", or "to mark all available
> > buffers empty". Net rx queue is ok as long as it has a buffer/buffer
> > chain big enough to write to, but it will fetch them on demand, so
> > "queue full" (as in all buffers are available) is not a problem for
> > the device.
> >
> > However, the part of the shadow virtqueue that forwards the available
> > buffer seeks the opposite: It wants as many buffers as possible to be
> > available. That means that there is no [1] code that fills/read &
> > flush/detach the buffer immediately: Shadow virtqueue wants to make
> > available as many buffers as possible, but the device may not use them
> > until it has more data available. To the extreme (virtio-net rx queue
> > full), shadow virtqueue may make available all buffers, so in a
> > while(true) loop, it will try to make them available until it hits
> > that all the buffers are already available (vq->in_use == vq->num).
> >
> > The solution is to check the number of buffers already available
> > before calling virtio_queue_pop(). We could duplicate in_use in shadow
> > virtqueue, of course, but everything we need is already done in
> > VirtQueue code, so I think to reuse it is a better solution. Another
> > solution could be to treat vq->in_use == vq->num as an special return
> > code with no printed error in virtqueue_pop, but to expose if the
> > queue is full (as vq->in_use == vq->num) sounds less invasive to me.
> >
> > >
> > > > > In shadow vq this situation happens with the correct guest network
> > > > > driver, since the rx queue is filled for the device to write. Network
> > > > > device in qemu fetch descriptors on demand, but shadow vq fetch all
> > > > > available in batching. If the driver just happens to fill the queue of
> > > > > available descriptors, the log will raise, so we need to check in
> > > > > handle_sw_lm_vq before calling pop(). Of course the shadow vq can
> > > > > duplicate guest_vq->in_use instead of checking virtio_queue_full, but
> > > > > then it needs to check two things for every virtqueue_pop() [1].
> > >
> > > I don't understand this scenario. It sounds like you are saying the
> > > guest and shadow rx vq are not in sync so there is a case where
> > > vq->in_use > vq->num is triggered?
> >
> > Sorry if I explain it bad, what I meant is that there is a case where
> > SVQ (as device code) will call virtqueue_pop() when vq->in_use ==
> > vq->num. virtio_queue_full maintains the check as >=, I think it
> > should be safe to even to code virtio_queue_full to:
> >
> > assert(vq->in_use > vq->num);
> > return vq->inuse == vq->num;
> >
> > Please let me know if this is not clear enough.
>
> I don't get it. When virtqueue_split_pop() has popped all requests
> virtio_queue_empty_rcu() should return true and we shouldn't reach if
> (vq->inuse >= vq->vring.num). The guest driver cannot submit more
> available buffers at this point.
>

Hi Stefan.

After the meeting, and reviewing the code carefully, I think you are
right. I'm not sure what I did to reproduce the issue, but I'm not
able to do it anymore, even in the conditions I thought where it was
trivially reproducible. Now I think it was caused in the previous
series because of accessing directly to guest's vring.

So I will delete this commit from the series. I still need to test SVQ
with the new additions, so if the bug persists it will reproduce for
sure.

Thank you very much!

> I only checked split rings, not packed rings.
>
> Can you point to the SVQ code which has this problem? It may be easier
> to re-read the code than try to describe it in an email.
>
> Stefan



  reply	other threads:[~2021-03-24 19:07 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 18:50 [RFC PATCH 00/27] vDPA software assisted live migration Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 01/27] vhost: Add vhost_dev_can_log Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 02/27] vhost: Add device callback in vhost_migration_log Eugenio Pérez
2020-12-07 16:19   ` Stefan Hajnoczi
2020-12-09 12:20     ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 03/27] vhost: Move log resize/put to vhost_dev_set_log Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 04/27] vhost: add vhost_kernel_set_vring_enable Eugenio Pérez
2020-12-07 16:43   ` Stefan Hajnoczi
2020-12-09 12:00     ` Eugenio Perez Martin
2020-12-09 16:08       ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 05/27] vhost: Add hdev->dev.sw_lm_vq_handler Eugenio Pérez
2020-12-07 16:52   ` Stefan Hajnoczi
2020-12-09 15:02     ` Eugenio Perez Martin
2020-12-10 11:30       ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 06/27] virtio: Add virtio_queue_get_used_notify_split Eugenio Pérez
2020-12-07 16:58   ` Stefan Hajnoczi
2021-01-12 18:21     ` Eugenio Perez Martin
2021-03-02 11:22       ` Stefan Hajnoczi
2021-03-02 18:34         ` Eugenio Perez Martin
2021-03-08 10:46           ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 07/27] vhost: Route guest->host notification through qemu Eugenio Pérez
2020-12-07 17:42   ` Stefan Hajnoczi
2020-12-09 17:08     ` Eugenio Perez Martin
2020-12-10 11:50       ` Stefan Hajnoczi
2021-01-21 20:10         ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 08/27] vhost: Add a flag for software assisted Live Migration Eugenio Pérez
2020-12-08  7:20   ` Stefan Hajnoczi
2020-12-09 17:57     ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 09/27] vhost: Route host->guest notification through qemu Eugenio Pérez
2020-12-08  7:34   ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 10/27] vhost: Allocate shadow vring Eugenio Pérez
2020-12-08  7:49   ` Stefan Hajnoczi
2020-12-08  8:17   ` Stefan Hajnoczi
2020-12-09 18:15     ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 11/27] virtio: const-ify all virtio_tswap* functions Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 12/27] virtio: Add virtio_queue_full Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 13/27] vhost: Send buffers to device Eugenio Pérez
2020-12-08  8:16   ` Stefan Hajnoczi
2020-12-09 18:41     ` Eugenio Perez Martin
2020-12-10 11:55       ` Stefan Hajnoczi
2021-01-22 18:18         ` Eugenio Perez Martin
     [not found]           ` <CAJaqyWdNeaboGaSsXPA8r=mUsbctFLzACFKLX55yRQpTvjqxJw@mail.gmail.com>
2021-03-22 10:51             ` Stefan Hajnoczi
2021-03-22 15:55               ` Eugenio Perez Martin
2021-03-22 17:40                 ` Stefan Hajnoczi
2021-03-24 19:04                   ` Eugenio Perez Martin [this message]
2021-03-24 19:56                     ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 14/27] virtio: Remove virtio_queue_get_used_notify_split Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 15/27] vhost: Do not invalidate signalled used Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 16/27] virtio: Expose virtqueue_alloc_element Eugenio Pérez
2020-12-08  8:25   ` Stefan Hajnoczi
2020-12-09 18:46     ` Eugenio Perez Martin
2020-12-10 11:57       ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 17/27] vhost: add vhost_vring_set_notification_rcu Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 18/27] vhost: add vhost_vring_poll_rcu Eugenio Pérez
2020-12-08  8:41   ` Stefan Hajnoczi
2020-12-09 18:48     ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 19/27] vhost: add vhost_vring_get_buf_rcu Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 20/27] vhost: Return used buffers Eugenio Pérez
2020-12-08  8:50   ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 21/27] vhost: Add vhost_virtqueue_memory_unmap Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 22/27] vhost: Add vhost_virtqueue_memory_map Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 23/27] vhost: unmap qemu's shadow virtqueues on sw live migration Eugenio Pérez
2020-11-27 15:29   ` Stefano Garzarella
2020-11-30  7:54     ` Eugenio Perez Martin
2020-11-20 18:51 ` [RFC PATCH 24/27] vhost: iommu changes Eugenio Pérez
2020-12-08  9:02   ` Stefan Hajnoczi
2020-11-20 18:51 ` [RFC PATCH 25/27] vhost: Do not commit vhost used idx on vhost_virtqueue_stop Eugenio Pérez
2020-11-20 19:35   ` Eugenio Perez Martin
2020-11-20 18:51 ` [RFC PATCH 26/27] vhost: Add vhost_hdev_can_sw_lm Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 27/27] vhost: forbid vhost devices logging Eugenio Pérez
2020-11-20 19:03 ` [RFC PATCH 00/27] vDPA software assisted live migration Eugenio Perez Martin
2020-11-20 19:30 ` no-reply
2020-11-25  7:08 ` Jason Wang
2020-11-25 12:03   ` Eugenio Perez Martin
2020-11-25 12:14     ` Eugenio Perez Martin
2020-11-26  3:07     ` Jason Wang
2020-11-27 15:44 ` Stefano Garzarella
2020-12-08  9:37 ` Stefan Hajnoczi
2020-12-09  9:26   ` Jason Wang
2020-12-09 15:57     ` Stefan Hajnoczi
2020-12-10  9:12       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJaqyWfnAYQnuUyp1UUJ0g0KAizycqRQ5kTfFWevzyVktQTzMg@mail.gmail.com \
    --to=eperezma@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).