All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Jonah Palmer <jonah.palmer@oracle.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, raphael@enfabrica.net,
	 kwolf@redhat.com, hreitz@redhat.com, jasowang@redhat.com,
	pbonzini@redhat.com,  fam@euphon.net, stefanha@redhat.com,
	qemu-block@nongnu.org,  schalla@marvell.com, leiyang@redhat.com,
	virtio-fs@lists.linux.dev,  si-wei.liu@oracle.com,
	boris.ostrovsky@oracle.com
Subject: Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
Date: Tue, 26 Mar 2024 19:34:22 +0100	[thread overview]
Message-ID: <CAJaqyWcHokNf97uwE0=S56CK9cBpB54sF8cdW7+BhFc5VzBRUQ@mail.gmail.com> (raw)
In-Reply-To: <d435919b-ec5e-41d5-8bbc-354d027f67d0@oracle.com>

On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:
> > On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >>
> >>
> >> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>
> >>>> The goal of these patches is to add support to a variety of virtio and
> >>>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> >>>> indicates that all buffers are used by the device in the same order in
> >>>> which they were made available by the driver.
> >>>>
> >>>> These patches attempt to implement a generalized, non-device-specific
> >>>> solution to support this feature.
> >>>>
> >>>> The core feature behind this solution is a buffer mechanism in the form
> >>>> of GLib's GHashTable. The decision behind using a hash table was to
> >>>> leverage their ability for quick lookup, insertion, and removal
> >>>> operations. Given that our keys are simply numbers of an ordered
> >>>> sequence, a hash table seemed like the best choice for a buffer
> >>>> mechanism.
> >>>>
> >>>> ---------------------
> >>>>
> >>>> The strategy behind this implementation is as follows:
> >>>>
> >>>> We know that buffers that are popped from the available ring and enqueued
> >>>> for further processing will always done in the same order in which they
> >>>> were made available by the driver. Given this, we can note their order
> >>>> by assigning the resulting VirtQueueElement a key. This key is a number
> >>>> in a sequence that represents the order in which they were popped from
> >>>> the available ring, relative to the other VirtQueueElements.
> >>>>
> >>>> For example, given 3 "elements" that were popped from the available
> >>>> ring, we assign a key value to them which represents their order (elem0
> >>>> is popped first, then elem1, then lastly elem2):
> >>>>
> >>>>        elem2   --  elem1   --  elem0   ---> Enqueue for processing
> >>>>       (key: 2)    (key: 1)    (key: 0)
> >>>>
> >>>> Then these elements are enqueued for further processing by the host.
> >>>>
> >>>> While most devices will return these completed elements in the same
> >>>> order in which they were enqueued, some devices may not (e.g.
> >>>> virtio-blk). To guarantee that these elements are put on the used ring
> >>>> in the same order in which they were enqueued, we can use a buffering
> >>>> mechanism that keeps track of the next expected sequence number of an
> >>>> element.
> >>>>
> >>>> In other words, if the completed element does not have a key value that
> >>>> matches the next expected sequence number, then we know this element is
> >>>> not in-order and we must stash it away in a hash table until an order
> >>>> can be made. The element's key value is used as the key for placing it
> >>>> in the hash table.
> >>>>
> >>>> If the completed element has a key value that matches the next expected
> >>>> sequence number, then we know this element is in-order and we can push
> >>>> it on the used ring. Then we increment the next expected sequence number
> >>>> and check if the hash table contains an element at this key location.
> >>>>
> >>>> If so, we retrieve this element, push it to the used ring, delete the
> >>>> key-value pair from the hash table, increment the next expected sequence
> >>>> number, and check the hash table again for an element at this new key
> >>>> location. This process is repeated until we're unable to find an element
> >>>> in the hash table to continue the order.
> >>>>
> >>>> So, for example, say the 3 elements we enqueued were completed in the
> >>>> following order: elem1, elem2, elem0. The next expected sequence number
> >>>> is 0:
> >>>>
> >>>>       exp-seq-num = 0:
> >>>>
> >>>>        elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> >>>>       (key: 1)                                         |
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |key: 1 - elem1|
> >>>>                                                  ================
> >>>>       ---------------------
> >>>>       exp-seq-num = 0:
> >>>>
> >>>>        elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> >>>>       (key: 2)                                         |
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |key: 1 - elem1|
> >>>>                                                  |--------------|
> >>>>                                                  |key: 2 - elem2|
> >>>>                                                  ================
> >>>>       ---------------------
> >>>>       exp-seq-num = 0:
> >>>>
> >>>>        elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
> >>>>       (key: 0)
> >>>>
> >>>>       exp-seq-num = 1:
> >>>>
> >>>>       lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>>>                                                remove elem from table
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |key: 2 - elem2|
> >>>>                                                  ================
> >>>>
> >>>>       exp-seq-num = 2:
> >>>>
> >>>>       lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>>>                                                remove elem from table
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |   *empty*    |
> >>>>                                                  ================
> >>>>
> >>>>       exp-seq-num = 3:
> >>>>
> >>>>       lookup(table, exp-seq-num) != NULL ? --> No, done
> >>>>       ---------------------
> >>>>
> >>>
> >>> I think to use a hashtable to handle this has an important drawback:
> >>> it hurts performance on the devices that are using right in-order
> >>> because of hash calculus, to benefit devices that are using it badly
> >>> by using descriptors out of order. We should use data structs that are
> >>> as free as possible for the first, and we don't care to worse the
> >>> experience of the devices that enable in_order and they shouldn't.
> >>>
> >>
> >> Right, because if descriptors are coming in in-order, we still search
> >> the (empty) hash table.
> >>
> >> Hmm... what if we introduced a flag to see if we actually should bother
> >> searching the hash table? That way we avoid the cost of searching when
> >> we really don't need to.
> >>
> >>> So I suggest reusing vq->used_elems array vq. At each used descriptor
> >>> written in the used ring, you know the next head is elem->index +
> >>> elem->ndescs, so you can check if that element has been filled or not.
> >>> If used, it needs to be flushed too. If not used, just return.
> >>>
> >>> Of course virtqueue_flush also needs to take this into account.
> >>>
> >>> What do you think, does it make sense to you?
> >>>
> >>
> >> I'm having a bit of trouble understanding the suggestion here. Would you
> >> mind elaborating a bit more for me on this?
> >>
> >> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
> >> being first, elem2 last) and then elem2 finishes first, elem1 second,
> >> and elem0 third. Given that these elements finish out-of-order, how
> >> would you handle these out-of-order elements using your suggestion?
> >>
> >
> > virtqueue_fill is called first with elem2. So vq->used_elems[2 %
> > vq->num] is filled with the needed information of the descriptor:
> > index, len and ndescs. idx function parameter is ignored.
> >
> > Optionally, virtqueue_push is called. It checks if
> > vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num +
> > elem->out_num > 0, and reset them on every used ring write. If it is
> > not valid, this is a no-op. Currently, it is not valid.
> >
> > Same process for elem1.
> >
> > virtqueue_fill is the same for elem0. But now virtqueue_flush gets
> > interesting, as it detects vq->used_elems[0] is used. It scans for the
> > first not-used element, and it finds it is vq->used_elems[3]. So it
> > needs to write an used elem with id = 2 and the corresponding length.
> >
> > Maybe it is interesting to implement ways to improve the look for the
> > last used descriptor, but if any I'd go for a bitmap and always on top
> > of the basis series.
> >
> > The algorithm has not been tested, so maybe I've missed something.
> >
> > Thanks!
> >
>
> Thank you for taking the time to clarify for this for me, I appreciate it.
>
> I spent some time yesterday and this morning working this over in my
> head. I believe I understand what you're trying to do here and it makes
> more sense than employing a data structure like a hash table for this
> kind of job. However, I have a few questions regarding this implementation.
>
> So, one question is on the reuse of the VirtQueue's used_elems array.
> Wont reusing this array cause issues with packed VQ operations, since it
> also uses this array? If we want to stick with using this array
> specifically, perhaps we may need to rewrite its logic if the device has
> negotiated the in_order feature? E.g.
>
> virtqueue_packed_flush (...) {
>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER) {
>        // new logic
>     } else {
>       // current logic
>     }
> }
> -----------
>

That's right.

> Regarding this paragraph:
>
> "virtqueue_fill is called first with elem2. So vq->used_elems[2 %
> vq->num] is filled with the needed information of the descriptor:
> index, len and ndescs. idx function parameter is ignored."
>
> This looks exactly like virtqueue_packed_fill except for the idx
> parameter we'd pass in (sequence_num % vq->vring.num).
>
> In any case, regardless of whether this element being passed in is
> considered to be in-order or not, we still add this element to
> vq->used_elems in virtqueue_fill. Ok, got it.
>
> Then you say "Optionally, virtqueue_push is called". I assume by
> "optionally" you mean we need to know if this is a single-shot operation
> or a batched operation. A single-shot operation would call for
> virtqueue_push whereas a batched operation would just use
> virtqueue_fill. If this is what you meant by that then ok, I understand
> that too.
>

Totally correct.

> However, I think before we start considering whether or not we need to
> call virtqueue_push or continue with virtqueue_fill, we first should
> know whether or not this element is in-order. And I think to do that we
> should use the check you mentioned:
>
> if (vq->used_elems[vq->used_idx].in_num +
> vq->used_elems[vq->used_idx].out_num > 0)
>
> or perhaps:
>
> if (vq->used_elems[vq->used_idx] != NULL)
>
> If the element is found not to be in-order, I assume we return and we
> are done with the handling of this element for now.
>
> Now my confusion with this part comes from calling virtqueue_push inside
> of the virtqueue_fill function. Wouldn't calling virtqueue_push inside
> of virtqueue_fill present some kind of recursive execution path? Unless
> I'm missing something here, this probably isn't something we need to do,
> right?

Maybe I explained something wrong, but virtqueue_fill should not call
virtqueue_push. It's up to the caller (virtio-net, virtio-blk, etc) to
call one or another. Can you elaborate?

> -----------
>
> Lastly, when execution reaches virtqueue_flush, what would define an
> element as unused? Perhaps...
>
> if (vq->used_elems[i] == NULL)
>

used_elems is not an array of pointers but an array of
VirtQueueElement so we cannot do this way.

> or
>
> if (vq->used_elems[i].in_num + vq->used_elems[i].out_num > 0)
>

Right, I propose to reset both in_num = out_num = 0.

Thanks!

> Thanks Eugenio!
>
> >> Thanks :)
> >>
> >>> Thanks!
> >>>
> >>>
> >>>> Jonah Palmer (8):
> >>>>     virtio: Define InOrderVQElement
> >>>>     virtio: Create/destroy/reset VirtQueue In-Order hash table
> >>>>     virtio: Define order variables
> >>>>     virtio: Implement in-order handling for virtio devices
> >>>>     virtio-net: in-order handling
> >>>>     vhost-svq: in-order handling
> >>>>     vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
> >>>>     virtio: Add VIRTIO_F_IN_ORDER property definition
> >>>>
> >>>>    hw/block/vhost-user-blk.c          |   1 +
> >>>>    hw/net/vhost_net.c                 |   2 +
> >>>>    hw/net/virtio-net.c                |   6 +-
> >>>>    hw/scsi/vhost-scsi.c               |   1 +
> >>>>    hw/scsi/vhost-user-scsi.c          |   1 +
> >>>>    hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
> >>>>    hw/virtio/vhost-user-fs.c          |   1 +
> >>>>    hw/virtio/vhost-user-vsock.c       |   1 +
> >>>>    hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
> >>>>    include/hw/virtio/virtio.h         |  20 +++++-
> >>>>    net/vhost-vdpa.c                   |   1 +
> >>>>    11 files changed, 145 insertions(+), 7 deletions(-)
> >>>>
> >>>> --
> >>>> 2.39.3
> >>>>
> >>>
> >>
> >
>


  reply	other threads:[~2024-03-26 18:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
2024-03-22  9:45   ` Eugenio Perez Martin
2024-03-25 17:08     ` Jonah Palmer
2024-03-25 19:12       ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table Jonah Palmer
2024-03-21 15:57 ` [RFC 3/8] virtio: Define order variables Jonah Palmer
2024-03-21 15:57 ` [RFC 4/8] virtio: Implement in-order handling for virtio devices Jonah Palmer
2024-03-22 10:46   ` Eugenio Perez Martin
2024-03-25 17:34     ` Jonah Palmer
2024-03-25 19:45       ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 5/8] virtio-net: in-order handling Jonah Palmer
2024-03-21 15:57 ` [RFC 6/8] vhost-svq: " Jonah Palmer
2024-03-21 15:57 ` [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
2024-03-22 10:47   ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
2024-03-22 10:48   ` Eugenio Perez Martin
2024-03-21 19:48 ` [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Dongli Zhang
2024-03-21 21:25   ` Jonah Palmer
2024-03-22 11:18 ` Eugenio Perez Martin
2024-03-25 16:52   ` Jonah Palmer
2024-03-25 20:33     ` Eugenio Perez Martin
2024-03-26 16:49       ` Jonah Palmer
2024-03-26 18:34         ` Eugenio Perez Martin [this message]
2024-03-26 19:01           ` Jonah Palmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJaqyWcHokNf97uwE0=S56CK9cBpB54sF8cdW7+BhFc5VzBRUQ@mail.gmail.com' \
    --to=eperezma@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jonah.palmer@oracle.com \
    --cc=kwolf@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael@enfabrica.net \
    --cc=schalla@marvell.com \
    --cc=si-wei.liu@oracle.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.