All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
@ 2024-03-21 15:57 Jonah Palmer
  2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of GLib's GHashTable. The decision behind using a hash table was to
leverage their ability for quick lookup, insertion, and removal
operations. Given that our keys are simply numbers of an ordered
sequence, a hash table seemed like the best choice for a buffer
mechanism.

---------------------

The strategy behind this implementation is as follows:

We know that buffers that are popped from the available ring and enqueued
for further processing will always done in the same order in which they
were made available by the driver. Given this, we can note their order
by assigning the resulting VirtQueueElement a key. This key is a number
in a sequence that represents the order in which they were popped from
the available ring, relative to the other VirtQueueElements.

For example, given 3 "elements" that were popped from the available
ring, we assign a key value to them which represents their order (elem0
is popped first, then elem1, then lastly elem2):

     elem2   --  elem1   --  elem0   ---> Enqueue for processing
    (key: 2)    (key: 1)    (key: 0)

Then these elements are enqueued for further processing by the host.

While most devices will return these completed elements in the same
order in which they were enqueued, some devices may not (e.g.
virtio-blk). To guarantee that these elements are put on the used ring
in the same order in which they were enqueued, we can use a buffering
mechanism that keeps track of the next expected sequence number of an
element.

In other words, if the completed element does not have a key value that
matches the next expected sequence number, then we know this element is
not in-order and we must stash it away in a hash table until an order
can be made. The element's key value is used as the key for placing it
in the hash table.

If the completed element has a key value that matches the next expected
sequence number, then we know this element is in-order and we can push
it on the used ring. Then we increment the next expected sequence number
and check if the hash table contains an element at this key location.

If so, we retrieve this element, push it to the used ring, delete the
key-value pair from the hash table, increment the next expected sequence
number, and check the hash table again for an element at this new key
location. This process is repeated until we're unable to find an element
in the hash table to continue the order.

So, for example, say the 3 elements we enqueued were completed in the
following order: elem1, elem2, elem0. The next expected sequence number
is 0:

    exp-seq-num = 0:

     elem1   --> elem1.key == exp-seq-num ? --> No, stash it
    (key: 1)                                         |
                                                     |
                                                     v
                                               ================
                                               |key: 1 - elem1|
                                               ================
    ---------------------
    exp-seq-num = 0:

     elem2   --> elem2.key == exp-seq-num ? --> No, stash it
    (key: 2)                                         |
                                                     |
                                                     v
                                               ================
                                               |key: 1 - elem1|
                                               |--------------|
                                               |key: 2 - elem2|
                                               ================
    ---------------------
    exp-seq-num = 0:

     elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
    (key: 0)

    exp-seq-num = 1:

    lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
                                             remove elem from table
                                                     |
                                                     v
                                               ================
                                               |key: 2 - elem2|
                                               ================

    exp-seq-num = 2:

    lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
                                             remove elem from table
                                                     |
                                                     v
                                               ================
                                               |   *empty*    |
                                               ================

    exp-seq-num = 3:

    lookup(table, exp-seq-num) != NULL ? --> No, done
    ---------------------

Jonah Palmer (8):
  virtio: Define InOrderVQElement
  virtio: Create/destroy/reset VirtQueue In-Order hash table
  virtio: Define order variables
  virtio: Implement in-order handling for virtio devices
  virtio-net: in-order handling
  vhost-svq: in-order handling
  vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c          |   1 +
 hw/net/vhost_net.c                 |   2 +
 hw/net/virtio-net.c                |   6 +-
 hw/scsi/vhost-scsi.c               |   1 +
 hw/scsi/vhost-user-scsi.c          |   1 +
 hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
 hw/virtio/vhost-user-fs.c          |   1 +
 hw/virtio/vhost-user-vsock.c       |   1 +
 hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
 include/hw/virtio/virtio.h         |  20 +++++-
 net/vhost-vdpa.c                   |   1 +
 11 files changed, 145 insertions(+), 7 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC 1/8] virtio: Define InOrderVQElement
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-22  9:45   ` Eugenio Perez Martin
  2024-03-21 15:57 ` [RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table Jonah Palmer
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
transport feature implementation.

The InOrderVQElement structure is used to encapsulate out-of-order
VirtQueueElement data that was processed by the host. This data
includes:
 - The processed VirtQueueElement (elem)
 - Length of data (len)
 - VirtQueueElement array index (idx)
 - Number of processed VirtQueueElements (count)

InOrderVQElements will be stored in a buffering mechanism until an
order can be achieved.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 include/hw/virtio/virtio.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..c8aa435a5e 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -77,6 +77,13 @@ typedef struct VirtQueueElement
     struct iovec *out_sg;
 } VirtQueueElement;
 
+typedef struct InOrderVQElement {
+    const VirtQueueElement *elem;
+    unsigned int len;
+    unsigned int idx;
+    unsigned int count;
+} InOrderVQElement;
+
 #define VIRTIO_QUEUE_MAX 1024
 
 #define VIRTIO_NO_VECTOR 0xffff
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
  2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-21 15:57 ` [RFC 3/8] virtio: Define order variables Jonah Palmer
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Define a GLib hash table (GHashTable) member in a device's VirtQueue
and add its creation, destruction, and reset functions appropriately.
Also define a function to handle the deallocation of InOrderVQElement
values whenever they're removed from the hash table or the hash table
is destroyed. This hash table is to be used when the device is using
the VIRTIO_F_IN_ORDER transport feature.

A VirtQueue's in-order hash table will take in a uint16_t key with a
InOrderVQElement value as its key-value pair.

The hash table will be used as a buffer mechanism for completed,
out-of-order VirtQueueElements until they can be used in the same order
in which they were made available to the device.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/virtio.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..d2afeeb59a 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -152,6 +152,9 @@ struct VirtQueue
     EventNotifier host_notifier;
     bool host_notifier_enabled;
     QLIST_ENTRY(VirtQueue) node;
+
+    /* In-Order */
+    GHashTable *in_order_ht;
 };
 
 const char *virtio_device_names[] = {
@@ -2070,6 +2073,16 @@ static enum virtio_device_endian virtio_current_cpu_endian(void)
     }
 }
 
+/* 
+ * Called when an element is removed from the hash table
+ * or when the hash table is destroyed.
+ */
+static void free_in_order_vq_element(gpointer data)
+{
+    InOrderVQElement *elem = (InOrderVQElement *)data;
+    g_free(elem);
+}
+
 static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
 {
     vdev->vq[i].vring.desc = 0;
@@ -2087,6 +2100,9 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
     vdev->vq[i].notification = true;
     vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
     vdev->vq[i].inuse = 0;
+    if (vdev->vq[i].in_order_ht != NULL) {
+        g_hash_table_remove_all(vdev->vq[i].in_order_ht);
+    }
     virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
 }
 
@@ -2334,6 +2350,13 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
     vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
     vdev->vq[i].handle_output = handle_output;
     vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
+    vdev->vq[i].in_order_ht = NULL;
+
+    if (virtio_host_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+        vdev->vq[i].in_order_ht =
+            g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,
+                                  free_in_order_vq_element);
+    }
 
     return &vdev->vq[i];
 }
@@ -2345,6 +2368,10 @@ void virtio_delete_queue(VirtQueue *vq)
     vq->handle_output = NULL;
     g_free(vq->used_elems);
     vq->used_elems = NULL;
+    if (virtio_host_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+        g_hash_table_destroy(vq->in_order_ht);
+        vq->in_order_ht = NULL;
+    }
     virtio_virtqueue_reset_region_cache(vq);
 }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 3/8] virtio: Define order variables
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
  2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
  2024-03-21 15:57 ` [RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-21 15:57 ` [RFC 4/8] virtio: Implement in-order handling for virtio devices Jonah Palmer
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Define order variables for their use in a VirtQueue's in-order hash
table. Also initialize current_order variables to 0 when creating or
resetting a VirtQueue. These variables are used when the device has
negotiated the VIRTIO_F_IN_ORDER transport feature.

A VirtQueue's current_order_idx represents the next expected index in
the sequence of VirtQueueElements to be processed (put on the used
ring). The next VirtQueueElement to be processed must match this
sequence number before additional elements can be safely added to the
used ring.

A VirtQueue's current_order_key is essentially a counter whose value is
saved as a key in a VirtQueueElement. After the value has been assigned
to the VirtQueueElement, the counter is incremented. All
VirtQueueElements being used by the device are assigned a key value and
the sequence at which they're assigned must be preserved when the device
puts these elements on the used ring.

A VirtQueueElement's order_key is value of a VirtQueue's
current_order_key at the time of the VirtQueueElement's creation. This
value must match with the VirtQueue's current_order_idx before it's able
to be put on the used ring by the device.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/virtio.c         | 6 ++++++
 include/hw/virtio/virtio.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d2afeeb59a..40124545d6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -155,6 +155,8 @@ struct VirtQueue
 
     /* In-Order */
     GHashTable *in_order_ht;
+    uint16_t current_order_idx;
+    uint16_t current_order_key;
 };
 
 const char *virtio_device_names[] = {
@@ -2103,6 +2105,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
     if (vdev->vq[i].in_order_ht != NULL) {
         g_hash_table_remove_all(vdev->vq[i].in_order_ht);
     }
+    vdev->vq[i].current_order_idx = 0;
+    vdev->vq[i].current_order_key = 0;
     virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
 }
 
@@ -2357,6 +2361,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
             g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,
                                   free_in_order_vq_element);
     }
+    vdev->vq[i].current_order_idx = 0;
+    vdev->vq[i].current_order_key = 0;
 
     return &vdev->vq[i];
 }
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8aa435a5e..f83d7e1fee 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -75,6 +75,7 @@ typedef struct VirtQueueElement
     hwaddr *out_addr;
     struct iovec *in_sg;
     struct iovec *out_sg;
+    uint16_t order_key;
 } VirtQueueElement;
 
 typedef struct InOrderVQElement {
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 4/8] virtio: Implement in-order handling for virtio devices
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (2 preceding siblings ...)
  2024-03-21 15:57 ` [RFC 3/8] virtio: Define order variables Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-22 10:46   ` Eugenio Perez Martin
  2024-03-21 15:57 ` [RFC 5/8] virtio-net: in-order handling Jonah Palmer
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Implements in-order handling for most virtio devices using the
VIRTIO_F_IN_ORDER transport feature, specifically those who call
virtqueue_push to push their used elements onto the used ring.

The logic behind this implementation is as follows:

1.) virtqueue_pop always enqueues VirtQueueElements in-order.

virtqueue_pop always retrieves one or more buffer descriptors in-order
from the available ring and converts them into a VirtQueueElement. This
means that the order in which VirtQueueElements are enqueued are
in-order by default.

By virtue, as VirtQueueElements are created, we can assign a sequential
key value to them. This preserves the order of buffers that have been
made available to the device by the driver.

As VirtQueueElements are assigned a key value, the current sequence
number is incremented.

2.) Requests can be completed out-of-order.

While most devices complete requests in the same order that they were
enqueued by default, some devices don't (e.g. virtio-blk). The goal of
this out-of-order handling is to reduce the impact of devices that
process elements in-order by default while also guaranteeing compliance
with the VIRTIO_F_IN_ORDER feature.

Below is the logic behind handling completed requests (which may or may
not be in-order).

3.) Does the incoming used VirtQueueElement preserve the correct order?

In other words, is the sequence number (key) assigned to the
VirtQueueElement the expected number that would preserve the original
order?

3a.)
If it does... immediately push the used element onto the used ring.
Then increment the next expected sequence number and check to see if
any previous out-of-order VirtQueueElements stored on the hash table
has a key that matches this next expected sequence number.

For each VirtQueueElement found on the hash table with a matching key:
push the element on the used ring, remove the key-value pair from the
hash table, and then increment the next expected sequence number. Repeat
this process until we're unable to find an element with a matching key.

Note that if the device uses batching (e.g. virtio-net), then we skip
the virtqueue_flush call and let the device call it themselves.

3b.)
If it does not... stash the VirtQueueElement, along with relevant data,
as a InOrderVQElement on the hash table. The key used is the order_key
that was assigned when the VirtQueueElement was created.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/virtio.c         | 70 ++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio.h |  8 +++++
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 40124545d6..40e4377f1e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
     }
 }
 
+void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
+                             unsigned int len, unsigned int idx,
+                             unsigned int count)
+{
+    InOrderVQElement *in_order_elem;
+
+    if (elem->order_key == vq->current_order_idx) {
+        /* Element is in-order, push to used ring */
+        virtqueue_fill(vq, elem, len, idx);
+
+        /* Batching? Don't flush */
+        if (count) {
+            virtqueue_flush(vq, count);
+        }
+
+        /* Increment next expected order, search for more in-order elements */
+        while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
+                        GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) {
+            /* Found in-order element, push to used ring */
+            virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
+                           in_order_elem->idx);
+
+            /* Batching? Don't flush */
+            if (count) {
+                virtqueue_flush(vq, in_order_elem->count);
+            }
+
+            /* Remove key-value pair from hash table */
+            g_hash_table_remove(vq->in_order_ht,
+                                GUINT_TO_POINTER(vq->current_order_idx));
+        }
+    } else {
+        /* Element is out-of-order, stash in hash table */
+        in_order_elem = virtqueue_alloc_in_order_element(elem, len, idx,
+                                                         count);
+        g_hash_table_insert(vq->in_order_ht, GUINT_TO_POINTER(elem->order_key),
+                            in_order_elem);
+    }
+}
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len)
 {
     RCU_READ_LOCK_GUARD();
-    virtqueue_fill(vq, elem, len, 0);
-    virtqueue_flush(vq, 1);
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+        virtqueue_order_element(vq, elem, len, 0, 1);
+    } else {
+        virtqueue_fill(vq, elem, len, 0);
+        virtqueue_flush(vq, 1);
+    }
 }
 
 /* Called within rcu_read_lock().  */
@@ -1478,6 +1522,18 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
                                                                         false);
 }
 
+void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
+                                       unsigned int len, unsigned int idx,
+                                       unsigned int count)
+{
+    InOrderVQElement *in_order_elem = g_malloc(sizeof(InOrderVQElement));
+    in_order_elem->elem = elem;
+    in_order_elem->len = len;
+    in_order_elem->idx = idx;
+    in_order_elem->count = count;
+    return in_order_elem;
+}
+
 static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
 {
     VirtQueueElement *elem;
@@ -1626,6 +1682,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
         elem->in_sg[i] = iov[out_num + i];
     }
 
+    /* Assign key for in-order processing */
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+        elem->order_key = vq->current_order_key++;
+    }
+
     vq->inuse++;
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1762,6 +1823,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
         vq->last_avail_wrap_counter ^= 1;
     }
 
+    /* Assign key for in-order processing */
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+        elem->order_key = vq->current_order_key++;
+    }
+
     vq->shadow_avail_idx = vq->last_avail_idx;
     vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
 
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index f83d7e1fee..eeeda397a9 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -275,6 +275,14 @@ void virtio_del_queue(VirtIODevice *vdev, int n);
 
 void virtio_delete_queue(VirtQueue *vq);
 
+void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
+                                       unsigned int len, unsigned int idx,
+                                       unsigned int count);
+
+void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
+                             unsigned int len, unsigned int idx,
+                             unsigned int count);
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 5/8] virtio-net: in-order handling
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (3 preceding siblings ...)
  2024-03-21 15:57 ` [RFC 4/8] virtio: Implement in-order handling for virtio devices Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-21 15:57 ` [RFC 6/8] vhost-svq: " Jonah Palmer
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Implements in-order handling for the virtio-net device.

Since virtio-net utilizes batching for its Rx VirtQueue, the device is
responsible for calling virtqueue_flush once it has completed its
batching operation.

Note:
-----
It's unclear if this implementation is really necessary to "guarantee"
that used VirtQueueElements are put on the used ring in-order since, by
design, virtio-net already does this with its Rx VirtQueue.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/net/virtio-net.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9959f1932b..b0375f7e5e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2069,7 +2069,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
 
     for (j = 0; j < i; j++) {
         /* signal other side */
-        virtqueue_fill(q->rx_vq, elems[j], lens[j], j);
+        if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+            virtqueue_order_element(q->rx_vq, elems[j], lens[j], j, 0);
+        } else {
+            virtqueue_fill(q->rx_vq, elems[j], lens[j], j);
+        }
         g_free(elems[j]);
     }
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 6/8] vhost-svq: in-order handling
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (4 preceding siblings ...)
  2024-03-21 15:57 ` [RFC 5/8] virtio-net: in-order handling Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-21 15:57 ` [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Implements in-order handling for vhost devices using shadow virtqueues.

Since vhost's shadow virtqueues utilize batching in their
vhost_svq_flush calls, the vhost device is responsible for calling
virtqueue_flush once it has completed its batching operation.

Note:
-----
It's unclear if this implementation is really necessary to "guarantee"
in-order handling since, by design, the vhost_svq_flush function puts
used VirtQueueElements in-order already.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index fc5f408f77..3c42adee87 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -493,11 +493,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
                 qemu_log_mask(LOG_GUEST_ERROR,
                          "More than %u used buffers obtained in a %u size SVQ",
                          i, svq->vring.num);
-                virtqueue_fill(vq, elem, len, i);
-                virtqueue_flush(vq, i);
+                if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_IN_ORDER)) {
+                    virtqueue_order_element(vq, elem, len, i, i);
+                } else {
+                    virtqueue_fill(vq, elem, len, i);
+                    virtqueue_flush(vq, i);
+                }
                 return;
             }
-            virtqueue_fill(vq, elem, len, i++);
+
+            if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_IN_ORDER)) {
+                virtqueue_order_element(vq, elem, len, i++, 0);
+            } else {
+                virtqueue_fill(vq, elem, len, i++);
+            }
         }
 
         virtqueue_flush(vq, i);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (5 preceding siblings ...)
  2024-03-21 15:57 ` [RFC 6/8] vhost-svq: " Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-22 10:47   ` Eugenio Perez Martin
  2024-03-21 15:57 ` [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/block/vhost-user-blk.c    | 1 +
 hw/net/vhost_net.c           | 2 ++
 hw/scsi/vhost-scsi.c         | 1 +
 hw/scsi/vhost-user-scsi.c    | 1 +
 hw/virtio/vhost-user-fs.c    | 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c             | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..d176ed857e 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..33d1d4b9d3 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VIRTIO_NET_F_HASH_REPORT,
     VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VIRTIO_NET_F_RSS,
     VIRTIO_NET_F_HASH_REPORT,
     VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_SCSI_F_HOTPLUG,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_SCSI_F_HOTPLUG,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
 
     VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_INDIRECT_DESC,
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_F_NOTIFY_ON_EMPTY,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
     VIRTIO_F_VERSION_1,
+    VIRTIO_F_IN_ORDER,
     VIRTIO_NET_F_CSUM,
     VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
     VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (6 preceding siblings ...)
  2024-03-21 15:57 ` [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
@ 2024-03-21 15:57 ` Jonah Palmer
  2024-03-22 10:48   ` Eugenio Perez Martin
  2024-03-21 19:48 ` [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Dongli Zhang
  2024-03-22 11:18 ` Eugenio Perez Martin
  9 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 15:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index eeeda397a9..ffd78830a3 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -400,7 +400,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
     DEFINE_PROP_BIT64("packed", _state, _field, \
                       VIRTIO_F_RING_PACKED, false), \
     DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-                      VIRTIO_F_RING_RESET, true)
+                      VIRTIO_F_RING_RESET, true), \
+    DEFINE_PROP_BIT64("in_order", _state, _field, \
+                      VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (7 preceding siblings ...)
  2024-03-21 15:57 ` [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
@ 2024-03-21 19:48 ` Dongli Zhang
  2024-03-21 21:25   ` Jonah Palmer
  2024-03-22 11:18 ` Eugenio Perez Martin
  9 siblings, 1 reply; 25+ messages in thread
From: Dongli Zhang @ 2024-03-21 19:48 UTC (permalink / raw)
  To: Jonah Palmer, qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

Hi Jonah,

Would you mind helping explain how does VIRTIO_F_IN_ORDER improve the performance?

https://lore.kernel.org/all/20240321155717.1392787-1-jonah.palmer@oracle.com/#t

I tried to look for it from prior discussions but could not find why.

https://lore.kernel.org/all/BYAPR18MB2791DF7E6C0F61E2D8698E8FA08DA@BYAPR18MB2791.namprd18.prod.outlook.com/

Thank you very much!

Dongli Zhang

On 3/21/24 08:57, Jonah Palmer wrote:
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
> 
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
> 
> The core feature behind this solution is a buffer mechanism in the form
> of GLib's GHashTable. The decision behind using a hash table was to
> leverage their ability for quick lookup, insertion, and removal
> operations. Given that our keys are simply numbers of an ordered
> sequence, a hash table seemed like the best choice for a buffer
> mechanism.
> 
> ---------------------
> 
> The strategy behind this implementation is as follows:
> 
> We know that buffers that are popped from the available ring and enqueued
> for further processing will always done in the same order in which they
> were made available by the driver. Given this, we can note their order
> by assigning the resulting VirtQueueElement a key. This key is a number
> in a sequence that represents the order in which they were popped from
> the available ring, relative to the other VirtQueueElements.
> 
> For example, given 3 "elements" that were popped from the available
> ring, we assign a key value to them which represents their order (elem0
> is popped first, then elem1, then lastly elem2):
> 
>      elem2   --  elem1   --  elem0   ---> Enqueue for processing
>     (key: 2)    (key: 1)    (key: 0)
> 
> Then these elements are enqueued for further processing by the host.
> 
> While most devices will return these completed elements in the same
> order in which they were enqueued, some devices may not (e.g.
> virtio-blk). To guarantee that these elements are put on the used ring
> in the same order in which they were enqueued, we can use a buffering
> mechanism that keeps track of the next expected sequence number of an
> element.
> 
> In other words, if the completed element does not have a key value that
> matches the next expected sequence number, then we know this element is
> not in-order and we must stash it away in a hash table until an order
> can be made. The element's key value is used as the key for placing it
> in the hash table.
> 
> If the completed element has a key value that matches the next expected
> sequence number, then we know this element is in-order and we can push
> it on the used ring. Then we increment the next expected sequence number
> and check if the hash table contains an element at this key location.
> 
> If so, we retrieve this element, push it to the used ring, delete the
> key-value pair from the hash table, increment the next expected sequence
> number, and check the hash table again for an element at this new key
> location. This process is repeated until we're unable to find an element
> in the hash table to continue the order.
> 
> So, for example, say the 3 elements we enqueued were completed in the
> following order: elem1, elem2, elem0. The next expected sequence number
> is 0:
> 
>     exp-seq-num = 0:
> 
>      elem1   --> elem1.key == exp-seq-num ? --> No, stash it
>     (key: 1)                                         |
>                                                      |
>                                                      v
>                                                ================
>                                                |key: 1 - elem1|
>                                                ================
>     ---------------------
>     exp-seq-num = 0:
> 
>      elem2   --> elem2.key == exp-seq-num ? --> No, stash it
>     (key: 2)                                         |
>                                                      |
>                                                      v
>                                                ================
>                                                |key: 1 - elem1|
>                                                |--------------|
>                                                |key: 2 - elem2|
>                                                ================
>     ---------------------
>     exp-seq-num = 0:
> 
>      elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>     (key: 0)
> 
>     exp-seq-num = 1:
> 
>     lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>                                              remove elem from table
>                                                      |
>                                                      v
>                                                ================
>                                                |key: 2 - elem2|
>                                                ================
> 
>     exp-seq-num = 2:
> 
>     lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>                                              remove elem from table
>                                                      |
>                                                      v
>                                                ================
>                                                |   *empty*    |
>                                                ================
> 
>     exp-seq-num = 3:
> 
>     lookup(table, exp-seq-num) != NULL ? --> No, done
>     ---------------------
> 
> Jonah Palmer (8):
>   virtio: Define InOrderVQElement
>   virtio: Create/destroy/reset VirtQueue In-Order hash table
>   virtio: Define order variables
>   virtio: Implement in-order handling for virtio devices
>   virtio-net: in-order handling
>   vhost-svq: in-order handling
>   vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>   virtio: Add VIRTIO_F_IN_ORDER property definition
> 
>  hw/block/vhost-user-blk.c          |   1 +
>  hw/net/vhost_net.c                 |   2 +
>  hw/net/virtio-net.c                |   6 +-
>  hw/scsi/vhost-scsi.c               |   1 +
>  hw/scsi/vhost-user-scsi.c          |   1 +
>  hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
>  hw/virtio/vhost-user-fs.c          |   1 +
>  hw/virtio/vhost-user-vsock.c       |   1 +
>  hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
>  include/hw/virtio/virtio.h         |  20 +++++-
>  net/vhost-vdpa.c                   |   1 +
>  11 files changed, 145 insertions(+), 7 deletions(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-21 19:48 ` [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Dongli Zhang
@ 2024-03-21 21:25   ` Jonah Palmer
  0 siblings, 0 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-21 21:25 UTC (permalink / raw)
  To: Dongli Zhang, qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 3/21/24 3:48 PM, Dongli Zhang wrote:
> Hi Jonah,
> 
> Would you mind helping explain how does VIRTIO_F_IN_ORDER improve the performance?
> 
> https://lore.kernel.org/all/20240321155717.1392787-1-jonah.palmer@oracle.com/#t
> 
> I tried to look for it from prior discussions but could not find why.
> 
> https://lore.kernel.org/all/BYAPR18MB2791DF7E6C0F61E2D8698E8FA08DA@BYAPR18MB2791.namprd18.prod.outlook.com/
> 
> Thank you very much!
> 
> Dongli Zhang
> 

Hey Dongli,

So VIRTIO_F_IN_ORDER can theoretically improve performance under certain 
conditions. Whether it can improve performance today, I'm not sure.

But, if we can guarantee that all buffers are used by the device in the 
same order in which they're made available by the driver (enforcing a 
strict in-order processing and completion of requests), then we can 
leverage this to our advantage.

For example, we could simplify device and driver logic such as not 
needing complex mechanisms to track the completion of out-of-order 
requests (reduce request management overhead). Though the need of 
complex mechanisms to force this data to be in-order kind of defeats 
this benefit.

It could also improve cache utilization since sequential access patterns 
are more cache-friendly compared to random access patterns.

Also, in-order processing is more predictable, making it easier to 
optimize device and driver performance. E.g. it can allow us to 
fine-tune things without having to account for the variability of 
out-of-order completions.

But again, the actual performance impact will vary depending on the use 
case and workload. Scenarios that require high levels of parallelism or 
where out-of-order completions are efficiently managed, the flexibility 
of out-of-order processing can still be preferable.

Jonah

> On 3/21/24 08:57, Jonah Palmer wrote:
>> The goal of these patches is to add support to a variety of virtio and
>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
>> indicates that all buffers are used by the device in the same order in
>> which they were made available by the driver.
>>
>> These patches attempt to implement a generalized, non-device-specific
>> solution to support this feature.
>>
>> The core feature behind this solution is a buffer mechanism in the form
>> of GLib's GHashTable. The decision behind using a hash table was to
>> leverage their ability for quick lookup, insertion, and removal
>> operations. Given that our keys are simply numbers of an ordered
>> sequence, a hash table seemed like the best choice for a buffer
>> mechanism.
>>
>> ---------------------
>>
>> The strategy behind this implementation is as follows:
>>
>> We know that buffers that are popped from the available ring and enqueued
>> for further processing will always done in the same order in which they
>> were made available by the driver. Given this, we can note their order
>> by assigning the resulting VirtQueueElement a key. This key is a number
>> in a sequence that represents the order in which they were popped from
>> the available ring, relative to the other VirtQueueElements.
>>
>> For example, given 3 "elements" that were popped from the available
>> ring, we assign a key value to them which represents their order (elem0
>> is popped first, then elem1, then lastly elem2):
>>
>>       elem2   --  elem1   --  elem0   ---> Enqueue for processing
>>      (key: 2)    (key: 1)    (key: 0)
>>
>> Then these elements are enqueued for further processing by the host.
>>
>> While most devices will return these completed elements in the same
>> order in which they were enqueued, some devices may not (e.g.
>> virtio-blk). To guarantee that these elements are put on the used ring
>> in the same order in which they were enqueued, we can use a buffering
>> mechanism that keeps track of the next expected sequence number of an
>> element.
>>
>> In other words, if the completed element does not have a key value that
>> matches the next expected sequence number, then we know this element is
>> not in-order and we must stash it away in a hash table until an order
>> can be made. The element's key value is used as the key for placing it
>> in the hash table.
>>
>> If the completed element has a key value that matches the next expected
>> sequence number, then we know this element is in-order and we can push
>> it on the used ring. Then we increment the next expected sequence number
>> and check if the hash table contains an element at this key location.
>>
>> If so, we retrieve this element, push it to the used ring, delete the
>> key-value pair from the hash table, increment the next expected sequence
>> number, and check the hash table again for an element at this new key
>> location. This process is repeated until we're unable to find an element
>> in the hash table to continue the order.
>>
>> So, for example, say the 3 elements we enqueued were completed in the
>> following order: elem1, elem2, elem0. The next expected sequence number
>> is 0:
>>
>>      exp-seq-num = 0:
>>
>>       elem1   --> elem1.key == exp-seq-num ? --> No, stash it
>>      (key: 1)                                         |
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |key: 1 - elem1|
>>                                                 ================
>>      ---------------------
>>      exp-seq-num = 0:
>>
>>       elem2   --> elem2.key == exp-seq-num ? --> No, stash it
>>      (key: 2)                                         |
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |key: 1 - elem1|
>>                                                 |--------------|
>>                                                 |key: 2 - elem2|
>>                                                 ================
>>      ---------------------
>>      exp-seq-num = 0:
>>
>>       elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>>      (key: 0)
>>
>>      exp-seq-num = 1:
>>
>>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>                                               remove elem from table
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |key: 2 - elem2|
>>                                                 ================
>>
>>      exp-seq-num = 2:
>>
>>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>                                               remove elem from table
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |   *empty*    |
>>                                                 ================
>>
>>      exp-seq-num = 3:
>>
>>      lookup(table, exp-seq-num) != NULL ? --> No, done
>>      ---------------------
>>
>> Jonah Palmer (8):
>>    virtio: Define InOrderVQElement
>>    virtio: Create/destroy/reset VirtQueue In-Order hash table
>>    virtio: Define order variables
>>    virtio: Implement in-order handling for virtio devices
>>    virtio-net: in-order handling
>>    vhost-svq: in-order handling
>>    vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>>    virtio: Add VIRTIO_F_IN_ORDER property definition
>>
>>   hw/block/vhost-user-blk.c          |   1 +
>>   hw/net/vhost_net.c                 |   2 +
>>   hw/net/virtio-net.c                |   6 +-
>>   hw/scsi/vhost-scsi.c               |   1 +
>>   hw/scsi/vhost-user-scsi.c          |   1 +
>>   hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
>>   hw/virtio/vhost-user-fs.c          |   1 +
>>   hw/virtio/vhost-user-vsock.c       |   1 +
>>   hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
>>   include/hw/virtio/virtio.h         |  20 +++++-
>>   net/vhost-vdpa.c                   |   1 +
>>   11 files changed, 145 insertions(+), 7 deletions(-)
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/8] virtio: Define InOrderVQElement
  2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
@ 2024-03-22  9:45   ` Eugenio Perez Martin
  2024-03-25 17:08     ` Jonah Palmer
  0 siblings, 1 reply; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-22  9:45 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
> transport feature implementation.
>
> The InOrderVQElement structure is used to encapsulate out-of-order
> VirtQueueElement data that was processed by the host. This data
> includes:
>  - The processed VirtQueueElement (elem)
>  - Length of data (len)
>  - VirtQueueElement array index (idx)
>  - Number of processed VirtQueueElements (count)
>
> InOrderVQElements will be stored in a buffering mechanism until an
> order can be achieved.
>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  include/hw/virtio/virtio.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b3c74a1bca..c8aa435a5e 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -77,6 +77,13 @@ typedef struct VirtQueueElement
>      struct iovec *out_sg;
>  } VirtQueueElement;
>
> +typedef struct InOrderVQElement {
> +    const VirtQueueElement *elem;

Some subsystems allocate space for extra elements after
VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
to allocate this extra space by its second argument. Would it work for
this?

> +    unsigned int len;
> +    unsigned int idx;
> +    unsigned int count;

Now I don't get why these fields cannot be obtained from elem->(len,
index, ndescs) ?

> +} InOrderVQElement;
> +
>  #define VIRTIO_QUEUE_MAX 1024
>
>  #define VIRTIO_NO_VECTOR 0xffff
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices
  2024-03-21 15:57 ` [RFC 4/8] virtio: Implement in-order handling for virtio devices Jonah Palmer
@ 2024-03-22 10:46   ` Eugenio Perez Martin
  2024-03-25 17:34     ` Jonah Palmer
  0 siblings, 1 reply; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-22 10:46 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Implements in-order handling for most virtio devices using the
> VIRTIO_F_IN_ORDER transport feature, specifically those who call
> virtqueue_push to push their used elements onto the used ring.
>
> The logic behind this implementation is as follows:
>
> 1.) virtqueue_pop always enqueues VirtQueueElements in-order.
>
> virtqueue_pop always retrieves one or more buffer descriptors in-order
> from the available ring and converts them into a VirtQueueElement. This
> means that the order in which VirtQueueElements are enqueued are
> in-order by default.
>
> By virtue, as VirtQueueElements are created, we can assign a sequential
> key value to them. This preserves the order of buffers that have been
> made available to the device by the driver.
>
> As VirtQueueElements are assigned a key value, the current sequence
> number is incremented.
>
> 2.) Requests can be completed out-of-order.
>
> While most devices complete requests in the same order that they were
> enqueued by default, some devices don't (e.g. virtio-blk). The goal of
> this out-of-order handling is to reduce the impact of devices that
> process elements in-order by default while also guaranteeing compliance
> with the VIRTIO_F_IN_ORDER feature.
>
> Below is the logic behind handling completed requests (which may or may
> not be in-order).
>
> 3.) Does the incoming used VirtQueueElement preserve the correct order?
>
> In other words, is the sequence number (key) assigned to the
> VirtQueueElement the expected number that would preserve the original
> order?
>
> 3a.)
> If it does... immediately push the used element onto the used ring.
> Then increment the next expected sequence number and check to see if
> any previous out-of-order VirtQueueElements stored on the hash table
> has a key that matches this next expected sequence number.
>
> For each VirtQueueElement found on the hash table with a matching key:
> push the element on the used ring, remove the key-value pair from the
> hash table, and then increment the next expected sequence number. Repeat
> this process until we're unable to find an element with a matching key.
>
> Note that if the device uses batching (e.g. virtio-net), then we skip
> the virtqueue_flush call and let the device call it themselves.
>
> 3b.)
> If it does not... stash the VirtQueueElement, along with relevant data,
> as a InOrderVQElement on the hash table. The key used is the order_key
> that was assigned when the VirtQueueElement was created.
>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  hw/virtio/virtio.c         | 70 ++++++++++++++++++++++++++++++++++++--
>  include/hw/virtio/virtio.h |  8 +++++
>  2 files changed, 76 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 40124545d6..40e4377f1e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>      }
>  }
>
> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> +                             unsigned int len, unsigned int idx,
> +                             unsigned int count)
> +{
> +    InOrderVQElement *in_order_elem;
> +
> +    if (elem->order_key == vq->current_order_idx) {
> +        /* Element is in-order, push to used ring */
> +        virtqueue_fill(vq, elem, len, idx);
> +
> +        /* Batching? Don't flush */
> +        if (count) {
> +            virtqueue_flush(vq, count);

The "count" parameter is the number of heads used, but here you're
only using one head (elem). Same with the other virtqueue_flush in the
function.

Also, this function sometimes replaces virtqueue_fill and other
replaces virtqueue_fill + virtqueue_flush (both examples in patch
6/8). I have the impression the series would be simpler if
virtqueue_order_element is a static function just handling the
virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER) path of
virtqueue_fill, so the caller does not need to know if the in_order
feature is on or off.

> +        }
> +
> +        /* Increment next expected order, search for more in-order elements */
> +        while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
> +                        GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) {
> +            /* Found in-order element, push to used ring */
> +            virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
> +                           in_order_elem->idx);
> +
> +            /* Batching? Don't flush */
> +            if (count) {
> +                virtqueue_flush(vq, in_order_elem->count);
> +            }
> +
> +            /* Remove key-value pair from hash table */
> +            g_hash_table_remove(vq->in_order_ht,
> +                                GUINT_TO_POINTER(vq->current_order_idx));
> +        }
> +    } else {
> +        /* Element is out-of-order, stash in hash table */
> +        in_order_elem = virtqueue_alloc_in_order_element(elem, len, idx,
> +                                                         count);
> +        g_hash_table_insert(vq->in_order_ht, GUINT_TO_POINTER(elem->order_key),
> +                            in_order_elem);
> +    }
> +}
> +
>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>                      unsigned int len)
>  {
>      RCU_READ_LOCK_GUARD();
> -    virtqueue_fill(vq, elem, len, 0);
> -    virtqueue_flush(vq, 1);
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> +        virtqueue_order_element(vq, elem, len, 0, 1);
> +    } else {
> +        virtqueue_fill(vq, elem, len, 0);
> +        virtqueue_flush(vq, 1);
> +    }
>  }
>
>  /* Called within rcu_read_lock().  */
> @@ -1478,6 +1522,18 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
>                                                                          false);
>  }
>
> +void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
> +                                       unsigned int len, unsigned int idx,
> +                                       unsigned int count)
> +{
> +    InOrderVQElement *in_order_elem = g_malloc(sizeof(InOrderVQElement));
> +    in_order_elem->elem = elem;
> +    in_order_elem->len = len;
> +    in_order_elem->idx = idx;
> +    in_order_elem->count = count;
> +    return in_order_elem;
> +}
> +
>  static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
>  {
>      VirtQueueElement *elem;
> @@ -1626,6 +1682,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>          elem->in_sg[i] = iov[out_num + i];
>      }
>
> +    /* Assign key for in-order processing */
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +        elem->order_key = vq->current_order_key++;

Since you're adding this in both split_pop and packed_pop, why not add
it in virtqueue_pop?

> +    }
> +
>      vq->inuse++;
>
>      trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> @@ -1762,6 +1823,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>          vq->last_avail_wrap_counter ^= 1;
>      }
>
> +    /* Assign key for in-order processing */
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +        elem->order_key = vq->current_order_key++;
> +    }
> +
>      vq->shadow_avail_idx = vq->last_avail_idx;
>      vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index f83d7e1fee..eeeda397a9 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -275,6 +275,14 @@ void virtio_del_queue(VirtIODevice *vdev, int n);
>
>  void virtio_delete_queue(VirtQueue *vq);
>
> +void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
> +                                       unsigned int len, unsigned int idx,
> +                                       unsigned int count);
> +
> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> +                             unsigned int len, unsigned int idx,
> +                             unsigned int count);
> +
>  void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>                      unsigned int len);
>  void virtqueue_flush(VirtQueue *vq, unsigned int count);
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  2024-03-21 15:57 ` [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
@ 2024-03-22 10:47   ` Eugenio Perez Martin
  0 siblings, 0 replies; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-22 10:47 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
> devices.
>
> The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
> devices ensures that the backend is capable of offering and providing
> support for this feature, and that it can be disabled if the backend
> does not support it.
>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

Thanks!

> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  hw/block/vhost-user-blk.c    | 1 +
>  hw/net/vhost_net.c           | 2 ++
>  hw/scsi/vhost-scsi.c         | 1 +
>  hw/scsi/vhost-user-scsi.c    | 1 +
>  hw/virtio/vhost-user-fs.c    | 1 +
>  hw/virtio/vhost-user-vsock.c | 1 +
>  net/vhost-vdpa.c             | 1 +
>  7 files changed, 8 insertions(+)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 6a856ad51a..d176ed857e 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_IOMMU_PLATFORM,
>      VIRTIO_F_RING_RESET,
> +    VIRTIO_F_IN_ORDER,
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..33d1d4b9d3 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
>      VIRTIO_F_IOMMU_PLATFORM,
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_RING_RESET,
> +    VIRTIO_F_IN_ORDER,
>      VIRTIO_NET_F_HASH_REPORT,
>      VHOST_INVALID_FEATURE_BIT
>  };
> @@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_F_IOMMU_PLATFORM,
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_RING_RESET,
> +    VIRTIO_F_IN_ORDER,
>      VIRTIO_NET_F_RSS,
>      VIRTIO_NET_F_HASH_REPORT,
>      VIRTIO_NET_F_GUEST_USO4,
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index ae26bc19a4..40e7630191 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
>      VIRTIO_RING_F_EVENT_IDX,
>      VIRTIO_SCSI_F_HOTPLUG,
>      VIRTIO_F_RING_RESET,
> +    VIRTIO_F_IN_ORDER,
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index a63b1f4948..1d59951ab7 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_RING_F_EVENT_IDX,
>      VIRTIO_SCSI_F_HOTPLUG,
>      VIRTIO_F_RING_RESET,
> +    VIRTIO_F_IN_ORDER,
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index cca2cd41be..9243dbb128 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_IOMMU_PLATFORM,
>      VIRTIO_F_RING_RESET,
> +    VIRTIO_F_IN_ORDER,
>
>      VHOST_INVALID_FEATURE_BIT
>  };
> diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
> index 9431b9792c..cc7e4e47b4 100644
> --- a/hw/virtio/vhost-user-vsock.c
> +++ b/hw/virtio/vhost-user-vsock.c
> @@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
>      VIRTIO_RING_F_INDIRECT_DESC,
>      VIRTIO_RING_F_EVENT_IDX,
>      VIRTIO_F_NOTIFY_ON_EMPTY,
> +    VIRTIO_F_IN_ORDER,
>      VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 85e73dd6a7..ed3185acfa 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
>      VIRTIO_F_RING_PACKED,
>      VIRTIO_F_RING_RESET,
>      VIRTIO_F_VERSION_1,
> +    VIRTIO_F_IN_ORDER,
>      VIRTIO_NET_F_CSUM,
>      VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
>      VIRTIO_NET_F_CTRL_MAC_ADDR,
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition
  2024-03-21 15:57 ` [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
@ 2024-03-22 10:48   ` Eugenio Perez Martin
  0 siblings, 0 replies; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-22 10:48 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Extend the virtio device property definitions to include the
> VIRTIO_F_IN_ORDER feature.
>
> The default state of this feature is disabled, allowing it to be
> explicitly enabled where it's supported.
>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

Thanks!

> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  include/hw/virtio/virtio.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index eeeda397a9..ffd78830a3 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -400,7 +400,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>      DEFINE_PROP_BIT64("packed", _state, _field, \
>                        VIRTIO_F_RING_PACKED, false), \
>      DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> -                      VIRTIO_F_RING_RESET, true)
> +                      VIRTIO_F_RING_RESET, true), \
> +    DEFINE_PROP_BIT64("in_order", _state, _field, \
> +                      VIRTIO_F_IN_ORDER, false)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (8 preceding siblings ...)
  2024-03-21 19:48 ` [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Dongli Zhang
@ 2024-03-22 11:18 ` Eugenio Perez Martin
  2024-03-25 16:52   ` Jonah Palmer
  9 siblings, 1 reply; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-22 11:18 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
>
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
>
> The core feature behind this solution is a buffer mechanism in the form
> of GLib's GHashTable. The decision behind using a hash table was to
> leverage their ability for quick lookup, insertion, and removal
> operations. Given that our keys are simply numbers of an ordered
> sequence, a hash table seemed like the best choice for a buffer
> mechanism.
>
> ---------------------
>
> The strategy behind this implementation is as follows:
>
> We know that buffers that are popped from the available ring and enqueued
> for further processing will always done in the same order in which they
> were made available by the driver. Given this, we can note their order
> by assigning the resulting VirtQueueElement a key. This key is a number
> in a sequence that represents the order in which they were popped from
> the available ring, relative to the other VirtQueueElements.
>
> For example, given 3 "elements" that were popped from the available
> ring, we assign a key value to them which represents their order (elem0
> is popped first, then elem1, then lastly elem2):
>
>      elem2   --  elem1   --  elem0   ---> Enqueue for processing
>     (key: 2)    (key: 1)    (key: 0)
>
> Then these elements are enqueued for further processing by the host.
>
> While most devices will return these completed elements in the same
> order in which they were enqueued, some devices may not (e.g.
> virtio-blk). To guarantee that these elements are put on the used ring
> in the same order in which they were enqueued, we can use a buffering
> mechanism that keeps track of the next expected sequence number of an
> element.
>
> In other words, if the completed element does not have a key value that
> matches the next expected sequence number, then we know this element is
> not in-order and we must stash it away in a hash table until an order
> can be made. The element's key value is used as the key for placing it
> in the hash table.
>
> If the completed element has a key value that matches the next expected
> sequence number, then we know this element is in-order and we can push
> it on the used ring. Then we increment the next expected sequence number
> and check if the hash table contains an element at this key location.
>
> If so, we retrieve this element, push it to the used ring, delete the
> key-value pair from the hash table, increment the next expected sequence
> number, and check the hash table again for an element at this new key
> location. This process is repeated until we're unable to find an element
> in the hash table to continue the order.
>
> So, for example, say the 3 elements we enqueued were completed in the
> following order: elem1, elem2, elem0. The next expected sequence number
> is 0:
>
>     exp-seq-num = 0:
>
>      elem1   --> elem1.key == exp-seq-num ? --> No, stash it
>     (key: 1)                                         |
>                                                      |
>                                                      v
>                                                ================
>                                                |key: 1 - elem1|
>                                                ================
>     ---------------------
>     exp-seq-num = 0:
>
>      elem2   --> elem2.key == exp-seq-num ? --> No, stash it
>     (key: 2)                                         |
>                                                      |
>                                                      v
>                                                ================
>                                                |key: 1 - elem1|
>                                                |--------------|
>                                                |key: 2 - elem2|
>                                                ================
>     ---------------------
>     exp-seq-num = 0:
>
>      elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>     (key: 0)
>
>     exp-seq-num = 1:
>
>     lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>                                              remove elem from table
>                                                      |
>                                                      v
>                                                ================
>                                                |key: 2 - elem2|
>                                                ================
>
>     exp-seq-num = 2:
>
>     lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>                                              remove elem from table
>                                                      |
>                                                      v
>                                                ================
>                                                |   *empty*    |
>                                                ================
>
>     exp-seq-num = 3:
>
>     lookup(table, exp-seq-num) != NULL ? --> No, done
>     ---------------------
>

I think to use a hashtable to handle this has an important drawback:
it hurts performance on the devices that are using right in-order
because of hash calculus, to benefit devices that are using it badly
by using descriptors out of order. We should use data structs that are
as free as possible for the first, and we don't care to worse the
experience of the devices that enable in_order and they shouldn't.

So I suggest reusing vq->used_elems array vq. At each used descriptor
written in the used ring, you know the next head is elem->index +
elem->ndescs, so you can check if that element has been filled or not.
If used, it needs to be flushed too. If not used, just return.

Of course virtqueue_flush also needs to take this into account.

What do you think, does it make sense to you?

Thanks!


> Jonah Palmer (8):
>   virtio: Define InOrderVQElement
>   virtio: Create/destroy/reset VirtQueue In-Order hash table
>   virtio: Define order variables
>   virtio: Implement in-order handling for virtio devices
>   virtio-net: in-order handling
>   vhost-svq: in-order handling
>   vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>   virtio: Add VIRTIO_F_IN_ORDER property definition
>
>  hw/block/vhost-user-blk.c          |   1 +
>  hw/net/vhost_net.c                 |   2 +
>  hw/net/virtio-net.c                |   6 +-
>  hw/scsi/vhost-scsi.c               |   1 +
>  hw/scsi/vhost-user-scsi.c          |   1 +
>  hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
>  hw/virtio/vhost-user-fs.c          |   1 +
>  hw/virtio/vhost-user-vsock.c       |   1 +
>  hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
>  include/hw/virtio/virtio.h         |  20 +++++-
>  net/vhost-vdpa.c                   |   1 +
>  11 files changed, 145 insertions(+), 7 deletions(-)
>
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-22 11:18 ` Eugenio Perez Martin
@ 2024-03-25 16:52   ` Jonah Palmer
  2024-03-25 20:33     ` Eugenio Perez Martin
  0 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-25 16:52 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>> The goal of these patches is to add support to a variety of virtio and
>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
>> indicates that all buffers are used by the device in the same order in
>> which they were made available by the driver.
>>
>> These patches attempt to implement a generalized, non-device-specific
>> solution to support this feature.
>>
>> The core feature behind this solution is a buffer mechanism in the form
>> of GLib's GHashTable. The decision behind using a hash table was to
>> leverage their ability for quick lookup, insertion, and removal
>> operations. Given that our keys are simply numbers of an ordered
>> sequence, a hash table seemed like the best choice for a buffer
>> mechanism.
>>
>> ---------------------
>>
>> The strategy behind this implementation is as follows:
>>
>> We know that buffers that are popped from the available ring and enqueued
>> for further processing will always done in the same order in which they
>> were made available by the driver. Given this, we can note their order
>> by assigning the resulting VirtQueueElement a key. This key is a number
>> in a sequence that represents the order in which they were popped from
>> the available ring, relative to the other VirtQueueElements.
>>
>> For example, given 3 "elements" that were popped from the available
>> ring, we assign a key value to them which represents their order (elem0
>> is popped first, then elem1, then lastly elem2):
>>
>>       elem2   --  elem1   --  elem0   ---> Enqueue for processing
>>      (key: 2)    (key: 1)    (key: 0)
>>
>> Then these elements are enqueued for further processing by the host.
>>
>> While most devices will return these completed elements in the same
>> order in which they were enqueued, some devices may not (e.g.
>> virtio-blk). To guarantee that these elements are put on the used ring
>> in the same order in which they were enqueued, we can use a buffering
>> mechanism that keeps track of the next expected sequence number of an
>> element.
>>
>> In other words, if the completed element does not have a key value that
>> matches the next expected sequence number, then we know this element is
>> not in-order and we must stash it away in a hash table until an order
>> can be made. The element's key value is used as the key for placing it
>> in the hash table.
>>
>> If the completed element has a key value that matches the next expected
>> sequence number, then we know this element is in-order and we can push
>> it on the used ring. Then we increment the next expected sequence number
>> and check if the hash table contains an element at this key location.
>>
>> If so, we retrieve this element, push it to the used ring, delete the
>> key-value pair from the hash table, increment the next expected sequence
>> number, and check the hash table again for an element at this new key
>> location. This process is repeated until we're unable to find an element
>> in the hash table to continue the order.
>>
>> So, for example, say the 3 elements we enqueued were completed in the
>> following order: elem1, elem2, elem0. The next expected sequence number
>> is 0:
>>
>>      exp-seq-num = 0:
>>
>>       elem1   --> elem1.key == exp-seq-num ? --> No, stash it
>>      (key: 1)                                         |
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |key: 1 - elem1|
>>                                                 ================
>>      ---------------------
>>      exp-seq-num = 0:
>>
>>       elem2   --> elem2.key == exp-seq-num ? --> No, stash it
>>      (key: 2)                                         |
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |key: 1 - elem1|
>>                                                 |--------------|
>>                                                 |key: 2 - elem2|
>>                                                 ================
>>      ---------------------
>>      exp-seq-num = 0:
>>
>>       elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>>      (key: 0)
>>
>>      exp-seq-num = 1:
>>
>>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>                                               remove elem from table
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |key: 2 - elem2|
>>                                                 ================
>>
>>      exp-seq-num = 2:
>>
>>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>                                               remove elem from table
>>                                                       |
>>                                                       v
>>                                                 ================
>>                                                 |   *empty*    |
>>                                                 ================
>>
>>      exp-seq-num = 3:
>>
>>      lookup(table, exp-seq-num) != NULL ? --> No, done
>>      ---------------------
>>
> 
> I think to use a hashtable to handle this has an important drawback:
> it hurts performance on the devices that are using right in-order
> because of hash calculus, to benefit devices that are using it badly
> by using descriptors out of order. We should use data structs that are
> as free as possible for the first, and we don't care to worse the
> experience of the devices that enable in_order and they shouldn't.
> 

Right, because if descriptors are coming in in-order, we still search 
the (empty) hash table.

Hmm... what if we introduced a flag to see if we actually should bother 
searching the hash table? That way we avoid the cost of searching when 
we really don't need to.

> So I suggest reusing vq->used_elems array vq. At each used descriptor
> written in the used ring, you know the next head is elem->index +
> elem->ndescs, so you can check if that element has been filled or not.
> If used, it needs to be flushed too. If not used, just return.
> 
> Of course virtqueue_flush also needs to take this into account.
> 
> What do you think, does it make sense to you?
> 

I'm having a bit of trouble understanding the suggestion here. Would you 
mind elaborating a bit more for me on this?

For example, say elem0, elem1, and elem2 were enqueued in-order (elem0 
being first, elem2 last) and then elem2 finishes first, elem1 second, 
and elem0 third. Given that these elements finish out-of-order, how 
would you handle these out-of-order elements using your suggestion?

Thanks :)

> Thanks!
> 
> 
>> Jonah Palmer (8):
>>    virtio: Define InOrderVQElement
>>    virtio: Create/destroy/reset VirtQueue In-Order hash table
>>    virtio: Define order variables
>>    virtio: Implement in-order handling for virtio devices
>>    virtio-net: in-order handling
>>    vhost-svq: in-order handling
>>    vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>>    virtio: Add VIRTIO_F_IN_ORDER property definition
>>
>>   hw/block/vhost-user-blk.c          |   1 +
>>   hw/net/vhost_net.c                 |   2 +
>>   hw/net/virtio-net.c                |   6 +-
>>   hw/scsi/vhost-scsi.c               |   1 +
>>   hw/scsi/vhost-user-scsi.c          |   1 +
>>   hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
>>   hw/virtio/vhost-user-fs.c          |   1 +
>>   hw/virtio/vhost-user-vsock.c       |   1 +
>>   hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
>>   include/hw/virtio/virtio.h         |  20 +++++-
>>   net/vhost-vdpa.c                   |   1 +
>>   11 files changed, 145 insertions(+), 7 deletions(-)
>>
>> --
>> 2.39.3
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/8] virtio: Define InOrderVQElement
  2024-03-22  9:45   ` Eugenio Perez Martin
@ 2024-03-25 17:08     ` Jonah Palmer
  2024-03-25 19:12       ` Eugenio Perez Martin
  0 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-25 17:08 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 3/22/24 5:45 AM, Eugenio Perez Martin wrote:
> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>> Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
>> transport feature implementation.
>>
>> The InOrderVQElement structure is used to encapsulate out-of-order
>> VirtQueueElement data that was processed by the host. This data
>> includes:
>>   - The processed VirtQueueElement (elem)
>>   - Length of data (len)
>>   - VirtQueueElement array index (idx)
>>   - Number of processed VirtQueueElements (count)
>>
>> InOrderVQElements will be stored in a buffering mechanism until an
>> order can be achieved.
>>
>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>> ---
>>   include/hw/virtio/virtio.h | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> index b3c74a1bca..c8aa435a5e 100644
>> --- a/include/hw/virtio/virtio.h
>> +++ b/include/hw/virtio/virtio.h
>> @@ -77,6 +77,13 @@ typedef struct VirtQueueElement
>>       struct iovec *out_sg;
>>   } VirtQueueElement;
>>
>> +typedef struct InOrderVQElement {
>> +    const VirtQueueElement *elem;
> 
> Some subsystems allocate space for extra elements after
> VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
> to allocate this extra space by its second argument. Would it work for
> this?
> 

I don't see why not. Although this may not be necessary due to me 
missing a key aspect mentioned in your comment below.

>> +    unsigned int len;
>> +    unsigned int idx;
>> +    unsigned int count;
> 
> Now I don't get why these fields cannot be obtained from elem->(len,
> index, ndescs) ?
> 

Interesting. I didn't realize that these values are equivalent to a 
VirtQueueElement's len, index, and ndescs fields.

Is this always true? Else I would've expected, for example, 
virtqueue_push to not need the 'unsigned int len' parameter if this 
information is already included via. the VirtQueueElement being passed in.

>> +} InOrderVQElement;
>> +
>>   #define VIRTIO_QUEUE_MAX 1024
>>
>>   #define VIRTIO_NO_VECTOR 0xffff
>> --
>> 2.39.3
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices
  2024-03-22 10:46   ` Eugenio Perez Martin
@ 2024-03-25 17:34     ` Jonah Palmer
  2024-03-25 19:45       ` Eugenio Perez Martin
  0 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-25 17:34 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 3/22/24 6:46 AM, Eugenio Perez Martin wrote:
> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>> Implements in-order handling for most virtio devices using the
>> VIRTIO_F_IN_ORDER transport feature, specifically those who call
>> virtqueue_push to push their used elements onto the used ring.
>>
>> The logic behind this implementation is as follows:
>>
>> 1.) virtqueue_pop always enqueues VirtQueueElements in-order.
>>
>> virtqueue_pop always retrieves one or more buffer descriptors in-order
>> from the available ring and converts them into a VirtQueueElement. This
>> means that the order in which VirtQueueElements are enqueued are
>> in-order by default.
>>
>> By virtue, as VirtQueueElements are created, we can assign a sequential
>> key value to them. This preserves the order of buffers that have been
>> made available to the device by the driver.
>>
>> As VirtQueueElements are assigned a key value, the current sequence
>> number is incremented.
>>
>> 2.) Requests can be completed out-of-order.
>>
>> While most devices complete requests in the same order that they were
>> enqueued by default, some devices don't (e.g. virtio-blk). The goal of
>> this out-of-order handling is to reduce the impact of devices that
>> process elements in-order by default while also guaranteeing compliance
>> with the VIRTIO_F_IN_ORDER feature.
>>
>> Below is the logic behind handling completed requests (which may or may
>> not be in-order).
>>
>> 3.) Does the incoming used VirtQueueElement preserve the correct order?
>>
>> In other words, is the sequence number (key) assigned to the
>> VirtQueueElement the expected number that would preserve the original
>> order?
>>
>> 3a.)
>> If it does... immediately push the used element onto the used ring.
>> Then increment the next expected sequence number and check to see if
>> any previous out-of-order VirtQueueElements stored on the hash table
>> has a key that matches this next expected sequence number.
>>
>> For each VirtQueueElement found on the hash table with a matching key:
>> push the element on the used ring, remove the key-value pair from the
>> hash table, and then increment the next expected sequence number. Repeat
>> this process until we're unable to find an element with a matching key.
>>
>> Note that if the device uses batching (e.g. virtio-net), then we skip
>> the virtqueue_flush call and let the device call it themselves.
>>
>> 3b.)
>> If it does not... stash the VirtQueueElement, along with relevant data,
>> as a InOrderVQElement on the hash table. The key used is the order_key
>> that was assigned when the VirtQueueElement was created.
>>
>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>> ---
>>   hw/virtio/virtio.c         | 70 ++++++++++++++++++++++++++++++++++++--
>>   include/hw/virtio/virtio.h |  8 +++++
>>   2 files changed, 76 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index 40124545d6..40e4377f1e 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>>       }
>>   }
>>
>> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
>> +                             unsigned int len, unsigned int idx,
>> +                             unsigned int count)
>> +{
>> +    InOrderVQElement *in_order_elem;
>> +
>> +    if (elem->order_key == vq->current_order_idx) {
>> +        /* Element is in-order, push to used ring */
>> +        virtqueue_fill(vq, elem, len, idx);
>> +
>> +        /* Batching? Don't flush */
>> +        if (count) {
>> +            virtqueue_flush(vq, count);
> 
> The "count" parameter is the number of heads used, but here you're
> only using one head (elem). Same with the other virtqueue_flush in the
> function.
> 

True. This acts more as a flag than an actual count since, unless we're 
batching (which in the current setup, the device would explicitly call 
virtqueue_flush separately), this value will be either 0 or 1.

> Also, this function sometimes replaces virtqueue_fill and other
> replaces virtqueue_fill + virtqueue_flush (both examples in patch
> 6/8). I have the impression the series would be simpler if
> virtqueue_order_element is a static function just handling the
> virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER) path of
> virtqueue_fill, so the caller does not need to know if the in_order
> feature is on or off.
> 

Originally I wanted this function to replace virtqueue_fill + 
virtqueue_flush but after looking at virtio_net_receive_rcu and 
vhost_svq_flush, where multiple virtqueue_fill's can be called before a 
single virtqueue_flush, I added this 'if (count)' conditional to handle 
both cases.

I did consider virtqueue_order_element just handling the virtqueue_fill 
path but then I wasn't sure how to handle calling virtqueue_flush when 
retrieving out-of-order data from the hash table.

For example, devices that call virtqueue_push would call virtqueue_fill 
and then virtqueue_flush afterwards. In the scenario where, say, elem1 
was found out of order and put into the hash table, and then elem0 comes 
along. For elem0 we'd call virtqueue_fill and then we should call 
virtqueue_flush to keep the order going. Then we'd find elem1 and do the 
same. I have trouble seeing how we could properly call virtqueue_flush 
after finding out-of-order elements (that are now ready to be placed on 
the used ring in-order) in the hash table.

>> +        }
>> +
>> +        /* Increment next expected order, search for more in-order elements */
>> +        while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
>> +                        GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) {
>> +            /* Found in-order element, push to used ring */
>> +            virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
>> +                           in_order_elem->idx);
>> +
>> +            /* Batching? Don't flush */
>> +            if (count) {
>> +                virtqueue_flush(vq, in_order_elem->count);
>> +            }
>> +
>> +            /* Remove key-value pair from hash table */
>> +            g_hash_table_remove(vq->in_order_ht,
>> +                                GUINT_TO_POINTER(vq->current_order_idx));
>> +        }
>> +    } else {
>> +        /* Element is out-of-order, stash in hash table */
>> +        in_order_elem = virtqueue_alloc_in_order_element(elem, len, idx,
>> +                                                         count);
>> +        g_hash_table_insert(vq->in_order_ht, GUINT_TO_POINTER(elem->order_key),
>> +                            in_order_elem);
>> +    }
>> +}
>> +
>>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>                       unsigned int len)
>>   {
>>       RCU_READ_LOCK_GUARD();
>> -    virtqueue_fill(vq, elem, len, 0);
>> -    virtqueue_flush(vq, 1);
>> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
>> +        virtqueue_order_element(vq, elem, len, 0, 1);
>> +    } else {
>> +        virtqueue_fill(vq, elem, len, 0);
>> +        virtqueue_flush(vq, 1);
>> +    }
>>   }
>>
>>   /* Called within rcu_read_lock().  */
>> @@ -1478,6 +1522,18 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
>>                                                                           false);
>>   }
>>
>> +void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
>> +                                       unsigned int len, unsigned int idx,
>> +                                       unsigned int count)
>> +{
>> +    InOrderVQElement *in_order_elem = g_malloc(sizeof(InOrderVQElement));
>> +    in_order_elem->elem = elem;
>> +    in_order_elem->len = len;
>> +    in_order_elem->idx = idx;
>> +    in_order_elem->count = count;
>> +    return in_order_elem;
>> +}
>> +
>>   static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
>>   {
>>       VirtQueueElement *elem;
>> @@ -1626,6 +1682,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>>           elem->in_sg[i] = iov[out_num + i];
>>       }
>>
>> +    /* Assign key for in-order processing */
>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>> +        elem->order_key = vq->current_order_key++;
> 
> Since you're adding this in both split_pop and packed_pop, why not add
> it in virtqueue_pop?
> 

I wanted to add this order_key to the VirtQueueElement after it was 
created. I suppose I could do this directly in virtqueue_alloc_element 
but I'd have to add another parameter to it, which might be unnecessary 
given it'd only be applicable for this specific in_order feature.

I also suppose I could just capture the VirtQueueElement being returned 
from virtqueue_packed_pop/virtqueue_split_pop and make the assignment 
there, but it felt out of place to do it in virtqueue_pop.

>> +    }
>> +
>>       vq->inuse++;
>>
>>       trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>> @@ -1762,6 +1823,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>>           vq->last_avail_wrap_counter ^= 1;
>>       }
>>
>> +    /* Assign key for in-order processing */
>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>> +        elem->order_key = vq->current_order_key++;
>> +    }
>> +
>>       vq->shadow_avail_idx = vq->last_avail_idx;
>>       vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>>
>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> index f83d7e1fee..eeeda397a9 100644
>> --- a/include/hw/virtio/virtio.h
>> +++ b/include/hw/virtio/virtio.h
>> @@ -275,6 +275,14 @@ void virtio_del_queue(VirtIODevice *vdev, int n);
>>
>>   void virtio_delete_queue(VirtQueue *vq);
>>
>> +void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
>> +                                       unsigned int len, unsigned int idx,
>> +                                       unsigned int count);
>> +
>> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
>> +                             unsigned int len, unsigned int idx,
>> +                             unsigned int count);
>> +
>>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>>                       unsigned int len);
>>   void virtqueue_flush(VirtQueue *vq, unsigned int count);
>> --
>> 2.39.3
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 1/8] virtio: Define InOrderVQElement
  2024-03-25 17:08     ` Jonah Palmer
@ 2024-03-25 19:12       ` Eugenio Perez Martin
  0 siblings, 0 replies; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-25 19:12 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Mon, Mar 25, 2024 at 6:08 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 3/22/24 5:45 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >> Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
> >> transport feature implementation.
> >>
> >> The InOrderVQElement structure is used to encapsulate out-of-order
> >> VirtQueueElement data that was processed by the host. This data
> >> includes:
> >>   - The processed VirtQueueElement (elem)
> >>   - Length of data (len)
> >>   - VirtQueueElement array index (idx)
> >>   - Number of processed VirtQueueElements (count)
> >>
> >> InOrderVQElements will be stored in a buffering mechanism until an
> >> order can be achieved.
> >>
> >> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >> ---
> >>   include/hw/virtio/virtio.h | 7 +++++++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index b3c74a1bca..c8aa435a5e 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -77,6 +77,13 @@ typedef struct VirtQueueElement
> >>       struct iovec *out_sg;
> >>   } VirtQueueElement;
> >>
> >> +typedef struct InOrderVQElement {
> >> +    const VirtQueueElement *elem;
> >
> > Some subsystems allocate space for extra elements after
> > VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
> > to allocate this extra space by its second argument. Would it work for
> > this?
> >
>
> I don't see why not. Although this may not be necessary due to me
> missing a key aspect mentioned in your comment below.
>
> >> +    unsigned int len;
> >> +    unsigned int idx;
> >> +    unsigned int count;
> >
> > Now I don't get why these fields cannot be obtained from elem->(len,
> > index, ndescs) ?
> >
>
> Interesting. I didn't realize that these values are equivalent to a
> VirtQueueElement's len, index, and ndescs fields.
>
> Is this always true? Else I would've expected, for example,
> virtqueue_push to not need the 'unsigned int len' parameter if this
> information is already included via. the VirtQueueElement being passed in.
>

The code uses "len" to store the written length values of each used
descriptor between virtqueue_fill and virtqueue_flush. But not all
devices use these separately, only the ones that batches: virtio-net
and SVQ.

A smarter / less simpler implementation of virtqueue_push could
certainly avoid storing elem->len. But the performance gain is
probably tiny, and the code complexity grows.

> >> +} InOrderVQElement;
> >> +
> >>   #define VIRTIO_QUEUE_MAX 1024
> >>
> >>   #define VIRTIO_NO_VECTOR 0xffff
> >> --
> >> 2.39.3
> >>
> >
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices
  2024-03-25 17:34     ` Jonah Palmer
@ 2024-03-25 19:45       ` Eugenio Perez Martin
  0 siblings, 0 replies; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-25 19:45 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Mon, Mar 25, 2024 at 6:35 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 3/22/24 6:46 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >> Implements in-order handling for most virtio devices using the
> >> VIRTIO_F_IN_ORDER transport feature, specifically those who call
> >> virtqueue_push to push their used elements onto the used ring.
> >>
> >> The logic behind this implementation is as follows:
> >>
> >> 1.) virtqueue_pop always enqueues VirtQueueElements in-order.
> >>
> >> virtqueue_pop always retrieves one or more buffer descriptors in-order
> >> from the available ring and converts them into a VirtQueueElement. This
> >> means that the order in which VirtQueueElements are enqueued are
> >> in-order by default.
> >>
> >> By virtue, as VirtQueueElements are created, we can assign a sequential
> >> key value to them. This preserves the order of buffers that have been
> >> made available to the device by the driver.
> >>
> >> As VirtQueueElements are assigned a key value, the current sequence
> >> number is incremented.
> >>
> >> 2.) Requests can be completed out-of-order.
> >>
> >> While most devices complete requests in the same order that they were
> >> enqueued by default, some devices don't (e.g. virtio-blk). The goal of
> >> this out-of-order handling is to reduce the impact of devices that
> >> process elements in-order by default while also guaranteeing compliance
> >> with the VIRTIO_F_IN_ORDER feature.
> >>
> >> Below is the logic behind handling completed requests (which may or may
> >> not be in-order).
> >>
> >> 3.) Does the incoming used VirtQueueElement preserve the correct order?
> >>
> >> In other words, is the sequence number (key) assigned to the
> >> VirtQueueElement the expected number that would preserve the original
> >> order?
> >>
> >> 3a.)
> >> If it does... immediately push the used element onto the used ring.
> >> Then increment the next expected sequence number and check to see if
> >> any previous out-of-order VirtQueueElements stored on the hash table
> >> has a key that matches this next expected sequence number.
> >>
> >> For each VirtQueueElement found on the hash table with a matching key:
> >> push the element on the used ring, remove the key-value pair from the
> >> hash table, and then increment the next expected sequence number. Repeat
> >> this process until we're unable to find an element with a matching key.
> >>
> >> Note that if the device uses batching (e.g. virtio-net), then we skip
> >> the virtqueue_flush call and let the device call it themselves.
> >>
> >> 3b.)
> >> If it does not... stash the VirtQueueElement, along with relevant data,
> >> as a InOrderVQElement on the hash table. The key used is the order_key
> >> that was assigned when the VirtQueueElement was created.
> >>
> >> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >> ---
> >>   hw/virtio/virtio.c         | 70 ++++++++++++++++++++++++++++++++++++--
> >>   include/hw/virtio/virtio.h |  8 +++++
> >>   2 files changed, 76 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 40124545d6..40e4377f1e 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
> >>       }
> >>   }
> >>
> >> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> >> +                             unsigned int len, unsigned int idx,
> >> +                             unsigned int count)
> >> +{
> >> +    InOrderVQElement *in_order_elem;
> >> +
> >> +    if (elem->order_key == vq->current_order_idx) {
> >> +        /* Element is in-order, push to used ring */
> >> +        virtqueue_fill(vq, elem, len, idx);
> >> +
> >> +        /* Batching? Don't flush */
> >> +        if (count) {
> >> +            virtqueue_flush(vq, count);
> >
> > The "count" parameter is the number of heads used, but here you're
> > only using one head (elem). Same with the other virtqueue_flush in the
> > function.
> >
>
> True. This acts more as a flag than an actual count since, unless we're
> batching (which in the current setup, the device would explicitly call
> virtqueue_flush separately), this value will be either 0 or 1.
>

If possible, I think it is better to keep consistent with the current
API: fill+flush for batching, push for a single shot.

> > Also, this function sometimes replaces virtqueue_fill and other
> > replaces virtqueue_fill + virtqueue_flush (both examples in patch
> > 6/8). I have the impression the series would be simpler if
> > virtqueue_order_element is a static function just handling the
> > virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER) path of
> > virtqueue_fill, so the caller does not need to know if the in_order
> > feature is on or off.
> >
>
> Originally I wanted this function to replace virtqueue_fill +
> virtqueue_flush but after looking at virtio_net_receive_rcu and
> vhost_svq_flush, where multiple virtqueue_fill's can be called before a
> single virtqueue_flush, I added this 'if (count)' conditional to handle
> both cases.
>
> I did consider virtqueue_order_element just handling the virtqueue_fill
> path but then I wasn't sure how to handle calling virtqueue_flush when
> retrieving out-of-order data from the hash table.
>
> For example, devices that call virtqueue_push would call virtqueue_fill
> and then virtqueue_flush afterwards. In the scenario where, say, elem1
> was found out of order and put into the hash table, and then elem0 comes
> along. For elem0 we'd call virtqueue_fill and then we should call
> virtqueue_flush to keep the order going. Then we'd find elem1 and do the
> same. I have trouble seeing how we could properly call virtqueue_flush
> after finding out-of-order elements (that are now ready to be placed on
> the used ring in-order) in the hash table.
>

I see, that's a good point indeed. The way I thought, it is a no-op in
that case, and the later virtqueue_flush needs to check if it has
pending buffers to use.

The next question is what to do with the virtqueue_fill idx and
virtqueue_flush count parameters. More on that in the cover letter, as
the discussion goes that direction there.

> >> +        }
> >> +
> >> +        /* Increment next expected order, search for more in-order elements */
> >> +        while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
> >> +                        GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) {
> >> +            /* Found in-order element, push to used ring */
> >> +            virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
> >> +                           in_order_elem->idx);
> >> +
> >> +            /* Batching? Don't flush */
> >> +            if (count) {
> >> +                virtqueue_flush(vq, in_order_elem->count);
> >> +            }
> >> +
> >> +            /* Remove key-value pair from hash table */
> >> +            g_hash_table_remove(vq->in_order_ht,
> >> +                                GUINT_TO_POINTER(vq->current_order_idx));
> >> +        }
> >> +    } else {
> >> +        /* Element is out-of-order, stash in hash table */
> >> +        in_order_elem = virtqueue_alloc_in_order_element(elem, len, idx,
> >> +                                                         count);
> >> +        g_hash_table_insert(vq->in_order_ht, GUINT_TO_POINTER(elem->order_key),
> >> +                            in_order_elem);
> >> +    }
> >> +}
> >> +
> >>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>                       unsigned int len)
> >>   {
> >>       RCU_READ_LOCK_GUARD();
> >> -    virtqueue_fill(vq, elem, len, 0);
> >> -    virtqueue_flush(vq, 1);
> >> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> >> +        virtqueue_order_element(vq, elem, len, 0, 1);
> >> +    } else {
> >> +        virtqueue_fill(vq, elem, len, 0);
> >> +        virtqueue_flush(vq, 1);
> >> +    }
> >>   }
> >>
> >>   /* Called within rcu_read_lock().  */
> >> @@ -1478,6 +1522,18 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem)
> >>                                                                           false);
> >>   }
> >>
> >> +void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
> >> +                                       unsigned int len, unsigned int idx,
> >> +                                       unsigned int count)
> >> +{
> >> +    InOrderVQElement *in_order_elem = g_malloc(sizeof(InOrderVQElement));
> >> +    in_order_elem->elem = elem;
> >> +    in_order_elem->len = len;
> >> +    in_order_elem->idx = idx;
> >> +    in_order_elem->count = count;
> >> +    return in_order_elem;
> >> +}
> >> +
> >>   static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
> >>   {
> >>       VirtQueueElement *elem;
> >> @@ -1626,6 +1682,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >>           elem->in_sg[i] = iov[out_num + i];
> >>       }
> >>
> >> +    /* Assign key for in-order processing */
> >> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >> +        elem->order_key = vq->current_order_key++;
> >
> > Since you're adding this in both split_pop and packed_pop, why not add
> > it in virtqueue_pop?
> >
>
> I wanted to add this order_key to the VirtQueueElement after it was
> created. I suppose I could do this directly in virtqueue_alloc_element
> but I'd have to add another parameter to it, which might be unnecessary
> given it'd only be applicable for this specific in_order feature.
>
> I also suppose I could just capture the VirtQueueElement being returned
> from virtqueue_packed_pop/virtqueue_split_pop and make the assignment
> there, but it felt out of place to do it in virtqueue_pop.
>

I see. I keep finding it simpler to do the assignment in one point
only, as it is not a specific split / packed operation. But let's see.

Thanks!

> >> +    }
> >> +
> >>       vq->inuse++;
> >>
> >>       trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >> @@ -1762,6 +1823,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >>           vq->last_avail_wrap_counter ^= 1;
> >>       }
> >>
> >> +    /* Assign key for in-order processing */
> >> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >> +        elem->order_key = vq->current_order_key++;
> >> +    }
> >> +
> >>       vq->shadow_avail_idx = vq->last_avail_idx;
> >>       vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
> >>
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index f83d7e1fee..eeeda397a9 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -275,6 +275,14 @@ void virtio_del_queue(VirtIODevice *vdev, int n);
> >>
> >>   void virtio_delete_queue(VirtQueue *vq);
> >>
> >> +void *virtqueue_alloc_in_order_element(const VirtQueueElement *elem,
> >> +                                       unsigned int len, unsigned int idx,
> >> +                                       unsigned int count);
> >> +
> >> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> >> +                             unsigned int len, unsigned int idx,
> >> +                             unsigned int count);
> >> +
> >>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
> >>                       unsigned int len);
> >>   void virtqueue_flush(VirtQueue *vq, unsigned int count);
> >> --
> >> 2.39.3
> >>
> >
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-25 16:52   ` Jonah Palmer
@ 2024-03-25 20:33     ` Eugenio Perez Martin
  2024-03-26 16:49       ` Jonah Palmer
  0 siblings, 1 reply; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-25 20:33 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >> The goal of these patches is to add support to a variety of virtio and
> >> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> >> indicates that all buffers are used by the device in the same order in
> >> which they were made available by the driver.
> >>
> >> These patches attempt to implement a generalized, non-device-specific
> >> solution to support this feature.
> >>
> >> The core feature behind this solution is a buffer mechanism in the form
> >> of GLib's GHashTable. The decision behind using a hash table was to
> >> leverage their ability for quick lookup, insertion, and removal
> >> operations. Given that our keys are simply numbers of an ordered
> >> sequence, a hash table seemed like the best choice for a buffer
> >> mechanism.
> >>
> >> ---------------------
> >>
> >> The strategy behind this implementation is as follows:
> >>
> >> We know that buffers that are popped from the available ring and enqueued
> >> for further processing will always done in the same order in which they
> >> were made available by the driver. Given this, we can note their order
> >> by assigning the resulting VirtQueueElement a key. This key is a number
> >> in a sequence that represents the order in which they were popped from
> >> the available ring, relative to the other VirtQueueElements.
> >>
> >> For example, given 3 "elements" that were popped from the available
> >> ring, we assign a key value to them which represents their order (elem0
> >> is popped first, then elem1, then lastly elem2):
> >>
> >>       elem2   --  elem1   --  elem0   ---> Enqueue for processing
> >>      (key: 2)    (key: 1)    (key: 0)
> >>
> >> Then these elements are enqueued for further processing by the host.
> >>
> >> While most devices will return these completed elements in the same
> >> order in which they were enqueued, some devices may not (e.g.
> >> virtio-blk). To guarantee that these elements are put on the used ring
> >> in the same order in which they were enqueued, we can use a buffering
> >> mechanism that keeps track of the next expected sequence number of an
> >> element.
> >>
> >> In other words, if the completed element does not have a key value that
> >> matches the next expected sequence number, then we know this element is
> >> not in-order and we must stash it away in a hash table until an order
> >> can be made. The element's key value is used as the key for placing it
> >> in the hash table.
> >>
> >> If the completed element has a key value that matches the next expected
> >> sequence number, then we know this element is in-order and we can push
> >> it on the used ring. Then we increment the next expected sequence number
> >> and check if the hash table contains an element at this key location.
> >>
> >> If so, we retrieve this element, push it to the used ring, delete the
> >> key-value pair from the hash table, increment the next expected sequence
> >> number, and check the hash table again for an element at this new key
> >> location. This process is repeated until we're unable to find an element
> >> in the hash table to continue the order.
> >>
> >> So, for example, say the 3 elements we enqueued were completed in the
> >> following order: elem1, elem2, elem0. The next expected sequence number
> >> is 0:
> >>
> >>      exp-seq-num = 0:
> >>
> >>       elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> >>      (key: 1)                                         |
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |key: 1 - elem1|
> >>                                                 ================
> >>      ---------------------
> >>      exp-seq-num = 0:
> >>
> >>       elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> >>      (key: 2)                                         |
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |key: 1 - elem1|
> >>                                                 |--------------|
> >>                                                 |key: 2 - elem2|
> >>                                                 ================
> >>      ---------------------
> >>      exp-seq-num = 0:
> >>
> >>       elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
> >>      (key: 0)
> >>
> >>      exp-seq-num = 1:
> >>
> >>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>                                               remove elem from table
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |key: 2 - elem2|
> >>                                                 ================
> >>
> >>      exp-seq-num = 2:
> >>
> >>      lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>                                               remove elem from table
> >>                                                       |
> >>                                                       v
> >>                                                 ================
> >>                                                 |   *empty*    |
> >>                                                 ================
> >>
> >>      exp-seq-num = 3:
> >>
> >>      lookup(table, exp-seq-num) != NULL ? --> No, done
> >>      ---------------------
> >>
> >
> > I think to use a hashtable to handle this has an important drawback:
> > it hurts performance on the devices that are using right in-order
> > because of hash calculus, to benefit devices that are using it badly
> > by using descriptors out of order. We should use data structs that are
> > as free as possible for the first, and we don't care to worse the
> > experience of the devices that enable in_order and they shouldn't.
> >
>
> Right, because if descriptors are coming in in-order, we still search
> the (empty) hash table.
>
> Hmm... what if we introduced a flag to see if we actually should bother
> searching the hash table? That way we avoid the cost of searching when
> we really don't need to.
>
> > So I suggest reusing vq->used_elems array vq. At each used descriptor
> > written in the used ring, you know the next head is elem->index +
> > elem->ndescs, so you can check if that element has been filled or not.
> > If used, it needs to be flushed too. If not used, just return.
> >
> > Of course virtqueue_flush also needs to take this into account.
> >
> > What do you think, does it make sense to you?
> >
>
> I'm having a bit of trouble understanding the suggestion here. Would you
> mind elaborating a bit more for me on this?
>
> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
> being first, elem2 last) and then elem2 finishes first, elem1 second,
> and elem0 third. Given that these elements finish out-of-order, how
> would you handle these out-of-order elements using your suggestion?
>

virtqueue_fill is called first with elem2. So vq->used_elems[2 %
vq->num] is filled with the needed information of the descriptor:
index, len and ndescs. idx function parameter is ignored.

Optionally, virtqueue_push is called. It checks if
vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num +
elem->out_num > 0, and reset them on every used ring write. If it is
not valid, this is a no-op. Currently, it is not valid.

Same process for elem1.

virtqueue_fill is the same for elem0. But now virtqueue_flush gets
interesting, as it detects vq->used_elems[0] is used. It scans for the
first not-used element, and it finds it is vq->used_elems[3]. So it
needs to write an used elem with id = 2 and the corresponding length.

Maybe it is interesting to implement ways to improve the look for the
last used descriptor, but if any I'd go for a bitmap and always on top
of the basis series.

The algorithm has not been tested, so maybe I've missed something.

Thanks!

> Thanks :)
>
> > Thanks!
> >
> >
> >> Jonah Palmer (8):
> >>    virtio: Define InOrderVQElement
> >>    virtio: Create/destroy/reset VirtQueue In-Order hash table
> >>    virtio: Define order variables
> >>    virtio: Implement in-order handling for virtio devices
> >>    virtio-net: in-order handling
> >>    vhost-svq: in-order handling
> >>    vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
> >>    virtio: Add VIRTIO_F_IN_ORDER property definition
> >>
> >>   hw/block/vhost-user-blk.c          |   1 +
> >>   hw/net/vhost_net.c                 |   2 +
> >>   hw/net/virtio-net.c                |   6 +-
> >>   hw/scsi/vhost-scsi.c               |   1 +
> >>   hw/scsi/vhost-user-scsi.c          |   1 +
> >>   hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
> >>   hw/virtio/vhost-user-fs.c          |   1 +
> >>   hw/virtio/vhost-user-vsock.c       |   1 +
> >>   hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
> >>   include/hw/virtio/virtio.h         |  20 +++++-
> >>   net/vhost-vdpa.c                   |   1 +
> >>   11 files changed, 145 insertions(+), 7 deletions(-)
> >>
> >> --
> >> 2.39.3
> >>
> >
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-25 20:33     ` Eugenio Perez Martin
@ 2024-03-26 16:49       ` Jonah Palmer
  2024-03-26 18:34         ` Eugenio Perez Martin
  0 siblings, 1 reply; 25+ messages in thread
From: Jonah Palmer @ 2024-03-26 16:49 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:
> On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
>>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>
>>>> The goal of these patches is to add support to a variety of virtio and
>>>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
>>>> indicates that all buffers are used by the device in the same order in
>>>> which they were made available by the driver.
>>>>
>>>> These patches attempt to implement a generalized, non-device-specific
>>>> solution to support this feature.
>>>>
>>>> The core feature behind this solution is a buffer mechanism in the form
>>>> of GLib's GHashTable. The decision behind using a hash table was to
>>>> leverage their ability for quick lookup, insertion, and removal
>>>> operations. Given that our keys are simply numbers of an ordered
>>>> sequence, a hash table seemed like the best choice for a buffer
>>>> mechanism.
>>>>
>>>> ---------------------
>>>>
>>>> The strategy behind this implementation is as follows:
>>>>
>>>> We know that buffers that are popped from the available ring and enqueued
>>>> for further processing will always done in the same order in which they
>>>> were made available by the driver. Given this, we can note their order
>>>> by assigning the resulting VirtQueueElement a key. This key is a number
>>>> in a sequence that represents the order in which they were popped from
>>>> the available ring, relative to the other VirtQueueElements.
>>>>
>>>> For example, given 3 "elements" that were popped from the available
>>>> ring, we assign a key value to them which represents their order (elem0
>>>> is popped first, then elem1, then lastly elem2):
>>>>
>>>>        elem2   --  elem1   --  elem0   ---> Enqueue for processing
>>>>       (key: 2)    (key: 1)    (key: 0)
>>>>
>>>> Then these elements are enqueued for further processing by the host.
>>>>
>>>> While most devices will return these completed elements in the same
>>>> order in which they were enqueued, some devices may not (e.g.
>>>> virtio-blk). To guarantee that these elements are put on the used ring
>>>> in the same order in which they were enqueued, we can use a buffering
>>>> mechanism that keeps track of the next expected sequence number of an
>>>> element.
>>>>
>>>> In other words, if the completed element does not have a key value that
>>>> matches the next expected sequence number, then we know this element is
>>>> not in-order and we must stash it away in a hash table until an order
>>>> can be made. The element's key value is used as the key for placing it
>>>> in the hash table.
>>>>
>>>> If the completed element has a key value that matches the next expected
>>>> sequence number, then we know this element is in-order and we can push
>>>> it on the used ring. Then we increment the next expected sequence number
>>>> and check if the hash table contains an element at this key location.
>>>>
>>>> If so, we retrieve this element, push it to the used ring, delete the
>>>> key-value pair from the hash table, increment the next expected sequence
>>>> number, and check the hash table again for an element at this new key
>>>> location. This process is repeated until we're unable to find an element
>>>> in the hash table to continue the order.
>>>>
>>>> So, for example, say the 3 elements we enqueued were completed in the
>>>> following order: elem1, elem2, elem0. The next expected sequence number
>>>> is 0:
>>>>
>>>>       exp-seq-num = 0:
>>>>
>>>>        elem1   --> elem1.key == exp-seq-num ? --> No, stash it
>>>>       (key: 1)                                         |
>>>>                                                        |
>>>>                                                        v
>>>>                                                  ================
>>>>                                                  |key: 1 - elem1|
>>>>                                                  ================
>>>>       ---------------------
>>>>       exp-seq-num = 0:
>>>>
>>>>        elem2   --> elem2.key == exp-seq-num ? --> No, stash it
>>>>       (key: 2)                                         |
>>>>                                                        |
>>>>                                                        v
>>>>                                                  ================
>>>>                                                  |key: 1 - elem1|
>>>>                                                  |--------------|
>>>>                                                  |key: 2 - elem2|
>>>>                                                  ================
>>>>       ---------------------
>>>>       exp-seq-num = 0:
>>>>
>>>>        elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>>>>       (key: 0)
>>>>
>>>>       exp-seq-num = 1:
>>>>
>>>>       lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>>>                                                remove elem from table
>>>>                                                        |
>>>>                                                        v
>>>>                                                  ================
>>>>                                                  |key: 2 - elem2|
>>>>                                                  ================
>>>>
>>>>       exp-seq-num = 2:
>>>>
>>>>       lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>>>                                                remove elem from table
>>>>                                                        |
>>>>                                                        v
>>>>                                                  ================
>>>>                                                  |   *empty*    |
>>>>                                                  ================
>>>>
>>>>       exp-seq-num = 3:
>>>>
>>>>       lookup(table, exp-seq-num) != NULL ? --> No, done
>>>>       ---------------------
>>>>
>>>
>>> I think to use a hashtable to handle this has an important drawback:
>>> it hurts performance on the devices that are using right in-order
>>> because of hash calculus, to benefit devices that are using it badly
>>> by using descriptors out of order. We should use data structs that are
>>> as free as possible for the first, and we don't care to worse the
>>> experience of the devices that enable in_order and they shouldn't.
>>>
>>
>> Right, because if descriptors are coming in in-order, we still search
>> the (empty) hash table.
>>
>> Hmm... what if we introduced a flag to see if we actually should bother
>> searching the hash table? That way we avoid the cost of searching when
>> we really don't need to.
>>
>>> So I suggest reusing vq->used_elems array vq. At each used descriptor
>>> written in the used ring, you know the next head is elem->index +
>>> elem->ndescs, so you can check if that element has been filled or not.
>>> If used, it needs to be flushed too. If not used, just return.
>>>
>>> Of course virtqueue_flush also needs to take this into account.
>>>
>>> What do you think, does it make sense to you?
>>>
>>
>> I'm having a bit of trouble understanding the suggestion here. Would you
>> mind elaborating a bit more for me on this?
>>
>> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
>> being first, elem2 last) and then elem2 finishes first, elem1 second,
>> and elem0 third. Given that these elements finish out-of-order, how
>> would you handle these out-of-order elements using your suggestion?
>>
> 
> virtqueue_fill is called first with elem2. So vq->used_elems[2 %
> vq->num] is filled with the needed information of the descriptor:
> index, len and ndescs. idx function parameter is ignored.
> 
> Optionally, virtqueue_push is called. It checks if
> vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num +
> elem->out_num > 0, and reset them on every used ring write. If it is
> not valid, this is a no-op. Currently, it is not valid.
> 
> Same process for elem1.
> 
> virtqueue_fill is the same for elem0. But now virtqueue_flush gets
> interesting, as it detects vq->used_elems[0] is used. It scans for the
> first not-used element, and it finds it is vq->used_elems[3]. So it
> needs to write an used elem with id = 2 and the corresponding length.
> 
> Maybe it is interesting to implement ways to improve the look for the
> last used descriptor, but if any I'd go for a bitmap and always on top
> of the basis series.
> 
> The algorithm has not been tested, so maybe I've missed something.
> 
> Thanks!
> 

Thank you for taking the time to clarify for this for me, I appreciate it.

I spent some time yesterday and this morning working this over in my 
head. I believe I understand what you're trying to do here and it makes 
more sense than employing a data structure like a hash table for this 
kind of job. However, I have a few questions regarding this implementation.

So, one question is on the reuse of the VirtQueue's used_elems array. 
Wont reusing this array cause issues with packed VQ operations, since it 
also uses this array? If we want to stick with using this array 
specifically, perhaps we may need to rewrite its logic if the device has 
negotiated the in_order feature? E.g.

virtqueue_packed_flush (...) {
    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER) {
       // new logic
    } else {
      // current logic
    }
}
-----------

Regarding this paragraph:

"virtqueue_fill is called first with elem2. So vq->used_elems[2 %
vq->num] is filled with the needed information of the descriptor:
index, len and ndescs. idx function parameter is ignored."

This looks exactly like virtqueue_packed_fill except for the idx 
parameter we'd pass in (sequence_num % vq->vring.num).

In any case, regardless of whether this element being passed in is 
considered to be in-order or not, we still add this element to 
vq->used_elems in virtqueue_fill. Ok, got it.

Then you say "Optionally, virtqueue_push is called". I assume by 
"optionally" you mean we need to know if this is a single-shot operation 
or a batched operation. A single-shot operation would call for 
virtqueue_push whereas a batched operation would just use 
virtqueue_fill. If this is what you meant by that then ok, I understand 
that too.

However, I think before we start considering whether or not we need to 
call virtqueue_push or continue with virtqueue_fill, we first should 
know whether or not this element is in-order. And I think to do that we 
should use the check you mentioned:

if (vq->used_elems[vq->used_idx].in_num + 
vq->used_elems[vq->used_idx].out_num > 0)

or perhaps:

if (vq->used_elems[vq->used_idx] != NULL)

If the element is found not to be in-order, I assume we return and we 
are done with the handling of this element for now.

Now my confusion with this part comes from calling virtqueue_push inside 
of the virtqueue_fill function. Wouldn't calling virtqueue_push inside 
of virtqueue_fill present some kind of recursive execution path? Unless 
I'm missing something here, this probably isn't something we need to do, 
right?
-----------

Lastly, when execution reaches virtqueue_flush, what would define an 
element as unused? Perhaps...

if (vq->used_elems[i] == NULL)

or

if (vq->used_elems[i].in_num + vq->used_elems[i].out_num > 0)

Thanks Eugenio!

>> Thanks :)
>>
>>> Thanks!
>>>
>>>
>>>> Jonah Palmer (8):
>>>>     virtio: Define InOrderVQElement
>>>>     virtio: Create/destroy/reset VirtQueue In-Order hash table
>>>>     virtio: Define order variables
>>>>     virtio: Implement in-order handling for virtio devices
>>>>     virtio-net: in-order handling
>>>>     vhost-svq: in-order handling
>>>>     vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>>>>     virtio: Add VIRTIO_F_IN_ORDER property definition
>>>>
>>>>    hw/block/vhost-user-blk.c          |   1 +
>>>>    hw/net/vhost_net.c                 |   2 +
>>>>    hw/net/virtio-net.c                |   6 +-
>>>>    hw/scsi/vhost-scsi.c               |   1 +
>>>>    hw/scsi/vhost-user-scsi.c          |   1 +
>>>>    hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
>>>>    hw/virtio/vhost-user-fs.c          |   1 +
>>>>    hw/virtio/vhost-user-vsock.c       |   1 +
>>>>    hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
>>>>    include/hw/virtio/virtio.h         |  20 +++++-
>>>>    net/vhost-vdpa.c                   |   1 +
>>>>    11 files changed, 145 insertions(+), 7 deletions(-)
>>>>
>>>> --
>>>> 2.39.3
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-26 16:49       ` Jonah Palmer
@ 2024-03-26 18:34         ` Eugenio Perez Martin
  2024-03-26 19:01           ` Jonah Palmer
  0 siblings, 1 reply; 25+ messages in thread
From: Eugenio Perez Martin @ 2024-03-26 18:34 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:
> > On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >>
> >>
> >> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>
> >>>> The goal of these patches is to add support to a variety of virtio and
> >>>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> >>>> indicates that all buffers are used by the device in the same order in
> >>>> which they were made available by the driver.
> >>>>
> >>>> These patches attempt to implement a generalized, non-device-specific
> >>>> solution to support this feature.
> >>>>
> >>>> The core feature behind this solution is a buffer mechanism in the form
> >>>> of GLib's GHashTable. The decision behind using a hash table was to
> >>>> leverage their ability for quick lookup, insertion, and removal
> >>>> operations. Given that our keys are simply numbers of an ordered
> >>>> sequence, a hash table seemed like the best choice for a buffer
> >>>> mechanism.
> >>>>
> >>>> ---------------------
> >>>>
> >>>> The strategy behind this implementation is as follows:
> >>>>
> >>>> We know that buffers that are popped from the available ring and enqueued
> >>>> for further processing will always done in the same order in which they
> >>>> were made available by the driver. Given this, we can note their order
> >>>> by assigning the resulting VirtQueueElement a key. This key is a number
> >>>> in a sequence that represents the order in which they were popped from
> >>>> the available ring, relative to the other VirtQueueElements.
> >>>>
> >>>> For example, given 3 "elements" that were popped from the available
> >>>> ring, we assign a key value to them which represents their order (elem0
> >>>> is popped first, then elem1, then lastly elem2):
> >>>>
> >>>>        elem2   --  elem1   --  elem0   ---> Enqueue for processing
> >>>>       (key: 2)    (key: 1)    (key: 0)
> >>>>
> >>>> Then these elements are enqueued for further processing by the host.
> >>>>
> >>>> While most devices will return these completed elements in the same
> >>>> order in which they were enqueued, some devices may not (e.g.
> >>>> virtio-blk). To guarantee that these elements are put on the used ring
> >>>> in the same order in which they were enqueued, we can use a buffering
> >>>> mechanism that keeps track of the next expected sequence number of an
> >>>> element.
> >>>>
> >>>> In other words, if the completed element does not have a key value that
> >>>> matches the next expected sequence number, then we know this element is
> >>>> not in-order and we must stash it away in a hash table until an order
> >>>> can be made. The element's key value is used as the key for placing it
> >>>> in the hash table.
> >>>>
> >>>> If the completed element has a key value that matches the next expected
> >>>> sequence number, then we know this element is in-order and we can push
> >>>> it on the used ring. Then we increment the next expected sequence number
> >>>> and check if the hash table contains an element at this key location.
> >>>>
> >>>> If so, we retrieve this element, push it to the used ring, delete the
> >>>> key-value pair from the hash table, increment the next expected sequence
> >>>> number, and check the hash table again for an element at this new key
> >>>> location. This process is repeated until we're unable to find an element
> >>>> in the hash table to continue the order.
> >>>>
> >>>> So, for example, say the 3 elements we enqueued were completed in the
> >>>> following order: elem1, elem2, elem0. The next expected sequence number
> >>>> is 0:
> >>>>
> >>>>       exp-seq-num = 0:
> >>>>
> >>>>        elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> >>>>       (key: 1)                                         |
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |key: 1 - elem1|
> >>>>                                                  ================
> >>>>       ---------------------
> >>>>       exp-seq-num = 0:
> >>>>
> >>>>        elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> >>>>       (key: 2)                                         |
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |key: 1 - elem1|
> >>>>                                                  |--------------|
> >>>>                                                  |key: 2 - elem2|
> >>>>                                                  ================
> >>>>       ---------------------
> >>>>       exp-seq-num = 0:
> >>>>
> >>>>        elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
> >>>>       (key: 0)
> >>>>
> >>>>       exp-seq-num = 1:
> >>>>
> >>>>       lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>>>                                                remove elem from table
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |key: 2 - elem2|
> >>>>                                                  ================
> >>>>
> >>>>       exp-seq-num = 2:
> >>>>
> >>>>       lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
> >>>>                                                remove elem from table
> >>>>                                                        |
> >>>>                                                        v
> >>>>                                                  ================
> >>>>                                                  |   *empty*    |
> >>>>                                                  ================
> >>>>
> >>>>       exp-seq-num = 3:
> >>>>
> >>>>       lookup(table, exp-seq-num) != NULL ? --> No, done
> >>>>       ---------------------
> >>>>
> >>>
> >>> I think to use a hashtable to handle this has an important drawback:
> >>> it hurts performance on the devices that are using right in-order
> >>> because of hash calculus, to benefit devices that are using it badly
> >>> by using descriptors out of order. We should use data structs that are
> >>> as free as possible for the first, and we don't care to worse the
> >>> experience of the devices that enable in_order and they shouldn't.
> >>>
> >>
> >> Right, because if descriptors are coming in in-order, we still search
> >> the (empty) hash table.
> >>
> >> Hmm... what if we introduced a flag to see if we actually should bother
> >> searching the hash table? That way we avoid the cost of searching when
> >> we really don't need to.
> >>
> >>> So I suggest reusing vq->used_elems array vq. At each used descriptor
> >>> written in the used ring, you know the next head is elem->index +
> >>> elem->ndescs, so you can check if that element has been filled or not.
> >>> If used, it needs to be flushed too. If not used, just return.
> >>>
> >>> Of course virtqueue_flush also needs to take this into account.
> >>>
> >>> What do you think, does it make sense to you?
> >>>
> >>
> >> I'm having a bit of trouble understanding the suggestion here. Would you
> >> mind elaborating a bit more for me on this?
> >>
> >> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
> >> being first, elem2 last) and then elem2 finishes first, elem1 second,
> >> and elem0 third. Given that these elements finish out-of-order, how
> >> would you handle these out-of-order elements using your suggestion?
> >>
> >
> > virtqueue_fill is called first with elem2. So vq->used_elems[2 %
> > vq->num] is filled with the needed information of the descriptor:
> > index, len and ndescs. idx function parameter is ignored.
> >
> > Optionally, virtqueue_push is called. It checks if
> > vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num +
> > elem->out_num > 0, and reset them on every used ring write. If it is
> > not valid, this is a no-op. Currently, it is not valid.
> >
> > Same process for elem1.
> >
> > virtqueue_fill is the same for elem0. But now virtqueue_flush gets
> > interesting, as it detects vq->used_elems[0] is used. It scans for the
> > first not-used element, and it finds it is vq->used_elems[3]. So it
> > needs to write an used elem with id = 2 and the corresponding length.
> >
> > Maybe it is interesting to implement ways to improve the look for the
> > last used descriptor, but if any I'd go for a bitmap and always on top
> > of the basis series.
> >
> > The algorithm has not been tested, so maybe I've missed something.
> >
> > Thanks!
> >
>
> Thank you for taking the time to clarify for this for me, I appreciate it.
>
> I spent some time yesterday and this morning working this over in my
> head. I believe I understand what you're trying to do here and it makes
> more sense than employing a data structure like a hash table for this
> kind of job. However, I have a few questions regarding this implementation.
>
> So, one question is on the reuse of the VirtQueue's used_elems array.
> Wont reusing this array cause issues with packed VQ operations, since it
> also uses this array? If we want to stick with using this array
> specifically, perhaps we may need to rewrite its logic if the device has
> negotiated the in_order feature? E.g.
>
> virtqueue_packed_flush (...) {
>     if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER) {
>        // new logic
>     } else {
>       // current logic
>     }
> }
> -----------
>

That's right.

> Regarding this paragraph:
>
> "virtqueue_fill is called first with elem2. So vq->used_elems[2 %
> vq->num] is filled with the needed information of the descriptor:
> index, len and ndescs. idx function parameter is ignored."
>
> This looks exactly like virtqueue_packed_fill except for the idx
> parameter we'd pass in (sequence_num % vq->vring.num).
>
> In any case, regardless of whether this element being passed in is
> considered to be in-order or not, we still add this element to
> vq->used_elems in virtqueue_fill. Ok, got it.
>
> Then you say "Optionally, virtqueue_push is called". I assume by
> "optionally" you mean we need to know if this is a single-shot operation
> or a batched operation. A single-shot operation would call for
> virtqueue_push whereas a batched operation would just use
> virtqueue_fill. If this is what you meant by that then ok, I understand
> that too.
>

Totally correct.

> However, I think before we start considering whether or not we need to
> call virtqueue_push or continue with virtqueue_fill, we first should
> know whether or not this element is in-order. And I think to do that we
> should use the check you mentioned:
>
> if (vq->used_elems[vq->used_idx].in_num +
> vq->used_elems[vq->used_idx].out_num > 0)
>
> or perhaps:
>
> if (vq->used_elems[vq->used_idx] != NULL)
>
> If the element is found not to be in-order, I assume we return and we
> are done with the handling of this element for now.
>
> Now my confusion with this part comes from calling virtqueue_push inside
> of the virtqueue_fill function. Wouldn't calling virtqueue_push inside
> of virtqueue_fill present some kind of recursive execution path? Unless
> I'm missing something here, this probably isn't something we need to do,
> right?

Maybe I explained something wrong, but virtqueue_fill should not call
virtqueue_push. It's up to the caller (virtio-net, virtio-blk, etc) to
call one or another. Can you elaborate?

> -----------
>
> Lastly, when execution reaches virtqueue_flush, what would define an
> element as unused? Perhaps...
>
> if (vq->used_elems[i] == NULL)
>

used_elems is not an array of pointers but an array of
VirtQueueElement so we cannot do this way.

> or
>
> if (vq->used_elems[i].in_num + vq->used_elems[i].out_num > 0)
>

Right, I propose to reset both in_num = out_num = 0.

Thanks!

> Thanks Eugenio!
>
> >> Thanks :)
> >>
> >>> Thanks!
> >>>
> >>>
> >>>> Jonah Palmer (8):
> >>>>     virtio: Define InOrderVQElement
> >>>>     virtio: Create/destroy/reset VirtQueue In-Order hash table
> >>>>     virtio: Define order variables
> >>>>     virtio: Implement in-order handling for virtio devices
> >>>>     virtio-net: in-order handling
> >>>>     vhost-svq: in-order handling
> >>>>     vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
> >>>>     virtio: Add VIRTIO_F_IN_ORDER property definition
> >>>>
> >>>>    hw/block/vhost-user-blk.c          |   1 +
> >>>>    hw/net/vhost_net.c                 |   2 +
> >>>>    hw/net/virtio-net.c                |   6 +-
> >>>>    hw/scsi/vhost-scsi.c               |   1 +
> >>>>    hw/scsi/vhost-user-scsi.c          |   1 +
> >>>>    hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
> >>>>    hw/virtio/vhost-user-fs.c          |   1 +
> >>>>    hw/virtio/vhost-user-vsock.c       |   1 +
> >>>>    hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
> >>>>    include/hw/virtio/virtio.h         |  20 +++++-
> >>>>    net/vhost-vdpa.c                   |   1 +
> >>>>    11 files changed, 145 insertions(+), 7 deletions(-)
> >>>>
> >>>> --
> >>>> 2.39.3
> >>>>
> >>>
> >>
> >
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
  2024-03-26 18:34         ` Eugenio Perez Martin
@ 2024-03-26 19:01           ` Jonah Palmer
  0 siblings, 0 replies; 25+ messages in thread
From: Jonah Palmer @ 2024-03-26 19:01 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 3/26/24 2:34 PM, Eugenio Perez Martin wrote:
> On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:
>>> On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
>>>>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>>>
>>>>>> The goal of these patches is to add support to a variety of virtio and
>>>>>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
>>>>>> indicates that all buffers are used by the device in the same order in
>>>>>> which they were made available by the driver.
>>>>>>
>>>>>> These patches attempt to implement a generalized, non-device-specific
>>>>>> solution to support this feature.
>>>>>>
>>>>>> The core feature behind this solution is a buffer mechanism in the form
>>>>>> of GLib's GHashTable. The decision behind using a hash table was to
>>>>>> leverage their ability for quick lookup, insertion, and removal
>>>>>> operations. Given that our keys are simply numbers of an ordered
>>>>>> sequence, a hash table seemed like the best choice for a buffer
>>>>>> mechanism.
>>>>>>
>>>>>> ---------------------
>>>>>>
>>>>>> The strategy behind this implementation is as follows:
>>>>>>
>>>>>> We know that buffers that are popped from the available ring and enqueued
>>>>>> for further processing will always done in the same order in which they
>>>>>> were made available by the driver. Given this, we can note their order
>>>>>> by assigning the resulting VirtQueueElement a key. This key is a number
>>>>>> in a sequence that represents the order in which they were popped from
>>>>>> the available ring, relative to the other VirtQueueElements.
>>>>>>
>>>>>> For example, given 3 "elements" that were popped from the available
>>>>>> ring, we assign a key value to them which represents their order (elem0
>>>>>> is popped first, then elem1, then lastly elem2):
>>>>>>
>>>>>>         elem2   --  elem1   --  elem0   ---> Enqueue for processing
>>>>>>        (key: 2)    (key: 1)    (key: 0)
>>>>>>
>>>>>> Then these elements are enqueued for further processing by the host.
>>>>>>
>>>>>> While most devices will return these completed elements in the same
>>>>>> order in which they were enqueued, some devices may not (e.g.
>>>>>> virtio-blk). To guarantee that these elements are put on the used ring
>>>>>> in the same order in which they were enqueued, we can use a buffering
>>>>>> mechanism that keeps track of the next expected sequence number of an
>>>>>> element.
>>>>>>
>>>>>> In other words, if the completed element does not have a key value that
>>>>>> matches the next expected sequence number, then we know this element is
>>>>>> not in-order and we must stash it away in a hash table until an order
>>>>>> can be made. The element's key value is used as the key for placing it
>>>>>> in the hash table.
>>>>>>
>>>>>> If the completed element has a key value that matches the next expected
>>>>>> sequence number, then we know this element is in-order and we can push
>>>>>> it on the used ring. Then we increment the next expected sequence number
>>>>>> and check if the hash table contains an element at this key location.
>>>>>>
>>>>>> If so, we retrieve this element, push it to the used ring, delete the
>>>>>> key-value pair from the hash table, increment the next expected sequence
>>>>>> number, and check the hash table again for an element at this new key
>>>>>> location. This process is repeated until we're unable to find an element
>>>>>> in the hash table to continue the order.
>>>>>>
>>>>>> So, for example, say the 3 elements we enqueued were completed in the
>>>>>> following order: elem1, elem2, elem0. The next expected sequence number
>>>>>> is 0:
>>>>>>
>>>>>>        exp-seq-num = 0:
>>>>>>
>>>>>>         elem1   --> elem1.key == exp-seq-num ? --> No, stash it
>>>>>>        (key: 1)                                         |
>>>>>>                                                         |
>>>>>>                                                         v
>>>>>>                                                   ================
>>>>>>                                                   |key: 1 - elem1|
>>>>>>                                                   ================
>>>>>>        ---------------------
>>>>>>        exp-seq-num = 0:
>>>>>>
>>>>>>         elem2   --> elem2.key == exp-seq-num ? --> No, stash it
>>>>>>        (key: 2)                                         |
>>>>>>                                                         |
>>>>>>                                                         v
>>>>>>                                                   ================
>>>>>>                                                   |key: 1 - elem1|
>>>>>>                                                   |--------------|
>>>>>>                                                   |key: 2 - elem2|
>>>>>>                                                   ================
>>>>>>        ---------------------
>>>>>>        exp-seq-num = 0:
>>>>>>
>>>>>>         elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>>>>>>        (key: 0)
>>>>>>
>>>>>>        exp-seq-num = 1:
>>>>>>
>>>>>>        lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>>>>>                                                 remove elem from table
>>>>>>                                                         |
>>>>>>                                                         v
>>>>>>                                                   ================
>>>>>>                                                   |key: 2 - elem2|
>>>>>>                                                   ================
>>>>>>
>>>>>>        exp-seq-num = 2:
>>>>>>
>>>>>>        lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>>>>>>                                                 remove elem from table
>>>>>>                                                         |
>>>>>>                                                         v
>>>>>>                                                   ================
>>>>>>                                                   |   *empty*    |
>>>>>>                                                   ================
>>>>>>
>>>>>>        exp-seq-num = 3:
>>>>>>
>>>>>>        lookup(table, exp-seq-num) != NULL ? --> No, done
>>>>>>        ---------------------
>>>>>>
>>>>>
>>>>> I think to use a hashtable to handle this has an important drawback:
>>>>> it hurts performance on the devices that are using right in-order
>>>>> because of hash calculus, to benefit devices that are using it badly
>>>>> by using descriptors out of order. We should use data structs that are
>>>>> as free as possible for the first, and we don't care to worse the
>>>>> experience of the devices that enable in_order and they shouldn't.
>>>>>
>>>>
>>>> Right, because if descriptors are coming in in-order, we still search
>>>> the (empty) hash table.
>>>>
>>>> Hmm... what if we introduced a flag to see if we actually should bother
>>>> searching the hash table? That way we avoid the cost of searching when
>>>> we really don't need to.
>>>>
>>>>> So I suggest reusing vq->used_elems array vq. At each used descriptor
>>>>> written in the used ring, you know the next head is elem->index +
>>>>> elem->ndescs, so you can check if that element has been filled or not.
>>>>> If used, it needs to be flushed too. If not used, just return.
>>>>>
>>>>> Of course virtqueue_flush also needs to take this into account.
>>>>>
>>>>> What do you think, does it make sense to you?
>>>>>
>>>>
>>>> I'm having a bit of trouble understanding the suggestion here. Would you
>>>> mind elaborating a bit more for me on this?
>>>>
>>>> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
>>>> being first, elem2 last) and then elem2 finishes first, elem1 second,
>>>> and elem0 third. Given that these elements finish out-of-order, how
>>>> would you handle these out-of-order elements using your suggestion?
>>>>
>>>
>>> virtqueue_fill is called first with elem2. So vq->used_elems[2 %
>>> vq->num] is filled with the needed information of the descriptor:
>>> index, len and ndescs. idx function parameter is ignored.
>>>
>>> Optionally, virtqueue_push is called. It checks if
>>> vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num +
>>> elem->out_num > 0, and reset them on every used ring write. If it is
>>> not valid, this is a no-op. Currently, it is not valid.
>>>
>>> Same process for elem1.
>>>
>>> virtqueue_fill is the same for elem0. But now virtqueue_flush gets
>>> interesting, as it detects vq->used_elems[0] is used. It scans for the
>>> first not-used element, and it finds it is vq->used_elems[3]. So it
>>> needs to write an used elem with id = 2 and the corresponding length.
>>>
>>> Maybe it is interesting to implement ways to improve the look for the
>>> last used descriptor, but if any I'd go for a bitmap and always on top
>>> of the basis series.
>>>
>>> The algorithm has not been tested, so maybe I've missed something.
>>>
>>> Thanks!
>>>
>>
>> Thank you for taking the time to clarify for this for me, I appreciate it.
>>
>> I spent some time yesterday and this morning working this over in my
>> head. I believe I understand what you're trying to do here and it makes
>> more sense than employing a data structure like a hash table for this
>> kind of job. However, I have a few questions regarding this implementation.
>>
>> So, one question is on the reuse of the VirtQueue's used_elems array.
>> Wont reusing this array cause issues with packed VQ operations, since it
>> also uses this array? If we want to stick with using this array
>> specifically, perhaps we may need to rewrite its logic if the device has
>> negotiated the in_order feature? E.g.
>>
>> virtqueue_packed_flush (...) {
>>      if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER) {
>>         // new logic
>>      } else {
>>        // current logic
>>      }
>> }
>> -----------
>>
> 
> That's right.
> 
>> Regarding this paragraph:
>>
>> "virtqueue_fill is called first with elem2. So vq->used_elems[2 %
>> vq->num] is filled with the needed information of the descriptor:
>> index, len and ndescs. idx function parameter is ignored."
>>
>> This looks exactly like virtqueue_packed_fill except for the idx
>> parameter we'd pass in (sequence_num % vq->vring.num).
>>
>> In any case, regardless of whether this element being passed in is
>> considered to be in-order or not, we still add this element to
>> vq->used_elems in virtqueue_fill. Ok, got it.
>>
>> Then you say "Optionally, virtqueue_push is called". I assume by
>> "optionally" you mean we need to know if this is a single-shot operation
>> or a batched operation. A single-shot operation would call for
>> virtqueue_push whereas a batched operation would just use
>> virtqueue_fill. If this is what you meant by that then ok, I understand
>> that too.
>>
> 
> Totally correct.
> 
>> However, I think before we start considering whether or not we need to
>> call virtqueue_push or continue with virtqueue_fill, we first should
>> know whether or not this element is in-order. And I think to do that we
>> should use the check you mentioned:
>>
>> if (vq->used_elems[vq->used_idx].in_num +
>> vq->used_elems[vq->used_idx].out_num > 0)
>>
>> or perhaps:
>>
>> if (vq->used_elems[vq->used_idx] != NULL)
>>
>> If the element is found not to be in-order, I assume we return and we
>> are done with the handling of this element for now.
>>
>> Now my confusion with this part comes from calling virtqueue_push inside
>> of the virtqueue_fill function. Wouldn't calling virtqueue_push inside
>> of virtqueue_fill present some kind of recursive execution path? Unless
>> I'm missing something here, this probably isn't something we need to do,
>> right?
> 
> Maybe I explained something wrong, but virtqueue_fill should not call
> virtqueue_push. It's up to the caller (virtio-net, virtio-blk, etc) to
> call one or another. Can you elaborate?
> 

Oh I see, my apologies! I misunderstood and thought you were suggesting 
to call virtqueue_push inside of virtqueue_fill.

>> -----------
>>
>> Lastly, when execution reaches virtqueue_flush, what would define an
>> element as unused? Perhaps...
>>
>> if (vq->used_elems[i] == NULL)
>>
> 
> used_elems is not an array of pointers but an array of
> VirtQueueElement so we cannot do this way.
> 
>> or
>>
>> if (vq->used_elems[i].in_num + vq->used_elems[i].out_num > 0)
>>
> 
> Right, I propose to reset both in_num = out_num = 0.
> 
> Thanks!
> 

Gotcha. I'll take this and work on getting a v2 RFC out.

Thank you for the re-clarifying things again Eugenio!

>> Thanks Eugenio!
>>
>>>> Thanks :)
>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>> Jonah Palmer (8):
>>>>>>      virtio: Define InOrderVQElement
>>>>>>      virtio: Create/destroy/reset VirtQueue In-Order hash table
>>>>>>      virtio: Define order variables
>>>>>>      virtio: Implement in-order handling for virtio devices
>>>>>>      virtio-net: in-order handling
>>>>>>      vhost-svq: in-order handling
>>>>>>      vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>>>>>>      virtio: Add VIRTIO_F_IN_ORDER property definition
>>>>>>
>>>>>>     hw/block/vhost-user-blk.c          |   1 +
>>>>>>     hw/net/vhost_net.c                 |   2 +
>>>>>>     hw/net/virtio-net.c                |   6 +-
>>>>>>     hw/scsi/vhost-scsi.c               |   1 +
>>>>>>     hw/scsi/vhost-user-scsi.c          |   1 +
>>>>>>     hw/virtio/vhost-shadow-virtqueue.c |  15 ++++-
>>>>>>     hw/virtio/vhost-user-fs.c          |   1 +
>>>>>>     hw/virtio/vhost-user-vsock.c       |   1 +
>>>>>>     hw/virtio/virtio.c                 | 103 ++++++++++++++++++++++++++++-
>>>>>>     include/hw/virtio/virtio.h         |  20 +++++-
>>>>>>     net/vhost-vdpa.c                   |   1 +
>>>>>>     11 files changed, 145 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.39.3
>>>>>>
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-03-26 19:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
2024-03-22  9:45   ` Eugenio Perez Martin
2024-03-25 17:08     ` Jonah Palmer
2024-03-25 19:12       ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table Jonah Palmer
2024-03-21 15:57 ` [RFC 3/8] virtio: Define order variables Jonah Palmer
2024-03-21 15:57 ` [RFC 4/8] virtio: Implement in-order handling for virtio devices Jonah Palmer
2024-03-22 10:46   ` Eugenio Perez Martin
2024-03-25 17:34     ` Jonah Palmer
2024-03-25 19:45       ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 5/8] virtio-net: in-order handling Jonah Palmer
2024-03-21 15:57 ` [RFC 6/8] vhost-svq: " Jonah Palmer
2024-03-21 15:57 ` [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
2024-03-22 10:47   ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
2024-03-22 10:48   ` Eugenio Perez Martin
2024-03-21 19:48 ` [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Dongli Zhang
2024-03-21 21:25   ` Jonah Palmer
2024-03-22 11:18 ` Eugenio Perez Martin
2024-03-25 16:52   ` Jonah Palmer
2024-03-25 20:33     ` Eugenio Perez Martin
2024-03-26 16:49       ` Jonah Palmer
2024-03-26 18:34         ` Eugenio Perez Martin
2024-03-26 19:01           ` Jonah Palmer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.