virtio-fs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support
@ 2024-03-28 16:21 Jonah Palmer
  2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Jonah Palmer @ 2024-03-28 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequent elements are completed out-of-order.

A VirtQueue whose device who uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that was already written to the
used/descriptor ring and an "unused" element being an element that
wasn't), we use an element's in_num and out_num values. If the sum of
these two values is greater than 0, the element is considered unused. If
the sum is 0, then the element is considered used and invalid. When we
find an order and write the element to the used/descriptor ring, we set
these two values to 0 to indicate that it's been used.

---
v2: Use a VirtQueue's used_elems array as a buffer mechanism

v1: Implement custom GLib GHashTable as a buffer mechanism

Jonah Palmer (5):
  virtio: Initialize sequence variables
  virtio: In-order support for split VQs
  virtio: In-order support for packed VQs
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c    |   1 +
 hw/net/vhost_net.c           |   2 +
 hw/scsi/vhost-scsi.c         |   1 +
 hw/scsi/vhost-user-scsi.c    |   1 +
 hw/virtio/vhost-user-fs.c    |   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c           | 118 +++++++++++++++++++++++++++++++----
 include/hw/virtio/virtio.h   |   5 +-
 net/vhost-vdpa.c             |   1 +
 9 files changed, 119 insertions(+), 12 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC v2 1/5] virtio: Initialize sequence variables
  2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
@ 2024-03-28 16:21 ` Jonah Palmer
  2024-04-03 10:18   ` Eugenio Perez Martin
  2024-03-28 16:22 ` [RFC v2 2/5] virtio: In-order support for split VQs Jonah Palmer
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jonah Palmer @ 2024-03-28 16:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/virtio.c         | 18 ++++++++++++++++++
 include/hw/virtio/virtio.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
     uint16_t used_idx;
     bool used_wrap_counter;
 
+    /* In-Order sequence indices */
+    uint16_t used_seq_idx;
+    uint16_t current_seq_idx;
+
     /* Last used index value we have signalled on */
     uint16_t signalled_used;
 
@@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
         elem->in_sg[i] = iov[out_num + i];
     }
 
+    /* Assign sequence index for in-order processing */
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+        elem->seq_idx = vq->current_seq_idx++;
+    }
+
     vq->inuse++;
 
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
     vq->shadow_avail_idx = vq->last_avail_idx;
     vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
 
+    /* Assign sequence index for in-order processing */
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+        elem->seq_idx = vq->current_seq_idx++;
+    }
+
     trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
     address_space_cache_destroy(&indirect_desc_cache);
@@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
     vdev->vq[i].notification = true;
     vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
     vdev->vq[i].inuse = 0;
+    vdev->vq[i].used_seq_idx = 0;
+    vdev->vq[i].current_seq_idx = 0;
     virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
 }
 
@@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
     vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
     vdev->vq[i].handle_output = handle_output;
     vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
+    vdev->vq[i].used_seq_idx = 0;
+    vdev->vq[i].current_seq_idx = 0;
 
     return &vdev->vq[i];
 }
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..910b2a3427 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -75,6 +75,7 @@ typedef struct VirtQueueElement
     hwaddr *out_addr;
     struct iovec *in_sg;
     struct iovec *out_sg;
+    uint16_t seq_idx;
 } VirtQueueElement;
 
 #define VIRTIO_QUEUE_MAX 1024
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v2 2/5] virtio: In-order support for split VQs
  2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
  2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
@ 2024-03-28 16:22 ` Jonah Palmer
  2024-03-28 16:22 ` [RFC v2 3/5] virtio: In-order support for packed VQs Jonah Palmer
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Jonah Palmer @ 2024-03-28 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Implements VIRTIO_F_IN_ORDER feature support for virtio devices using
the split virtqueue layout.

For a virtio device that has negotiated the VIRTIO_F_IN_ORDER feature
whose virtqueues use a split virtqueue layout, it's essential that
used VirtQueueElements are written to the used ring in-order.

For devices that use this in-order feature, its VirtQueue's used_elems
array is used to hold processed VirtQueueElements until they can be
presented to the driver in-order.

In the split virtqueue case, we check to see if the element was the next
expected element to be written to the used ring. If it's not, nothing
get written to the used ring and we're done. If it is, the element is
written to the used ring and then we check to see if the next expected
element continues the order. This process is repeated until we're unable
to continue the order.

If no elements were written to the used ring, no update to the used
ring's index is needed.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/virtio.c | 50 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 069d96df99..19d3d43816 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -856,16 +856,38 @@ static void virtqueue_split_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
     VRingUsedElem uelem;
+    uint16_t uelem_idx;
 
     if (unlikely(!vq->vring.used)) {
         return;
     }
 
-    idx = (idx + vq->used_idx) % vq->vring.num;
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+        /* Write element(s) to used ring if they're in-order */
+        while (true) {
+            uelem_idx = vq->used_seq_idx % vq->vring.num;
 
-    uelem.id = elem->index;
-    uelem.len = len;
-    vring_used_write(vq, &uelem, idx);
+            /* Stop if element has been used */
+            if (vq->used_elems[uelem_idx].in_num +
+                vq->used_elems[uelem_idx].out_num <= 0) {
+                break;
+            }
+            uelem.id = vq->used_elems[uelem_idx].index;
+            uelem.len = vq->used_elems[uelem_idx].len;
+            vring_used_write(vq, &uelem, uelem_idx);
+
+            /* Mark this element as used */
+            vq->used_elems[uelem_idx].in_num = 0;
+            vq->used_elems[uelem_idx].out_num = 0;
+            vq->used_seq_idx++;
+        }
+    } else {
+        idx = (idx + vq->used_idx) % vq->vring.num;
+
+        uelem.id = elem->index;
+        uelem.len = len;
+        vring_used_write(vq, &uelem, idx);
+    }
 }
 
 static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
@@ -918,6 +940,8 @@ static void virtqueue_packed_fill_desc(VirtQueue *vq,
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
+    uint16_t seq_idx;
+
     trace_virtqueue_fill(vq, elem, len, idx);
 
     virtqueue_unmap_sg(vq, elem, len);
@@ -926,6 +950,16 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
         return;
     }
 
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+        seq_idx = elem->seq_idx % vq->vring.num;
+
+        vq->used_elems[seq_idx].index = elem->index;
+        vq->used_elems[seq_idx].len = elem->len;
+        vq->used_elems[seq_idx].ndescs = elem->ndescs;
+        vq->used_elems[seq_idx].in_num = elem->in_num;
+        vq->used_elems[seq_idx].out_num = elem->out_num;
+    }
+
     if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
         virtqueue_packed_fill(vq, elem, len, idx);
     } else {
@@ -944,6 +978,14 @@ static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
 
     /* Make sure buffer is written before we update index. */
     smp_wmb();
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+        count = (vq->used_seq_idx - vq->used_idx) % vq->vring.num;
+
+        /* No in-order elements were written, nothing to update */
+        if (!count) {
+            return;
+        }
+    }
     trace_virtqueue_flush(vq, count);
     old = vq->used_idx;
     new = old + count;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v2 3/5] virtio: In-order support for packed VQs
  2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
  2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
  2024-03-28 16:22 ` [RFC v2 2/5] virtio: In-order support for split VQs Jonah Palmer
@ 2024-03-28 16:22 ` Jonah Palmer
  2024-03-28 16:22 ` [RFC v2 4/5] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
  2024-03-28 16:22 ` [RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
  4 siblings, 0 replies; 14+ messages in thread
From: Jonah Palmer @ 2024-03-28 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Implements VIRTIO_F_IN_ORDER feature support for virtio devices using
the packed virtqueue layout.

For a virtio device that has negotiated the VIRTIO_F_IN_ORDER feature
whose virtqueues use a packed virtqueue layout, it's essential that used
VirtQueueElements are written to the descriptor ring in-order.

In the packed virtqueue case, since we already write to the virtqueue's
used_elems array at the start of virtqueue_fill, we don't need to call
virtqueue_packed_fill. Furthermore, due to change in behavior of the
used_elems array and not knowing how many unused in-order elements
exist, separate logic is required for the flushing operation of packed
virtqueues.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/virtio.c | 50 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 19d3d43816..dc2eabd18b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -960,7 +960,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
         vq->used_elems[seq_idx].out_num = elem->out_num;
     }
 
-    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) &&
+        !virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
         virtqueue_packed_fill(vq, elem, len, idx);
     } else {
         virtqueue_split_fill(vq, elem, len, idx);
@@ -997,18 +998,53 @@ static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
 
 static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
 {
-    unsigned int i, ndescs = 0;
+    unsigned int i, j, uelem_idx, ndescs = 0;
 
     if (unlikely(!vq->vring.desc)) {
         return;
     }
 
-    for (i = 1; i < count; i++) {
-        virtqueue_packed_fill_desc(vq, &vq->used_elems[i], i, false);
-        ndescs += vq->used_elems[i].ndescs;
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+        /* First expected element is used, nothing to do */
+        if (vq->used_elems[vq->used_idx].in_num +
+            vq->used_elems[vq->used_idx].out_num <= 0) {
+            return;
+        }
+
+        j = vq->used_idx;
+
+        for (i = j + 1; ; i++) {
+            uelem_idx = i % vq->vring.num;
+
+            /* Stop if element has been used */
+            if (vq->used_elems[uelem_idx].in_num +
+                vq->used_elems[uelem_idx].out_num <= 0) {
+                break;
+            }
+
+            virtqueue_packed_fill_desc(vq, &vq->used_elems[uelem_idx],
+                                       uelem_idx, false);
+            ndescs += vq->used_elems[uelem_idx].ndescs;
+
+            /* Mark this element as used */
+            vq->used_elems[uelem_idx].in_num = 0;
+            vq->used_elems[uelem_idx].out_num = 0;
+        }
+
+        /* Mark first expected element as used */
+        vq->used_elems[vq->used_idx].in_num = 0;
+        vq->used_elems[vq->used_idx].out_num = 0;
+    } else {
+        j = 0;
+
+        for (i = 1; i < count; i++) {
+            virtqueue_packed_fill_desc(vq, &vq->used_elems[i], i, false);
+            ndescs += vq->used_elems[i].ndescs;
+        }
     }
-    virtqueue_packed_fill_desc(vq, &vq->used_elems[0], 0, true);
-    ndescs += vq->used_elems[0].ndescs;
+
+    virtqueue_packed_fill_desc(vq, &vq->used_elems[j], j, true);
+    ndescs += vq->used_elems[j].ndescs;
 
     vq->inuse -= ndescs;
     vq->used_idx += ndescs;
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v2 4/5] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (2 preceding siblings ...)
  2024-03-28 16:22 ` [RFC v2 3/5] virtio: In-order support for packed VQs Jonah Palmer
@ 2024-03-28 16:22 ` Jonah Palmer
  2024-03-28 16:22 ` [RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
  4 siblings, 0 replies; 14+ messages in thread
From: Jonah Palmer @ 2024-03-28 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/block/vhost-user-blk.c    | 1 +
 hw/net/vhost_net.c           | 2 ++
 hw/scsi/vhost-scsi.c         | 1 +
 hw/scsi/vhost-user-scsi.c    | 1 +
 hw/virtio/vhost-user-fs.c    | 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c             | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..d176ed857e 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VIRTIO_NET_F_HASH_REPORT,
     VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VIRTIO_NET_F_RSS,
     VIRTIO_NET_F_HASH_REPORT,
     VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_SCSI_F_HOTPLUG,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_SCSI_F_HOTPLUG,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_IN_ORDER,
 
     VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_INDIRECT_DESC,
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_F_NOTIFY_ON_EMPTY,
+    VIRTIO_F_IN_ORDER,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
     VIRTIO_F_VERSION_1,
+    VIRTIO_F_IN_ORDER,
     VIRTIO_NET_F_CSUM,
     VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
     VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition
  2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
                   ` (3 preceding siblings ...)
  2024-03-28 16:22 ` [RFC v2 4/5] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
@ 2024-03-28 16:22 ` Jonah Palmer
  4 siblings, 0 replies; 14+ messages in thread
From: Jonah Palmer @ 2024-03-28 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam, eperezma,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky, jonah.palmer

Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 910b2a3427..dd0ba6e57f 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -385,7 +385,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
     DEFINE_PROP_BIT64("packed", _state, _field, \
                       VIRTIO_F_RING_PACKED, false), \
     DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-                      VIRTIO_F_RING_RESET, true)
+                      VIRTIO_F_RING_RESET, true), \
+    DEFINE_PROP_BIT64("in_order", _state, _field, \
+                      VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
@ 2024-04-03 10:18   ` Eugenio Perez Martin
  2024-04-03 16:51     ` Jonah Palmer
  0 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2024-04-03 10:18 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Initialize sequence variables for VirtQueue and VirtQueueElement
> structures. A VirtQueue's sequence variables are initialized when a
> VirtQueue is being created or reset. A VirtQueueElement's sequence
> variable is initialized when a VirtQueueElement is being initialized.
> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>
> A VirtQueue's used_seq_idx represents the next expected index in a
> sequence of VirtQueueElements to be processed (put on the used ring).
> The next VirtQueueElement added to the used ring must match this
> sequence number before additional elements can be safely added to the
> used ring. It's also particularly useful for helping find the number of
> new elements added to the used ring.
>
> A VirtQueue's current_seq_idx represents the current sequence index.
> This value is essentially a counter where the value is assigned to a new
> VirtQueueElement and then incremented. Given its uint16_t type, this
> sequence number can be between 0 and 65,535.
>
> A VirtQueueElement's seq_idx represents the sequence number assigned to
> the VirtQueueElement when it was created. This value must match with the
> VirtQueue's used_seq_idx before the element can be put on the used ring
> by the device.
>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  hw/virtio/virtio.c         | 18 ++++++++++++++++++
>  include/hw/virtio/virtio.h |  1 +
>  2 files changed, 19 insertions(+)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index fb6b4ccd83..069d96df99 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -132,6 +132,10 @@ struct VirtQueue
>      uint16_t used_idx;
>      bool used_wrap_counter;
>
> +    /* In-Order sequence indices */
> +    uint16_t used_seq_idx;
> +    uint16_t current_seq_idx;
> +

I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?

>      /* Last used index value we have signalled on */
>      uint16_t signalled_used;
>
> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>          elem->in_sg[i] = iov[out_num + i];
>      }
>
> +    /* Assign sequence index for in-order processing */
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +        elem->seq_idx = vq->current_seq_idx++;
> +    }
> +
>      vq->inuse++;
>
>      trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>      vq->shadow_avail_idx = vq->last_avail_idx;
>      vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>
> +    /* Assign sequence index for in-order processing */
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +        elem->seq_idx = vq->current_seq_idx++;
> +    }
> +
>      trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>  done:
>      address_space_cache_destroy(&indirect_desc_cache);
> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
>      vdev->vq[i].notification = true;
>      vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>      vdev->vq[i].inuse = 0;
> +    vdev->vq[i].used_seq_idx = 0;
> +    vdev->vq[i].current_seq_idx = 0;
>      virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
>  }
>
> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>      vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>      vdev->vq[i].handle_output = handle_output;
>      vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
> +    vdev->vq[i].used_seq_idx = 0;
> +    vdev->vq[i].current_seq_idx = 0;
>
>      return &vdev->vq[i];
>  }
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b3c74a1bca..910b2a3427 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>      hwaddr *out_addr;
>      struct iovec *in_sg;
>      struct iovec *out_sg;
> +    uint16_t seq_idx;
>  } VirtQueueElement;
>
>  #define VIRTIO_QUEUE_MAX 1024
> --
> 2.39.3
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-03 10:18   ` Eugenio Perez Martin
@ 2024-04-03 16:51     ` Jonah Palmer
  2024-04-04 11:35       ` Eugenio Perez Martin
  0 siblings, 1 reply; 14+ messages in thread
From: Jonah Palmer @ 2024-04-03 16:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>> Initialize sequence variables for VirtQueue and VirtQueueElement
>> structures. A VirtQueue's sequence variables are initialized when a
>> VirtQueue is being created or reset. A VirtQueueElement's sequence
>> variable is initialized when a VirtQueueElement is being initialized.
>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>>
>> A VirtQueue's used_seq_idx represents the next expected index in a
>> sequence of VirtQueueElements to be processed (put on the used ring).
>> The next VirtQueueElement added to the used ring must match this
>> sequence number before additional elements can be safely added to the
>> used ring. It's also particularly useful for helping find the number of
>> new elements added to the used ring.
>>
>> A VirtQueue's current_seq_idx represents the current sequence index.
>> This value is essentially a counter where the value is assigned to a new
>> VirtQueueElement and then incremented. Given its uint16_t type, this
>> sequence number can be between 0 and 65,535.
>>
>> A VirtQueueElement's seq_idx represents the sequence number assigned to
>> the VirtQueueElement when it was created. This value must match with the
>> VirtQueue's used_seq_idx before the element can be put on the used ring
>> by the device.
>>
>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>> ---
>>   hw/virtio/virtio.c         | 18 ++++++++++++++++++
>>   include/hw/virtio/virtio.h |  1 +
>>   2 files changed, 19 insertions(+)
>>
>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>> index fb6b4ccd83..069d96df99 100644
>> --- a/hw/virtio/virtio.c
>> +++ b/hw/virtio/virtio.c
>> @@ -132,6 +132,10 @@ struct VirtQueue
>>       uint16_t used_idx;
>>       bool used_wrap_counter;
>>
>> +    /* In-Order sequence indices */
>> +    uint16_t used_seq_idx;
>> +    uint16_t current_seq_idx;
>> +
> 
> I'm having a hard time understanding the difference between these and
> last_avail_idx and used_idx. It seems to me if we replace them
> everything will work? What am I missing?
> 

For used_seq_idx, it does work like used_idx except the difference is 
when their values get updated, specifically for the split VQ case.

As you know, for the split VQ case, the used_idx is updated during 
virtqueue_split_flush. However, imagine a batch of elements coming in 
where virtqueue_split_fill is called multiple times before 
virtqueue_split_flush. We want to make sure we write these elements to 
the used ring in-order and we'll know its order based on used_seq_idx.

Alternatively, I thought about replicating the logic for the packed VQ 
case (where this used_seq_idx isn't used) where we start looking at 
vq->used_elems[vq->used_idx] and iterate through until we find a used 
element, but I wasn't sure how to handle the case where elements get 
used (written to the used ring) and new elements get put in used_elems 
before the used_idx is updated. Since this search would require us to 
always start at index vq->used_idx.

For example, say, of three elements getting filled (elem0 - elem2), 
elem1 and elem0 come back first (vq->used_idx = 0):

elem1 - not in-order
elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
         in-order, write elem0 and elem1 to used ring, mark elements as
         used

Then elem2 comes back, but vq->used_idx is still 0, so how do we know to 
ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1 
(elem1) and iterate to vq->used_idx + 2 (elem2)?

Hmm... now that I'm thinking about it, maybe for the split VQ case we 
could continue looking through the vq->used_elems array until we find an 
unused element... but then again how would we (1) know if the element is 
in-order and (2) know when to stop searching?

In any case, the use of this variable could be seen as an optimization 
as its value will tell us where to start looking in vq->used_elems 
instead of always starting at vq->used_idx.

If this is like a one-shot scenario where one element gets written and 
then flushed after, then yes in this case used_seq_idx == used_idx.

------

For current_seq_idx, this is pretty much just a counter. Every new 
VirtQueueElement created from virtqueue_pop is given a number and the 
counter is incremented. Like grabbing a ticket number and waiting for 
your number to be called. The next person to grab a ticket number will 
be your number + 1.

Let me know if I'm making any sense. Thanks :)

Jonah

>>       /* Last used index value we have signalled on */
>>       uint16_t signalled_used;
>>
>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>>           elem->in_sg[i] = iov[out_num + i];
>>       }
>>
>> +    /* Assign sequence index for in-order processing */
>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>> +        elem->seq_idx = vq->current_seq_idx++;
>> +    }
>> +
>>       vq->inuse++;
>>
>>       trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>>       vq->shadow_avail_idx = vq->last_avail_idx;
>>       vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>>
>> +    /* Assign sequence index for in-order processing */
>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>> +        elem->seq_idx = vq->current_seq_idx++;
>> +    }
>> +
>>       trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>   done:
>>       address_space_cache_destroy(&indirect_desc_cache);
>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
>>       vdev->vq[i].notification = true;
>>       vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>>       vdev->vq[i].inuse = 0;
>> +    vdev->vq[i].used_seq_idx = 0;
>> +    vdev->vq[i].current_seq_idx = 0;
>>       virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
>>   }
>>
>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>       vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>>       vdev->vq[i].handle_output = handle_output;
>>       vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
>> +    vdev->vq[i].used_seq_idx = 0;
>> +    vdev->vq[i].current_seq_idx = 0;
>>
>>       return &vdev->vq[i];
>>   }
>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> index b3c74a1bca..910b2a3427 100644
>> --- a/include/hw/virtio/virtio.h
>> +++ b/include/hw/virtio/virtio.h
>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>>       hwaddr *out_addr;
>>       struct iovec *in_sg;
>>       struct iovec *out_sg;
>> +    uint16_t seq_idx;
>>   } VirtQueueElement;
>>
>>   #define VIRTIO_QUEUE_MAX 1024
>> --
>> 2.39.3
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-03 16:51     ` Jonah Palmer
@ 2024-04-04 11:35       ` Eugenio Perez Martin
  2024-04-04 14:41         ` Jonah Palmer
  0 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2024-04-04 11:35 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >> Initialize sequence variables for VirtQueue and VirtQueueElement
> >> structures. A VirtQueue's sequence variables are initialized when a
> >> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >> variable is initialized when a VirtQueueElement is being initialized.
> >> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>
> >> A VirtQueue's used_seq_idx represents the next expected index in a
> >> sequence of VirtQueueElements to be processed (put on the used ring).
> >> The next VirtQueueElement added to the used ring must match this
> >> sequence number before additional elements can be safely added to the
> >> used ring. It's also particularly useful for helping find the number of
> >> new elements added to the used ring.
> >>
> >> A VirtQueue's current_seq_idx represents the current sequence index.
> >> This value is essentially a counter where the value is assigned to a new
> >> VirtQueueElement and then incremented. Given its uint16_t type, this
> >> sequence number can be between 0 and 65,535.
> >>
> >> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >> the VirtQueueElement when it was created. This value must match with the
> >> VirtQueue's used_seq_idx before the element can be put on the used ring
> >> by the device.
> >>
> >> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >> ---
> >>   hw/virtio/virtio.c         | 18 ++++++++++++++++++
> >>   include/hw/virtio/virtio.h |  1 +
> >>   2 files changed, 19 insertions(+)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index fb6b4ccd83..069d96df99 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -132,6 +132,10 @@ struct VirtQueue
> >>       uint16_t used_idx;
> >>       bool used_wrap_counter;
> >>
> >> +    /* In-Order sequence indices */
> >> +    uint16_t used_seq_idx;
> >> +    uint16_t current_seq_idx;
> >> +
> >
> > I'm having a hard time understanding the difference between these and
> > last_avail_idx and used_idx. It seems to me if we replace them
> > everything will work? What am I missing?
> >
>
> For used_seq_idx, it does work like used_idx except the difference is
> when their values get updated, specifically for the split VQ case.
>
> As you know, for the split VQ case, the used_idx is updated during
> virtqueue_split_flush. However, imagine a batch of elements coming in
> where virtqueue_split_fill is called multiple times before
> virtqueue_split_flush. We want to make sure we write these elements to
> the used ring in-order and we'll know its order based on used_seq_idx.
>
> Alternatively, I thought about replicating the logic for the packed VQ
> case (where this used_seq_idx isn't used) where we start looking at
> vq->used_elems[vq->used_idx] and iterate through until we find a used
> element, but I wasn't sure how to handle the case where elements get
> used (written to the used ring) and new elements get put in used_elems
> before the used_idx is updated. Since this search would require us to
> always start at index vq->used_idx.
>
> For example, say, of three elements getting filled (elem0 - elem2),
> elem1 and elem0 come back first (vq->used_idx = 0):
>
> elem1 - not in-order
> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>          in-order, write elem0 and elem1 to used ring, mark elements as
>          used
>
> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>
> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> could continue looking through the vq->used_elems array until we find an
> unused element... but then again how would we (1) know if the element is
> in-order and (2) know when to stop searching?
>

Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with this approach I think used_elems should be renamed actually.

> In any case, the use of this variable could be seen as an optimization
> as its value will tell us where to start looking in vq->used_elems
> instead of always starting at vq->used_idx.
>
> If this is like a one-shot scenario where one element gets written and
> then flushed after, then yes in this case used_seq_idx == used_idx.
>
> ------
>
> For current_seq_idx, this is pretty much just a counter. Every new
> VirtQueueElement created from virtqueue_pop is given a number and the
> counter is incremented. Like grabbing a ticket number and waiting for
> your number to be called. The next person to grab a ticket number will
> be your number + 1.
>

So it's like last_avail_idx, isn't it?

> Let me know if I'm making any sense. Thanks :)
>
> Jonah
>
> >>       /* Last used index value we have signalled on */
> >>       uint16_t signalled_used;
> >>
> >> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >>           elem->in_sg[i] = iov[out_num + i];
> >>       }
> >>
> >> +    /* Assign sequence index for in-order processing */
> >> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >> +        elem->seq_idx = vq->current_seq_idx++;
> >> +    }
> >> +
> >>       vq->inuse++;
> >>
> >>       trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >>       vq->shadow_avail_idx = vq->last_avail_idx;
> >>       vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
> >>
> >> +    /* Assign sequence index for in-order processing */
> >> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >> +        elem->seq_idx = vq->current_seq_idx++;
> >> +    }
> >> +
> >>       trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>   done:
> >>       address_space_cache_destroy(&indirect_desc_cache);
> >> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
> >>       vdev->vq[i].notification = true;
> >>       vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
> >>       vdev->vq[i].inuse = 0;
> >> +    vdev->vq[i].used_seq_idx = 0;
> >> +    vdev->vq[i].current_seq_idx = 0;
> >>       virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
> >>   }
> >>
> >> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>       vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
> >>       vdev->vq[i].handle_output = handle_output;
> >>       vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
> >> +    vdev->vq[i].used_seq_idx = 0;
> >> +    vdev->vq[i].current_seq_idx = 0;
> >>
> >>       return &vdev->vq[i];
> >>   }
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index b3c74a1bca..910b2a3427 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
> >>       hwaddr *out_addr;
> >>       struct iovec *in_sg;
> >>       struct iovec *out_sg;
> >> +    uint16_t seq_idx;
> >>   } VirtQueueElement;
> >>
> >>   #define VIRTIO_QUEUE_MAX 1024
> >> --
> >> 2.39.3
> >>
> >
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-04 11:35       ` Eugenio Perez Martin
@ 2024-04-04 14:41         ` Jonah Palmer
  2024-04-04 16:33           ` Eugenio Perez Martin
  0 siblings, 1 reply; 14+ messages in thread
From: Jonah Palmer @ 2024-04-04 14:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>
>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
>>>> structures. A VirtQueue's sequence variables are initialized when a
>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
>>>> variable is initialized when a VirtQueueElement is being initialized.
>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>>>>
>>>> A VirtQueue's used_seq_idx represents the next expected index in a
>>>> sequence of VirtQueueElements to be processed (put on the used ring).
>>>> The next VirtQueueElement added to the used ring must match this
>>>> sequence number before additional elements can be safely added to the
>>>> used ring. It's also particularly useful for helping find the number of
>>>> new elements added to the used ring.
>>>>
>>>> A VirtQueue's current_seq_idx represents the current sequence index.
>>>> This value is essentially a counter where the value is assigned to a new
>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
>>>> sequence number can be between 0 and 65,535.
>>>>
>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
>>>> the VirtQueueElement when it was created. This value must match with the
>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
>>>> by the device.
>>>>
>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>> ---
>>>>    hw/virtio/virtio.c         | 18 ++++++++++++++++++
>>>>    include/hw/virtio/virtio.h |  1 +
>>>>    2 files changed, 19 insertions(+)
>>>>
>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>> index fb6b4ccd83..069d96df99 100644
>>>> --- a/hw/virtio/virtio.c
>>>> +++ b/hw/virtio/virtio.c
>>>> @@ -132,6 +132,10 @@ struct VirtQueue
>>>>        uint16_t used_idx;
>>>>        bool used_wrap_counter;
>>>>
>>>> +    /* In-Order sequence indices */
>>>> +    uint16_t used_seq_idx;
>>>> +    uint16_t current_seq_idx;
>>>> +
>>>
>>> I'm having a hard time understanding the difference between these and
>>> last_avail_idx and used_idx. It seems to me if we replace them
>>> everything will work? What am I missing?
>>>
>>
>> For used_seq_idx, it does work like used_idx except the difference is
>> when their values get updated, specifically for the split VQ case.
>>
>> As you know, for the split VQ case, the used_idx is updated during
>> virtqueue_split_flush. However, imagine a batch of elements coming in
>> where virtqueue_split_fill is called multiple times before
>> virtqueue_split_flush. We want to make sure we write these elements to
>> the used ring in-order and we'll know its order based on used_seq_idx.
>>
>> Alternatively, I thought about replicating the logic for the packed VQ
>> case (where this used_seq_idx isn't used) where we start looking at
>> vq->used_elems[vq->used_idx] and iterate through until we find a used
>> element, but I wasn't sure how to handle the case where elements get
>> used (written to the used ring) and new elements get put in used_elems
>> before the used_idx is updated. Since this search would require us to
>> always start at index vq->used_idx.
>>
>> For example, say, of three elements getting filled (elem0 - elem2),
>> elem1 and elem0 come back first (vq->used_idx = 0):
>>
>> elem1 - not in-order
>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>>           in-order, write elem0 and elem1 to used ring, mark elements as
>>           used
>>
>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
>> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>>
>> Hmm... now that I'm thinking about it, maybe for the split VQ case we
>> could continue looking through the vq->used_elems array until we find an
>> unused element... but then again how would we (1) know if the element is
>> in-order and (2) know when to stop searching?
>>
> 
> Ok I think I understand the problem now. It is aggravated if we add
> chained descriptors to the mix.
> 
> We know that the order of used descriptors must be the exact same as
> the order they were made available, leaving out in order batching.
> What if vq->used_elems at virtqueue_pop and then virtqueue_push just
> marks them as used somehow? Two booleans (or flag) would do for a
> first iteration.
> 
> If we go with this approach I think used_elems should be renamed actually.
> 

If I'm understanding correctly, I don't think adding newly created 
elements to vq->used_elems at virtqueue_pop will do much for us. We 
could just keep adding processed elements to vq->used_elems at 
virtqueue_fill but instead of:

vq->used_elems[seq_idx].in_num = elem->in_num;
vq->used_elems[seq_idx].out_num = elem->out_num;

We could do:

vq->used_elems[seq_idx].in_num = 1;
vq->used_elems[seq_idx].out_num = 1;

We'd use in_num and out_num as separate flags. in_num could indicate if 
this element has been written to the used ring while out_num could 
indicate if this element has been flushed (1 for no, 0 for yes). In 
other words, when we go to write to the used ring, start at index 
vq->used_idx and iterate through the used elements.

If a used element's in_num and out_num == 0, then this element is 
invalid (not yet processed) and we stop the search.

If a used element's in_num and out_num == 1, then this element is valid, 
written to the used ring, in_num is set to 0, and the search continues.

Lastly, if a used element's in_num == 0 but out_num == 1, then this 
element has already been written to the used ring but not yet flushed, 
so ignore this element and continue searching.

There should never be a case where in_num == 1 and out_num == 0.

However, this would probably mean that before (or right after) we 
actually perform the flush we'll have to iterate through the used_elems 
array one more time and set their out_num's to 0 to indicate the element 
has been flushed.

Again, this is just for the batched split VQ case where we have to keep 
track of elements that have been written but not flushed and elements 
that have been written and flushed, given that we don't know which 
elements have actually been written to the used ring until the used_idx 
is updated.

This approach appears more costly though if we're really trying to avoid 
having this new used_seq_idx VirtQueue member.

>> In any case, the use of this variable could be seen as an optimization
>> as its value will tell us where to start looking in vq->used_elems
>> instead of always starting at vq->used_idx.
>>
>> If this is like a one-shot scenario where one element gets written and
>> then flushed after, then yes in this case used_seq_idx == used_idx.
>>
>> ------
>>
>> For current_seq_idx, this is pretty much just a counter. Every new
>> VirtQueueElement created from virtqueue_pop is given a number and the
>> counter is incremented. Like grabbing a ticket number and waiting for
>> your number to be called. The next person to grab a ticket number will
>> be your number + 1.
>>
> 
> So it's like last_avail_idx, isn't it?
> 

For the split VQ case, pretty much. Though if we hit this case in 
virtqueue_split_pop, we may get into some trouble:

if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
     goto done;
}

However for the packed VQ case, last_avail_idx might not work in the way 
we'd need it to for this implementation. In virtqueue_packed_pop, we see 
this:

elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
vq->last_avail_idx += elem->ndescs;

It would appear as though last_avail_idx is incremented by total number 
of descriptors associated with the element, which can be greater than 1. 
This implementation increments by 1 for each element.

Actually... It's good you mentioned this because I think my packed VQ 
implementation is wrong. For packed VQs, vq->used_idx is incremented by 
the total number of descriptors in the flushed elements and not 
necessarily the number of elements being flushed like in the split VQ 
case. I'm adding elements to vq->used_elems in a per-element sequence 
rather than going by the number of descriptors an element holds, which 
should be the case for packed VQs.

>> Let me know if I'm making any sense. Thanks :)
>>
>> Jonah
>>
>>>>        /* Last used index value we have signalled on */
>>>>        uint16_t signalled_used;
>>>>
>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>>>>            elem->in_sg[i] = iov[out_num + i];
>>>>        }
>>>>
>>>> +    /* Assign sequence index for in-order processing */
>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>> +    }
>>>> +
>>>>        vq->inuse++;
>>>>
>>>>        trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>>>>        vq->shadow_avail_idx = vq->last_avail_idx;
>>>>        vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>>>>
>>>> +    /* Assign sequence index for in-order processing */
>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>> +    }
>>>> +
>>>>        trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>    done:
>>>>        address_space_cache_destroy(&indirect_desc_cache);
>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
>>>>        vdev->vq[i].notification = true;
>>>>        vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>>>>        vdev->vq[i].inuse = 0;
>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>        virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
>>>>    }
>>>>
>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>>>        vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>>>>        vdev->vq[i].handle_output = handle_output;
>>>>        vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>
>>>>        return &vdev->vq[i];
>>>>    }
>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>>> index b3c74a1bca..910b2a3427 100644
>>>> --- a/include/hw/virtio/virtio.h
>>>> +++ b/include/hw/virtio/virtio.h
>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>>>>        hwaddr *out_addr;
>>>>        struct iovec *in_sg;
>>>>        struct iovec *out_sg;
>>>> +    uint16_t seq_idx;
>>>>    } VirtQueueElement;
>>>>
>>>>    #define VIRTIO_QUEUE_MAX 1024
>>>> --
>>>> 2.39.3
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-04 14:41         ` Jonah Palmer
@ 2024-04-04 16:33           ` Eugenio Perez Martin
  2024-04-05 13:58             ` Jonah Palmer
  0 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2024-04-04 16:33 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> > On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >>
> >>
> >> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>
> >>>> Initialize sequence variables for VirtQueue and VirtQueueElement
> >>>> structures. A VirtQueue's sequence variables are initialized when a
> >>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >>>> variable is initialized when a VirtQueueElement is being initialized.
> >>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>>>
> >>>> A VirtQueue's used_seq_idx represents the next expected index in a
> >>>> sequence of VirtQueueElements to be processed (put on the used ring).
> >>>> The next VirtQueueElement added to the used ring must match this
> >>>> sequence number before additional elements can be safely added to the
> >>>> used ring. It's also particularly useful for helping find the number of
> >>>> new elements added to the used ring.
> >>>>
> >>>> A VirtQueue's current_seq_idx represents the current sequence index.
> >>>> This value is essentially a counter where the value is assigned to a new
> >>>> VirtQueueElement and then incremented. Given its uint16_t type, this
> >>>> sequence number can be between 0 and 65,535.
> >>>>
> >>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >>>> the VirtQueueElement when it was created. This value must match with the
> >>>> VirtQueue's used_seq_idx before the element can be put on the used ring
> >>>> by the device.
> >>>>
> >>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >>>> ---
> >>>>    hw/virtio/virtio.c         | 18 ++++++++++++++++++
> >>>>    include/hw/virtio/virtio.h |  1 +
> >>>>    2 files changed, 19 insertions(+)
> >>>>
> >>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>> index fb6b4ccd83..069d96df99 100644
> >>>> --- a/hw/virtio/virtio.c
> >>>> +++ b/hw/virtio/virtio.c
> >>>> @@ -132,6 +132,10 @@ struct VirtQueue
> >>>>        uint16_t used_idx;
> >>>>        bool used_wrap_counter;
> >>>>
> >>>> +    /* In-Order sequence indices */
> >>>> +    uint16_t used_seq_idx;
> >>>> +    uint16_t current_seq_idx;
> >>>> +
> >>>
> >>> I'm having a hard time understanding the difference between these and
> >>> last_avail_idx and used_idx. It seems to me if we replace them
> >>> everything will work? What am I missing?
> >>>
> >>
> >> For used_seq_idx, it does work like used_idx except the difference is
> >> when their values get updated, specifically for the split VQ case.
> >>
> >> As you know, for the split VQ case, the used_idx is updated during
> >> virtqueue_split_flush. However, imagine a batch of elements coming in
> >> where virtqueue_split_fill is called multiple times before
> >> virtqueue_split_flush. We want to make sure we write these elements to
> >> the used ring in-order and we'll know its order based on used_seq_idx.
> >>
> >> Alternatively, I thought about replicating the logic for the packed VQ
> >> case (where this used_seq_idx isn't used) where we start looking at
> >> vq->used_elems[vq->used_idx] and iterate through until we find a used
> >> element, but I wasn't sure how to handle the case where elements get
> >> used (written to the used ring) and new elements get put in used_elems
> >> before the used_idx is updated. Since this search would require us to
> >> always start at index vq->used_idx.
> >>
> >> For example, say, of three elements getting filled (elem0 - elem2),
> >> elem1 and elem0 come back first (vq->used_idx = 0):
> >>
> >> elem1 - not in-order
> >> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
> >>           in-order, write elem0 and elem1 to used ring, mark elements as
> >>           used
> >>
> >> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> >> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> >> (elem1) and iterate to vq->used_idx + 2 (elem2)?
> >>
> >> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> >> could continue looking through the vq->used_elems array until we find an
> >> unused element... but then again how would we (1) know if the element is
> >> in-order and (2) know when to stop searching?
> >>
> >
> > Ok I think I understand the problem now. It is aggravated if we add
> > chained descriptors to the mix.
> >
> > We know that the order of used descriptors must be the exact same as
> > the order they were made available, leaving out in order batching.
> > What if vq->used_elems at virtqueue_pop and then virtqueue_push just
> > marks them as used somehow? Two booleans (or flag) would do for a
> > first iteration.
> >
> > If we go with this approach I think used_elems should be renamed actually.
> >
>
> If I'm understanding correctly, I don't think adding newly created
> elements to vq->used_elems at virtqueue_pop will do much for us.

By knowing what descriptor id must go in each position of the used ring.

Following your example, let's say avail_idx is 10 at that moment.
Then, the driver makes available the three elements you mention, so:
used_elems[10] = elem0
used_elems[11] = elem1
used_elems[12] = elem2

Now the device uses elem1. virtqueue_push can search linearly for
elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
the device is mis-behaving, no need to optimize this behavior.
used_elems[11].index == elem->index, so we mark it as used somehow.
Let's say we add a boolean to VirtQueueElement.

virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx]
looking for used elements. At this moment used_elems[used_idx] is not
used so it stops there.

Now elem0 is pushed. It is the first one in the
used_elems[used_idx]...used_elems[avail_idx] range, so we can write it
to the used ring at fill. used_idx++. We use the rest of the
descriptor until we find one in used_elems that is not used, which is
used_elems[12].

After that virtqueue_flush is called. At its scanning, used_elems[10]
is used, so it writes it to the used ring. After that, used_elems[11]
is also used, so it is written also. used_elems[12] is not used, so it
stops there.

Finally, elem2 is pushed, so used_elems[12] is written.
virtqueue_flush detects it, so it writes it to the guest's used ring.

Let me know what you think. I find it simpler than declaring new
indexes, but I may be wrong.

This makes it difficult to actually batch used descriptors. My
proposal is to address it in another series, by delegating it to the
caller and recovering proper usage of virtqueue_push(idx) parameter.
The caller can specify either to batch as usual, or to delegate the
automatic (and potentially inefficient) ordering I'm proposing here.

> We
> could just keep adding processed elements to vq->used_elems at
> virtqueue_fill but instead of:
>
> vq->used_elems[seq_idx].in_num = elem->in_num;
> vq->used_elems[seq_idx].out_num = elem->out_num;
>
> We could do:
>
> vq->used_elems[seq_idx].in_num = 1;
> vq->used_elems[seq_idx].out_num = 1;
>
> We'd use in_num and out_num as separate flags. in_num could indicate if
> this element has been written to the used ring while out_num could
> indicate if this element has been flushed (1 for no, 0 for yes). In
> other words, when we go to write to the used ring, start at index
> vq->used_idx and iterate through the used elements.
>
> If a used element's in_num and out_num == 0, then this element is
> invalid (not yet processed) and we stop the search.
>
> If a used element's in_num and out_num == 1, then this element is valid,
> written to the used ring, in_num is set to 0, and the search continues.
>
> Lastly, if a used element's in_num == 0 but out_num == 1, then this
> element has already been written to the used ring but not yet flushed,
> so ignore this element and continue searching.
>
> There should never be a case where in_num == 1 and out_num == 0.
>
> However, this would probably mean that before (or right after) we
> actually perform the flush we'll have to iterate through the used_elems
> array one more time and set their out_num's to 0 to indicate the element
> has been flushed.
>
> Again, this is just for the batched split VQ case where we have to keep
> track of elements that have been written but not flushed and elements
> that have been written and flushed, given that we don't know which
> elements have actually been written to the used ring until the used_idx
> is updated.
>
> This approach appears more costly though if we're really trying to avoid
> having this new used_seq_idx VirtQueue member.
>
> >> In any case, the use of this variable could be seen as an optimization
> >> as its value will tell us where to start looking in vq->used_elems
> >> instead of always starting at vq->used_idx.
> >>
> >> If this is like a one-shot scenario where one element gets written and
> >> then flushed after, then yes in this case used_seq_idx == used_idx.
> >>
> >> ------
> >>
> >> For current_seq_idx, this is pretty much just a counter. Every new
> >> VirtQueueElement created from virtqueue_pop is given a number and the
> >> counter is incremented. Like grabbing a ticket number and waiting for
> >> your number to be called. The next person to grab a ticket number will
> >> be your number + 1.
> >>
> >
> > So it's like last_avail_idx, isn't it?
> >
>
> For the split VQ case, pretty much. Though if we hit this case in
> virtqueue_split_pop, we may get into some trouble:
>
> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
>      goto done;
> }
>

That's a fatal mistake and the device breaks, vdev->broken = true. It
cannot be used anymore from that point on, because of all the checks
of that variable.

Does that solve the problem?

> However for the packed VQ case, last_avail_idx might not work in the way
> we'd need it to for this implementation. In virtqueue_packed_pop, we see
> this:
>
> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
> vq->last_avail_idx += elem->ndescs;
>
> It would appear as though last_avail_idx is incremented by total number
> of descriptors associated with the element, which can be greater than 1.
> This implementation increments by 1 for each element.
>
> Actually... It's good you mentioned this because I think my packed VQ
> implementation is wrong. For packed VQs, vq->used_idx is incremented by
> the total number of descriptors in the flushed elements and not
> necessarily the number of elements being flushed like in the split VQ
> case. I'm adding elements to vq->used_elems in a per-element sequence
> rather than going by the number of descriptors an element holds, which
> should be the case for packed VQs.
>

If you keep it by your proposed index I think you can increase it one
per head, as they are the entries that are written in both cases.
unsed_idx should increment properly already.

If you move to my proposal, both should increase by elem->ndescs as
you suggest here.

> >> Let me know if I'm making any sense. Thanks :)
> >>
> >> Jonah
> >>
> >>>>        /* Last used index value we have signalled on */
> >>>>        uint16_t signalled_used;
> >>>>
> >>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >>>>            elem->in_sg[i] = iov[out_num + i];
> >>>>        }
> >>>>
> >>>> +    /* Assign sequence index for in-order processing */
> >>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >>>> +        elem->seq_idx = vq->current_seq_idx++;
> >>>> +    }
> >>>> +
> >>>>        vq->inuse++;
> >>>>
> >>>>        trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >>>>        vq->shadow_avail_idx = vq->last_avail_idx;
> >>>>        vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
> >>>>
> >>>> +    /* Assign sequence index for in-order processing */
> >>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >>>> +        elem->seq_idx = vq->current_seq_idx++;
> >>>> +    }
> >>>> +
> >>>>        trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>>>    done:
> >>>>        address_space_cache_destroy(&indirect_desc_cache);
> >>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
> >>>>        vdev->vq[i].notification = true;
> >>>>        vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
> >>>>        vdev->vq[i].inuse = 0;
> >>>> +    vdev->vq[i].used_seq_idx = 0;
> >>>> +    vdev->vq[i].current_seq_idx = 0;
> >>>>        virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
> >>>>    }
> >>>>
> >>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>>>        vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
> >>>>        vdev->vq[i].handle_output = handle_output;
> >>>>        vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
> >>>> +    vdev->vq[i].used_seq_idx = 0;
> >>>> +    vdev->vq[i].current_seq_idx = 0;
> >>>>
> >>>>        return &vdev->vq[i];
> >>>>    }
> >>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >>>> index b3c74a1bca..910b2a3427 100644
> >>>> --- a/include/hw/virtio/virtio.h
> >>>> +++ b/include/hw/virtio/virtio.h
> >>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
> >>>>        hwaddr *out_addr;
> >>>>        struct iovec *in_sg;
> >>>>        struct iovec *out_sg;
> >>>> +    uint16_t seq_idx;
> >>>>    } VirtQueueElement;
> >>>>
> >>>>    #define VIRTIO_QUEUE_MAX 1024
> >>>> --
> >>>> 2.39.3
> >>>>
> >>>
> >>
> >
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-04 16:33           ` Eugenio Perez Martin
@ 2024-04-05 13:58             ` Jonah Palmer
  2024-04-05 15:04               ` Eugenio Perez Martin
  0 siblings, 1 reply; 14+ messages in thread
From: Jonah Palmer @ 2024-04-05 13:58 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:
> On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
>>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
>>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>>>
>>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
>>>>>> structures. A VirtQueue's sequence variables are initialized when a
>>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
>>>>>> variable is initialized when a VirtQueueElement is being initialized.
>>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>>>>>>
>>>>>> A VirtQueue's used_seq_idx represents the next expected index in a
>>>>>> sequence of VirtQueueElements to be processed (put on the used ring).
>>>>>> The next VirtQueueElement added to the used ring must match this
>>>>>> sequence number before additional elements can be safely added to the
>>>>>> used ring. It's also particularly useful for helping find the number of
>>>>>> new elements added to the used ring.
>>>>>>
>>>>>> A VirtQueue's current_seq_idx represents the current sequence index.
>>>>>> This value is essentially a counter where the value is assigned to a new
>>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
>>>>>> sequence number can be between 0 and 65,535.
>>>>>>
>>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
>>>>>> the VirtQueueElement when it was created. This value must match with the
>>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
>>>>>> by the device.
>>>>>>
>>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>>>> ---
>>>>>>     hw/virtio/virtio.c         | 18 ++++++++++++++++++
>>>>>>     include/hw/virtio/virtio.h |  1 +
>>>>>>     2 files changed, 19 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>> index fb6b4ccd83..069d96df99 100644
>>>>>> --- a/hw/virtio/virtio.c
>>>>>> +++ b/hw/virtio/virtio.c
>>>>>> @@ -132,6 +132,10 @@ struct VirtQueue
>>>>>>         uint16_t used_idx;
>>>>>>         bool used_wrap_counter;
>>>>>>
>>>>>> +    /* In-Order sequence indices */
>>>>>> +    uint16_t used_seq_idx;
>>>>>> +    uint16_t current_seq_idx;
>>>>>> +
>>>>>
>>>>> I'm having a hard time understanding the difference between these and
>>>>> last_avail_idx and used_idx. It seems to me if we replace them
>>>>> everything will work? What am I missing?
>>>>>
>>>>
>>>> For used_seq_idx, it does work like used_idx except the difference is
>>>> when their values get updated, specifically for the split VQ case.
>>>>
>>>> As you know, for the split VQ case, the used_idx is updated during
>>>> virtqueue_split_flush. However, imagine a batch of elements coming in
>>>> where virtqueue_split_fill is called multiple times before
>>>> virtqueue_split_flush. We want to make sure we write these elements to
>>>> the used ring in-order and we'll know its order based on used_seq_idx.
>>>>
>>>> Alternatively, I thought about replicating the logic for the packed VQ
>>>> case (where this used_seq_idx isn't used) where we start looking at
>>>> vq->used_elems[vq->used_idx] and iterate through until we find a used
>>>> element, but I wasn't sure how to handle the case where elements get
>>>> used (written to the used ring) and new elements get put in used_elems
>>>> before the used_idx is updated. Since this search would require us to
>>>> always start at index vq->used_idx.
>>>>
>>>> For example, say, of three elements getting filled (elem0 - elem2),
>>>> elem1 and elem0 come back first (vq->used_idx = 0):
>>>>
>>>> elem1 - not in-order
>>>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>>>>            in-order, write elem0 and elem1 to used ring, mark elements as
>>>>            used
>>>>
>>>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
>>>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
>>>> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>>>>
>>>> Hmm... now that I'm thinking about it, maybe for the split VQ case we
>>>> could continue looking through the vq->used_elems array until we find an
>>>> unused element... but then again how would we (1) know if the element is
>>>> in-order and (2) know when to stop searching?
>>>>
>>>
>>> Ok I think I understand the problem now. It is aggravated if we add
>>> chained descriptors to the mix.
>>>
>>> We know that the order of used descriptors must be the exact same as
>>> the order they were made available, leaving out in order batching.
>>> What if vq->used_elems at virtqueue_pop and then virtqueue_push just
>>> marks them as used somehow? Two booleans (or flag) would do for a
>>> first iteration.
>>>
>>> If we go with this approach I think used_elems should be renamed actually.
>>>
>>
>> If I'm understanding correctly, I don't think adding newly created
>> elements to vq->used_elems at virtqueue_pop will do much for us.
> 
> By knowing what descriptor id must go in each position of the used ring.
> 
> Following your example, let's say avail_idx is 10 at that moment.
> Then, the driver makes available the three elements you mention, so:
> used_elems[10] = elem0
> used_elems[11] = elem1
> used_elems[12] = elem2
> 
> Now the device uses elem1. virtqueue_push can search linearly for
> elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
> the device is mis-behaving, no need to optimize this behavior.
> used_elems[11].index == elem->index, so we mark it as used somehow.
> Let's say we add a boolean to VirtQueueElement.
> 
> virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx]
> looking for used elements. At this moment used_elems[used_idx] is not
> used so it stops there.
> 
> Now elem0 is pushed. It is the first one in the
> used_elems[used_idx]...used_elems[avail_idx] range, so we can write it
> to the used ring at fill. used_idx++. We use the rest of the
> descriptor until we find one in used_elems that is not used, which is
> used_elems[12].
> 
> After that virtqueue_flush is called. At its scanning, used_elems[10]
> is used, so it writes it to the used ring. After that, used_elems[11]
> is also used, so it is written also. used_elems[12] is not used, so it
> stops there.
> 
> Finally, elem2 is pushed, so used_elems[12] is written.
> virtqueue_flush detects it, so it writes it to the guest's used ring.
> 
> Let me know what you think. I find it simpler than declaring new
> indexes, but I may be wrong.
> 

I think I see where you're getting at, but I just have a few clarifying 
questions about your proposal here.

So you're proposing to add entries to used_elems at virtqueue_pop, ok.

avail_idx = 10, then the driver makes some new entries (elems) available 
in the avail ring:

used_elems[10] = elem0
used_elems[11] = elem1
used_elems[12] = elem2

At this point, avail_idx = 13, used_idx = 10.

elem1 gets used first, ok.

Now, if I'm understanding correctly, you're saying that in 
virtqueue_push explicitly (not virtqueue_fill/virtqueue_flush), we scan 
used_elems[used_idx] - used_elems[avail_idx] to find used_elems[i].index 
== elem->index and mark it as used, e.g. used_elems[i].used = true. 
Okay, now used_elems[11] has been marked as used.

Now we make it to virtqueue_fill. What's the role you want 
virtqueue_fill to take here (if any)?

You say that after we mark this element as used, we go to 
virtqueue_flush and scan for used elements in used_elems[used_idx] - 
used_elems[avail_idx]. Of course, the first one of this order will be in 
used_elems[used_idx], which is currently showing the element as unused, 
so we're done with this element for now.

So, what exactly is the role of virtqueue_flush here? I'm inferring here 
that you want the virtqueue_flush role (for split VQs) to do both the 
writing to the used ring (normally done in virtqueue_fill) as well as 
updating the used_idx (normally done in virtqueue_flush). Is this correct?

Next, elem0 gets used second.

Again, in virtqueue_push we scan scan used_elems[used_idx] - 
used_elems[avail_idx] to find used_elems[i].index == elem->index and 
mark it as used. Okay, used_elems[10] has been marked as used.

Then you say, "It is the first one in the used_elems[used_idx] - 
used_elems[avail_idx] range, so we can write it to the used ring at 
fill. used_idx++. We use the rest of the descriptor until we find one in 
used_elems that is not used, which is used_elems[12]."

This, to me, sounds like "in virtqueue_fill, when we find an order (e.g. 
used_elems[used_idx].index == elem->index) write it to the used ring AND 
increment the used_idx. Keep writing elements to the used ring if 
used_elems[used_idx].used == true and, for each element being written, 
incremented used_idx."

This is a bit confusing to me since next you say "After that, 
virtqueue_flush is called. At its scanning, used_elems[10] is used, so 
it writes it to the used ring. After that, used_elems[11] is also used, 
so it is written also. used_elems[12] is not used, so it stops there."

This sounds very similar to what you proposed for virtqueue_fill, except 
it looks like you're also saying to do this in virtqueue_flush, hence my 
confusion.

If you wouldn't mind, could you clarify the roles of virtqueue_fill and 
virtqueue_flush here for me? Thanks :)!

> This makes it difficult to actually batch used descriptors. My
> proposal is to address it in another series, by delegating it to the
> caller and recovering proper usage of virtqueue_push(idx) parameter.
> The caller can specify either to batch as usual, or to delegate the
> automatic (and potentially inefficient) ordering I'm proposing here.
> 

Just to be clear, for this series, you'd like me to implement a solution 
that does *not* consider the case where virtqueue_fill is called 
multiple times before virtqueue_flush (and to put a solution for this in 
a separate series)?

Are we not concerned that we might shoot ourselves in the foot here by 
implementing a process that may not work well for a batching solution, 
especially when we have an almost-working solution for batching and 
non-batching cases?

>> We
>> could just keep adding processed elements to vq->used_elems at
>> virtqueue_fill but instead of:
>>
>> vq->used_elems[seq_idx].in_num = elem->in_num;
>> vq->used_elems[seq_idx].out_num = elem->out_num;
>>
>> We could do:
>>
>> vq->used_elems[seq_idx].in_num = 1;
>> vq->used_elems[seq_idx].out_num = 1;
>>
>> We'd use in_num and out_num as separate flags. in_num could indicate if
>> this element has been written to the used ring while out_num could
>> indicate if this element has been flushed (1 for no, 0 for yes). In
>> other words, when we go to write to the used ring, start at index
>> vq->used_idx and iterate through the used elements.
>>
>> If a used element's in_num and out_num == 0, then this element is
>> invalid (not yet processed) and we stop the search.
>>
>> If a used element's in_num and out_num == 1, then this element is valid,
>> written to the used ring, in_num is set to 0, and the search continues.
>>
>> Lastly, if a used element's in_num == 0 but out_num == 1, then this
>> element has already been written to the used ring but not yet flushed,
>> so ignore this element and continue searching.
>>
>> There should never be a case where in_num == 1 and out_num == 0.
>>
>> However, this would probably mean that before (or right after) we
>> actually perform the flush we'll have to iterate through the used_elems
>> array one more time and set their out_num's to 0 to indicate the element
>> has been flushed.
>>
>> Again, this is just for the batched split VQ case where we have to keep
>> track of elements that have been written but not flushed and elements
>> that have been written and flushed, given that we don't know which
>> elements have actually been written to the used ring until the used_idx
>> is updated.
>>
>> This approach appears more costly though if we're really trying to avoid
>> having this new used_seq_idx VirtQueue member.
>>
>>>> In any case, the use of this variable could be seen as an optimization
>>>> as its value will tell us where to start looking in vq->used_elems
>>>> instead of always starting at vq->used_idx.
>>>>
>>>> If this is like a one-shot scenario where one element gets written and
>>>> then flushed after, then yes in this case used_seq_idx == used_idx.
>>>>
>>>> ------
>>>>
>>>> For current_seq_idx, this is pretty much just a counter. Every new
>>>> VirtQueueElement created from virtqueue_pop is given a number and the
>>>> counter is incremented. Like grabbing a ticket number and waiting for
>>>> your number to be called. The next person to grab a ticket number will
>>>> be your number + 1.
>>>>
>>>
>>> So it's like last_avail_idx, isn't it?
>>>
>>
>> For the split VQ case, pretty much. Though if we hit this case in
>> virtqueue_split_pop, we may get into some trouble:
>>
>> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
>>       goto done;
>> }
>>
> 
> That's a fatal mistake and the device breaks, vdev->broken = true. It
> cannot be used anymore from that point on, because of all the checks
> of that variable.
> 
> Does that solve the problem?
> 

Ah, it does. My apologies, I should've recognized this would result in 
the device breaking.

>> However for the packed VQ case, last_avail_idx might not work in the way
>> we'd need it to for this implementation. In virtqueue_packed_pop, we see
>> this:
>>
>> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
>> vq->last_avail_idx += elem->ndescs;
>>
>> It would appear as though last_avail_idx is incremented by total number
>> of descriptors associated with the element, which can be greater than 1.
>> This implementation increments by 1 for each element.
>>
>> Actually... It's good you mentioned this because I think my packed VQ
>> implementation is wrong. For packed VQs, vq->used_idx is incremented by
>> the total number of descriptors in the flushed elements and not
>> necessarily the number of elements being flushed like in the split VQ
>> case. I'm adding elements to vq->used_elems in a per-element sequence
>> rather than going by the number of descriptors an element holds, which
>> should be the case for packed VQs.
>>
> 
> If you keep it by your proposed index I think you can increase it one
> per head, as they are the entries that are written in both cases.
> unsed_idx should increment properly already.
> 
> If you move to my proposal, both should increase by elem->ndescs as
> you suggest here.
> 

Ack! Thanks!

>>>> Let me know if I'm making any sense. Thanks :)
>>>>
>>>> Jonah
>>>>
>>>>>>         /* Last used index value we have signalled on */
>>>>>>         uint16_t signalled_used;
>>>>>>
>>>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>>>>>>             elem->in_sg[i] = iov[out_num + i];
>>>>>>         }
>>>>>>
>>>>>> +    /* Assign sequence index for in-order processing */
>>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>>>> +    }
>>>>>> +
>>>>>>         vq->inuse++;
>>>>>>
>>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>>>>>>         vq->shadow_avail_idx = vq->last_avail_idx;
>>>>>>         vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>>>>>>
>>>>>> +    /* Assign sequence index for in-order processing */
>>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>>>> +    }
>>>>>> +
>>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>>>     done:
>>>>>>         address_space_cache_destroy(&indirect_desc_cache);
>>>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
>>>>>>         vdev->vq[i].notification = true;
>>>>>>         vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>>>>>>         vdev->vq[i].inuse = 0;
>>>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>>>         virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
>>>>>>     }
>>>>>>
>>>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>>>>>         vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>>>>>>         vdev->vq[i].handle_output = handle_output;
>>>>>>         vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
>>>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>>>
>>>>>>         return &vdev->vq[i];
>>>>>>     }
>>>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>>>>> index b3c74a1bca..910b2a3427 100644
>>>>>> --- a/include/hw/virtio/virtio.h
>>>>>> +++ b/include/hw/virtio/virtio.h
>>>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>>>>>>         hwaddr *out_addr;
>>>>>>         struct iovec *in_sg;
>>>>>>         struct iovec *out_sg;
>>>>>> +    uint16_t seq_idx;
>>>>>>     } VirtQueueElement;
>>>>>>
>>>>>>     #define VIRTIO_QUEUE_MAX 1024
>>>>>> --
>>>>>> 2.39.3
>>>>>>
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-05 13:58             ` Jonah Palmer
@ 2024-04-05 15:04               ` Eugenio Perez Martin
  2024-04-05 15:37                 ` Jonah Palmer
  0 siblings, 1 reply; 14+ messages in thread
From: Eugenio Perez Martin @ 2024-04-05 15:04 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky

On Fri, Apr 5, 2024 at 3:59 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:
> > On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >>
> >>
> >> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> >>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> >>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>>>
> >>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
> >>>>>> structures. A VirtQueue's sequence variables are initialized when a
> >>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >>>>>> variable is initialized when a VirtQueueElement is being initialized.
> >>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>>>>>
> >>>>>> A VirtQueue's used_seq_idx represents the next expected index in a
> >>>>>> sequence of VirtQueueElements to be processed (put on the used ring).
> >>>>>> The next VirtQueueElement added to the used ring must match this
> >>>>>> sequence number before additional elements can be safely added to the
> >>>>>> used ring. It's also particularly useful for helping find the number of
> >>>>>> new elements added to the used ring.
> >>>>>>
> >>>>>> A VirtQueue's current_seq_idx represents the current sequence index.
> >>>>>> This value is essentially a counter where the value is assigned to a new
> >>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
> >>>>>> sequence number can be between 0 and 65,535.
> >>>>>>
> >>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >>>>>> the VirtQueueElement when it was created. This value must match with the
> >>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
> >>>>>> by the device.
> >>>>>>
> >>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >>>>>> ---
> >>>>>>     hw/virtio/virtio.c         | 18 ++++++++++++++++++
> >>>>>>     include/hw/virtio/virtio.h |  1 +
> >>>>>>     2 files changed, 19 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>> index fb6b4ccd83..069d96df99 100644
> >>>>>> --- a/hw/virtio/virtio.c
> >>>>>> +++ b/hw/virtio/virtio.c
> >>>>>> @@ -132,6 +132,10 @@ struct VirtQueue
> >>>>>>         uint16_t used_idx;
> >>>>>>         bool used_wrap_counter;
> >>>>>>
> >>>>>> +    /* In-Order sequence indices */
> >>>>>> +    uint16_t used_seq_idx;
> >>>>>> +    uint16_t current_seq_idx;
> >>>>>> +
> >>>>>
> >>>>> I'm having a hard time understanding the difference between these and
> >>>>> last_avail_idx and used_idx. It seems to me if we replace them
> >>>>> everything will work? What am I missing?
> >>>>>
> >>>>
> >>>> For used_seq_idx, it does work like used_idx except the difference is
> >>>> when their values get updated, specifically for the split VQ case.
> >>>>
> >>>> As you know, for the split VQ case, the used_idx is updated during
> >>>> virtqueue_split_flush. However, imagine a batch of elements coming in
> >>>> where virtqueue_split_fill is called multiple times before
> >>>> virtqueue_split_flush. We want to make sure we write these elements to
> >>>> the used ring in-order and we'll know its order based on used_seq_idx.
> >>>>
> >>>> Alternatively, I thought about replicating the logic for the packed VQ
> >>>> case (where this used_seq_idx isn't used) where we start looking at
> >>>> vq->used_elems[vq->used_idx] and iterate through until we find a used
> >>>> element, but I wasn't sure how to handle the case where elements get
> >>>> used (written to the used ring) and new elements get put in used_elems
> >>>> before the used_idx is updated. Since this search would require us to
> >>>> always start at index vq->used_idx.
> >>>>
> >>>> For example, say, of three elements getting filled (elem0 - elem2),
> >>>> elem1 and elem0 come back first (vq->used_idx = 0):
> >>>>
> >>>> elem1 - not in-order
> >>>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
> >>>>            in-order, write elem0 and elem1 to used ring, mark elements as
> >>>>            used
> >>>>
> >>>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> >>>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> >>>> (elem1) and iterate to vq->used_idx + 2 (elem2)?
> >>>>
> >>>> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> >>>> could continue looking through the vq->used_elems array until we find an
> >>>> unused element... but then again how would we (1) know if the element is
> >>>> in-order and (2) know when to stop searching?
> >>>>
> >>>
> >>> Ok I think I understand the problem now. It is aggravated if we add
> >>> chained descriptors to the mix.
> >>>
> >>> We know that the order of used descriptors must be the exact same as
> >>> the order they were made available, leaving out in order batching.
> >>> What if vq->used_elems at virtqueue_pop and then virtqueue_push just
> >>> marks them as used somehow? Two booleans (or flag) would do for a
> >>> first iteration.
> >>>
> >>> If we go with this approach I think used_elems should be renamed actually.
> >>>
> >>
> >> If I'm understanding correctly, I don't think adding newly created
> >> elements to vq->used_elems at virtqueue_pop will do much for us.
> >
> > By knowing what descriptor id must go in each position of the used ring.
> >
> > Following your example, let's say avail_idx is 10 at that moment.
> > Then, the driver makes available the three elements you mention, so:
> > used_elems[10] = elem0
> > used_elems[11] = elem1
> > used_elems[12] = elem2
> >
> > Now the device uses elem1. virtqueue_push can search linearly for
> > elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
> > the device is mis-behaving, no need to optimize this behavior.
> > used_elems[11].index == elem->index, so we mark it as used somehow.
> > Let's say we add a boolean to VirtQueueElement.
> >
> > virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx]
> > looking for used elements. At this moment used_elems[used_idx] is not
> > used so it stops there.
> >
> > Now elem0 is pushed. It is the first one in the
> > used_elems[used_idx]...used_elems[avail_idx] range, so we can write it
> > to the used ring at fill. used_idx++. We use the rest of the
> > descriptor until we find one in used_elems that is not used, which is
> > used_elems[12].
> >
> > After that virtqueue_flush is called. At its scanning, used_elems[10]
> > is used, so it writes it to the used ring. After that, used_elems[11]
> > is also used, so it is written also. used_elems[12] is not used, so it
> > stops there.
> >
> > Finally, elem2 is pushed, so used_elems[12] is written.
> > virtqueue_flush detects it, so it writes it to the guest's used ring.
> >
> > Let me know what you think. I find it simpler than declaring new
> > indexes, but I may be wrong.
> >
>
> I think I see where you're getting at, but I just have a few clarifying
> questions about your proposal here.
>
> So you're proposing to add entries to used_elems at virtqueue_pop, ok.
>
> avail_idx = 10, then the driver makes some new entries (elems) available
> in the avail ring:
>
> used_elems[10] = elem0
> used_elems[11] = elem1
> used_elems[12] = elem2
>
> At this point, avail_idx = 13, used_idx = 10.
>
> elem1 gets used first, ok.
>
> Now, if I'm understanding correctly, you're saying that in
> virtqueue_push explicitly (not virtqueue_fill/virtqueue_flush), we scan
> used_elems[used_idx] - used_elems[avail_idx] to find used_elems[i].index
> == elem->index and mark it as used, e.g. used_elems[i].used = true.
> Okay, now used_elems[11] has been marked as used.
>
> Now we make it to virtqueue_fill. What's the role you want
> virtqueue_fill to take here (if any)?
>

Sorry I meant virtqueue_fill here.

> You say that after we mark this element as used, we go to
> virtqueue_flush and scan for used elements in used_elems[used_idx] -
> used_elems[avail_idx]. Of course, the first one of this order will be in
> used_elems[used_idx], which is currently showing the element as unused,
> so we're done with this element for now.
>
> So, what exactly is the role of virtqueue_flush here? I'm inferring here
> that you want the virtqueue_flush role (for split VQs) to do both the
> writing to the used ring (normally done in virtqueue_fill) as well as
> updating the used_idx (normally done in virtqueue_flush). Is this correct?
>

I modelled this after the packed vq scenario, where all is updated at
_flush. But yes, I think you got it totally right.

> Next, elem0 gets used second.
>
> Again, in virtqueue_push we scan scan used_elems[used_idx] -
> used_elems[avail_idx] to find used_elems[i].index == elem->index and
> mark it as used. Okay, used_elems[10] has been marked as used.
>
> Then you say, "It is the first one in the used_elems[used_idx] -
> used_elems[avail_idx] range, so we can write it to the used ring at
> fill. used_idx++. We use the rest of the descriptor until we find one in
> used_elems that is not used, which is used_elems[12]."
>
> This, to me, sounds like "in virtqueue_fill, when we find an order (e.g.
> used_elems[used_idx].index == elem->index) write it to the used ring AND
> increment the used_idx. Keep writing elements to the used ring if
> used_elems[used_idx].used == true and, for each element being written,
> incremented used_idx."
>
> This is a bit confusing to me since next you say "After that,
> virtqueue_flush is called. At its scanning, used_elems[10] is used, so
> it writes it to the used ring. After that, used_elems[11] is also used,
> so it is written also. used_elems[12] is not used, so it stops there."
>
> This sounds very similar to what you proposed for virtqueue_fill, except
> it looks like you're also saying to do this in virtqueue_flush, hence my
> confusion.
>
> If you wouldn't mind, could you clarify the roles of virtqueue_fill and
> virtqueue_flush here for me? Thanks :)!
>

I see how they're confusing if following the split vq way, sorry!
* _fill: Only update used_elems (or equivalent)
* _flush: Write information to used ring or descriptor ring.

> > This makes it difficult to actually batch used descriptors. My
> > proposal is to address it in another series, by delegating it to the
> > caller and recovering proper usage of virtqueue_push(idx) parameter.
> > The caller can specify either to batch as usual, or to delegate the
> > automatic (and potentially inefficient) ordering I'm proposing here.
> >
>
> Just to be clear, for this series, you'd like me to implement a solution
> that does *not* consider the case where virtqueue_fill is called
> multiple times before virtqueue_flush (and to put a solution for this in
> a separate series)?
>

No, it is supported. It is just not very efficient because of the linear search.

For it to be supported properly the caller should indicate
virtqueue_fill idx properly. But that requires modifications to all
devices, so I'm proposing to do it on top.

> Are we not concerned that we might shoot ourselves in the foot here by
> implementing a process that may not work well for a batching solution,
> especially when we have an almost-working solution for batching and
> non-batching cases?
>
> >> We
> >> could just keep adding processed elements to vq->used_elems at
> >> virtqueue_fill but instead of:
> >>
> >> vq->used_elems[seq_idx].in_num = elem->in_num;
> >> vq->used_elems[seq_idx].out_num = elem->out_num;
> >>
> >> We could do:
> >>
> >> vq->used_elems[seq_idx].in_num = 1;
> >> vq->used_elems[seq_idx].out_num = 1;
> >>
> >> We'd use in_num and out_num as separate flags. in_num could indicate if
> >> this element has been written to the used ring while out_num could
> >> indicate if this element has been flushed (1 for no, 0 for yes). In
> >> other words, when we go to write to the used ring, start at index
> >> vq->used_idx and iterate through the used elements.
> >>
> >> If a used element's in_num and out_num == 0, then this element is
> >> invalid (not yet processed) and we stop the search.
> >>
> >> If a used element's in_num and out_num == 1, then this element is valid,
> >> written to the used ring, in_num is set to 0, and the search continues.
> >>
> >> Lastly, if a used element's in_num == 0 but out_num == 1, then this
> >> element has already been written to the used ring but not yet flushed,
> >> so ignore this element and continue searching.
> >>
> >> There should never be a case where in_num == 1 and out_num == 0.
> >>
> >> However, this would probably mean that before (or right after) we
> >> actually perform the flush we'll have to iterate through the used_elems
> >> array one more time and set their out_num's to 0 to indicate the element
> >> has been flushed.
> >>
> >> Again, this is just for the batched split VQ case where we have to keep
> >> track of elements that have been written but not flushed and elements
> >> that have been written and flushed, given that we don't know which
> >> elements have actually been written to the used ring until the used_idx
> >> is updated.
> >>
> >> This approach appears more costly though if we're really trying to avoid
> >> having this new used_seq_idx VirtQueue member.
> >>
> >>>> In any case, the use of this variable could be seen as an optimization
> >>>> as its value will tell us where to start looking in vq->used_elems
> >>>> instead of always starting at vq->used_idx.
> >>>>
> >>>> If this is like a one-shot scenario where one element gets written and
> >>>> then flushed after, then yes in this case used_seq_idx == used_idx.
> >>>>
> >>>> ------
> >>>>
> >>>> For current_seq_idx, this is pretty much just a counter. Every new
> >>>> VirtQueueElement created from virtqueue_pop is given a number and the
> >>>> counter is incremented. Like grabbing a ticket number and waiting for
> >>>> your number to be called. The next person to grab a ticket number will
> >>>> be your number + 1.
> >>>>
> >>>
> >>> So it's like last_avail_idx, isn't it?
> >>>
> >>
> >> For the split VQ case, pretty much. Though if we hit this case in
> >> virtqueue_split_pop, we may get into some trouble:
> >>
> >> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
> >>       goto done;
> >> }
> >>
> >
> > That's a fatal mistake and the device breaks, vdev->broken = true. It
> > cannot be used anymore from that point on, because of all the checks
> > of that variable.
> >
> > Does that solve the problem?
> >
>
> Ah, it does. My apologies, I should've recognized this would result in
> the device breaking.
>
> >> However for the packed VQ case, last_avail_idx might not work in the way
> >> we'd need it to for this implementation. In virtqueue_packed_pop, we see
> >> this:
> >>
> >> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
> >> vq->last_avail_idx += elem->ndescs;
> >>
> >> It would appear as though last_avail_idx is incremented by total number
> >> of descriptors associated with the element, which can be greater than 1.
> >> This implementation increments by 1 for each element.
> >>
> >> Actually... It's good you mentioned this because I think my packed VQ
> >> implementation is wrong. For packed VQs, vq->used_idx is incremented by
> >> the total number of descriptors in the flushed elements and not
> >> necessarily the number of elements being flushed like in the split VQ
> >> case. I'm adding elements to vq->used_elems in a per-element sequence
> >> rather than going by the number of descriptors an element holds, which
> >> should be the case for packed VQs.
> >>
> >
> > If you keep it by your proposed index I think you can increase it one
> > per head, as they are the entries that are written in both cases.
> > unsed_idx should increment properly already.
> >
> > If you move to my proposal, both should increase by elem->ndescs as
> > you suggest here.
> >
>
> Ack! Thanks!
>
> >>>> Let me know if I'm making any sense. Thanks :)
> >>>>
> >>>> Jonah
> >>>>
> >>>>>>         /* Last used index value we have signalled on */
> >>>>>>         uint16_t signalled_used;
> >>>>>>
> >>>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >>>>>>             elem->in_sg[i] = iov[out_num + i];
> >>>>>>         }
> >>>>>>
> >>>>>> +    /* Assign sequence index for in-order processing */
> >>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >>>>>> +        elem->seq_idx = vq->current_seq_idx++;
> >>>>>> +    }
> >>>>>> +
> >>>>>>         vq->inuse++;
> >>>>>>
> >>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >>>>>>         vq->shadow_avail_idx = vq->last_avail_idx;
> >>>>>>         vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
> >>>>>>
> >>>>>> +    /* Assign sequence index for in-order processing */
> >>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >>>>>> +        elem->seq_idx = vq->current_seq_idx++;
> >>>>>> +    }
> >>>>>> +
> >>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>>>>>     done:
> >>>>>>         address_space_cache_destroy(&indirect_desc_cache);
> >>>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
> >>>>>>         vdev->vq[i].notification = true;
> >>>>>>         vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
> >>>>>>         vdev->vq[i].inuse = 0;
> >>>>>> +    vdev->vq[i].used_seq_idx = 0;
> >>>>>> +    vdev->vq[i].current_seq_idx = 0;
> >>>>>>         virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
> >>>>>>     }
> >>>>>>
> >>>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>>>>>         vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
> >>>>>>         vdev->vq[i].handle_output = handle_output;
> >>>>>>         vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
> >>>>>> +    vdev->vq[i].used_seq_idx = 0;
> >>>>>> +    vdev->vq[i].current_seq_idx = 0;
> >>>>>>
> >>>>>>         return &vdev->vq[i];
> >>>>>>     }
> >>>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >>>>>> index b3c74a1bca..910b2a3427 100644
> >>>>>> --- a/include/hw/virtio/virtio.h
> >>>>>> +++ b/include/hw/virtio/virtio.h
> >>>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
> >>>>>>         hwaddr *out_addr;
> >>>>>>         struct iovec *in_sg;
> >>>>>>         struct iovec *out_sg;
> >>>>>> +    uint16_t seq_idx;
> >>>>>>     } VirtQueueElement;
> >>>>>>
> >>>>>>     #define VIRTIO_QUEUE_MAX 1024
> >>>>>> --
> >>>>>> 2.39.3
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC v2 1/5] virtio: Initialize sequence variables
  2024-04-05 15:04               ` Eugenio Perez Martin
@ 2024-04-05 15:37                 ` Jonah Palmer
  0 siblings, 0 replies; 14+ messages in thread
From: Jonah Palmer @ 2024-04-05 15:37 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: qemu-devel, mst, raphael, kwolf, hreitz, jasowang, pbonzini, fam,
	stefanha, qemu-block, schalla, leiyang, virtio-fs, si-wei.liu,
	boris.ostrovsky



On 4/5/24 11:04 AM, Eugenio Perez Martin wrote:
> On Fri, Apr 5, 2024 at 3:59 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:
>>> On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
>>>>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
>>>>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>>>>>
>>>>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
>>>>>>>> structures. A VirtQueue's sequence variables are initialized when a
>>>>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
>>>>>>>> variable is initialized when a VirtQueueElement is being initialized.
>>>>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>>>>>>>>
>>>>>>>> A VirtQueue's used_seq_idx represents the next expected index in a
>>>>>>>> sequence of VirtQueueElements to be processed (put on the used ring).
>>>>>>>> The next VirtQueueElement added to the used ring must match this
>>>>>>>> sequence number before additional elements can be safely added to the
>>>>>>>> used ring. It's also particularly useful for helping find the number of
>>>>>>>> new elements added to the used ring.
>>>>>>>>
>>>>>>>> A VirtQueue's current_seq_idx represents the current sequence index.
>>>>>>>> This value is essentially a counter where the value is assigned to a new
>>>>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
>>>>>>>> sequence number can be between 0 and 65,535.
>>>>>>>>
>>>>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
>>>>>>>> the VirtQueueElement when it was created. This value must match with the
>>>>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
>>>>>>>> by the device.
>>>>>>>>
>>>>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>>>>>> ---
>>>>>>>>      hw/virtio/virtio.c         | 18 ++++++++++++++++++
>>>>>>>>      include/hw/virtio/virtio.h |  1 +
>>>>>>>>      2 files changed, 19 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>>>> index fb6b4ccd83..069d96df99 100644
>>>>>>>> --- a/hw/virtio/virtio.c
>>>>>>>> +++ b/hw/virtio/virtio.c
>>>>>>>> @@ -132,6 +132,10 @@ struct VirtQueue
>>>>>>>>          uint16_t used_idx;
>>>>>>>>          bool used_wrap_counter;
>>>>>>>>
>>>>>>>> +    /* In-Order sequence indices */
>>>>>>>> +    uint16_t used_seq_idx;
>>>>>>>> +    uint16_t current_seq_idx;
>>>>>>>> +
>>>>>>>
>>>>>>> I'm having a hard time understanding the difference between these and
>>>>>>> last_avail_idx and used_idx. It seems to me if we replace them
>>>>>>> everything will work? What am I missing?
>>>>>>>
>>>>>>
>>>>>> For used_seq_idx, it does work like used_idx except the difference is
>>>>>> when their values get updated, specifically for the split VQ case.
>>>>>>
>>>>>> As you know, for the split VQ case, the used_idx is updated during
>>>>>> virtqueue_split_flush. However, imagine a batch of elements coming in
>>>>>> where virtqueue_split_fill is called multiple times before
>>>>>> virtqueue_split_flush. We want to make sure we write these elements to
>>>>>> the used ring in-order and we'll know its order based on used_seq_idx.
>>>>>>
>>>>>> Alternatively, I thought about replicating the logic for the packed VQ
>>>>>> case (where this used_seq_idx isn't used) where we start looking at
>>>>>> vq->used_elems[vq->used_idx] and iterate through until we find a used
>>>>>> element, but I wasn't sure how to handle the case where elements get
>>>>>> used (written to the used ring) and new elements get put in used_elems
>>>>>> before the used_idx is updated. Since this search would require us to
>>>>>> always start at index vq->used_idx.
>>>>>>
>>>>>> For example, say, of three elements getting filled (elem0 - elem2),
>>>>>> elem1 and elem0 come back first (vq->used_idx = 0):
>>>>>>
>>>>>> elem1 - not in-order
>>>>>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>>>>>>             in-order, write elem0 and elem1 to used ring, mark elements as
>>>>>>             used
>>>>>>
>>>>>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
>>>>>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
>>>>>> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>>>>>>
>>>>>> Hmm... now that I'm thinking about it, maybe for the split VQ case we
>>>>>> could continue looking through the vq->used_elems array until we find an
>>>>>> unused element... but then again how would we (1) know if the element is
>>>>>> in-order and (2) know when to stop searching?
>>>>>>
>>>>>
>>>>> Ok I think I understand the problem now. It is aggravated if we add
>>>>> chained descriptors to the mix.
>>>>>
>>>>> We know that the order of used descriptors must be the exact same as
>>>>> the order they were made available, leaving out in order batching.
>>>>> What if vq->used_elems at virtqueue_pop and then virtqueue_push just
>>>>> marks them as used somehow? Two booleans (or flag) would do for a
>>>>> first iteration.
>>>>>
>>>>> If we go with this approach I think used_elems should be renamed actually.
>>>>>
>>>>
>>>> If I'm understanding correctly, I don't think adding newly created
>>>> elements to vq->used_elems at virtqueue_pop will do much for us.
>>>
>>> By knowing what descriptor id must go in each position of the used ring.
>>>
>>> Following your example, let's say avail_idx is 10 at that moment.
>>> Then, the driver makes available the three elements you mention, so:
>>> used_elems[10] = elem0
>>> used_elems[11] = elem1
>>> used_elems[12] = elem2
>>>
>>> Now the device uses elem1. virtqueue_push can search linearly for
>>> elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
>>> the device is mis-behaving, no need to optimize this behavior.
>>> used_elems[11].index == elem->index, so we mark it as used somehow.
>>> Let's say we add a boolean to VirtQueueElement.
>>>
>>> virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx]
>>> looking for used elements. At this moment used_elems[used_idx] is not
>>> used so it stops there.
>>>
>>> Now elem0 is pushed. It is the first one in the
>>> used_elems[used_idx]...used_elems[avail_idx] range, so we can write it
>>> to the used ring at fill. used_idx++. We use the rest of the
>>> descriptor until we find one in used_elems that is not used, which is
>>> used_elems[12].
>>>
>>> After that virtqueue_flush is called. At its scanning, used_elems[10]
>>> is used, so it writes it to the used ring. After that, used_elems[11]
>>> is also used, so it is written also. used_elems[12] is not used, so it
>>> stops there.
>>>
>>> Finally, elem2 is pushed, so used_elems[12] is written.
>>> virtqueue_flush detects it, so it writes it to the guest's used ring.
>>>
>>> Let me know what you think. I find it simpler than declaring new
>>> indexes, but I may be wrong.
>>>
>>
>> I think I see where you're getting at, but I just have a few clarifying
>> questions about your proposal here.
>>
>> So you're proposing to add entries to used_elems at virtqueue_pop, ok.
>>
>> avail_idx = 10, then the driver makes some new entries (elems) available
>> in the avail ring:
>>
>> used_elems[10] = elem0
>> used_elems[11] = elem1
>> used_elems[12] = elem2
>>
>> At this point, avail_idx = 13, used_idx = 10.
>>
>> elem1 gets used first, ok.
>>
>> Now, if I'm understanding correctly, you're saying that in
>> virtqueue_push explicitly (not virtqueue_fill/virtqueue_flush), we scan
>> used_elems[used_idx] - used_elems[avail_idx] to find used_elems[i].index
>> == elem->index and mark it as used, e.g. used_elems[i].used = true.
>> Okay, now used_elems[11] has been marked as used.
>>
>> Now we make it to virtqueue_fill. What's the role you want
>> virtqueue_fill to take here (if any)?
>>
> 
> Sorry I meant virtqueue_fill here.
> 
>> You say that after we mark this element as used, we go to
>> virtqueue_flush and scan for used elements in used_elems[used_idx] -
>> used_elems[avail_idx]. Of course, the first one of this order will be in
>> used_elems[used_idx], which is currently showing the element as unused,
>> so we're done with this element for now.
>>
>> So, what exactly is the role of virtqueue_flush here? I'm inferring here
>> that you want the virtqueue_flush role (for split VQs) to do both the
>> writing to the used ring (normally done in virtqueue_fill) as well as
>> updating the used_idx (normally done in virtqueue_flush). Is this correct?
>>
> 
> I modelled this after the packed vq scenario, where all is updated at
> _flush. But yes, I think you got it totally right.
> 
>> Next, elem0 gets used second.
>>
>> Again, in virtqueue_push we scan scan used_elems[used_idx] -
>> used_elems[avail_idx] to find used_elems[i].index == elem->index and
>> mark it as used. Okay, used_elems[10] has been marked as used.
>>
>> Then you say, "It is the first one in the used_elems[used_idx] -
>> used_elems[avail_idx] range, so we can write it to the used ring at
>> fill. used_idx++. We use the rest of the descriptor until we find one in
>> used_elems that is not used, which is used_elems[12]."
>>
>> This, to me, sounds like "in virtqueue_fill, when we find an order (e.g.
>> used_elems[used_idx].index == elem->index) write it to the used ring AND
>> increment the used_idx. Keep writing elements to the used ring if
>> used_elems[used_idx].used == true and, for each element being written,
>> incremented used_idx."
>>
>> This is a bit confusing to me since next you say "After that,
>> virtqueue_flush is called. At its scanning, used_elems[10] is used, so
>> it writes it to the used ring. After that, used_elems[11] is also used,
>> so it is written also. used_elems[12] is not used, so it stops there."
>>
>> This sounds very similar to what you proposed for virtqueue_fill, except
>> it looks like you're also saying to do this in virtqueue_flush, hence my
>> confusion.
>>
>> If you wouldn't mind, could you clarify the roles of virtqueue_fill and
>> virtqueue_flush here for me? Thanks :)!
>>
> 
> I see how they're confusing if following the split vq way, sorry!
> * _fill: Only update used_elems (or equivalent)
> * _flush: Write information to used ring or descriptor ring.
> 

Thank you for the clarification Eugenio! This makes sense to me now :) 
I'll get started on a v3 RFC.

Or should I send this next one as a patch?

>>> This makes it difficult to actually batch used descriptors. My
>>> proposal is to address it in another series, by delegating it to the
>>> caller and recovering proper usage of virtqueue_push(idx) parameter.
>>> The caller can specify either to batch as usual, or to delegate the
>>> automatic (and potentially inefficient) ordering I'm proposing here.
>>>
>>
>> Just to be clear, for this series, you'd like me to implement a solution
>> that does *not* consider the case where virtqueue_fill is called
>> multiple times before virtqueue_flush (and to put a solution for this in
>> a separate series)?
>>
> 
> No, it is supported. It is just not very efficient because of the linear search.
> 
> For it to be supported properly the caller should indicate
> virtqueue_fill idx properly. But that requires modifications to all
> devices, so I'm proposing to do it on top.
> 

Ahh I see now. Thank you for clarifying that!

Thanks again!

Jonah

>> Are we not concerned that we might shoot ourselves in the foot here by
>> implementing a process that may not work well for a batching solution,
>> especially when we have an almost-working solution for batching and
>> non-batching cases?
>>
>>>> We
>>>> could just keep adding processed elements to vq->used_elems at
>>>> virtqueue_fill but instead of:
>>>>
>>>> vq->used_elems[seq_idx].in_num = elem->in_num;
>>>> vq->used_elems[seq_idx].out_num = elem->out_num;
>>>>
>>>> We could do:
>>>>
>>>> vq->used_elems[seq_idx].in_num = 1;
>>>> vq->used_elems[seq_idx].out_num = 1;
>>>>
>>>> We'd use in_num and out_num as separate flags. in_num could indicate if
>>>> this element has been written to the used ring while out_num could
>>>> indicate if this element has been flushed (1 for no, 0 for yes). In
>>>> other words, when we go to write to the used ring, start at index
>>>> vq->used_idx and iterate through the used elements.
>>>>
>>>> If a used element's in_num and out_num == 0, then this element is
>>>> invalid (not yet processed) and we stop the search.
>>>>
>>>> If a used element's in_num and out_num == 1, then this element is valid,
>>>> written to the used ring, in_num is set to 0, and the search continues.
>>>>
>>>> Lastly, if a used element's in_num == 0 but out_num == 1, then this
>>>> element has already been written to the used ring but not yet flushed,
>>>> so ignore this element and continue searching.
>>>>
>>>> There should never be a case where in_num == 1 and out_num == 0.
>>>>
>>>> However, this would probably mean that before (or right after) we
>>>> actually perform the flush we'll have to iterate through the used_elems
>>>> array one more time and set their out_num's to 0 to indicate the element
>>>> has been flushed.
>>>>
>>>> Again, this is just for the batched split VQ case where we have to keep
>>>> track of elements that have been written but not flushed and elements
>>>> that have been written and flushed, given that we don't know which
>>>> elements have actually been written to the used ring until the used_idx
>>>> is updated.
>>>>
>>>> This approach appears more costly though if we're really trying to avoid
>>>> having this new used_seq_idx VirtQueue member.
>>>>
>>>>>> In any case, the use of this variable could be seen as an optimization
>>>>>> as its value will tell us where to start looking in vq->used_elems
>>>>>> instead of always starting at vq->used_idx.
>>>>>>
>>>>>> If this is like a one-shot scenario where one element gets written and
>>>>>> then flushed after, then yes in this case used_seq_idx == used_idx.
>>>>>>
>>>>>> ------
>>>>>>
>>>>>> For current_seq_idx, this is pretty much just a counter. Every new
>>>>>> VirtQueueElement created from virtqueue_pop is given a number and the
>>>>>> counter is incremented. Like grabbing a ticket number and waiting for
>>>>>> your number to be called. The next person to grab a ticket number will
>>>>>> be your number + 1.
>>>>>>
>>>>>
>>>>> So it's like last_avail_idx, isn't it?
>>>>>
>>>>
>>>> For the split VQ case, pretty much. Though if we hit this case in
>>>> virtqueue_split_pop, we may get into some trouble:
>>>>
>>>> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
>>>>        goto done;
>>>> }
>>>>
>>>
>>> That's a fatal mistake and the device breaks, vdev->broken = true. It
>>> cannot be used anymore from that point on, because of all the checks
>>> of that variable.
>>>
>>> Does that solve the problem?
>>>
>>
>> Ah, it does. My apologies, I should've recognized this would result in
>> the device breaking.
>>
>>>> However for the packed VQ case, last_avail_idx might not work in the way
>>>> we'd need it to for this implementation. In virtqueue_packed_pop, we see
>>>> this:
>>>>
>>>> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
>>>> vq->last_avail_idx += elem->ndescs;
>>>>
>>>> It would appear as though last_avail_idx is incremented by total number
>>>> of descriptors associated with the element, which can be greater than 1.
>>>> This implementation increments by 1 for each element.
>>>>
>>>> Actually... It's good you mentioned this because I think my packed VQ
>>>> implementation is wrong. For packed VQs, vq->used_idx is incremented by
>>>> the total number of descriptors in the flushed elements and not
>>>> necessarily the number of elements being flushed like in the split VQ
>>>> case. I'm adding elements to vq->used_elems in a per-element sequence
>>>> rather than going by the number of descriptors an element holds, which
>>>> should be the case for packed VQs.
>>>>
>>>
>>> If you keep it by your proposed index I think you can increase it one
>>> per head, as they are the entries that are written in both cases.
>>> unsed_idx should increment properly already.
>>>
>>> If you move to my proposal, both should increase by elem->ndescs as
>>> you suggest here.
>>>
>>
>> Ack! Thanks!
>>
>>>>>> Let me know if I'm making any sense. Thanks :)
>>>>>>
>>>>>> Jonah
>>>>>>
>>>>>>>>          /* Last used index value we have signalled on */
>>>>>>>>          uint16_t signalled_used;
>>>>>>>>
>>>>>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>>>>>>>>              elem->in_sg[i] = iov[out_num + i];
>>>>>>>>          }
>>>>>>>>
>>>>>>>> +    /* Assign sequence index for in-order processing */
>>>>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>>>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          vq->inuse++;
>>>>>>>>
>>>>>>>>          trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>>>>>>>>          vq->shadow_avail_idx = vq->last_avail_idx;
>>>>>>>>          vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>>>>>>>>
>>>>>>>> +    /* Assign sequence index for in-order processing */
>>>>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>>>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>>>>>      done:
>>>>>>>>          address_space_cache_destroy(&indirect_desc_cache);
>>>>>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
>>>>>>>>          vdev->vq[i].notification = true;
>>>>>>>>          vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>>>>>>>>          vdev->vq[i].inuse = 0;
>>>>>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>>>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>>>>>          virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
>>>>>>>>      }
>>>>>>>>
>>>>>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>>>>>>>          vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>>>>>>>>          vdev->vq[i].handle_output = handle_output;
>>>>>>>>          vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
>>>>>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>>>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>>>>>
>>>>>>>>          return &vdev->vq[i];
>>>>>>>>      }
>>>>>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>>>>>>> index b3c74a1bca..910b2a3427 100644
>>>>>>>> --- a/include/hw/virtio/virtio.h
>>>>>>>> +++ b/include/hw/virtio/virtio.h
>>>>>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>>>>>>>>          hwaddr *out_addr;
>>>>>>>>          struct iovec *in_sg;
>>>>>>>>          struct iovec *out_sg;
>>>>>>>> +    uint16_t seq_idx;
>>>>>>>>      } VirtQueueElement;
>>>>>>>>
>>>>>>>>      #define VIRTIO_QUEUE_MAX 1024
>>>>>>>> --
>>>>>>>> 2.39.3
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-05 15:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
2024-04-03 10:18   ` Eugenio Perez Martin
2024-04-03 16:51     ` Jonah Palmer
2024-04-04 11:35       ` Eugenio Perez Martin
2024-04-04 14:41         ` Jonah Palmer
2024-04-04 16:33           ` Eugenio Perez Martin
2024-04-05 13:58             ` Jonah Palmer
2024-04-05 15:04               ` Eugenio Perez Martin
2024-04-05 15:37                 ` Jonah Palmer
2024-03-28 16:22 ` [RFC v2 2/5] virtio: In-order support for split VQs Jonah Palmer
2024-03-28 16:22 ` [RFC v2 3/5] virtio: In-order support for packed VQs Jonah Palmer
2024-03-28 16:22 ` [RFC v2 4/5] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
2024-03-28 16:22 ` [RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).